Linux virtualization list

Linux virtualization list
 help / color / mirror / Atom feed

* [PATCH v6 01/12] nvdimm: preserve flush callback errors
From: Li Chen @ 2026-06-21 13:02 UTC (permalink / raw)
  To: Pankaj Gupta, Dan Williams, Vishal Verma, Dave Jiang, Ira Weiny,
	Alison Schofield, virtualization, nvdimm
  Cc: linux-kernel, Li Chen
In-Reply-To: <20260621130246.2973254-1-me@linux.beauty>

nvdimm_flush() currently converts any non-zero provider flush error to
-EIO. That loses useful errno values from provider callbacks.

A local virtio-pmem mkfs sanity test showed the masking clearly:

  wipefs: /dev/pmem0: cannot flush modified buffers: Input/output error
  mkfs.ext4: Input/output error while writing out and closing file system
  nd_region region0: dbg: nvdimm_flush rc=-5

The virtio-pmem callback can return -ENOMEM when async_pmem_flush() fails
to allocate a child flush bio, but nvdimm_flush() hides that as -EIO before
pmem_submit_bio() converts it to a block status.

Return the provider callback error directly. The generic flush path still
returns 0, and pmem_submit_bio() already handles errno-to-blk_status
conversion for bio completion.

Signed-off-by: Li Chen <me@linux.beauty>
---
v3->v4:
- New patch.

 drivers/nvdimm/region_devs.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index e35c2e18518f0..0cd96503c0596 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1114,10 +1114,8 @@ int nvdimm_flush(struct nd_region *nd_region, struct bio *bio)
 
 	if (!nd_region->flush)
 		rc = generic_nvdimm_flush(nd_region);
-	else {
-		if (nd_region->flush(nd_region, bio))
-			rc = -EIO;
-	}
+	else
+		rc = nd_region->flush(nd_region, bio);
 
 	return rc;
 }
-- 
2.52.0

^ permalink raw reply related

* [PATCH v6 00/12] nvdimm: virtio_pmem: fix request lifetime and converge broken queue failures
From: Li Chen @ 2026-06-21 13:02 UTC (permalink / raw)
  To: Pankaj Gupta, Dan Williams, Vishal Verma, Dave Jiang, Ira Weiny,
	Alison Schofield, virtualization, nvdimm
  Cc: linux-kernel

Hi,

The nvdimm flush helper currently converts any non-zero provider flush
callback error to -EIO. That hides useful errno values from providers. For
example, virtio-pmem may fail flush allocation with -ENOMEM, but that is
currently reported as -EIO by nvdimm_flush().

The raw failure seen in the local mkfs sanity test was:

  wipefs: /dev/pmem0: cannot flush modified buffers: Input/output error
  mkfs.ext4: Input/output error while writing out and closing file system
  nd_region region0: dbg: nvdimm_flush rc=-5

The first five patches keep provider flush errors intact, make
pmem_submit_bio() honor a failed REQ_PREFLUSH before copying data, keep
dataless bios out of the data loop, and avoid allocating a child flush bio
for virtio-pmem REQ_FUA handling. REQ_PREFLUSH and REQ_FUA are now issued
synchronously from pmem_submit_bio(). After that, virtio-pmem only allocates
its request object for the actual provider flush, and that allocation uses
GFP_NOIO so reclaim does not recurse into filesystem or block IO.

The rest of the series addresses virtio-pmem request lifetime and broken
virtqueue handling. The virtio-pmem flush path uses a virtqueue cookie/token
to carry a per-request context through completion. Under broken virtqueue /
notify failure conditions, the submitter can return and free the request
object while the host/backend may still complete the published request. The
IRQ completion handler then dereferences freed memory when waking waiters,
which is reported by KASAN as a slab-use-after-free and may manifest as lock
corruption (e.g. "BUG: spinlock already unlocked") without KASAN.

In addition, the flush path has two wait sites: one for virtqueue descriptor
availability (-ENOSPC from virtqueue_add_sgs()) and one for request
completion. If the virtqueue becomes broken, forward progress is no longer
guaranteed and these waiters may sleep indefinitely unless the driver
converges the failure and wakes all wait sites. This version also orders
response publication with release/acquire, keeps DMA_FROM_DEVICE response
storage away from CPU-owned request fields, and wakes the in-flight
completion waiter when a queue is marked broken.

This series addresses these issues:

1/12 nvdimm: preserve flush callback errors
Return provider flush callback errors directly from nvdimm_flush().

2/12 nvdimm: pmem: keep PREFLUSH before data writes
Run REQ_PREFLUSH synchronously before copying data and fail the bio if the
flush fails.

3/12 nvdimm: pmem: guard data loop for dataless bios
Keep flush-only bios out of the data copy loop.

4/12 nvdimm: virtio_pmem: stop allocating child flush bio
Flush REQ_FUA synchronously instead of allocating a chained child bio.

5/12 nvdimm: virtio_pmem: use GFP_NOIO for flush requests
Use GFP_NOIO for the virtio-pmem request allocation.

6/12 nvdimm: virtio_pmem: always wake -ENOSPC waiters
Wake one -ENOSPC waiter for each reclaimed used buffer, decoupled from
token completion.

7/12 nvdimm: virtio_pmem: use READ_ONCE()/WRITE_ONCE() for wait flags
Use READ_ONCE()/WRITE_ONCE() for the wait_event() flags (done and
wq_buf_avail).

8/12 nvdimm: virtio_pmem: refcount requests for token lifetime
Refcount request objects so the token lifetime spans the window where it is
reachable through the virtqueue until completion/drain drops the virtqueue
reference.

9/12 nvdimm: virtio_pmem: publish done with release/acquire
Order response publication before the submitter observes request completion.

10/12 nvdimm: virtio_pmem: isolate DMA request buffers
Keep the DMA_FROM_DEVICE response buffer away from CPU-owned request fields.

11/12 nvdimm: virtio_pmem: converge broken virtqueue to -EIO
Track a device-level broken state to converge broken/notify failures to -EIO:
wake -ENOSPC waiters, wake the in-flight completion waiter, fail-fast new
requests, and report errors after the queue is marked broken.

12/12 nvdimm: virtio_pmem: drain requests in freeze
Drain outstanding requests in freeze() after resetting the device so waiters
do not sleep indefinitely and virtqueue_detach_unused_buf() only runs on a
quiesced queue.

The original repros were on QEMU x86_64 with a virtio-pmem device exported
as /dev/pmem0. For this v6 reroll, the series applies to v7.1-rc7 and to
local next/master at 4fa3f5fabb30 ("Add linux-next specific files for
20260616").

Thanks,
Li Chen

Changelog:
v5->v6:
- Address Sashiko review feedback:
  - Add a data-loop guard for dataless bios in pmem_submit_bio().
  - Replace the child flush bio allocation with synchronous FUA flushing.
  - Keep GFP_NOIO only for the virtio-pmem request allocation.
  - Publish request completion with release/acquire ordering.
  - Isolate the DMA_FROM_DEVICE response buffer from CPU-owned fields.
  - Wake the in-flight host-completion waiter when marking the queue broken.
- Clear req_vq after del_vqs() and make drain tolerate a NULL queue.
v4->v5:
- Address review feedback about REQ_PREFLUSH ordering and active virtqueue
  detach.
- Add 2/8 so a failed REQ_PREFLUSH fails the bio before any data copy, and
  make REQ_PREFLUSH use a synchronous provider flush instead of a deferred
  child bio.
- Rework broken-queue handling so runtime failure marking only stops new
  submissions and wakes local -ENOSPC waiters; used/unused token draining is
  done after device reset in remove() and freeze().
- Remove the broken-state shortcut from the host-completion wait so the
  submitter never reads an uninitialized response field.
- Keep the raw broken-virtqueue dmesg in 7/8 while updating the teardown
  rationale.
- Renumber the old virtio-pmem fixes after the new pmem PREFLUSH patch.
v3->v4:
- Rebased the series onto v7.1-rc7 so it applies cleanly to Linux 7.1-rc7.
- Update the allocation site in 6/7 from kmalloc(sizeof(*req_data),
  GFP_KERNEL) to kmalloc_obj(*req_data) to match current nvdimm code.
- Add 1/7 to preserve provider flush callback errors in nvdimm_flush().
- Include the GFP_NOIO child flush bio allocation fix as 2/7.
- Renumber the old request lifetime and broken virtqueue fixes after the two
  new flush error patches.
v2->v3:
- Split patch 1 as suggested by Pankaj Gupta: keep the waiter wakeup
  ordering change in 1/5 and move READ_ONCE()/WRITE_ONCE() updates to
  2/5 (no functional change intended).
- Add log report to commit msg.
- Fold the export fix into 4/5 to keep the series bisectable when
  CONFIG_VIRTIO_PMEM=m.
v1->v2:
- Add the export patch to fix compile issue.

Links:
v5: https://lore.kernel.org/all/20260617122442.2118957-1-me@linux.beauty/
v4: https://lore.kernel.org/all/20260609120726.1714780-1-me@linux.beauty/
v3: https://lore.kernel.org/all/20260226025712.2236279-1-me@linux.beauty/#t
v2: https://lore.kernel.org/all/20251225042915.334117-1-me@linux.beauty/
v1: https://www.spinics.net/lists/kernel/msg5974818.html

Li Chen (12):
  nvdimm: preserve flush callback errors
  nvdimm: pmem: keep PREFLUSH before data writes
  nvdimm: pmem: guard data loop for dataless bios
  nvdimm: virtio_pmem: stop allocating child flush bio
  nvdimm: virtio_pmem: use GFP_NOIO for flush requests
  nvdimm: virtio_pmem: always wake -ENOSPC waiters
  nvdimm: virtio_pmem: use READ_ONCE()/WRITE_ONCE() for wait flags
  nvdimm: virtio_pmem: refcount requests for token lifetime
  nvdimm: virtio_pmem: publish done with release/acquire
  nvdimm: virtio_pmem: isolate DMA request buffers
  nvdimm: virtio_pmem: converge broken virtqueue to -EIO
  nvdimm: virtio_pmem: drain requests in freeze

 drivers/nvdimm/nd_virtio.c   | 224 +++++++++++++++++++++++++++--------
 drivers/nvdimm/pmem.c        |  52 ++++----
 drivers/nvdimm/region_devs.c |   6 +-
 drivers/nvdimm/virtio_pmem.c |  51 +++++++-
 drivers/nvdimm/virtio_pmem.h |  18 ++-
 5 files changed, 270 insertions(+), 81 deletions(-)

-- 
2.52.0

^ permalink raw reply

* Re: [PATCH] crypto: virtio - bound the device-reported akcipher result
From: Michael S. Tsirkin @ 2026-06-21  5:33 UTC (permalink / raw)
  To: hexlabsecurity
  Cc: Herbert Xu, Jason Wang, Gonglei, virtualization, Xuan Zhuo,
	linux-crypto, linux-kernel, Eugenio Pérez, David S. Miller
In-Reply-To: <20260620-b4-disp-27caeeac-v1-1-956e8f9c4f01@proton.me>

On Sat, Jun 20, 2026 at 09:44:21PM -0500, Bryam Vargas via B4 Relay wrote:
> From: Bryam Vargas <hexlabsecurity@proton.me>
> 
> length


some kind of corruption here.

> virtio_crypto_dataq_akcipher_callback() sets the result length from the
> device-reported response length without bounding it to the destination
> buffer, which was allocated for the original request length.
> sg_copy_from_buffer() then reads that many bytes from the destination
> buffer; a backend reporting a larger length over-reads adjacent kernel
> heap into the caller's scatterlist (an out-of-bounds read).
> 
> Clamp the reported length to the originally requested destination length.
> A conforming device reports no more than that, so valid results are
> unaffected.
> 
> Fixes: a36bd0ad9fbf ("virtio-crypto: adjust dst_len at ops callback")
> Cc: stable@vger.kernel.org
> Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
> ---
>  drivers/crypto/virtio/virtio_crypto_akcipher_algs.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> index d8d452cac391..64ea141f018c 100644
> --- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> +++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> @@ -88,7 +88,8 @@ static void virtio_crypto_dataq_akcipher_callback(struct virtio_crypto_request *
>  	}
>  
>  	/* actual length may be less than dst buffer */
> -	akcipher_req->dst_len = len - sizeof(vc_req->status);
> +	akcipher_req->dst_len = min_t(unsigned int, len - sizeof(vc_req->status),
> +				      akcipher_req->dst_len);
>  	sg_copy_from_buffer(akcipher_req->dst, sg_nents(akcipher_req->dst),
>  			    vc_akcipher_req->dst_buf, akcipher_req->dst_len);
>  	virtio_crypto_akcipher_finalize_req(vc_akcipher_req, akcipher_req, error);
> 
> ---
> base-commit: 1a3746ccbb0a97bed3c06ccde6b880013b1dddc1
> change-id: 20260620-b4-disp-27caeeac-5b8b67962fdd
> 
> Best regards,
> -- 
> Bryam Vargas <hexlabsecurity@proton.me>
> 


^ permalink raw reply

* [PATCH] crypto: virtio - bound the device-reported akcipher result
From: Bryam Vargas via B4 Relay @ 2026-06-21  2:44 UTC (permalink / raw)
  To: Herbert Xu, Jason Wang, Michael S. Tsirkin, Gonglei
  Cc: virtualization, Xuan Zhuo, linux-crypto, linux-kernel,
	Eugenio Pérez, David S. Miller

From: Bryam Vargas <hexlabsecurity@proton.me>

length

virtio_crypto_dataq_akcipher_callback() sets the result length from the
device-reported response length without bounding it to the destination
buffer, which was allocated for the original request length.
sg_copy_from_buffer() then reads that many bytes from the destination
buffer; a backend reporting a larger length over-reads adjacent kernel
heap into the caller's scatterlist (an out-of-bounds read).

Clamp the reported length to the originally requested destination length.
A conforming device reports no more than that, so valid results are
unaffected.

Fixes: a36bd0ad9fbf ("virtio-crypto: adjust dst_len at ops callback")
Cc: stable@vger.kernel.org
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
 drivers/crypto/virtio/virtio_crypto_akcipher_algs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
index d8d452cac391..64ea141f018c 100644
--- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
+++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
@@ -88,7 +88,8 @@ static void virtio_crypto_dataq_akcipher_callback(struct virtio_crypto_request *
 	}
 
 	/* actual length may be less than dst buffer */
-	akcipher_req->dst_len = len - sizeof(vc_req->status);
+	akcipher_req->dst_len = min_t(unsigned int, len - sizeof(vc_req->status),
+				      akcipher_req->dst_len);
 	sg_copy_from_buffer(akcipher_req->dst, sg_nents(akcipher_req->dst),
 			    vc_akcipher_req->dst_buf, akcipher_req->dst_len);
 	virtio_crypto_akcipher_finalize_req(vc_akcipher_req, akcipher_req, error);

---
base-commit: 1a3746ccbb0a97bed3c06ccde6b880013b1dddc1
change-id: 20260620-b4-disp-27caeeac-5b8b67962fdd

Best regards,
-- 
Bryam Vargas <hexlabsecurity@proton.me>



^ permalink raw reply related

* [PATCH] drm/virtio: bound EDID block reads to the response buffer
From: Bryam Vargas via B4 Relay @ 2026-06-21  2:43 UTC (permalink / raw)
  To: Dmitry Osipenko, David Airlie, Gerd Hoffmann
  Cc: linux-kernel, Gurchetan Singh, Chia-I Wu, dri-devel,
	virtualization

From: Bryam Vargas <hexlabsecurity@proton.me>

virtio_get_edid_block() validates the read offset only against the
device-supplied resp->size field, never against the fixed-size resp->edid
array. The EDID block index is driven by the device-supplied extension
count, so a malicious virtio-gpu backend can advertise a large size
together with a high block count and read far past the array into adjacent
kernel memory, which is then surfaced in the parsed EDID (an out-of-bounds
read / info leak).

Also reject any read whose end exceeds the size of the edid array.
Conforming EDID responses stay within the array and are unaffected.

Fixes: b4b01b4995fb ("drm/virtio: add edid support")
Cc: stable@vger.kernel.org
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
 drivers/gpu/drm/virtio/virtgpu_vq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 67865810a2e7..c8b9475a7472 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -897,7 +897,8 @@ static int virtio_get_edid_block(void *data, u8 *buf,
 	struct virtio_gpu_resp_edid *resp = data;
 	size_t start = block * EDID_LENGTH;

-	if (start + len > le32_to_cpu(resp->size))
+	if (start + len > le32_to_cpu(resp->size) ||
+	    start + len > sizeof(resp->edid))
 		return -EINVAL;
 	memcpy(buf, resp->edid + start, len);
 	return 0;

---
base-commit: 1a3746ccbb0a97bed3c06ccde6b880013b1dddc1
change-id: 20260620-b4-disp-22bba7bf-47d80f0c083b

Best regards,
-- 
Bryam Vargas <hexlabsecurity@proton.me>

^ permalink raw reply related

* [PATCH v2] vdpa_sim: fix cleanup after worker creation failure
From: Linfeng Sun @ 2026-06-20 10:09 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang
  Cc: Xuan Zhuo, Eugenio Pérez, virtualization, linux-kernel,
	Linfeng Sun

vdpasim_create() leaves vdpasim->worker as an ERR_PTR when
kthread_run_worker() fails. The error path then drops the device
reference, which releases the partially initialized simulator.

vdpasim_free() unconditionally passes the worker pointer to
kthread_destroy_worker(), so the ERR_PTR is dereferenced and can trigger
a general protection fault.

Store the worker error, clear the pointer, and only clean up the worker
when it was successfully initialized. Also make the release path tolerate
partially initialized objects by guarding virtqueue and IOTLB cleanup,
since the same release path can be reached from other initialization
failures.

I found this bug myself, though the patch was written with AI assistance.

Fixes: 76acfa7bc54f ("vdpa_sim: use kthread worker")
Assisted-by: OpenAI-Codex:GPT-5
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Linfeng Sun <slf@hdu.edu.cn>
---
v1 -> v2:
- Remove the unnecessary vdpasim->iommu_pt check from IOTLB cleanup.
- Update the commit message to clarify the cleanup path for partially
  initialized devices.
- Add Fixes and Assisted-by tags.
- Link to v1: https://lore.kernel.org/r/20260612105054.1850453-1-slf@hdu.edu.cn/

 drivers/vdpa/vdpa_sim/vdpa_sim.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 8cb1cc2ea139..2aa34df4d0ad 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -231,8 +231,11 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
 	kthread_init_work(&vdpasim->work, vdpasim_work_fn);
 	vdpasim->worker = kthread_run_worker(0, "vDPA sim worker: %s",
 						dev_attr->name);
-	if (IS_ERR(vdpasim->worker))
+	if (IS_ERR(vdpasim->worker)) {
+		ret = PTR_ERR(vdpasim->worker);
+		vdpasim->worker = NULL;
 		goto err_iommu;
+	}
 
 	mutex_init(&vdpasim->mutex);
 	spin_lock_init(&vdpasim->iommu_lock);
@@ -742,18 +745,24 @@ static void vdpasim_free(struct vdpa_device *vdpa)
 	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 	int i;
 
-	kthread_cancel_work_sync(&vdpasim->work);
-	kthread_destroy_worker(vdpasim->worker);
+	if (vdpasim->worker) {
+		kthread_cancel_work_sync(&vdpasim->work);
+		kthread_destroy_worker(vdpasim->worker);
+	}
 
-	for (i = 0; i < vdpasim->dev_attr.nvqs; i++) {
-		vringh_kiov_cleanup(&vdpasim->vqs[i].out_iov);
-		vringh_kiov_cleanup(&vdpasim->vqs[i].in_iov);
+	if (vdpasim->vqs) {
+		for (i = 0; i < vdpasim->dev_attr.nvqs; i++) {
+			vringh_kiov_cleanup(&vdpasim->vqs[i].out_iov);
+			vringh_kiov_cleanup(&vdpasim->vqs[i].in_iov);
+		}
 	}
 
 	vdpasim->dev_attr.free(vdpasim);
 
-	for (i = 0; i < vdpasim->dev_attr.nas; i++)
-		vhost_iotlb_reset(&vdpasim->iommu[i]);
+	if (vdpasim->iommu) {
+		for (i = 0; i < vdpasim->dev_attr.nas; i++)
+			vhost_iotlb_reset(&vdpasim->iommu[i]);
+	}
 	kfree(vdpasim->iommu);
 	kfree(vdpasim->iommu_pt);
 	kfree(vdpasim->vqs);
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net v4] virtio-net: fix len check in receive_big()
From: patchwork-bot+netdevbpf @ 2026-06-19  0:50 UTC (permalink / raw)
  To: Xiang Mei
  Cc: mst, jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
	kuba, pabeni, netdev, virtualization, linux-kernel,
	minhquangbui99, bestswngs
In-Reply-To: <20260616042837.2249468-1-xmei5@asu.edu>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 15 Jun 2026 21:28:37 -0700 you wrote:
> receive_big() bounds the device-announced length by
> (big_packets_num_skbfrags + 1) * PAGE_SIZE.  That is still too loose:
> add_recvbuf_big() sets sg[1] to start at offset
> sizeof(struct padded_vnet_hdr) into the first page, so the chain
> actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
> big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
> check allows for the common hdr_len == 12 case.
> 
> [...]

Here is the summary with links:
  - [net,v4] virtio-net: fix len check in receive_big()
    https://git.kernel.org/netdev/net/c/9e5ad06ea826

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] virtio-blk: use little-endian types for the zoned fields
From: Stefan Hajnoczi @ 2026-06-18 15:18 UTC (permalink / raw)
  To: Michael Bommarito
  Cc: Michael S . Tsirkin, Jason Wang, Stefano Garzarella,
	Dmitry Fomichev, Damien Le Moal, Jens Axboe, Paolo Bonzini,
	virtualization, linux-block, linux-kernel
In-Reply-To: <20260617151727.4071754-1-michael.bommarito@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2446 bytes --]

On Wed, Jun 17, 2026 at 11:17:27AM -0400, Michael Bommarito wrote:
> The zoned block-device fields in the virtio-blk header are typed
> __virtio{32,64}, so their endianness follows VIRTIO_F_VERSION_1. The
> zoned feature is only defined for VIRTIO 1.x devices, and the virtio
> specification defines all of its fields as little-endian. Commit
> b16a1756c716 ("virtio_blk: mark all zone fields LE") tagged them
> __le* for exactly this reason, but commit f1ba4e674feb ("virtio-blk:
> fix to match virtio spec") re-applied the reviewed version of the
> original zoned series -- which predated b16a1756 -- and silently
> restored the __virtio* typing together with the matching
> virtio*_to_cpu() / virtio_cread() accessors in the driver.
> 
> Restore the little-endian typing for the zoned configuration-space
> characteristics, the zone descriptor, the zone report header and the
> ZONE_APPEND in-header sector, and read them with le*_to_cpu() and
> virtio_cread_le() to match.
> 
> There is no functional change on any spec-compliant device: zoned
> requires VIRTIO_F_VERSION_1, and for a VERSION_1 device
> virtio*_to_cpu() is identical to le*_to_cpu(). The change makes the
> uapi types describe the actual wire format and removes a latent
> endianness mismatch for a (non-conformant) legacy device on a
> big-endian guest.
> 
> Fixes: f1ba4e674feb ("virtio-blk: fix to match virtio spec")
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> ---
> Testing:
>  - Builds with no new warnings; sparse endian-clean (C=2,
>    __CHECK_ENDIAN__, CONFIG_BLK_DEV_ZONED=y) both before and after.
>  - Booted under QEMU with a host-managed zoned device exposed through
>    virtio-blk. Zone revalidation, blkzone report and a sequential
>    write / write-pointer check return correct values; blktests zbd
>    device tests 001-006 (sysfs+ioctl, report zone, reset, write split,
>    write ordering, revalidate) pass, with results identical before and
>    after this change -- expected, since on a VIRTIO_F_VERSION_1 device
>    virtio*_to_cpu() == le*_to_cpu().
> 
>  drivers/block/virtio_blk.c      | 38 +++++++++++++++------------------
>  include/uapi/linux/virtio_blk.h | 18 ++++++++--------
>  2 files changed, 26 insertions(+), 30 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH 0/2] tools: Fix tools/virtio test build
From: Michael S. Tsirkin @ 2026-06-18 12:04 UTC (permalink / raw)
  To: Yichong Chen
  Cc: jasowang, xuanzhuo, eperezma, rppt, ljs, akpm, virtualization,
	linux-kernel
In-Reply-To: <913D9291164C65D4+20260618095642.510565-1-chenyichong@uniontech.com>

On Thu, Jun 18, 2026 at 05:56:40PM +0800, Yichong Chen wrote:
> This series fixes build failures hit by:
> 
>   make -C tools/virtio test
> 
> Patch 1 adds tools/virtio compatibility definitions needed by
> recent virtio headers when building vhost_net_test.
> 
> Patch 2 makes tools/include/linux/overflow.h include stdint.h
> for SIZE_MAX, which is used by its size helper functions.
> 
> With the series applied, make -C tools/virtio test builds
> virtio_test, vringh_test and vhost_net_test successfully.
> 
> Yichong Chen (2):
>   tools/virtio: Add missing compat definitions for vhost_net_test
>   tools/include: Include stdint.h for SIZE_MAX in overflow.h


which commit is this on top of?

>  tools/include/linux/overflow.h       |  2 +
>  tools/virtio/linux/completion.h      |  9 +++++
>  tools/virtio/linux/device.h          |  1 +
>  tools/virtio/linux/dma-mapping.h     |  1 +
>  tools/virtio/linux/mod_devicetable.h | 14 +++++++
>  tools/virtio/linux/slab.h            |  4 ++
>  tools/virtio/linux/virtio_features.h | 56 ++++++++++++++++++++++++++++
>  7 files changed, 87 insertions(+)
>  create mode 100644 tools/virtio/linux/completion.h
>  create mode 100644 tools/virtio/linux/mod_devicetable.h
>  create mode 100644 tools/virtio/linux/virtio_features.h
> 
> -- 
> 2.51.0


^ permalink raw reply

* [PATCH] tools/virtio: Remove unsupported --batch option from vhost_net_test
From: Yichong Chen @ 2026-06-18 10:02 UTC (permalink / raw)
  To: mst, jasowang
  Cc: xuanzhuo, eperezma, virtualization, linux-kernel, chenyichong

vhost_net_test has --batch in longopts, but not in help.

The parser never handles 'b', so --batch hits assert(0).

Remove the unsupported option.

Signed-off-by: Yichong Chen <chenyichong@uniontech.com>
---
 tools/virtio/vhost_net_test.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/tools/virtio/vhost_net_test.c b/tools/virtio/vhost_net_test.c
index 389d99a6d7c7..566e15420bb6 100644
--- a/tools/virtio/vhost_net_test.c
+++ b/tools/virtio/vhost_net_test.c
@@ -450,11 +450,6 @@ static const struct option longopts[] = {
 		.val = 'n',
 		.has_arg = required_argument,
 	},
-	{
-		.name = "batch",
-		.val = 'b',
-		.has_arg = required_argument,
-	},
 	{
 	}
 };
-- 
2.51.0


^ permalink raw reply related

* [PATCH 0/2] tools: Fix tools/virtio test build
From: Yichong Chen @ 2026-06-18  9:56 UTC (permalink / raw)
  To: mst, jasowang
  Cc: xuanzhuo, eperezma, rppt, ljs, akpm, virtualization, linux-kernel,
	chenyichong

This series fixes build failures hit by:

  make -C tools/virtio test

Patch 1 adds tools/virtio compatibility definitions needed by
recent virtio headers when building vhost_net_test.

Patch 2 makes tools/include/linux/overflow.h include stdint.h
for SIZE_MAX, which is used by its size helper functions.

With the series applied, make -C tools/virtio test builds
virtio_test, vringh_test and vhost_net_test successfully.

Yichong Chen (2):
  tools/virtio: Add missing compat definitions for vhost_net_test
  tools/include: Include stdint.h for SIZE_MAX in overflow.h

 tools/include/linux/overflow.h       |  2 +
 tools/virtio/linux/completion.h      |  9 +++++
 tools/virtio/linux/device.h          |  1 +
 tools/virtio/linux/dma-mapping.h     |  1 +
 tools/virtio/linux/mod_devicetable.h | 14 +++++++
 tools/virtio/linux/slab.h            |  4 ++
 tools/virtio/linux/virtio_features.h | 56 ++++++++++++++++++++++++++++
 7 files changed, 87 insertions(+)
 create mode 100644 tools/virtio/linux/completion.h
 create mode 100644 tools/virtio/linux/mod_devicetable.h
 create mode 100644 tools/virtio/linux/virtio_features.h

-- 
2.51.0


^ permalink raw reply

* [PATCH 2/2] tools/include: Include stdint.h for SIZE_MAX in overflow.h
From: Yichong Chen @ 2026-06-18  9:56 UTC (permalink / raw)
  To: mst, jasowang
  Cc: xuanzhuo, eperezma, rppt, ljs, akpm, virtualization, linux-kernel,
	chenyichong
In-Reply-To: <20260618095642.510565-1-chenyichong@uniontech.com>

tools/include/linux/overflow.h uses SIZE_MAX in its size helpers.

Include stdint.h so overflow.h provides that dependency itself.

Signed-off-by: Yichong Chen <chenyichong@uniontech.com>
---
 tools/include/linux/overflow.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/include/linux/overflow.h b/tools/include/linux/overflow.h
index 3427d7880326..9d30ae0dbd1d 100644
--- a/tools/include/linux/overflow.h
+++ b/tools/include/linux/overflow.h
@@ -2,6 +2,8 @@
 #ifndef __LINUX_OVERFLOW_H
 #define __LINUX_OVERFLOW_H
 
+#include <stdint.h>
+
 #include <linux/compiler.h>
 
 /*
-- 
2.51.0


^ permalink raw reply related

* [PATCH 1/2] tools/virtio: Add missing compat definitions for vhost_net_test
From: Yichong Chen @ 2026-06-18  9:56 UTC (permalink / raw)
  To: mst, jasowang
  Cc: xuanzhuo, eperezma, rppt, ljs, akpm, virtualization, linux-kernel,
	chenyichong
In-Reply-To: <20260618095642.510565-1-chenyichong@uniontech.com>

vhost_net_test builds virtio_ring.c in userspace.

Recent virtio headers pull in new helper headers.

They also use new allocation helpers and a DMA attribute.

Add the missing compat definitions.

Signed-off-by: Yichong Chen <chenyichong@uniontech.com>
---
 tools/virtio/linux/completion.h      |  9 +++++
 tools/virtio/linux/device.h          |  1 +
 tools/virtio/linux/dma-mapping.h     |  1 +
 tools/virtio/linux/mod_devicetable.h | 14 +++++++
 tools/virtio/linux/slab.h            |  4 ++
 tools/virtio/linux/virtio_features.h | 56 ++++++++++++++++++++++++++++
 6 files changed, 85 insertions(+)
 create mode 100644 tools/virtio/linux/completion.h
 create mode 100644 tools/virtio/linux/mod_devicetable.h
 create mode 100644 tools/virtio/linux/virtio_features.h

diff --git a/tools/virtio/linux/completion.h b/tools/virtio/linux/completion.h
new file mode 100644
index 000000000000..5e54b679721b
--- /dev/null
+++ b/tools/virtio/linux/completion.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_COMPLETION_H
+#define _LINUX_COMPLETION_H
+
+struct completion {
+	unsigned int done;
+};
+
+#endif /* _LINUX_COMPLETION_H */
diff --git a/tools/virtio/linux/device.h b/tools/virtio/linux/device.h
index 075c2140d975..abf100cb0023 100644
--- a/tools/virtio/linux/device.h
+++ b/tools/virtio/linux/device.h
@@ -1,4 +1,5 @@
 #ifndef LINUX_DEVICE_H
+#define LINUX_DEVICE_H
 
 struct device {
 	void *parent;
diff --git a/tools/virtio/linux/dma-mapping.h b/tools/virtio/linux/dma-mapping.h
index fddfa2fbb276..65e2974b3908 100644
--- a/tools/virtio/linux/dma-mapping.h
+++ b/tools/virtio/linux/dma-mapping.h
@@ -59,5 +59,6 @@ enum dma_data_direction {
  * instead.
  */
 #define DMA_MAPPING_ERROR		(~(dma_addr_t)0)
+#define DMA_ATTR_DEBUGGING_IGNORE_CACHELINES	0
 
 #endif
diff --git a/tools/virtio/linux/mod_devicetable.h b/tools/virtio/linux/mod_devicetable.h
new file mode 100644
index 000000000000..3ba594b8229d
--- /dev/null
+++ b/tools/virtio/linux/mod_devicetable.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_MOD_DEVICETABLE_H
+#define _LINUX_MOD_DEVICETABLE_H
+
+#include <linux/types.h>
+
+struct virtio_device_id {
+	__u32 device;
+	__u32 vendor;
+};
+
+#define VIRTIO_DEV_ANY_ID	0xffffffff
+
+#endif /* _LINUX_MOD_DEVICETABLE_H */
diff --git a/tools/virtio/linux/slab.h b/tools/virtio/linux/slab.h
index 319dcaa07755..13d94c6f663c 100644
--- a/tools/virtio/linux/slab.h
+++ b/tools/virtio/linux/slab.h
@@ -4,4 +4,8 @@
 #define GFP_ATOMIC 0
 #define __GFP_NOWARN 0
 #define __GFP_ZERO 0
+#define kmalloc_obj(VAR_OR_TYPE, ...) \
+	kmalloc(sizeof(VAR_OR_TYPE), GFP_KERNEL)
+#define kmalloc_objs(VAR_OR_TYPE, COUNT, ...) \
+	kmalloc_array((COUNT), sizeof(VAR_OR_TYPE), GFP_KERNEL)
 #endif
diff --git a/tools/virtio/linux/virtio_features.h b/tools/virtio/linux/virtio_features.h
new file mode 100644
index 000000000000..18c56610e209
--- /dev/null
+++ b/tools/virtio/linux/virtio_features.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_VIRTIO_FEATURES_H
+#define _LINUX_VIRTIO_FEATURES_H
+
+#include <linux/bug.h>
+#include <linux/string.h>
+#include <linux/types.h>
+
+#define VIRTIO_FEATURES_U64S	2
+#define VIRTIO_FEATURES_BITS	(VIRTIO_FEATURES_U64S * 64)
+
+#define VIRTIO_BIT(b)		(1ULL << ((b) & 0x3f))
+#define VIRTIO_U64(b)		((b) >> 6)
+
+#define VIRTIO_DECLARE_FEATURES(name)			\
+	union {						\
+		u64 name;					\
+		u64 name##_array[VIRTIO_FEATURES_U64S];	\
+	}
+
+static inline bool virtio_features_chk_bit(unsigned int bit)
+{
+	return bit < VIRTIO_FEATURES_BITS;
+}
+
+static inline bool virtio_features_test_bit(const u64 *features,
+					    unsigned int bit)
+{
+	return virtio_features_chk_bit(bit) &&
+	       !!(features[VIRTIO_U64(bit)] & VIRTIO_BIT(bit));
+}
+
+static inline void virtio_features_set_bit(u64 *features, unsigned int bit)
+{
+	if (virtio_features_chk_bit(bit))
+		features[VIRTIO_U64(bit)] |= VIRTIO_BIT(bit);
+}
+
+static inline void virtio_features_clear_bit(u64 *features, unsigned int bit)
+{
+	if (virtio_features_chk_bit(bit))
+		features[VIRTIO_U64(bit)] &= ~VIRTIO_BIT(bit);
+}
+
+static inline void virtio_features_zero(u64 *features)
+{
+	memset(features, 0, sizeof(features[0]) * VIRTIO_FEATURES_U64S);
+}
+
+static inline void virtio_features_from_u64(u64 *features, u64 from)
+{
+	virtio_features_zero(features);
+	features[0] = from;
+}
+
+#endif /* _LINUX_VIRTIO_FEATURES_H */
-- 
2.51.0


^ permalink raw reply related

* Re: [PATCH] virtio-blk: use little-endian types for the zoned fields
From: Stefano Garzarella @ 2026-06-18  7:41 UTC (permalink / raw)
  To: Michael Bommarito
  Cc: Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi, Dmitry Fomichev,
	Damien Le Moal, Jens Axboe, Paolo Bonzini, virtualization,
	linux-block, linux-kernel
In-Reply-To: <20260617151727.4071754-1-michael.bommarito@gmail.com>

On Wed, Jun 17, 2026 at 11:17:27AM -0400, Michael Bommarito wrote:
>The zoned block-device fields in the virtio-blk header are typed
>__virtio{32,64}, so their endianness follows VIRTIO_F_VERSION_1. The
>zoned feature is only defined for VIRTIO 1.x devices, and the virtio
>specification defines all of its fields as little-endian. Commit
>b16a1756c716 ("virtio_blk: mark all zone fields LE") tagged them
>__le* for exactly this reason, but commit f1ba4e674feb ("virtio-blk:
>fix to match virtio spec") re-applied the reviewed version of the
>original zoned series -- which predated b16a1756 -- and silently
>restored the __virtio* typing together with the matching
>virtio*_to_cpu() / virtio_cread() accessors in the driver.
>
>Restore the little-endian typing for the zoned configuration-space
>characteristics, the zone descriptor, the zone report header and the
>ZONE_APPEND in-header sector, and read them with le*_to_cpu() and
>virtio_cread_le() to match.
>
>There is no functional change on any spec-compliant device: zoned
>requires VIRTIO_F_VERSION_1, and for a VERSION_1 device
>virtio*_to_cpu() is identical to le*_to_cpu(). The change makes the
>uapi types describe the actual wire format and removes a latent
>endianness mismatch for a (non-conformant) legacy device on a
>big-endian guest.

Not for this patch, but at this point should we do the same also for the 
fields gated by the following features that IIUC are all added in 1.*:
- VIRTIO_BLK_F_MQ
- VIRTIO_BLK_F_DISCARD
- VIRTIO_BLK_F_WRITE_ZEROES
- VIRTIO_BLK_F_SECURE_ERASE

>
>Fixes: f1ba4e674feb ("virtio-blk: fix to match virtio spec")
>Suggested-by: Michael S. Tsirkin <mst@redhat.com>
>Assisted-by: Claude:claude-opus-4-8
>Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
>---
>Testing:
> - Builds with no new warnings; sparse endian-clean (C=2,
>   __CHECK_ENDIAN__, CONFIG_BLK_DEV_ZONED=y) both before and after.
> - Booted under QEMU with a host-managed zoned device exposed through
>   virtio-blk. Zone revalidation, blkzone report and a sequential
>   write / write-pointer check return correct values; blktests zbd
>   device tests 001-006 (sysfs+ioctl, report zone, reset, write split,
>   write ordering, revalidate) pass, with results identical before and
>   after this change -- expected, since on a VIRTIO_F_VERSION_1 device
>   virtio*_to_cpu() == le*_to_cpu().
>
> drivers/block/virtio_blk.c      | 38 +++++++++++++++------------------
> include/uapi/linux/virtio_blk.h | 18 ++++++++--------
> 2 files changed, 26 insertions(+), 30 deletions(-)
>
>diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>index b1c9a27fe00f3..5532cfbde7bfe 100644
>--- a/drivers/block/virtio_blk.c
>+++ b/drivers/block/virtio_blk.c
>@@ -99,7 +99,7 @@ struct virtblk_req {
> 		 * be the last byte.
> 		 */
> 		struct {
>-			__virtio64 sector;
>+			__le64 sector;
> 			u8 status;
> 		} zone_append;
> 	} in_hdr;
>@@ -335,14 +335,12 @@ static inline void virtblk_request_done(struct request *req)
> {
> 	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
> 	blk_status_t status = virtblk_result(virtblk_vbr_status(vbr));
>-	struct virtio_blk *vblk = req->mq_hctx->queue->queuedata;
>
> 	virtblk_unmap_data(req, vbr);
> 	virtblk_cleanup_cmd(req);
>
> 	if (req_op(req) == REQ_OP_ZONE_APPEND)
>-		req->__sector = virtio64_to_cpu(vblk->vdev,
>-						vbr->in_hdr.zone_append.sector);
>+		req->__sector = le64_to_cpu(vbr->in_hdr.zone_append.sector);
>
> 	blk_mq_end_request(req, status);
> }
>@@ -589,13 +587,13 @@ static int virtblk_parse_zone(struct virtio_blk *vblk,
> {
> 	struct blk_zone zone = { };
>
>-	zone.start = virtio64_to_cpu(vblk->vdev, entry->z_start);
>+	zone.start = le64_to_cpu(entry->z_start);
> 	if (zone.start + vblk->zone_sectors <= get_capacity(vblk->disk))
> 		zone.len = vblk->zone_sectors;
> 	else
> 		zone.len = get_capacity(vblk->disk) - zone.start;
>-	zone.capacity = virtio64_to_cpu(vblk->vdev, entry->z_cap);
>-	zone.wp = virtio64_to_cpu(vblk->vdev, entry->z_wp);
>+	zone.capacity = le64_to_cpu(entry->z_cap);
>+	zone.wp = le64_to_cpu(entry->z_wp);
>
> 	switch (entry->z_type) {
> 	case VIRTIO_BLK_ZT_SWR:
>@@ -687,8 +685,7 @@ static int virtblk_report_zones(struct gendisk *disk, sector_t sector,
> 		if (ret)
> 			goto fail_report;
>
>-		nz = min_t(u64, virtio64_to_cpu(vblk->vdev, report->nr_zones),
>-			   nr_zones);
>+		nz = min_t(u64, le64_to_cpu(report->nr_zones), nr_zones);
> 		if (!nz)
> 			break;
>
>@@ -698,8 +695,7 @@ static int virtblk_report_zones(struct gendisk *disk, sector_t sector,
> 			if (ret)
> 				goto fail_report;
>
>-			sector = virtio64_to_cpu(vblk->vdev,
>-						 report->zones[i].z_start) +
>+			sector = le64_to_cpu(report->zones[i].z_start) +
> 				 vblk->zone_sectors;
> 			zone_idx++;
> 		}
>@@ -725,18 +721,18 @@ static int virtblk_read_zoned_limits(struct virtio_blk *vblk,
>
> 	lim->features |= BLK_FEAT_ZONED;
>
>-	virtio_cread(vdev, struct virtio_blk_config,
>-		     zoned.max_open_zones, &v);
>+	virtio_cread_le(vdev, struct virtio_blk_config,
>+			zoned.max_open_zones, &v);
> 	lim->max_open_zones = v;
> 	dev_dbg(&vdev->dev, "max open zones = %u\n", v);
>
>-	virtio_cread(vdev, struct virtio_blk_config,
>-		     zoned.max_active_zones, &v);
>+	virtio_cread_le(vdev, struct virtio_blk_config,
>+			zoned.max_active_zones, &v);
> 	lim->max_active_zones = v;
> 	dev_dbg(&vdev->dev, "max active zones = %u\n", v);
>
>-	virtio_cread(vdev, struct virtio_blk_config,
>-		     zoned.write_granularity, &wg);
>+	virtio_cread_le(vdev, struct virtio_blk_config,
>+			zoned.write_granularity, &wg);
> 	if (!wg) {
> 		dev_warn(&vdev->dev, "zero write granularity reported\n");
> 		return -ENODEV;
>@@ -750,8 +746,8 @@ static int virtblk_read_zoned_limits(struct virtio_blk *vblk,
> 	 * virtio ZBD specification doesn't require zones to be a power of
> 	 * two sectors in size, but the code in this driver expects that.
> 	 */
>-	virtio_cread(vdev, struct virtio_blk_config, zoned.zone_sectors,
>-		     &vblk->zone_sectors);
>+	virtio_cread_le(vdev, struct virtio_blk_config, zoned.zone_sectors,
>+			&vblk->zone_sectors);
> 	if (vblk->zone_sectors == 0 || !is_power_of_2(vblk->zone_sectors)) {
> 		dev_err(&vdev->dev,
> 			"zoned device with non power of two zone size %u\n",
>@@ -767,8 +763,8 @@ static int virtblk_read_zoned_limits(struct virtio_blk *vblk,
> 		lim->max_hw_discard_sectors = 0;
> 	}
>
>-	virtio_cread(vdev, struct virtio_blk_config,
>-		     zoned.max_append_sectors, &v);
>+	virtio_cread_le(vdev, struct virtio_blk_config,
>+			zoned.max_append_sectors, &v);
> 	if (!v) {
> 		dev_warn(&vdev->dev, "zero max_append_sectors reported\n");
> 		return -ENODEV;
>diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
>index 3744e4da1b2a7..5af2a0300bb9d 100644
>--- a/include/uapi/linux/virtio_blk.h
>+++ b/include/uapi/linux/virtio_blk.h
>@@ -140,11 +140,11 @@ struct virtio_blk_config {
>

To avoid making this mistake again, how about adding a note here to 
clarify that all the fields listed below are defined only for VIRTIO 1.x 
devices and are therefore always little-endian?

Anyway, the patch LGTM:

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>


> 	/* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */
> 	struct virtio_blk_zoned_characteristics {
>-		__virtio32 zone_sectors;
>-		__virtio32 max_open_zones;
>-		__virtio32 max_active_zones;
>-		__virtio32 max_append_sectors;
>-		__virtio32 write_granularity;
>+		__le32 zone_sectors;
>+		__le32 max_open_zones;
>+		__le32 max_active_zones;
>+		__le32 max_append_sectors;
>+		__le32 write_granularity;
> 		__u8 model;
> 		__u8 unused2[3];
> 	} zoned;
>@@ -241,11 +241,11 @@ struct virtio_blk_outhdr {
>  */
> struct virtio_blk_zone_descriptor {
> 	/* Zone capacity */
>-	__virtio64 z_cap;
>+	__le64 z_cap;
> 	/* The starting sector of the zone */
>-	__virtio64 z_start;
>+	__le64 z_start;
> 	/* Zone write pointer position in sectors */
>-	__virtio64 z_wp;
>+	__le64 z_wp;
> 	/* Zone type */
> 	__u8 z_type;
> 	/* Zone state */
>@@ -254,7 +254,7 @@ struct virtio_blk_zone_descriptor {
> };
>
> struct virtio_blk_zone_report {
>-	__virtio64 nr_zones;
>+	__le64 nr_zones;
> 	__u8 reserved[56];
> 	struct virtio_blk_zone_descriptor zones[];
> };
>-- 
>2.53.0
>


^ permalink raw reply

* [PATCH] drm/virtio: warn when virtqueue has no free space for too long
From: Ryosuke Yasuoka @ 2026-06-18  7:18 UTC (permalink / raw)
  To: David Airlie, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter
  Cc: dri-devel, virtualization, linux-kernel, Ryosuke Yasuoka

virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() wait for
virtqueue space using wait_event() with vqs_released as the only abort
condition. This covers the device removal path, where
virtio_gpu_release_vqs() sets the flag, but does not help when the host
simply stops processing the virtqueue while the device remains present.

In that case, the virtqueue fills up and subsequent command submissions
block indefinitely in D state with no diagnostic output, making the root
cause difficult to identify.

Replace the bare wait_event() with a wait_event_timeout() loop that
prints a warning every 5 seconds while the virtqueue remains full. The
wait still blocks indefinitely so driver behavior is unchanged. The
warnings help identify an unresponsive host device during
troubleshooting.

Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
---
When the host stops processing the virtio-gpu virtqueue without
triggering device removal, the bare wait_event() in
virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() blocks
indefinitely with no diagnostic output. A DRM atomic commit worker
blocks in virtio_gpu_queue_fenced_ctrl_buffer() while holding the
modeset lock. During graceful shutdown, systemd (PID 1) needs the same
lock — either by writing to the console via fbcon, or by closing a DRM
file descriptor that triggers framebuffer cleanup — and blocks as well,
making the VM unrecoverable without a forced power-off.

  PID: 553    COMMAND: "kworker/u4:3"
   #0 __schedule
   #1 schedule
   #2 virtio_gpu_queue_fenced_ctrl_buffer [virtio_gpu]
   #3 virtio_gpu_primary_plane_update [virtio_gpu]
   ...

  PID: 1      COMMAND: "systemd"  (console write path)
   #0 __schedule
   #1 schedule
   #2 schedule_preempt_disabled
   #3 __ww_mutex_lock
   #4 drm_modeset_lock [drm]
   #5 drm_atomic_get_plane_state [drm]
   #6 drm_client_modeset_commit_atomic [drm]
   #7 drm_client_modeset_commit_locked [drm]
   #8 drm_fb_helper_pan_display [drm_kms_helper]
   #9 fb_pan_display
  #10 bit_update_start
  #11 fbcon_switch
  #12 redraw_screen
   ...

Reproduction steps:
1. Build QEMU with the fault injection patch [1] that adds an
   x-ctrl-queue-broken property to virtio-gpu.
2. Boot the VM and trigger the fault injection from the host.
3. Fill the ctrlq (e.g., move the mouse on the guest's display).
   The process gets stuck in virtio_gpu_queue_fenced_ctrl_buffer()
   in D state.
4. Run a graceful shutdown command (shutdown now or reboot).
5. The shutdown process hangs.

My earlier patch a46991b334f6 ("drm/virtio: abort virtqueue wait on
device removal to avoid hung task") covers the case where the shutdown
process reaches the device_shutdown() call path, which sets vqs_released
to unblock the wait. However, during graceful shutdown, systemd (PID 1)
gets stuck on the modeset lock before ever reaching device_shutdown(),
so vqs_released is never set and the wait is never unblocked.

I initially considered adding a module parameter to abort the wait with
-ENODEV on timeout:

  +static unsigned int virtio_gpu_vq_timeout;
  +MODULE_PARM_DESC(vq_timeout,
  +     "Timeout in seconds for virtqueue wait (0 = no timeout, default)");
  +module_param_named(vq_timeout, virtio_gpu_vq_timeout, uint, 0444);
  ...
  +             if (virtio_gpu_vq_timeout) {
  +                     if (!wait_event_timeout(vgdev->ctrlq.ack_queue,
  +                                             vq->num_free >= elemcnt ||
  +                                             vgdev->vqs_released,
  +                                             secs_to_jiffies(virtio_gpu_vq_timeout))) {
  +                             if (fence && vbuf->objs)
  +                                     virtio_gpu_array_unlock_resv(vbuf->objs);
  +                             free_vbuf(vgdev, vbuf);
  +                             drm_dev_exit(idx);
  +                             return -ENODEV;
  +                     }
  +             } else {
  +                     wait_event(vgdev->ctrlq.ack_queue,
  +                                vq->num_free >= elemcnt ||
  +                                vgdev->vqs_released);
  +             }

This approach aborts the wait and allows the graceful shutdown process
to eventually proceed, albeit with a delay.

But that approach has drawbacks: it allows users to set arbitrarily
short timeouts that could destabilize the driver, and aborting commands
mid-flight is a rough recovery path. An unconditional timeout was also
discussed previously [2] but is not appropriate without virtio
specification support.

This patch takes a safer approach: replace the bare wait_event() with
wait_event_timeout() and print a warning every 5 seconds while the
virtqueue remains full. The wait still blocks indefinitely and no
commands are aborted, so driver behavior is unchanged. The warnings
help identify an unresponsive host device during troubleshooting.
Once the user notices the warning, they can work around the hang by
unbinding the VT from fbcon, removing the device, or forcing a shutdown
via SysRq.

[1] https://gist.github.com/YsuOS/fbcd181752594af35f954953a1d260b8
[2] https://lore.kernel.org/all/8a986c52-964f-42a5-b063-fbe2b242ca36@collabora.com/
---
 drivers/gpu/drm/virtio/virtgpu_vq.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 68d097ad9d1d..a546130d3b6a 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -410,8 +410,13 @@ static int virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev,
 	if (vq->num_free < elemcnt) {
 		spin_unlock(&vgdev->ctrlq.qlock);
 		virtio_gpu_notify(vgdev);
-		wait_event(vgdev->ctrlq.ack_queue,
-			   vq->num_free >= elemcnt || vgdev->vqs_released);
+		while (!wait_event_timeout(vgdev->ctrlq.ack_queue,
+					   vq->num_free >= elemcnt ||
+					   vgdev->vqs_released,
+					   5 * HZ) && !vgdev->vqs_released)
+			DRM_WARN("ctrlq waiting for host: no free space for %d secs\n",
+				 5);
+
 		/*
 		 * Set by virtio_gpu_release_vqs() to unblock
 		 * synchronize_srcu() wait in drm_dev_unplug().
@@ -592,8 +597,13 @@ static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
 	ret = virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC);
 	if (ret == -ENOSPC) {
 		spin_unlock(&vgdev->cursorq.qlock);
-		wait_event(vgdev->cursorq.ack_queue,
-			   vq->num_free >= outcnt || vgdev->vqs_released);
+		while (!wait_event_timeout(vgdev->cursorq.ack_queue,
+					   vq->num_free >= outcnt ||
+					   vgdev->vqs_released,
+					   5 * HZ) && !vgdev->vqs_released)
+			DRM_WARN("cursorq waiting for host: no free space for %d secs\n",
+				 5);
+
 		/* See comment in virtio_gpu_queue_ctrl_sgs(). */
 		if (vgdev->vqs_released) {
 			free_vbuf(vgdev, vbuf);

---
base-commit: b9e2d5cdaab05c997be3a69d9b372d7676683e1b
change-id: 20260616-virtiogpu_add_timeout-33e985718c22

Best regards,
-- 
Ryosuke Yasuoka <ryasuoka@redhat.com>

^ permalink raw reply related

* Re: [PATCH v4 0/7] nvdimm: virtio_pmem: fix request lifetime and converge broken queue failures
From: Li Chen @ 2026-06-18  1:18 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Pankaj Gupta, Dan Williams, Vishal Verma, Dave Jiang, Ira Weiny,
	virtualization, nvdimm, linux-kernel
In-Reply-To: <aixTFVGZVDaCxMis@aschofie-mobl2.lan>

Hi Alison,


 ---- On Sat, 13 Jun 2026 02:42:29 +0800  Alison Schofield <alison.schofield@intel.com> wrote --- 
 > On Tue, Jun 09, 2026 at 08:07:14PM +0800, Li Chen wrote:
 > > Hi,
 > 
 > Hi Li Chen,
 > 
 > In case you missed it, a Sashiko AI review of this set has posted
 > feedback. Please take a look.
 > 
 > https://sashiko.dev/#/patchset/20260609120726.1714780-1-me%40linux.beauty
 > 
 > -- Alison

Thanks for checking and for the reminder.

I will also keep watching Sashiko's review results for this and future
rerolls, and will fold in any issues that look valid.


Regards,
Li


^ permalink raw reply

* Re: [GIT PULL] virtio,vhost,vdpa: features, fixes
From: pr-tracker-bot @ 2026-06-17 19:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Linus Torvalds, kvm, virtualization, netdev, linux-kernel, a0yami,
	ammarfaizi2, arnd, chenhuacai, chenhuacai, christfontanez,
	Damir.Shaikhutdinov, david, den, enelsonmoore, eperezma, ethan,
	evg28bur, filip.hejsek, francesco, graf, harald.mommer, jasowang,
	jiri, johan, johannes.thumshirn, lingshan.zhu, luis.hernandez093,
	lulu, mhi, michael.bommarito, mikhail.golubev-ciuchea, mkl, mst,
	mvaralar, nathan, oleg, pawel.moll, physicalmtea, polina.vishneva,
	q.h.hack.winter, r 
In-Reply-To: <20260617065516-mutt-send-email-mst@kernel.org>

The pull request you sent on Wed, 17 Jun 2026 06:55:16 -0400:

> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/d44ade05aa21468bd30652bc4492891b854a400a

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply

* Re: [PATCH] drm/virtio: Use common error handling code in two functions
From: Markus Elfring @ 2026-06-17 16:13 UTC (permalink / raw)
  To: Dmitry Osipenko, virtualization, dri-devel, Chia-I Wu,
	David Airlie, Gerd Hoffmann, Gurchetan Singh, Maarten Lankhorst,
	Maxime Ripard, Simona Vetter, Thomas Zimmermann
  Cc: LKML, kernel-janitors
In-Reply-To: <54d6527d-30c2-4e73-846d-c0e5276c8f74@collabora.com>

>> Use additional labels so that a bit of exception handling can be better
>> reused at the end of two function implementations.
…
>> +++ b/drivers/gpu/drm/virtio/virtgpu_vram.c
>> @@ -212,16 +212,12 @@ int virtio_gpu_vram_create(struct virtio_gpu_device *vgdev,
>>  
>>  	/* Create fake offset */
>>  	ret = drm_gem_create_mmap_offset(obj);
>> -	if (ret) {
>> -		kfree(vram);
>> -		return ret;
>> -	}
>> +	if (ret)
>> +		goto free_vram;
>>  
>>  	ret = virtio_gpu_resource_id_get(vgdev, &vram->base.hw_res_handle);
>> -	if (ret) {
>> -		kfree(vram);
>> -		return ret;
>> -	}
>> +	if (ret)
>> +		goto free_vram;
>>  
>>  	virtio_gpu_cmd_resource_create_blob(vgdev, &vram->base, params, NULL,
>>  					    0);
>> @@ -237,6 +233,10 @@ int virtio_gpu_vram_create(struct virtio_gpu_device *vgdev,
>>  
>>  	*bo_ptr = &vram->base;
>>  	return 0;
>> +
>> +free_vram:
>> +	kfree(vram);
>> +	return ret;
>>  }
>>  
>>  void virtio_gpu_vram_map_deferred(struct virtio_gpu_object_vram *vram)
> 
> Please see [1], will be great if you could address the reported issues
> with this patch in v2

Do you indicate that you would prefer an other coding style for the application
of goto chains?


>                       and add another patch fixing the
> virtio_gpu_resource_id_get() error handling.
> 
> [1]
> https://sashiko.dev/#/patchset/b7440806-e9e8-4027-afe1-f6fe9297d8b2%40web.de
Do you request to achieve corresponding resource cleanup after a failed
virtio_gpu_resource_id_get() call by any other function call instead of kfree(vram)?

Regards,
Markus

^ permalink raw reply

* [PATCH] virtio-blk: use little-endian types for the zoned fields
From: Michael Bommarito @ 2026-06-17 15:17 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang
  Cc: Stefan Hajnoczi, Stefano Garzarella, Dmitry Fomichev,
	Damien Le Moal, Jens Axboe, Paolo Bonzini, virtualization,
	linux-block, linux-kernel

The zoned block-device fields in the virtio-blk header are typed
__virtio{32,64}, so their endianness follows VIRTIO_F_VERSION_1. The
zoned feature is only defined for VIRTIO 1.x devices, and the virtio
specification defines all of its fields as little-endian. Commit
b16a1756c716 ("virtio_blk: mark all zone fields LE") tagged them
__le* for exactly this reason, but commit f1ba4e674feb ("virtio-blk:
fix to match virtio spec") re-applied the reviewed version of the
original zoned series -- which predated b16a1756 -- and silently
restored the __virtio* typing together with the matching
virtio*_to_cpu() / virtio_cread() accessors in the driver.

Restore the little-endian typing for the zoned configuration-space
characteristics, the zone descriptor, the zone report header and the
ZONE_APPEND in-header sector, and read them with le*_to_cpu() and
virtio_cread_le() to match.

There is no functional change on any spec-compliant device: zoned
requires VIRTIO_F_VERSION_1, and for a VERSION_1 device
virtio*_to_cpu() is identical to le*_to_cpu(). The change makes the
uapi types describe the actual wire format and removes a latent
endianness mismatch for a (non-conformant) legacy device on a
big-endian guest.

Fixes: f1ba4e674feb ("virtio-blk: fix to match virtio spec")
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
Testing:
 - Builds with no new warnings; sparse endian-clean (C=2,
   __CHECK_ENDIAN__, CONFIG_BLK_DEV_ZONED=y) both before and after.
 - Booted under QEMU with a host-managed zoned device exposed through
   virtio-blk. Zone revalidation, blkzone report and a sequential
   write / write-pointer check return correct values; blktests zbd
   device tests 001-006 (sysfs+ioctl, report zone, reset, write split,
   write ordering, revalidate) pass, with results identical before and
   after this change -- expected, since on a VIRTIO_F_VERSION_1 device
   virtio*_to_cpu() == le*_to_cpu().

 drivers/block/virtio_blk.c      | 38 +++++++++++++++------------------
 include/uapi/linux/virtio_blk.h | 18 ++++++++--------
 2 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index b1c9a27fe00f3..5532cfbde7bfe 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -99,7 +99,7 @@ struct virtblk_req {
 		 * be the last byte.
 		 */
 		struct {
-			__virtio64 sector;
+			__le64 sector;
 			u8 status;
 		} zone_append;
 	} in_hdr;
@@ -335,14 +335,12 @@ static inline void virtblk_request_done(struct request *req)
 {
 	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
 	blk_status_t status = virtblk_result(virtblk_vbr_status(vbr));
-	struct virtio_blk *vblk = req->mq_hctx->queue->queuedata;
 
 	virtblk_unmap_data(req, vbr);
 	virtblk_cleanup_cmd(req);
 
 	if (req_op(req) == REQ_OP_ZONE_APPEND)
-		req->__sector = virtio64_to_cpu(vblk->vdev,
-						vbr->in_hdr.zone_append.sector);
+		req->__sector = le64_to_cpu(vbr->in_hdr.zone_append.sector);
 
 	blk_mq_end_request(req, status);
 }
@@ -589,13 +587,13 @@ static int virtblk_parse_zone(struct virtio_blk *vblk,
 {
 	struct blk_zone zone = { };
 
-	zone.start = virtio64_to_cpu(vblk->vdev, entry->z_start);
+	zone.start = le64_to_cpu(entry->z_start);
 	if (zone.start + vblk->zone_sectors <= get_capacity(vblk->disk))
 		zone.len = vblk->zone_sectors;
 	else
 		zone.len = get_capacity(vblk->disk) - zone.start;
-	zone.capacity = virtio64_to_cpu(vblk->vdev, entry->z_cap);
-	zone.wp = virtio64_to_cpu(vblk->vdev, entry->z_wp);
+	zone.capacity = le64_to_cpu(entry->z_cap);
+	zone.wp = le64_to_cpu(entry->z_wp);
 
 	switch (entry->z_type) {
 	case VIRTIO_BLK_ZT_SWR:
@@ -687,8 +685,7 @@ static int virtblk_report_zones(struct gendisk *disk, sector_t sector,
 		if (ret)
 			goto fail_report;
 
-		nz = min_t(u64, virtio64_to_cpu(vblk->vdev, report->nr_zones),
-			   nr_zones);
+		nz = min_t(u64, le64_to_cpu(report->nr_zones), nr_zones);
 		if (!nz)
 			break;
 
@@ -698,8 +695,7 @@ static int virtblk_report_zones(struct gendisk *disk, sector_t sector,
 			if (ret)
 				goto fail_report;
 
-			sector = virtio64_to_cpu(vblk->vdev,
-						 report->zones[i].z_start) +
+			sector = le64_to_cpu(report->zones[i].z_start) +
 				 vblk->zone_sectors;
 			zone_idx++;
 		}
@@ -725,18 +721,18 @@ static int virtblk_read_zoned_limits(struct virtio_blk *vblk,
 
 	lim->features |= BLK_FEAT_ZONED;
 
-	virtio_cread(vdev, struct virtio_blk_config,
-		     zoned.max_open_zones, &v);
+	virtio_cread_le(vdev, struct virtio_blk_config,
+			zoned.max_open_zones, &v);
 	lim->max_open_zones = v;
 	dev_dbg(&vdev->dev, "max open zones = %u\n", v);
 
-	virtio_cread(vdev, struct virtio_blk_config,
-		     zoned.max_active_zones, &v);
+	virtio_cread_le(vdev, struct virtio_blk_config,
+			zoned.max_active_zones, &v);
 	lim->max_active_zones = v;
 	dev_dbg(&vdev->dev, "max active zones = %u\n", v);
 
-	virtio_cread(vdev, struct virtio_blk_config,
-		     zoned.write_granularity, &wg);
+	virtio_cread_le(vdev, struct virtio_blk_config,
+			zoned.write_granularity, &wg);
 	if (!wg) {
 		dev_warn(&vdev->dev, "zero write granularity reported\n");
 		return -ENODEV;
@@ -750,8 +746,8 @@ static int virtblk_read_zoned_limits(struct virtio_blk *vblk,
 	 * virtio ZBD specification doesn't require zones to be a power of
 	 * two sectors in size, but the code in this driver expects that.
 	 */
-	virtio_cread(vdev, struct virtio_blk_config, zoned.zone_sectors,
-		     &vblk->zone_sectors);
+	virtio_cread_le(vdev, struct virtio_blk_config, zoned.zone_sectors,
+			&vblk->zone_sectors);
 	if (vblk->zone_sectors == 0 || !is_power_of_2(vblk->zone_sectors)) {
 		dev_err(&vdev->dev,
 			"zoned device with non power of two zone size %u\n",
@@ -767,8 +763,8 @@ static int virtblk_read_zoned_limits(struct virtio_blk *vblk,
 		lim->max_hw_discard_sectors = 0;
 	}
 
-	virtio_cread(vdev, struct virtio_blk_config,
-		     zoned.max_append_sectors, &v);
+	virtio_cread_le(vdev, struct virtio_blk_config,
+			zoned.max_append_sectors, &v);
 	if (!v) {
 		dev_warn(&vdev->dev, "zero max_append_sectors reported\n");
 		return -ENODEV;
diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
index 3744e4da1b2a7..5af2a0300bb9d 100644
--- a/include/uapi/linux/virtio_blk.h
+++ b/include/uapi/linux/virtio_blk.h
@@ -140,11 +140,11 @@ struct virtio_blk_config {
 
 	/* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */
 	struct virtio_blk_zoned_characteristics {
-		__virtio32 zone_sectors;
-		__virtio32 max_open_zones;
-		__virtio32 max_active_zones;
-		__virtio32 max_append_sectors;
-		__virtio32 write_granularity;
+		__le32 zone_sectors;
+		__le32 max_open_zones;
+		__le32 max_active_zones;
+		__le32 max_append_sectors;
+		__le32 write_granularity;
 		__u8 model;
 		__u8 unused2[3];
 	} zoned;
@@ -241,11 +241,11 @@ struct virtio_blk_outhdr {
  */
 struct virtio_blk_zone_descriptor {
 	/* Zone capacity */
-	__virtio64 z_cap;
+	__le64 z_cap;
 	/* The starting sector of the zone */
-	__virtio64 z_start;
+	__le64 z_start;
 	/* Zone write pointer position in sectors */
-	__virtio64 z_wp;
+	__le64 z_wp;
 	/* Zone type */
 	__u8 z_type;
 	/* Zone state */
@@ -254,7 +254,7 @@ struct virtio_blk_zone_descriptor {
 };
 
 struct virtio_blk_zone_report {
-	__virtio64 nr_zones;
+	__le64 nr_zones;
 	__u8 reserved[56];
 	struct virtio_blk_zone_descriptor zones[];
 };
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH] drm/virtio: Use common error handling code in two functions
From: Dmitry Osipenko @ 2026-06-17 15:11 UTC (permalink / raw)
  To: Markus Elfring, virtualization, dri-devel, Chia-I Wu,
	David Airlie, Gerd Hoffmann, Gurchetan Singh, Maarten Lankhorst,
	Maxime Ripard, Simona Vetter, Thomas Zimmermann
  Cc: LKML, kernel-janitors
In-Reply-To: <b7440806-e9e8-4027-afe1-f6fe9297d8b2@web.de>

Hi,

On 6/9/26 21:08, Markus Elfring wrote:
> From: Markus Elfring <elfring@users.sourceforge.net>
> Date: Tue, 9 Jun 2026 20:00:07 +0200
> 
> Use additional labels so that a bit of exception handling can be better
> reused at the end of two function implementations.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
> ---
>  drivers/gpu/drm/virtio/virtgpu_vq.c   |  7 +++----
>  drivers/gpu/drm/virtio/virtgpu_vram.c | 16 ++++++++--------
>  2 files changed, 11 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
> index 67865810a2e7..05b19c73103a 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
> @@ -318,15 +318,14 @@ static struct sg_table *vmalloc_to_sgt(char *data, uint32_t size, int *sg_ents)
>  
>  	*sg_ents = DIV_ROUND_UP(size, PAGE_SIZE);
>  	ret = sg_alloc_table(sgt, *sg_ents, GFP_KERNEL);
> -	if (ret) {
> -		kfree(sgt);
> -		return NULL;
> -	}
> +	if (ret)
> +		goto free_sgt;
>  
>  	for_each_sgtable_sg(sgt, sg, i) {
>  		pg = vmalloc_to_page(data);
>  		if (!pg) {
>  			sg_free_table(sgt);
> +free_sgt:
>  			kfree(sgt);
>  			return NULL;
>  		}
> diff --git a/drivers/gpu/drm/virtio/virtgpu_vram.c b/drivers/gpu/drm/virtio/virtgpu_vram.c
> index 4ae3cbc35dd3..ec5b669fccfa 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_vram.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_vram.c
> @@ -212,16 +212,12 @@ int virtio_gpu_vram_create(struct virtio_gpu_device *vgdev,
>  
>  	/* Create fake offset */
>  	ret = drm_gem_create_mmap_offset(obj);
> -	if (ret) {
> -		kfree(vram);
> -		return ret;
> -	}
> +	if (ret)
> +		goto free_vram;
>  
>  	ret = virtio_gpu_resource_id_get(vgdev, &vram->base.hw_res_handle);
> -	if (ret) {
> -		kfree(vram);
> -		return ret;
> -	}
> +	if (ret)
> +		goto free_vram;
>  
>  	virtio_gpu_cmd_resource_create_blob(vgdev, &vram->base, params, NULL,
>  					    0);
> @@ -237,6 +233,10 @@ int virtio_gpu_vram_create(struct virtio_gpu_device *vgdev,
>  
>  	*bo_ptr = &vram->base;
>  	return 0;
> +
> +free_vram:
> +	kfree(vram);
> +	return ret;
>  }
>  
>  void virtio_gpu_vram_map_deferred(struct virtio_gpu_object_vram *vram)

Please see [1], will be great if you could address the reported issues
with this patch in v2 and add another patch fixing the
virtio_gpu_resource_id_get() error handling.

[1]
https://sashiko.dev/#/patchset/b7440806-e9e8-4027-afe1-f6fe9297d8b2%40web.de

-- 
Best regards,
Dmitry

^ permalink raw reply

* [PATCH v5 8/8] nvdimm: virtio_pmem: drain requests in freeze
From: Li Chen @ 2026-06-17 12:24 UTC (permalink / raw)
  To: Pankaj Gupta, Dan Williams, Vishal Verma, Dave Jiang, Ira Weiny,
	Alison Schofield, virtualization, nvdimm
  Cc: linux-kernel, Li Chen
In-Reply-To: <20260617122442.2118957-1-me@linux.beauty>

virtio_pmem_freeze() currently deletes virtqueues and resets the device
without waking threads waiting for a virtqueue descriptor or a host
completion.

Mark the request virtqueue broken before reset. This makes new submissions
fail fast and lets -ENOSPC waiters leave the wait list. Reset the device
before draining used and unused request tokens, then delete the virtqueues.
This wakes waiters with -EIO. It also keeps the detach call on a quiesced
device.

Signed-off-by: Li Chen <me@linux.beauty>
---
Changes in v5:
- Reset the device before draining used and unused request tokens.
- Use the split broken-marking and post-reset drain helpers.
v2->v3:
- No change.
v3->v4:
- Rebased onto v7.1-rc7 and renumbered after the flush error patches.

 drivers/nvdimm/virtio_pmem.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
index 3bcc7b3671d21..9961bc2678d0f 100644
--- a/drivers/nvdimm/virtio_pmem.c
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -158,9 +158,21 @@ static void virtio_pmem_remove(struct virtio_device *vdev)
 
 static int virtio_pmem_freeze(struct virtio_device *vdev)
 {
-	vdev->config->del_vqs(vdev);
+	struct virtio_pmem *vpmem = vdev->priv;
+	unsigned long flags;
+
+	spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	virtio_pmem_mark_broken(vpmem);
+	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
 	virtio_reset_device(vdev);
 
+	spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	virtio_pmem_drain(vpmem);
+	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+	vdev->config->del_vqs(vdev);
+
 	return 0;
 }
 
-- 
2.52.0

^ permalink raw reply related

* [PATCH v5 7/8] nvdimm: virtio_pmem: converge broken virtqueue to -EIO
From: Li Chen @ 2026-06-17 12:24 UTC (permalink / raw)
  To: Pankaj Gupta, Dan Williams, Vishal Verma, Dave Jiang, Ira Weiny,
	Alison Schofield, virtualization, nvdimm
  Cc: linux-kernel, Li Chen
In-Reply-To: <20260617122442.2118957-1-me@linux.beauty>

dmesg reports virtqueue failure and device reset:
virtio_pmem virtio2: failed to send command to
virtio pmem device, no free slots in the virtqueue
virtio_pmem virtio2: virtio pmem device
needs a reset

virtio_pmem_flush() can wait for a free virtqueue descriptor (-ENOSPC).
It can also wait for host completion. If the request virtqueue breaks,
those waiters may never make progress. One example is notify failure from
virtqueue_kick().

Track a device-level broken state and converge the failure to -EIO. New
requests fail fast, -ENOSPC waiters are unlinked and woken, and completed
requests are forced to report an error after the queue is marked broken.

Do not detach unused buffers from an active virtqueue. Runtime broken-queue
handling only stops new submissions and wakes local waiters. Removal resets
the device first. It then drains request tokens. After that, the device no
longer owns the buffers when the virtqueue reference is dropped.

Closes: https://lore.kernel.org/r/202512250116.ewtzlD0g-lkp@intel.com/
Signed-off-by: Li Chen <me@linux.beauty>
---
Changes in v5:
- Split broken marking from token draining.
- Do not call virtqueue_detach_unused_buf() on an active queue.
- Reset the device before draining tokens in remove().
- Do not let the host-completion wait return only because the device is
  marked broken.
v2->v3:
- Add raw dmesg excerpt to the patch description.
- Drop timestamps from the embedded dmesg.
- Fold the CONFIG_VIRTIO_PMEM=m export fix into this patch.
v3->v4:
- Rebased onto v7.1-rc7 and renumbered after the flush error patches.
- Use kmalloc_obj(*req_data) at the allocation site to match current nvdimm
  code.

 drivers/nvdimm/nd_virtio.c   | 96 +++++++++++++++++++++++++++++++-----
 drivers/nvdimm/virtio_pmem.c | 14 +++++-
 drivers/nvdimm/virtio_pmem.h |  5 ++
 3 files changed, 103 insertions(+), 12 deletions(-)

diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
index f5264f6afe44f..f649f70660097 100644
--- a/drivers/nvdimm/nd_virtio.c
+++ b/drivers/nvdimm/nd_virtio.c
@@ -17,6 +17,18 @@ static void virtio_pmem_req_release(struct kref *kref)
 	kfree(req);
 }
 
+static void virtio_pmem_signal_done(struct virtio_pmem_request *req)
+{
+	WRITE_ONCE(req->done, true);
+	wake_up(&req->host_acked);
+}
+
+static void virtio_pmem_complete_err(struct virtio_pmem_request *req)
+{
+	req->resp.ret = cpu_to_le32(1);
+	virtio_pmem_signal_done(req);
+}
+
 static void virtio_pmem_wake_one_waiter(struct virtio_pmem *vpmem)
 {
 	struct virtio_pmem_request *req_buf;
@@ -31,6 +43,45 @@ static void virtio_pmem_wake_one_waiter(struct virtio_pmem *vpmem)
 	wake_up(&req_buf->wq_buf);
 }
 
+static void virtio_pmem_wake_all_waiters(struct virtio_pmem *vpmem)
+{
+	struct virtio_pmem_request *req, *tmp;
+
+	list_for_each_entry_safe(req, tmp, &vpmem->req_list, list) {
+		list_del_init(&req->list);
+		WRITE_ONCE(req->wq_buf_avail, true);
+		wake_up(&req->wq_buf);
+	}
+}
+
+void virtio_pmem_mark_broken(struct virtio_pmem *vpmem)
+{
+	if (!READ_ONCE(vpmem->broken)) {
+		WRITE_ONCE(vpmem->broken, true);
+		dev_err_once(&vpmem->vdev->dev, "virtqueue is broken\n");
+	}
+
+	virtio_pmem_wake_all_waiters(vpmem);
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_mark_broken);
+
+void virtio_pmem_drain(struct virtio_pmem *vpmem)
+{
+	struct virtio_pmem_request *req;
+	unsigned int len;
+
+	while ((req = virtqueue_get_buf(vpmem->req_vq, &len)) != NULL) {
+		virtio_pmem_complete_err(req);
+		kref_put(&req->kref, virtio_pmem_req_release);
+	}
+
+	while ((req = virtqueue_detach_unused_buf(vpmem->req_vq)) != NULL) {
+		virtio_pmem_complete_err(req);
+		kref_put(&req->kref, virtio_pmem_req_release);
+	}
+}
+EXPORT_SYMBOL_GPL(virtio_pmem_drain);
+
  /* The interrupt handler */
 void virtio_pmem_host_ack(struct virtqueue *vq)
 {
@@ -42,8 +93,10 @@ void virtio_pmem_host_ack(struct virtqueue *vq)
 	spin_lock_irqsave(&vpmem->pmem_lock, flags);
 	while ((req_data = virtqueue_get_buf(vq, &len)) != NULL) {
 		virtio_pmem_wake_one_waiter(vpmem);
-		WRITE_ONCE(req_data->done, true);
-		wake_up(&req_data->host_acked);
+		if (READ_ONCE(vpmem->broken))
+			virtio_pmem_complete_err(req_data);
+		else
+			virtio_pmem_signal_done(req_data);
 		kref_put(&req_data->kref, virtio_pmem_req_release);
 	}
 	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
@@ -71,6 +124,9 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
 		return -EIO;
 	}
 
+	if (READ_ONCE(vpmem->broken))
+		return -EIO;
+
 	req_data = kmalloc_obj(*req_data);
 	if (!req_data)
 		return -ENOMEM;
@@ -87,13 +143,18 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
 	sgs[1] = &ret;
 
 	spin_lock_irqsave(&vpmem->pmem_lock, flags);
-	 /*
-	  * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual
-	  * queue does not have free descriptor. We add the request
-	  * to req_list and wait for host_ack to wake us up when free
-	  * slots are available.
-	  */
+	/*
+	 * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual
+	 * queue does not have free descriptor. We add the request
+	 * to req_list and wait for host_ack to wake us up when free
+	 * slots are available.
+	 */
 	for (;;) {
+		if (READ_ONCE(vpmem->broken)) {
+			err = -EIO;
+			break;
+		}
+
 		err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req_data,
 					GFP_ATOMIC);
 		if (!err) {
@@ -115,17 +176,30 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
 		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
 
 		/* A host response results in "host_ack" getting called */
-		wait_event(req_data->wq_buf, READ_ONCE(req_data->wq_buf_avail));
+		wait_event(req_data->wq_buf,
+			   READ_ONCE(req_data->wq_buf_avail) ||
+			   READ_ONCE(vpmem->broken));
 		spin_lock_irqsave(&vpmem->pmem_lock, flags);
+
+		if (READ_ONCE(vpmem->broken))
+			break;
 	}
 
-	err1 = virtqueue_kick(vpmem->req_vq);
+	if (err == -EIO || virtqueue_is_broken(vpmem->req_vq))
+		virtio_pmem_mark_broken(vpmem);
+
+	err1 = true;
+	if (!err && !READ_ONCE(vpmem->broken)) {
+		err1 = virtqueue_kick(vpmem->req_vq);
+		if (!err1)
+			virtio_pmem_mark_broken(vpmem);
+	}
 	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
 	/*
 	 * virtqueue_add_sgs failed with error different than -ENOSPC, we can't
 	 * do anything about that.
 	 */
-	if (err || !err1) {
+	if (READ_ONCE(vpmem->broken) || err || !err1) {
 		dev_info(&vdev->dev, "failed to send command to virtio pmem device\n");
 		err = -EIO;
 	} else {
diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
index 77b1966619059..3bcc7b3671d21 100644
--- a/drivers/nvdimm/virtio_pmem.c
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -25,6 +25,7 @@ static int init_vq(struct virtio_pmem *vpmem)
 
 	spin_lock_init(&vpmem->pmem_lock);
 	INIT_LIST_HEAD(&vpmem->req_list);
+	WRITE_ONCE(vpmem->broken, false);
 
 	return 0;
 };
@@ -138,10 +139,21 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
 static void virtio_pmem_remove(struct virtio_device *vdev)
 {
 	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
+	struct virtio_pmem *vpmem = vdev->priv;
+	unsigned long flags;
+
+	spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	virtio_pmem_mark_broken(vpmem);
+	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+	virtio_reset_device(vdev);
+
+	spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	virtio_pmem_drain(vpmem);
+	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
 
 	nvdimm_bus_unregister(nvdimm_bus);
 	vdev->config->del_vqs(vdev);
-	virtio_reset_device(vdev);
 }
 
 static int virtio_pmem_freeze(struct virtio_device *vdev)
diff --git a/drivers/nvdimm/virtio_pmem.h b/drivers/nvdimm/virtio_pmem.h
index 1017e498c9b4c..7ad24a0443f61 100644
--- a/drivers/nvdimm/virtio_pmem.h
+++ b/drivers/nvdimm/virtio_pmem.h
@@ -48,6 +48,9 @@ struct virtio_pmem {
 	/* List to store deferred work if virtqueue is full */
 	struct list_head req_list;
 
+	/* Fail fast and wake waiters if the request virtqueue is broken. */
+	bool broken;
+
 	/* Synchronize virtqueue data */
 	spinlock_t pmem_lock;
 
@@ -57,5 +60,7 @@ struct virtio_pmem {
 };
 
 void virtio_pmem_host_ack(struct virtqueue *vq);
+void virtio_pmem_mark_broken(struct virtio_pmem *vpmem);
+void virtio_pmem_drain(struct virtio_pmem *vpmem);
 int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
 #endif
-- 
2.52.0

^ permalink raw reply related

* [PATCH v5 6/8] nvdimm: virtio_pmem: refcount requests for token lifetime
From: Li Chen @ 2026-06-17 12:24 UTC (permalink / raw)
  To: Pankaj Gupta, Dan Williams, Vishal Verma, Dave Jiang, Ira Weiny,
	Alison Schofield, virtualization, nvdimm
  Cc: linux-kernel, stable, Li Chen
In-Reply-To: <20260617122442.2118957-1-me@linux.beauty>

KASAN reports slab-use-after-free in __wake_up_common():
BUG: KASAN: slab-use-after-free in __wake_up_common+0x114/0x160
Read of size 8 at addr ffff88810fdcb710 by task swapper/0/0

CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted
6.19.0-next-20260220-00006-g1eae5f204ec3 #4 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux
1.17.0-2-2 04/01/2014
Call Trace:
 <IRQ>
 dump_stack_lvl+0x6d/0xb0
 print_report+0x170/0x4e2
 ? __pfx__raw_spin_lock_irqsave+0x10/0x10
 ? __virt_addr_valid+0x1dc/0x380
 kasan_report+0xbc/0xf0
 ? __wake_up_common+0x114/0x160
 ? __wake_up_common+0x114/0x160
 __wake_up_common+0x114/0x160
 ? __pfx__raw_spin_lock_irqsave+0x10/0x10
 __wake_up+0x36/0x60
 virtio_pmem_host_ack+0x11d/0x3b0
 ? sched_balance_domains+0x29f/0xb00
 ? __pfx_virtio_pmem_host_ack+0x10/0x10
 ? _raw_spin_lock_irqsave+0x98/0x100
 ? __pfx__raw_spin_lock_irqsave+0x10/0x10
 vring_interrupt+0x1c9/0x5e0
 ? __pfx_vp_interrupt+0x10/0x10
 vp_vring_interrupt+0x87/0x100
 ? __pfx_vp_interrupt+0x10/0x10
 __handle_irq_event_percpu+0x17f/0x550
 ? __pfx__raw_spin_lock+0x10/0x10
 handle_irq_event+0xab/0x1c0
 handle_fasteoi_irq+0x276/0xae0
 __common_interrupt+0x65/0x130
 common_interrupt+0x78/0xa0
 </IRQ>

virtio_pmem_host_ack() wakes a request that has already been freed by the
submitter.

This happens when the request token is still reachable via the virtqueue,
but virtio_pmem_flush() returns and frees it.

Fix the token lifetime by refcounting struct virtio_pmem_request.
virtio_pmem_flush() holds a submitter reference, and the virtqueue holds an
extra reference once the request is queued. The completion path drops the
virtqueue reference, and the submitter drops its reference before
returning.

Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
Cc: stable@vger.kernel.org
Signed-off-by: Li Chen <me@linux.beauty>
---
v2->v3:
- Add raw KASAN report to the patch description.
- Drop timestamps from the embedded report.
v3->v4:
- Rebased onto v7.1-rc7 and renumbered after the flush error patches.

 drivers/nvdimm/nd_virtio.c   | 34 +++++++++++++++++++++++++++++-----
 drivers/nvdimm/virtio_pmem.h |  2 ++
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
index f8c0604edde51..f5264f6afe44f 100644
--- a/drivers/nvdimm/nd_virtio.c
+++ b/drivers/nvdimm/nd_virtio.c
@@ -9,6 +9,14 @@
 #include "virtio_pmem.h"
 #include "nd.h"
 
+static void virtio_pmem_req_release(struct kref *kref)
+{
+	struct virtio_pmem_request *req;
+
+	req = container_of(kref, struct virtio_pmem_request, kref);
+	kfree(req);
+}
+
 static void virtio_pmem_wake_one_waiter(struct virtio_pmem *vpmem)
 {
 	struct virtio_pmem_request *req_buf;
@@ -36,6 +44,7 @@ void virtio_pmem_host_ack(struct virtqueue *vq)
 		virtio_pmem_wake_one_waiter(vpmem);
 		WRITE_ONCE(req_data->done, true);
 		wake_up(&req_data->host_acked);
+		kref_put(&req_data->kref, virtio_pmem_req_release);
 	}
 	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
 }
@@ -66,6 +75,7 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
 	if (!req_data)
 		return -ENOMEM;
 
+	kref_init(&req_data->kref);
 	WRITE_ONCE(req_data->done, false);
 	init_waitqueue_head(&req_data->host_acked);
 	init_waitqueue_head(&req_data->wq_buf);
@@ -83,10 +93,23 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
 	  * to req_list and wait for host_ack to wake us up when free
 	  * slots are available.
 	  */
-	while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req_data,
-					GFP_ATOMIC)) == -ENOSPC) {
-
-		dev_info(&vdev->dev, "failed to send command to virtio pmem device, no free slots in the virtqueue\n");
+	for (;;) {
+		err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req_data,
+					GFP_ATOMIC);
+		if (!err) {
+			/*
+			 * Take the virtqueue reference while @pmem_lock is
+			 * held so completion cannot run concurrently.
+			 */
+			kref_get(&req_data->kref);
+			break;
+		}
+
+		if (err != -ENOSPC)
+			break;
+
+		dev_info_ratelimited(&vdev->dev,
+				     "failed to send command to virtio pmem device, no free slots in the virtqueue\n");
 		WRITE_ONCE(req_data->wq_buf_avail, false);
 		list_add_tail(&req_data->list, &vpmem->req_list);
 		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
@@ -95,6 +118,7 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
 		wait_event(req_data->wq_buf, READ_ONCE(req_data->wq_buf_avail));
 		spin_lock_irqsave(&vpmem->pmem_lock, flags);
 	}
+
 	err1 = virtqueue_kick(vpmem->req_vq);
 	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
 	/*
@@ -110,7 +134,7 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
 		err = le32_to_cpu(req_data->resp.ret);
 	}
 
-	kfree(req_data);
+	kref_put(&req_data->kref, virtio_pmem_req_release);
 	return err;
 };
 
diff --git a/drivers/nvdimm/virtio_pmem.h b/drivers/nvdimm/virtio_pmem.h
index f72cf17f9518f..1017e498c9b4c 100644
--- a/drivers/nvdimm/virtio_pmem.h
+++ b/drivers/nvdimm/virtio_pmem.h
@@ -12,11 +12,13 @@
 
 #include <linux/module.h>
 #include <uapi/linux/virtio_pmem.h>
+#include <linux/kref.h>
 #include <linux/libnvdimm.h>
 #include <linux/mutex.h>
 #include <linux/spinlock.h>
 
 struct virtio_pmem_request {
+	struct kref kref;
 	struct virtio_pmem_req req;
 	struct virtio_pmem_resp resp;
 
-- 
2.52.0

^ permalink raw reply related

* [PATCH v5 5/8] nvdimm: virtio_pmem: use READ_ONCE()/WRITE_ONCE() for wait flags
From: Li Chen @ 2026-06-17 12:24 UTC (permalink / raw)
  To: Pankaj Gupta, Dan Williams, Vishal Verma, Dave Jiang, Ira Weiny,
	Alison Schofield, virtualization, nvdimm
  Cc: linux-kernel, Li Chen
In-Reply-To: <20260617122442.2118957-1-me@linux.beauty>

Use READ_ONCE()/WRITE_ONCE() for the wait_event() flags (done and
wq_buf_avail). They are observed by waiters without pmem_lock, so make
the accesses explicit single loads/stores and avoid compiler
reordering/caching across the wait/wake paths.

Signed-off-by: Li Chen <me@linux.beauty>
---
v2->v3:
- Split out READ_ONCE()/WRITE_ONCE() updates from patch 3/7 (no functional
  change intended).
v3->v4:
- Rebased onto v7.1-rc7 and renumbered after the flush error patches.

 drivers/nvdimm/nd_virtio.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
index 16ee5a47b9938..f8c0604edde51 100644
--- a/drivers/nvdimm/nd_virtio.c
+++ b/drivers/nvdimm/nd_virtio.c
@@ -18,9 +18,9 @@ static void virtio_pmem_wake_one_waiter(struct virtio_pmem *vpmem)
 
 	req_buf = list_first_entry(&vpmem->req_list,
 				   struct virtio_pmem_request, list);
-	req_buf->wq_buf_avail = true;
+	list_del_init(&req_buf->list);
+	WRITE_ONCE(req_buf->wq_buf_avail, true);
 	wake_up(&req_buf->wq_buf);
-	list_del(&req_buf->list);
 }
 
  /* The interrupt handler */
@@ -34,7 +34,7 @@ void virtio_pmem_host_ack(struct virtqueue *vq)
 	spin_lock_irqsave(&vpmem->pmem_lock, flags);
 	while ((req_data = virtqueue_get_buf(vq, &len)) != NULL) {
 		virtio_pmem_wake_one_waiter(vpmem);
-		req_data->done = true;
+		WRITE_ONCE(req_data->done, true);
 		wake_up(&req_data->host_acked);
 	}
 	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
@@ -66,7 +66,7 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
 	if (!req_data)
 		return -ENOMEM;
 
-	req_data->done = false;
+	WRITE_ONCE(req_data->done, false);
 	init_waitqueue_head(&req_data->host_acked);
 	init_waitqueue_head(&req_data->wq_buf);
 	INIT_LIST_HEAD(&req_data->list);
@@ -87,12 +87,12 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
 					GFP_ATOMIC)) == -ENOSPC) {
 
 		dev_info(&vdev->dev, "failed to send command to virtio pmem device, no free slots in the virtqueue\n");
-		req_data->wq_buf_avail = false;
+		WRITE_ONCE(req_data->wq_buf_avail, false);
 		list_add_tail(&req_data->list, &vpmem->req_list);
 		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
 
 		/* A host response results in "host_ack" getting called */
-		wait_event(req_data->wq_buf, req_data->wq_buf_avail);
+		wait_event(req_data->wq_buf, READ_ONCE(req_data->wq_buf_avail));
 		spin_lock_irqsave(&vpmem->pmem_lock, flags);
 	}
 	err1 = virtqueue_kick(vpmem->req_vq);
@@ -106,7 +106,7 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
 		err = -EIO;
 	} else {
 		/* A host response results in "host_ack" getting called */
-		wait_event(req_data->host_acked, req_data->done);
+		wait_event(req_data->host_acked, READ_ONCE(req_data->done));
 		err = le32_to_cpu(req_data->resp.ret);
 	}
 
-- 
2.52.0

^ permalink raw reply related

* [PATCH v5 4/8] nvdimm: virtio_pmem: always wake -ENOSPC waiters
From: Li Chen @ 2026-06-17 12:24 UTC (permalink / raw)
  To: Pankaj Gupta, Dan Williams, Vishal Verma, Dave Jiang, Ira Weiny,
	Alison Schofield, virtualization, nvdimm
  Cc: linux-kernel, Li Chen
In-Reply-To: <20260617122442.2118957-1-me@linux.beauty>

virtio_pmem_host_ack() reclaims virtqueue descriptors with
virtqueue_get_buf(). The -ENOSPC waiter wakeup is tied to completing the
returned token. If token completion is skipped for any reason, reclaimed
descriptors may not wake a waiter and the submitter may sleep forever
waiting for a free slot. Always wake one -ENOSPC waiter for each virtqueue
completion before touching the returned token.

Signed-off-by: Li Chen <me@linux.beauty>
---
v2->v3:
- Split out the waiter wakeup ordering change from READ_ONCE()/WRITE_ONCE()
  updates (now patch 4/7), per Pankaj's suggestion.
v3->v4:
- Rebased onto v7.1-rc7 and renumbered after the flush error patches.

 drivers/nvdimm/nd_virtio.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
index 081370aac6317..16ee5a47b9938 100644
--- a/drivers/nvdimm/nd_virtio.c
+++ b/drivers/nvdimm/nd_virtio.c
@@ -9,26 +9,33 @@
 #include "virtio_pmem.h"
 #include "nd.h"
 
+static void virtio_pmem_wake_one_waiter(struct virtio_pmem *vpmem)
+{
+	struct virtio_pmem_request *req_buf;
+
+	if (list_empty(&vpmem->req_list))
+		return;
+
+	req_buf = list_first_entry(&vpmem->req_list,
+				   struct virtio_pmem_request, list);
+	req_buf->wq_buf_avail = true;
+	wake_up(&req_buf->wq_buf);
+	list_del(&req_buf->list);
+}
+
  /* The interrupt handler */
 void virtio_pmem_host_ack(struct virtqueue *vq)
 {
 	struct virtio_pmem *vpmem = vq->vdev->priv;
-	struct virtio_pmem_request *req_data, *req_buf;
+	struct virtio_pmem_request *req_data;
 	unsigned long flags;
 	unsigned int len;
 
 	spin_lock_irqsave(&vpmem->pmem_lock, flags);
 	while ((req_data = virtqueue_get_buf(vq, &len)) != NULL) {
+		virtio_pmem_wake_one_waiter(vpmem);
 		req_data->done = true;
 		wake_up(&req_data->host_acked);
-
-		if (!list_empty(&vpmem->req_list)) {
-			req_buf = list_first_entry(&vpmem->req_list,
-					struct virtio_pmem_request, list);
-			req_buf->wq_buf_avail = true;
-			wake_up(&req_buf->wq_buf);
-			list_del(&req_buf->list);
-		}
 	}
 	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
 }
-- 
2.52.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox