[PATCH] nvdimm: virtio_pmem: serialize flush requests

public inbox for virtualization@lists.linux-foundation.org
 help / color / mirror / Atom feed

* [PATCH] nvdimm: virtio_pmem: serialize flush requests
@ 2026-01-13  3:45 Li Chen
  2026-01-30 20:52 ` Ira Weiny
  0 siblings, 1 reply; 7+ messages in thread
From: Li Chen @ 2026-01-13  3:45 UTC (permalink / raw)
  To: Dan Williams, Vishal Verma, Dave Jiang, Ira Weiny, Pankaj Gupta,
	Michael S. Tsirkin, Cornelia Huck, Jakub Staron, nvdimm,
	virtualization, linux-kernel
  Cc: Li Chen

Under heavy concurrent flush traffic, virtio-pmem can overflow its request
virtqueue (req_vq): virtqueue_add_sgs() starts returning -ENOSPC and the
driver logs "no free slots in the virtqueue". Shortly after that the
device enters VIRTIO_CONFIG_S_NEEDS_RESET and flush requests fail with
"virtio pmem device needs a reset".

Serialize virtio_pmem_flush() with a per-device mutex so only one flush
request is in-flight at a time. This prevents req_vq descriptor overflow
under high concurrency.

Reproducer (guest with virtio-pmem):
  - mkfs.ext4 -F /dev/pmem0
  - mount -t ext4 -o dax,noatime /dev/pmem0 /mnt/bench
  - fio: ioengine=io_uring rw=randwrite bs=4k iodepth=64 numjobs=64
        direct=1 fsync=1 runtime=30s time_based=1
  - dmesg: "no free slots in the virtqueue"
           "virtio pmem device needs a reset"

Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
Signed-off-by: Li Chen <me@linux.beauty>
---
 drivers/nvdimm/nd_virtio.c   | 15 +++++++++++----
 drivers/nvdimm/virtio_pmem.c |  1 +
 drivers/nvdimm/virtio_pmem.h |  4 ++++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
index c3f07be4aa22..827a17fe7c71 100644
--- a/drivers/nvdimm/nd_virtio.c
+++ b/drivers/nvdimm/nd_virtio.c
@@ -44,19 +44,24 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
 	unsigned long flags;
 	int err, err1;
 
+	might_sleep();
+	mutex_lock(&vpmem->flush_lock);
+
 	/*
 	 * Don't bother to submit the request to the device if the device is
 	 * not activated.
 	 */
 	if (vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_NEEDS_RESET) {
 		dev_info(&vdev->dev, "virtio pmem device needs a reset\n");
-		return -EIO;
+		err = -EIO;
+		goto out_unlock;
 	}
 
-	might_sleep();
 	req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
-	if (!req_data)
-		return -ENOMEM;
+	if (!req_data) {
+		err = -ENOMEM;
+		goto out_unlock;
+	}
 
 	req_data->done = false;
 	init_waitqueue_head(&req_data->host_acked);
@@ -103,6 +108,8 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
 	}
 
 	kfree(req_data);
+out_unlock:
+	mutex_unlock(&vpmem->flush_lock);
 	return err;
 };
 
diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
index 2396d19ce549..77b196661905 100644
--- a/drivers/nvdimm/virtio_pmem.c
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -64,6 +64,7 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
 		goto out_err;
 	}
 
+	mutex_init(&vpmem->flush_lock);
 	vpmem->vdev = vdev;
 	vdev->priv = vpmem;
 	err = init_vq(vpmem);
diff --git a/drivers/nvdimm/virtio_pmem.h b/drivers/nvdimm/virtio_pmem.h
index 0dddefe594c4..f72cf17f9518 100644
--- a/drivers/nvdimm/virtio_pmem.h
+++ b/drivers/nvdimm/virtio_pmem.h
@@ -13,6 +13,7 @@
 #include <linux/module.h>
 #include <uapi/linux/virtio_pmem.h>
 #include <linux/libnvdimm.h>
+#include <linux/mutex.h>
 #include <linux/spinlock.h>
 
 struct virtio_pmem_request {
@@ -35,6 +36,9 @@ struct virtio_pmem {
 	/* Virtio pmem request queue */
 	struct virtqueue *req_vq;
 
+	/* Serialize flush requests to the device. */
+	struct mutex flush_lock;
+
 	/* nvdimm bus registers virtio pmem device */
 	struct nvdimm_bus *nvdimm_bus;
 	struct nvdimm_bus_descriptor nd_desc;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvdimm: virtio_pmem: serialize flush requests
  2026-01-13  3:45 [PATCH] nvdimm: virtio_pmem: serialize flush requests Li Chen
@ 2026-01-30 20:52 ` Ira Weiny
  2026-01-31 17:46   ` Michael S. Tsirkin
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Ira Weiny @ 2026-01-30 20:52 UTC (permalink / raw)
  To: Li Chen, Dan Williams, Vishal Verma, Dave Jiang, Ira Weiny,
	Pankaj Gupta, Michael S. Tsirkin, Cornelia Huck, Jakub Staron,
	nvdimm, virtualization, linux-kernel
  Cc: Li Chen

Li Chen wrote:
> Under heavy concurrent flush traffic, virtio-pmem can overflow its request
> virtqueue (req_vq): virtqueue_add_sgs() starts returning -ENOSPC and the
> driver logs "no free slots in the virtqueue". Shortly after that the
> device enters VIRTIO_CONFIG_S_NEEDS_RESET and flush requests fail with
> "virtio pmem device needs a reset".
> 
> Serialize virtio_pmem_flush() with a per-device mutex so only one flush
> request is in-flight at a time. This prevents req_vq descriptor overflow
> under high concurrency.
> 
> Reproducer (guest with virtio-pmem):
>   - mkfs.ext4 -F /dev/pmem0
>   - mount -t ext4 -o dax,noatime /dev/pmem0 /mnt/bench
>   - fio: ioengine=io_uring rw=randwrite bs=4k iodepth=64 numjobs=64
>         direct=1 fsync=1 runtime=30s time_based=1

I don't see this error.

<file>
13:28:50 > cat foo.fio 
# test http://lore.kernel.org/20260113034552.62805-1-me@linux.beauty

[global]
filename=/mnt/bench/foo
ioengine=io_uring
size=1G
bs=4K
iodepth=64
numjobs=64
direct=1
fsync=1
runtime=30s
time_based=1

[rand-write]
rw=randwrite
</file>

It's possible I'm doing something wrong.  Can you share your qemu cmdline
or more details on the bug yall see.

>   - dmesg: "no free slots in the virtqueue"
>            "virtio pmem device needs a reset"
> 
> Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
> Signed-off-by: Li Chen <me@linux.beauty>
> ---
>  drivers/nvdimm/nd_virtio.c   | 15 +++++++++++----
>  drivers/nvdimm/virtio_pmem.c |  1 +
>  drivers/nvdimm/virtio_pmem.h |  4 ++++
>  3 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> index c3f07be4aa22..827a17fe7c71 100644
> --- a/drivers/nvdimm/nd_virtio.c
> +++ b/drivers/nvdimm/nd_virtio.c
> @@ -44,19 +44,24 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
>  	unsigned long flags;
>  	int err, err1;
>  
> +	might_sleep();
> +	mutex_lock(&vpmem->flush_lock);

Assuming this does fix a bug I'd rather use guard here.

	guard(mutex)(&vpmem->flush_lock);

Then skip all the gotos and out_unlock stuff.

Also, does this affect performance at all?

Ira

> +
>  	/*
>  	 * Don't bother to submit the request to the device if the device is
>  	 * not activated.
>  	 */
>  	if (vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_NEEDS_RESET) {
>  		dev_info(&vdev->dev, "virtio pmem device needs a reset\n");
> -		return -EIO;
> +		err = -EIO;
> +		goto out_unlock;
>  	}
>  
> -	might_sleep();
>  	req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
> -	if (!req_data)
> -		return -ENOMEM;
> +	if (!req_data) {
> +		err = -ENOMEM;
> +		goto out_unlock;
> +	}
>  
>  	req_data->done = false;
>  	init_waitqueue_head(&req_data->host_acked);
> @@ -103,6 +108,8 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
>  	}
>  
>  	kfree(req_data);
> +out_unlock:
> +	mutex_unlock(&vpmem->flush_lock);
>  	return err;
>  };

[snip]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvdimm: virtio_pmem: serialize flush requests
  2026-01-30 20:52 ` Ira Weiny
@ 2026-01-31 17:46   ` Michael S. Tsirkin
  2026-02-01  4:40     ` Li Chen
  2026-01-31 17:47   ` Michael S. Tsirkin
  2026-02-01  4:21   ` Li Chen
  2 siblings, 1 reply; 7+ messages in thread
From: Michael S. Tsirkin @ 2026-01-31 17:46 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Li Chen, Dan Williams, Vishal Verma, Dave Jiang, Pankaj Gupta,
	Cornelia Huck, Jakub Staron, nvdimm, virtualization, linux-kernel

On Fri, Jan 30, 2026 at 02:52:12PM -0600, Ira Weiny wrote:
> Li Chen wrote:
> > Under heavy concurrent flush traffic, virtio-pmem can overflow its request
> > virtqueue (req_vq): virtqueue_add_sgs() starts returning -ENOSPC and the
> > driver logs "no free slots in the virtqueue". Shortly after that the
> > device enters VIRTIO_CONFIG_S_NEEDS_RESET and flush requests fail with
> > "virtio pmem device needs a reset".
> > 
> > Serialize virtio_pmem_flush() with a per-device mutex so only one flush
> > request is in-flight at a time. This prevents req_vq descriptor overflow
> > under high concurrency.
> > 
> > Reproducer (guest with virtio-pmem):
> >   - mkfs.ext4 -F /dev/pmem0
> >   - mount -t ext4 -o dax,noatime /dev/pmem0 /mnt/bench
> >   - fio: ioengine=io_uring rw=randwrite bs=4k iodepth=64 numjobs=64
> >         direct=1 fsync=1 runtime=30s time_based=1
> 
> I don't see this error.
> 
> <file>
> 13:28:50 > cat foo.fio 
> # test http://lore.kernel.org/20260113034552.62805-1-me@linux.beauty
> 
> [global]
> filename=/mnt/bench/foo
> ioengine=io_uring
> size=1G
> bs=4K
> iodepth=64
> numjobs=64
> direct=1
> fsync=1
> runtime=30s
> time_based=1
> 
> [rand-write]
> rw=randwrite
> </file>
> 
> It's possible I'm doing something wrong.  Can you share your qemu cmdline
> or more details on the bug yall see.
> 
> >   - dmesg: "no free slots in the virtqueue"
> >            "virtio pmem device needs a reset"
> > 
> > Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
> > Signed-off-by: Li Chen <me@linux.beauty>
> > ---
> >  drivers/nvdimm/nd_virtio.c   | 15 +++++++++++----
> >  drivers/nvdimm/virtio_pmem.c |  1 +
> >  drivers/nvdimm/virtio_pmem.h |  4 ++++
> >  3 files changed, 16 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > index c3f07be4aa22..827a17fe7c71 100644
> > --- a/drivers/nvdimm/nd_virtio.c
> > +++ b/drivers/nvdimm/nd_virtio.c
> > @@ -44,19 +44,24 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
> >  	unsigned long flags;
> >  	int err, err1;
> >  
> > +	might_sleep();


for that matter might_sleep not really needed near mutex_lock.


> > +	mutex_lock(&vpmem->flush_lock);
> 
> Assuming this does fix a bug I'd rather use guard here.
> 
> 	guard(mutex)(&vpmem->flush_lock);
> 
> Then skip all the gotos and out_unlock stuff.
> 
> Also, does this affect performance at all?
> 
> Ira
> 
> > +
> >  	/*
> >  	 * Don't bother to submit the request to the device if the device is
> >  	 * not activated.
> >  	 */
> >  	if (vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_NEEDS_RESET) {
> >  		dev_info(&vdev->dev, "virtio pmem device needs a reset\n");
> > -		return -EIO;
> > +		err = -EIO;
> > +		goto out_unlock;
> >  	}
> >  
> > -	might_sleep();
> >  	req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
> > -	if (!req_data)
> > -		return -ENOMEM;
> > +	if (!req_data) {
> > +		err = -ENOMEM;
> > +		goto out_unlock;
> > +	}
> >  
> >  	req_data->done = false;
> >  	init_waitqueue_head(&req_data->host_acked);
> > @@ -103,6 +108,8 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
> >  	}
> >  
> >  	kfree(req_data);
> > +out_unlock:
> > +	mutex_unlock(&vpmem->flush_lock);
> >  	return err;
> >  };
> 
> [snip]


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvdimm: virtio_pmem: serialize flush requests
  2026-01-31 17:46   ` Michael S. Tsirkin
@ 2026-02-01  4:40     ` Li Chen
  0 siblings, 0 replies; 7+ messages in thread
From: Li Chen @ 2026-02-01  4:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Ira Weiny, Dan Williams, Vishal Verma, Dave Jiang, Pankaj Gupta,
	Cornelia Huck, Jakub Staron, nvdimm, virtualization, linux-kernel

Hi Michael,

On Sun, 01 Feb 2026 01:46:19 +0800,
Michael S. Tsirkin wrote:
> 
> On Fri, Jan 30, 2026 at 02:52:12PM -0600, Ira Weiny wrote:
> > Li Chen wrote:
> > > Under heavy concurrent flush traffic, virtio-pmem can overflow its request
> > > virtqueue (req_vq): virtqueue_add_sgs() starts returning -ENOSPC and the
> > > driver logs "no free slots in the virtqueue". Shortly after that the
> > > device enters VIRTIO_CONFIG_S_NEEDS_RESET and flush requests fail with
> > > "virtio pmem device needs a reset".
> > > 
> > > Serialize virtio_pmem_flush() with a per-device mutex so only one flush
> > > request is in-flight at a time. This prevents req_vq descriptor overflow
> > > under high concurrency.
> > > 
> > > Reproducer (guest with virtio-pmem):
> > >   - mkfs.ext4 -F /dev/pmem0
> > >   - mount -t ext4 -o dax,noatime /dev/pmem0 /mnt/bench
> > >   - fio: ioengine=io_uring rw=randwrite bs=4k iodepth=64 numjobs=64
> > >         direct=1 fsync=1 runtime=30s time_based=1
> > 
> > I don't see this error.
> > 
> > <file>
> > 13:28:50 > cat foo.fio 
> > # test http://lore.kernel.org/20260113034552.62805-1-me@linux.beauty
> > 
> > [global]
> > filename=/mnt/bench/foo
> > ioengine=io_uring
> > size=1G
> > bs=4K
> > iodepth=64
> > numjobs=64
> > direct=1
> > fsync=1
> > runtime=30s
> > time_based=1
> > 
> > [rand-write]
> > rw=randwrite
> > </file>
> > 
> > It's possible I'm doing something wrong.  Can you share your qemu cmdline
> > or more details on the bug yall see.
> > 
> > >   - dmesg: "no free slots in the virtqueue"
> > >            "virtio pmem device needs a reset"
> > > 
> > > Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
> > > Signed-off-by: Li Chen <me@linux.beauty>
> > > ---
> > >  drivers/nvdimm/nd_virtio.c   | 15 +++++++++++----
> > >  drivers/nvdimm/virtio_pmem.c |  1 +
> > >  drivers/nvdimm/virtio_pmem.h |  4 ++++
> > >  3 files changed, 16 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > > index c3f07be4aa22..827a17fe7c71 100644
> > > --- a/drivers/nvdimm/nd_virtio.c
> > > +++ b/drivers/nvdimm/nd_virtio.c
> > > @@ -44,19 +44,24 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
> > >  	unsigned long flags;
> > >  	int err, err1;
> > >  
> > > +	might_sleep();
> 
> 
> for that matter might_sleep not really needed near mutex_lock.
> 
> 
> > > +	mutex_lock(&vpmem->flush_lock);

Good point. mutex_lock() already does might_sleep(), so the explicit
might_sleep() next to the lock is redundant.

I'll drop it in v2 (which also switches to guard(mutex) as Ira suggested).

Regards,
Li

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvdimm: virtio_pmem: serialize flush requests
  2026-01-30 20:52 ` Ira Weiny
  2026-01-31 17:46   ` Michael S. Tsirkin
@ 2026-01-31 17:47   ` Michael S. Tsirkin
  2026-02-02 17:18     ` Ira Weiny
  2026-02-01  4:21   ` Li Chen
  2 siblings, 1 reply; 7+ messages in thread
From: Michael S. Tsirkin @ 2026-01-31 17:47 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Li Chen, Dan Williams, Vishal Verma, Dave Jiang, Pankaj Gupta,
	Cornelia Huck, Jakub Staron, nvdimm, virtualization, linux-kernel

On Fri, Jan 30, 2026 at 02:52:12PM -0600, Ira Weiny wrote:
> Li Chen wrote:
> > Under heavy concurrent flush traffic, virtio-pmem can overflow its request
> > virtqueue (req_vq): virtqueue_add_sgs() starts returning -ENOSPC and the
> > driver logs "no free slots in the virtqueue". Shortly after that the
> > device enters VIRTIO_CONFIG_S_NEEDS_RESET and flush requests fail with
> > "virtio pmem device needs a reset".
> > 
> > Serialize virtio_pmem_flush() with a per-device mutex so only one flush
> > request is in-flight at a time. This prevents req_vq descriptor overflow
> > under high concurrency.
> > 
> > Reproducer (guest with virtio-pmem):
> >   - mkfs.ext4 -F /dev/pmem0
> >   - mount -t ext4 -o dax,noatime /dev/pmem0 /mnt/bench
> >   - fio: ioengine=io_uring rw=randwrite bs=4k iodepth=64 numjobs=64
> >         direct=1 fsync=1 runtime=30s time_based=1
> 
> I don't see this error.
> 
> <file>
> 13:28:50 > cat foo.fio 
> # test http://lore.kernel.org/20260113034552.62805-1-me@linux.beauty
> 
> [global]
> filename=/mnt/bench/foo
> ioengine=io_uring
> size=1G
> bs=4K
> iodepth=64
> numjobs=64
> direct=1
> fsync=1
> runtime=30s
> time_based=1
> 
> [rand-write]
> rw=randwrite
> </file>
> 
> It's possible I'm doing something wrong.  Can you share your qemu cmdline
> or more details on the bug yall see.
> 
> >   - dmesg: "no free slots in the virtqueue"
> >            "virtio pmem device needs a reset"
> > 
> > Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
> > Signed-off-by: Li Chen <me@linux.beauty>
> > ---
> >  drivers/nvdimm/nd_virtio.c   | 15 +++++++++++----
> >  drivers/nvdimm/virtio_pmem.c |  1 +
> >  drivers/nvdimm/virtio_pmem.h |  4 ++++
> >  3 files changed, 16 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > index c3f07be4aa22..827a17fe7c71 100644
> > --- a/drivers/nvdimm/nd_virtio.c
> > +++ b/drivers/nvdimm/nd_virtio.c
> > @@ -44,19 +44,24 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
> >  	unsigned long flags;
> >  	int err, err1;
> >  
> > +	might_sleep();
> > +	mutex_lock(&vpmem->flush_lock);
> 
> Assuming this does fix a bug I'd rather use guard here.

Do you, from code review, agree with the logic that
it's racy right now?
Whether the bug is reproducible isn't really the question.


> 	guard(mutex)(&vpmem->flush_lock);
> 
> Then skip all the gotos and out_unlock stuff.
> 
> Also, does this affect performance at all?
> 
> Ira
> 
> > +
> >  	/*
> >  	 * Don't bother to submit the request to the device if the device is
> >  	 * not activated.
> >  	 */
> >  	if (vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_NEEDS_RESET) {
> >  		dev_info(&vdev->dev, "virtio pmem device needs a reset\n");
> > -		return -EIO;
> > +		err = -EIO;
> > +		goto out_unlock;
> >  	}
> >  
> > -	might_sleep();
> >  	req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
> > -	if (!req_data)
> > -		return -ENOMEM;
> > +	if (!req_data) {
> > +		err = -ENOMEM;
> > +		goto out_unlock;
> > +	}
> >  
> >  	req_data->done = false;
> >  	init_waitqueue_head(&req_data->host_acked);
> > @@ -103,6 +108,8 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
> >  	}
> >  
> >  	kfree(req_data);
> > +out_unlock:
> > +	mutex_unlock(&vpmem->flush_lock);
> >  	return err;
> >  };
> 
> [snip]


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvdimm: virtio_pmem: serialize flush requests
  2026-01-31 17:47   ` Michael S. Tsirkin
@ 2026-02-02 17:18     ` Ira Weiny
  0 siblings, 0 replies; 7+ messages in thread
From: Ira Weiny @ 2026-02-02 17:18 UTC (permalink / raw)
  To: Michael S. Tsirkin, Ira Weiny
  Cc: Li Chen, Dan Williams, Vishal Verma, Dave Jiang, Pankaj Gupta,
	Cornelia Huck, Jakub Staron, nvdimm, virtualization, linux-kernel

Michael S. Tsirkin wrote:
> On Fri, Jan 30, 2026 at 02:52:12PM -0600, Ira Weiny wrote:
> > Li Chen wrote:

[snip]

> > > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > > index c3f07be4aa22..827a17fe7c71 100644
> > > --- a/drivers/nvdimm/nd_virtio.c
> > > +++ b/drivers/nvdimm/nd_virtio.c
> > > @@ -44,19 +44,24 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
> > >  	unsigned long flags;
> > >  	int err, err1;
> > >  
> > > +	might_sleep();
> > > +	mutex_lock(&vpmem->flush_lock);
> > 
> > Assuming this does fix a bug I'd rather use guard here.
> 
> Do you, from code review, agree with the logic that
> it's racy right now?

I do now.  I was hoping to understand the test being run.  The additional
detail that it takes multiple runs helps.

> Whether the bug is reproducible isn't really the question.
> 

True.  But we should still use guard().  I'll look for v2.

Ira

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvdimm: virtio_pmem: serialize flush requests
  2026-01-30 20:52 ` Ira Weiny
  2026-01-31 17:46   ` Michael S. Tsirkin
  2026-01-31 17:47   ` Michael S. Tsirkin
@ 2026-02-01  4:21   ` Li Chen
  2 siblings, 0 replies; 7+ messages in thread
From: Li Chen @ 2026-02-01  4:21 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Vishal Verma, Dave Jiang, Pankaj Gupta,
	Michael S. Tsirkin, Cornelia Huck, Jakub Staron, nvdimm,
	virtualization, linux-kernel


Hi Ira,

On Sat, 31 Jan 2026 04:52:12 +0800,
Ira Weiny wrote:
> 
> Li Chen wrote:
> > Under heavy concurrent flush traffic, virtio-pmem can overflow its request
> > virtqueue (req_vq): virtqueue_add_sgs() starts returning -ENOSPC and the
> > driver logs "no free slots in the virtqueue". Shortly after that the
> > device enters VIRTIO_CONFIG_S_NEEDS_RESET and flush requests fail with
> > "virtio pmem device needs a reset".
> > 
> > Serialize virtio_pmem_flush() with a per-device mutex so only one flush
> > request is in-flight at a time. This prevents req_vq descriptor overflow
> > under high concurrency.
> > 
> > Reproducer (guest with virtio-pmem):
> >   - mkfs.ext4 -F /dev/pmem0
> >   - mount -t ext4 -o dax,noatime /dev/pmem0 /mnt/bench
> >   - fio: ioengine=io_uring rw=randwrite bs=4k iodepth=64 numjobs=64
> >         direct=1 fsync=1 runtime=30s time_based=1
> 
> I don't see this error.
> 
> <file>
> 13:28:50 > cat foo.fio 
> # test http://lore.kernel.org/20260113034552.62805-1-me@linux.beauty
> 
> [global]
> filename=/mnt/bench/foo
> ioengine=io_uring
> size=1G
> bs=4K
> iodepth=64
> numjobs=64
> direct=1
> fsync=1
> runtime=30s
> time_based=1
> 
> [rand-write]
> rw=randwrite
> </file>
> 
> It's possible I'm doing something wrong.  Can you share your qemu cmdline
> or more details on the bug yall see.

Thanks for taking a look.

I can reproduce the issue here, but it is timing dependent. A single fio run
does not always hit it, so I suspect that's why you're not seeing the dmesg
messages.

Environment:
QEMU: 10.1.2
virtio-pmem backend: memory-backend-ram (shared)

The virtio-pmem relevant QEMU bits:
  -object memory-backend-ram,id=pmem0,size=10G,share=on
  -device virtio-pmem-pci,id=virtio-pmem0,memdev=pmem0

For completeness, this is the full QEMU command line I used (paths replaced
with placeholders):
  qemu-system-x86_64 -enable-kvm -cpu host -smp 16 -m 10G,maxmem=20G \\
    -netdev user,id=net0,hostfwd=tcp::<ssh_port>-:22 \\
    -device virtio-net,netdev=net0 \\
    -drive file=<guest.qcow2>,if=none,id=boot0,format=qcow2 \\
    -device virtio-blk-pci,drive=boot0,num-queues=4 \\
    -object memory-backend-ram,id=pmem0,size=10G,share=on \\
    -device virtio-pmem-pci,id=virtio-pmem0,memdev=pmem0 \\
    -nographic -kernel <bzImage> -append "<cmdline>"

Kernel under test (baseline, no patch):
  v6.18-764-g7aa104c7e8e9

I used the same fio parameters from the cover letter. The only difference is
that I run it in a loop so it has multiple chances to trigger. Each iteration
does a fresh mkfs + mount and clears dmesg before running fio:
This should be equivalent to the foo.fio you posted.

  for i in $(seq 1 10); do
    umount -l /mnt/bench 2>/dev/null || true
    mkfs.ext4 -F /dev/pmem0
    mkdir -p /mnt/bench
    dmesg -C
    mount -t ext4 -o dax,noatime /dev/pmem0 /mnt/bench
    fio --name=randwrite_fsync --filename=/mnt/bench/foo --size=1G \\
      --ioengine=io_uring --rw=randwrite --bs=4k --iodepth=64 --numjobs=64 \\
      --direct=1 --fsync=1 --runtime=30 --time_based=1
    dmesg | egrep -i \\
      -e "no free slots in the virtqueue" \\
      -e "virtio pmem device needs a reset" && break
  done

If it does not trigger in 10 iterations, reboot the guest and repeat.

On the baseline kernel, I see:
"failed to send command to virtio pmem device, no free slots in the virtqueue"
and "virtio pmem device needs a reset"
Typically within a few iterations (often on the first one).

With the fix applied, I ran 10 iterations back-to-back and did not see the
above messages.
 
> >   - dmesg: "no free slots in the virtqueue"
> >            "virtio pmem device needs a reset"
> > 
> > Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
> > Signed-off-by: Li Chen <me@linux.beauty>
> > ---
> >  drivers/nvdimm/nd_virtio.c   | 15 +++++++++++----
> >  drivers/nvdimm/virtio_pmem.c |  1 +
> >  drivers/nvdimm/virtio_pmem.h |  4 ++++
> >  3 files changed, 16 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > index c3f07be4aa22..827a17fe7c71 100644
> > --- a/drivers/nvdimm/nd_virtio.c
> > +++ b/drivers/nvdimm/nd_virtio.c
> > @@ -44,19 +44,24 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
> >  	unsigned long flags;
> >  	int err, err1;
> >  
> > +	might_sleep();
> > +	mutex_lock(&vpmem->flush_lock);
> 
> Assuming this does fix a bug I'd rather use guard here.
> 
> 	guard(mutex)(&vpmem->flush_lock);
> 
> Then skip all the gotos and out_unlock stuff.

Agreed. I'll use guard in v2.
 
> Also, does this affect performance at all?

I did a quick sanity check. With a smaller numjobs value (numjobs=16,
iodepth=64, fsync=1, bs=4k, runtime=30s), I did not see a regression on this
setup. At numjobs=64 the baseline frequently hits NEEDS_RESET, so correctness
is the primary motivation here.

Regards,
Li

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-02-02 17:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-13  3:45 [PATCH] nvdimm: virtio_pmem: serialize flush requests Li Chen
2026-01-30 20:52 ` Ira Weiny
2026-01-31 17:46   ` Michael S. Tsirkin
2026-02-01  4:40     ` Li Chen
2026-01-31 17:47   ` Michael S. Tsirkin
2026-02-02 17:18     ` Ira Weiny
2026-02-01  4:21   ` Li Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox