Re: [PATCH] nvdimm: virtio_pmem: serialize flush requests

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Li Chen <me@linux.beauty>
To: Ira Weiny <ira.weiny@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	"Vishal Verma" <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Pankaj Gupta <pankaj.gupta.linux@gmail.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Jakub Staron <jstaron@google.com>, <nvdimm@lists.linux.dev>,
	<virtualization@lists.linux.dev>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] nvdimm: virtio_pmem: serialize flush requests
Date: Sun, 01 Feb 2026 12:21:42 +0800	[thread overview]
Message-ID: <87a4xtozzd.wl-me@linux.beauty> (raw)
In-Reply-To: <697d19fc772ad_f6311008@iweiny-mobl.notmuch>


Hi Ira,

On Sat, 31 Jan 2026 04:52:12 +0800,
Ira Weiny wrote:
> 
> Li Chen wrote:
> > Under heavy concurrent flush traffic, virtio-pmem can overflow its request
> > virtqueue (req_vq): virtqueue_add_sgs() starts returning -ENOSPC and the
> > driver logs "no free slots in the virtqueue". Shortly after that the
> > device enters VIRTIO_CONFIG_S_NEEDS_RESET and flush requests fail with
> > "virtio pmem device needs a reset".
> > 
> > Serialize virtio_pmem_flush() with a per-device mutex so only one flush
> > request is in-flight at a time. This prevents req_vq descriptor overflow
> > under high concurrency.
> > 
> > Reproducer (guest with virtio-pmem):
> >   - mkfs.ext4 -F /dev/pmem0
> >   - mount -t ext4 -o dax,noatime /dev/pmem0 /mnt/bench
> >   - fio: ioengine=io_uring rw=randwrite bs=4k iodepth=64 numjobs=64
> >         direct=1 fsync=1 runtime=30s time_based=1
> 
> I don't see this error.
> 
> <file>
> 13:28:50 > cat foo.fio 
> # test http://lore.kernel.org/20260113034552.62805-1-me@linux.beauty
> 
> [global]
> filename=/mnt/bench/foo
> ioengine=io_uring
> size=1G
> bs=4K
> iodepth=64
> numjobs=64
> direct=1
> fsync=1
> runtime=30s
> time_based=1
> 
> [rand-write]
> rw=randwrite
> </file>
> 
> It's possible I'm doing something wrong.  Can you share your qemu cmdline
> or more details on the bug yall see.

Thanks for taking a look.

I can reproduce the issue here, but it is timing dependent. A single fio run
does not always hit it, so I suspect that's why you're not seeing the dmesg
messages.

Environment:
QEMU: 10.1.2
virtio-pmem backend: memory-backend-ram (shared)

The virtio-pmem relevant QEMU bits:
  -object memory-backend-ram,id=pmem0,size=10G,share=on
  -device virtio-pmem-pci,id=virtio-pmem0,memdev=pmem0

For completeness, this is the full QEMU command line I used (paths replaced
with placeholders):
  qemu-system-x86_64 -enable-kvm -cpu host -smp 16 -m 10G,maxmem=20G \\
    -netdev user,id=net0,hostfwd=tcp::<ssh_port>-:22 \\
    -device virtio-net,netdev=net0 \\
    -drive file=<guest.qcow2>,if=none,id=boot0,format=qcow2 \\
    -device virtio-blk-pci,drive=boot0,num-queues=4 \\
    -object memory-backend-ram,id=pmem0,size=10G,share=on \\
    -device virtio-pmem-pci,id=virtio-pmem0,memdev=pmem0 \\
    -nographic -kernel <bzImage> -append "<cmdline>"

Kernel under test (baseline, no patch):
  v6.18-764-g7aa104c7e8e9

I used the same fio parameters from the cover letter. The only difference is
that I run it in a loop so it has multiple chances to trigger. Each iteration
does a fresh mkfs + mount and clears dmesg before running fio:
This should be equivalent to the foo.fio you posted.

  for i in $(seq 1 10); do
    umount -l /mnt/bench 2>/dev/null || true
    mkfs.ext4 -F /dev/pmem0
    mkdir -p /mnt/bench
    dmesg -C
    mount -t ext4 -o dax,noatime /dev/pmem0 /mnt/bench
    fio --name=randwrite_fsync --filename=/mnt/bench/foo --size=1G \\
      --ioengine=io_uring --rw=randwrite --bs=4k --iodepth=64 --numjobs=64 \\
      --direct=1 --fsync=1 --runtime=30 --time_based=1
    dmesg | egrep -i \\
      -e "no free slots in the virtqueue" \\
      -e "virtio pmem device needs a reset" && break
  done

If it does not trigger in 10 iterations, reboot the guest and repeat.

On the baseline kernel, I see:
"failed to send command to virtio pmem device, no free slots in the virtqueue"
and "virtio pmem device needs a reset"
Typically within a few iterations (often on the first one).

With the fix applied, I ran 10 iterations back-to-back and did not see the
above messages.
 
> >   - dmesg: "no free slots in the virtqueue"
> >            "virtio pmem device needs a reset"
> > 
> > Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
> > Signed-off-by: Li Chen <me@linux.beauty>
> > ---
> >  drivers/nvdimm/nd_virtio.c   | 15 +++++++++++----
> >  drivers/nvdimm/virtio_pmem.c |  1 +
> >  drivers/nvdimm/virtio_pmem.h |  4 ++++
> >  3 files changed, 16 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > index c3f07be4aa22..827a17fe7c71 100644
> > --- a/drivers/nvdimm/nd_virtio.c
> > +++ b/drivers/nvdimm/nd_virtio.c
> > @@ -44,19 +44,24 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
> >  	unsigned long flags;
> >  	int err, err1;
> >  
> > +	might_sleep();
> > +	mutex_lock(&vpmem->flush_lock);
> 
> Assuming this does fix a bug I'd rather use guard here.
> 
> 	guard(mutex)(&vpmem->flush_lock);
> 
> Then skip all the gotos and out_unlock stuff.

Agreed. I'll use guard in v2.
 
> Also, does this affect performance at all?

I did a quick sanity check. With a smaller numjobs value (numjobs=16,
iodepth=64, fsync=1, bs=4k, runtime=30s), I did not see a regression on this
setup. At numjobs=64 the baseline frequently hits NEEDS_RESET, so correctness
is the primary motivation here.

Regards,
Li

     prev parent reply	other threads:[~2026-02-01  4:22 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-13  3:45 [PATCH] nvdimm: virtio_pmem: serialize flush requests Li Chen
2026-01-30 20:52 ` Ira Weiny
2026-01-31 17:46   ` Michael S. Tsirkin
2026-02-01  4:40     ` Li Chen
2026-01-31 17:47   ` Michael S. Tsirkin
2026-02-02 17:18     ` Ira Weiny
2026-02-01  4:21   ` Li Chen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a4xtozzd.wl-me@linux.beauty \
    --to=me@linux.beauty \
    --cc=cohuck@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=jstaron@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=pankaj.gupta.linux@gmail.com \
    --cc=virtualization@lists.linux.dev \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.