All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Glanzmann <thomas@glanzmann.de>
To: Benjamin Coddington <bcodding@redhat.com>,
	Trond Myklebust <trondmy@hammerspace.com>
Cc: kvm@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: I/O stalls when merging qcow2 snapshots on nfs
Date: Mon, 6 May 2024 19:21:32 +0200	[thread overview]
Message-ID: <ZjkRnJD7wQRnn1Lf@glanzmann.de> (raw)
In-Reply-To: <74f183ca71fbde90678f138077965ffd19bed91b.camel@hammerspace.com> <CC139243-7C48-4416-BE71-3C7B651F00FC@redhat.com>

Hello Ben and Trond,

> On 5 May 2024, at 7:29, Thomas Glanzmann wrote paraphrased:

> When commiting 20 - 60 GB snapshots on kvm VMs which are stored on NFS I get 20
> seconds+ I/O stalls.

> When doing backups and migrations with kvm on NFS I get I/O stalls in
> the guest. How to avoid that?

* Benjamin Coddington <bcodding@redhat.com> [2024-05-06 13:25]:
> What NFS version ends up getting mounted here?

NFS 4.2: (below output has already your's and Tronds options added)

172.31.0.1:/nfs on /mnt type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,nconnect=16,timeo=600,retrans=2,sec=sys,clientaddr=172.31.0.6,local_lock=none,write=eager,addr=172.31.0.1)

> You might eliminate some head-of-line blocking issues with the
> "nconnect=16" mount option to open additional TCP connections.

> My view of what could be happening is that the IO from your guest's process
> is congesting with the IO from your 'virsh blockcommit' process, and we
> don't currently have a great way to classify and queue IO from various
> sources in various ways.

thank you for reminding me of nconnect. I evaluated it with VMware ESX and saw
no benefit when benchmarking it with a single VM and dismissed it. But of
course it makes sense when having more than one concurrent I/O stream.

* Trond Myklebust <trondmy@hammerspace.com> [2024-05-06 15:47]:
> Two suggestions:
>    1. Try mounting the NFS partition on which these VMs reside with the
>       "write=eager" mount option. That ensures that the kernel kicks
>       off the write of the block immediately once QEMU has scheduled it
>       for writeback. Note, however that the kernel does not wait for
>       that write to complete (i.e. these writes are all asynchronous).
>    2. Alternatively, try playing with the 'vm.dirty_ratio' or
>       'vm.dirty_bytes' values in order to trigger writeback at an
>       earlier time. With the default value of vm.dirty_ratio=20, you
>       can end up caching up to 20% of your total memory's worth of
>       dirty data before the VM triggers writeback over that 1Gbit link.

Thank you for the option write=eager. I was not aware of that but I
often run into problems where a 10 Gbit/s network pipe fills up my
buffer cache and than tries to destage GB 128 GB * 0.2 - 25.6 GB to the
disk which can't keep in my case and resulting in long I/O stalls. Usually my
disks can take between 100 (synchronous replicated drbd link 200km) - 500 MB/s
(SATA SSDs). I tried to tell kernel to destage faster by
(vm.dirty_expire_centisecs=100) which improved some workloads but not all.

So, I think I found a solution to my problem by doing the following:

- Increase NFSD threads to 128:

cat > /etc/nfs.conf.d/storage.conf <<'EOF'
[nfsd]
threads = 128

[mountd]
threads = 8
EOF
echo 128 > /proc/fs/nfsd/threads

- Mount the nfs volume with -o nconnect=16,write=eager

- Use iothreads and cache=none.

  <iothreads>2</iothreads>
  <driver name='qemu' type='qcow2' cache='none' discard='unmap' iothread='1'/>

By doing the above I no longer see any I/O stalls longer than one second (in my
date loop 2 seconds time difference).

Thank you two again for helping me out with this.

Cheers,
	Thomas

PS: Cache=writethrough and without I/O threads the I/O stalls for the time blockcommit executes.

  reply	other threads:[~2024-05-06 17:21 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-05 11:29 I/O stalls when merging qcow2 snapshots on nfs Thomas Glanzmann
2024-05-06 11:25 ` Benjamin Coddington
2024-05-06 17:21   ` Thomas Glanzmann [this message]
2024-05-06 13:47 ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZjkRnJD7wQRnn1Lf@glanzmann.de \
    --to=thomas@glanzmann.de \
    --cc=bcodding@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.