From: Thomas Glanzmann <thomas@glanzmann.de>
To: kvm@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: I/O stalls when merging qcow2 snapshots on nfs
Date: Sun, 5 May 2024 13:29:28 +0200 [thread overview]
Message-ID: <ZjdtmFu92mSlaHZ2@glanzmann.de> (raw)
Hello,
I often take snapshots in order to move kvm VMs from one nfs share to
another while they're running or to take backups. Sometimes I have very
large VMs (1.1 TB) which take a very long time (40 minutes - 2 hours) to
backup or move. They also write between 20 - 60 GB of data while being
backed up or moved. Once the backup or move is done the dirty snapshot
data needs to be merged to the parent disk. While doing this I often
experience I/O stalls within the VMs in the range of 1 - 20 seconds.
Sometimes worse. But I have some very latency sensitive VMs which crash
or misbehave after 15 seconds I/O stalls. So I would like to know if there
is some tuening I can do to make these I/O stalls shorter.
- I already tried to set vm.dirty_expire_centisecs=100 which appears to
make it better, but not under 15 seconds. Perfect would be I/O stalls
no more than 1 second.
This is how you can reproduce the issue:
- NFS Server:
mkdir /ssd
apt install -y nfs-kernel-server
echo '/nfs 0.0.0.0/0.0.0.0(rw,no_root_squash,no_subtree_check,sync)' > /etc/exports
exports -ra
- NFS Client / KVM Host:
mount server:/ssd /mnt
# Put a VM on /mnt and start it.
# Create a snapshot:
virsh snapshot-create-as --domain testy guest-state1 --diskspec vda,file=/mnt/overlay.qcow2 --disk-only --atomic --no-metadata -no-metadata
- In the VM:
# Write some data (in my case 6 GB of data are writen in 60 seconds due
# to the nfs client being connected with a 1 Gbit/s link)
fio --ioengine=libaio --filesize=32G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=write --blocksize=1m --iodepth=1 --readwrite=write --unlink=1
# Do some synchronous I/O
while true; do date | tee -a date.log; sync; sleep 1; done
- On the NFS Client / KVM host:
# Merge the snapshot into the parentdisk
time virsh blockcommit testy vda --active --pivot --delete
Successfully pivoted
real 1m4.666s
user 0m0.017s
sys 0m0.007s
I exported the nfs share with sync on purpose because I often use drbd
in sync mode (protocol c) to replicate the data on the nfs server to a
site which is 200 km away using a 10 Gbit/s link.
The result is:
(testy) [~] while true; do date | tee -a date.log; sync; sleep 1; done
Sun May 5 12:53:36 CEST 2024
Sun May 5 12:53:37 CEST 2024
Sun May 5 12:53:38 CEST 2024
Sun May 5 12:53:39 CEST 2024
Sun May 5 12:53:40 CEST 2024
Sun May 5 12:53:41 CEST 2024 < here I started virsh blockcommit
Sun May 5 12:53:45 CEST 2024
Sun May 5 12:53:50 CEST 2024
Sun May 5 12:53:59 CEST 2024
Sun May 5 12:54:04 CEST 2024
Sun May 5 12:54:22 CEST 2024
Sun May 5 12:54:23 CEST 2024
Sun May 5 12:54:27 CEST 2024
Sun May 5 12:54:32 CEST 2024
Sun May 5 12:54:40 CEST 2024
Sun May 5 12:54:42 CEST 2024
Sun May 5 12:54:45 CEST 2024
Sun May 5 12:54:46 CEST 2024
Sun May 5 12:54:47 CEST 2024
Sun May 5 12:54:48 CEST 2024
Sun May 5 12:54:49 CEST 2024
This is with 'vm.dirty_expire_centisecs=100' with the default values
'vm.dirty_expire_centisecs=3000' it is worse.
I/O stalls:
- 4 seconds
- 9 seconds
- 5 seconds
- 18 seconds
- 4 seconds
- 5 seconds
- 8 seconds
- 2 seconds
- 3 seconds
With the default vm.dirty_expire_centisecs=3000 I get something like that:
(testy) [~] while true; do date | tee -a date.log; sync; sleep 1; done
Sun May 5 11:51:33 CEST 2024
Sun May 5 11:51:34 CEST 2024
Sun May 5 11:51:35 CEST 2024
Sun May 5 11:51:37 CEST 2024
Sun May 5 11:51:38 CEST 2024
Sun May 5 11:51:39 CEST 2024
Sun May 5 11:51:40 CEST 2024 << virsh blockcommit
Sun May 5 11:51:49 CEST 2024
Sun May 5 11:52:07 CEST 2024
Sun May 5 11:52:08 CEST 2024
Sun May 5 11:52:27 CEST 2024
Sun May 5 11:52:45 CEST 2024
Sun May 5 11:52:47 CEST 2024
Sun May 5 11:52:48 CEST 2024
Sun May 5 11:52:49 CEST 2024
I/O stalls:
- 9 seconds
- 18 seconds
- 19 seconds
- 18 seconds
- 1 seconds
I'm open to any suggestions which improve the situation. I often have 10
Gbit/s network and a lot of dirty buffer cache, but at the same time I
often replicate synchronously to a second site 200 kms apart which only
gives me around 100 MB/s write performance.
With vm.dirty_expire_centisecs=10 even worse:
(testy) [~] while true; do date | tee -a date.log; sync; sleep 1; done
Sun May 5 13:25:31 CEST 2024
Sun May 5 13:25:32 CEST 2024
Sun May 5 13:25:33 CEST 2024
Sun May 5 13:25:34 CEST 2024
Sun May 5 13:25:35 CEST 2024
Sun May 5 13:25:36 CEST 2024
Sun May 5 13:25:37 CEST 2024 < virsh blockcommit
Sun May 5 13:26:00 CEST 2024
Sun May 5 13:26:01 CEST 2024
Sun May 5 13:26:06 CEST 2024
Sun May 5 13:26:11 CEST 2024
Sun May 5 13:26:40 CEST 2024
Sun May 5 13:26:42 CEST 2024
Sun May 5 13:26:43 CEST 2024
Sun May 5 13:26:44 CEST 2024
I/O stalls:
- 23 seconds
- 5 seconds
- 5 seconds
- 29 seconds
- 1 second
Cheers,
Thomas
next reply other threads:[~2024-05-05 11:39 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-05 11:29 Thomas Glanzmann [this message]
2024-05-06 11:25 ` I/O stalls when merging qcow2 snapshots on nfs Benjamin Coddington
2024-05-06 17:21 ` Thomas Glanzmann
2024-05-06 13:47 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZjdtmFu92mSlaHZ2@glanzmann.de \
--to=thomas@glanzmann.de \
--cc=kvm@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.