public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Catherine Hoang <catherine.hoang@oracle.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH v4] xfs: allow read IO and FICLONE to run concurrently
Date: Thu, 19 Oct 2023 13:04:11 -0700	[thread overview]
Message-ID: <20231019200411.GN3195650@frogsfrogsfrogs> (raw)
In-Reply-To: <ZS92TizgnKHdBtDb@infradead.org>

On Tue, Oct 17, 2023 at 11:08:14PM -0700, Christoph Hellwig wrote:
> On Tue, Oct 17, 2023 at 01:12:08PM -0700, Catherine Hoang wrote:
> > One of our VM cluster management products needs to snapshot KVM image
> > files so that they can be restored in case of failure. Snapshotting is
> > done by redirecting VM disk writes to a sidecar file and using reflink
> > on the disk image, specifically the FICLONE ioctl as used by
> > "cp --reflink". Reflink locks the source and destination files while it
> > operates, which means that reads from the main vm disk image are blocked,
> > causing the vm to stall. When an image file is heavily fragmented, the
> > copy process could take several minutes. Some of the vm image files have
> > 50-100 million extent records, and duplicating that much metadata locks
> > the file for 30 minutes or more. Having activities suspended for such
> > a long time in a cluster node could result in node eviction.
> > 
> > Clone operations and read IO do not change any data in the source file,
> > so they should be able to run concurrently. Demote the exclusive locks
> > taken by FICLONE to shared locks to allow reads while cloning. While a
> > clone is in progress, writes will take the IOLOCK_EXCL, so they block
> > until the clone completes.
> 
> Sorry for being pesky, but do you have some rough numbers on how
> much this actually with the above workload?

Well... the stupid answer is that I augmented generic/176 to try to race
buffered and direct reads with cloning a million extents and print out
when the racing reads completed.  On an unpatched kernel, the reads
don't complete until the reflink does:

--- /tmp/fstests/tests/generic/176.out  2023-07-11 12:18:21.617971250 -0700
+++ /var/tmp/fstests/generic/176.out.bad        2023-10-19 10:22:04.771017812 -0700
@@ -2,3 +2,8 @@
 Format and mount
 Create a many-block file
 Reflink the big file
+start reflink Thu Oct 19 10:19:19 PDT 2023
+end reflink Thu Oct 19 10:20:06 PDT 2023
+buffered read ioend Thu Oct 19 10:20:06 PDT 2023
+direct read ioend Thu Oct 19 10:20:06 PDT 2023
+finished waiting Thu Oct 19 10:20:06 PDT 2023

Yowza, a minute's worth of read latency!  On a patched kernel, the reads
complete while the clone is running:

--- /tmp/fstests/tests/generic/176.out  2023-07-11 12:18:21.617971250 -0700
+++ /var/tmp/fstests/generic/176.out.bad        2023-10-19 10:22:25.528685643 -0700
@@ -2,3 +2,552 @@
 Format and mount
 Create a many-block file
 Reflink the big file
+start reflink Thu Oct 19 10:19:24 PDT 2023
+buffered read ioend Thu Oct 19 10:19:24 PDT 2023
+direct read ioend Thu Oct 19 10:19:24 PDT 2023
+buffered read ioend Thu Oct 19 10:19:24 PDT 2023
+direct read ioend Thu Oct 19 10:19:24 PDT 2023
+buffered read ioend Thu Oct 19 10:19:24 PDT 2023
+buffered read ioend Thu Oct 19 10:19:24 PDT 2023
+buffered read ioend Thu Oct 19 10:19:25 PDT 2023
+buffered read ioend Thu Oct 19 10:19:25 PDT 2023
+direct read ioend Thu Oct 19 10:19:25 PDT 2023
...
+buffered read ioend Thu Oct 19 10:20:06 PDT 2023
+buffered read ioend Thu Oct 19 10:20:07 PDT 2023
+buffered read ioend Thu Oct 19 10:20:07 PDT 2023
+direct read ioend Thu Oct 19 10:20:07 PDT 2023
+buffered read ioend Thu Oct 19 10:20:07 PDT 2023
+buffered read ioend Thu Oct 19 10:20:07 PDT 2023
+buffered read ioend Thu Oct 19 10:20:07 PDT 2023
+end reflink Thu Oct 19 10:20:07 PDT 2023
+direct read ioend Thu Oct 19 10:20:07 PDT 2023
+finished waiting Thu Oct 19 10:20:07 PDT 2023

So as you can see, reads from the reflink source file no longer
experience a giant latency spike.  I also wrote an fstest to check this
behavior; I'll attach it as a separate reply.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

  reply	other threads:[~2023-10-19 20:04 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-17 20:12 [PATCH v4] xfs: allow read IO and FICLONE to run concurrently Catherine Hoang
2023-10-17 22:59 ` Darrick J. Wong
2023-10-17 23:59 ` Dave Chinner
2023-10-23  6:42   ` Chandan Babu R
2023-10-18  6:08 ` Christoph Hellwig
2023-10-19 20:04   ` Darrick J. Wong [this message]
2023-10-20  6:06     ` Christoph Hellwig
2023-10-20 15:34       ` Darrick J. Wong
2023-10-22 22:42         ` Dave Chinner
2023-10-23 15:40           ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231019200411.GN3195650@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=catherine.hoang@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox