public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Catherine Hoang <catherine.hoang@oracle.com>,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH v4] xfs: allow read IO and FICLONE to run concurrently
Date: Mon, 23 Oct 2023 08:40:09 -0700	[thread overview]
Message-ID: <20231023154009.GU3195650@frogsfrogsfrogs> (raw)
In-Reply-To: <ZTWlc3R95DPLOjw3@dread.disaster.area>

On Mon, Oct 23, 2023 at 09:42:59AM +1100, Dave Chinner wrote:
> On Fri, Oct 20, 2023 at 08:34:48AM -0700, Darrick J. Wong wrote:
> > On Thu, Oct 19, 2023 at 11:06:42PM -0700, Christoph Hellwig wrote:
> > > On Thu, Oct 19, 2023 at 01:04:11PM -0700, Darrick J. Wong wrote:
> > > > Well... the stupid answer is that I augmented generic/176 to try to race
> > > > buffered and direct reads with cloning a million extents and print out
> > > > when the racing reads completed.  On an unpatched kernel, the reads
> > > > don't complete until the reflink does:
> > > 
> > > > So as you can see, reads from the reflink source file no longer
> > > > experience a giant latency spike.  I also wrote an fstest to check this
> > > > behavior; I'll attach it as a separate reply.
> > > 
> > > Nice.  I guess write latency doesn't really matter for this use
> > > case?
> > 
> > Nope -- they've gotten libvirt to tell qemu to redirect vm disk writes
> > to a new sidecar file.  Then they reflink the original source file to
> > the backup file, but they want qemu to be able to service reads from
> > that original source file while the reflink is ongoing.  When the backup
> > is done, they commit the sidecar contents back into the original image.
> > 
> > It would be kinda neat if we had file range locks.  Regular progress
> > could shorten the range as it makes progress.  If the thread doing the
> > reflink could find out that another thread has blocked on part of the
> > file range, it could even hurry up and clone that part so that neither
> > reads nor writes would see enormous latency spikes.
> > 
> > Even better, we could actually support concurrent reads and writes to
> > the page cache as long as the ranges don't overlap.  But that's all
> > speculative until Dave dumps his old ranged lock patchset on the list.
> 
> The unfortunate reality is that range locks as I was trying to
> implement them didn't scale - it was a failed experiment.
> 
> The issue is the internal tracking structure of a range lock. It has
> to be concurrency safe itself, and even with lockless tree
> structures using per-node seqlocks for internal sequencing, they
> still rely on atomic ops for safe concurrent access and updates.
> 
> Hence the best I could get out of an uncontended range lock (i.e.
> locking different exclusive ranges concurrently) was about 400,000
> lock/unlock operations per second before the internal tracking
> structure broke down under concurrent modification pressure.  That
> was a whole lot better than previous attempts that topped out at
> ~150,000 lock/unlock ops/s, but it's still far short of the ~3
> million concurrent shared lock/unlock ops/s than a rwsem could do on
> that same machine.
> 
> Worse for range locks was that once passed peak performance,
> internal contention within the range lock caused performance to fall
> off a cliff and ends up being much worse than just using pure
> exclusive locking with a mutex.
> 
> Hence without some novel new internal lockless and memory allocation
> free tracking structure and algorithm, range locks will suck for the
> one thing we want them for: high performance, highly concurrent
> access to discrete ranges of a single file.

Ah.  Thanks for the reminder about that.

--D

> -Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com

      reply	other threads:[~2023-10-23 15:40 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-17 20:12 [PATCH v4] xfs: allow read IO and FICLONE to run concurrently Catherine Hoang
2023-10-17 22:59 ` Darrick J. Wong
2023-10-17 23:59 ` Dave Chinner
2023-10-23  6:42   ` Chandan Babu R
2023-10-18  6:08 ` Christoph Hellwig
2023-10-19 20:04   ` Darrick J. Wong
2023-10-20  6:06     ` Christoph Hellwig
2023-10-20 15:34       ` Darrick J. Wong
2023-10-22 22:42         ` Dave Chinner
2023-10-23 15:40           ` Darrick J. Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231023154009.GU3195650@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=catherine.hoang@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox