All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>,
	Catherine Hoang <catherine.hoang@oracle.com>,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH v4] xfs: allow read IO and FICLONE to run concurrently
Date: Mon, 23 Oct 2023 09:42:59 +1100	[thread overview]
Message-ID: <ZTWlc3R95DPLOjw3@dread.disaster.area> (raw)
In-Reply-To: <20231020153448.GR3195650@frogsfrogsfrogs>

On Fri, Oct 20, 2023 at 08:34:48AM -0700, Darrick J. Wong wrote:
> On Thu, Oct 19, 2023 at 11:06:42PM -0700, Christoph Hellwig wrote:
> > On Thu, Oct 19, 2023 at 01:04:11PM -0700, Darrick J. Wong wrote:
> > > Well... the stupid answer is that I augmented generic/176 to try to race
> > > buffered and direct reads with cloning a million extents and print out
> > > when the racing reads completed.  On an unpatched kernel, the reads
> > > don't complete until the reflink does:
> > 
> > > So as you can see, reads from the reflink source file no longer
> > > experience a giant latency spike.  I also wrote an fstest to check this
> > > behavior; I'll attach it as a separate reply.
> > 
> > Nice.  I guess write latency doesn't really matter for this use
> > case?
> 
> Nope -- they've gotten libvirt to tell qemu to redirect vm disk writes
> to a new sidecar file.  Then they reflink the original source file to
> the backup file, but they want qemu to be able to service reads from
> that original source file while the reflink is ongoing.  When the backup
> is done, they commit the sidecar contents back into the original image.
> 
> It would be kinda neat if we had file range locks.  Regular progress
> could shorten the range as it makes progress.  If the thread doing the
> reflink could find out that another thread has blocked on part of the
> file range, it could even hurry up and clone that part so that neither
> reads nor writes would see enormous latency spikes.
> 
> Even better, we could actually support concurrent reads and writes to
> the page cache as long as the ranges don't overlap.  But that's all
> speculative until Dave dumps his old ranged lock patchset on the list.

The unfortunate reality is that range locks as I was trying to
implement them didn't scale - it was a failed experiment.

The issue is the internal tracking structure of a range lock. It has
to be concurrency safe itself, and even with lockless tree
structures using per-node seqlocks for internal sequencing, they
still rely on atomic ops for safe concurrent access and updates.

Hence the best I could get out of an uncontended range lock (i.e.
locking different exclusive ranges concurrently) was about 400,000
lock/unlock operations per second before the internal tracking
structure broke down under concurrent modification pressure.  That
was a whole lot better than previous attempts that topped out at
~150,000 lock/unlock ops/s, but it's still far short of the ~3
million concurrent shared lock/unlock ops/s than a rwsem could do on
that same machine.

Worse for range locks was that once passed peak performance,
internal contention within the range lock caused performance to fall
off a cliff and ends up being much worse than just using pure
exclusive locking with a mutex.

Hence without some novel new internal lockless and memory allocation
free tracking structure and algorithm, range locks will suck for the
one thing we want them for: high performance, highly concurrent
access to discrete ranges of a single file.

-Dave.

-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2023-10-22 22:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-17 20:12 [PATCH v4] xfs: allow read IO and FICLONE to run concurrently Catherine Hoang
2023-10-17 22:59 ` Darrick J. Wong
2023-10-17 23:59 ` Dave Chinner
2023-10-23  6:42   ` Chandan Babu R
2023-10-18  6:08 ` Christoph Hellwig
2023-10-19 20:04   ` Darrick J. Wong
2023-10-20  6:06     ` Christoph Hellwig
2023-10-20 15:34       ` Darrick J. Wong
2023-10-22 22:42         ` Dave Chinner [this message]
2023-10-23 15:40           ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZTWlc3R95DPLOjw3@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=catherine.hoang@oracle.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.