All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: Jan Kara <jack@suse.cz>,
	viro@zeniv.linux.org.uk, brauner@kernel.org,
	linux-fsdevel@vger.kernel.org,
	Amir Goldstein <amir73il@gmail.com>
Subject: Re: [PATCH] fs: Add a new flag RWF_IOWAIT for preadv2(2)
Date: Wed, 7 Aug 2024 07:52:26 +1000	[thread overview]
Message-ID: <ZrKbGuKFsZsqnrfg@dread.disaster.area> (raw)
In-Reply-To: <CALOAHbASNdPPRXVAxcjVWW7ucLG_DOM+6dpoonqAPpgBS00b7w@mail.gmail.com>

On Tue, Aug 06, 2024 at 10:05:50PM +0800, Yafang Shao wrote:
> On Tue, Aug 6, 2024 at 9:24 PM Jan Kara <jack@suse.cz> wrote:
> > On Tue 06-08-24 19:54:58, Yafang Shao wrote:
> > > Its guarantee is clear:
> > >
> > >   : I/O is intended to be atomic to ordinary files and pipes and FIFOs.
> > >   : Atomic means that all the bytes from a single operation that started
> > >   : out together end up together, without interleaving from other I/O
> > >   : operations.
> >
> > Oh, I understand why XFS does locking this way and I'm well aware this is
> > a requirement in POSIX. However, as you have experienced, it has a
> > significant performance cost for certain workloads (at least with simple
> > locking protocol we have now) and history shows users rather want the extra
> > performance at the cost of being a bit more careful in userspace. So I
> > don't see any filesystem switching to XFS behavior until we have a
> > performant range locking primitive.
> >
> > > What this flag does is avoid waiting for this type of lock if it
> > > exists. Maybe we should consider a more descriptive name like
> > > RWF_NOATOMICWAIT, RWF_NOFSLOCK, or RWF_NOPOSIXWAIT? Naming is always
> > > challenging.
> >
> > Aha, OK. So you want the flag to mean "I don't care about POSIX read-write
> > exclusion". I'm still not convinced the flag is a great idea but
> > RWF_NOWRITEEXCLUSION could perhaps better describe the intent of the flag.
> 
> That's better. Should we proceed with implementing this new flag? It
> provides users with an option to avoid this type of issue.

No. If we are going to add a flag like that, the fix to XFS isn't to
use IOCB_NOWAIT on reads, it's to use shared locking for buffered
writes just like we do for direct IO.

IOWs, this flag would be needed on -writes-, not reads, and at that
point we may as well just change XFS to do shared buffered writes
for -everyone- so it is consistent with all other Linux filesystems.

Indeed, last time Amir brought this up, I suggested that shared
buffered write locking in XFS was the simplest way forward. Given
that we use large folios now, small IOs get mapped to a single folio
and so will still have the same write vs overlapping write exclusion
behaviour most all the time.

However, since then we've moved to using shared IO locking for
cloning files. A clone does not modify data, so read IO is allowed
during the clone. If we move writes to use shared locking, this
breaks file cloning. We would have to move cloning back to to using
exclusive locking, and that's going to cause performance and IO
latency regressions for applications using clones with concurrent IO
(e.g. VM image snapshots in cloud infrastruction).

Hence the only viable solution to all these different competing "we
need exclusive access to a range of the file whilst allowing other
concurrent IO" issues is to move to range locking for IO
exclusion....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2024-08-06 21:52 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-04  8:02 [PATCH] fs: Add a new flag RWF_IOWAIT for preadv2(2) Yafang Shao
2024-08-05 13:40 ` Jan Kara
2024-08-05 14:07   ` Christian Brauner
2024-08-06 11:54   ` Yafang Shao
2024-08-06 13:24     ` Jan Kara
2024-08-06 14:05       ` Yafang Shao
2024-08-06 21:52         ` Dave Chinner [this message]
2024-08-07  3:01           ` Yafang Shao
2024-08-08  2:51             ` Dave Chinner
2024-08-08 13:16               ` Yafang Shao
2024-08-06 14:57       ` Christian Brauner
2024-08-06  5:47 ` Dave Chinner
2024-08-06 11:44   ` Yafang Shao
2024-08-06 15:08 ` Matthew Wilcox
2024-08-07  2:29   ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZrKbGuKFsZsqnrfg@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=laoar.shao@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.