linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-mm <linux-mm@kvack.org>,
	Daniel Gomez <da.gomez@samsung.com>,
	Pankaj Raghav <p.raghav@samsung.com>,
	Jens Axboe <axboe@kernel.dk>, Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@lst.de>, Chris Mason <clm@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Matthew Wilcox <willy@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO
Date: Tue, 27 Feb 2024 06:08:57 -0800	[thread overview]
Message-ID: <Zd3s-SPx_EnDXJzs@bombadil.infradead.org> (raw)
In-Reply-To: <xhymmlbragegxvgykhaddrkkhc7qn7soapca22ogbjlegjri35@ffqmquunkvxw>

On Tue, Feb 27, 2024 at 05:07:30AM -0500, Kent Overstreet wrote:
> On Fri, Feb 23, 2024 at 03:59:58PM -0800, Luis Chamberlain wrote:
> > Part of the testing we have done with LBS was to do some performance
> > tests on XFS to ensure things are not regressing. Building linux is a
> > fine decent test and we did some random cloud instance tests on that and
> > presented that at Plumbers, but it doesn't really cut it if we want to
> > push things to the limit though. What are the limits to buffered IO
> > and how do we test that? Who keeps track of it?
> > 
> > The obvious recurring tension is that for really high performance folks
> > just recommend to use birect IO. But if you are stress testing changes
> > to a filesystem and want to push buffered IO to its limits it makes
> > sense to stick to buffered IO, otherwise how else do we test it?
> > 
> > It is good to know limits to buffered IO too because some workloads
> > cannot use direct IO.  For instance PostgreSQL doesn't have direct IO
> > support and even as late as the end of last year we learned that adding
> > direct IO to PostgreSQL would be difficult.  Chris Mason has noted also
> > that direct IO can also force writes during reads (?)... Anyway, testing
> > the limits of buffered IO limits to ensure you are not creating
> > regressions when doing some page cache surgery seems like it might be
> > useful and a sensible thing to do .... The good news is we have not found
> > regressions with LBS but all the testing seems to beg the question, of what
> > are the limits of buffered IO anyway, and how does it scale? Do we know, do
> > we care? Do we keep track of it? How does it compare to direct IO for some
> > workloads? How big is the delta? How do we best test that? How do we
> > automate all that? Do we want to automatically test this to avoid regressions?
> > 
> > The obvious issues with some workloads for buffered IO is having a
> > possible penality if you are not really re-using folios added to the
> > page cache. Jens Axboe reported a while ago issues with workloads with
> > random reads over a data set 10x the size of RAM and also proposed
> > RWF_UNCACHED as a way to help [0]. As Chinner put it, this seemed more
> > like direct IO with kernel pages and a memcpy(), and it requires
> > further serialization to be implemented that we already do for
> > direct IO for writes. There at least seems to be agreement that if we're
> > going to provide an enhancement or alternative that we should strive to not
> > make the same mistakes we've done with direct IO. The rationale for some
> > workloads to use buffered IO is it helps reduce some tail latencies, so
> > that's something to live up to.
> > 
> > On that same thread Christoph also mentioned the possibility of a direct
> > IO variant which can leverage the cache. Is that something we want to
> > move forward with?
> > 
> > Chris Mason also listed a few other desirables if we do:
> > 
> > - Allowing concurrent writes (xfs DIO does this now)
> 
> AFAIK every filesystem allows concurrent direct writes, not just xfs,
> it's _buffered_ writes that we care about here.

The context above was a possible direct IO variant, that's why direct IO
was mentioned and that XFS at least had support.

> I just pushed a patch to my CI for buffered writes without taking the
> inode lock - for bcachefs. It'll be straightforward, but a decent amount
> of work, to lift this to the VFS, if people are interested in
> collaborating.
> 
> https://evilpiepirate.org/git/bcachefs.git/log/?h=bcachefs-buffered-write-locking

Neat, this is sort of what I wanted to get a sense for, if this sort of
topic was worth discussing at LSFMM.

> The approach is: for non extending, non appending writes, see if we can
> pin the entire range of the pagecache we're writing to; fall back to
> taking the inode lock if we can't.

Perhaps a silly thought... but initial reaction is, would it make sense
for the page cache to make this easier for us, so we have this be
easier? It is not clear to me but my first reaction to seeing some of
these deltas was what if we had something like the space split up, as we
do with XFS agcounts, and so each group deals with its own ranges. I
considered this before profiling, and as with Matthew I figured it might
be lock contenton.  It very likely is not for my test case, and as Linus
and Dave has clarified we are both penalized and also have a
singlthreaded writeback.  If we had a group split we'd have locks per
group and perhaps a writeback a dedicated thread per group.

  Luis

  reply	other threads:[~2024-02-27 14:08 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-23 23:59 [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO Luis Chamberlain
2024-02-24  4:12 ` Matthew Wilcox
2024-02-24 17:31   ` Linus Torvalds
2024-02-24 18:13     ` Matthew Wilcox
2024-02-24 18:24       ` Linus Torvalds
2024-02-24 18:20     ` Linus Torvalds
2024-02-24 19:11       ` Linus Torvalds
2024-02-24 21:42         ` Theodore Ts'o
2024-02-24 22:57         ` Chris Mason
2024-02-24 23:40           ` Linus Torvalds
2024-05-10 23:57           ` Luis Chamberlain
2024-02-25  5:18     ` Kent Overstreet
2024-02-25  6:04       ` Kent Overstreet
2024-02-25 13:10       ` Matthew Wilcox
2024-02-25 17:03         ` Linus Torvalds
2024-02-25 21:14           ` Matthew Wilcox
2024-02-25 23:45             ` Linus Torvalds
2024-02-26  1:02               ` Kent Overstreet
2024-02-26  1:32                 ` Linus Torvalds
2024-02-26  1:58                   ` Kent Overstreet
2024-02-26  2:06                     ` Kent Overstreet
2024-02-26  2:34                     ` Linus Torvalds
2024-02-26  2:50                   ` Al Viro
2024-02-26 17:17                     ` Linus Torvalds
2024-02-26 21:07                       ` Matthew Wilcox
2024-02-26 21:17                         ` Kent Overstreet
2024-02-26 21:19                           ` Kent Overstreet
2024-02-26 21:55                             ` Paul E. McKenney
2024-02-26 23:29                               ` Kent Overstreet
2024-02-27  0:05                                 ` Paul E. McKenney
2024-02-27  0:29                                   ` Kent Overstreet
2024-02-27  0:55                                     ` Paul E. McKenney
2024-02-27  1:08                                       ` Kent Overstreet
2024-02-27  5:17                                         ` Paul E. McKenney
2024-02-27  6:21                                           ` Kent Overstreet
2024-02-27 15:32                                             ` Paul E. McKenney
2024-02-27 15:52                                               ` Kent Overstreet
2024-02-27 16:06                                                 ` Paul E. McKenney
2024-02-27 15:54                                               ` Matthew Wilcox
2024-02-27 16:21                                                 ` Paul E. McKenney
2024-02-27 16:34                                                   ` Kent Overstreet
2024-02-27 17:58                                                     ` Paul E. McKenney
2024-02-28 23:55                                                       ` Kent Overstreet
2024-02-29 19:42                                                         ` Paul E. McKenney
2024-02-29 20:51                                                           ` Kent Overstreet
2024-03-05  2:19                                                             ` Paul E. McKenney
2024-02-27  0:43                                 ` Dave Chinner
2024-02-26 22:46                       ` Linus Torvalds
2024-02-26 23:48                         ` Linus Torvalds
2024-02-27  7:21                           ` Kent Overstreet
2024-02-27 15:39                             ` Matthew Wilcox
2024-02-27 15:54                               ` Kent Overstreet
2024-02-27 16:34                             ` Linus Torvalds
2024-02-27 16:47                               ` Kent Overstreet
2024-02-27 17:07                                 ` Linus Torvalds
2024-02-27 17:20                                   ` Kent Overstreet
2024-02-27 18:02                                     ` Linus Torvalds
2024-05-14 11:52                         ` Luis Chamberlain
2024-05-14 16:04                           ` Linus Torvalds
2024-11-15 19:43                           ` Linus Torvalds
2024-11-15 20:42                             ` Matthew Wilcox
2024-11-15 21:52                               ` Linus Torvalds
2024-02-25 21:29           ` Kent Overstreet
2024-02-25 17:32         ` Kent Overstreet
2024-02-24 17:55   ` Luis Chamberlain
2024-02-25  5:24 ` Kent Overstreet
2024-02-26 12:22 ` Dave Chinner
2024-02-27 10:07 ` Kent Overstreet
2024-02-27 14:08   ` Luis Chamberlain [this message]
2024-02-27 14:57     ` Kent Overstreet
2024-02-27 22:13   ` Dave Chinner
2024-02-27 22:21     ` Kent Overstreet
2024-02-27 22:42       ` Dave Chinner
2024-02-28  7:48         ` [Lsf-pc] " Amir Goldstein
2024-02-28 14:01           ` Chris Mason
2024-02-29  0:25           ` Dave Chinner
2024-02-29  0:57             ` Kent Overstreet
2024-03-04  0:46               ` Dave Chinner
2024-02-27 22:46       ` Linus Torvalds
2024-02-27 23:00         ` Linus Torvalds
2024-02-28  2:22         ` Kent Overstreet
2024-02-28  3:00           ` Matthew Wilcox
2024-02-28  4:22             ` Matthew Wilcox
2024-02-28 17:34               ` Kent Overstreet
2024-02-28 18:04                 ` Matthew Wilcox
2024-02-28 18:18         ` Kent Overstreet
2024-02-28 19:09           ` Linus Torvalds
2024-02-28 19:29             ` Kent Overstreet
2024-02-28 20:17               ` Linus Torvalds
2024-02-28 23:21                 ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zd3s-SPx_EnDXJzs@bombadil.infradead.org \
    --to=mcgrof@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=clm@fb.com \
    --cc=da.gomez@samsung.com \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@lst.de \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=p.raghav@samsung.com \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).