linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Bernd Schubert <bernd.schubert@fastmail.fm>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	Jingbo Xu <jefflexu@linux.alibaba.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	lege.wang@jaguarmicro.com,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [HELP] FUSE writeback performance bottleneck
Date: Tue, 4 Jun 2024 12:53:19 -0400	[thread overview]
Message-ID: <20240604165319.GG3413@localhost.localdomain> (raw)
In-Reply-To: <21741978-a604-4054-8af9-793085925c82@fastmail.fm>

On Tue, Jun 04, 2024 at 04:13:25PM +0200, Bernd Schubert wrote:
> 
> 
> On 6/4/24 12:02, Miklos Szeredi wrote:
> > On Tue, 4 Jun 2024 at 11:32, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
> > 
> >> Back to the background for the copy, so it copies pages to avoid
> >> blocking on memory reclaim. With that allocation it in fact increases
> >> memory pressure even more. Isn't the right solution to mark those pages
> >> as not reclaimable and to avoid blocking on it? Which is what the tmp
> >> pages do, just not in beautiful way.
> > 
> > Copying to the tmp page is the same as marking the pages as
> > non-reclaimable and non-syncable.
> > 
> > Conceptually it would be nice to only copy when there's something
> > actually waiting for writeback on the page.
> > 
> > Note: normally the WRITE request would be copied to userspace along
> > with the contents of the pages very soon after starting writeback.
> > After this the contents of the page no longer matter, and we can just
> > clear writeback without doing the copy.
> > 
> > But if the request gets stuck in the input queue before being copied
> > to userspace, then deadlock can still happen if the server blocks on
> > direct reclaim and won't continue with processing the queue.   And
> > sync(2) will also block in that case.>
> > So we'd somehow need to handle stuck WRITE requests.   I don't see an
> > easy way to do this "on demand", when something actually starts
> > waiting on PG_writeback.  Alternatively the page copy could be done
> > after a timeout, which is ugly, but much easier to implement.
> 
> I think the timeout method would only work if we have already allocated
> the pages, under memory pressure page allocation might not work well.
> But then this still seems to be a workaround, because we don't take any
> less memory with these copied pages.
> I'm going to look into mm/ if there isn't a better solution.

I've thought a bit about this, and I still don't have a good solution, so I'm
going to throw out my random thoughts and see if it helps us get to a good spot.

1. Generally we are moving away from GFP_NOFS/GFP_NOIO to instead use
   memalloc_*_save/memalloc_*_restore, so instead the process is marked being in
   these contexts.  We could do something similar for FUSE, tho this gets hairy
   with things that async off request handling to other threads (which is all of
   the FUSE file systems we have internally).  We'd need to have some way to
   apply this to an entire process group, but this could be a workable solution.

2. Per-request timeouts.  This is something we're planning on tackling for other
   reasons, but it could fit nicely here to say "if this fuse fs has a
   per-request timeout, skip the copy".  That way we at least know we're upper
   bound on how long we would be "deadlocked".  I don't love this approach
   because it's still a deadlock until the timeout elapsed, but it's an idea.

3. Since we're limiting writeout per the BDI, we could just say FUSE is special,
   only one memory reclaim related writeout at a time.  We flag when we're doing
   a write via memory reclaim, and then if we try to trigger writeout via memory
   reclaim again we simply reject it to avoid the deadlock.  This has the
   downside of making it so non-fuse related things that may be triggering
   direct reclaim through FUSE means they'll reclaim something else, and if the
   dirty pages from FUSE are the ones causing the problem we could spin a bunch
   evicting pages that we don't care about and thrashing a bit.

As I said all of these have downsides, I think #1 is probably the most workable,
but I haven't thought about it super thoroughly. Thanks,

Josef

  reply	other threads:[~2024-06-04 16:53 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-03  6:17 [HELP] FUSE writeback performance bottleneck Jingbo Xu
2024-06-03 14:43 ` Bernd Schubert
2024-06-03 15:19   ` Miklos Szeredi
2024-06-03 15:32     ` Bernd Schubert
2024-06-03 22:10     ` Dave Chinner
2024-06-04  7:20       ` Miklos Szeredi
2024-06-04  1:57     ` Jingbo Xu
2024-06-04  7:27       ` Miklos Szeredi
2024-06-04  7:36         ` Jingbo Xu
2024-06-04  9:32           ` Bernd Schubert
2024-06-04 10:02             ` Miklos Szeredi
2024-06-04 14:13               ` Bernd Schubert
2024-06-04 16:53                 ` Josef Bacik [this message]
2024-06-04 21:39                   ` Bernd Schubert
2024-06-04 22:16                     ` Josef Bacik
2024-06-05  5:49                       ` Amir Goldstein
2024-06-05 15:35                         ` Josef Bacik
2024-08-22 17:00               ` Joanne Koong
2024-08-22 21:01                 ` Joanne Koong
2024-08-23  3:34               ` Jingbo Xu
2024-09-13  0:00                 ` Joanne Koong
2024-09-13  1:25                   ` Jingbo Xu
2024-06-04 12:24             ` Jingbo Xu
2024-09-11  9:32         ` Jingbo Xu
2024-09-12 23:18           ` Joanne Koong
2024-09-13  3:35             ` Jingbo Xu
2024-09-13 20:55               ` Joanne Koong
2024-10-11 23:08                 ` Joanne Koong
2024-10-14  1:57                   ` Jingbo Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240604165319.GG3413@localhost.localdomain \
    --to=josef@toxicpanda.com \
    --cc=bernd.schubert@fastmail.fm \
    --cc=jefflexu@linux.alibaba.com \
    --cc=lege.wang@jaguarmicro.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miklos@szeredi.hu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).