From: Jeff Layton <jlayton@kernel.org>
To: Joanne Koong <joannelkoong@gmail.com>,
miklos@szeredi.hu, akpm@linux-foundation.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org
Cc: jefflexu@linux.alibaba.com, shakeel.butt@linux.dev,
david@redhat.com, bernd.schubert@fastmail.fm, ziy@nvidia.com,
kernel-team@meta.com
Subject: Re: [PATCH v7 0/3] fuse: remove temp page copies in writeback
Date: Mon, 14 Apr 2025 12:21:39 -0400 [thread overview]
Message-ID: <0e00e8b306620c781868f375a462127d72b26289.camel@kernel.org> (raw)
In-Reply-To: <20250404181443.1363005-1-joannelkoong@gmail.com>
On Fri, 2025-04-04 at 11:14 -0700, Joanne Koong wrote:
> The purpose of this patchset is to help make writeback in FUSE filesystems as
> fast as possible.
>
> In the current FUSE writeback design (see commit 3be5a52b30aa
> ("fuse: support writable mmap"))), a temp page is allocated for every dirty
> page to be written back, the contents of the dirty page are copied over to the
> temp page, and the temp page gets handed to the server to write back. This is
> done so that writeback may be immediately cleared on the dirty page, and this
> in turn is done in order to mitigate the following deadlock scenario that may
> arise if reclaim waits on writeback on the dirty page to complete (more details
> can be found in this thread [1]):
> * single-threaded FUSE server is in the middle of handling a request
> that needs a memory allocation
> * memory allocation triggers direct reclaim
> * direct reclaim waits on a folio under writeback
> * the FUSE server can't write back the folio since it's stuck in
> direct reclaim
>
> Allocating and copying dirty pages to temp pages is the biggest performance
> bottleneck for FUSE writeback. This patchset aims to get rid of the temp page
> altogether (which will also allow us to get rid of the internal FUSE rb tree
> that is needed to keep track of writeback status on the temp pages).
> Benchmarks show approximately a 20% improvement in throughput for 4k
> block-size writes and a 45% improvement for 1M block-size writes.
>
> In the current reclaim code, there is one scenario where writeback is waited
> on, which is the case where the system is running legacy cgroupv1 and reclaim
> encounters a folio that already has the reclaim flag set and the caller did
> not have __GFP_FS (or __GFP_IO if swap) set.
>
> This patchset adds a new mapping flag, AS_WRITEBACK_INDETERMINATE, which
> filesystems may set on its inode mappings to indicate that writeback
> operations may take an indeterminate amount of time to complete. FUSE will set
> this flag on its mappings. Reclaim for the legacy cgroup v1 case described
> above will skip reclaim of folios with that flag set.
>
> With this change, writeback state is now only cleared on the dirty page after
> the server has written it back to disk. If the server is deliberately
> malicious or well-intentioned but buggy, this may stall sync(2) and page
> migration, but for sync(2), a malicious server may already stall this by not
> replying to the FUSE_SYNCFS request and for page migration, there are already
> many easier ways to stall this by having FUSE permanently hold the folio lock.
> A fuller discussion on this can be found in [2]. Long-term, there needs to be
> a more comprehensive solution for addressing migration of FUSE pages that
> handles all scenarios where FUSE may permanently hold the lock, but that is
> outside the scope of this patchset and will be done as future work. Please
> also note that this change also now ensures that when sync(2) returns, FUSE
> filesystems will have persisted writeback changes.
>
> [1] https://lore.kernel.org/linux-kernel/495d2400-1d96-4924-99d3-8b2952e05fc3@linux.alibaba.com/
> [2] https://lore.kernel.org/linux-fsdevel/20241122232359.429647-1-joannelkoong@gmail.com/
>
> Changelog
> ---------
> v6:
> https://lore.kernel.org/linux-fsdevel/20241122232359.429647-1-joannelkoong@gmail.com/
> Changes from v6 -> v7:
> * Drop migration and sync patches, as they are useless if a server is
> determined to be malicious
>
> v5:
> https://lore.kernel.org/linux-fsdevel/20241115224459.427610-1-joannelkoong@gmail.com/
> Changes from v5 -> v6:
> * Add Shakeel and Jingbo's reviewed-bys
> * Move folio_end_writeback() to fuse_writepage_finish() (Jingbo)
> * Embed fuse_writepage_finish_stat() logic inline (Jingbo)
> * Remove node_stat NR_WRITEBACK inc/sub (Jingbo)
>
> v4:
> https://lore.kernel.org/linux-fsdevel/20241107235614.3637221-1-joannelkoong@gmail.com/
> Changes from v4 -> v5:
> * AS_WRITEBACK_MAY_BLOCK -> AS_WRITEBACK_INDETERMINATE (Shakeel)
> * Drop memory hotplug patch (David and Shakeel)
> * Remove some more kunnecessary writeback waits in fuse code (Jingbo)
> * Make commit message for reclaim patch more concise - drop part about
> deadlock and just focus on how it may stall waits
>
> v3:
> https://lore.kernel.org/linux-fsdevel/20241107191618.2011146-1-joannelkoong@gmail.com/
> Changes from v3 -> v4:
> * Use filemap_fdatawait_range() instead of filemap_range_has_writeback() in
> readahead
>
> v2:
> https://lore.kernel.org/linux-fsdevel/20241014182228.1941246-1-joannelkoong@gmail.com/
> Changes from v2 -> v3:
> * Account for sync and page migration cases as well (Miklos)
> * Change AS_NO_WRITEBACK_RECLAIM to the more generic AS_WRITEBACK_MAY_BLOCK
> * For fuse inodes, set mapping_writeback_may_block only if fc->writeback_cache
> is enabled
>
> v1:
> https://lore.kernel.org/linux-fsdevel/20241011223434.1307300-1-joannelkoong@gmail.com/T/#t
> Changes from v1 -> v2:
> * Have flag in "enum mapping_flags" instead of creating asop_flags (Shakeel)
> * Set fuse inodes to use AS_NO_WRITEBACK_RECLAIM (Shakeel)
>
> Joanne Koong (3):
> mm: add AS_WRITEBACK_INDETERMINATE mapping flag
> mm: skip reclaiming folios in legacy memcg writeback indeterminate
> contexts
> fuse: remove tmp folio for writebacks and internal rb tree
>
> fs/fuse/file.c | 360 ++++------------------------------------
> fs/fuse/fuse_i.h | 3 -
> include/linux/pagemap.h | 11 ++
> mm/vmscan.c | 10 +-
> 4 files changed, 46 insertions(+), 338 deletions(-)
>
This looks sane, and I love that diffstat.
I also agree with David about changing the flag name to something more
specific. As a kernel engineer, anything with "INDETERMINATE" in the
name gives me the ick.
Assuming that the only real change in v8 will be the flag name change,
you can add:
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Assuming others are ok with this, how do you see this going in? Maybe
Andrew could pick up the mm bits and Miklos could take the FUSE patch?
next prev parent reply other threads:[~2025-04-14 16:21 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-04 18:14 [PATCH v7 0/3] fuse: remove temp page copies in writeback Joanne Koong
2025-04-04 18:14 ` [PATCH v7 1/3] mm: add AS_WRITEBACK_INDETERMINATE mapping flag Joanne Koong
2025-04-04 19:13 ` David Hildenbrand
2025-04-04 20:09 ` Joanne Koong
2025-04-04 20:13 ` David Hildenbrand
2025-04-09 22:05 ` Shakeel Butt
2025-04-09 23:48 ` Joanne Koong
2025-04-04 18:14 ` [PATCH v7 2/3] mm: skip reclaiming folios in legacy memcg writeback indeterminate contexts Joanne Koong
2025-04-04 18:14 ` [PATCH v7 3/3] fuse: remove tmp folio for writebacks and internal rb tree Joanne Koong
2025-04-09 2:43 ` Jingbo Xu
2025-04-09 23:47 ` Joanne Koong
2025-04-10 2:12 ` Jingbo Xu
2025-04-10 15:07 ` Joanne Koong
2025-04-10 15:11 ` Jingbo Xu
2025-04-10 16:11 ` Joanne Koong
2025-04-14 20:24 ` Joanne Koong
2025-04-15 7:49 ` Jingbo Xu
2025-04-15 15:59 ` Joanne Koong
2025-04-16 1:40 ` Jingbo Xu
2025-04-16 16:43 ` Joanne Koong
2025-04-16 18:05 ` Bernd Schubert
2025-04-14 16:21 ` Jeff Layton [this message]
2025-04-14 20:28 ` [PATCH v7 0/3] fuse: remove temp page copies in writeback Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0e00e8b306620c781868f375a462127d72b26289.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=bernd.schubert@fastmail.fm \
--cc=david@redhat.com \
--cc=jefflexu@linux.alibaba.com \
--cc=joannelkoong@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=miklos@szeredi.hu \
--cc=shakeel.butt@linux.dev \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).