From: Joanne Koong <joannelkoong@gmail.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
Andrew Morton <akpm@linux-foundation.org>,
linux-fsdevel@vger.kernel.org, jefflexu@linux.alibaba.com,
josef@toxicpanda.com, bernd.schubert@fastmail.fm,
linux-mm@kvack.org, kernel-team@meta.com
Subject: Re: [PATCH v6 0/5] fuse: remove temp page copies in writeback
Date: Wed, 18 Dec 2024 09:37:37 -0800 [thread overview]
Message-ID: <CAJnrk1ZHk6BnAWFBhw_rdq1UudgNjBf9r9Eg+VORxuPp48JOPw@mail.gmail.com> (raw)
In-Reply-To: <qbbwxtqrlxhdkesrruwgfnu3qyzi6b6jhahxhbvn56kpiw5i4v@dhvdhlslbhcc>
On Fri, Dec 13, 2024 at 8:47 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> +Andrew
>
> On Fri, Dec 13, 2024 at 12:52:44PM +0100, Miklos Szeredi wrote:
> > On Sat, 23 Nov 2024 at 00:24, Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > The purpose of this patchset is to help make writeback-cache write
> > > performance in FUSE filesystems as fast as possible.
> > >
> > > In the current FUSE writeback design (see commit 3be5a52b30aa
> > > ("fuse: support writable mmap"))), a temp page is allocated for every dirty
> > > page to be written back, the contents of the dirty page are copied over to the
> > > temp page, and the temp page gets handed to the server to write back. This is
> > > done so that writeback may be immediately cleared on the dirty page, and this
> > > in turn is done for two reasons:
> > > a) in order to mitigate the following deadlock scenario that may arise if
> > > reclaim waits on writeback on the dirty page to complete (more details can be
> > > found in this thread [1]):
> > > * single-threaded FUSE server is in the middle of handling a request
> > > that needs a memory allocation
> > > * memory allocation triggers direct reclaim
> > > * direct reclaim waits on a folio under writeback
> > > * the FUSE server can't write back the folio since it's stuck in
> > > direct reclaim
> > > b) in order to unblock internal (eg sync, page compaction) waits on writeback
> > > without needing the server to complete writing back to disk, which may take
> > > an indeterminate amount of time.
> > >
> > > Allocating and copying dirty pages to temp pages is the biggest performance
> > > bottleneck for FUSE writeback. This patchset aims to get rid of the temp page
> > > altogether (which will also allow us to get rid of the internal FUSE rb tree
> > > that is needed to keep track of writeback status on the temp pages).
> > > Benchmarks show approximately a 20% improvement in throughput for 4k
> > > block-size writes and a 45% improvement for 1M block-size writes.
> > >
> > > With removing the temp page, writeback state is now only cleared on the dirty
> > > page after the server has written it back to disk. This may take an
> > > indeterminate amount of time. As well, there is also the possibility of
> > > malicious or well-intentioned but buggy servers where writeback may in the
> > > worst case scenario, never complete. This means that any
> > > folio_wait_writeback() on a dirty page belonging to a FUSE filesystem needs to
> > > be carefully audited.
> > >
> > > In particular, these are the cases that need to be accounted for:
> > > * potentially deadlocking in reclaim, as mentioned above
> > > * potentially stalling sync(2)
> > > * potentially stalling page migration / compaction
> > >
> > > This patchset adds a new mapping flag, AS_WRITEBACK_INDETERMINATE, which
> > > filesystems may set on its inode mappings to indicate that writeback
> > > operations may take an indeterminate amount of time to complete. FUSE will set
> > > this flag on its mappings. This patchset adds checks to the critical parts of
> > > reclaim, sync, and page migration logic where writeback may be waited on.
> > >
> > > Please note the following:
> > > * For sync(2), waiting on writeback will be skipped for FUSE, but this has no
> > > effect on existing behavior. Dirty FUSE pages are already not guaranteed to
> > > be written to disk by the time sync(2) returns (eg writeback is cleared on
> > > the dirty page but the server may not have written out the temp page to disk
> > > yet). If the caller wishes to ensure the data has actually been synced to
> > > disk, they should use fsync(2)/fdatasync(2) instead.
> > > * AS_WRITEBACK_INDETERMINATE does not indicate that the folios should never be
> > > waited on when in writeback. There are some cases where the wait is
> > > desirable. For example, for the sync_file_range() syscall, it is fine to
> > > wait on the writeback since the caller passes in a fd for the operation.
> >
> > Looks good, thanks.
> >
> > Acked-by: Miklos Szeredi <mszeredi@redhat.com>
> >
> > I think this should go via the mm tree.
>
> Andrew, can you please pick this series up or Joanne can send an updated
> version with all Acks/Review tag collected? Let us know what you prefer.
>
Hi Andrew,
Could you let us know your preference or if there's anything else you
need from us to proceed?
Thanks,
Joanne
> Thanks,
> Shakeel
next prev parent reply other threads:[~2024-12-18 17:37 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-22 23:23 [PATCH v6 0/5] fuse: remove temp page copies in writeback Joanne Koong
2024-11-22 23:23 ` [PATCH v6 1/5] mm: add AS_WRITEBACK_INDETERMINATE mapping flag Joanne Koong
2024-11-22 23:23 ` [PATCH v6 2/5] mm: skip reclaiming folios in legacy memcg writeback indeterminate contexts Joanne Koong
2024-11-22 23:23 ` [PATCH v6 3/5] fs/writeback: in wait_sb_inodes(), skip wait for AS_WRITEBACK_INDETERMINATE mappings Joanne Koong
2024-11-22 23:23 ` [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with " Joanne Koong
2024-12-19 13:05 ` David Hildenbrand
2024-12-19 14:19 ` Zi Yan
2024-12-19 15:08 ` Zi Yan
2024-12-19 15:39 ` David Hildenbrand
2024-12-19 15:47 ` Zi Yan
2024-12-19 15:50 ` David Hildenbrand
2024-12-19 15:43 ` Shakeel Butt
2024-12-19 15:47 ` David Hildenbrand
2024-12-19 15:53 ` Shakeel Butt
2024-12-19 15:55 ` Zi Yan
2024-12-19 15:56 ` Bernd Schubert
2024-12-19 16:00 ` Zi Yan
2024-12-19 16:02 ` Zi Yan
2024-12-19 16:09 ` Bernd Schubert
2024-12-19 16:14 ` Zi Yan
2024-12-19 16:26 ` Shakeel Butt
2024-12-19 16:31 ` David Hildenbrand
2024-12-19 16:53 ` Shakeel Butt
2024-12-19 16:22 ` Shakeel Butt
2024-12-19 16:29 ` David Hildenbrand
2024-12-19 16:40 ` Shakeel Butt
2024-12-19 16:41 ` David Hildenbrand
2024-12-19 17:14 ` Shakeel Butt
2024-12-19 17:26 ` David Hildenbrand
2024-12-19 17:30 ` Bernd Schubert
2024-12-19 17:37 ` Shakeel Butt
2024-12-19 17:40 ` Bernd Schubert
2024-12-19 17:44 ` Joanne Koong
2024-12-19 17:54 ` Shakeel Butt
2024-12-20 11:44 ` David Hildenbrand
2024-12-20 12:15 ` Bernd Schubert
2024-12-20 14:49 ` David Hildenbrand
2024-12-20 15:26 ` Bernd Schubert
2024-12-20 18:01 ` Shakeel Butt
2024-12-21 2:28 ` Jingbo Xu
2024-12-21 16:23 ` David Hildenbrand
2024-12-22 2:47 ` Jingbo Xu
2024-12-24 11:32 ` David Hildenbrand
2024-12-21 16:18 ` David Hildenbrand
2024-12-23 22:14 ` Shakeel Butt
2024-12-24 12:37 ` David Hildenbrand
2024-12-26 15:11 ` Zi Yan
2024-12-26 20:13 ` Shakeel Butt
2024-12-26 22:02 ` Bernd Schubert
2024-12-27 20:08 ` Joanne Koong
2024-12-27 20:32 ` Bernd Schubert
2024-12-30 17:52 ` Joanne Koong
2024-12-30 10:16 ` David Hildenbrand
2024-12-30 18:38 ` Joanne Koong
2024-12-30 19:52 ` David Hildenbrand
2024-12-30 20:11 ` Shakeel Butt
2025-01-02 18:54 ` Joanne Koong
2025-01-03 20:31 ` David Hildenbrand
2025-01-06 10:19 ` Miklos Szeredi
2025-01-06 18:17 ` Shakeel Butt
2025-01-07 8:34 ` David Hildenbrand
2025-01-07 18:07 ` Shakeel Butt
2025-01-09 11:22 ` David Hildenbrand
2025-01-10 20:28 ` Jeff Layton
2025-01-10 21:13 ` David Hildenbrand
2025-01-10 22:00 ` Shakeel Butt
2025-01-13 15:27 ` David Hildenbrand
2025-01-13 21:44 ` Jeff Layton
2025-01-14 8:38 ` Miklos Szeredi
2025-01-14 9:40 ` Miklos Szeredi
2025-01-14 9:55 ` Bernd Schubert
2025-01-14 10:07 ` Miklos Szeredi
2025-01-14 18:07 ` Joanne Koong
2025-01-14 18:58 ` Miklos Szeredi
2025-01-14 19:12 ` Joanne Koong
2025-01-14 20:00 ` Miklos Szeredi
2025-01-14 20:29 ` Jeff Layton
2025-01-14 21:40 ` Bernd Schubert
2025-01-23 16:06 ` Pavel Begunkov
2025-01-14 20:51 ` Joanne Koong
2025-01-24 12:25 ` David Hildenbrand
2025-01-14 15:49 ` Jeff Layton
2025-01-24 12:29 ` David Hildenbrand
2025-01-28 10:16 ` Miklos Szeredi
2025-01-14 15:44 ` Jeff Layton
2025-01-14 18:58 ` Joanne Koong
2025-01-10 23:11 ` Jeff Layton
2025-01-10 20:16 ` Jeff Layton
2025-01-10 20:20 ` David Hildenbrand
2025-01-10 20:43 ` Jeff Layton
2025-01-10 21:00 ` David Hildenbrand
2025-01-10 21:07 ` Jeff Layton
2025-01-10 21:21 ` David Hildenbrand
2025-01-07 16:15 ` Miklos Szeredi
2025-01-08 1:40 ` Jingbo Xu
2024-12-30 20:04 ` Shakeel Butt
2025-01-02 19:59 ` Joanne Koong
2025-01-02 20:26 ` Zi Yan
2024-12-20 21:01 ` Joanne Koong
2024-12-21 16:25 ` David Hildenbrand
2024-12-21 21:59 ` Bernd Schubert
2024-12-23 19:00 ` Joanne Koong
2024-12-26 22:44 ` Bernd Schubert
2024-12-27 18:25 ` Joanne Koong
2024-12-19 17:55 ` Joanne Koong
2024-12-19 18:04 ` Bernd Schubert
2024-12-19 18:11 ` Shakeel Butt
2024-12-20 7:55 ` Jingbo Xu
2025-04-02 21:34 ` Joanne Koong
2025-04-03 3:31 ` Jingbo Xu
2025-04-03 9:18 ` David Hildenbrand
2025-04-03 9:25 ` Bernd Schubert
2025-04-03 9:35 ` Christian Brauner
2025-04-03 19:09 ` Joanne Koong
2025-04-03 20:44 ` David Hildenbrand
2025-04-03 22:04 ` Joanne Koong
2024-11-22 23:23 ` [PATCH v6 5/5] fuse: remove tmp folio for writebacks and internal rb tree Joanne Koong
2024-11-25 9:46 ` Jingbo Xu
2024-12-12 21:55 ` [PATCH v6 0/5] fuse: remove temp page copies in writeback Joanne Koong
2024-12-13 11:52 ` Miklos Szeredi
2024-12-13 16:47 ` Shakeel Butt
2024-12-18 17:37 ` Joanne Koong [this message]
2024-12-18 17:44 ` Shakeel Butt
2024-12-18 17:53 ` Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJnrk1ZHk6BnAWFBhw_rdq1UudgNjBf9r9Eg+VORxuPp48JOPw@mail.gmail.com \
--to=joannelkoong@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=bernd.schubert@fastmail.fm \
--cc=jefflexu@linux.alibaba.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@meta.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=miklos@szeredi.hu \
--cc=shakeel.butt@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).