From: David Sterba <dsterba@suse.cz>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: use FGP_STABLE to wait for folio writeback
Date: Tue, 15 Oct 2024 02:31:49 +0200 [thread overview]
Message-ID: <20241015003148.GG1609@suse.cz> (raw)
In-Reply-To: <8ef15f84-6523-4e47-beda-fa440128df0e@gmx.com>
On Tue, Oct 15, 2024 at 07:25:20AM +1030, Qu Wenruo wrote:
> 在 2024/10/15 00:46, David Sterba 写道:
> > On Mon, Oct 14, 2024 at 03:06:31PM +1030, Qu Wenruo wrote:
> >> __filemap_get_folio() provides the flag FGP_STABLE to wait for
> >> writeback.
> >>
> >> There are two call sites doing __filemap_get_folio() then
> >> folio_wait_writeback():
> >>
> >> - btrfs_truncate_block()
> >> - defrag_prepare_one_folio()
> >>
> >> We can directly utilize that flag instead of manually calling
> >> folio_wait_writeback().
> >
> > We can do that but I'm missing a justification for that. The explicit
> > writeback calls are done at a different points than what FGP_STABLE
> > does. So what's the difference?
> >
>
> TL;DR, it's not safe to read folio before waiting for writeback in theory.
>
> There is a possible race, mostly related to my previous attempt of
> subpage partial uptodate support:
>
> Thread A | Thread B
> -------------------------------+-----------------------------
> extent_writepage_io() |
> |- submit_one_sector() |
> |- folio_set_writeback() |
> The folio is partial dirty|
> and uninvolved sectors are|
> not uptodate |
> | btrfs_truncate_block()
> | |- btrfs_do_readpage()
> | |- submit_one_folio
> | This will read all sectors
> | from disk, but that writeback
> | sector is not yet finished
>
> In this case, we can read out garbage from disk, since the write is not
> yet finished.
>
> This is not yet possible, because we always read out the whole page so
> in that case thread B won't trigger a read.
>
> But this already shows the way we wait for writeback is not safe.
> And that's why no other filesystems manually wait for writeback after
> reading the folio.
>
> Thus I think doing FGP_STABLE is way more safer, and that's why all
> other fses are doing this way instead.
I'm not disputing we need it and I may be missing something, what I
noticed in the patch is maybe a generic pattern, structure read at some
time and then synced/written, but there could be some change in
bettween. One example is one you show (theoretically or not).
The writeback is a kind of synchronization point, but also in parallel
with the data/metadata in page cache. If the state, regarding writeback,
is not safe and we can either get old data or could get partially synced
data (ie. ok in page cache but not regarding writeback) it is a
problematic pattern.
You found two cases, truncate and defrag. Both are different I think,
truncate comes from normal MM operations, while defrag is triggered by
an ioctl (still trying to be in sync with MM).
I'm not sure we can copy what other filesystems do, even if it's just on
the basic principle of COW vs update in place + journaling. We copy the
and do the next update and don't have to care about previous state, so
even a split between read and writeback does no harm.
next prev parent reply other threads:[~2024-10-15 0:31 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-14 4:36 [PATCH] btrfs: use FGP_STABLE to wait for folio writeback Qu Wenruo
2024-10-14 11:19 ` Anand Jain
2024-10-14 14:16 ` David Sterba
2024-10-14 20:55 ` Qu Wenruo
2024-10-14 20:59 ` Qu Wenruo
2024-10-15 0:31 ` David Sterba [this message]
2024-10-15 1:26 ` Qu Wenruo
2024-10-15 4:04 ` Anand Jain
2024-10-15 4:41 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241015003148.GG1609@suse.cz \
--to=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox