public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: dsterba@suse.cz, Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 0/8] btrfs: make subpage handling be feature full
Date: Mon, 3 Mar 2025 09:18:41 +1030	[thread overview]
Message-ID: <b067b88a-4611-41ff-862a-a61c73bf543e@gmx.com> (raw)
In-Reply-To: <20250228124202.GF5777@suse.cz>



在 2025/2/28 23:12, David Sterba 写道:
> On Fri, Feb 28, 2025 at 01:44:04PM +1030, Qu Wenruo wrote:
>> For the 2K one, since it's just two small patches I'm also fine pushing
>> them now.
>> Just do not forget that we need progs patches, and a dozen of known
>> failures from fstests, and I'm not yet able to address them all any time
>> soon.
>
> Yeah, the mkfs support can go to the next minor progs release. About the
> status we can print a warning and document it. No need to focus on
> fixing the fstests, I think stress testing will be sufficient for now.
>

It turns out that this will not be a smooth sailing.

There is a very huge conflicts between our async extent and subpage
handling for writeback.

Our async extent mechanism can mark a folio writeback at any time.
(At submission time we keep the range locked, and let the compression
work happen in another thread).

If we have two async extent inside the same folio, we will have the
following race: (64K page size, 4K fs block size)

    0          32K         64K
    |<- AE 1 ->|<- AE 2 ->|

             Thread A (AE 1)           |            Thread B (AE 2)
--------------------------------------+------------------------------
submit_one_async_extent()             |
|- process_one_folio()                |
    |- subpage_set_writeback()         |
                                       |
/* IO finished */                     |
end_compressed_writeback()            |
|- btrfs_folio_clear_writeback()      |
    |  /* this is the last writeback   |
    |     holder, should end the folio |
    |     writeback flag */            |
    |- last = true                     |
    |                                  | submit_one_async_extent()
    |                                  | |- process_one_folio()
    |                                  |    |- subpage_set_writeback()
    |                                  |
    |                                  | /* IO finished */
    |                                  | end_compressed_writeback()
    |                                  | |- btrfs_folio_clear_writeback()
    |                                  |    | /* Again the last holder */
    |                                  |    |- last = true
    |- folio_end_writeback()           |    |- folio_end_writeback()

This leads to two threads calling folio_end_writeback() on the same folio.

This will eventually lead to VM_BUG_ON_FOLIO() or other problems.

Furthermore we can not rely on the folio->private to do anything after
folio_end_writeback() call.
Because that call may unmap/invalidate the folio.

What's worse is, the iomap's extra writeback accounting won't help.
Iomap will hold one extra writeback count before submitting the blocks
inside the folio, then reduce the writeback count after all blocks have
been marked writeback (submitted).

That solution requires that all the blocks inside the folio to be
submitted and marked writeback at the same time.

But our async extent breaks that requirement completely.


So far I have no better solution, but to disable the block-perfect
compression first, then introduce the same iomap's extra count solution.

The proper solution is not only the iomap solution, but to make the
async extent submission to mark the folios writeback.
That will be quite some work (part of the iomap migration plan).

Thanks,
Qu

      reply	other threads:[~2025-03-02 22:48 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-27  5:54 [PATCH v2 0/8] btrfs: make subpage handling be feature full Qu Wenruo
2025-02-27  5:54 ` [PATCH v2 1/8] btrfs: prevent inline data extents read from touching blocks beyond its range Qu Wenruo
2025-02-27  5:54 ` [PATCH v2 2/8] btrfs: subpage: do not hold subpage spin lock when clearing folio writeback Qu Wenruo
2025-02-27 12:42   ` Filipe Manana
2025-02-27  5:54 ` [PATCH v2 3/8] btrfs: fix the qgroup data free range for inline data extents Qu Wenruo
2025-02-27  5:54 ` [PATCH v2 4/8] btrfs: introduce a read path dedicated extent lock helper Qu Wenruo
2025-02-27 12:47   ` Filipe Manana
2025-02-27  5:54 ` [PATCH v2 5/8] btrfs: make btrfs_do_readpage() to do block-by-block read Qu Wenruo
2025-02-27 12:48   ` Filipe Manana
2025-02-27  5:54 ` [PATCH v2 6/8] btrfs: allow buffered write to avoid full page read if it's block aligned Qu Wenruo
2025-02-27  5:54 ` [PATCH v2 7/8] btrfs: allow inline data extents creation if block size < page size Qu Wenruo
2025-02-27  5:54 ` [PATCH v2 8/8] btrfs: remove the subpage related warning message Qu Wenruo
2025-02-27 11:16 ` [PATCH v2 0/8] btrfs: make subpage handling be feature full David Sterba
2025-02-28  3:14   ` Qu Wenruo
2025-02-28 12:42     ` David Sterba
2025-03-02 22:48       ` Qu Wenruo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b067b88a-4611-41ff-862a-a61c73bf543e@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox