Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 1/2] btrfs: do not clear page dirty inside extent_write_locked_range()
Date: Fri, 10 May 2024 10:25:25 -0400	[thread overview]
Message-ID: <20240510142525.GA90506@perftesting> (raw)
In-Reply-To: <ebf001731e2ebafd5c2a435a7e0848634a421ed7.1709677986.git.wqu@suse.com>

On Wed, Mar 06, 2024 at 09:05:33AM +1030, Qu Wenruo wrote:
> [BUG]
> For subpage + zoned case, btrfs can easily hang with the following
> workload, even with previous subpage delalloc rework:
> 
>  # mkfs.btrfs -s 4k -f $dev
>  # mount $dev $mnt
>  # xfs_io -f -c "pwrite 32k 128k" $mnt/foobar
>  # umount $mnt
> 
> The system would hang at unmount due to unfinished ordered extents.
> 
> Above $dev is a tcmu-runner emulated zoned HDD, which has a max zone
> append size of 64K, and the system has 64K page size.
> 

This whole explanation confuses me, because in the normal case we clear the
whole page dirty before we enter into __extent_writepage_io(), so what we're
doing here is no different than the regular case, so I don't get how this is a
problem?

> [CAUSE]
> There is a bug involved in extent_write_locked_range() (well, I'm
> already surprised by how many subpage incompatible code are inside that
> function):
> 
> - If @pages_dirty is true, we will clear the page dirty flag for the
>   whole page
> 
>   This means, for above case, since the max zone append size is 64K,
>   we got an ordered extent sized 64K, resulting the following writeback
>   range:
> 
>   0               64K                 128K            192K
>   |       |///////|///////////////////|/////////|
>           32K               96K
>            \       OE      /
> 
>   |///| = subpage dirty range
> 
>   And when we go into the __extent_writepage_io() call to submit [32K,
>   64K), extent_write_locked_range() would find it's the locked page, and
>   not clear its page dirty flag, so the submission go without any
>   problem.
> 
>   But when we go into the [64K, 96K) range for the second half of the
>   ordered extent, extent_write_locked_range() would clear the page dirty
>   flag for the whole page range [64K, 128K), resulting the following
>   layout:
> 
>   0               64K                 128K            192K
>   |       |///////|         |         |/////////|
>           32K               96K
>            \       OE      /

Huh?  We come into extent_write_locked_range(), the first page is the locked
page so we don't clear dirty, because it's already been cleared dirty by the
original start from extent_write_cache_pages() right?  So we would have

Actual page
[0, 64k) !PageDirty()
[64k, 128k) PageDirty()

Subpage
[0, 64k) not dirty
[64k, 160k) dirty

> 
>   Then inside __extent_writepage_io(), since the page is no longer
>   dirty, we skip the submission, causing only half of the ordered extent
>   can be finished, thus hanging the whole system.

Huh? We _always_ call __extent_writepage_io() with the actual page not dirty, so
how are we skipping the submission?  We only clear the subpage range for the
part we actually submitted.

This needs a better explanation, because I'm super confused about how the page
state matters here when the main path does exactly this.  Thanks,

Josef

  reply	other threads:[~2024-05-10 14:25 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05 22:35 [PATCH v2 0/2] btrfs: fix data corruption/hang/rsv leak in subpage zoned cases Qu Wenruo
2024-03-05 22:35 ` [PATCH v2 1/2] btrfs: do not clear page dirty inside extent_write_locked_range() Qu Wenruo
2024-05-10 14:25   ` Josef Bacik [this message]
2024-05-12  8:56     ` Qu Wenruo
2024-03-05 22:35 ` [PATCH v2 2/2] btrfs: make extent_write_locked_range() to handle subpage writeback correctly Qu Wenruo
2024-05-10 14:25   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240510142525.GA90506@perftesting \
    --to=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox