All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 1/2] btrfs: do not clear page dirty inside extent_write_locked_range()
Date: Fri, 10 May 2024 10:25:25 -0400	[thread overview]
Message-ID: <20240510142525.GA90506@perftesting> (raw)
In-Reply-To: <ebf001731e2ebafd5c2a435a7e0848634a421ed7.1709677986.git.wqu@suse.com>

On Wed, Mar 06, 2024 at 09:05:33AM +1030, Qu Wenruo wrote:
> [BUG]
> For subpage + zoned case, btrfs can easily hang with the following
> workload, even with previous subpage delalloc rework:
> 
>  # mkfs.btrfs -s 4k -f $dev
>  # mount $dev $mnt
>  # xfs_io -f -c "pwrite 32k 128k" $mnt/foobar
>  # umount $mnt
> 
> The system would hang at unmount due to unfinished ordered extents.
> 
> Above $dev is a tcmu-runner emulated zoned HDD, which has a max zone
> append size of 64K, and the system has 64K page size.
> 

This whole explanation confuses me, because in the normal case we clear the
whole page dirty before we enter into __extent_writepage_io(), so what we're
doing here is no different than the regular case, so I don't get how this is a
problem?

> [CAUSE]
> There is a bug involved in extent_write_locked_range() (well, I'm
> already surprised by how many subpage incompatible code are inside that
> function):
> 
> - If @pages_dirty is true, we will clear the page dirty flag for the
>   whole page
> 
>   This means, for above case, since the max zone append size is 64K,
>   we got an ordered extent sized 64K, resulting the following writeback
>   range:
> 
>   0               64K                 128K            192K
>   |       |///////|///////////////////|/////////|
>           32K               96K
>            \       OE      /
> 
>   |///| = subpage dirty range
> 
>   And when we go into the __extent_writepage_io() call to submit [32K,
>   64K), extent_write_locked_range() would find it's the locked page, and
>   not clear its page dirty flag, so the submission go without any
>   problem.
> 
>   But when we go into the [64K, 96K) range for the second half of the
>   ordered extent, extent_write_locked_range() would clear the page dirty
>   flag for the whole page range [64K, 128K), resulting the following
>   layout:
> 
>   0               64K                 128K            192K
>   |       |///////|         |         |/////////|
>           32K               96K
>            \       OE      /

Huh?  We come into extent_write_locked_range(), the first page is the locked
page so we don't clear dirty, because it's already been cleared dirty by the
original start from extent_write_cache_pages() right?  So we would have

Actual page
[0, 64k) !PageDirty()
[64k, 128k) PageDirty()

Subpage
[0, 64k) not dirty
[64k, 160k) dirty

> 
>   Then inside __extent_writepage_io(), since the page is no longer
>   dirty, we skip the submission, causing only half of the ordered extent
>   can be finished, thus hanging the whole system.

Huh? We _always_ call __extent_writepage_io() with the actual page not dirty, so
how are we skipping the submission?  We only clear the subpage range for the
part we actually submitted.

This needs a better explanation, because I'm super confused about how the page
state matters here when the main path does exactly this.  Thanks,

Josef

  reply	other threads:[~2024-05-10 14:25 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05 22:35 [PATCH v2 0/2] btrfs: fix data corruption/hang/rsv leak in subpage zoned cases Qu Wenruo
2024-03-05 22:35 ` [PATCH v2 1/2] btrfs: do not clear page dirty inside extent_write_locked_range() Qu Wenruo
2024-05-10 14:25   ` Josef Bacik [this message]
2024-05-12  8:56     ` Qu Wenruo
2024-03-05 22:35 ` [PATCH v2 2/2] btrfs: make extent_write_locked_range() to handle subpage writeback correctly Qu Wenruo
2024-05-10 14:25   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240510142525.GA90506@perftesting \
    --to=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.