Re: [PATCH] btrfs: check WRITE_ERR when trying to read an extent buffer

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Filipe Manana <fdmanana@kernel.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com,
	Filipe Manana <fdmanana@suse.com>
Subject: Re: [PATCH] btrfs: check WRITE_ERR when trying to read an extent buffer
Date: Tue, 14 Dec 2021 10:32:37 +0000	[thread overview]
Message-ID: <YbhyxZPERYm4DYFU@debian9.Home> (raw)
In-Reply-To: <1676bf652be3e37ef3ae55ed784c8f0ab2ff3f8f.1639423346.git.josef@toxicpanda.com>

On Mon, Dec 13, 2021 at 02:22:33PM -0500, Josef Bacik wrote:
> Filipe reported a hang when we have errors on btrfs.  This turned out to
> be a side-effect of my fix 666cc468424b ("btrfs: clear extent buffer
> uptodate when we fail to write it") which made it so we clear

The commit in Linus' tree is c2e39305299f0118298c2201f6d6cc7d3485f29e.

> EXTENT_BUFFER_UPTODATE on an eb when we fail to write it out.
> 
> Below is a paste of Filipe's analysis he got from using drgn to debug
> the hang
> 
> """
> btree readahead code calls read_extent_buffer_pages(), sets ->io_pages to
> a value while writeback of all pages has not yet completed:
>    --> writeback for the first 3 pages finishes, we clear
>        EXTENT_BUFFER_UPTODATE from eb on the first page when we get an
>        error.
>    --> at this point eb->io_pages is 1 and we cleared Uptodate bit from the
>        first 3 pages
>    --> read_extent_buffer_pages() does not see EXTENT_BUFFER_UPTODATE() so
>        it continues, it's able to lock the pages since we obviously don't
>        hold the pages locked during writeback
>    --> read_extent_buffer_pages() then computes 'num_reads' as 3, and sets
>        eb->io_pages to 3, since only the first page does not have Uptodate
>        bit set at this point
>    --> writeback for the remaining page completes, we ended decrementing
>        eb->io_pages by 1, resulting in eb->io_pages == 2, and therefore
>        never calling end_extent_buffer_writeback(), so
>        EXTENT_BUFFER_WRITEBACK remains in the eb's flags
>    --> of course, when the read bio completes, it doesn't and shouldn't
>        call end_extent_buffer_writeback()
>    --> we should clear EXTENT_BUFFER_UPTODATE only after all pages of
>        the eb finished writeback?  or maybe make the read pages code
>        wait for writeback of all pages of the eb to complete before
>        checking which pages need to be read, touch ->io_pages, submit
>        read bio, etc
> 
> writeback bit never cleared means we can hang when aborting a
> transaction, at:
> 
>     btrfs_cleanup_one_transaction()
>        btrfs_destroy_marked_extents()
>          wait_on_extent_buffer_writeback()
> """
> 
> This is a problem because our writes are not synchronized with reads in
> any way.  We clear the UPTODATE flag and then we can easily come in and
> try to read the EB while we're still waiting on other bio's to
> complete.
> 
> We have two options here, we could lock all the pages, and then check to
> see if eb->io_pages != 0 to know if we've already got an outstanding
> write on the eb.
> 
> Or we can simply check to see if we have WRITE_ERR set on this extent
> buffer.  We set this bit _before_ we clear UPTODATE, so if the read gets
> triggered because we aren't UPTODATE because of a write error we're
> guaranteed to have WRITE_ERR set, and in this case we can simply return
> -EIO.  This will fix the reported hang.
> 
> Reported-by: Filipe Manana <fdmanana@suse.com>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

As this is already in Linus' tree, and it was tagged for stable backports, it should
include:

Fixes: c2e39305299f01 ("btrfs: clear extent buffer uptodate when we fail to write it")

David can probably add that and fix the commit id above.

Other than that, it looks good, thanks.

Reviewed-by: Filipe Manana <fdmanana@suse.com>

> ---
>  fs/btrfs/extent_io.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 762100a00978..38c5e9eb9a10 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -6601,6 +6601,14 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
>  	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
>  		return 0;
>  
> +	/*
> +	 * We could have had EXTENT_BUFFER_UPTODATE cleared by the write
> +	 * operation, which could potentially still be in flight.  In this case
> +	 * we simply want to return an error.
> +	 */
> +	if (unlikely(test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)))
> +		return -EIO;
> +
>  	if (eb->fs_info->sectorsize < PAGE_SIZE)
>  		return read_extent_buffer_subpage(eb, wait, mirror_num);
>  
> -- 
> 2.26.3
>

next prev parent reply	other threads:[~2021-12-14 10:32 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-13 19:22 [PATCH] btrfs: check WRITE_ERR when trying to read an extent buffer Josef Bacik
2021-12-14 10:32 ` Filipe Manana [this message]
2021-12-14 14:19   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YbhyxZPERYm4DYFU@debian9.Home \
    --to=fdmanana@kernel.org \
    --cc=fdmanana@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.