linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: dsterba@suse.cz, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: Correctly free extent buffer in case btree_read_extent_buffer_pages fails
Date: Thu, 14 Mar 2019 09:10:42 +0200	[thread overview]
Message-ID: <6581c72e-7a26-2cbe-2bc6-d3bae313a255@suse.com> (raw)
In-Reply-To: <20190313180314.GH31119@twin.jikos.cz>



On 13.03.19 г. 20:03 ч., David Sterba wrote:
> On Tue, Mar 12, 2019 at 03:18:47PM +0200, Nikolay Borisov wrote:
>> If a an eb fails to be read for whatever reason - it's corrupted on disk
>> and parent transid/key validations fail or IO for eb pages fail then
>> this buffer must be removed from the buffer cache. Currently the code
>> calls free_extent_buffer if an error occurs. Unfortunately this doesn't
>> achieve the desired behavior since btrfs_find_create_tree_block returns
>> with eb->refs == 2. On the other hand free_extent_buffer will only
>> decrement the refs once leavin it added to the buffer cache radix tree.
>> This enables later code to look up the buffer from the cache and utilize
>> it potentially leading to a crash.
>>
>> The correct way to free the buffer is call free_extent_buffer_stale.
>> This function will correctly call atomic_dec explicitly for the buffer
>> and subsequently call release_extent_buffer which will decrement the
>> final reference thus correctly remove the invalid buffer from buffer
>> cache. This change affects only newly allocated buffers since they have
>> eb->refs == 2.
> 
> Following the same pattern, should every use of
> btrfs_find_create_tree_block be followed by free_extent_buffer_stale in
> case of errors?
> 
> There's almost the same sequence in readahead_tree_block as you update
> in reada_tree_block_flagged:
> 
>   btrfs_find_create_tree_block
>   read_extent_buffer_pages
>   free_extent_buffer
> 
> so should be the _stale variant used there too?

You are right it should also be using the _stale variant. I will fix it.

Despite my cleanup of the extent buffer freeing code I'm still not
entirely happy with the special cases we have with the stale or unmapped
logic. That's why we need all these special cases ;\

> 
> free_extent_buffer_stale is normally used in other contexts where the
> 'stale' part means that the eb is known to be disconnected.
> 
> I think that btrfs_find_create_tree_block needs a dedicated freeing
> function that will make sure all invariants (refs == 2, no other
> structures connected) are met and then do effectively the same what
> _stale does now.
Sounds good though I'm a bit reluctant to add yet another
free_extent_buffer variant. IMO the logic of the extent buffer radix
tree need to be revised which could lead to simplifications in the whole
ref counting code.

> 

  reply	other threads:[~2019-03-14  7:10 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-12 13:18 [PATCH] btrfs: Correctly free extent buffer in case btree_read_extent_buffer_pages fails Nikolay Borisov
2019-03-13 18:03 ` David Sterba
2019-03-14  7:10   ` Nikolay Borisov [this message]
2019-03-18 19:01     ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6581c72e-7a26-2cbe-2bc6-d3bae313a255@suse.com \
    --to=nborisov@suse.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).