Re: [PATCH] Btrfs: fix leaf corruption after __btrfs_drop_extents

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Filipe David Manana <fdmanana@gmail.com>
To: Liu Bo <bo.li.liu@oracle.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>, levin.front@gmail.com
Subject: Re: [PATCH] Btrfs: fix leaf corruption after __btrfs_drop_extents
Date: Fri, 6 Jun 2014 13:11:17 +0100	[thread overview]
Message-ID: <CAL3q7H7pYVLzaoLM6Je6dnDoHiACBAA0r_n1DGAWDuQJzJ7n3Q@mail.gmail.com> (raw)
In-Reply-To: <1402035958-30013-1-git-send-email-bo.li.liu@oracle.com>

On Fri, Jun 6, 2014 at 7:25 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> Several reports about leaf corruption has been floating on the list, one of them
> points to __btrfs_drop_extents(), and we find that the leaf becomes corrupted
> after __btrfs_drop_extents(), it's really a rare case but it does exist.
>
> The problem turns out to be btrfs_next_leaf() called in __btrfs_drop_extents().
>
> So in btrfs_next_leaf(), we release the current path to re-search the last key of
> the leaf for locating next leaf, and we've taken it into account that there might
> be balance operations between leafs during this 'unlock and re-lock' dance, so
> we check the path again and advance it if there are now more items available.
> But things are a bit different if that last key happens to be removed and balance
> gets a bigger key as the last one, and btrfs_search_slot will return it with
> ret > 0,

Hi Liu,

That makes a lot of sense outside the context of btrfs_drop_extents().

If I understand you correctly, you're saying that we have a file
extent item that gets deleted after we release the path in
btrfs_next_leaf and before btrfs_search_slot finishes for doing a
lookup for that item.

But who calls btrfs_drop_extents(), is supposed to have locked the
file range in the inode's io tree, right? Isn't the goal of locking
ranges in the io tree to serialize manipulation of file extent items
within a certain range? Unless there's some caller of _drop_extents()
that isn't locking the range in the io tree, I'm not seeing how we can
get the file extent item deleted or updated by another task.

Thanks

> IOW, nothing change in this leaf except the new last key, then we think
> we're okay because there is no more item balanced in, fine, we thinks we can
> go to the next leaf.
>
> However, we should return that bigger key, otherwise we deserve leaf corruption,
> for example, in endio, skipping that key means that __btrfs_drop_extents() thinks
> it has dropped all extent matched the required range and finish_ordered_io can
> safely insert a new extent, but it actually doesn't and ends up a leaf
> corruption.
>
> This takes the special case into account.
>
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/ctree.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 1bcfcdb..bbb256c 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -5736,6 +5736,24 @@ again:
>                 ret = 0;
>                 goto done;
>         }
> +       /*
> +        * So the above check misses one case:
> +        * - after releasing the path above, someone has removed the item that
> +        *   used to be at the very end of the block, and balance between leafs
> +        *   gets another one with bigger key.offset to replace it.
> +        *
> +        * This one should be returned as well, or we can get leaf corruption
> +        * later(esp. in __btrfs_drop_extents()).
> +        *
> +        * And a bit more explanation about this check,
> +        * with ret > 0, the key isn't found, the path points to the slot
> +        * where it should be inserted, so the path->slots[0] item must be the
> +        * bigger one.
> +        */
> +       if (nritems > 0 && ret > 0 && path->slots[0] == nritems - 1) {
> +               ret = 0;
> +               goto done;
> +       }
>
>         while (level < BTRFS_MAX_LEVEL) {
>                 if (!path->nodes[level]) {
> --
> 1.8.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

next prev parent reply	other threads:[~2014-06-06 12:11 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-06  6:25 [PATCH] Btrfs: fix leaf corruption after __btrfs_drop_extents Liu Bo
2014-06-06 12:11 ` Filipe David Manana [this message]
2014-06-08 10:54   ` Liu Bo
2014-06-08 11:11     ` Filipe David Manana
2014-06-08 11:16       ` Filipe David Manana
2014-06-06 14:02 ` Holger Hoffstätte
2014-06-08 11:07   ` Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL3q7H7pYVLzaoLM6Je6dnDoHiACBAA0r_n1DGAWDuQJzJ7n3Q@mail.gmail.com \
    --to=fdmanana@gmail.com \
    --cc=bo.li.liu@oracle.com \
    --cc=levin.front@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).