From: Josef Bacik <jbacik@fusionio.com>
To: Alex Lyakas <alex.btrfs@zadarastorage.com>
Cc: <dsterba@suse.cz>, linux-btrfs <linux-btrfs@vger.kernel.org>,
Josef Bacik <jbacik@fusionio.com>
Subject: Re: [PATCH v3] btrfs: clean snapshots one by one
Date: Mon, 15 Jul 2013 12:41:29 -0400 [thread overview]
Message-ID: <20130715164129.GB3085@localhost.localdomain> (raw)
In-Reply-To: <CAOcd+r1eSPRb5ysoOTD4xbK59+JVcN4K7n6PkFGWwdZ7cm3aZg@mail.gmail.com>
On Sun, Jul 14, 2013 at 07:20:04PM +0300, Alex Lyakas wrote:
> Hi,
>
> On Thu, Jul 4, 2013 at 10:52 PM, Alex Lyakas
> <alex.btrfs@zadarastorage.com> wrote:
> > Hi David,
> >
> > On Thu, Jul 4, 2013 at 8:03 PM, David Sterba <dsterba@suse.cz> wrote:
> >> On Thu, Jul 04, 2013 at 06:29:23PM +0300, Alex Lyakas wrote:
> >>> > @@ -7363,6 +7365,12 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
> >>> > wc->reada_count = BTRFS_NODEPTRS_PER_BLOCK(root);
> >>> >
> >>> > while (1) {
> >>> > + if (!for_reloc && btrfs_fs_closing(root->fs_info)) {
> >>> > + pr_debug("btrfs: drop snapshot early exit\n");
> >>> > + err = -EAGAIN;
> >>> > + goto out_end_trans;
> >>> > + }
> >>> Here you exit the loop, but the "drop_progress" in the root item is
> >>> incorrect. When the system is remounted, and snapshot deletion
> >>> resumes, it seems that it tries to resume from the EXTENT_ITEM that
> >>> does not exist anymore, and [1] shows that btrfs_lookup_extent_info()
> >>> simply does not find the needed extent.
> >>> So then I hit panic in walk_down_tree():
> >>> BUG: wc->refs[level - 1] == 0
> >>>
> >>> I fixed it like follows:
> >>> There is a place where btrfs_drop_snapshot() checks if it needs to
> >>> detach from transaction and re-attach. So I moved the exit point there
> >>> and the code is like this:
> >>>
> >>> if (btrfs_should_end_transaction(trans, tree_root) ||
> >>> (!for_reloc && btrfs_need_cleaner_sleep(root))) {
> >>> ret = btrfs_update_root(trans, tree_root,
> >>> &root->root_key,
> >>> root_item);
> >>> if (ret) {
> >>> btrfs_abort_transaction(trans, tree_root, ret);
> >>> err = ret;
> >>> goto out_end_trans;
> >>> }
> >>>
> >>> btrfs_end_transaction_throttle(trans, tree_root);
> >>> if (!for_reloc && btrfs_need_cleaner_sleep(root)) {
> >>> err = -EAGAIN;
> >>> goto out_free;
> >>> }
> >>> trans = btrfs_start_transaction(tree_root, 0);
> >>> ...
> >>>
> >>> With this fix, I do not hit the panic, and snapshot deletion proceeds
> >>> and completes alright after mount.
> >>>
> >>> Do you agree to my analysis or I am missing something? It seems that
> >>> Josef's btrfs-next still has this issue (as does Chris's for-linus).
> >>
> >> Sound analysis and I agree with the fix. The clean-by-one patch has been
> >> merged into 3.10 so we need a stable fix for that.
> > Thanks for confirming, David!
> >
> > From more testing, I have two more notes:
> >
> > # After applying the fix, whenever snapshot deletion is resumed after
> > mount, and successfully completes, then I unmount again, and rmmod
> > btrfs, linux complains about loosing few "struct extent_buffer" during
> > kem_cache_delete().
> > So somewhere on that path:
> > if (btrfs_disk_key_objectid(&root_item->drop_progress) == 0) {
> > ...
> > } else {
> > ===> HERE
> >
> > and later we perhaps somehow overwrite the contents of "struct
> > btrfs_path" that is used in the whole function. Because at the end of
> > the function we always do btrfs_free_path(), which inside does
> > btrfs_release_path(). I was not able to determine where the leak
> > happens, do you have any hint? No other activity happens in the system
> > except the resumed snap deletion, and this problem only happens when
> > resuming.
> >
> I found where the memory leak happens. When we abort snapshot deletion
> in the middle, then this btrfs_root is basically left alone hanging in
> the air. It is out of the "dead_roots" already, so when del_fs_roots()
> is called during unmount, it will not free this root and its
> root->node (which is the one that triggers memory leak warning on
> kmem_cache_destroy) and perhaps other stuff too. The issue still
> exists in btrfs-next.
>
> Simplest fix I came up with was:
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index d275681..52a2c54 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -7468,6 +7468,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
> int err = 0;
> int ret;
> int level;
> + bool root_freed = false;
>
> path = btrfs_alloc_path();
> if (!path) {
> @@ -7641,6 +7642,8 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
> free_extent_buffer(root->commit_root);
> btrfs_put_fs_root(root);
> }
> + root_freed = true;
> +
> out_end_trans:
> btrfs_end_transaction_throttle(trans, tree_root);
> out_free:
> @@ -7649,6 +7652,18 @@ out_free:
> out:
> if (err)
> btrfs_std_error(root->fs_info, err);
> +
> + /*
> + * If the root was not freed by any reason, this means that FS had
> + * a problem and will probably be unmounted soon.
> + * But we need to put the root back into the 'dead_roots' list,
> + * so that it will be properly freed during unmount.
> + */
> + if (!root_freed) {
> + WARN_ON(err == 0);
> + btrfs_add_dead_root(root);
> + }
> +
> return err;
> }
>
> With this fix, I don't see any memleak warnings (also by enabling
> LEAK_DEBUG) while aborting and resuming snapshot deletion.
>
>
> > # This is for Josef: after I unmount the fs with ongoing snap deletion
> > (after applying my fix), and run the latest btrfsck - it complains a
> > lot about problems in extent tree:( But after I mount again, snap
> > deletion resumes then completes, then I unmount and btrfsck is happy
> > again. So probably it does not account orphan roots properly?
> >
> > David, will you provide a fixed patch, if possible?
> >
>
> Josef, David, I feel that I am not helpful enough by pinpointing the
> problem and suggesting a fix, but not providing actual patch that
> fixes it and can be applied. The reason is that it is difficult for me
> to test the fix thoroughly on the latest upstream kernel (like
> btrfs-next), for reasons I'm sure you understand. So I appreciate if
> you could post these two fixes to the upstream kernel; but otherwise,
> I will try to work and test them on the latest kernel myself.
>
This is perfect, you've given great fixes and great analysis. Since you have an
actual patch for this one please re-send with a Signed-off-by and such so I can
apply it. Thanks,
Josef
prev parent reply other threads:[~2013-07-15 16:41 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-12 15:13 [PATCH v3] btrfs: clean snapshots one by one David Sterba
2013-03-16 19:34 ` Alex Lyakas
2013-05-07 0:41 ` Chris Mason
2013-05-07 11:54 ` David Sterba
2013-05-10 13:04 ` Chris Mason
2013-05-14 6:32 ` Miao Xie
2013-07-04 15:29 ` Alex Lyakas
2013-07-04 17:03 ` David Sterba
2013-07-04 19:52 ` Alex Lyakas
2013-07-05 2:21 ` Josef Bacik
2013-07-14 16:20 ` Alex Lyakas
2013-07-15 16:41 ` Josef Bacik [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130715164129.GB3085@localhost.localdomain \
--to=jbacik@fusionio.com \
--cc=alex.btrfs@zadarastorage.com \
--cc=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).