linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@gmail.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>,
	kernel-team@fb.com, Josef Bacik <jbacik@fb.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH 2/2] btrfs: run delayed items before dropping the snapshot
Date: Fri, 30 Nov 2018 21:15:27 +0000	[thread overview]
Message-ID: <CAL3q7H6jrtTC7+m8=_DG_GcmajO-bc_1cVhzMu3vthCRXutmQw@mail.gmail.com> (raw)
In-Reply-To: <CAL3q7H7N8ozNzhkTmt=s30SLaH-TptHZNv3b_Wv_Tif_c_nJVg@mail.gmail.com>

On Fri, Nov 30, 2018 at 5:12 PM Filipe Manana <fdmanana@gmail.com> wrote:
>
> On Fri, Nov 30, 2018 at 4:53 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > From: Josef Bacik <jbacik@fb.com>
> >
> > With my delayed refs patches in place we started seeing a large amount
> > of aborts in __btrfs_free_extent
> >
> > BTRFS error (device sdb1): unable to find ref byte nr 91947008 parent 0 root 35964  owner 1 offset 0
> > Call Trace:
> >  ? btrfs_merge_delayed_refs+0xaf/0x340
> >  __btrfs_run_delayed_refs+0x6ea/0xfc0
> >  ? btrfs_set_path_blocking+0x31/0x60
> >  btrfs_run_delayed_refs+0xeb/0x180
> >  btrfs_commit_transaction+0x179/0x7f0
> >  ? btrfs_check_space_for_delayed_refs+0x30/0x50
> >  ? should_end_transaction.isra.19+0xe/0x40
> >  btrfs_drop_snapshot+0x41c/0x7c0
> >  btrfs_clean_one_deleted_snapshot+0xb5/0xd0
> >  cleaner_kthread+0xf6/0x120
> >  kthread+0xf8/0x130
> >  ? btree_invalidatepage+0x90/0x90
> >  ? kthread_bind+0x10/0x10
> >  ret_from_fork+0x35/0x40
> >
> > This was because btrfs_drop_snapshot depends on the root not being modified
> > while it's dropping the snapshot.  It will unlock the root node (and really
> > every node) as it walks down the tree, only to re-lock it when it needs to do
> > something.  This is a problem because if we modify the tree we could cow a block
> > in our path, which free's our reference to that block.  Then once we get back to
> > that shared block we'll free our reference to it again, and get ENOENT when
> > trying to lookup our extent reference to that block in __btrfs_free_extent.
> >
> > This is ultimately happening because we have delayed items left to be processed
> > for our deleted snapshot _after_ all of the inodes are closed for the snapshot.
> > We only run the delayed inode item if we're deleting the inode, and even then we
> > do not run the delayed insertions or delayed removals.  These can be run at any
> > point after our final inode does it's last iput, which is what triggers the
> > snapshot deletion.  We can end up with the snapshot deletion happening and then
> > have the delayed items run on that file system, resulting in the above problem.
> >
> > This problem has existed forever, however my patches made it much easier to hit
> > as I wake up the cleaner much more often to deal with delayed iputs, which made
> > us more likely to start the snapshot dropping work before the transaction
> > commits, which is when the delayed items would generally be run.  Before,
> > generally speaking, we would run the delayed items, commit the transaction, and
> > wakeup the cleaner thread to start deleting snapshots, which means we were less
> > likely to hit this problem.  You could still hit it if you had multiple
> > snapshots to be deleted and ended up with lots of delayed items, but it was
> > definitely harder.
> >
> > Fix for now by simply running all the delayed items before starting to drop the
> > snapshot.  We could make this smarter in the future by making the delayed items
> > per-root, and then simply drop any delayed items for roots that we are going to
> > delete.  But for now just a quick and easy solution is the safest.
> >
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
>
> Great catch!
> I've hit this error from  __btrfs_free_extent() a handful of times
> over the years, but never managed
> to reproduce it on demand or figure out it was related to snapshot deletion.
>
> > ---
> >  fs/btrfs/extent-tree.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index dcb699dd57f3..965702034b22 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -9330,6 +9330,8 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
> >                 goto out_free;
> >         }
> >
> > +       btrfs_run_delayed_items(trans);
> > +

Btw, we should check the return value of this and return it if it's an error?
We can't do nothing with it in the context of the cleaner thread,
which is why, I suppose, you chose to ignore the value (besides that
the error might have been for some other root).
But this can be used in the context of relocation, where we can return
the error back to userspace.

Thanks.

> >         if (block_rsv)
> >                 trans->block_rsv = block_rsv;
> >
> > --
> > 2.14.3
> >
>
>
> --
> Filipe David Manana,
>
> “Whether you think you can, or you think you can't — you're right.”



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

      reply	other threads:[~2018-11-30 21:15 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-30 16:52 [PATCH 0/2] Fix aborts when dropping snapshots Josef Bacik
2018-11-30 16:52 ` [PATCH 1/2] btrfs: catch cow on deleting snapshots Josef Bacik
2018-11-30 17:14   ` Filipe Manana
2018-11-30 17:19     ` Josef Bacik
2018-12-06 18:05       ` David Sterba
2018-12-01  0:19   ` Qu Wenruo
2018-12-01  3:02     ` Hans van Kranenburg
2018-11-30 16:52 ` [PATCH 2/2] btrfs: run delayed items before dropping the snapshot Josef Bacik
2018-11-30 17:12   ` Filipe Manana
2018-11-30 21:15     ` Filipe Manana [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL3q7H6jrtTC7+m8=_DG_GcmajO-bc_1cVhzMu3vthCRXutmQw@mail.gmail.com' \
    --to=fdmanana@gmail.com \
    --cc=jbacik@fb.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).