From: Filipe Manana <fdmanana@kernel.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>,
Filipe David Borba Manana <fdmanana@suse.com>
Subject: Re: [PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent
Date: Wed, 24 Oct 2018 10:07:02 +0100 [thread overview]
Message-ID: <CAL3q7H43pT85PG3dEvNKP5W1puPxfS3xY4PL0-zJ7Ku3RBUWxQ@mail.gmail.com> (raw)
In-Reply-To: <20181024040228.3gpipdaohezdsrm2@MacBook-Pro-91.local>
On Wed, Oct 24, 2018 at 5:08 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Mon, Oct 22, 2018 at 11:05:08PM +0100, Filipe Manana wrote:
> > On Mon, Oct 22, 2018 at 8:18 PM Josef Bacik <josef@toxicpanda.com> wrote:
> > >
> > > On Mon, Oct 22, 2018 at 08:10:37PM +0100, fdmanana@kernel.org wrote:
> > > > From: Filipe Manana <fdmanana@suse.com>
> > > >
> > > > When we are writing out a free space cache, during the transaction commit
> > > > phase, we can end up in a deadlock which results in a stack trace like the
> > > > following:
> > > >
> > > > schedule+0x28/0x80
> > > > btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> > > > ? finish_wait+0x80/0x80
> > > > btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> > > > btrfs_search_slot+0xf6/0x9f0 [btrfs]
> > > > ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> > > > ? inode_insert5+0x119/0x190
> > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > > ? kmem_cache_alloc+0x166/0x1d0
> > > > btrfs_iget+0x113/0x690 [btrfs]
> > > > __lookup_free_space_inode+0xd8/0x150 [btrfs]
> > > > lookup_free_space_inode+0x5b/0xb0 [btrfs]
> > > > load_free_space_cache+0x7c/0x170 [btrfs]
> > > > ? cache_block_group+0x72/0x3b0 [btrfs]
> > > > cache_block_group+0x1b3/0x3b0 [btrfs]
> > > > ? finish_wait+0x80/0x80
> > > > find_free_extent+0x799/0x1010 [btrfs]
> > > > btrfs_reserve_extent+0x9b/0x180 [btrfs]
> > > > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> > > > __btrfs_cow_block+0x11d/0x500 [btrfs]
> > > > btrfs_cow_block+0xdc/0x180 [btrfs]
> > > > btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > > ? kmem_cache_alloc+0x166/0x1d0
> > > > btrfs_update_inode_item+0x46/0x100 [btrfs]
> > > > cache_save_setup+0xe4/0x3a0 [btrfs]
> > > > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> > > > btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> > > >
> > > > At cache_save_setup() we need to update the inode item of a block group's
> > > > cache which is located in the tree root (fs_info->tree_root), which means
> > > > that it may result in COWing a leaf from that tree. If that happens we
> > > > need to find a free metadata extent and while looking for one, if we find
> > > > a block group which was not cached yet we attempt to load its cache by
> > > > calling cache_block_group(). However this function will try to load the
> > > > inode of the free space cache, which requires finding the matching inode
> > > > item in the tree root - if that inode item is located in the same leaf as
> > > > the inode item of the space cache we are updating at cache_save_setup(),
> > > > we end up in a deadlock, since we try to obtain a read lock on the same
> > > > extent buffer that we previously write locked.
> > > >
> > > > So fix this by skipping the loading of free space caches of any block
> > > > groups that are not yet cached (rare cases) if we are COWing an extent
> > > > buffer from the root tree and space caching is enabled (-o space_cache
> > > > mount option). This is a rare case and its downside is failure to
> > > > find a free extent (return -ENOSPC) when all the already cached block
> > > > groups have no free extents.
> > > >
> > > > Reported-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > > > Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
> > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > > > Tested-by: Andrew Nelson <andrew.s.nelson@gmail.com>
> > > > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > >
> > > Great, thanks,
> > >
> > > Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> >
> > So this makes many fstests occasionally fail with aborted transaction
> > due to ENOSPC.
> > It's late and I haven't verified yet, but I suppose this is because we
> > always skip loading the cache regardless of currently being COWing an
> > existing leaf or allocating a new one (growing the tree).
> > Needs to be fixed.
> >
>
> How about we just use path->search_commit_root? If we're loading the cache we
> just want the last committed version, it's not like we read it after we've
> written it. Then we can go back to business as usual. Thanks,
Yeah, that works. It was an idea before sending v1 but it felt a bit
dirty at the time.
Left fstests running overnight using the commit root approach and
everything seems fine. Sending a v4.
Another alternative, which would solve similar problems for any tree,
would be to allow getting a read lock on an
already (spin) write locked eb, just like we do for blocking write
locks: https://friendpaste.com/6XrGXb5p0RSJGixUFZ8lCt
thanks
>
> Josef
next prev parent reply other threads:[~2018-10-24 9:07 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-22 9:09 [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent fdmanana
2018-10-22 9:53 ` Filipe Manana
2018-10-22 9:54 ` Filipe Manana
2018-10-22 18:07 ` Josef Bacik
2018-10-22 18:51 ` Filipe Manana
2018-10-22 18:48 ` [PATCH v2] " fdmanana
2018-10-22 18:55 ` Josef Bacik
2018-10-22 19:10 ` Filipe Manana
2018-10-22 19:10 ` [PATCH v3] " fdmanana
2018-10-22 19:18 ` Josef Bacik
2018-10-22 22:05 ` Filipe Manana
2018-10-24 4:08 ` Josef Bacik
2018-10-24 9:07 ` Filipe Manana [this message]
2018-10-24 9:13 ` [PATCH v4] " fdmanana
2018-10-24 11:37 ` Josef Bacik
2018-10-24 11:53 ` Filipe Manana
2018-10-24 12:40 ` Josef Bacik
2018-10-24 12:48 ` Filipe Manana
2018-10-24 23:58 ` Josef Bacik
2018-11-05 16:28 ` David Sterba
2018-11-05 16:30 ` Filipe Manana
2018-11-05 16:34 ` David Sterba
2018-11-05 16:36 ` Filipe Manana
2018-11-13 14:33 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAL3q7H43pT85PG3dEvNKP5W1puPxfS3xY4PL0-zJ7Ku3RBUWxQ@mail.gmail.com \
--to=fdmanana@kernel.org \
--cc=fdmanana@suse.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).