From: Josef Bacik <jbacik@fusionio.com>
To: Jim Schutt <jaschut@sandia.gov>
Cc: Josef Bacik <JBacik@fusionio.com>, Liu Bo <bo.li.liu@oracle.com>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] Btrfs: fix a deadlock on chunk mutex
Date: Thu, 31 Jan 2013 10:33:17 -0500 [thread overview]
Message-ID: <20130131153317.GM3660@localhost.localdomain> (raw)
In-Reply-To: <510992A4.4090703@sandia.gov>
On Wed, Jan 30, 2013 at 02:37:40PM -0700, Jim Schutt wrote:
> On 01/30/2013 09:38 AM, Josef Bacik wrote:
> > On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
> >> > On 01/29/2013 01:04 PM, Josef Bacik wrote:
> >>> > > On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
> >>>>> > >> > On 01/28/2013 02:23 PM, Josef Bacik wrote:
> >>>>>>> > >>> > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
> >>>>>>>>> > >>>> > >> Hi Josef,
> >>>>>>>>> > >>>> > >>
> >>>>>>>>> > >>>> > >> Thanks for the patch - sorry for the long delay in testing...
> >>>>>>>>> > >>>> > >>
> >>>>>>> > >>> > >
> >>>>>>> > >>> > > Jim,
> >>>>>>> > >>> > >
> >>>>>>> > >>> > > I've been trying to reason out how this happens, could you do a btrfs fi df on
> >>>>>>> > >>> > > the filesystem thats giving you trouble so I can see if what I think is
> >>>>>>> > >>> > > happening is what's actually happening. Thanks,
> >>>>> > >> >
> >>>>> > >> > Here's an example, using a slightly different kernel than
> >>>>> > >> > my previous report. It's your btrfs-next master branch
> >>>>> > >> > (commit 8f139e59d5 "Btrfs: use bit operation for ->fs_state")
> >>>>> > >> > with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).
> >>>>> > >> >
> >>>>> > >> >
> >>>>> > >> > Here I'm finding the file system in question:
> >>>>> > >> >
> >>>>> > >> > # ls -l /dev/mapper | grep dm-93
> >>>>> > >> > lrwxrwxrwx 1 root root 8 Jan 29 11:13 cs53s19p2 -> ../dm-93
> >>>>> > >> >
> >>>>> > >> > # df -h | grep -A 1 cs53s19p2
> >>>>> > >> > /dev/mapper/cs53s19p2
> >>>>> > >> > 896G 1.1G 896G 1% /ram/mnt/ceph/data.osd.522
> >>>>> > >> >
> >>>>> > >> >
> >>>>> > >> > Here's the info you asked for:
> >>>>> > >> >
> >>>>> > >> > # btrfs fi df /ram/mnt/ceph/data.osd.522
> >>>>> > >> > Data: total=2.01GB, used=1.00GB
> >>>>> > >> > System: total=4.00MB, used=64.00KB
> >>>>> > >> > Metadata: total=8.00MB, used=7.56MB
> >>>>> > >> >
> >>> > > How big is the disk you are using, and what mount options? I have a patch to
> >>> > > keep the panic from happening and hopefully the abort, could you try this? I
> >>> > > still want to keep the underlying error from happening because it shouldn't be,
> >>> > > but no reason I can't fix the error case while you can easily reproduce it :).
> >>> > > Thanks,
> >>> > >
> >>> > > Josef
> >>> > >
> >>> > >>From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
> >>> > > From: Josef Bacik <jbacik@fusionio.com>
> >>> > > Date: Tue, 29 Jan 2013 15:03:37 -0500
> >>> > > Subject: [PATCH] Btrfs: fix chunk allocation error handling
> >>> > >
> >>> > > If we error out allocating a dev extent we will have already created the
> >>> > > block group and such which will cause problems since the allocator may have
> >>> > > tried to allocate out of the block group that no longer exists. This will
> >>> > > cause BUG_ON()'s in the bio submission path. This also makes a failure to
> >>> > > allocate a dev extent a non-abort error, we will just clean up the dev
> >>> > > extents we did allocate and exit. Now if we fail to delete the dev extents
> >>> > > we will abort since we can't have half of the dev extents hanging around,
> >>> > > but this will make us much less likely to abort. Thanks,
> >>> > >
> >>> > > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> >>> > > ---
> >> >
> >> > Interesting - with your patch applied I triggered the following, just
> >> > bringing up a fresh Ceph filesystem - I didn't even get a chance to
> >> > mount it on my Ceph clients:
> >> >
> > Ok can you give this patch a whirl as well? It seems to fix the problem for me.
>
> With this patch on top of your previous patch, after several trials of
> my test I am also unable to reproduce the issue. Since I had been
> having trouble first time, every time, I think it also seems to fix
> the problem for me.
>
Hey Jim,
Could you test this patch instead? I think it's a little less hamfisted and
should give us a nice balance between not crashing and being good for
performance. Thanks,
Josef
commit 43510c0e5faad8e5e4d8ba13baa1dd5dfb3d39ce
Author: Josef Bacik <jbacik@fusionio.com>
Date: Wed Jan 30 17:02:51 2013 -0500
Btrfs: do not allow overcommit to happen if we are over 80% in use
Because of how little we allocate chunks now we can get really tight on
metadata space before we will allocate a new chunk. This resulted in being
unable to add device extents when allocating a new metadata chunk as we did
not have enough space. This is because we were allowed to overcommit too
much metadata without actually making sure we had enough space to make
allocations. The idea behind overcommit is that we are allowed to say "sure
you can have that reservation" when most of the free space is occupied by
reservations, not actual allocations. But in this case where a majority of
the total space is in use by actual allocations we can screw ourselves by
not being able to make real allocations when it matters. So put this cap in
place for now to keep us from overcommitting so much that we run out of
space. Thanks,
Reported-and-tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index dca5679..156341e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3672,13 +3672,30 @@ static int can_overcommit(struct btrfs_root *root,
struct btrfs_space_info *space_info, u64 bytes,
enum btrfs_reserve_flush_enum flush)
{
+ struct btrfs_block_rsv *global_rsv = &root->fs_info->global_block_rsv;
u64 profile = btrfs_get_alloc_profile(root, 0);
+ u64 rsv_size = 0;
u64 avail;
u64 used;
used = space_info->bytes_used + space_info->bytes_reserved +
- space_info->bytes_pinned + space_info->bytes_readonly +
- space_info->bytes_may_use;
+ space_info->bytes_pinned + space_info->bytes_readonly;
+
+ spin_lock(&global_rsv->lock);
+ rsv_size = global_rsv->size;
+ spin_unlock(&global_rsv->lock);
+
+ /*
+ * We only want to allow over committing if we have lots of actual space
+ * free, but if we don't have enough space to handle the global reserve
+ * space then we could end up having a real enospc problem when trying
+ * to allocate a chunk or some other such important allocation.
+ */
+ rsv_size <<= 1;
+ if (used + rsv_size >= space_info->total_bytes)
+ return 0;
+
+ used += space_info->bytes_may_use;
spin_lock(&root->fs_info->free_chunk_lock);
avail = root->fs_info->free_chunk_space;
next prev parent reply other threads:[~2013-01-31 15:33 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-13 1:52 [PATCH] Btrfs: fix a deadlock on chunk mutex Liu Bo
2012-12-18 13:52 ` Josef Bacik
2012-12-18 14:47 ` Liu Bo
2012-12-18 15:40 ` Josef Bacik
2013-01-03 18:44 ` Jim Schutt
2013-01-28 21:23 ` Josef Bacik
2013-01-28 21:58 ` Jim Schutt
2013-01-29 2:30 ` Liu Bo
2013-01-29 13:47 ` Josef Bacik
2013-01-29 13:50 ` Josef Bacik
2013-01-29 16:43 ` David Sterba
2013-01-29 16:52 ` David Sterba
2013-01-29 18:41 ` Jim Schutt
2013-01-29 20:04 ` Josef Bacik
2013-01-29 20:37 ` Jim Schutt
2013-01-29 23:05 ` Jim Schutt
2013-01-30 15:06 ` Josef Bacik
2013-01-30 15:16 ` Josef Bacik
2013-01-30 16:38 ` Josef Bacik
2013-01-30 21:37 ` Jim Schutt
2013-01-30 21:55 ` Josef Bacik
2013-01-31 15:33 ` Josef Bacik [this message]
2013-01-31 16:52 ` Jim Schutt
2014-02-18 15:47 ` Alex Lyakas
2014-02-18 16:06 ` Josef Bacik
2014-02-18 16:24 ` Alex Lyakas
2014-02-18 16:26 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130131153317.GM3660@localhost.localdomain \
--to=jbacik@fusionio.com \
--cc=bo.li.liu@oracle.com \
--cc=jaschut@sandia.gov \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).