Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Josef Bacik <jbacik@fusionio.com>
To: Jim Schutt <jaschut@sandia.gov>
Cc: Josef Bacik <JBacik@fusionio.com>, Liu Bo <bo.li.liu@oracle.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] Btrfs: fix a deadlock on chunk mutex
Date: Wed, 30 Jan 2013 10:06:32 -0500	[thread overview]
Message-ID: <20130130150632.GF3660@localhost.localdomain> (raw)
In-Reply-To: <510855AD.2020602@sandia.gov>

On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
> On 01/29/2013 01:04 PM, Josef Bacik wrote:
> > On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
> >> > On 01/28/2013 02:23 PM, Josef Bacik wrote:
> >>> > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
> >>>> > >> Hi Josef,
> >>>> > >>
> >>>> > >> Thanks for the patch - sorry for the long delay in testing...
> >>>> > >>
> >>> > > 
> >>> > > Jim,
> >>> > > 
> >>> > > I've been trying to reason out how this happens, could you do a btrfs fi df on
> >>> > > the filesystem thats giving you trouble so I can see if what I think is
> >>> > > happening is what's actually happening.  Thanks,
> >> > 
> >> > Here's an example, using a slightly different kernel than
> >> > my previous report.  It's your btrfs-next master branch
> >> > (commit 8f139e59d5 "Btrfs: use bit operation for ->fs_state")
> >> > with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).
> >> > 
> >> > 
> >> > Here I'm finding the file system in question:
> >> > 
> >> > # ls -l /dev/mapper | grep dm-93
> >> > lrwxrwxrwx 1 root root       8 Jan 29 11:13 cs53s19p2 -> ../dm-93
> >> > 
> >> > # df -h | grep -A 1 cs53s19p2
> >> > /dev/mapper/cs53s19p2
> >> >                       896G  1.1G  896G   1% /ram/mnt/ceph/data.osd.522
> >> > 
> >> > 
> >> > Here's the info you asked for:
> >> > 
> >> > # btrfs fi df /ram/mnt/ceph/data.osd.522
> >> > Data: total=2.01GB, used=1.00GB
> >> > System: total=4.00MB, used=64.00KB
> >> > Metadata: total=8.00MB, used=7.56MB
> >> > 
> > How big is the disk you are using, and what mount options?  I have a patch to
> > keep the panic from happening and hopefully the abort, could you try this?  I
> > still want to keep the underlying error from happening because it shouldn't be,
> > but no reason I can't fix the error case while you can easily reproduce it :).
> > Thanks,
> > 
> > Josef
> > 
> >>From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
> > From: Josef Bacik <jbacik@fusionio.com>
> > Date: Tue, 29 Jan 2013 15:03:37 -0500
> > Subject: [PATCH] Btrfs: fix chunk allocation error handling
> > 
> > If we error out allocating a dev extent we will have already created the
> > block group and such which will cause problems since the allocator may have
> > tried to allocate out of the block group that no longer exists.  This will
> > cause BUG_ON()'s in the bio submission path.  This also makes a failure to
> > allocate a dev extent a non-abort error, we will just clean up the dev
> > extents we did allocate and exit.  Now if we fail to delete the dev extents
> > we will abort since we can't have half of the dev extents hanging around,
> > but this will make us much less likely to abort.  Thanks,
> > 
> > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> > ---
> 
> Interesting - with your patch applied I triggered the following, just
> bringing up a fresh Ceph filesystem - I didn't even get a chance to
> mount it on my Ceph clients:
> 

Well that makes me a sad panda, but hey it didn't panic this time.  What
workload are you running on this fs/ceph cluster?  Thanks,

Josef

next prev parent reply	other threads:[~2013-01-30 15:06 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-13  1:52 [PATCH] Btrfs: fix a deadlock on chunk mutex Liu Bo
2012-12-18 13:52 ` Josef Bacik
2012-12-18 14:47   ` Liu Bo
2012-12-18 15:40     ` Josef Bacik
2013-01-03 18:44   ` Jim Schutt
2013-01-28 21:23     ` Josef Bacik
2013-01-28 21:58       ` Jim Schutt
2013-01-29  2:30       ` Liu Bo
2013-01-29 13:47         ` Josef Bacik
2013-01-29 13:50           ` Josef Bacik
2013-01-29 16:43             ` David Sterba
2013-01-29 16:52               ` David Sterba
2013-01-29 18:41       ` Jim Schutt
2013-01-29 20:04         ` Josef Bacik
2013-01-29 20:37           ` Jim Schutt
2013-01-29 23:05           ` Jim Schutt
2013-01-30 15:06             ` Josef Bacik [this message]
2013-01-30 15:16             ` Josef Bacik
2013-01-30 16:38             ` Josef Bacik
2013-01-30 21:37               ` Jim Schutt
2013-01-30 21:55                 ` Josef Bacik
2013-01-31 15:33                 ` Josef Bacik
2013-01-31 16:52                   ` Jim Schutt
2014-02-18 15:47   ` Alex Lyakas
2014-02-18 16:06     ` Josef Bacik
2014-02-18 16:24       ` Alex Lyakas
2014-02-18 16:26         ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130130150632.GF3660@localhost.localdomain \
    --to=jbacik@fusionio.com \
    --cc=bo.li.liu@oracle.com \
    --cc=jaschut@sandia.gov \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).