Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Jim Schutt" <jaschut@sandia.gov>
To: "Josef Bacik" <jbacik@fusionio.com>
Cc: "Liu Bo" <bo.li.liu@oracle.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] Btrfs: fix a deadlock on chunk mutex
Date: Tue, 29 Jan 2013 16:05:17 -0700	[thread overview]
Message-ID: <510855AD.2020602@sandia.gov> (raw)
In-Reply-To: <20130129200415.GE3660@localhost.localdomain>

On 01/29/2013 01:04 PM, Josef Bacik wrote:
> On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
>> > On 01/28/2013 02:23 PM, Josef Bacik wrote:
>>> > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
>>>> > >> Hi Josef,
>>>> > >>
>>>> > >> Thanks for the patch - sorry for the long delay in testing...
>>>> > >>
>>> > > 
>>> > > Jim,
>>> > > 
>>> > > I've been trying to reason out how this happens, could you do a btrfs fi df on
>>> > > the filesystem thats giving you trouble so I can see if what I think is
>>> > > happening is what's actually happening.  Thanks,
>> > 
>> > Here's an example, using a slightly different kernel than
>> > my previous report.  It's your btrfs-next master branch
>> > (commit 8f139e59d5 "Btrfs: use bit operation for ->fs_state")
>> > with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).
>> > 
>> > 
>> > Here I'm finding the file system in question:
>> > 
>> > # ls -l /dev/mapper | grep dm-93
>> > lrwxrwxrwx 1 root root       8 Jan 29 11:13 cs53s19p2 -> ../dm-93
>> > 
>> > # df -h | grep -A 1 cs53s19p2
>> > /dev/mapper/cs53s19p2
>> >                       896G  1.1G  896G   1% /ram/mnt/ceph/data.osd.522
>> > 
>> > 
>> > Here's the info you asked for:
>> > 
>> > # btrfs fi df /ram/mnt/ceph/data.osd.522
>> > Data: total=2.01GB, used=1.00GB
>> > System: total=4.00MB, used=64.00KB
>> > Metadata: total=8.00MB, used=7.56MB
>> > 
> How big is the disk you are using, and what mount options?  I have a patch to
> keep the panic from happening and hopefully the abort, could you try this?  I
> still want to keep the underlying error from happening because it shouldn't be,
> but no reason I can't fix the error case while you can easily reproduce it :).
> Thanks,
> 
> Josef
> 
>>From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
> From: Josef Bacik <jbacik@fusionio.com>
> Date: Tue, 29 Jan 2013 15:03:37 -0500
> Subject: [PATCH] Btrfs: fix chunk allocation error handling
> 
> If we error out allocating a dev extent we will have already created the
> block group and such which will cause problems since the allocator may have
> tried to allocate out of the block group that no longer exists.  This will
> cause BUG_ON()'s in the bio submission path.  This also makes a failure to
> allocate a dev extent a non-abort error, we will just clean up the dev
> extents we did allocate and exit.  Now if we fail to delete the dev extents
> we will abort since we can't have half of the dev extents hanging around,
> but this will make us much less likely to abort.  Thanks,
> 
> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> ---

Interesting - with your patch applied I triggered the following, just
bringing up a fresh Ceph filesystem - I didn't even get a chance to
mount it on my Ceph clients:

[ 6419.450179] BTRFS error (device dm-73) in btrfs_free_dev_extent:1115: error 28 (Slot search failed)
[ 6419.459223] btrfs is forced readonly
[ 6419.462805] ------------[ cut here ]------------
[ 6419.467440] WARNING: at fs/btrfs/super.c:256 __btrfs_abort_transaction+0x60/0x110 [btrfs]()
[ 6419.475809] Hardware name: X8DTH-i/6/iF/6F
[ 6419.479914] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput sg joydev sd_mod iTCO_wdt iTCO_vendor_support hid_generic coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul microcode button ata_piix libata mpt2sas scsi_transport_sas raid_class scsi_mod serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core cxgb4 i2c_i801 i2c_core lpc_ich mfd_core uhci_hcd ehci_hcd i7core_edac edac_core ioatdma dm_mod nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc fscache broadcom tg3 hwmon bnx2 igb dca e1000
[ 6419.546095] Pid: 107593, comm: ceph-osd Not tainted 3.7.0-00270-g8353482 #494
[ 6419.553227] Call Trace:
[ 6419.555697]  [<ffffffff8103ff04>] warn_slowpath_common+0x94/0xc0
[ 6419.561708]  [<ffffffff8103ffe6>] warn_slowpath_fmt+0x46/0x50
[ 6419.567491]  [<ffffffffa0542730>] __btrfs_abort_transaction+0x60/0x110 [btrfs]
[ 6419.574746]  [<ffffffffa05980c6>] __btrfs_alloc_chunk+0x6e6/0x770 [btrfs]
[ 6419.581553]  [<ffffffffa05981ae>] btrfs_alloc_chunk+0x5e/0x90 [btrfs]
[ 6419.588017]  [<ffffffffa0554db1>] ? check_system_chunk+0x71/0x130 [btrfs]
[ 6419.594824]  [<ffffffffa055515c>] do_chunk_alloc+0x2ec/0x370 [btrfs]
[ 6419.601188]  [<ffffffffa055e06c>] find_free_extent+0xaac/0xbe0 [btrfs]
[ 6419.607733]  [<ffffffffa055e222>] btrfs_reserve_extent+0x82/0x190 [btrfs]
[ 6419.614545]  [<ffffffffa055e3b5>] btrfs_alloc_free_block+0x85/0x230 [btrfs]
[ 6419.621530]  [<ffffffffa0586e55>] ? check_buffer_tree_ref+0x25/0x50 [btrfs]
[ 6419.628512]  [<ffffffffa0549bca>] __btrfs_cow_block+0x14a/0x4b0 [btrfs]
[ 6419.635155]  [<ffffffffa05a261c>] ? btrfs_try_tree_write_lock+0x3c/0xa0 [btrfs]
[ 6419.642475]  [<ffffffffa05a2c43>] ? btrfs_set_lock_blocking_rw+0xe3/0x160 [btrfs]
[ 6419.649970]  [<ffffffffa054a5b1>] btrfs_cow_block+0x161/0x200 [btrfs]
[ 6419.656424]  [<ffffffffa054d679>] btrfs_search_slot+0x399/0x760 [btrfs]
[ 6419.663050]  [<ffffffffa0573f79>] btrfs_truncate_inode_items+0x179/0x710 [btrfs]
[ 6419.670458]  [<ffffffffa0584ad5>] ? btrfs_add_ordered_operation+0x55/0xb0 [btrfs]
[ 6419.677961]  [<ffffffffa0575fcd>] btrfs_truncate+0x16d/0x2c0 [btrfs]
[ 6419.684328]  [<ffffffffa057a441>] btrfs_setsize+0x151/0x190 [btrfs]
[ 6419.690601]  [<ffffffff8117eb4a>] ? notify_change+0xaa/0x2e0
[ 6419.696274]  [<ffffffffa057a4e6>] btrfs_setattr+0x66/0xd0 [btrfs]
[ 6419.702373]  [<ffffffff8117eca2>] notify_change+0x202/0x2e0
[ 6419.707949]  [<ffffffff81161f5f>] do_truncate+0x6f/0x90
[ 6419.713174]  [<ffffffff811620dd>] do_sys_truncate+0x15d/0x170
[ 6419.718919]  [<ffffffff811620fe>] sys_truncate+0xe/0x10
[ 6419.724139]  [<ffffffff814b7882>] system_call_fastpath+0x16/0x1b
[ 6419.730132] ---[ end trace e480283f0ee28284 ]---
[ 6419.734754] BTRFS warning (device dm-73): __btrfs_alloc_chunk:3803: Aborting unused transaction(error 28).

Here's some data on the btrfs filesystem in question:

# ls -l /dev/mapper | grep dm-73
lrwxrwxrwx 1 root root       8 Jan 29 14:27 cs33s16p2 -> ../dm-73

# df -h | grep -A 1 cs33s16p2
/dev/mapper/cs33s16p2
                      896G  7.8M  896G   1% /ram/mnt/ceph/data.osd.39

# btrfs fi df /ram/mnt/ceph/data.osd.39/
Data: total=8.00MB, used=3.61MB
System: total=4.00MB, used=64.00KB
Metadata: total=8.00MB, used=4.12MB

# cat /proc/mounts | grep osd.39
/dev/mapper/cs33s16p2 /ram/mnt/ceph/data.osd.39 btrfs ro,noatime,nospace_cache 0 0


FWIW, in these tests I'm building a fresh Ceph filesystem with 576 OSDs,
hence 576 different btrfs filesystems, but typically I only have an
issue with one of them per test.

-- Jim

next prev parent reply	other threads:[~2013-01-29 23:05 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-13  1:52 [PATCH] Btrfs: fix a deadlock on chunk mutex Liu Bo
2012-12-18 13:52 ` Josef Bacik
2012-12-18 14:47   ` Liu Bo
2012-12-18 15:40     ` Josef Bacik
2013-01-03 18:44   ` Jim Schutt
2013-01-28 21:23     ` Josef Bacik
2013-01-28 21:58       ` Jim Schutt
2013-01-29  2:30       ` Liu Bo
2013-01-29 13:47         ` Josef Bacik
2013-01-29 13:50           ` Josef Bacik
2013-01-29 16:43             ` David Sterba
2013-01-29 16:52               ` David Sterba
2013-01-29 18:41       ` Jim Schutt
2013-01-29 20:04         ` Josef Bacik
2013-01-29 20:37           ` Jim Schutt
2013-01-29 23:05           ` Jim Schutt [this message]
2013-01-30 15:06             ` Josef Bacik
2013-01-30 15:16             ` Josef Bacik
2013-01-30 16:38             ` Josef Bacik
2013-01-30 21:37               ` Jim Schutt
2013-01-30 21:55                 ` Josef Bacik
2013-01-31 15:33                 ` Josef Bacik
2013-01-31 16:52                   ` Jim Schutt
2014-02-18 15:47   ` Alex Lyakas
2014-02-18 16:06     ` Josef Bacik
2014-02-18 16:24       ` Alex Lyakas
2014-02-18 16:26         ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=510855AD.2020602@sandia.gov \
    --to=jaschut@sandia.gov \
    --cc=bo.li.liu@oracle.com \
    --cc=jbacik@fusionio.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).