From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:17443 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752754AbaKZP50 (ORCPT ); Wed, 26 Nov 2014 10:57:26 -0500 Message-ID: <5475F85F.7090705@fb.com> Date: Wed, 26 Nov 2014 10:57:19 -0500 From: Josef Bacik MIME-Version: 1.0 To: Filipe Manana , Subject: Re: [PATCH 2/6] Btrfs: fix crash caused by block group removal References: <1417015735-8581-1-git-send-email-fdmanana@suse.com> <1417015735-8581-3-git-send-email-fdmanana@suse.com> In-Reply-To: <1417015735-8581-3-git-send-email-fdmanana@suse.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 11/26/2014 10:28 AM, Filipe Manana wrote: > If we remove a block group (because it became empty), we might have left > a caching_ctl structure in fs_info->caching_block_groups that points to > the block group and is accessed at transaction commit time. This results > in accessing an invalid or incorrect block group. This issue became visible > after Josef's patch "Btrfs: remove empty block groups automatically". > > So if the block group is removed make sure we don't leave a dangling > caching_ctl in caching_block_groups. > > Sample crash trace: > > [58380.439449] BUG: unable to handle kernel paging request at ffff8801446eaeb8 > [58380.439707] IP: [] block_group_cache_done.isra.21+0xc/0x1c [btrfs] > [58380.440879] PGD 1acb067 PUD 23f5ff067 PMD 23f5db067 PTE 80000001446ea060 > [58380.441220] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC > [58380.441486] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse processor i2c_piix4 parport_pc parport pcspkr serio_raw evdev i2ccore thermal_sys microcode button ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sg sd_mod crc_t10dif crct10dif_generic crct10dif_common virtio_scsi floppy ata_piix e1000 libata virtio_pci scsi_mod virtio_ring virtio [last unloaded: btrfs] > [58380.443238] CPU: 3 PID: 25728 Comm: btrfs-transacti Tainted: G W 3.17.0-rc5-btrfs-next-1+ #1 > [58380.443238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 > [58380.443238] task: ffff88013ac82090 ti: ffff88013896c000 task.ti: ffff88013896c000 > [58380.443238] RIP: 0010:[] [] block_group_cache_done.isra.21+0xc/0x1c [btrfs] > [58380.443238] RSP: 0018:ffff88013896fdd8 EFLAGS: 00010283 > [58380.443238] RAX: ffff880222cae850 RBX: ffff880119ba74c0 RCX: 0000000000000000 > [58380.443238] RDX: 0000000000000000 RSI: ffff880185e16800 RDI: ffff8801446eaeb8 > [58380.443238] RBP: ffff88013896fdd8 R08: ffff8801a9ca9fa8 R09: ffff88013896fc60 > [58380.443238] R10: ffff88013896fd28 R11: 0000000000000000 R12: ffff880222cae000 > [58380.443238] R13: ffff880222cae850 R14: ffff880222cae6b0 R15: ffff8801446eae00 > [58380.443238] FS: 0000000000000000(0000) GS:ffff88023ed80000(0000) knlGS:0000000000000000 > [58380.443238] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [58380.443238] CR2: ffff8801446eaeb8 CR3: 0000000001811000 CR4: 00000000000006e0 > [58380.443238] Stack: > [58380.443238] ffff88013896fe18 ffffffffa03fe2d5 ffff880222cae850 ffff880185e16800 > [58380.443238] ffff88000dc41c20 0000000000000000 ffff8801a9ca9f00 0000000000000000 > [58380.443238] ffff88013896fe80 ffffffffa040fbcf ffff88018b0dcdb0 ffff88013ac82090 > [58380.443238] Call Trace: > [58380.443238] [] btrfs_prepare_extent_commit+0x5a/0xd7 [btrfs] > [58380.443238] [] btrfs_commit_transaction+0x45c/0x882 [btrfs] > [58380.443238] [] transaction_kthread+0xf2/0x1a4 [btrfs] > [58380.443238] [] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs] > [58380.443238] [] kthread+0xb7/0xbf > [58380.443238] [] ? __kthread_parkme+0x67/0x67 > [58380.443238] [] ret_from_fork+0x7c/0xb0 > [58380.443238] [] ? __kthread_parkme+0x67/0x67 > > Signed-off-by: Filipe Manana > --- > fs/btrfs/ctree.h | 1 + > fs/btrfs/extent-tree.c | 27 +++++++++++++++++++++++++++ > 2 files changed, 28 insertions(+) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index d3ccd09..7f40a65 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -1277,6 +1277,7 @@ struct btrfs_block_group_cache { > unsigned int ro:1; > unsigned int dirty:1; > unsigned int iref:1; > + unsigned int has_caching_ctl:1; > Don't do this, just unconditionally call get_caching_control in btrfs_remove_block_group, then if we get something we can do stuff, otherwise we can just continue. Thanks, Josef