From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:30845 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751028AbaKZQCv (ORCPT ); Wed, 26 Nov 2014 11:02:51 -0500 Message-ID: <5475F9A5.7020309@fb.com> Date: Wed, 26 Nov 2014 11:02:45 -0500 From: Josef Bacik MIME-Version: 1.0 To: Filipe Manana , Subject: Re: [PATCH 3/6] Btrfs: fix freeing used extents after removing empty block group References: <1417015735-8581-1-git-send-email-fdmanana@suse.com> <1417015735-8581-4-git-send-email-fdmanana@suse.com> In-Reply-To: <1417015735-8581-4-git-send-email-fdmanana@suse.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 11/26/2014 10:28 AM, Filipe Manana wrote: > There's a race between adding a block group to the list of the unused > block groups and removing an unused block group (cleaner kthread) that > leads to freeing extents that are in use or a crash during transaction > commmit. Basically the cleaner kthread, when executing > btrfs_delete_unused_bgs(), might catch the newly added block group to > the list fs_info->unused_bgs and clear the range representing the whole > group from fs_info->freed_extents[] before the task that added the block > group to the list (running update_block_group()) marked the last freed > extent as dirty in fs_info->freed_extents (pinned_extents). > > That is: > > CPU 1 CPU 2 > > btrfs_delete_unused_bgs() > update_block_group() > add block group to > fs_info->unused_bgs > got block group from the list > clear_extent_bits for the whole > block group range in freed_extents[] > set_extent_dirty for the > range covering the freed > extent in freed_extents[] > (fs_info->pinned_extents) > > block group deleted, and a new block > group with the same logical address is > created > > reserve space from the new block group > for new data or metadata - the reserved > space overlaps the range specified by > CPU 1 for set_extent_dirty() > > commit transaction > find all ranges marked as dirty in > fs_info->pinned_extents, clear them > and add them to the free space cache > > Alternatively, if CPU 2 doesn't create a new block group with the same > logical address, we get a crash/BUG_ON at transaction commit when unpining > extent ranges because we can't find a block group for the range marked as > dirty by CPU 1. Sample trace: > > [ 2163.426462] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC > [ 2163.426640] Modules linked in: btrfs xor raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio crc32c_generic libcrc32c dm_mod nfsd auth_rpc > gss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse parport_pc parport i2c_piix4 processor thermal_sys i2ccore evdev button pcspkr microcode serio_raw ext4 crc16 jbd2 mbcache > sg sr_mod cdrom sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic virtio_scsi floppy ata_piix libata e1000 scsi_mod virtio_pci virtio_ring virtio > [ 2163.428209] CPU: 0 PID: 11858 Comm: btrfs-transacti Tainted: G W 3.17.0-rc5-btrfs-next-1+ #1 > [ 2163.428519] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 > [ 2163.428875] task: ffff88009f2c0650 ti: ffff8801356bc000 task.ti: ffff8801356bc000 > [ 2163.429157] RIP: 0010:[] [] unpin_extent_range.isra.58+0x62/0x192 [btrfs] > [ 2163.429562] RSP: 0018:ffff8801356bfda8 EFLAGS: 00010246 > [ 2163.429802] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > [ 2163.429990] RDX: 0000000041bfffff RSI: 0000000001c00000 RDI: ffff880024307080 > [ 2163.430042] RBP: ffff8801356bfde8 R08: 0000000000000068 R09: ffff88003734f118 > [ 2163.430042] R10: ffff8801356bfcb8 R11: fffffffffffffb69 R12: ffff8800243070d0 > [ 2163.430042] R13: 0000000083c04000 R14: ffff8800751b0f00 R15: ffff880024307000 > [ 2163.430042] FS: 0000000000000000(0000) GS:ffff88013f400000(0000) knlGS:0000000000000000 > [ 2163.430042] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 2163.430042] CR2: 00007ff10eb43fc0 CR3: 0000000004cb8000 CR4: 00000000000006f0 > [ 2163.430042] Stack: > [ 2163.430042] ffff8800243070d0 0000000083c08000 0000000083c07fff ffff88012d6bc800 > [ 2163.430042] ffff8800243070d0 ffff8800751b0f18 ffff8800751b0f00 0000000000000000 > [ 2163.430042] ffff8801356bfe18 ffffffffa037a481 0000000083c04000 0000000083c07fff > [ 2163.430042] Call Trace: > [ 2163.430042] [] btrfs_finish_extent_commit+0xac/0xbf [btrfs] > [ 2163.430042] [] btrfs_commit_transaction+0x6ee/0x882 [btrfs] > [ 2163.430042] [] transaction_kthread+0xf2/0x1a4 [btrfs] > [ 2163.430042] [] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs] > [ 2163.430042] [] kthread+0xb7/0xbf > [ 2163.430042] [] ? __kthread_parkme+0x67/0x67 > [ 2163.430042] [] ret_from_fork+0x7c/0xb0 > [ 2163.430042] [] ? __kthread_parkme+0x67/0x67 > > So fix this by making update_block_group() first set the range as dirty > in pinned_extents before adding the block group to the unused_bgs list. > > Signed-off-by: Filipe Manana > --- > fs/btrfs/extent-tree.c | 21 ++++++++++----------- > 1 file changed, 10 insertions(+), 11 deletions(-) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index b7e40ef..92f61f2 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -5439,7 +5439,17 @@ static int update_block_group(struct btrfs_root *root, > spin_unlock(&cache->space_info->lock); > } else { > old_val -= num_bytes; > + btrfs_set_block_group_used(&cache->item, old_val); > + cache->pinned += num_bytes; > + cache->space_info->bytes_pinned += num_bytes; > + cache->space_info->bytes_used -= num_bytes; > + cache->space_info->disk_used -= num_bytes * factor; > + spin_unlock(&cache->lock); > + spin_unlock(&cache->space_info->lock); > > + set_extent_dirty(info->pinned_extents, > + bytenr, bytenr + num_bytes - 1, > + GFP_NOFS | __GFP_NOFAIL); > /* > * No longer have used bytes in this block group, queue > * it for deletion. > @@ -5453,17 +5463,6 @@ static int update_block_group(struct btrfs_root *root, > } > spin_unlock(&info->unused_bgs_lock); > } > - btrfs_set_block_group_used(&cache->item, old_val); > - cache->pinned += num_bytes; > - cache->space_info->bytes_pinned += num_bytes; > - cache->space_info->bytes_used -= num_bytes; > - cache->space_info->disk_used -= num_bytes * factor; > - spin_unlock(&cache->lock); > - spin_unlock(&cache->space_info->lock); > - > - set_extent_dirty(info->pinned_extents, > - bytenr, bytenr + num_bytes - 1, > - GFP_NOFS | __GFP_NOFAIL); > } > btrfs_put_block_group(cache); > total -= num_bytes; > Reviewed-by: Josef Bacik Josef