From: Josef Bacik <jbacik@fb.com>
To: Filipe Manana <fdmanana@suse.com>, <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 3/6] Btrfs: fix freeing used extents after removing empty block group
Date: Wed, 26 Nov 2014 11:02:45 -0500 [thread overview]
Message-ID: <5475F9A5.7020309@fb.com> (raw)
In-Reply-To: <1417015735-8581-4-git-send-email-fdmanana@suse.com>
On 11/26/2014 10:28 AM, Filipe Manana wrote:
> There's a race between adding a block group to the list of the unused
> block groups and removing an unused block group (cleaner kthread) that
> leads to freeing extents that are in use or a crash during transaction
> commmit. Basically the cleaner kthread, when executing
> btrfs_delete_unused_bgs(), might catch the newly added block group to
> the list fs_info->unused_bgs and clear the range representing the whole
> group from fs_info->freed_extents[] before the task that added the block
> group to the list (running update_block_group()) marked the last freed
> extent as dirty in fs_info->freed_extents (pinned_extents).
>
> That is:
>
> CPU 1 CPU 2
>
> btrfs_delete_unused_bgs()
> update_block_group()
> add block group to
> fs_info->unused_bgs
> got block group from the list
> clear_extent_bits for the whole
> block group range in freed_extents[]
> set_extent_dirty for the
> range covering the freed
> extent in freed_extents[]
> (fs_info->pinned_extents)
>
> block group deleted, and a new block
> group with the same logical address is
> created
>
> reserve space from the new block group
> for new data or metadata - the reserved
> space overlaps the range specified by
> CPU 1 for set_extent_dirty()
>
> commit transaction
> find all ranges marked as dirty in
> fs_info->pinned_extents, clear them
> and add them to the free space cache
>
> Alternatively, if CPU 2 doesn't create a new block group with the same
> logical address, we get a crash/BUG_ON at transaction commit when unpining
> extent ranges because we can't find a block group for the range marked as
> dirty by CPU 1. Sample trace:
>
> [ 2163.426462] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
> [ 2163.426640] Modules linked in: btrfs xor raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio crc32c_generic libcrc32c dm_mod nfsd auth_rpc
> gss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse parport_pc parport i2c_piix4 processor thermal_sys i2ccore evdev button pcspkr microcode serio_raw ext4 crc16 jbd2 mbcache
> sg sr_mod cdrom sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_generic virtio_scsi floppy ata_piix libata e1000 scsi_mod virtio_pci virtio_ring virtio
> [ 2163.428209] CPU: 0 PID: 11858 Comm: btrfs-transacti Tainted: G W 3.17.0-rc5-btrfs-next-1+ #1
> [ 2163.428519] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
> [ 2163.428875] task: ffff88009f2c0650 ti: ffff8801356bc000 task.ti: ffff8801356bc000
> [ 2163.429157] RIP: 0010:[<ffffffffa037728e>] [<ffffffffa037728e>] unpin_extent_range.isra.58+0x62/0x192 [btrfs]
> [ 2163.429562] RSP: 0018:ffff8801356bfda8 EFLAGS: 00010246
> [ 2163.429802] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 2163.429990] RDX: 0000000041bfffff RSI: 0000000001c00000 RDI: ffff880024307080
> [ 2163.430042] RBP: ffff8801356bfde8 R08: 0000000000000068 R09: ffff88003734f118
> [ 2163.430042] R10: ffff8801356bfcb8 R11: fffffffffffffb69 R12: ffff8800243070d0
> [ 2163.430042] R13: 0000000083c04000 R14: ffff8800751b0f00 R15: ffff880024307000
> [ 2163.430042] FS: 0000000000000000(0000) GS:ffff88013f400000(0000) knlGS:0000000000000000
> [ 2163.430042] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 2163.430042] CR2: 00007ff10eb43fc0 CR3: 0000000004cb8000 CR4: 00000000000006f0
> [ 2163.430042] Stack:
> [ 2163.430042] ffff8800243070d0 0000000083c08000 0000000083c07fff ffff88012d6bc800
> [ 2163.430042] ffff8800243070d0 ffff8800751b0f18 ffff8800751b0f00 0000000000000000
> [ 2163.430042] ffff8801356bfe18 ffffffffa037a481 0000000083c04000 0000000083c07fff
> [ 2163.430042] Call Trace:
> [ 2163.430042] [<ffffffffa037a481>] btrfs_finish_extent_commit+0xac/0xbf [btrfs]
> [ 2163.430042] [<ffffffffa038c06d>] btrfs_commit_transaction+0x6ee/0x882 [btrfs]
> [ 2163.430042] [<ffffffffa03881f1>] transaction_kthread+0xf2/0x1a4 [btrfs]
> [ 2163.430042] [<ffffffffa03880ff>] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs]
> [ 2163.430042] [<ffffffff8105966b>] kthread+0xb7/0xbf
> [ 2163.430042] [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
> [ 2163.430042] [<ffffffff813ebeac>] ret_from_fork+0x7c/0xb0
> [ 2163.430042] [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
>
> So fix this by making update_block_group() first set the range as dirty
> in pinned_extents before adding the block group to the unused_bgs list.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> fs/btrfs/extent-tree.c | 21 ++++++++++-----------
> 1 file changed, 10 insertions(+), 11 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index b7e40ef..92f61f2 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5439,7 +5439,17 @@ static int update_block_group(struct btrfs_root *root,
> spin_unlock(&cache->space_info->lock);
> } else {
> old_val -= num_bytes;
> + btrfs_set_block_group_used(&cache->item, old_val);
> + cache->pinned += num_bytes;
> + cache->space_info->bytes_pinned += num_bytes;
> + cache->space_info->bytes_used -= num_bytes;
> + cache->space_info->disk_used -= num_bytes * factor;
> + spin_unlock(&cache->lock);
> + spin_unlock(&cache->space_info->lock);
>
> + set_extent_dirty(info->pinned_extents,
> + bytenr, bytenr + num_bytes - 1,
> + GFP_NOFS | __GFP_NOFAIL);
> /*
> * No longer have used bytes in this block group, queue
> * it for deletion.
> @@ -5453,17 +5463,6 @@ static int update_block_group(struct btrfs_root *root,
> }
> spin_unlock(&info->unused_bgs_lock);
> }
> - btrfs_set_block_group_used(&cache->item, old_val);
> - cache->pinned += num_bytes;
> - cache->space_info->bytes_pinned += num_bytes;
> - cache->space_info->bytes_used -= num_bytes;
> - cache->space_info->disk_used -= num_bytes * factor;
> - spin_unlock(&cache->lock);
> - spin_unlock(&cache->space_info->lock);
> -
> - set_extent_dirty(info->pinned_extents,
> - bytenr, bytenr + num_bytes - 1,
> - GFP_NOFS | __GFP_NOFAIL);
> }
> btrfs_put_block_group(cache);
> total -= num_bytes;
>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Josef
next prev parent reply other threads:[~2014-11-26 16:02 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-26 15:28 [PATCH 0/6] Btrfs: fixes for block group remove/allocation and trim/discard Filipe Manana
2014-11-26 15:28 ` [PATCH 1/6] Btrfs: fix invalid block group rbtree access after bg is removed Filipe Manana
2014-11-26 15:58 ` Josef Bacik
2014-11-26 15:28 ` [PATCH 2/6] Btrfs: fix crash caused by block group removal Filipe Manana
2014-11-26 15:57 ` Josef Bacik
2014-11-26 16:09 ` Filipe David Manana
2014-11-26 16:24 ` Josef Bacik
2014-11-26 16:34 ` Filipe David Manana
2014-11-26 16:41 ` Josef Bacik
2014-11-26 15:28 ` [PATCH 3/6] Btrfs: fix freeing used extents after removing empty block group Filipe Manana
2014-11-26 16:02 ` Josef Bacik [this message]
2014-11-26 15:28 ` [PATCH 4/6] Btrfs: fix race between fs trimming and block group remove/allocation Filipe Manana
2014-11-26 16:15 ` Josef Bacik
2014-11-26 16:25 ` Filipe David Manana
2014-11-26 16:30 ` Josef Bacik
2014-11-26 17:19 ` [PATCH v2 " Filipe Manana
2014-11-27 1:16 ` [PATCH v3 " Filipe Manana
2014-11-27 21:14 ` [PATCH v4 " Filipe Manana
2014-11-26 15:28 ` [PATCH 5/6] Btrfs: fix race between writing free space cache and trimming Filipe Manana
2014-12-01 17:04 ` [PATCH v2 " Filipe Manana
2014-11-26 15:28 ` [PATCH 6/6] Btrfs: make btrfs_abort_transaction consider existence of new block groups Filipe Manana
2014-11-26 16:07 ` Josef Bacik
2014-11-26 16:15 ` Filipe David Manana
2014-11-26 16:19 ` Josef Bacik
2014-11-26 16:29 ` Filipe David Manana
2014-11-26 16:32 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5475F9A5.7020309@fb.com \
--to=jbacik@fb.com \
--cc=fdmanana@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.