* [PATCH 2/6] Btrfs: fix race between device replace and block group removal
@ 2016-05-20 4:44 fdmanana
2016-05-20 15:15 ` Josef Bacik
0 siblings, 1 reply; 2+ messages in thread
From: fdmanana @ 2016-05-20 4:44 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
When it's finishing, the device replace code iterates all extent maps
representing block group and for each one that has a stripe that refers
to the source device, it replaces its device with the target device.
However when it replaces the source device with the target device it,
the target device still has an ID of 0ULL (BTRFS_DEV_REPLACE_DEVID),
only after its ID is changed to match the one from the source device.
This leads to races with the chunk removal code that can temporarly see
a device with an ID of 0ULL and then attempt to use that ID to remove
items from the device tree and fail, causing a transaction abort:
[ 9238.594364] BTRFS info (device sdf): dev_replace from /dev/sdf (devid 3) to /dev/sde finished
[ 9238.594377] ------------[ cut here ]------------
[ 9238.594402] WARNING: CPU: 14 PID: 21566 at fs/btrfs/volumes.c:2771 btrfs_remove_chunk+0x2e5/0x793 [btrfs]
[ 9238.594403] BTRFS: Transaction aborted (error 1)
[ 9238.594416] Modules linked in: btrfs crc32c_generic acpi_cpufreq xor tpm_tis tpm raid6_pq ppdev parport_pc processor psmouse parport i2c_piix4 evdev sg i2c_core se
rio_raw pcspkr button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod fl
oppy [last unloaded: btrfs]
[ 9238.594418] CPU: 14 PID: 21566 Comm: btrfs-cleaner Not tainted 4.6.0-rc7-btrfs-next-29+ #1
[ 9238.594419] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[ 9238.594421] 0000000000000000 ffff88017f1dbc60 ffffffff8126b42c ffff88017f1dbcb0
[ 9238.594422] 0000000000000000 ffff88017f1dbca0 ffffffff81052b14 00000ad37f1dbd18
[ 9238.594423] 0000000000000001 ffff88018068a558 ffff88005c4b9c00 ffff880233f60db0
[ 9238.594424] Call Trace:
[ 9238.594428] [<ffffffff8126b42c>] dump_stack+0x67/0x90
[ 9238.594430] [<ffffffff81052b14>] __warn+0xc2/0xdd
[ 9238.594432] [<ffffffff81052b7a>] warn_slowpath_fmt+0x4b/0x53
[ 9238.594434] [<ffffffff8116c311>] ? kmem_cache_free+0x128/0x188
[ 9238.594450] [<ffffffffa04d43f5>] btrfs_remove_chunk+0x2e5/0x793 [btrfs]
[ 9238.594452] [<ffffffff8108e456>] ? arch_local_irq_save+0x9/0xc
[ 9238.594464] [<ffffffffa04a26fa>] btrfs_delete_unused_bgs+0x317/0x382 [btrfs]
[ 9238.594476] [<ffffffffa04a961d>] cleaner_kthread+0x1ad/0x1c7 [btrfs]
[ 9238.594489] [<ffffffffa04a9470>] ? btree_invalidatepage+0x8e/0x8e [btrfs]
[ 9238.594490] [<ffffffff8106f403>] kthread+0xd4/0xdc
[ 9238.594494] [<ffffffff8149e242>] ret_from_fork+0x22/0x40
[ 9238.594495] [<ffffffff8106f32f>] ? kthread_stop+0x286/0x286
[ 9238.594496] ---[ end trace 183efbe50275f059 ]---
The sequence of steps leading to this is like the following:
CPU 1 CPU 2
btrfs_dev_replace_finishing()
at this point
dev_replace->tgtdev->devid ==
BTRFS_DEV_REPLACE_DEVID (0ULL)
...
btrfs_start_transaction()
btrfs_commit_transaction()
btrfs_delete_unused_bgs()
btrfs_remove_chunk()
looks up for the extent map
corresponding to the chunk
lock_chunks() (chunk_mutex)
check_system_chunk()
unlock_chunks() (chunk_mutex)
locks fs_info->chunk_mutex
btrfs_dev_replace_update_device_in_mapping_tree()
--> iterates fs_info->mapping_tree and
replaces the device in every extent
map's map->stripes[] with
dev_replace->tgtdev, which still has
an id of 0ULL (BTRFS_DEV_REPLACE_DEVID)
iterates over all stripes from
the extent map
--> calls btrfs_free_dev_extent()
passing it the target device
that still has an ID of 0ULL
--> btrfs_free_dev_extent() fails
--> aborts current transaction
finishes setting up the target device,
namely it sets tgtdev->devid to the value
of srcdev->devid (which is necessarily > 0)
frees the srcdev
unlocks fs_info->chunk_mutex
So fix this by taking the device list mutex while processing the stripes
for the chunk's extent map. This is similar to the race between device
replace and block group creation that was fixed by commit 50460e37186a
("Btrfs: fix race when finishing dev replace leading to transaction abort").
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/volumes.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bd0f45f..683e2bd 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2736,6 +2736,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans,
u64 dev_extent_len = 0;
u64 chunk_objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
int i, ret = 0;
+ struct btrfs_fs_devices *fs_devices = root->fs_info->fs_devices;
/* Just in case */
root = root->fs_info->chunk_root;
@@ -2762,12 +2763,19 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans,
check_system_chunk(trans, extent_root, map->type);
unlock_chunks(root->fs_info->chunk_root);
+ /*
+ * Take the device list mutex to prevent races with the final phase of
+ * a device replace operation that replaces the device object associated
+ * with map stripes (dev-replace.c:btrfs_dev_replace_finishing()).
+ */
+ mutex_lock(&fs_devices->device_list_mutex);
for (i = 0; i < map->num_stripes; i++) {
struct btrfs_device *device = map->stripes[i].dev;
ret = btrfs_free_dev_extent(trans, device,
map->stripes[i].physical,
&dev_extent_len);
if (ret) {
+ mutex_unlock(&fs_devices->device_list_mutex);
btrfs_abort_transaction(trans, root, ret);
goto out;
}
@@ -2786,11 +2794,14 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans,
if (map->stripes[i].dev) {
ret = btrfs_update_device(trans, map->stripes[i].dev);
if (ret) {
+ mutex_unlock(&fs_devices->device_list_mutex);
btrfs_abort_transaction(trans, root, ret);
goto out;
}
}
}
+ mutex_unlock(&fs_devices->device_list_mutex);
+
ret = btrfs_free_chunk(trans, root, chunk_objectid, chunk_offset);
if (ret) {
btrfs_abort_transaction(trans, root, ret);
--
2.7.0.rc3
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH 2/6] Btrfs: fix race between device replace and block group removal
2016-05-20 4:44 [PATCH 2/6] Btrfs: fix race between device replace and block group removal fdmanana
@ 2016-05-20 15:15 ` Josef Bacik
0 siblings, 0 replies; 2+ messages in thread
From: Josef Bacik @ 2016-05-20 15:15 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs@vger.kernel.org
On Fri, May 20, 2016 at 12:44 AM, <fdmanana@kernel.org> wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> When it's finishing, the device replace code iterates all extent maps
> representing block group and for each one that has a stripe that refers
> to the source device, it replaces its device with the target device.
> However when it replaces the source device with the target device it,
> the target device still has an ID of 0ULL (BTRFS_DEV_REPLACE_DEVID),
> only after its ID is changed to match the one from the source device.
> This leads to races with the chunk removal code that can temporarly see
> a device with an ID of 0ULL and then attempt to use that ID to remove
> items from the device tree and fail, causing a transaction abort:
>
> [ 9238.594364] BTRFS info (device sdf): dev_replace from /dev/sdf (devid 3) to /dev/sde finished
> [ 9238.594377] ------------[ cut here ]------------
> [ 9238.594402] WARNING: CPU: 14 PID: 21566 at fs/btrfs/volumes.c:2771 btrfs_remove_chunk+0x2e5/0x793 [btrfs]
> [ 9238.594403] BTRFS: Transaction aborted (error 1)
> [ 9238.594416] Modules linked in: btrfs crc32c_generic acpi_cpufreq xor tpm_tis tpm raid6_pq ppdev parport_pc processor psmouse parport i2c_piix4 evdev sg i2c_core se
> rio_raw pcspkr button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod fl
> oppy [last unloaded: btrfs]
> [ 9238.594418] CPU: 14 PID: 21566 Comm: btrfs-cleaner Not tainted 4.6.0-rc7-btrfs-next-29+ #1
> [ 9238.594419] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
> [ 9238.594421] 0000000000000000 ffff88017f1dbc60 ffffffff8126b42c ffff88017f1dbcb0
> [ 9238.594422] 0000000000000000 ffff88017f1dbca0 ffffffff81052b14 00000ad37f1dbd18
> [ 9238.594423] 0000000000000001 ffff88018068a558 ffff88005c4b9c00 ffff880233f60db0
> [ 9238.594424] Call Trace:
> [ 9238.594428] [<ffffffff8126b42c>] dump_stack+0x67/0x90
> [ 9238.594430] [<ffffffff81052b14>] __warn+0xc2/0xdd
> [ 9238.594432] [<ffffffff81052b7a>] warn_slowpath_fmt+0x4b/0x53
> [ 9238.594434] [<ffffffff8116c311>] ? kmem_cache_free+0x128/0x188
> [ 9238.594450] [<ffffffffa04d43f5>] btrfs_remove_chunk+0x2e5/0x793 [btrfs]
> [ 9238.594452] [<ffffffff8108e456>] ? arch_local_irq_save+0x9/0xc
> [ 9238.594464] [<ffffffffa04a26fa>] btrfs_delete_unused_bgs+0x317/0x382 [btrfs]
> [ 9238.594476] [<ffffffffa04a961d>] cleaner_kthread+0x1ad/0x1c7 [btrfs]
> [ 9238.594489] [<ffffffffa04a9470>] ? btree_invalidatepage+0x8e/0x8e [btrfs]
> [ 9238.594490] [<ffffffff8106f403>] kthread+0xd4/0xdc
> [ 9238.594494] [<ffffffff8149e242>] ret_from_fork+0x22/0x40
> [ 9238.594495] [<ffffffff8106f32f>] ? kthread_stop+0x286/0x286
> [ 9238.594496] ---[ end trace 183efbe50275f059 ]---
>
> The sequence of steps leading to this is like the following:
>
> CPU 1 CPU 2
>
> btrfs_dev_replace_finishing()
>
> at this point
> dev_replace->tgtdev->devid ==
> BTRFS_DEV_REPLACE_DEVID (0ULL)
>
> ...
>
> btrfs_start_transaction()
> btrfs_commit_transaction()
>
> btrfs_delete_unused_bgs()
> btrfs_remove_chunk()
>
> looks up for the extent map
> corresponding to the chunk
>
> lock_chunks() (chunk_mutex)
> check_system_chunk()
> unlock_chunks() (chunk_mutex)
>
> locks fs_info->chunk_mutex
>
> btrfs_dev_replace_update_device_in_mapping_tree()
> --> iterates fs_info->mapping_tree and
> replaces the device in every extent
> map's map->stripes[] with
> dev_replace->tgtdev, which still has
> an id of 0ULL (BTRFS_DEV_REPLACE_DEVID)
>
> iterates over all stripes from
> the extent map
>
> --> calls btrfs_free_dev_extent()
> passing it the target device
> that still has an ID of 0ULL
>
> --> btrfs_free_dev_extent() fails
> --> aborts current transaction
>
> finishes setting up the target device,
> namely it sets tgtdev->devid to the value
> of srcdev->devid (which is necessarily > 0)
>
> frees the srcdev
>
> unlocks fs_info->chunk_mutex
>
> So fix this by taking the device list mutex while processing the stripes
> for the chunk's extent map. This is similar to the race between device
> replace and block group creation that was fixed by commit 50460e37186a
> ("Btrfs: fix race when finishing dev replace leading to transaction abort").
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-05-20 15:15 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-20 4:44 [PATCH 2/6] Btrfs: fix race between device replace and block group removal fdmanana
2016-05-20 15:15 ` Josef Bacik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox