From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mo4-p00-ob.smtp.rzone.de ([81.169.146.163]:42073 "EHLO mo4-p00-ob.smtp.rzone.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751463AbaBGJnI (ORCPT ); Fri, 7 Feb 2014 04:43:08 -0500 Message-ID: <52F4AAAB.7050902@giantdisaster.de> Date: Fri, 07 Feb 2014 10:43:07 +0100 From: Stefan Behrens MIME-Version: 1.0 To: miaox@cn.fujitsu.com, linux-btrfs@vger.kernel.org CC: Wang Shilong Subject: Re: [PATCH 2/2] Btrfs: fix use-after-free in the finishing procedure of the device replace References: <1391071615-6729-1-git-send-email-miaox@cn.fujitsu.com> <1391071615-6729-2-git-send-email-miaox@cn.fujitsu.com> <52F495F2.6020606@giantdisaster.de> <52F4A4B9.20506@cn.fujitsu.com> In-Reply-To: <52F4A4B9.20506@cn.fujitsu.com> Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Fri, 07 Feb 2014 17:17:45 +0800, Miao Xie wrote: > On Fri, 07 Feb 2014 09:14:42 +0100, Stefan Behrens wrote: >> On Thu, 30 Jan 2014 16:46:55 +0800, Miao Xie wrote: >>> During device replace test, we hit a null pointer deference (It was very easy >>> to reproduce it by running xfstests' btrfs/011 on the devices with the virtio >>> scsi driver). There were two bugs that caused this problem: >>> - We might allocate new chunks on the replaced device after we updated >>> the mapping tree. And we forgot to replace the source device in those >>> mapping of the new chunks. >>> - We might get the mapping information which including the source device >>> before the mapping information update. And then submit the bio which was >>> based on that mapping information after we freed the source device. >>> >>> For the first bug, we can fix it by doing mapping tree update and source >>> device remove in the same context of the chunk mutex. The chunk mutex is >>> used to protect the allocable device list, the above method can avoid >>> the new chunk allocation, and after we remove the source device, all >>> the new chunks will be allocated on the new device. So it can fix >>> the first bug. >>> >>> For the second bug, we need make sure all flighting bios are finished and >>> no new bios are produced during we are removing the source device. To fix >>> this problem, we introduced a global @bio_counter, we not only inc/dec >>> @bio_counter outsize of map_blocks, but also inc it before submitting bio >>> and dec @bio_counter when ending bios. >>> >>> Since Raid56 is a little different and device replace dosen't support raid56 >>> yet, it is not addressed in the patch and I add comments to make sure we will >>> fix it in the future. >>> >>> Reported-by: Qu Wenruo >>> Signed-off-by: Wang Shilong >>> Signed-off-by: Miao Xie >>> --- >>> fs/btrfs/ctree.h | 9 ++++++ >>> fs/btrfs/dev-replace.c | 74 +++++++++++++++++++++++++++++++++++++++++++++----- >>> fs/btrfs/disk-io.c | 12 +++++++- >>> fs/btrfs/volumes.c | 30 +++++++++++++++----- >>> fs/btrfs/volumes.h | 1 + >>> 5 files changed, 111 insertions(+), 15 deletions(-) [...] >>> static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, >>> int scrub_ret) >>> { >>> @@ -458,12 +487,6 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, >>> src_device = dev_replace->srcdev; >>> btrfs_dev_replace_unlock(dev_replace); >>> >>> - /* replace old device with new one in mapping tree */ >>> - if (!scrub_ret) >>> - btrfs_dev_replace_update_device_in_mapping_tree(fs_info, >>> - src_device, >>> - tgt_device); >>> - >> >> Isn't this change the reason that you need to add code to count bios? > > No, the change is not the reason to introduce a bio counter. > > The above two bugs are independent, the second bug can happen even the first one > doesn't exist, for example: > Task1 Task2 > btrfs_readpages() > btrfs_map_bio() > __btrfs_map_block() > You are right! Thanks. > device replace start > device replace finish > release source device > bio_size_ok() > bdev_get_queue() > oops > >> I mean, code is there to flush everything to disk. It's needed for sync, >> too. When btrfs_dev_replace_finishing() is called, the copy operation is >> finished and the replaced and the replacement drive are logically >> identical except that data is not yet forced to be flushed to disk. The >> only difference between both disks is the bdev block device pointer and >> the btrfs device pointer. >> >> I think this is the correct and lightweight procedure: >> >> 1. After the copy operation finished, from now on, always use the new >> btrfs device and the new block device. >> 2. Flush all outstanding bios and wait for them. >> 3. Close the block dev and free the btrfs device. >> >> You said that the problem was that we might allocate new chunks on the >> replaced device after we updated the mapping tree. Well, can't we just >> make sure the new chunks are allocated using the new device pointers >> after step 1 and all issues are resolved? I think that the first change >> is not a good idea and causes the other issues that you fix with a lot >> of code. > > Step1 can fix the first problem, but need more code to recover the mapping > tree when the flush fails. > > And as I said above, the second bug is not introduced by the fix of the first > bug, we have to add a bio counter. Any good idea? > > BTW, the reason why I mixed two fixes into one patch is that we could not fix > the oops by any one of them. I can split it into two patches if need. > > Thanks > Miao