From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10DA0C761A4 for ; Fri, 13 Dec 2019 20:38:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4FE21246B1 for ; Fri, 13 Dec 2019 20:38:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727963AbfLMPst (ORCPT ); Fri, 13 Dec 2019 10:48:49 -0500 Received: from mx2.suse.de ([195.135.220.15]:51682 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727920AbfLMPss (ORCPT ); Fri, 13 Dec 2019 10:48:48 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 04731AFB2; Fri, 13 Dec 2019 15:48:46 +0000 (UTC) Received: by ds.suse.cz (Postfix, from userid 10065) id 94169DA82A; Fri, 13 Dec 2019 16:48:46 +0100 (CET) Date: Fri, 13 Dec 2019 16:48:45 +0100 From: David Sterba To: Filipe Manana Cc: Kyle Ambroff-Kao , linux-btrfs Subject: Re: [PATCH 1/1] btrfs: Allow replacing device with a smaller one if possible Message-ID: <20191213154845.GY3929@twin.jikos.cz> Reply-To: dsterba@suse.cz Mail-Followup-To: dsterba@suse.cz, Filipe Manana , Kyle Ambroff-Kao , linux-btrfs References: <20191208093045.43433-1-kyle@ambroffkao.com> <20191208093045.43433-2-kyle@ambroffkao.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12) Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Mon, Dec 09, 2019 at 10:26:52AM +0000, Filipe Manana wrote: > 2) A simple solution, but often less efficient: before starting the > actual replace operation, shrink the source device to the size of the > target device - just use the existing btrfs_shrink_device(), which > will relocate chunks beyond the new size, and if there's not enough > space it just returns -ENOSPC. This means no changes to the actual > way replace copies data - it does extra IO, due to the relocation but > keeps things simple, and it should still be significantly more > efficient then doing a device remove + device add operation, maybe > except if all or most of the allocated chunks (in the device to be > replaced) cross or start beyond an offset matching the new device's > size. > > Also, since the shrink can take some time due to relocation of > chunks, we would need to teach btrfs_shrink_device() to check for > device replace cancel requests as well. And such request is > detected, restore the device's size to the original value. The shrinking can be done completely in userspace, calling one more ioctl before device replace. Handling the error cases will be simplified (and not necessarily done in kernel at all). So something like that: $ btrfs device replace 2 /dev/sdx /mnt (fail because the device is too small, print a message that the target device needs to be shrunk manually or there's an option eg. --shrink-target that will do that in one go) $ btrfs device replace --shrink-target 2 /dev/sdx /mnt Shrink device 2 from 12345678 to 123456 (you can cancel that by 'btrfs resize --cancel) Done Starting devicr replace > I think option 2 may actually be acceptable for an initial version. Option 1 is > complex and increases the risk for data loss. Also, for option 2, there's the > possible downside of requiring writes to the source device - one might > be replacing > it because the device is not healthy, writes into some regions are > failing, which > can prevent the shrink/relocation process from suceeding, in that case only > a device remove followed by a device add operation would work. That's a good point and giving user more options how to replace the device sounds a like a better option than implementing all of that in kernel.