From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f67.google.com ([209.85.214.67]:52039 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751108AbdJROap (ORCPT ); Wed, 18 Oct 2017 10:30:45 -0400 Received: by mail-it0-f67.google.com with SMTP id o135so6176235itb.0 for ; Wed, 18 Oct 2017 07:30:44 -0700 (PDT) Subject: Re: Is it safe to use btrfs on top of different types of devices? To: Adam Borowski Cc: Zoltan , linux-btrfs@vger.kernel.org References: <20171017011443.bupcsskm7joc73wb@angband.pl> <81e1136a-a846-9531-b1bf-9ad2aabb785d@gmail.com> <20171017170626.amfrohfyqlujdueu@angband.pl> <1d5e9875-1c1e-f67e-1f5b-0741555d9517@gmail.com> <20171017202135.xdop4eko6utircmz@angband.pl> <213a404f-90e6-a3f8-4867-4e9fcf24426c@gmail.com> <20171018115905.f5ndvyp5rcu4ykhv@angband.pl> From: "Austin S. Hemmelgarn" Message-ID: <18794def-4e82-df32-82d3-27bd22c974d3@gmail.com> Date: Wed, 18 Oct 2017 10:30:37 -0400 MIME-Version: 1.0 In-Reply-To: <20171018115905.f5ndvyp5rcu4ykhv@angband.pl> Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 2017-10-18 07:59, Adam Borowski wrote: > On Wed, Oct 18, 2017 at 07:30:55AM -0400, Austin S. Hemmelgarn wrote: >> On 2017-10-17 16:21, Adam Borowski wrote: >>>>> It's a single-device filesystem, thus disconnects are obviously fatal. But, >>>>> they never caused even a single bit of damage (as scrub goes), thus proving >>>>> btrfs handles this kind of disconnects well. Unlike times past, the kernel >>>>> doesn't get confused thus no reboot is needed, merely an unmount, "service >>>>> nbd-client restart", mount, restart the rebuild jobs. >>>> That's expected behavior though. _Single_ device BTRFS has nothing to get >>>> out of sync most of the time, the only time there's any possibility of an >>>> issue is when you die after writing the first copy of a block that's in a >>>> dup profile chunk, but even that is not very likely to cause problems >>>> (you'll just lose at most the last worth of data). >>> >>> How come? In a DUP profile, the writes are: chunk 1, chunk2, barrier, >>> superblock. The two prior writes may be arbitrarily reordered -- both >>> between each other or even individual sectors inside the chunks, but unless >>> the disk lies about barriers, there's no way to have any corruption, thus >>> running scrub is not needed. >> If the device dies after writing chunk 1 but before the barrier, you end up >> needing scrub. How much of a failure window is present is largely a >> function of how fast the device is, but there is a failure window there. > > CoW is there to ensure there is _no_ failure window. The new content > doesn't matter until there are live pointers to it -- from the filesystem's > point of view we merely scribbled something on an unused part of the block > device. Only after all pieces are in place (as ensured by the barrier), the > superblock is updated with a reference to the new metadata->data chain. Even with CoW there _IS_ a failure window. At a bare minimum, when updating the root of the tree which has multiple copies, you have a failure window. This window could admittedly be significantly reduced for multi-device setups if we actually parallelized writes properly, but it would still be there. > > Thus, no matter when a disconnect happens, after a crash you get either > uncorrupted old version or uncorrupted new version. > > No scrub is ever needed for this reason on single device or on RAID1 that > didn't run degraded. The whole conversation started regarding a RAID1 array that's functionally guaranteed to run degraded on a regular basis.