From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f174.google.com ([209.85.223.174]:36757 "EHLO mail-io0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751638AbcF0RhZ (ORCPT ); Mon, 27 Jun 2016 13:37:25 -0400 Received: by mail-io0-f174.google.com with SMTP id s63so154660331ioi.3 for ; Mon, 27 Jun 2016 10:37:25 -0700 (PDT) Subject: Re: Strange behavior when replacing device on BTRFS RAID 5 array. To: Chris Murphy , Nick Austin References: Cc: Btrfs BTRFS From: "Austin S. Hemmelgarn" Message-ID: <016f057b-b7a1-cccf-ca8a-cfe0e1d4341a@gmail.com> Date: Mon, 27 Jun 2016 13:37:15 -0400 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 2016-06-27 13:29, Chris Murphy wrote: > On Sun, Jun 26, 2016 at 10:02 PM, Nick Austin wrote: >> On Sun, Jun 26, 2016 at 8:57 PM, Nick Austin wrote: >>> sudo btrfs fi show /mnt/newdata >>> Label: '/var/data' uuid: e4a2eb77-956e-447a-875e-4f6595a5d3ec >>> Total devices 4 FS bytes used 8.07TiB >>> devid 1 size 5.46TiB used 2.70TiB path /dev/sdg >>> devid 2 size 5.46TiB used 2.70TiB path /dev/sdl >>> devid 3 size 5.46TiB used 2.70TiB path /dev/sdm >>> devid 4 size 5.46TiB used 2.70TiB path /dev/sdx >> >> It looks like fi show has bad data: >> >> When I start heavy IO on the filesystem (running rsync -c to verify the data), >> I notice zero IO on the bad drive I told btrfs to replace, and lots of IO to the >> expected replacement. >> >> I guess some metadata is messed up somewhere? >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 25.19 0.00 7.81 28.46 0.00 38.54 >> >> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn >> sdg 437.00 75168.00 1792.00 75168 1792 >> sdl 443.00 76064.00 1792.00 76064 1792 >> sdm 438.00 75232.00 1472.00 75232 1472 >> sdw 443.00 75680.00 1856.00 75680 1856 >> sdx 0.00 0.00 0.00 0 0 > > There's reported some bugs with 'btrfs replace' and raid56, but I > don't know the exact nature of those bugs, when or how they manifest. > It's recommended to fallback to use 'btrfs add' and then 'btrfs > delete' but you have other issues going on also. One other thing to mention, if the device is failing, _always_ add '-r' to the replace command line. This will tell it to avoid reading from the device being replaced (in raid1 or raid10 mode, it will pull from the other mirror, in raid5/6 mode, it will recompute the block from parity and compare to the stored checksums (which in turn means that this _will_ be slower on raid5/6 than regular repalce)). Link resets and other issues that cause devices to disappear become more common the more damaged a disk is, so avoiding reading from it becomes more important too, because just reading from a disk puts stress on it.