linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: Rene Castberg <rene@castberg.org>, linux-btrfs@vger.kernel.org
Subject: Re: RAID5 Unable to remove Failing HD
Date: Wed, 10 Feb 2016 17:00:05 +0800	[thread overview]
Message-ID: <56BAFC15.6080106@oracle.com> (raw)
In-Reply-To: <CAKUFzr8gm-yN01CY=H_j6jdEXJL5b0A_vS6gJ5iH1YN1+ZJAvg@mail.gmail.com>



Rene,

Thanks for the report. Fixes are in the following patch sets

  concern1:
  Btrfs to fail/offline a device for write/flush error:
    [PATCH 00/15] btrfs: Hot spare and Auto replace

  concern2:
  User should be able to delete a device when device has failed:
    [PATCH 0/7] Introduce device delete by devid

  If you were able to tryout these patches, pls lets know.

Thanks, Anand


On 02/10/2016 03:17 PM, Rene Castberg wrote:
> Hi,
>
> This morning i woke up to a failing disk:
>
> [230743.953079] BTRFS: bdev /dev/sdc errs: wr 1573, rd 45648, flush
> 503, corrupt 0, gen 0
> [230743.953970] BTRFS: bdev /dev/sdc errs: wr 1573, rd 45649, flush
> 503, corrupt 0, gen 0
> [230744.106443] BTRFS: lost page write due to I/O error on /dev/sdc
> [230744.180412] BTRFS: lost page write due to I/O error on /dev/sdc
> [230760.116173] btrfs_dev_stat_print_on_error: 5 callbacks suppressed
> [230760.116176] BTRFS: bdev /dev/sdc errs: wr 1577, rd 45651, flush
> 503, corrupt 0, gen 0
> [230760.726244] BTRFS: bdev /dev/sdc errs: wr 1577, rd 45652, flush
> 503, corrupt 0, gen 0
> [230761.392939] btrfs_end_buffer_write_sync: 2 callbacks suppressed
> [230761.392947] BTRFS: lost page write due to I/O error on /dev/sdc
> [230761.392953] BTRFS: bdev /dev/sdc errs: wr 1578, rd 45652, flush
> 503, corrupt 0, gen 0
> [230761.393813] BTRFS: lost page write due to I/O error on /dev/sdc
> [230761.393818] BTRFS: bdev /dev/sdc errs: wr 1579, rd 45652, flush
> 503, corrupt 0, gen 0
> [230761.394843] BTRFS: lost page write due to I/O error on /dev/sdc
> [230761.394849] BTRFS: bdev /dev/sdc errs: wr 1580, rd 45652, flush
> 503, corrupt 0, gen 0
> [230802.000425] nfsd: last server has exited, flushing export cache
> [230898.791862] BTRFS: lost page write due to I/O error on /dev/sdc
> [230898.791873] BTRFS: bdev /dev/sdc errs: wr 1581, rd 45652, flush
> 503, corrupt 0, gen 0
> [230898.792746] BTRFS: lost page write due to I/O error on /dev/sdc
> [230898.792752] BTRFS: bdev /dev/sdc errs: wr 1582, rd 45652, flush
> 503, corrupt 0, gen 0
> [230898.793723] BTRFS: lost page write due to I/O error on /dev/sdc
> [230898.793728] BTRFS: bdev /dev/sdc errs: wr 1583, rd 45652, flush
> 503, corrupt 0, gen 0
> [230898.830893] BTRFS info (device sdd): allowing degraded mounts
> [230898.830902] BTRFS info (device sdd): disk space caching is enabled
>
> Eventually i remounted it as degraded, hopefully to prevent any loss of data.
>
> It seems taht the btrfs filesystem still hasn't noticed that the disk
> has failed:
> $btrfs fi show
> Label: 'RenesData'  uuid: ee80dae2-7c86-43ea-a253-c8f04589b496
>          Total devices 5 FS bytes used 5.38TiB
>          devid    1 size 2.73TiB used 1.84TiB path /dev/sdb
>          devid    2 size 2.73TiB used 1.84TiB path /dev/sde
>          devid    3 size 3.64TiB used 1.84TiB path /dev/sdf
>          devid    4 size 2.73TiB used 1.84TiB path /dev/sdd
>          devid    5 size 3.64TiB used 1.84TiB path /dev/sdc
>
> I tried deleting the device:
> # btrfs device delete /dev/sdc /mnt2/RenesData/
> ERROR: error removing device '/dev/sdc': Invalid argument
>
> I have been unlucky and already had a failure last friday, where a
> RAID5 array failed after a disk failure.  I rebooted, and the data was
> unrecoverable. Fortunately this was only temp data so the failure
> wasn't a real issue.
>
> Can somebody give me some advice how to delete the failing disk? I
> plan on replacing the disk but unfortunately the system doesn't have
> hotplug, so i will need to shutdown to replace the disk without
> loosing any of the data stored on these devices.
>
> Regards
>
> Rene Castberg
>
> # uname -a
> Linux midgard 4.3.3-1.el7.elrepo.x86_64 #1 SMP Tue Dec 15 11:18:19 EST
> 2015 x86_64 x86_64 x86_64 GNU/Linux
> [root@midgard ~]# btrfs --version
> btrfs-progs v4.3.1
> [root@midgard ~]# btrfs fi df  /mnt2/RenesData/
> Data, RAID6: total=5.52TiB, used=5.37TiB
> System, RAID6: total=96.00MiB, used=480.00KiB
> Metadata, RAID6: total=17.53GiB, used=11.86GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
>
> # btrfs device stats /mnt2/RenesData/
> [/dev/sdb].write_io_errs   0
> [/dev/sdb].read_io_errs    0
> [/dev/sdb].flush_io_errs   0
> [/dev/sdb].corruption_errs 0
> [/dev/sdb].generation_errs 0
> [/dev/sde].write_io_errs   0
> [/dev/sde].read_io_errs    0
> [/dev/sde].flush_io_errs   0
> [/dev/sde].corruption_errs 0
> [/dev/sde].generation_errs 0
> [/dev/sdf].write_io_errs   0
> [/dev/sdf].read_io_errs    0
> [/dev/sdf].flush_io_errs   0
> [/dev/sdf].corruption_errs 0
> [/dev/sdf].generation_errs 0
> [/dev/sdd].write_io_errs   0
> [/dev/sdd].read_io_errs    0
> [/dev/sdd].flush_io_errs   0
> [/dev/sdd].corruption_errs 0
> [/dev/sdd].generation_errs 0
> [/dev/sdc].write_io_errs   1583
> [/dev/sdc].read_io_errs    45652
> [/dev/sdc].flush_io_errs   503
> [/dev/sdc].corruption_errs 0
> [/dev/sdc].generation_errs 0
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

  reply	other threads:[~2016-02-10  9:00 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-10  7:17 RAID5 Unable to remove Failing HD Rene Castberg
2016-02-10  9:00 ` Anand Jain [this message]
     [not found]   ` <CAKUFzr___Mc56XSu2nCuKbt11bAWdOdNo4y1LEZ47E5_TDxFGQ@mail.gmail.com>
2016-02-10 16:58     ` Rene Castberg
2016-02-11  4:52       ` Anand Jain
2016-04-18  8:59   ` Lionel Bouton
2016-04-18 14:11     ` Lionel Bouton
2016-04-19  7:35     ` Duncan
2016-04-19  9:13       ` Anand Jain
2016-04-19  9:45         ` Duncan
2016-04-19 10:49         ` Lionel Bouton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56BAFC15.6080106@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=rene@castberg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).