Re: Possible Raid Bug

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Anand Jain <anand.jain@oracle.com>
To: Alexander Fougner <fougner89@gmail.com>,
	Patrik Lundquist <patrik.lundquist@gmail.com>
Cc: Stephen Williams <stephenw@veryfast.biz>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Possible Raid Bug
Date: Sat, 26 Mar 2016 11:08:50 +0800	[thread overview]
Message-ID: <56F5FD42.5010701@oracle.com> (raw)
In-Reply-To: <CAGGqMYR9LdCiqHfO5Rn5zY1G7_YKYjrEzkovTVVaFRXqQ7vpeQ@mail.gmail.com>



On 03/26/2016 04:09 AM, Alexander Fougner wrote:
> 2016-03-25 20:57 GMT+01:00 Patrik Lundquist <patrik.lundquist@gmail.com>:
>> On 25 March 2016 at 18:20, Stephen Williams <stephenw@veryfast.biz> wrote:
>>>
>>> Your information below was very helpful and I was able to recreate the
>>> Raid array. However my initial question still stands - What if the
>>> drives dies completely? I work in a Data center and we see this quite a
>>> lot where a drive is beyond dead - The OS will literally not detect it.
>>
>> That's currently a weakness of Btrfs. I don't know how people deal
>> with it in production. I think Anand Jain is working on improving it.

  We need this issue be fixed for the real production usage.

  Patch set of hot spare contains the fix for this. Currently I am
  fixing an issue (#5) which Yauhen reported and thats related to the
  auto replace. Refreshed v2 will be out soon.

Thanks, Anand

>>> At this point would the Raid10 array be beyond repair? As you need the
>>> drive present in order to mount the array in degraded mode.
>>
>> Right... let's try it again but a little bit differently.
>>
>> # mount /dev/sdb /mnt
>>
>> Let's drop the disk.
>>
>> # echo 1 >/sys/block/sde/device/delete
>>
>> [ 3669.024256] sd 5:0:0:0: [sde] Synchronizing SCSI cache
>> [ 3669.024934] sd 5:0:0:0: [sde] Stopping disk
>> [ 3669.037028] ata6.00: disabled
>>
>> # touch /mnt/test3
>> # sync
>>
>> [ 3845.960839] BTRFS error (device sdb): bdev /dev/sde errs: wr 1, rd
>> 0, flush 0, corrupt 0, gen 0
>> [ 3845.961525] BTRFS error (device sdb): bdev /dev/sde errs: wr 2, rd
>> 0, flush 0, corrupt 0, gen 0
>> [ 3845.962738] BTRFS error (device sdb): bdev /dev/sde errs: wr 3, rd
>> 0, flush 0, corrupt 0, gen 0
>> [ 3845.963038] BTRFS error (device sdb): bdev /dev/sde errs: wr 4, rd
>> 0, flush 0, corrupt 0, gen 0
>> [ 3845.963422] BTRFS error (device sdb): bdev /dev/sde errs: wr 4, rd
>> 0, flush 1, corrupt 0, gen 0
>> [ 3845.963686] BTRFS warning (device sdb): lost page write due to IO
>> error on /dev/sde
>> [ 3845.963691] BTRFS error (device sdb): bdev /dev/sde errs: wr 5, rd
>> 0, flush 1, corrupt 0, gen 0
>> [ 3845.963932] BTRFS warning (device sdb): lost page write due to IO
>> error on /dev/sde
>> [ 3845.963941] BTRFS error (device sdb): bdev /dev/sde errs: wr 6, rd
>> 0, flush 1, corrupt 0, gen 0
>>
>> # umount /mnt
>>
>> [ 4095.276831] BTRFS error (device sdb): bdev /dev/sde errs: wr 7, rd
>> 0, flush 1, corrupt 0, gen 0
>> [ 4095.278368] BTRFS error (device sdb): bdev /dev/sde errs: wr 8, rd
>> 0, flush 1, corrupt 0, gen 0
>> [ 4095.279152] BTRFS error (device sdb): bdev /dev/sde errs: wr 8, rd
>> 0, flush 2, corrupt 0, gen 0
>> [ 4095.279373] BTRFS warning (device sdb): lost page write due to IO
>> error on /dev/sde
>> [ 4095.279377] BTRFS error (device sdb): bdev /dev/sde errs: wr 9, rd
>> 0, flush 2, corrupt 0, gen 0
>> [ 4095.279609] BTRFS warning (device sdb): lost page write due to IO
>> error on /dev/sde
>> [ 4095.279612] BTRFS error (device sdb): bdev /dev/sde errs: wr 10, rd
>> 0, flush 2, corrupt 0, gen 0
>>
>> # mount -o degraded /dev/sdb /mnt
>>
>> [ 4608.113751] BTRFS info (device sdb): allowing degraded mounts
>> [ 4608.113756] BTRFS info (device sdb): disk space caching is enabled
>> [ 4608.113757] BTRFS: has skinny extents
>> [ 4608.116557] BTRFS info (device sdb): bdev /dev/sde errs: wr 6, rd
>> 0, flush 1, corrupt 0, gen 0
>>
>> # touch /mnt/test4
>> # sync
>>
>> Writing to the filesystem works while the device is missing.
>> No new errors in dmesg after re-mounting degraded. Reboot to get back /dev/sde.
>>
>> [    4.329852] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
>> devid 4 transid 26 /dev/sde
>> [    4.330157] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
>> devid 3 transid 31 /dev/sdd
>> [    4.330511] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
>> devid 2 transid 31 /dev/sdc
>> [    4.330865] BTRFS: device fsid 75737bea-d76c-42f5-b0e6-7d346e38610d
>> devid 1 transid 31 /dev/sdb
>>
>> /dev/sde transid is lagging behind, of course.
>>
>> # wipefs -a /dev/sde
>> # btrfs device scan
>>
>> # mount -o degraded /dev/sdb /mnt
>>
>> [  507.248621] BTRFS info (device sdb): allowing degraded mounts
>> [  507.248626] BTRFS info (device sdb): disk space caching is enabled
>> [  507.248628] BTRFS: has skinny extents
>> [  507.252815] BTRFS info (device sdb): bdev /dev/sde errs: wr 6, rd
>> 0, flush 1, corrupt 0, gen 0
>> [  507.252919] BTRFS: missing devices(1) exceeds the limit(0),
>
> single/dup profile has zero-limit tolerance for missing devices. Only
> ro-mount allowed in that case.
>
>> writeable mount is not allowed
>> [  507.278277] BTRFS: open_ctree failed
>>
>> Well, that was unexpected! Reboot again.
>>
>> # mount -o degraded /dev/sdb /mnt
>>
>> [   94.368514] BTRFS info (device sdd): allowing degraded mounts
>> [   94.368519] BTRFS info (device sdd): disk space caching is enabled
>> [   94.368521] BTRFS: has skinny extents
>> [   94.370909] BTRFS warning (device sdd): devid 4 uuid
>> 8549a275-f663-4741-b410-79b49a1d465f is missing
>> [   94.372170] BTRFS info (device sdd): bdev (null) errs: wr 6, rd 0,
>> flush 1, corrupt 0, gen 0
>> [   94.372284] BTRFS: missing devices(1) exceeds the limit(0),
>> writeable mount is not allowed
>> [   94.395021] BTRFS: open_ctree failed
>>
>> No go.
>>
>> # mount -o degraded,ro /dev/sdb /mnt
>> # btrfs device stats /mnt
>> [/dev/sdb].write_io_errs   0
>> [/dev/sdb].read_io_errs    0
>> [/dev/sdb].flush_io_errs   0
>> [/dev/sdb].corruption_errs 0
>> [/dev/sdb].generation_errs 0
>> [/dev/sdc].write_io_errs   0
>> [/dev/sdc].read_io_errs    0
>> [/dev/sdc].flush_io_errs   0
>> [/dev/sdc].corruption_errs 0
>> [/dev/sdc].generation_errs 0
>> [/dev/sdd].write_io_errs   0
>> [/dev/sdd].read_io_errs    0
>> [/dev/sdd].flush_io_errs   0
>> [/dev/sdd].corruption_errs 0
>> [/dev/sdd].generation_errs 0
>> [(null)].write_io_errs   6
>> [(null)].read_io_errs    0
>> [(null)].flush_io_errs   1
>> [(null)].corruption_errs 0
>> [(null)].generation_errs 0
>>
>> Only errors on the device formerly known as /dev/sde, so why won't it
>> mount degraded,rw? Now I'm stuck like Stephen.
>>
>
> Because during the first degraded mount single profile chunks were created.
> I believe this is what Anand is working on. To actually check device
> degradation on a blockgroup basis rather than on FS basis.
>
>> # btrfs device usage /mnt
>> /dev/sdb, ID: 1
>>     Device size:             2.00GiB
>>     Data,single:           624.00MiB
>>     Data,RAID10:           102.38MiB
>>     Metadata,RAID10:       102.38MiB
>>     System,RAID10:           4.00MiB
>>     Unallocated:             1.19GiB
>>
>> /dev/sdc, ID: 2
>>     Device size:             2.00GiB
>>     Data,RAID10:           102.38MiB
>>     Metadata,RAID10:       102.38MiB
>>     System,single:          32.00MiB
>>     System,RAID10:           4.00MiB
>>     Unallocated:             1.76GiB
>>
>> /dev/sdd, ID: 3
>>     Device size:             2.00GiB
>>     Data,RAID10:           102.38MiB
>>     Metadata,single:       256.00MiB
>>     Metadata,RAID10:       102.38MiB
>>     System,RAID10:           4.00MiB
>>     Unallocated:             1.55GiB
>>
>> missing, ID: 4
>>     Device size:               0.00B
>>     Data,RAID10:           102.38MiB
>>     Metadata,RAID10:       102.38MiB
>>     System,RAID10:           4.00MiB
>>     Unallocated:             1.80GiB
>>
>> The data written while mounted degraded is in profile 'single' and
>> will have to be converted to 'raid10' once the filesystem is whole
>> again.
>>
>> So what do I do now? Why did it degrade further after a reboot?
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

next prev parent reply	other threads:[~2016-03-26  3:09 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-25 11:49 Possible Raid Bug Stephen Williams
2016-03-25 12:48 ` Patrik Lundquist
2016-03-25 14:41   ` Duncan
2016-03-25 14:57   ` Patrik Lundquist
2016-03-25 17:20     ` Stephen Williams
2016-03-25 19:57       ` Patrik Lundquist
2016-03-25 20:09         ` Alexander Fougner
2016-03-26  3:08           ` Anand Jain [this message]
2016-03-26 11:51             ` Patrik Lundquist
2016-03-26 14:00               ` Stephen Williams
2016-03-26 20:58                 ` Chris Murphy
2016-03-27 17:06                   ` Stephen Williams
2016-03-26 20:48               ` Chris Murphy
2016-03-28  3:54               ` Anand Jain
2016-03-28 10:41                 ` Patrik Lundquist
2016-03-25 21:34         ` Chris Murphy
2016-03-26  3:56           ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56F5FD42.5010701@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=fougner89@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=patrik.lundquist@gmail.com \
    --cc=stephenw@veryfast.biz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.