* [btrfs tools] ability to fail a device... @ 2015-09-08 19:18 Ian Kumlien 2015-09-08 19:34 ` Hugo Mills 0 siblings, 1 reply; 15+ messages in thread From: Ian Kumlien @ 2015-09-08 19:18 UTC (permalink / raw) To: linux-btrfs Hi, Currently i have a raid1 configuration on two disks where one of them is failing. But since: btrfs fi df /mnt/disk/ Data, RAID1: total=858.00GiB, used=638.16GiB Data, single: total=1.00GiB, used=256.00KiB System, RAID1: total=32.00MiB, used=132.00KiB Metadata, RAID1: total=4.00GiB, used=1.21GiB GlobalReserve, single: total=412.00MiB, used=0.00B There should be no problem in failing one disk... Or so i thought! btrfs dev delete /dev/sdb2 /mnt/disk/ ERROR: error removing the device '/dev/sdb2' - unable to go below two devices on raid1 And i can't issue rebalance either since it will tell me about errors until the failing disk dies. Whats even more interesting is that i can't mount just the working disk - ie if the other disk *has* failed and is inaccessible... though, i haven't tried physically removing it... mdam has fail and remove, I assume for this reason - perhaps it's something that should be added? uname -r 4.2.0 btrfs --version btrfs-progs v4.1.2 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [btrfs tools] ability to fail a device... 2015-09-08 19:18 [btrfs tools] ability to fail a device Ian Kumlien @ 2015-09-08 19:34 ` Hugo Mills 2015-09-08 19:43 ` Ian Kumlien 2015-09-09 1:35 ` Anand Jain 0 siblings, 2 replies; 15+ messages in thread From: Hugo Mills @ 2015-09-08 19:34 UTC (permalink / raw) To: Ian Kumlien; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 2190 bytes --] On Tue, Sep 08, 2015 at 09:18:05PM +0200, Ian Kumlien wrote: > Hi, > > Currently i have a raid1 configuration on two disks where one of them > is failing. > > But since: > btrfs fi df /mnt/disk/ > Data, RAID1: total=858.00GiB, used=638.16GiB > Data, single: total=1.00GiB, used=256.00KiB > System, RAID1: total=32.00MiB, used=132.00KiB > Metadata, RAID1: total=4.00GiB, used=1.21GiB > GlobalReserve, single: total=412.00MiB, used=0.00B > > There should be no problem in failing one disk... Or so i thought! > > btrfs dev delete /dev/sdb2 /mnt/disk/ > ERROR: error removing the device '/dev/sdb2' - unable to go below two > devices on raid1 dev delete is more like a reshaping operation in mdadm: it tries to remove a device safely whilst retaining all of the redundancy guarantees. You can't go down to one device with RAID-1 and still keep the redundancy. dev delete is really for managed device removal under non-failure conditions, not for error recovery. > And i can't issue rebalance either since it will tell me about errors > until the failing disk dies. > > Whats even more interesting is that i can't mount just the working > disk - ie if the other disk > *has* failed and is inaccessible... though, i haven't tried physically > removing it... Physically removing it is the way to go (or disabling it using echo offline >/sys/block/sda/device/state). Once you've done that, you can mount the degraded FS with -odegraded, then either add a new device and balance to restore the RAID-1, or balance with -{d,m}convert=single to drop the redundancy to single. > mdam has fail and remove, I assume for this reason - perhaps it's > something that should be added? I think there should be a btrfs dev drop, which is the fail-like operation: tell the FS that a device is useless, and should be dropped from the array, so the FS doesn't keep trying to write to it. That's not implemented yet, though. Hugo. -- Hugo Mills | Alert status mauve ocelot: Slight chance of hugo@... carfax.org.uk | brimstone. Be prepared to make a nice cup of tea. http://carfax.org.uk/ | PGP: E2AB1DE4 | [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [btrfs tools] ability to fail a device... 2015-09-08 19:34 ` Hugo Mills @ 2015-09-08 19:43 ` Ian Kumlien 2015-09-08 19:55 ` Ian Kumlien 2015-09-09 1:35 ` Anand Jain 1 sibling, 1 reply; 15+ messages in thread From: Ian Kumlien @ 2015-09-08 19:43 UTC (permalink / raw) To: Hugo Mills, Ian Kumlien, linux-btrfs On 8 September 2015 at 21:34, Hugo Mills <hugo@carfax.org.uk> wrote: > On Tue, Sep 08, 2015 at 09:18:05PM +0200, Ian Kumlien wrote: >> Hi, >> >> Currently i have a raid1 configuration on two disks where one of them >> is failing. >> >> But since: >> btrfs fi df /mnt/disk/ >> Data, RAID1: total=858.00GiB, used=638.16GiB >> Data, single: total=1.00GiB, used=256.00KiB >> System, RAID1: total=32.00MiB, used=132.00KiB >> Metadata, RAID1: total=4.00GiB, used=1.21GiB >> GlobalReserve, single: total=412.00MiB, used=0.00B >> >> There should be no problem in failing one disk... Or so i thought! >> >> btrfs dev delete /dev/sdb2 /mnt/disk/ >> ERROR: error removing the device '/dev/sdb2' - unable to go below two >> devices on raid1 > > dev delete is more like a reshaping operation in mdadm: it tries to > remove a device safely whilst retaining all of the redundancy > guarantees. You can't go down to one device with RAID-1 and still keep > the redundancy. > > dev delete is really for managed device removal under non-failure > conditions, not for error recovery. > >> And i can't issue rebalance either since it will tell me about errors >> until the failing disk dies. >> >> Whats even more interesting is that i can't mount just the working >> disk - ie if the other disk >> *has* failed and is inaccessible... though, i haven't tried physically >> removing it... > > Physically removing it is the way to go (or disabling it using echo > offline >/sys/block/sda/device/state). Once you've done that, you can > mount the degraded FS with -odegraded, then either add a new device > and balance to restore the RAID-1, or balance with > -{d,m}convert=single to drop the redundancy to single. This did not work... [ 1742.368079] BTRFS info (device sda2): The free space cache file (280385028096) is invalid. skip it [ 1789.052403] BTRFS: open /dev/sdb2 failed [ 1789.064629] BTRFS info (device sda2): allowing degraded mounts [ 1789.064632] BTRFS info (device sda2): disk space caching is enabled [ 1789.092286] BTRFS: bdev /dev/sdb2 errs: wr 2036894, rd 2031380, flush 705, corrupt 0, gen 0 [ 1792.625275] BTRFS: too many missing devices, writeable mount is not allowed [ 1792.644407] BTRFS: open_ctree failed >> mdam has fail and remove, I assume for this reason - perhaps it's >> something that should be added? > > I think there should be a btrfs dev drop, which is the fail-like > operation: tell the FS that a device is useless, and should be dropped > from the array, so the FS doesn't keep trying to write to it. That's > not implemented yet, though. Damn it =) > Hugo. > > -- > Hugo Mills | Alert status mauve ocelot: Slight chance of > hugo@... carfax.org.uk | brimstone. Be prepared to make a nice cup of tea. > http://carfax.org.uk/ | > PGP: E2AB1DE4 | ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [btrfs tools] ability to fail a device... 2015-09-08 19:43 ` Ian Kumlien @ 2015-09-08 19:55 ` Ian Kumlien 2015-09-08 20:00 ` Ian Kumlien 0 siblings, 1 reply; 15+ messages in thread From: Ian Kumlien @ 2015-09-08 19:55 UTC (permalink / raw) To: Hugo Mills, Ian Kumlien, linux-btrfs On 8 September 2015 at 21:43, Ian Kumlien <ian.kumlien@gmail.com> wrote: > On 8 September 2015 at 21:34, Hugo Mills <hugo@carfax.org.uk> wrote: >> On Tue, Sep 08, 2015 at 09:18:05PM +0200, Ian Kumlien wrote: [--8<--] >> Physically removing it is the way to go (or disabling it using echo >> offline >/sys/block/sda/device/state). Once you've done that, you can >> mount the degraded FS with -odegraded, then either add a new device >> and balance to restore the RAID-1, or balance with >> -{d,m}convert=single to drop the redundancy to single. > > This did not work... And removing the pyscial device is not the answer either... until i did a read only mount ;) Didn't expect it to fail with unable to open ctree like that... [--8<--] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [btrfs tools] ability to fail a device... 2015-09-08 19:55 ` Ian Kumlien @ 2015-09-08 20:00 ` Ian Kumlien 2015-09-08 20:08 ` Chris Murphy 0 siblings, 1 reply; 15+ messages in thread From: Ian Kumlien @ 2015-09-08 20:00 UTC (permalink / raw) To: Hugo Mills, Ian Kumlien, linux-btrfs On 8 September 2015 at 21:55, Ian Kumlien <ian.kumlien@gmail.com> wrote: > On 8 September 2015 at 21:43, Ian Kumlien <ian.kumlien@gmail.com> wrote: >> On 8 September 2015 at 21:34, Hugo Mills <hugo@carfax.org.uk> wrote: >>> On Tue, Sep 08, 2015 at 09:18:05PM +0200, Ian Kumlien wrote: > [--8<--] > >>> Physically removing it is the way to go (or disabling it using echo >>> offline >/sys/block/sda/device/state). Once you've done that, you can >>> mount the degraded FS with -odegraded, then either add a new device >>> and balance to restore the RAID-1, or balance with >>> -{d,m}convert=single to drop the redundancy to single. >> >> This did not work... > > And removing the pyscial device is not the answer either... until i > did a read only mount ;) > > Didn't expect it to fail with unable to open ctree like that... Someone thought they were done too early, only one disk => read only mount. But, readonly mount => no balance. I think something is wrong.... btrfs balance start -dconvert=single -mconvert=single /mnt/disk/ ERROR: error during balancing '/mnt/disk/' - Read-only file system btrfs dev delete missing /mnt/disk/ ERROR: error removing the device 'missing' - Read-only file system Any mount without ro becomes: [ 507.236652] BTRFS info (device sda2): allowing degraded mounts [ 507.236655] BTRFS info (device sda2): disk space caching is enabled [ 507.325365] BTRFS: bdev (null) errs: wr 2036894, rd 2031380, flush 705, corrupt 0, gen 0 [ 510.983321] BTRFS: too many missing devices, writeable mount is not allowed [ 511.006241] BTRFS: open_ctree failed And one of them has to give! ;) > [--8<--] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [btrfs tools] ability to fail a device... 2015-09-08 20:00 ` Ian Kumlien @ 2015-09-08 20:08 ` Chris Murphy 2015-09-08 20:13 ` Ian Kumlien 0 siblings, 1 reply; 15+ messages in thread From: Chris Murphy @ 2015-09-08 20:08 UTC (permalink / raw) To: Ian Kumlien; +Cc: Hugo Mills, Btrfs BTRFS On Tue, Sep 8, 2015 at 2:00 PM, Ian Kumlien <ian.kumlien@gmail.com> wrote: > On 8 September 2015 at 21:55, Ian Kumlien <ian.kumlien@gmail.com> wrote: >> On 8 September 2015 at 21:43, Ian Kumlien <ian.kumlien@gmail.com> wrote: >>> On 8 September 2015 at 21:34, Hugo Mills <hugo@carfax.org.uk> wrote: >>>> On Tue, Sep 08, 2015 at 09:18:05PM +0200, Ian Kumlien wrote: >> [--8<--] >> >>>> Physically removing it is the way to go (or disabling it using echo >>>> offline >/sys/block/sda/device/state). Once you've done that, you can >>>> mount the degraded FS with -odegraded, then either add a new device >>>> and balance to restore the RAID-1, or balance with >>>> -{d,m}convert=single to drop the redundancy to single. >>> >>> This did not work... >> >> And removing the pyscial device is not the answer either... until i >> did a read only mount ;) >> >> Didn't expect it to fail with unable to open ctree like that... > > Someone thought they were done too early, only one disk => read only > mount. But, readonly mount => no balance. > > I think something is wrong.... > > btrfs balance start -dconvert=single -mconvert=single /mnt/disk/ > ERROR: error during balancing '/mnt/disk/' - Read-only file system > > btrfs dev delete missing /mnt/disk/ > ERROR: error removing the device 'missing' - Read-only file system > > Any mount without ro becomes: > [ 507.236652] BTRFS info (device sda2): allowing degraded mounts > [ 507.236655] BTRFS info (device sda2): disk space caching is enabled > [ 507.325365] BTRFS: bdev (null) errs: wr 2036894, rd 2031380, flush > 705, corrupt 0, gen 0 > [ 510.983321] BTRFS: too many missing devices, writeable mount is not allowed > [ 511.006241] BTRFS: open_ctree failed > > And one of them has to give! ;) You've run into this: https://bugzilla.kernel.org/show_bug.cgi?id=92641 -- Chris Murphy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [btrfs tools] ability to fail a device... 2015-09-08 20:08 ` Chris Murphy @ 2015-09-08 20:13 ` Ian Kumlien 2015-09-08 20:17 ` Chris Murphy 0 siblings, 1 reply; 15+ messages in thread From: Ian Kumlien @ 2015-09-08 20:13 UTC (permalink / raw) To: Chris Murphy; +Cc: Hugo Mills, Btrfs BTRFS On 8 September 2015 at 22:08, Chris Murphy <lists@colorremedies.com> wrote: > On Tue, Sep 8, 2015 at 2:00 PM, Ian Kumlien <ian.kumlien@gmail.com> wrote: [--8<--] >> Someone thought they were done too early, only one disk => read only >> mount. But, readonly mount => no balance. >> >> I think something is wrong.... >> >> btrfs balance start -dconvert=single -mconvert=single /mnt/disk/ >> ERROR: error during balancing '/mnt/disk/' - Read-only file system >> >> btrfs dev delete missing /mnt/disk/ >> ERROR: error removing the device 'missing' - Read-only file system >> >> Any mount without ro becomes: >> [ 507.236652] BTRFS info (device sda2): allowing degraded mounts >> [ 507.236655] BTRFS info (device sda2): disk space caching is enabled >> [ 507.325365] BTRFS: bdev (null) errs: wr 2036894, rd 2031380, flush >> 705, corrupt 0, gen 0 >> [ 510.983321] BTRFS: too many missing devices, writeable mount is not allowed >> [ 511.006241] BTRFS: open_ctree failed >> >> And one of them has to give! ;) > > > You've run into this: > https://bugzilla.kernel.org/show_bug.cgi?id=92641 Ah, I thought it might not be known - I'm currently copying the files since a read only mount is "good enough" for that -o degraded should allow readwrite *IF* the data can be written to.... My question is also, would this keep me from "adding devices"? I mean, it did seem like a catch 22 earlier, but that would really make a mess of things... > -- > Chris Murphy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [btrfs tools] ability to fail a device... 2015-09-08 20:13 ` Ian Kumlien @ 2015-09-08 20:17 ` Chris Murphy 2015-09-08 20:24 ` Ian Kumlien 2015-09-08 20:28 ` Hugo Mills 0 siblings, 2 replies; 15+ messages in thread From: Chris Murphy @ 2015-09-08 20:17 UTC (permalink / raw) To: Ian Kumlien; +Cc: Chris Murphy, Hugo Mills, Btrfs BTRFS On Tue, Sep 8, 2015 at 2:13 PM, Ian Kumlien <ian.kumlien@gmail.com> wrote: > On 8 September 2015 at 22:08, Chris Murphy <lists@colorremedies.com> wrote: >> On Tue, Sep 8, 2015 at 2:00 PM, Ian Kumlien <ian.kumlien@gmail.com> wrote: > > [--8<--] > >>> Someone thought they were done too early, only one disk => read only >>> mount. But, readonly mount => no balance. >>> >>> I think something is wrong.... >>> >>> btrfs balance start -dconvert=single -mconvert=single /mnt/disk/ >>> ERROR: error during balancing '/mnt/disk/' - Read-only file system >>> >>> btrfs dev delete missing /mnt/disk/ >>> ERROR: error removing the device 'missing' - Read-only file system >>> >>> Any mount without ro becomes: >>> [ 507.236652] BTRFS info (device sda2): allowing degraded mounts >>> [ 507.236655] BTRFS info (device sda2): disk space caching is enabled >>> [ 507.325365] BTRFS: bdev (null) errs: wr 2036894, rd 2031380, flush >>> 705, corrupt 0, gen 0 >>> [ 510.983321] BTRFS: too many missing devices, writeable mount is not allowed >>> [ 511.006241] BTRFS: open_ctree failed >>> >>> And one of them has to give! ;) >> >> >> You've run into this: >> https://bugzilla.kernel.org/show_bug.cgi?id=92641 > > Ah, I thought it might not be known - I'm currently copying the files > since a read only mount is "good enough" for that > > -o degraded should allow readwrite *IF* the data can be written to.... > My question is also, would this keep me from "adding devices"? > I mean, it did seem like a catch 22 earlier, but that would really > make a mess of things... It is not possible to add a device to an ro filesystem, so effectively the fs read-writeability is broken in this case. -- Chris Murphy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [btrfs tools] ability to fail a device... 2015-09-08 20:17 ` Chris Murphy @ 2015-09-08 20:24 ` Ian Kumlien 2015-09-08 20:28 ` Hugo Mills 1 sibling, 0 replies; 15+ messages in thread From: Ian Kumlien @ 2015-09-08 20:24 UTC (permalink / raw) To: Chris Murphy; +Cc: Hugo Mills, Btrfs BTRFS On 8 September 2015 at 22:17, Chris Murphy <lists@colorremedies.com> wrote: > On Tue, Sep 8, 2015 at 2:13 PM, Ian Kumlien <ian.kumlien@gmail.com> wrote: >> On 8 September 2015 at 22:08, Chris Murphy <lists@colorremedies.com> wrote: >>> On Tue, Sep 8, 2015 at 2:00 PM, Ian Kumlien <ian.kumlien@gmail.com> wrote: >> >> [--8<--] >> >>>> Someone thought they were done too early, only one disk => read only >>>> mount. But, readonly mount => no balance. >>>> >>>> I think something is wrong.... >>>> >>>> btrfs balance start -dconvert=single -mconvert=single /mnt/disk/ >>>> ERROR: error during balancing '/mnt/disk/' - Read-only file system >>>> >>>> btrfs dev delete missing /mnt/disk/ >>>> ERROR: error removing the device 'missing' - Read-only file system >>>> >>>> Any mount without ro becomes: >>>> [ 507.236652] BTRFS info (device sda2): allowing degraded mounts >>>> [ 507.236655] BTRFS info (device sda2): disk space caching is enabled >>>> [ 507.325365] BTRFS: bdev (null) errs: wr 2036894, rd 2031380, flush >>>> 705, corrupt 0, gen 0 >>>> [ 510.983321] BTRFS: too many missing devices, writeable mount is not allowed >>>> [ 511.006241] BTRFS: open_ctree failed >>>> >>>> And one of them has to give! ;) >>> >>> >>> You've run into this: >>> https://bugzilla.kernel.org/show_bug.cgi?id=92641 >> >> Ah, I thought it might not be known - I'm currently copying the files >> since a read only mount is "good enough" for that >> >> -o degraded should allow readwrite *IF* the data can be written to.... >> My question is also, would this keep me from "adding devices"? >> I mean, it did seem like a catch 22 earlier, but that would really >> make a mess of things... > > It is not possible to add a device to an ro filesystem, so effectively > the fs read-writeability is broken in this case. Wow, now that's quite a bug! > -- > Chris Murphy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [btrfs tools] ability to fail a device... 2015-09-08 20:17 ` Chris Murphy 2015-09-08 20:24 ` Ian Kumlien @ 2015-09-08 20:28 ` Hugo Mills 2015-09-08 20:33 ` Ian Kumlien 1 sibling, 1 reply; 15+ messages in thread From: Hugo Mills @ 2015-09-08 20:28 UTC (permalink / raw) To: Chris Murphy; +Cc: Ian Kumlien, Btrfs BTRFS [-- Attachment #1: Type: text/plain, Size: 2464 bytes --] On Tue, Sep 08, 2015 at 02:17:55PM -0600, Chris Murphy wrote: > On Tue, Sep 8, 2015 at 2:13 PM, Ian Kumlien <ian.kumlien@gmail.com> wrote: > > On 8 September 2015 at 22:08, Chris Murphy <lists@colorremedies.com> wrote: > >> On Tue, Sep 8, 2015 at 2:00 PM, Ian Kumlien <ian.kumlien@gmail.com> wrote: > > > > [--8<--] > > > >>> Someone thought they were done too early, only one disk => read only > >>> mount. But, readonly mount => no balance. > >>> > >>> I think something is wrong.... > >>> > >>> btrfs balance start -dconvert=single -mconvert=single /mnt/disk/ > >>> ERROR: error during balancing '/mnt/disk/' - Read-only file system > >>> > >>> btrfs dev delete missing /mnt/disk/ > >>> ERROR: error removing the device 'missing' - Read-only file system > >>> > >>> Any mount without ro becomes: > >>> [ 507.236652] BTRFS info (device sda2): allowing degraded mounts > >>> [ 507.236655] BTRFS info (device sda2): disk space caching is enabled > >>> [ 507.325365] BTRFS: bdev (null) errs: wr 2036894, rd 2031380, flush > >>> 705, corrupt 0, gen 0 > >>> [ 510.983321] BTRFS: too many missing devices, writeable mount is not allowed > >>> [ 511.006241] BTRFS: open_ctree failed > >>> > >>> And one of them has to give! ;) > >> > >> > >> You've run into this: > >> https://bugzilla.kernel.org/show_bug.cgi?id=92641 > > > > Ah, I thought it might not be known - I'm currently copying the files > > since a read only mount is "good enough" for that > > > > -o degraded should allow readwrite *IF* the data can be written to.... > > My question is also, would this keep me from "adding devices"? > > I mean, it did seem like a catch 22 earlier, but that would really > > make a mess of things... > > It is not possible to add a device to an ro filesystem, so effectively > the fs read-writeability is broken in this case. I thought this particular issue had already been dealt with in 4.2? (i.e. you can still mount an FS RW if it's degraded, but there are still some single chunks on it). Ian: If you can still mount the FS read/write with both devices in it, then it might be worth trying to balance away the problematic single chunks with: btrfs bal start -dprofiles=single -mprofiles=single /mountpoint Then unmount, pull the dead drive, and remount -odegraded. Hugo. -- Hugo Mills | The early bird gets the worm, but the second mouse hugo@... carfax.org.uk | gets the cheese. http://carfax.org.uk/ | PGP: E2AB1DE4 | [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [btrfs tools] ability to fail a device... 2015-09-08 20:28 ` Hugo Mills @ 2015-09-08 20:33 ` Ian Kumlien 2015-09-08 20:40 ` Hugo Mills 0 siblings, 1 reply; 15+ messages in thread From: Ian Kumlien @ 2015-09-08 20:33 UTC (permalink / raw) To: Hugo Mills, Chris Murphy, Ian Kumlien, Btrfs BTRFS On 8 September 2015 at 22:28, Hugo Mills <hugo@carfax.org.uk> wrote: > On Tue, Sep 08, 2015 at 02:17:55PM -0600, Chris Murphy wrote: >> On Tue, Sep 8, 2015 at 2:13 PM, Ian Kumlien <ian.kumlien@gmail.com> wrote: >> > On 8 September 2015 at 22:08, Chris Murphy <lists@colorremedies.com> wrote: >> >> On Tue, Sep 8, 2015 at 2:00 PM, Ian Kumlien <ian.kumlien@gmail.com> wrote: [--8<--] >> > -o degraded should allow readwrite *IF* the data can be written to.... >> > My question is also, would this keep me from "adding devices"? >> > I mean, it did seem like a catch 22 earlier, but that would really >> > make a mess of things... >> >> It is not possible to add a device to an ro filesystem, so effectively >> the fs read-writeability is broken in this case. > > I thought this particular issue had already been dealt with in 4.2? > (i.e. you can still mount an FS RW if it's degraded, but there are > still some single chunks on it). Single chunks are only on sda - not on sdb... There should be no problem... > Ian: If you can still mount the FS read/write with both devices in > it, then it might be worth trying to balance away the problematic > single chunks with: > > btrfs bal start -dprofiles=single -mprofiles=single /mountpoint > > Then unmount, pull the dead drive, and remount -odegraded. It never completes, too many errors and eventually the disk disappears until the machine is turned off and on again... (normal disk reset doesn't work) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [btrfs tools] ability to fail a device... 2015-09-08 20:33 ` Ian Kumlien @ 2015-09-08 20:40 ` Hugo Mills 0 siblings, 0 replies; 15+ messages in thread From: Hugo Mills @ 2015-09-08 20:40 UTC (permalink / raw) To: Ian Kumlien; +Cc: Chris Murphy, Btrfs BTRFS [-- Attachment #1: Type: text/plain, Size: 2198 bytes --] On Tue, Sep 08, 2015 at 10:33:54PM +0200, Ian Kumlien wrote: > On 8 September 2015 at 22:28, Hugo Mills <hugo@carfax.org.uk> wrote: > > On Tue, Sep 08, 2015 at 02:17:55PM -0600, Chris Murphy wrote: > >> On Tue, Sep 8, 2015 at 2:13 PM, Ian Kumlien <ian.kumlien@gmail.com> wrote: > >> > On 8 September 2015 at 22:08, Chris Murphy <lists@colorremedies.com> wrote: > >> >> On Tue, Sep 8, 2015 at 2:00 PM, Ian Kumlien <ian.kumlien@gmail.com> wrote: > > [--8<--] > > >> > -o degraded should allow readwrite *IF* the data can be written to.... > >> > My question is also, would this keep me from "adding devices"? > >> > I mean, it did seem like a catch 22 earlier, but that would really > >> > make a mess of things... > >> > >> It is not possible to add a device to an ro filesystem, so effectively > >> the fs read-writeability is broken in this case. > > > > I thought this particular issue had already been dealt with in 4.2? > > (i.e. you can still mount an FS RW if it's degraded, but there are > > still some single chunks on it). > > Single chunks are only on sda - not on sdb... > > There should be no problem... The check is more primitive than that at the moment, sadly. It just checks that the number of missing devices is smaller than or equal to the acceptable device loss for each RAID profile present on the FS. > > Ian: If you can still mount the FS read/write with both devices in > > it, then it might be worth trying to balance away the problematic > > single chunks with: > > > > btrfs bal start -dprofiles=single -mprofiles=single /mountpoint > > > > Then unmount, pull the dead drive, and remount -odegraded. > > It never completes, too many errors and eventually the disk disappears > until the machine is turned off and on again... (normal disk reset > doesn't work) The profiles= parameters should limit the balance to just the three single chunks, and will remove them (because they're empty). It shouldn't hit the metadata too hard, even if it's raising lots of errors. Hugo. -- Hugo Mills | The early bird gets the worm, but the second mouse hugo@... carfax.org.uk | gets the cheese. http://carfax.org.uk/ | PGP: E2AB1DE4 | [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [btrfs tools] ability to fail a device... 2015-09-08 19:34 ` Hugo Mills 2015-09-08 19:43 ` Ian Kumlien @ 2015-09-09 1:35 ` Anand Jain 2015-09-09 7:07 ` Ian Kumlien 1 sibling, 1 reply; 15+ messages in thread From: Anand Jain @ 2015-09-09 1:35 UTC (permalink / raw) To: Hugo Mills; +Cc: Ian Kumlien, linux-btrfs On 09/09/2015 03:34 AM, Hugo Mills wrote: > On Tue, Sep 08, 2015 at 09:18:05PM +0200, Ian Kumlien wrote: >> Hi, >> >> Currently i have a raid1 configuration on two disks where one of them >> is failing. >> >> But since: >> btrfs fi df /mnt/disk/ >> Data, RAID1: total=858.00GiB, used=638.16GiB >> Data, single: total=1.00GiB, used=256.00KiB >> System, RAID1: total=32.00MiB, used=132.00KiB >> Metadata, RAID1: total=4.00GiB, used=1.21GiB >> GlobalReserve, single: total=412.00MiB, used=0.00B >> >> There should be no problem in failing one disk... Or so i thought! >> >> btrfs dev delete /dev/sdb2 /mnt/disk/ >> ERROR: error removing the device '/dev/sdb2' - unable to go below two >> devices on raid1 > > dev delete is more like a reshaping operation in mdadm: it tries to > remove a device safely whilst retaining all of the redundancy > guarantees. You can't go down to one device with RAID-1 and still keep > the redundancy. > > dev delete is really for managed device removal under non-failure > conditions, not for error recovery. > >> And i can't issue rebalance either since it will tell me about errors >> until the failing disk dies. >> >> Whats even more interesting is that i can't mount just the working >> disk - ie if the other disk >> *has* failed and is inaccessible... though, i haven't tried physically >> removing it... > > Physically removing it is the way to go (or disabling it using echo > offline >/sys/block/sda/device/state). Once you've done that, you can > mount the degraded FS with -odegraded, then either add a new device > and balance to restore the RAID-1, or balance with > -{d,m}convert=single to drop the redundancy to single. its like you _must_ add a disk in this context otherwise the volume will render unmountable in the next mount cycle. the below mentioned patch has more details. >> mdam has fail and remove, I assume for this reason - perhaps it's >> something that should be added? > > I think there should be a btrfs dev drop, which is the fail-like > operation: tell the FS that a device is useless, and should be dropped > from the array, so the FS doesn't keep trying to write to it. That's > not implemented yet, though. There is a patch set to handle this.. 'Btrfs: introduce function to handle device offline' Thanks, Anand > Hugo. > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [btrfs tools] ability to fail a device... 2015-09-09 1:35 ` Anand Jain @ 2015-09-09 7:07 ` Ian Kumlien 2015-09-09 8:54 ` Ian Kumlien 0 siblings, 1 reply; 15+ messages in thread From: Ian Kumlien @ 2015-09-09 7:07 UTC (permalink / raw) To: Anand Jain; +Cc: Hugo Mills, Btrfs BTRFS On 9 September 2015 at 03:35, Anand Jain <anand.jain@oracle.com> wrote: > On 09/09/2015 03:34 AM, Hugo Mills wrote: >> >> On Tue, Sep 08, 2015 at 09:18:05PM +0200, Ian Kumlien wrote: >>> >>> Hi, >>> >>> Currently i have a raid1 configuration on two disks where one of them >>> is failing. >>> >>> But since: >>> btrfs fi df /mnt/disk/ >>> Data, RAID1: total=858.00GiB, used=638.16GiB >>> Data, single: total=1.00GiB, used=256.00KiB >>> System, RAID1: total=32.00MiB, used=132.00KiB >>> Metadata, RAID1: total=4.00GiB, used=1.21GiB >>> GlobalReserve, single: total=412.00MiB, used=0.00B >>> >>> There should be no problem in failing one disk... Or so i thought! >>> >>> btrfs dev delete /dev/sdb2 /mnt/disk/ >>> ERROR: error removing the device '/dev/sdb2' - unable to go below two >>> devices on raid1 >> >> >> dev delete is more like a reshaping operation in mdadm: it tries to >> remove a device safely whilst retaining all of the redundancy >> guarantees. You can't go down to one device with RAID-1 and still keep >> the redundancy. >> >> dev delete is really for managed device removal under non-failure >> conditions, not for error recovery. >> >>> And i can't issue rebalance either since it will tell me about errors >>> until the failing disk dies. >>> >>> Whats even more interesting is that i can't mount just the working >>> disk - ie if the other disk >>> *has* failed and is inaccessible... though, i haven't tried physically >>> removing it... >> >> >> Physically removing it is the way to go (or disabling it using echo >> offline >/sys/block/sda/device/state). Once you've done that, you can >> mount the degraded FS with -odegraded, then either add a new device >> and balance to restore the RAID-1, or balance with >> -{d,m}convert=single to drop the redundancy to single. > > > its like you _must_ add a disk in this context otherwise the volume will > render unmountable in the next mount cycle. the below mentioned patch has > more details. Which would mean that if the disk dies, you have a unusable disk. (and in my case adding a disk might not help since it would try to read from the broken one until it completely fails again) >>> mdam has fail and remove, I assume for this reason - perhaps it's >>> something that should be added? >> >> >> I think there should be a btrfs dev drop, which is the fail-like >> operation: tell the FS that a device is useless, and should be dropped >> from the array, so the FS doesn't keep trying to write to it. That's >> not implemented yet, though. > > > > There is a patch set to handle this.. > 'Btrfs: introduce function to handle device offline' I'll have a look > Thanks, Anand > >> Hugo. >> > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [btrfs tools] ability to fail a device... 2015-09-09 7:07 ` Ian Kumlien @ 2015-09-09 8:54 ` Ian Kumlien 0 siblings, 0 replies; 15+ messages in thread From: Ian Kumlien @ 2015-09-09 8:54 UTC (permalink / raw) To: Anand Jain; +Cc: Hugo Mills, Btrfs BTRFS On 9 September 2015 at 09:07, Ian Kumlien <ian.kumlien@gmail.com> wrote: > On 9 September 2015 at 03:35, Anand Jain <anand.jain@oracle.com> wrote: > > There is a patch set to handle this.. > > 'Btrfs: introduce function to handle device offline' > > I'll have a look So from my very quick look at the code that i could find (can only find patch set 3 for some reason) this would not fix it properly ;) (Completely lost all it's formatting but:) + if ((rw_devices > 1) && + (degrade_option || tolerated_fail > missing)) { + btrfs_sysfs_rm_device_link(fs_devices, dev, 0); + __btrfs_put_dev_offline(dev); + return; + } I think that this has to be a evaluation on if there is a "complete copy" on the device(s) we have. If there is, then we can populate any other device and the system should still be viable (this includes things like 'doing the math' to replace missing disks in raid5 and 6 btw) Do you have the patches somewhere? They don't seem to apply to 4.2 (been looking at line numbers) ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-09-09 8:54 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-09-08 19:18 [btrfs tools] ability to fail a device Ian Kumlien 2015-09-08 19:34 ` Hugo Mills 2015-09-08 19:43 ` Ian Kumlien 2015-09-08 19:55 ` Ian Kumlien 2015-09-08 20:00 ` Ian Kumlien 2015-09-08 20:08 ` Chris Murphy 2015-09-08 20:13 ` Ian Kumlien 2015-09-08 20:17 ` Chris Murphy 2015-09-08 20:24 ` Ian Kumlien 2015-09-08 20:28 ` Hugo Mills 2015-09-08 20:33 ` Ian Kumlien 2015-09-08 20:40 ` Hugo Mills 2015-09-09 1:35 ` Anand Jain 2015-09-09 7:07 ` Ian Kumlien 2015-09-09 8:54 ` Ian Kumlien
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).