btrfs fail behavior when a device vanishes

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* btrfs fail behavior when a device vanishes
@ 2015-12-31 20:11 Chris Murphy
  2015-12-31 20:17 ` Chris Murphy
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Chris Murphy @ 2015-12-31 20:11 UTC (permalink / raw)
  To: Btrfs BTRFS

This is a torture test, no data is at risk.

Two devices, btrfs raid1 with some stuff on them.
Copy from that array, elsewhere.
During copy, yank the active device.

dmesg shows many of these:

[ 7179.373245] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr
652123, rd 697237, flush 0, corrupt 0, gen 0

Why are the write errors nearly as high as the read errors, when there
is only a copy from this device happening?

Is Btrfs trying to write the read error count (for dev stats) of sdc1
onto sdc1, and that causes a write error?

Also, is there a command to make a block device go away? At least in
gnome shell when I eject a USB stick, it isn't just umounted, it no
longer appears with lsblk or blkid, so I'm wondering if there's a way
to vanish a misbehaving device so that Btrfs isn't bogged down with a
flood of retries.

In case anyone is curious, the entire dmesg from device insertion,
formatting, mounting, copying to then from, and device yanking is here
(should be permanent):
http://pastebin.com/raw/Wfe1pY4N

And the copy did successfully complete anyway, and the resulting files
have the same hashes as their originals. So, yay, despite the noisy
messages.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs fail behavior when a device vanishes
  2015-12-31 20:11 btrfs fail behavior when a device vanishes Chris Murphy
@ 2015-12-31 20:17 ` Chris Murphy
  2015-12-31 20:24 ` Hugo Mills
  2015-12-31 21:23 ` ronnie sahlberg
  2 siblings, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2015-12-31 20:17 UTC (permalink / raw)
  To: Btrfs BTRFS

On Thu, Dec 31, 2015 at 1:11 PM, Chris Murphy <lists@colorremedies.com> wrote:

> Also, is there a command to make a block device go away?

Maybe?
echo 1 > /sys/block/device-name/device/delete



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs fail behavior when a device vanishes
  2015-12-31 20:11 btrfs fail behavior when a device vanishes Chris Murphy
  2015-12-31 20:17 ` Chris Murphy
@ 2015-12-31 20:24 ` Hugo Mills
  2015-12-31 20:43   ` Chris Murphy
  2015-12-31 21:23 ` ronnie sahlberg
  2 siblings, 1 reply; 10+ messages in thread
From: Hugo Mills @ 2015-12-31 20:24 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1657 bytes --]

On Thu, Dec 31, 2015 at 01:11:25PM -0700, Chris Murphy wrote:
> This is a torture test, no data is at risk.
> 
> Two devices, btrfs raid1 with some stuff on them.
> Copy from that array, elsewhere.
> During copy, yank the active device.
> 
> dmesg shows many of these:
> 
> [ 7179.373245] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr
> 652123, rd 697237, flush 0, corrupt 0, gen 0
> 
> Why are the write errors nearly as high as the read errors, when there
> is only a copy from this device happening?

   I'm guessing completely here, but maybe it's trying to write
corrected data to sdc1, because the original read failed?

   Hugo.

> Is Btrfs trying to write the read error count (for dev stats) of sdc1
> onto sdc1, and that causes a write error?
> 
> Also, is there a command to make a block device go away? At least in
> gnome shell when I eject a USB stick, it isn't just umounted, it no
> longer appears with lsblk or blkid, so I'm wondering if there's a way
> to vanish a misbehaving device so that Btrfs isn't bogged down with a
> flood of retries.
> 
> In case anyone is curious, the entire dmesg from device insertion,
> formatting, mounting, copying to then from, and device yanking is here
> (should be permanent):
> http://pastebin.com/raw/Wfe1pY4N
> 
> And the copy did successfully complete anyway, and the resulting files
> have the same hashes as their originals. So, yay, despite the noisy
> messages.
> 
> 

-- 
Hugo Mills             | Well, sir, the floor is yours. But remember, the
hugo@... carfax.org.uk | roof is ours!
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                                             The Goons

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs fail behavior when a device vanishes
  2015-12-31 20:24 ` Hugo Mills
@ 2015-12-31 20:43   ` Chris Murphy
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2015-12-31 20:43 UTC (permalink / raw)
  To: Hugo Mills, Chris Murphy, Btrfs BTRFS

On Thu, Dec 31, 2015 at 1:24 PM, Hugo Mills <hugo@carfax.org.uk> wrote:
> On Thu, Dec 31, 2015 at 01:11:25PM -0700, Chris Murphy wrote:
>> This is a torture test, no data is at risk.
>>
>> Two devices, btrfs raid1 with some stuff on them.
>> Copy from that array, elsewhere.
>> During copy, yank the active device.
>>
>> dmesg shows many of these:
>>
>> [ 7179.373245] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr
>> 652123, rd 697237, flush 0, corrupt 0, gen 0
>>
>> Why are the write errors nearly as high as the read errors, when there
>> is only a copy from this device happening?
>
>    I'm guessing completely here, but maybe it's trying to write
> corrected data to sdc1, because the original read failed?
>

Egads. OK that makes sense.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs fail behavior when a device vanishes
  2015-12-31 20:11 btrfs fail behavior when a device vanishes Chris Murphy
  2015-12-31 20:17 ` Chris Murphy
  2015-12-31 20:24 ` Hugo Mills
@ 2015-12-31 21:23 ` ronnie sahlberg
  2016-01-01  1:09   ` ronnie sahlberg
  2 siblings, 1 reply; 10+ messages in thread
From: ronnie sahlberg @ 2015-12-31 21:23 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Thu, Dec 31, 2015 at 12:11 PM, Chris Murphy <lists@colorremedies.com> wrote:
> This is a torture test, no data is at risk.
>
> Two devices, btrfs raid1 with some stuff on them.
> Copy from that array, elsewhere.
> During copy, yank the active device.
>
> dmesg shows many of these:
>
> [ 7179.373245] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr
> 652123, rd 697237, flush 0, corrupt 0, gen 0

For automated tests a good way could be to build a multi device btrfs filesystem
ontop of it.
For example STGT exporting n# volumes and then mount via the loopback interface.
Then you could just use tgtadm to add / remove the device in a
controlled fashion and to any filesystem it will look exactly like if
you pulled the device physically.

This allows you to run fully automated and scripted "how long before
the filesystem goes into total dataloss mode" tests.

If you want more fine control than just plug/unplug on a live
filesystem , you can use
https://github.com/rsahlberg/flaky-stgt
Again, this uses iSCSI but it allows you to script event such as
"this range of blocks are now Uncorrectable read error" etc.
To automatically stress test that the filesystem can deal with it.

I created this STGT fork so that filesystem testers would have a way
to automate testing of their failure paths.
In particular for BTRFS which seems to still be incredible fragile
when devices fail or disconnect.

Unfortunately I don't think anyone cared very much. :-(
Please BTRFS devs,  please use something like this for testing of
failure modes and robustness. Please!

>
> Why are the write errors nearly as high as the read errors, when there
> is only a copy from this device happening?
>
> Is Btrfs trying to write the read error count (for dev stats) of sdc1
> onto sdc1, and that causes a write error?
>
> Also, is there a command to make a block device go away? At least in
> gnome shell when I eject a USB stick, it isn't just umounted, it no
> longer appears with lsblk or blkid, so I'm wondering if there's a way
> to vanish a misbehaving device so that Btrfs isn't bogged down with a
> flood of retries.
>
> In case anyone is curious, the entire dmesg from device insertion,
> formatting, mounting, copying to then from, and device yanking is here
> (should be permanent):
> http://pastebin.com/raw/Wfe1pY4N
>
> And the copy did successfully complete anyway, and the resulting files
> have the same hashes as their originals. So, yay, despite the noisy
> messages.
>
>
> --
> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs fail behavior when a device vanishes
  2015-12-31 21:23 ` ronnie sahlberg
@ 2016-01-01  1:09   ` ronnie sahlberg
  2016-01-01  1:27     ` Chris Murphy
  0 siblings, 1 reply; 10+ messages in thread
From: ronnie sahlberg @ 2016-01-01  1:09 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 3341 bytes --]

Here is a kludge I hacked up.
Someone that cares could clean this up and start building a proper
test suite or something.

This test script creates a 3 disk raid1 filesystem and very slowly
writes a large file onto the filesystem while, one by one each disk is
disconnected then reconnected in a loop.
It is fairly trivial to trigger dataloss when devices are bounced like this.

You have to run the script as root due to the calls to [u]mount and iscsiadm




On Thu, Dec 31, 2015 at 1:23 PM, ronnie sahlberg
<ronniesahlberg@gmail.com> wrote:
> On Thu, Dec 31, 2015 at 12:11 PM, Chris Murphy <lists@colorremedies.com> wrote:
>> This is a torture test, no data is at risk.
>>
>> Two devices, btrfs raid1 with some stuff on them.
>> Copy from that array, elsewhere.
>> During copy, yank the active device.
>>
>> dmesg shows many of these:
>>
>> [ 7179.373245] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr
>> 652123, rd 697237, flush 0, corrupt 0, gen 0
>
> For automated tests a good way could be to build a multi device btrfs filesystem
> ontop of it.
> For example STGT exporting n# volumes and then mount via the loopback interface.
> Then you could just use tgtadm to add / remove the device in a
> controlled fashion and to any filesystem it will look exactly like if
> you pulled the device physically.
>
> This allows you to run fully automated and scripted "how long before
> the filesystem goes into total dataloss mode" tests.
>
>
>
> If you want more fine control than just plug/unplug on a live
> filesystem , you can use
> https://github.com/rsahlberg/flaky-stgt
> Again, this uses iSCSI but it allows you to script event such as
> "this range of blocks are now Uncorrectable read error" etc.
> To automatically stress test that the filesystem can deal with it.
>
>
> I created this STGT fork so that filesystem testers would have a way
> to automate testing of their failure paths.
> In particular for BTRFS which seems to still be incredible fragile
> when devices fail or disconnect.
>
> Unfortunately I don't think anyone cared very much. :-(
> Please BTRFS devs,  please use something like this for testing of
> failure modes and robustness. Please!
>
>
>
>>
>> Why are the write errors nearly as high as the read errors, when there
>> is only a copy from this device happening?
>>
>> Is Btrfs trying to write the read error count (for dev stats) of sdc1
>> onto sdc1, and that causes a write error?
>>
>> Also, is there a command to make a block device go away? At least in
>> gnome shell when I eject a USB stick, it isn't just umounted, it no
>> longer appears with lsblk or blkid, so I'm wondering if there's a way
>> to vanish a misbehaving device so that Btrfs isn't bogged down with a
>> flood of retries.
>>
>> In case anyone is curious, the entire dmesg from device insertion,
>> formatting, mounting, copying to then from, and device yanking is here
>> (should be permanent):
>> http://pastebin.com/raw/Wfe1pY4N
>>
>> And the copy did successfully complete anyway, and the resulting files
>> have the same hashes as their originals. So, yay, despite the noisy
>> messages.
>>
>>
>> --
>> Chris Murphy
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: test_0100_write_raid1_unplug.sh --]
[-- Type: application/x-sh, Size: 2302 bytes --]

[-- Attachment #3: functions.sh --]
[-- Type: application/x-sh, Size: 1923 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs fail behavior when a device vanishes
  2016-01-01  1:09   ` ronnie sahlberg
@ 2016-01-01  1:27     ` Chris Murphy
  2016-01-01  1:37       ` ronnie sahlberg
                         ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Chris Murphy @ 2016-01-01  1:27 UTC (permalink / raw)
  To: ronnie sahlberg; +Cc: Chris Murphy, Btrfs BTRFS

On Thu, Dec 31, 2015 at 6:09 PM, ronnie sahlberg
<ronniesahlberg@gmail.com> wrote:
> Here is a kludge I hacked up.
> Someone that cares could clean this up and start building a proper
> test suite or something.
>
> This test script creates a 3 disk raid1 filesystem and very slowly
> writes a large file onto the filesystem while, one by one each disk is
> disconnected then reconnected in a loop.
> It is fairly trivial to trigger dataloss when devices are bounced like this.

Yes, it's quite a torture test. I'd expect this would be a problem for
Btrfs until this feature is done at least:

https://btrfs.wiki.kernel.org/index.php/Project_ideas#Take_device_with_heavy_IO_errors_offline_or_mark_as_.22unreliable.22

And maybe this one too
https://btrfs.wiki.kernel.org/index.php/Project_ideas#False_alarm_on_bad_disk_-_rebuild_mitigation

Already we know that Btrfs tries to write indefinitely to missing
devices. If it reappears, what gets written? Will that device be
consistent? And then another one goes missing, comes back, now
possibly two devices with totally different states for identical
generations. It's a mess. We know that trivially causes major
corruption with btrfs raid1 if a user mounts e.g. devid1 rw,degraded
modifies that; then mounts devid2 (only) rw,degraded and modifies it;
and then mounts both devids together. Kablewy. Big mess. And that's
umounting each one in between those steps; not even the abrupt
disconnect/reconnect.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs fail behavior when a device vanishes
  2016-01-01  1:27     ` Chris Murphy
@ 2016-01-01  1:37       ` ronnie sahlberg
  2016-01-01  1:45       ` ronnie sahlberg
  2016-01-11  9:46       ` Anand Jain
  2 siblings, 0 replies; 10+ messages in thread
From: ronnie sahlberg @ 2016-01-01  1:37 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Thu, Dec 31, 2015 at 5:27 PM, Chris Murphy <lists@colorremedies.com> wrote:
> On Thu, Dec 31, 2015 at 6:09 PM, ronnie sahlberg
> <ronniesahlberg@gmail.com> wrote:
>> Here is a kludge I hacked up.
>> Someone that cares could clean this up and start building a proper
>> test suite or something.
>>
>> This test script creates a 3 disk raid1 filesystem and very slowly
>> writes a large file onto the filesystem while, one by one each disk is
>> disconnected then reconnected in a loop.
>> It is fairly trivial to trigger dataloss when devices are bounced like this.
>
> Yes, it's quite a torture test. I'd expect this would be a problem for
> Btrfs until this feature is done at least:
>
> https://btrfs.wiki.kernel.org/index.php/Project_ideas#Take_device_with_heavy_IO_errors_offline_or_mark_as_.22unreliable.22
>
> And maybe this one too
> https://btrfs.wiki.kernel.org/index.php/Project_ideas#False_alarm_on_bad_disk_-_rebuild_mitigation
>
> Already we know that Btrfs tries to write indefinitely to missing
> devices.

Another question is how it handles writes when the mirrorset becomes
degraded that way.
I would expect it would :
* immediately emergency destage any dirty data in the write cache to
the surviving member disks.
* switch any future I/O to that mirrorset to use ordered and
synchronous writes to the surviving members.

> If it reappears, what gets written? Will that device be
> consistent? And then another one goes missing, comes back, now
> possibly two devices with totally different states for identical
> generations. It's a mess. We know that trivially causes major
> corruption with btrfs raid1 if a user mounts e.g. devid1 rw,degraded
> modifies that; then mounts devid2 (only) rw,degraded and modifies it;
> and then mounts both devids together. Kablewy. Big mess. And that's
> umounting each one in between those steps; not even the abrupt
> disconnect/reconnect.
>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs fail behavior when a device vanishes
  2016-01-01  1:27     ` Chris Murphy
  2016-01-01  1:37       ` ronnie sahlberg
@ 2016-01-01  1:45       ` ronnie sahlberg
  2016-01-11  9:46       ` Anand Jain
  2 siblings, 0 replies; 10+ messages in thread
From: ronnie sahlberg @ 2016-01-01  1:45 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Thu, Dec 31, 2015 at 5:27 PM, Chris Murphy <lists@colorremedies.com> wrote:
> On Thu, Dec 31, 2015 at 6:09 PM, ronnie sahlberg
> <ronniesahlberg@gmail.com> wrote:
>> Here is a kludge I hacked up.
>> Someone that cares could clean this up and start building a proper
>> test suite or something.
>>
>> This test script creates a 3 disk raid1 filesystem and very slowly
>> writes a large file onto the filesystem while, one by one each disk is
>> disconnected then reconnected in a loop.
>> It is fairly trivial to trigger dataloss when devices are bounced like this.
>
> Yes, it's quite a torture test. I'd expect this would be a problem for
> Btrfs until this feature is done at least:
>
> https://btrfs.wiki.kernel.org/index.php/Project_ideas#Take_device_with_heavy_IO_errors_offline_or_mark_as_.22unreliable.22
>
> And maybe this one too
> https://btrfs.wiki.kernel.org/index.php/Project_ideas#False_alarm_on_bad_disk_-_rebuild_mitigation
>
> Already we know that Btrfs tries to write indefinitely to missing
> devices. If it reappears, what gets written? Will that device be
> consistent? And then another one goes missing, comes back, now
> possibly two devices with totally different states for identical
> generations. It's a mess. We know that trivially causes major
> corruption with btrfs raid1 if a user mounts e.g. devid1 rw,degraded
> modifies that; then mounts devid2 (only) rw,degraded and modifies it;
> and then mounts both devids together. Kablewy. Big mess. And that's
> umounting each one in between those steps; not even the abrupt
> disconnect/reconnect.

Based on my test_0100... create a test script for that scenario too.
Even if btrfs can not handle it yet,  it does not hurt to have these
tests for scenarios that MUST work before the filesystem go officially
"stable+production".
Having these tests will possibly even make the work to close the
robustness gap easier since the devs will have reproducible test
scripts they can validate new features against.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: btrfs fail behavior when a device vanishes
  2016-01-01  1:27     ` Chris Murphy
  2016-01-01  1:37       ` ronnie sahlberg
  2016-01-01  1:45       ` ronnie sahlberg
@ 2016-01-11  9:46       ` Anand Jain
  2 siblings, 0 replies; 10+ messages in thread
From: Anand Jain @ 2016-01-11  9:46 UTC (permalink / raw)
  To: Chris Murphy, ronnie sahlberg; +Cc: Btrfs BTRFS

> Already we know that Btrfs tries to write indefinitely to missing
> devices.

(sorry for the late reply, now back from vacation).
The below and its related patch will take care of it, if when
critical IO fails it can bring the device to an offline / failed
state, so that it prevents further IOs to it.

  [PATCH 07/15] btrfs: introduce device dynamic state transition to 
offline or failed

  Further this could provide btrfs sysfs user interface so that
  externally device error monitoring scripts can bring the device
  offline / failed. (we need to settle the sysfs framework and
  patchset to add that sysfs interface).

> If it reappears, what gets written? Will that device be
> consistent?

  Yep that part of the error handling isn't present. The workaround
for it is to use the remount. (sorry if certain setup considers
remount as not suitable). However this kind of the user involved
recovery option is safe from the intermittently failing devices,
which may lead to a messy situation as you mentioned.

Thanks, Anand

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-01-11  9:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-31 20:11 btrfs fail behavior when a device vanishes Chris Murphy
2015-12-31 20:17 ` Chris Murphy
2015-12-31 20:24 ` Hugo Mills
2015-12-31 20:43   ` Chris Murphy
2015-12-31 21:23 ` ronnie sahlberg
2016-01-01  1:09   ` ronnie sahlberg
2016-01-01  1:27     ` Chris Murphy
2016-01-01  1:37       ` ronnie sahlberg
2016-01-01  1:45       ` ronnie sahlberg
2016-01-11  9:46       ` Anand Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).