degraded raid scribbling upon wrong device

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* degraded raid scribbling upon wrong device
@ 2017-07-13  6:40 Adam Borowski
  2017-07-22 20:36 ` Adam Borowski
  0 siblings, 1 reply; 2+ messages in thread
From: Adam Borowski @ 2017-07-13  6:40 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 741 bytes --]

Hi!
Here's a set of test cases, two of them in some cases seem to scribble upon
the wrong device:

* deg-mid-missing
* deg-last-replaced (not on the innocent "re")
* but never deg-last-missing

When all goes ok, there are no errors other than wrong generation on the
re-added disk (expected).   When it goes bad, there's a lot of corruption.
In all cases, though, the "Device missing:" field is wrong.

I'm not yet sure how to trigger this, perhaps someone would have a clue?

8:30am, hitting the sack, will try again todorrow.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can.
⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener.
⠈⠳⣄⠀⠀⠀⠀ A master species delegates.

[-- Attachment #2: deg-mid-missing --]
[-- Type: text/plain, Size: 818 bytes --]

#!/bin/sh
set -e
set -x

umount /mnt/vol1 ||:
losetup -D

dd if=/dev/zero bs=1048576 count=1 seek=4095 of=ra
dd if=/dev/zero bs=1048576 count=1 seek=4095 of=rb
dd if=/dev/zero bs=1048576 count=1 seek=4095 of=rc
dd if=/dev/zero bs=1048576 count=1 seek=4095 of=rd

mkfs.btrfs -draid1 -mraid1 ra rb rc rd

losetup -D
losetup -f ra
losetup -f rb
losetup -f rc
losetup -f rd
sleep 1
mount /dev/loop0 /mnt/vol1
cp -pr /bin /mnt/vol1
btrfs fi sync /mnt/vol1
btrfs fi us /mnt/vol1
umount /mnt/vol1

losetup -D
losetup -f ra
losetup -f rb
losetup -f rd
sleep 1
mount -odegraded /dev/loop0 /mnt/vol1
btrfs fi us /mnt/vol1
dd if=/dev/zero of=/mnt/vol1/foo bs=1048576 count=2222
umount /mnt/vol1

losetup -D
losetup -f ra
losetup -f rb
losetup -f rc
losetup -f rd
sleep 1
mount /dev/loop0 /mnt/vol1
btrfs scrub start -B /mnt/vol1

[-- Attachment #3: deg-last-missing --]
[-- Type: text/plain, Size: 818 bytes --]

#!/bin/sh
set -e
set -x

umount /mnt/vol1 ||:
losetup -D

dd if=/dev/zero bs=1048576 count=1 seek=4095 of=ra
dd if=/dev/zero bs=1048576 count=1 seek=4095 of=rb
dd if=/dev/zero bs=1048576 count=1 seek=4095 of=rc
dd if=/dev/zero bs=1048576 count=1 seek=4095 of=rd

mkfs.btrfs -draid1 -mraid1 ra rb rc rd

losetup -D
losetup -f ra
losetup -f rb
losetup -f rc
losetup -f rd
sleep 1
mount /dev/loop0 /mnt/vol1
cp -pr /bin /mnt/vol1
btrfs fi sync /mnt/vol1
btrfs fi us /mnt/vol1
umount /mnt/vol1

losetup -D
losetup -f ra
losetup -f rb
losetup -f rc
sleep 1
mount -odegraded /dev/loop0 /mnt/vol1
btrfs fi us /mnt/vol1
dd if=/dev/zero of=/mnt/vol1/foo bs=1048576 count=2222
umount /mnt/vol1

losetup -D
losetup -f ra
losetup -f rb
losetup -f rc
losetup -f rd
sleep 1
mount /dev/loop0 /mnt/vol1
btrfs scrub start -B /mnt/vol1

[-- Attachment #4: deg-last-replaced --]
[-- Type: text/plain, Size: 883 bytes --]

#!/bin/sh
set -e
set -x

umount /mnt/vol1 ||:
losetup -D

dd if=/dev/zero bs=1048576 count=1 seek=4095 of=ra
dd if=/dev/zero bs=1048576 count=1 seek=4095 of=rb
dd if=/dev/zero bs=1048576 count=1 seek=4095 of=rc
dd if=/dev/zero bs=1048576 count=1 seek=4095 of=rd
dd if=/dev/zero bs=1048576 count=1 seek=4095 of=re

mkfs.btrfs -draid1 -mraid1 ra rb rc rd

losetup -D
losetup -f ra
losetup -f rb
losetup -f rc
losetup -f rd
sleep 1
mount /dev/loop0 /mnt/vol1
cp -pr /bin /mnt/vol1
btrfs fi sync /mnt/vol1
btrfs fi us /mnt/vol1
umount /mnt/vol1

losetup -D
losetup -f ra
losetup -f rb
losetup -f rc
losetup -f re
sleep 1
mount -odegraded /dev/loop0 /mnt/vol1
btrfs fi us /mnt/vol1
dd if=/dev/zero of=/mnt/vol1/foo bs=1048576 count=2222
umount /mnt/vol1

losetup -D
losetup -f ra
losetup -f rb
losetup -f rc
losetup -f rd
sleep 1
mount /dev/loop0 /mnt/vol1
btrfs scrub start -B /mnt/vol1

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: degraded raid scribbling upon wrong device
  2017-07-13  6:40 degraded raid scribbling upon wrong device Adam Borowski
@ 2017-07-22 20:36 ` Adam Borowski
  0 siblings, 0 replies; 2+ messages in thread
From: Adam Borowski @ 2017-07-22 20:36 UTC (permalink / raw)
  To: linux-btrfs

On Thu, Jul 13, 2017 at 08:40:12AM +0200, Adam Borowski wrote:
> Here's a set of test cases, two of them in some cases seem to scribble upon
> the wrong device:
> 
> * deg-mid-missing
> * deg-last-replaced (not on the innocent "re")
> * but never deg-last-missing
> 
> When all goes ok, there are no errors other than wrong generation on the
> re-added disk (expected).   When it goes bad, there's a lot of corruption.
> In all cases, though, the "Device missing:" field is wrong.

I did not explore this adequately yet, in a good part because of ENOSPC
triggering a lot of time for an unrelated reason that Omar just fixed
(thanks!).  So, here's what I know so far:

* copying in, say, 2.2GB /usr/share is a lot more likely to trigger than
  dd-ing 2.2GB of /dev/null
* no "real" degrading is needed: in the original scripts, the missing device
  is empty so all blocks are doubled anyway.  It's not about degraded chunks
  but because of a bogus device.
* bogus output of "btrfs f u" is a sure predictor that, with enough tries,
  you'll get corruption -- if it shows something when it should say
  "missing", shit is likely to happen

Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can.
⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener.
⠈⠳⣄⠀⠀⠀⠀ A master species delegates.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-07-22 20:36 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-13  6:40 degraded raid scribbling upon wrong device Adam Borowski
2017-07-22 20:36 ` Adam Borowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).