From: Omar Sandoval <osandov@osandov.com>
To: linux-btrfs@vger.kernel.org
Cc: Miao Xie <miaoxie@huawei.com>, Philip <bugzilla@philip-seeger.de>,
Omar Sandoval <osandov@osandov.com>
Subject: [PATCH 0/4] Btrfs: RAID 5/6 missing device replace
Date: Mon, 11 May 2015 00:58:11 -0700 [thread overview]
Message-ID: <cover.1431330020.git.osandov@osandov.com> (raw)
A user reported on Bugzilla that they were seeing kernel BUGs when
attempting to replace a missing device on a RAID 6 array. After
identifying the apparent cause of the BUG, I reached the conclusion that
there wasn't a quick fix. Maybe Miao Xie can point something out that I
missed, as he originally implemented device replace on RAID 5/6 :)
Patch 4 has the details, but the main problem is that we can't create
bios for a missing device, so the main scrub code path isn't very
useful. On RAID 5/6, since we only have one mirror for any piece of
data, the missing device is the only mirror we can use. Clearly, (unless
I missed something), this case needs to be handled differently.
These patches are on top of v4.1-rc2. I ran the scrub and replace
xfstests and the script below, which also reproduces the original BUG.
Thanks!
Omar Sandoval (4):
Btrfs: remove misleading handling of missing device scrub
Btrfs: count devices correctly in readahead during RAID 5/6 replace
Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation
Btrfs: fix device replace of a missing RAID 5/6 device
fs/btrfs/raid56.c | 87 +++++++++++++++++++++++++----
fs/btrfs/raid56.h | 10 +++-
fs/btrfs/reada.c | 4 +-
fs/btrfs/scrub.c | 164 +++++++++++++++++++++++++++++++++++++++++++++---------
4 files changed, 225 insertions(+), 40 deletions(-)
Testing script:
----
#!/bin/bash
USAGE="Usage: $0 [eio|missing] [raid0|raid1|raid5|raid6]"
if [ "$1" = "-h" ]; then
echo "$USAGE"
exit
fi
MODE="${1:-missing}"
RAID="${2:-raid5}"
case "$MODE" in
eio|missing)
;;
*)
echo "$USAGE" >&2
exit 1
;;
esac
case "$RAID" in
raid[0156])
;;
*)
echo "$USAGE" >&2
exit 1
;;
esac
NUM_DISKS=4
NUM_RAID_DISKS=3
SRC_DISK=1
TARGET_DISK=3
NUM_SECTORS=$((1024 * 1024))
LOOP_DEVICES=()
DM_DEVICES=()
cleanup () {
echo "Done. Press enter to cleanup..."
read
if findmnt /mnt; then
umount /mnt
fi
for DM in "${DM_DEVICES[@]}"; do
dmsetup remove "$DM"
done
for LOOP in "${LOOP_DEVICES[@]}"; do
losetup --detach "$LOOP"
done
for ((i = 0; i < NUM_DISKS; i++)); do
rm -f disk${i}.img
done
}
trap 'cleanup; exit 1' ERR
echo "Creating disk images..."
for ((i = 0; i < NUM_DISKS; i++)); do
rm -f disk${i}.img
dd if=/dev/zero of=disk${i}.img bs=512 seek=$NUM_SECTORS count=0
LOOP_DEVICES+=("$(losetup --find --show disk${i}.img)")
done
echo "Creating loopback devices..."
for LOOP in "${LOOP_DEVICES[@]}"; do
DM="${LOOP/\/dev\/loop/dm}"
dmsetup create "$DM" --table "0 $NUM_SECTORS linear $LOOP 0"
DM_DEVICES+=("$DM")
done
echo "Creating filesystem..."
FS_DEVICES=("${DM_DEVICES[@]:0:$NUM_RAID_DISKS}")
FS_DEVICES=("${FS_DEVICES[@]/#/\/dev\/mapper\/}")
MOUNT_DEVICE="${FS_DEVICES[$(((SRC_DISK + 1) % NUM_RAID_DISKS))]}"
mkfs.btrfs -d "$RAID" -m "$RAID" "${FS_DEVICES[@]}"
mount "$MOUNT_DEVICE" /mnt
cp -r ~/xfstests /mnt
sync
case "$MODE" in
eio)
echo "Killing disk..."
dmsetup suspend "${DM_DEVICES[$SRC_DISK]}"
dmsetup reload "${DM_DEVICES[$SRC_DISK]}" --table "0 $NUM_SECTORS error"
dmsetup resume "${DM_DEVICES[$SRC_DISK]}"
;;
missing)
echo "Removing disk and remounting degraded..."
umount /mnt
dmsetup remove "${DM_DEVICES[$SRC_DISK]}"
unset DM_DEVICES[$SRC_DISK]
mount -o degraded "$MOUNT_DEVICE" /mnt
;;
esac
echo "Replacing disk..."
btrfs replace start -B $((SRC_DISK + 1)) /dev/mapper/"${DM_DEVICES[$TARGET_DISK]}" /mnt
echo "Scrubbing to double-check..."
btrfs scrub start -Br /mnt
cleanup
----
--
2.4.0
next reply other threads:[~2015-05-11 7:58 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-11 7:58 Omar Sandoval [this message]
2015-05-11 7:58 ` [PATCH 1/4] Btrfs: remove misleading handling of missing device scrub Omar Sandoval
2015-05-11 7:58 ` [PATCH 2/4] Btrfs: count devices correctly in readahead during RAID 5/6 replace Omar Sandoval
2015-05-11 7:58 ` [PATCH 3/4] Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation Omar Sandoval
2015-05-11 7:58 ` [PATCH 4/4] Btrfs: fix device replace of a missing RAID 5/6 device Omar Sandoval
2015-06-11 10:29 ` Zhao Lei
2015-06-12 8:12 ` Omar Sandoval
2015-06-12 8:26 ` Zhao Lei
2015-05-26 17:05 ` [PATCH 0/4] Btrfs: RAID 5/6 missing device replace Omar Sandoval
2015-06-11 3:52 ` Zhao Lei
2015-06-11 6:08 ` Omar Sandoval
2015-06-12 9:42 ` wangyf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1431330020.git.osandov@osandov.com \
--to=osandov@osandov.com \
--cc=bugzilla@philip-seeger.de \
--cc=linux-btrfs@vger.kernel.org \
--cc=miaoxie@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.