* abort device removal?
@ 2014-12-05 17:28 moparisthebest
2014-12-07 17:59 ` Marc MERLIN
0 siblings, 1 reply; 2+ messages in thread
From: moparisthebest @ 2014-12-05 17:28 UTC (permalink / raw)
To: linux-btrfs
Hello all,
I had a 6-device array I added a 4tb device to last night and ran the
command to remove a previous 4tb device that still worked fine
overnight. Unfortunately, one of the OTHER devices completely failed
while this was happening, and it *looks* like btrfs did the right thing
and stopped the move, except it's still marked as 0 space in btrfs fi
show. The delete command is still running, though iotop shows it's not
actually reading or writing anything and no further moving messages in
dmesg/kern.log seems to indicate that too.
So what I think I *need* to do is re-add the drive it's currently trying
to remove so I can delete the now non-functioning other drive without
losing any data. My fear is a reboot or unmount/remount will fail to
mount the currently-being-removed drive as well causing me to lose
everything.
Here is some relevant info from the system:
# uname -a
Linux mytorrentflux1 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
# btrfs --version
Btrfs v3.17.3
# btrfs fi show
Label: 'completed' uuid: 0d14bb0f-46cc-408e-9245-f06d50ec2da8
Total devices 7 FS bytes used 7.60TiB
devid 1 size 3.64TiB used 3.28TiB path /dev/mapper/fourtb1
devid 2 size 3.64TiB used 3.29TiB path /dev/mapper/fourtb2
devid 3 size 2.73TiB used 2.37TiB path /dev/mapper/threetb1
devid 5 size 1.82TiB used 1.82TiB path /dev/mapper/twotb1
devid 6 size 0.00B used 1.99TiB path /dev/mapper/fourtb3
devid 7 size 2.73TiB used 2.22TiB path /dev/mapper/threetb2
devid 8 size 3.64TiB used 240.29GiB path /dev/mapper/fourtb4
Btrfs v3.17.3
# btrfs fi df /mnt/completed/
Data, RAID10: total=6.26TiB, used=6.26TiB
Data, RAID1: total=1.33TiB, used=1.33TiB
System, RAID10: total=96.00MiB, used=852.00KiB
Metadata, RAID10: total=10.77GiB, used=9.90GiB
Metadata, RAID1: total=5.00GiB, used=4.37GiB
fourtb4 is the new drive I just added, fourtb3 is the functioning drive
I attempted to remove before threetb1 completely failed (smartctl can't
even read anything from it, well, from the underlying device)
dmesg/kern.log is too large too attach, here are some important-looking
excerpts (3 lines often repeated):
Dec 5 09:59:35 mytorrentflux1 kernel: [1549876.646751] btrfs: bdev
/dev/mapper/threetb1 errs: wr 17599, rd 973, flush 0, corrupt 0, gen 0
Dec 5 09:59:35 mytorrentflux1 kernel: [1549877.022291] lost page write
due to I/O error on /dev/mapper/threetb1
Dec 5 10:07:08 mytorrentflux1 kernel: [1550329.743294]
btrfs_dev_stat_print_on_error: 264 callbacks suppressed
I appreciate any help or guidance I can get on this issue so I don't
lose data, hopefully.
Thanks much!
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: abort device removal?
2014-12-05 17:28 abort device removal? moparisthebest
@ 2014-12-07 17:59 ` Marc MERLIN
0 siblings, 0 replies; 2+ messages in thread
From: Marc MERLIN @ 2014-12-07 17:59 UTC (permalink / raw)
To: moparisthebest; +Cc: linux-btrfs
On Fri, Dec 05, 2014 at 12:28:57PM -0500, moparisthebest wrote:
> Hello all,
>
> I had a 6-device array I added a 4tb device to last night and ran the
> command to remove a previous 4tb device that still worked fine
> overnight. Unfortunately, one of the OTHER devices completely failed
> while this was happening, and it *looks* like btrfs did the right thing
> and stopped the move, except it's still marked as 0 space in btrfs fi
> show. The delete command is still running, though iotop shows it's not
> actually reading or writing anything and no further moving messages in
> dmesg/kern.log seems to indicate that too.
>
> So what I think I *need* to do is re-add the drive it's currently trying
> to remove so I can delete the now non-functioning other drive without
> losing any data. My fear is a reboot or unmount/remount will fail to
> mount the currently-being-removed drive as well causing me to lose
> everything.
So I didn't try this, but my understanding is that remove actually runs
a rebalance to remove all the data from that drive.
If the rebalance didn't finish, the drive is still good and part of the
array.
Obviously, you'd be better off with a full backup, but my guess is that
you could just shutdown, remove the failing drive, and leave all the
other drives.
Then run rebalance and it should recreate the missing data from your
failed drive from parity.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2014-12-07 17:59 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-05 17:28 abort device removal? moparisthebest
2014-12-07 17:59 ` Marc MERLIN
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox