* USB upgrade fun
@ 2017-10-08 9:58 Kai Hendry
2017-10-10 2:06 ` Satoru Takeuchi
2017-10-12 16:57 ` Chris Murphy
0 siblings, 2 replies; 15+ messages in thread
From: Kai Hendry @ 2017-10-08 9:58 UTC (permalink / raw)
To: linux-btrfs
Hi there,
My /mnt/raid1 suddenly became full somewhat expectedly, so I bought 2
new USB 4TB hard drives (one WD, one Seagate) to upgrade to.
After adding sde and sdd I started to see errors in dmesg [2].
https://s.natalian.org/2017-10-07/raid1-newdisks.txt
[2] https://s.natalian.org/2017-10-07/btrfs-errors.txt
I assumed it had to perhaps with the USB bus on my NUC5CPYB being maxed
out, and to expedite the sync, I tried to remove one of the older 2TB
sdc1. However the load went crazy and my system went completely
unstable. I shutdown the machine and after an hour I hard powered it
down since it seemed to hang (it's headless).
Sidenote: I've since learnt that removing a drive actually deletes the
contents of the drive? I don't want that. I was hoping to put that drive
into cold storage. How do I remove a drive without losing data from a
RAID1 configuration?
After a reboot it failed, namely because "nofail" wasn't in my fstab and
systemd is pedantic by default. After managing to get it booting into my
system without /mnt/raid1 I faced these "open ctree failed" issues.
After running btrfs check on all the drives and getting nowhere, I
decided to unplug the new drives and I discovered that when I take out
the new 4TB WD drive, I could mount it with -o degraded.
dmesg errors with the WD include "My Passport" Wrong diagnostic page;
asked for 1 got 8 "Failed to get diagnostic page 0xffffffea" which
raised my suspicions. The model number btw is WDBYFT0040BYI-WESN
Anyway, I'm back up and running with 2x2TB (one of them didn't finish
removing, I don't know which) & 1x4TB.
[1] https://s.natalian.org/2017-10-08/usb-btrfs.txt
I've decided to send the WD back for a refund. I've decided I want keep
the 2x2TB and RAID1 with the new 1x4TB disk. So 4TB total. btrfs
complains now of "Some devices missing" [1]. How do I fix this
situation?
Any tips how to name this individual disks? hdparm -I isn't a lot to go
on.
[hendry@nuc ~]$ sudo hdparm -I /dev/sdb1 | grep Model
Model Number: ST4000LM024-2AN17V
[hendry@nuc ~]$ sudo hdparm -I /dev/sdc1 | grep Model
Model Number: ST2000LM003 HN-M201RAD
[hendry@nuc ~]$ sudo hdparm -I /dev/sdd1 | grep Model
Model Number: ST2000LM005 HN-M201AAD
Ok, thanks. Hope you can guide me,
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: USB upgrade fun 2017-10-08 9:58 USB upgrade fun Kai Hendry @ 2017-10-10 2:06 ` Satoru Takeuchi 2017-10-11 4:18 ` Kai Hendry 2017-10-12 16:57 ` Chris Murphy 1 sibling, 1 reply; 15+ messages in thread From: Satoru Takeuchi @ 2017-10-10 2:06 UTC (permalink / raw) To: Kai Hendry; +Cc: linux-btrfs At Sun, 08 Oct 2017 17:58:10 +0800, Kai Hendry wrote: > > Hi there, > > My /mnt/raid1 suddenly became full somewhat expectedly, so I bought 2 > new USB 4TB hard drives (one WD, one Seagate) to upgrade to. > > After adding sde and sdd I started to see errors in dmesg [2]. > https://s.natalian.org/2017-10-07/raid1-newdisks.txt > [2] https://s.natalian.org/2017-10-07/btrfs-errors.txt These messages are harmless. Qu is tuckling on this problem. > > Sidenote: I've since learnt that removing a drive actually deletes the > contents of the drive? I don't want that. I was hoping to put that drive > into cold storage. How do I remove a drive without losing data from a > RAID1 configuration? Please let me clarify what you said. Do you worry about losing filesystem data in removed device, in this case /dev/sdc1? To be more specific, if /mnt/raid1/file is in /dev/sdc1 and lose this file by removing this device? If so, don't worry. When removing /dev/sdc1, the filesystem data exists in this device is moved to other devices, /dev/sdb1, /dev/sdd1, or /dev/sde1. Just FYI, `btrfs replace /dev/sdc1 /dev/sdd1 /mnt/raid1` is more suitable in your case. > > > I assumed it had to perhaps with the USB bus on my NUC5CPYB being maxed > out, and to expedite the sync, I tried to remove one of the older 2TB > sdc1. However the load went crazy and my system went completely > unstable. I shutdown the machine and after an hour I hard powered it > down since it seemed to hang (it's headless). Because all data in /dev/sdc1, in this case totally 1.81TiB(data) + 6.00GiB(metadata) + 32MiB(system) should be moved to remaining devices. > > > After a reboot it failed, namely because "nofail" wasn't in my fstab and > systemd is pedantic by default. After managing to get it booting into my > system without /mnt/raid1 I faced these "open ctree failed" issues. > After running btrfs check on all the drives and getting nowhere, I > decided to unplug the new drives and I discovered that when I take out > the new 4TB WD drive, I could mount it with -o degraded. > > dmesg errors with the WD include "My Passport" Wrong diagnostic page; > asked for 1 got 8 "Failed to get diagnostic page 0xffffffea" which > raised my suspicions. The model number btw is WDBYFT0040BYI-WESN > > Anyway, I'm back up and running with 2x2TB (one of them didn't finish > removing, I don't know which) & 1x4TB. > > [1] https://s.natalian.org/2017-10-08/usb-btrfs.txt > > I've decided to send the WD back for a refund. I've decided I want keep > the 2x2TB and RAID1 with the new 1x4TB disk. So 4TB total. btrfs > complains now of "Some devices missing" [1]. How do I fix this > situation? Probably `btrfs device remove missing /mnt/raid1` works. Thanks, Satoru > > Any tips how to name this individual disks? hdparm -I isn't a lot to go > on. > > [hendry@nuc ~]$ sudo hdparm -I /dev/sdb1 | grep Model > Model Number: ST4000LM024-2AN17V > [hendry@nuc ~]$ sudo hdparm -I /dev/sdc1 | grep Model > Model Number: ST2000LM003 HN-M201RAD > [hendry@nuc ~]$ sudo hdparm -I /dev/sdd1 | grep Model > Model Number: ST2000LM005 HN-M201AAD > > > Ok, thanks. Hope you can guide me, > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: USB upgrade fun 2017-10-10 2:06 ` Satoru Takeuchi @ 2017-10-11 4:18 ` Kai Hendry 2017-10-12 2:27 ` Kai Hendry 0 siblings, 1 reply; 15+ messages in thread From: Kai Hendry @ 2017-10-11 4:18 UTC (permalink / raw) To: Satoru Takeuchi; +Cc: linux-btrfs On Tue, 10 Oct 2017, at 10:06 AM, Satoru Takeuchi wrote: > Probably `btrfs device remove missing /mnt/raid1` works. That command worked. Took a really long time, but it worked. However when I unmounted /mnt/raid1 and tried mounting it again, it fails! :( https://s.natalian.org/2017-10-11/btrfs.txt mount: /mnt/raid1: wrong fs type, bad option, bad superblock on /dev/sdb1, missing codepage or helper program, or other error. "open_ctree failed" is back... sigh... Any tips? I'm going to check the disks one by one and I'll reboot the server a little later. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: USB upgrade fun 2017-10-11 4:18 ` Kai Hendry @ 2017-10-12 2:27 ` Kai Hendry 2017-10-12 17:19 ` Chris Murphy 0 siblings, 1 reply; 15+ messages in thread From: Kai Hendry @ 2017-10-12 2:27 UTC (permalink / raw) To: Satoru Takeuchi; +Cc: linux-btrfs A guy on #btrsfs suggests: 15:09 <multicore> hendry: super_total_bytes 8001581707264 mismatch with fs_devices total_rw_bytes 8001581710848 that one is because unaligned partitions, 4.12 - 4.13 kernels are affected (at least some versions) However I rebooted into 4.9.54-1-lts and I have the same issue. super_total_bytes 8001581707264 mismatch with fs_devices total_rw_bytes 8001581710848 https://s.natalian.org/2017-10-12/1507775104_2548x1398.png Any ideas what I can do now? I am getting rather nervous! Btw I checked sudo btrfs check -p /dev/sd{b,c,d}1 and they are all fine. Just can't mount my data! :( Hope you can help me, ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: USB upgrade fun 2017-10-12 2:27 ` Kai Hendry @ 2017-10-12 17:19 ` Chris Murphy 0 siblings, 0 replies; 15+ messages in thread From: Chris Murphy @ 2017-10-12 17:19 UTC (permalink / raw) To: Kai Hendry; +Cc: Satoru Takeuchi, Btrfs BTRFS On Thu, Oct 12, 2017 at 3:27 AM, Kai Hendry <hendry@iki.fi> wrote: > A guy on #btrsfs suggests: > > 15:09 <multicore> hendry: super_total_bytes 8001581707264 mismatch with > fs_devices total_rw_bytes 8001581710848 that one is because unaligned > partitions, 4.12 - 4.13 kernels are affected (at least some versions) > > > However I rebooted into 4.9.54-1-lts and I have the same issue. > super_total_bytes 8001581707264 mismatch with fs_devices total_rw_bytes > 8001581710848 > > https://s.natalian.org/2017-10-12/1507775104_2548x1398.png > > Any ideas what I can do now? I am getting rather nervous! > > Btw I checked sudo btrfs check -p /dev/sd{b,c,d}1 and they are all fine. > > Just can't mount my data! :( I should have read the whole thread. Sad. OK this is a new problem I haven't heard of before. So maybe yet another regression. *sigh* What do you get for btrfs insp dump-s -f <anydev> btrfs-debug-tree -t 3 <anydev> -- Chris Murphy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: USB upgrade fun 2017-10-08 9:58 USB upgrade fun Kai Hendry 2017-10-10 2:06 ` Satoru Takeuchi @ 2017-10-12 16:57 ` Chris Murphy 2017-10-12 17:19 ` Austin S. Hemmelgarn 1 sibling, 1 reply; 15+ messages in thread From: Chris Murphy @ 2017-10-12 16:57 UTC (permalink / raw) To: Kai Hendry; +Cc: Btrfs BTRFS On Sun, Oct 8, 2017 at 10:58 AM, Kai Hendry <hendry@iki.fi> wrote: > Hi there, > > My /mnt/raid1 suddenly became full somewhat expectedly, so I bought 2 > new USB 4TB hard drives (one WD, one Seagate) to upgrade to. > > After adding sde and sdd I started to see errors in dmesg [2]. > https://s.natalian.org/2017-10-07/raid1-newdisks.txt > [2] https://s.natalian.org/2017-10-07/btrfs-errors.txt I'm not sure what the call traces mean exactly but they seem non-fatal. The entire dmesg might be useful to see if there are device or bus related errors. I have a similar modeled NUC and I can tell you for sure it does not provide enough USB bus power for 2.5" laptop drives. They must be externally powered, or you need a really good USB hub with an even better power supply that can handle e.g. 4 drives at the same time to bus power them. I had lots of problems before I fixed this, but Btrfs managed to recover gracefully once I solved the power issue. > > > I assumed it had to perhaps with the USB bus on my NUC5CPYB being maxed > out, and to expedite the sync, I tried to remove one of the older 2TB > sdc1. However the load went crazy and my system went completely > unstable. I shutdown the machine and after an hour I hard powered it > down since it seemed to hang (it's headless). I've notice recent kernels hanging under trivial scrub and balance with hard drives. It does complete, but they are really laggy and sometimes unresponsive to anything else unless the operation is cancelled. I haven't had time to do regression testing. My assertion about this is in the archives, about versions I think it started with. > > Sidenote: I've since learnt that removing a drive actually deletes the > contents of the drive? I don't want that. I was hoping to put that drive > into cold storage. How do I remove a drive without losing data from a > RAID1 configuration? I'm pretty sure, but not certain of the following: device delete/remove is replicating chunk by chunk cow style. The entire operation is not atomic. The chunk operations themselves are atomic. I expect that metadata is updated as each chunk is properly replicated so I don't think what you want is possible. Again, pretty sure about this too, but not certain: device replace is an atomic operation, the whole thing succeeds or fails, and at the end merely the Btrfs signature is wiped from the deleted device(s). So you could restore that signature and the device would be valid again; HOWEVER it's going to have the same volume UUID as the new devices. Even though the device UUIDs are unique, and should prevent confusion, maybe confusion is possible. A better way, which currently doesn't exist, is to make the raid1 a seed device, and then add two new devices and remove the seed. That way you get the replication you want, the instant the sprout is mounted rw, it can be used in production (all changes go to the sprout), while the chunks from the seed are replicated. The reason this isn't viable right now is the tools aren't mature enough to handle multiple devices yet. Otherwise with a single device seed to a single sprout, this works and would be the way to do what you want. A better way that does exist is to setup an overlay for the two original devices. Mount the overlay devices, add the new devices, delete the overlays. So the overlay devices get the writes that cause those devices to be invalidated. The original devices aren't really touched. There's a way to do this with dmsetup like how live boot media work, and there's another way I haven't ever used before that's described here: https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file > After a reboot it failed, namely because "nofail" wasn't in my fstab and > systemd is pedantic by default. After managing to get it booting into my > system without /mnt/raid1 I faced these "open ctree failed" issues. > After running btrfs check on all the drives and getting nowhere, I > decided to unplug the new drives and I discovered that when I take out > the new 4TB WD drive, I could mount it with -o degraded. > > dmesg errors with the WD include "My Passport" Wrong diagnostic page; > asked for 1 got 8 "Failed to get diagnostic page 0xffffffea" which > raised my suspicions. The model number btw is WDBYFT0040BYI-WESN > > Anyway, I'm back up and running with 2x2TB (one of them didn't finish > removing, I don't know which) & 1x4TB. Be aware that you are likely in a very precarious position now. Anytime raid1 volumes are mounted rw,degraded, one or more of the devices will end up with new empty single chunks (there is a patch to prevent this, I'm not sure if it's in 4.13). The consequence of these new empty single chunks is that they will prevent any subsequent degraded rw mount. You get a one time degraded,rw. Any subsquent attempt will require ro,degraded to get it to mount. If you end up snared in this, there are patches in the archives to inhibit the kernels protection to allow mounting of such volumes. Super annoying. You'll have to build a custom kernel. My opinion is you should update backups before you do anything else, just in cas Next, you have to figure out a way to get all devices to be used in this volume healthy. Tricky as you technically have a 4 device raid 1 with a missing device. I propose first to check if you have single chunks with either 'btrfs fi us' or 'btrfs fi df' and if so, get rid of them with a filtered balance 'btrfs balance start -mconvert=raid1,soft -dconvert=raid1,soft' and then in theory you should be able to do 'btrfs delete missing' to end up with a valid three device btrfs raid 1, which you can use until you get your USB power supply issues sorted. So I have a lot of nausea and something of a fever right now as I'm writing this, you should definitely not trust anything I've said at face value. Except the backup now business. That's probably good advice. -- Chris Murphy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: USB upgrade fun 2017-10-12 16:57 ` Chris Murphy @ 2017-10-12 17:19 ` Austin S. Hemmelgarn 2017-10-13 1:42 ` Kai Hendry 0 siblings, 1 reply; 15+ messages in thread From: Austin S. Hemmelgarn @ 2017-10-12 17:19 UTC (permalink / raw) To: Chris Murphy, Kai Hendry; +Cc: Btrfs BTRFS On 2017-10-12 12:57, Chris Murphy wrote: > On Sun, Oct 8, 2017 at 10:58 AM, Kai Hendry <hendry@iki.fi> wrote: >> Hi there, >> >> My /mnt/raid1 suddenly became full somewhat expectedly, so I bought 2 >> new USB 4TB hard drives (one WD, one Seagate) to upgrade to. >> >> After adding sde and sdd I started to see errors in dmesg [2]. >> https://s.natalian.org/2017-10-07/raid1-newdisks.txt >> [2] https://s.natalian.org/2017-10-07/btrfs-errors.txt > > I'm not sure what the call traces mean exactly but they seem > non-fatal. The entire dmesg might be useful to see if there are device > or bus related errors. > > I have a similar modeled NUC and I can tell you for sure it does not > provide enough USB bus power for 2.5" laptop drives. They must be > externally powered, or you need a really good USB hub with an even > better power supply that can handle e.g. 4 drives at the same time to > bus power them. I had lots of problems before I fixed this, but Btrfs > managed to recover gracefully once I solved the power issue. Same here on a pair of 3 year old NUC's. Based on the traces and the other information, I'd be willing to bet this is probably the root cause of the issues.>> >> I assumed it had to perhaps with the USB bus on my NUC5CPYB being maxed >> out, and to expedite the sync, I tried to remove one of the older 2TB >> sdc1. However the load went crazy and my system went completely >> unstable. I shutdown the machine and after an hour I hard powered it >> down since it seemed to hang (it's headless). > > I've notice recent kernels hanging under trivial scrub and balance > with hard drives. It does complete, but they are really laggy and > sometimes unresponsive to anything else unless the operation is > cancelled. I haven't had time to do regression testing. My assertion > about this is in the archives, about versions I think it started with. >> >> Sidenote: I've since learnt that removing a drive actually deletes the >> contents of the drive? I don't want that. I was hoping to put that drive >> into cold storage. How do I remove a drive without losing data from a >> RAID1 configuration? > > I'm pretty sure, but not certain of the following: device > delete/remove is replicating chunk by chunk cow style. The entire > operation is not atomic. The chunk operations themselves are atomic. I > expect that metadata is updated as each chunk is properly replicated > so I don't think what you want is possible. This is correct. Deleting a device first marks that device as zero size so nothing tries to allocate data there, and then runs a balance operation to force chunks onto other devices (I'm not sure if it only moves chunks that are on the device being removed though). This results in two particularly important differences from most other RAID systems: 1. The device being removed is functionally wiped (it will appear to be empty), but not physically wiped (most of the data is still there, you just can't get to it through BTRFS). 2. The process as a whole is not atomic, but as a result of how it works, it is generally possible to restart it if it got stopped part way through (and you won't usually lose much progress). That said, even if it was technically possible to remove the drive without messing things up, it would be of limited utility. You couldn't later reconnect it and expect things to just work (you would have generation mismatches, which would hopefully cause the old disk to effectively be updated to match the new one, _IF_ the old disk even registered properly as part of the filesystem), and it would be non-trivial to get data off of it safely too (you would have to connect it to a different system, and hope that BTRFS doesn't choke on half a filesystem). > > Again, pretty sure about this too, but not certain: device replace is > an atomic operation, the whole thing succeeds or fails, and at the end > merely the Btrfs signature is wiped from the deleted device(s). So you > could restore that signature and the device would be valid again; > HOWEVER it's going to have the same volume UUID as the new devices. > Even though the device UUIDs are unique, and should prevent confusion, > maybe confusion is possible. Also correct. This is part of why it's preferred to use the replace command instead of deleting and then adding a device to replace it (the other reason being that it's significantly more efficient, especially if the filesystem isn't full). > > A better way, which currently doesn't exist, is to make the raid1 a > seed device, and then add two new devices and remove the seed. That > way you get the replication you want, the instant the sprout is > mounted rw, it can be used in production (all changes go to the > sprout), while the chunks from the seed are replicated. The reason > this isn't viable right now is the tools aren't mature enough to > handle multiple devices yet. Otherwise with a single device seed to a > single sprout, this works and would be the way to do what you want. Indeed, although it's worth noting that even with a single seed and single sprout, things aren't as well tested as most of the rest of BTRFS. > > A better way that does exist is to setup an overlay for the two > original devices. Mount the overlay devices, add the new devices, > delete the overlays. So the overlay devices get the writes that cause > those devices to be invalidated. The original devices aren't really > touched. There's a way to do this with dmsetup like how live boot > media work, and there's another way I haven't ever used before that's > described here: > > https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file Using block-level overlays with BTRFS is probably a bad idea for the same reasons that block-level copies are a bad idea, even with the dmsetup methods (also, most live boot media does it at the filesystem level, not the block level, it's safer and more efficient that way). Your safest bet is probably seed devices, though that of course is not very well documented. > >> After a reboot it failed, namely because "nofail" wasn't in my fstab and >> systemd is pedantic by default. After managing to get it booting into my >> system without /mnt/raid1 I faced these "open ctree failed" issues. >> After running btrfs check on all the drives and getting nowhere, I >> decided to unplug the new drives and I discovered that when I take out >> the new 4TB WD drive, I could mount it with -o degraded. >> >> dmesg errors with the WD include "My Passport" Wrong diagnostic page; >> asked for 1 got 8 "Failed to get diagnostic page 0xffffffea" which >> raised my suspicions. The model number btw is WDBYFT0040BYI-WESN >> >> Anyway, I'm back up and running with 2x2TB (one of them didn't finish >> removing, I don't know which) & 1x4TB. > > > Be aware that you are likely in a very precarious position now. > Anytime raid1 volumes are mounted rw,degraded, one or more of the > devices will end up with new empty single chunks (there is a patch to > prevent this, I'm not sure if it's in 4.13). The consequence of these > new empty single chunks is that they will prevent any subsequent > degraded rw mount. You get a one time degraded,rw. Any subsquent > attempt will require ro,degraded to get it to mount. If you end up > snared in this, there are patches in the archives to inhibit the > kernels protection to allow mounting of such volumes. Super annoying. > You'll have to build a custom kernel. > > My opinion is you should update backups before you do anything else, > just in cas > Next, you have to figure out a way to get all devices to be used in > this volume healthy. Tricky as you technically have a 4 device raid 1 > with a missing device. I propose first to check if you have single > chunks with either 'btrfs fi us' or 'btrfs fi df' and if so, get rid > of them with a filtered balance 'btrfs balance start > -mconvert=raid1,soft -dconvert=raid1,soft' and then in theory you > should be able to do 'btrfs delete missing' to end up with a valid > three device btrfs raid 1, which you can use until you get your USB > power supply issues sorted. I absolutely concur with Chris here, get your backups updated, and then worry about repairing the filesystem. Or, alternatively, get your backups updated and then nuke the filesystem and rebuild it from scratch (this may be more work, but it's guaranteed to work). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: USB upgrade fun 2017-10-12 17:19 ` Austin S. Hemmelgarn @ 2017-10-13 1:42 ` Kai Hendry 2017-10-13 11:22 ` Austin S. Hemmelgarn 2017-10-28 7:03 ` Kai Hendry 0 siblings, 2 replies; 15+ messages in thread From: Kai Hendry @ 2017-10-13 1:42 UTC (permalink / raw) To: Austin S. Hemmelgarn, Chris Murphy; +Cc: Btrfs BTRFS Thank you Austin & Chris for your replies! On Fri, 13 Oct 2017, at 01:19 AM, Austin S. Hemmelgarn wrote: > Same here on a pair of 3 year old NUC's. Based on the traces and the > other information, I'd be willing to bet this is probably the root cause > of the issues. It probably is... since when I remove my new 4TB USB disk from the front, I am at least able to mount my two 2x2TB in degraded mode and see my data! So I am not quite sure what to do now. I don't trust USB hubs. On a different NUC I've noticed I can't charge my iPhone anymore! https://mail-archive.com/linux-usb@vger.kernel.org/msg95231.html So... is there any end in sight for the "USB power" problem? Does USB-C / Thunderbolt address this issue? :( Try return my new 4TB to Amazon and find an externally powered one > > merely the Btrfs signature is wiped from the deleted device(s). So you > > could restore that signature and the device would be valid again; Wonder how would you do that, in order to have a working snapshot that I can put in cold storage? Nonetheless I hope the btrfs developers can make it possible to remove a RAID1 drive, to put in cold storage use case, without any pfaffing. I ran that debug info: https://s.natalian.org/2017-10-13/btrfs-reply.txt To summarise: * sdb - new 4TB disk that makes my raid1 unmountable atm when connected * sd{c,d} - old 2tb Here's the accompanying dmesg.txt https://s.natalian.org/2017-10-13/dmesg.txt sorry, it might be difficult to follow since I was moving the 4TB between the front ports and such. Kind regards, ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: USB upgrade fun 2017-10-13 1:42 ` Kai Hendry @ 2017-10-13 11:22 ` Austin S. Hemmelgarn 2017-10-13 18:28 ` Chris Murphy 2017-10-28 7:03 ` Kai Hendry 1 sibling, 1 reply; 15+ messages in thread From: Austin S. Hemmelgarn @ 2017-10-13 11:22 UTC (permalink / raw) To: Kai Hendry, Chris Murphy; +Cc: Btrfs BTRFS On 2017-10-12 21:42, Kai Hendry wrote: > Thank you Austin & Chris for your replies! > > On Fri, 13 Oct 2017, at 01:19 AM, Austin S. Hemmelgarn wrote: >> Same here on a pair of 3 year old NUC's. Based on the traces and the >> other information, I'd be willing to bet this is probably the root cause >> of the issues. > > It probably is... since when I remove my new 4TB USB disk from the > front, I am at least able to mount my two 2x2TB in degraded mode and see > my data! Given this, I think I know exactly what's wrong (although confirmation from a developer that what I think is going on can actually happen would be nice). Based on what you're saying, the metadata on the two 2TB drives says they're the only ones in the array, while the metadata on the 4TB drive says all three are in the array, but is missing almost all other data and is out of sync with the 2TB drives. > > So I am not quite sure what to do now. > I don't trust USB hubs. Yeah, I don't trust USB in general for permanently attached storage. Not just because of power problems like this, but because all kinds of things can cause random disconnects, which in turn cause issues with any filesystem (BTRFS just makes it easier to notice them). > > On a different NUC I've noticed I can't charge my iPhone anymore! > https://mail-archive.com/linux-usb@vger.kernel.org/msg95231.html So... > is there any end in sight for the "USB power" problem? Does USB-C / > Thunderbolt address this issue? :( In theory, yes, but I'm not sure if the NUC's that include it properly support the USB Power Delivery specification (if not, then they can't safely source more than 500mA, which is too little for a traditional hard drive). > > Try return my new 4TB to Amazon and find an externally powered one > >>> merely the Btrfs signature is wiped from the deleted device(s). So you >>> could restore that signature and the device would be valid again; > > Wonder how would you do that, in order to have a working snapshot that I > can put in cold storage? In practice, it's too much work to be practical. It requires rebuilding the metadata tree from scratch, which is pretty much impossible with the current tools (and even then you likely wouldn't have all the data, because some may have been overwritten during the device removal). > > Nonetheless I hope the btrfs developers can make it possible to remove a > RAID1 drive, to put in cold storage use case, without any pfaffing. From a practical perspective, you're almost certainly better off creating a copy for cold storage without involving BTRFS. As an alternative, I would suggest one of the following approaches: 1. Take a snapshot of the filesystem, and use send/receive to transfer that to another device which you then remove and store somewhere. 2. Use rsync to copy things to another device which you then remove and store somewhere. Both these options are safer, less likely to screw up your existing filesystem, and produce copies that can safely be connected to the original system at the same time as the original filesystem. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: USB upgrade fun 2017-10-13 11:22 ` Austin S. Hemmelgarn @ 2017-10-13 18:28 ` Chris Murphy 0 siblings, 0 replies; 15+ messages in thread From: Chris Murphy @ 2017-10-13 18:28 UTC (permalink / raw) To: Austin S. Hemmelgarn; +Cc: Kai Hendry, Chris Murphy, Btrfs BTRFS On Fri, Oct 13, 2017 at 12:22 PM, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote: > From a practical perspective, you're almost certainly better off creating a > copy for cold storage without involving BTRFS. Yeah if you want to hedge your bets, and keep it simple, rsync to XFS. With some added risk (double) you can use mdadm linear or LVM to concat multiple devices if necessary. In theory you're not going to get a double failure. But really, the more backups you add, it increasingly doesn't matter what the strategy used. >As an alternative, I would > suggest one of the following approaches: > > 1. Take a snapshot of the filesystem, and use send/receive to transfer that > to another device which you then remove and store somewhere. This is what I do. Primary, and two separate backup (separate volume UUIDs), all three are Btrfs. The primary uses send/receive to the two backups. But my organization plan allows me to have only three subvolumes on the primary being snapshot and thus send/receive. So it's easy. Fourth copy, a subset of important data, is in the cloud. > 2. Use rsync to copy things to another device which you then remove and > store somewhere. A fifth copy, also a subset of important data is done doing this to XFS and a big drive kept offsite. It just accumulates. I don't consider it a backup, it's an archive, nothing is deleted. I treat it as WORM. Maybe one day I redo it based on dm-verity so I can add error detection and correction, the cryptographic aspects are a distant second for me. > Both these options are safer, less likely to screw up your existing > filesystem, and produce copies that can safely be connected to the original > system at the same time as the original filesystem. Keeping it simple is the only way you use it with any regularity, and can understand it well enough if (really when) you have to do a restore. -- Chris Murphy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: USB upgrade fun 2017-10-13 1:42 ` Kai Hendry 2017-10-13 11:22 ` Austin S. Hemmelgarn @ 2017-10-28 7:03 ` Kai Hendry 2017-10-28 7:58 ` Qu Wenruo 1 sibling, 1 reply; 15+ messages in thread From: Kai Hendry @ 2017-10-28 7:03 UTC (permalink / raw) To: Austin S. Hemmelgarn, Chris Murphy; +Cc: Btrfs BTRFS, Qu Wenruo On Fri, 13 Oct 2017, at 09:42 AM, Kai Hendry wrote: > It probably is... since when I remove my new 4TB USB disk from the > front, I am at least able to mount my two 2x2TB in degraded mode and see > my data! Just a follow up. I have not been of late been able to mount my data, even in degraded mode. However someone on #btrfs suggested I try an older Linux kernel & I also found https://www.spinics.net/lists/linux-btrfs/msg69905.html to reaffirm my suspicions. Low and behold I can mount with an older kernel (linux-4.4.3-1-x86_64.pkg.tar.xz) !! But if I reboot into 4.13.9-1-ARCH, no worky: [ 489.139903] BTRFS warning (device sdb1): devid 3 uuid e5f03f81-35e7-4a29-9608-bd78864cc0ad is missing [ 489.524334] BTRFS info (device sdb1): bdev (null) errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 [ 489.524367] BTRFS info (device sdb1): bdev /dev/sdb1 errs: wr 13361, rd 31990017, flush 155, corrupt 0, gen 0 [ 502.934069] BTRFS warning (device sdb1): missing devices (1) exceeds the limit (0), writeable mount is not allowed [ 502.980748] BTRFS error (device sdb1): open_ctree failed Does anyone know how I can track the progress of "--fix-dev-size"? It doesn't seem part of btrfs-progs 4.13-1... ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: USB upgrade fun 2017-10-28 7:03 ` Kai Hendry @ 2017-10-28 7:58 ` Qu Wenruo 2017-10-29 3:20 ` Kai Hendry 0 siblings, 1 reply; 15+ messages in thread From: Qu Wenruo @ 2017-10-28 7:58 UTC (permalink / raw) To: Kai Hendry, Austin S. Hemmelgarn, Chris Murphy; +Cc: Btrfs BTRFS [-- Attachment #1.1: Type: text/plain, Size: 1949 bytes --] On 2017年10月28日 15:03, Kai Hendry wrote: > On Fri, 13 Oct 2017, at 09:42 AM, Kai Hendry wrote: >> It probably is... since when I remove my new 4TB USB disk from the >> front, I am at least able to mount my two 2x2TB in degraded mode and see >> my data! > > Just a follow up. I have not been of late been able to mount my data, > even in degraded mode. > > However someone on #btrfs suggested I try an older Linux kernel & I also > found https://www.spinics.net/lists/linux-btrfs/msg69905.html to > reaffirm my suspicions. > > Low and behold I can mount with an older kernel > (linux-4.4.3-1-x86_64.pkg.tar.xz) !! But if I reboot into 4.13.9-1-ARCH, > no worky: > > [ 489.139903] BTRFS warning (device sdb1): devid 3 uuid > e5f03f81-35e7-4a29-9608-bd78864cc0ad is missing > [ 489.524334] BTRFS info (device sdb1): bdev (null) errs: wr 0, rd 1, > flush 0, corrupt 0, gen 0 > [ 489.524367] BTRFS info (device sdb1): bdev /dev/sdb1 errs: wr 13361, > rd 31990017, flush 155, corrupt 0, gen 0 > [ 502.934069] BTRFS warning (device sdb1): missing devices (1) exceeds > the limit (0), writeable mount is not allowed > [ 502.980748] BTRFS error (device sdb1): open_ctree failed > > > Does anyone know how I can track the progress of "--fix-dev-size"? It > doesn't seem part of btrfs-progs 4.13-1... > Didn't follow the whole thread, so I can't say much about the original problem. But concerning "--fix-dev-size", it's not merged into mainline yet. So if you really want to try it, you could use the out-of-tree btrfs-progs: https://github.com/adam900710/btrfs-progs/tree/check_unaligned_dev Don't get confused with the name, to use "fix-dev-size" you need to run "btrfs rescue fix-dev-size" However, according to your kernel message, it seems that you're missing one device, in which case "fix-dev-size" can't handle yet. So I'm afraid it can't help much for your case. Thanks, Qu [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 520 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: USB upgrade fun 2017-10-28 7:58 ` Qu Wenruo @ 2017-10-29 3:20 ` Kai Hendry 2017-10-29 10:02 ` Qu Wenruo 0 siblings, 1 reply; 15+ messages in thread From: Kai Hendry @ 2017-10-29 3:20 UTC (permalink / raw) To: Qu Wenruo; +Cc: Btrfs BTRFS On Sat, 28 Oct 2017, at 03:58 PM, Qu Wenruo wrote: > Don't get confused with the name, to use "fix-dev-size" you need to run > "btrfs rescue fix-dev-size" [hendry@nuc btrfs-progs]$ sudo ./btrfs rescue fix-device-size /dev/sdc1 warning, device 2 is missing ERROR: devid 2 is missing or not writeable ERROR: fixing device size needs all device(s) present and writeable [hendry@nuc btrfs-progs]$ lsblk -f NAME FSTYPE LABEL UUID MOUNTPOINT sda ├─sda1 vfat 0C95-8576 /boot └─sda2 btrfs c5f98288-5ab3-4236-b00e-f2cd15c0616d / sdb sdc └─sdc1 btrfs extraid1 5cab2a4a-e282-4931-b178-bec4c73cdf77 [hendry@nuc btrfs-progs]$ lsblk -f NAME FSTYPE LABEL UUID MOUNTPOINT sda ├─sda1 vfat 0C95-8576 /boot └─sda2 btrfs c5f98288-5ab3-4236-b00e-f2cd15c0616d / sdb └─sdb1 btrfs extraid1 5cab2a4a-e282-4931-b178-bec4c73cdf77 sdc └─sdc1 btrfs extraid1 5cab2a4a-e282-4931-b178-bec4c73cdf77 [hendry@nuc btrfs-progs]$ sudo ./btrfs rescue fix-device-size /dev/sdc1 Couldn't setup extent tree Couldn't setup device tree ERROR: could not open btrfs [hendry@nuc btrfs-progs]$ sudo ./btrfs rescue fix-device-size /dev/sdb1 leaf parent key incorrect 1320477425664 ERROR: could not open btrfs [hendry@nuc btrfs-progs]$ sudo mount -o degraded /dev/sdb1 /mnt/raid1/ mount: /mnt/raid1: wrong fs type, bad option, bad superblock on /dev/sdb1, missing codepage or helper program, or other error. Still unable to mount. Damn. Maybe I'm chasing a red herring? Here are the relevant kernel logs: Oct 29 10:56:45 nuc kernel: sd 2:0:0:0: [sdb] Attached SCSI disk Oct 29 10:57:32 nuc kernel: BTRFS info (device sdc1): allowing degraded mounts Oct 29 10:57:32 nuc kernel: BTRFS info (device sdc1): disk space caching is enabled Oct 29 10:57:32 nuc kernel: BTRFS info (device sdc1): has skinny extents Oct 29 10:57:33 nuc kernel: BTRFS error (device sdc1): super_total_bytes 4000795746304 mismatch with fs_devices total_rw_bytes 4000795749888 Oct 29 10:57:33 nuc kernel: BTRFS error (device sdc1): failed to read chunk tree: -22 Oct 29 10:57:33 nuc kernel: BTRFS error (device sdc1): open_ctree failed Nonetheless if I reboot to 4.4, I can still mount. However my root is bizarrely out of space or inodes so journalctl et al is unusable. Kind regards, ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: USB upgrade fun 2017-10-29 3:20 ` Kai Hendry @ 2017-10-29 10:02 ` Qu Wenruo 2017-10-30 11:12 ` Kai Hendry 0 siblings, 1 reply; 15+ messages in thread From: Qu Wenruo @ 2017-10-29 10:02 UTC (permalink / raw) To: Kai Hendry; +Cc: Btrfs BTRFS [-- Attachment #1.1: Type: text/plain, Size: 3082 bytes --] On 2017年10月29日 11:20, Kai Hendry wrote: > On Sat, 28 Oct 2017, at 03:58 PM, Qu Wenruo wrote: >> Don't get confused with the name, to use "fix-dev-size" you need to run >> "btrfs rescue fix-dev-size" > > [hendry@nuc btrfs-progs]$ sudo ./btrfs rescue fix-device-size /dev/sdc1 > warning, device 2 is missing > ERROR: devid 2 is missing or not writeable > ERROR: fixing device size needs all device(s) present and writeable > [hendry@nuc btrfs-progs]$ lsblk -f > NAME FSTYPE LABEL UUID MOUNTPOINT > sda > ├─sda1 vfat 0C95-8576 /boot > └─sda2 btrfs c5f98288-5ab3-4236-b00e-f2cd15c0616d / > sdb > sdc > └─sdc1 btrfs extraid1 5cab2a4a-e282-4931-b178-bec4c73cdf77 > [hendry@nuc btrfs-progs]$ lsblk -f > NAME FSTYPE LABEL UUID MOUNTPOINT > sda > ├─sda1 vfat 0C95-8576 /boot > └─sda2 btrfs c5f98288-5ab3-4236-b00e-f2cd15c0616d / > sdb > └─sdb1 btrfs extraid1 5cab2a4a-e282-4931-b178-bec4c73cdf77 > sdc > └─sdc1 btrfs extraid1 5cab2a4a-e282-4931-b178-bec4c73cdf77 > [hendry@nuc btrfs-progs]$ sudo ./btrfs rescue fix-device-size /dev/sdc1 > Couldn't setup extent tree > Couldn't setup device tree > ERROR: could not open btrfs > [hendry@nuc btrfs-progs]$ sudo ./btrfs rescue fix-device-size /dev/sdb1 > leaf parent key incorrect 1320477425664 > ERROR: could not open btrfs Maybe superblocks of both devices have something wrong. > [hendry@nuc btrfs-progs]$ sudo mount -o degraded /dev/sdb1 /mnt/raid1/ > mount: /mnt/raid1: wrong fs type, bad option, bad superblock on > /dev/sdb1, missing codepage or helper program, or other error. > > > Still unable to mount. Damn. Maybe I'm chasing a red herring? Here are > the relevant kernel logs: > > Oct 29 10:56:45 nuc kernel: sd 2:0:0:0: [sdb] Attached SCSI disk > Oct 29 10:57:32 nuc kernel: BTRFS info (device sdc1): allowing degraded > mounts > Oct 29 10:57:32 nuc kernel: BTRFS info (device sdc1): disk space caching > is enabled > Oct 29 10:57:32 nuc kernel: BTRFS info (device sdc1): has skinny extents > Oct 29 10:57:33 nuc kernel: BTRFS error (device sdc1): super_total_bytes > 4000795746304 mismatch with fs_devices total_rw_bytes 4000795749888 > Oct 29 10:57:33 nuc kernel: BTRFS error (device sdc1): failed to read > chunk tree: -22 > Oct 29 10:57:33 nuc kernel: BTRFS error (device sdc1): open_ctree failed > > > Nonetheless if I reboot to 4.4, I can still mount. However my root is > bizarrely out of space or inodes so journalctl et al is unusable. Can 4.4 mount it rw? If so, mount it, do minimal write like creating an empty file, to update both superblock copies, and then try fix-device-size. Thanks, Qu > > > Kind regards, > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 520 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: USB upgrade fun 2017-10-29 10:02 ` Qu Wenruo @ 2017-10-30 11:12 ` Kai Hendry 0 siblings, 0 replies; 15+ messages in thread From: Kai Hendry @ 2017-10-30 11:12 UTC (permalink / raw) To: Qu Wenruo; +Cc: Btrfs BTRFS On Sun, 29 Oct 2017, at 06:02 PM, Qu Wenruo wrote: > If so, mount it, do minimal write like creating an empty file, to update > both superblock copies, and then try fix-device-size. Tried that, and it didn't work. Made a recording: https://youtu.be/SFd3QscNT6w ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2017-10-30 11:12 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-10-08 9:58 USB upgrade fun Kai Hendry 2017-10-10 2:06 ` Satoru Takeuchi 2017-10-11 4:18 ` Kai Hendry 2017-10-12 2:27 ` Kai Hendry 2017-10-12 17:19 ` Chris Murphy 2017-10-12 16:57 ` Chris Murphy 2017-10-12 17:19 ` Austin S. Hemmelgarn 2017-10-13 1:42 ` Kai Hendry 2017-10-13 11:22 ` Austin S. Hemmelgarn 2017-10-13 18:28 ` Chris Murphy 2017-10-28 7:03 ` Kai Hendry 2017-10-28 7:58 ` Qu Wenruo 2017-10-29 3:20 ` Kai Hendry 2017-10-29 10:02 ` Qu Wenruo 2017-10-30 11:12 ` Kai Hendry
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).