* FS corruption when mounting non-degraded after mounting degraded
@ 2016-01-19 11:28 Rian Hunter
2016-01-19 13:54 ` Duncan
2016-01-21 16:26 ` Rich Freeman
0 siblings, 2 replies; 8+ messages in thread
From: Rian Hunter @ 2016-01-19 11:28 UTC (permalink / raw)
To: linux-btrfs
In my raid6 setup, a disk was soft-failing on me. I pulled the disk,
inserted a new one, mounted degraded, then did btrfs-replace while
running some RW jobs on the FS.
My jobs were taking too long. It seems like raid6 btrfs-replace without
the source disk is not very fast. So I unmounted the FS, inserted the
soft-failing disk again, remounted normally (non-degraded) and restarted
the (now much faster) btrfs-replace.
I checked on the status sometime later and there were hundreds if not
thousands of "transid verify failure" messages in my dmesg.
Additionally the btrfs-replace operation had outright failed.
After removing the soft-failing disk, and mounting degraded again, it now
seemed that some files in the FS were corrupted and in some instances
accessing certain files would actually cause the kernel to loop
indefinitely while eating up memory.
Nothing was corrupted before I mounted the soft-failing disk in
non-degraded mode. This leads me to believe that btrfs doesn't
intelligently handle remounting normally previously degraded arrays. Can
anyone confirm this?
I would highly recommend some sort of fast failure or DWIM behavior for
this, e.g.:
$ mount -o degraded /dev/sda1 /mnt
$ touch /mnt/newfile.txt
$ unmount /mnt
$ # plugin other device, e.g. /dev/sdb
$ mount /dev/sda1 /mnt
STDERR: Ignoring /dev/sdb1, FS has changed since mounting degraded
without it.
Alternatively:
STDERR: Cannot mount file system with device /dev/sdb1, FS has changed
since mounting degraded without it.
Rian
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: FS corruption when mounting non-degraded after mounting degraded 2016-01-19 11:28 FS corruption when mounting non-degraded after mounting degraded Rian Hunter @ 2016-01-19 13:54 ` Duncan 2016-01-21 16:26 ` Rich Freeman 1 sibling, 0 replies; 8+ messages in thread From: Duncan @ 2016-01-19 13:54 UTC (permalink / raw) To: linux-btrfs Rian Hunter posted on Tue, 19 Jan 2016 03:28:53 -0800 as excerpted: > Nothing was corrupted before I mounted the soft-failing disk in > non-degraded mode. This leads me to believe that btrfs doesn't > intelligently handle remounting normally previously degraded arrays. Can > anyone confirm this? Two things: 1) Critical: I see absolutely no mention of the kernel version you were/ are using. Btrfs raid56 mode is reasonably new and was only nominally complete with kernel 3.19. Both 3.19 and 4.0 still had quickly known bugs in raid56 mode, however, and should be out of use now as neither one was an LTS kernel, and obviously anything 3.18 and earlier shouldn't be used for raid56 mode either as it wasn't even complete at that point. Beyond that, the recommendation has always been to wait for awhile if you need anything like stable, with my own recommendation being at least a year, five releases cyles, after full-feature introduction, and then looking at reports on the list, before considering it stable. So LTS 4.1 still isn't stable for raid56 mode, 4.2 is already out of the last couple current kernel releases window, and 4.3, while to my knowledge not having any critical known raid56 bugs, is still within that year stabilization window, which means with 4.4 out you should be on it, unless there's a specific bug in it that is preventing that at this point. The just released 4.4 is the first kernel that could meet the year- minimum-to-stabilize criteria, and it's LTS, which means it has the /potential/ of being the first kernel I'd consider btrfs raid56 mode as stable as the rest of btrfs. But 4.4 is still new enough we don't /know/ that yet, and while it's an LTS and hopefully will eventually be reasonably stable for raid56 mode, there's no reason beyond the simple first-release-after-1-year timing to suggest that it actually /is/ at that level yet, as it simply hasn't been out long enough to know. So, particularly with btrfs raid56 mode, running the latest kernel is absolutely critical, and if you're not on kernel 4.4 yet, you definitely should be if you're running raid56 mode. And even 4.4 shouldn't really be considered raid56 mode stable yet. It might be, but we simply don't have the data for it one way or the other yet. And given that you didn't mention kernel version at all, where kernel version is so critical, a reasonable assumption would be that you didn't do due-diligence research before deploying on raid56 and no nothing about it, so who knows /what/ old kernel you might be using? Meanwhile, btrfs itself is, as frequently stated on this list, "stabilizING, but not entirely stable and mature yet." IOW, even for single-device btrfs or the more mature raid0/1/10 modes, btrfs isn't "production ready stable", even if it's stable /enough/ that many both on this list and in the wider Linux distro world are using it as a "daily driver", hopefully with backups available and ready, since it /is/ still stabilizing and maturing. Of course the sysadmin's rule of backups even for mature filesystems says, in simplest form, that if you don't have at least one backup, that by your actions, you're defining the data as worth less than the time and resources necessary to do the backup, despite any verbal/written claims to the contrary, and with btrfs not yet fully stable and btrfs raid56 even less so, that means on btrfs raid56, you REALLY have backups or you REALLY are declaring the data of trivial value, at best. So by all means, use btrfs if you have appropriate backups and are willing to use them, but don't expect btrfs raid56 mode to be stable yet, particularly on kernels before the just-released 4.4, because it simply isn't. 2) Specifically regarding your posted point, I don't personally use raid56 yet, and don't understand its limitations as well as I do the raid1 mode I do use, but it wouldn't surprise me if indeed adding an old device after the filesystem has moved on could indeed cause problems, particularly in a rebuild from degraded environment. Btrfs raid1 mode has a similar but more limited don't-do-this case -- it should be OK in the situation you mentioned, but it's strongly recommended not to separate the two copies, mount each one writable separately, and then try to use them together, as that's an invitation to corruption. Rather, if raid1 components must be separated, care should be taken to mount just one degraded,rw, if it's at all planned to mount them together, undegraded, again (and if that's done, a scrub is needed, at minimum, to catch the older device data back up to the newer one(s)). In the event that both /do/ get separately mounted rw, the only way to properly use them combined again is to wipe one or the other, and add it again as a new device. And with parity rebuilds there's a generally known non-btrfs-specific rebuild hole where parity rebuilds can corrupt under the wrong circumstances already. I've never taken the time to fully dig into the technical details so don't claim to fully understand the hole and thus don't know for sure if you obviously triggered it, but I know it's there. And because btrfs does per-chunk raid rather than whole-device raid, it's possible you triggered it in the btrfs case even if you wouldn't have in the general case. But I'll defer to the more knowledgeable for the specifics there. Meanwhile... these facts are no secret. The non-btrfs-specific parity rebuild hole is well known, as is the fact that btrfs raid56 mode isn't yet mature. Anyone doing due-diligence pre-deployment research should have come across the latter repeatedly on this list as well as on the btrfs wiki, and the former repeatedly in general discussions of parity- raid. So I'd say, restore from the backup you certainly had if the data was of any value at all, and call it good. And if you didn't have that backup, well, your actions declared the data of trivial value at best, so be happy, because you saved what your actions defined as of higher value, the time and resources you would have put into the backup had you done it. So you can still call it good! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: FS corruption when mounting non-degraded after mounting degraded 2016-01-19 11:28 FS corruption when mounting non-degraded after mounting degraded Rian Hunter 2016-01-19 13:54 ` Duncan @ 2016-01-21 16:26 ` Rich Freeman 2016-01-21 17:15 ` Chris Murphy 1 sibling, 1 reply; 8+ messages in thread From: Rich Freeman @ 2016-01-21 16:26 UTC (permalink / raw) To: Rian Hunter; +Cc: Btrfs BTRFS On Tue, Jan 19, 2016 at 6:28 AM, Rian Hunter <rian@thelig.ht> wrote: > In my raid6 setup, a disk was soft-failing on me. I pulled the disk, > inserted a new one, mounted degraded, then did btrfs-replace while running > some RW jobs on the FS. > > My jobs were taking too long. It seems like raid6 btrfs-replace without the > source disk is not very fast. So I unmounted the FS, inserted the > soft-failing disk again, remounted normally (non-degraded) and restarted the > (now much faster) btrfs-replace. > > I checked on the status sometime later and there were hundreds if not > thousands of "transid verify failure" messages in my dmesg. Additionally the > btrfs-replace operation had outright failed. > I think the bottom line here is the same thing that everybody runs into when they have LVM snapshots of btrfs devices. Btrfs doesn't record any kind of timestamp or generation number of anything like that when it touches data on a drive, or if it does it isn't granular enough or it isn't being used when mounting drives to ensure consistency. So, btrfs sees a bunch of partitions that at one time were consistent and assumes that they must still be consistent. If they aren't consistent, almost anything can go wrong. The bottom line is that anytime you mount btrfs in anything other than read-only mode, any partitions that may have been associated with that filesystem which aren't mounted should be treated like the plague. Don't let btrfs ever mount them again, because it will be all too willing to try and you'll likely lose data if it happens. There was a long thread on this not long ago and I don't know what the final outcome of it was. It would be ideal if btrfs could detect if a partition is out-of-date and not attempt to mount it, except perhaps as read-only in some kind of recovery mode. -- Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: FS corruption when mounting non-degraded after mounting degraded 2016-01-21 16:26 ` Rich Freeman @ 2016-01-21 17:15 ` Chris Murphy 2016-01-21 22:25 ` Rian Hunter 0 siblings, 1 reply; 8+ messages in thread From: Chris Murphy @ 2016-01-21 17:15 UTC (permalink / raw) To: Rich Freeman; +Cc: Rian Hunter, Btrfs BTRFS On Thu, Jan 21, 2016 at 9:26 AM, Rich Freeman <r-btrfs@thefreemanclan.net> wrote: > On Tue, Jan 19, 2016 at 6:28 AM, Rian Hunter <rian@thelig.ht> wrote: >> In my raid6 setup, a disk was soft-failing on me. I pulled the disk, >> inserted a new one, mounted degraded, then did btrfs-replace while running >> some RW jobs on the FS. >> >> My jobs were taking too long. It seems like raid6 btrfs-replace without the >> source disk is not very fast. So I unmounted the FS, inserted the >> soft-failing disk again, remounted normally (non-degraded) and restarted the >> (now much faster) btrfs-replace. >> >> I checked on the status sometime later and there were hundreds if not >> thousands of "transid verify failure" messages in my dmesg. Additionally the >> btrfs-replace operation had outright failed. >> > > I think the bottom line here is the same thing that everybody runs > into when they have LVM snapshots of btrfs devices. Seems unrelated. The LVM and dd concern is an *identical* member device appearing (two or more instances of the same volume UUID and dev UUID). In this case it's a member device with a (truly) unique dev UUID. So there should be no such confusion. > > Btrfs doesn't record any kind of timestamp or generation number of > anything like that when it touches data on a drive, or if it does it > isn't granular enough or it isn't being used when mounting drives to > ensure consistency. Except it does. And the superblock on every device includes the dev UUID of all other devices. So even with a device missing, or a device missing and reappearing while that same device is being replaced, the file system has information available. While a member device is being replaced, it isn't invalid. It's still available for reading from for its own replacement. Hence why 'btrfs replace start' has an -r option. So it makes sense that the rebuild goes faster once it's reconnected, because now that drive is available for normal reads, the volume isn't degraded anymore, so data doesn't have to be reconstructed from parity. But none of this explains why corruption happened. So I'd say it's a bug. The question is, is it discretely reproducible? Once there's concise reproduce steps, it's much more likely a dev can reproduce and debug what the problem is. Maybe there's a problem where the reintroduction of a previously missing device in the middle of being replaced. The devid is different from dev.uuid, but they are mapped somehow, and mainly it's the devid being used by the filesystem chunk tree. So maybe confusion happens where there are suddenly two devid 2, even though they have different dev UUID. -- Chris Murphy ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: FS corruption when mounting non-degraded after mounting degraded 2016-01-21 17:15 ` Chris Murphy @ 2016-01-21 22:25 ` Rian Hunter 2016-01-21 23:51 ` Chris Murphy 0 siblings, 1 reply; 8+ messages in thread From: Rian Hunter @ 2016-01-21 22:25 UTC (permalink / raw) To: Chris Murphy; +Cc: Rich Freeman, Btrfs BTRFS On Thu, 21 Jan 2016, Chris Murphy wrote: > But none of this explains why corruption happened. So I'd say it's a > bug. The question is, is it discretely reproducible? Once there's > concise reproduce steps, it's much more likely a dev can reproduce and > debug what the problem is. > > Maybe there's a problem where the reintroduction of a previously > missing device in the middle of being replaced. The devid is different > from dev.uuid, but they are mapped somehow, and mainly it's the devid > being used by the filesystem chunk tree. So maybe confusion happens > where there are suddenly two devid 2, even though they have different > dev UUID. Let me put the sequence of events into more discrete steps so that any interested dev may reproduce the bug. For the record I maintain daily backups and specifically made a backup before the events described so I was not adversely affected by this bug except through lost time. Kernel uname -a: Linux nori 4.3.0-1-amd64 #1 SMP Debian 4.3.3-5 (2016-01-04) x86_64 GNU/Linux Start state: Normally functioning raid6 array. Device FOO intermittently fails and requires power cycle to work again. This has happened 25-50 times in the past with no irresolvable data corruption. Decision has finally been made to replace it. * Unmount raid6 FS * Disconnect array. * Physically remove device FOO from array, add new device BAR to array. * Connect array * Mount raid6 array with "-o degraded" * Run "btrfs replace start 2 /dev/BAR /mnt" * Start VMs on FS * Machine freezes (not sure why) * Restart machine * Mount raid6 array with "-o degraded" * Replace job continues automatically * Start VMs on FS * After an hour: VMs have no started up yet (seeing hung-task warnings in kernel). "btrfs replace status /mnt" shows 0.1% done * Cancel replace: "btrfs replace cancel /mnt" * Unmount raid6 FS * Disconnect array * Physically add device FOO back to array * Reconnect array * Mount raid6 array normally (no "-o degraded") * Run "btrfs replace start 2 /dev/BAR /mnt" * Start VMs on FS * VMs starting is still too slow, kill all VMs except for a critical one, VMFOO. Let VMFOO attempt to start. * After an hour: lots of "parent transid verify failed" in dmesg, I get nervous. I attribute it to re-adding the disk. Also seeing "hung-task" corresponding to the starting of VMFOO. * Forcibly Stop VMFOO. * Cancel replace: "btrfs replace cancel /mnt" * Unmount raid6 FS * Disconnect array * Physically remove device FOO from array * Reconnect array * Mount raid6 array with "-o degraded" * Run "btrfs replace start 2 /dev/BAR /mnt" * After an hour: Replace operation was automatically cancelled, lots of "parent transid verify failed" in dmesg again. * Run "btrfs scrub," "btrfs scrub status" shows millions of unrecoverable errors * Cancel "btrfs scrub" * At this point I'm convinced this FS is in a very broken state and I try to salvage whatever data could have changed since beginning the process. * Tried rsync'ing VMFOO off disk, lots of "parent transid verify failed" in dmesg, rsync reporting "stale file handle" and it seems stalled, iostat reporting no disk IO, all CPUS burning in system time (not user mode), system memory usage steadily rising. * Unmount raid6 array * Mount raid6 array with "-o degraded", fails: drive would not appear mounted for some reason. (no message in dmesg but there was nothing under mount point, and no failure from the "mount" command) * Restart machine * Mount raid6 array with "-o degraded", success * Tried rsync'ing VMFOO off disk again, lots of "parent transid verify failed" in dmesg, rsync reporting "stale file h andle," seems stalled, iostat reporting no disk IO, all CPUS burning, system memory usage steadily rising. * Tried rsync'ing other VMs, saw additional "parent transid verify" message in the kernel but the rsync completed successfully (though there was some corruption). * Tried rsync'ing some other non-VM data, everything proceeded normally, invoked no "parent transid verify failed" errors. After this I completely reformatted the FS and restored it from backup so unfortunately I don't have any way to investigate the remnants of what happened. When I personally was trying to figure out what happened, some observations stood out: * The fact that VMFOO's data was in a very bad state * Yet the other VMs were in a bad, but less bad, state * Also the non-VM data was uncorrupted. * The errors first occured during replacing after mounting the FS normally after previously mounting it with "-o degraded" >From a black box perspective, this led me to believe that the corruption happened during the replace operation after mounting normally after first mounting with "-o degraded." Of course, knowledge of the internals could easily verify this. HTH! I'm not on this mailing list so please reply directly to me. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: FS corruption when mounting non-degraded after mounting degraded 2016-01-21 22:25 ` Rian Hunter @ 2016-01-21 23:51 ` Chris Murphy 2016-01-22 1:21 ` Rian Hunter 0 siblings, 1 reply; 8+ messages in thread From: Chris Murphy @ 2016-01-21 23:51 UTC (permalink / raw) To: Rian Hunter; +Cc: Btrfs BTRFS On Thu, Jan 21, 2016 at 3:25 PM, Rian Hunter <rian@thelig.ht> wrote: > > Start state: Normally functioning raid6 array. Device FOO intermittently > fails and requires power cycle to work again. This has happened 25-50 > times in the past with no irresolvable data corruption. For each drive, what are the results: # smartctl -l scterc /dev/sdX # cat /sys/block/sdX/device/timeout > > * Unmount raid6 FS > * Disconnect array. > * Physically remove device FOO from array, add new device BAR to array. > * Connect array > * Mount raid6 array with "-o degraded" > * Run "btrfs replace start 2 /dev/BAR /mnt" > * Start VMs on FS > * Machine freezes (not sure why) > * Restart machine > * Mount raid6 array with "-o degraded" > * Replace job continues automatically > * Start VMs on FS > * After an hour: VMs have no started up yet (seeing hung-task > warnings in kernel). "btrfs replace status /mnt" shows 0.1% done Do you have dmesg output that includes sysrq+w at the time of the hung tasks warnings? That's pretty much always requested by devs in these cases. > * Cancel replace: "btrfs replace cancel /mnt" > * Unmount raid6 FS > * Disconnect array > * Physically add device FOO back to array > * Reconnect array > * Mount raid6 array normally (no "-o degraded") > * Run "btrfs replace start 2 /dev/BAR /mnt" Hmm. Open question if 'btrfs replace cancel' actually marked /dev/BAR as wiped. If it doesn't, then this 2nd replace start should have failed unless you had used -f, or you used wipefs -a. If it's not wiped by any of the above, then I'd expect it's possible things get messy. > * Mount raid6 array with "-o degraded" > * Run "btrfs replace start 2 /dev/BAR /mnt" > * After an hour: Replace operation was automatically cancelled, lots > of "parent transid verify failed" in dmesg again. > * Run "btrfs scrub," "btrfs scrub status" shows millions of > unrecoverable errors Some others (no devs however) have disagreed with me on this, so take it with a grain of salt, but I don't understand the rationale of running scrub on degraded arrays. The first order of business is to get it non-degraded. If that can't be done, scrub is pointless. Of course it's going to produce millions of unrecoverable errors. There's a device missing, so I'd expect many unrecoverable errors during a degraded scrub. > * Cancel "btrfs scrub" > * At this point I'm convinced this FS is in a very broken state and I > try to salvage whatever data could have changed since beginning the > process. Agreed. Certainly not reassuring. > From a black box perspective, this led me to believe that the > corruption happened during the replace operation after mounting > normally after first mounting with "-o degraded." Of course, > knowledge of the internals could easily verify this. Filesystems are really difficult, so even knowledge of the internals doesn't guarantee the devs will understand where the problem first started in this case. -- Chris Murphy ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: FS corruption when mounting non-degraded after mounting degraded 2016-01-21 23:51 ` Chris Murphy @ 2016-01-22 1:21 ` Rian Hunter 2016-01-22 3:38 ` Chris Murphy 0 siblings, 1 reply; 8+ messages in thread From: Rian Hunter @ 2016-01-22 1:21 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS On Thu, 21 Jan 2016, Chris Murphy wrote: > On Thu, Jan 21, 2016 at 3:25 PM, Rian Hunter <rian@thelig.ht> wrote: >> Start state: Normally functioning raid6 array. Device FOO intermittently >> fails and requires power cycle to work again. This has happened 25-50 >> times in the past with no irresolvable data corruption. > > For each drive, what are the results: > # smartctl -l scterc /dev/sdX > # cat /sys/block/sdX/device/timeout I no longer have the failing drive in my array and it would be a bit hard to replace it now that all drive bays are full. It was the same make/model as my other drives. Notably, the SMART attribute "Command_Timeout" was in the millions when I removed the device and that is suppossed to be highly correlated with drive failure. "SCT Error Recovery Control command not supported" is printed for all my devices for the "smartctl" command. # cat /sys/block/sd{a,b,c,d,e,f,g,h}/device/timeout 30 30 30 30 30 30 30 30 >> * Unmount raid6 FS >> * Disconnect array. >> * Physically remove device FOO from array, add new device BAR to array. >> * Connect array >> * Mount raid6 array with "-o degraded" >> * Run "btrfs replace start 2 /dev/BAR /mnt" >> * Start VMs on FS >> * Machine freezes (not sure why) >> * Restart machine >> * Mount raid6 array with "-o degraded" >> * Replace job continues automatically >> * Start VMs on FS >> * After an hour: VMs have no started up yet (seeing hung-task >> warnings in kernel). "btrfs replace status /mnt" shows 0.1% done > > Do you have dmesg output that includes sysrq+w at the time of the hung > tasks warnings? That's pretty much always requested by devs in these > cases. This is dmesg before the freeze: Jan 18 15:15:40 nori kernel: [1019147.178187] INFO: task btrfs-transacti:24186 blocked for more than 120 seconds. Jan 18 15:15:40 nori kernel: [1019147.178196] Tainted: G W 4.3.0-1-amd64 #1 Jan 18 15:15:40 nori kernel: [1019147.178199] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 18 15:15:40 nori kernel: [1019147.178202] btrfs-transacti D ffffffff810a0850 0 24186 2 0x00000000 Jan 18 15:15:40 nori kernel: [1019147.178211] ffff88034726e0c0 0000000000000046 ffff88046c6ab040 ffff88046ab84000 Jan 18 15:15:40 nori kernel: [1019147.178217] ffff8803b844ed08 ffff8803b844ecd0 ffff88046ab83de8 ffff88042488fb70 Jan 18 15:15:40 nori kernel: [1019147.178222] 0000000000000000 ffffffff8157d6af ffff8803b844e000 ffffffffa038362d Jan 18 15:15:40 nori kernel: [1019147.178228] Call Trace: Jan 18 15:15:40 nori kernel: [1019147.178240] [<ffffffff8157d6af>] ? schedule+0x2f/0x70 Jan 18 15:15:40 nori kernel: [1019147.178284] [<ffffffffa038362d>] ? btrfs_scrub_pause+0x8d/0x100 [btrfs] Jan 18 15:15:40 nori kernel: [1019147.178291] [<ffffffff810b1520>] ? wake_atomic_t_function+0x60/0x60 Jan 18 15:15:40 nori kernel: [1019147.178322] [<ffffffffa03270fa>] ? btrfs_commit_transaction+0x47a/0xad0 [btrfs] Jan 18 15:15:40 nori kernel: [1019147.178350] [<ffffffffa03277e7>] ? start_transaction+0x97/0x590 [btrfs] Jan 18 15:15:40 nori kernel: [1019147.178376] [<ffffffffa03223b0>] ? transaction_kthread+0x220/0x240 [btrfs] Jan 18 15:15:40 nori kernel: [1019147.178400] [<ffffffffa0322190>] ? btrfs_cleanup_transaction+0x540/0x540 [btrfs] Jan 18 15:15:40 nori kernel: [1019147.178407] [<ffffffff8109044d>] ? kthread+0xcd/0xf0 Jan 18 15:15:40 nori kernel: [1019147.178414] [<ffffffff81090380>] ? kthread_create_on_node+0x190/0x190 Jan 18 15:15:40 nori kernel: [1019147.178420] [<ffffffff815816cf>] ? ret_from_fork+0x3f/0x70 Jan 18 15:15:40 nori kernel: [1019147.178426] [<ffffffff81090380>] ? kthread_create_on_node+0x190/0x190 Jan 18 15:17:40 nori kernel: [1019266.951118] INFO: task btrfs-transacti:24186 blocked for more than 120 seconds. Jan 18 15:17:40 nori kernel: [1019266.951126] Tainted: G W 4.3.0-1-amd64 #1 Jan 18 15:17:40 nori kernel: [1019266.951129] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 18 15:17:40 nori kernel: [1019266.951133] btrfs-transacti D ffffffff810a0850 0 24186 2 0x00000000 Jan 18 15:17:40 nori kernel: [1019266.951141] ffff88034726e0c0 0000000000000046 ffff88046c6ab040 ffff88046ab84000 Jan 18 15:17:40 nori kernel: [1019266.951147] ffff8803b844ed08 ffff8803b844ecd0 ffff88046ab83de8 ffff88042488fb70 Jan 18 15:17:40 nori kernel: [1019266.951152] 0000000000000000 ffffffff8157d6af ffff8803b844e000 ffffffffa038362d Jan 18 15:17:40 nori kernel: [1019266.951158] Call Trace: Jan 18 15:17:40 nori kernel: [1019266.951170] [<ffffffff8157d6af>] ? schedule+0x2f/0x70 Jan 18 15:17:40 nori kernel: [1019266.951213] [<ffffffffa038362d>] ? btrfs_scrub_pause+0x8d/0x100 [btrfs] Jan 18 15:17:40 nori kernel: [1019266.951219] [<ffffffff810b1520>] ? wake_atomic_t_function+0x60/0x60 Jan 18 15:17:40 nori kernel: [1019266.951251] [<ffffffffa03270fa>] ? btrfs_commit_transaction+0x47a/0xad0 [btrfs] Jan 18 15:17:40 nori kernel: [1019266.951278] [<ffffffffa03277e7>] ? start_transaction+0x97/0x590 [btrfs] Jan 18 15:17:40 nori kernel: [1019266.951304] [<ffffffffa03223b0>] ? transaction_kthread+0x220/0x240 [btrfs] Jan 18 15:17:40 nori kernel: [1019266.951328] [<ffffffffa0322190>] ? btrfs_cleanup_transaction+0x540/0x540 [btrfs] Jan 18 15:17:40 nori kernel: [1019266.951335] [<ffffffff8109044d>] ? kthread+0xcd/0xf0 Jan 18 15:17:40 nori kernel: [1019266.951341] [<ffffffff81090380>] ? kthread_create_on_node+0x190/0x190 Jan 18 15:17:40 nori kernel: [1019266.951348] [<ffffffff815816cf>] ? ret_from_fork+0x3f/0x70 Jan 18 15:17:40 nori kernel: [1019266.951354] [<ffffffff81090380>] ? kthread_create_on_node+0x190/0x190 Jan 18 15:17:40 nori kernel: [1019266.951368] INFO: task lxc-start:25363 blocked for more than 120 seconds. Jan 18 15:17:40 nori kernel: [1019266.951371] Tainted: G W 4.3.0-1-amd64 #1 Jan 18 15:17:40 nori kernel: [1019266.951374] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 18 15:17:40 nori kernel: [1019266.951376] lxc-start D ffffffff810a0850 0 25363 1 0x00000000 Jan 18 15:17:40 nori kernel: [1019266.951382] ffff8801d8319140 0000000000000082 ffff88046c6f8080 ffff88002f49c000 Jan 18 15:17:40 nori kernel: [1019266.951387] ffff8803dab7e1f0 ffff880237740b98 ffff8803dab7e1f0 0000000000000001 Jan 18 15:17:40 nori kernel: [1019266.951392] ffff8801d8319140 ffffffff8157d6af ffff88042488fb70 ffffffffa032611d Jan 18 15:17:40 nori kernel: [1019266.951397] Call Trace: Jan 18 15:17:40 nori kernel: [1019266.951402] [<ffffffff8157d6af>] ? schedule+0x2f/0x70 Jan 18 15:17:40 nori kernel: [1019266.951424] [<ffffffffa032611d>] ? wait_current_trans.isra.21+0xcd/0x110 [btrfs] Jan 18 15:17:40 nori kernel: [1019266.951429] [<ffffffff810b1520>] ? wake_atomic_t_function+0x60/0x60 Jan 18 15:17:40 nori kernel: [1019266.951453] [<ffffffffa0327b66>] ? start_transaction+0x416/0x590 [btrfs] Jan 18 15:17:40 nori kernel: [1019266.951481] [<ffffffffa0336309>] ? btrfs_create+0x49/0x1f0 [btrfs] Jan 18 15:17:40 nori kernel: [1019266.951488] [<ffffffff811d856a>] ? vfs_create+0xba/0x120 Jan 18 15:17:40 nori kernel: [1019266.951494] [<ffffffff811db9d0>] ? path_openat+0x1120/0x1260 Jan 18 15:17:40 nori kernel: [1019266.951498] [<ffffffff811d6d00>] ? terminate_walk+0x40/0xc0 Jan 18 15:17:40 nori kernel: [1019266.951503] [<ffffffff811dc0e1>] ? filename_lookup+0xf1/0x180 Jan 18 15:17:40 nori kernel: [1019266.951509] [<ffffffff811dcca1>] ? do_filp_open+0x91/0x100 Jan 18 15:17:40 nori kernel: [1019266.951516] [<ffffffff811cc4e9>] ? do_sys_open+0x139/0x220 Jan 18 15:17:40 nori kernel: [1019266.951522] [<ffffffff81581332>] ? system_call_fast_compare_end+0xc/0x67 Relavant dmesg after the restart: Jan 18 15:24:06 nori kernel: [ 205.325598] BTRFS: device fsid f3b9a0ff-85f4-40d0-a130-04db96d92836 devid 5 transid 67149 /dev/dm-3 Jan 18 15:24:19 nori kernel: [ 218.385633] BTRFS: device fsid f3b9a0ff-85f4-40d0-a130-04db96d92836 devid 4 transid 67149 /dev/dm-4 Jan 18 15:24:24 nori kernel: [ 223.122442] BTRFS: device fsid f3b9a0ff-85f4-40d0-a130-04db96d92836 devid 1 transid 67149 /dev/dm-5 Jan 18 15:24:30 nori kernel: [ 228.995711] BTRFS: device fsid f3b9a0ff-85f4-40d0-a130-04db96d92836 devid 3 transid 67149 /dev/dm-6 Jan 18 15:24:43 nori kernel: [ 242.305643] BTRFS: device fsid f3b9a0ff-85f4-40d0-a130-04db96d92836 devid 0 transid 67149 /dev/dm-7 Jan 18 15:24:44 nori kernel: [ 243.036998] BTRFS info (device dm-7): allowing degraded mounts Jan 18 15:24:44 nori kernel: [ 243.037009] BTRFS info (device dm-7): disk space caching is enabled Jan 18 15:24:44 nori kernel: [ 243.037013] BTRFS: has skinny extents Jan 18 15:24:44 nori kernel: [ 243.038806] BTRFS warning (device dm-7): devid 2 uuid 4ac0186c-ad43-448c-855b-df265ae31975 is missing Jan 18 15:24:48 nori kernel: [ 247.008168] BTRFS: continuing dev_replace from <missing disk> (devid 2) to /dev/dm-7 @0% Jan 18 15:28:41 nori kernel: [ 479.247257] device vethAG2VFQ entered promiscuous mode Jan 18 15:28:41 nori kernel: [ 479.247302] IPv6: ADDRCONF(NETDEV_UP): vethAG2VFQ: link is not ready Jan 18 15:28:42 nori kernel: [ 480.263321] eth0: renamed from vethGCO23Y Jan 18 15:28:42 nori kernel: [ 480.289101] IPv6: ADDRCONF(NETDEV_CHANGE): vethAG2VFQ: link becomes ready Jan 18 15:28:42 nori kernel: [ 480.289150] lxc-bridge-nat: port 1(vethAG2VFQ) entered forwarding state Jan 18 15:28:42 nori kernel: [ 480.289172] lxc-bridge-nat: port 1(vethAG2VFQ) entered forwarding state Jan 18 15:48:41 nori kernel: [ 1676.618159] INFO: task btrfs-transacti:2351 blocked for more than 120 seconds. Jan 18 15:48:41 nori kernel: [ 1676.618167] Not tainted 4.3.0-1-amd64 #1 Jan 18 15:48:41 nori kernel: [ 1676.618170] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 18 15:48:41 nori kernel: [ 1676.618174] btrfs-transacti D ffffffff810a08c0 0 2351 2 0x00000000 Jan 18 15:48:41 nori kernel: [ 1676.618182] ffff88044c282100 0000000000000046 ffff88046c6a9180 ffff88044c36c000 Jan 18 15:48:41 nori kernel: [ 1676.618188] ffff88044c36ba50 7fffffffffffffff ffffffff815818d0 ffff88044c36bac8 Jan 18 15:48:41 nori kernel: [ 1676.618193] ffffea000fdaa880 ffffffff8158113f 0000000000000000 ffffffff81583ee9 Jan 18 15:48:41 nori kernel: [ 1676.618199] Call Trace: Jan 18 15:48:41 nori kernel: [ 1676.618211] [<ffffffff815818d0>] ? bit_wait+0x40/0x40 Jan 18 15:48:41 nori kernel: [ 1676.618216] [<ffffffff8158113f>] ? schedule+0x2f/0x70 Jan 18 15:48:41 nori kernel: [ 1676.618223] [<ffffffff81583ee9>] ? schedule_timeout+0x189/0x250 Jan 18 15:48:41 nori kernel: [ 1676.618262] [<ffffffffa038d930>] ? __raid56_parity_write+0x30/0x40 [btrfs] Jan 18 15:48:41 nori kernel: [ 1676.618295] [<ffffffffa038eb6f>] ? raid56_parity_write+0x15f/0x170 [btrfs] Jan 18 15:48:41 nori kernel: [ 1676.618324] [<ffffffffa038b87e>] ? btrfs_bio_counter_sub+0x1e/0x50 [btrfs] Jan 18 15:48:41 nori kernel: [ 1676.618355] [<ffffffffa0347e0b>] ? btrfs_map_bio+0xfb/0x330 [btrfs] Jan 18 15:48:41 nori kernel: [ 1676.618360] [<ffffffff815818d0>] ? bit_wait+0x40/0x40 Jan 18 15:48:41 nori kernel: [ 1676.618365] [<ffffffff815807cd>] ? io_schedule_timeout+0x9d/0x100 Jan 18 15:48:41 nori kernel: [ 1676.618370] [<ffffffff81581901>] ? bit_wait_io+0x31/0x50 Jan 18 15:48:41 nori kernel: [ 1676.618375] [<ffffffff815814d8>] ? __wait_on_bit+0x58/0x90 Jan 18 15:48:41 nori kernel: [ 1676.618402] [<ffffffffa031e042>] ? btrfs_submit_bio_hook+0xe2/0x190 [btrfs] Jan 18 15:48:41 nori kernel: [ 1676.618408] [<ffffffff8115e646>] ? wait_on_page_bit+0xc6/0xe0 Jan 18 15:48:41 nori kernel: [ 1676.618414] [<ffffffff810b15d0>] ? autoremove_wake_function+0x40/0x40 Jan 18 15:48:41 nori kernel: [ 1676.618445] [<ffffffffa033ce52>] ? extent_write_cache_pages.isra.30.constprop.50+0x252/0x360 [btrfs] Jan 18 15:48:41 nori kernel: [ 1676.618475] [<ffffffffa033e0ad>] ? extent_writepages+0x5d/0x90 [btrfs] Jan 18 15:48:41 nori kernel: [ 1676.618503] [<ffffffffa0320660>] ? btrfs_real_readdir+0x530/0x530 [btrfs] Jan 18 15:48:41 nori kernel: [ 1676.618509] [<ffffffff81160465>] ? __filemap_fdatawrite_range+0xc5/0x100 Jan 18 15:48:41 nori kernel: [ 1676.618537] [<ffffffffa0331dfb>] ? btrfs_fdatawrite_range+0x1b/0x50 [btrfs] Jan 18 15:48:41 nori kernel: [ 1676.618567] [<ffffffffa035e397>] ? __btrfs_write_out_cache.isra.25+0x407/0x450 [btrfs] Jan 18 15:48:41 nori kernel: [ 1676.618596] [<ffffffffa035f891>] ? btrfs_write_out_cache+0x91/0xe0 [btrfs] Jan 18 15:48:41 nori kernel: [ 1676.618619] [<ffffffffa030adea>] ? btrfs_start_dirty_block_groups+0x1ea/0x3e0 [btrfs] Jan 18 15:48:41 nori kernel: [ 1676.618644] [<ffffffffa031be44>] ? btrfs_commit_transaction+0x1c4/0xad0 [btrfs] Jan 18 15:48:41 nori kernel: [ 1676.618668] [<ffffffffa031c7e7>] ? start_transaction+0x97/0x590 [btrfs] Jan 18 15:48:41 nori kernel: [ 1676.618690] [<ffffffffa03173b0>] ? transaction_kthread+0x220/0x240 [btrfs] Jan 18 15:48:41 nori kernel: [ 1676.618711] [<ffffffffa0317190>] ? btrfs_cleanup_transaction+0x540/0x540 [btrfs] Jan 18 15:48:41 nori kernel: [ 1676.618718] [<ffffffff810904bd>] ? kthread+0xcd/0xf0 Jan 18 15:48:41 nori kernel: [ 1676.618724] [<ffffffff810903f0>] ? kthread_create_on_node+0x190/0x190 Jan 18 15:48:41 nori kernel: [ 1676.618730] [<ffffffff8158514f>] ? ret_from_fork+0x3f/0x70 Jan 18 15:48:41 nori kernel: [ 1676.618736] [<ffffffff810903f0>] ? kthread_create_on_node+0x190/0x190 Jan 18 15:50:41 nori kernel: [ 1796.383713] INFO: task btrfs-transacti:2351 blocked for more than 120 seconds. Jan 18 15:50:41 nori kernel: [ 1796.383721] Not tainted 4.3.0-1-amd64 #1 Jan 18 15:50:41 nori kernel: [ 1796.383724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 18 15:50:41 nori kernel: [ 1796.383727] btrfs-transacti D ffffffff810a08c0 0 2351 2 0x00000000 Jan 18 15:50:41 nori kernel: [ 1796.383736] ffff88044c282100 0000000000000046 ffff88046c6a9180 ffff88044c36c000 Jan 18 15:50:41 nori kernel: [ 1796.383742] ffff88044c36ba50 7fffffffffffffff ffffffff815818d0 ffff88044c36bac8 Jan 18 15:50:41 nori kernel: [ 1796.383747] ffffea000fdaa880 ffffffff8158113f 0000000000000000 ffffffff81583ee9 Jan 18 15:50:41 nori kernel: [ 1796.383752] Call Trace: Jan 18 15:50:41 nori kernel: [ 1796.383764] [<ffffffff815818d0>] ? bit_wait+0x40/0x40 Jan 18 15:50:41 nori kernel: [ 1796.383770] [<ffffffff8158113f>] ? schedule+0x2f/0x70 Jan 18 15:50:41 nori kernel: [ 1796.383776] [<ffffffff81583ee9>] ? schedule_timeout+0x189/0x250 Jan 18 15:50:41 nori kernel: [ 1796.383817] [<ffffffffa038d930>] ? __raid56_parity_write+0x30/0x40 [btrfs] Jan 18 15:50:41 nori kernel: [ 1796.383853] [<ffffffffa038eb6f>] ? raid56_parity_write+0x15f/0x170 [btrfs] Jan 18 15:50:41 nori kernel: [ 1796.383884] [<ffffffffa038b87e>] ? btrfs_bio_counter_sub+0x1e/0x50 [btrfs] Jan 18 15:50:41 nori kernel: [ 1796.383916] [<ffffffffa0347e0b>] ? btrfs_map_bio+0xfb/0x330 [btrfs] Jan 18 15:50:41 nori kernel: [ 1796.383921] [<ffffffff815818d0>] ? bit_wait+0x40/0x40 Jan 18 15:50:41 nori kernel: [ 1796.383926] [<ffffffff815807cd>] ? io_schedule_timeout+0x9d/0x100 Jan 18 15:50:41 nori kernel: [ 1796.383931] [<ffffffff81581901>] ? bit_wait_io+0x31/0x50 Jan 18 15:50:41 nori kernel: [ 1796.383936] [<ffffffff815814d8>] ? __wait_on_bit+0x58/0x90 Jan 18 15:50:41 nori kernel: [ 1796.383963] [<ffffffffa031e042>] ? btrfs_submit_bio_hook+0xe2/0x190 [btrfs] Jan 18 15:50:41 nori kernel: [ 1796.383970] [<ffffffff8115e646>] ? wait_on_page_bit+0xc6/0xe0 Jan 18 15:50:41 nori kernel: [ 1796.383976] [<ffffffff810b15d0>] ? autoremove_wake_function+0x40/0x40 Jan 18 15:50:41 nori kernel: [ 1796.384008] [<ffffffffa033ce52>] ? extent_write_cache_pages.isra.30.constprop.50+0x252/0x360 [btrfs] Jan 18 15:50:41 nori kernel: [ 1796.384040] [<ffffffffa033e0ad>] ? extent_writepages+0x5d/0x90 [btrfs] Jan 18 15:50:41 nori kernel: [ 1796.384068] [<ffffffffa0320660>] ? btrfs_real_readdir+0x530/0x530 [btrfs] Jan 18 15:50:41 nori kernel: [ 1796.384074] [<ffffffff81160465>] ? __filemap_fdatawrite_range+0xc5/0x100 Jan 18 15:50:41 nori kernel: [ 1796.384103] [<ffffffffa0331dfb>] ? btrfs_fdatawrite_range+0x1b/0x50 [btrfs] Jan 18 15:50:41 nori kernel: [ 1796.384133] [<ffffffffa035e397>] ? __btrfs_write_out_cache.isra.25+0x407/0x450 [btrfs] Jan 18 15:50:41 nori kernel: [ 1796.384162] [<ffffffffa035f891>] ? btrfs_write_out_cache+0x91/0xe0 [btrfs] Jan 18 15:50:41 nori kernel: [ 1796.384186] [<ffffffffa030adea>] ? btrfs_start_dirty_block_groups+0x1ea/0x3e0 [btrfs] Jan 18 15:50:41 nori kernel: [ 1796.384212] [<ffffffffa031be44>] ? btrfs_commit_transaction+0x1c4/0xad0 [btrfs] Jan 18 15:50:41 nori kernel: [ 1796.384235] [<ffffffffa031c7e7>] ? start_transaction+0x97/0x590 [btrfs] Jan 18 15:50:41 nori kernel: [ 1796.384258] [<ffffffffa03173b0>] ? transaction_kthread+0x220/0x240 [btrfs] Jan 18 15:50:41 nori kernel: [ 1796.384280] [<ffffffffa0317190>] ? btrfs_cleanup_transaction+0x540/0x540 [btrfs] Jan 18 15:50:41 nori kernel: [ 1796.384287] [<ffffffff810904bd>] ? kthread+0xcd/0xf0 Jan 18 15:50:41 nori kernel: [ 1796.384293] [<ffffffff810903f0>] ? kthread_create_on_node+0x190/0x190 Jan 18 15:50:41 nori kernel: [ 1796.384300] [<ffffffff8158514f>] ? ret_from_fork+0x3f/0x70 Jan 18 15:50:41 nori kernel: [ 1796.384305] [<ffffffff810903f0>] ? kthread_create_on_node+0x190/0x190 Jan 18 15:54:41 nori kernel: [ 2035.918749] INFO: task btrfs-transacti:2351 blocked for more than 120 seconds. Jan 18 15:54:41 nori kernel: [ 2035.918757] Not tainted 4.3.0-1-amd64 #1 Jan 18 15:54:41 nori kernel: [ 2035.918760] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 18 15:54:41 nori kernel: [ 2035.918763] btrfs-transacti D ffffffff810a08c0 0 2351 2 0x00000000 Jan 18 15:54:41 nori kernel: [ 2035.918772] ffff88044c282100 0000000000000046 ffff88046c6f6140 ffff88044c36c000 Jan 18 15:54:41 nori kernel: [ 2035.918777] ffff88046a970d08 ffff88046a970cd0 ffff88044c36bde8 ffff880450288d10 Jan 18 15:54:41 nori kernel: [ 2035.918782] 0000000000000000 ffffffff8158113f ffff88046a970000 ffffffffa037862d Jan 18 15:54:41 nori kernel: [ 2035.918788] Call Trace: Jan 18 15:54:41 nori kernel: [ 2035.918800] [<ffffffff8158113f>] ? schedule+0x2f/0x70 Jan 18 15:54:41 nori kernel: [ 2035.918844] [<ffffffffa037862d>] ? btrfs_scrub_pause+0x8d/0x100 [btrfs] Jan 18 15:54:41 nori kernel: [ 2035.918851] [<ffffffff810b1590>] ? wake_atomic_t_function+0x60/0x60 Jan 18 15:54:41 nori kernel: [ 2035.918881] [<ffffffffa031c0fa>] ? btrfs_commit_transaction+0x47a/0xad0 [btrfs] Jan 18 15:54:41 nori kernel: [ 2035.918907] [<ffffffffa031c7e7>] ? start_transaction+0x97/0x590 [btrfs] Jan 18 15:54:41 nori kernel: [ 2035.918932] [<ffffffffa03173b0>] ? transaction_kthread+0x220/0x240 [btrfs] Jan 18 15:54:41 nori kernel: [ 2035.918954] [<ffffffffa0317190>] ? btrfs_cleanup_transaction+0x540/0x540 [btrfs] Jan 18 15:54:41 nori kernel: [ 2035.918961] [<ffffffff810904bd>] ? kthread+0xcd/0xf0 Jan 18 15:54:41 nori kernel: [ 2035.918968] [<ffffffff810903f0>] ? kthread_create_on_node+0x190/0x190 Jan 18 15:54:41 nori kernel: [ 2035.918975] [<ffffffff8158514f>] ? ret_from_fork+0x3f/0x70 Jan 18 15:54:41 nori kernel: [ 2035.918980] [<ffffffff810903f0>] ? kthread_create_on_node+0x190/0x190 Jan 18 15:54:41 nori kernel: [ 2035.918999] INFO: task gzip:5211 blocked for more than 120 seconds. Jan 18 15:54:41 nori kernel: [ 2035.919002] Not tainted 4.3.0-1-amd64 #1 Jan 18 15:54:41 nori kernel: [ 2035.919004] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 18 15:54:41 nori kernel: [ 2035.919007] gzip D ffffffff810a08c0 0 5211 5173 0x00000100 Jan 18 15:54:41 nori kernel: [ 2035.919012] ffff8803f941f0c0 0000000000000086 ffff88046c6a8140 ffff8803de66c000 Jan 18 15:54:41 nori kernel: [ 2035.919018] ffff88044c2991f0 ffff8803f4f3eae0 ffff88044c2991f0 0000000000000001 Jan 18 15:54:41 nori kernel: [ 2035.919023] ffff8803f941f0c0 ffffffff8158113f ffff880450288d10 ffffffffa031b11d Jan 18 15:54:41 nori kernel: [ 2035.919028] Call Trace: Jan 18 15:54:41 nori kernel: [ 2035.919033] [<ffffffff8158113f>] ? schedule+0x2f/0x70 Jan 18 15:54:41 nori kernel: [ 2035.919056] [<ffffffffa031b11d>] ? wait_current_trans.isra.21+0xcd/0x110 [btrfs] Jan 18 15:54:41 nori kernel: [ 2035.919061] [<ffffffff810b1590>] ? wake_atomic_t_function+0x60/0x60 Jan 18 15:54:41 nori kernel: [ 2035.919083] [<ffffffffa031cb66>] ? start_transaction+0x416/0x590 [btrfs] Jan 18 15:54:41 nori kernel: [ 2035.919088] [<ffffffff811e5935>] ? __d_lookup+0x75/0x150 Jan 18 15:54:41 nori kernel: [ 2035.919111] [<ffffffffa031da0f>] ? __unlink_start_trans+0x1f/0xb0 [btrfs] Jan 18 15:54:41 nori kernel: [ 2035.919135] [<ffffffffa0325d23>] ? btrfs_unlink+0x23/0xa0 [btrfs] Jan 18 15:54:41 nori kernel: [ 2035.919141] [<ffffffff811d9667>] ? vfs_unlink+0xe7/0x190 Jan 18 15:54:41 nori kernel: [ 2035.919147] [<ffffffff811dcaa9>] ? do_unlinkat+0x299/0x2d0 Jan 18 15:54:41 nori kernel: [ 2035.919154] [<ffffffff81584db2>] ? system_call_fast_compare_end+0xc/0x67 Jan 18 15:56:41 nori kernel: [ 2155.710465] INFO: task btrfs-transacti:2351 blocked for more than 120 seconds. Jan 18 15:56:41 nori kernel: [ 2155.710473] Not tainted 4.3.0-1-amd64 #1 Jan 18 15:56:41 nori kernel: [ 2155.710475] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 18 15:56:41 nori kernel: [ 2155.710478] btrfs-transacti D ffffffff810a08c0 0 2351 2 0x00000000 Jan 18 15:56:41 nori kernel: [ 2155.710487] ffff88044c282100 0000000000000046 ffff88046c6f6140 ffff88044c36c000 Jan 18 15:56:41 nori kernel: [ 2155.710493] ffff88046a970d08 ffff88046a970cd0 ffff88044c36bde8 ffff880450288d10 Jan 18 15:56:41 nori kernel: [ 2155.710498] 0000000000000000 ffffffff8158113f ffff88046a970000 ffffffffa037862d Jan 18 15:56:41 nori kernel: [ 2155.710504] Call Trace: Jan 18 15:56:41 nori kernel: [ 2155.710516] [<ffffffff8158113f>] ? schedule+0x2f/0x70 Jan 18 15:56:41 nori kernel: [ 2155.710559] [<ffffffffa037862d>] ? btrfs_scrub_pause+0x8d/0x100 [btrfs] Jan 18 15:56:41 nori kernel: [ 2155.710566] [<ffffffff810b1590>] ? wake_atomic_t_function+0x60/0x60 Jan 18 15:56:41 nori kernel: [ 2155.710597] [<ffffffffa031c0fa>] ? btrfs_commit_transaction+0x47a/0xad0 [btrfs] Jan 18 15:56:41 nori kernel: [ 2155.710624] [<ffffffffa031c7e7>] ? start_transaction+0x97/0x590 [btrfs] Jan 18 15:56:41 nori kernel: [ 2155.710650] [<ffffffffa03173b0>] ? transaction_kthread+0x220/0x240 [btrfs] Jan 18 15:56:41 nori kernel: [ 2155.710674] [<ffffffffa0317190>] ? btrfs_cleanup_transaction+0x540/0x540 [btrfs] Jan 18 15:56:41 nori kernel: [ 2155.710681] [<ffffffff810904bd>] ? kthread+0xcd/0xf0 Jan 18 15:56:41 nori kernel: [ 2155.710688] [<ffffffff810903f0>] ? kthread_create_on_node+0x190/0x190 Jan 18 15:56:41 nori kernel: [ 2155.710695] [<ffffffff8158514f>] ? ret_from_fork+0x3f/0x70 Jan 18 15:56:41 nori kernel: [ 2155.710700] [<ffffffff810903f0>] ? kthread_create_on_node+0x190/0x190 Jan 18 15:56:41 nori kernel: [ 2155.710719] INFO: task gzip:5211 blocked for more than 120 seconds. Jan 18 15:56:41 nori kernel: [ 2155.710722] Not tainted 4.3.0-1-amd64 #1 Jan 18 15:56:41 nori kernel: [ 2155.710724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 18 15:56:41 nori kernel: [ 2155.710727] gzip D ffffffff810a08c0 0 5211 5173 0x00000100 Jan 18 15:56:41 nori kernel: [ 2155.710732] ffff8803f941f0c0 0000000000000086 ffff88046c6a8140 ffff8803de66c000 Jan 18 15:56:41 nori kernel: [ 2155.710737] ffff88044c2991f0 ffff8803f4f3eae0 ffff88044c2991f0 0000000000000001 Jan 18 15:56:41 nori kernel: [ 2155.710742] ffff8803f941f0c0 ffffffff8158113f ffff880450288d10 ffffffffa031b11d Jan 18 15:56:41 nori kernel: [ 2155.710747] Call Trace: Jan 18 15:56:41 nori kernel: [ 2155.710753] [<ffffffff8158113f>] ? schedule+0x2f/0x70 Jan 18 15:56:41 nori kernel: [ 2155.710776] [<ffffffffa031b11d>] ? wait_current_trans.isra.21+0xcd/0x110 [btrfs] Jan 18 15:56:41 nori kernel: [ 2155.710781] [<ffffffff810b1590>] ? wake_atomic_t_function+0x60/0x60 Jan 18 15:56:41 nori kernel: [ 2155.710804] [<ffffffffa031cb66>] ? start_transaction+0x416/0x590 [btrfs] Jan 18 15:56:41 nori kernel: [ 2155.710809] [<ffffffff811e5935>] ? __d_lookup+0x75/0x150 Jan 18 15:56:41 nori kernel: [ 2155.710833] [<ffffffffa031da0f>] ? __unlink_start_trans+0x1f/0xb0 [btrfs] Jan 18 15:56:41 nori kernel: [ 2155.710859] [<ffffffffa0325d23>] ? btrfs_unlink+0x23/0xa0 [btrfs] Jan 18 15:56:41 nori kernel: [ 2155.710868] [<ffffffff811d9667>] ? vfs_unlink+0xe7/0x190 Jan 18 15:56:41 nori kernel: [ 2155.710874] [<ffffffff811dcaa9>] ? do_unlinkat+0x299/0x2d0 Jan 18 15:56:41 nori kernel: [ 2155.710882] [<ffffffff81584db2>] ? system_call_fast_compare_end+0xc/0x67 Jan 18 15:58:41 nori kernel: [ 2275.507816] INFO: task btrfs-transacti:2351 blocked for more than 120 seconds. Jan 18 15:58:41 nori kernel: [ 2275.507824] Not tainted 4.3.0-1-amd64 #1 Jan 18 15:58:41 nori kernel: [ 2275.507826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 18 15:58:41 nori kernel: [ 2275.507830] btrfs-transacti D ffffffff810a08c0 0 2351 2 0x00000000 Jan 18 15:58:41 nori kernel: [ 2275.507838] ffff88044c282100 0000000000000046 ffff88046c6f6140 ffff88044c36c000 Jan 18 15:58:41 nori kernel: [ 2275.507844] ffff88046a970d08 ffff88046a970cd0 ffff88044c36bde8 ffff880450288d10 Jan 18 15:58:41 nori kernel: [ 2275.507849] 0000000000000000 ffffffff8158113f ffff88046a970000 ffffffffa037862d Jan 18 15:58:41 nori kernel: [ 2275.507855] Call Trace: Jan 18 15:58:41 nori kernel: [ 2275.507866] [<ffffffff8158113f>] ? schedule+0x2f/0x70 Jan 18 15:58:41 nori kernel: [ 2275.507909] [<ffffffffa037862d>] ? btrfs_scrub_pause+0x8d/0x100 [btrfs] Jan 18 15:58:41 nori kernel: [ 2275.507916] [<ffffffff810b1590>] ? wake_atomic_t_function+0x60/0x60 Jan 18 15:58:41 nori kernel: [ 2275.507946] [<ffffffffa031c0fa>] ? btrfs_commit_transaction+0x47a/0xad0 [btrfs] Jan 18 15:58:41 nori kernel: [ 2275.507974] [<ffffffffa031c7e7>] ? start_transaction+0x97/0x590 [btrfs] Jan 18 15:58:41 nori kernel: [ 2275.508000] [<ffffffffa03173b0>] ? transaction_kthread+0x220/0x240 [btrfs] Jan 18 15:58:41 nori kernel: [ 2275.508022] [<ffffffffa0317190>] ? btrfs_cleanup_transaction+0x540/0x540 [btrfs] Jan 18 15:58:41 nori kernel: [ 2275.508030] [<ffffffff810904bd>] ? kthread+0xcd/0xf0 Jan 18 15:58:41 nori kernel: [ 2275.508036] [<ffffffff810903f0>] ? kthread_create_on_node+0x190/0x190 Jan 18 15:58:41 nori kernel: [ 2275.508043] [<ffffffff8158514f>] ? ret_from_fork+0x3f/0x70 Jan 18 15:58:41 nori kernel: [ 2275.508049] [<ffffffff810903f0>] ? kthread_create_on_node+0x190/0x190 Jan 18 15:58:41 nori kernel: [ 2275.508067] INFO: task gzip:5211 blocked for more than 120 seconds. Jan 18 15:58:41 nori kernel: [ 2275.508070] Not tainted 4.3.0-1-amd64 #1 Jan 18 15:58:41 nori kernel: [ 2275.508072] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 18 15:58:41 nori kernel: [ 2275.508075] gzip D ffffffff810a08c0 0 5211 5173 0x00000100 Jan 18 15:58:41 nori kernel: [ 2275.508080] ffff8803f941f0c0 0000000000000086 ffff88046c6a8140 ffff8803de66c000 Jan 18 15:58:41 nori kernel: [ 2275.508086] ffff88044c2991f0 ffff8803f4f3eae0 ffff88044c2991f0 0000000000000001 Jan 18 15:58:41 nori kernel: [ 2275.508091] ffff8803f941f0c0 ffffffff8158113f ffff880450288d10 ffffffffa031b11d Jan 18 15:58:41 nori kernel: [ 2275.508095] Call Trace: Jan 18 15:58:41 nori kernel: [ 2275.508101] [<ffffffff8158113f>] ? schedule+0x2f/0x70 Jan 18 15:58:41 nori kernel: [ 2275.508123] [<ffffffffa031b11d>] ? wait_current_trans.isra.21+0xcd/0x110 [btrfs] Jan 18 15:58:41 nori kernel: [ 2275.508128] [<ffffffff810b1590>] ? wake_atomic_t_function+0x60/0x60 Jan 18 15:58:41 nori kernel: [ 2275.508152] [<ffffffffa031cb66>] ? start_transaction+0x416/0x590 [btrfs] Jan 18 15:58:41 nori kernel: [ 2275.508157] [<ffffffff811e5935>] ? __d_lookup+0x75/0x150 Jan 18 15:58:41 nori kernel: [ 2275.508181] [<ffffffffa031da0f>] ? __unlink_start_trans+0x1f/0xb0 [btrfs] Jan 18 15:58:41 nori kernel: [ 2275.508207] [<ffffffffa0325d23>] ? btrfs_unlink+0x23/0xa0 [btrfs] Jan 18 15:58:41 nori kernel: [ 2275.508213] [<ffffffff811d9667>] ? vfs_unlink+0xe7/0x190 Jan 18 15:58:41 nori kernel: [ 2275.508219] [<ffffffff811dcaa9>] ? do_unlinkat+0x299/0x2d0 Jan 18 15:58:41 nori kernel: [ 2275.508226] [<ffffffff81584db2>] ? system_call_fast_compare_end+0xc/0x67 Jan 18 16:00:41 nori kernel: [ 2395.299270] INFO: task btrfs-transacti:2351 blocked for more than 120 seconds. Jan 18 16:00:41 nori kernel: [ 2395.299278] Not tainted 4.3.0-1-amd64 #1 Jan 18 16:00:41 nori kernel: [ 2395.299281] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 18 16:00:41 nori kernel: [ 2395.299284] btrfs-transacti D ffffffff810a08c0 0 2351 2 0x00000000 Jan 18 16:00:41 nori kernel: [ 2395.299293] ffff88044c282100 0000000000000046 ffff88046c6f6140 ffff88044c36c000 Jan 18 16:00:41 nori kernel: [ 2395.299299] ffff88046a970d08 ffff88046a970cd0 ffff88044c36bde8 ffff880450288d10 Jan 18 16:00:41 nori kernel: [ 2395.299304] 0000000000000000 ffffffff8158113f ffff88046a970000 ffffffffa037862d Jan 18 16:00:41 nori kernel: [ 2395.299309] Call Trace: Jan 18 16:00:41 nori kernel: [ 2395.299322] [<ffffffff8158113f>] ? schedule+0x2f/0x70 Jan 18 16:00:41 nori kernel: [ 2395.299365] [<ffffffffa037862d>] ? btrfs_scrub_pause+0x8d/0x100 [btrfs] Jan 18 16:00:41 nori kernel: [ 2395.299371] [<ffffffff810b1590>] ? wake_atomic_t_function+0x60/0x60 Jan 18 16:00:41 nori kernel: [ 2395.299402] [<ffffffffa031c0fa>] ? btrfs_commit_transaction+0x47a/0xad0 [btrfs] Jan 18 16:00:41 nori kernel: [ 2395.299429] [<ffffffffa031c7e7>] ? start_transaction+0x97/0x590 [btrfs] Jan 18 16:00:41 nori kernel: [ 2395.299455] [<ffffffffa03173b0>] ? transaction_kthread+0x220/0x240 [btrfs] Jan 18 16:00:41 nori kernel: [ 2395.299478] [<ffffffffa0317190>] ? btrfs_cleanup_transaction+0x540/0x540 [btrfs] Jan 18 16:00:41 nori kernel: [ 2395.299485] [<ffffffff810904bd>] ? kthread+0xcd/0xf0 Jan 18 16:00:41 nori kernel: [ 2395.299492] [<ffffffff810903f0>] ? kthread_create_on_node+0x190/0x190 Jan 18 16:00:41 nori kernel: [ 2395.299499] [<ffffffff8158514f>] ? ret_from_fork+0x3f/0x70 Jan 18 16:00:41 nori kernel: [ 2395.299505] [<ffffffff810903f0>] ? kthread_create_on_node+0x190/0x190 Jan 18 16:00:41 nori kernel: [ 2395.299522] INFO: task gzip:5211 blocked for more than 120 seconds. Jan 18 16:00:41 nori kernel: [ 2395.299526] Not tainted 4.3.0-1-amd64 #1 Jan 18 16:00:41 nori kernel: [ 2395.299528] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 18 16:00:41 nori kernel: [ 2395.299530] gzip D ffffffff810a08c0 0 5211 5173 0x00000100 Jan 18 16:00:41 nori kernel: [ 2395.299536] ffff8803f941f0c0 0000000000000086 ffff88046c6a8140 ffff8803de66c000 Jan 18 16:00:41 nori kernel: [ 2395.299541] ffff88044c2991f0 ffff8803f4f3eae0 ffff88044c2991f0 0000000000000001 Jan 18 16:00:41 nori kernel: [ 2395.299546] ffff8803f941f0c0 ffffffff8158113f ffff880450288d10 ffffffffa031b11d Jan 18 16:00:41 nori kernel: [ 2395.299551] Call Trace: Jan 18 16:00:41 nori kernel: [ 2395.299557] [<ffffffff8158113f>] ? schedule+0x2f/0x70 Jan 18 16:00:41 nori kernel: [ 2395.299580] [<ffffffffa031b11d>] ? wait_current_trans.isra.21+0xcd/0x110 [btrfs] Jan 18 16:00:41 nori kernel: [ 2395.299585] [<ffffffff810b1590>] ? wake_atomic_t_function+0x60/0x60 Jan 18 16:00:41 nori kernel: [ 2395.299608] [<ffffffffa031cb66>] ? start_transaction+0x416/0x590 [btrfs] Jan 18 16:00:41 nori kernel: [ 2395.299614] [<ffffffff811e5935>] ? __d_lookup+0x75/0x150 Jan 18 16:00:41 nori kernel: [ 2395.299637] [<ffffffffa031da0f>] ? __unlink_start_trans+0x1f/0xb0 [btrfs] Jan 18 16:00:41 nori kernel: [ 2395.299662] [<ffffffffa0325d23>] ? btrfs_unlink+0x23/0xa0 [btrfs] Jan 18 16:00:41 nori kernel: [ 2395.299669] [<ffffffff811d9667>] ? vfs_unlink+0xe7/0x190 Jan 18 16:00:41 nori kernel: [ 2395.299674] [<ffffffff811dcaa9>] ? do_unlinkat+0x299/0x2d0 Jan 18 16:00:41 nori kernel: [ 2395.299681] [<ffffffff81584db2>] ? system_call_fast_compare_end+0xc/0x67 After that last line, I cancelled the replace and unmounted the FS and then proceeded to start it again with the previously missing drive. >> * Cancel replace: "btrfs replace cancel /mnt" >> * Unmount raid6 FS >> * Disconnect array >> * Physically add device FOO back to array >> * Reconnect array >> * Mount raid6 array normally (no "-o degraded") >> * Run "btrfs replace start 2 /dev/BAR /mnt" > > Hmm. Open question if 'btrfs replace cancel' actually marked /dev/BAR > as wiped. If it doesn't, then this 2nd replace start should have > failed unless you had used -f, or you used wipefs -a. If it's not > wiped by any of the above, then I'd expect it's possible things get > messy. I'm fairly positive that it must have marked it as wiped because I don't remember it failing or having to use -f. > Some others (no devs however) have disagreed with me on this, so take > it with a grain of salt, but I don't understand the rationale of > running scrub on degraded arrays. The first order of business is to > get it non-degraded. If that can't be done, scrub is pointless. Of > course it's going to produce millions of unrecoverable errors. There's > a device missing, so I'd expect many unrecoverable errors during a > degraded scrub. Yes I searched and I found your opinion on this. I was fairly confident that the "unrecoverable errors" in the scrub were due to it being degraded. I did the scrub because I felt I was running out of options. My only opinion, after going through this experience would be that btrfs-scrub fail fast and returned a hard error like "Cannot scrub degraded mount." That would have saved me some time. >> From a black box perspective, this led me to believe that the >> corruption happened during the replace operation after mounting >> normally after first mounting with "-o degraded." Of course, >> knowledge of the internals could easily verify this. > > Filesystems are really difficult, so even knowledge of the internals > doesn't guarantee the devs will understand where the problem first > started in this case. Fair point. I have experience writing a file system myself and I agree. Though sometimes it helps when you have a non-zero mental model of the internals. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: FS corruption when mounting non-degraded after mounting degraded 2016-01-22 1:21 ` Rian Hunter @ 2016-01-22 3:38 ` Chris Murphy 0 siblings, 0 replies; 8+ messages in thread From: Chris Murphy @ 2016-01-22 3:38 UTC (permalink / raw) To: Rian Hunter; +Cc: Chris Murphy, Btrfs BTRFS On Thu, Jan 21, 2016 at 6:21 PM, Rian Hunter <rian@thelig.ht> wrote: > > "SCT Error Recovery Control command not supported" is printed for all my > devices for the "smartctl" command. > > # cat /sys/block/sd{a,b,c,d,e,f,g,h}/device/timeout > 30 > 30 > 30 > 30 > 30 > 30 > 30 > 30 This combination is a misconfigured setup. The SCT ERC value must be lower than thekernel block device timout value. When SCT ERC is not supported, that means the value is unknown but in effect it means it can be stratospheric, 120+ seconds is not unheard of. So your use case must be prepared for non-deterministic, rare (hopefully), total stalls of 2-3 minutes. If your use case can't ever tolerate such a hang while a drive is in deep recovery doing a read on a marginal/bad sector, then the drives are disqualified for the use case, you have to get drives that have configurable SCT ERC, and do something like 'smartctl -l scterc,70,70' on every device at startup, since it's not a persistent setting. In the meantime you need to change the block device timeout for all drives to something nutty like 160. This also is not a persistent setting. This will permit a drive with a bad sector to actually report a read failure rather than the SATA link being reset by the kernel. The kernel uses this timeout value to know when to give up waiting for a hung drive, and the drive is hanging because it's in deep read recovery so its whole queue just stalls until there's a read error, or the read succeeds. What will suck is that some of these deep recoveries will work, so instead of getting a read error, which is a prerequisite for Btrfs (or md raid or lvm raid or any raid) to fix the slow sector, the drive just gets slower and slower when it has one or more frequently used bad blocks *until* it hits the drive's own internal timeout. The work around for this (instead of waiting for that sector to get bad enough to actually fail to read) is to do a full balance. That rewrites all data, and sectors with recent writes are fast. Any persistently bad sectors at write time will be remapped by drive firmware in favor of reserve sectors. >> >> Do you have dmesg output that includes sysrq+w at the time of the hung >> tasks warnings? That's pretty much always requested by devs in these >> cases. > > > This is dmesg before the freeze: https://www.kernel.org/doc/Documentation/sysrq.txt If you get a hung task message, enable sysrq, then write w to the trigger path, then capture dmesg to a file and post that to a bug report. > > > Yes I searched and I found your opinion on this. I was fairly > confident that the "unrecoverable errors" in the scrub were due to it > being degraded. I did the scrub because I felt I was running out of > options. > > My only opinion, after going through this experience would be that > btrfs-scrub fail fast and returned a hard error like "Cannot scrub > degraded mount." That would have saved me some time. Yes that's my position also until a dev tells me to shush. It's probably be a pretty straight forward patch for btrfs-progs to have scrub first check for degraded mount and fail with the error message you propose if it is; David might accept it! But if he doesn't he'll probably suggest a better way to handle such cases if there is one. Btrfs is always doing a passive on-the-fly scrub anyway. An active read and write scrub where every single stripe *will* have a read error that the scrub code has to do something with just seems to me like taxing an array that's already in duress. Maybe a compromise is a -f flag that forces scrub anyway? Maybe it's a valid stress test, but I still remain unconvinced it's a good production task. > >>> From a black box perspective, this led me to believe that the >>> corruption happened during the replace operation after mounting >>> normally after first mounting with "-o degraded." Of course, >>> knowledge of the internals could easily verify this. >> >> >> Filesystems are really difficult, so even knowledge of the internals >> doesn't guarantee the devs will understand where the problem first >> started in this case. > > > Fair point. I have experience writing a file system myself and I agree. > Though sometimes it helps when you have a non-zero mental model of > the internals. Well at least they can look at the call traces, and maybe ask some intelligent questions. I can't do that yet. -- Chris Murphy ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-01-22 3:38 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-01-19 11:28 FS corruption when mounting non-degraded after mounting degraded Rian Hunter 2016-01-19 13:54 ` Duncan 2016-01-21 16:26 ` Rich Freeman 2016-01-21 17:15 ` Chris Murphy 2016-01-21 22:25 ` Rian Hunter 2016-01-21 23:51 ` Chris Murphy 2016-01-22 1:21 ` Rian Hunter 2016-01-22 3:38 ` Chris Murphy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).