4.2.6: livelock in recovery (free_reloc

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* 4.2.6: livelock in recovery (free_reloc_roots)?
@ 2015-11-20  9:04 Lukas Pirl
  2015-11-21  0:37 ` Lukas Pirl
  2016-03-02 12:56 ` Still in 4.4.0: livelock in recovery (free_reloc_roots) Lukas Pirl
  0 siblings, 2 replies; 7+ messages in thread
From: Lukas Pirl @ 2015-11-20  9:04 UTC (permalink / raw)
  To: linux-btrfs

Dear list,

I am (still) trying to recover a RAID1 that can only be mounted
recovery,degraded,ro.

I experienced an issue that might be interesting for you: I tried to
mount the file system rw,recovery and the kernel ended up burning one
core (and only one specific core, never scheduled to another one).

The watchdog printed a stack trace roughly every 20 seconds. There were
only a few stack traces that were printed alternating (see below).
After a few hours with the mount command still being blocked and without
visible IO activity, the system was power-cycled.

Summary:

Call Trace:
 [<ffffffffa0309641>] ? free_reloc_roots+0x11/0x30 [btrfs]
 [<ffffffffa030964d>] ? free_reloc_roots+0x1d/0x30 [btrfs]
 [<ffffffffa030f6e5>] ? merge_reloc_roots+0x165/0x220 [btrfs]
 [<ffffffffa0310343>] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
 [<ffffffffa02bcaa2>] ? open_ctree+0x20d2/0x23b0 [btrfs]
 [<ffffffffa02933fb>] ? btrfs_mount+0x87b/0x990 [btrfs]
 [<ffffffff8117417f>] ? pcpu_next_unpop+0x3f/0x50
 [<ffffffff811c3646>] ? mount_fs+0x36/0x170
 [<ffffffff811ddf08>] ? vfs_kern_mount+0x68/0x110
 [<ffffffffa0292d3b>] ? btrfs_mount+0x1bb/0x990 [btrfs]
 …

Call Trace:
 <IRQ>  [<ffffffff810c8240>] ? rcu_dump_cpu_stacks+0x80/0xb0
 [<ffffffff810cb381>] ? rcu_check_callbacks+0x421/0x6e0
 [<ffffffff8101cb95>] ? sched_clock+0x5/0x10
 [<ffffffff8108d2c5>] ? notifier_call_chain+0x45/0x70
 [<ffffffff810d5fc1>] ? timekeeping_update+0xf1/0x150
 [<ffffffff810df2c0>] ? tick_sched_do_timer+0x40/0x40
 [<ffffffff810d0bb6>] ? update_process_times+0x36/0x60
 [<ffffffff810df2c0>] ? tick_sched_do_timer+0x40/0x40
 [<ffffffff810decf4>] ? tick_sched_handle.isra.15+0x24/0x60
 [<ffffffff810df2c0>] ? tick_sched_do_timer+0x40/0x40
 [<ffffffff810df2fb>] ? tick_sched_timer+0x3b/0x70
 [<ffffffff810d16dc>] ? __hrtimer_run_queues+0xdc/0x210
 [<ffffffff8101c645>] ? read_tsc+0x5/0x10
 [<ffffffff8101c645>] ? read_tsc+0x5/0x10
 [<ffffffff810d1afa>] ? hrtimer_interrupt+0x9a/0x190
 [<ffffffff8155b4f9>] ? smp_apic_timer_interrupt+0x39/0x50
 [<ffffffff815596db>] ? apic_timer_interrupt+0x6b/0x70
 <EOI>  [<ffffffff815584a0>] ? _raw_spin_lock+0x10/0x20
 [<ffffffffa030955f>] ? __del_reloc_root+0x2f/0x100 [btrfs]
 [<ffffffffa0309530>] ? __add_reloc_root+0xe0/0xe0 [btrfs]
 [<ffffffffa030964d>] ? free_reloc_roots+0x1d/0x30 [btrfs]
 [<ffffffffa030f6e5>] ? merge_reloc_roots+0x165/0x220 [btrfs]
 [<ffffffffa0310343>] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
 [<ffffffffa02bcaa2>] ? open_ctree+0x20d2/0x23b0 [btrfs]
 [<ffffffffa02933fb>] ? btrfs_mount+0x87b/0x990 [btrfs]
 [<ffffffff8117417f>] ? pcpu_next_unpop+0x3f/0x50
 [<ffffffff811c3646>] ? mount_fs+0x36/0x170
 [<ffffffff811ddf08>] ? vfs_kern_mount+0x68/0x110
 [<ffffffffa0292d3b>] ? btrfs_mount+0x1bb/0x990 [btrfs]
 …

Call Trace:
 [<ffffffffa030955f>] ? __del_reloc_root+0x2f/0x100 [btrfs]
 [<ffffffffa030964d>] ? free_reloc_roots+0x1d/0x30 [btrfs]
 [<ffffffffa030f6e5>] ? merge_reloc_roots+0x165/0x220 [btrfs]
 [<ffffffffa0310343>] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
 [<ffffffffa02bcaa2>] ? open_ctree+0x20d2/0x23b0 [btrfs]
 [<ffffffffa02933fb>] ? btrfs_mount+0x87b/0x990 [btrfs]
 [<ffffffff8117417f>] ? pcpu_next_unpop+0x3f/0x50
 [<ffffffff811c3646>] ? mount_fs+0x36/0x170
 [<ffffffff811ddf08>] ? vfs_kern_mount+0x68/0x110
 [<ffffffffa0292d3b>] ? btrfs_mount+0x1bb/0x990 [btrfs]
 …

A longer excerpt can be found here: http://pastebin.com/NPM0Ckfy

I am using kernel 4.2.6 (Debian backports) and btrfs-tools 4.3.

btrfs check --readonly gave no errors.
(except the probably false positives mentioned here
http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg48325.html)

Reading the whole file system worked also.

If you need more information to trace this back, let me know and I'll
try to get it.
If you have suggestions regarding the recovery, please let me know as well.

Best regards,

Lukas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 4.2.6: livelock in recovery (free_reloc_roots)?
  2015-11-20  9:04 4.2.6: livelock in recovery (free_reloc_roots)? Lukas Pirl
@ 2015-11-21  0:37 ` Lukas Pirl
  2015-11-21  7:16   ` Duncan
  2016-03-02 12:56 ` Still in 4.4.0: livelock in recovery (free_reloc_roots) Lukas Pirl
  1 sibling, 1 reply; 7+ messages in thread
From: Lukas Pirl @ 2015-11-21  0:37 UTC (permalink / raw)
  To: linux-btrfs

A follow-up question:

Can "btrfs_recover_relocation" prevented from being run? I would not
mind losing a few recent writes (what was a balance) but instead going
rw again, so I can restart a balance.

>From what I have read, btrfs-zero-log would not help in this case (?) so
I did not run it so far.

By the way, I can confirm the defect of 'btrfs device remove missing …"
mentioned here: http://www.spinics.net/lists/linux-btrfs/msg48383.html :

$ btrfs device delete missing /mnt/data
ERROR: missing is not a block device
$ btrfs device delete 5 /mnt/data
ERROR: 5 is not a block device

Thanks and best regards,

Lukas
On 11/20/2015 10:04 PM, Lukas Pirl wrote as excerpted:
> Dear list,
> 
> I am (still) trying to recover a RAID1 that can only be mounted
> recovery,degraded,ro.
> 
> I experienced an issue that might be interesting for you: I tried to
> mount the file system rw,recovery and the kernel ended up burning one
> core (and only one specific core, never scheduled to another one).
> 
> The watchdog printed a stack trace roughly every 20 seconds. There were
> only a few stack traces that were printed alternating (see below).
> After a few hours with the mount command still being blocked and without
> visible IO activity, the system was power-cycled.
> 
> Summary:
> 
> Call Trace:
>  [<ffffffffa0309641>] ? free_reloc_roots+0x11/0x30 [btrfs]
>  [<ffffffffa030964d>] ? free_reloc_roots+0x1d/0x30 [btrfs]
>  [<ffffffffa030f6e5>] ? merge_reloc_roots+0x165/0x220 [btrfs]
>  [<ffffffffa0310343>] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
>  [<ffffffffa02bcaa2>] ? open_ctree+0x20d2/0x23b0 [btrfs]
>  [<ffffffffa02933fb>] ? btrfs_mount+0x87b/0x990 [btrfs]
>  [<ffffffff8117417f>] ? pcpu_next_unpop+0x3f/0x50
>  [<ffffffff811c3646>] ? mount_fs+0x36/0x170
>  [<ffffffff811ddf08>] ? vfs_kern_mount+0x68/0x110
>  [<ffffffffa0292d3b>] ? btrfs_mount+0x1bb/0x990 [btrfs]
>  …
> 
> Call Trace:
>  <IRQ>  [<ffffffff810c8240>] ? rcu_dump_cpu_stacks+0x80/0xb0
>  [<ffffffff810cb381>] ? rcu_check_callbacks+0x421/0x6e0
>  [<ffffffff8101cb95>] ? sched_clock+0x5/0x10
>  [<ffffffff8108d2c5>] ? notifier_call_chain+0x45/0x70
>  [<ffffffff810d5fc1>] ? timekeeping_update+0xf1/0x150
>  [<ffffffff810df2c0>] ? tick_sched_do_timer+0x40/0x40
>  [<ffffffff810d0bb6>] ? update_process_times+0x36/0x60
>  [<ffffffff810df2c0>] ? tick_sched_do_timer+0x40/0x40
>  [<ffffffff810decf4>] ? tick_sched_handle.isra.15+0x24/0x60
>  [<ffffffff810df2c0>] ? tick_sched_do_timer+0x40/0x40
>  [<ffffffff810df2fb>] ? tick_sched_timer+0x3b/0x70
>  [<ffffffff810d16dc>] ? __hrtimer_run_queues+0xdc/0x210
>  [<ffffffff8101c645>] ? read_tsc+0x5/0x10
>  [<ffffffff8101c645>] ? read_tsc+0x5/0x10
>  [<ffffffff810d1afa>] ? hrtimer_interrupt+0x9a/0x190
>  [<ffffffff8155b4f9>] ? smp_apic_timer_interrupt+0x39/0x50
>  [<ffffffff815596db>] ? apic_timer_interrupt+0x6b/0x70
>  <EOI>  [<ffffffff815584a0>] ? _raw_spin_lock+0x10/0x20
>  [<ffffffffa030955f>] ? __del_reloc_root+0x2f/0x100 [btrfs]
>  [<ffffffffa0309530>] ? __add_reloc_root+0xe0/0xe0 [btrfs]
>  [<ffffffffa030964d>] ? free_reloc_roots+0x1d/0x30 [btrfs]
>  [<ffffffffa030f6e5>] ? merge_reloc_roots+0x165/0x220 [btrfs]
>  [<ffffffffa0310343>] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
>  [<ffffffffa02bcaa2>] ? open_ctree+0x20d2/0x23b0 [btrfs]
>  [<ffffffffa02933fb>] ? btrfs_mount+0x87b/0x990 [btrfs]
>  [<ffffffff8117417f>] ? pcpu_next_unpop+0x3f/0x50
>  [<ffffffff811c3646>] ? mount_fs+0x36/0x170
>  [<ffffffff811ddf08>] ? vfs_kern_mount+0x68/0x110
>  [<ffffffffa0292d3b>] ? btrfs_mount+0x1bb/0x990 [btrfs]
>  …
> 
> Call Trace:
>  [<ffffffffa030955f>] ? __del_reloc_root+0x2f/0x100 [btrfs]
>  [<ffffffffa030964d>] ? free_reloc_roots+0x1d/0x30 [btrfs]
>  [<ffffffffa030f6e5>] ? merge_reloc_roots+0x165/0x220 [btrfs]
>  [<ffffffffa0310343>] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
>  [<ffffffffa02bcaa2>] ? open_ctree+0x20d2/0x23b0 [btrfs]
>  [<ffffffffa02933fb>] ? btrfs_mount+0x87b/0x990 [btrfs]
>  [<ffffffff8117417f>] ? pcpu_next_unpop+0x3f/0x50
>  [<ffffffff811c3646>] ? mount_fs+0x36/0x170
>  [<ffffffff811ddf08>] ? vfs_kern_mount+0x68/0x110
>  [<ffffffffa0292d3b>] ? btrfs_mount+0x1bb/0x990 [btrfs]
>  …
> 
> A longer excerpt can be found here: http://pastebin.com/NPM0Ckfy
> 
> I am using kernel 4.2.6 (Debian backports) and btrfs-tools 4.3.
> 
> btrfs check --readonly gave no errors.
> (except the probably false positives mentioned here
> http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg48325.html)
> 
> Reading the whole file system worked also.
> 
> If you need more information to trace this back, let me know and I'll
> try to get it.
> If you have suggestions regarding the recovery, please let me know as well.
> 
> Best regards,
> 
> Lukas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 4.2.6: livelock in recovery (free_reloc_roots)?
  2015-11-21  0:37 ` Lukas Pirl
@ 2015-11-21  7:16   ` Duncan
  2015-11-21  9:01     ` Alexander Fougner
  2015-11-21 11:18     ` Lukas Pirl
  0 siblings, 2 replies; 7+ messages in thread
From: Duncan @ 2015-11-21  7:16 UTC (permalink / raw)
  To: linux-btrfs

Lukas Pirl posted on Sat, 21 Nov 2015 13:37:37 +1300 as excerpted:

> Can "btrfs_recover_relocation" prevented from being run? I would not
> mind losing a few recent writes (what was a balance) but instead going
> rw again, so I can restart a balance.

I'm not familiar with that thread name (I run multiple small btrfs on 
ssds, so scrub, balance, etc, take only a few minutes at most), but if 
it's the balance thread, then yes, there's a mount option that cancels a 
running balance.  See the wiki page covering mount options.

> From what I have read, btrfs-zero-log would not help in this case (?) so
> I did not run it so far.

Correct.  Btrfs is atomic at commit time, so doesn't need a journal in 
the sense of older filesystems like reiserfs, jfs and ext3/4.

What's this log, then?  While btrfs won't fully write normal file writes 
until a commit (every 30 seconds by default, there's a mount option...), 
which is atomic (with copy-on-write helping here) so in the event of a 
crash either the before or after state is returned, not something half 
written, fsync is different.  That says don't return until the file is 
written to storage.  But if a commit is done to ensure that, there may be 
far more data to commit that otherwise doesn't need to be committed yet, 
seriously slowing things down.  So that's where this log comes in.  It's 
purely a log of fsynced data (and perhaps a few other similar things, I'm 
not a dev and am not sure) between atomic commits, so the fsync can 
return quickly while still having actually written the data to store, 
without having to wait upto 30 seconds (by default) for the normal commit 
to complete, or forcing a commit, along with everything else half 
written, early.

There was a bug at one point where this log could be corrupted and thus 
couldn't be written properly at mount, but the primary trigger bug for 
that problem is long since fixed, so while there's various hardware bugs 
and etc that could still by remote chance cause problems, thus the option 
to zero the log, it's a very rare occurrence, and the trace when it fails 
is telltale enough that if it's posted the devs can tell you to run the 
zero-log command then.  Otherwise, it generally does no good, and while 
it generally does no serious harm beyond the loss of a few seconds worth 
of fsyncs, etc, either, because the commits /are/ atomic and zeroing the 
log simply returns the system to the state of such a commit, it's not 
recommended as it /does/ needlessly kill the log of those last few 
seconds of fsyncs.

> By the way, I can confirm the defect of 'btrfs device remove missing …"
> mentioned here: http://www.spinics.net/lists/linux-btrfs/msg48383.html :
> 
> $ btrfs device delete missing
> /mnt/data ERROR: missing is not a block device
> $ btrfs device delete 5
> /mnt/data ERROR: 5 is not a block device

That's a known bug, with patches working their way thru the system both 
to provide a different alternative and to let btrfs device delete missing 
work again, but IDR the specific status of those patches.  Presumably 
they'll be in 4.4, but I don't know if they made it into 4.3 or not and 
don't feel like looking it up ATM when you could do so just as easily.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 4.2.6: livelock in recovery (free_reloc_roots)?
  2015-11-21  7:16   ` Duncan
@ 2015-11-21  9:01     ` Alexander Fougner
  2015-11-21  9:38       ` Lukas Pirl
  2015-11-21 11:18     ` Lukas Pirl
  1 sibling, 1 reply; 7+ messages in thread
From: Alexander Fougner @ 2015-11-21  9:01 UTC (permalink / raw)
  To: Duncan, linux-btrfs



Den 2015-11-21 kl. 08:16, skrev Duncan:
> Lukas Pirl posted on Sat, 21 Nov 2015 13:37:37 +1300 as excerpted:
> 
>> Can "btrfs_recover_relocation" prevented from being run? I would not
>> mind losing a few recent writes (what was a balance) but instead going
>> rw again, so I can restart a balance.
> 
> I'm not familiar with that thread name (I run multiple small btrfs on 
> ssds, so scrub, balance, etc, take only a few minutes at most), but if 
> it's the balance thread, then yes, there's a mount option that cancels a 
> running balance.  See the wiki page covering mount options.
> 
>> From what I have read, btrfs-zero-log would not help in this case (?) so
>> I did not run it so far.
> 
> Correct.  Btrfs is atomic at commit time, so doesn't need a journal in 
> the sense of older filesystems like reiserfs, jfs and ext3/4.
> 
> What's this log, then?  While btrfs won't fully write normal file writes 
> until a commit (every 30 seconds by default, there's a mount option...), 
> which is atomic (with copy-on-write helping here) so in the event of a 
> crash either the before or after state is returned, not something half 
> written, fsync is different.  That says don't return until the file is 
> written to storage.  But if a commit is done to ensure that, there may be 
> far more data to commit that otherwise doesn't need to be committed yet, 
> seriously slowing things down.  So that's where this log comes in.  It's 
> purely a log of fsynced data (and perhaps a few other similar things, I'm 
> not a dev and am not sure) between atomic commits, so the fsync can 
> return quickly while still having actually written the data to store, 
> without having to wait upto 30 seconds (by default) for the normal commit 
> to complete, or forcing a commit, along with everything else half 
> written, early.
> 
> There was a bug at one point where this log could be corrupted and thus 
> couldn't be written properly at mount, but the primary trigger bug for 
> that problem is long since fixed, so while there's various hardware bugs 
> and etc that could still by remote chance cause problems, thus the option 
> to zero the log, it's a very rare occurrence, and the trace when it fails 
> is telltale enough that if it's posted the devs can tell you to run the 
> zero-log command then.  Otherwise, it generally does no good, and while 
> it generally does no serious harm beyond the loss of a few seconds worth 
> of fsyncs, etc, either, because the commits /are/ atomic and zeroing the 
> log simply returns the system to the state of such a commit, it's not 
> recommended as it /does/ needlessly kill the log of those last few 
> seconds of fsyncs.
> 
>> By the way, I can confirm the defect of 'btrfs device remove missing …"
>> mentioned here: http://www.spinics.net/lists/linux-btrfs/msg48383.html :
>>
>> $ btrfs device delete missing
>> /mnt/data ERROR: missing is not a block device
>> $ btrfs device delete 5
>> /mnt/data ERROR: 5 is not a block device
> 
> That's a known bug, with patches working their way thru the system both 
> to provide a different alternative and to let btrfs device delete missing 
> work again, but IDR the specific status of those patches.  Presumably 
> they'll be in 4.4, but I don't know if they made it into 4.3 or not and 
> don't feel like looking it up ATM when you could do so just as easily.

This is fixed in btrfs-progs 4.3.1, that allows you to delete a device again by the 'missing' keyword.
 

-- 
Best regards,
Alexander Fougner

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 4.2.6: livelock in recovery (free_reloc_roots)?
  2015-11-21  9:01     ` Alexander Fougner
@ 2015-11-21  9:38       ` Lukas Pirl
  0 siblings, 0 replies; 7+ messages in thread
From: Lukas Pirl @ 2015-11-21  9:38 UTC (permalink / raw)
  To: Alexander Fougner; +Cc: linux-btrfs

On 11/21/2015 10:01 PM, Alexander Fougner wrote as excerpted:
> This is fixed in btrfs-progs 4.3.1, that allows you to delete a
> device again by the 'missing' keyword.

Thanks Alexander! I just found the thread reporting the bug but not the
patch with the corresponding btrfs-tools version it was merged in.

Lukas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 4.2.6: livelock in recovery (free_reloc_roots)?
  2015-11-21  7:16   ` Duncan
  2015-11-21  9:01     ` Alexander Fougner
@ 2015-11-21 11:18     ` Lukas Pirl
  1 sibling, 0 replies; 7+ messages in thread
From: Lukas Pirl @ 2015-11-21 11:18 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On 11/21/2015 08:16 PM, Duncan wrote as excerpted:
> Lukas Pirl posted on Sat, 21 Nov 2015 13:37:37 +1300 as excerpted:
> 
>> > Can "btrfs_recover_relocation" prevented from being run? I would not
>> > mind losing a few recent writes (what was a balance) but instead going
>> > rw again, so I can restart a balance.
> I'm not familiar with that thread name (I run multiple small btrfs on 
> ssds, so scrub, balance, etc, take only a few minutes at most), but if 

First, thank you Duncan for taking the time to hack in those broad
explanations.

I am not sure if this name also corresponds to a thread name, but it is
for sure a function that appears in all the dumped traces when trying to
'mount -o recovery,degraded' the file system in question:

 [<ffffffffa030964d>] ? free_reloc_roots+0x1d/0x30 [btrfs]
 [<ffffffffa030f6e5>] ? merge_reloc_roots+0x165/0x220 [btrfs]
 [<ffffffffa0310343>] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
 [<ffffffffa02bcaa2>] ? open_ctree+0x20d2/0x23b0 [btrfs]
 [<ffffffffa02933fb>] ? btrfs_mount+0x87b/0x990 [btrfs]

> it's the balance thread, then yes, there's a mount option that cancels a 
> running balance.  See the wiki page covering mount options.

Yes, the file system is mounted with '-o skip_balance'.
(Although the '-o recovery' might trigger relocations?!)

>> > From what I have read, btrfs-zero-log would not help in this case (?) so
>> > I did not run it so far.
> Correct.  Btrfs is atomic at commit time, so doesn't need a journal in 
> the sense of older filesystems like reiserfs, jfs and ext3/4.
> …
> Otherwise, it generally does no good, and while 
> it generally does no serious harm beyond the loss of a few seconds worth 
> of fsyncs, etc, either, because the commits /are/ atomic and zeroing the 
> log simply returns the system to the state of such a commit, it's not 
> recommended as it /does/ needlessly kill the log of those last few 
> seconds of fsyncs.

So I see that it does no good but no serious harm (generally). Since it
is related to writes (not relocations, I assume) clearing the log is
unlikely to fix the problem with btrfs_recover_relocation or
merge_reloc_roots, respectively.

Maybe a dev helps us and shines some light in the (I assume) impossible
relocation issue.

Best,

Lukas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Still in 4.4.0: livelock in recovery (free_reloc_roots)
  2015-11-20  9:04 4.2.6: livelock in recovery (free_reloc_roots)? Lukas Pirl
  2015-11-21  0:37 ` Lukas Pirl
@ 2016-03-02 12:56 ` Lukas Pirl
  1 sibling, 0 replies; 7+ messages in thread
From: Lukas Pirl @ 2016-03-02 12:56 UTC (permalink / raw)
  To: linux-btrfs

On 11/20/2015 10:04 AM, Lukas Pirl wrote as excerpted:
> I am (still) trying to recover a RAID1 that can only be mounted
> recovery,degraded,ro.
> 
> I experienced an issue that might be interesting for you: I tried to
> mount the file system rw,recovery and the kernel ended up burning one
> core (and only one specific core, never scheduled to another one).
> 
> The watchdog printed a stack trace roughly every 20 seconds. There were
> only a few stack traces that were printed alternating (see below).
> After a few hours with the mount command still being blocked and without
> visible IO activity, the system was power-cycled.
> 
> Summary:
> 
> Call Trace:
>  [<ffffffffa0309641>] ? free_reloc_roots+0x11/0x30 [btrfs]
>  [<ffffffffa030964d>] ? free_reloc_roots+0x1d/0x30 [btrfs]
>  [<ffffffffa030f6e5>] ? merge_reloc_roots+0x165/0x220 [btrfs]
>  [<ffffffffa0310343>] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
>  [<ffffffffa02bcaa2>] ? open_ctree+0x20d2/0x23b0 [btrfs]
>  [<ffffffffa02933fb>] ? btrfs_mount+0x87b/0x990 [btrfs]
>  [<ffffffff8117417f>] ? pcpu_next_unpop+0x3f/0x50
>  [<ffffffff811c3646>] ? mount_fs+0x36/0x170
>  [<ffffffff811ddf08>] ? vfs_kern_mount+0x68/0x110
>  [<ffffffffa0292d3b>] ? btrfs_mount+0x1bb/0x990 [btrfs]
>  …
> 
> Call Trace:
>  <IRQ>  [<ffffffff810c8240>] ? rcu_dump_cpu_stacks+0x80/0xb0
>  [<ffffffff810cb381>] ? rcu_check_callbacks+0x421/0x6e0
>  [<ffffffff8101cb95>] ? sched_clock+0x5/0x10
>  [<ffffffff8108d2c5>] ? notifier_call_chain+0x45/0x70
>  [<ffffffff810d5fc1>] ? timekeeping_update+0xf1/0x150
>  [<ffffffff810df2c0>] ? tick_sched_do_timer+0x40/0x40
>  [<ffffffff810d0bb6>] ? update_process_times+0x36/0x60
>  [<ffffffff810df2c0>] ? tick_sched_do_timer+0x40/0x40
>  [<ffffffff810decf4>] ? tick_sched_handle.isra.15+0x24/0x60
>  [<ffffffff810df2c0>] ? tick_sched_do_timer+0x40/0x40
>  [<ffffffff810df2fb>] ? tick_sched_timer+0x3b/0x70
>  [<ffffffff810d16dc>] ? __hrtimer_run_queues+0xdc/0x210
>  [<ffffffff8101c645>] ? read_tsc+0x5/0x10
>  [<ffffffff8101c645>] ? read_tsc+0x5/0x10
>  [<ffffffff810d1afa>] ? hrtimer_interrupt+0x9a/0x190
>  [<ffffffff8155b4f9>] ? smp_apic_timer_interrupt+0x39/0x50
>  [<ffffffff815596db>] ? apic_timer_interrupt+0x6b/0x70
>  <EOI>  [<ffffffff815584a0>] ? _raw_spin_lock+0x10/0x20
>  [<ffffffffa030955f>] ? __del_reloc_root+0x2f/0x100 [btrfs]
>  [<ffffffffa0309530>] ? __add_reloc_root+0xe0/0xe0 [btrfs]
>  [<ffffffffa030964d>] ? free_reloc_roots+0x1d/0x30 [btrfs]
>  [<ffffffffa030f6e5>] ? merge_reloc_roots+0x165/0x220 [btrfs]
>  [<ffffffffa0310343>] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
>  [<ffffffffa02bcaa2>] ? open_ctree+0x20d2/0x23b0 [btrfs]
>  [<ffffffffa02933fb>] ? btrfs_mount+0x87b/0x990 [btrfs]
>  [<ffffffff8117417f>] ? pcpu_next_unpop+0x3f/0x50
>  [<ffffffff811c3646>] ? mount_fs+0x36/0x170
>  [<ffffffff811ddf08>] ? vfs_kern_mount+0x68/0x110
>  [<ffffffffa0292d3b>] ? btrfs_mount+0x1bb/0x990 [btrfs]
>  …
> 
> Call Trace:
>  [<ffffffffa030955f>] ? __del_reloc_root+0x2f/0x100 [btrfs]
>  [<ffffffffa030964d>] ? free_reloc_roots+0x1d/0x30 [btrfs]
>  [<ffffffffa030f6e5>] ? merge_reloc_roots+0x165/0x220 [btrfs]
>  [<ffffffffa0310343>] ? btrfs_recover_relocation+0x293/0x380 [btrfs]
>  [<ffffffffa02bcaa2>] ? open_ctree+0x20d2/0x23b0 [btrfs]
>  [<ffffffffa02933fb>] ? btrfs_mount+0x87b/0x990 [btrfs]
>  [<ffffffff8117417f>] ? pcpu_next_unpop+0x3f/0x50
>  [<ffffffff811c3646>] ? mount_fs+0x36/0x170
>  [<ffffffff811ddf08>] ? vfs_kern_mount+0x68/0x110
>  [<ffffffffa0292d3b>] ? btrfs_mount+0x1bb/0x990 [btrfs]
>  …
> 
> A longer excerpt can be found here: http://pastebin.com/NPM0Ckfy
> 
> I am using kernel 4.2.6 (Debian backports) and btrfs-tools 4.3.
> 
> btrfs check --readonly gave no errors.
> (except the probably false positives mentioned here
> http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg48325.html)
> 
> Reading the whole file system worked also.
> 
> If you need more information to trace this back, let me know and I'll
> try to get it.
> If you have suggestions regarding the recovery, please let me know as well.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-03-02 13:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-20  9:04 4.2.6: livelock in recovery (free_reloc_roots)? Lukas Pirl
2015-11-21  0:37 ` Lukas Pirl
2015-11-21  7:16   ` Duncan
2015-11-21  9:01     ` Alexander Fougner
2015-11-21  9:38       ` Lukas Pirl
2015-11-21 11:18     ` Lukas Pirl
2016-03-02 12:56 ` Still in 4.4.0: livelock in recovery (free_reloc_roots) Lukas Pirl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox