Crashes in extent_io.c after "btrfs bad mapping eb" notice

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Crashes in extent_io.c after "btrfs bad mapping eb" notice
@ 2012-10-30 23:57 Franke
  2012-10-31  0:47 ` Liu Bo
  2012-10-31  3:46 ` Chris Samuel
  0 siblings, 2 replies; 9+ messages in thread
From: Franke @ 2012-10-30 23:57 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I have been having some crashes like this. Since I upgraded to 3.6.4 they have become common. The crashes happen pretty randomly during normal system usage. After the syslog messages the system stays semi usable for a minute, but when I run any new program it hangs. I had to downgrade to 3.6.2 to get my system usable again.

Is there any way I can help find the cause of those crashes? 

thanks,
Tobias Franke

logs here:
https://www.dropbox.com/s/1f6gh4k0zl2t5zq/btrfs_hangs_3-4-6.log

It starts like this:
Oct 30 22:42:36 localhost kernel: btrfs bad mapping eb start 982689189888 len 4096, wanted 4424 17
Oct 30 22:42:36 localhost kernel: ------------[ cut here ]------------
Oct 30 22:42:36 localhost kernel: WARNING: at fs/btrfs/extent_io.c:4661 map_private_extent_buffer+0x83/0xc9()
Oct 30 22:42:36 localhost kernel: Hardware name: A780L3L
Oct 30 22:42:36 localhost kernel: Modules linked in: hwmon_vid snd_hda_codec_hdmi nvidia(PO) snd_hda_codec_realtek snd_hda_intel snd_hda_codec i2c_piix4 k10temp
Oct 30 22:42:36 localhost kernel: Pid: 7143, comm: btrfs-endio-wri Tainted: P           O 3.6.4-gentoo #1
Oct 30 22:42:36 localhost kernel: Call Trace:
Oct 30 22:42:36 localhost kernel: [<ffffffff81030181>] ? warn_slowpath_common+0x78/0x8c
Oct 30 22:42:36 localhost kernel: [<ffffffff8127dc2e>] ? map_private_extent_buffer+0x83/0xc9
Oct 30 22:42:36 localhost kernel: [<ffffffff81249353>] ? generic_bin_search.clone.50+0xaa/0x130
Oct 30 22:42:36 localhost kernel: [<ffffffff8124c11a>] ? btrfs_search_old_slot+0x333/0x6ad
Oct 30 22:42:36 localhost kernel: [<ffffffff812a737d>] ? __resolve_indirect_refs+0x147/0x4b1
Oct 30 22:42:36 localhost kernel: [<ffffffff812a70bf>] ? __add_keyed_refs.clone.6+0x9e/0x215
Oct 30 22:42:36 localhost kernel: [<ffffffff812a6ba6>] ? __add_prelim_ref+0x3a/0xb9
Oct 30 22:42:36 localhost kernel: [<ffffffff812a7d54>] ? find_parent_nodes+0x553/0x7ca
Oct 30 22:42:36 localhost kernel: [<ffffffff812458b8>] ? leaf_space_used+0xaf/0xd7
Oct 30 22:42:36 localhost kernel: [<ffffffff812a8044>] ? btrfs_find_all_roots+0x79/0xd4
Oct 30 22:42:36 localhost kernel: [<ffffffff812aa1d0>] ? btrfs_qgroup_account_ref+0xd2/0x4a8
Oct 30 22:42:36 localhost kernel: [<ffffffff81278482>] ? alloc_extent_state+0x58/0x9d
Oct 30 22:42:36 localhost kernel: [<ffffffff810e8245>] ? kmem_cache_free+0x12/0xbd
Oct 30 22:42:36 localhost kernel: [<ffffffff81253c93>] ? btrfs_delayed_refs_qgroup_accounting+0x9e/0xca
Oct 30 22:42:36 localhost kernel: [<ffffffff81266723>] ? __btrfs_end_transaction+0x4e/0x293
Oct 30 22:42:36 localhost kernel: [<ffffffff8126bb26>] ? btrfs_finish_ordered_io+0x2e1/0x332
Oct 30 22:42:36 localhost kernel: [<ffffffff8103cfae>] ? run_timer_softirq+0x2c2/0x2c2
Oct 30 22:42:36 localhost kernel: [<ffffffff81286398>] ? worker_loop+0x174/0x4c0
Oct 30 22:42:36 localhost kernel: [<ffffffff81286224>] ? btrfs_queue_worker+0x261/0x261
Oct 30 22:42:36 localhost kernel: [<ffffffff81286224>] ? btrfs_queue_worker+0x261/0x261
Oct 30 22:42:36 localhost kernel: [<ffffffff8104b293>] ? kthread+0x81/0x89
Oct 30 22:42:36 localhost kernel: [<ffffffff815de774>] ? kernel_thread_helper+0x4/0x10
Oct 30 22:42:36 localhost kernel: [<ffffffff8104b212>] ? kthread_freezable_should_stop+0x52/0x52
Oct 30 22:42:36 localhost kernel: [<ffffffff815de770>] ? gs_change+0xb/0xb
Oct 30 22:42:36 localhost kernel: ---[ end trace cb2d15ce8c2d83ec ]---
Oct 30 22:42:36 localhost kernel: ------------[ cut here ]------------

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice
  2012-10-30 23:57 Crashes in extent_io.c after "btrfs bad mapping eb" notice Franke
@ 2012-10-31  0:47 ` Liu Bo
  2012-10-31  0:50   ` Liu Bo
  2012-10-31 20:00   ` Franke
  2012-10-31  3:46 ` Chris Samuel
  1 sibling, 2 replies; 9+ messages in thread
From: Liu Bo @ 2012-10-31  0:47 UTC (permalink / raw)
  To: Franke; +Cc: linux-btrfs, Jan Schmidt

On 10/31/2012 07:57 AM, Franke wrote:
> Hello,
> 
> I have been having some crashes like this. Since I upgraded to 3.6.4 they have become common. The crashes happen pretty randomly during normal system usage. After the syslog messages the system stays semi usable for a minute, but when I run any new program it hangs. I had to downgrade to 3.6.2 to get my system usable again.
> 
> Is there any way I can help find the cause of those crashes? 
> 

Hi Franke,

Jan and me have worked together to fix the mentioned bugs days before, and you can try
Jan's git repo and see if it works in your situation:

          git.jan-o-sch.net for-chris

(It contains most of patches related to backref walking.)

thanks,
liubo

> thanks,
> Tobias Franke
> 
> logs here:
> https://www.dropbox.com/s/1f6gh4k0zl2t5zq/btrfs_hangs_3-4-6.log
> 
> It starts like this:
> Oct 30 22:42:36 localhost kernel: btrfs bad mapping eb start 982689189888 len 4096, wanted 4424 17
> Oct 30 22:42:36 localhost kernel: ------------[ cut here ]------------
> Oct 30 22:42:36 localhost kernel: WARNING: at fs/btrfs/extent_io.c:4661 map_private_extent_buffer+0x83/0xc9()
> Oct 30 22:42:36 localhost kernel: Hardware name: A780L3L
> Oct 30 22:42:36 localhost kernel: Modules linked in: hwmon_vid snd_hda_codec_hdmi nvidia(PO) snd_hda_codec_realtek snd_hda_intel snd_hda_codec i2c_piix4 k10temp
> Oct 30 22:42:36 localhost kernel: Pid: 7143, comm: btrfs-endio-wri Tainted: P           O 3.6.4-gentoo #1
> Oct 30 22:42:36 localhost kernel: Call Trace:
> Oct 30 22:42:36 localhost kernel: [<ffffffff81030181>] ? warn_slowpath_common+0x78/0x8c
> Oct 30 22:42:36 localhost kernel: [<ffffffff8127dc2e>] ? map_private_extent_buffer+0x83/0xc9
> Oct 30 22:42:36 localhost kernel: [<ffffffff81249353>] ? generic_bin_search.clone.50+0xaa/0x130
> Oct 30 22:42:36 localhost kernel: [<ffffffff8124c11a>] ? btrfs_search_old_slot+0x333/0x6ad
> Oct 30 22:42:36 localhost kernel: [<ffffffff812a737d>] ? __resolve_indirect_refs+0x147/0x4b1
> Oct 30 22:42:36 localhost kernel: [<ffffffff812a70bf>] ? __add_keyed_refs.clone.6+0x9e/0x215
> Oct 30 22:42:36 localhost kernel: [<ffffffff812a6ba6>] ? __add_prelim_ref+0x3a/0xb9
> Oct 30 22:42:36 localhost kernel: [<ffffffff812a7d54>] ? find_parent_nodes+0x553/0x7ca
> Oct 30 22:42:36 localhost kernel: [<ffffffff812458b8>] ? leaf_space_used+0xaf/0xd7
> Oct 30 22:42:36 localhost kernel: [<ffffffff812a8044>] ? btrfs_find_all_roots+0x79/0xd4
> Oct 30 22:42:36 localhost kernel: [<ffffffff812aa1d0>] ? btrfs_qgroup_account_ref+0xd2/0x4a8
> Oct 30 22:42:36 localhost kernel: [<ffffffff81278482>] ? alloc_extent_state+0x58/0x9d
> Oct 30 22:42:36 localhost kernel: [<ffffffff810e8245>] ? kmem_cache_free+0x12/0xbd
> Oct 30 22:42:36 localhost kernel: [<ffffffff81253c93>] ? btrfs_delayed_refs_qgroup_accounting+0x9e/0xca
> Oct 30 22:42:36 localhost kernel: [<ffffffff81266723>] ? __btrfs_end_transaction+0x4e/0x293
> Oct 30 22:42:36 localhost kernel: [<ffffffff8126bb26>] ? btrfs_finish_ordered_io+0x2e1/0x332
> Oct 30 22:42:36 localhost kernel: [<ffffffff8103cfae>] ? run_timer_softirq+0x2c2/0x2c2
> Oct 30 22:42:36 localhost kernel: [<ffffffff81286398>] ? worker_loop+0x174/0x4c0
> Oct 30 22:42:36 localhost kernel: [<ffffffff81286224>] ? btrfs_queue_worker+0x261/0x261
> Oct 30 22:42:36 localhost kernel: [<ffffffff81286224>] ? btrfs_queue_worker+0x261/0x261
> Oct 30 22:42:36 localhost kernel: [<ffffffff8104b293>] ? kthread+0x81/0x89
> Oct 30 22:42:36 localhost kernel: [<ffffffff815de774>] ? kernel_thread_helper+0x4/0x10
> Oct 30 22:42:36 localhost kernel: [<ffffffff8104b212>] ? kthread_freezable_should_stop+0x52/0x52
> Oct 30 22:42:36 localhost kernel: [<ffffffff815de770>] ? gs_change+0xb/0xb
> Oct 30 22:42:36 localhost kernel: ---[ end trace cb2d15ce8c2d83ec ]---
> Oct 30 22:42:36 localhost kernel: ------------[ cut here ]------------
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice
  2012-10-31  0:47 ` Liu Bo
@ 2012-10-31  0:50   ` Liu Bo
  2012-10-31 20:00   ` Franke
  1 sibling, 0 replies; 9+ messages in thread
From: Liu Bo @ 2012-10-31  0:50 UTC (permalink / raw)
  To: Franke; +Cc: linux-btrfs, Jan Schmidt

On 10/31/2012 08:47 AM, Liu Bo wrote:
> On 10/31/2012 07:57 AM, Franke wrote:
>> Hello,
>>
>> I have been having some crashes like this. Since I upgraded to 3.6.4 they have become common. The crashes happen pretty randomly during normal system usage. After the syslog messages the system stays semi usable for a minute, but when I run any new program it hangs. I had to downgrade to 3.6.2 to get my system usable again.
>>
>> Is there any way I can help find the cause of those crashes? 
>>
> 
> Hi Franke,
> 
> Jan and me have worked together to fix the mentioned bugs days before, and you can try
> Jan's git repo and see if it works in your situation:
> 
>           git.jan-o-sch.net for-chris

it should be
           git.jan-o-sch.net/btrfs-unstable for-chris

> 
> (It contains most of patches related to backref walking.)
> 
> thanks,
> liubo
> 
>> thanks,
>> Tobias Franke
>>
>> logs here:
>> https://www.dropbox.com/s/1f6gh4k0zl2t5zq/btrfs_hangs_3-4-6.log
>>
>> It starts like this:
>> Oct 30 22:42:36 localhost kernel: btrfs bad mapping eb start 982689189888 len 4096, wanted 4424 17
>> Oct 30 22:42:36 localhost kernel: ------------[ cut here ]------------
>> Oct 30 22:42:36 localhost kernel: WARNING: at fs/btrfs/extent_io.c:4661 map_private_extent_buffer+0x83/0xc9()
>> Oct 30 22:42:36 localhost kernel: Hardware name: A780L3L
>> Oct 30 22:42:36 localhost kernel: Modules linked in: hwmon_vid snd_hda_codec_hdmi nvidia(PO) snd_hda_codec_realtek snd_hda_intel snd_hda_codec i2c_piix4 k10temp
>> Oct 30 22:42:36 localhost kernel: Pid: 7143, comm: btrfs-endio-wri Tainted: P           O 3.6.4-gentoo #1
>> Oct 30 22:42:36 localhost kernel: Call Trace:
>> Oct 30 22:42:36 localhost kernel: [<ffffffff81030181>] ? warn_slowpath_common+0x78/0x8c
>> Oct 30 22:42:36 localhost kernel: [<ffffffff8127dc2e>] ? map_private_extent_buffer+0x83/0xc9
>> Oct 30 22:42:36 localhost kernel: [<ffffffff81249353>] ? generic_bin_search.clone.50+0xaa/0x130
>> Oct 30 22:42:36 localhost kernel: [<ffffffff8124c11a>] ? btrfs_search_old_slot+0x333/0x6ad
>> Oct 30 22:42:36 localhost kernel: [<ffffffff812a737d>] ? __resolve_indirect_refs+0x147/0x4b1
>> Oct 30 22:42:36 localhost kernel: [<ffffffff812a70bf>] ? __add_keyed_refs.clone.6+0x9e/0x215
>> Oct 30 22:42:36 localhost kernel: [<ffffffff812a6ba6>] ? __add_prelim_ref+0x3a/0xb9
>> Oct 30 22:42:36 localhost kernel: [<ffffffff812a7d54>] ? find_parent_nodes+0x553/0x7ca
>> Oct 30 22:42:36 localhost kernel: [<ffffffff812458b8>] ? leaf_space_used+0xaf/0xd7
>> Oct 30 22:42:36 localhost kernel: [<ffffffff812a8044>] ? btrfs_find_all_roots+0x79/0xd4
>> Oct 30 22:42:36 localhost kernel: [<ffffffff812aa1d0>] ? btrfs_qgroup_account_ref+0xd2/0x4a8
>> Oct 30 22:42:36 localhost kernel: [<ffffffff81278482>] ? alloc_extent_state+0x58/0x9d
>> Oct 30 22:42:36 localhost kernel: [<ffffffff810e8245>] ? kmem_cache_free+0x12/0xbd
>> Oct 30 22:42:36 localhost kernel: [<ffffffff81253c93>] ? btrfs_delayed_refs_qgroup_accounting+0x9e/0xca
>> Oct 30 22:42:36 localhost kernel: [<ffffffff81266723>] ? __btrfs_end_transaction+0x4e/0x293
>> Oct 30 22:42:36 localhost kernel: [<ffffffff8126bb26>] ? btrfs_finish_ordered_io+0x2e1/0x332
>> Oct 30 22:42:36 localhost kernel: [<ffffffff8103cfae>] ? run_timer_softirq+0x2c2/0x2c2
>> Oct 30 22:42:36 localhost kernel: [<ffffffff81286398>] ? worker_loop+0x174/0x4c0
>> Oct 30 22:42:36 localhost kernel: [<ffffffff81286224>] ? btrfs_queue_worker+0x261/0x261
>> Oct 30 22:42:36 localhost kernel: [<ffffffff81286224>] ? btrfs_queue_worker+0x261/0x261
>> Oct 30 22:42:36 localhost kernel: [<ffffffff8104b293>] ? kthread+0x81/0x89
>> Oct 30 22:42:36 localhost kernel: [<ffffffff815de774>] ? kernel_thread_helper+0x4/0x10
>> Oct 30 22:42:36 localhost kernel: [<ffffffff8104b212>] ? kthread_freezable_should_stop+0x52/0x52
>> Oct 30 22:42:36 localhost kernel: [<ffffffff815de770>] ? gs_change+0xb/0xb
>> Oct 30 22:42:36 localhost kernel: ---[ end trace cb2d15ce8c2d83ec ]---
>> Oct 30 22:42:36 localhost kernel: ------------[ cut here ]------------
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice
  2012-10-31  0:47 ` Liu Bo
  2012-10-31  0:50   ` Liu Bo
@ 2012-10-31 20:00   ` Franke
  2012-11-01  2:39     ` Liu Bo
  1 sibling, 1 reply; 9+ messages in thread
From: Franke @ 2012-10-31 20:00 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

Hi,

since yesterday I have run a balance while asleep/at work. Now I
experimented a bit, and the situation has changed.

I am now getting hard hangs ( system is gone without even writing
anything to syslog ), some time ( minutes to an hour ) into running a
scrub.

Those hangs happen with 3.6.2 , 3.6.4 and Jan's unstable version.
It hasn't hung yet without running a scrub.

I have no idea if this is part of the same problem or something else.
Do you have any idea either way?

I will probably destroy my btrfs and create a new one over the weekend,
unless you still need info for debugging.

thanks,
Tobias

On Wed, 31 Oct 2012 08:47:16 +0800
Liu Bo <bo.li.liu@oracle.com> wrote:

> On 10/31/2012 07:57 AM, Franke wrote:
> > Hello,
> > 
> > I have been having some crashes like this. Since I upgraded to
> > 3.6.4 they have become common. The crashes happen pretty randomly
> > during normal system usage. After the syslog messages the system
> > stays semi usable for a minute, but when I run any new program it
> > hangs. I had to downgrade to 3.6.2 to get my system usable again.
> > 
> > Is there any way I can help find the cause of those crashes? 
> > 
> 
> Hi Franke,
> 
> Jan and me have worked together to fix the mentioned bugs days
> before, and you can try Jan's git repo and see if it works in your
> situation:
> 
>           git.jan-o-sch.net for-chris
> 
> (It contains most of patches related to backref walking.)
> 
> thanks,
> liubo
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice
  2012-10-31 20:00   ` Franke
@ 2012-11-01  2:39     ` Liu Bo
  2012-11-01 11:57       ` Jan Schmidt
  0 siblings, 1 reply; 9+ messages in thread
From: Liu Bo @ 2012-11-01  2:39 UTC (permalink / raw)
  To: Franke; +Cc: linux-btrfs

On 11/01/2012 04:00 AM, Franke wrote:
> Hi,
> 
> since yesterday I have run a balance while asleep/at work. Now I
> experimented a bit, and the situation has changed.
> 
> I am now getting hard hangs ( system is gone without even writing
> anything to syslog ), some time ( minutes to an hour ) into running a
> scrub.
> 
> Those hangs happen with 3.6.2 , 3.6.4 and Jan's unstable version.
> It hasn't hung yet without running a scrub.
> 
> I have no idea if this is part of the same problem or something else.
> Do you have any idea either way?
> 

Well, thanks for testing.

We may need your sysrq-w output(maybe screen output) to locate where we hard hangs.

Besides, I recommend you pick Jan's patches out, and apply them on the latest btrfs upstream
and run another round to see if it get better, since there might be some fixes for the very hang
already in the upstream.

Right now the latest btrfs upstream's top commit is

commit f46dbe3dee853f8a860f889cb2b7ff4c624f2a7a
Author: Chris Mason <chris.mason@fusionio.com>
Date:   Tue Oct 9 11:17:20 2012 -0400

    btrfs: init ref_index to zero in add_inode_ref
    
    Signed-off-by: Chris Mason <chris.mason@fusionio.com>

thanks,
liubo

> I will probably destroy my btrfs and create a new one over the weekend,
> unless you still need info for debugging.
> 
> thanks,
> Tobias
> 
> On Wed, 31 Oct 2012 08:47:16 +0800
> Liu Bo <bo.li.liu@oracle.com> wrote:
> 
>> On 10/31/2012 07:57 AM, Franke wrote:
>>> Hello,
>>>
>>> I have been having some crashes like this. Since I upgraded to
>>> 3.6.4 they have become common. The crashes happen pretty randomly
>>> during normal system usage. After the syslog messages the system
>>> stays semi usable for a minute, but when I run any new program it
>>> hangs. I had to downgrade to 3.6.2 to get my system usable again.
>>>
>>> Is there any way I can help find the cause of those crashes? 
>>>
>>
>> Hi Franke,
>>
>> Jan and me have worked together to fix the mentioned bugs days
>> before, and you can try Jan's git repo and see if it works in your
>> situation:
>>
>>           git.jan-o-sch.net for-chris
>>
>> (It contains most of patches related to backref walking.)
>>
>> thanks,
>> liubo
>>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice
  2012-11-01  2:39     ` Liu Bo
@ 2012-11-01 11:57       ` Jan Schmidt
  2012-11-01 16:01         ` Jan Schmidt
  2012-11-01 18:13         ` Tobias Franke
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Schmidt @ 2012-11-01 11:57 UTC (permalink / raw)
  To: Liu Bo; +Cc: Franke, linux-btrfs

On Thu, November 01, 2012 at 03:39 (+0100), Liu Bo wrote:
> On 11/01/2012 04:00 AM, Franke wrote:
>> Hi,
>>
>> since yesterday I have run a balance while asleep/at work. Now I
>> experimented a bit, and the situation has changed.
>>
>> I am now getting hard hangs ( system is gone without even writing
>> anything to syslog ), some time ( minutes to an hour ) into running a
>> scrub.
>>
>> Those hangs happen with 3.6.2 , 3.6.4 and Jan's unstable version.
>> It hasn't hung yet without running a scrub.
>>
>> I have no idea if this is part of the same problem or something else.
>> Do you have any idea either way?
>>
> 
> Well, thanks for testing.
> 
> We may need your sysrq-w output(maybe screen output) to locate where we hard hangs.
> 
> Besides, I recommend you pick Jan's patches out, and apply them on the latest btrfs upstream
> and run another round to see if it get better, since there might be some fixes for the very hang
> already in the upstream.
> 
> Right now the latest btrfs upstream's top commit is
> 
> commit f46dbe3dee853f8a860f889cb2b7ff4c624f2a7a
> Author: Chris Mason <chris.mason@fusionio.com>
> Date:   Tue Oct 9 11:17:20 2012 -0400
> 
>     btrfs: init ref_index to zero in add_inode_ref
>     
>     Signed-off-by: Chris Mason <chris.mason@fusionio.com>

This is an old top commit. The current cmason/master state is

commit c37b2b6269ee4637fb7cdb5da0d1e47215d57ce2
Author: Josef Bacik <jbacik@fusionio.com>
Date:   Mon Oct 22 15:51:44 2012 -0400

and includes my recent fixes. I don't really expect them to prevent getting
stuck anywhere. sysrq+w output would be really helpful. I'm trying to reproduce
the problems in the meantime.

-Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice
  2012-11-01 11:57       ` Jan Schmidt
@ 2012-11-01 16:01         ` Jan Schmidt
  2012-11-01 18:13         ` Tobias Franke
  1 sibling, 0 replies; 9+ messages in thread
From: Jan Schmidt @ 2012-11-01 16:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Liu Bo, Franke

On Thu, November 01, 2012 at 12:57 (+0100), Jan Schmidt wrote:
> I'm trying to reproduce the problems in the meantime.

Looks like it worked :-/ And it also looks like it can either bug or deadlock,
depending on the things going on in the kernel at the same time.

I did a parallel fsmark on a qgroup enabled volume while scrubbing it, reaching
at a page fault after four hours of iteration:

<1>[194521.851156] BUG: unable to handle kernel paging request at ffff880137c52a08
<1>[194659.159461] IP: [<ffffffff810e3642>] __lock_acquire+0x62/0x1630
<4>[194659.231741] PGD 1e0c063 PUD be586067 PMD be745067 PTE 8000000137c52160
<4>[194659.311717] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
<4>[194659.375976] Modules linked in: btrfs mpt2sas scsi_transport_sas raid_class
<4>[194659.460230] CPU 6 
<4>[194659.483318] Pid: 20466, comm: btrfs-scrub-3 Tainted: G        W    3.6.0+ #3 Supermicro X8SIL/X8SIL
<4>[194659.595327] RIP: 0010:[<ffffffff810e3642>]  [<ffffffff810e3642>] __lock_acquire+0x62/0x1630
<4>[194659.696829] RSP: 0018:ffff880138ab7c50  EFLAGS: 00010046
<4>[194659.761725] RAX: 0000000000000046 RBX: ffff880137c52a08 RCX: 0000000000000000
<4>[194659.848565] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff880137c52a08
<4>[194659.935405] RBP: ffff880138ab7d20 R08: 0000000000000002 R09: 0000000000000001
<4>[194660.022245] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8802273ba3b0
<4>[194660.108984] R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
<4>[194660.195717] FS:  0000000000000000(0000) GS:ffff880237200000(0000) knlGS:0000000000000000
<4>[194660.293997] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>[194660.363990] CR2: ffff880137c52a08 CR3: 0000000001e0b000 CR4: 00000000000007e0
<4>[194660.450726] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[194660.537564] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[194660.624406] Process btrfs-scrub-3 (pid: 20466, threadinfo ffff880138ab6000, task ffff8802273ba3b0)
<4>[194660.733189] Stack:
<4>[194660.758357]  0000000000000286 ffff8802353a4000 ffff8802273ba3b0 ffffffff00000000
<4>[194660.848733]  ffff880138ab7c90 0000000000000286 ffff880138ab7d20 ffff8802353a4000
<4>[194660.939212]  ffff8802273baa78 ffffffff8245f100 ffff880138ab7cc0 0000000000000286
<4>[194661.029589] Call Trace:
<4>[194661.059969]  [<ffffffff8109811a>] ? del_timer_sync+0x8a/0xc0
<4>[194661.128964]  [<ffffffff81098090>] ? try_to_del_timer_sync+0x70/0x70
<4>[194661.205367]  [<ffffffffa00a306a>] ? worker_loop+0x35a/0x5b0 [btrfs]
<4>[194661.281688]  [<ffffffff810e4ca5>] lock_acquire+0x95/0x140
<4>[194661.347634]  [<ffffffffa00a306a>] ? worker_loop+0x35a/0x5b0 [btrfs]
<4>[194661.423964]  [<ffffffff819380c0>] _raw_spin_lock+0x40/0x80
<4>[194661.490953]  [<ffffffffa00a306a>] ? worker_loop+0x35a/0x5b0 [btrfs]
<4>[194661.567283]  [<ffffffffa00a306a>] worker_loop+0x35a/0x5b0 [btrfs]
<4>[194661.641539]  [<ffffffffa00a2d10>] ? btrfs_queue_worker+0x300/0x300 [btrfs]
<4>[194661.725249]  [<ffffffff810ac3d6>] kthread+0xa6/0xb0
<4>[194661.784961]  [<ffffffff819409a4>] kernel_thread_helper+0x4/0x10
<4>[194661.857120]  [<ffffffff8193901d>] ? retint_restore_args+0xe/0xe
<4>[194661.929294]  [<ffffffff810ac330>] ? __init_kthread_worker+0x70/0x70
<4>[194662.005632]  [<ffffffff819409a0>] ? gs_change+0xb/0xb
<4>[194662.067405] Code: 48 89 5d d8 4c 89 7d f8 45 0f 45 e8 85 c0 48 89 fb 4c 8b 55 10 0f 84 4e 04 00 00 44 8b 3d 2b be 0c 01 45 85 ff 0f 84 56 04 00 00 <48> 81 3b e0 5a 1f 82 b8 01 00 00 00 44 0f 44 e8 83 fe 01 0f 86 
<1>[194662.302652] RIP  [<ffffffff810e3642>] __lock_acquire+0x62/0x1630
<4>[194662.375973]  RSP <ffff880138ab7c50>
<4>[194662.418821] CR2: ffff880137c52a08
<4>[194662.460051] ---[ end trace 85e160ea023efd39 ]---

debug config enabled:
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_SLUB_DEBUG=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_LOCKDEP=y

-Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice
  2012-11-01 11:57       ` Jan Schmidt
  2012-11-01 16:01         ` Jan Schmidt
@ 2012-11-01 18:13         ` Tobias Franke
  1 sibling, 0 replies; 9+ messages in thread
From: Tobias Franke @ 2012-11-01 18:13 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have been trying to get some sysrq output, but I am only getting hard
hangs right now. So no reaction to sysrq-w.

Here is some sysrq-w output from around 15 minutes before the crash:
https://www.dropbox.com/s/386ifyoxh8eppia/sysrqw_2012-11-1.log
I don't know if this will be helpful.

Tobias

On Thu, 01 Nov 2012 12:57:00 +0100
Jan Schmidt <list.btrfs@jan-o-sch.net> wrote:

> On Thu, November 01, 2012 at 03:39 (+0100), Liu Bo wrote:
> > On 11/01/2012 04:00 AM, Franke wrote:
> >> Hi,
> >>
> >> since yesterday I have run a balance while asleep/at work. Now I
> >> experimented a bit, and the situation has changed.
> >>
> >> I am now getting hard hangs ( system is gone without even writing
> >> anything to syslog ), some time ( minutes to an hour ) into
> >> running a scrub.
> >>
> >> Those hangs happen with 3.6.2 , 3.6.4 and Jan's unstable version.
> >> It hasn't hung yet without running a scrub.
> >>
> >> I have no idea if this is part of the same problem or something
> >> else. Do you have any idea either way?
> >>
> > 
> > Well, thanks for testing.
> > 
> > We may need your sysrq-w output(maybe screen output) to locate
> > where we hard hangs.
> > 
> > Besides, I recommend you pick Jan's patches out, and apply them on
> > the latest btrfs upstream and run another round to see if it get
> > better, since there might be some fixes for the very hang already
> > in the upstream.
> > 
> > Right now the latest btrfs upstream's top commit is
> > 
> > commit f46dbe3dee853f8a860f889cb2b7ff4c624f2a7a
> > Author: Chris Mason <chris.mason@fusionio.com>
> > Date:   Tue Oct 9 11:17:20 2012 -0400
> > 
> >     btrfs: init ref_index to zero in add_inode_ref
> >     
> >     Signed-off-by: Chris Mason <chris.mason@fusionio.com>
> 
> This is an old top commit. The current cmason/master state is
> 
> commit c37b2b6269ee4637fb7cdb5da0d1e47215d57ce2
> Author: Josef Bacik <jbacik@fusionio.com>
> Date:   Mon Oct 22 15:51:44 2012 -0400
> 
> and includes my recent fixes. I don't really expect them to prevent
> getting stuck anywhere. sysrq+w output would be really helpful. I'm
> trying to reproduce the problems in the meantime.
> 
> -Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Crashes in extent_io.c after "btrfs bad mapping eb" notice
  2012-10-30 23:57 Crashes in extent_io.c after "btrfs bad mapping eb" notice Franke
  2012-10-31  0:47 ` Liu Bo
@ 2012-10-31  3:46 ` Chris Samuel
  1 sibling, 0 replies; 9+ messages in thread
From: Chris Samuel @ 2012-10-31  3:46 UTC (permalink / raw)
  To: Franke; +Cc: linux-btrfs

On 31/10/12 10:57, Franke wrote:

> I have been having some crashes like this. Since I upgraded to 3.6.4
> they have become common. The crashes happen pretty randomly during
> normal system usage. After the syslog messages the system stays semi
> usable for a minute, but when I run any new program it hangs. I had
> to downgrade to 3.6.2 to get my system usable again.

There were no btrfs changes in either 3.6.3 or 3.6.4, so it must be an
interaction with another change - you will probably need to bisect
between 3.6.2 and 3.6.3 to see what's causing it to happen.

Worth noting that Greg K-H is looking for people to help him with the
stable series...

http://www.kroah.com/log/linux/minion-wanted.html

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-11-01 18:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-30 23:57 Crashes in extent_io.c after "btrfs bad mapping eb" notice Franke
2012-10-31  0:47 ` Liu Bo
2012-10-31  0:50   ` Liu Bo
2012-10-31 20:00   ` Franke
2012-11-01  2:39     ` Liu Bo
2012-11-01 11:57       ` Jan Schmidt
2012-11-01 16:01         ` Jan Schmidt
2012-11-01 18:13         ` Tobias Franke
2012-10-31  3:46 ` Chris Samuel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).