Re: kernel BUG at fs/btrfs/extent-tree.c:1353

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: kernel BUG at fs/btrfs/extent-tree.c:1353
       [not found] ` <20100708143109.GR15984@think>
@ 2010-07-14 15:25   ` Johannes Hirte
  2010-07-15  0:11     ` Dave Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Hirte @ 2010-07-14 15:25 UTC (permalink / raw)
  To: Chris Mason
  Cc: linux-kernel, linux-btrfs, zheng.yan, Jens Axboe, linux-fsdevel

Am Donnerstag 08 Juli 2010, 16:31:09 schrieb Chris Mason:
> Neither Yan nor I have been able to reproduce this locally, but a few
> people have now hit it.  Johannes, are you available to try out a
> debugging kernel to try and track this down?
> 
> -chris
> 
> On Thu, Jul 08, 2010 at 04:27:23PM +0200, Johannes Hirte wrote:
> > When doing a 'rm -r /var/tmp/portage/sys-devel' I get the following Oops:
> > 
> > ------------[ cut here ]------------
> > kernel BUG at fs/btrfs/extent-tree.c:1353!
> > invalid opcode: 0000 [#1] PREEMPT SMP
> > last sysfs file:
> > /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0A:00/power_supply/BAT0/charge_
> > full Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event
> > snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss nfs lockd nfs_acl
> > auth_rpcgss sunrpc sco rfcomm bnep l2cap crc16 xts gf128mul usb_storage
> > dm_crypt dm_mod coretemp hwmon acpi_cpufreq mperf snd_hda_codec_realtek
> > uvcvideo iwl3945 snd_hda_intel snd_hda_codec iwlcore videodev r8169
> > snd_hwdep btusb snd_pcm v4l1_compat mac80211 snd_timer bluetooth snd mii
> > cfg80211 soundcore sg rfkill ac i2c_i801 snd_page_alloc uhci_hcd battery
> > [last unloaded: microcode]
> > 
> > Pid: 2358, comm: rm Not tainted 2.6.35-rc4 #32 M912/M912
> > EIP: 0060:[<c10c383b>] EFLAGS: 00010202 CPU: 1
> > EIP is at lookup_inline_extent_backref+0xf2/0x406
> > EAX: 00000001 EBX: 00000007 ECX: 00000000 EDX: 00000000
> > ESI: 00000004 EDI: f7268150 EBP: 00000004 ESP: f5aa5d08
> > 
> >  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > 
> > Process rm (pid: 2358, ti=f5aa4000 task=f6f0fa70 task.ti=f5aa4000)
> > 
> > Stack:
> >  f702f8c0 f744e080 f665f380 000000b0 00000000 00000000 ffffffff f6c80f00
> > 
> > <0> f744e080 c10ec226 e98acfff f6c98000 00001001 0e987000 00000004
> > 00000000 <0> 00000850 040e9870 a8000000 00001000 00000000 00000007
> > 00000000 0e987000
> > 
> > Call Trace:
> >  [<c10ec226>] ? set_extent_dirty+0x19/0x1d
> >  [<c10c5081>] ? __btrfs_free_extent+0xda/0x675
> >  [<c10c88bf>] ? run_clustered_refs+0x699/0x6d7
> >  [<c10d239f>] ? btrfs_mark_buffer_dirty+0xa3/0xef
> >  [<c1101454>] ? btrfs_find_ref_cluster+0xf9/0x13a
> >  [<c10c89bc>] ? btrfs_run_delayed_refs+0xbf/0x155
> >  [<c10d3a73>] ? __btrfs_end_transaction+0x53/0x16c
> >  [<c10db480>] ? btrfs_delete_inode+0x166/0x17e
> >  [<c102280d>] ? get_parent_ip+0x8/0x19
> >  [<c108fe5c>] ? generic_delete_inode+0x6f/0xbd
> >  [<c108f5b3>] ? iput+0x46/0x48
> >  [<c10893a8>] ? do_unlinkat+0xc7/0x109
> >  [<c102280d>] ? get_parent_ip+0x8/0x19
> >  [<c10822e3>] ? fput+0x12/0x15c
> >  [<c10a2f30>] ? dnotify_flush+0x41/0xc2
> >  [<c107fe85>] ? filp_close+0x4c/0x52
> >  [<c107feed>] ? sys_close+0x62/0x9b
> >  [<c1002550>] ? sysenter_do_call+0x12/0x26
> > 
> > Code: 80 4e 68 02 8d 4c 24 43 89 f8 6a 01 ff 74 24 1c ff 74 24 08 8b 54
> > 24 38 e8 01 c2 ff ff 83 c4 0c 83 f8 00 0f 8c e1 02 00 00 74 02 <0f> 0b
> > 8b 04 24 8b 34 24 8b 00 8b 56 20 89 44 24 08 e8 2e fa ff
> > EIP: [<c10c383b>] lookup_inline_extent_backref+0xf2/0x406 SS:ESP
> > 0068:f5aa5d08 ---[ end trace d97601f0b455ca72 ]---
> > note: rm[2358] exited with preempt_count 2
> > BUG: scheduling while atomic: rm/2358/0x10000003
> > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
> > snd_seq_device snd_pcm_oss snd_mixer_oss nfs lockd nfs_acl auth_rpcgss
> > sunrpc sco rfcomm bnep l2cap crc16 xts gf128mul usb_storage dm_crypt
> > dm_mod coretemp hwmon acpi_cpufreq mperf snd_hda_codec_realtek uvcvideo
> > iwl3945 snd_hda_intel snd_hda_codec iwlcore videodev r8169 snd_hwdep
> > btusb snd_pcm v4l1_compat mac80211 snd_timer bluetooth snd mii cfg80211
> > soundcore sg rfkill ac i2c_i801 snd_page_alloc uhci_hcd battery [last
> > unloaded: microcode]
> > Pid: 2358, comm: rm Tainted: G      D     2.6.35-rc4 #32
> > 
> > Call Trace:
> >  [<c12de6b3>] ? schedule+0x88/0x332
> >  [<c10237c1>] ? __cond_resched+0xf/0x19
> >  [<c12de9e2>] ? _cond_resched+0x12/0x18
> >  [<c106ceec>] ? unmap_vmas+0x4e7/0x534
> >  [<c1070c8f>] ? exit_mmap+0x64/0xa4
> >  [<c1026089>] ? mmput+0x21/0x96
> >  [<c102938e>] ? exit_mm+0xe7/0xf0
> >  [<c12dfa28>] ? _raw_spin_unlock_irqrestore+0x1a/0x24
> >  [<c103aaa1>] ? hrtimer_try_to_cancel+0x31/0x3a
> >  [<c102a42e>] ? do_exit+0x17b/0x57d
> >  [<c1028e78>] ? kmsg_dump+0x81/0xf9
> >  [<c1002d06>] ? do_invalid_op+0x0/0x76
> >  [<c1004fa0>] ? oops_end+0x72/0x75
> >  [<c1002d6f>] ? do_invalid_op+0x69/0x76
> >  [<c10c383b>] ? lookup_inline_extent_backref+0xf2/0x406
> >  [<c10bdc9a>] ? generic_bin_search.clone.0+0x145/0x150
> >  [<c10bcf30>] ? btrfs_cow_block+0x106/0x112
> >  [<c10bdcdc>] ? bin_search+0x37/0x3d
> >  [<c10bfe33>] ? btrfs_search_slot+0x405/0x477
> >  [<c12e031a>] ? error_code+0x66/0x6c
> >  [<c1002d06>] ? do_invalid_op+0x0/0x76
> >  [<c10c383b>] ? lookup_inline_extent_backref+0xf2/0x406
> >  [<c10ec226>] ? set_extent_dirty+0x19/0x1d
> >  [<c10c5081>] ? __btrfs_free_extent+0xda/0x675
> >  [<c10c88bf>] ? run_clustered_refs+0x699/0x6d7
> >  [<c10d239f>] ? btrfs_mark_buffer_dirty+0xa3/0xef
> >  [<c1101454>] ? btrfs_find_ref_cluster+0xf9/0x13a
> >  [<c10c89bc>] ? btrfs_run_delayed_refs+0xbf/0x155
> >  [<c10d3a73>] ? __btrfs_end_transaction+0x53/0x16c
> >  [<c10db480>] ? btrfs_delete_inode+0x166/0x17e
> >  [<c102280d>] ? get_parent_ip+0x8/0x19
> >  [<c108fe5c>] ? generic_delete_inode+0x6f/0xbd
> >  [<c108f5b3>] ? iput+0x46/0x48
> >  [<c10893a8>] ? do_unlinkat+0xc7/0x109
> >  [<c102280d>] ? get_parent_ip+0x8/0x19
> >  [<c10822e3>] ? fput+0x12/0x15c
> >  [<c10a2f30>] ? dnotify_flush+0x41/0xc2
> >  [<c107fe85>] ? filp_close+0x4c/0x52
> >  [<c107feed>] ? sys_close+0x62/0x9b
> >  [<c1002550>] ? sysenter_do_call+0x12/0x26

I'm not sure if btrfs is to blame for this error. After the errors I switched 
to XFS on this system and got now this error:

ls -l .kde4/share/apps/akregator/data/
ls: cannot access .kde4/share/apps/akregator/data/feeds.opml: Structure needs 
cleaning
total 4
?????????? ? ?    ?        ?            ? feeds.opml

xfs_check is showing this:

xfs_check /dev/sda3
link count mismatch for inode 219998792 (name ?), nlink 0, counted 1
disconnected inode 220064328, nlink 1

So this is the second FS I've got suddenly errors, so I think the problem lies 
deeper. Adding some CCs for this.


regards,
  Johannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel BUG at fs/btrfs/extent-tree.c:1353
  2010-07-14 15:25   ` kernel BUG at fs/btrfs/extent-tree.c:1353 Johannes Hirte
@ 2010-07-15  0:11     ` Dave Chinner
  2010-07-15 18:14       ` Johannes Hirte
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2010-07-15  0:11 UTC (permalink / raw)
  To: Johannes Hirte
  Cc: Chris Mason, linux-kernel, linux-btrfs, zheng.yan, Jens Axboe,
	linux-fsdevel

On Wed, Jul 14, 2010 at 05:25:23PM +0200, Johannes Hirte wrote:
> Am Donnerstag 08 Juli 2010, 16:31:09 schrieb Chris Mason:
> I'm not sure if btrfs is to blame for this error. After the errors I switched 
> to XFS on this system and got now this error:
> 
> ls -l .kde4/share/apps/akregator/data/
> ls: cannot access .kde4/share/apps/akregator/data/feeds.opml: Structure needs 
> cleaning
> total 4
> ?????????? ? ?    ?        ?            ? feeds.opml

What is the error reported in dmesg when the XFS filesytem shuts down?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel BUG at fs/btrfs/extent-tree.c:1353
  2010-07-15  0:11     ` Dave Chinner
@ 2010-07-15 18:14       ` Johannes Hirte
  2010-07-16 14:59         ` Johannes Hirte
  2010-07-19  8:01         ` Miao Xie
  0 siblings, 2 replies; 10+ messages in thread
From: Johannes Hirte @ 2010-07-15 18:14 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Chris Mason, linux-kernel, linux-btrfs, zheng.yan, Jens Axboe,
	linux-fsdevel

Am Donnerstag 15 Juli 2010, 02:11:04 schrieb Dave Chinner:
> On Wed, Jul 14, 2010 at 05:25:23PM +0200, Johannes Hirte wrote:
> > Am Donnerstag 08 Juli 2010, 16:31:09 schrieb Chris Mason:
> > I'm not sure if btrfs is to blame for this error. After the errors I
> > switched to XFS on this system and got now this error:
> > 
> > ls -l .kde4/share/apps/akregator/data/
> > ls: cannot access .kde4/share/apps/akregator/data/feeds.opml: Structure
> > needs cleaning
> > total 4
> > ?????????? ? ?    ?        ?            ? feeds.opml
> 
> What is the error reported in dmesg when the XFS filesytem shuts down?

Nothing. I double checked the logs. There are only the messages when mounting 
the filesystem. No other errors are reported than the inaccessible file and the 
output from xfs_check.

regards,
  Johannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel BUG at fs/btrfs/extent-tree.c:1353
  2010-07-15 18:14       ` Johannes Hirte
@ 2010-07-16 14:59         ` Johannes Hirte
  2010-07-19  8:01         ` Miao Xie
  1 sibling, 0 replies; 10+ messages in thread
From: Johannes Hirte @ 2010-07-16 14:59 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Chris Mason, linux-kernel, linux-btrfs, zheng.yan, Jens Axboe,
	linux-fsdevel

Am Donnerstag 15 Juli 2010, 20:14:51 schrieb Johannes Hirte:
> Am Donnerstag 15 Juli 2010, 02:11:04 schrieb Dave Chinner:
> > On Wed, Jul 14, 2010 at 05:25:23PM +0200, Johannes Hirte wrote:
> > > Am Donnerstag 08 Juli 2010, 16:31:09 schrieb Chris Mason:
> > > I'm not sure if btrfs is to blame for this error. After the errors I
> > > switched to XFS on this system and got now this error:
> > > 
> > > ls -l .kde4/share/apps/akregator/data/
> > > ls: cannot access .kde4/share/apps/akregator/data/feeds.opml: Structure
> > > needs cleaning
> > > total 4
> > > ?????????? ? ?    ?        ?            ? feeds.opml
> > 
> > What is the error reported in dmesg when the XFS filesytem shuts down?
> 
> Nothing. I double checked the logs. There are only the messages when
> mounting the filesystem. No other errors are reported than the
> inaccessible file and the output from xfs_check.

I'm running now a kernel with more debug options enabled and got this:

[ 6794.810935] 
[ 6794.810941] =================================
[ 6794.810955] [ INFO: inconsistent lock state ]
[ 6794.810966] 2.6.35-rc4-btrfs-debug #7
[ 6794.810975] ---------------------------------
[ 6794.810984] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[ 6794.810996] kswapd0/361 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 6794.811006]  (&(&ip->i_iolock)->mr_lock#2){++++?+}, at: [<c10fa82d>] 
xfs_ilock+0x22/0x67
[ 6794.811039] {RECLAIM_FS-ON-W} state was registered at:
[ 6794.811046]   [<c104ebc1>] mark_held_locks+0x42/0x5e
[ 6794.811046]   [<c104f1f7>] lockdep_trace_alloc+0x99/0xb0
[ 6794.811046]   [<c10740b8>] __alloc_pages_nodemask+0x6a/0x4a1
[ 6794.811046]   [<c106edc2>] __page_cache_alloc+0x11/0x13
[ 6794.811046]   [<c106fb43>] grab_cache_page_write_begin+0x47/0x81
[ 6794.811046]   [<c10b2050>] block_write_begin_newtrunc+0x2e/0x9c
[ 6794.811046]   [<c10b233a>] block_write_begin+0x23/0x5d
[ 6794.811046]   [<c1114a9d>] xfs_vm_write_begin+0x26/0x28
[ 6794.811046]   [<c106f15d>] generic_file_buffered_write+0xb5/0x1bd
[ 6794.811046]   [<c1117e31>] xfs_file_aio_write+0x40e/0x66d
[ 6794.811046]   [<c10950b4>] do_sync_write+0x8b/0xc6
[ 6794.811046]   [<c109568b>] vfs_write+0x77/0xa4
[ 6794.811046]   [<c10957f3>] sys_write+0x3c/0x5e
[ 6794.811046]   [<c1002690>] sysenter_do_call+0x12/0x36
[ 6794.811046] irq event stamp: 141369
[ 6794.811046] hardirqs last  enabled at (141369): [<c13639d2>] 
_raw_spin_unlock_irqrestore+0x36/0x5b
[ 6794.811046] hardirqs last disabled at (141368): [<c13634c5>] 
_raw_spin_lock_irqsave+0x14/0x68
[ 6794.811046] softirqs last  enabled at (141300): [<c1032d69>] 
__do_softirq+0xfe/0x10d
[ 6794.811046] softirqs last disabled at (141295): [<c1032da7>] 
do_softirq+0x2f/0x47
[ 6794.811046] 
[ 6794.811046] other info that might help us debug this:
[ 6794.811046] 2 locks held by kswapd0/361:
[ 6794.811046]  #0:  (shrinker_rwsem){++++..}, at: [<c10774db>] 
shrink_slab+0x25/0x13f
[ 6794.811046]  #1:  (&xfs_mount_list_lock){++++.-}, at: [<c111cc78>] 
xfs_reclaim_inode_shrink+0x2a/0xe8
[ 6794.811046] 
[ 6794.811046] stack backtrace:
[ 6794.811046] Pid: 361, comm: kswapd0 Not tainted 2.6.35-rc4-btrfs-debug #7
[ 6794.811046] Call Trace:
[ 6794.811046]  [<c13616c0>] ? printk+0xf/0x17
[ 6794.811046]  [<c104e988>] valid_state+0x134/0x142
[ 6794.811046]  [<c104ea66>] mark_lock+0xd0/0x1e9
[ 6794.811046]  [<c104e2a7>] ? check_usage_forwards+0x0/0x5f
[ 6794.811046]  [<c105003d>] __lock_acquire+0x374/0xc80
[ 6794.811046]  [<c1044942>] ? sched_clock_local+0x12/0x121
[ 6794.811046]  [<c1044c0b>] ? sched_clock_cpu+0x122/0x133
[ 6794.811046]  [<c1050d4d>] lock_acquire+0x5f/0x76
[ 6794.811046]  [<c10fa82d>] ? xfs_ilock+0x22/0x67
[ 6794.811046]  [<c1043974>] down_write_nested+0x32/0x63
[ 6794.811046]  [<c10fa82d>] ? xfs_ilock+0x22/0x67
[ 6794.811046]  [<c10fa82d>] xfs_ilock+0x22/0x67
[ 6794.811046]  [<c10faa48>] xfs_ireclaim+0x98/0xbb
[ 6794.811046]  [<c1043a1e>] ? up_write+0x16/0x2b
[ 6794.811046]  [<c111c78c>] xfs_reclaim_inode+0x1a7/0x1b1
[ 6794.811046]  [<c111cafe>] xfs_inode_ag_walk+0x77/0xbc
[ 6794.811046]  [<c111c5e5>] ? xfs_reclaim_inode+0x0/0x1b1
[ 6794.811046]  [<c111cc07>] xfs_inode_ag_iterator+0x52/0x99
[ 6794.811046]  [<c111cc78>] ? xfs_reclaim_inode_shrink+0x2a/0xe8
[ 6794.811046]  [<c111c5e5>] ? xfs_reclaim_inode+0x0/0x1b1
[ 6794.811046]  [<c111cc99>] xfs_reclaim_inode_shrink+0x4b/0xe8
[ 6794.811046]  [<c1077588>] shrink_slab+0xd2/0x13f
[ 6794.811046]  [<c1078cef>] kswapd+0x37d/0x4e9
[ 6794.811046]  [<c104028f>] ? autoremove_wake_function+0x0/0x2f
[ 6794.811046]  [<c1078972>] ? kswapd+0x0/0x4e9
[ 6794.811046]  [<c103ffbc>] kthread+0x60/0x65
[ 6794.811046]  [<c103ff5c>] ? kthread+0x0/0x65
[ 6794.811046]  [<c1002bba>] kernel_thread_helper+0x6/0x10

Don't know if this is related to the problem.


regards,
  Johannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel BUG at fs/btrfs/extent-tree.c:1353
  2010-07-15 18:14       ` Johannes Hirte
  2010-07-16 14:59         ` Johannes Hirte
@ 2010-07-19  8:01         ` Miao Xie
  2010-07-22 18:07           ` Johannes Hirte
  1 sibling, 1 reply; 10+ messages in thread
From: Miao Xie @ 2010-07-19  8:01 UTC (permalink / raw)
  To: Johannes Hirte
  Cc: Dave Chinner, Chris Mason, linux-kernel, linux-btrfs, zheng.yan,
	Jens Axboe, linux-fsdevel

On Thu, 15 Jul 2010 20:14:51 +0200, Johannes Hirte wrote:
> Am Donnerstag 15 Juli 2010, 02:11:04 schrieb Dave Chinner:
>> On Wed, Jul 14, 2010 at 05:25:23PM +0200, Johannes Hirte wrote:
>>> Am Donnerstag 08 Juli 2010, 16:31:09 schrieb Chris Mason:
>>> I'm not sure if btrfs is to blame for this error. After the errors I
>>> switched to XFS on this system and got now this error:
>>>
>>> ls -l .kde4/share/apps/akregator/data/
>>> ls: cannot access .kde4/share/apps/akregator/data/feeds.opml: Structure
>>> needs cleaning
>>> total 4
>>> ?????????? ? ?    ?        ?            ? feeds.opml
>>
>> What is the error reported in dmesg when the XFS filesytem shuts down?
>
> Nothing. I double checked the logs. There are only the messages when mounting
> the filesystem. No other errors are reported than the inaccessible file and the
> output from xfs_check.

Is there anything wrong with your disks or memory?
Sometimes the bad memory can break the filesystem. I have met this kind of problem
some time ago.

If there is no problem with your disk and memory, Could you tell us the parameter of
mkfs.btrfs and mount?

Thanks
Miao Xie

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel BUG at fs/btrfs/extent-tree.c:1353
  2010-07-19  8:01         ` Miao Xie
@ 2010-07-22 18:07           ` Johannes Hirte
  2010-07-23 11:02             ` Daniel J Blueman
                               ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Johannes Hirte @ 2010-07-22 18:07 UTC (permalink / raw)
  To: miaox
  Cc: Dave Chinner, Chris Mason, linux-kernel, linux-btrfs, zheng.yan,
	Jens Axboe, linux-fsdevel

Am Montag 19 Juli 2010, 10:01:46 schrieb Miao Xie:
> On Thu, 15 Jul 2010 20:14:51 +0200, Johannes Hirte wrote:
> > Am Donnerstag 15 Juli 2010, 02:11:04 schrieb Dave Chinner:
> >> On Wed, Jul 14, 2010 at 05:25:23PM +0200, Johannes Hirte wrote:
> >>> Am Donnerstag 08 Juli 2010, 16:31:09 schrieb Chris Mason:
> >>> I'm not sure if btrfs is to blame for this error. After the errors I
> >>> switched to XFS on this system and got now this error:
> >>> 
> >>> ls -l .kde4/share/apps/akregator/data/
> >>> ls: cannot access .kde4/share/apps/akregator/data/feeds.opml: Structure
> >>> needs cleaning
> >>> total 4
> >>> ?????????? ? ?    ?        ?            ? feeds.opml
> >> 
> >> What is the error reported in dmesg when the XFS filesytem shuts down?
> > 
> > Nothing. I double checked the logs. There are only the messages when
> > mounting the filesystem. No other errors are reported than the
> > inaccessible file and the output from xfs_check.
> 
> Is there anything wrong with your disks or memory?
> Sometimes the bad memory can break the filesystem. I have met this kind of
> problem some time ago.

I don't think that's the case. I've checked the RAM with memtest86+ and got no 
errors. I got the errors with two different disks, the first one with btrfs the 
second one now with XFS. Before changing to the second disk, I've run 
badblocks on it to be sure it has no errors.

> 
> If there is no problem with your disk and memory, Could you tell us the
> parameter of mkfs.btrfs and mount?

I'm not sure what parameters I've used for mkbtrfs. It was either none ore '-m 
single'. mount parameters are only noatime. Some time ago I've played a little 
with max_inline.

On the actual disk with XFS I got now some more errors on my root-fs. Similar 
error on one file:

ls: cannot access /var/tmp/portage/app-
office/krita-2.2.1/work/krita-2.2.1/krita/image/tiles3/tests/dm_consistancy_test/dm_consistancy_test.pr: 
Invalid argument

xfs_check shows on this fs:

localhost ~ # xfs_check /dev/sda1
agi unlinked bucket 10 is 7279754 in ag 0 (inode=7279754)
agi unlinked bucket 11 is 7279755 in ag 0 (inode=7279755)
dir 91466358 entry dm_consistancy_test.pr bad inode number 1862628266
dir 91466358 size is 36, should be 35
agi unlinked bucket 48 is 11677104 in ag 2 (inode=78785968)
agi unlinked bucket 49 is 11677105 in ag 2 (inode=78785969)
agi unlinked bucket 50 is 11677106 in ag 2 (inode=78785970)
agi unlinked bucket 51 is 11677107 in ag 2 (inode=78785971)
agi unlinked bucket 52 is 11677108 in ag 2 (inode=78785972)
agi unlinked bucket 53 is 11677109 in ag 2 (inode=78785973)
agi unlinked bucket 54 is 11677110 in ag 2 (inode=78785974)
agi unlinked bucket 55 is 11677111 in ag 2 (inode=78785975)
agi unlinked bucket 58 is 11677114 in ag 2 (inode=78785978)
agi unlinked bucket 59 is 11677115 in ag 2 (inode=78785979)
agi unlinked bucket 60 is 11677116 in ag 2 (inode=78785980)
agi unlinked bucket 61 is 11677117 in ag 2 (inode=78785981)
allocated inode 7279754 has 0 link count
allocated inode 7279755 has 0 link count
disconnected inode 91466360, nlink 1
allocated inode 78785968 has 0 link count
allocated inode 78785969 has 0 link count
allocated inode 78785970 has 0 link count
allocated inode 78785971 has 0 link count
allocated inode 78785972 has 0 link count
allocated inode 78785973 has 0 link count
allocated inode 78785974 has 0 link count
allocated inode 78785975 has 0 link count
allocated inode 78785978 has 0 link count
allocated inode 78785979 has 0 link count
allocated inode 78785980 has 0 link count
allocated inode 78785981 has 0 link count

And again I don't find any related message in dmesg.

regards,
  Johannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel BUG at fs/btrfs/extent-tree.c:1353
  2010-07-22 18:07           ` Johannes Hirte
@ 2010-07-23 11:02             ` Daniel J Blueman
  2010-07-23 11:14             ` Bob Copeland
  2010-07-29 17:09             ` Johannes Hirte
  2 siblings, 0 replies; 10+ messages in thread
From: Daniel J Blueman @ 2010-07-23 11:02 UTC (permalink / raw)
  To: Johannes Hirte
  Cc: miaox, Dave Chinner, Chris Mason, linux-kernel, linux-btrfs,
	zheng.yan, Jens Axboe, linux-fsdevel

On 22 July 2010 19:07, Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de> wrote:
> Am Montag 19 Juli 2010, 10:01:46 schrieb Miao Xie:
>> On Thu, 15 Jul 2010 20:14:51 +0200, Johannes Hirte wrote:
>> > Am Donnerstag 15 Juli 2010, 02:11:04 schrieb Dave Chinner:
>> >> On Wed, Jul 14, 2010 at 05:25:23PM +0200, Johannes Hirte wrote:
>> >>> Am Donnerstag 08 Juli 2010, 16:31:09 schrieb Chris Mason:
>> >>> I'm not sure if btrfs is to blame for this error. After the errors I
>> >>> switched to XFS on this system and got now this error:
>> >>>
>> >>> ls -l .kde4/share/apps/akregator/data/
>> >>> ls: cannot access .kde4/share/apps/akregator/data/feeds.opml: Structure
>> >>> needs cleaning
>> >>> total 4
>> >>> ?????????? ? ?    ?        ?            ? feeds.opml
>> >>
>> >> What is the error reported in dmesg when the XFS filesytem shuts down?
>> >
>> > Nothing. I double checked the logs. There are only the messages when
>> > mounting the filesystem. No other errors are reported than the
>> > inaccessible file and the output from xfs_check.
>>
>> Is there anything wrong with your disks or memory?
>> Sometimes the bad memory can break the filesystem. I have met this kind of
>> problem some time ago.
>
> I don't think that's the case. I've checked the RAM with memtest86+ and got no
> errors. I got the errors with two different disks, the first one with btrfs the
> second one now with XFS. Before changing to the second disk, I've run
> badblocks on it to be sure it has no errors.

There are some known-buggy chipsets also. One still around is the
Nvidia CK804/MCP55, under certain patterns of spatially-local pending
reads and writes to the memory controller, a 64-byte request would
occasionally be returned with the wrong offset. I was hitting it with
some 27-Gbit adapters and managed to capture it on a PCI-e protocol
analyser. Rsync between network and local disk would hit sometimes
too.
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel BUG at fs/btrfs/extent-tree.c:1353
  2010-07-22 18:07           ` Johannes Hirte
  2010-07-23 11:02             ` Daniel J Blueman
@ 2010-07-23 11:14             ` Bob Copeland
  2010-07-29 17:09             ` Johannes Hirte
  2 siblings, 0 replies; 10+ messages in thread
From: Bob Copeland @ 2010-07-23 11:14 UTC (permalink / raw)
  To: Johannes Hirte
  Cc: miaox, Dave Chinner, Chris Mason, linux-kernel, linux-btrfs,
	zheng.yan, Jens Axboe, linux-fsdevel

On Thu, Jul 22, 2010 at 2:07 PM, Johannes Hirte
<johannes.hirte@fem.tu-ilmenau.de> wrote:
>> Is there anything wrong with your disks or memory?
>> Sometimes the bad memory can break the filesystem. I have met this kind of
>> problem some time ago.
>
> I don't think that's the case. I've checked the RAM with memtest86+ and got no
> errors. I got the errors with two different disks, the first one with btrfs the
> second one now with XFS. Before changing to the second disk, I've run
> badblocks on it to be sure it has no errors.

You might also try kmemcheck.  There's a good chance that a bug
that scribbles random memory shows up as FS corruption.  I have had
ext4 corruption due to an inotify bug, which kmemcheck found on the
first try.

-- 
Bob Copeland %% www.bobcopeland.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel BUG at fs/btrfs/extent-tree.c:1353
  2010-07-22 18:07           ` Johannes Hirte
  2010-07-23 11:02             ` Daniel J Blueman
  2010-07-23 11:14             ` Bob Copeland
@ 2010-07-29 17:09             ` Johannes Hirte
  2010-07-29 18:54               ` Jens Axboe
  2 siblings, 1 reply; 10+ messages in thread
From: Johannes Hirte @ 2010-07-29 17:09 UTC (permalink / raw)
  To: miaox
  Cc: Dave Chinner, Chris Mason, linux-kernel, linux-btrfs, zheng.yan,
	Jens Axboe, linux-fsdevel

Am Donnerstag 22 Juli 2010, 20:07:23 schrieb Johannes Hirte:
> Am Montag 19 Juli 2010, 10:01:46 schrieb Miao Xie:
> > On Thu, 15 Jul 2010 20:14:51 +0200, Johannes Hirte wrote:
> > > Am Donnerstag 15 Juli 2010, 02:11:04 schrieb Dave Chinner:
> > >> On Wed, Jul 14, 2010 at 05:25:23PM +0200, Johannes Hirte wrote:
> > >>> Am Donnerstag 08 Juli 2010, 16:31:09 schrieb Chris Mason:
> > >>> I'm not sure if btrfs is to blame for this error. After the errors I
> > >>> switched to XFS on this system and got now this error:
> > >>> 
> > >>> ls -l .kde4/share/apps/akregator/data/
> > >>> ls: cannot access .kde4/share/apps/akregator/data/feeds.opml:
> > >>> Structure needs cleaning
> > >>> total 4
> > >>> ?????????? ? ?    ?        ?            ? feeds.opml
> > >> 
> > >> What is the error reported in dmesg when the XFS filesytem shuts down?
> > > 
> > > Nothing. I double checked the logs. There are only the messages when
> > > mounting the filesystem. No other errors are reported than the
> > > inaccessible file and the output from xfs_check.
> > 
> > Is there anything wrong with your disks or memory?
> > Sometimes the bad memory can break the filesystem. I have met this kind
> > of problem some time ago.
> 
> I don't think that's the case. I've checked the RAM with memtest86+ and got
> no errors. I got the errors with two different disks, the first one with
> btrfs the second one now with XFS. Before changing to the second disk,
> I've run badblocks on it to be sure it has no errors.

I think I've found it. The bug was introduced by 

commit 7f0e7bed936a0c422641a046551829a01341dd80
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Jun 8 18:14:34 2010 +0200

    writeback: fix writeback completion notifications
    
    The code dealing with bdi_work->state and completion of a bdi_work is a
    major mess currently.  This patch makes sure we directly use one set of
    flags to deal with it, and use it consistently, which means:
    
     - always notify about completion from the rcu callback.  We only ever
       wait for it from on-stack callers, so this simplification does not
       even cause a theoretical slowdown currently.  It also makes sure we
       don't miss out on the notification if we ever add other callers to
       wait for it.
     - make earlier completion notification depending on the on-stack
       allocation, not the sync mode.  If we introduce new callers that
       want to do WB_SYNC_NONE writeback from on-stack callers this will
       be nessecary.
    
    Also rename bdi_wait_on_work_clear to bdi_wait_on_work_done and inline
    a few small functions into their only caller to make the code
    understandable.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

and seems to be fixed by

commit 83ba7b071f30f7c01f72518ad72d5cd203c27502
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Jul 6 08:59:53 2010 +0200

    writeback: simplify the write back thread queue
    
    First remove items from work_list as soon as we start working on them.This
    means we don't have to track any pending or visited state and can get
    rid of all the RCU magic freeing the work items - we can simply free
    them once the operation has finished.  Second use a real completion for
    tracking synchronous requests - if the caller sets the completion pointer
    we complete it, otherwise use it as a boolean indicator that we can free
    the work item directly.  Third unify struct wb_writeback_args and struct
    bdi_work into a single data structure, wb_writeback_work.  Previous we
    set all parameters into a struct wb_writeback_args, copied it into
    struct bdi_work, copied it again on the stack to use it there.  Instead
    of just allocate one structure dynamically or on the stack and use it
    all the way through the stack.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

I was able to reproduce the bug by unpacking a big tar-file and deleting this files multiple times. Normally with btrfs the kernel crashed within 20 runs. After commit 83ba7b071f30f7c01f72518ad72d5cd203c27502 it survived more than 500 runs.


regards,
  Johannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel BUG at fs/btrfs/extent-tree.c:1353
  2010-07-29 17:09             ` Johannes Hirte
@ 2010-07-29 18:54               ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2010-07-29 18:54 UTC (permalink / raw)
  To: Johannes Hirte
  Cc: miaox, Dave Chinner, Chris Mason, linux-kernel, linux-btrfs,
	zheng.yan, linux-fsdevel

On 07/29/2010 07:09 PM, Johannes Hirte wrote:
> Am Donnerstag 22 Juli 2010, 20:07:23 schrieb Johannes Hirte:
>> Am Montag 19 Juli 2010, 10:01:46 schrieb Miao Xie:
>>> On Thu, 15 Jul 2010 20:14:51 +0200, Johannes Hirte wrote:
>>>> Am Donnerstag 15 Juli 2010, 02:11:04 schrieb Dave Chinner:
>>>>> On Wed, Jul 14, 2010 at 05:25:23PM +0200, Johannes Hirte wrote:
>>>>>> Am Donnerstag 08 Juli 2010, 16:31:09 schrieb Chris Mason:
>>>>>> I'm not sure if btrfs is to blame for this error. After the errors I
>>>>>> switched to XFS on this system and got now this error:
>>>>>>
>>>>>> ls -l .kde4/share/apps/akregator/data/
>>>>>> ls: cannot access .kde4/share/apps/akregator/data/feeds.opml:
>>>>>> Structure needs cleaning
>>>>>> total 4
>>>>>> ?????????? ? ?    ?        ?            ? feeds.opml
>>>>>
>>>>> What is the error reported in dmesg when the XFS filesytem shuts down?
>>>>
>>>> Nothing. I double checked the logs. There are only the messages when
>>>> mounting the filesystem. No other errors are reported than the
>>>> inaccessible file and the output from xfs_check.
>>>
>>> Is there anything wrong with your disks or memory?
>>> Sometimes the bad memory can break the filesystem. I have met this kind
>>> of problem some time ago.
>>
>> I don't think that's the case. I've checked the RAM with memtest86+ and got
>> no errors. I got the errors with two different disks, the first one with
>> btrfs the second one now with XFS. Before changing to the second disk,
>> I've run badblocks on it to be sure it has no errors.
> 
> I think I've found it. The bug was introduced by 
> 
> commit 7f0e7bed936a0c422641a046551829a01341dd80
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Tue Jun 8 18:14:34 2010 +0200
> 
>     writeback: fix writeback completion notifications
>     
>     The code dealing with bdi_work->state and completion of a bdi_work is a
>     major mess currently.  This patch makes sure we directly use one set of
>     flags to deal with it, and use it consistently, which means:
>     
>      - always notify about completion from the rcu callback.  We only ever
>        wait for it from on-stack callers, so this simplification does not
>        even cause a theoretical slowdown currently.  It also makes sure we
>        don't miss out on the notification if we ever add other callers to
>        wait for it.
>      - make earlier completion notification depending on the on-stack
>        allocation, not the sync mode.  If we introduce new callers that
>        want to do WB_SYNC_NONE writeback from on-stack callers this will
>        be nessecary.
>     
>     Also rename bdi_wait_on_work_clear to bdi_wait_on_work_done and inline
>     a few small functions into their only caller to make the code
>     understandable.
>     
>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>     Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
> 
> and seems to be fixed by
> 
> commit 83ba7b071f30f7c01f72518ad72d5cd203c27502
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Tue Jul 6 08:59:53 2010 +0200
> 
>     writeback: simplify the write back thread queue
>     
>     First remove items from work_list as soon as we start working on them.This
>     means we don't have to track any pending or visited state and can get
>     rid of all the RCU magic freeing the work items - we can simply free
>     them once the operation has finished.  Second use a real completion for
>     tracking synchronous requests - if the caller sets the completion pointer
>     we complete it, otherwise use it as a boolean indicator that we can free
>     the work item directly.  Third unify struct wb_writeback_args and struct
>     bdi_work into a single data structure, wb_writeback_work.  Previous we
>     set all parameters into a struct wb_writeback_args, copied it into
>     struct bdi_work, copied it again on the stack to use it there.  Instead
>     of just allocate one structure dynamically or on the stack and use it
>     all the way through the stack.
>     
>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>     Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
> 
> I was able to reproduce the bug by unpacking a big tar-file and
> deleting this files multiple times. Normally with btrfs the kernel
> crashed within 20 runs. After commit
> 83ba7b071f30f7c01f72518ad72d5cd203c27502 it survived more than 500
> runs.

Makes sense, that first commit would potentially pass in stack cruft as
the wbc arg. So I think we can safely consider it fixed now.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-07-29 18:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <201007081627.24654.johannes.hirte@fem.tu-ilmenau.de>
     [not found] ` <20100708143109.GR15984@think>
2010-07-14 15:25   ` kernel BUG at fs/btrfs/extent-tree.c:1353 Johannes Hirte
2010-07-15  0:11     ` Dave Chinner
2010-07-15 18:14       ` Johannes Hirte
2010-07-16 14:59         ` Johannes Hirte
2010-07-19  8:01         ` Miao Xie
2010-07-22 18:07           ` Johannes Hirte
2010-07-23 11:02             ` Daniel J Blueman
2010-07-23 11:14             ` Bob Copeland
2010-07-29 17:09             ` Johannes Hirte
2010-07-29 18:54               ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).