* 2.6.29-rc2 oops and assertion failure...
@ 2011-04-07 7:21 Daniel J Blueman
2011-04-07 16:23 ` Josef Bacik
0 siblings, 1 reply; 4+ messages in thread
From: Daniel J Blueman @ 2011-04-07 7:21 UTC (permalink / raw)
To: Linux BTRFS, Chris Mason
When running a practical stress-test on 2.6.29-rc2 trying to reproduce
an older (extent refcounting) issue, I am consistently able to hit an
oops [1] and an assertion failure [2].
Here, I'm testing with 8 block ramdisks, configured in the kernel to
256MB each (intentionally testing free-space handling):
for i in `seq 0 7`; do mknod /dev/ram$i b 1 $i; dd if=/dev/zero
of=/dev/ram$i bs=1024k count=256; done
mkfs.btrfs -m raid10 -d raid10 /dev/ram0 /dev/ram1 /dev/ram2 /dev/ram3
/dev/ram4 /dev/ram5 /dev/ram6 /dev/ram7
mount /dev/ram0 /mnt -o space_cache,ssd,nobarrier,compress # try
without compress also
cp -xa / /mnt
the next steps are executed in parallel:
while :; do cp -xa / /mnt; done &
while :; do btrfs filesystem balance /mnt; done &
while :; do find /mnt -print0 | xargs -0 btrfs filesystem defragment -c; done &
--- [1]
general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/bus/hid/drivers/generic-usb/new_id
CPU 0
Modules linked in: brd loop [last unloaded: brd]
Pid: 28000, comm: btrfs Tainted: G W 2.6.39-rc2-350cd #2
Supermicro X8STi/X8STi
RIP: 0010:[<ffffffff812d40a4>] [<ffffffff812d40a4>]
btrfs_write_out_cache+0x9d4/0xdf0
RSP: 0018:ffff8802af913968 EFLAGS: 00010246
RAX: db73880000000000 RBX: 0000000000000000 RCX: 0000000000000200
RDX: 0000000000001000 RSI: ffff8802ba9b1048 RDI: db73880000000000
RBP: ffff8802af913ae8 R08: 0000000000000001 R09: 0000000000000000
R10: ffffffff810e8130 R11: 0000000000000000 R12: ffff8802510a3be0
R13: ffff8802acf8b948 R14: ffff8802510a3bb0 R15: ffff8802b9f561c8
FS: 00007fabcef8d740(0000) GS:ffff88031fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000003c960c8 CR3: 00000002afa29000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process btrfs (pid: 28000, threadinfo ffff8802af912000, task ffff8803090d0000)
Stack:
ffff8802af913988 0000000000000001 ffff8802af913998 ffff8802af913a88
ffff8802af9139a8 0000000000000000 0000000000000040 ffff880215059770
ffff880215059710 0000000000000010 ffff8802acf8b908 000000000000000f
Call Trace:
[<ffffffff810506b1>] ? get_parent_ip+0x11/0x50
[<ffffffff8105584d>] ? sub_preempt_count+0x9d/0xd0
[<ffffffff8128a98b>] btrfs_write_dirty_block_groups+0x2ab/0x300
[<ffffffff81296a35>] commit_cowonly_roots+0x105/0x1e0
[<ffffffff8129782d>] btrfs_commit_transaction+0x37d/0x720
[<ffffffff81080ad0>] ? wake_up_bit+0x40/0x40
[<ffffffff812e0afc>] relocate_block_group+0x4bc/0x600
[<ffffffff812e0de8>] btrfs_relocate_block_group+0x1a8/0x2d0
[<ffffffff812c14ed>] btrfs_relocate_chunk+0x6d/0x3b0
[<ffffffff810506b1>] ? get_parent_ip+0x11/0x50
[<ffffffff8105584d>] ? sub_preempt_count+0x9d/0xd0
[<ffffffff812c20dd>] btrfs_balance+0x20d/0x280
[<ffffffff812c9ec0>] btrfs_ioctl+0x450/0x590
[<ffffffff81152e8d>] do_vfs_ioctl+0x8d/0x330
[<ffffffff81141444>] ? fget_light+0x274/0x3c0
[<ffffffff81106cc0>] ? __do_fault+0x150/0x5d0
[<ffffffff8115317a>] sys_ioctl+0x4a/0x80
[<ffffffff81709ffb>] system_call_fastpath+0x16/0x1b
Code: 89 ad 38 ff ff ff 49 89 c7 4c 8b ad 48 ff ff ff e9 e4 00 00 00
66 90 40 f6 c7 04 0f 85 6e 01 00 00 89 d1 c1 e9 03 f6 c2 04 89 c9 <f3>
48 a5 74 09 8b 0e 89 0f b9 04 00 00 00 f6 c2 02 74 0e 44 0f
RIP [<ffffffff812d40a4>] btrfs_write_out_cache+0x9d4/0xdf0
RSP <ffff8802af913968>
---[ end trace a7919e7f17c0a728 ]---
--- [2]
kernel BUG at fs/btrfs/relocation.c:4282!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/bdi/btrfs-1/uevent
CPU 0
Modules linked in: brd loop
Pid: 7775, comm: flush-btrfs-1 Tainted: G W 2.6.39-rc2-350cd
#2 Supermicro X8STi/X8STi
RIP: 0010:[<ffffffff812da5ab>] [<ffffffff812da5ab>]
btrfs_reloc_cow_block+0x28b/0x2c0
RSP: 0018:ffff8803057817f0 EFLAGS: 00010246
RAX: ffff880305728000 RBX: ffff880305640000 RCX: ffff880235d92e40
RDX: ffff880209c1f5f0 RSI: ffff880308bdd168 RDI: ffff8802ff1fb220
RBP: ffff880305781850 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff812d8630 R11: 0000000000000000 R12: ffff880308bdd168
R13: ffff880209c1f5f0 R14: ffff8802ff1fb220 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88031fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f7355663650 CR3: 00000001f75f7000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process flush-btrfs-1 (pid: 7775, threadinfo ffff880305780000, task
ffff880308fa8000)
Stack:
ffff880305781850 ffffffff81276c5d fffffffffffffff7 ffffea0006cd7e90
0000000000000000 ffff880235d92e40 ffff880305781850 ffff880308bdd168
ffff880235d92e40 ffff880209c1f5f0 ffff8802ff1fb220 0000000000000000
Call Trace:
[<ffffffff81276c5d>] ? update_ref_for_cow+0x26d/0x360
[<ffffffff81277401>] __btrfs_cow_block+0x6b1/0x980
[<ffffffff81277e4b>] btrfs_cow_block+0x11b/0x2c0
[<ffffffff8127b915>] btrfs_search_slot+0x3c5/0x790
[<ffffffff812762d5>] ? btrfs_alloc_path+0x15/0x30
[<ffffffff812a1640>] btrfs_truncate_inode_items+0x110/0x770
[<ffffffff810506b1>] ? get_parent_ip+0x11/0x50
[<ffffffff817094d0>] ? _raw_spin_unlock+0x30/0x60
[<ffffffff812a21fb>] btrfs_evict_inode+0x18b/0x200
[<ffffffff8115b511>] evict+0x81/0x180
[<ffffffff8115b9c6>] iput_final+0xe6/0x1a0
[<ffffffff8115bab6>] iput+0x36/0x50
[<ffffffff811672de>] writeback_sb_inodes+0x12e/0x1d0
[<ffffffff81167e9b>] writeback_inodes_wb+0x7b/0x180
[<ffffffff8116825b>] wb_writeback+0x2bb/0x320
[<ffffffff8115c882>] ? get_nr_inodes+0x62/0xb0
[<ffffffff811684dc>] wb_do_writeback+0x21c/0x230
[<ffffffff81168582>] bdi_writeback_thread+0x92/0x180
[<ffffffff811684f0>] ? wb_do_writeback+0x230/0x230
[<ffffffff81080596>] kthread+0xb6/0xc0
[<ffffffff8109629d>] ? trace_hardirqs_on_caller+0x14d/0x190
[<ffffffff8170b154>] kernel_thread_helper+0x4/0x10
[<ffffffff81055718>] ? finish_task_switch+0x78/0x110
[<ffffffff81709884>] ? retint_restore_args+0xe/0xe
[<ffffffff810804e0>] ? __init_kthread_worker+0x70/0x70
[<ffffffff8170b150>] ? gs_change+0xb/0xb
Code: ff ff e8 79 bf 42 00 e9 ae fe ff ff eb 02 90 90 e8 6b bf 42 00
eb 01 90 e9 33 fe ff ff 48 83 be 47 01 00 00 f7 0f 85 c2 fd ff ff <0f>
0b eb fe 48 3b 50 20 0f 84 04 ff ff ff 0f 0b eb fe 83 7d c4
RIP [<ffffffff812da5ab>] btrfs_reloc_cow_block+0x28b/0x2c0
RSP <ffff8803057817f0>
---[ end trace a7919e7f17c0a728 ]---
--
Daniel J Blueman
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.29-rc2 oops and assertion failure...
2011-04-07 7:21 2.6.29-rc2 oops and assertion failure Daniel J Blueman
@ 2011-04-07 16:23 ` Josef Bacik
2011-04-08 2:26 ` Daniel J Blueman
0 siblings, 1 reply; 4+ messages in thread
From: Josef Bacik @ 2011-04-07 16:23 UTC (permalink / raw)
To: Daniel J Blueman; +Cc: Linux BTRFS, Chris Mason
On 04/07/2011 03:21 AM, Daniel J Blueman wrote:
> When running a practical stress-test on 2.6.29-rc2 trying to reproduce
> an older (extent refcounting) issue, I am consistently able to hit an
> oops [1] and an assertion failure [2].
>
Sorry about that, please apply the patch I just sent this morning
[PATCH] Btrfs: deal with the case that we run out of space in the cache
Thanks,
Josef
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.29-rc2 oops and assertion failure...
2011-04-07 16:23 ` Josef Bacik
@ 2011-04-08 2:26 ` Daniel J Blueman
2011-04-08 13:37 ` Josef Bacik
0 siblings, 1 reply; 4+ messages in thread
From: Daniel J Blueman @ 2011-04-08 2:26 UTC (permalink / raw)
To: Josef Bacik, Chris Mason; +Cc: Linux BTRFS
Hi Josef, Chris,
On 8 April 2011 00:23, Josef Bacik <josef@redhat.com> wrote:
> On 04/07/2011 03:21 AM, Daniel J Blueman wrote:
>>
>> When running a practical stress-test on 2.6.29-rc2 trying to reproduce
>> an older (extent refcounting) issue, I am consistently able to hit an
>> oops [] and an assertion failure [].
>
> Sorry about that, please apply the patch I just sent this morning
>
> [PATCH] Btrfs: deal with the case that we run out of space in the cache
Superb work - the btrfs_write_out_cache oops is addressed, so now we
(separately) hit a few other assertions at: volumes.c:2013 [1],
volumes.c:2063 [2] and volumes.c:2703 [3] with the previous
reproducer.
Let me know if adding any debugging or other testing may be useful.
Thanks,
Daniel
--- [1]
kernel BUG at fs/btrfs/volumes.c:2013!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/block/ram7/removable
CPU 0
Modules linked in: ppp_generic slhc tun brd loop
Pid: 17040, comm: btrfs Tainted: G W 2.6.39-rc2-350cd+ #3
Supermicro X8STi/X8STi
RIP: 0010:[<ffffffff812c214b>] [<ffffffff812c214b>] btrfs_balance+0x27b/0x280
RSP: 0018:ffff88015c923e08 EFLAGS: 00010282
RAX: 00000000fffffffb RBX: ffff880301d6e1b0 RCX: 0000000000000040
RDX: 00000000fffffffb RSI: 0000000000000000 RDI: ffffffff8112e425
RBP: ffff88015c923e88 R08: 0000000000000000 R09: ffff8802f8ee53f0
R10: 0000000000000012 R11: 0000000000000098 R12: ffff8802f909a490
R13: ffff8802f909bc38 R14: 0000000010000000 R15: 00007fffd1599ce0
FS: 00007f3c4b6f4740(0000) GS:ffff88031fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000f00098 CR3: 000000015c921000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process btrfs (pid: 17040, threadinfo ffff88015c922000, task ffff88030b898000)
Stack:
ffff880307cd5498 ffff880301d6c120 ffff88015c923e38 ffffffff81085b9e
ffff880308a5d700 0000000000000008 ffff88015c923f48 ffffffff81031d5c
ffffea000a9e7b40 ffff88015c923f58 ffff88030b898000 ffff88015c8aa300
Call Trace:
[<ffffffff81085b9e>] ? up_read+0x1e/0x40
[<ffffffff81031d5c>] ? do_page_fault+0x1cc/0x440
[<ffffffff812c9ec0>] btrfs_ioctl+0x450/0x590
[<ffffffff81152e8d>] do_vfs_ioctl+0x8d/0x330
[<ffffffff81141444>] ? fget_light+0x274/0x3c0
[<ffffffff81106cc0>] ? __do_fault+0x150/0x5d0
[<ffffffff8115317a>] sys_ioctl+0x4a/0x80
[<ffffffff8170a03b>] system_call_fastpath+0x16/0x1b
Code: 81 c7 d8 22 00 00 e8 05 4b 44 00 8b 45 80 e9 e7 fd ff ff 31 c0
eb d2 85 c0 74 a7 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe <0f>
0b eb fe 90 55 48 89 e5 48 83 ec 40 8b 05 e2 62 72 00 4c 89
RIP [<ffffffff812c214b>] btrfs_balance+0x27b/0x280
RSP <ffff88015c923e08>
--- [2]
kernel BUG at fs/btrfs/volumes.c:2063!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/block/ram7/removable
CPU 0
Modules linked in: brd loop
Pid: 13460, comm: btrfs Tainted: G W 2.6.39-rc2-350cd+ #3
Supermicro X8STi/X8STi
RIP: 0010:[<ffffffff812c213b>] [<ffffffff812c213b>] btrfs_balance+0x26b/0x280
RSP: 0018:ffff8800b1827e08 EFLAGS: 00010282
RAX: 00000000fffffffb RBX: ffff88030934d168 RCX: 0000000000000006
RDX: 00000000fffffffb RSI: ffff880308fc06f0 RDI: ffff880308fc0000
RBP: ffff8800b1827e88 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8802ff5455e8
R13: ffff8800b1827e38 R14: 000000010d560000 R15: ffff8800b1827e18
FS: 00007fce737e5740(0000) GS:ffff88031fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000002371688 CR3: 00000000b1ff8000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process btrfs (pid: 13460, threadinfo ffff8800b1826000, task ffff880308fc0000)
Stack:
0000000000000100 ffff88030934e1b0 0000000000000100 0000010d560000e4
ffff880308837a00 0000000000000008 0000000000000100 00000113bbffffe4
ffff880308fc0600 ffff8800b1827f58 ffff880308fc0000 ffff8801f8c56c00
Call Trace:
[<ffffffff812c9ec0>] btrfs_ioctl+0x450/0x590
[<ffffffff81152e8d>] do_vfs_ioctl+0x8d/0x330
[<ffffffff8114148f>] ? fget_light+0x2bf/0x3c0
[<ffffffff8109629d>] ? trace_hardirqs_on_caller+0x14d/0x190
[<ffffffff8115317a>] sys_ioctl+0x4a/0x80
[<ffffffff8170a03b>] system_call_fastpath+0x16/0x1b
Code: 7c 90 fb ff 48 8b 55 88 48 8b ba 58 01 00 00 48 81 c7 d8 22 00
00 e8 05 4b 44 00 8b 45 80 e9 e7 fd ff ff 31 c0 eb d2 85 c0 74 a7 <0f>
0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 90
RIP [<ffffffff812c213b>] btrfs_balance+0x26b/0x280
RSP <ffff8800b1827e08>
--- [3]
kernel BUG at fs/btrfs/volumes.c:2703!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/bdi/btrfs-3/uevent
CPU 0
Modules linked in: brd loop
Pid: 14333, comm: btrfs-delalloc- Tainted: G W
2.6.39-rc2-350cd+ #3 Supermicro X8STi/X8STi
RIP: 0010:[<ffffffff812c08c2>] [<ffffffff812c08c2>]
__finish_chunk_alloc+0x212/0x220
RSP: 0018:ffff8803007e7af0 EFLAGS: 00010286
RAX: 00000000ffffffe4 RBX: ffff88024e54e000 RCX: 0000000000000040
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8112e425
RBP: ffff8803007e7b70 R08: 0000000000000000 R09: ffff8803072fe168
R10: 0000000000000012 R11: 0000000000000098 R12: ffff880303c192a8
R13: ffff88020a461e70 R14: ffff8801c2632090 R15: 00000000000000b0
FS: 0000000000000000(0000) GS:ffff88031fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000002953c98 CR3: 00000002fdfd3000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process btrfs-delalloc- (pid: 14333, threadinfo ffff8803007e6000, task
ffff880308ccc020)
Stack:
00000002007e7b70 0000000000000003 00000000006e0000 000000007ffc0000
ffff8801c2634120 0000000000370000 0000000000000100 0000007ffc0000e4
0000000000370000 ffff88024e54e000 0000000000000246 ffff8801c2632090
Call Trace:
[<ffffffff812c3d0e>] btrfs_alloc_chunk+0x8e/0xa0
[<ffffffff81281ed6>] do_chunk_alloc+0x1b6/0x280
[<ffffffff812844e4>] btrfs_reserve_extent+0xb4/0x170
[<ffffffff81706c39>] ? mutex_unlock+0x9/0x10
[<ffffffff812980c7>] ? start_transaction+0x247/0x2b0
[<ffffffff8129db9e>] submit_compressed_extents+0xfe/0x460
[<ffffffff810506b1>] ? get_parent_ip+0x11/0x50
[<ffffffff8129df7f>] async_cow_submit+0x7f/0x90
[<ffffffff812c452b>] run_ordered_completions+0x7b/0xc0
[<ffffffff812c4f9c>] worker_loop+0x16c/0x3c0
[<ffffffff812c4e30>] ? check_pending_worker_creates+0xd0/0xd0
[<ffffffff81080596>] kthread+0xb6/0xc0
[<ffffffff8170b194>] kernel_thread_helper+0x4/0x10
[<ffffffff81055718>] ? finish_task_switch+0x78/0x110
[<ffffffff817098c4>] ? retint_restore_args+0xe/0xe
[<ffffffff810804e0>] ? __init_kthread_worker+0x70/0x70
[<ffffffff8170b190>] ? gs_change+0xb/0xb
Code: 1d 07 00 44 89 a3 58 07 00 00 4c 89 ef e8 c7 ef e6 ff 31 c0 48
83 c4 58 5b 41 5c 41 5d 41 5e 41 5f c9 c3 0f 0b eb fe 0f 0b eb fe <0f>
0b eb fe eb 08 90 90 90 90 90 90 90 90 55 49 89 ca 48 89 e5
RIP [<ffffffff812c08c2>] __finish_chunk_alloc+0x212/0x220
RSP <ffff8803007e7af0>
--
Daniel J Blueman
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.29-rc2 oops and assertion failure...
2011-04-08 2:26 ` Daniel J Blueman
@ 2011-04-08 13:37 ` Josef Bacik
0 siblings, 0 replies; 4+ messages in thread
From: Josef Bacik @ 2011-04-08 13:37 UTC (permalink / raw)
To: Daniel J Blueman; +Cc: Chris Mason, Linux BTRFS
On 04/07/2011 10:26 PM, Daniel J Blueman wrote:
> Hi Josef, Chris,
>
> On 8 April 2011 00:23, Josef Bacik<josef@redhat.com> wrote:
>> On 04/07/2011 03:21 AM, Daniel J Blueman wrote:
>>>
>>> When running a practical stress-test on 2.6.29-rc2 trying to reproduce
>>> an older (extent refcounting) issue, I am consistently able to hit an
>>> oops [] and an assertion failure [].
>>
>> Sorry about that, please apply the patch I just sent this morning
>>
>> [PATCH] Btrfs: deal with the case that we run out of space in the cache
>
> Superb work - the btrfs_write_out_cache oops is addressed, so now we
> (separately) hit a few other assertions at: volumes.c:2013 [1],
> volumes.c:2063 [2] and volumes.c:2703 [3] with the previous
> reproducer.
>
> Let me know if adding any debugging or other testing may be useful.
>
> Thanks,
> Daniel
Looks like the first 2 panics are basically the same thing. You are
getting -EIO back from btrfs_shrink_device(), which could either come
from searching or it could come from the stuff in relocation.c. So will
you put printk's at the 2 places in relocation.c where we return -EIO
and figure out which one is getting tripped? Once we know who is
returning EIO we can go from there. As for the last one, that's just a
normal ENOSPC, but it's because we're allocating a chunk in the
submission path, so that's going to be a little trickier to deal with.
Lets fix these first two panics first and then hopefully that last one
will just go away :). Thanks,
Josef
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-04-08 13:37 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-07 7:21 2.6.29-rc2 oops and assertion failure Daniel J Blueman
2011-04-07 16:23 ` Josef Bacik
2011-04-08 2:26 ` Daniel J Blueman
2011-04-08 13:37 ` Josef Bacik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).