linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* warning and bug on 3.2-rc4 + for-linus from yesterday
@ 2011-12-09 20:39 Simon Kirby
  2011-12-10  2:21 ` Simon Kirby
  2011-12-12  0:30 ` David Sterba
  0 siblings, 2 replies; 4+ messages in thread
From: Simon Kirby @ 2011-12-09 20:39 UTC (permalink / raw)
  To: linux-btrfs

Hello!

We recently upgraded our backup server kernel (rsync with snapshots and
compression) to Linus git master from yesterday (3.2-rc4+ 09d9673d53005)
that contains the btrfs for-linus as of yesterday. We've been seeing a
few warnings and bugs:

------------[ cut here ]------------
WARNING: at mm/page-writeback.c:1763 __set_page_dirty_nobuffers+0x17b/0x190()
Hardware name: PowerEdge 1950
Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2
Pid: 14299, comm: btrfs-delalloc- Tainted: G        W    3.2.0-rc4-hw+ #71
Call Trace:
 [<ffffffff810dec2b>] ? __set_page_dirty_nobuffers+0x17b/0x190
 [<ffffffff8105b050>] warn_slowpath_common+0x80/0xc0
 [<ffffffff8105b0a5>] warn_slowpath_null+0x15/0x20
 [<ffffffff810dec2b>] __set_page_dirty_nobuffers+0x17b/0x190
 [<ffffffff81303b95>] compress_file_range+0x535/0x5e0
 [<ffffffff811174ee>] ? kfree+0xee/0x120
 [<ffffffff81303c70>] async_cow_start+0x30/0x50
 [<ffffffff813220a3>] worker_loop+0x173/0x530
 [<ffffffff81321f30>] ? btrfs_queue_worker+0x310/0x310
 [<ffffffff81321f30>] ? btrfs_queue_worker+0x310/0x310
 [<ffffffff8107c7f6>] kthread+0x96/0xb0
 [<ffffffff816e09b4>] kernel_thread_helper+0x4/0x10
 [<ffffffff8107c760>] ? kthread_worker_fn+0x190/0x190
 [<ffffffff816e09b0>] ? gs_change+0x13/0x13
---[ end trace 52453f1ad38744b8 ]---

(several hours later)

------------[ cut here ]------------
kernel BUG at fs/btrfs/inode.c:1587!
invalid opcode: 0000 [#1] SMP
CPU 2
Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2

Pid: 4477, comm: btrfs-fixup-0 Tainted: G        W    3.2.0-rc4-hw+ #71 Dell Inc. PowerEdge 1950/0NK937
RIP: 0010:[<ffffffff812fbe10>]  [<ffffffff812fbe10>] btrfs_writepage_fixup_worker+0x160/0x170
RSP: 0018:ffff88040ff1dde0  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000013d6000 RCX: 0000000000000000
RDX: 0000000000000065 RSI: 00000000013d6000 RDI: ffff8800996fe8e0
RBP: ffff88040ff1de30 R08: ffff88040ff1dd34 R09: ffff88040ff1dda0
R10: dead000000200200 R11: 0000000000000000 R12: ffffea000ea54840
R13: 00000000013d6fff R14: ffff8800996fe9b0 R15: ffff8800996fe850
FS:  0000000000000000(0000) GS:ffff88043fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000002051c80 CR3: 0000000427492000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process btrfs-fixup-0 (pid: 4477, threadinfo ffff88040ff1c000, task ffff8804135b9630)
Stack:
 ffffffff8106b2c0 ffff880261a9bae0 0000000000000000 0000000000000286
 ffff8801b10103f0 ffff880261a9bae8 ffff880261a9bb10 ffff880412f606c0
 ffff880412f60710 ffff880412f606d8 ffff88040ff1dee0 ffffffff813220a3
Call Trace:
 [<ffffffff8106b2c0>] ? del_timer+0xd0/0xd0
 [<ffffffff813220a3>] worker_loop+0x173/0x530
 [<ffffffff81321f30>] ? btrfs_queue_worker+0x310/0x310
 [<ffffffff81321f30>] ? btrfs_queue_worker+0x310/0x310
 [<ffffffff8107c7f6>] kthread+0x96/0xb0
 [<ffffffff816e09b4>] kernel_thread_helper+0x4/0x10
 [<ffffffff8107c760>] ? kthread_worker_fn+0x190/0x190
 [<ffffffff816e09b0>] ? gs_change+0x13/0x13
Code: 5d 41 5e 41 5f c9 c3 0f 1f 40 00 48 8d 4d d0 41 b8 50 00 00 00 4c 89 ea 48 89 de 4c 89 ff e8 f8 c1 01 00 eb ba 66 0f 1f 44 00 00 <0f> 0b eb fe 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41
RIP  [<ffffffff812fbe10>] btrfs_writepage_fixup_worker+0x160/0x170
 RSP <ffff88040ff1dde0>
---[ end trace 52453f1ad38744b9 ]---

Simon-

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: warning and bug on 3.2-rc4 + for-linus from yesterday
  2011-12-09 20:39 warning and bug on 3.2-rc4 + for-linus from yesterday Simon Kirby
@ 2011-12-10  2:21 ` Simon Kirby
  2011-12-12  0:30 ` David Sterba
  1 sibling, 0 replies; 4+ messages in thread
From: Simon Kirby @ 2011-12-10  2:21 UTC (permalink / raw)
  To: linux-btrfs

On Fri, Dec 09, 2011 at 12:39:48PM -0800, Simon Kirby wrote:

> Hello!
> 
> We recently upgraded our backup server kernel (rsync with snapshots and
> compression) to Linus git master from yesterday (3.2-rc4+ 09d9673d53005)
> that contains the btrfs for-linus as of yesterday. We've been seeing a
> few warnings and bugs:

Then it kept pinging but didn't accept SSH anymore, with this captured
via serial console:

[79214.481458] ------------[ cut here ]------------
[79214.485335] kernel BUG at fs/btrfs/inode.c:2893!
[79214.485335] invalid opcode: 0000 [#2] SMP
[79214.485335] CPU 0
[79214.485335] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2
[79214.485335]
[79214.485335] Pid: 24202, comm: btrfsctl Tainted: G      D W    3.2.0-rc4-hw+ #71 Dell Inc. PowerEdge 1950/0NK937
[79214.485335] RIP: 0010:[<ffffffff81304d88>]  [<ffffffff81304d88>] btrfs_unlink_subvol+0x268/0x270
[79214.485335] RSP: 0018:ffff880344babd28  EFLAGS: 00010286
[79214.485335] RAX: 00000000ffffffe4 RBX: 0000000000000c46 RCX: ffff880336fd1588
[79214.485335] RDX: 00000000ffffffe4 RSI: 0000000000000000 RDI: ffff880336fd15a8
[79214.485335] RBP: ffff880344babda8 R08: 0000000000000000 R09: 0000000000000000
[79214.485335] R10: 0000000000000000 R11: 9000000000000001 R12: ffff880405cf5e88
[79214.485335] R13: ffff880428a9ba20 R14: ffff880405158c00 R15: 0000000000000100
[79214.485335] FS:  00007f27ff13d740(0000) GS:ffff88043fc00000(0000) knlGS:0000000000000000
[79214.485335] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[79214.485335] CR2: 00007fffdf79f950 CR3: 00000003f79fe000 CR4: 00000000000006f0
[79214.485335] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[79214.485335] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[79214.485335] Process btrfsctl (pid: 24202, threadinfo ffff880344baa000, task ffff8803dcec0000)
[79214.485335] Stack:
[79214.485335]  ffff8804037d53f8 ffff880300000010 0000001044babd58 ffff8804037d53f8
[79214.485335]  00000000000008a0 ffff8803fd8b43f0 00000000000008a0 ffffffffffffff84
[79214.485335]  00000000000000ff 0000000000000268 ffff880037e73008 0000000000000000
[79214.485335] Call Trace:
[79214.485335]  [<ffffffff81323975>] btrfs_ioctl_snap_destroy+0x3b5/0x480
[79214.485335]  [<ffffffff81326ef2>] btrfs_ioctl+0x3a2/0x10d0
[79214.485335]  [<ffffffff816da594>] ? do_page_fault+0x254/0x4b0
[79214.485335]  [<ffffffff8112cdc0>] do_vfs_ioctl+0xa0/0x520
[79214.485335]  [<ffffffff8112d28a>] sys_ioctl+0x4a/0x80
[79214.485335]  [<ffffffff816de8d2>] system_call_fastpath+0x16/0x1b
[79214.485335] Code: 48 8d 54 92 65 e8 89 f2 00 00 48 8b 5d b9 4c 89 ef e8 4d 2c fd ff 48 89 5d c8 e9 ca fe ff ff 0f 0b eb fe 0f 0b eb fe 0f 1f 40 00 <0f> 0b eb fe 0f 0b eb fe 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c
[79214.485335] RIP  [<ffffffff81304d88>] btrfs_unlink_subvol+0x268/0x270
[79214.485335]  RSP <ffff880344babd28>
[79214.700401] ---[ end trace 52453f1ad38744ba ]---

Simon-

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: warning and bug on 3.2-rc4 + for-linus from yesterday
  2011-12-09 20:39 warning and bug on 3.2-rc4 + for-linus from yesterday Simon Kirby
  2011-12-10  2:21 ` Simon Kirby
@ 2011-12-12  0:30 ` David Sterba
  2011-12-13  0:18   ` Simon Kirby
  1 sibling, 1 reply; 4+ messages in thread
From: David Sterba @ 2011-12-12  0:30 UTC (permalink / raw)
  To: Simon Kirby; +Cc: linux-btrfs

Hi,

On Fri, Dec 09, 2011 at 12:39:48PM -0800, Simon Kirby wrote:
> ------------[ cut here ]------------
> WARNING: at mm/page-writeback.c:1763 __set_page_dirty_nobuffers+0x17b/0x190()
> Hardware name: PowerEdge 1950
> Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2
> Pid: 14299, comm: btrfs-delalloc- Tainted: G        W    3.2.0-rc4-hw+ #71
> Call Trace:
>  [<ffffffff810dec2b>] ? __set_page_dirty_nobuffers+0x17b/0x190
>  [<ffffffff8105b050>] warn_slowpath_common+0x80/0xc0
>  [<ffffffff8105b0a5>] warn_slowpath_null+0x15/0x20
>  [<ffffffff810dec2b>] __set_page_dirty_nobuffers+0x17b/0x190
>  [<ffffffff81303b95>] compress_file_range+0x535/0x5e0
>  [<ffffffff811174ee>] ? kfree+0xee/0x120
>  [<ffffffff81303c70>] async_cow_start+0x30/0x50
>  [<ffffffff813220a3>] worker_loop+0x173/0x530
>  [<ffffffff81321f30>] ? btrfs_queue_worker+0x310/0x310
>  [<ffffffff81321f30>] ? btrfs_queue_worker+0x310/0x310
>  [<ffffffff8107c7f6>] kthread+0x96/0xb0
>  [<ffffffff816e09b4>] kernel_thread_helper+0x4/0x10
>  [<ffffffff8107c760>] ? kthread_worker_fn+0x190/0x190
>  [<ffffffff816e09b0>] ? gs_change+0x13/0x13
> ---[ end trace 52453f1ad38744b8 ]---
> 
> (several hours later)

1761                 if (mapping2) { /* Race with truncate? */
1762                         BUG_ON(mapping2 != mapping);
1763                         WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
^^^^
1764                         account_page_dirtied(page, mapping);
1765                         radix_tree_tag_set(&mapping->page_tree,
1766                                 page_index(page), PAGECACHE_TAG_DIRTY);
1767                 }

The warning pops up just the first time, so I think it may happen more
often, would be interesting to verify this.

> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/inode.c:1587!
> invalid opcode: 0000 [#1] SMP
> CPU 2
> Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2
> 
> Pid: 4477, comm: btrfs-fixup-0 Tainted: G        W    3.2.0-rc4-hw+ #71 Dell Inc. PowerEdge 1950/0NK937
> RIP: 0010:[<ffffffff812fbe10>]  [<ffffffff812fbe10>] btrfs_writepage_fixup_worker+0x160/0x170

I was chasing this BUG last week with 3.1+cmason/for-linus and I'm able
to trigger it quite reliably with looped xfstests/209 and low writeback
activity like looped make all & clean in kernel tree (reliably means
like from 10 minutes to 1 day).

Chris told me to instrument SetPageDirty and based on a similar tracing
patchset (http://lwn.net/Articles/315511/) I tried but to no avail, the
instrumentation must have significantly changed timing and the bug did not fire
at all.

I skip the details of my debugging results for now, the first
stacktrace contains something I was hoping for :)

async_cow_start -- calls compress_file_range directly,

compress_file_range+0x535
(gdb) l *(compress_file_range+0x535)
0x31605 is in compress_file_range (fs/btrfs/inode.c:540).
535                      * for the async work queue to run cow_file_range to do
536                      * the normal delalloc dance
537                      */
538                     if (page_offset(locked_page) >= start &&
539                         page_offset(locked_page) <= end) {
540                             __set_page_dirty_nobuffers(locked_page);
541                             /* unlocked later on in the async handlers */
542                     }
543                     add_async_extent(async_cow, start, end - start + 1,
544                                      0, NULL, 0, BTRFS_COMPRESS_NONE);

this code is reached from (or in a general case when compression is not done):

 356         /*
 357          * we don't want to send crud past the end of i_size through
 358          * compression, that's just a waste of CPU time.  So, if the
 359          * end of the file is before the start of our current
 360          * requested range of bytes, we bail out to the uncompressed
 361          * cleanup code that can deal with all of this.
 362          *
 363          * It isn't really the fastest way to fix things, but this is a
 364          * very uncommon corner.
 365          */
 366         if (actual_end <= start)
 367                 goto cleanup_and_bail_uncompressed;

... but seems that 'uncommon' happens and could trigger the bug in fixup
worker if there is some race with truncate.

I'll try to put together details and logs from my debugging.


david

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: warning and bug on 3.2-rc4 + for-linus from yesterday
  2011-12-12  0:30 ` David Sterba
@ 2011-12-13  0:18   ` Simon Kirby
  0 siblings, 0 replies; 4+ messages in thread
From: Simon Kirby @ 2011-12-13  0:18 UTC (permalink / raw)
  To: linux-btrfs

Hello!

On Mon, Dec 12, 2011 at 01:30:31AM +0100, David Sterba wrote:

> On Fri, Dec 09, 2011 at 12:39:48PM -0800, Simon Kirby wrote:
> > ------------[ cut here ]------------
> > WARNING: at mm/page-writeback.c:1763 __set_page_dirty_nobuffers+0x17b/0x190()
> > Hardware name: PowerEdge 1950
> > Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2
> > Pid: 14299, comm: btrfs-delalloc- Tainted: G        W    3.2.0-rc4-hw+ #71
> > Call Trace:
> >  [<ffffffff810dec2b>] ? __set_page_dirty_nobuffers+0x17b/0x190
> >  [<ffffffff8105b050>] warn_slowpath_common+0x80/0xc0
> >  [<ffffffff8105b0a5>] warn_slowpath_null+0x15/0x20
> >  [<ffffffff810dec2b>] __set_page_dirty_nobuffers+0x17b/0x190
> >  [<ffffffff81303b95>] compress_file_range+0x535/0x5e0
> >  [<ffffffff811174ee>] ? kfree+0xee/0x120
> >  [<ffffffff81303c70>] async_cow_start+0x30/0x50
> >  [<ffffffff813220a3>] worker_loop+0x173/0x530
> >  [<ffffffff81321f30>] ? btrfs_queue_worker+0x310/0x310
> >  [<ffffffff81321f30>] ? btrfs_queue_worker+0x310/0x310
> >  [<ffffffff8107c7f6>] kthread+0x96/0xb0
> >  [<ffffffff816e09b4>] kernel_thread_helper+0x4/0x10
> >  [<ffffffff8107c760>] ? kthread_worker_fn+0x190/0x190
> >  [<ffffffff816e09b0>] ? gs_change+0x13/0x13
> > ---[ end trace 52453f1ad38744b8 ]---
> > 
> > (several hours later)
> 
> 1761                 if (mapping2) { /* Race with truncate? */
> 1762                         BUG_ON(mapping2 != mapping);
> 1763                         WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
> ^^^^
> 1764                         account_page_dirtied(page, mapping);
> 1765                         radix_tree_tag_set(&mapping->page_tree,
> 1766                                 page_index(page), PAGECACHE_TAG_DIRTY);
> 1767                 }
> 
> The warning pops up just the first time, so I think it may happen more
> often, would be interesting to verify this.

We'll try again with s/WARN_ON_ONCE/WARN_ON/ here.

Meanwhile, other problems have come up on this kernel, like a BUG() on
snapshot delete:

------------[ cut here ]------------
kernel BUG at fs/btrfs/inode.c:2893!
invalid opcode: 0000 [#1] SMP
CPU 1
Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2

Pid: 20842, comm: btrfsctl Tainted: G        W    3.2.0-rc4-hw+ #71 Dell Inc. PowerEdge 1950/0NK937
RIP: 0010:[<ffffffff81304d88>]  [<ffffffff81304d88>] btrfs_unlink_subvol+0x268/0x270
RSP: 0018:ffff88008eca7d28  EFLAGS: 00010286
RAX: 00000000ffffffe4 RBX: 0000000000000c46 RCX: ffff8800922a2928
RDX: 00000000ffffffe4 RSI: 0000000000000000 RDI: ffff8800922a2948
RBP: ffff88008eca7da8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 9000000000000001 R12: ffff880075b20548
R13: ffff88042876aab0 R14: ffff88026d2f3400 R15: 0000000000000100
FS:  00007ffc81a2c740(0000) GS:ffff88043fc40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007ffc813e8480 CR3: 00000002e7a92000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process btrfsctl (pid: 20842, threadinfo ffff88008eca6000, task ffff8803d8239630)
Stack:
 ffff8802080323f8 ffff880300000010 000000108eca7d58 ffff8802080323f8
 00000000000008a0 ffff8803fda1f708 00000000000008a0 ffffffffffffff84
 00000000000000ff 0000000000000268 ffff880088788008 0000000000000000
Call Trace:
 [<ffffffff81323975>] btrfs_ioctl_snap_destroy+0x3b5/0x480
 [<ffffffff81326ef2>] btrfs_ioctl+0x3a2/0x10d0
 [<ffffffff816da594>] ? do_page_fault+0x254/0x4b0
 [<ffffffff8112cdc0>] do_vfs_ioctl+0xa0/0x520
 [<ffffffff8112d28a>] sys_ioctl+0x4a/0x80
 [<ffffffff816de8d2>] system_call_fastpath+0x16/0x1b
Code: 48 8d 54 92 65 e8 89 f2 00 00 48 8b 5d b9 4c 89 ef e8 4d 2c fd ff 48 89 5d c8 e9 ca fe ff ff 0f 0b eb fe 0f 0b eb fe 0f 1f 40 00 <0f> 0b eb fe 0f 0b eb fe 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c
RIP  [<ffffffff81304d88>] btrfs_unlink_subvol+0x268/0x270
 RSP <ffff88008eca7d28>
---[ end trace 485334ee1ecd2cc8 ]---

After this, we hit a number of other BUG() and WARN() cases during
mounting on the next boot:

kernel BUG at fs/btrfs/extent-tree.c:6343!
WARNING: at fs/btrfs/extent-tree.c:4754 __btrfs_free_extent+0x627/0x720()
WARNING: at fs/btrfs/extent-tree.c:4801 __btrfs_free_extent+0x6cc/0x720()
kernel BUG at fs/btrfs/extent-tree.c:2287!

I posted them here if they are of any use:

	http://0x.ca/sim/ref/3.2-rc4/

We'll probably go back to 2.6.38 again for now, the latest mostly-working
kernel on this box.

Simon-

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-12-13  0:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-09 20:39 warning and bug on 3.2-rc4 + for-linus from yesterday Simon Kirby
2011-12-10  2:21 ` Simon Kirby
2011-12-12  0:30 ` David Sterba
2011-12-13  0:18   ` Simon Kirby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).