All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: Re: [bug, 2.6.37-current] Assertion failed: atomic_read(&pag->pag_ref) == 0
Date: Fri, 5 Nov 2010 10:00:37 +1100	[thread overview]
Message-ID: <20101104230037.GD13830@dastard> (raw)
In-Reply-To: <20101026071356.GY32255@dastard>

On Tue, Oct 26, 2010 at 06:13:56PM +1100, Dave Chinner wrote:
> Folks,
> 
> Since themainline merge, I've been getting unmount failures during
> shutdown that look like:
> 
> Unmounting local filesystems...done.
> Shutting down LVM Volume Groups[ 7088.820123] Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 259
> [ 7088.821811] ------------[ cut here ]------------
> [ 7088.822594] kernel BUG at fs/xfs/support/debug.c:108!
> [ 7088.823383] invalid opcode: 0000 [#1] SMP 
> [ 7088.824019] last sysfs file: /sys/devices/system/node/node0/cpumap
> [ 7088.824045] CPU 1 
> [ 7088.824045] Modules linked in:
> [ 7088.824045] 
> [ 7088.824045] Pid: 0, comm: kworker/0:0 Not tainted 2.6.36-dgc+ #587 /Bochs
> [ 7088.824045] RIP: 0010:[<ffffffff814b74cf>]  [<ffffffff814b74cf>] assfail+0x1f/0x30
> [ 7088.824045] RSP: 0018:ffff8800df003e50  EFLAGS: 00010286
> [ 7088.824045] RAX: 0000000000000069 RBX: ffff88011760a400 RCX: 0000000000000001
> [ 7088.824045] RDX: ffff88011b7742c0 RSI: 0000000000000001 RDI: 0000000000000246
> [ 7088.824045] RBP: ffff8800df003e50 R08: 0000000000000001 R09: 0000000000000001
> [ 7088.824045] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff81ef8f00
> [ 7088.824045] R13: ffff880117118df8 R14: ffff8800df1cecf0 R15: ffff880116ebf6e8
> [ 7088.824045] FS:  0000000000000000(0000) GS:ffff8800df000000(0000) knlGS:0000000000000000
> [ 7088.824045] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 7088.824045] CR2: 00007ffd8c8b6990 CR3: 0000000001edb000 CR4: 00000000000006e0
> [ 7088.824045] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 7088.824045] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 7088.824045] Process kworker/0:0 (pid: 0, threadinfo ffff88011b776000, task ffff88011b7742c0)
> [ 7088.824045] Stack:
> [ 7088.824045]  ffff8800df003e70 ffffffff81499007 ffff8800df003e70 ffff8800df1cecc0
> [ 7088.824045] <0> ffff8800df003ed0 ffffffff810e900a 0000000000000001 000000000000000a
> [ 7088.824045] <0> ffff880100000006 0000000000000202 0000000000000100 0000000000000048
> [ 7088.824045] Call Trace:
> [ 7088.824045]  <IRQ> 
> [ 7088.824045]  [<ffffffff81499007>] __xfs_free_perag+0x37/0x50
> [ 7088.824045]  [<ffffffff810e900a>] __rcu_process_callbacks+0x13a/0x3e0
> [ 7088.824045]  [<ffffffff810e92d8>] rcu_process_callbacks+0x28/0x50
> [ 7088.824045]  [<ffffffff8108848d>] __do_softirq+0xcd/0x290
> [ 7088.824045]  [<ffffffff810a8808>] ? hrtimer_interrupt+0x138/0x250
> [ 7088.824045]  [<ffffffff81037f5c>] call_softirq+0x1c/0x50
> [ 7088.824045]  [<ffffffff810398dd>] do_softirq+0x9d/0xd0
> [ 7088.824045]  [<ffffffff810881e5>] irq_exit+0x95/0xa0
> [ 7088.824045]  [<ffffffff81b06380>] smp_apic_timer_interrupt+0x70/0x9b
> [ 7088.824045]  [<ffffffff81037a13>] apic_timer_interrupt+0x13/0x20
> [ 7088.824045]  <EOI> 
> [ 7088.824045]  [<ffffffff81060f6b>] ? native_safe_halt+0xb/0x10
> [ 7088.824045]  [<ffffffff810baded>] ? trace_hardirqs_on+0xd/0x10
> [ 7088.824045]  [<ffffffff8103fd70>] default_idle+0x50/0xb0
> [ 7088.824045]  [<ffffffff81035e28>] cpu_idle+0x78/0x100
> [ 7088.824045]  [<ffffffff81af627b>] start_secondary+0x1ac/0x1b1
> [ 7088.824045] Code: 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 31 c0 89 d1 48 89 f2 48 89 fe 48 c7 c7 08 38 df 81 e8 7b 34 64 00 <0f> 0b eb fe 66 66 66 66 2e  
> [ 7088.824045] RIP  [<ffffffff814b74cf>] assfail+0x1f/0x30
> [ 7088.824045]  RSP <ffff8800df003e50>
> [ 7088.863091] ---[ end trace ec76f8135c3adba9 ]---
> 
> I'm not seeing failures during xfstests runs, it seems that dbench may be the
> trigger.  Is anyone else seeing reference counting problems like this on the
> current linus tree?

Ok, found the bug - it's in the reclaim scalability patchset that
was merged into .37-rc1 - when the shrinker skips a locked AG it
misseѕ a xfs_perag_put() call.  I'll push out a patch soon.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

      parent reply	other threads:[~2010-11-04 22:59 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-26  7:13 [bug, 2.6.37-current] Assertion failed: atomic_read(&pag->pag_ref) == 0 Dave Chinner
2010-10-28 11:58 ` Christoph Hellwig
2010-10-30 14:38   ` Christoph Hellwig
2010-11-04 23:00 ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101104230037.GD13830@dastard \
    --to=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.