* [PATCH] set freed perag structures to NULL to avoid mount failure oops
@ 2012-04-05 19:28 Eric Sandeen
2012-04-10 7:12 ` Dave Chinner
0 siblings, 1 reply; 3+ messages in thread
From: Eric Sandeen @ 2012-04-05 19:28 UTC (permalink / raw)
To: xfs-oss
We encountered a bug after log replay failed on mount:
[14217.617258] XFS (md5): log mount/recovery failed: error 5
[14217.624037] XFS (md5): log mount failed
[14234.866732] general protection fault: 0000 [#1] SMP
...
[14234.913286] RIP: 0010:[<ffffffff810c2cbe>] [<ffffffff810c2cbe>]
__lock_acquire+0x2be/0xa20
[14234.913293] RSP: 0018:ffff880116c6fa38 EFLAGS: 00010002
[14234.913294] RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: 0000000000000000
[14234.913321] Call Trace:
[14234.913335] [<ffffffff810c3aad>] lock_acquire+0x9d/0x1f0
[14234.913365] [<ffffffff8162f646>] _raw_spin_lock+0x46/0x80
[14234.913372] [<ffffffff8130e60d>] _atomic_dec_and_lock+0x4d/0x70
[14234.913382] [<ffffffffa0392801>] xfs_buf_rele+0x51/0x1e0 [xfs]
[14234.913400] [<ffffffffa039554f>] xfs_flush_buftarg+0x10f/0x130 [xfs]
[14234.913410] [<ffffffffa03955a4>] xfs_free_buftarg+0x34/0x70 [xfs]
[14234.913422] [<ffffffffa03a4634>] xfs_close_devices+0x64/0x70 [xfs]
[14234.913433] [<ffffffffa03a5890>] xfs_fs_fill_super+0x180/0x2a0 [xfs]
[14234.913436] [<ffffffff811b7a71>] mount_bdev+0x1d1/0x220
[14234.913460] [<ffffffffa03a3ad5>] xfs_fs_mount+0x15/0x20 [xfs]
[14234.913462] [<ffffffff811b8603>] mount_fs+0x43/0x1b0
[14234.913468] [<ffffffff811d473a>] vfs_kern_mount+0x6a/0xd0
[14234.913471] [<ffffffff811d6124>] do_kern_mount+0x54/0x110
[14234.913473] [<ffffffff811d7c34>] do_mount+0x1a4/0x260
[14234.913476] [<ffffffff811d8100>] sys_mount+0x90/0xe0
[14234.913478] [<ffffffff81639202>] system_call_fastpath+0x16/0x1b
RAX is freed memory.
After xfs_mountfs() fails to replay the log, it
goes to out_perag_free: which does xfs_free_perag().
Later down the mount failure path, xfs_buf_rele() checks for (!pag):
if (!pag) {
ASSERT(list_empty(&bp->b_lru));
ASSERT(RB_EMPTY_NODE(&bp->b_rbnode));
if (atomic_dec_and_test(&bp->b_hold))
xfs_buf_free(bp);
return;
}
but we did not set the perag to NULL, so this doesn't get hit.
Next we do:
if (atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock)) {
which goes bang if the perag has been freed.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
Full disclosure: not yet tested, will do soon :)
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 1ffead4..ad830f7 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -252,6 +252,7 @@ __xfs_free_perag(
ASSERT(atomic_read(&pag->pag_ref) == 0);
kmem_free(pag);
+ pag = NULL;
}
/*
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] set freed perag structures to NULL to avoid mount failure oops
2012-04-05 19:28 [PATCH] set freed perag structures to NULL to avoid mount failure oops Eric Sandeen
@ 2012-04-10 7:12 ` Dave Chinner
2012-04-10 15:31 ` Eric Sandeen
0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2012-04-10 7:12 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs-oss
On Thu, Apr 05, 2012 at 12:28:35PM -0700, Eric Sandeen wrote:
> We encountered a bug after log replay failed on mount:
>
> [14217.617258] XFS (md5): log mount/recovery failed: error 5
> [14217.624037] XFS (md5): log mount failed
> [14234.866732] general protection fault: 0000 [#1] SMP
> ...
> [14234.913286] RIP: 0010:[<ffffffff810c2cbe>] [<ffffffff810c2cbe>]
> __lock_acquire+0x2be/0xa20
> [14234.913293] RSP: 0018:ffff880116c6fa38 EFLAGS: 00010002
> [14234.913294] RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: 0000000000000000
> [14234.913321] Call Trace:
> [14234.913335] [<ffffffff810c3aad>] lock_acquire+0x9d/0x1f0
> [14234.913365] [<ffffffff8162f646>] _raw_spin_lock+0x46/0x80
> [14234.913372] [<ffffffff8130e60d>] _atomic_dec_and_lock+0x4d/0x70
> [14234.913382] [<ffffffffa0392801>] xfs_buf_rele+0x51/0x1e0 [xfs]
> [14234.913400] [<ffffffffa039554f>] xfs_flush_buftarg+0x10f/0x130 [xfs]
> [14234.913410] [<ffffffffa03955a4>] xfs_free_buftarg+0x34/0x70 [xfs]
> [14234.913422] [<ffffffffa03a4634>] xfs_close_devices+0x64/0x70 [xfs]
> [14234.913433] [<ffffffffa03a5890>] xfs_fs_fill_super+0x180/0x2a0 [xfs]
> [14234.913436] [<ffffffff811b7a71>] mount_bdev+0x1d1/0x220
> [14234.913460] [<ffffffffa03a3ad5>] xfs_fs_mount+0x15/0x20 [xfs]
> [14234.913462] [<ffffffff811b8603>] mount_fs+0x43/0x1b0
> [14234.913468] [<ffffffff811d473a>] vfs_kern_mount+0x6a/0xd0
> [14234.913471] [<ffffffff811d6124>] do_kern_mount+0x54/0x110
> [14234.913473] [<ffffffff811d7c34>] do_mount+0x1a4/0x260
> [14234.913476] [<ffffffff811d8100>] sys_mount+0x90/0xe0
> [14234.913478] [<ffffffff81639202>] system_call_fastpath+0x16/0x1b
>
>
> RAX is freed memory.
>
> After xfs_mountfs() fails to replay the log, it
> goes to out_perag_free: which does xfs_free_perag().
>
> Later down the mount failure path, xfs_buf_rele() checks for (!pag):
>
> if (!pag) {
> ASSERT(list_empty(&bp->b_lru));
> ASSERT(RB_EMPTY_NODE(&bp->b_rbnode));
> if (atomic_dec_and_test(&bp->b_hold))
> xfs_buf_free(bp);
> return;
> }
>
> but we did not set the perag to NULL, so this doesn't get hit.
Sure, bp->b_pag is only set for cached buffers, but it also has a
reference to the perag structure so it is supposed to be guaranteed
that the perag structure is valid until the buffer is freed and the
perag reference is dropped.
> Next we do:
>
> if (atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock)) {
>
> which goes bang if the perag has been freed.
Which means that the unmount has failed to flush all the active
buffers. This woul dhave resulted in an ASSERT() firing in
xfs_free_perag() on a debug kernel....
>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
>
> Full disclosure: not yet tested, will do soon :)
>
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 1ffead4..ad830f7 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -252,6 +252,7 @@ __xfs_free_perag(
>
> ASSERT(atomic_read(&pag->pag_ref) == 0);
.... right here, or even even before call_rcu() was called
> kmem_free(pag);
> + pag = NULL;
> }
As it is, I don't quite understand what setting a local variable to
NULL is supposed to acheive here - the perag structure has been
removed from the radix tree and zeroing a local variable doesn't do
anything globally visible...
What is more likely is that the log mount failure path is not
flushing the buftarg correctly before tearing down the perag
structures. Indeed, a log mount failure (and a perag data
intialisation failure) will simply tear down the perag structures
without waiting for cached buffers to be flushed in xfs_mountfs(),
so that is going to the root cause of this problem. i.e. we need
xfs_flush_buftarg() calls in the mountfs error handling path....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH] set freed perag structures to NULL to avoid mount failure oops
2012-04-10 7:12 ` Dave Chinner
@ 2012-04-10 15:31 ` Eric Sandeen
0 siblings, 0 replies; 3+ messages in thread
From: Eric Sandeen @ 2012-04-10 15:31 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs-oss
On 4/10/12 2:12 AM, Dave Chinner wrote:
> On Thu, Apr 05, 2012 at 12:28:35PM -0700, Eric Sandeen wrote:
...
>> kmem_free(pag);
>> + pag = NULL;
>> }
>
> As it is, I don't quite understand what setting a local variable to
> NULL is supposed to acheive here - the perag structure has been
> removed from the radix tree and zeroing a local variable doesn't do
> anything globally visible...
er, right. Ok.
> What is more likely is that the log mount failure path is not
> flushing the buftarg correctly before tearing down the perag
> structures. Indeed, a log mount failure (and a perag data
> intialisation failure) will simply tear down the perag structures
> without waiting for cached buffers to be flushed in xfs_mountfs(),
> so that is going to the root cause of this problem. i.e. we need
> xfs_flush_buftarg() calls in the mountfs error handling path....
Ok, makes sense.
-Eric
> Cheers,
>
> Dave.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-04-10 17:17 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-05 19:28 [PATCH] set freed perag structures to NULL to avoid mount failure oops Eric Sandeen
2012-04-10 7:12 ` Dave Chinner
2012-04-10 15:31 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox