All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Torsten Kaiser <just.for.lkml@googlemail.com>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>, xfs@oss.sgi.com
Subject: Re: Hang in XFS reclaim on 3.7.0-rc3
Date: Tue, 20 Nov 2012 10:53:06 +1100	[thread overview]
Message-ID: <20121119235306.GX14281@dastard> (raw)
In-Reply-To: <CAPVoSvR+Gk6ggSZ6=ZMpyvwhosjd4BSGsUqRT=txkzGDGLMTPw@mail.gmail.com>

On Mon, Nov 19, 2012 at 07:50:06AM +0100, Torsten Kaiser wrote:
> On Mon, Nov 19, 2012 at 12:51 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Sun, Nov 18, 2012 at 04:29:22PM +0100, Torsten Kaiser wrote:
> >> On Sun, Nov 18, 2012 at 11:24 AM, Torsten Kaiser
> >> <just.for.lkml@googlemail.com> wrote:
> >> > On Tue, Oct 30, 2012 at 9:37 PM, Torsten Kaiser
> >> > <just.for.lkml@googlemail.com> wrote:
> >> >> I will keep LOCKDEP enabled on that system, and if there really is
> >> >> another splat, I will report back here. But I rather doubt that this
> >> >> will be needed.
> >> >
> >> > After the patch, I did not see this problem again, but today I found
> >> > another LOCKDEP report that also looks XFS related.
> >> > I found it twice in the logs, and as both were slightly different, I
> >> > will attach both versions.
> >>
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104353] 3.7.0-rc4 #1 Not tainted
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104355] inconsistent
> >> > {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104430]        CPU0
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104431]        ----
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104432]   lock(&(&ip->i_lock)->mr_lock);
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104433]   <Interrupt>
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104434]
> >> > lock(&(&ip->i_lock)->mr_lock);
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104435]
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104435]  *** DEADLOCK ***
> >>
> >> Sorry! Copied the wrong report. Your fix only landed in -rc5, so my
> >> vanilla -rc4 did (also) report the old problem again.
> >> And I copy&pasted that report instead of the second appearance of the
> >> new problem.
> >
> > Can you repost it with line wrapping turned off? The output simply
> > becomes unreadable when it wraps....
> >
> > Yeah, I know I can put it back together, but I've got better things
> > to do with my time than stitch a couple of hundred lines of debug
> > back into a readable format....
> 
> Sorry about that, but I can't find any option to turn that off in Gmail.

Seems like you can't, as per Documentation/email-clients.txt

> I have added the reports as attachment, I hope thats OK for you.

Encoded as text, so it does.

So, both lockdep thingy's are the same:

> [110926.972482] =========================================================
> [110926.972484] [ INFO: possible irq lock inversion dependency detected ]
> [110926.972486] 3.7.0-rc4 #1 Not tainted
> [110926.972487] ---------------------------------------------------------
> [110926.972489] kswapd0/725 just changed the state of lock:
> [110926.972490]  (sb_internal){.+.+.?}, at: [<ffffffff8122b268>] xfs_trans_alloc+0x28/0x50
> [110926.972499] but this lock took another, RECLAIM_FS-unsafe lock in the past:
> [110926.972500]  (&(&ip->i_lock)->mr_lock/1){+.+.+.}

Ah, what? Since when has the ilock been reclaim unsafe?

> [110926.972500] and interrupts could create inverse lock ordering between them.
> [110926.972500] 
> [110926.972503] 
> [110926.972503] other info that might help us debug this:
> [110926.972504]  Possible interrupt unsafe locking scenario:
> [110926.972504] 
> [110926.972505]        CPU0                    CPU1
> [110926.972506]        ----                    ----
> [110926.972507]   lock(&(&ip->i_lock)->mr_lock/1);
> [110926.972509]                                local_irq_disable();
> [110926.972509]                                lock(sb_internal);
> [110926.972511]                                lock(&(&ip->i_lock)->mr_lock/1);
> [110926.972512]   <Interrupt>
> [110926.972513]     lock(sb_internal);

Um, that's just bizzare. No XFS code runs with interrupts disabled,
so I cannot see how this possible.

.....


       [<ffffffff8108137e>] mark_held_locks+0x7e/0x130
       [<ffffffff81081a63>] lockdep_trace_alloc+0x63/0xc0
       [<ffffffff810e9dd5>] kmem_cache_alloc+0x35/0xe0
       [<ffffffff810dba31>] vm_map_ram+0x271/0x770
       [<ffffffff811e1316>] _xfs_buf_map_pages+0x46/0xe0
       [<ffffffff811e222a>] xfs_buf_get_map+0x8a/0x130
       [<ffffffff81233ab9>] xfs_trans_get_buf_map+0xa9/0xd0
       [<ffffffff8121bced>] xfs_ialloc_inode_init+0xcd/0x1d0

We shouldn't be mapping buffers there, there's a patch below to fix
this. It's probably the source of this report, even though I cannot
lockdep seems to be off with the fairies...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

xfs: inode allocation should use unmapped buffers.

From: Dave Chinner <dchinner@redhat.com>

Inode buffers do not need to be mapped as inodes are read or written
directly from/to the pages underlying the buffer. This fixes a
regression introduced by commit 611c994 ("xfs: make XBF_MAPPED the
default behaviour").

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_ialloc.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
index 2d6495e..a815412 100644
--- a/fs/xfs/xfs_ialloc.c
+++ b/fs/xfs/xfs_ialloc.c
@@ -200,7 +200,8 @@ xfs_ialloc_inode_init(
 		 */
 		d = XFS_AGB_TO_DADDR(mp, agno, agbno + (j * blks_per_cluster));
 		fbuf = xfs_trans_get_buf(tp, mp->m_ddev_targp, d,
-					 mp->m_bsize * blks_per_cluster, 0);
+					 mp->m_bsize * blks_per_cluster,
+					 XBF_UNMAPPED);
 		if (!fbuf)
 			return ENOMEM;
 		/*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Torsten Kaiser <just.for.lkml@googlemail.com>
Cc: xfs@oss.sgi.com, Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: Hang in XFS reclaim on 3.7.0-rc3
Date: Tue, 20 Nov 2012 10:53:06 +1100	[thread overview]
Message-ID: <20121119235306.GX14281@dastard> (raw)
In-Reply-To: <CAPVoSvR+Gk6ggSZ6=ZMpyvwhosjd4BSGsUqRT=txkzGDGLMTPw@mail.gmail.com>

On Mon, Nov 19, 2012 at 07:50:06AM +0100, Torsten Kaiser wrote:
> On Mon, Nov 19, 2012 at 12:51 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Sun, Nov 18, 2012 at 04:29:22PM +0100, Torsten Kaiser wrote:
> >> On Sun, Nov 18, 2012 at 11:24 AM, Torsten Kaiser
> >> <just.for.lkml@googlemail.com> wrote:
> >> > On Tue, Oct 30, 2012 at 9:37 PM, Torsten Kaiser
> >> > <just.for.lkml@googlemail.com> wrote:
> >> >> I will keep LOCKDEP enabled on that system, and if there really is
> >> >> another splat, I will report back here. But I rather doubt that this
> >> >> will be needed.
> >> >
> >> > After the patch, I did not see this problem again, but today I found
> >> > another LOCKDEP report that also looks XFS related.
> >> > I found it twice in the logs, and as both were slightly different, I
> >> > will attach both versions.
> >>
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104353] 3.7.0-rc4 #1 Not tainted
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104355] inconsistent
> >> > {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104430]        CPU0
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104431]        ----
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104432]   lock(&(&ip->i_lock)->mr_lock);
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104433]   <Interrupt>
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104434]
> >> > lock(&(&ip->i_lock)->mr_lock);
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104435]
> >> > Nov  6 21:57:09 thoregon kernel: [ 9941.104435]  *** DEADLOCK ***
> >>
> >> Sorry! Copied the wrong report. Your fix only landed in -rc5, so my
> >> vanilla -rc4 did (also) report the old problem again.
> >> And I copy&pasted that report instead of the second appearance of the
> >> new problem.
> >
> > Can you repost it with line wrapping turned off? The output simply
> > becomes unreadable when it wraps....
> >
> > Yeah, I know I can put it back together, but I've got better things
> > to do with my time than stitch a couple of hundred lines of debug
> > back into a readable format....
> 
> Sorry about that, but I can't find any option to turn that off in Gmail.

Seems like you can't, as per Documentation/email-clients.txt

> I have added the reports as attachment, I hope thats OK for you.

Encoded as text, so it does.

So, both lockdep thingy's are the same:

> [110926.972482] =========================================================
> [110926.972484] [ INFO: possible irq lock inversion dependency detected ]
> [110926.972486] 3.7.0-rc4 #1 Not tainted
> [110926.972487] ---------------------------------------------------------
> [110926.972489] kswapd0/725 just changed the state of lock:
> [110926.972490]  (sb_internal){.+.+.?}, at: [<ffffffff8122b268>] xfs_trans_alloc+0x28/0x50
> [110926.972499] but this lock took another, RECLAIM_FS-unsafe lock in the past:
> [110926.972500]  (&(&ip->i_lock)->mr_lock/1){+.+.+.}

Ah, what? Since when has the ilock been reclaim unsafe?

> [110926.972500] and interrupts could create inverse lock ordering between them.
> [110926.972500] 
> [110926.972503] 
> [110926.972503] other info that might help us debug this:
> [110926.972504]  Possible interrupt unsafe locking scenario:
> [110926.972504] 
> [110926.972505]        CPU0                    CPU1
> [110926.972506]        ----                    ----
> [110926.972507]   lock(&(&ip->i_lock)->mr_lock/1);
> [110926.972509]                                local_irq_disable();
> [110926.972509]                                lock(sb_internal);
> [110926.972511]                                lock(&(&ip->i_lock)->mr_lock/1);
> [110926.972512]   <Interrupt>
> [110926.972513]     lock(sb_internal);

Um, that's just bizzare. No XFS code runs with interrupts disabled,
so I cannot see how this possible.

.....


       [<ffffffff8108137e>] mark_held_locks+0x7e/0x130
       [<ffffffff81081a63>] lockdep_trace_alloc+0x63/0xc0
       [<ffffffff810e9dd5>] kmem_cache_alloc+0x35/0xe0
       [<ffffffff810dba31>] vm_map_ram+0x271/0x770
       [<ffffffff811e1316>] _xfs_buf_map_pages+0x46/0xe0
       [<ffffffff811e222a>] xfs_buf_get_map+0x8a/0x130
       [<ffffffff81233ab9>] xfs_trans_get_buf_map+0xa9/0xd0
       [<ffffffff8121bced>] xfs_ialloc_inode_init+0xcd/0x1d0

We shouldn't be mapping buffers there, there's a patch below to fix
this. It's probably the source of this report, even though I cannot
lockdep seems to be off with the fairies...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

xfs: inode allocation should use unmapped buffers.

From: Dave Chinner <dchinner@redhat.com>

Inode buffers do not need to be mapped as inodes are read or written
directly from/to the pages underlying the buffer. This fixes a
regression introduced by commit 611c994 ("xfs: make XBF_MAPPED the
default behaviour").

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_ialloc.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
index 2d6495e..a815412 100644
--- a/fs/xfs/xfs_ialloc.c
+++ b/fs/xfs/xfs_ialloc.c
@@ -200,7 +200,8 @@ xfs_ialloc_inode_init(
 		 */
 		d = XFS_AGB_TO_DADDR(mp, agno, agbno + (j * blks_per_cluster));
 		fbuf = xfs_trans_get_buf(tp, mp->m_ddev_targp, d,
-					 mp->m_bsize * blks_per_cluster, 0);
+					 mp->m_bsize * blks_per_cluster,
+					 XBF_UNMAPPED);
 		if (!fbuf)
 			return ENOMEM;
 		/*

  reply	other threads:[~2012-11-19 23:51 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-29 20:03 Hang in XFS reclaim on 3.7.0-rc3 Torsten Kaiser
2012-10-29 20:03 ` Torsten Kaiser
2012-10-29 22:26 ` Dave Chinner
2012-10-29 22:26   ` Dave Chinner
2012-10-29 22:41   ` Dave Chinner
2012-10-29 22:41     ` Dave Chinner
2012-10-29 22:41     ` Dave Chinner
2012-10-30 20:37   ` Torsten Kaiser
2012-10-30 20:37     ` Torsten Kaiser
2012-10-30 20:46     ` Christoph Hellwig
2012-10-30 20:46       ` Christoph Hellwig
2012-11-18 10:24     ` Torsten Kaiser
2012-11-18 10:24       ` Torsten Kaiser
2012-11-18 15:29       ` Torsten Kaiser
2012-11-18 15:29         ` Torsten Kaiser
2012-11-18 23:51         ` Dave Chinner
2012-11-18 23:51           ` Dave Chinner
2012-11-19  6:50           ` Torsten Kaiser
2012-11-19  6:50             ` Torsten Kaiser
2012-11-19 23:53             ` Dave Chinner [this message]
2012-11-19 23:53               ` Dave Chinner
2012-11-20  7:09               ` Torsten Kaiser
2012-11-20  7:09                 ` Torsten Kaiser
2012-11-20 19:45               ` Torsten Kaiser
2012-11-20 19:45                 ` Torsten Kaiser
2012-11-20 20:27                 ` Dave Chinner
2012-11-20 20:27                   ` Dave Chinner
2012-11-01 21:30   ` Ben Myers
2012-11-01 21:30     ` Ben Myers
2012-11-01 22:32     ` Dave Chinner
2012-11-01 22:32       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121119235306.GX14281@dastard \
    --to=david@fromorbit.com \
    --cc=just.for.lkml@googlemail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.