From: Lachlan McIlroy <lachlan@sgi.com>
To: Carlos Carvalho <carlos@fisica.ufpr.br>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: possible deadlock shown by CONFIG_PROVE_LOCKING
Date: Mon, 04 Feb 2008 16:56:52 +1100 [thread overview]
Message-ID: <47A6A924.5010601@sgi.com> (raw)
In-Reply-To: <18332.63985.756005.37224@fisica.ufpr.br>
Carlos Carvalho wrote:
> Lachlan McIlroy (lachlan@sgi.com) wrote on 25 January 2008 13:00:
> >Carlos Carvalho wrote:
> >> I compiled the kernel with Ingo's CONFIG_PROVE_LOCKING and got the
> >> below at boot. Is it a problem?
> >It was a problem - it has since been fixed in 2.6.23.
> >Patch is attached in case you're interested.
>
> [patch removed]
>
> Thanks. I applied it to 22.16 without rejects but then the machine
> doesn't even mount the partition:
>
> Filesystem "md0": Disabling barriers, not supported by the underlying device
> XFS mounting filesystem md0
> Unable to handle kernel paging request at ffff8110e325e8d0 RIP:
The patch shouldn't have caused that. Did you keep a stack trace from this?
>
> Then I ran 23.14, no errors. However the machine locks hard a few
> seconds after we start the heavy IO tests...
Not good either - again, any stack traces from the stuck threads?
>
> So we used .24. Now the machine stood some 16h of heavy IO load. Good!
> However we got the log below after about 15 minutes of running the test.
> It didn't repeat. Is it still a problem?
If it didn't cause a deadlock then it's probably not an issue. We probably
just need some more lockdep annotations to tell lockdep to ignore this
locking scenario. I can't see where we acquire nested qh_lock's in
xfs_qm_dqget(). It might actually be caused by acquiring the xfs_qm_freelist
lock in xfs_qm_dqget while holding the hash lock because both locks are
initialised with the same stack and lockdep will think they are locks for
the same object type.
>
> Jan 26 19:25:43 mumm kernel: Filesystem "md0": Disabling barriers, not supported by the underlying device
> Jan 26 19:25:43 mumm kernel: XFS mounting filesystem md0
> Jan 26 19:25:43 mumm kernel: Ending clean XFS mount for filesystem: md0
>
> 15min later...
>
> Jan 26 19:41:40 mumm kernel:
> Jan 26 19:41:40 mumm kernel: =============================================
> Jan 26 19:41:40 mumm kernel: [ INFO: possible recursive locking detected ]
> Jan 26 19:41:40 mumm kernel: 2.6.24 #3
> Jan 26 19:41:40 mumm kernel: ---------------------------------------------
> Jan 26 19:41:40 mumm kernel: rm/1589 is trying to acquire lock:
> Jan 26 19:41:40 mumm kernel: (&list->qh_lock){--..}, at: [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
> Jan 26 19:41:40 mumm kernel:
> Jan 26 19:41:40 mumm kernel: but task is already holding lock:
> Jan 26 19:41:40 mumm kernel: (&list->qh_lock){--..}, at: [<ffffffff802c5841>] xfs_qm_dqget+0x1f6/0x355
> Jan 26 19:41:40 mumm kernel:
> Jan 26 19:41:40 mumm kernel: other info that might help us debug this:
> Jan 26 19:41:40 mumm kernel: 4 locks held by rm/1589:
> Jan 26 19:41:40 mumm kernel: #0: (&type->i_mutex_dir_key#2/1){--..}, at: [<ffffffff8027c400>] do_unlinkat+0x6b/0x133
> Jan 26 19:41:40 mumm kernel: #1: (&sb->s_type->i_mutex_key#2){--..}, at: [<ffffffff8027a59d>] vfs_unlink+0x42/0x8c
> Jan 26 19:41:40 mumm kernel: #2: (&(&ip->i_lock)->mr_lock){----}, at: [<ffffffff802f4093>] xfs_ilock+0x63/0x8d
> Jan 26 19:41:40 mumm kernel: #3: (&list->qh_lock){--..}, at: [<ffffffff802c5841>] xfs_qm_dqget+0x1f6/0x355
> Jan 26 19:41:40 mumm kernel:
> Jan 26 19:41:40 mumm kernel: stack backtrace:
> Jan 26 19:41:40 mumm kernel: Pid: 1589, comm: rm Not tainted 2.6.24 #3
> Jan 26 19:41:40 mumm kernel:
> Jan 26 19:41:40 mumm kernel: Call Trace:
> Jan 26 19:41:40 mumm kernel: [<ffffffff802446b2>] __lock_acquire+0x1b7/0xcd0
> Jan 26 19:41:40 mumm kernel: [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
> Jan 26 19:41:40 mumm kernel: [<ffffffff802455f4>] lock_acquire+0x51/0x6c
> Jan 26 19:41:40 mumm kernel: [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
> Jan 26 19:41:40 mumm kernel: [<ffffffff80241ef9>] debug_mutex_lock_common+0x16/0x23
> Jan 26 19:41:40 mumm kernel: other info that might help us debug this:
> Jan 26 19:41:40 mumm kernel: 4 locks held by rm/1589:
> Jan 26 19:41:40 mumm kernel: #0: (&type->i_mutex_dir_key#2/1){--..}, at: [<ffffffff8027c400>] do_unlinkat+0x6b/0x133
> Jan 26 19:41:40 mumm kernel: #1: (&sb->s_type->i_mutex_key#2){--..}, at: [<ffffffff8027a59d>] vfs_unlink+0x42/0x8c
> Jan 26 19:41:40 mumm kernel: #2: (&(&ip->i_lock)->mr_lock){----}, at: [<ffffffff802f4093>] xfs_ilock+0x63/0x8d
> Jan 26 19:41:40 mumm kernel: #3: (&list->qh_lock){--..}, at: [<ffffffff802c5841>] xfs_qm_dqget+0x1f6/0x355
> Jan 26 19:41:40 mumm kernel:
> Jan 26 19:41:40 mumm kernel: stack backtrace:
> Jan 26 19:41:40 mumm kernel: Pid: 1589, comm: rm Not tainted 2.6.24 #3
> Jan 26 19:41:40 mumm kernel:
> Jan 26 19:41:40 mumm kernel: Call Trace:
> Jan 26 19:41:40 mumm kernel: [<ffffffff802446b2>] __lock_acquire+0x1b7/0xcd0
> Jan 26 19:41:40 mumm kernel: [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
> Jan 26 19:41:40 mumm kernel: [<ffffffff802455f4>] lock_acquire+0x51/0x6c
> Jan 26 19:41:40 mumm kernel: [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
> Jan 26 19:41:40 mumm kernel: [<ffffffff80241ef9>] debug_mutex_lock_common+0x16/0x23
> Jan 26 19:41:40 mumm kernel: [<ffffffff803f8c7f>] mutex_lock_nested+0xec/0x286
> Jan 26 19:41:40 mumm kernel: [<ffffffff802c58e2>] xfs_qm_dqget+0x297/0x355
> Jan 26 19:41:40 mumm kernel: [<ffffffff802c8c2c>] xfs_qm_dqattach_one+0xe2/0x150
> Jan 26 19:41:40 mumm kernel: [<ffffffff802c8d51>] xfs_qm_dqattach+0xb7/0x175
> Jan 26 19:41:40 mumm kernel: [<ffffffff80311ac5>] xfs_remove+0x8f/0x364
> Jan 26 19:41:40 mumm kernel: [<ffffffff803f8dff>] mutex_lock_nested+0x26c/0x286
> Jan 26 19:41:40 mumm kernel: [<ffffffff80244082>] trace_hardirqs_on+0x115/0x139
> Jan 26 19:41:40 mumm kernel: [<ffffffff80318e6f>] xfs_vn_unlink+0x17/0x36
> Jan 26 19:41:40 mumm kernel: [<ffffffff8027a5b7>] vfs_unlink+0x5c/0x8c
> Jan 26 19:41:40 mumm kernel: [<ffffffff8027c449>] do_unlinkat+0xb4/0x133
> Jan 26 19:41:40 mumm kernel: [<ffffffff803f997a>] trace_hardirqs_on_thunk+0x35/0x3a
> Jan 26 19:41:40 mumm kernel: [<ffffffff80244082>] trace_hardirqs_on+0x115/0x139
> Jan 26 19:41:40 mumm kernel: [<ffffffff803f997a>] trace_hardirqs_on_thunk+0x35/0x3a
> Jan 26 19:41:40 mumm kernel: [<ffffffff8020b66e>] system_call+0x7e/0x83
>
prev parent reply other threads:[~2008-02-04 5:52 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-24 13:33 possible deadlock shown by CONFIG_PROVE_LOCKING Carlos Carvalho
2008-01-25 2:00 ` Lachlan McIlroy
2008-01-27 21:38 ` Carlos Carvalho
2008-02-04 5:56 ` Lachlan McIlroy [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47A6A924.5010601@sgi.com \
--to=lachlan@sgi.com \
--cc=carlos@fisica.ufpr.br \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.