Re: Linus GIT - INFO: possible circular locking dependency detected

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Linus GIT - INFO: possible circular locking dependency detected
       [not found]   ` <20111105093146.GA14338@albatros>
@ 2011-11-08 23:57     ` Andrew Morton
  2011-11-09  0:17       ` Greg KH
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2011-11-08 23:57 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: Luis Henriques, Miles Lane, LKML, Ingo Molnar, Peter Zijlstra,
	Alexey Dobriyan, stable

On Sat, 5 Nov 2011 13:31:46 +0400
Vasiliy Kulikov <segoon@openwall.com> wrote:

> (cc'ed Andrew and Alexey)
> 
> On Thu, Nov 03, 2011 at 20:49 +0000, Luis Henriques wrote:
> > On Thu, Nov 03, 2011 at 11:57:20AM -0400, Miles Lane wrote:
> > ...
> > 
> > I'm hitting the exact same problem, using a minimal .config file (can send
> > it if required), by just running "find /".
> > 
> > I have bisected the problem and found that commit
> > aa6afca5bcaba8101f3ea09d5c3e4100b2b9f0e5 seems to be the cause of it.

Well, let's tell the -stable maintainer(s?) that aa6afca5bca ("proc:
fix races against execve() of /proc/PID/fd**") is known to cause a
regression.

> procfs holds sig->cred_guard_mutex to ensure the target's credentials are
> not changed.  It is held for a little timeslice.  From the stack trace I
> don't understand how sys_execve() can happen with ->cred_guard_mutex
> held:
> 
> static struct dentry *proc_lookupfd_common(struct inode *dir,
> 					   struct dentry *dentry,
> 					   instantiate_t instantiate)
> {
>     ...
> 	if (lock_trace(task))
> 		goto out;
> 
> 	result = instantiate(dir, dentry, task, &fd);
> 	unlock_trace(task);
>     ...
> }
> 
> 
> static int lock_trace(struct task_struct *task)
> {
> 	int err = mutex_lock_killable(&task->signal->cred_guard_mutex);
> 	if (err)
> 		return err;
> 	if (!ptrace_may_access(task, PTRACE_MODE_ATTACH)) {
> 		mutex_unlock(&task->signal->cred_guard_mutex);
> 		return -EPERM;
> 	}
> 	return 0;
> }
> 
> proc_lookupfd_common() always exits without ->cred_guard_mutex held.
> 

Yes, it's a strange trace.  udev is a strange thing and can be
triggered by the kernel at odd times.  I wonder if it's possible that
some other process is now synchronously running udevd while holding
cred_guard_mutex.  That wouldn't show up in the backtrace.  But I doubt
if lockdep would notice it either...

Either way, it would be prudent to revert aa6afca5bca from mainline if
we can't get this fixed up soon.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linus GIT - INFO: possible circular locking dependency detected
  2011-11-08 23:57     ` Linus GIT - INFO: possible circular locking dependency detected Andrew Morton
@ 2011-11-09  0:17       ` Greg KH
  2011-11-09  0:40         ` Greg KH
  0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2011-11-09  0:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vasiliy Kulikov, Luis Henriques, Miles Lane, LKML, Ingo Molnar,
	Peter Zijlstra, Alexey Dobriyan, stable

On Tue, Nov 08, 2011 at 03:57:16PM -0800, Andrew Morton wrote:
> On Sat, 5 Nov 2011 13:31:46 +0400
> Vasiliy Kulikov <segoon@openwall.com> wrote:
> 
> > (cc'ed Andrew and Alexey)
> > 
> > On Thu, Nov 03, 2011 at 20:49 +0000, Luis Henriques wrote:
> > > On Thu, Nov 03, 2011 at 11:57:20AM -0400, Miles Lane wrote:
> > > ...
> > > 
> > > I'm hitting the exact same problem, using a minimal .config file (can send
> > > it if required), by just running "find /".
> > > 
> > > I have bisected the problem and found that commit
> > > aa6afca5bcaba8101f3ea09d5c3e4100b2b9f0e5 seems to be the cause of it.
> 
> Well, let's tell the -stable maintainer(s?)

It's just me.  And my script-bots, but they are all controlled by me in
the end.  Hopefully....

> that aa6afca5bca ("proc:
> fix races against execve() of /proc/PID/fd**") is known to cause a
> regression.

Ok, I'll go delete it from the stable queues for now.

Vasiliy, please let me know when this gets fixed, what the fixed commit
id is, and what this original commit id was, so I can then queue them up
for the stable releases.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linus GIT - INFO: possible circular locking dependency detected
  2011-11-09  0:17       ` Greg KH
@ 2011-11-09  0:40         ` Greg KH
  2011-11-09 19:11           ` Luis Henriques
  0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2011-11-09  0:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vasiliy Kulikov, Luis Henriques, Miles Lane, LKML, Ingo Molnar,
	Peter Zijlstra, Alexey Dobriyan, stable

On Tue, Nov 08, 2011 at 04:17:23PM -0800, Greg KH wrote:
> On Tue, Nov 08, 2011 at 03:57:16PM -0800, Andrew Morton wrote:
> > On Sat, 5 Nov 2011 13:31:46 +0400
> > Vasiliy Kulikov <segoon@openwall.com> wrote:
> > 
> > > (cc'ed Andrew and Alexey)
> > > 
> > > On Thu, Nov 03, 2011 at 20:49 +0000, Luis Henriques wrote:
> > > > On Thu, Nov 03, 2011 at 11:57:20AM -0400, Miles Lane wrote:
> > > > ...
> > > > 
> > > > I'm hitting the exact same problem, using a minimal .config file (can send
> > > > it if required), by just running "find /".
> > > > 
> > > > I have bisected the problem and found that commit
> > > > aa6afca5bcaba8101f3ea09d5c3e4100b2b9f0e5 seems to be the cause of it.
> > 
> > Well, let's tell the -stable maintainer(s?)
> 
> It's just me.  And my script-bots, but they are all controlled by me in
> the end.  Hopefully....
> 
> > that aa6afca5bca ("proc:
> > fix races against execve() of /proc/PID/fd**") is known to cause a
> > regression.
> 
> Ok, I'll go delete it from the stable queues for now.

Now removed.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linus GIT - INFO: possible circular locking dependency detected
  2011-11-09  0:40         ` Greg KH
@ 2011-11-09 19:11           ` Luis Henriques
  2011-11-09 19:52             ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Luis Henriques @ 2011-11-09 19:11 UTC (permalink / raw)
  To: Greg KH
  Cc: Andrew Morton, Vasiliy Kulikov, Miles Lane, LKML, Ingo Molnar,
	Peter Zijlstra, Alexey Dobriyan, stable

Hi,

On Tue, Nov 08, 2011 at 04:40:13PM -0800, Greg KH wrote:
> > It's just me.  And my script-bots, but they are all controlled by me in
> > the end.  Hopefully....
> > 
> > > that aa6afca5bca ("proc:
> > > fix races against execve() of /proc/PID/fd**") is known to cause a
> > > regression.
> > 
> > Ok, I'll go delete it from the stable queues for now.
> 
> Now removed.

I finally took another look at this, and although I'm far from being an
expert on these areas, I believe the trace information from lockdep may
actually be incorrect.  Here's what I'm getting:

[   12.948038] exe/36 is trying to acquire lock:
[   12.948038]  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811b301e>] lock_trace+0x2e/0x80
[   12.948038] 
[   12.948038] but task is already holding lock:
[   12.948038]  (&sb->s_type->i_mutex_key#6){+.+.+.}, at: [<ffffffff8115f8b8>] vfs_readdir+0x78/0xd0
[   12.948038] 
[   12.948038] which lock already depends on the new lock.

So, sig->cred_guard_mutex is acquired (in lock_trace) after
sb->s_type->i_mutex_key (in vfs_readdir).  Now, take a look at the traces:

[   12.948038] -> #1 (&sb->s_type->i_mutex_key#6){+.+.+.}:
[   12.948038]        [<ffffffff81092e4f>] lock_acquire+0xaf/0x1f0
[   12.948038]        [<ffffffff8135b2a5>] __mutex_lock_common+0x65/0x4d0
[   12.948038]        [<ffffffff8135b72b>] mutex_lock_nested+0x1b/0x20
[   12.948038]        [<ffffffff81158c0a>] do_lookup+0x28a/0x3b0
[   12.948038]        [<ffffffff8115929f>] link_path_walk+0x12f/0x870
[   12.948038]        [<ffffffff8115b0ab>] path_openat+0xbb/0x380
[   12.948038]        [<ffffffff8115b3b2>] do_filp_open+0x42/0xa0
[   12.948038]        [<ffffffff81152cb2>] open_exec+0x32/0xf0
[   12.948038]        [<ffffffff81153dd7>] do_execve_common.clone.32+0x137/0x330
[   12.948038]        [<ffffffff81153feb>] do_execve+0x1b/0x20
[   12.948038]        [<ffffffff8100c78a>] sys_execve+0x4a/0x80
[   12.948038]        [<ffffffff8135ed1c>] stub_execve+0x6c/0xc0
[   12.948038] 
[   12.948038] -> #0 (&sig->cred_guard_mutex){+.+.+.}:
[   12.948038]        [<ffffffff8108ff9f>] __lock_acquire+0x17bf/0x2020
[   12.948038]        [<ffffffff81092e4f>] lock_acquire+0xaf/0x1f0
[   12.948038]        [<ffffffff8135b2a5>] __mutex_lock_common+0x65/0x4d0
[   12.948038]        [<ffffffff8135b76b>] mutex_lock_killable_nested+0x1b/0x20
[   12.948038]        [<ffffffff811b301e>] lock_trace+0x2e/0x80
[   12.948038]        [<ffffffff811b73ab>] proc_readfd_common+0x5b/0x4b0
[   12.948038]        [<ffffffff811b7835>] proc_readfd+0x15/0x20
[   12.948038]        [<ffffffff8115f8f0>] vfs_readdir+0xb0/0xd0
[   12.948038]        [<ffffffff8115fa09>] sys_getdents+0x89/0x100
[   12.948038]        [<ffffffff8135e8c2>] system_call_fastpath+0x16/0x1b

sb->s_type->i_mutex_key is shown as being acquired in the execve path,
which seems to be wrong -- it was acquired in the vfs_readdir (on the 2nd
trace).

This means that the initial analysis from Vasiliy is incorrect, as he
assumed the execve path.  Or Am I interpreting this log incorrectly?
(Probably I am...).

Anyway, if my analysis is correct, replacing the lock_trace by a simple
ptrace_may_access() should be enough.  Something like:

-       if (lock_trace(p))
+       if (!ptrace_may_access(p, PTRACE_MODE_ATTACH))
                goto out;

Obviously, the unlock_trace() should be removed as well... But I may be
missing other cases where the lock_trace is actually required.

BTW, I get this log simply by running:

# ls /proc/1/fd

Just my 2 cents...

Cheers,
--
Luis Henriques

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linus GIT - INFO: possible circular locking dependency detected
  2011-11-09 19:11           ` Luis Henriques
@ 2011-11-09 19:52             ` Peter Zijlstra
  2011-11-09 20:05               ` Luis Henriques
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2011-11-09 19:52 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Greg KH, Andrew Morton, Vasiliy Kulikov, Miles Lane, LKML,
	Ingo Molnar, Alexey Dobriyan, stable

On Wed, 2011-11-09 at 19:11 +0000, Luis Henriques wrote:
> [   12.948038] -> #0 (&sig->cred_guard_mutex){+.+.+.}:
> [   12.948038]        [<ffffffff8108ff9f>] __lock_acquire+0x17bf/0x2020
> [   12.948038]        [<ffffffff81092e4f>] lock_acquire+0xaf/0x1f0
> [   12.948038]        [<ffffffff8135b2a5>] __mutex_lock_common+0x65/0x4d0
> [   12.948038]        [<ffffffff8135b76b>] mutex_lock_killable_nested+0x1b/0x20
> [   12.948038]        [<ffffffff811b301e>] lock_trace+0x2e/0x80
> [   12.948038]        [<ffffffff811b73ab>] proc_readfd_common+0x5b/0x4b0
> [   12.948038]        [<ffffffff811b7835>] proc_readfd+0x15/0x20
> [   12.948038]        [<ffffffff8115f8f0>] vfs_readdir+0xb0/0xd0
> [   12.948038]        [<ffffffff8115fa09>] sys_getdents+0x89/0x100
> [   12.948038]        [<ffffffff8135e8c2>] system_call_fastpath+0x16/0x1b
> 
> sb->s_type->i_mutex_key is shown as being acquired in the execve path,
> which seems to be wrong -- it was acquired in the vfs_readdir (on the 2nd
> trace).
> 
> This means that the initial analysis from Vasiliy is incorrect, as he
> assumed the execve path.  Or Am I interpreting this log incorrectly?
> (Probably I am...). 

->#0 shows where sig->cred_guard_mutex was taken, as it was the first
lock there is no nesting yet and therefore the ->i_mutex_key#6 thing
should not be associated with this stacktrace.

->#1 shows where ->i_mutex_key#6 was taken while holding
->cred_guard_mutex (but doesn't explicitly show where that was taken).

Mostly ->#0 information is useless in lockdep reports and can be safely
ignored.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linus GIT - INFO: possible circular locking dependency detected
  2011-11-09 19:52             ` Peter Zijlstra
@ 2011-11-09 20:05               ` Luis Henriques
  2011-11-14 17:27                 ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: Luis Henriques @ 2011-11-09 20:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Greg KH, Andrew Morton, Vasiliy Kulikov, Miles Lane, LKML,
	Ingo Molnar, Alexey Dobriyan, stable

On Wed, Nov 09, 2011 at 08:52:12PM +0100, Peter Zijlstra wrote:
> On Wed, 2011-11-09 at 19:11 +0000, Luis Henriques wrote:
> > [   12.948038] -> #0 (&sig->cred_guard_mutex){+.+.+.}:
> > [   12.948038]        [<ffffffff8108ff9f>] __lock_acquire+0x17bf/0x2020
> > [   12.948038]        [<ffffffff81092e4f>] lock_acquire+0xaf/0x1f0
> > [   12.948038]        [<ffffffff8135b2a5>] __mutex_lock_common+0x65/0x4d0
> > [   12.948038]        [<ffffffff8135b76b>] mutex_lock_killable_nested+0x1b/0x20
> > [   12.948038]        [<ffffffff811b301e>] lock_trace+0x2e/0x80
> > [   12.948038]        [<ffffffff811b73ab>] proc_readfd_common+0x5b/0x4b0
> > [   12.948038]        [<ffffffff811b7835>] proc_readfd+0x15/0x20
> > [   12.948038]        [<ffffffff8115f8f0>] vfs_readdir+0xb0/0xd0
> > [   12.948038]        [<ffffffff8115fa09>] sys_getdents+0x89/0x100
> > [   12.948038]        [<ffffffff8135e8c2>] system_call_fastpath+0x16/0x1b
> > 
> > sb->s_type->i_mutex_key is shown as being acquired in the execve path,
> > which seems to be wrong -- it was acquired in the vfs_readdir (on the 2nd
> > trace).
> > 
> > This means that the initial analysis from Vasiliy is incorrect, as he
> > assumed the execve path.  Or Am I interpreting this log incorrectly?
> > (Probably I am...). 
> 
> ->#0 shows where sig->cred_guard_mutex was taken, as it was the first
> lock there is no nesting yet and therefore the ->i_mutex_key#6 thing
> should not be associated with this stacktrace.
> 
> ->#1 shows where ->i_mutex_key#6 was taken while holding
> ->cred_guard_mutex (but doesn't explicitly show where that was taken).
> 
> Mostly ->#0 information is useless in lockdep reports and can be safely
> ignored.

Interesting.  Thank you for the clarification.  I understood it the other
way around, as the log starts with:

[   12.948038] exe/36 is trying to acquire lock:
[   12.948038]  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811b301e>] lock_trace+0x2e/0x80
[   12.948038] 
[   12.948038] but task is already holding lock:
[   12.948038]  (&sb->s_type->i_mutex_key#6){+.+.+.}, at: [<ffffffff8115f8b8>] vfs_readdir+0x78/0xd0

This is why I thought the "useless information" was in ->#1 (for the
i_mutex_key#6).

Cheers,
--
Luis Henriques

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linus GIT - INFO: possible circular locking dependency detected
  2011-11-09 20:05               ` Luis Henriques
@ 2011-11-14 17:27                 ` Paul E. McKenney
  0 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2011-11-14 17:27 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Peter Zijlstra, Greg KH, Andrew Morton, Vasiliy Kulikov,
	Miles Lane, LKML, Ingo Molnar, Alexey Dobriyan, stable

On Wed, Nov 09, 2011 at 08:05:53PM +0000, Luis Henriques wrote:
> On Wed, Nov 09, 2011 at 08:52:12PM +0100, Peter Zijlstra wrote:
> > On Wed, 2011-11-09 at 19:11 +0000, Luis Henriques wrote:
> > > [   12.948038] -> #0 (&sig->cred_guard_mutex){+.+.+.}:
> > > [   12.948038]        [<ffffffff8108ff9f>] __lock_acquire+0x17bf/0x2020
> > > [   12.948038]        [<ffffffff81092e4f>] lock_acquire+0xaf/0x1f0
> > > [   12.948038]        [<ffffffff8135b2a5>] __mutex_lock_common+0x65/0x4d0
> > > [   12.948038]        [<ffffffff8135b76b>] mutex_lock_killable_nested+0x1b/0x20
> > > [   12.948038]        [<ffffffff811b301e>] lock_trace+0x2e/0x80
> > > [   12.948038]        [<ffffffff811b73ab>] proc_readfd_common+0x5b/0x4b0
> > > [   12.948038]        [<ffffffff811b7835>] proc_readfd+0x15/0x20
> > > [   12.948038]        [<ffffffff8115f8f0>] vfs_readdir+0xb0/0xd0
> > > [   12.948038]        [<ffffffff8115fa09>] sys_getdents+0x89/0x100
> > > [   12.948038]        [<ffffffff8135e8c2>] system_call_fastpath+0x16/0x1b
> > > 
> > > sb->s_type->i_mutex_key is shown as being acquired in the execve path,
> > > which seems to be wrong -- it was acquired in the vfs_readdir (on the 2nd
> > > trace).
> > > 
> > > This means that the initial analysis from Vasiliy is incorrect, as he
> > > assumed the execve path.  Or Am I interpreting this log incorrectly?
> > > (Probably I am...). 
> > 
> > ->#0 shows where sig->cred_guard_mutex was taken, as it was the first
> > lock there is no nesting yet and therefore the ->i_mutex_key#6 thing
> > should not be associated with this stacktrace.
> > 
> > ->#1 shows where ->i_mutex_key#6 was taken while holding
> > ->cred_guard_mutex (but doesn't explicitly show where that was taken).
> > 
> > Mostly ->#0 information is useless in lockdep reports and can be safely
> > ignored.
> 
> Interesting.  Thank you for the clarification.  I understood it the other
> way around, as the log starts with:
> 
> [   12.948038] exe/36 is trying to acquire lock:
> [   12.948038]  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811b301e>] lock_trace+0x2e/0x80
> [   12.948038] 
> [   12.948038] but task is already holding lock:
> [   12.948038]  (&sb->s_type->i_mutex_key#6){+.+.+.}, at: [<ffffffff8115f8b8>] vfs_readdir+0x78/0xd0
> 
> This is why I thought the "useless information" was in ->#1 (for the
> i_mutex_key#6).

FWIW, I am seeing this as well now that I have rebased -rcu to 3.2-rc1.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-11-14 17:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAHFgRy8S0xLfhZxTUOEH5A0PL_Fb79-0-gmbQ=9h2D-xMqt1hA@mail.gmail.com>
     [not found] ` <20111103204930.GA3599@hades>
     [not found]   ` <20111105093146.GA14338@albatros>
2011-11-08 23:57     ` Linus GIT - INFO: possible circular locking dependency detected Andrew Morton
2011-11-09  0:17       ` Greg KH
2011-11-09  0:40         ` Greg KH
2011-11-09 19:11           ` Luis Henriques
2011-11-09 19:52             ` Peter Zijlstra
2011-11-09 20:05               ` Luis Henriques
2011-11-14 17:27                 ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).