v2.6.26-rc7/cgroups: circular locking dependency

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* v2.6.26-rc7/cgroups: circular locking dependency
@ 2008-06-21 17:38 Vegard Nossum
  2008-06-22  9:10 ` Cyrill Gorcunov
  2008-06-22 15:34 ` KOSAKI Motohiro
  0 siblings, 2 replies; 11+ messages in thread
From: Vegard Nossum @ 2008-06-21 17:38 UTC (permalink / raw)
  To: Paul Menage, containers; +Cc: linux-kernel

Hi,

I decided to see what cgroups is all about, and followed the instructions
in Documentation/cgroups.txt :-) It happened when I did this:

    [root@damson /dev/cgroup/Vegard 0]
    # echo 1 > cpuset.cpus

I can also provide the kernel config if necessary.


Vegard

 
 =======================================================
 [ INFO: possible circular locking dependency detected ]
 2.6.26-rc7 #25
 -------------------------------------------------------
 bash/10032 is trying to acquire lock:
  (&cpu_hotplug.lock){--..}, at: [<c015efbc>] get_online_cpus+0x2c/0x40
 
 but task is already holding lock:
  (cgroup_mutex){--..}, at: [<c0160e6f>] cgroup_lock+0xf/0x20
 
 which lock already depends on the new lock.
 
 
 the existing dependency chain (in reverse order) is:
 
 -> #1 (cgroup_mutex){--..}:
        [<c015a435>] __lock_acquire+0xf45/0x1040
        [<c015a5c8>] lock_acquire+0x98/0xd0
        [<c05416d1>] mutex_lock_nested+0xb1/0x300
        [<c0160e6f>] cgroup_lock+0xf/0x20
        [<c0164750>] cpuset_handle_cpuhp+0x20/0x180
        [<c014ea77>] notifier_call_chain+0x37/0x70
        [<c014eae9>] __raw_notifier_call_chain+0x19/0x20
        [<c051f8c8>] _cpu_down+0x78/0x240
        [<c051fabb>] cpu_down+0x2b/0x40
        [<c0520cd9>] store_online+0x39/0x80
        [<c02f627b>] sysdev_store+0x2b/0x40
        [<c01d3372>] sysfs_write_file+0xa2/0x100
        [<c0195486>] vfs_write+0x96/0x130
        [<c0195b4d>] sys_write+0x3d/0x70
        [<c010831b>] sysenter_past_esp+0x78/0xd1
        [<ffffffff>] 0xffffffff
 
 -> #0 (&cpu_hotplug.lock){--..}:
        [<c0159fe5>] __lock_acquire+0xaf5/0x1040
        [<c015a5c8>] lock_acquire+0x98/0xd0
        [<c05416d1>] mutex_lock_nested+0xb1/0x300
        [<c015efbc>] get_online_cpus+0x2c/0x40
        [<c0163e6d>] rebuild_sched_domains+0x7d/0x3a0
        [<c01653a4>] cpuset_common_file_write+0x204/0x440
        [<c0162bc7>] cgroup_file_write+0x67/0x130
        [<c0195486>] vfs_write+0x96/0x130
        [<c0195b4d>] sys_write+0x3d/0x70
        [<c010831b>] sysenter_past_esp+0x78/0xd1
        [<ffffffff>] 0xffffffff
 
 other info that might help us debug this:
 
 1 lock held by bash/10032:
  #0:  (cgroup_mutex){--..}, at: [<c0160e6f>] cgroup_lock+0xf/0x20
 
 stack backtrace:
 Pid: 10032, comm: bash Not tainted 2.6.26-rc7 #25
  [<c0157ae7>] print_circular_bug_tail+0x77/0x90
  [<c0159fe5>] __lock_acquire+0xaf5/0x1040
  [<c015a5c8>] lock_acquire+0x98/0xd0
  [<c015efbc>] ? get_online_cpus+0x2c/0x40
  [<c05416d1>] mutex_lock_nested+0xb1/0x300
  [<c015efbc>] ? get_online_cpus+0x2c/0x40
  [<c015efbc>] ? get_online_cpus+0x2c/0x40
  [<c015efbc>] get_online_cpus+0x2c/0x40
  [<c0163e6d>] rebuild_sched_domains+0x7d/0x3a0
  [<c0158e65>] ? mark_held_locks+0x65/0x80
  [<c0190ac5>] ? kfree+0xb5/0x120
  [<c027736a>] ? heap_free+0xa/0x10
  [<c027736a>] ? heap_free+0xa/0x10
  [<c01653a4>] cpuset_common_file_write+0x204/0x440
  [<c0163c30>] ? started_after+0x0/0x70
  [<c010e585>] ? native_sched_clock+0xb5/0x110
  [<c0162e10>] ? started_after+0x0/0x70
  [<c0164c80>] ? cpuset_test_cpumask+0x0/0x20
  [<c0164ca0>] ? cpuset_change_cpumask+0x0/0x20
  [<c0162bc7>] cgroup_file_write+0x67/0x130
  [<c01651a0>] ? cpuset_common_file_write+0x0/0x440
  [<c010e585>] ? native_sched_clock+0xb5/0x110
  [<c0195486>] vfs_write+0x96/0x130
  [<c0162b60>] ? cgroup_file_write+0x0/0x130
  [<c0195b4d>] sys_write+0x3d/0x70
  [<c010831b>] sysenter_past_esp+0x78/0xd1
  =======================


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: v2.6.26-rc7/cgroups: circular locking dependency
  2008-06-21 17:38 v2.6.26-rc7/cgroups: circular locking dependency Vegard Nossum
@ 2008-06-22  9:10 ` Cyrill Gorcunov
  2008-06-22  9:42   ` Vegard Nossum
  2008-06-22 15:34 ` KOSAKI Motohiro
  1 sibling, 1 reply; 11+ messages in thread
From: Cyrill Gorcunov @ 2008-06-22  9:10 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: Paul Menage, containers, linux-kernel

[Vegard Nossum - Sat, Jun 21, 2008 at 07:38:59PM +0200]
| Hi,
| 
| I decided to see what cgroups is all about, and followed the instructions
| in Documentation/cgroups.txt :-) It happened when I did this:
| 
|     [root@damson /dev/cgroup/Vegard 0]
|     # echo 1 > cpuset.cpus
| 
| I can also provide the kernel config if necessary.
| 
| 
| Vegard
| 
|  
|  =======================================================
|  [ INFO: possible circular locking dependency detected ]
|  2.6.26-rc7 #25
|  -------------------------------------------------------
|  bash/10032 is trying to acquire lock:
|   (&cpu_hotplug.lock){--..}, at: [<c015efbc>] get_online_cpus+0x2c/0x40
|  
|  but task is already holding lock:
|   (cgroup_mutex){--..}, at: [<c0160e6f>] cgroup_lock+0xf/0x20
|  
|  which lock already depends on the new lock.
|  
|  
|  the existing dependency chain (in reverse order) is:
|  
|  -> #1 (cgroup_mutex){--..}:
|         [<c015a435>] __lock_acquire+0xf45/0x1040
|         [<c015a5c8>] lock_acquire+0x98/0xd0
|         [<c05416d1>] mutex_lock_nested+0xb1/0x300
|         [<c0160e6f>] cgroup_lock+0xf/0x20
|         [<c0164750>] cpuset_handle_cpuhp+0x20/0x180
|         [<c014ea77>] notifier_call_chain+0x37/0x70
|         [<c014eae9>] __raw_notifier_call_chain+0x19/0x20
|         [<c051f8c8>] _cpu_down+0x78/0x240
|         [<c051fabb>] cpu_down+0x2b/0x40
|         [<c0520cd9>] store_online+0x39/0x80
|         [<c02f627b>] sysdev_store+0x2b/0x40
|         [<c01d3372>] sysfs_write_file+0xa2/0x100
|         [<c0195486>] vfs_write+0x96/0x130
|         [<c0195b4d>] sys_write+0x3d/0x70
|         [<c010831b>] sysenter_past_esp+0x78/0xd1
|         [<ffffffff>] 0xffffffff
|  
|  -> #0 (&cpu_hotplug.lock){--..}:
|         [<c0159fe5>] __lock_acquire+0xaf5/0x1040
|         [<c015a5c8>] lock_acquire+0x98/0xd0
|         [<c05416d1>] mutex_lock_nested+0xb1/0x300
|         [<c015efbc>] get_online_cpus+0x2c/0x40
|         [<c0163e6d>] rebuild_sched_domains+0x7d/0x3a0
|         [<c01653a4>] cpuset_common_file_write+0x204/0x440
|         [<c0162bc7>] cgroup_file_write+0x67/0x130
|         [<c0195486>] vfs_write+0x96/0x130
|         [<c0195b4d>] sys_write+0x3d/0x70
|         [<c010831b>] sysenter_past_esp+0x78/0xd1
|         [<ffffffff>] 0xffffffff
|  
|  other info that might help us debug this:
|  
|  1 lock held by bash/10032:
|   #0:  (cgroup_mutex){--..}, at: [<c0160e6f>] cgroup_lock+0xf/0x20
|  
|  stack backtrace:
|  Pid: 10032, comm: bash Not tainted 2.6.26-rc7 #25
|   [<c0157ae7>] print_circular_bug_tail+0x77/0x90
|   [<c0159fe5>] __lock_acquire+0xaf5/0x1040
|   [<c015a5c8>] lock_acquire+0x98/0xd0
|   [<c015efbc>] ? get_online_cpus+0x2c/0x40
|   [<c05416d1>] mutex_lock_nested+0xb1/0x300
|   [<c015efbc>] ? get_online_cpus+0x2c/0x40
|   [<c015efbc>] ? get_online_cpus+0x2c/0x40
|   [<c015efbc>] get_online_cpus+0x2c/0x40
|   [<c0163e6d>] rebuild_sched_domains+0x7d/0x3a0
|   [<c0158e65>] ? mark_held_locks+0x65/0x80
|   [<c0190ac5>] ? kfree+0xb5/0x120
|   [<c027736a>] ? heap_free+0xa/0x10
|   [<c027736a>] ? heap_free+0xa/0x10
|   [<c01653a4>] cpuset_common_file_write+0x204/0x440
|   [<c0163c30>] ? started_after+0x0/0x70
|   [<c010e585>] ? native_sched_clock+0xb5/0x110
|   [<c0162e10>] ? started_after+0x0/0x70
|   [<c0164c80>] ? cpuset_test_cpumask+0x0/0x20
|   [<c0164ca0>] ? cpuset_change_cpumask+0x0/0x20
|   [<c0162bc7>] cgroup_file_write+0x67/0x130
|   [<c01651a0>] ? cpuset_common_file_write+0x0/0x440
|   [<c010e585>] ? native_sched_clock+0xb5/0x110
|   [<c0195486>] vfs_write+0x96/0x130
|   [<c0162b60>] ? cgroup_file_write+0x0/0x130
|   [<c0195b4d>] sys_write+0x3d/0x70
|   [<c010831b>] sysenter_past_esp+0x78/0xd1
|   =======================
| 

Hi Vegard,

would be interesting to see your config please

		- Cyrill -

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: v2.6.26-rc7/cgroups: circular locking dependency
  2008-06-22  9:10 ` Cyrill Gorcunov
@ 2008-06-22  9:42   ` Vegard Nossum
  2008-06-22  9:50     ` Cyrill Gorcunov
  0 siblings, 1 reply; 11+ messages in thread
From: Vegard Nossum @ 2008-06-22  9:42 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Paul Menage, containers, linux-kernel

On Sun, Jun 22, 2008 at 11:10 AM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> [Vegard Nossum - Sat, Jun 21, 2008 at 07:38:59PM +0200]
> |  =======================================================
> |  [ INFO: possible circular locking dependency detected ]
> |  2.6.26-rc7 #25
> |  -------------------------------------------------------
> |  bash/10032 is trying to acquire lock:
> |   (&cpu_hotplug.lock){--..}, at: [<c015efbc>] get_online_cpus+0x2c/0x40
> |
> |  but task is already holding lock:
> |   (cgroup_mutex){--..}, at: [<c0160e6f>] cgroup_lock+0xf/0x20

...

> Hi Vegard,
>
> would be interesting to see your config please

Here is my config and vmlinux:

http://www.kernel.org/pub/linux/kernel/people/vegard/kernels/20080621-v2.6.26-rc7/

By the way, I was doing CPU hotplug/unplug at the time.

Thanks :)


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: v2.6.26-rc7/cgroups: circular locking dependency
  2008-06-22  9:42   ` Vegard Nossum
@ 2008-06-22  9:50     ` Cyrill Gorcunov
  0 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2008-06-22  9:50 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: Paul Menage, containers, linux-kernel

[Vegard Nossum - Sun, Jun 22, 2008 at 11:42:50AM +0200]
| On Sun, Jun 22, 2008 at 11:10 AM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
| > [Vegard Nossum - Sat, Jun 21, 2008 at 07:38:59PM +0200]
| > |  =======================================================
| > |  [ INFO: possible circular locking dependency detected ]
| > |  2.6.26-rc7 #25
| > |  -------------------------------------------------------
| > |  bash/10032 is trying to acquire lock:
| > |   (&cpu_hotplug.lock){--..}, at: [<c015efbc>] get_online_cpus+0x2c/0x40
| > |
| > |  but task is already holding lock:
| > |   (cgroup_mutex){--..}, at: [<c0160e6f>] cgroup_lock+0xf/0x20
| 
| ...
| 
| > Hi Vegard,
| >
| > would be interesting to see your config please
| 
| Here is my config and vmlinux:
| 
| http://www.kernel.org/pub/linux/kernel/people/vegard/kernels/20080621-v2.6.26-rc7/
| 
| By the way, I was doing CPU hotplug/unplug at the time.
| 
| Thanks :)
| 
| 
| Vegard
| 
| -- 
| "The animistic metaphor of the bug that maliciously sneaked in while
| the programmer was not looking is intellectually dishonest as it
| disguises that the error is the programmer's own creation."
| 	-- E. W. Dijkstra, EWD1036
| 

Thanks Vegard,

actually i'm not that spec in this area ;) but will take a look

		- Cyrill -

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: v2.6.26-rc7/cgroups: circular locking dependency
  2008-06-21 17:38 v2.6.26-rc7/cgroups: circular locking dependency Vegard Nossum
  2008-06-22  9:10 ` Cyrill Gorcunov
@ 2008-06-22 15:34 ` KOSAKI Motohiro
  2008-06-22 15:50   ` Peter Zijlstra
  2008-06-22 16:02   ` Cyrill Gorcunov
  1 sibling, 2 replies; 11+ messages in thread
From: KOSAKI Motohiro @ 2008-06-22 15:34 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: Paul Menage, containers, linux-kernel, Paul Jackson

CC'ed Paul Jackson

it seems typical ABBA deadlock.
I think cpuset use cgrou_lock() by mistake.

IMHO, cpuset_handle_cpuhp() sholdn't use cgroup_lock() and
shouldn't call rebuild_sched_domains().


 -> #1 (cgroup_mutex){--..}:
       [<c015a435>] __lock_acquire+0xf45/0x1040
       [<c015a5c8>] lock_acquire+0x98/0xd0
       [<c05416d1>] mutex_lock_nested+0xb1/0x300
       [<c0160e6f>] cgroup_lock+0xf/0x20			   cgroup_lock
       [<c0164750>] cpuset_handle_cpuhp+0x20/0x180
       [<c014ea77>] notifier_call_chain+0x37/0x70
       [<c014eae9>] __raw_notifier_call_chain+0x19/0x20
       [<c051f8c8>] _cpu_down+0x78/0x240			cpu_hotplug.lock
       [<c051fabb>] cpu_down+0x2b/0x40				  cpu_add_remove_lock
       [<c0520cd9>] store_online+0x39/0x80
       [<c02f627b>] sysdev_store+0x2b/0x40
       [<c01d3372>] sysfs_write_file+0xa2/0x100
       [<c0195486>] vfs_write+0x96/0x130
       [<c0195b4d>] sys_write+0x3d/0x70
       [<c010831b>] sysenter_past_esp+0x78/0xd1
       [<ffffffff>] 0xffffffff

 -> #0 (&cpu_hotplug.lock){--..}:
       [<c0159fe5>] __lock_acquire+0xaf5/0x1040
       [<c015a5c8>] lock_acquire+0x98/0xd0
       [<c05416d1>] mutex_lock_nested+0xb1/0x300
       [<c015efbc>] get_online_cpus+0x2c/0x40			      cpu_hotplug.lock
       [<c0163e6d>] rebuild_sched_domains+0x7d/0x3a0
       [<c01653a4>] cpuset_common_file_write+0x204/0x440	cgroup_lock
       [<c0162bc7>] cgroup_file_write+0x67/0x130
       [<c0195486>] vfs_write+0x96/0x130
       [<c0195b4d>] sys_write+0x3d/0x70
       [<c010831b>] sysenter_past_esp+0x78/0xd1
       [<ffffffff>] 0xffffffff


> Hi,
>
> I decided to see what cgroups is all about, and followed the instructions
> in Documentation/cgroups.txt :-) It happened when I did this:
>
>    [root@damson /dev/cgroup/Vegard 0]
>    # echo 1 > cpuset.cpus
>
> I can also provide the kernel config if necessary.
>
>
> Vegard
>
>
>  =======================================================
>  [ INFO: possible circular locking dependency detected ]
>  2.6.26-rc7 #25
>  -------------------------------------------------------
>  bash/10032 is trying to acquire lock:
>  (&cpu_hotplug.lock){--..}, at: [<c015efbc>] get_online_cpus+0x2c/0x40
>
>  but task is already holding lock:
>  (cgroup_mutex){--..}, at: [<c0160e6f>] cgroup_lock+0xf/0x20
>
>  which lock already depends on the new lock.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: v2.6.26-rc7/cgroups: circular locking dependency
  2008-06-22 15:34 ` KOSAKI Motohiro
@ 2008-06-22 15:50   ` Peter Zijlstra
  2008-06-23 12:02     ` Paul Jackson
  2008-06-24  6:29     ` Max Krasnyansky
  2008-06-22 16:02   ` Cyrill Gorcunov
  1 sibling, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2008-06-22 15:50 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Vegard Nossum, Paul Menage, containers, linux-kernel,
	Paul Jackson, maxk

On Mon, 2008-06-23 at 00:34 +0900, KOSAKI Motohiro wrote:
> CC'ed Paul Jackson
> 
> it seems typical ABBA deadlock.
> I think cpuset use cgrou_lock() by mistake.
> 
> IMHO, cpuset_handle_cpuhp() sholdn't use cgroup_lock() and
> shouldn't call rebuild_sched_domains().

Looks like Max forgot to test with lockdep enabled... 

Well, someone should when you change the online map.

Max, Paul, can we handle this in update_sched_domains() instead?

>  -> #1 (cgroup_mutex){--..}:
>        [<c015a435>] __lock_acquire+0xf45/0x1040
>        [<c015a5c8>] lock_acquire+0x98/0xd0
>        [<c05416d1>] mutex_lock_nested+0xb1/0x300
>        [<c0160e6f>] cgroup_lock+0xf/0x20			   cgroup_lock
>        [<c0164750>] cpuset_handle_cpuhp+0x20/0x180
>        [<c014ea77>] notifier_call_chain+0x37/0x70
>        [<c014eae9>] __raw_notifier_call_chain+0x19/0x20
>        [<c051f8c8>] _cpu_down+0x78/0x240			cpu_hotplug.lock
>        [<c051fabb>] cpu_down+0x2b/0x40				  cpu_add_remove_lock
>        [<c0520cd9>] store_online+0x39/0x80
>        [<c02f627b>] sysdev_store+0x2b/0x40
>        [<c01d3372>] sysfs_write_file+0xa2/0x100
>        [<c0195486>] vfs_write+0x96/0x130
>        [<c0195b4d>] sys_write+0x3d/0x70
>        [<c010831b>] sysenter_past_esp+0x78/0xd1
>        [<ffffffff>] 0xffffffff
> 
>  -> #0 (&cpu_hotplug.lock){--..}:
>        [<c0159fe5>] __lock_acquire+0xaf5/0x1040
>        [<c015a5c8>] lock_acquire+0x98/0xd0
>        [<c05416d1>] mutex_lock_nested+0xb1/0x300
>        [<c015efbc>] get_online_cpus+0x2c/0x40			      cpu_hotplug.lock
>        [<c0163e6d>] rebuild_sched_domains+0x7d/0x3a0
>        [<c01653a4>] cpuset_common_file_write+0x204/0x440	cgroup_lock
>        [<c0162bc7>] cgroup_file_write+0x67/0x130
>        [<c0195486>] vfs_write+0x96/0x130
>        [<c0195b4d>] sys_write+0x3d/0x70
>        [<c010831b>] sysenter_past_esp+0x78/0xd1
>        [<ffffffff>] 0xffffffff
> 
> 
> > Hi,
> >
> > I decided to see what cgroups is all about, and followed the instructions
> > in Documentation/cgroups.txt :-) It happened when I did this:
> >
> >    [root@damson /dev/cgroup/Vegard 0]
> >    # echo 1 > cpuset.cpus
> >
> > I can also provide the kernel config if necessary.
> >
> >
> > Vegard
> >
> >
> >  =======================================================
> >  [ INFO: possible circular locking dependency detected ]
> >  2.6.26-rc7 #25
> >  -------------------------------------------------------
> >  bash/10032 is trying to acquire lock:
> >  (&cpu_hotplug.lock){--..}, at: [<c015efbc>] get_online_cpus+0x2c/0x40
> >
> >  but task is already holding lock:
> >  (cgroup_mutex){--..}, at: [<c0160e6f>] cgroup_lock+0xf/0x20
> >
> >  which lock already depends on the new lock.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: v2.6.26-rc7/cgroups: circular locking dependency
  2008-06-22 15:34 ` KOSAKI Motohiro
  2008-06-22 15:50   ` Peter Zijlstra
@ 2008-06-22 16:02   ` Cyrill Gorcunov
  1 sibling, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov @ 2008-06-22 16:02 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Vegard Nossum, Paul Menage, containers, linux-kernel,
	Paul Jackson

[KOSAKI Motohiro - Mon, Jun 23, 2008 at 12:34:04AM +0900]
| CC'ed Paul Jackson
| 
| it seems typical ABBA deadlock.
| I think cpuset use cgrou_lock() by mistake.
| 
| IMHO, cpuset_handle_cpuhp() sholdn't use cgroup_lock() and
| shouldn't call rebuild_sched_domains().
| 
| 
|  -> #1 (cgroup_mutex){--..}:
|        [<c015a435>] __lock_acquire+0xf45/0x1040
|        [<c015a5c8>] lock_acquire+0x98/0xd0
|        [<c05416d1>] mutex_lock_nested+0xb1/0x300
|        [<c0160e6f>] cgroup_lock+0xf/0x20			   cgroup_lock
|        [<c0164750>] cpuset_handle_cpuhp+0x20/0x180
|        [<c014ea77>] notifier_call_chain+0x37/0x70
|        [<c014eae9>] __raw_notifier_call_chain+0x19/0x20
|        [<c051f8c8>] _cpu_down+0x78/0x240			cpu_hotplug.lock
|        [<c051fabb>] cpu_down+0x2b/0x40				  cpu_add_remove_lock
|        [<c0520cd9>] store_online+0x39/0x80
|        [<c02f627b>] sysdev_store+0x2b/0x40
|        [<c01d3372>] sysfs_write_file+0xa2/0x100
|        [<c0195486>] vfs_write+0x96/0x130
|        [<c0195b4d>] sys_write+0x3d/0x70
|        [<c010831b>] sysenter_past_esp+0x78/0xd1
|        [<ffffffff>] 0xffffffff
| 
|  -> #0 (&cpu_hotplug.lock){--..}:
|        [<c0159fe5>] __lock_acquire+0xaf5/0x1040
|        [<c015a5c8>] lock_acquire+0x98/0xd0
|        [<c05416d1>] mutex_lock_nested+0xb1/0x300
|        [<c015efbc>] get_online_cpus+0x2c/0x40			      cpu_hotplug.lock
|        [<c0163e6d>] rebuild_sched_domains+0x7d/0x3a0
|        [<c01653a4>] cpuset_common_file_write+0x204/0x440	cgroup_lock
|        [<c0162bc7>] cgroup_file_write+0x67/0x130
|        [<c0195486>] vfs_write+0x96/0x130
|        [<c0195b4d>] sys_write+0x3d/0x70
|        [<c010831b>] sysenter_past_esp+0x78/0xd1
|        [<ffffffff>] 0xffffffff
| 
| 
| > Hi,
| >
| > I decided to see what cgroups is all about, and followed the instructions
| > in Documentation/cgroups.txt :-) It happened when I did this:
| >
| >    [root@damson /dev/cgroup/Vegard 0]
| >    # echo 1 > cpuset.cpus
| >
| > I can also provide the kernel config if necessary.
| >
| >
| > Vegard
| >
| >
| >  =======================================================
| >  [ INFO: possible circular locking dependency detected ]
| >  2.6.26-rc7 #25
| >  -------------------------------------------------------
| >  bash/10032 is trying to acquire lock:
| >  (&cpu_hotplug.lock){--..}, at: [<c015efbc>] get_online_cpus+0x2c/0x40
| >
| >  but task is already holding lock:
| >  (cgroup_mutex){--..}, at: [<c0160e6f>] cgroup_lock+0xf/0x20
| >
| >  which lock already depends on the new lock.
| 

Thanks Kosaki!

		- Cyrill -

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: v2.6.26-rc7/cgroups: circular locking dependency
  2008-06-22 15:50   ` Peter Zijlstra
@ 2008-06-23 12:02     ` Paul Jackson
  2008-06-24  6:29     ` Max Krasnyansky
  1 sibling, 0 replies; 11+ messages in thread
From: Paul Jackson @ 2008-06-23 12:02 UTC (permalink / raw)
  To: Peter Zijlstra, Gautham R Shenoy
  Cc: kosaki.motohiro, vegard.nossum, menage, containers, linux-kernel,
	maxk

CC'd Gautham R Shenoy <ego@in.ibm.com>.

I believe that we had the locking relation between what had been
cgroup_lock (global cgroup lock which can be held over large stretches
of non-performance critical code) and callback_mutex (global cpuset
specific lock which is held over shorter stretches of more performance
critical code - though still not on really hot code paths.)  One can
nest callback_mutex inside cgroup_lock, but not vice versa.

The callback_mutex guarded some CPU masks and Node masks, which might
be multi-word and hence don't change atomically.  Any low level code
that needs to read these these cpuset CPU and Node masks, needs to
hold callback_mutex briefly, to keep that mask from changing while
being read.

There is even a comment in kernel/cpuset.c, explaining how an ABBA
deadlock must be avoided when calling rebuild_sched_domains():

/*
 * rebuild_sched_domains()
 *
 * ...
 *
 * Call with cgroup_mutex held.  May take callback_mutex during
 * call due to the kfifo_alloc() and kmalloc() calls.  May nest
 * a call to the get_online_cpus()/put_online_cpus() pair.
 * Must not be called holding callback_mutex, because we must not
 * call get_online_cpus() while holding callback_mutex.  Elsewhere
 * the kernel nests callback_mutex inside get_online_cpus() calls.
 * So the reverse nesting would risk an ABBA deadlock.

This went into the kernel sometime around 2.6.18.

Then in October and November of 2007, Gautham R Shenoy submitted
"Refcount Based Cpu Hotplug" (http://lkml.org/lkml/2007/11/15/239)

This added cpu_hotplug.lock, which at first glance seems to fit into
the locking hierarchy about where callback_mutex did before, such as
being invocable from rebuild_sched_domains().

However ... the kernel/cpuset.c comments were not updated to describe
the intended locking hierarchy as it relates to cpu_hotplug.lock, and
it looks as if cpu_hotplug.lock can also be taken while invoking the
hotplug callbacks, such as the one here that is handling a CPU down
event for cpusets.

Gautham ... you there?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.940.382.4214

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: v2.6.26-rc7/cgroups: circular locking dependency
  2008-06-22 15:50   ` Peter Zijlstra
  2008-06-23 12:02     ` Paul Jackson
@ 2008-06-24  6:29     ` Max Krasnyansky
  2008-06-26  7:25       ` Paul Menage
  1 sibling, 1 reply; 11+ messages in thread
From: Max Krasnyansky @ 2008-06-24  6:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Vegard Nossum, Paul Menage, containers,
	linux-kernel, Paul Jackson

Peter Zijlstra wrote:
> On Mon, 2008-06-23 at 00:34 +0900, KOSAKI Motohiro wrote:
>> CC'ed Paul Jackson
>>
>> it seems typical ABBA deadlock.
>> I think cpuset use cgrou_lock() by mistake.
>>
>> IMHO, cpuset_handle_cpuhp() sholdn't use cgroup_lock() and
>> shouldn't call rebuild_sched_domains().
> 
> Looks like Max forgot to test with lockdep enabled... 
Hmm, I don't think I actually changed any lock nesting/dependencies. Did I ?
Oh, I see rebuild_sched_domains() is now called from cpuset hotplug handler.
I just looked at the comment for rebuild_sched_domains() and it says
" * Call with cgroup_mutex held. ..." that's why I thought it's safe and it
worked on the test stations.

Anyway, we need definitely need to make rebuild_sched_domains() work from the
hotplug handler.

> Well, someone should when you change the online map.
> 
> Max, Paul, can we handle this in update_sched_domains() instead?

That'd be exactly the same as calling rebuild_sched_domains() outside of the
cgroup_lock(). So I do not think it'll help. Paul has more info in his reply
so I'll reply to his email.

Max

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: v2.6.26-rc7/cgroups: circular locking dependency
  2008-06-24  6:29     ` Max Krasnyansky
@ 2008-06-26  7:25       ` Paul Menage
  2008-06-26 17:45         ` Max Krasnyansky
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Menage @ 2008-06-26  7:25 UTC (permalink / raw)
  To: Max Krasnyansky
  Cc: Peter Zijlstra, KOSAKI Motohiro, Vegard Nossum, containers,
	linux-kernel, Paul Jackson

On Mon, Jun 23, 2008 at 11:29 PM, Max Krasnyansky <maxk@qualcomm.com> wrote:
> Peter Zijlstra wrote:
>> On Mon, 2008-06-23 at 00:34 +0900, KOSAKI Motohiro wrote:
>>> CC'ed Paul Jackson
>>>
>>> it seems typical ABBA deadlock.
>>> I think cpuset use cgrou_lock() by mistake.
>>>
>>> IMHO, cpuset_handle_cpuhp() sholdn't use cgroup_lock() and
>>> shouldn't call rebuild_sched_domains().
>>
>> Looks like Max forgot to test with lockdep enabled...
> Hmm, I don't think I actually changed any lock nesting/dependencies. Did I ?
> Oh, I see rebuild_sched_domains() is now called from cpuset hotplug handler.
> I just looked at the comment for rebuild_sched_domains() and it says
> " * Call with cgroup_mutex held. ..." that's why I thought it's safe and it
> worked on the test stations.
>
> Anyway, we need definitely need to make rebuild_sched_domains() work from the
> hotplug handler.

In that case the obvious solution would be to nest inside
cgroup_lock() inside cpuhotplug.lock. i.e. require that
update_sched_domains() be called inside get_online_cpus(), and call
get_online_cpus() prior to calling cgroup_lock() in any code path that
might call update_sched_domains(). That's basically:

cpuset_write_u64()
cpuset_write_s64()
cpuset_destroy()
common_cpu_hotplug_unplug()
cpuset_write_resmask()

i.e. almost all the cpuset userspace APIs. A bit ugly, but probably
not a big deal given how infrequently CPU hotplug/hotunplug occurs?

Probably simplest with a wrapper function such as:

static bool cpuset_lock_live_cgroup(struct cgroup *cgrp)
{
  get_online_cpus();
  if (cgroup_lock_live_cgroup())
    return true;
  put_online_cpus();
  return false;
}

static void cpuset_unlock()
{
  cgroup_unlock();
  put_online_cpus();
}

and use those in the relevant entry points in place of
cgroup_lock_live_group()/cgroup_unlock()

Oh, except that cpuset_destroy() is called firmly inside cgroup_mutex,
and hence can't nest the call to cgroup_lock() inside the call to
get_online_cpus().

Second idea - can we just punt the call to rebuild_sched_domains() to
a workqueue thread if it's due to a flag or cpumask change? Does it
matter if the call doesn't happen synchronously? The work handler
could easily nest the cgroup_lock() call inside get_online_cpus() and
then call rebuild_sched_domains()

Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: v2.6.26-rc7/cgroups: circular locking dependency
  2008-06-26  7:25       ` Paul Menage
@ 2008-06-26 17:45         ` Max Krasnyansky
  0 siblings, 0 replies; 11+ messages in thread
From: Max Krasnyansky @ 2008-06-26 17:45 UTC (permalink / raw)
  To: Paul Menage
  Cc: Peter Zijlstra, KOSAKI Motohiro, Vegard Nossum, containers,
	linux-kernel, Paul Jackson

Paul Menage wrote:
> Second idea - can we just punt the call to rebuild_sched_domains() to
> a workqueue thread if it's due to a flag or cpumask change? Does it
> matter if the call doesn't happen synchronously? The work handler
> could easily nest the cgroup_lock() call inside get_online_cpus() and
> then call rebuild_sched_domains()

I was thinking about exactly the same thing. I kind of don't like async 
nature of it. Maybe it's ok but there might be some interesting races 
with async domain updates.

Max

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-06-26 17:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-21 17:38 v2.6.26-rc7/cgroups: circular locking dependency Vegard Nossum
2008-06-22  9:10 ` Cyrill Gorcunov
2008-06-22  9:42   ` Vegard Nossum
2008-06-22  9:50     ` Cyrill Gorcunov
2008-06-22 15:34 ` KOSAKI Motohiro
2008-06-22 15:50   ` Peter Zijlstra
2008-06-23 12:02     ` Paul Jackson
2008-06-24  6:29     ` Max Krasnyansky
2008-06-26  7:25       ` Paul Menage
2008-06-26 17:45         ` Max Krasnyansky
2008-06-22 16:02   ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox