[PATCH 0/8] CPU-Hotplug: Fix CPU-Hotplug <--> cpufreq locking dependency

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/8] CPU-Hotplug: Fix CPU-Hotplug <--> cpufreq locking dependency
@ 2008-04-29 12:56 Gautham R Shenoy
  2008-04-29 12:57 ` [PATCH 1/8] lockdep: fix recursive read lock validation Gautham R Shenoy
                   ` (7 more replies)
  0 siblings, 8 replies; 27+ messages in thread
From: Gautham R Shenoy @ 2008-04-29 12:56 UTC (permalink / raw)
  To: linux-kernel, Zdenek Kabelac, Peter Zijlstra, Oleg Nesterov,
	Heiko Carstens, Rafael J. Wysocki
  Cc: Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

Hello everyone,

A couple of months ago Zdenek Kabelac had reported a Circular locking
dependency in resume, which was related to the locking dependency
between the cpufreq and CPU-Hotplug subsystems.

http://lkml.org/lkml/2008/2/26/479

This patchset aggregates the various patches that have been posted in
the recent past, to solve this problem.

There are a few more TODO items in CPU-Hotplug , unrelated to
this problem, that cropped up in the various discussions.
They will be posted in a seperate thread.

The patchstack has been lightly tested on a 4-way x86 machine with
CPU-Hotplug running in parallel with kernbench and changing cpufreq
governors simultaneously.

Awaiting your feedback,
gautham

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/8] lockdep: fix recursive read lock validation
  2008-04-29 12:56 [PATCH 0/8] CPU-Hotplug: Fix CPU-Hotplug <--> cpufreq locking dependency Gautham R Shenoy
@ 2008-04-29 12:57 ` Gautham R Shenoy
  2008-04-29 13:16   ` Bart Van Assche
  2008-04-29 12:58 ` [PATCH 2/8] lockdep: reader-in-writer recursion Gautham R Shenoy
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Gautham R Shenoy @ 2008-04-29 12:57 UTC (permalink / raw)
  To: linux-kernel, Zdenek Kabelac, Peter Zijlstra, Oleg Nesterov,
	Heiko Carstens, Rafael J. Wysocki
  Cc: Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

Subject: lockdep: fix recursive read lock validation

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

__lock_acquire( .read = 2 )
  hlock->read = read; /* [1] */
  validate_chain()
    ret = check_deadlock(); /* returns 2 when recursive */

    if (ret == 2)
      hlock->read = 2; /* but it was already 2 from [1] */

    check_prevs_add()
      if (hlock->read != 2)
        /* add to dependency chain */

So it will never add a recursive read lock to the dependency chain. Fix this
by setting hlock->read to 1 when its the first recursive lock instance.

This means that the following sequence is now invalid, whereas previously
it was considered valid:

  rlock(a); rlock(b); runlock(b); runlock(a)
  rlock(b); rlock(a);

It really is invalid when considered against write locks.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---

 kernel/lockdep.c       |    9 ++++-----
 lib/locking-selftest.c |   12 ++++++------
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 81a4e4a..94b0f4f 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -1556,12 +1556,11 @@ static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
 		if (!ret)
 			return 0;
 		/*
-		 * Mark recursive read, as we jump over it when
-		 * building dependencies (just like we jump over
-		 * trylock entries):
+		 * If we are the first recursive read, don't jump over our
+		 * dependency.
 		 */
-		if (ret == 2)
-			hlock->read = 2;
+		if (hlock->read == 2 && ret != 2)
+			hlock->read = 1;
 		/*
 		 * Add dependency only if this lock is not the head
 		 * of the chain, and if it's not a secondary read-lock:
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 280332c..c84a689 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -1135,12 +1135,12 @@ void locking_selftest(void)
 	debug_locks_silent = !debug_locks_verbose;
 
 	DO_TESTCASE_6R("A-A deadlock", AA);
-	DO_TESTCASE_6R("A-B-B-A deadlock", ABBA);
-	DO_TESTCASE_6R("A-B-B-C-C-A deadlock", ABBCCA);
-	DO_TESTCASE_6R("A-B-C-A-B-C deadlock", ABCABC);
-	DO_TESTCASE_6R("A-B-B-C-C-D-D-A deadlock", ABBCCDDA);
-	DO_TESTCASE_6R("A-B-C-D-B-D-D-A deadlock", ABCDBDDA);
-	DO_TESTCASE_6R("A-B-C-D-B-C-D-A deadlock", ABCDBCDA);
+	DO_TESTCASE_6("A-B-B-A deadlock", ABBA);
+	DO_TESTCASE_6("A-B-B-C-C-A deadlock", ABBCCA);
+	DO_TESTCASE_6("A-B-C-A-B-C deadlock", ABCABC);
+	DO_TESTCASE_6("A-B-B-C-C-D-D-A deadlock", ABBCCDDA);
+	DO_TESTCASE_6("A-B-C-D-B-D-D-A deadlock", ABCDBDDA);
+	DO_TESTCASE_6("A-B-C-D-B-C-D-A deadlock", ABCDBCDA);
 	DO_TESTCASE_6("double unlock", double_unlock);
 	DO_TESTCASE_6("initialize held", init_held);
 	DO_TESTCASE_6_SUCCESS("bad unlock order", bad_unlock_order);
-- 
Thanks and Regards
gautham

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/8] lockdep: reader-in-writer recursion
  2008-04-29 12:56 [PATCH 0/8] CPU-Hotplug: Fix CPU-Hotplug <--> cpufreq locking dependency Gautham R Shenoy
  2008-04-29 12:57 ` [PATCH 1/8] lockdep: fix recursive read lock validation Gautham R Shenoy
@ 2008-04-29 12:58 ` Gautham R Shenoy
  2008-04-29 13:00 ` [PATCH 3/8] lockdep: fix fib_hash softirq inversion Gautham R Shenoy
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Gautham R Shenoy @ 2008-04-29 12:58 UTC (permalink / raw)
  To: linux-kernel, Zdenek Kabelac, Peter Zijlstra, Oleg Nesterov,
	Heiko Carstens, Rafael J. Wysocki
  Cc: Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

Subject: lockdep: reader-in-writer recursion

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Create a read mode that allows for reader-in-writer recursion

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---

 include/linux/lockdep.h |    1 +
 kernel/lockdep.c        |    9 ++++++++-
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 4c4d236..36e254f 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -291,6 +291,7 @@ extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
  *   0: exclusive (write) acquire
  *   1: read-acquire (no recursion allowed)
  *   2: read-acquire with same-instance recursion allowed
+ *   3: 2 + reader in writer recursion
  *
  * Values for check:
  *
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 94b0f4f..3859259 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -1280,6 +1280,13 @@ check_deadlock(struct task_struct *curr, struct held_lock *next,
 		 */
 		if ((read == 2) && prev->read)
 			return 2;
+		/*
+		 * Allow read-after-write recursion of the same
+		 * lock class (i.e. write_lock(lock)+read_lock(lock)):
+		 */
+		if (read == 3)
+			return 2;
+
 		return print_deadlock_bug(curr, prev, next);
 	}
 	return 1;
@@ -1559,7 +1566,7 @@ static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
 		 * If we are the first recursive read, don't jump over our
 		 * dependency.
 		 */
-		if (hlock->read == 2 && ret != 2)
+		if (hlock->read >= 2 && ret != 2)
 			hlock->read = 1;
 		/*
 		 * Add dependency only if this lock is not the head
-- 
Thanks and Regards
gautham

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/8] lockdep: fix fib_hash softirq inversion
  2008-04-29 12:56 [PATCH 0/8] CPU-Hotplug: Fix CPU-Hotplug <--> cpufreq locking dependency Gautham R Shenoy
  2008-04-29 12:57 ` [PATCH 1/8] lockdep: fix recursive read lock validation Gautham R Shenoy
  2008-04-29 12:58 ` [PATCH 2/8] lockdep: reader-in-writer recursion Gautham R Shenoy
@ 2008-04-29 13:00 ` Gautham R Shenoy
  2008-04-29 14:45   ` Peter Zijlstra
  2008-04-29 13:01 ` [PATCH 4/8] net: af_netlink: deadlock Gautham R Shenoy
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Gautham R Shenoy @ 2008-04-29 13:00 UTC (permalink / raw)
  To: linux-kernel, Zdenek Kabelac, Peter Zijlstra, Oleg Nesterov,
	Heiko Carstens, Rafael J. Wysocki
  Cc: Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri, Eric Dumazet,
	David S. Miller

Subject: lockdep: fix fib_hash softirq inversion

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.25-rc4-sched-devel.git #56
---------------------------------------------------------
swapper/0 just changed the state of lock:
 (&rt_hash_locks[i]){-+..}, at: [<ffffffff804721ac>] rt_intern_hash+0x8c/0x3b0
but this lock took another, soft-read-irq-unsafe lock in the past:
 (fib_hash_lock){-.-?}

and interrupts could create inverse lock ordering between them.


other info that might help us debug this:
1 lock held by swapper/0:
 #0:  (rcu_read_lock){..--}, at: [<ffffffff8045a706>] netif_receive_skb+0x76/0x350

the first lock's dependencies:
-> (&rt_hash_locks[i]){-+..} ops: 0 {
   initial-use  at:
                                       [<ffffffff80275900>] __lock_acquire+0x150/0x1080
                                       [<ffffffff80276895>] lock_acquire+0x65/0x90
                                       [<ffffffff804e99a6>] _spin_lock_bh+0x36/0x70
                                       [<ffffffff804721ac>] rt_intern_hash+0x8c/0x3b0
                                       [<ffffffff80472acd>] __ip_route_output_key+0x5fd/0xb10
                                       [<ffffffff80472ffc>] ip_route_output_flow+0x1c/0x70
                                       [<ffffffff8049b49d>] udp_sendmsg+0x5dd/0x760
                                       [<ffffffff804a2235>] inet_sendmsg+0x45/0x80
                                       [<ffffffff8044d677>] sock_sendmsg+0x127/0x140
                                       [<ffffffff8044e43a>] sys_sendto+0xea/0x120
                                       [<ffffffff8022044b>] system_call_after_swapgs+0x7b/0x80
                                       [<ffffffffffffffff>] 0xffffffffffffffff
   in-softirq-W at:
                                       [<ffffffffffffffff>] 0xffffffffffffffff
   hardirq-on-W at:
                                       [<ffffffff80275e72>] __lock_acquire+0x6c2/0x1080
                                       [<ffffffff80276895>] lock_acquire+0x65/0x90
                                       [<ffffffff804e99a6>] _spin_lock_bh+0x36/0x70
                                       [<ffffffff804721ac>] rt_intern_hash+0x8c/0x3b0
                                       [<ffffffff80472acd>] __ip_route_output_key+0x5fd/0xb10
                                       [<ffffffff80472ffc>] ip_route_output_flow+0x1c/0x70
                                       [<ffffffff8049b49d>] udp_sendmsg+0x5dd/0x760
                                       [<ffffffff804a2235>] inet_sendmsg+0x45/0x80
                                       [<ffffffff8044d677>] sock_sendmsg+0x127/0x140
                                       [<ffffffff8044e43a>] sys_sendto+0xea/0x120
                                       [<ffffffff8022044b>] system_call_after_swapgs+0x7b/0x80
                                       [<ffffffffffffffff>] 0xffffffffffffffff
 }
 ... key      at: [<ffffffff80b96b14>] __key.37724+0x0/0x8
 -> (&tbl->lock){-+-+} ops: 0 {
    initial-use  at:
                                         [<ffffffffffffffff>] 0xffffffffffffffff
    in-softirq-W at:
                                         [<ffffffffffffffff>] 0xffffffffffffffff
    hardirq-on-W at:
                                         [<ffffffffffffffff>] 0xffffffffffffffff
    in-softirq-R at:
                                         [<ffffffffffffffff>] 0xffffffffffffffff
    hardirq-on-R at:
                                         [<ffffffff80275bf0>] __lock_acquire+0x440/0x1080
                                         [<ffffffff80276895>] lock_acquire+0x65/0x90
                                         [<ffffffff804e9b59>] _read_lock_bh+0x39/0x70
                                         [<ffffffff80461a46>] neigh_lookup+0x76/0x150
                                         [<ffffffff8049c5a2>] arp_bind_neighbour+0x72/0xb0
                                         [<ffffffff8047223a>] rt_intern_hash+0x11a/0x3b0
                                         [<ffffffff80472acd>] __ip_route_output_key+0x5fd/0xb10
                                         [<ffffffff80472ffc>] ip_route_output_flow+0x1c/0x70
                                         [<ffffffff8049b49d>] udp_sendmsg+0x5dd/0x760
                                         [<ffffffff804a2235>] inet_sendmsg+0x45/0x80
                                         [<ffffffff8044d677>] sock_sendmsg+0x127/0x140
                                         [<ffffffff8044e43a>] sys_sendto+0xea/0x120
                                         [<ffffffff8022044b>] system_call_after_swapgs+0x7b/0x80
                                         [<ffffffffffffffff>] 0xffffffffffffffff
  }
  ... key      at: [<ffffffff80b967f8>] __key.29558+0x0/0x8
  -> (base_lock_keys + cpu){++..} ops: 0 {
     initial-use  at:
                                           [<ffffffff80275900>] __lock_acquire+0x150/0x1080
                                           [<ffffffff80276895>] lock_acquire+0x65/0x90
                                           [<ffffffff804e9d27>] _spin_lock_irqsave+0x47/0x80
                                           [<ffffffff8025a946>] lock_timer_base+0x36/0x70
                                           [<ffffffff8025aaab>] __mod_timer+0x3b/0xf0
                                           [<ffffffff8025adbe>] mod_timer+0x2e/0x50
                                           [<ffffffff806ccc6b>] con_init+0x26b/0x290
                                           [<ffffffff806cc0f2>] console_init+0x32/0x50
                                           [<ffffffff806abbc0>] start_kernel+0x1e0/0x330
                                           [<ffffffff806ab1b2>] _sinittext+0x1b2/0x200
                                           [<ffffffffffffffff>] 0xffffffffffffffff
     in-hardirq-W at:
                                           [<ffffffffffffffff>] 0xffffffffffffffff
     in-softirq-W at:
                                           [<ffffffffffffffff>] 0xffffffffffffffff
   }
   ... key      at: [<ffffffff807a7400>] base_lock_keys+0x0/0x40
  ... acquired at:
   [<ffffffffffffffff>] 0xffffffffffffffff

 ... acquired at:
   [<ffffffff80276734>] __lock_acquire+0xf84/0x1080
   [<ffffffff80276895>] lock_acquire+0x65/0x90
   [<ffffffff804e9b59>] _read_lock_bh+0x39/0x70
   [<ffffffff80461a46>] neigh_lookup+0x76/0x150
   [<ffffffff8049c5a2>] arp_bind_neighbour+0x72/0xb0
   [<ffffffff8047223a>] rt_intern_hash+0x11a/0x3b0
   [<ffffffff80472acd>] __ip_route_output_key+0x5fd/0xb10
   [<ffffffff80472ffc>] ip_route_output_flow+0x1c/0x70
   [<ffffffff8049b49d>] udp_sendmsg+0x5dd/0x760
   [<ffffffff804a2235>] inet_sendmsg+0x45/0x80
   [<ffffffff8044d677>] sock_sendmsg+0x127/0x140
   [<ffffffff8044e43a>] sys_sendto+0xea/0x120
   [<ffffffff8022044b>] system_call_after_swapgs+0x7b/0x80
   [<ffffffffffffffff>] 0xffffffffffffffff

 -> (fib_hash_lock){-.-?} ops: 0 {
    initial-use  at:
                                         [<ffffffff80275900>] __lock_acquire+0x150/0x1080
                                         [<ffffffff80276895>] lock_acquire+0x65/0x90
                                         [<ffffffff804e9a76>] _write_lock_bh+0x36/0x70
                                         [<ffffffff804ad443>] fn_hash_insert+0x5a3/0x720
                                         [<ffffffff804a88ef>] fib_magic+0x10f/0x120
                                         [<ffffffff804a8973>] fib_add_ifaddr+0x73/0x170
                                         [<ffffffff804a8b7b>] fib_inetaddr_event+0x4b/0x280
                                         [<ffffffff804ed193>] notifier_call_chain+0x53/0x80
                                         [<ffffffff8026b998>] __blocking_notifier_call_chain+0x58/0x80
                                         [<ffffffff8026b9d1>] blocking_notifier_call_chain+0x11/0x20
                                         [<ffffffff8049fa44>] __inet_insert_ifa+0xd4/0x170
                                         [<ffffffff8049faed>] inet_insert_ifa+0xd/0x10
                                         [<ffffffff804a0eff>] inetdev_event+0x3df/0x480
                                         [<ffffffff804ed193>] notifier_call_chain+0x53/0x80
                                         [<ffffffff8026b739>] __raw_notifier_call_chain+0x9/0x10
                                         [<ffffffff8026b751>] raw_notifier_call_chain+0x11/0x20
                                         [<ffffffff8045b3a6>] call_netdevice_notifiers+0x16/0x20
                                         [<ffffffff8045ce52>] dev_open+0x82/0x90
                                         [<ffffffff8045b529>] dev_change_flags+0x99/0x1b0
                                         [<ffffffff804a171e>] devinet_ioctl+0x5ce/0x780
                                         [<ffffffff804a205c>] inet_ioctl+0x5c/0x80
                                         [<ffffffff8044c8b5>] sock_ioctl+0x65/0x250
                                         [<ffffffff802d5241>] vfs_ioctl+0x31/0x90
                                         [<ffffffff802d5313>] do_vfs_ioctl+0x73/0x2d0
                                         [<ffffffff802d55f2>] sys_ioctl+0x82/0xa0
                                         [<ffffffff8022044b>] system_call_after_swapgs+0x7b/0x80
                                         [<ffffffffffffffff>] 0xffffffffffffffff
    hardirq-on-W at:
                                         [<ffffffff80275e72>] __lock_acquire+0x6c2/0x1080
                                         [<ffffffff80276895>] lock_acquire+0x65/0x90
                                         [<ffffffff804e9a76>] _write_lock_bh+0x36/0x70
                                         [<ffffffff804ad443>] fn_hash_insert+0x5a3/0x720
                                         [<ffffffff804a88ef>] fib_magic+0x10f/0x120
                                         [<ffffffff804a8973>] fib_add_ifaddr+0x73/0x170
                                         [<ffffffff804a8b7b>] fib_inetaddr_event+0x4b/0x280
                                         [<ffffffff804ed193>] notifier_call_chain+0x53/0x80
                                         [<ffffffff8026b998>] __blocking_notifier_call_chain+0x58/0x80
                                         [<ffffffff8026b9d1>] blocking_notifier_call_chain+0x11/0x20
                                         [<ffffffff8049fa44>] __inet_insert_ifa+0xd4/0x170
                                         [<ffffffff8049faed>] inet_insert_ifa+0xd/0x10
                                         [<ffffffff804a0eff>] inetdev_event+0x3df/0x480
                                         [<ffffffff804ed193>] notifier_call_chain+0x53/0x80
                                         [<ffffffff8026b739>] __raw_notifier_call_chain+0x9/0x10
                                         [<ffffffff8026b751>] raw_notifier_call_chain+0x11/0x20
                                         [<ffffffff8045b3a6>] call_netdevice_notifiers+0x16/0x20
                                         [<ffffffff8045ce52>] dev_open+0x82/0x90
                                         [<ffffffff8045b529>] dev_change_flags+0x99/0x1b0
                                         [<ffffffff804a171e>] devinet_ioctl+0x5ce/0x780
                                         [<ffffffff804a205c>] inet_ioctl+0x5c/0x80
                                         [<ffffffff8044c8b5>] sock_ioctl+0x65/0x250
                                         [<ffffffff802d5241>] vfs_ioctl+0x31/0x90
                                         [<ffffffff802d5313>] do_vfs_ioctl+0x73/0x2d0
                                         [<ffffffff802d55f2>] sys_ioctl+0x82/0xa0
                                         [<ffffffff8022044b>] system_call_after_swapgs+0x7b/0x80
                                         [<ffffffffffffffff>] 0xffffffffffffffff
    in-softirq-R at:
                                         [<ffffffffffffffff>] 0xffffffffffffffff
    softirq-on-R at:
                                         [<ffffffff80275c1b>] __lock_acquire+0x46b/0x1080
                                         [<ffffffff80276895>] lock_acquire+0x65/0x90
                                         [<ffffffff804e9bc4>] _read_lock+0x34/0x70
                                         [<ffffffff804acdd7>] fn_hash_lookup+0x27/0xf0
                                         [<ffffffff804a855d>] inet_addr_type+0x6d/0xf0
                                         [<ffffffff804aada2>] fib_create_info+0x5a2/0xbf0
                                         [<ffffffff804acf0b>] fn_hash_insert+0x6b/0x720
                                         [<ffffffff804a88ef>] fib_magic+0x10f/0x120
                                         [<ffffffff804a89c8>] fib_add_ifaddr+0xc8/0x170
                                         [<ffffffff804a8b7b>] fib_inetaddr_event+0x4b/0x280
                                         [<ffffffff804ed193>] notifier_call_chain+0x53/0x80
                                         [<ffffffff8026b998>] __blocking_notifier_call_chain+0x58/0x80
                                         [<ffffffff8026b9d1>] blocking_notifier_call_chain+0x11/0x20
                                         [<ffffffff8049fa44>] __inet_insert_ifa+0xd4/0x170
                                         [<ffffffff8049faed>] inet_insert_ifa+0xd/0x10
                                         [<ffffffff804a0eff>] inetdev_event+0x3df/0x480
                                         [<ffffffff804ed193>] notifier_call_chain+0x53/0x80
                                         [<ffffffff8026b739>] __raw_notifier_call_chain+0x9/0x10
                                         [<ffffffff8026b751>] raw_notifier_call_chain+0x11/0x20
                                         [<ffffffff8045b3a6>] call_netdevice_notifiers+0x16/0x20
                                         [<ffffffff8045ce52>] dev_open+0x82/0x90
                                         [<ffffffff8045b529>] dev_change_flags+0x99/0x1b0
                                         [<ffffffff804a171e>] devinet_ioctl+0x5ce/0x780
                                         [<ffffffff804a205c>] inet_ioctl+0x5c/0x80
                                         [<ffffffff8044c8b5>] sock_ioctl+0x65/0x250
                                         [<ffffffff802d5241>] vfs_ioctl+0x31/0x90
                                         [<ffffffff802d5313>] do_vfs_ioctl+0x73/0x2d0
                                         [<ffffffff802d55f2>] sys_ioctl+0x82/0xa0
                                         [<ffffffff8022044b>] system_call_after_swapgs+0x7b/0x80
                                         [<ffffffffffffffff>] 0xffffffffffffffff
    hardirq-on-R at:
                                         [<ffffffff80275bf0>] __lock_acquire+0x440/0x1080
                                         [<ffffffff80276895>] lock_acquire+0x65/0x90
                                         [<ffffffff804e9bc4>] _read_lock+0x34/0x70
                                         [<ffffffff804acdd7>] fn_hash_lookup+0x27/0xf0
                                         [<ffffffff804a855d>] inet_addr_type+0x6d/0xf0
                                         [<ffffffff804aada2>] fib_create_info+0x5a2/0xbf0
                                         [<ffffffff804acf0b>] fn_hash_insert+0x6b/0x720
                                         [<ffffffff804a88ef>] fib_magic+0x10f/0x120
                                         [<ffffffff804a89c8>] fib_add_ifaddr+0xc8/0x170
                                         [<ffffffff804a8b7b>] fib_inetaddr_event+0x4b/0x280
                                         [<ffffffff804ed193>] notifier_call_chain+0x53/0x80
                                         [<ffffffff8026b998>] __blocking_notifier_call_chain+0x58/0x80
                                         [<ffffffff8026b9d1>] blocking_notifier_call_chain+0x11/0x20
                                         [<ffffffff8049fa44>] __inet_insert_ifa+0xd4/0x170
                                         [<ffffffff8049faed>] inet_insert_ifa+0xd/0x10
                                         [<ffffffff804a0eff>] inetdev_event+0x3df/0x480
                                         [<ffffffff804ed193>] notifier_call_chain+0x53/0x80
                                         [<ffffffff8026b739>] __raw_notifier_call_chain+0x9/0x10
                                         [<ffffffff8026b751>] raw_notifier_call_chain+0x11/0x20
                                         [<ffffffff8045b3a6>] call_netdevice_notifiers+0x16/0x20
                                         [<ffffffff8045ce52>] dev_open+0x82/0x90
                                         [<ffffffff8045b529>] dev_change_flags+0x99/0x1b0
                                         [<ffffffff804a171e>] devinet_ioctl+0x5ce/0x780
                                         [<ffffffff804a205c>] inet_ioctl+0x5c/0x80
                                         [<ffffffff8044c8b5>] sock_ioctl+0x65/0x250
                                         [<ffffffff802d5241>] vfs_ioctl+0x31/0x90
                                         [<ffffffff802d5313>] do_vfs_ioctl+0x73/0x2d0
                                         [<ffffffff802d55f2>] sys_ioctl+0x82/0xa0
                                         [<ffffffff8022044b>] system_call_after_swapgs+0x7b/0x80
                                         [<ffffffffffffffff>] 0xffffffffffffffff
  }
  ... key      at: [<ffffffff806799b8>] fib_hash_lock+0x18/0x40
 ... acquired at:
   [<ffffffff80276734>] __lock_acquire+0xf84/0x1080
   [<ffffffff80276895>] lock_acquire+0x65/0x90
   [<ffffffff804e9bc4>] _read_lock+0x34/0x70
   [<ffffffff804acdd7>] fn_hash_lookup+0x27/0xf0
   [<ffffffff804a855d>] inet_addr_type+0x6d/0xf0
   [<ffffffff8049dcc6>] arp_constructor+0x86/0x280
   [<ffffffff8046272f>] neigh_create+0x19f/0x5b0
   [<ffffffff8049c5cf>] arp_bind_neighbour+0x9f/0xb0
   [<ffffffff8047223a>] rt_intern_hash+0x11a/0x3b0
   [<ffffffff80472acd>] __ip_route_output_key+0x5fd/0xb10
   [<ffffffff804947ec>] tcp_v4_connect+0x10c/0x550
   [<ffffffff804a343a>] inet_stream_connect+0x23a/0x2d0
   [<ffffffff8044e530>] sys_connect+0xa0/0xc0
   [<ffffffff8022044b>] system_call_after_swapgs+0x7b/0x80
   [<ffffffffffffffff>] 0xffffffffffffffff


the second lock's dependencies:
-> (fib_hash_lock){-.-?} ops: 0 {
   initial-use  at:
                                       [<ffffffff80275900>] __lock_acquire+0x150/0x1080
                                       [<ffffffff80276895>] lock_acquire+0x65/0x90
                                       [<ffffffff804e9a76>] _write_lock_bh+0x36/0x70
                                       [<ffffffff804ad443>] fn_hash_insert+0x5a3/0x720
                                       [<ffffffff804a88ef>] fib_magic+0x10f/0x120
                                       [<ffffffff804a8973>] fib_add_ifaddr+0x73/0x170
                                       [<ffffffff804a8b7b>] fib_inetaddr_event+0x4b/0x280
                                       [<ffffffff804ed193>] notifier_call_chain+0x53/0x80
                                       [<ffffffff8026b998>] __blocking_notifier_call_chain+0x58/0x80
                                       [<ffffffff8026b9d1>] blocking_notifier_call_chain+0x11/0x20
                                       [<ffffffff8049fa44>] __inet_insert_ifa+0xd4/0x170
                                       [<ffffffff8049faed>] inet_insert_ifa+0xd/0x10
                                       [<ffffffff804a0eff>] inetdev_event+0x3df/0x480
                                       [<ffffffff804ed193>] notifier_call_chain+0x53/0x80
                                       [<ffffffff8026b739>] __raw_notifier_call_chain+0x9/0x10
                                       [<ffffffff8026b751>] raw_notifier_call_chain+0x11/0x20
                                       [<ffffffff8045b3a6>] call_netdevice_notifiers+0x16/0x20
                                       [<ffffffff8045ce52>] dev_open+0x82/0x90
                                       [<ffffffff8045b529>] dev_change_flags+0x99/0x1b0
                                       [<ffffffff804a171e>] devinet_ioctl+0x5ce/0x780
                                       [<ffffffff804a205c>] inet_ioctl+0x5c/0x80
                                       [<ffffffff8044c8b5>] sock_ioctl+0x65/0x250
                                       [<ffffffff802d5241>] vfs_ioctl+0x31/0x90
                                       [<ffffffff802d5313>] do_vfs_ioctl+0x73/0x2d0
                                       [<ffffffff802d55f2>] sys_ioctl+0x82/0xa0
                                       [<ffffffff8022044b>] system_call_after_swapgs+0x7b/0x80
                                       [<ffffffffffffffff>] 0xffffffffffffffff
   hardirq-on-W at:
                                       [<ffffffff80275e72>] __lock_acquire+0x6c2/0x1080
                                       [<ffffffff80276895>] lock_acquire+0x65/0x90
                                       [<ffffffff804e9a76>] _write_lock_bh+0x36/0x70
                                       [<ffffffff804ad443>] fn_hash_insert+0x5a3/0x720
                                       [<ffffffff804a88ef>] fib_magic+0x10f/0x120
                                       [<ffffffff804a8973>] fib_add_ifaddr+0x73/0x170
                                       [<ffffffff804a8b7b>] fib_inetaddr_event+0x4b/0x280
                                       [<ffffffff804ed193>] notifier_call_chain+0x53/0x80
                                       [<ffffffff8026b998>] __blocking_notifier_call_chain+0x58/0x80
                                       [<ffffffff8026b9d1>] blocking_notifier_call_chain+0x11/0x20
                                       [<ffffffff8049fa44>] __inet_insert_ifa+0xd4/0x170
                                       [<ffffffff8049faed>] inet_insert_ifa+0xd/0x10
                                       [<ffffffff804a0eff>] inetdev_event+0x3df/0x480
                                       [<ffffffff804ed193>] notifier_call_chain+0x53/0x80
                                       [<ffffffff8026b739>] __raw_notifier_call_chain+0x9/0x10
                                       [<ffffffff8026b751>] raw_notifier_call_chain+0x11/0x20
                                       [<ffffffff8045b3a6>] call_netdevice_notifiers+0x16/0x20
                                       [<ffffffff8045ce52>] dev_open+0x82/0x90
                                       [<ffffffff8045b529>] dev_change_flags+0x99/0x1b0
                                       [<ffffffff804a171e>] devinet_ioctl+0x5ce/0x780
                                       [<ffffffff804a205c>] inet_ioctl+0x5c/0x80
                                       [<ffffffff8044c8b5>] sock_ioctl+0x65/0x250
                                       [<ffffffff802d5241>] vfs_ioctl+0x31/0x90
                                       [<ffffffff802d5313>] do_vfs_ioctl+0x73/0x2d0
                                       [<ffffffff802d55f2>] sys_ioctl+0x82/0xa0
                                       [<ffffffff8022044b>] system_call_after_swapgs+0x7b/0x80
                                       [<ffffffffffffffff>] 0xffffffffffffffff
   in-softirq-R at:
                                       [<ffffffffffffffff>] 0xffffffffffffffff
   softirq-on-R at:
                                       [<ffffffff80275c1b>] __lock_acquire+0x46b/0x1080
                                       [<ffffffff80276895>] lock_acquire+0x65/0x90
                                       [<ffffffff804e9bc4>] _read_lock+0x34/0x70
                                       [<ffffffff804acdd7>] fn_hash_lookup+0x27/0xf0
                                       [<ffffffff804a855d>] inet_addr_type+0x6d/0xf0
                                       [<ffffffff804aada2>] fib_create_info+0x5a2/0xbf0
                                       [<ffffffff804acf0b>] fn_hash_insert+0x6b/0x720
                                       [<ffffffff804a88ef>] fib_magic+0x10f/0x120
                                       [<ffffffff804a89c8>] fib_add_ifaddr+0xc8/0x170
                                       [<ffffffff804a8b7b>] fib_inetaddr_event+0x4b/0x280
                                       [<ffffffff804ed193>] notifier_call_chain+0x53/0x80
                                       [<ffffffff8026b998>] __blocking_notifier_call_chain+0x58/0x80
                                       [<ffffffff8026b9d1>] blocking_notifier_call_chain+0x11/0x20
                                       [<ffffffff8049fa44>] __inet_insert_ifa+0xd4/0x170
                                       [<ffffffff8049faed>] inet_insert_ifa+0xd/0x10
                                       [<ffffffff804a0eff>] inetdev_event+0x3df/0x480
                                       [<ffffffff804ed193>] notifier_call_chain+0x53/0x80
                                       [<ffffffff8026b739>] __raw_notifier_call_chain+0x9/0x10
                                       [<ffffffff8026b751>] raw_notifier_call_chain+0x11/0x20
                                       [<ffffffff8045b3a6>] call_netdevice_notifiers+0x16/0x20
                                       [<ffffffff8045ce52>] dev_open+0x82/0x90
                                       [<ffffffff8045b529>] dev_change_flags+0x99/0x1b0
                                       [<ffffffff804a171e>] devinet_ioctl+0x5ce/0x780
                                       [<ffffffff804a205c>] inet_ioctl+0x5c/0x80
                                       [<ffffffff8044c8b5>] sock_ioctl+0x65/0x250
                                       [<ffffffff802d5241>] vfs_ioctl+0x31/0x90
                                       [<ffffffff802d5313>] do_vfs_ioctl+0x73/0x2d0
                                       [<ffffffff802d55f2>] sys_ioctl+0x82/0xa0
                                       [<ffffffff8022044b>] system_call_after_swapgs+0x7b/0x80
                                       [<ffffffffffffffff>] 0xffffffffffffffff
   hardirq-on-R at:
                                       [<ffffffff80275bf0>] __lock_acquire+0x440/0x1080
                                       [<ffffffff80276895>] lock_acquire+0x65/0x90
                                       [<ffffffff804e9bc4>] _read_lock+0x34/0x70
                                       [<ffffffff804acdd7>] fn_hash_lookup+0x27/0xf0
                                       [<ffffffff804a855d>] inet_addr_type+0x6d/0xf0
                                       [<ffffffff804aada2>] fib_create_info+0x5a2/0xbf0
                                       [<ffffffff804acf0b>] fn_hash_insert+0x6b/0x720
                                       [<ffffffff804a88ef>] fib_magic+0x10f/0x120
                                       [<ffffffff804a89c8>] fib_add_ifaddr+0xc8/0x170
                                       [<ffffffff804a8b7b>] fib_inetaddr_event+0x4b/0x280
                                       [<ffffffff804ed193>] notifier_call_chain+0x53/0x80
                                       [<ffffffff8026b998>] __blocking_notifier_call_chain+0x58/0x80
                                       [<ffffffff8026b9d1>] blocking_notifier_call_chain+0x11/0x20
                                       [<ffffffff8049fa44>] __inet_insert_ifa+0xd4/0x170
                                       [<ffffffff8049faed>] inet_insert_ifa+0xd/0x10
                                       [<ffffffff804a0eff>] inetdev_event+0x3df/0x480
                                       [<ffffffff804ed193>] notifier_call_chain+0x53/0x80
                                       [<ffffffff8026b739>] __raw_notifier_call_chain+0x9/0x10
                                       [<ffffffff8026b751>] raw_notifier_call_chain+0x11/0x20
                                       [<ffffffff8045b3a6>] call_netdevice_notifiers+0x16/0x20
                                       [<ffffffff8045ce52>] dev_open+0x82/0x90
                                       [<ffffffff8045b529>] dev_change_flags+0x99/0x1b0
                                       [<ffffffff804a171e>] devinet_ioctl+0x5ce/0x780
                                       [<ffffffff804a205c>] inet_ioctl+0x5c/0x80
                                       [<ffffffff8044c8b5>] sock_ioctl+0x65/0x250
                                       [<ffffffff802d5241>] vfs_ioctl+0x31/0x90
                                       [<ffffffff802d5313>] do_vfs_ioctl+0x73/0x2d0
                                       [<ffffffff802d55f2>] sys_ioctl+0x82/0xa0
                                       [<ffffffff8022044b>] system_call_after_swapgs+0x7b/0x80
                                       [<ffffffffffffffff>] 0xffffffffffffffff
 }
 ... key      at: [<ffffffff806799b8>] fib_hash_lock+0x18/0x40

stack backtrace:
Pid: 0, comm: swapper Not tainted 2.6.25-rc4-sched-devel.git #56

Call Trace:
 <IRQ>  [<ffffffff80273a4c>] print_irq_inversion_bug+0x13c/0x160
 [<ffffffff80274a9d>] check_usage_forwards+0x4d/0x60
 [<ffffffff80274c46>] mark_lock+0x196/0x5e0
 [<ffffffff80275d15>] __lock_acquire+0x565/0x1080
 [<ffffffff80276895>] lock_acquire+0x65/0x90
 [<ffffffff804721ac>] ? rt_intern_hash+0x8c/0x3b0
 [<ffffffff804e99a6>] _spin_lock_bh+0x36/0x70
 [<ffffffff804721ac>] rt_intern_hash+0x8c/0x3b0
 [<ffffffff80473a4e>] ip_route_input+0x8fe/0x1250
 [<ffffffff8027530d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8027530d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff804769a8>] ip_rcv+0x518/0x650
 [<ffffffff8045a8f9>] netif_receive_skb+0x269/0x350
 [<ffffffff80405529>] e1000_clean_rx_irq+0x1a9/0x5a0
 [<ffffffff8040072b>] e1000_clean+0x1fb/0x5b0
 [<ffffffff8027268d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff8045d18f>] net_rx_action+0xbf/0x180
 [<ffffffff80255c55>] __do_softirq+0x75/0x100
 [<ffffffff8022173c>] call_softirq+0x1c/0x30
 [<ffffffff80223505>] do_softirq+0x65/0xa0
 [<ffffffff80255b67>] irq_exit+0x97/0xa0
 [<ffffffff80223618>] do_IRQ+0xa8/0x130
 [<ffffffff80220a06>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff8021ef83>] ? default_idle+0x43/0x80
 [<ffffffff8021ef81>] ? default_idle+0x41/0x80
 [<ffffffff8021ef40>] ? default_idle+0x0/0x80
 [<ffffffff8021f02d>] ? cpu_idle+0x6d/0x120
 [<ffffffff804e3edb>] ? start_secondary+0x31b/0x420

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Eric Dumazet <dada1@cosmosbay.com>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---

 net/ipv4/fib_hash.c |   18 +++++++++---------
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c
index 8d58d85..9b48b18 100644
--- a/net/ipv4/fib_hash.c
+++ b/net/ipv4/fib_hash.c
@@ -251,7 +251,7 @@ fn_hash_lookup(struct fib_table *tb, const struct flowi *flp, struct fib_result
 	struct fn_zone *fz;
 	struct fn_hash *t = (struct fn_hash*)tb->tb_data;
 
-	read_lock(&fib_hash_lock);
+	read_lock_bh(&fib_hash_lock);
 	for (fz = t->fn_zone_list; fz; fz = fz->fz_next) {
 		struct hlist_head *head;
 		struct hlist_node *node;
@@ -273,7 +273,7 @@ fn_hash_lookup(struct fib_table *tb, const struct flowi *flp, struct fib_result
 	}
 	err = 1;
 out:
-	read_unlock(&fib_hash_lock);
+	read_unlock_bh(&fib_hash_lock);
 	return err;
 }
 
@@ -295,7 +295,7 @@ fn_hash_select_default(struct fib_table *tb, const struct flowi *flp, struct fib
 	last_resort = NULL;
 	order = -1;
 
-	read_lock(&fib_hash_lock);
+	read_lock_bh(&fib_hash_lock);
 	hlist_for_each_entry(f, node, &fz->fz_hash[0], fn_hash) {
 		struct fib_alias *fa;
 
@@ -343,7 +343,7 @@ fn_hash_select_default(struct fib_table *tb, const struct flowi *flp, struct fib
 		fib_result_assign(res, last_resort);
 	tb->tb_default = last_idx;
 out:
-	read_unlock(&fib_hash_lock);
+	read_unlock_bh(&fib_hash_lock);
 }
 
 /* Insert node F to FZ. */
@@ -753,18 +753,18 @@ static int fn_hash_dump(struct fib_table *tb, struct sk_buff *skb, struct netlin
 	struct fn_hash *table = (struct fn_hash*)tb->tb_data;
 
 	s_m = cb->args[2];
-	read_lock(&fib_hash_lock);
+	read_lock_bh(&fib_hash_lock);
 	for (fz = table->fn_zone_list, m=0; fz; fz = fz->fz_next, m++) {
 		if (m < s_m) continue;
 		if (fn_hash_dump_zone(skb, cb, tb, fz) < 0) {
 			cb->args[2] = m;
-			read_unlock(&fib_hash_lock);
+			read_unlock_bh(&fib_hash_lock);
 			return -1;
 		}
 		memset(&cb->args[3], 0,
 		       sizeof(cb->args) - 3*sizeof(cb->args[0]));
 	}
-	read_unlock(&fib_hash_lock);
+	read_unlock_bh(&fib_hash_lock);
 	cb->args[2] = m;
 	return skb->len;
 }
@@ -962,7 +962,7 @@ static void *fib_seq_start(struct seq_file *seq, loff_t *pos)
 	struct fib_iter_state *iter = seq->private;
 	void *v = NULL;
 
-	read_lock(&fib_hash_lock);
+	read_lock_bh(&fib_hash_lock);
 	if (fib_get_table(iter->p.net, RT_TABLE_MAIN))
 		v = *pos ? fib_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;
 	return v;
@@ -977,7 +977,7 @@ static void *fib_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 static void fib_seq_stop(struct seq_file *seq, void *v)
 	__releases(fib_hash_lock)
 {
-	read_unlock(&fib_hash_lock);
+	read_unlock_bh(&fib_hash_lock);
 }
 
 static unsigned fib_flag_trans(int type, __be32 mask, struct fib_info *fi)
-- 
Thanks and Regards
gautham

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/8] net: af_netlink: deadlock
  2008-04-29 12:56 [PATCH 0/8] CPU-Hotplug: Fix CPU-Hotplug <--> cpufreq locking dependency Gautham R Shenoy
                   ` (2 preceding siblings ...)
  2008-04-29 13:00 ` [PATCH 3/8] lockdep: fix fib_hash softirq inversion Gautham R Shenoy
@ 2008-04-29 13:01 ` Gautham R Shenoy
  2008-04-29 13:19   ` Hans Reiser, reiserfs developer linux-os (Dick Johnson)
  2008-04-29 13:02 ` [PATCH 5/8] cpu: cpu-hotplug deadlock Gautham R Shenoy
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Gautham R Shenoy @ 2008-04-29 13:01 UTC (permalink / raw)
  To: linux-kernel, Zdenek Kabelac, Peter Zijlstra, Oleg Nesterov,
	Heiko Carstens, Rafael J. Wysocki
  Cc: Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri, Eric Dumazet,
	David S. Miller

Subject: net: af_netlink: deadlock

From: Gautham R Shenoy <ego@in.ibm.com>

=========================================================
    [ INFO: possible irq lock inversion dependency detected ]
    2.6.25-rc3 #61
    ---------------------------------------------------------
    swapper/0 just changed the state of lock:
     (&tb->tb6_lock){-+..}, at: [<c049a5ab>] __ip6_ins_rt+0x1e/0x3c
    but this lock took another, soft-read-irq-unsafe lock in the past:
     (nl_table_lock){..-?}

    and interrupts could create inverse lock ordering between them.

    Full lockdep report at
    http://gautham.shenoy.googlepages.com/nl_table_lock_irq_inversion.log

    This patch fixes the problem by making the read acquire of nl_table_lock
    softirq safe.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: David S. Miller <davem@davemloft.net>
---

 net/netlink/af_netlink.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 1ab0da2..7ade317 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -205,9 +205,9 @@ netlink_lock_table(void)
 {
 	/* read_lock() synchronizes us to netlink_table_grab */
 
-	read_lock(&nl_table_lock);
+	read_lock_bh(&nl_table_lock);
 	atomic_inc(&nl_table_users);
-	read_unlock(&nl_table_lock);
+	read_unlock_bh(&nl_table_lock);
 }
 
 static inline void
@@ -225,7 +225,7 @@ static inline struct sock *netlink_lookup(struct net *net, int protocol,
 	struct sock *sk;
 	struct hlist_node *node;
 
-	read_lock(&nl_table_lock);
+	read_lock_bh(&nl_table_lock);
 	head = nl_pid_hashfn(hash, pid);
 	sk_for_each(sk, node, head) {
 		if ((sk->sk_net == net) && (nlk_sk(sk)->pid == pid)) {
@@ -235,7 +235,7 @@ static inline struct sock *netlink_lookup(struct net *net, int protocol,
 	}
 	sk = NULL;
 found:
-	read_unlock(&nl_table_lock);
+	read_unlock_bh(&nl_table_lock);
 	return sk;
 }
 
@@ -1078,12 +1078,12 @@ void netlink_set_err(struct sock *ssk, u32 pid, u32 group, int code)
 	info.group = group;
 	info.code = code;
 
-	read_lock(&nl_table_lock);
+	read_lock_bh(&nl_table_lock);
 
 	sk_for_each_bound(sk, node, &nl_table[ssk->sk_protocol].mc_list)
 		do_one_set_err(sk, &info);
 
-	read_unlock(&nl_table_lock);
+	read_unlock_bh(&nl_table_lock);
 }
 
 /* must be called with netlink table grabbed */
@@ -1776,7 +1776,7 @@ static struct sock *netlink_seq_socket_idx(struct seq_file *seq, loff_t pos)
 static void *netlink_seq_start(struct seq_file *seq, loff_t *pos)
 	__acquires(nl_table_lock)
 {
-	read_lock(&nl_table_lock);
+	read_lock_bh(&nl_table_lock);
 	return *pos ? netlink_seq_socket_idx(seq, *pos - 1) : SEQ_START_TOKEN;
 }
 
@@ -1825,7 +1825,7 @@ static void *netlink_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 static void netlink_seq_stop(struct seq_file *seq, void *v)
 	__releases(nl_table_lock)
 {
-	read_unlock(&nl_table_lock);
+	read_unlock_bh(&nl_table_lock);
 }
 
 
-- 
Thanks and Regards
gautham

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/8] cpu: cpu-hotplug deadlock
  2008-04-29 12:56 [PATCH 0/8] CPU-Hotplug: Fix CPU-Hotplug <--> cpufreq locking dependency Gautham R Shenoy
                   ` (3 preceding siblings ...)
  2008-04-29 13:01 ` [PATCH 4/8] net: af_netlink: deadlock Gautham R Shenoy
@ 2008-04-29 13:02 ` Gautham R Shenoy
  2008-04-29 14:33   ` Oleg Nesterov
  2008-04-29 13:02 ` [PATCH 6/8] lockdep: annotate cpu_hotplug Gautham R Shenoy
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Gautham R Shenoy @ 2008-04-29 13:02 UTC (permalink / raw)
  To: linux-kernel, Zdenek Kabelac, Peter Zijlstra, Oleg Nesterov,
	Heiko Carstens, Rafael J. Wysocki
  Cc: Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

Subject: cpu: cpu-hotplug deadlock

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

cpu_hotplug.mutex is basically a lock-internal lock; but by keeping it locked
over the 'write' section (cpu_hotplug_begin/done) a lock inversion happens when
some of the write side code calls into code that would otherwise take a
read lock.

And it so happens that read-in-write recursion is expressly permitted.

Fix this by turning cpu_hotplug into a proper stand alone unfair reader/writer
lock that allows reader-in-reader and reader-in-writer recursion.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---

 kernel/cpu.c |   97 +++++++++++++++++++++++++++++++++-------------------------
 1 files changed, 55 insertions(+), 42 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 2eff3f6..e856c22 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -27,12 +27,13 @@ static int cpu_hotplug_disabled;
 
 static struct {
 	struct task_struct *active_writer;
-	struct mutex lock; /* Synchronizes accesses to refcount, */
+	spinlock_t lock; /* Synchronizes accesses to refcount, */
 	/*
 	 * Also blocks the new readers during
 	 * an ongoing cpu hotplug operation.
 	 */
 	int refcount;
+	wait_queue_head_t reader_queue;
 	wait_queue_head_t writer_queue;
 } cpu_hotplug;
 
@@ -41,8 +42,9 @@ static struct {
 void __init cpu_hotplug_init(void)
 {
 	cpu_hotplug.active_writer = NULL;
-	mutex_init(&cpu_hotplug.lock);
+	spin_lock_init(&cpu_hotplug.lock);
 	cpu_hotplug.refcount = 0;
+	init_waitqueue_head(&cpu_hotplug.reader_queue);
 	init_waitqueue_head(&cpu_hotplug.writer_queue);
 }
 
@@ -51,27 +53,42 @@ void __init cpu_hotplug_init(void)
 void get_online_cpus(void)
 {
 	might_sleep();
+
+	spin_lock(&cpu_hotplug.lock);
 	if (cpu_hotplug.active_writer == current)
-		return;
-	mutex_lock(&cpu_hotplug.lock);
+		goto unlock;
+
+	if (cpu_hotplug.active_writer) {
+		DEFINE_WAIT(wait);
+
+		for (;;) {
+			prepare_to_wait(&cpu_hotplug.reader_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (!cpu_hotplug.active_writer)
+				break;
+			spin_unlock(&cpu_hotplug.lock);
+			schedule();
+			spin_lock(&cpu_hotplug.lock);
+		}
+		finish_wait(&cpu_hotplug.reader_queue, &wait);
+	}
 	cpu_hotplug.refcount++;
-	mutex_unlock(&cpu_hotplug.lock);
-
+ unlock:
+	spin_unlock(&cpu_hotplug.lock);
 }
 EXPORT_SYMBOL_GPL(get_online_cpus);
 
 void put_online_cpus(void)
 {
+	spin_lock(&cpu_hotplug.lock);
 	if (cpu_hotplug.active_writer == current)
-		return;
-	mutex_lock(&cpu_hotplug.lock);
-	cpu_hotplug.refcount--;
+		goto unlock;
 
-	if (unlikely(writer_exists()) && !cpu_hotplug.refcount)
+	cpu_hotplug.refcount--;
+	if (!cpu_hotplug.refcount)
 		wake_up(&cpu_hotplug.writer_queue);
-
-	mutex_unlock(&cpu_hotplug.lock);
-
+ unlock:
+	spin_unlock(&cpu_hotplug.lock);
 }
 EXPORT_SYMBOL_GPL(put_online_cpus);
 
@@ -95,45 +112,41 @@ void cpu_maps_update_done(void)
  * This ensures that the hotplug operation can begin only when the
  * refcount goes to zero.
  *
- * Note that during a cpu-hotplug operation, the new readers, if any,
- * will be blocked by the cpu_hotplug.lock
- *
- * Since cpu_maps_update_begin is always called after invoking
- * cpu_maps_update_begin, we can be sure that only one writer is active.
- *
- * Note that theoretically, there is a possibility of a livelock:
- * - Refcount goes to zero, last reader wakes up the sleeping
- *   writer.
- * - Last reader unlocks the cpu_hotplug.lock.
- * - A new reader arrives at this moment, bumps up the refcount.
- * - The writer acquires the cpu_hotplug.lock finds the refcount
- *   non zero and goes to sleep again.
- *
- * However, this is very difficult to achieve in practice since
- * get_online_cpus() not an api which is called all that often.
- *
+ * cpu_hotplug is basically an unfair recursive reader/writer lock that
+ * allows reader in writer recursion.
  */
 static void cpu_hotplug_begin(void)
 {
-	DECLARE_WAITQUEUE(wait, current);
-
-	mutex_lock(&cpu_hotplug.lock);
+	might_sleep();
 
-	cpu_hotplug.active_writer = current;
-	add_wait_queue_exclusive(&cpu_hotplug.writer_queue, &wait);
-	while (cpu_hotplug.refcount) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		mutex_unlock(&cpu_hotplug.lock);
-		schedule();
-		mutex_lock(&cpu_hotplug.lock);
+	spin_lock(&cpu_hotplug.lock);
+	if (cpu_hotplug.refcount || cpu_hotplug.active_writer) {
+		DEFINE_WAIT(wait);
+
+		for (;;) {
+			prepare_to_wait(&cpu_hotplug.writer_queue, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (!cpu_hotplug.refcount && !cpu_hotplug.active_writer)
+				break;
+			spin_unlock(&cpu_hotplug.lock);
+			schedule();
+			spin_lock(&cpu_hotplug.lock);
+		}
+		finish_wait(&cpu_hotplug.writer_queue, &wait);
 	}
-	remove_wait_queue_locked(&cpu_hotplug.writer_queue, &wait);
+	cpu_hotplug.active_writer = current;
+	spin_unlock(&cpu_hotplug.lock);
 }
 
 static void cpu_hotplug_done(void)
 {
+	spin_lock(&cpu_hotplug.lock);
 	cpu_hotplug.active_writer = NULL;
-	mutex_unlock(&cpu_hotplug.lock);
+	if (!list_empty(&cpu_hotplug.writer_queue.task_list))
+		wake_up(&cpu_hotplug.writer_queue);
+	else
+		wake_up_all(&cpu_hotplug.reader_queue);
+	spin_unlock(&cpu_hotplug.lock);
 }
 /* Need to know about CPUs going up/down? */
 int __cpuinit register_cpu_notifier(struct notifier_block *nb)
-- 
Thanks and Regards
gautham

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/8] lockdep: annotate cpu_hotplug
  2008-04-29 12:56 [PATCH 0/8] CPU-Hotplug: Fix CPU-Hotplug <--> cpufreq locking dependency Gautham R Shenoy
                   ` (4 preceding siblings ...)
  2008-04-29 13:02 ` [PATCH 5/8] cpu: cpu-hotplug deadlock Gautham R Shenoy
@ 2008-04-29 13:02 ` Gautham R Shenoy
  2008-04-29 13:03 ` [PATCH 7/8] cpu_hotplug: Introduce try_get_online_cpus() Gautham R Shenoy
  2008-04-29 13:05 ` [PATCH 8/8] cpufreq: Nest down_write/read(cpufreq_rwsem) within get_online_cpus()/put_online_cpus() Gautham R Shenoy
  7 siblings, 0 replies; 27+ messages in thread
From: Gautham R Shenoy @ 2008-04-29 13:02 UTC (permalink / raw)
  To: linux-kernel, Zdenek Kabelac, Peter Zijlstra, Oleg Nesterov,
	Heiko Carstens, Rafael J. Wysocki
  Cc: Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

Subject: lockdep: annotate cpu_hotplug

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Annotate the cpu_hotplug lock.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---

 include/linux/lockdep.h |   15 +++++++++++++++
 kernel/cpu.c            |   16 ++++++++++++++++
 2 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 36e254f..424db2f 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -458,4 +458,19 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 # define rwsem_release(l, n, i)			do { } while (0)
 #endif
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+# ifdef CONFIG_PROVE_LOCKING
+#  define cpu_hotplug_acquire(l, s, t, i)	lock_acquire(l, s, t, 0, 2, i)
+#  define cpu_hotplug_acquire_read(l, s, t, i)	lock_acquire(l, s, t, 3, 2, i)
+# else
+#  define cpu_hotplug_acquire(l, s, t, i)	lock_acquire(l, s, t, 0, 1, i)
+#  define cpu_hotplug_acquire_read(l, s, t, i)	lock_acquire(l, s, t, 3, 1, i)
+# endif
+# define cpu_hotplug_release(l, n, i)		lock_release(l, n, i)
+#else
+# define cpu_hotplug_acquire(l, s, t, i)	do { } while (0)
+# define cpu_hotplug_acquire_read(l, s, t, i)	do { } while (0)
+# define cpu_hotplug_release(l, n, i)		do { } while (0)
+#endif
+
 #endif /* __LINUX_LOCKDEP_H */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index e856c22..8f1718f 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -35,12 +35,20 @@ static struct {
 	int refcount;
 	wait_queue_head_t reader_queue;
 	wait_queue_head_t writer_queue;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	struct lockdep_map	dep_map;
+#endif
 } cpu_hotplug;
 
 #define writer_exists() (cpu_hotplug.active_writer != NULL)
 
 void __init cpu_hotplug_init(void)
 {
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	static struct lock_class_key __key;
+
+	lockdep_init_map(&cpu_hotplug.dep_map, "cpu_hotplug", &__key, 0);
+#endif
 	cpu_hotplug.active_writer = NULL;
 	spin_lock_init(&cpu_hotplug.lock);
 	cpu_hotplug.refcount = 0;
@@ -54,6 +62,8 @@ void get_online_cpus(void)
 {
 	might_sleep();
 
+	cpu_hotplug_acquire_read(&cpu_hotplug.dep_map, 0, 0, _THIS_IP_);
+
 	spin_lock(&cpu_hotplug.lock);
 	if (cpu_hotplug.active_writer == current)
 		goto unlock;
@@ -80,6 +90,8 @@ EXPORT_SYMBOL_GPL(get_online_cpus);
 
 void put_online_cpus(void)
 {
+	cpu_hotplug_release(&cpu_hotplug.dep_map, 1, _THIS_IP_);
+
 	spin_lock(&cpu_hotplug.lock);
 	if (cpu_hotplug.active_writer == current)
 		goto unlock;
@@ -119,6 +131,8 @@ static void cpu_hotplug_begin(void)
 {
 	might_sleep();
 
+	cpu_hotplug_acquire(&cpu_hotplug.dep_map, 0, 0, _THIS_IP_);
+
 	spin_lock(&cpu_hotplug.lock);
 	if (cpu_hotplug.refcount || cpu_hotplug.active_writer) {
 		DEFINE_WAIT(wait);
@@ -140,6 +154,8 @@ static void cpu_hotplug_begin(void)
 
 static void cpu_hotplug_done(void)
 {
+	cpu_hotplug_release(&cpu_hotplug.dep_map, 1, _THIS_IP_);
+
 	spin_lock(&cpu_hotplug.lock);
 	cpu_hotplug.active_writer = NULL;
 	if (!list_empty(&cpu_hotplug.writer_queue.task_list))
-- 
Thanks and Regards
gautham

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 7/8] cpu_hotplug: Introduce try_get_online_cpus()
  2008-04-29 12:56 [PATCH 0/8] CPU-Hotplug: Fix CPU-Hotplug <--> cpufreq locking dependency Gautham R Shenoy
                   ` (5 preceding siblings ...)
  2008-04-29 13:02 ` [PATCH 6/8] lockdep: annotate cpu_hotplug Gautham R Shenoy
@ 2008-04-29 13:03 ` Gautham R Shenoy
  2008-04-29 13:05 ` [PATCH 8/8] cpufreq: Nest down_write/read(cpufreq_rwsem) within get_online_cpus()/put_online_cpus() Gautham R Shenoy
  7 siblings, 0 replies; 27+ messages in thread
From: Gautham R Shenoy @ 2008-04-29 13:03 UTC (permalink / raw)
  To: linux-kernel, Zdenek Kabelac, Peter Zijlstra, Oleg Nesterov,
	Heiko Carstens, Rafael J. Wysocki
  Cc: Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri,
	Venkatesh Pallipadi

cpu_hotplug: Introduce try_get_online_cpus()

From: Gautham R Shenoy <ego@in.ibm.com>

Subsystems such as workqueues currently cannot use get_online_cpus()
since they deadlock with the workqueue CPU-Hotplug callback code.

For such cases, introduce a new API try_get_online_cpus() which
returns 0 when a cpu-hotplug operation is in progress. It behaves
like get_online_cpus() and returns 1 otherwise.

Based on the status of CPU-Hotplug, the subsystems can take appropriate
action.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---

 include/linux/cpu.h |    2 ++
 kernel/cpu.c        |   22 ++++++++++++++++++++++
 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 0be8d65..d0be341 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -107,6 +107,7 @@ static inline void cpuhotplug_mutex_unlock(struct mutex *cpu_hp_mutex)
 }
 
 extern void get_online_cpus(void);
+extern int try_get_online_cpus(void);
 extern void put_online_cpus(void);
 #define hotcpu_notifier(fn, pri) {				\
 	static struct notifier_block fn##_nb =			\
@@ -126,6 +127,7 @@ static inline void cpuhotplug_mutex_unlock(struct mutex *cpu_hp_mutex)
 
 #define get_online_cpus()	do { } while (0)
 #define put_online_cpus()	do { } while (0)
+static inline int try_get_online_cpus(void) { return 1; }
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 8f1718f..52c1e4e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -57,7 +57,29 @@ void __init cpu_hotplug_init(void)
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
+int try_get_online_cpus(void)
+{
+	spin_lock(&cpu_hotplug.lock);
+	if (cpu_hotplug.active_writer == current)
+		goto out_unlock;
+
+	if (likely(!cpu_hotplug.active_writer))
+		goto out_success;
+
+	if (cpu_hotplug.refcount)
+		goto out_success;
+
+	spin_unlock(&cpu_hotplug.lock);
+	return 0;
+
+out_success:
+	cpu_hotplug.refcount++;
+out_unlock:
+	spin_unlock(&cpu_hotplug.lock);
+	cpu_hotplug_acquire_read(&cpu_hotplug.dep_map, 0, 1, _THIS_IP_);
+	return 1;
 
+}
 void get_online_cpus(void)
 {
 	might_sleep();
-- 
Thanks and Regards
gautham

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 8/8] cpufreq: Nest down_write/read(cpufreq_rwsem) within get_online_cpus()/put_online_cpus()
  2008-04-29 12:56 [PATCH 0/8] CPU-Hotplug: Fix CPU-Hotplug <--> cpufreq locking dependency Gautham R Shenoy
                   ` (6 preceding siblings ...)
  2008-04-29 13:03 ` [PATCH 7/8] cpu_hotplug: Introduce try_get_online_cpus() Gautham R Shenoy
@ 2008-04-29 13:05 ` Gautham R Shenoy
  7 siblings, 0 replies; 27+ messages in thread
From: Gautham R Shenoy @ 2008-04-29 13:05 UTC (permalink / raw)
  To: linux-kernel, Zdenek Kabelac, Peter Zijlstra, Oleg Nesterov,
	Heiko Carstens, Rafael J. Wysocki
  Cc: Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

cpufreq: Nest down_write/read(cpufreq_rwsem) with get_online_cpus()

From: Gautham R Shenoy <ego@in.ibm.com>

On the CPU-Hotplug callback path, the cpufreq_rwsem has a dependency
with the cpu_hotplug lock. However on the read-path, this dependency is not
honoured resulting in a possible deadlock when one tries to change
the cpufreq governor when a CPU-Hotplug is in progress.

This patch nests the down_write/read(cpufreq_rwsem) with
get_online_cpus()/put_online_cpus() pair.

Also, it introduces a try_get_online_cpus() in the do_dbs_timer() work-item
code, since we cannot do get_online_cpus() from within work items.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---

 drivers/cpufreq/cpufreq.c          |   11 +++++++++--
 drivers/cpufreq/cpufreq_ondemand.c |    9 +++++++--
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 35a26a3..087f8fc 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -71,11 +71,12 @@ int lock_policy_rwsem_##mode						\
 {									\
 	int policy_cpu = per_cpu(policy_cpu, cpu);			\
 	BUG_ON(policy_cpu == -1);					\
-	down_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu));		\
+	get_online_cpus();						\
 	if (unlikely(!cpu_online(cpu))) {				\
-		up_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu));	\
+		put_online_cpus();					\
 		return -1;						\
 	}								\
+	down_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu));		\
 									\
 	return 0;							\
 }
@@ -91,6 +92,7 @@ void unlock_policy_rwsem_read(int cpu)
 	int policy_cpu = per_cpu(policy_cpu, cpu);
 	BUG_ON(policy_cpu == -1);
 	up_read(&per_cpu(cpu_policy_rwsem, policy_cpu));
+	put_online_cpus();
 }
 EXPORT_SYMBOL_GPL(unlock_policy_rwsem_read);
 
@@ -99,6 +101,7 @@ void unlock_policy_rwsem_write(int cpu)
 	int policy_cpu = per_cpu(policy_cpu, cpu);
 	BUG_ON(policy_cpu == -1);
 	up_write(&per_cpu(cpu_policy_rwsem, policy_cpu));
+	put_online_cpus();
 }
 EXPORT_SYMBOL_GPL(unlock_policy_rwsem_write);
 
@@ -1818,7 +1821,9 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data)
 	cpufreq_driver = driver_data;
 	spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
+	get_online_cpus();
 	ret = sysdev_driver_register(&cpu_sysdev_class,&cpufreq_sysdev_driver);
+	put_online_cpus();
 
 	if ((!ret) && !(cpufreq_driver->flags & CPUFREQ_STICKY)) {
 		int i;
@@ -1833,8 +1838,10 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data)
 		if (ret) {
 			dprintk("no CPU initialized for driver %s\n",
 							driver_data->name);
+			get_online_cpus();
 			sysdev_driver_unregister(&cpu_sysdev_class,
 						&cpufreq_sysdev_driver);
+			put_online_cpus();
 
 			spin_lock_irqsave(&cpufreq_driver_lock, flags);
 			cpufreq_driver = NULL;
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index d2af20d..9428f7e 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -446,12 +446,15 @@ static void do_dbs_timer(struct work_struct *work)
 
 	delay -= jiffies % delay;
 
-	if (lock_policy_rwsem_write(cpu) < 0)
+	if (!try_get_online_cpus())
 		return;
 
+	if (lock_policy_rwsem_write(cpu) < 0)
+		goto out;
+
 	if (!dbs_info->enable) {
 		unlock_policy_rwsem_write(cpu);
-		return;
+		goto out;
 	}
 
 	/* Common NORMAL_SAMPLE setup */
@@ -471,6 +474,8 @@ static void do_dbs_timer(struct work_struct *work)
 	}
 	queue_delayed_work_on(cpu, kondemand_wq, &dbs_info->work, delay);
 	unlock_policy_rwsem_write(cpu);
+out:
+	put_online_cpus();
 }
 
 static inline void dbs_timer_init(struct cpu_dbs_info_s *dbs_info)
-- 
Thanks and Regards
gautham

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] lockdep: fix recursive read lock validation
  2008-04-29 12:57 ` [PATCH 1/8] lockdep: fix recursive read lock validation Gautham R Shenoy
@ 2008-04-29 13:16   ` Bart Van Assche
  2008-04-29 14:57     ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2008-04-29 13:16 UTC (permalink / raw)
  To: ego
  Cc: linux-kernel, Zdenek Kabelac, Peter Zijlstra, Oleg Nesterov,
	Heiko Carstens, Rafael J. Wysocki, Andrew Morton, Ingo Molnar,
	Srivatsa Vaddagiri

On Tue, Apr 29, 2008 at 2:57 PM, Gautham R Shenoy <ego@in.ibm.com> wrote:
> Subject: lockdep: fix recursive read lock validation
>  This means that the following sequence is now invalid, whereas previously
>  it was considered valid:
>
>   rlock(a); rlock(b); runlock(b); runlock(a)
>   rlock(b); rlock(a);

Why are you marking this sequence as invalid ? Although it can be
debated whether it is good programming practice to be inconsistent
about the order of read-locking, the above sequence can't be involved
in a deadlock.

Bart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Hans Reiser, reiserfs developer
  2008-04-29 13:01 ` [PATCH 4/8] net: af_netlink: deadlock Gautham R Shenoy
@ 2008-04-29 13:19   ` linux-os (Dick Johnson)
  0 siblings, 0 replies; 27+ messages in thread
From: linux-os (Dick Johnson) @ 2008-04-29 13:19 UTC (permalink / raw)
  To: Linux kernel

Hello all,
Hans Reiser, the reiserfs developer was found guilty
yesterday of killing his wife.

http://yro.slashdot.prg/article.pl?sid=08/04/28/2243232

No body, no crime scene, no physical evidence, and no
witnesses, but they still found him guilty!

Cheers,
Dick Johnson
Penguin : Linux version 2.6.22.1 on an i686 machine (5588.29 BogoMips).
My book : http://www.AbominableFirebug.com/
_

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/8] cpu: cpu-hotplug deadlock
  2008-04-29 13:02 ` [PATCH 5/8] cpu: cpu-hotplug deadlock Gautham R Shenoy
@ 2008-04-29 14:33   ` Oleg Nesterov
  2008-04-29 15:09     ` Peter Zijlstra
  2008-04-30  5:37     ` Gautham R Shenoy
  0 siblings, 2 replies; 27+ messages in thread
From: Oleg Nesterov @ 2008-04-29 14:33 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: linux-kernel, Zdenek Kabelac, Peter Zijlstra, Heiko Carstens,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

On 04/29, Gautham R Shenoy wrote:
>
> cpu_hotplug.mutex is basically a lock-internal lock; but by keeping it locked
> over the 'write' section (cpu_hotplug_begin/done) a lock inversion happens when
> some of the write side code calls into code that would otherwise take a
> read lock.
>
> And it so happens that read-in-write recursion is expressly permitted.
>
> Fix this by turning cpu_hotplug into a proper stand alone unfair reader/writer
> lock that allows reader-in-reader and reader-in-writer recursion.

While the patch itself is very clean and understandable, I can't understand
the changelog ;)

Could you explain what is the semantics difference? The current code allows
read-in-write recursion too.

The only difference I can see is that now cpu_hotplug_begin() doesn't rely
on cpu_add_remove_lock any longer (currently the caller must hold this lock),
but this (good) change is not documented.

>  static void cpu_hotplug_done(void)
>  {
> +	spin_lock(&cpu_hotplug.lock);
>  	cpu_hotplug.active_writer = NULL;
> -	mutex_unlock(&cpu_hotplug.lock);
> +	if (!list_empty(&cpu_hotplug.writer_queue.task_list))

waitqueue_active() ?

> +		wake_up(&cpu_hotplug.writer_queue);
> +	else
> +		wake_up_all(&cpu_hotplug.reader_queue);

Please note that wake_up() and wake_up_all() doesn't differ here, because
cpu_hotplug_begin() use prepare_to_wait(), not prepare_to_wait_exclusive().
I'd suggest to change cpu_hotplug_begin(), and use wake_up() for both
cases.

(actually, since write-locks should be very rare, perhaps we don't need
 2 wait_queues ?)

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/8] lockdep: fix fib_hash softirq inversion
  2008-04-29 13:00 ` [PATCH 3/8] lockdep: fix fib_hash softirq inversion Gautham R Shenoy
@ 2008-04-29 14:45   ` Peter Zijlstra
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2008-04-29 14:45 UTC (permalink / raw)
  To: ego
  Cc: linux-kernel, Zdenek Kabelac, Oleg Nesterov, Heiko Carstens,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri,
	Eric Dumazet, David S. Miller

On Tue, 2008-04-29 at 18:30 +0530, Gautham R Shenoy wrote:
> Subject: lockdep: fix fib_hash softirq inversion
> 
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>

This is a false positive, and hence we should not apply this patch.

I think I understand the problem and have an idea on how to fix lockdep,
but I haven't gotten around to actually doing so.

Esp in the light that fixing lockdep isn't a requirement (just a nice to
have) for these other cpu hotplug patches we should drop these patches.

So that is, 1-4 should not be applied to mainline. I'll work on a new
and improved 2 that'll make 3 and 4 unneeded.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] lockdep: fix recursive read lock validation
  2008-04-29 13:16   ` Bart Van Assche
@ 2008-04-29 14:57     ` Peter Zijlstra
  2008-04-29 15:03       ` Bart Van Assche
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2008-04-29 14:57 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: ego, linux-kernel, Zdenek Kabelac, Oleg Nesterov, Heiko Carstens,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

On Tue, 2008-04-29 at 15:16 +0200, Bart Van Assche wrote:
> On Tue, Apr 29, 2008 at 2:57 PM, Gautham R Shenoy <ego@in.ibm.com> wrote:
> > Subject: lockdep: fix recursive read lock validation
> >  This means that the following sequence is now invalid, whereas previously
> >  it was considered valid:
> >
> >   rlock(a); rlock(b); runlock(b); runlock(a)
> >   rlock(b); rlock(a);
> 
> Why are you marking this sequence as invalid ? Although it can be
> debated whether it is good programming practice to be inconsistent
> about the order of read-locking, the above sequence can't be involved
> in a deadlock.

Not for pure read locks, but when you add write locks to it, it does get
deadlocky. Lockdep does not keep separate chains for read and write
locks.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] lockdep: fix recursive read lock validation
  2008-04-29 14:57     ` Peter Zijlstra
@ 2008-04-29 15:03       ` Bart Van Assche
  2008-04-29 15:15         ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2008-04-29 15:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ego, linux-kernel, Zdenek Kabelac, Oleg Nesterov, Heiko Carstens,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

On Tue, Apr 29, 2008 at 4:57 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> On Tue, 2008-04-29 at 15:16 +0200, Bart Van Assche wrote:
>  > On Tue, Apr 29, 2008 at 2:57 PM, Gautham R Shenoy <ego@in.ibm.com> wrote:
>  > > Subject: lockdep: fix recursive read lock validation
>  > >  This means that the following sequence is now invalid, whereas previously
>  > >  it was considered valid:
>  > >
>  > >   rlock(a); rlock(b); runlock(b); runlock(a)
>  > >   rlock(b); rlock(a);
>  >
>  > Why are you marking this sequence as invalid ? Although it can be
>  > debated whether it is good programming practice to be inconsistent
>  > about the order of read-locking, the above sequence can't be involved
>  > in a deadlock.
>
>  Not for pure read locks, but when you add write locks to it, it does get
>  deadlocky. Lockdep does not keep separate chains for read and write
>  locks.

Nesting writer locks inside reader locks is always a bad idea. So
please come up with an example of how varying the reader lock nesting
order can trigger a deadlock (when no writer locks are nested inside
reader locks and nested writer locks are always nested in the same
order).

Bart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/8] cpu: cpu-hotplug deadlock
  2008-04-29 14:33   ` Oleg Nesterov
@ 2008-04-29 15:09     ` Peter Zijlstra
  2008-04-29 16:45       ` Oleg Nesterov
  2008-04-30  5:37     ` Gautham R Shenoy
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2008-04-29 15:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, linux-kernel, Zdenek Kabelac, Heiko Carstens,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

On Tue, 2008-04-29 at 18:33 +0400, Oleg Nesterov wrote:
> On 04/29, Gautham R Shenoy wrote:
> >
> > cpu_hotplug.mutex is basically a lock-internal lock; but by keeping it locked
> > over the 'write' section (cpu_hotplug_begin/done) a lock inversion happens when
> > some of the write side code calls into code that would otherwise take a
> > read lock.
> >
> > And it so happens that read-in-write recursion is expressly permitted.
> >
> > Fix this by turning cpu_hotplug into a proper stand alone unfair reader/writer
> > lock that allows reader-in-reader and reader-in-writer recursion.
> 
> While the patch itself is very clean and understandable, I can't understand
> the changelog ;)

Yeah, my skillz lie elsewhere,.. :-/

The only thing that changed is that the mutex is not held; so what we
change is:

 LOCK

 ... do the full hotplug thing ...

 UNLOCK

into

 LOCK
  set state
 UNLOCK

 ... do the full hotplug thing ...

 LOCK
  unset state
 UNLOCK

So that the lock isn't held over the hotplug operation.

> Could you explain what is the semantics difference? The current code allows
> read-in-write recursion too.

Yes, that is a requirement.

> The only difference I can see is that now cpu_hotplug_begin() doesn't rely
> on cpu_add_remove_lock any longer (currently the caller must hold this lock),
> but this (good) change is not documented.
> 
> >  static void cpu_hotplug_done(void)
> >  {
> > +	spin_lock(&cpu_hotplug.lock);
> >  	cpu_hotplug.active_writer = NULL;
> > -	mutex_unlock(&cpu_hotplug.lock);
> > +	if (!list_empty(&cpu_hotplug.writer_queue.task_list))
> 
> waitqueue_active() ?

I even looked to see if such a thing existed - must have been blind that
day.

> > +		wake_up(&cpu_hotplug.writer_queue);
> > +	else
> > +		wake_up_all(&cpu_hotplug.reader_queue);
> 
> Please note that wake_up() and wake_up_all() doesn't differ here, because
> cpu_hotplug_begin() use prepare_to_wait(), not prepare_to_wait_exclusive().
> I'd suggest to change cpu_hotplug_begin(), and use wake_up() for both
> cases.

Ah, I was not aware of that distinction. Thanks!

> (actually, since write-locks should be very rare, perhaps we don't need
>  2 wait_queues ?)

And just let them race the wakeup race, sure that might work. Gautham
even pointed out that it never happens because there is another
exclusive lock on the write path.

But you say you like that it doesn't depend on that anymore - me too ;-)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] lockdep: fix recursive read lock validation
  2008-04-29 15:03       ` Bart Van Assche
@ 2008-04-29 15:15         ` Peter Zijlstra
  2008-04-29 16:03           ` Bart Van Assche
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2008-04-29 15:15 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: ego, linux-kernel, Zdenek Kabelac, Oleg Nesterov, Heiko Carstens,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

On Tue, 2008-04-29 at 17:03 +0200, Bart Van Assche wrote:
> On Tue, Apr 29, 2008 at 4:57 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >
> > On Tue, 2008-04-29 at 15:16 +0200, Bart Van Assche wrote:
> >  > On Tue, Apr 29, 2008 at 2:57 PM, Gautham R Shenoy <ego@in.ibm.com> wrote:
> >  > > Subject: lockdep: fix recursive read lock validation
> >  > >  This means that the following sequence is now invalid, whereas previously
> >  > >  it was considered valid:
> >  > >
> >  > >   rlock(a); rlock(b); runlock(b); runlock(a)
> >  > >   rlock(b); rlock(a);
> >  >
> >  > Why are you marking this sequence as invalid ? Although it can be
> >  > debated whether it is good programming practice to be inconsistent
> >  > about the order of read-locking, the above sequence can't be involved
> >  > in a deadlock.
> >
> >  Not for pure read locks, but when you add write locks to it, it does get
> >  deadlocky. Lockdep does not keep separate chains for read and write
> >  locks.
> 
> Nesting writer locks inside reader locks is always a bad idea. So
> please come up with an example of how varying the reader lock nesting
> order can trigger a deadlock (when no writer locks are nested inside
> reader locks and nested writer locks are always nested in the same
> order).

It can't deadlock when only readers are involved, but lockdep will not
be able to distinguish between the cases where only read locks are
involved and a mix of readers and writers is involved.

Hence disallow both.

But hitting this requiers you do a series of rather unfortunate things:

 1) use recursive locking
 2) don't have strict lock order
 3) make it work by using read locks only

Seriously, any code that triggers this might want to have its locking
re-throught.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] lockdep: fix recursive read lock validation
  2008-04-29 15:15         ` Peter Zijlstra
@ 2008-04-29 16:03           ` Bart Van Assche
  2008-04-29 16:15             ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2008-04-29 16:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ego, linux-kernel, Zdenek Kabelac, Oleg Nesterov, Heiko Carstens,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

On Tue, Apr 29, 2008 at 5:15 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> On Tue, 2008-04-29 at 17:03 +0200, Bart Van Assche wrote:
>  > On Tue, Apr 29, 2008 at 4:57 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>  > >
>  > > On Tue, 2008-04-29 at 15:16 +0200, Bart Van Assche wrote:
>  > >  > On Tue, Apr 29, 2008 at 2:57 PM, Gautham R Shenoy <ego@in.ibm.com> wrote:
>  > >  > > Subject: lockdep: fix recursive read lock validation
>  > >  > >  This means that the following sequence is now invalid, whereas previously
>  > >  > >  it was considered valid:
>  > >  > >
>  > >  > >   rlock(a); rlock(b); runlock(b); runlock(a)
>  > >  > >   rlock(b); rlock(a);
>  > >  >
>  > >  > Why are you marking this sequence as invalid ? Although it can be
>  > >  > debated whether it is good programming practice to be inconsistent
>  > >  > about the order of read-locking, the above sequence can't be involved
>  > >  > in a deadlock.
>  > >
>  > >  Not for pure read locks, but when you add write locks to it, it does get
>  > >  deadlocky. Lockdep does not keep separate chains for read and write
>  > >  locks.
>  >
>  > Nesting writer locks inside reader locks is always a bad idea. So
>  > please come up with an example of how varying the reader lock nesting
>  > order can trigger a deadlock (when no writer locks are nested inside
>  > reader locks and nested writer locks are always nested in the same
>  > order).
>
>  It can't deadlock when only readers are involved, but lockdep will not
>  be able to distinguish between the cases where only read locks are
>  involved and a mix of readers and writers is involved.
>
>  Hence disallow both.
>
>  But hitting this requires you do a series of rather unfortunate things:
>
>   1) use recursive locking
>   2) don't have strict lock order
>   3) make it work by using read locks only
>
>  Seriously, any code that triggers this might want to have its locking
>  re-throught.

You did not get my point.

My point is that if you follow the following locking discipline, a
deadlock will never be triggered:
* Always obtain writer locks in a consistent order.
* Never nest writer locks inside reader locks.
* Nesting reader locks inside writer locks is okay, and nesting reader
locks inside other reader locks is also OK.

Again: if you do not agree with the above, please post an example that
proves me wrong.

Or: whether or not to allow a sequence like "rlock(a); rlock(b);
runlock(b); runlock(a); rlock(b); rlock(a);" is something we can
choose. We do not have to forbid this sequence -- we can choose
whether or not we allow this sequence.

Bart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] lockdep: fix recursive read lock validation
  2008-04-29 16:03           ` Bart Van Assche
@ 2008-04-29 16:15             ` Peter Zijlstra
  2008-04-29 16:29               ` Bart Van Assche
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2008-04-29 16:15 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: ego, linux-kernel, Zdenek Kabelac, Oleg Nesterov, Heiko Carstens,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

On Tue, 2008-04-29 at 18:03 +0200, Bart Van Assche wrote:
> On Tue, Apr 29, 2008 at 5:15 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >
> > On Tue, 2008-04-29 at 17:03 +0200, Bart Van Assche wrote:
> >  > On Tue, Apr 29, 2008 at 4:57 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >  > >
> >  > > On Tue, 2008-04-29 at 15:16 +0200, Bart Van Assche wrote:
> >  > >  > On Tue, Apr 29, 2008 at 2:57 PM, Gautham R Shenoy <ego@in.ibm.com> wrote:
> >  > >  > > Subject: lockdep: fix recursive read lock validation
> >  > >  > >  This means that the following sequence is now invalid, whereas previously
> >  > >  > >  it was considered valid:
> >  > >  > >
> >  > >  > >   rlock(a); rlock(b); runlock(b); runlock(a)
> >  > >  > >   rlock(b); rlock(a);
> >  > >  >
> >  > >  > Why are you marking this sequence as invalid ? Although it can be
> >  > >  > debated whether it is good programming practice to be inconsistent
> >  > >  > about the order of read-locking, the above sequence can't be involved
> >  > >  > in a deadlock.
> >  > >
> >  > >  Not for pure read locks, but when you add write locks to it, it does get
> >  > >  deadlocky. Lockdep does not keep separate chains for read and write
> >  > >  locks.
> >  >
> >  > Nesting writer locks inside reader locks is always a bad idea. So
> >  > please come up with an example of how varying the reader lock nesting
> >  > order can trigger a deadlock (when no writer locks are nested inside
> >  > reader locks and nested writer locks are always nested in the same
> >  > order).
> >
> >  It can't deadlock when only readers are involved, but lockdep will not
> >  be able to distinguish between the cases where only read locks are
> >  involved and a mix of readers and writers is involved.
> >
> >  Hence disallow both.
> >
> >  But hitting this requires you do a series of rather unfortunate things:
> >
> >   1) use recursive locking
> >   2) don't have strict lock order
> >   3) make it work by using read locks only
> >
> >  Seriously, any code that triggers this might want to have its locking
> >  re-throught.
> 
> You did not get my point.
> 
> My point is that if you follow the following locking discipline, a
> deadlock will never be triggered:
> * Always obtain writer locks in a consistent order.
> * Never nest writer locks inside reader locks.
> * Nesting reader locks inside writer locks is okay, and nesting reader
> locks inside other reader locks is also OK.
> 
> Again: if you do not agree with the above, please post an example that
> proves me wrong.

Using a lock that does not allow reader nesting would be cheating,
right?

> Or: whether or not to allow a sequence like "rlock(a); rlock(b);
> runlock(b); runlock(a); rlock(b); rlock(a);" is something we can
> choose. We do not have to forbid this sequence -- we can choose
> whether or not we allow this sequence.

I'm utterly confused now; I never argued that it would get deadlocks;
and I said I choose to not allow it from a lockdep pov. What else do you
want?




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] lockdep: fix recursive read lock validation
  2008-04-29 16:15             ` Peter Zijlstra
@ 2008-04-29 16:29               ` Bart Van Assche
  2008-04-29 17:04                 ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2008-04-29 16:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ego, linux-kernel, Zdenek Kabelac, Oleg Nesterov, Heiko Carstens,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

On Tue, Apr 29, 2008 at 6:15 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
>  > Or: whether or not to allow a sequence like "rlock(a); rlock(b);
>  > runlock(b); runlock(a); rlock(b); rlock(a);" is something we can
>  > choose. We do not have to forbid this sequence -- we can choose
>  > whether or not we allow this sequence.
>
>  I'm utterly confused now; I never argued that it would get deadlocks;
>  and I said I choose to not allow it from a lockdep pov. What else do you
>  want?

So we both agree that the statement in the original e-mail (by Gautham
R Shenoy) is wrong ? The original e-mail stated that obtaining reader
locks in an inconsistent order is wrong.

Bart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/8] cpu: cpu-hotplug deadlock
  2008-04-29 15:09     ` Peter Zijlstra
@ 2008-04-29 16:45       ` Oleg Nesterov
  2008-04-29 17:31         ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2008-04-29 16:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gautham R Shenoy, linux-kernel, Zdenek Kabelac, Heiko Carstens,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

On 04/29, Peter Zijlstra wrote:
>
> The only thing that changed is that the mutex is not held; so what we
> change is:
> 
>  LOCK
> 
>  ... do the full hotplug thing ...
> 
>  UNLOCK
> 
> into
> 
>  LOCK
>   set state
>  UNLOCK
> 
>  ... do the full hotplug thing ...
> 
>  LOCK
>   unset state
>  UNLOCK
> 
> So that the lock isn't held over the hotplug operation.

Well, yes I see, but... Ugh, I have a a blind spot here ;)

why this makes any difference from the semantics POV ? why it is bad
to hold the mutex throughout the "full hotplug thing" ?

> > (actually, since write-locks should be very rare, perhaps we don't need
> >  2 wait_queues ?)
> 
> And just let them race the wakeup race, sure that might work. Gautham
> even pointed out that it never happens because there is another
> exclusive lock on the write path.
> 
> But you say you like that it doesn't depend on that anymore - me too ;-)

Yes. but let's suppose we have the single wait_queue, this doesn't make
any difference from the correctness POV, no?

To clarify: I am not arguing! this makes sense, but I'm asking to be sure
I didn't miss a subtle reason why do we "really" need 2 wait_queues.

Also. Let's suppose we have both read- and write- waiters, and cpu_hotplug_done()
does wake_up(writer_queue). It is possible that another reader comes and does
get_online_cpus() and increments .refcount first. After that, cpu_hotplug
is "opened" for the read-lock, but other read-waiters continue to sleep, and
the final put_online_cpus() wakes up write-waiters only. Yes, this all is
correct, but not "symmetrical", and leads to the question "do we really need
2 wait_queues" again.

Oleg.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] lockdep: fix recursive read lock validation
  2008-04-29 16:29               ` Bart Van Assche
@ 2008-04-29 17:04                 ` Peter Zijlstra
  2008-04-29 17:45                   ` Bart Van Assche
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2008-04-29 17:04 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: ego, linux-kernel, Zdenek Kabelac, Oleg Nesterov, Heiko Carstens,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

On Tue, 2008-04-29 at 18:29 +0200, Bart Van Assche wrote:
> On Tue, Apr 29, 2008 at 6:15 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >
> >  > Or: whether or not to allow a sequence like "rlock(a); rlock(b);
> >  > runlock(b); runlock(a); rlock(b); rlock(a);" is something we can
> >  > choose. We do not have to forbid this sequence -- we can choose
> >  > whether or not we allow this sequence.
> >
> >  I'm utterly confused now; I never argued that it would get deadlocks;
> >  and I said I choose to not allow it from a lockdep pov. What else do you
> >  want?
> 
> So we both agree that the statement in the original e-mail (by Gautham
> R Shenoy) is wrong ? The original e-mail stated that obtaining reader
> locks in an inconsistent order is wrong.

I think the critical part is:

> It really is invalid when considered against write locks.

Aside from that it just states that inversion of lock order will be
treated as invalid - even for read locks.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/8] cpu: cpu-hotplug deadlock
  2008-04-29 16:45       ` Oleg Nesterov
@ 2008-04-29 17:31         ` Peter Zijlstra
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2008-04-29 17:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Gautham R Shenoy, linux-kernel, Zdenek Kabelac, Heiko Carstens,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

On Tue, 2008-04-29 at 20:45 +0400, Oleg Nesterov wrote:
> On 04/29, Peter Zijlstra wrote:
> >
> > The only thing that changed is that the mutex is not held; so what we
> > change is:
> > 
> >  LOCK
> > 
> >  ... do the full hotplug thing ...
> > 
> >  UNLOCK
> > 
> > into
> > 
> >  LOCK
> >   set state
> >  UNLOCK
> > 
> >  ... do the full hotplug thing ...
> > 
> >  LOCK
> >   unset state
> >  UNLOCK
> > 
> > So that the lock isn't held over the hotplug operation.
> 
> Well, yes I see, but... Ugh, I have a a blind spot here ;)
> 
> why this makes any difference from the semantics POV ? why it is bad
> to hold the mutex throughout the "full hotplug thing" ?

Darn, now you make me think ;-)

Ok, I think I have it; the crux of the matter is that we want
reader-in-writer recursion for the cpu hotplug lock.

So we want:

 cpu_hotplug.write_lock()
   A.lock()
     cpu_hotplug.read_lock()

When - as it was - the write lock is implemented as keeping the lock
internal lock (the lock guarding the lock state) locked over the entire
write section, and the read lock side is, LOCK; change state; UNLOCK,
the above will result in a deadlock like:

  C.lock
    A.lock
      C.lock

By making both the read and write side work like:

 LOCK
  change state
 UNLOCK

the internal lock will not deadlock.

So what I did was promote cpu_hotplug to a full lock that handled
read-in-read and read-in-write recursion and made cpu_hotplug.lock the
lock internal lock.

> > > (actually, since write-locks should be very rare, perhaps we don't need
> > >  2 wait_queues ?)
> > 
> > And just let them race the wakeup race, sure that might work. Gautham
> > even pointed out that it never happens because there is another
> > exclusive lock on the write path.
> > 
> > But you say you like that it doesn't depend on that anymore - me too ;-)
> 
> Yes. but let's suppose we have the single wait_queue, this doesn't make
> any difference from the correctness POV, no?
> 
> To clarify: I am not arguing! this makes sense, but I'm asking to be sure
> I didn't miss a subtle reason why do we "really" need 2 wait_queues.
> 
> Also. Let's suppose we have both read- and write- waiters, and cpu_hotplug_done()
> does wake_up(writer_queue). It is possible that another reader comes and does
> get_online_cpus() and increments .refcount first. After that, cpu_hotplug
> is "opened" for the read-lock, but other read-waiters continue to sleep, and
> the final put_online_cpus() wakes up write-waiters only. Yes, this all is
> correct, but not "symmetrical", and leads to the question "do we really need
> 2 wait_queues" again.

I don't think we do. It just didn't occur to me to pile read-waiters and
write-waiters on the same waitqueue.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] lockdep: fix recursive read lock validation
  2008-04-29 17:04                 ` Peter Zijlstra
@ 2008-04-29 17:45                   ` Bart Van Assche
  2008-04-29 17:58                     ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2008-04-29 17:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ego, linux-kernel, Zdenek Kabelac, Oleg Nesterov, Heiko Carstens,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

On Tue, Apr 29, 2008 at 7:04 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
>  I think the critical part is:
>
>  > It really is invalid when considered against write locks.
>
>  Aside from that it just states that inversion of lock order will be
>  treated as invalid - even for read locks.

Inversion of recursive reader locks can't trigger a deadlock, even
when considered against write locks, as long as the rules are followed
I posted earlier. I invite anyone to come up with an example that
proves me wrong.

Let's return to the starting point of this discussion. The patch at
the start of this thread forbids to invert the lock order of reader
locks. Why to forbid this if it can't trigger a deadlock ? At least in
user space code, it's quite easy to trigger such inversion. In
sufficiently complex code where functions call other functions while
holding a reader lock, reader lock inversion can be hard to avoid.

Bart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] lockdep: fix recursive read lock validation
  2008-04-29 17:45                   ` Bart Van Assche
@ 2008-04-29 17:58                     ` Peter Zijlstra
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2008-04-29 17:58 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: ego, linux-kernel, Zdenek Kabelac, Oleg Nesterov, Heiko Carstens,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

On Tue, 2008-04-29 at 19:45 +0200, Bart Van Assche wrote:

> Let's return to the starting point of this discussion. The patch at
> the start of this thread forbids to invert the lock order of reader
> locks. Why to forbid this if it can't trigger a deadlock ? At least in
> user space code, it's quite easy to trigger such inversion. In
> sufficiently complex code where functions call other functions while
> holding a reader lock, reader lock inversion can be hard to avoid.

Sure, but the whole point is: lockdep cannot track those rules. _that_
is why I'm choosing to forbid them.

Also, on Linux only rwlock_t allows reader recursion, and that lock type
really ought to die, all users should convert to RCU (if possible).

So I'm not thinking placing this restriction on kernel code is going to
be painful.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/8] cpu: cpu-hotplug deadlock
  2008-04-29 14:33   ` Oleg Nesterov
  2008-04-29 15:09     ` Peter Zijlstra
@ 2008-04-30  5:37     ` Gautham R Shenoy
  2008-04-30 11:43       ` Oleg Nesterov
  1 sibling, 1 reply; 27+ messages in thread
From: Gautham R Shenoy @ 2008-04-30  5:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Zdenek Kabelac, Peter Zijlstra, Heiko Carstens,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

On Tue, Apr 29, 2008 at 06:33:50PM +0400, Oleg Nesterov wrote:
> On 04/29, Gautham R Shenoy wrote:
> >
> > cpu_hotplug.mutex is basically a lock-internal lock; but by keeping it locked
> > over the 'write' section (cpu_hotplug_begin/done) a lock inversion happens when
> > some of the write side code calls into code that would otherwise take a
> > read lock.
> >
> > And it so happens that read-in-write recursion is expressly permitted.
> >
> > Fix this by turning cpu_hotplug into a proper stand alone unfair reader/writer
> > lock that allows reader-in-reader and reader-in-writer recursion.
> 
> While the patch itself is very clean and understandable, I can't understand
> the changelog ;)
> 
> Could you explain what is the semantics difference? The current code allows
> read-in-write recursion too.

How about the following ?
----------------------------------------------------------------------------
cpu_hotplug: split the cpu_hotplug.lock mutex into a spinlock and a waitqueue.

Consider the following callpath:
get_online_cpus()
.
.
.
some_fn()--> takes some_lock; /* lock_acquire(some_lock) [1] */
.
.
.
get_online_cpus();  /* lock_acquire(cpu_hotplug.lock) [2] */

and on the cpu_hotplug callback path, we have
cpu_hotplug.lock /* lock_acquire(cpu_hotplug.lock) [3]*/
.
.
.
some_other_fn() ----> take some_lock /* lock_acquire(some_lock) [4]*/

lockdep will treat this as a  ABBA possiblity since on the write path we
currently hold the lock so that the readers are blocked on it.

However, the write path won't be activated until the refcount goes
to zero. Which means that lockdep yelling at [2] is a false positive.

Hence we need to split the mutex into a seperate spin_lock under which
the refcount can be incremented, and introduce a waitqueue where the readers
can block during a ongoing cpu-hotplug operation, and provide lockdep
annotation only when the reader is about to block.
----------------------------------------------------------------------------

> 
> The only difference I can see is that now cpu_hotplug_begin() doesn't rely
> on cpu_add_remove_lock any longer (currently the caller must hold this lock),
> but this (good) change is not documented.
> 
> >  static void cpu_hotplug_done(void)
> >  {
> > +	spin_lock(&cpu_hotplug.lock);
> >  	cpu_hotplug.active_writer = NULL;
> > -	mutex_unlock(&cpu_hotplug.lock);
> > +	if (!list_empty(&cpu_hotplug.writer_queue.task_list))
> 
> waitqueue_active() ?
> 
> > +		wake_up(&cpu_hotplug.writer_queue);
> > +	else
> > +		wake_up_all(&cpu_hotplug.reader_queue);
> 
> Please note that wake_up() and wake_up_all() doesn't differ here, because
> cpu_hotplug_begin() use prepare_to_wait(), not prepare_to_wait_exclusive().
> I'd suggest to change cpu_hotplug_begin(), and use wake_up() for both
> cases.
> 
> (actually, since write-locks should be very rare, perhaps we don't need
>  2 wait_queues ?)
>
> Oleg.

-- 
Thanks and Regards
gautham

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/8] cpu: cpu-hotplug deadlock
  2008-04-30  5:37     ` Gautham R Shenoy
@ 2008-04-30 11:43       ` Oleg Nesterov
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2008-04-30 11:43 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: linux-kernel, Zdenek Kabelac, Peter Zijlstra, Heiko Carstens,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Srivatsa Vaddagiri

On 04/30, Gautham R Shenoy wrote:
>
> On Tue, Apr 29, 2008 at 06:33:50PM +0400, Oleg Nesterov wrote:
> >
> > Could you explain what is the semantics difference? The current code allows
> > read-in-write recursion too.
>
> How about the following ?
> ----------------------------------------------------------------------------
> cpu_hotplug: split the cpu_hotplug.lock mutex into a spinlock and a waitqueue.
>
> Consider the following callpath:
> get_online_cpus()
> .
> .
> .
> some_fn()--> takes some_lock; /* lock_acquire(some_lock) [1] */
> .
> .
> .
> get_online_cpus();  /* lock_acquire(cpu_hotplug.lock) [2] */
>
> and on the cpu_hotplug callback path, we have
> cpu_hotplug.lock /* lock_acquire(cpu_hotplug.lock) [3]*/
> .
> .
> .
> some_other_fn() ----> take some_lock /* lock_acquire(some_lock) [4]*/
>
> lockdep will treat this as a  ABBA possiblity since on the write path we
> currently hold the lock so that the readers are blocked on it.
>
> However, the write path won't be activated until the refcount goes
> to zero. Which means that lockdep yelling at [2] is a false positive.

Thanks...

So, we need these complications to avoid a false positive from lockdep...
I wonder if it is possible to simply hide this internal mutex from lockdep.

OK, at least get_online_cpus() doesn't suffer from the performance pov,
and lockdep is really important. (I didn't actually understand this until
it found the bug introduced by me ;)

BTW, not that it matters, but get_online_cpus/put_online_cpus can check
"cpu_hotplug.active_writer == current" lockless.

Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2008-04-30 11:42 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-29 12:56 [PATCH 0/8] CPU-Hotplug: Fix CPU-Hotplug <--> cpufreq locking dependency Gautham R Shenoy
2008-04-29 12:57 ` [PATCH 1/8] lockdep: fix recursive read lock validation Gautham R Shenoy
2008-04-29 13:16   ` Bart Van Assche
2008-04-29 14:57     ` Peter Zijlstra
2008-04-29 15:03       ` Bart Van Assche
2008-04-29 15:15         ` Peter Zijlstra
2008-04-29 16:03           ` Bart Van Assche
2008-04-29 16:15             ` Peter Zijlstra
2008-04-29 16:29               ` Bart Van Assche
2008-04-29 17:04                 ` Peter Zijlstra
2008-04-29 17:45                   ` Bart Van Assche
2008-04-29 17:58                     ` Peter Zijlstra
2008-04-29 12:58 ` [PATCH 2/8] lockdep: reader-in-writer recursion Gautham R Shenoy
2008-04-29 13:00 ` [PATCH 3/8] lockdep: fix fib_hash softirq inversion Gautham R Shenoy
2008-04-29 14:45   ` Peter Zijlstra
2008-04-29 13:01 ` [PATCH 4/8] net: af_netlink: deadlock Gautham R Shenoy
2008-04-29 13:19   ` Hans Reiser, reiserfs developer linux-os (Dick Johnson)
2008-04-29 13:02 ` [PATCH 5/8] cpu: cpu-hotplug deadlock Gautham R Shenoy
2008-04-29 14:33   ` Oleg Nesterov
2008-04-29 15:09     ` Peter Zijlstra
2008-04-29 16:45       ` Oleg Nesterov
2008-04-29 17:31         ` Peter Zijlstra
2008-04-30  5:37     ` Gautham R Shenoy
2008-04-30 11:43       ` Oleg Nesterov
2008-04-29 13:02 ` [PATCH 6/8] lockdep: annotate cpu_hotplug Gautham R Shenoy
2008-04-29 13:03 ` [PATCH 7/8] cpu_hotplug: Introduce try_get_online_cpus() Gautham R Shenoy
2008-04-29 13:05 ` [PATCH 8/8] cpufreq: Nest down_write/read(cpufreq_rwsem) within get_online_cpus()/put_online_cpus() Gautham R Shenoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox