linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH tip/core/urgent 0/10] v2: Fix RCU lockdep splats
@ 2010-05-01  0:25 Paul E. McKenney
  2010-05-01  0:26 ` [PATCH tip/core/urgent 01/10] rcu: v2: optionally leave lockdep enabled after RCU lockdep splat Paul E. McKenney
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Paul E. McKenney @ 2010-05-01  0:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet

Hello!

This patchset contains ten fixes for various lockdep splats.  The first
two sets are repostings/revisions, the rest new.  The new patches have
all been posted to LKML, but this is the first time for inclusion.

o	rcu: v2: optionally leave lockdep enabled after RCU lockdep splat
	This is a repost that makes the one-splat-per-boot the default,
	but allows those who want multiple splats to get this behavior
	via a new CONFIG_PROVE_RCU_REPEATEDLY configuration parameter.
	(Original from Lai Jiangshan.)

o	KEYS: Fix an RCU warning
	KEYS: Fix an RCU warning in the reading of user keys
	Fixes for RCU-lockdep splats from David Howells for
	security/keys.  Repost of http://lkml.org/lkml/2010/4/22/411.

o	cgroup: Fix an RCU warning in cgroup_path()
	cgroup: Fix an RCU warning in alloc_css_id()
	sched: Fix an RCU warning in print_task()
	cgroup: Check task_lock in task_subsys_state()
	Fixes for new RCU-lockdep splats in cgroups and sched from
	Li Zefan.

o	memcg: css_id() must be called under rcu_read_lock()
	Fixes for new RCU-lockdep splats in memcg from Kamazawa Hiroyuki.

o	blk-cgroup: Fix RCU correctness warning in cfq_init_queue()
	Fix for new RCU-lockdep splat in I/O scheduler from Vivek Goyal.

o	vfs: fix RCU-lockdep false positive due to /proc access
	Fix for new RCU-lockdep splat from fdtable.h, with much
	debugging assist from Eric Dumazet.


 b/block/cfq-iosched.c          |    2 ++
 b/include/linux/cgroup.h       |    1 +
 b/include/linux/fdtable.h      |    3 ++-
 b/include/linux/rcupdate.h     |   15 +++++++++++----
 b/kernel/cgroup.c              |   12 +++++++++---
 b/kernel/lockdep.c             |    2 ++
 b/kernel/sched_debug.c         |    2 ++
 b/lib/Kconfig.debug            |   12 ++++++++++++
 b/mm/memcontrol.c              |   21 ++++++++++++++++-----
 b/security/keys/request_key.c  |   13 ++++++++-----
 b/security/keys/user_defined.c |    3 ++-
 kernel/cgroup.c                |    4 ++--
 12 files changed, 69 insertions(+), 21 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH tip/core/urgent 01/10] rcu: v2: optionally leave lockdep enabled after RCU lockdep splat
  2010-05-01  0:25 [PATCH tip/core/urgent 0/10] v2: Fix RCU lockdep splats Paul E. McKenney
@ 2010-05-01  0:26 ` Paul E. McKenney
  2010-05-01  0:26 ` [PATCH tip/core/urgent 02/10] KEYS: Fix an RCU warning Paul E. McKenney
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2010-05-01  0:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, Paul E. McKenney

From: Lai Jiangshan <laijs@cn.fujitsu.com>

There is no need to disable lockdep after an RCU lockdep splat, so
remove the debug_lockdeps_off() from lockdep_rcu_dereference().
To avoid repeated lockdep splats, use a static variable in the
inlined rcu_dereference_check() and rcu_dereference_protected()
macros so that a given instance splats only once, but so that
multiple instances can be detected per boot.

This is controlled by a new config variable CONFIG_PROVE_RCU_REPEATEDLY,
which is disabled by default.

Requested-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/rcupdate.h |   15 +++++++++++----
 kernel/lockdep.c         |    2 ++
 lib/Kconfig.debug        |   12 ++++++++++++
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 07db2fe..ec9ab49 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -190,6 +190,15 @@ static inline int rcu_read_lock_sched_held(void)
 
 #ifdef CONFIG_PROVE_RCU
 
+#define __do_rcu_dereference_check(c)					\
+	do {								\
+		static bool __warned;					\
+		if (debug_lockdep_rcu_enabled() && !__warned && !(c)) {	\
+			__warned = true;				\
+			lockdep_rcu_dereference(__FILE__, __LINE__);	\
+		}							\
+	} while (0)
+
 /**
  * rcu_dereference_check - rcu_dereference with debug checking
  * @p: The pointer to read, prior to dereferencing
@@ -219,8 +228,7 @@ static inline int rcu_read_lock_sched_held(void)
  */
 #define rcu_dereference_check(p, c) \
 	({ \
-		if (debug_lockdep_rcu_enabled() && !(c)) \
-			lockdep_rcu_dereference(__FILE__, __LINE__); \
+		__do_rcu_dereference_check(c); \
 		rcu_dereference_raw(p); \
 	})
 
@@ -237,8 +245,7 @@ static inline int rcu_read_lock_sched_held(void)
  */
 #define rcu_dereference_protected(p, c) \
 	({ \
-		if (debug_lockdep_rcu_enabled() && !(c)) \
-			lockdep_rcu_dereference(__FILE__, __LINE__); \
+		__do_rcu_dereference_check(c); \
 		(p); \
 	})
 
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 2594e1c..73747b7 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -3801,8 +3801,10 @@ void lockdep_rcu_dereference(const char *file, const int line)
 {
 	struct task_struct *curr = current;
 
+#ifndef CONFIG_PROVE_RCU_REPEATEDLY
 	if (!debug_locks_off())
 		return;
+#endif /* #ifdef CONFIG_PROVE_RCU_REPEATEDLY */
 	printk("\n===================================================\n");
 	printk(  "[ INFO: suspicious rcu_dereference_check() usage. ]\n");
 	printk(  "---------------------------------------------------\n");
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 935248b..94090b4 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -512,6 +512,18 @@ config PROVE_RCU
 
 	 Say N if you are unsure.
 
+config PROVE_RCU_REPEATEDLY
+	bool "RCU debugging: don't disable PROVE_RCU on first splat"
+	depends on PROVE_RCU
+	default n
+	help
+	 By itself, PROVE_RCU will disable checking upon issuing the
+	 first warning (or "splat").  This feature prevents such
+	 disabling, allowing multiple RCU-lockdep warnings to be printed
+	 on a single reboot.
+
+	 Say N if you are unsure.
+
 config LOCKDEP
 	bool
 	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/urgent 02/10] KEYS: Fix an RCU warning
  2010-05-01  0:25 [PATCH tip/core/urgent 0/10] v2: Fix RCU lockdep splats Paul E. McKenney
  2010-05-01  0:26 ` [PATCH tip/core/urgent 01/10] rcu: v2: optionally leave lockdep enabled after RCU lockdep splat Paul E. McKenney
@ 2010-05-01  0:26 ` Paul E. McKenney
  2010-05-01  0:26 ` [PATCH tip/core/urgent 03/10] KEYS: Fix an RCU warning in the reading of user keys Paul E. McKenney
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2010-05-01  0:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, Paul E. McKenney

From: David Howells <dhowells@redhat.com>

Fix the following RCU warning:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
security/keys/request_key.c:116 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by keyctl/5372:
 #0:  (key_types_sem){.+.+.+}, at: [<ffffffff811a4e3d>] key_type_lookup+0x1c/0x70

stack backtrace:
Pid: 5372, comm: keyctl Not tainted 2.6.34-rc3-cachefs #150
Call Trace:
 [<ffffffff810515f8>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffff811a9220>] call_sbin_request_key+0x156/0x2b6
 [<ffffffff811a4c66>] ? __key_instantiate_and_link+0xb1/0xdc
 [<ffffffff811a4cd3>] ? key_instantiate_and_link+0x42/0x5f
 [<ffffffff811a96b8>] ? request_key_auth_new+0x17b/0x1f3
 [<ffffffff811a8e00>] ? request_key_and_link+0x271/0x400
 [<ffffffff810aba6f>] ? kmem_cache_alloc+0xe1/0x118
 [<ffffffff811a8f1a>] request_key_and_link+0x38b/0x400
 [<ffffffff811a7b72>] sys_request_key+0xf7/0x14a
 [<ffffffff81052227>] ? trace_hardirqs_on_caller+0x10c/0x130
 [<ffffffff81393f5c>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b

This was caused by doing:

	[root@andromeda ~]# keyctl newring fred @s
	539196288
	[root@andromeda ~]# keyctl request2 user a a 539196288
	request_key: Required key not available

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 security/keys/request_key.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/security/keys/request_key.c b/security/keys/request_key.c
index 03fe63e..ea97c31 100644
--- a/security/keys/request_key.c
+++ b/security/keys/request_key.c
@@ -68,7 +68,8 @@ static int call_sbin_request_key(struct key_construction *cons,
 {
 	const struct cred *cred = current_cred();
 	key_serial_t prkey, sskey;
-	struct key *key = cons->key, *authkey = cons->authkey, *keyring;
+	struct key *key = cons->key, *authkey = cons->authkey, *keyring,
+		*session;
 	char *argv[9], *envp[3], uid_str[12], gid_str[12];
 	char key_str[12], keyring_str[3][12];
 	char desc[20];
@@ -112,10 +113,12 @@ static int call_sbin_request_key(struct key_construction *cons,
 	if (cred->tgcred->process_keyring)
 		prkey = cred->tgcred->process_keyring->serial;
 
-	if (cred->tgcred->session_keyring)
-		sskey = rcu_dereference(cred->tgcred->session_keyring)->serial;
-	else
-		sskey = cred->user->session_keyring->serial;
+	rcu_read_lock();
+	session = rcu_dereference(cred->tgcred->session_keyring);
+	if (!session)
+		session = cred->user->session_keyring;
+	sskey = session->serial;
+	rcu_read_unlock();
 
 	sprintf(keyring_str[2], "%d", sskey);
 
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/urgent 03/10] KEYS: Fix an RCU warning in the reading of user keys
  2010-05-01  0:25 [PATCH tip/core/urgent 0/10] v2: Fix RCU lockdep splats Paul E. McKenney
  2010-05-01  0:26 ` [PATCH tip/core/urgent 01/10] rcu: v2: optionally leave lockdep enabled after RCU lockdep splat Paul E. McKenney
  2010-05-01  0:26 ` [PATCH tip/core/urgent 02/10] KEYS: Fix an RCU warning Paul E. McKenney
@ 2010-05-01  0:26 ` Paul E. McKenney
  2010-05-01  0:26 ` [PATCH tip/core/urgent 04/10] cgroup: Fix an RCU warning in cgroup_path() Paul E. McKenney
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2010-05-01  0:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, Paul E. McKenney

From: David Howells <dhowells@redhat.com>

Fix an RCU warning in the reading of user keys:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
security/keys/user_defined.c:202 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by keyctl/3637:
 #0:  (&key->sem){+++++.}, at: [<ffffffff811a80ae>] keyctl_read_key+0x9c/0xcf

stack backtrace:
Pid: 3637, comm: keyctl Not tainted 2.6.34-rc5-cachefs #18
Call Trace:
 [<ffffffff81051f6c>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffff811aa55f>] user_read+0x47/0x91
 [<ffffffff811a80be>] keyctl_read_key+0xac/0xcf
 [<ffffffff811a8a06>] sys_keyctl+0x75/0xb7
 [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 security/keys/user_defined.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/security/keys/user_defined.c b/security/keys/user_defined.c
index 7c687d5..e9aa079 100644
--- a/security/keys/user_defined.c
+++ b/security/keys/user_defined.c
@@ -199,7 +199,8 @@ long user_read(const struct key *key, char __user *buffer, size_t buflen)
 	struct user_key_payload *upayload;
 	long ret;
 
-	upayload = rcu_dereference(key->payload.data);
+	upayload = rcu_dereference_protected(
+		key->payload.data, rwsem_is_locked(&((struct key *)key)->sem));
 	ret = upayload->datalen;
 
 	/* we can return the data as is */
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/urgent 04/10] cgroup: Fix an RCU warning in cgroup_path()
  2010-05-01  0:25 [PATCH tip/core/urgent 0/10] v2: Fix RCU lockdep splats Paul E. McKenney
                   ` (2 preceding siblings ...)
  2010-05-01  0:26 ` [PATCH tip/core/urgent 03/10] KEYS: Fix an RCU warning in the reading of user keys Paul E. McKenney
@ 2010-05-01  0:26 ` Paul E. McKenney
  2010-05-01  0:26 ` [PATCH tip/core/urgent 05/10] cgroup: Fix an RCU warning in alloc_css_id() Paul E. McKenney
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2010-05-01  0:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, Li Zefan, Paul E. McKenney

From: Li Zefan <lizf@cn.fujitsu.com>

with CONFIG_PROVE_RCU=y, a warning can be triggered:

  # mount -t cgroup -o debug xxx /mnt
  # cat /proc/$$/cgroup

...
kernel/cgroup.c:1649 invoked rcu_dereference_check() without protection!
...

This is a false-positive, because cgroup_path() can be called
with either rcu_read_lock() held or cgroup_mutex held.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/cgroup.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e2769e1..4ca928d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1646,7 +1646,9 @@ static inline struct cftype *__d_cft(struct dentry *dentry)
 int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
 {
 	char *start;
-	struct dentry *dentry = rcu_dereference(cgrp->dentry);
+	struct dentry *dentry = rcu_dereference_check(cgrp->dentry,
+						      rcu_read_lock_held() ||
+						      cgroup_lock_is_held());
 
 	if (!dentry || cgrp == dummytop) {
 		/*
@@ -1662,13 +1664,17 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
 	*--start = '\0';
 	for (;;) {
 		int len = dentry->d_name.len;
+
 		if ((start -= len) < buf)
 			return -ENAMETOOLONG;
-		memcpy(start, cgrp->dentry->d_name.name, len);
+		memcpy(start, dentry->d_name.name, len);
 		cgrp = cgrp->parent;
 		if (!cgrp)
 			break;
-		dentry = rcu_dereference(cgrp->dentry);
+
+		dentry = rcu_dereference_check(cgrp->dentry,
+					       rcu_read_lock_held() ||
+					       cgroup_lock_is_held());
 		if (!cgrp->parent)
 			continue;
 		if (--start < buf)
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/urgent 05/10] cgroup: Fix an RCU warning in alloc_css_id()
  2010-05-01  0:25 [PATCH tip/core/urgent 0/10] v2: Fix RCU lockdep splats Paul E. McKenney
                   ` (3 preceding siblings ...)
  2010-05-01  0:26 ` [PATCH tip/core/urgent 04/10] cgroup: Fix an RCU warning in cgroup_path() Paul E. McKenney
@ 2010-05-01  0:26 ` Paul E. McKenney
  2010-05-01  0:26 ` [PATCH tip/core/urgent 06/10] sched: Fix an RCU warning in print_task() Paul E. McKenney
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2010-05-01  0:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, Li Zefan, Paul E. McKenney

From: Li Zefan <lizf@cn.fujitsu.com>

With CONFIG_PROVE_RCU=y, a warning can be triggered:

  # mount -t cgroup -o memory xxx /mnt
  # mkdir /mnt/0

...
kernel/cgroup.c:4442 invoked rcu_dereference_check() without protection!
...

This is a false-positive. It's safe to directly access parent_css->id.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/cgroup.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 4ca928d..3a53c77 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4561,13 +4561,13 @@ static int alloc_css_id(struct cgroup_subsys *ss, struct cgroup *parent,
 {
 	int subsys_id, i, depth = 0;
 	struct cgroup_subsys_state *parent_css, *child_css;
-	struct css_id *child_id, *parent_id = NULL;
+	struct css_id *child_id, *parent_id;
 
 	subsys_id = ss->subsys_id;
 	parent_css = parent->subsys[subsys_id];
 	child_css = child->subsys[subsys_id];
-	depth = css_depth(parent_css) + 1;
 	parent_id = parent_css->id;
+	depth = parent_id->depth;
 
 	child_id = get_new_cssid(ss, depth);
 	if (IS_ERR(child_id))
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/urgent 06/10] sched: Fix an RCU warning in print_task()
  2010-05-01  0:25 [PATCH tip/core/urgent 0/10] v2: Fix RCU lockdep splats Paul E. McKenney
                   ` (4 preceding siblings ...)
  2010-05-01  0:26 ` [PATCH tip/core/urgent 05/10] cgroup: Fix an RCU warning in alloc_css_id() Paul E. McKenney
@ 2010-05-01  0:26 ` Paul E. McKenney
  2010-05-01  0:26 ` [PATCH tip/core/urgent 07/10] cgroup: Check task_lock in task_subsys_state() Paul E. McKenney
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2010-05-01  0:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, Li Zefan, Paul E. McKenney

From: Li Zefan <lizf@cn.fujitsu.com>

With CONFIG_PROVE_RCU=y, a warning can be triggered:

  $ cat /proc/sched_debug

...
kernel/cgroup.c:1649 invoked rcu_dereference_check() without protection!
...

Both cgroup_path() and task_group() should be called with either
rcu_read_lock or cgroup_mutex held.

The rcu_dereference_check() does include cgroup_lock_is_held(), so we
know that this lock is not held.  Therefore, in a CONFIG_PREEMPT kernel,
to say nothing of a CONFIG_PREEMPT_RT kernel, the original code could
have ended up copying a string out of the freelist.

This patch inserts RCU read-side primitives needed to prevent this
scenario.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/sched_debug.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c
index 9b49db1..19be00b 100644
--- a/kernel/sched_debug.c
+++ b/kernel/sched_debug.c
@@ -114,7 +114,9 @@ print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
 	{
 		char path[64];
 
+		rcu_read_lock();
 		cgroup_path(task_group(p)->css.cgroup, path, sizeof(path));
+		rcu_read_unlock();
 		SEQ_printf(m, " %s", path);
 	}
 #endif
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/urgent 07/10] cgroup: Check task_lock in task_subsys_state()
  2010-05-01  0:25 [PATCH tip/core/urgent 0/10] v2: Fix RCU lockdep splats Paul E. McKenney
                   ` (5 preceding siblings ...)
  2010-05-01  0:26 ` [PATCH tip/core/urgent 06/10] sched: Fix an RCU warning in print_task() Paul E. McKenney
@ 2010-05-01  0:26 ` Paul E. McKenney
  2010-05-01  0:26 ` [PATCH tip/core/urgent 08/10] memcg: css_id() must be called under rcu_read_lock() Paul E. McKenney
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2010-05-01  0:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, Li Zefan, Paul E. McKenney

From: Li Zefan <lizf@cn.fujitsu.com>

Expand task_subsys_state()'s rcu_dereference_check() to include the full
locking rule as documented in Documentation/cgroups/cgroups.txt by adding
a check for task->alloc_lock being held.

This fixes an RCU false positive when resuming from suspend. The warning
comes from freezer cgroup in cgroup_freezing_or_frozen().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/cgroup.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b8ad1ea..8f78073 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -530,6 +530,7 @@ static inline struct cgroup_subsys_state *task_subsys_state(
 {
 	return rcu_dereference_check(task->cgroups->subsys[subsys_id],
 				     rcu_read_lock_held() ||
+				     lockdep_is_held(&task->alloc_lock) ||
 				     cgroup_lock_is_held());
 }
 
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/urgent 08/10] memcg: css_id() must be called under rcu_read_lock()
  2010-05-01  0:25 [PATCH tip/core/urgent 0/10] v2: Fix RCU lockdep splats Paul E. McKenney
                   ` (6 preceding siblings ...)
  2010-05-01  0:26 ` [PATCH tip/core/urgent 07/10] cgroup: Check task_lock in task_subsys_state() Paul E. McKenney
@ 2010-05-01  0:26 ` Paul E. McKenney
  2010-05-01  0:26 ` [PATCH tip/core/urgent 09/10] blk-cgroup: Fix RCU correctness warning in cfq_init_queue() Paul E. McKenney
  2010-05-01  0:26 ` [PATCH tip/core/urgent 10/10] vfs: fix RCU-lockdep false positive due to /proc access Paul E. McKenney
  9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2010-05-01  0:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, Paul E. McKenney, Daisuke Nishimura, Balbir Singh,
	KAMEZAWA Hiroyuki

This patch fixes task_in_mem_cgroup(), mem_cgroup_uncharge_swapcache(),
mem_cgroup_move_swap_account(), and is_target_pte_for_mc() to protect
calls to css_id().  An additional RCU lockdep splat was reported for
memcg_oom_wake_function(), however, this function is not yet in
mainline as of 2.6.34-rc5.

Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 mm/memcontrol.c |   21 ++++++++++++++++-----
 1 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f4ede99..e06490d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -811,10 +811,12 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
 	 * enabled in "curr" and "curr" is a child of "mem" in *cgroup*
 	 * hierarchy(even if use_hierarchy is disabled in "mem").
 	 */
+	rcu_read_lock();
 	if (mem->use_hierarchy)
 		ret = css_is_ancestor(&curr->css, &mem->css);
 	else
 		ret = (curr == mem);
+	rcu_read_unlock();
 	css_put(&curr->css);
 	return ret;
 }
@@ -2312,7 +2314,9 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
 
 	/* record memcg information */
 	if (do_swap_account && swapout && memcg) {
+		rcu_read_lock();
 		swap_cgroup_record(ent, css_id(&memcg->css));
+		rcu_read_unlock();
 		mem_cgroup_get(memcg);
 	}
 	if (swapout && memcg)
@@ -2369,8 +2373,10 @@ static int mem_cgroup_move_swap_account(swp_entry_t entry,
 {
 	unsigned short old_id, new_id;
 
+	rcu_read_lock();
 	old_id = css_id(&from->css);
 	new_id = css_id(&to->css);
+	rcu_read_unlock();
 
 	if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) {
 		mem_cgroup_swap_statistics(from, false);
@@ -4038,11 +4044,16 @@ static int is_target_pte_for_mc(struct vm_area_struct *vma,
 			put_page(page);
 	}
 	/* throught */
-	if (ent.val && do_swap_account && !ret &&
-			css_id(&mc.from->css) == lookup_swap_cgroup(ent)) {
-		ret = MC_TARGET_SWAP;
-		if (target)
-			target->ent = ent;
+	if (ent.val && do_swap_account && !ret) {
+		unsigned short id;
+		rcu_read_lock();
+		id = css_id(&mc.from->css);
+		rcu_read_unlock();
+		if (id == lookup_swap_cgroup(ent)) {
+			ret = MC_TARGET_SWAP;
+			if (target)
+				target->ent = ent;
+		}
 	}
 	return ret;
 }
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/urgent 09/10] blk-cgroup: Fix RCU correctness warning in cfq_init_queue()
  2010-05-01  0:25 [PATCH tip/core/urgent 0/10] v2: Fix RCU lockdep splats Paul E. McKenney
                   ` (7 preceding siblings ...)
  2010-05-01  0:26 ` [PATCH tip/core/urgent 08/10] memcg: css_id() must be called under rcu_read_lock() Paul E. McKenney
@ 2010-05-01  0:26 ` Paul E. McKenney
  2010-05-01  0:26 ` [PATCH tip/core/urgent 10/10] vfs: fix RCU-lockdep false positive due to /proc access Paul E. McKenney
  9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2010-05-01  0:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, Vivek Goyal, Paul E. McKenney

From: Vivek Goyal <vgoyal@redhat.com>

It is necessary to be in an RCU read-side critical section when invoking
css_id(), so this patch adds one to blkiocg_add_blkio_group().  This is
actually a false positive, because this is called at initialization time
and hence always refers to the root cgroup, which cannot go away.

[  103.790505] ===================================================
[  103.790509] [ INFO: suspicious rcu_dereference_check() usage. ]
[  103.790511] ---------------------------------------------------
[  103.790514] kernel/cgroup.c:4432 invoked rcu_dereference_check() without protection!
[  103.790517]
[  103.790517] other info that might help us debug this:
[  103.790519]
[  103.790521]
[  103.790521] rcu_scheduler_active = 1, debug_locks = 1
[  103.790524] 4 locks held by bash/4422:
[  103.790526]  #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff8114befa>] sysfs_write_file+0x3c/0x144
[  103.790537]  #1:  (s_active#102){.+.+.+}, at: [<ffffffff8114bfa5>] sysfs_write_file+0xe7/0x144
[  103.790544]  #2:  (&q->sysfs_lock){+.+.+.}, at: [<ffffffff812263b1>] queue_attr_store+0x49/0x8f
[  103.790552]  #3:  (&(&blkcg->lock)->rlock){......}, at: [<ffffffff8122e4db>] blkiocg_add_blkio_group+0x2b/0xad
[  103.790560]
[  103.790561] stack backtrace:
[  103.790564] Pid: 4422, comm: bash Not tainted 2.6.34-rc4-blkio-second-crash #81
[  103.790567] Call Trace:
[  103.790572]  [<ffffffff81068f57>] lockdep_rcu_dereference+0x9d/0xa5
[  103.790577]  [<ffffffff8107fac1>] css_id+0x44/0x57
[  103.790581]  [<ffffffff8122e503>] blkiocg_add_blkio_group+0x53/0xad
[  103.790586]  [<ffffffff81231936>] cfq_init_queue+0x139/0x32c
[  103.790591]  [<ffffffff8121f2d0>] elv_iosched_store+0xbf/0x1bf
[  103.790595]  [<ffffffff812263d8>] queue_attr_store+0x70/0x8f
[  103.790599]  [<ffffffff8114bfa5>] ? sysfs_write_file+0xe7/0x144
[  103.790603]  [<ffffffff8114bfc6>] sysfs_write_file+0x108/0x144
[  103.790609]  [<ffffffff810f527f>] vfs_write+0xae/0x10b
[  103.790612]  [<ffffffff81069863>] ? trace_hardirqs_on_caller+0x10c/0x130
[  103.790616]  [<ffffffff810f539c>] sys_write+0x4a/0x6e
[  103.790622]  [<ffffffff81002b5b>] system_call_fastpath+0x16/0x1b
[  103.790625]

Located-by: Miles Lane <miles.lane@gmail.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 block/cfq-iosched.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 838834b..5f127cf 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3694,8 +3694,10 @@ static void *cfq_init_queue(struct request_queue *q)
 	 * to make sure that cfq_put_cfqg() does not try to kfree root group
 	 */
 	atomic_set(&cfqg->ref, 1);
+	rcu_read_lock();
 	blkiocg_add_blkio_group(&blkio_root_cgroup, &cfqg->blkg, (void *)cfqd,
 					0);
+	rcu_read_unlock();
 #endif
 	/*
 	 * Not strictly needed (since RB_ROOT just clears the node and we
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/urgent 10/10] vfs: fix RCU-lockdep false positive due to /proc access
  2010-05-01  0:25 [PATCH tip/core/urgent 0/10] v2: Fix RCU lockdep splats Paul E. McKenney
                   ` (8 preceding siblings ...)
  2010-05-01  0:26 ` [PATCH tip/core/urgent 09/10] blk-cgroup: Fix RCU correctness warning in cfq_init_queue() Paul E. McKenney
@ 2010-05-01  0:26 ` Paul E. McKenney
  9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2010-05-01  0:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, Paul E. McKenney

If a single-threaded process does a file-descriptor operation, and
some other process accesses that same file descriptor via /proc,
the current rcu_dereference_check_fdtable() can give a false-positive
RCU-lockdep splat due to the reference count being increased by the
/proc access after the reference-count check in fget_light() but before
the check in rcu_dereference_check_fdtable().

This commit prevents this false positive by checking for a single-threaded
process.

Located-by: Miles Lane <miles.lane@gmail.com>
Located-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 include/linux/fdtable.h |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 013dc52..e4a6d31 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -61,7 +61,8 @@ struct files_struct {
 	(rcu_dereference_check((fdtfd), \
 			       rcu_read_lock_held() || \
 			       lockdep_is_held(&(files)->file_lock) || \
-			       atomic_read(&(files)->count) == 1))
+			       atomic_read(&(files)->count) == 1 || \
+			       thread_group_empty(current)))
 
 #define files_fdtable(files) \
 		(rcu_dereference_check_fdtable((files), (files)->fdt))
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH tip/core/urgent 08/10] memcg: css_id() must be called under rcu_read_lock()
  2010-05-03 18:52 [PATCH tip/core/urgent 0/10] v3: Fix RCU lockdep splats Paul E. McKenney
@ 2010-05-03 18:53 ` Paul E. McKenney
  2010-05-07 19:11   ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2010-05-03 18:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, laijs, dipankar, akpm, mathieu.desnoyers, josh, dvhltc,
	niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, Paul E. McKenney, Daisuke Nishimura, Balbir Singh,
	KAMEZAWA Hiroyuki

This patch fixes task_in_mem_cgroup(), mem_cgroup_uncharge_swapcache(),
mem_cgroup_move_swap_account(), and is_target_pte_for_mc() to protect
calls to css_id().  An additional RCU lockdep splat was reported for
memcg_oom_wake_function(), however, this function is not yet in
mainline as of 2.6.34-rc5.

Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 mm/memcontrol.c |   21 ++++++++++++++++-----
 1 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f4ede99..e06490d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -811,10 +811,12 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
 	 * enabled in "curr" and "curr" is a child of "mem" in *cgroup*
 	 * hierarchy(even if use_hierarchy is disabled in "mem").
 	 */
+	rcu_read_lock();
 	if (mem->use_hierarchy)
 		ret = css_is_ancestor(&curr->css, &mem->css);
 	else
 		ret = (curr == mem);
+	rcu_read_unlock();
 	css_put(&curr->css);
 	return ret;
 }
@@ -2312,7 +2314,9 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
 
 	/* record memcg information */
 	if (do_swap_account && swapout && memcg) {
+		rcu_read_lock();
 		swap_cgroup_record(ent, css_id(&memcg->css));
+		rcu_read_unlock();
 		mem_cgroup_get(memcg);
 	}
 	if (swapout && memcg)
@@ -2369,8 +2373,10 @@ static int mem_cgroup_move_swap_account(swp_entry_t entry,
 {
 	unsigned short old_id, new_id;
 
+	rcu_read_lock();
 	old_id = css_id(&from->css);
 	new_id = css_id(&to->css);
+	rcu_read_unlock();
 
 	if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) {
 		mem_cgroup_swap_statistics(from, false);
@@ -4038,11 +4044,16 @@ static int is_target_pte_for_mc(struct vm_area_struct *vma,
 			put_page(page);
 	}
 	/* throught */
-	if (ent.val && do_swap_account && !ret &&
-			css_id(&mc.from->css) == lookup_swap_cgroup(ent)) {
-		ret = MC_TARGET_SWAP;
-		if (target)
-			target->ent = ent;
+	if (ent.val && do_swap_account && !ret) {
+		unsigned short id;
+		rcu_read_lock();
+		id = css_id(&mc.from->css);
+		rcu_read_unlock();
+		if (id == lookup_swap_cgroup(ent)) {
+			ret = MC_TARGET_SWAP;
+			if (target)
+				target->ent = ent;
+		}
 	}
 	return ret;
 }
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH tip/core/urgent 08/10] memcg: css_id() must be called under rcu_read_lock()
  2010-05-03 18:53 ` [PATCH tip/core/urgent 08/10] memcg: css_id() must be called under rcu_read_lock() Paul E. McKenney
@ 2010-05-07 19:11   ` Andrew Morton
  2010-05-10  0:17     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2010-05-07 19:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, laijs, dipankar, mathieu.desnoyers, josh,
	dvhltc, niv, tglx, peterz, rostedt, Valdis.Kletnieks, dhowells,
	eric.dumazet, Daisuke Nishimura, Balbir Singh, KAMEZAWA Hiroyuki

On Mon,  3 May 2010 11:53:17 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> This patch fixes task_in_mem_cgroup(), mem_cgroup_uncharge_swapcache(),
> mem_cgroup_move_swap_account(), and is_target_pte_for_mc() to protect
> calls to css_id().  An additional RCU lockdep splat was reported for
> memcg_oom_wake_function(), however, this function is not yet in
> mainline as of 2.6.34-rc5.
> 
> ...
>
> index f4ede99..e06490d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -811,10 +811,12 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
>  	 * enabled in "curr" and "curr" is a child of "mem" in *cgroup*
>  	 * hierarchy(even if use_hierarchy is disabled in "mem").
>  	 */
> +	rcu_read_lock();
>  	if (mem->use_hierarchy)
>  		ret = css_is_ancestor(&curr->css, &mem->css);
>  	else
>  		ret = (curr == mem);
> +	rcu_read_unlock();
>  	css_put(&curr->css);
>  	return ret;
>  }

The above hunk seems to be locking around css_is_ancestor(), not
css_id() as the changelog states.

> @@ -2312,7 +2314,9 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
>  
>  	/* record memcg information */
>  	if (do_swap_account && swapout && memcg) {
> +		rcu_read_lock();
>  		swap_cgroup_record(ent, css_id(&memcg->css));
> +		rcu_read_unlock();
>  		mem_cgroup_get(memcg);
>  	}
>  	if (swapout && memcg)

That makes some sense - the lock is held across the call and the use of
the result of the call.


> @@ -2369,8 +2373,10 @@ static int mem_cgroup_move_swap_account(swp_entry_t entry,
>  {
>  	unsigned short old_id, new_id;
>  
> +	rcu_read_lock();
>  	old_id = css_id(&from->css);
>  	new_id = css_id(&to->css);
> +	rcu_read_unlock();
>  
>  	if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) {
>  		mem_cgroup_swap_statistics(from, false);

This doesn't make sense.  We take the lock, read the values, drop the
lock and then use the now-possibly-wrong values.

> @@ -4038,11 +4044,16 @@ static int is_target_pte_for_mc(struct vm_area_struct *vma,
>  			put_page(page);
>  	}
>  	/* throught */
> -	if (ent.val && do_swap_account && !ret &&
> -			css_id(&mc.from->css) == lookup_swap_cgroup(ent)) {
> -		ret = MC_TARGET_SWAP;
> -		if (target)
> -			target->ent = ent;
> +	if (ent.val && do_swap_account && !ret) {
> +		unsigned short id;

Please put a newline between end-of-locals and start-of-code.

> +		rcu_read_lock();
> +		id = css_id(&mc.from->css);
> +		rcu_read_unlock();
> +		if (id == lookup_swap_cgroup(ent)) {
> +			ret = MC_TARGET_SWAP;
> +			if (target)
> +				target->ent = ent;
> +		}

Again, when we use `id', the lock has been dropped.  The value which
css_id() returned might no longer be correct.



The merge of this patch caused rejections in -mm's
memcg-clean-up-move-charge.patch (at least).  It may have caused more,
I haven't checked yet.  The code I have here now does:

	if (ent.val && !ret) {
		unsigned short id;

		rcu_read_lock();
		id = css_id(&mc.from->css);
		rcu_read_unlock();
		if (id == lookup_swap_cgroup(ent)) {
			ret = MC_TARGET_SWAP;
			if (target)
				target->ent = ent;
		}
	}

however I suspect it would be saner to do

	if (ent.val && !ret) {
		rcu_read_lock();
		if (css_id(&mc.from->css) == lookup_swap_cgroup(ent)) {
			ret = MC_TARGET_SWAP;
			if (target)
				target->ent = ent;
		}
		rcu_read_unlock();
	}

However this still doesn't make a lot of sense because three nanoseonds
after we've done rcu_read_unlock(), the value of

	css_id(&mc.from->css) == lookup_swap_cgroup(ent)

might have changed.  So I'd ask the memcg guys to have a more serious
think about all of this please.  I get the feeling that the original
patch just splattered rcu_read_lock() around the place to silence a
runtime warning without digging into what the code is really supposed
to be doing.

The mem_cgroup_move_swap_account() would benefit from some attention
also please.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH tip/core/urgent 08/10] memcg: css_id() must be called under rcu_read_lock()
  2010-05-07 19:11   ` Andrew Morton
@ 2010-05-10  0:17     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 14+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-05-10  0:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Paul E. McKenney, linux-kernel, mingo, laijs, dipankar,
	mathieu.desnoyers, josh, dvhltc, niv, tglx, peterz, rostedt,
	Valdis.Kletnieks, dhowells, eric.dumazet, Daisuke Nishimura,
	Balbir Singh

On Fri, 7 May 2010 12:11:38 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon,  3 May 2010 11:53:17 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > This patch fixes task_in_mem_cgroup(), mem_cgroup_uncharge_swapcache(),
> > mem_cgroup_move_swap_account(), and is_target_pte_for_mc() to protect
> > calls to css_id().  An additional RCU lockdep splat was reported for
> > memcg_oom_wake_function(), however, this function is not yet in
> > mainline as of 2.6.34-rc5.
> > 
> > ...
> >
> > index f4ede99..e06490d 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -811,10 +811,12 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
> >  	 * enabled in "curr" and "curr" is a child of "mem" in *cgroup*
> >  	 * hierarchy(even if use_hierarchy is disabled in "mem").
> >  	 */
> > +	rcu_read_lock();
> >  	if (mem->use_hierarchy)
> >  		ret = css_is_ancestor(&curr->css, &mem->css);
> >  	else
> >  		ret = (curr == mem);
> > +	rcu_read_unlock();
> >  	css_put(&curr->css);
> >  	return ret;
> >  }
> 
> The above hunk seems to be locking around css_is_ancestor(), not
> css_id() as the changelog states.
> 

Hmm. I'll move rcu_xxx to cgroup.c::css_is_ancestor().
(But .....because we have css's reference count, rcu_read_lock isn't
 necessary...lock-check founds it as bug but this was intentional.)



> > @@ -2312,7 +2314,9 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
> >  
> >  	/* record memcg information */
> >  	if (do_swap_account && swapout && memcg) {
> > +		rcu_read_lock();
> >  		swap_cgroup_record(ent, css_id(&memcg->css));
> > +		rcu_read_unlock();
> >  		mem_cgroup_get(memcg);
> >  	}
> >  	if (swapout && memcg)
> 
> That makes some sense - the lock is held across the call and the use of
> the result of the call.
> 
> 
> > @@ -2369,8 +2373,10 @@ static int mem_cgroup_move_swap_account(swp_entry_t entry,
> >  {
> >  	unsigned short old_id, new_id;
> >  
> > +	rcu_read_lock();
> >  	old_id = css_id(&from->css);
> >  	new_id = css_id(&to->css);
> > +	rcu_read_unlock();
> >  
> >  	if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) {
> >  		mem_cgroup_swap_statistics(from, false);
> 
> This doesn't make sense.  We take the lock, read the values, drop the
> lock and then use the now-possibly-wrong values.
> 
will fix.

> > @@ -4038,11 +4044,16 @@ static int is_target_pte_for_mc(struct vm_area_struct *vma,
> >  			put_page(page);
> >  	}
> >  	/* throught */
> > -	if (ent.val && do_swap_account && !ret &&
> > -			css_id(&mc.from->css) == lookup_swap_cgroup(ent)) {
> > -		ret = MC_TARGET_SWAP;
> > -		if (target)
> > -			target->ent = ent;
> > +	if (ent.val && do_swap_account && !ret) {
> > +		unsigned short id;
> 
> Please put a newline between end-of-locals and start-of-code.
> 
will fix.

> > +		rcu_read_lock();
> > +		id = css_id(&mc.from->css);
> > +		rcu_read_unlock();
> > +		if (id == lookup_swap_cgroup(ent)) {
> > +			ret = MC_TARGET_SWAP;
> > +			if (target)
> > +				target->ent = ent;
> > +		}
> 
> Again, when we use `id', the lock has been dropped.  The value which
> css_id() returned might no longer be correct.
> 
> 
will fix. 

> 
> The merge of this patch caused rejections in -mm's
> memcg-clean-up-move-charge.patch (at least).  It may have caused more,
> I haven't checked yet.  The code I have here now does:
> 
> 	if (ent.val && !ret) {
> 		unsigned short id;
> 
> 		rcu_read_lock();
> 		id = css_id(&mc.from->css);
> 		rcu_read_unlock();
> 		if (id == lookup_swap_cgroup(ent)) {
> 			ret = MC_TARGET_SWAP;
> 			if (target)
> 				target->ent = ent;
> 		}
> 	}
> 
> however I suspect it would be saner to do
> 
> 	if (ent.val && !ret) {
> 		rcu_read_lock();
> 		if (css_id(&mc.from->css) == lookup_swap_cgroup(ent)) {
> 			ret = MC_TARGET_SWAP;
> 			if (target)
> 				target->ent = ent;
> 		}
> 		rcu_read_unlock();
> 	}
> 

I'll prepare for -rc6 patch and for -mm patch.


> However this still doesn't make a lot of sense because three nanoseonds
> after we've done rcu_read_unlock(), the value of
> 
> 	css_id(&mc.from->css) == lookup_swap_cgroup(ent)
> 
> might have changed.  So I'd ask the memcg guys to have a more serious
> think about all of this please.  I get the feeling that the original
> patch just splattered rcu_read_lock() around the place to silence a
> runtime warning without digging into what the code is really supposed
> to be doing.
> 
In most case, they are intentional and we have reference count of css.

I can think of

	- css_id_rcu()  .... use rcu_dereference().
	- css_id()	... don't use rcu_dereference().

But this seems crazy.



> The mem_cgroup_move_swap_account() would benefit from some attention
> also please.

ok, I'll rewrite. If I find that I can't avoid rejection to -mm, I'll make
a patch for -rc6 to do minimal fixes. And add a patch for fixining remaining
things to -mm.

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-05-10  0:21 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-01  0:25 [PATCH tip/core/urgent 0/10] v2: Fix RCU lockdep splats Paul E. McKenney
2010-05-01  0:26 ` [PATCH tip/core/urgent 01/10] rcu: v2: optionally leave lockdep enabled after RCU lockdep splat Paul E. McKenney
2010-05-01  0:26 ` [PATCH tip/core/urgent 02/10] KEYS: Fix an RCU warning Paul E. McKenney
2010-05-01  0:26 ` [PATCH tip/core/urgent 03/10] KEYS: Fix an RCU warning in the reading of user keys Paul E. McKenney
2010-05-01  0:26 ` [PATCH tip/core/urgent 04/10] cgroup: Fix an RCU warning in cgroup_path() Paul E. McKenney
2010-05-01  0:26 ` [PATCH tip/core/urgent 05/10] cgroup: Fix an RCU warning in alloc_css_id() Paul E. McKenney
2010-05-01  0:26 ` [PATCH tip/core/urgent 06/10] sched: Fix an RCU warning in print_task() Paul E. McKenney
2010-05-01  0:26 ` [PATCH tip/core/urgent 07/10] cgroup: Check task_lock in task_subsys_state() Paul E. McKenney
2010-05-01  0:26 ` [PATCH tip/core/urgent 08/10] memcg: css_id() must be called under rcu_read_lock() Paul E. McKenney
2010-05-01  0:26 ` [PATCH tip/core/urgent 09/10] blk-cgroup: Fix RCU correctness warning in cfq_init_queue() Paul E. McKenney
2010-05-01  0:26 ` [PATCH tip/core/urgent 10/10] vfs: fix RCU-lockdep false positive due to /proc access Paul E. McKenney
  -- strict thread matches above, loose matches on Subject: below --
2010-05-03 18:52 [PATCH tip/core/urgent 0/10] v3: Fix RCU lockdep splats Paul E. McKenney
2010-05-03 18:53 ` [PATCH tip/core/urgent 08/10] memcg: css_id() must be called under rcu_read_lock() Paul E. McKenney
2010-05-07 19:11   ` Andrew Morton
2010-05-10  0:17     ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).