[PATCH 0/6] Potential rt patches for tip/rt/2.6.33

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] Potential rt patches for tip/rt/2.6.33
@ 2010-06-14 22:21 John Kacur
  2010-06-14 22:21 ` [PATCH 1/6] tracing: Update the comm field in the right variable in update_max_tr John Kacur
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: John Kacur @ 2010-06-14 22:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: lkml, rt-users, Clark Williams, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Ingo Molnar

Thomas:

This is a small collection of patches that I have in my -rt tree on-top
of 2.6.33.5-rt23

Please consider them for the next version of tip/rt/2.6.33

They can be pulled from
git://git.kernel.org/pub/scm/linux/kernel/git/jkacur/jk-2.6.git tip-rt-2.6.33-jk

Amit Shah (1):
  hvc_console: Fix race between hvc_close and hvc_remove

Anton Blanchard (1):
  hvc_console: Fix race between hvc_close and hvc_remove

Arnaldo Carvalho de Melo (1):
  tracing: Update the comm field in the right variable in update_max_tr

John Kacur (1):
  lockdep: Make MAX_STACK_TRACE_ENTRIES configurable.

Stephane Eranian (1):
  perf: Fix errors path in perf_output_begin()

Toshiyuki Okajima (1):
  KEYS: find_keyring_by_name() can gain access to a freed keyring

 drivers/char/hvc_console.c |   27 +++++++++++++++++----------
 kernel/lockdep_internals.h |    2 +-
 kernel/perf_event.c        |    2 +-
 kernel/trace/trace.c       |    4 ++--
 lib/Kconfig.debug          |   10 ++++++++++
 security/keys/keyring.c    |   18 +++++++++---------
 6 files changed, 40 insertions(+), 23 deletions(-)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] tracing: Update the comm field in the right variable in update_max_tr
  2010-06-14 22:21 [PATCH 0/6] Potential rt patches for tip/rt/2.6.33 John Kacur
@ 2010-06-14 22:21 ` John Kacur
  2010-06-14 22:21 ` [PATCH 2/6: v4] lockdep: Make MAX_STACK_TRACE_ENTRIES configurable John Kacur
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: John Kacur @ 2010-06-14 22:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: lkml, rt-users, Clark Williams, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Ingo Molnar

From: Arnaldo Carvalho de Melo <acme@redhat.com>

The latency output showed:

 #    | task: -3 (uid:0 nice:0 policy:1 rt_prio:99)

The comm is missing in the "task:" and it looks like a minus 3 is
the output. The correct display should be:

 #    | task: migration/0-3 (uid:0 nice:0 policy:1 rt_prio:99)

The problem is that the comm is being stored in the wrong data
structure. The max_tr.data[cpu] is what stores the comm, not the
tr->data[cpu].

Before this patch the max_tr.data[cpu]->comm was zeroed and the /debug/trace
ended up showing just the '-' sign followed by the pid.

Also remove a needless initialization of max_data.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1267824230-23861-1-git-send-email-acme@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Upstream commit: 1acaa1b2d9b5904c9cce06122990a2d71046ce16
Signed-off-by: John Kacur <jkacur@redhat.com>
---
 kernel/trace/trace.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 9b66ee1..a1cf0ad 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -520,7 +520,7 @@ static void
 __update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu)
 {
 	struct trace_array_cpu *data = tr->data[cpu];
-	struct trace_array_cpu *max_data = tr->data[cpu];
+	struct trace_array_cpu *max_data;
 
 	max_tr.cpu = cpu;
 	max_tr.time_start = data->preempt_timestamp;
@@ -530,7 +530,7 @@ __update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu)
 	max_data->critical_start = data->critical_start;
 	max_data->critical_end = data->critical_end;
 
-	memcpy(data->comm, tsk->comm, TASK_COMM_LEN);
+	memcpy(max_data->comm, tsk->comm, TASK_COMM_LEN);
 	max_data->pid = tsk->pid;
 	max_data->uid = task_uid(tsk);
 	max_data->nice = tsk->static_prio - 20 - MAX_RT_PRIO;
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6: v4] lockdep: Make MAX_STACK_TRACE_ENTRIES configurable.
  2010-06-14 22:21 [PATCH 0/6] Potential rt patches for tip/rt/2.6.33 John Kacur
  2010-06-14 22:21 ` [PATCH 1/6] tracing: Update the comm field in the right variable in update_max_tr John Kacur
@ 2010-06-14 22:21 ` John Kacur
  2010-06-16 18:49   ` Peter Zijlstra
  2010-06-14 22:21 ` [PATCH 3/6] perf: Fix errors path in perf_output_begin() John Kacur
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: John Kacur @ 2010-06-14 22:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: lkml, rt-users, Clark Williams, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Ingo Molnar

Certain debug configurations that have LOCKDEP turned on, run into the limit
where the MAX_STACK_TRACE_ENTRIES are too small. Rather than simply turning
off the locking correctness validator, let the user configure this value
to something reasonable for their system.

This patch was generated against 2.6.33.5-rt23 but is also intended to be
picked-up for mainline.

Message-ID: <alpine.LFD.2.00.1004161329030.11320@localhost.localdomain>
Signed-off-by: John Kacur <jkacur@redhat.com>
---
 kernel/lockdep_internals.h |    2 +-
 lib/Kconfig.debug          |   10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/kernel/lockdep_internals.h b/kernel/lockdep_internals.h
index a2ee95a..d0797bc 100644
--- a/kernel/lockdep_internals.h
+++ b/kernel/lockdep_internals.h
@@ -65,7 +65,7 @@ enum {
  * Stack-trace: tightly packed array of stack backtrace
  * addresses. Protected by the hash_lock.
  */
-#define MAX_STACK_TRACE_ENTRIES	262144UL
+#define MAX_STACK_TRACE_ENTRIES	(CONFIG_MAX_STACK_ENTRIES_KIB*1024UL)
 
 extern struct list_head all_lock_classes;
 extern struct lock_chain lock_chains[];
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index cbf6e02..6087fb0 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -509,6 +509,16 @@ config LOCKDEP
 	select KALLSYMS
 	select KALLSYMS_ALL
 
+config MAX_STACK_ENTRIES_KIB
+	int "MAX_STACK_ENTRIES_KIB for LOCKDEP"
+	depends on LOCKDEP
+	default 256
+	help
+	   This option allows you to change the default MAX_STACK_TRACE_ENTRIES
+	   used by LOCKDEP. The default is 256*1024 = 262144.
+	   Warning: increasing this will increase the size of stack_trace_array
+	   and therefore the kernel size too.
+
 config LOCK_STAT
 	bool "Lock usage statistics"
 	depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] perf: Fix errors path in perf_output_begin()
  2010-06-14 22:21 [PATCH 0/6] Potential rt patches for tip/rt/2.6.33 John Kacur
  2010-06-14 22:21 ` [PATCH 1/6] tracing: Update the comm field in the right variable in update_max_tr John Kacur
  2010-06-14 22:21 ` [PATCH 2/6: v4] lockdep: Make MAX_STACK_TRACE_ENTRIES configurable John Kacur
@ 2010-06-14 22:21 ` John Kacur
  2010-06-14 22:21 ` [PATCH 4/6] KEYS: find_keyring_by_name() can gain access to a freed keyring John Kacur
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: John Kacur @ 2010-06-14 22:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: lkml, rt-users, Clark Williams, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Ingo Molnar

From: Stephane Eranian <eranian@google.com>

In case the sampling buffer has no "payload" pages,
nr_pages is 0. The problem is that the error path in
perf_output_begin() skips to a label which assumes
perf_output_lock() has been issued which is not the
case. That triggers a WARN_ON() in
perf_output_unlock().

This patch fixes the problem by skipping
perf_output_unlock() in case data->nr_pages is 0.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4bf13674.014fd80a.6c82.ffffb20c@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Upstream-commit: 00d1d0b095ba4e5c0958cb228b2a9c445d4a339d
Signed-off-by: John Kacur <jkacur@redhat.com>
---
 kernel/perf_event.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 43c1dfb..e353be2 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -2933,7 +2933,7 @@ int perf_output_begin(struct perf_output_handle *handle,
 	handle->sample	= sample;
 
 	if (!data->nr_pages)
-		goto fail;
+		goto out;
 
 	have_lost = atomic_read(&data->lost);
 	if (have_lost)
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/6] KEYS: find_keyring_by_name() can gain access to a freed keyring
  2010-06-14 22:21 [PATCH 0/6] Potential rt patches for tip/rt/2.6.33 John Kacur
                   ` (2 preceding siblings ...)
  2010-06-14 22:21 ` [PATCH 3/6] perf: Fix errors path in perf_output_begin() John Kacur
@ 2010-06-14 22:21 ` John Kacur
  2010-06-14 22:21 ` [PATCH 5/6] hvc_console: Fix race between hvc_close and hvc_remove John Kacur
  2010-06-14 22:21 ` [PATCH 6/6] " John Kacur
  5 siblings, 0 replies; 15+ messages in thread
From: John Kacur @ 2010-06-14 22:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: lkml, rt-users, Clark Williams, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Ingo Molnar

From: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>

find_keyring_by_name() can gain access to a keyring that has had its reference
count reduced to zero, and is thus ready to be freed.  This then allows the
dead keyring to be brought back into use whilst it is being destroyed.

The following timeline illustrates the process:

|(cleaner)                           (user)
|
| free_user(user)                    sys_keyctl()
|  |                                  |
|  key_put(user->session_keyring)     keyctl_get_keyring_ID()
|  ||	//=> keyring->usage = 0        |
|  |schedule_work(&key_cleanup_task)   lookup_user_key()
|  ||                                   |
|  kmem_cache_free(,user)               |
|  .                                    |[KEY_SPEC_USER_KEYRING]
|  .                                    install_user_keyrings()
|  .                                    ||
| key_cleanup() [<= worker_thread()]    ||
|  |                                    ||
|  [spin_lock(&key_serial_lock)]        |[mutex_lock(&key_user_keyr..mutex)]
|  |                                    ||
|  atomic_read() == 0                   ||
|  |{ rb_ease(&key->serial_node,) }     ||
|  |                                    ||
|  [spin_unlock(&key_serial_lock)]      |find_keyring_by_name()
|  |                                    |||
|  keyring_destroy(keyring)             ||[read_lock(&keyring_name_lock)]
|  ||                                   |||
|  |[write_lock(&keyring_name_lock)]    ||atomic_inc(&keyring->usage)
|  |.                                   ||| *** GET freeing keyring ***
|  |.                                   ||[read_unlock(&keyring_name_lock)]
|  ||                                   ||
|  |list_del()                          |[mutex_unlock(&key_user_k..mutex)]
|  ||                                   |
|  |[write_unlock(&keyring_name_lock)]  ** INVALID keyring is returned **
|  |                                    .
|  kmem_cache_free(,keyring)            .
|                                       .
|                                       atomic_dec(&keyring->usage)
v                                         *** DESTROYED ***
TIME

If CONFIG_SLUB_DEBUG=y then we may see the following message generated:

	=============================================================================
	BUG key_jar: Poison overwritten
	-----------------------------------------------------------------------------

	INFO: 0xffff880197a7e200-0xffff880197a7e200. First byte 0x6a instead of 0x6b
	INFO: Allocated in key_alloc+0x10b/0x35f age=25 cpu=1 pid=5086
	INFO: Freed in key_cleanup+0xd0/0xd5 age=12 cpu=1 pid=10
	INFO: Slab 0xffffea000592cb90 objects=16 used=2 fp=0xffff880197a7e200 flags=0x200000000000c3
	INFO: Object 0xffff880197a7e200 @offset=512 fp=0xffff880197a7e300

	Bytes b4 0xffff880197a7e1f0:  5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
	  Object 0xffff880197a7e200:  6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b jkkkkkkkkkkkkkkk

Alternatively, we may see a system panic happen, such as:

	BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
	IP: [<ffffffff810e61a3>] kmem_cache_alloc+0x5b/0xe9
	PGD 6b2b4067 PUD 6a80d067 PMD 0
	Oops: 0000 [#1] SMP
	last sysfs file: /sys/kernel/kexec_crash_loaded
	CPU 1
	...
	Pid: 31245, comm: su Not tainted 2.6.34-rc5-nofixed-nodebug #2 D2089/PRIMERGY
	RIP: 0010:[<ffffffff810e61a3>]  [<ffffffff810e61a3>] kmem_cache_alloc+0x5b/0xe9
	RSP: 0018:ffff88006af3bd98  EFLAGS: 00010002
	RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88007d19900b
	RDX: 0000000100000000 RSI: 00000000000080d0 RDI: ffffffff81828430
	RBP: ffffffff81828430 R08: ffff88000a293750 R09: 0000000000000000
	R10: 0000000000000001 R11: 0000000000100000 R12: 00000000000080d0
	R13: 00000000000080d0 R14: 0000000000000296 R15: ffffffff810f20ce
	FS:  00007f97116bc700(0000) GS:ffff88000a280000(0000) knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 0000000000000001 CR3: 000000006a91c000 CR4: 00000000000006e0
	DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
	DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
	Process su (pid: 31245, threadinfo ffff88006af3a000, task ffff8800374414c0)
	Stack:
	 0000000512e0958e 0000000000008000 ffff880037f8d180 0000000000000001
	 0000000000000000 0000000000008001 ffff88007d199000 ffffffff810f20ce
	 0000000000008000 ffff88006af3be48 0000000000000024 ffffffff810face3
	Call Trace:
	 [<ffffffff810f20ce>] ? get_empty_filp+0x70/0x12f
	 [<ffffffff810face3>] ? do_filp_open+0x145/0x590
	 [<ffffffff810ce208>] ? tlb_finish_mmu+0x2a/0x33
	 [<ffffffff810ce43c>] ? unmap_region+0xd3/0xe2
	 [<ffffffff810e4393>] ? virt_to_head_page+0x9/0x2d
	 [<ffffffff81103916>] ? alloc_fd+0x69/0x10e
	 [<ffffffff810ef4ed>] ? do_sys_open+0x56/0xfc
	 [<ffffffff81008a02>] ? system_call_fastpath+0x16/0x1b
	Code: 0f 1f 44 00 00 49 89 c6 fa 66 0f 1f 44 00 00 65 4c 8b 04 25 60 e8 00 00 48 8b 45 00 49 01 c0 49 8b 18 48 85 db 74 0d 48 63 45 18 <48> 8b 04 03 49 89 00 eb 14 4c 89 f9 83 ca ff 44 89 e6 48 89 ef
	RIP  [<ffffffff810e61a3>] kmem_cache_alloc+0x5b/0xe9

This problem is that find_keyring_by_name does not confirm that the keyring is
valid before accepting it.

Skipping keyrings that have been reduced to a zero count seems the way to go.
To this end, use atomic_inc_not_zero() to increment the usage count and skip
the candidate keyring if that returns false.

The following script _may_ cause the bug to happen, but there's no guarantee
as the window of opportunity is small:

	#!/bin/sh
	LOOP=100000
	USER=dummy_user
	/bin/su -c "exit;" $USER || { /usr/sbin/adduser -m $USER; add=1; }
	for ((i=0; i<LOOP; i++))
	do
		/bin/su -c "echo '$i' > /dev/null" $USER
	done
	(( add == 1 )) && /usr/sbin/userdel -r $USER
	exit

Note that the nominated user must not be in use.

An alternative way of testing this may be:

	for ((i=0; i<100000; i++))
	do
		keyctl session foo /bin/true || break
	done >&/dev/null

as that uses a keyring named "foo" rather than relying on the user and
user-session named keyrings.

Reported-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Upstream-commit: cea7daa3589d6b550546a8c8963599f7c1a3ae5c
Signed-off-by: John Kacur <jkacur@redhat.com>
---
 security/keys/keyring.c |   18 +++++++++---------
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/security/keys/keyring.c b/security/keys/keyring.c
index 8ec0274..e031952 100644
--- a/security/keys/keyring.c
+++ b/security/keys/keyring.c
@@ -524,9 +524,8 @@ struct key *find_keyring_by_name(const char *name, bool skip_perm_check)
 	struct key *keyring;
 	int bucket;
 
-	keyring = ERR_PTR(-EINVAL);
 	if (!name)
-		goto error;
+		return ERR_PTR(-EINVAL);
 
 	bucket = keyring_hash(name);
 
@@ -553,17 +552,18 @@ struct key *find_keyring_by_name(const char *name, bool skip_perm_check)
 					   KEY_SEARCH) < 0)
 				continue;
 
-			/* we've got a match */
-			atomic_inc(&keyring->usage);
-			read_unlock(&keyring_name_lock);
-			goto error;
+			/* we've got a match but we might end up racing with
+			 * key_cleanup() if the keyring is currently 'dead'
+			 * (ie. it has a zero usage count) */
+			if (!atomic_inc_not_zero(&keyring->usage))
+				continue;
+			goto out;
 		}
 	}
 
-	read_unlock(&keyring_name_lock);
 	keyring = ERR_PTR(-ENOKEY);
-
- error:
+out:
+	read_unlock(&keyring_name_lock);
 	return keyring;
 
 } /* end find_keyring_by_name() */
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/6] hvc_console: Fix race between hvc_close and hvc_remove
  2010-06-14 22:21 [PATCH 0/6] Potential rt patches for tip/rt/2.6.33 John Kacur
                   ` (3 preceding siblings ...)
  2010-06-14 22:21 ` [PATCH 4/6] KEYS: find_keyring_by_name() can gain access to a freed keyring John Kacur
@ 2010-06-14 22:21 ` John Kacur
  2010-06-14 22:21 ` [PATCH 6/6] " John Kacur
  5 siblings, 0 replies; 15+ messages in thread
From: John Kacur @ 2010-06-14 22:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: lkml, rt-users, Clark Williams, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Ingo Molnar

From: Amit Shah <amit.shah@redhat.com>

Alan pointed out a race in the code where hvc_remove is invoked. The
recent virtio_console work is the first user of hvc_remove().

Alan describes it thus:

The hvc_console assumes that a close and remove call can't occur at the
same time.

In addition tty_hangup(tty) is problematic as tty_hangup is asynchronous
itself....

So this can happen

        hvc_close                               hvc_remove
        hung up ? - no
                                                lock
                                                tty = hp->tty
                                                unlock
        lock
        hp->tty = NULL
        unlock
        notify del
        kref_put the hvc struct
        close completes
        tty is destroyed
                                                tty_hangup dead tty
                                                tty->ops will be NULL
                                                NULL->...

This patch adds some tty krefs and also converts to using tty_vhangup().

Reported-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
CC: Alan Cox <alan@lxorguk.ukuu.org.uk>
CC: linuxppc-dev@ozlabs.org
CC: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Upstream-commit: e74d098c66543d0731de62eb747ccd5b636a6f4c
Signed-off-by: John Kacur <jkacur@redhat.com>
---
 drivers/char/hvc_console.c |   31 +++++++++++++++++++++----------
 1 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/drivers/char/hvc_console.c b/drivers/char/hvc_console.c
index 416d342..da70f68 100644
--- a/drivers/char/hvc_console.c
+++ b/drivers/char/hvc_console.c
@@ -312,6 +312,7 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 	spin_lock_irqsave(&hp->lock, flags);
 	/* Check and then increment for fast path open. */
 	if (hp->count++ > 0) {
+		tty_kref_get(tty);
 		spin_unlock_irqrestore(&hp->lock, flags);
 		hvc_kick();
 		return 0;
@@ -319,7 +320,7 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 
 	tty->driver_data = hp;
 
-	hp->tty = tty;
+	hp->tty = tty_kref_get(tty);
 
 	spin_unlock_irqrestore(&hp->lock, flags);
 
@@ -336,6 +337,7 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 		spin_lock_irqsave(&hp->lock, flags);
 		hp->tty = NULL;
 		spin_unlock_irqrestore(&hp->lock, flags);
+		tty_kref_put(tty);
 		tty->driver_data = NULL;
 		kref_put(&hp->kref, destroy_hvc_struct);
 		printk(KERN_ERR "hvc_open: request_irq failed with rc %d.\n", rc);
@@ -363,13 +365,18 @@ static void hvc_close(struct tty_struct *tty, struct file * filp)
 		return;
 
 	hp = tty->driver_data;
+
 	spin_lock_irqsave(&hp->lock, flags);
+	tty_kref_get(tty);
 
 	if (--hp->count == 0) {
 		/* We are done with the tty pointer now. */
 		hp->tty = NULL;
 		spin_unlock_irqrestore(&hp->lock, flags);
 
+		/* Put the ref obtained in hvc_open() */
+		tty_kref_put(tty);
+
 		if (hp->ops->notifier_del)
 			hp->ops->notifier_del(hp, hp->data);
 
@@ -389,6 +396,7 @@ static void hvc_close(struct tty_struct *tty, struct file * filp)
 		spin_unlock_irqrestore(&hp->lock, flags);
 	}
 
+	tty_kref_put(tty);
 	kref_put(&hp->kref, destroy_hvc_struct);
 }
 
@@ -424,10 +432,11 @@ static void hvc_hangup(struct tty_struct *tty)
 	spin_unlock_irqrestore(&hp->lock, flags);
 
 	if (hp->ops->notifier_hangup)
-			hp->ops->notifier_hangup(hp, hp->data);
+		hp->ops->notifier_hangup(hp, hp->data);
 
 	while(temp_open_count) {
 		--temp_open_count;
+		tty_kref_put(tty);
 		kref_put(&hp->kref, destroy_hvc_struct);
 	}
 }
@@ -592,7 +601,7 @@ int hvc_poll(struct hvc_struct *hp)
 	}
 
 	/* No tty attached, just skip */
-	tty = hp->tty;
+	tty = tty_kref_get(hp->tty);
 	if (tty == NULL)
 		goto bail;
 
@@ -672,6 +681,8 @@ int hvc_poll(struct hvc_struct *hp)
 
 		tty_flip_buffer_push(tty);
 	}
+	if (tty)
+		tty_kref_put(tty);
 
 	return poll_mask;
 }
@@ -806,7 +817,7 @@ int hvc_remove(struct hvc_struct *hp)
 	struct tty_struct *tty;
 
 	spin_lock_irqsave(&hp->lock, flags);
-	tty = hp->tty;
+	tty = tty_kref_get(hp->tty);
 
 	if (hp->index < MAX_NR_HVC_CONSOLES)
 		vtermnos[hp->index] = -1;
@@ -818,18 +829,18 @@ int hvc_remove(struct hvc_struct *hp)
 	/*
 	 * We 'put' the instance that was grabbed when the kref instance
 	 * was initialized using kref_init().  Let the last holder of this
-	 * kref cause it to be removed, which will probably be the tty_hangup
+	 * kref cause it to be removed, which will probably be the tty_vhangup
 	 * below.
 	 */
 	kref_put(&hp->kref, destroy_hvc_struct);
 
 	/*
-	 * This function call will auto chain call hvc_hangup.  The tty should
-	 * always be valid at this time unless a simultaneous tty close already
-	 * cleaned up the hvc_struct.
+	 * This function call will auto chain call hvc_hangup.
 	 */
-	if (tty)
-		tty_hangup(tty);
+	if (tty) {
+		tty_vhangup(tty);
+		tty_kref_put(tty);
+	}
 	return 0;
 }
 EXPORT_SYMBOL_GPL(hvc_remove);
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] hvc_console: Fix race between hvc_close and hvc_remove
  2010-06-14 22:21 [PATCH 0/6] Potential rt patches for tip/rt/2.6.33 John Kacur
                   ` (4 preceding siblings ...)
  2010-06-14 22:21 ` [PATCH 5/6] hvc_console: Fix race between hvc_close and hvc_remove John Kacur
@ 2010-06-14 22:21 ` John Kacur
  5 siblings, 0 replies; 15+ messages in thread
From: John Kacur @ 2010-06-14 22:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: lkml, rt-users, Clark Williams, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Ingo Molnar

From: Anton Blanchard <anton@samba.org>

I don't claim to understand the tty layer, but it seems like hvc_open and
hvc_close should be balanced in their kref reference counting.

Right now we get a kref every call to hvc_open:

        if (hp->count++ > 0) {
                tty_kref_get(tty); <----- here
                spin_unlock_irqrestore(&hp->lock, flags);
                hvc_kick();
                return 0;
        } /* else count == 0 */

        tty->driver_data = hp;

        hp->tty = tty_kref_get(tty); <------ or here if hp->count was 0

But hvc_close has:

        tty_kref_get(tty);

        if (--hp->count == 0) {
...
                /* Put the ref obtained in hvc_open() */
                tty_kref_put(tty);
...
        }

        tty_kref_put(tty);

Since the outside kref get/put balance we only do a single kref_put when
count reaches 0.

The patch below changes things to call tty_kref_put once for every
hvc_close call, and with that my machine boots fine.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Upstream-commit: 320718ee074acce5ffced6506cb51af1388942aa
Signed-off-by: John Kacur <jkacur@redhat.com>
---
 drivers/char/hvc_console.c |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/drivers/char/hvc_console.c b/drivers/char/hvc_console.c
index da70f68..2898084 100644
--- a/drivers/char/hvc_console.c
+++ b/drivers/char/hvc_console.c
@@ -367,16 +367,12 @@ static void hvc_close(struct tty_struct *tty, struct file * filp)
 	hp = tty->driver_data;
 
 	spin_lock_irqsave(&hp->lock, flags);
-	tty_kref_get(tty);
 
 	if (--hp->count == 0) {
 		/* We are done with the tty pointer now. */
 		hp->tty = NULL;
 		spin_unlock_irqrestore(&hp->lock, flags);
 
-		/* Put the ref obtained in hvc_open() */
-		tty_kref_put(tty);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6: v4] lockdep: Make MAX_STACK_TRACE_ENTRIES configurable.
  2010-06-14 22:21 ` [PATCH 2/6: v4] lockdep: Make MAX_STACK_TRACE_ENTRIES configurable John Kacur
@ 2010-06-16 18:49   ` Peter Zijlstra
  2010-06-16 20:37     ` John Kacur
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2010-06-16 18:49 UTC (permalink / raw)
  To: John Kacur
  Cc: Thomas Gleixner, lkml, rt-users, Clark Williams,
	Arnaldo Carvalho de Melo, Ingo Molnar

On Tue, 2010-06-15 at 00:21 +0200, John Kacur wrote:
> Certain debug configurations that have LOCKDEP turned on, run into the limit
> where the MAX_STACK_TRACE_ENTRIES are too small. Rather than simply turning
> off the locking correctness validator, let the user configure this value
> to something reasonable for their system.
> 
> This patch was generated against 2.6.33.5-rt23 but is also intended to be
> picked-up for mainline.

NACK

patches like 4726f2a617ebd868a4fdeb5679613b897e5f1676 are the way to go.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6: v4] lockdep: Make MAX_STACK_TRACE_ENTRIES configurable.
  2010-06-16 18:49   ` Peter Zijlstra
@ 2010-06-16 20:37     ` John Kacur
  2010-06-16 21:16       ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: John Kacur @ 2010-06-16 20:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, lkml, rt-users, Clark Williams,
	Arnaldo Carvalho de Melo, Ingo Molnar, Sven-Thorsten Dietrich,
	ghaskins

On Wed, 16 Jun 2010, Peter Zijlstra wrote:

> On Tue, 2010-06-15 at 00:21 +0200, John Kacur wrote:
> > Certain debug configurations that have LOCKDEP turned on, run into the limit
> > where the MAX_STACK_TRACE_ENTRIES are too small. Rather than simply turning
> > off the locking correctness validator, let the user configure this value
> > to something reasonable for their system.
> > 
> > This patch was generated against 2.6.33.5-rt23 but is also intended to be
> > picked-up for mainline.
> 
> NACK
> 
> patches like 4726f2a617ebd868a4fdeb5679613b897e5f1676 are the way to go.

I've been testing 4726f2a617ebd868a4fdeb5679613b897e5f1676 in rt
(Thomas has it in tip/rt/2.6.33 now) and so far it is doing the trick for 
me, at least on my laptop. I still need to test it on larger machines.
However, this problem seems to continuably come up, and I'm not the only 
one who has expessed the wish / need to have this tunable.
See Message-Id	 <4BCEAD7B0200005A0006513E@soto.provo.novell.com>

and here in the same thread is even an argument that some configs
might tune it down. (I needed to increase it for rt-debug configs)
See Message-Id	 <1271861581.29617.13.camel@sven.thebigcorporation.com>

fwiw regarding a nacked patch, v4 is the least intrusive version
(smallest number of changes), based on a bit of trout wacking induced 
hints.  

Thanks

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6: v4] lockdep: Make MAX_STACK_TRACE_ENTRIES configurable.
  2010-06-16 20:37     ` John Kacur
@ 2010-06-16 21:16       ` Peter Zijlstra
  2010-06-16 21:29         ` Steven Rostedt
  2010-06-16 21:33         ` John Kacur
  0 siblings, 2 replies; 15+ messages in thread
From: Peter Zijlstra @ 2010-06-16 21:16 UTC (permalink / raw)
  To: John Kacur
  Cc: Thomas Gleixner, lkml, rt-users, Clark Williams,
	Arnaldo Carvalho de Melo, Ingo Molnar, Sven-Thorsten Dietrich,
	ghaskins

On Wed, 2010-06-16 at 22:37 +0200, John Kacur wrote:
> 
> On Wed, 16 Jun 2010, Peter Zijlstra wrote:
> 
> > On Tue, 2010-06-15 at 00:21 +0200, John Kacur wrote:
> > > Certain debug configurations that have LOCKDEP turned on, run into the limit
> > > where the MAX_STACK_TRACE_ENTRIES are too small. Rather than simply turning
> > > off the locking correctness validator, let the user configure this value
> > > to something reasonable for their system.
> > > 
> > > This patch was generated against 2.6.33.5-rt23 but is also intended to be
> > > picked-up for mainline.
> > 
> > NACK
> > 
> > patches like 4726f2a617ebd868a4fdeb5679613b897e5f1676 are the way to go.
> 
> I've been testing 4726f2a617ebd868a4fdeb5679613b897e5f1676 in rt
> (Thomas has it in tip/rt/2.6.33 now) and so far it is doing the trick for 
> me, at least on my laptop. I still need to test it on larger machines.
> However, this problem seems to continuably come up, and I'm not the only 
> one who has expessed the wish / need to have this tunable.

And simply increasing the number without thought is the worst approach
ever and I'm simply not going to merge it.

Also, google doesn't seem to index msg-ids, so I've no idea what you're
referring to.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6: v4] lockdep: Make MAX_STACK_TRACE_ENTRIES configurable.
  2010-06-16 21:16       ` Peter Zijlstra
@ 2010-06-16 21:29         ` Steven Rostedt
  2010-06-17  8:35           ` Peter Zijlstra
  2010-06-16 21:33         ` John Kacur
  1 sibling, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2010-06-16 21:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: John Kacur, Thomas Gleixner, lkml, rt-users, Clark Williams,
	Arnaldo Carvalho de Melo, Ingo Molnar, Sven-Thorsten Dietrich,
	ghaskins

On Wed, 2010-06-16 at 23:16 +0200, Peter Zijlstra wrote:

> Also, google doesn't seem to index msg-ids, so I've no idea what you're
> referring to.

But marc.info does:

http://marc.info/?i=4BCEAD7B0200005A0006513E@soto.provo.novell.com

;-)

-- Steve



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6: v4] lockdep: Make MAX_STACK_TRACE_ENTRIES configurable.
  2010-06-16 21:16       ` Peter Zijlstra
  2010-06-16 21:29         ` Steven Rostedt
@ 2010-06-16 21:33         ` John Kacur
  1 sibling, 0 replies; 15+ messages in thread
From: John Kacur @ 2010-06-16 21:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, lkml, rt-users, Clark Williams,
	Arnaldo Carvalho de Melo, Ingo Molnar, Sven-Thorsten Dietrich,
	ghaskins



On Wed, 16 Jun 2010, Peter Zijlstra wrote:

> On Wed, 2010-06-16 at 22:37 +0200, John Kacur wrote:
> > 
> > On Wed, 16 Jun 2010, Peter Zijlstra wrote:
> > 
> > > On Tue, 2010-06-15 at 00:21 +0200, John Kacur wrote:
> > > > Certain debug configurations that have LOCKDEP turned on, run into the limit
> > > > where the MAX_STACK_TRACE_ENTRIES are too small. Rather than simply turning
> > > > off the locking correctness validator, let the user configure this value
> > > > to something reasonable for their system.
> > > > 
> > > > This patch was generated against 2.6.33.5-rt23 but is also intended to be
> > > > picked-up for mainline.
> > > 
> > > NACK
> > > 
> > > patches like 4726f2a617ebd868a4fdeb5679613b897e5f1676 are the way to go.
> > 
> > I've been testing 4726f2a617ebd868a4fdeb5679613b897e5f1676 in rt
> > (Thomas has it in tip/rt/2.6.33 now) and so far it is doing the trick for 
> > me, at least on my laptop. I still need to test it on larger machines.
> > However, this problem seems to continuably come up, and I'm not the only 
> > one who has expessed the wish / need to have this tunable.
> 
> And simply increasing the number without thought is the worst approach
> ever and I'm simply not going to merge it.
> 
> Also, google doesn't seem to index msg-ids, so I've no idea what you're
> referring to.
> --

The first mail in which Gregory also expressed desire to make this tunable 
is here
http://lkml.org/lkml/2010/4/21/123

The second in which Sven said in certain configurations they might even 
tune it down (decrease) is here.
http://lkml.org/lkml/2010/4/21/183

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6: v4] lockdep: Make MAX_STACK_TRACE_ENTRIES configurable.
  2010-06-16 21:29         ` Steven Rostedt
@ 2010-06-17  8:35           ` Peter Zijlstra
  2010-06-17  8:46             ` John Kacur
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2010-06-17  8:35 UTC (permalink / raw)
  To: rostedt
  Cc: John Kacur, Thomas Gleixner, lkml, rt-users, Clark Williams,
	Arnaldo Carvalho de Melo, Ingo Molnar, Sven-Thorsten Dietrich,
	ghaskins

On Wed, 2010-06-16 at 17:29 -0400, Steven Rostedt wrote:
> On Wed, 2010-06-16 at 23:16 +0200, Peter Zijlstra wrote:
> 
> > Also, google doesn't seem to index msg-ids, so I've no idea what you're
> > referring to.
> 
> But marc.info does:
> 
> http://marc.info/?i=4BCEAD7B0200005A0006513E@soto.provo.novell.com

Right, so that was before Yong Zhang's patch, if it still happens we
need to again look at what is causing this. Again, blindly increasing
the limit is not a good option.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6: v4] lockdep: Make MAX_STACK_TRACE_ENTRIES configurable.
  2010-06-17  8:35           ` Peter Zijlstra
@ 2010-06-17  8:46             ` John Kacur
  2010-06-18  3:56               ` Sven-Thorsten Dietrich
  0 siblings, 1 reply; 15+ messages in thread
From: John Kacur @ 2010-06-17  8:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rostedt, Thomas Gleixner, lkml, rt-users, Clark Williams,
	Arnaldo Carvalho de Melo, Ingo Molnar, Sven-Thorsten Dietrich,
	ghaskins

On Thu, 17 Jun 2010, Peter Zijlstra wrote:

> On Wed, 2010-06-16 at 17:29 -0400, Steven Rostedt wrote:
> > On Wed, 2010-06-16 at 23:16 +0200, Peter Zijlstra wrote:
> > 
> > > Also, google doesn't seem to index msg-ids, so I've no idea what you're
> > > referring to.
> > 
> > But marc.info does:
> > 
> > http://marc.info/?i=4BCEAD7B0200005A0006513E@soto.provo.novell.com
> 
> Right, so that was before Yong Zhang's patch, if it still happens we
> need to again look at what is causing this. Again, blindly increasing
> the limit is not a good option.

Well, as I said, I'm testing with Yong Zhang's patch, and it seems to be 
doing the trick, so I am actually not pushing for my patch right now.

But please stop characterizing this as "blindly increasing the limit", 
because that is not at all what I or others do. We have a debug build with 
tons of options turned on in which case we increased it to the minimum 
that worked for us, and we have a tracing build in which case we left it 
at the default. Also as I pointed out, in Sven's case it sounds like they may 
have had a build where they even wanted to decrease it.

Your objection in the past was that it was another tunable that nobody understands, and I 
have more sympathy for that argument. My counterargument is that if we're 
all putting a version of this patch in our private builds, then it's a tad 
counterproductive. Let's leave things the way they are for now, unless 
this becomes a problem again.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6: v4] lockdep: Make MAX_STACK_TRACE_ENTRIES configurable.
  2010-06-17  8:46             ` John Kacur
@ 2010-06-18  3:56               ` Sven-Thorsten Dietrich
  0 siblings, 0 replies; 15+ messages in thread
From: Sven-Thorsten Dietrich @ 2010-06-18  3:56 UTC (permalink / raw)
  To: John Kacur, mgalbraith
  Cc: Peter Zijlstra, rostedt, Thomas Gleixner, lkml, rt-users,
	Clark Williams, Arnaldo Carvalho de Melo, Ingo Molnar, ghaskins

On Thu, 2010-06-17 at 10:46 +0200, John Kacur wrote:
>  Also as I pointed out, in Sven's case it sounds like they may 
> have had a build where they even wanted to decrease it.
> 

I hacked the patch below for SLERT 10 a few years ago, and that's
shipping still.

As you can see from the comment in the patch header, I was aware that
this wasn't a closed case, but rather a stop-gap.

I think if I had spent any more time on it, or run into the issue
repeatedly, my first instinct would have been to do what John did, i.e.
make it configurable.

However, I think that fundamentally, if nesting goes that deep, simply
increasing this define would lead to masking real problems.

So while I hacked this up for a shipping product, with intention to
debug this later, I'd probably favor a proper fix of the offending call
chain upstream.

I do not recall ever trying to decrease it, other than to reproduce the
issue a few times. I never did have time to dig into it further, and we
closed the bug as won't fix for this particular product.

Regards,

Sven

Subject: Increase lockdep's MAX_STACK_TRACE_ENTRIES
From: Sven-Thorsten Dietrich <sdietrich@suse.de>

For large SMP systems, MAX_STACK_TRACE_ENTRIES was too low, and
lockdep would complain. This change addresses the issue on systems
we have tested.

It remains to be determined whether other bugs (e.g. scheduler
balancing issues led to unreasonably deep call chains)

Signed-off-by: Sven-Thorsten Dietrich <sdietrich@suse.de>

Index: linux-2.6.22/kernel/lockdep_internals.h
===================================================================
--- linux-2.6.22.orig/kernel/lockdep_internals.h
+++ linux-2.6.22/kernel/lockdep_internals.h
@@ -27,7 +27,7 @@
  * Stack-trace: tightly packed array of stack backtrace
  * addresses. Protected by the hash_lock.
  */
-#define MAX_STACK_TRACE_ENTRIES	262144UL
+#define MAX_STACK_TRACE_ENTRIES	524288UL

 extern struct list_head all_lock_classes;

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-06-18  3:56 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-14 22:21 [PATCH 0/6] Potential rt patches for tip/rt/2.6.33 John Kacur
2010-06-14 22:21 ` [PATCH 1/6] tracing: Update the comm field in the right variable in update_max_tr John Kacur
2010-06-14 22:21 ` [PATCH 2/6: v4] lockdep: Make MAX_STACK_TRACE_ENTRIES configurable John Kacur
2010-06-16 18:49   ` Peter Zijlstra
2010-06-16 20:37     ` John Kacur
2010-06-16 21:16       ` Peter Zijlstra
2010-06-16 21:29         ` Steven Rostedt
2010-06-17  8:35           ` Peter Zijlstra
2010-06-17  8:46             ` John Kacur
2010-06-18  3:56               ` Sven-Thorsten Dietrich
2010-06-16 21:33         ` John Kacur
2010-06-14 22:21 ` [PATCH 3/6] perf: Fix errors path in perf_output_begin() John Kacur
2010-06-14 22:21 ` [PATCH 4/6] KEYS: find_keyring_by_name() can gain access to a freed keyring John Kacur
2010-06-14 22:21 ` [PATCH 5/6] hvc_console: Fix race between hvc_close and hvc_remove John Kacur
2010-06-14 22:21 ` [PATCH 6/6] " John Kacur

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).