Re: Fw: 2.5.61 oops running SDET

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Fw: 2.5.61 oops running SDET
       [not found] <20030215172407.1fdd41fd.akpm@digeo.com>
@ 2003-02-16  1:35 ` Linus Torvalds
  2003-02-16  2:09   ` Martin J. Bligh
  0 siblings, 1 reply; 41+ messages in thread
From: Linus Torvalds @ 2003-02-16  1:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin J. Bligh, Kernel Mailing List


On Sat, 15 Feb 2003, Andrew Morton wrote:
>
> The recent change to fs/proc/array.c:task_sig()?
> 
>         buffer += sprintf(buffer, "ShdPnd:\t");
>         buffer = render_sigset_t(&p->signal->shared_pending.signal, buffer);

Yeah, but I think the bug has existed for much longer. 

It looks like "proc_pid_status()" doesn't actually lock the task at all, 
nor even bother to test whether the task has signal state. Which has 
_always_ been a bug. I don't know why it would start happening now, but it 
might just be unlucky timing.

I think the proper fix is to put a 

	task_lock()
	..
	task_unlock()

around the whole proc_pid_status() function, _and_ then verify that 
"tsk->sighand" is non-NULL.

(Oh, careful, that's already what "get_task_mm()" does internally, so look 
out for deadlocks - you'd need to open-code the get_task_mm() in there 
too, so the end result is something like

	task_lock(task)
	if (task->mm) {
		.. mm state
	}
	if (task->sighand) {
		.. signal state
	}
	..
	task_unlock(task);

instead).

		Linus


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16  1:35 ` Fw: 2.5.61 oops running SDET Linus Torvalds
@ 2003-02-16  2:09   ` Martin J. Bligh
  2003-02-16  2:27     ` Linus Torvalds
  2003-02-16  2:48     ` Zwane Mwaikambo
  0 siblings, 2 replies; 41+ messages in thread
From: Martin J. Bligh @ 2003-02-16  2:09 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton; +Cc: Kernel Mailing List

> Yeah, but I think the bug has existed for much longer. 
> 
> It looks like "proc_pid_status()" doesn't actually lock the task at all, 
> nor even bother to test whether the task has signal state. Which has 
> _always_ been a bug. I don't know why it would start happening now, but
> it  might just be unlucky timing.
> 
> I think the proper fix is to put a 
> 
> 	task_lock()
> 	..
> 	task_unlock()
> 
> around the whole proc_pid_status() function, _and_ then verify that 
> "tsk->sighand" is non-NULL.
> 
> (Oh, careful, that's already what "get_task_mm()" does internally, so
> look  out for deadlocks - you'd need to open-code the get_task_mm() in
> there  too, so the end result is something like
> 
> 	task_lock(task)
> 	if (task->mm) {
> 		.. mm state
> 	}
> 	if (task->sighand) {
> 		.. signal state
> 	}
> 	..
> 	task_unlock(task);
> 
> instead).

Something like this? Compiles, but not tested yet ...

diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c
proc/fs/proc/array.c
--- virgin/fs/proc/array.c	Sat Feb 15 16:11:45 2003
+++ proc/fs/proc/array.c	Sat Feb 15 18:05:10 2003
@@ -243,8 +243,10 @@ extern char *task_mem(struct mm_struct *
 int proc_pid_status(struct task_struct *task, char * buffer)
 {
 	char * orig = buffer;
-	struct mm_struct *mm = get_task_mm(task);
+	struct mm_struct *mm;
 
+	task_lock(task);
+	mm = __get_task_mm(task);
 	buffer = task_name(task, buffer);
 	buffer = task_state(task, buffer);
  
@@ -257,6 +259,7 @@ int proc_pid_status(struct task_struct *
 #if defined(CONFIG_ARCH_S390)
 	buffer = task_show_regs(task, buffer);
 #endif
+	task_unlock(task);
 	return buffer - orig;
 }
 
diff -urpN -X /home/fletch/.diff.exclude virgin/include/linux/sched.h
proc/include/linux/sched.h
--- virgin/include/linux/sched.h	Sat Feb 15 16:11:47 2003
+++ proc/include/linux/sched.h	Sat Feb 15 18:04:42 2003
@@ -706,6 +706,18 @@ static inline struct mm_struct * get_tas
 
 	return mm;
 }
+
+/* lockless version of get_task_mm */
+static inline struct mm_struct * __get_task_mm(struct task_struct * task)
+{
+	struct mm_struct * mm;
+ 
+	mm = task->mm;
+	if (mm)
+		atomic_inc(&mm->mm_users);
+
+	return mm;
+}
  
  
 /* set thread flags in other task's structures


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16  2:09   ` Martin J. Bligh
@ 2003-02-16  2:27     ` Linus Torvalds
  2003-02-16  4:00       ` Martin J. Bligh
  2003-02-16  2:48     ` Zwane Mwaikambo
  1 sibling, 1 reply; 41+ messages in thread
From: Linus Torvalds @ 2003-02-16  2:27 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, Kernel Mailing List


On Sat, 15 Feb 2003, Martin J. Bligh wrote:
> 
> Something like this? Compiles, but not tested yet ...

Close, but you also need

-	buffer = task_sig(task, buffer);
+	if (task->sighand)
+		buffer = task_sig(task, buffer);

to actually check whether signals exist or not. Otherwise you'll just get 
the same oops anyway (well, keeping the task locked may change timings 
enough that it doesn't happen, but the bug will continue to be there.

I would also say that since you explicitly take the task lock, there's no 
real reason to use "get_task_mm()" at all any more, so instead of doing 
that (and then doing the mmput()), just get rid of the mm variable 
entirely, and do

	if (task->mm)
		buffer = task_mem(task->mm, buffer)

too. There's really no downside to just holding on to the task lock over
the whole operation instead of incrementing and decrementing the mm
counts and dropping the lock early.

(There are a few valid reasons for using the "get_task_mm()" function:

 - you need to block and thus drop the task lock

 - the original code just used "task->mm" directly, and using 
   "get_task_mm()" made the original conversion to mm safe handling 
   easier.

Neither reason is really valid any more in this function at least).

			Linus


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16  2:27     ` Linus Torvalds
@ 2003-02-16  4:00       ` Martin J. Bligh
  2003-02-16 13:05         ` Anton Blanchard
  2003-02-16 18:07         ` Fw: 2.5.61 oops running SDET Linus Torvalds
  0 siblings, 2 replies; 41+ messages in thread
From: Martin J. Bligh @ 2003-02-16  4:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, Kernel Mailing List

> Close, but you also need
> 
> -	buffer = task_sig(task, buffer);
> +	if (task->sighand)
> +		buffer = task_sig(task, buffer);
> 
> to actually check whether signals exist or not. Otherwise you'll just get 
> the same oops anyway (well, keeping the task locked may change timings 
> enough that it doesn't happen, but the bug will continue to be there.
> 
> I would also say that since you explicitly take the task lock, there's no 
> real reason to use "get_task_mm()" at all any more, so instead of doing 
> that (and then doing the mmput()), just get rid of the mm variable 
> entirely, and do
> 
> 	if (task->mm)
> 		buffer = task_mem(task->mm, buffer)
> 
> too. There's really no downside to just holding on to the task lock over
> the whole operation instead of incrementing and decrementing the mm
> counts and dropping the lock early.
> 
> (There are a few valid reasons for using the "get_task_mm()" function:
> 
>  - you need to block and thus drop the task lock
> 
>  - the original code just used "task->mm" directly, and using 
>    "get_task_mm()" made the original conversion to mm safe handling 
>    easier.
> 
> Neither reason is really valid any more in this function at least).

OK, I did the following, which is what I think you wanted, plus Zwane's
observation that task_state acquires the task_struct lock (we're the only 
caller, so I just removed it), but I still get the same panic and this time
the box hung. No doubt I've just done something stupid in the patch ...
(task_state takes the tasklist_lock ... is that safe to do inside task_lock?)

diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet/fs/proc/array.c
--- virgin/fs/proc/array.c	Sat Feb 15 16:11:45 2003
+++ sdet/fs/proc/array.c	Sat Feb 15 19:28:34 2003
@@ -166,12 +166,10 @@ static inline char * task_state(struct t
 		p->uid, p->euid, p->suid, p->fsuid,
 		p->gid, p->egid, p->sgid, p->fsgid);
 	read_unlock(&tasklist_lock);	
-	task_lock(p);
 	buffer += sprintf(buffer,
 		"FDSize:\t%d\n"
 		"Groups:\t",
 		p->files ? p->files->max_fds : 0);
-	task_unlock(p);
 
 	for (g = 0; g < p->ngroups; g++)
 		buffer += sprintf(buffer, "%d ", p->groups[g]);
@@ -243,20 +241,20 @@ extern char *task_mem(struct mm_struct *
 int proc_pid_status(struct task_struct *task, char * buffer)
 {
 	char * orig = buffer;
-	struct mm_struct *mm = get_task_mm(task);
 
+	task_lock(task);
 	buffer = task_name(task, buffer);
 	buffer = task_state(task, buffer);
  
-	if (mm) {
-		buffer = task_mem(mm, buffer);
-		mmput(mm);
-	}
-	buffer = task_sig(task, buffer);
+	if (task->mm)
+		buffer = task_mem(task->mm, buffer);
+	if (task->sighand)
+		buffer = task_sig(task, buffer);
 	buffer = task_cap(task, buffer);
 #if defined(CONFIG_ARCH_S390)
 	buffer = task_show_regs(task, buffer);
 #endif
+	task_unlock(task);
 	return buffer - orig;
 }
 



Unable to handle kernel NULL pointer dereference at virtual address 00000014
 printing eip:
c0118b13
*pde = 2d4bf001
*pte = 00000000
Oops: 0000
CPU:    6
EIP:    0060:[<c0118b13>]    Not tainted
EFLAGS: 00010207
EIP is at render_sigset_t+0x17/0x7c
eax: 00000010   ebx: ed3d909f   ecx: ed3d9097   edx: eda60100
esi: 0000003c   edi: 00000010   ebp: ed49bf04   esp: ed49bef8
ds: 007b   es: 007b   ss: 0068
Process ps (pid: 3339, threadinfo=ed49a000 task=ee32cd20)
Stack: ed3d907d 00000002 ed3d909f 00000006 c016d85f 00000010 ed3d909f ed3d9097 
       c0233f38 eda606bc ed3d9086 ed3d907e c0233f2f ee1d7b70 eda60100 eeb4d0c0 
       ed3d9000 ee1d7b70 00000000 ed3d9000 000000d0 eeb4d0c0 c01300f6 c17119e8 
Call Trace:
 [<c016d85f>] proc_pid_status+0x26f/0x334
 [<c01300f6>] __get_free_pages+0x4e/0x54
 [<c016b517>] proc_info_read+0x53/0x130
 [<c0145295>] vfs_read+0xa5/0x128
 [<c0145522>] sys_read+0x2a/0x3c
 [<c0108b3f>] syscall_call+0x7/0xb

Code: 0f a3 37 19 d2 b8 01 00 00 00 31 c9 85 d2 0f 45 c8 8d 56 01 
 <1>Unable to handle kernel NULL pointer dereference at virtual address 00000014
 printing eip:
c0118b13
*pde = 2d512001
*pte = 00000000
Oops: 0000
CPU:    2
EIP:    0060:[<c0118b13>]    Not tainted
EFLAGS: 00010207
EIP is at render_sigset_t+0x17/0x7c
eax: 00000010   ebx: ee34b0a0   ecx: ee34b098   edx: ed3ea7a0
esi: 0000003c   edi: 00000010   ebp: ed54ff04   esp: ed54fef8
ds: 007b   es: 007b   ss: 0068
Process ps (pid: 14710, threadinfo=ed54e000 task=ee430d40)
Stack: ee34b07e 00000002 ee34b0a0 00000006 c016d85f 00000010 ee34b0a0 ee34b098 
       c0233f38 ed3ead5c ee34b087 ee34b07f c0233f2f ee4960d0 ed3ea7a0 eec66ca0 
       ee34b000 ee4960d0 00000000 ee34b000 000000d0 eec66ca0 c01300f6 c17383b8 
Call Trace:
 [<c016d85f>] proc_pid_status+0x26f/0x334
 [<c01300f6>] __get_free_pages+0x4e/0x54
 [<c016b517>] proc_info_read+0x53/0x130
 [<c0145295>] vfs_read+0xa5/0x128
 [<c0145522>] sys_read+0x2a/0x3c
 [<c0108b3f>] syscall_call+0x7/0xb

Code: 0f a3 37 19 d2 b8 01 00 00 00 31 c9 85 d2 0f 45 c8 8d 56 01 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16  4:00       ` Martin J. Bligh
@ 2003-02-16 13:05         ` Anton Blanchard
  2003-02-16 16:39           ` Martin J. Bligh
  2003-02-16 18:07         ` Fw: 2.5.61 oops running SDET Linus Torvalds
  1 sibling, 1 reply; 41+ messages in thread
From: Anton Blanchard @ 2003-02-16 13:05 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Linus Torvalds, Andrew Morton, Kernel Mailing List

 
> OK, I did the following, which is what I think you wanted, plus Zwane's
> observation that task_state acquires the task_struct lock (we're the only 
> caller, so I just removed it), but I still get the same panic and this time
> the box hung. No doubt I've just done something stupid in the patch ...
> (task_state takes the tasklist_lock ... is that safe to do inside task_lock?)

It looks like you now try and take the tasklist_lock inside a task_lock()
region, but task_lock says no-can-do:

/* Protects ->fs, ->files, ->mm, and synchronises with wait4().  Nests
 * inside tasklist_lock */
static inline void task_lock(struct task_struct *p)
{
...

Anton

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16 13:05         ` Anton Blanchard
@ 2003-02-16 16:39           ` Martin J. Bligh
  2003-02-16 18:21             ` Linus Torvalds
  0 siblings, 1 reply; 41+ messages in thread
From: Martin J. Bligh @ 2003-02-16 16:39 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Linus Torvalds, Andrew Morton, Kernel Mailing List

>> OK, I did the following, which is what I think you wanted, plus Zwane's
>> observation that task_state acquires the task_struct lock (we're the only 
>> caller, so I just removed it), but I still get the same panic and this time
>> the box hung. No doubt I've just done something stupid in the patch ...
>> (task_state takes the tasklist_lock ... is that safe to do inside task_lock?)
> 
> It looks like you now try and take the tasklist_lock inside a task_lock()
> region, but task_lock says no-can-do:
> 
> /* Protects ->fs, ->files, ->mm, and synchronises with wait4().  Nests
>  * inside tasklist_lock */
> static inline void task_lock(struct task_struct *p)
> {
> ...

Right, that's what I expected, and that may well explain the hang, but I 
don't see how that explains the oops still happening. What exactly is 
tasklist_lock protecting here anyway?

        read_lock(&tasklist_lock);
        buffer += sprintf(buffer,
                "State:\t%s\n"
                "Tgid:\t%d\n"
                "Pid:\t%d\n"
                "PPid:\t%d\n"
                "TracerPid:\t%d\n"
                "Uid:\t%d\t%d\t%d\t%d\n"
                "Gid:\t%d\t%d\t%d\t%d\n",
                get_task_state(p), p->tgid,
                p->pid, p->pid ? p->real_parent->pid : 0,
                p->pid && p->ptrace ? p->parent->pid : 0,
                p->uid, p->euid, p->suid, p->fsuid,
                p->gid, p->egid, p->sgid, p->fsgid);
        read_unlock(&tasklist_lock);

Is it these two accesses:

p->real_parent->pid ?
p->parent->pid ?

Don't see what I can do for this apart from to invert the ordering and take
tasklist_lock around the whole function, and nest task_lock inside that, or
I suppose I could take the task_lock for each of the parents? I seem to 
recall Linus reminding people recently that it was only the lock 
acquisition order that was important, not release ... does something like 
the following look OK?


diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet2/fs/proc/array.c
--- virgin/fs/proc/array.c	Sat Feb 15 16:11:45 2003
+++ sdet2/fs/proc/array.c	Sun Feb 16 08:37:45 2003
@@ -147,11 +147,11 @@ static inline const char * get_task_stat
 	return *p;
 }
 
+/* Call me with the tasklist_lock and task_lock for p held already */
 static inline char * task_state(struct task_struct *p, char *buffer)
 {
 	int g;
 
-	read_lock(&tasklist_lock);
 	buffer += sprintf(buffer,
 		"State:\t%s\n"
 		"Tgid:\t%d\n"
@@ -165,13 +165,10 @@ static inline char * task_state(struct t
 		p->pid && p->ptrace ? p->parent->pid : 0,
 		p->uid, p->euid, p->suid, p->fsuid,
 		p->gid, p->egid, p->sgid, p->fsgid);
-	read_unlock(&tasklist_lock);	
-	task_lock(p);
 	buffer += sprintf(buffer,
 		"FDSize:\t%d\n"
 		"Groups:\t",
 		p->files ? p->files->max_fds : 0);
-	task_unlock(p);
 
 	for (g = 0; g < p->ngroups; g++)
 		buffer += sprintf(buffer, "%d ", p->groups[g]);
@@ -243,20 +240,22 @@ extern char *task_mem(struct mm_struct *
 int proc_pid_status(struct task_struct *task, char * buffer)
 {
 	char * orig = buffer;
-	struct mm_struct *mm = get_task_mm(task);
 
+	read_lock(&tasklist_lock);
+	task_lock(task);
 	buffer = task_name(task, buffer);
 	buffer = task_state(task, buffer);
+	read_unlock(&tasklist_lock);
  
-	if (mm) {
-		buffer = task_mem(mm, buffer);
-		mmput(mm);
-	}
-	buffer = task_sig(task, buffer);
+	if (task->mm)
+		buffer = task_mem(task->mm, buffer);
+	if (task->sighand)
+		buffer = task_sig(task, buffer);
 	buffer = task_cap(task, buffer);
 #if defined(CONFIG_ARCH_S390)
 	buffer = task_show_regs(task, buffer);
 #endif
+	task_unlock(task);
 	return buffer - orig;
 }
 





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16 16:39           ` Martin J. Bligh
@ 2003-02-16 18:21             ` Linus Torvalds
  2003-02-16 19:06               ` Martin J. Bligh
  0 siblings, 1 reply; 41+ messages in thread
From: Linus Torvalds @ 2003-02-16 18:21 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List


On Sun, 16 Feb 2003, Martin J. Bligh wrote:
> 
> Right, that's what I expected, and that may well explain the hang, but I 
> don't see how that explains the oops still happening. What exactly is 
> tasklist_lock protecting here anyway?
> 
>         read_lock(&tasklist_lock);
>         buffer += sprintf(buffer,
>                 "State:\t%s\n"
>                 "Tgid:\t%d\n"
>                 "Pid:\t%d\n"
>                 "PPid:\t%d\n"
>                 "TracerPid:\t%d\n"
>                 "Uid:\t%d\t%d\t%d\t%d\n"
>                 "Gid:\t%d\t%d\t%d\t%d\n",
>                 get_task_state(p), p->tgid,
>                 p->pid, p->pid ? p->real_parent->pid : 0,
>                 p->pid && p->ptrace ? p->parent->pid : 0,
>                 p->uid, p->euid, p->suid, p->fsuid,
>                 p->gid, p->egid, p->sgid, p->fsgid);
>         read_unlock(&tasklist_lock);
> 
> Is it these two accesses:
> 
> p->real_parent->pid ?
> p->parent->pid ?

Yup.

> Don't see what I can do for this apart from to invert the ordering and take
> tasklist_lock around the whole function, and nest task_lock inside that, or
> I suppose I could take the task_lock for each of the parents? I seem to 
> recall Linus reminding people recently that it was only the lock 
> acquisition order that was important, not release ... does something like 
> the following look OK?

This patch looks like it should certainly fix the problem, but that is 
still some god-awful ugly overkill in locking.

I'd rather make the rule be that you have to take the task lock before 
modifying things like the parent pointers (and all the other fundamntal 
pointers), since that's already the rule for most of it anyway.

And then the tasklist lock would go away _entirely_ from /proc (except for
task lookup in ->readdir/->lookup, of course, where it is fundamentally
needed and proper - and will probably some day be replaced by RCU, I
suspect).

			Linus


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16 18:21             ` Linus Torvalds
@ 2003-02-16 19:06               ` Martin J. Bligh
  2003-02-16 19:17                 ` Linus Torvalds
                                   ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Martin J. Bligh @ 2003-02-16 19:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List,
	Zwane Mwaikambo, Manfred Spraul

>> Don't see what I can do for this apart from to invert the ordering and take
>> tasklist_lock around the whole function, and nest task_lock inside that, or
>> I suppose I could take the task_lock for each of the parents? I seem to 
>> recall Linus reminding people recently that it was only the lock 
>> acquisition order that was important, not release ... does something like 
>> the following look OK?
> 
> This patch looks like it should certainly fix the problem, but that is 
> still some god-awful ugly overkill in locking.
> 
> I'd rather make the rule be that you have to take the task lock before 
> modifying things like the parent pointers (and all the other fundamntal 
> pointers), since that's already the rule for most of it anyway.
> 
> And then the tasklist lock would go away _entirely_ from /proc (except for
> task lookup in ->readdir/->lookup, of course, where it is fundamentally
> needed and proper - and will probably some day be replaced by RCU, I
> suspect).

Well, I did the stupid safe thing, and it hangs the box once we get up to 
a load of 32 with SDET. Below is what I did, the only other issue I can
see in here is that task_mem takes mm->mmap_sem which is now nested inside
the task_lock inside tasklist_lock ... but I can't see anywhere that's a
problem from a quick search

diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet2/fs/proc/array.c
--- virgin/fs/proc/array.c	Sat Feb 15 16:11:45 2003
+++ sdet2/fs/proc/array.c	Sun Feb 16 09:59:24 2003
@@ -147,11 +147,11 @@ static inline const char * get_task_stat
 	return *p;
 }
 
+/* Call me with the tasklist_lock and task_lock for p held already */
 static inline char * task_state(struct task_struct *p, char *buffer)
 {
 	int g;
 
-	read_lock(&tasklist_lock);
 	buffer += sprintf(buffer,
 		"State:\t%s\n"
 		"Tgid:\t%d\n"
@@ -165,13 +165,10 @@ static inline char * task_state(struct t
 		p->pid && p->ptrace ? p->parent->pid : 0,
 		p->uid, p->euid, p->suid, p->fsuid,
 		p->gid, p->egid, p->sgid, p->fsgid);
-	read_unlock(&tasklist_lock);	
-	task_lock(p);
 	buffer += sprintf(buffer,
 		"FDSize:\t%d\n"
 		"Groups:\t",
 		p->files ? p->files->max_fds : 0);
-	task_unlock(p);
 
 	for (g = 0; g < p->ngroups; g++)
 		buffer += sprintf(buffer, "%d ", p->groups[g]);
@@ -243,20 +240,22 @@ extern char *task_mem(struct mm_struct *
 int proc_pid_status(struct task_struct *task, char * buffer)
 {
 	char * orig = buffer;
-	struct mm_struct *mm = get_task_mm(task);
 
+	read_lock(&tasklist_lock);
+	task_lock(task);
 	buffer = task_name(task, buffer);
 	buffer = task_state(task, buffer);
  
-	if (mm) {
-		buffer = task_mem(mm, buffer);
-		mmput(mm);
-	}
-	buffer = task_sig(task, buffer);
+	if (task->mm)
+		buffer = task_mem(task->mm, buffer);
+	if (task->sighand)
+		buffer = task_sig(task, buffer);
 	buffer = task_cap(task, buffer);
 #if defined(CONFIG_ARCH_S390)
 	buffer = task_show_regs(task, buffer);
 #endif
+	task_unlock(task);
+	read_unlock(&tasklist_lock);
 	return buffer - orig;
 }
 





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16 19:06               ` Martin J. Bligh
@ 2003-02-16 19:17                 ` Linus Torvalds
  2003-02-16 21:15                   ` Martin J. Bligh
  2003-02-16 19:18                 ` Manfred Spraul
  2003-02-16 19:19                 ` more signal locking bugs? Martin J. Bligh
  2 siblings, 1 reply; 41+ messages in thread
From: Linus Torvalds @ 2003-02-16 19:17 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List,
	Zwane Mwaikambo, Manfred Spraul


On Sun, 16 Feb 2003, Martin J. Bligh wrote:
> 
> Well, I did the stupid safe thing, and it hangs the box once we get up to 
> a load of 32 with SDET. Below is what I did, the only other issue I can
> see in here is that task_mem takes mm->mmap_sem which is now nested inside
> the task_lock inside tasklist_lock ... but I can't see anywhere that's a
> problem from a quick search

Ho - you can _never_ nest a semaphore inside a spinlock - if the semaphore 
sleeps, the spinlock will cause a lockup regardless of any reverse nesting 
issues.. So I guess the old "get_task_mm()" code is requried anyway.

		Linus


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16 19:17                 ` Linus Torvalds
@ 2003-02-16 21:15                   ` Martin J. Bligh
  2003-02-16 21:21                     ` Manfred Spraul
  0 siblings, 1 reply; 41+ messages in thread
From: Martin J. Bligh @ 2003-02-16 21:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List,
	Zwane Mwaikambo, Manfred Spraul

>> Well, I did the stupid safe thing, and it hangs the box once we get up to 
>> a load of 32 with SDET. Below is what I did, the only other issue I can
>> see in here is that task_mem takes mm->mmap_sem which is now nested inside
>> the task_lock inside tasklist_lock ... but I can't see anywhere that's a
>> problem from a quick search
> 
> Ho - you can _never_ nest a semaphore inside a spinlock - if the semaphore 
> sleeps, the spinlock will cause a lockup regardless of any reverse nesting 
> issues.. So I guess the old "get_task_mm()" code is requried anyway.

True ... staring at unfamiliar code made me forget a few things ;-)
OK, the below patch works, no oopses, no hangs. If this is acceptable
(even if it's not the final solution), could you apply it? Seems to
be better than the current situation, at any rate.

-----------------------------------------------------

We can encounter a race condition where task->sighand can be NULL in 
proc_pid_status, resulting in an offset NULL pointer dereference in 
render_sigset_t. This can be easily demonstrated running SDET with a 
load of 32 to 64 on a large machine (16x NUMA-Q in this case).

This patch makes us take task_lock around the sighand deferences,
and fixes the oops in testing on the same machine. Thanks to Linus,
Andrew, Manfred and Zwane for lots of guidance ;-)

diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet3/fs/proc/array.c
--- virgin/fs/proc/array.c	Sat Feb 15 16:11:45 2003
+++ sdet3/fs/proc/array.c	Sun Feb 16 11:44:24 2003
@@ -252,8 +252,11 @@ int proc_pid_status(struct task_struct *
 		buffer = task_mem(mm, buffer);
 		mmput(mm);
 	}
-	buffer = task_sig(task, buffer);
+	task_lock(task);
+	if (task->sighand)
+		buffer = task_sig(task, buffer);
 	buffer = task_cap(task, buffer);
+	task_unlock(task);
 #if defined(CONFIG_ARCH_S390)
 	buffer = task_show_regs(task, buffer);
 #endif


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16 21:15                   ` Martin J. Bligh
@ 2003-02-16 21:21                     ` Manfred Spraul
  2003-02-16 22:34                       ` Linus Torvalds
  0 siblings, 1 reply; 41+ messages in thread
From: Manfred Spraul @ 2003-02-16 21:21 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Linus Torvalds, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo

Martin J. Bligh wrote:

>diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet3/fs/proc/array.c
>--- virgin/fs/proc/array.c	Sat Feb 15 16:11:45 2003
>+++ sdet3/fs/proc/array.c	Sun Feb 16 11:44:24 2003
>@@ -252,8 +252,11 @@ int proc_pid_status(struct task_struct *
> 		buffer = task_mem(mm, buffer);
> 		mmput(mm);
> 	}
>-	buffer = task_sig(task, buffer);
>+	task_lock(task);
>+	if (task->sighand)
>+		buffer = task_sig(task, buffer);
> 	buffer = task_cap(task, buffer);
>+	task_unlock(task);
> #if defined(CONFIG_ARCH_S390)
> 	buffer = task_show_regs(task, buffer);
> #endif
>  
>
I think it's needed for 2.4, too.

--
    Manfred


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16 21:21                     ` Manfred Spraul
@ 2003-02-16 22:34                       ` Linus Torvalds
  2003-02-16 23:08                         ` Martin J. Bligh
  0 siblings, 1 reply; 41+ messages in thread
From: Linus Torvalds @ 2003-02-16 22:34 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo


On Sun, 16 Feb 2003, Manfred Spraul wrote:
> Martin J. Bligh wrote:
> 
> >diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet3/fs/proc/array.c
> >--- virgin/fs/proc/array.c	Sat Feb 15 16:11:45 2003
> >+++ sdet3/fs/proc/array.c	Sun Feb 16 11:44:24 2003
> >@@ -252,8 +252,11 @@ int proc_pid_status(struct task_struct *
> > 		buffer = task_mem(mm, buffer);
> > 		mmput(mm);
> > 	}
> >-	buffer = task_sig(task, buffer);
> >+	task_lock(task);
> >+	if (task->sighand)
> >+		buffer = task_sig(task, buffer);
> > 	buffer = task_cap(task, buffer);
> >+	task_unlock(task);
> > #if defined(CONFIG_ARCH_S390)
> > 	buffer = task_show_regs(task, buffer);
> > #endif
> >  
> >
> I think it's needed for 2.4, too.

It's not wrong, but it shouldn't help. Simply because "task_lock()"  
isn't even relevant to "task->sighand" as it stands now. It will be 
cleared without holding the task lock, as far as I can see.

So I suspect it fixes things for Martin only because it changes timings 
enough not to hit the race.

		Linus


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16 22:34                       ` Linus Torvalds
@ 2003-02-16 23:08                         ` Martin J. Bligh
  2003-02-16 23:32                           ` Linus Torvalds
  0 siblings, 1 reply; 41+ messages in thread
From: Martin J. Bligh @ 2003-02-16 23:08 UTC (permalink / raw)
  To: Linus Torvalds, Manfred Spraul
  Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List,
	Zwane Mwaikambo

>> > diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet3/fs/proc/array.c
>> > --- virgin/fs/proc/array.c	Sat Feb 15 16:11:45 2003
>> > +++ sdet3/fs/proc/array.c	Sun Feb 16 11:44:24 2003
>> > @@ -252,8 +252,11 @@ int proc_pid_status(struct task_struct *
>> > 		buffer = task_mem(mm, buffer);
>> > 		mmput(mm);
>> > 	}
>> > -	buffer = task_sig(task, buffer);
>> > +	task_lock(task);
>> > +	if (task->sighand)
>> > +		buffer = task_sig(task, buffer);
>> > 	buffer = task_cap(task, buffer);
>> > +	task_unlock(task);
>> > # if defined(CONFIG_ARCH_S390)
>> > 	buffer = task_show_regs(task, buffer);
>> > # endif
>> >  
>> > 
>> I think it's needed for 2.4, too.
> 
> It's not wrong, but it shouldn't help. Simply because "task_lock()"  
> isn't even relevant to "task->sighand" as it stands now. It will be 
> cleared without holding the task lock, as far as I can see.
> 
> So I suspect it fixes things for Martin only because it changes timings 
> enough not to hit the race.

Ah, fair enough ... it's probably the if, rather than the task_lock.

So what does protect sighand? tasklist_lock? It doesn't seem to
as people do things like:

spin_unlock_irq(&current->sighand->siglock);

all the time ... so is it just protected by good faith and the direction
of the wind?

M.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16 23:08                         ` Martin J. Bligh
@ 2003-02-16 23:32                           ` Linus Torvalds
  0 siblings, 0 replies; 41+ messages in thread
From: Linus Torvalds @ 2003-02-16 23:32 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Manfred Spraul, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo, Ingo Molnar


On Sun, 16 Feb 2003, Martin J. Bligh wrote:
> 
> So what does protect sighand? tasklist_lock?

Yup. At least right now. And we do tend to hold tasklist_lock in most
cases simply by virtue of having to get it anyway in order to search for 
the process. 

>				It doesn't seem to
> as people do things like:
> 
> spin_unlock_irq(&current->sighand->siglock);
> 
> all the time ... so is it just protected by good faith and the direction
> of the wind?

Agreed.  That, and the fact that most of the time the stuff _is_ there:  
obviously any time the process looks at its own signals it will always be
there.

So we have two cases:
 - "normal" signal sending which holds tasklist_lock to find the target.
 - signals to the process itself, which is usually safe, since we know 
   we're there (the exception would be if/when taking a fault in the exit
   path, but even that might be ok since signals are torn down not by exit
   itself but by "release_task")

So the signal path may actually be ok.

		Linus


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16 19:06               ` Martin J. Bligh
  2003-02-16 19:17                 ` Linus Torvalds
@ 2003-02-16 19:18                 ` Manfred Spraul
  2003-02-16 19:19                 ` more signal locking bugs? Martin J. Bligh
  2 siblings, 0 replies; 41+ messages in thread
From: Manfred Spraul @ 2003-02-16 19:18 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Linus Torvalds, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo

Martin J. Bligh wrote:

>Well, I did the stupid safe thing, and it hangs the box once we get up to 
>a load of 32 with SDET. Below is what I did, the only other issue I can
>see in here is that task_mem takes mm->mmap_sem which is now nested inside
>the task_lock inside tasklist_lock ... but I can't see anywhere that's a
>problem from a quick search
>  
>
task_lock is actually &spin_lock(tsk->alloc_lock); mmap_sem is a 
semaphore. Nesting means deadlock.

--
    Manfred


^ permalink raw reply	[flat|nested] 41+ messages in thread

* more signal locking bugs?
  2003-02-16 19:06               ` Martin J. Bligh
  2003-02-16 19:17                 ` Linus Torvalds
  2003-02-16 19:18                 ` Manfred Spraul
@ 2003-02-16 19:19                 ` Martin J. Bligh
  2003-02-16 19:24                   ` Linus Torvalds
  2 siblings, 1 reply; 41+ messages in thread
From: Martin J. Bligh @ 2003-02-16 19:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List,
	Zwane Mwaikambo, Manfred Spraul

task_lock nests *inside* tasklist_lock but ... :

__do_SAK:

	task_lock
	...
	send_sig (SIGKILL, ...)
		send_sig_info (SIGKILL, ...)
			if (T(sig, SIG_KERNEL_BROADCAST_MASK)))
				read_lock(&tasklist_lock);
				...
				read_unlock(&tasklist_lock);
	...
	task_unlock


#define SIG_KERNEL_BROADCAST_MASK (\
        M(SIGHUP)    |  M(SIGINT)    |  M(SIGQUIT)   |  M(SIGILL)    | \
        M(SIGTRAP)   |  M(SIGABRT)   |  M(SIGBUS)    |  M(SIGFPE)    | \
        M(SIGKILL)   |  M(SIGUSR1)   |  M(SIGSEGV)   |  M(SIGUSR2)   | \
        M(SIGPIPE)   |  M(SIGALRM)   |  M(SIGTERM)   |  M(SIGXCPU)   | \
        M(SIGXFSZ)   |  M(SIGVTALRM) |  M(SIGPROF)   |  M(SIGPOLL)   | \
        M(SIGSYS)    |  M_SIGSTKFLT  |  M(SIGPWR)    |  M(SIGCONT)   | \
        M(SIGSTOP)   |  M(SIGTSTP)   |  M(SIGTTIN)   |  M(SIGTTOU)   | \
        M_SIGEMT )


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: more signal locking bugs?
  2003-02-16 19:19                 ` more signal locking bugs? Martin J. Bligh
@ 2003-02-16 19:24                   ` Linus Torvalds
  2003-02-16 19:37                     ` Manfred Spraul
  0 siblings, 1 reply; 41+ messages in thread
From: Linus Torvalds @ 2003-02-16 19:24 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List,
	Zwane Mwaikambo, Manfred Spraul

On Sun, 16 Feb 2003, Martin J. Bligh wrote:
>
> task_lock nests *inside* tasklist_lock but ... :

Yeah, this is a problem with any signal, not just SAK.

In fact, in general it is always _wrong_ to nest a non-interrupt-safe 
lock inside an interrupt-safe spinlock, because then you can never take 
the non-interrupt-safe one on its own (because an interrupt may come in 
and take the interrupt-safe lock, at which point that CPU has now 
violated ordering). 

And if you always nest the interrupt-unsafee one inside the interrupt- 
safe one, you might as well not have the interrupt-unsafe one in the first 
place, since the outer lock _always_ protects the code in question anyway.

Which implies that either
 - the task lock should be _outside_ the tasklist_lock
or
 - the task lock should be made interrupt-safe.

If we make the tasklock one interrupt-safe, that should fix the signal
issue, and we can use the tasklock to protect "task->signal" and 
"task->sighand". 

In short, everything really seems to be pointing that way: the current
task lock simply _is_ broken, and has apparently always been broken (but
the ABBA deadlock is just extremely rare in practice, since you have to
get an interrupt at just the right point on one CPU, while you have the AB
case on another).

		Linus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: more signal locking bugs?
  2003-02-16 19:24                   ` Linus Torvalds
@ 2003-02-16 19:37                     ` Manfred Spraul
  2003-02-16 19:42                       ` Linus Torvalds
  0 siblings, 1 reply; 41+ messages in thread
From: Manfred Spraul @ 2003-02-16 19:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo

Linus Torvalds wrote:

>In short, everything really seems to be pointing that way: the current
>task lock simply _is_ broken, and has apparently always been broken (but
>the ABBA deadlock is just extremely rare in practice, since you have to
>get an interrupt at just the right point on one CPU, while you have the AB
>case on another).\
>
ABBA is not a deadlock, because linux read_locks permit recursive calls.

    read_lock(tasklist_lock);
    task_lock(tsk);
    read_lock(tasklist_lock);

Does not deadlock, nor any other ordering.

The tasklist_lock is never taken for write from bh or irq context.

--
    Manfred


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: more signal locking bugs?
  2003-02-16 19:37                     ` Manfred Spraul
@ 2003-02-16 19:42                       ` Linus Torvalds
  2003-02-16 20:01                         ` Linus Torvalds
  2003-02-16 20:07                         ` Manfred Spraul
  0 siblings, 2 replies; 41+ messages in thread
From: Linus Torvalds @ 2003-02-16 19:42 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo


On Sun, 16 Feb 2003, Manfred Spraul wrote:
>
> ABBA is not a deadlock, because linux read_locks permit recursive calls.
> 
>     read_lock(tasklist_lock);
>     task_lock(tsk);
>     read_lock(tasklist_lock);
> 
> Does not deadlock, nor any other ordering.
> 
> The tasklist_lock is never taken for write from bh or irq context.

Doesn't matter.

	CPU1 - do_exit()			CPU2 - non-nested task_lock()

						task_lock(tsk)
	write_lock_irq(&tasklist_lock);		IRQ HAPPENS
	task_lock(tsk);				read_lock(&tasklist_lock)

BOOM, you're dead.

See? ABBA _does_ happen with the task lock, it's just that the magic 
required to do so is fairly unlikely thanks to the added requirement for 
the irq to happen at just the right moment (ie there are no static 
code-paths that can cause it).

		Linus


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: more signal locking bugs?
  2003-02-16 19:42                       ` Linus Torvalds
@ 2003-02-16 20:01                         ` Linus Torvalds
  2003-02-16 20:07                         ` Manfred Spraul
  1 sibling, 0 replies; 41+ messages in thread
From: Linus Torvalds @ 2003-02-16 20:01 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo

On Sun, 16 Feb 2003, Linus Torvalds wrote:
> 
> See? ABBA _does_ happen with the task lock, it's just that the magic 
> required to do so is fairly unlikely thanks to the added requirement for 
> the irq to happen at just the right moment (ie there are no static 
> code-paths that can cause it).

Note: to clarify, this isn't in any way a new situation. As far as I can 
tell, this deadlock exists in 2.4.x too. And it's almost certainly pretty 
much impossible to trigger in practice, and as such we shouldn't need to 
be deeply worried about it. 

But it should make us think about the _design_ of locking in this region. 
Clearly we got it wrong, and clearly we never noticed for several years.

The simple fix is to make the task-lock be IRQ-safe. That fixes it, and
that's probably the right minimal solution for for 2.4.x (unless the "fix"  
for 2.4.x is to just ignore it since it's possible to trigger mostly in a
theoretical sense).

But assuming you accept that making the task-lock be irq-safe is the right
solution, then making it solve the signal handling and /proc scalability
issues is likely to be the right solution: since it's irq-safe, there's no 
real reason not to use it to protect task->{sighand | signal | parent} 
too.

			Linus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: more signal locking bugs?
  2003-02-16 19:42                       ` Linus Torvalds
  2003-02-16 20:01                         ` Linus Torvalds
@ 2003-02-16 20:07                         ` Manfred Spraul
  2003-02-16 20:10                           ` Linus Torvalds
  2003-02-17  0:23                           ` Linus Torvalds
  1 sibling, 2 replies; 41+ messages in thread
From: Manfred Spraul @ 2003-02-16 20:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo

On Sun, 16 Feb 2003, Linus Torvalds wrote:

> 
> On Sun, 16 Feb 2003, Manfred Spraul wrote:
> >
> > ABBA is not a deadlock, because linux read_locks permit recursive calls.
> > 
> >     read_lock(tasklist_lock);
> >     task_lock(tsk);
> >     read_lock(tasklist_lock);
> > 
> > Does not deadlock, nor any other ordering.
> > 
> > The tasklist_lock is never taken for write from bh or irq context.
> 
> Doesn't matter.
> 
> 	CPU1 - do_exit()	)
> 	write_lock_irq(&tasklist_lock);	
> 	task_lock(tsk);	
> 
> BOOM, you're dead.
> 
> See? ABBA _does_ happen with the task lock.
>
But these lines are not in 2.4 or 2.5.61.
The current rule to nesting tasklist_lock and task_lock is
- read_lock(&tasklist_lock) and task_lock can be mixed in any order.
- write_lock_irq(&tasklist_lock) and task_lock are incompatible.

What about this minimal patch? The performance critical operation is 
signal delivery - we should fix the synchronization between signal 
delivery and exec first.

--
	Manfred
<<
--- 2.5/fs/proc/array.c	2003-02-15 10:29:22.000000000 +0100
+++ build-2.5/fs/proc/array.c	2003-02-16 20:58:37.000000000 +0100
@@ -208,23 +208,27 @@
 {
 	sigset_t ign, catch;
 
-	buffer += sprintf(buffer, "SigPnd:\t");
-	buffer = render_sigset_t(&p->pending.signal, buffer);
-	*buffer++ = '\n';
-	buffer += sprintf(buffer, "ShdPnd:\t");
-	buffer = render_sigset_t(&p->signal->shared_pending.signal, buffer);
-	*buffer++ = '\n';
-	buffer += sprintf(buffer, "SigBlk:\t");
-	buffer = render_sigset_t(&p->blocked, buffer);
-	*buffer++ = '\n';
-
-	collect_sigign_sigcatch(p, &ign, &catch);
-	buffer += sprintf(buffer, "SigIgn:\t");
-	buffer = render_sigset_t(&ign, buffer);
-	*buffer++ = '\n';
-	buffer += sprintf(buffer, "SigCgt:\t"); /* Linux 2.0 uses "SigCgt" */
-	buffer = render_sigset_t(&catch, buffer);
-	*buffer++ = '\n';
+	read_lock(&tasklist_lock);
+	if (p->signal && p->sighand) {
+		buffer += sprintf(buffer, "SigPnd:\t");
+		buffer = render_sigset_t(&p->pending.signal, buffer);
+		*buffer++ = '\n';
+		buffer += sprintf(buffer, "ShdPnd:\t");
+		buffer = render_sigset_t(&p->signal->shared_pending.signal, buffer);
+		*buffer++ = '\n';
+		buffer += sprintf(buffer, "SigBlk:\t");
+		buffer = render_sigset_t(&p->blocked, buffer);
+		*buffer++ = '\n';
+
+		collect_sigign_sigcatch(p, &ign, &catch);
+		buffer += sprintf(buffer, "SigIgn:\t");
+		buffer = render_sigset_t(&ign, buffer);
+		*buffer++ = '\n';
+		buffer += sprintf(buffer, "SigCgt:\t"); /* Linux 2.0 uses "SigCgt" */
+		buffer = render_sigset_t(&catch, buffer);
+		*buffer++ = '\n';
+	}
+	read_unlock(&tasklist_lock);
 
 	return buffer;
 }
<<


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: more signal locking bugs?
  2003-02-16 20:07                         ` Manfred Spraul
@ 2003-02-16 20:10                           ` Linus Torvalds
  2003-02-16 20:23                             ` Manfred Spraul
  2003-02-17  0:23                           ` Linus Torvalds
  1 sibling, 1 reply; 41+ messages in thread
From: Linus Torvalds @ 2003-02-16 20:10 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo

On Sun, 16 Feb 2003, Manfred Spraul wrote:
> But these lines are not in 2.4 or 2.5.61.
> The current rule to nesting tasklist_lock and task_lock is
> - read_lock(&tasklist_lock) and task_lock can be mixed in any order.
> - write_lock_irq(&tasklist_lock) and task_lock are incompatible.

Oh, you're right, and you're right exactly _because_ "task->signal" isn't
protected by the task lock right now. Aurgh. I had already mentally done
that protection, which is why I thought we already had the bug.

So never mind. 2.4.x is obviously also ok.

> What about this minimal patch? The performance critical operation is 
> signal delivery - we should fix the synchronization between signal 
> delivery and exec first.

The patch looks ok, although I'd also remove the locking and testing from
collect_sigign_sigcatch() once it is done at a higher level.

And yeah, what about signal delivery? Put back the same lock there?

		Linus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: more signal locking bugs?
  2003-02-16 20:10                           ` Linus Torvalds
@ 2003-02-16 20:23                             ` Manfred Spraul
  0 siblings, 0 replies; 41+ messages in thread
From: Manfred Spraul @ 2003-02-16 20:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo

Linus Torvalds wrote:

>>What about this minimal patch? The performance critical operation is 
>>signal delivery - we should fix the synchronization between signal 
>>delivery and exec first.
>>    
>>
>
>The patch looks ok, although I'd also remove the locking and testing from
>collect_sigign_sigcatch() once it is done at a higher level.
>
>And yeah, what about signal delivery? Put back the same lock there?
>  
>
I don't know.
Taking read_lock(&tasklist_lock) for send_specific_sig_info might hurt 
scalability. Ingo?

If we do not want a global lock, then we have two options:
- make task_lock an interrupt spinlock.
- add a second spinlock to the task structure, for the signal stuff.

Design question - what's worse? Memory bloat or a few additional 
local_irq_{en,dis}able().
I don't care - no performance critical codepaths.
Additionally many task_lock()/task_unlock users could be replaced with 
spin_unlock_wait(&task->alloc_lock), which would not need the 
local_irq_disable().

--
    Manfred


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: more signal locking bugs?
  2003-02-16 20:07                         ` Manfred Spraul
  2003-02-16 20:10                           ` Linus Torvalds
@ 2003-02-17  0:23                           ` Linus Torvalds
  2003-02-17  2:05                             ` Martin J. Bligh
  2003-02-17  6:36                             ` more signal locking bugs? Manfred Spraul
  1 sibling, 2 replies; 41+ messages in thread
From: Linus Torvalds @ 2003-02-17  0:23 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo


On Sun, 16 Feb 2003, Manfred Spraul wrote:
> 
> What about this minimal patch? The performance critical operation is 
> signal delivery - we should fix the synchronization between signal 
> delivery and exec first.

Ok, I committed this alternative change, which isn't quite as minimal, but 
looks a lot cleaner to me.

Also, looking at execve() and other paths, we do seem to have sufficient 
protection from the tasklist_lock that signal delivery should be fine. So 
despite a long and confused thread, I think in the end the only real bug 
was the one Martin found which should be thus fixed..

		Linus

---
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.1054  -> 1.1055 
#	include/linux/signal.h	1.8     -> 1.9    
#	     fs/proc/array.c	1.42    -> 1.43   
#	      kernel/sched.c	1.158   -> 1.159  
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/02/16	torvalds@home.transmeta.com	1.1055
# Clean up and fix locking around signal rendering
# --------------------------------------------
#
diff -Nru a/fs/proc/array.c b/fs/proc/array.c
--- a/fs/proc/array.c	Sun Feb 16 16:22:45 2003
+++ b/fs/proc/array.c	Sun Feb 16 16:22:45 2003
@@ -180,51 +180,74 @@
 	return buffer;
 }
 
+static char * render_sigset_t(const char *header, sigset_t *set, char *buffer)
+{
+	int i, len;
+
+	len = strlen(header);
+	memcpy(buffer, header, len);
+	buffer += len;
+
+	i = _NSIG;
+	do {
+		int x = 0;
+
+		i -= 4;
+		if (sigismember(set, i+1)) x |= 1;
+		if (sigismember(set, i+2)) x |= 2;
+		if (sigismember(set, i+3)) x |= 4;
+		if (sigismember(set, i+4)) x |= 8;
+		*buffer++ = (x < 10 ? '0' : 'a' - 10) + x;
+	} while (i >= 4);
+
+	*buffer++ = '\n';
+	*buffer = 0;
+	return buffer;
+}
+
 static void collect_sigign_sigcatch(struct task_struct *p, sigset_t *ign,
 				    sigset_t *catch)
 {
 	struct k_sigaction *k;
 	int i;
 
-	sigemptyset(ign);
-	sigemptyset(catch);
+	k = p->sighand->action;
+	for (i = 1; i <= _NSIG; ++i, ++k) {
+		if (k->sa.sa_handler == SIG_IGN)
+			sigaddset(ign, i);
+		else if (k->sa.sa_handler != SIG_DFL)
+			sigaddset(catch, i);
+	}
+}
+
+static inline char * task_sig(struct task_struct *p, char *buffer)
+{
+	sigset_t pending, shpending, blocked, ignored, caught;
+
+	sigemptyset(&pending);
+	sigemptyset(&shpending);
+	sigemptyset(&blocked);
+	sigemptyset(&ignored);
+	sigemptyset(&caught);
 
+	/* Gather all the data with the appropriate locks held */
 	read_lock(&tasklist_lock);
 	if (p->sighand) {
 		spin_lock_irq(&p->sighand->siglock);
-		k = p->sighand->action;
-		for (i = 1; i <= _NSIG; ++i, ++k) {
-			if (k->sa.sa_handler == SIG_IGN)
-				sigaddset(ign, i);
-			else if (k->sa.sa_handler != SIG_DFL)
-				sigaddset(catch, i);
-		}
+		pending = p->pending.signal;
+		shpending = p->signal->shared_pending.signal;
+		blocked = p->blocked;
+		collect_sigign_sigcatch(p, &ignored, &caught);
 		spin_unlock_irq(&p->sighand->siglock);
 	}
 	read_unlock(&tasklist_lock);
-}
-
-static inline char * task_sig(struct task_struct *p, char *buffer)
-{
-	sigset_t ign, catch;
 
-	buffer += sprintf(buffer, "SigPnd:\t");
-	buffer = render_sigset_t(&p->pending.signal, buffer);
-	*buffer++ = '\n';
-	buffer += sprintf(buffer, "ShdPnd:\t");
-	buffer = render_sigset_t(&p->signal->shared_pending.signal, buffer);
-	*buffer++ = '\n';
-	buffer += sprintf(buffer, "SigBlk:\t");
-	buffer = render_sigset_t(&p->blocked, buffer);
-	*buffer++ = '\n';
-
-	collect_sigign_sigcatch(p, &ign, &catch);
-	buffer += sprintf(buffer, "SigIgn:\t");
-	buffer = render_sigset_t(&ign, buffer);
-	*buffer++ = '\n';
-	buffer += sprintf(buffer, "SigCgt:\t"); /* Linux 2.0 uses "SigCgt" */
-	buffer = render_sigset_t(&catch, buffer);
-	*buffer++ = '\n';
+	/* render them all */
+	buffer = render_sigset_t("SigPnd:\t", &pending, buffer);
+	buffer = render_sigset_t("ShdPnd:\t", &shpending, buffer);
+	buffer = render_sigset_t("SigBlk:\t", &blocked, buffer);
+	buffer = render_sigset_t("SigIgn:\t", &ignored, buffer);
+	buffer = render_sigset_t("SigCgt:\t", &caught, buffer);
 
 	return buffer;
 }
diff -Nru a/include/linux/signal.h b/include/linux/signal.h
--- a/include/linux/signal.h	Sun Feb 16 16:22:45 2003
+++ b/include/linux/signal.h	Sun Feb 16 16:22:45 2003
@@ -151,8 +151,6 @@
 	}
 }
 
-extern char * render_sigset_t(sigset_t *set, char *buffer);
-
 /* Some extensions for manipulating the low 32 signals in particular.  */
 
 static inline void sigaddsetmask(sigset_t *set, unsigned long mask)
diff -Nru a/kernel/sched.c b/kernel/sched.c
--- a/kernel/sched.c	Sun Feb 16 16:22:45 2003
+++ b/kernel/sched.c	Sun Feb 16 16:22:45 2003
@@ -2095,21 +2095,6 @@
 	}
 }
 
-char * render_sigset_t(sigset_t *set, char *buffer)
-{
-	int i = _NSIG, x;
-	do {
-		i -= 4, x = 0;
-		if (sigismember(set, i+1)) x |= 1;
-		if (sigismember(set, i+2)) x |= 2;
-		if (sigismember(set, i+3)) x |= 4;
-		if (sigismember(set, i+4)) x |= 8;
-		*buffer++ = (x < 10 ? '0' : 'a' - 10) + x;
-	} while (i >= 4);
-	*buffer = 0;
-	return buffer;
-}
-
 void show_state(void)
 {
 	task_t *g, *p;


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: more signal locking bugs?
  2003-02-17  0:23                           ` Linus Torvalds
@ 2003-02-17  2:05                             ` Martin J. Bligh
  2003-02-17  2:39                               ` Martin J. Bligh
  2003-02-17  6:36                             ` more signal locking bugs? Manfred Spraul
  1 sibling, 1 reply; 41+ messages in thread
From: Martin J. Bligh @ 2003-02-17  2:05 UTC (permalink / raw)
  To: Linus Torvalds, Manfred Spraul
  Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List,
	Zwane Mwaikambo

> Ok, I committed this alternative change, which isn't quite as minimal, but 
> looks a lot cleaner to me.
> 
> Also, looking at execve() and other paths, we do seem to have sufficient 
> protection from the tasklist_lock that signal delivery should be fine. So 
> despite a long and confused thread, I think in the end the only real bug 
> was the one Martin found which should be thus fixed..

Ran your patch ... but I get plenty of these now:

Unable to handle kernel NULL pointer dereference at virtual address 00000004
 printing eip:
c016d5a8
*pde = 2e10f001
*pte = 00000000
Oops: 0000
CPU:    2
EIP:    0060:[<c016d5a8>]    Not tainted
EFLAGS: 00010202
EIP is at collect_sigign_sigcatch+0x1c/0x44
eax: ed6d72c0   ebx: ed403f50   ecx: 00000004   edx: 00000001
esi: ed403f48   edi: ed6d72c0   ebp: 00000000   esp: ed403eec
ds: 007b   es: 007b   ss: 0068
Process ps (pid: 19988, threadinfo=ed402000 task=ed9352a0)
Stack: c011d115 c02af820 c016dab8 ed6d72c0 ed403f48 ed403f50 ed6d72c0 edb99a50 
       ed6d72c0 eebf33c0 ee225000 c034e400 ed403f50 ed403f48 00000000 52000000 
       00008800 00000125 eebf33c0 eebf33e0 00000000 00000000 00000000 000000d0 
Call Trace:
 [<c011d115>] do_exit+0x30d/0x31c
 [<c016dab8>] proc_pid_stat+0x110/0x324
 [<c0130076>] __get_free_pages+0x4e/0x54
 [<c016b497>] proc_info_read+0x53/0x130
 [<c0145215>] vfs_read+0xa5/0x128
 [<c01454a2>] sys_read+0x2a/0x3c
 [<c0108b3f>] syscall_call+0x7/0xb

Code: 8b 01 83 f8 01 75 08 8d 42 ff 0f ab 06 eb 0a 85 c0 74 06 8d 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: more signal locking bugs?
  2003-02-17  2:05                             ` Martin J. Bligh
@ 2003-02-17  2:39                               ` Martin J. Bligh
  2003-02-17  3:53                                 ` Linus Torvalds
  2003-02-17  3:54                                 ` [PATCH] fix secondary oops in sighand locking Martin J. Bligh
  0 siblings, 2 replies; 41+ messages in thread
From: Martin J. Bligh @ 2003-02-17  2:39 UTC (permalink / raw)
  To: Linus Torvalds, Manfred Spraul
  Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List,
	Zwane Mwaikambo

>> Ok, I committed this alternative change, which isn't quite as minimal, but 
>> looks a lot cleaner to me.
>> 
>> Also, looking at execve() and other paths, we do seem to have sufficient 
>> protection from the tasklist_lock that signal delivery should be fine. So 
>> despite a long and confused thread, I think in the end the only real bug 
>> was the one Martin found which should be thus fixed..
> 
> Ran your patch ... but I get plenty of these now:
> 
> Unable to handle kernel NULL pointer dereference at virtual address 00000004
>  printing eip:
> c016d5a8
> *pde = 2e10f001
> *pte = 00000000
> Oops: 0000
> CPU:    2
> EIP:    0060:[<c016d5a8>]    Not tainted
> EFLAGS: 00010202
> EIP is at collect_sigign_sigcatch+0x1c/0x44
> eax: ed6d72c0   ebx: ed403f50   ecx: 00000004   edx: 00000001
> esi: ed403f48   edi: ed6d72c0   ebp: 00000000   esp: ed403eec
> ds: 007b   es: 007b   ss: 0068
> Process ps (pid: 19988, threadinfo=ed402000 task=ed9352a0)
> Stack: c011d115 c02af820 c016dab8 ed6d72c0 ed403f48 ed403f50 ed6d72c0 edb99a50 
>        ed6d72c0 eebf33c0 ee225000 c034e400 ed403f50 ed403f48 00000000 52000000 
>        00008800 00000125 eebf33c0 eebf33e0 00000000 00000000 00000000 000000d0 
> Call Trace:
>  [<c011d115>] do_exit+0x30d/0x31c
>  [<c016dab8>] proc_pid_stat+0x110/0x324
>  [<c0130076>] __get_free_pages+0x4e/0x54
>  [<c016b497>] proc_info_read+0x53/0x130
>  [<c0145215>] vfs_read+0xa5/0x128
>  [<c01454a2>] sys_read+0x2a/0x3c
>  [<c0108b3f>] syscall_call+0x7/0xb
> 
> Code: 8b 01 83 f8 01 75 08 8d 42 ff 0f ab 06 eb 0a 85 c0 74 06 8d 

Ah, I see what happened, I think .... the locking used to be inside
collect_sigign_sigcatch, and you moved it out into task_sig ... but 
there were two callers of collect_sigign_sigcatch, the other one being
proc_pid_stat ... which now needs to have appropriate locking added to
it as well to match the change ... I'll see if I can get something 
together that works for that.

M.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: more signal locking bugs?
  2003-02-17  2:39                               ` Martin J. Bligh
@ 2003-02-17  3:53                                 ` Linus Torvalds
  2003-02-17  5:07                                   ` Martin J. Bligh
                                                     ` (2 more replies)
  2003-02-17  3:54                                 ` [PATCH] fix secondary oops in sighand locking Martin J. Bligh
  1 sibling, 3 replies; 41+ messages in thread
From: Linus Torvalds @ 2003-02-17  3:53 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Manfred Spraul, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo


On Sun, 16 Feb 2003, Martin J. Bligh wrote:
> 
> Ah, I see what happened, I think .... the locking used to be inside
> collect_sigign_sigcatch, and you moved it out into task_sig ... but 
> there were two callers of collect_sigign_sigcatch, the other one being
> proc_pid_stat

Doh.

This should fix it. 

		Linus

---
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.1055  -> 1.1056 
#	     fs/proc/array.c	1.43    -> 1.44   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/02/16	torvalds@home.transmeta.com	1.1056
# Do proper signal locking for the old-style /proc/stat too.
# --------------------------------------------
#
diff -Nru a/fs/proc/array.c b/fs/proc/array.c
--- a/fs/proc/array.c	Sun Feb 16 19:52:30 2003
+++ b/fs/proc/array.c	Sun Feb 16 19:52:30 2003
@@ -316,7 +316,13 @@
 
 	wchan = get_wchan(task);
 
-	collect_sigign_sigcatch(task, &sigign, &sigcatch);
+	read_lock(&tasklist_lock);
+	if (task->sighand) {
+		spin_lock_irq(&task->sighand->siglock);
+		collect_sigign_sigcatch(task, &sigign, &sigcatch);
+		spin_lock_irq(&task->sighand->siglock);
+	}
+	read_unlock(&tasklist_lock);		
 
 	/* scale priority and nice values from timeslices to -20..20 */
 	/* to make it look like a "normal" Unix priority/nice value  */


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: more signal locking bugs?
  2003-02-17  3:53                                 ` Linus Torvalds
@ 2003-02-17  5:07                                   ` Martin J. Bligh
  2003-02-17  6:17                                   ` Martin J. Bligh
  2003-02-17 17:09                                   ` Magnus Naeslund(f)
  2 siblings, 0 replies; 41+ messages in thread
From: Martin J. Bligh @ 2003-02-17  5:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Manfred Spraul, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo

>> Ah, I see what happened, I think .... the locking used to be inside
>> collect_sigign_sigcatch, and you moved it out into task_sig ... but 
>> there were two callers of collect_sigign_sigcatch, the other one being
>> proc_pid_stat
> 
> Doh.
> 
> This should fix it. 

Don't you need this bit as well?

+	sigemptyset(&sigign);
+	sigemptyset(&sigcatch);

to replace this bit from your previous patch.

 static void collect_sigign_sigcatch(struct task_struct *p, sigset_t *ign,
                                    sigset_t *catch)
 {
        struct k_sigaction *k;
        int i;
 
-       sigemptyset(ign);
-       sigemptyset(catch);

That was in the patch I sent you ... but I missed task->sighand->siglock ;-)

M.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: more signal locking bugs?
  2003-02-17  3:53                                 ` Linus Torvalds
  2003-02-17  5:07                                   ` Martin J. Bligh
@ 2003-02-17  6:17                                   ` Martin J. Bligh
  2003-02-17 17:09                                   ` Magnus Naeslund(f)
  2 siblings, 0 replies; 41+ messages in thread
From: Martin J. Bligh @ 2003-02-17  6:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Manfred Spraul, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo

>> Ah, I see what happened, I think .... the locking used to be inside
>> collect_sigign_sigcatch, and you moved it out into task_sig ... but 
>> there were two callers of collect_sigign_sigcatch, the other one being
>> proc_pid_stat
> 
> Doh.
> 
> This should fix it. 

Oooh, not only does SDET work now in 61, it doesn't freeze the whole box
when I hit ^C any more (like it's been doing since the dawn of time).
Spiffy ;-)


M.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: more signal locking bugs?
  2003-02-17  3:53                                 ` Linus Torvalds
  2003-02-17  5:07                                   ` Martin J. Bligh
  2003-02-17  6:17                                   ` Martin J. Bligh
@ 2003-02-17 17:09                                   ` Magnus Naeslund(f)
  2 siblings, 0 replies; 41+ messages in thread
From: Magnus Naeslund(f) @ 2003-02-17 17:09 UTC (permalink / raw)
  To: Linus Torvalds, Martin J. Bligh
  Cc: Manfred Spraul, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo

Linus Torvalds <torvalds@transmeta.com> wrote:
> 
> Doh.
> 
> This should fix it.

[snip]

> + spin_lock_irq(&task->sighand->siglock);
> + collect_sigign_sigcatch(task, &sigign, &sigcatch);
> + spin_lock_irq(&task->sighand->siglock);

[snip]

You take the lock twice here?

Magnus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] fix secondary oops in sighand locking
  2003-02-17  2:39                               ` Martin J. Bligh
  2003-02-17  3:53                                 ` Linus Torvalds
@ 2003-02-17  3:54                                 ` Martin J. Bligh
  1 sibling, 0 replies; 41+ messages in thread
From: Martin J. Bligh @ 2003-02-17  3:54 UTC (permalink / raw)
  To: Linus Torvalds, Manfred Spraul
  Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List,
	Zwane Mwaikambo

>> Ran your patch ... but I get plenty of these now:
>> 
>> Unable to handle kernel NULL pointer dereference at virtual address 00000004
>> ...
> 
> Ah, I see what happened, I think .... the locking used to be inside
> collect_sigign_sigcatch, and you moved it out into task_sig ... but 
> there were two callers of collect_sigign_sigcatch, the other one being
> proc_pid_stat ... which now needs to have appropriate locking added to
> it as well to match the change ... I'll see if I can get something 
> together that works for that.

OK, with this small extra patch, we now survive a beating with SDET
without oops or freeze. The patch adds the locking that was removed
from collect_sigign_sigcatch to the second caller (proc_pid_stat),
as well as task_sig.

diff -urpN -X /home/fletch/.diff.exclude linus-sdet/fs/proc/array.c linus-sdet-fix/fs/proc/array.c
--- linus-sdet/fs/proc/array.c	Sun Feb 16 18:48:04 2003
+++ linus-sdet-fix/fs/proc/array.c	Sun Feb 16 18:51:34 2003
@@ -316,7 +316,12 @@ int proc_pid_stat(struct task_struct *ta
 
 	wchan = get_wchan(task);
 
-	collect_sigign_sigcatch(task, &sigign, &sigcatch);
+	sigemptyset(&sigign);
+	sigemptyset(&sigcatch);
+	read_lock(&tasklist_lock);
+	if (task->sighand)
+		collect_sigign_sigcatch(task, &sigign, &sigcatch);
+	read_unlock(&tasklist_lock);
 
 	/* scale priority and nice values from timeslices to -20..20 */
 	/* to make it look like a "normal" Unix priority/nice value  */


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: more signal locking bugs?
  2003-02-17  0:23                           ` Linus Torvalds
  2003-02-17  2:05                             ` Martin J. Bligh
@ 2003-02-17  6:36                             ` Manfred Spraul
  2003-02-17 19:06                               ` Linus Torvalds
  1 sibling, 1 reply; 41+ messages in thread
From: Manfred Spraul @ 2003-02-17  6:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo

Linus Torvalds wrote:

>On Sun, 16 Feb 2003, Manfred Spraul wrote:
>  
>
>>What about this minimal patch? The performance critical operation is 
>>signal delivery - we should fix the synchronization between signal 
>>delivery and exec first.
>>    
>>
>
>Ok, I committed this alternative change, which isn't quite as minimal, but 
>looks a lot cleaner to me.
>
>Also, looking at execve() and other paths, we do seem to have sufficient 
>protection from the tasklist_lock that signal delivery should be fine. So 
>despite a long and confused thread, I think in the end the only real bug 
>was the one Martin found which should be thus fixed..
>  
>
I'm not convinced that exec is correct.
app with two threads, cloned sighand and sig structures.
thread one does exec().
thread two does exit().

Now we can arrive at no_thread_group in de_thread() and 
tsk->sig{,hand}->count == 1.

>no_thread_group:
>
>	write_lock_irq(&tasklist_lock);
>	spin_lock(&oldsighand->siglock);
>	spin_lock(&newsighand->siglock);
>
>	if (current == oldsig->curr_target)
>		oldsig->curr_target = next_thread(current);
>
signal sender: in send_sig_info().
reads tsk->signal. blocks on tsk->sighand->siglock.

>	if (newsig)
>		current->signal = newsig;
>	current->sighand = newsighand;
>	init_sigpending(&current->pending);
>	recalc_sigpending();
>
>	spin_unlock(&newsighand->siglock);
>	spin_unlock(&oldsighand->siglock);
>	write_unlock_irq(&tasklist_lock);
>
>	if (newsig && atomic_dec_and_test(&oldsig->count))
>		kmem_cache_free(signal_cachep, oldsig);
>
>	if (atomic_dec_and_test(&oldsighand->count))
>		kmem_cache_free(sighand_cachep, oldsighand);
>  
>
And now the signal sender continues. BOOM.
sighand structure, sig structure already kfreed, etc.

--
    Manfred


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: more signal locking bugs?
  2003-02-17  6:36                             ` more signal locking bugs? Manfred Spraul
@ 2003-02-17 19:06                               ` Linus Torvalds
  0 siblings, 0 replies; 41+ messages in thread
From: Linus Torvalds @ 2003-02-17 19:06 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton,
	Kernel Mailing List, Zwane Mwaikambo

On Mon, 17 Feb 2003, Manfred Spraul wrote:
>
> I'm not convinced that exec is correct.

No, I think you're right. And I think just fixing send_sig_info() to take 
the tasklist lock is the right thing to do.

That still leaves force_sig_info() without the tasklist lock, but since 
that is only used for page faults etc synchronous errors, that's fine (if 
we get a synchronous error in the kernel, we have bigger problems than 
signal locking).

		Linus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16  4:00       ` Martin J. Bligh
  2003-02-16 13:05         ` Anton Blanchard
@ 2003-02-16 18:07         ` Linus Torvalds
  2003-02-16 18:26           ` Martin J. Bligh
  1 sibling, 1 reply; 41+ messages in thread
From: Linus Torvalds @ 2003-02-16 18:07 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, Kernel Mailing List

On Sat, 15 Feb 2003, Martin J. Bligh wrote:
> 
> OK, I did the following, which is what I think you wanted, plus Zwane's
> observation that task_state acquires the task_struct lock (we're the only 
> caller, so I just removed it), but I still get the same panic and this time
> the box hung.

Yeah, a closer look shows that the exit path doesn't actually take the
task lock at all around any of the signal stuff, so the lock protects
"task->mm", "task->files" and "task->fs", but it does NOT protect
"task->signal" or "task->sighand" (illogical, but true).

Oh, damn.

The core that checks for "p->sighand" takes the tasklist lock for this
reason (see "collect_sigign_sigcatch()"

So the choice seems to be either:

 - make the exit path hold the task lock over the whole exit path, not 
   just over mm exit.

 - take the "tasklist_lock" over more of "task_sig()" (not just the 
   collect_sigign_sigcatch() thing, but the "&p->signal->shared_pending"  
   rendering too.

The latter is a two-liner. The former is the "right thing" for multiple 
reasons.

The reason I'd _like_ to see the task lock held over _all_ of the fields
in the exit() path is:

 - right now we actually take it and drop it multiple times in exit. See 
   __exit_files(), __exit_fs(), __exit_mm(). Which all take it just to 
   serialize setting ot "mm/files/fs" to NULL.

 - task_lock is a nice local lock, no global scalability impact.

So it really would be much nicer to do something like this in do_exit():

	struct mm_struct *mm;
	struct files_struct *files;
	struct fs_struct *fs;
	struct signal_struct *signal;
	struct sighand_struct *sighand;

	task_lock(task);
	mm = task->mm;
	files = task->files;
	fs = task->fs;
	signal = task->signal;
	sighand = task->sighand;

	task->mm = NULL;
	task->files = NULL;
	task->fs = NULL;
	task->signal = NULL;
	task->sighand = NULL;
	task_unlock(task);

	.. actually do the "__exit_mm(task, mm)" etc here ..

which would make things a lot more readable, and result in us taking the
lock only _once_ instead of three times, and would properly protect
"signal" and "sighand" so that the /proc code wouldn't need to take the
heavily used "tasklist_lock" just to read the signal state for a single
task.

But fixing up exit to do the above would require several (trivial) calling 
convention changes, like changing 

	static inline void __exit_mm(struct task_struct * tsk)
	{
	        struct mm_struct *mm = tsk->mm;
		...

into

	static inline void __exit_mm(struct task_struct * tsk,
			struct mm_struct *mm)
	{
		...

instead and updatign the callers. 

Is anybody willing to do that (hopefully fairly trivial) fixup and test
it, or should we go with the stupid "take the 'tasklist_lock'" approach?

			Linus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16 18:07         ` Fw: 2.5.61 oops running SDET Linus Torvalds
@ 2003-02-16 18:26           ` Martin J. Bligh
  2003-02-16 18:36             ` Martin J. Bligh
  0 siblings, 1 reply; 41+ messages in thread
From: Martin J. Bligh @ 2003-02-16 18:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, Kernel Mailing List

> So the choice seems to be either:
> 
>  - make the exit path hold the task lock over the whole exit path, not 
>    just over mm exit.
> 
>  - take the "tasklist_lock" over more of "task_sig()" (not just the 
>    collect_sigign_sigcatch() thing, but the "&p->signal->shared_pending"  
>    rendering too.
> 
> The latter is a two-liner. The former is the "right thing" for multiple 
> reasons.
> 
> The reason I'd _like_ to see the task lock held over _all_ of the fields
> in the exit() path is:
> 
>  - right now we actually take it and drop it multiple times in exit. See 
>    __exit_files(), __exit_fs(), __exit_mm(). Which all take it just to 
>    serialize setting ot "mm/files/fs" to NULL.
> 
>  - task_lock is a nice local lock, no global scalability impact.

Don't we have to take tasklist_lock anyway for when task_state reads
p->real_parent->pid and p->parent->pid? Or should we have to grab
the task_lock for those too? If so, it sounds like it could get into 
rather horrible ordering dependencies to me. 

If we move exit under task_lock ... then tasklist_lock doesn't help us
any more there either (presumably we're trying to stop them exiting 
whilst we look at them) ... I've coded up the stupid approach for now, 
and will check that works first. Should have results very soon.

M.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16 18:26           ` Martin J. Bligh
@ 2003-02-16 18:36             ` Martin J. Bligh
  0 siblings, 0 replies; 41+ messages in thread
From: Martin J. Bligh @ 2003-02-16 18:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, Kernel Mailing List

>> The reason I'd _like_ to see the task lock held over _all_ of the fields
>> in the exit() path is:
>> 
>>  - right now we actually take it and drop it multiple times in exit. See 
>>    __exit_files(), __exit_fs(), __exit_mm(). Which all take it just to 
>>    serialize setting ot "mm/files/fs" to NULL.
>> 
>>  - task_lock is a nice local lock, no global scalability impact.
> 
> Don't we have to take tasklist_lock anyway for when task_state reads
> p->real_parent->pid and p->parent->pid? Or should we have to grab
> the task_lock for those too? If so, it sounds like it could get into 
> rather horrible ordering dependencies to me. 
> 
> If we move exit under task_lock ... then tasklist_lock doesn't help us
> any more there either (presumably we're trying to stop them exiting 
> whilst we look at them) ... I've coded up the stupid approach for now, 
> and will check that works first. Should have results very soon.

Hmmmm ... I guess we could keep real_parent->pid and parent->pid inside
the child's task_struct ... I can't see those change unless we're changing
real_parent / parent task pointers anyway. Not sure how much that idea 
will make people vomit, but it would seem to make it easier to get rid
of the global locking, and it's probably nicer than trying to order the
aquistion of other people's task locks ...

M.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16  2:09   ` Martin J. Bligh
  2003-02-16  2:27     ` Linus Torvalds
@ 2003-02-16  2:48     ` Zwane Mwaikambo
  1 sibling, 0 replies; 41+ messages in thread
From: Zwane Mwaikambo @ 2003-02-16  2:48 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Linus Torvalds, Andrew Morton, Kernel Mailing List

On Sat, 15 Feb 2003, Martin J. Bligh wrote:

> -	struct mm_struct *mm = get_task_mm(task);
> +	struct mm_struct *mm;
>  
> +	task_lock(task);
> +	mm = __get_task_mm(task);
>  	buffer = task_name(task, buffer);
>  	buffer = task_state(task, buffer);

task_state acquires the task_struct lock.

	Zwane
-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
@ 2003-02-16 17:41 Manfred Spraul
  2003-02-16 18:15 ` Linus Torvalds
  0 siblings, 1 reply; 41+ messages in thread
From: Manfred Spraul @ 2003-02-16 17:41 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: linux-kernel, Linus Torvalds, Anton Blanchard, Zwane Mwaikambo

>
>
>OK, I did the following, which is what I think you wanted, plus Zwane's
>observation that task_state acquires the task_struct lock (we're the only 
>caller, so I just removed it), but I still get the same panic and this time
>the box hung.
>

AFAICS both exec and exit rely on write_lock_irq(tasklist_lock) for 
synchronization of changes to tsk->sig{,hand}.
I bet an __exit_sighand occurs in the middle of proc_pid_status() - 
after the NULL test, before the access in task_sig.

Martin, could you check what happens if you do not release the 
tasklist_lock until after the task_sig()?

--
    Manfred

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16 17:41 Manfred Spraul
@ 2003-02-16 18:15 ` Linus Torvalds
  2003-02-16 18:45   ` Manfred Spraul
  0 siblings, 1 reply; 41+ messages in thread
From: Linus Torvalds @ 2003-02-16 18:15 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Martin J. Bligh, linux-kernel, Anton Blanchard, Zwane Mwaikambo

On Sun, 16 Feb 2003, Manfred Spraul wrote:
> 
> AFAICS both exec and exit rely on write_lock_irq(tasklist_lock) for 
> synchronization of changes to tsk->sig{,hand}.

Yeah, as I sent out in my last email this does seem to be true right now, 
but it's really not correct. It's disgusting that we use such a 
fundamental global lock to protect something so trivially local to the one 
process, where the local per-process lock really should be more than 
enough.

		Linus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16 18:15 ` Linus Torvalds
@ 2003-02-16 18:45   ` Manfred Spraul
  2003-02-16 18:56     ` Linus Torvalds
  0 siblings, 1 reply; 41+ messages in thread
From: Manfred Spraul @ 2003-02-16 18:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin J. Bligh, linux-kernel, Anton Blanchard, Zwane Mwaikambo

Linus Torvalds wrote:

>On Sun, 16 Feb 2003, Manfred Spraul wrote:
>  
>
>>AFAICS both exec and exit rely on write_lock_irq(tasklist_lock) for 
>>synchronization of changes to tsk->sig{,hand}.
>>    
>>
>
>Yeah, as I sent out in my last email this does seem to be true right now, 
>but it's really not correct. It's disgusting that we use such a 
>fundamental global lock to protect something so trivially local to the one 
>process, where the local per-process lock really should be more than 
>enough.
>
The difference between the tasklist_lock and task_lock is that task_lock 
is not an interrupt lock.
Think about signal delivery during exec.

Do you want to replace tasklist_lock with task_lock in exit_sighand() 
and during exec, or do you want to add task_lock to tasklist_lock?

Hmm.
Someone removed the read_lock(tasklist_lock) around 
send_specific_sig_info() - which lock synchronizes exec and signal delivery?

--
    Manfred


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Fw: 2.5.61 oops running SDET
  2003-02-16 18:45   ` Manfred Spraul
@ 2003-02-16 18:56     ` Linus Torvalds
  0 siblings, 0 replies; 41+ messages in thread
From: Linus Torvalds @ 2003-02-16 18:56 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Martin J. Bligh, linux-kernel, Anton Blanchard, Zwane Mwaikambo,
	Ingo Molnar

On Sun, 16 Feb 2003, Manfred Spraul wrote:
> 
> Do you want to replace tasklist_lock with task_lock in exit_sighand() 
> and during exec, or do you want to add task_lock to tasklist_lock?

I'd rather replace it, but your point about irq-safeness is a good one. 
Clearly signal sending cannot take the task lock if we want to keep it 
irq-unsafe.

So the signal stuff probably does need to be protected by _both_ the 
tasklist lock and the provate task lock.

> Someone removed the read_lock(tasklist_lock) around 
> send_specific_sig_info() - which lock synchronizes exec and signal delivery?

Looks like none. And I think it was Ingo who removed it. Ingo?

		Linus

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2003-02-17 19:00 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20030215172407.1fdd41fd.akpm@digeo.com>
2003-02-16  1:35 ` Fw: 2.5.61 oops running SDET Linus Torvalds
2003-02-16  2:09   ` Martin J. Bligh
2003-02-16  2:27     ` Linus Torvalds
2003-02-16  4:00       ` Martin J. Bligh
2003-02-16 13:05         ` Anton Blanchard
2003-02-16 16:39           ` Martin J. Bligh
2003-02-16 18:21             ` Linus Torvalds
2003-02-16 19:06               ` Martin J. Bligh
2003-02-16 19:17                 ` Linus Torvalds
2003-02-16 21:15                   ` Martin J. Bligh
2003-02-16 21:21                     ` Manfred Spraul
2003-02-16 22:34                       ` Linus Torvalds
2003-02-16 23:08                         ` Martin J. Bligh
2003-02-16 23:32                           ` Linus Torvalds
2003-02-16 19:18                 ` Manfred Spraul
2003-02-16 19:19                 ` more signal locking bugs? Martin J. Bligh
2003-02-16 19:24                   ` Linus Torvalds
2003-02-16 19:37                     ` Manfred Spraul
2003-02-16 19:42                       ` Linus Torvalds
2003-02-16 20:01                         ` Linus Torvalds
2003-02-16 20:07                         ` Manfred Spraul
2003-02-16 20:10                           ` Linus Torvalds
2003-02-16 20:23                             ` Manfred Spraul
2003-02-17  0:23                           ` Linus Torvalds
2003-02-17  2:05                             ` Martin J. Bligh
2003-02-17  2:39                               ` Martin J. Bligh
2003-02-17  3:53                                 ` Linus Torvalds
2003-02-17  5:07                                   ` Martin J. Bligh
2003-02-17  6:17                                   ` Martin J. Bligh
2003-02-17 17:09                                   ` Magnus Naeslund(f)
2003-02-17  3:54                                 ` [PATCH] fix secondary oops in sighand locking Martin J. Bligh
2003-02-17  6:36                             ` more signal locking bugs? Manfred Spraul
2003-02-17 19:06                               ` Linus Torvalds
2003-02-16 18:07         ` Fw: 2.5.61 oops running SDET Linus Torvalds
2003-02-16 18:26           ` Martin J. Bligh
2003-02-16 18:36             ` Martin J. Bligh
2003-02-16  2:48     ` Zwane Mwaikambo
2003-02-16 17:41 Manfred Spraul
2003-02-16 18:15 ` Linus Torvalds
2003-02-16 18:45   ` Manfred Spraul
2003-02-16 18:56     ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox