* Re: Fw: 2.5.61 oops running SDET [not found] <20030215172407.1fdd41fd.akpm@digeo.com> @ 2003-02-16 1:35 ` Linus Torvalds 2003-02-16 2:09 ` Martin J. Bligh 0 siblings, 1 reply; 41+ messages in thread From: Linus Torvalds @ 2003-02-16 1:35 UTC (permalink / raw) To: Andrew Morton; +Cc: Martin J. Bligh, Kernel Mailing List On Sat, 15 Feb 2003, Andrew Morton wrote: > > The recent change to fs/proc/array.c:task_sig()? > > buffer += sprintf(buffer, "ShdPnd:\t"); > buffer = render_sigset_t(&p->signal->shared_pending.signal, buffer); Yeah, but I think the bug has existed for much longer. It looks like "proc_pid_status()" doesn't actually lock the task at all, nor even bother to test whether the task has signal state. Which has _always_ been a bug. I don't know why it would start happening now, but it might just be unlucky timing. I think the proper fix is to put a task_lock() .. task_unlock() around the whole proc_pid_status() function, _and_ then verify that "tsk->sighand" is non-NULL. (Oh, careful, that's already what "get_task_mm()" does internally, so look out for deadlocks - you'd need to open-code the get_task_mm() in there too, so the end result is something like task_lock(task) if (task->mm) { .. mm state } if (task->sighand) { .. signal state } .. task_unlock(task); instead). Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 1:35 ` Fw: 2.5.61 oops running SDET Linus Torvalds @ 2003-02-16 2:09 ` Martin J. Bligh 2003-02-16 2:27 ` Linus Torvalds 2003-02-16 2:48 ` Zwane Mwaikambo 0 siblings, 2 replies; 41+ messages in thread From: Martin J. Bligh @ 2003-02-16 2:09 UTC (permalink / raw) To: Linus Torvalds, Andrew Morton; +Cc: Kernel Mailing List > Yeah, but I think the bug has existed for much longer. > > It looks like "proc_pid_status()" doesn't actually lock the task at all, > nor even bother to test whether the task has signal state. Which has > _always_ been a bug. I don't know why it would start happening now, but > it might just be unlucky timing. > > I think the proper fix is to put a > > task_lock() > .. > task_unlock() > > around the whole proc_pid_status() function, _and_ then verify that > "tsk->sighand" is non-NULL. > > (Oh, careful, that's already what "get_task_mm()" does internally, so > look out for deadlocks - you'd need to open-code the get_task_mm() in > there too, so the end result is something like > > task_lock(task) > if (task->mm) { > .. mm state > } > if (task->sighand) { > .. signal state > } > .. > task_unlock(task); > > instead). Something like this? Compiles, but not tested yet ... diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c proc/fs/proc/array.c --- virgin/fs/proc/array.c Sat Feb 15 16:11:45 2003 +++ proc/fs/proc/array.c Sat Feb 15 18:05:10 2003 @@ -243,8 +243,10 @@ extern char *task_mem(struct mm_struct * int proc_pid_status(struct task_struct *task, char * buffer) { char * orig = buffer; - struct mm_struct *mm = get_task_mm(task); + struct mm_struct *mm; + task_lock(task); + mm = __get_task_mm(task); buffer = task_name(task, buffer); buffer = task_state(task, buffer); @@ -257,6 +259,7 @@ int proc_pid_status(struct task_struct * #if defined(CONFIG_ARCH_S390) buffer = task_show_regs(task, buffer); #endif + task_unlock(task); return buffer - orig; } diff -urpN -X /home/fletch/.diff.exclude virgin/include/linux/sched.h proc/include/linux/sched.h --- virgin/include/linux/sched.h Sat Feb 15 16:11:47 2003 +++ proc/include/linux/sched.h Sat Feb 15 18:04:42 2003 @@ -706,6 +706,18 @@ static inline struct mm_struct * get_tas return mm; } + +/* lockless version of get_task_mm */ +static inline struct mm_struct * __get_task_mm(struct task_struct * task) +{ + struct mm_struct * mm; + + mm = task->mm; + if (mm) + atomic_inc(&mm->mm_users); + + return mm; +} /* set thread flags in other task's structures ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 2:09 ` Martin J. Bligh @ 2003-02-16 2:27 ` Linus Torvalds 2003-02-16 4:00 ` Martin J. Bligh 2003-02-16 2:48 ` Zwane Mwaikambo 1 sibling, 1 reply; 41+ messages in thread From: Linus Torvalds @ 2003-02-16 2:27 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Andrew Morton, Kernel Mailing List On Sat, 15 Feb 2003, Martin J. Bligh wrote: > > Something like this? Compiles, but not tested yet ... Close, but you also need - buffer = task_sig(task, buffer); + if (task->sighand) + buffer = task_sig(task, buffer); to actually check whether signals exist or not. Otherwise you'll just get the same oops anyway (well, keeping the task locked may change timings enough that it doesn't happen, but the bug will continue to be there. I would also say that since you explicitly take the task lock, there's no real reason to use "get_task_mm()" at all any more, so instead of doing that (and then doing the mmput()), just get rid of the mm variable entirely, and do if (task->mm) buffer = task_mem(task->mm, buffer) too. There's really no downside to just holding on to the task lock over the whole operation instead of incrementing and decrementing the mm counts and dropping the lock early. (There are a few valid reasons for using the "get_task_mm()" function: - you need to block and thus drop the task lock - the original code just used "task->mm" directly, and using "get_task_mm()" made the original conversion to mm safe handling easier. Neither reason is really valid any more in this function at least). Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 2:27 ` Linus Torvalds @ 2003-02-16 4:00 ` Martin J. Bligh 2003-02-16 13:05 ` Anton Blanchard 2003-02-16 18:07 ` Fw: 2.5.61 oops running SDET Linus Torvalds 0 siblings, 2 replies; 41+ messages in thread From: Martin J. Bligh @ 2003-02-16 4:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, Kernel Mailing List > Close, but you also need > > - buffer = task_sig(task, buffer); > + if (task->sighand) > + buffer = task_sig(task, buffer); > > to actually check whether signals exist or not. Otherwise you'll just get > the same oops anyway (well, keeping the task locked may change timings > enough that it doesn't happen, but the bug will continue to be there. > > I would also say that since you explicitly take the task lock, there's no > real reason to use "get_task_mm()" at all any more, so instead of doing > that (and then doing the mmput()), just get rid of the mm variable > entirely, and do > > if (task->mm) > buffer = task_mem(task->mm, buffer) > > too. There's really no downside to just holding on to the task lock over > the whole operation instead of incrementing and decrementing the mm > counts and dropping the lock early. > > (There are a few valid reasons for using the "get_task_mm()" function: > > - you need to block and thus drop the task lock > > - the original code just used "task->mm" directly, and using > "get_task_mm()" made the original conversion to mm safe handling > easier. > > Neither reason is really valid any more in this function at least). OK, I did the following, which is what I think you wanted, plus Zwane's observation that task_state acquires the task_struct lock (we're the only caller, so I just removed it), but I still get the same panic and this time the box hung. No doubt I've just done something stupid in the patch ... (task_state takes the tasklist_lock ... is that safe to do inside task_lock?) diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet/fs/proc/array.c --- virgin/fs/proc/array.c Sat Feb 15 16:11:45 2003 +++ sdet/fs/proc/array.c Sat Feb 15 19:28:34 2003 @@ -166,12 +166,10 @@ static inline char * task_state(struct t p->uid, p->euid, p->suid, p->fsuid, p->gid, p->egid, p->sgid, p->fsgid); read_unlock(&tasklist_lock); - task_lock(p); buffer += sprintf(buffer, "FDSize:\t%d\n" "Groups:\t", p->files ? p->files->max_fds : 0); - task_unlock(p); for (g = 0; g < p->ngroups; g++) buffer += sprintf(buffer, "%d ", p->groups[g]); @@ -243,20 +241,20 @@ extern char *task_mem(struct mm_struct * int proc_pid_status(struct task_struct *task, char * buffer) { char * orig = buffer; - struct mm_struct *mm = get_task_mm(task); + task_lock(task); buffer = task_name(task, buffer); buffer = task_state(task, buffer); - if (mm) { - buffer = task_mem(mm, buffer); - mmput(mm); - } - buffer = task_sig(task, buffer); + if (task->mm) + buffer = task_mem(task->mm, buffer); + if (task->sighand) + buffer = task_sig(task, buffer); buffer = task_cap(task, buffer); #if defined(CONFIG_ARCH_S390) buffer = task_show_regs(task, buffer); #endif + task_unlock(task); return buffer - orig; } Unable to handle kernel NULL pointer dereference at virtual address 00000014 printing eip: c0118b13 *pde = 2d4bf001 *pte = 00000000 Oops: 0000 CPU: 6 EIP: 0060:[<c0118b13>] Not tainted EFLAGS: 00010207 EIP is at render_sigset_t+0x17/0x7c eax: 00000010 ebx: ed3d909f ecx: ed3d9097 edx: eda60100 esi: 0000003c edi: 00000010 ebp: ed49bf04 esp: ed49bef8 ds: 007b es: 007b ss: 0068 Process ps (pid: 3339, threadinfo=ed49a000 task=ee32cd20) Stack: ed3d907d 00000002 ed3d909f 00000006 c016d85f 00000010 ed3d909f ed3d9097 c0233f38 eda606bc ed3d9086 ed3d907e c0233f2f ee1d7b70 eda60100 eeb4d0c0 ed3d9000 ee1d7b70 00000000 ed3d9000 000000d0 eeb4d0c0 c01300f6 c17119e8 Call Trace: [<c016d85f>] proc_pid_status+0x26f/0x334 [<c01300f6>] __get_free_pages+0x4e/0x54 [<c016b517>] proc_info_read+0x53/0x130 [<c0145295>] vfs_read+0xa5/0x128 [<c0145522>] sys_read+0x2a/0x3c [<c0108b3f>] syscall_call+0x7/0xb Code: 0f a3 37 19 d2 b8 01 00 00 00 31 c9 85 d2 0f 45 c8 8d 56 01 <1>Unable to handle kernel NULL pointer dereference at virtual address 00000014 printing eip: c0118b13 *pde = 2d512001 *pte = 00000000 Oops: 0000 CPU: 2 EIP: 0060:[<c0118b13>] Not tainted EFLAGS: 00010207 EIP is at render_sigset_t+0x17/0x7c eax: 00000010 ebx: ee34b0a0 ecx: ee34b098 edx: ed3ea7a0 esi: 0000003c edi: 00000010 ebp: ed54ff04 esp: ed54fef8 ds: 007b es: 007b ss: 0068 Process ps (pid: 14710, threadinfo=ed54e000 task=ee430d40) Stack: ee34b07e 00000002 ee34b0a0 00000006 c016d85f 00000010 ee34b0a0 ee34b098 c0233f38 ed3ead5c ee34b087 ee34b07f c0233f2f ee4960d0 ed3ea7a0 eec66ca0 ee34b000 ee4960d0 00000000 ee34b000 000000d0 eec66ca0 c01300f6 c17383b8 Call Trace: [<c016d85f>] proc_pid_status+0x26f/0x334 [<c01300f6>] __get_free_pages+0x4e/0x54 [<c016b517>] proc_info_read+0x53/0x130 [<c0145295>] vfs_read+0xa5/0x128 [<c0145522>] sys_read+0x2a/0x3c [<c0108b3f>] syscall_call+0x7/0xb Code: 0f a3 37 19 d2 b8 01 00 00 00 31 c9 85 d2 0f 45 c8 8d 56 01 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 4:00 ` Martin J. Bligh @ 2003-02-16 13:05 ` Anton Blanchard 2003-02-16 16:39 ` Martin J. Bligh 2003-02-16 18:07 ` Fw: 2.5.61 oops running SDET Linus Torvalds 1 sibling, 1 reply; 41+ messages in thread From: Anton Blanchard @ 2003-02-16 13:05 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Linus Torvalds, Andrew Morton, Kernel Mailing List > OK, I did the following, which is what I think you wanted, plus Zwane's > observation that task_state acquires the task_struct lock (we're the only > caller, so I just removed it), but I still get the same panic and this time > the box hung. No doubt I've just done something stupid in the patch ... > (task_state takes the tasklist_lock ... is that safe to do inside task_lock?) It looks like you now try and take the tasklist_lock inside a task_lock() region, but task_lock says no-can-do: /* Protects ->fs, ->files, ->mm, and synchronises with wait4(). Nests * inside tasklist_lock */ static inline void task_lock(struct task_struct *p) { ... Anton ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 13:05 ` Anton Blanchard @ 2003-02-16 16:39 ` Martin J. Bligh 2003-02-16 18:21 ` Linus Torvalds 0 siblings, 1 reply; 41+ messages in thread From: Martin J. Bligh @ 2003-02-16 16:39 UTC (permalink / raw) To: Anton Blanchard; +Cc: Linus Torvalds, Andrew Morton, Kernel Mailing List >> OK, I did the following, which is what I think you wanted, plus Zwane's >> observation that task_state acquires the task_struct lock (we're the only >> caller, so I just removed it), but I still get the same panic and this time >> the box hung. No doubt I've just done something stupid in the patch ... >> (task_state takes the tasklist_lock ... is that safe to do inside task_lock?) > > It looks like you now try and take the tasklist_lock inside a task_lock() > region, but task_lock says no-can-do: > > /* Protects ->fs, ->files, ->mm, and synchronises with wait4(). Nests > * inside tasklist_lock */ > static inline void task_lock(struct task_struct *p) > { > ... Right, that's what I expected, and that may well explain the hang, but I don't see how that explains the oops still happening. What exactly is tasklist_lock protecting here anyway? read_lock(&tasklist_lock); buffer += sprintf(buffer, "State:\t%s\n" "Tgid:\t%d\n" "Pid:\t%d\n" "PPid:\t%d\n" "TracerPid:\t%d\n" "Uid:\t%d\t%d\t%d\t%d\n" "Gid:\t%d\t%d\t%d\t%d\n", get_task_state(p), p->tgid, p->pid, p->pid ? p->real_parent->pid : 0, p->pid && p->ptrace ? p->parent->pid : 0, p->uid, p->euid, p->suid, p->fsuid, p->gid, p->egid, p->sgid, p->fsgid); read_unlock(&tasklist_lock); Is it these two accesses: p->real_parent->pid ? p->parent->pid ? Don't see what I can do for this apart from to invert the ordering and take tasklist_lock around the whole function, and nest task_lock inside that, or I suppose I could take the task_lock for each of the parents? I seem to recall Linus reminding people recently that it was only the lock acquisition order that was important, not release ... does something like the following look OK? diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet2/fs/proc/array.c --- virgin/fs/proc/array.c Sat Feb 15 16:11:45 2003 +++ sdet2/fs/proc/array.c Sun Feb 16 08:37:45 2003 @@ -147,11 +147,11 @@ static inline const char * get_task_stat return *p; } +/* Call me with the tasklist_lock and task_lock for p held already */ static inline char * task_state(struct task_struct *p, char *buffer) { int g; - read_lock(&tasklist_lock); buffer += sprintf(buffer, "State:\t%s\n" "Tgid:\t%d\n" @@ -165,13 +165,10 @@ static inline char * task_state(struct t p->pid && p->ptrace ? p->parent->pid : 0, p->uid, p->euid, p->suid, p->fsuid, p->gid, p->egid, p->sgid, p->fsgid); - read_unlock(&tasklist_lock); - task_lock(p); buffer += sprintf(buffer, "FDSize:\t%d\n" "Groups:\t", p->files ? p->files->max_fds : 0); - task_unlock(p); for (g = 0; g < p->ngroups; g++) buffer += sprintf(buffer, "%d ", p->groups[g]); @@ -243,20 +240,22 @@ extern char *task_mem(struct mm_struct * int proc_pid_status(struct task_struct *task, char * buffer) { char * orig = buffer; - struct mm_struct *mm = get_task_mm(task); + read_lock(&tasklist_lock); + task_lock(task); buffer = task_name(task, buffer); buffer = task_state(task, buffer); + read_unlock(&tasklist_lock); - if (mm) { - buffer = task_mem(mm, buffer); - mmput(mm); - } - buffer = task_sig(task, buffer); + if (task->mm) + buffer = task_mem(task->mm, buffer); + if (task->sighand) + buffer = task_sig(task, buffer); buffer = task_cap(task, buffer); #if defined(CONFIG_ARCH_S390) buffer = task_show_regs(task, buffer); #endif + task_unlock(task); return buffer - orig; } ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 16:39 ` Martin J. Bligh @ 2003-02-16 18:21 ` Linus Torvalds 2003-02-16 19:06 ` Martin J. Bligh 0 siblings, 1 reply; 41+ messages in thread From: Linus Torvalds @ 2003-02-16 18:21 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List On Sun, 16 Feb 2003, Martin J. Bligh wrote: > > Right, that's what I expected, and that may well explain the hang, but I > don't see how that explains the oops still happening. What exactly is > tasklist_lock protecting here anyway? > > read_lock(&tasklist_lock); > buffer += sprintf(buffer, > "State:\t%s\n" > "Tgid:\t%d\n" > "Pid:\t%d\n" > "PPid:\t%d\n" > "TracerPid:\t%d\n" > "Uid:\t%d\t%d\t%d\t%d\n" > "Gid:\t%d\t%d\t%d\t%d\n", > get_task_state(p), p->tgid, > p->pid, p->pid ? p->real_parent->pid : 0, > p->pid && p->ptrace ? p->parent->pid : 0, > p->uid, p->euid, p->suid, p->fsuid, > p->gid, p->egid, p->sgid, p->fsgid); > read_unlock(&tasklist_lock); > > Is it these two accesses: > > p->real_parent->pid ? > p->parent->pid ? Yup. > Don't see what I can do for this apart from to invert the ordering and take > tasklist_lock around the whole function, and nest task_lock inside that, or > I suppose I could take the task_lock for each of the parents? I seem to > recall Linus reminding people recently that it was only the lock > acquisition order that was important, not release ... does something like > the following look OK? This patch looks like it should certainly fix the problem, but that is still some god-awful ugly overkill in locking. I'd rather make the rule be that you have to take the task lock before modifying things like the parent pointers (and all the other fundamntal pointers), since that's already the rule for most of it anyway. And then the tasklist lock would go away _entirely_ from /proc (except for task lookup in ->readdir/->lookup, of course, where it is fundamentally needed and proper - and will probably some day be replaced by RCU, I suspect). Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 18:21 ` Linus Torvalds @ 2003-02-16 19:06 ` Martin J. Bligh 2003-02-16 19:17 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 41+ messages in thread From: Martin J. Bligh @ 2003-02-16 19:06 UTC (permalink / raw) To: Linus Torvalds Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo, Manfred Spraul >> Don't see what I can do for this apart from to invert the ordering and take >> tasklist_lock around the whole function, and nest task_lock inside that, or >> I suppose I could take the task_lock for each of the parents? I seem to >> recall Linus reminding people recently that it was only the lock >> acquisition order that was important, not release ... does something like >> the following look OK? > > This patch looks like it should certainly fix the problem, but that is > still some god-awful ugly overkill in locking. > > I'd rather make the rule be that you have to take the task lock before > modifying things like the parent pointers (and all the other fundamntal > pointers), since that's already the rule for most of it anyway. > > And then the tasklist lock would go away _entirely_ from /proc (except for > task lookup in ->readdir/->lookup, of course, where it is fundamentally > needed and proper - and will probably some day be replaced by RCU, I > suspect). Well, I did the stupid safe thing, and it hangs the box once we get up to a load of 32 with SDET. Below is what I did, the only other issue I can see in here is that task_mem takes mm->mmap_sem which is now nested inside the task_lock inside tasklist_lock ... but I can't see anywhere that's a problem from a quick search diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet2/fs/proc/array.c --- virgin/fs/proc/array.c Sat Feb 15 16:11:45 2003 +++ sdet2/fs/proc/array.c Sun Feb 16 09:59:24 2003 @@ -147,11 +147,11 @@ static inline const char * get_task_stat return *p; } +/* Call me with the tasklist_lock and task_lock for p held already */ static inline char * task_state(struct task_struct *p, char *buffer) { int g; - read_lock(&tasklist_lock); buffer += sprintf(buffer, "State:\t%s\n" "Tgid:\t%d\n" @@ -165,13 +165,10 @@ static inline char * task_state(struct t p->pid && p->ptrace ? p->parent->pid : 0, p->uid, p->euid, p->suid, p->fsuid, p->gid, p->egid, p->sgid, p->fsgid); - read_unlock(&tasklist_lock); - task_lock(p); buffer += sprintf(buffer, "FDSize:\t%d\n" "Groups:\t", p->files ? p->files->max_fds : 0); - task_unlock(p); for (g = 0; g < p->ngroups; g++) buffer += sprintf(buffer, "%d ", p->groups[g]); @@ -243,20 +240,22 @@ extern char *task_mem(struct mm_struct * int proc_pid_status(struct task_struct *task, char * buffer) { char * orig = buffer; - struct mm_struct *mm = get_task_mm(task); + read_lock(&tasklist_lock); + task_lock(task); buffer = task_name(task, buffer); buffer = task_state(task, buffer); - if (mm) { - buffer = task_mem(mm, buffer); - mmput(mm); - } - buffer = task_sig(task, buffer); + if (task->mm) + buffer = task_mem(task->mm, buffer); + if (task->sighand) + buffer = task_sig(task, buffer); buffer = task_cap(task, buffer); #if defined(CONFIG_ARCH_S390) buffer = task_show_regs(task, buffer); #endif + task_unlock(task); + read_unlock(&tasklist_lock); return buffer - orig; } ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 19:06 ` Martin J. Bligh @ 2003-02-16 19:17 ` Linus Torvalds 2003-02-16 21:15 ` Martin J. Bligh 2003-02-16 19:18 ` Manfred Spraul 2003-02-16 19:19 ` more signal locking bugs? Martin J. Bligh 2 siblings, 1 reply; 41+ messages in thread From: Linus Torvalds @ 2003-02-16 19:17 UTC (permalink / raw) To: Martin J. Bligh Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo, Manfred Spraul On Sun, 16 Feb 2003, Martin J. Bligh wrote: > > Well, I did the stupid safe thing, and it hangs the box once we get up to > a load of 32 with SDET. Below is what I did, the only other issue I can > see in here is that task_mem takes mm->mmap_sem which is now nested inside > the task_lock inside tasklist_lock ... but I can't see anywhere that's a > problem from a quick search Ho - you can _never_ nest a semaphore inside a spinlock - if the semaphore sleeps, the spinlock will cause a lockup regardless of any reverse nesting issues.. So I guess the old "get_task_mm()" code is requried anyway. Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 19:17 ` Linus Torvalds @ 2003-02-16 21:15 ` Martin J. Bligh 2003-02-16 21:21 ` Manfred Spraul 0 siblings, 1 reply; 41+ messages in thread From: Martin J. Bligh @ 2003-02-16 21:15 UTC (permalink / raw) To: Linus Torvalds Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo, Manfred Spraul >> Well, I did the stupid safe thing, and it hangs the box once we get up to >> a load of 32 with SDET. Below is what I did, the only other issue I can >> see in here is that task_mem takes mm->mmap_sem which is now nested inside >> the task_lock inside tasklist_lock ... but I can't see anywhere that's a >> problem from a quick search > > Ho - you can _never_ nest a semaphore inside a spinlock - if the semaphore > sleeps, the spinlock will cause a lockup regardless of any reverse nesting > issues.. So I guess the old "get_task_mm()" code is requried anyway. True ... staring at unfamiliar code made me forget a few things ;-) OK, the below patch works, no oopses, no hangs. If this is acceptable (even if it's not the final solution), could you apply it? Seems to be better than the current situation, at any rate. ----------------------------------------------------- We can encounter a race condition where task->sighand can be NULL in proc_pid_status, resulting in an offset NULL pointer dereference in render_sigset_t. This can be easily demonstrated running SDET with a load of 32 to 64 on a large machine (16x NUMA-Q in this case). This patch makes us take task_lock around the sighand deferences, and fixes the oops in testing on the same machine. Thanks to Linus, Andrew, Manfred and Zwane for lots of guidance ;-) diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet3/fs/proc/array.c --- virgin/fs/proc/array.c Sat Feb 15 16:11:45 2003 +++ sdet3/fs/proc/array.c Sun Feb 16 11:44:24 2003 @@ -252,8 +252,11 @@ int proc_pid_status(struct task_struct * buffer = task_mem(mm, buffer); mmput(mm); } - buffer = task_sig(task, buffer); + task_lock(task); + if (task->sighand) + buffer = task_sig(task, buffer); buffer = task_cap(task, buffer); + task_unlock(task); #if defined(CONFIG_ARCH_S390) buffer = task_show_regs(task, buffer); #endif ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 21:15 ` Martin J. Bligh @ 2003-02-16 21:21 ` Manfred Spraul 2003-02-16 22:34 ` Linus Torvalds 0 siblings, 1 reply; 41+ messages in thread From: Manfred Spraul @ 2003-02-16 21:21 UTC (permalink / raw) To: Martin J. Bligh Cc: Linus Torvalds, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo Martin J. Bligh wrote: >diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet3/fs/proc/array.c >--- virgin/fs/proc/array.c Sat Feb 15 16:11:45 2003 >+++ sdet3/fs/proc/array.c Sun Feb 16 11:44:24 2003 >@@ -252,8 +252,11 @@ int proc_pid_status(struct task_struct * > buffer = task_mem(mm, buffer); > mmput(mm); > } >- buffer = task_sig(task, buffer); >+ task_lock(task); >+ if (task->sighand) >+ buffer = task_sig(task, buffer); > buffer = task_cap(task, buffer); >+ task_unlock(task); > #if defined(CONFIG_ARCH_S390) > buffer = task_show_regs(task, buffer); > #endif > > I think it's needed for 2.4, too. -- Manfred ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 21:21 ` Manfred Spraul @ 2003-02-16 22:34 ` Linus Torvalds 2003-02-16 23:08 ` Martin J. Bligh 0 siblings, 1 reply; 41+ messages in thread From: Linus Torvalds @ 2003-02-16 22:34 UTC (permalink / raw) To: Manfred Spraul Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo On Sun, 16 Feb 2003, Manfred Spraul wrote: > Martin J. Bligh wrote: > > >diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet3/fs/proc/array.c > >--- virgin/fs/proc/array.c Sat Feb 15 16:11:45 2003 > >+++ sdet3/fs/proc/array.c Sun Feb 16 11:44:24 2003 > >@@ -252,8 +252,11 @@ int proc_pid_status(struct task_struct * > > buffer = task_mem(mm, buffer); > > mmput(mm); > > } > >- buffer = task_sig(task, buffer); > >+ task_lock(task); > >+ if (task->sighand) > >+ buffer = task_sig(task, buffer); > > buffer = task_cap(task, buffer); > >+ task_unlock(task); > > #if defined(CONFIG_ARCH_S390) > > buffer = task_show_regs(task, buffer); > > #endif > > > > > I think it's needed for 2.4, too. It's not wrong, but it shouldn't help. Simply because "task_lock()" isn't even relevant to "task->sighand" as it stands now. It will be cleared without holding the task lock, as far as I can see. So I suspect it fixes things for Martin only because it changes timings enough not to hit the race. Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 22:34 ` Linus Torvalds @ 2003-02-16 23:08 ` Martin J. Bligh 2003-02-16 23:32 ` Linus Torvalds 0 siblings, 1 reply; 41+ messages in thread From: Martin J. Bligh @ 2003-02-16 23:08 UTC (permalink / raw) To: Linus Torvalds, Manfred Spraul Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo >> > diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet3/fs/proc/array.c >> > --- virgin/fs/proc/array.c Sat Feb 15 16:11:45 2003 >> > +++ sdet3/fs/proc/array.c Sun Feb 16 11:44:24 2003 >> > @@ -252,8 +252,11 @@ int proc_pid_status(struct task_struct * >> > buffer = task_mem(mm, buffer); >> > mmput(mm); >> > } >> > - buffer = task_sig(task, buffer); >> > + task_lock(task); >> > + if (task->sighand) >> > + buffer = task_sig(task, buffer); >> > buffer = task_cap(task, buffer); >> > + task_unlock(task); >> > # if defined(CONFIG_ARCH_S390) >> > buffer = task_show_regs(task, buffer); >> > # endif >> > >> > >> I think it's needed for 2.4, too. > > It's not wrong, but it shouldn't help. Simply because "task_lock()" > isn't even relevant to "task->sighand" as it stands now. It will be > cleared without holding the task lock, as far as I can see. > > So I suspect it fixes things for Martin only because it changes timings > enough not to hit the race. Ah, fair enough ... it's probably the if, rather than the task_lock. So what does protect sighand? tasklist_lock? It doesn't seem to as people do things like: spin_unlock_irq(¤t->sighand->siglock); all the time ... so is it just protected by good faith and the direction of the wind? M. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 23:08 ` Martin J. Bligh @ 2003-02-16 23:32 ` Linus Torvalds 0 siblings, 0 replies; 41+ messages in thread From: Linus Torvalds @ 2003-02-16 23:32 UTC (permalink / raw) To: Martin J. Bligh Cc: Manfred Spraul, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo, Ingo Molnar On Sun, 16 Feb 2003, Martin J. Bligh wrote: > > So what does protect sighand? tasklist_lock? Yup. At least right now. And we do tend to hold tasklist_lock in most cases simply by virtue of having to get it anyway in order to search for the process. > It doesn't seem to > as people do things like: > > spin_unlock_irq(¤t->sighand->siglock); > > all the time ... so is it just protected by good faith and the direction > of the wind? Agreed. That, and the fact that most of the time the stuff _is_ there: obviously any time the process looks at its own signals it will always be there. So we have two cases: - "normal" signal sending which holds tasklist_lock to find the target. - signals to the process itself, which is usually safe, since we know we're there (the exception would be if/when taking a fault in the exit path, but even that might be ok since signals are torn down not by exit itself but by "release_task") So the signal path may actually be ok. Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 19:06 ` Martin J. Bligh 2003-02-16 19:17 ` Linus Torvalds @ 2003-02-16 19:18 ` Manfred Spraul 2003-02-16 19:19 ` more signal locking bugs? Martin J. Bligh 2 siblings, 0 replies; 41+ messages in thread From: Manfred Spraul @ 2003-02-16 19:18 UTC (permalink / raw) To: Martin J. Bligh Cc: Linus Torvalds, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo Martin J. Bligh wrote: >Well, I did the stupid safe thing, and it hangs the box once we get up to >a load of 32 with SDET. Below is what I did, the only other issue I can >see in here is that task_mem takes mm->mmap_sem which is now nested inside >the task_lock inside tasklist_lock ... but I can't see anywhere that's a >problem from a quick search > > task_lock is actually &spin_lock(tsk->alloc_lock); mmap_sem is a semaphore. Nesting means deadlock. -- Manfred ^ permalink raw reply [flat|nested] 41+ messages in thread
* more signal locking bugs? 2003-02-16 19:06 ` Martin J. Bligh 2003-02-16 19:17 ` Linus Torvalds 2003-02-16 19:18 ` Manfred Spraul @ 2003-02-16 19:19 ` Martin J. Bligh 2003-02-16 19:24 ` Linus Torvalds 2 siblings, 1 reply; 41+ messages in thread From: Martin J. Bligh @ 2003-02-16 19:19 UTC (permalink / raw) To: Linus Torvalds Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo, Manfred Spraul task_lock nests *inside* tasklist_lock but ... : __do_SAK: task_lock ... send_sig (SIGKILL, ...) send_sig_info (SIGKILL, ...) if (T(sig, SIG_KERNEL_BROADCAST_MASK))) read_lock(&tasklist_lock); ... read_unlock(&tasklist_lock); ... task_unlock #define SIG_KERNEL_BROADCAST_MASK (\ M(SIGHUP) | M(SIGINT) | M(SIGQUIT) | M(SIGILL) | \ M(SIGTRAP) | M(SIGABRT) | M(SIGBUS) | M(SIGFPE) | \ M(SIGKILL) | M(SIGUSR1) | M(SIGSEGV) | M(SIGUSR2) | \ M(SIGPIPE) | M(SIGALRM) | M(SIGTERM) | M(SIGXCPU) | \ M(SIGXFSZ) | M(SIGVTALRM) | M(SIGPROF) | M(SIGPOLL) | \ M(SIGSYS) | M_SIGSTKFLT | M(SIGPWR) | M(SIGCONT) | \ M(SIGSTOP) | M(SIGTSTP) | M(SIGTTIN) | M(SIGTTOU) | \ M_SIGEMT ) ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: more signal locking bugs? 2003-02-16 19:19 ` more signal locking bugs? Martin J. Bligh @ 2003-02-16 19:24 ` Linus Torvalds 2003-02-16 19:37 ` Manfred Spraul 0 siblings, 1 reply; 41+ messages in thread From: Linus Torvalds @ 2003-02-16 19:24 UTC (permalink / raw) To: Martin J. Bligh Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo, Manfred Spraul On Sun, 16 Feb 2003, Martin J. Bligh wrote: > > task_lock nests *inside* tasklist_lock but ... : Yeah, this is a problem with any signal, not just SAK. In fact, in general it is always _wrong_ to nest a non-interrupt-safe lock inside an interrupt-safe spinlock, because then you can never take the non-interrupt-safe one on its own (because an interrupt may come in and take the interrupt-safe lock, at which point that CPU has now violated ordering). And if you always nest the interrupt-unsafee one inside the interrupt- safe one, you might as well not have the interrupt-unsafe one in the first place, since the outer lock _always_ protects the code in question anyway. Which implies that either - the task lock should be _outside_ the tasklist_lock or - the task lock should be made interrupt-safe. If we make the tasklock one interrupt-safe, that should fix the signal issue, and we can use the tasklock to protect "task->signal" and "task->sighand". In short, everything really seems to be pointing that way: the current task lock simply _is_ broken, and has apparently always been broken (but the ABBA deadlock is just extremely rare in practice, since you have to get an interrupt at just the right point on one CPU, while you have the AB case on another). Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: more signal locking bugs? 2003-02-16 19:24 ` Linus Torvalds @ 2003-02-16 19:37 ` Manfred Spraul 2003-02-16 19:42 ` Linus Torvalds 0 siblings, 1 reply; 41+ messages in thread From: Manfred Spraul @ 2003-02-16 19:37 UTC (permalink / raw) To: Linus Torvalds Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo Linus Torvalds wrote: >In short, everything really seems to be pointing that way: the current >task lock simply _is_ broken, and has apparently always been broken (but >the ABBA deadlock is just extremely rare in practice, since you have to >get an interrupt at just the right point on one CPU, while you have the AB >case on another).\ > ABBA is not a deadlock, because linux read_locks permit recursive calls. read_lock(tasklist_lock); task_lock(tsk); read_lock(tasklist_lock); Does not deadlock, nor any other ordering. The tasklist_lock is never taken for write from bh or irq context. -- Manfred ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: more signal locking bugs? 2003-02-16 19:37 ` Manfred Spraul @ 2003-02-16 19:42 ` Linus Torvalds 2003-02-16 20:01 ` Linus Torvalds 2003-02-16 20:07 ` Manfred Spraul 0 siblings, 2 replies; 41+ messages in thread From: Linus Torvalds @ 2003-02-16 19:42 UTC (permalink / raw) To: Manfred Spraul Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo On Sun, 16 Feb 2003, Manfred Spraul wrote: > > ABBA is not a deadlock, because linux read_locks permit recursive calls. > > read_lock(tasklist_lock); > task_lock(tsk); > read_lock(tasklist_lock); > > Does not deadlock, nor any other ordering. > > The tasklist_lock is never taken for write from bh or irq context. Doesn't matter. CPU1 - do_exit() CPU2 - non-nested task_lock() task_lock(tsk) write_lock_irq(&tasklist_lock); IRQ HAPPENS task_lock(tsk); read_lock(&tasklist_lock) BOOM, you're dead. See? ABBA _does_ happen with the task lock, it's just that the magic required to do so is fairly unlikely thanks to the added requirement for the irq to happen at just the right moment (ie there are no static code-paths that can cause it). Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: more signal locking bugs? 2003-02-16 19:42 ` Linus Torvalds @ 2003-02-16 20:01 ` Linus Torvalds 2003-02-16 20:07 ` Manfred Spraul 1 sibling, 0 replies; 41+ messages in thread From: Linus Torvalds @ 2003-02-16 20:01 UTC (permalink / raw) To: Manfred Spraul Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo On Sun, 16 Feb 2003, Linus Torvalds wrote: > > See? ABBA _does_ happen with the task lock, it's just that the magic > required to do so is fairly unlikely thanks to the added requirement for > the irq to happen at just the right moment (ie there are no static > code-paths that can cause it). Note: to clarify, this isn't in any way a new situation. As far as I can tell, this deadlock exists in 2.4.x too. And it's almost certainly pretty much impossible to trigger in practice, and as such we shouldn't need to be deeply worried about it. But it should make us think about the _design_ of locking in this region. Clearly we got it wrong, and clearly we never noticed for several years. The simple fix is to make the task-lock be IRQ-safe. That fixes it, and that's probably the right minimal solution for for 2.4.x (unless the "fix" for 2.4.x is to just ignore it since it's possible to trigger mostly in a theoretical sense). But assuming you accept that making the task-lock be irq-safe is the right solution, then making it solve the signal handling and /proc scalability issues is likely to be the right solution: since it's irq-safe, there's no real reason not to use it to protect task->{sighand | signal | parent} too. Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: more signal locking bugs? 2003-02-16 19:42 ` Linus Torvalds 2003-02-16 20:01 ` Linus Torvalds @ 2003-02-16 20:07 ` Manfred Spraul 2003-02-16 20:10 ` Linus Torvalds 2003-02-17 0:23 ` Linus Torvalds 1 sibling, 2 replies; 41+ messages in thread From: Manfred Spraul @ 2003-02-16 20:07 UTC (permalink / raw) To: Linus Torvalds Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo On Sun, 16 Feb 2003, Linus Torvalds wrote: > > On Sun, 16 Feb 2003, Manfred Spraul wrote: > > > > ABBA is not a deadlock, because linux read_locks permit recursive calls. > > > > read_lock(tasklist_lock); > > task_lock(tsk); > > read_lock(tasklist_lock); > > > > Does not deadlock, nor any other ordering. > > > > The tasklist_lock is never taken for write from bh or irq context. > > Doesn't matter. > > CPU1 - do_exit() ) > write_lock_irq(&tasklist_lock); > task_lock(tsk); > > BOOM, you're dead. > > See? ABBA _does_ happen with the task lock. > But these lines are not in 2.4 or 2.5.61. The current rule to nesting tasklist_lock and task_lock is - read_lock(&tasklist_lock) and task_lock can be mixed in any order. - write_lock_irq(&tasklist_lock) and task_lock are incompatible. What about this minimal patch? The performance critical operation is signal delivery - we should fix the synchronization between signal delivery and exec first. -- Manfred << --- 2.5/fs/proc/array.c 2003-02-15 10:29:22.000000000 +0100 +++ build-2.5/fs/proc/array.c 2003-02-16 20:58:37.000000000 +0100 @@ -208,23 +208,27 @@ { sigset_t ign, catch; - buffer += sprintf(buffer, "SigPnd:\t"); - buffer = render_sigset_t(&p->pending.signal, buffer); - *buffer++ = '\n'; - buffer += sprintf(buffer, "ShdPnd:\t"); - buffer = render_sigset_t(&p->signal->shared_pending.signal, buffer); - *buffer++ = '\n'; - buffer += sprintf(buffer, "SigBlk:\t"); - buffer = render_sigset_t(&p->blocked, buffer); - *buffer++ = '\n'; - - collect_sigign_sigcatch(p, &ign, &catch); - buffer += sprintf(buffer, "SigIgn:\t"); - buffer = render_sigset_t(&ign, buffer); - *buffer++ = '\n'; - buffer += sprintf(buffer, "SigCgt:\t"); /* Linux 2.0 uses "SigCgt" */ - buffer = render_sigset_t(&catch, buffer); - *buffer++ = '\n'; + read_lock(&tasklist_lock); + if (p->signal && p->sighand) { + buffer += sprintf(buffer, "SigPnd:\t"); + buffer = render_sigset_t(&p->pending.signal, buffer); + *buffer++ = '\n'; + buffer += sprintf(buffer, "ShdPnd:\t"); + buffer = render_sigset_t(&p->signal->shared_pending.signal, buffer); + *buffer++ = '\n'; + buffer += sprintf(buffer, "SigBlk:\t"); + buffer = render_sigset_t(&p->blocked, buffer); + *buffer++ = '\n'; + + collect_sigign_sigcatch(p, &ign, &catch); + buffer += sprintf(buffer, "SigIgn:\t"); + buffer = render_sigset_t(&ign, buffer); + *buffer++ = '\n'; + buffer += sprintf(buffer, "SigCgt:\t"); /* Linux 2.0 uses "SigCgt" */ + buffer = render_sigset_t(&catch, buffer); + *buffer++ = '\n'; + } + read_unlock(&tasklist_lock); return buffer; } << ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: more signal locking bugs? 2003-02-16 20:07 ` Manfred Spraul @ 2003-02-16 20:10 ` Linus Torvalds 2003-02-16 20:23 ` Manfred Spraul 2003-02-17 0:23 ` Linus Torvalds 1 sibling, 1 reply; 41+ messages in thread From: Linus Torvalds @ 2003-02-16 20:10 UTC (permalink / raw) To: Manfred Spraul Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo On Sun, 16 Feb 2003, Manfred Spraul wrote: > But these lines are not in 2.4 or 2.5.61. > The current rule to nesting tasklist_lock and task_lock is > - read_lock(&tasklist_lock) and task_lock can be mixed in any order. > - write_lock_irq(&tasklist_lock) and task_lock are incompatible. Oh, you're right, and you're right exactly _because_ "task->signal" isn't protected by the task lock right now. Aurgh. I had already mentally done that protection, which is why I thought we already had the bug. So never mind. 2.4.x is obviously also ok. > What about this minimal patch? The performance critical operation is > signal delivery - we should fix the synchronization between signal > delivery and exec first. The patch looks ok, although I'd also remove the locking and testing from collect_sigign_sigcatch() once it is done at a higher level. And yeah, what about signal delivery? Put back the same lock there? Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: more signal locking bugs? 2003-02-16 20:10 ` Linus Torvalds @ 2003-02-16 20:23 ` Manfred Spraul 0 siblings, 0 replies; 41+ messages in thread From: Manfred Spraul @ 2003-02-16 20:23 UTC (permalink / raw) To: Linus Torvalds Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo Linus Torvalds wrote: >>What about this minimal patch? The performance critical operation is >>signal delivery - we should fix the synchronization between signal >>delivery and exec first. >> >> > >The patch looks ok, although I'd also remove the locking and testing from >collect_sigign_sigcatch() once it is done at a higher level. > >And yeah, what about signal delivery? Put back the same lock there? > > I don't know. Taking read_lock(&tasklist_lock) for send_specific_sig_info might hurt scalability. Ingo? If we do not want a global lock, then we have two options: - make task_lock an interrupt spinlock. - add a second spinlock to the task structure, for the signal stuff. Design question - what's worse? Memory bloat or a few additional local_irq_{en,dis}able(). I don't care - no performance critical codepaths. Additionally many task_lock()/task_unlock users could be replaced with spin_unlock_wait(&task->alloc_lock), which would not need the local_irq_disable(). -- Manfred ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: more signal locking bugs? 2003-02-16 20:07 ` Manfred Spraul 2003-02-16 20:10 ` Linus Torvalds @ 2003-02-17 0:23 ` Linus Torvalds 2003-02-17 2:05 ` Martin J. Bligh 2003-02-17 6:36 ` more signal locking bugs? Manfred Spraul 1 sibling, 2 replies; 41+ messages in thread From: Linus Torvalds @ 2003-02-17 0:23 UTC (permalink / raw) To: Manfred Spraul Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo On Sun, 16 Feb 2003, Manfred Spraul wrote: > > What about this minimal patch? The performance critical operation is > signal delivery - we should fix the synchronization between signal > delivery and exec first. Ok, I committed this alternative change, which isn't quite as minimal, but looks a lot cleaner to me. Also, looking at execve() and other paths, we do seem to have sufficient protection from the tasklist_lock that signal delivery should be fine. So despite a long and confused thread, I think in the end the only real bug was the one Martin found which should be thus fixed.. Linus --- # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1054 -> 1.1055 # include/linux/signal.h 1.8 -> 1.9 # fs/proc/array.c 1.42 -> 1.43 # kernel/sched.c 1.158 -> 1.159 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/02/16 torvalds@home.transmeta.com 1.1055 # Clean up and fix locking around signal rendering # -------------------------------------------- # diff -Nru a/fs/proc/array.c b/fs/proc/array.c --- a/fs/proc/array.c Sun Feb 16 16:22:45 2003 +++ b/fs/proc/array.c Sun Feb 16 16:22:45 2003 @@ -180,51 +180,74 @@ return buffer; } +static char * render_sigset_t(const char *header, sigset_t *set, char *buffer) +{ + int i, len; + + len = strlen(header); + memcpy(buffer, header, len); + buffer += len; + + i = _NSIG; + do { + int x = 0; + + i -= 4; + if (sigismember(set, i+1)) x |= 1; + if (sigismember(set, i+2)) x |= 2; + if (sigismember(set, i+3)) x |= 4; + if (sigismember(set, i+4)) x |= 8; + *buffer++ = (x < 10 ? '0' : 'a' - 10) + x; + } while (i >= 4); + + *buffer++ = '\n'; + *buffer = 0; + return buffer; +} + static void collect_sigign_sigcatch(struct task_struct *p, sigset_t *ign, sigset_t *catch) { struct k_sigaction *k; int i; - sigemptyset(ign); - sigemptyset(catch); + k = p->sighand->action; + for (i = 1; i <= _NSIG; ++i, ++k) { + if (k->sa.sa_handler == SIG_IGN) + sigaddset(ign, i); + else if (k->sa.sa_handler != SIG_DFL) + sigaddset(catch, i); + } +} + +static inline char * task_sig(struct task_struct *p, char *buffer) +{ + sigset_t pending, shpending, blocked, ignored, caught; + + sigemptyset(&pending); + sigemptyset(&shpending); + sigemptyset(&blocked); + sigemptyset(&ignored); + sigemptyset(&caught); + /* Gather all the data with the appropriate locks held */ read_lock(&tasklist_lock); if (p->sighand) { spin_lock_irq(&p->sighand->siglock); - k = p->sighand->action; - for (i = 1; i <= _NSIG; ++i, ++k) { - if (k->sa.sa_handler == SIG_IGN) - sigaddset(ign, i); - else if (k->sa.sa_handler != SIG_DFL) - sigaddset(catch, i); - } + pending = p->pending.signal; + shpending = p->signal->shared_pending.signal; + blocked = p->blocked; + collect_sigign_sigcatch(p, &ignored, &caught); spin_unlock_irq(&p->sighand->siglock); } read_unlock(&tasklist_lock); -} - -static inline char * task_sig(struct task_struct *p, char *buffer) -{ - sigset_t ign, catch; - buffer += sprintf(buffer, "SigPnd:\t"); - buffer = render_sigset_t(&p->pending.signal, buffer); - *buffer++ = '\n'; - buffer += sprintf(buffer, "ShdPnd:\t"); - buffer = render_sigset_t(&p->signal->shared_pending.signal, buffer); - *buffer++ = '\n'; - buffer += sprintf(buffer, "SigBlk:\t"); - buffer = render_sigset_t(&p->blocked, buffer); - *buffer++ = '\n'; - - collect_sigign_sigcatch(p, &ign, &catch); - buffer += sprintf(buffer, "SigIgn:\t"); - buffer = render_sigset_t(&ign, buffer); - *buffer++ = '\n'; - buffer += sprintf(buffer, "SigCgt:\t"); /* Linux 2.0 uses "SigCgt" */ - buffer = render_sigset_t(&catch, buffer); - *buffer++ = '\n'; + /* render them all */ + buffer = render_sigset_t("SigPnd:\t", &pending, buffer); + buffer = render_sigset_t("ShdPnd:\t", &shpending, buffer); + buffer = render_sigset_t("SigBlk:\t", &blocked, buffer); + buffer = render_sigset_t("SigIgn:\t", &ignored, buffer); + buffer = render_sigset_t("SigCgt:\t", &caught, buffer); return buffer; } diff -Nru a/include/linux/signal.h b/include/linux/signal.h --- a/include/linux/signal.h Sun Feb 16 16:22:45 2003 +++ b/include/linux/signal.h Sun Feb 16 16:22:45 2003 @@ -151,8 +151,6 @@ } } -extern char * render_sigset_t(sigset_t *set, char *buffer); - /* Some extensions for manipulating the low 32 signals in particular. */ static inline void sigaddsetmask(sigset_t *set, unsigned long mask) diff -Nru a/kernel/sched.c b/kernel/sched.c --- a/kernel/sched.c Sun Feb 16 16:22:45 2003 +++ b/kernel/sched.c Sun Feb 16 16:22:45 2003 @@ -2095,21 +2095,6 @@ } } -char * render_sigset_t(sigset_t *set, char *buffer) -{ - int i = _NSIG, x; - do { - i -= 4, x = 0; - if (sigismember(set, i+1)) x |= 1; - if (sigismember(set, i+2)) x |= 2; - if (sigismember(set, i+3)) x |= 4; - if (sigismember(set, i+4)) x |= 8; - *buffer++ = (x < 10 ? '0' : 'a' - 10) + x; - } while (i >= 4); - *buffer = 0; - return buffer; -} - void show_state(void) { task_t *g, *p; ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: more signal locking bugs? 2003-02-17 0:23 ` Linus Torvalds @ 2003-02-17 2:05 ` Martin J. Bligh 2003-02-17 2:39 ` Martin J. Bligh 2003-02-17 6:36 ` more signal locking bugs? Manfred Spraul 1 sibling, 1 reply; 41+ messages in thread From: Martin J. Bligh @ 2003-02-17 2:05 UTC (permalink / raw) To: Linus Torvalds, Manfred Spraul Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo > Ok, I committed this alternative change, which isn't quite as minimal, but > looks a lot cleaner to me. > > Also, looking at execve() and other paths, we do seem to have sufficient > protection from the tasklist_lock that signal delivery should be fine. So > despite a long and confused thread, I think in the end the only real bug > was the one Martin found which should be thus fixed.. Ran your patch ... but I get plenty of these now: Unable to handle kernel NULL pointer dereference at virtual address 00000004 printing eip: c016d5a8 *pde = 2e10f001 *pte = 00000000 Oops: 0000 CPU: 2 EIP: 0060:[<c016d5a8>] Not tainted EFLAGS: 00010202 EIP is at collect_sigign_sigcatch+0x1c/0x44 eax: ed6d72c0 ebx: ed403f50 ecx: 00000004 edx: 00000001 esi: ed403f48 edi: ed6d72c0 ebp: 00000000 esp: ed403eec ds: 007b es: 007b ss: 0068 Process ps (pid: 19988, threadinfo=ed402000 task=ed9352a0) Stack: c011d115 c02af820 c016dab8 ed6d72c0 ed403f48 ed403f50 ed6d72c0 edb99a50 ed6d72c0 eebf33c0 ee225000 c034e400 ed403f50 ed403f48 00000000 52000000 00008800 00000125 eebf33c0 eebf33e0 00000000 00000000 00000000 000000d0 Call Trace: [<c011d115>] do_exit+0x30d/0x31c [<c016dab8>] proc_pid_stat+0x110/0x324 [<c0130076>] __get_free_pages+0x4e/0x54 [<c016b497>] proc_info_read+0x53/0x130 [<c0145215>] vfs_read+0xa5/0x128 [<c01454a2>] sys_read+0x2a/0x3c [<c0108b3f>] syscall_call+0x7/0xb Code: 8b 01 83 f8 01 75 08 8d 42 ff 0f ab 06 eb 0a 85 c0 74 06 8d ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: more signal locking bugs? 2003-02-17 2:05 ` Martin J. Bligh @ 2003-02-17 2:39 ` Martin J. Bligh 2003-02-17 3:53 ` Linus Torvalds 2003-02-17 3:54 ` [PATCH] fix secondary oops in sighand locking Martin J. Bligh 0 siblings, 2 replies; 41+ messages in thread From: Martin J. Bligh @ 2003-02-17 2:39 UTC (permalink / raw) To: Linus Torvalds, Manfred Spraul Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo >> Ok, I committed this alternative change, which isn't quite as minimal, but >> looks a lot cleaner to me. >> >> Also, looking at execve() and other paths, we do seem to have sufficient >> protection from the tasklist_lock that signal delivery should be fine. So >> despite a long and confused thread, I think in the end the only real bug >> was the one Martin found which should be thus fixed.. > > Ran your patch ... but I get plenty of these now: > > Unable to handle kernel NULL pointer dereference at virtual address 00000004 > printing eip: > c016d5a8 > *pde = 2e10f001 > *pte = 00000000 > Oops: 0000 > CPU: 2 > EIP: 0060:[<c016d5a8>] Not tainted > EFLAGS: 00010202 > EIP is at collect_sigign_sigcatch+0x1c/0x44 > eax: ed6d72c0 ebx: ed403f50 ecx: 00000004 edx: 00000001 > esi: ed403f48 edi: ed6d72c0 ebp: 00000000 esp: ed403eec > ds: 007b es: 007b ss: 0068 > Process ps (pid: 19988, threadinfo=ed402000 task=ed9352a0) > Stack: c011d115 c02af820 c016dab8 ed6d72c0 ed403f48 ed403f50 ed6d72c0 edb99a50 > ed6d72c0 eebf33c0 ee225000 c034e400 ed403f50 ed403f48 00000000 52000000 > 00008800 00000125 eebf33c0 eebf33e0 00000000 00000000 00000000 000000d0 > Call Trace: > [<c011d115>] do_exit+0x30d/0x31c > [<c016dab8>] proc_pid_stat+0x110/0x324 > [<c0130076>] __get_free_pages+0x4e/0x54 > [<c016b497>] proc_info_read+0x53/0x130 > [<c0145215>] vfs_read+0xa5/0x128 > [<c01454a2>] sys_read+0x2a/0x3c > [<c0108b3f>] syscall_call+0x7/0xb > > Code: 8b 01 83 f8 01 75 08 8d 42 ff 0f ab 06 eb 0a 85 c0 74 06 8d Ah, I see what happened, I think .... the locking used to be inside collect_sigign_sigcatch, and you moved it out into task_sig ... but there were two callers of collect_sigign_sigcatch, the other one being proc_pid_stat ... which now needs to have appropriate locking added to it as well to match the change ... I'll see if I can get something together that works for that. M. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: more signal locking bugs? 2003-02-17 2:39 ` Martin J. Bligh @ 2003-02-17 3:53 ` Linus Torvalds 2003-02-17 5:07 ` Martin J. Bligh ` (2 more replies) 2003-02-17 3:54 ` [PATCH] fix secondary oops in sighand locking Martin J. Bligh 1 sibling, 3 replies; 41+ messages in thread From: Linus Torvalds @ 2003-02-17 3:53 UTC (permalink / raw) To: Martin J. Bligh Cc: Manfred Spraul, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo On Sun, 16 Feb 2003, Martin J. Bligh wrote: > > Ah, I see what happened, I think .... the locking used to be inside > collect_sigign_sigcatch, and you moved it out into task_sig ... but > there were two callers of collect_sigign_sigcatch, the other one being > proc_pid_stat Doh. This should fix it. Linus --- # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.1055 -> 1.1056 # fs/proc/array.c 1.43 -> 1.44 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 03/02/16 torvalds@home.transmeta.com 1.1056 # Do proper signal locking for the old-style /proc/stat too. # -------------------------------------------- # diff -Nru a/fs/proc/array.c b/fs/proc/array.c --- a/fs/proc/array.c Sun Feb 16 19:52:30 2003 +++ b/fs/proc/array.c Sun Feb 16 19:52:30 2003 @@ -316,7 +316,13 @@ wchan = get_wchan(task); - collect_sigign_sigcatch(task, &sigign, &sigcatch); + read_lock(&tasklist_lock); + if (task->sighand) { + spin_lock_irq(&task->sighand->siglock); + collect_sigign_sigcatch(task, &sigign, &sigcatch); + spin_lock_irq(&task->sighand->siglock); + } + read_unlock(&tasklist_lock); /* scale priority and nice values from timeslices to -20..20 */ /* to make it look like a "normal" Unix priority/nice value */ ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: more signal locking bugs? 2003-02-17 3:53 ` Linus Torvalds @ 2003-02-17 5:07 ` Martin J. Bligh 2003-02-17 6:17 ` Martin J. Bligh 2003-02-17 17:09 ` Magnus Naeslund(f) 2 siblings, 0 replies; 41+ messages in thread From: Martin J. Bligh @ 2003-02-17 5:07 UTC (permalink / raw) To: Linus Torvalds Cc: Manfred Spraul, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo >> Ah, I see what happened, I think .... the locking used to be inside >> collect_sigign_sigcatch, and you moved it out into task_sig ... but >> there were two callers of collect_sigign_sigcatch, the other one being >> proc_pid_stat > > Doh. > > This should fix it. Don't you need this bit as well? + sigemptyset(&sigign); + sigemptyset(&sigcatch); to replace this bit from your previous patch. static void collect_sigign_sigcatch(struct task_struct *p, sigset_t *ign, sigset_t *catch) { struct k_sigaction *k; int i; - sigemptyset(ign); - sigemptyset(catch); That was in the patch I sent you ... but I missed task->sighand->siglock ;-) M. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: more signal locking bugs? 2003-02-17 3:53 ` Linus Torvalds 2003-02-17 5:07 ` Martin J. Bligh @ 2003-02-17 6:17 ` Martin J. Bligh 2003-02-17 17:09 ` Magnus Naeslund(f) 2 siblings, 0 replies; 41+ messages in thread From: Martin J. Bligh @ 2003-02-17 6:17 UTC (permalink / raw) To: Linus Torvalds Cc: Manfred Spraul, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo >> Ah, I see what happened, I think .... the locking used to be inside >> collect_sigign_sigcatch, and you moved it out into task_sig ... but >> there were two callers of collect_sigign_sigcatch, the other one being >> proc_pid_stat > > Doh. > > This should fix it. Oooh, not only does SDET work now in 61, it doesn't freeze the whole box when I hit ^C any more (like it's been doing since the dawn of time). Spiffy ;-) M. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: more signal locking bugs? 2003-02-17 3:53 ` Linus Torvalds 2003-02-17 5:07 ` Martin J. Bligh 2003-02-17 6:17 ` Martin J. Bligh @ 2003-02-17 17:09 ` Magnus Naeslund(f) 2 siblings, 0 replies; 41+ messages in thread From: Magnus Naeslund(f) @ 2003-02-17 17:09 UTC (permalink / raw) To: Linus Torvalds, Martin J. Bligh Cc: Manfred Spraul, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo Linus Torvalds <torvalds@transmeta.com> wrote: > > Doh. > > This should fix it. [snip] > + spin_lock_irq(&task->sighand->siglock); > + collect_sigign_sigcatch(task, &sigign, &sigcatch); > + spin_lock_irq(&task->sighand->siglock); [snip] You take the lock twice here? Magnus ^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH] fix secondary oops in sighand locking 2003-02-17 2:39 ` Martin J. Bligh 2003-02-17 3:53 ` Linus Torvalds @ 2003-02-17 3:54 ` Martin J. Bligh 1 sibling, 0 replies; 41+ messages in thread From: Martin J. Bligh @ 2003-02-17 3:54 UTC (permalink / raw) To: Linus Torvalds, Manfred Spraul Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo >> Ran your patch ... but I get plenty of these now: >> >> Unable to handle kernel NULL pointer dereference at virtual address 00000004 >> ... > > Ah, I see what happened, I think .... the locking used to be inside > collect_sigign_sigcatch, and you moved it out into task_sig ... but > there were two callers of collect_sigign_sigcatch, the other one being > proc_pid_stat ... which now needs to have appropriate locking added to > it as well to match the change ... I'll see if I can get something > together that works for that. OK, with this small extra patch, we now survive a beating with SDET without oops or freeze. The patch adds the locking that was removed from collect_sigign_sigcatch to the second caller (proc_pid_stat), as well as task_sig. diff -urpN -X /home/fletch/.diff.exclude linus-sdet/fs/proc/array.c linus-sdet-fix/fs/proc/array.c --- linus-sdet/fs/proc/array.c Sun Feb 16 18:48:04 2003 +++ linus-sdet-fix/fs/proc/array.c Sun Feb 16 18:51:34 2003 @@ -316,7 +316,12 @@ int proc_pid_stat(struct task_struct *ta wchan = get_wchan(task); - collect_sigign_sigcatch(task, &sigign, &sigcatch); + sigemptyset(&sigign); + sigemptyset(&sigcatch); + read_lock(&tasklist_lock); + if (task->sighand) + collect_sigign_sigcatch(task, &sigign, &sigcatch); + read_unlock(&tasklist_lock); /* scale priority and nice values from timeslices to -20..20 */ /* to make it look like a "normal" Unix priority/nice value */ ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: more signal locking bugs? 2003-02-17 0:23 ` Linus Torvalds 2003-02-17 2:05 ` Martin J. Bligh @ 2003-02-17 6:36 ` Manfred Spraul 2003-02-17 19:06 ` Linus Torvalds 1 sibling, 1 reply; 41+ messages in thread From: Manfred Spraul @ 2003-02-17 6:36 UTC (permalink / raw) To: Linus Torvalds Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo Linus Torvalds wrote: >On Sun, 16 Feb 2003, Manfred Spraul wrote: > > >>What about this minimal patch? The performance critical operation is >>signal delivery - we should fix the synchronization between signal >>delivery and exec first. >> >> > >Ok, I committed this alternative change, which isn't quite as minimal, but >looks a lot cleaner to me. > >Also, looking at execve() and other paths, we do seem to have sufficient >protection from the tasklist_lock that signal delivery should be fine. So >despite a long and confused thread, I think in the end the only real bug >was the one Martin found which should be thus fixed.. > > I'm not convinced that exec is correct. app with two threads, cloned sighand and sig structures. thread one does exec(). thread two does exit(). Now we can arrive at no_thread_group in de_thread() and tsk->sig{,hand}->count == 1. >no_thread_group: > > write_lock_irq(&tasklist_lock); > spin_lock(&oldsighand->siglock); > spin_lock(&newsighand->siglock); > > if (current == oldsig->curr_target) > oldsig->curr_target = next_thread(current); > signal sender: in send_sig_info(). reads tsk->signal. blocks on tsk->sighand->siglock. > if (newsig) > current->signal = newsig; > current->sighand = newsighand; > init_sigpending(¤t->pending); > recalc_sigpending(); > > spin_unlock(&newsighand->siglock); > spin_unlock(&oldsighand->siglock); > write_unlock_irq(&tasklist_lock); > > if (newsig && atomic_dec_and_test(&oldsig->count)) > kmem_cache_free(signal_cachep, oldsig); > > if (atomic_dec_and_test(&oldsighand->count)) > kmem_cache_free(sighand_cachep, oldsighand); > > And now the signal sender continues. BOOM. sighand structure, sig structure already kfreed, etc. -- Manfred ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: more signal locking bugs? 2003-02-17 6:36 ` more signal locking bugs? Manfred Spraul @ 2003-02-17 19:06 ` Linus Torvalds 0 siblings, 0 replies; 41+ messages in thread From: Linus Torvalds @ 2003-02-17 19:06 UTC (permalink / raw) To: Manfred Spraul Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton, Kernel Mailing List, Zwane Mwaikambo On Mon, 17 Feb 2003, Manfred Spraul wrote: > > I'm not convinced that exec is correct. No, I think you're right. And I think just fixing send_sig_info() to take the tasklist lock is the right thing to do. That still leaves force_sig_info() without the tasklist lock, but since that is only used for page faults etc synchronous errors, that's fine (if we get a synchronous error in the kernel, we have bigger problems than signal locking). Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 4:00 ` Martin J. Bligh 2003-02-16 13:05 ` Anton Blanchard @ 2003-02-16 18:07 ` Linus Torvalds 2003-02-16 18:26 ` Martin J. Bligh 1 sibling, 1 reply; 41+ messages in thread From: Linus Torvalds @ 2003-02-16 18:07 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Andrew Morton, Kernel Mailing List On Sat, 15 Feb 2003, Martin J. Bligh wrote: > > OK, I did the following, which is what I think you wanted, plus Zwane's > observation that task_state acquires the task_struct lock (we're the only > caller, so I just removed it), but I still get the same panic and this time > the box hung. Yeah, a closer look shows that the exit path doesn't actually take the task lock at all around any of the signal stuff, so the lock protects "task->mm", "task->files" and "task->fs", but it does NOT protect "task->signal" or "task->sighand" (illogical, but true). Oh, damn. The core that checks for "p->sighand" takes the tasklist lock for this reason (see "collect_sigign_sigcatch()" So the choice seems to be either: - make the exit path hold the task lock over the whole exit path, not just over mm exit. - take the "tasklist_lock" over more of "task_sig()" (not just the collect_sigign_sigcatch() thing, but the "&p->signal->shared_pending" rendering too. The latter is a two-liner. The former is the "right thing" for multiple reasons. The reason I'd _like_ to see the task lock held over _all_ of the fields in the exit() path is: - right now we actually take it and drop it multiple times in exit. See __exit_files(), __exit_fs(), __exit_mm(). Which all take it just to serialize setting ot "mm/files/fs" to NULL. - task_lock is a nice local lock, no global scalability impact. So it really would be much nicer to do something like this in do_exit(): struct mm_struct *mm; struct files_struct *files; struct fs_struct *fs; struct signal_struct *signal; struct sighand_struct *sighand; task_lock(task); mm = task->mm; files = task->files; fs = task->fs; signal = task->signal; sighand = task->sighand; task->mm = NULL; task->files = NULL; task->fs = NULL; task->signal = NULL; task->sighand = NULL; task_unlock(task); .. actually do the "__exit_mm(task, mm)" etc here .. which would make things a lot more readable, and result in us taking the lock only _once_ instead of three times, and would properly protect "signal" and "sighand" so that the /proc code wouldn't need to take the heavily used "tasklist_lock" just to read the signal state for a single task. But fixing up exit to do the above would require several (trivial) calling convention changes, like changing static inline void __exit_mm(struct task_struct * tsk) { struct mm_struct *mm = tsk->mm; ... into static inline void __exit_mm(struct task_struct * tsk, struct mm_struct *mm) { ... instead and updatign the callers. Is anybody willing to do that (hopefully fairly trivial) fixup and test it, or should we go with the stupid "take the 'tasklist_lock'" approach? Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 18:07 ` Fw: 2.5.61 oops running SDET Linus Torvalds @ 2003-02-16 18:26 ` Martin J. Bligh 2003-02-16 18:36 ` Martin J. Bligh 0 siblings, 1 reply; 41+ messages in thread From: Martin J. Bligh @ 2003-02-16 18:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, Kernel Mailing List > So the choice seems to be either: > > - make the exit path hold the task lock over the whole exit path, not > just over mm exit. > > - take the "tasklist_lock" over more of "task_sig()" (not just the > collect_sigign_sigcatch() thing, but the "&p->signal->shared_pending" > rendering too. > > The latter is a two-liner. The former is the "right thing" for multiple > reasons. > > The reason I'd _like_ to see the task lock held over _all_ of the fields > in the exit() path is: > > - right now we actually take it and drop it multiple times in exit. See > __exit_files(), __exit_fs(), __exit_mm(). Which all take it just to > serialize setting ot "mm/files/fs" to NULL. > > - task_lock is a nice local lock, no global scalability impact. Don't we have to take tasklist_lock anyway for when task_state reads p->real_parent->pid and p->parent->pid? Or should we have to grab the task_lock for those too? If so, it sounds like it could get into rather horrible ordering dependencies to me. If we move exit under task_lock ... then tasklist_lock doesn't help us any more there either (presumably we're trying to stop them exiting whilst we look at them) ... I've coded up the stupid approach for now, and will check that works first. Should have results very soon. M. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 18:26 ` Martin J. Bligh @ 2003-02-16 18:36 ` Martin J. Bligh 0 siblings, 0 replies; 41+ messages in thread From: Martin J. Bligh @ 2003-02-16 18:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, Kernel Mailing List >> The reason I'd _like_ to see the task lock held over _all_ of the fields >> in the exit() path is: >> >> - right now we actually take it and drop it multiple times in exit. See >> __exit_files(), __exit_fs(), __exit_mm(). Which all take it just to >> serialize setting ot "mm/files/fs" to NULL. >> >> - task_lock is a nice local lock, no global scalability impact. > > Don't we have to take tasklist_lock anyway for when task_state reads > p->real_parent->pid and p->parent->pid? Or should we have to grab > the task_lock for those too? If so, it sounds like it could get into > rather horrible ordering dependencies to me. > > If we move exit under task_lock ... then tasklist_lock doesn't help us > any more there either (presumably we're trying to stop them exiting > whilst we look at them) ... I've coded up the stupid approach for now, > and will check that works first. Should have results very soon. Hmmmm ... I guess we could keep real_parent->pid and parent->pid inside the child's task_struct ... I can't see those change unless we're changing real_parent / parent task pointers anyway. Not sure how much that idea will make people vomit, but it would seem to make it easier to get rid of the global locking, and it's probably nicer than trying to order the aquistion of other people's task locks ... M. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 2:09 ` Martin J. Bligh 2003-02-16 2:27 ` Linus Torvalds @ 2003-02-16 2:48 ` Zwane Mwaikambo 1 sibling, 0 replies; 41+ messages in thread From: Zwane Mwaikambo @ 2003-02-16 2:48 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Linus Torvalds, Andrew Morton, Kernel Mailing List On Sat, 15 Feb 2003, Martin J. Bligh wrote: > - struct mm_struct *mm = get_task_mm(task); > + struct mm_struct *mm; > > + task_lock(task); > + mm = __get_task_mm(task); > buffer = task_name(task, buffer); > buffer = task_state(task, buffer); task_state acquires the task_struct lock. Zwane -- function.linuxpower.ca ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
@ 2003-02-16 17:41 Manfred Spraul
2003-02-16 18:15 ` Linus Torvalds
0 siblings, 1 reply; 41+ messages in thread
From: Manfred Spraul @ 2003-02-16 17:41 UTC (permalink / raw)
To: Martin J. Bligh
Cc: linux-kernel, Linus Torvalds, Anton Blanchard, Zwane Mwaikambo
>
>
>OK, I did the following, which is what I think you wanted, plus Zwane's
>observation that task_state acquires the task_struct lock (we're the only
>caller, so I just removed it), but I still get the same panic and this time
>the box hung.
>
AFAICS both exec and exit rely on write_lock_irq(tasklist_lock) for
synchronization of changes to tsk->sig{,hand}.
I bet an __exit_sighand occurs in the middle of proc_pid_status() -
after the NULL test, before the access in task_sig.
Martin, could you check what happens if you do not release the
tasklist_lock until after the task_sig()?
--
Manfred
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: Fw: 2.5.61 oops running SDET 2003-02-16 17:41 Manfred Spraul @ 2003-02-16 18:15 ` Linus Torvalds 2003-02-16 18:45 ` Manfred Spraul 0 siblings, 1 reply; 41+ messages in thread From: Linus Torvalds @ 2003-02-16 18:15 UTC (permalink / raw) To: Manfred Spraul Cc: Martin J. Bligh, linux-kernel, Anton Blanchard, Zwane Mwaikambo On Sun, 16 Feb 2003, Manfred Spraul wrote: > > AFAICS both exec and exit rely on write_lock_irq(tasklist_lock) for > synchronization of changes to tsk->sig{,hand}. Yeah, as I sent out in my last email this does seem to be true right now, but it's really not correct. It's disgusting that we use such a fundamental global lock to protect something so trivially local to the one process, where the local per-process lock really should be more than enough. Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 18:15 ` Linus Torvalds @ 2003-02-16 18:45 ` Manfred Spraul 2003-02-16 18:56 ` Linus Torvalds 0 siblings, 1 reply; 41+ messages in thread From: Manfred Spraul @ 2003-02-16 18:45 UTC (permalink / raw) To: Linus Torvalds Cc: Martin J. Bligh, linux-kernel, Anton Blanchard, Zwane Mwaikambo Linus Torvalds wrote: >On Sun, 16 Feb 2003, Manfred Spraul wrote: > > >>AFAICS both exec and exit rely on write_lock_irq(tasklist_lock) for >>synchronization of changes to tsk->sig{,hand}. >> >> > >Yeah, as I sent out in my last email this does seem to be true right now, >but it's really not correct. It's disgusting that we use such a >fundamental global lock to protect something so trivially local to the one >process, where the local per-process lock really should be more than >enough. > The difference between the tasklist_lock and task_lock is that task_lock is not an interrupt lock. Think about signal delivery during exec. Do you want to replace tasklist_lock with task_lock in exit_sighand() and during exec, or do you want to add task_lock to tasklist_lock? Hmm. Someone removed the read_lock(tasklist_lock) around send_specific_sig_info() - which lock synchronizes exec and signal delivery? -- Manfred ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Fw: 2.5.61 oops running SDET 2003-02-16 18:45 ` Manfred Spraul @ 2003-02-16 18:56 ` Linus Torvalds 0 siblings, 0 replies; 41+ messages in thread From: Linus Torvalds @ 2003-02-16 18:56 UTC (permalink / raw) To: Manfred Spraul Cc: Martin J. Bligh, linux-kernel, Anton Blanchard, Zwane Mwaikambo, Ingo Molnar On Sun, 16 Feb 2003, Manfred Spraul wrote: > > Do you want to replace tasklist_lock with task_lock in exit_sighand() > and during exec, or do you want to add task_lock to tasklist_lock? I'd rather replace it, but your point about irq-safeness is a good one. Clearly signal sending cannot take the task lock if we want to keep it irq-unsafe. So the signal stuff probably does need to be protected by _both_ the tasklist lock and the provate task lock. > Someone removed the read_lock(tasklist_lock) around > send_specific_sig_info() - which lock synchronizes exec and signal delivery? Looks like none. And I think it was Ingo who removed it. Ingo? Linus ^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2003-02-17 19:00 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20030215172407.1fdd41fd.akpm@digeo.com>
2003-02-16 1:35 ` Fw: 2.5.61 oops running SDET Linus Torvalds
2003-02-16 2:09 ` Martin J. Bligh
2003-02-16 2:27 ` Linus Torvalds
2003-02-16 4:00 ` Martin J. Bligh
2003-02-16 13:05 ` Anton Blanchard
2003-02-16 16:39 ` Martin J. Bligh
2003-02-16 18:21 ` Linus Torvalds
2003-02-16 19:06 ` Martin J. Bligh
2003-02-16 19:17 ` Linus Torvalds
2003-02-16 21:15 ` Martin J. Bligh
2003-02-16 21:21 ` Manfred Spraul
2003-02-16 22:34 ` Linus Torvalds
2003-02-16 23:08 ` Martin J. Bligh
2003-02-16 23:32 ` Linus Torvalds
2003-02-16 19:18 ` Manfred Spraul
2003-02-16 19:19 ` more signal locking bugs? Martin J. Bligh
2003-02-16 19:24 ` Linus Torvalds
2003-02-16 19:37 ` Manfred Spraul
2003-02-16 19:42 ` Linus Torvalds
2003-02-16 20:01 ` Linus Torvalds
2003-02-16 20:07 ` Manfred Spraul
2003-02-16 20:10 ` Linus Torvalds
2003-02-16 20:23 ` Manfred Spraul
2003-02-17 0:23 ` Linus Torvalds
2003-02-17 2:05 ` Martin J. Bligh
2003-02-17 2:39 ` Martin J. Bligh
2003-02-17 3:53 ` Linus Torvalds
2003-02-17 5:07 ` Martin J. Bligh
2003-02-17 6:17 ` Martin J. Bligh
2003-02-17 17:09 ` Magnus Naeslund(f)
2003-02-17 3:54 ` [PATCH] fix secondary oops in sighand locking Martin J. Bligh
2003-02-17 6:36 ` more signal locking bugs? Manfred Spraul
2003-02-17 19:06 ` Linus Torvalds
2003-02-16 18:07 ` Fw: 2.5.61 oops running SDET Linus Torvalds
2003-02-16 18:26 ` Martin J. Bligh
2003-02-16 18:36 ` Martin J. Bligh
2003-02-16 2:48 ` Zwane Mwaikambo
2003-02-16 17:41 Manfred Spraul
2003-02-16 18:15 ` Linus Torvalds
2003-02-16 18:45 ` Manfred Spraul
2003-02-16 18:56 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox