* Re: Fw: 2.5.61 oops running SDET
[not found] <20030215172407.1fdd41fd.akpm@digeo.com>
@ 2003-02-16 1:35 ` Linus Torvalds
2003-02-16 2:09 ` Martin J. Bligh
0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2003-02-16 1:35 UTC (permalink / raw)
To: Andrew Morton; +Cc: Martin J. Bligh, Kernel Mailing List
On Sat, 15 Feb 2003, Andrew Morton wrote:
>
> The recent change to fs/proc/array.c:task_sig()?
>
> buffer += sprintf(buffer, "ShdPnd:\t");
> buffer = render_sigset_t(&p->signal->shared_pending.signal, buffer);
Yeah, but I think the bug has existed for much longer.
It looks like "proc_pid_status()" doesn't actually lock the task at all,
nor even bother to test whether the task has signal state. Which has
_always_ been a bug. I don't know why it would start happening now, but it
might just be unlucky timing.
I think the proper fix is to put a
task_lock()
..
task_unlock()
around the whole proc_pid_status() function, _and_ then verify that
"tsk->sighand" is non-NULL.
(Oh, careful, that's already what "get_task_mm()" does internally, so look
out for deadlocks - you'd need to open-code the get_task_mm() in there
too, so the end result is something like
task_lock(task)
if (task->mm) {
.. mm state
}
if (task->sighand) {
.. signal state
}
..
task_unlock(task);
instead).
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 1:35 ` Linus Torvalds
@ 2003-02-16 2:09 ` Martin J. Bligh
2003-02-16 2:27 ` Linus Torvalds
2003-02-16 2:48 ` Zwane Mwaikambo
0 siblings, 2 replies; 23+ messages in thread
From: Martin J. Bligh @ 2003-02-16 2:09 UTC (permalink / raw)
To: Linus Torvalds, Andrew Morton; +Cc: Kernel Mailing List
> Yeah, but I think the bug has existed for much longer.
>
> It looks like "proc_pid_status()" doesn't actually lock the task at all,
> nor even bother to test whether the task has signal state. Which has
> _always_ been a bug. I don't know why it would start happening now, but
> it might just be unlucky timing.
>
> I think the proper fix is to put a
>
> task_lock()
> ..
> task_unlock()
>
> around the whole proc_pid_status() function, _and_ then verify that
> "tsk->sighand" is non-NULL.
>
> (Oh, careful, that's already what "get_task_mm()" does internally, so
> look out for deadlocks - you'd need to open-code the get_task_mm() in
> there too, so the end result is something like
>
> task_lock(task)
> if (task->mm) {
> .. mm state
> }
> if (task->sighand) {
> .. signal state
> }
> ..
> task_unlock(task);
>
> instead).
Something like this? Compiles, but not tested yet ...
diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c
proc/fs/proc/array.c
--- virgin/fs/proc/array.c Sat Feb 15 16:11:45 2003
+++ proc/fs/proc/array.c Sat Feb 15 18:05:10 2003
@@ -243,8 +243,10 @@ extern char *task_mem(struct mm_struct *
int proc_pid_status(struct task_struct *task, char * buffer)
{
char * orig = buffer;
- struct mm_struct *mm = get_task_mm(task);
+ struct mm_struct *mm;
+ task_lock(task);
+ mm = __get_task_mm(task);
buffer = task_name(task, buffer);
buffer = task_state(task, buffer);
@@ -257,6 +259,7 @@ int proc_pid_status(struct task_struct *
#if defined(CONFIG_ARCH_S390)
buffer = task_show_regs(task, buffer);
#endif
+ task_unlock(task);
return buffer - orig;
}
diff -urpN -X /home/fletch/.diff.exclude virgin/include/linux/sched.h
proc/include/linux/sched.h
--- virgin/include/linux/sched.h Sat Feb 15 16:11:47 2003
+++ proc/include/linux/sched.h Sat Feb 15 18:04:42 2003
@@ -706,6 +706,18 @@ static inline struct mm_struct * get_tas
return mm;
}
+
+/* lockless version of get_task_mm */
+static inline struct mm_struct * __get_task_mm(struct task_struct * task)
+{
+ struct mm_struct * mm;
+
+ mm = task->mm;
+ if (mm)
+ atomic_inc(&mm->mm_users);
+
+ return mm;
+}
/* set thread flags in other task's structures
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 2:09 ` Martin J. Bligh
@ 2003-02-16 2:27 ` Linus Torvalds
2003-02-16 4:00 ` Martin J. Bligh
2003-02-16 2:48 ` Zwane Mwaikambo
1 sibling, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2003-02-16 2:27 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Andrew Morton, Kernel Mailing List
On Sat, 15 Feb 2003, Martin J. Bligh wrote:
>
> Something like this? Compiles, but not tested yet ...
Close, but you also need
- buffer = task_sig(task, buffer);
+ if (task->sighand)
+ buffer = task_sig(task, buffer);
to actually check whether signals exist or not. Otherwise you'll just get
the same oops anyway (well, keeping the task locked may change timings
enough that it doesn't happen, but the bug will continue to be there.
I would also say that since you explicitly take the task lock, there's no
real reason to use "get_task_mm()" at all any more, so instead of doing
that (and then doing the mmput()), just get rid of the mm variable
entirely, and do
if (task->mm)
buffer = task_mem(task->mm, buffer)
too. There's really no downside to just holding on to the task lock over
the whole operation instead of incrementing and decrementing the mm
counts and dropping the lock early.
(There are a few valid reasons for using the "get_task_mm()" function:
- you need to block and thus drop the task lock
- the original code just used "task->mm" directly, and using
"get_task_mm()" made the original conversion to mm safe handling
easier.
Neither reason is really valid any more in this function at least).
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 2:09 ` Martin J. Bligh
2003-02-16 2:27 ` Linus Torvalds
@ 2003-02-16 2:48 ` Zwane Mwaikambo
1 sibling, 0 replies; 23+ messages in thread
From: Zwane Mwaikambo @ 2003-02-16 2:48 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Linus Torvalds, Andrew Morton, Kernel Mailing List
On Sat, 15 Feb 2003, Martin J. Bligh wrote:
> - struct mm_struct *mm = get_task_mm(task);
> + struct mm_struct *mm;
>
> + task_lock(task);
> + mm = __get_task_mm(task);
> buffer = task_name(task, buffer);
> buffer = task_state(task, buffer);
task_state acquires the task_struct lock.
Zwane
--
function.linuxpower.ca
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 2:27 ` Linus Torvalds
@ 2003-02-16 4:00 ` Martin J. Bligh
2003-02-16 13:05 ` Anton Blanchard
2003-02-16 18:07 ` Linus Torvalds
0 siblings, 2 replies; 23+ messages in thread
From: Martin J. Bligh @ 2003-02-16 4:00 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, Kernel Mailing List
> Close, but you also need
>
> - buffer = task_sig(task, buffer);
> + if (task->sighand)
> + buffer = task_sig(task, buffer);
>
> to actually check whether signals exist or not. Otherwise you'll just get
> the same oops anyway (well, keeping the task locked may change timings
> enough that it doesn't happen, but the bug will continue to be there.
>
> I would also say that since you explicitly take the task lock, there's no
> real reason to use "get_task_mm()" at all any more, so instead of doing
> that (and then doing the mmput()), just get rid of the mm variable
> entirely, and do
>
> if (task->mm)
> buffer = task_mem(task->mm, buffer)
>
> too. There's really no downside to just holding on to the task lock over
> the whole operation instead of incrementing and decrementing the mm
> counts and dropping the lock early.
>
> (There are a few valid reasons for using the "get_task_mm()" function:
>
> - you need to block and thus drop the task lock
>
> - the original code just used "task->mm" directly, and using
> "get_task_mm()" made the original conversion to mm safe handling
> easier.
>
> Neither reason is really valid any more in this function at least).
OK, I did the following, which is what I think you wanted, plus Zwane's
observation that task_state acquires the task_struct lock (we're the only
caller, so I just removed it), but I still get the same panic and this time
the box hung. No doubt I've just done something stupid in the patch ...
(task_state takes the tasklist_lock ... is that safe to do inside task_lock?)
diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet/fs/proc/array.c
--- virgin/fs/proc/array.c Sat Feb 15 16:11:45 2003
+++ sdet/fs/proc/array.c Sat Feb 15 19:28:34 2003
@@ -166,12 +166,10 @@ static inline char * task_state(struct t
p->uid, p->euid, p->suid, p->fsuid,
p->gid, p->egid, p->sgid, p->fsgid);
read_unlock(&tasklist_lock);
- task_lock(p);
buffer += sprintf(buffer,
"FDSize:\t%d\n"
"Groups:\t",
p->files ? p->files->max_fds : 0);
- task_unlock(p);
for (g = 0; g < p->ngroups; g++)
buffer += sprintf(buffer, "%d ", p->groups[g]);
@@ -243,20 +241,20 @@ extern char *task_mem(struct mm_struct *
int proc_pid_status(struct task_struct *task, char * buffer)
{
char * orig = buffer;
- struct mm_struct *mm = get_task_mm(task);
+ task_lock(task);
buffer = task_name(task, buffer);
buffer = task_state(task, buffer);
- if (mm) {
- buffer = task_mem(mm, buffer);
- mmput(mm);
- }
- buffer = task_sig(task, buffer);
+ if (task->mm)
+ buffer = task_mem(task->mm, buffer);
+ if (task->sighand)
+ buffer = task_sig(task, buffer);
buffer = task_cap(task, buffer);
#if defined(CONFIG_ARCH_S390)
buffer = task_show_regs(task, buffer);
#endif
+ task_unlock(task);
return buffer - orig;
}
Unable to handle kernel NULL pointer dereference at virtual address 00000014
printing eip:
c0118b13
*pde = 2d4bf001
*pte = 00000000
Oops: 0000
CPU: 6
EIP: 0060:[<c0118b13>] Not tainted
EFLAGS: 00010207
EIP is at render_sigset_t+0x17/0x7c
eax: 00000010 ebx: ed3d909f ecx: ed3d9097 edx: eda60100
esi: 0000003c edi: 00000010 ebp: ed49bf04 esp: ed49bef8
ds: 007b es: 007b ss: 0068
Process ps (pid: 3339, threadinfo=ed49a000 task=ee32cd20)
Stack: ed3d907d 00000002 ed3d909f 00000006 c016d85f 00000010 ed3d909f ed3d9097
c0233f38 eda606bc ed3d9086 ed3d907e c0233f2f ee1d7b70 eda60100 eeb4d0c0
ed3d9000 ee1d7b70 00000000 ed3d9000 000000d0 eeb4d0c0 c01300f6 c17119e8
Call Trace:
[<c016d85f>] proc_pid_status+0x26f/0x334
[<c01300f6>] __get_free_pages+0x4e/0x54
[<c016b517>] proc_info_read+0x53/0x130
[<c0145295>] vfs_read+0xa5/0x128
[<c0145522>] sys_read+0x2a/0x3c
[<c0108b3f>] syscall_call+0x7/0xb
Code: 0f a3 37 19 d2 b8 01 00 00 00 31 c9 85 d2 0f 45 c8 8d 56 01
<1>Unable to handle kernel NULL pointer dereference at virtual address 00000014
printing eip:
c0118b13
*pde = 2d512001
*pte = 00000000
Oops: 0000
CPU: 2
EIP: 0060:[<c0118b13>] Not tainted
EFLAGS: 00010207
EIP is at render_sigset_t+0x17/0x7c
eax: 00000010 ebx: ee34b0a0 ecx: ee34b098 edx: ed3ea7a0
esi: 0000003c edi: 00000010 ebp: ed54ff04 esp: ed54fef8
ds: 007b es: 007b ss: 0068
Process ps (pid: 14710, threadinfo=ed54e000 task=ee430d40)
Stack: ee34b07e 00000002 ee34b0a0 00000006 c016d85f 00000010 ee34b0a0 ee34b098
c0233f38 ed3ead5c ee34b087 ee34b07f c0233f2f ee4960d0 ed3ea7a0 eec66ca0
ee34b000 ee4960d0 00000000 ee34b000 000000d0 eec66ca0 c01300f6 c17383b8
Call Trace:
[<c016d85f>] proc_pid_status+0x26f/0x334
[<c01300f6>] __get_free_pages+0x4e/0x54
[<c016b517>] proc_info_read+0x53/0x130
[<c0145295>] vfs_read+0xa5/0x128
[<c0145522>] sys_read+0x2a/0x3c
[<c0108b3f>] syscall_call+0x7/0xb
Code: 0f a3 37 19 d2 b8 01 00 00 00 31 c9 85 d2 0f 45 c8 8d 56 01
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 4:00 ` Martin J. Bligh
@ 2003-02-16 13:05 ` Anton Blanchard
2003-02-16 16:39 ` Martin J. Bligh
2003-02-16 18:07 ` Linus Torvalds
1 sibling, 1 reply; 23+ messages in thread
From: Anton Blanchard @ 2003-02-16 13:05 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Linus Torvalds, Andrew Morton, Kernel Mailing List
> OK, I did the following, which is what I think you wanted, plus Zwane's
> observation that task_state acquires the task_struct lock (we're the only
> caller, so I just removed it), but I still get the same panic and this time
> the box hung. No doubt I've just done something stupid in the patch ...
> (task_state takes the tasklist_lock ... is that safe to do inside task_lock?)
It looks like you now try and take the tasklist_lock inside a task_lock()
region, but task_lock says no-can-do:
/* Protects ->fs, ->files, ->mm, and synchronises with wait4(). Nests
* inside tasklist_lock */
static inline void task_lock(struct task_struct *p)
{
...
Anton
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 13:05 ` Anton Blanchard
@ 2003-02-16 16:39 ` Martin J. Bligh
2003-02-16 18:21 ` Linus Torvalds
0 siblings, 1 reply; 23+ messages in thread
From: Martin J. Bligh @ 2003-02-16 16:39 UTC (permalink / raw)
To: Anton Blanchard; +Cc: Linus Torvalds, Andrew Morton, Kernel Mailing List
>> OK, I did the following, which is what I think you wanted, plus Zwane's
>> observation that task_state acquires the task_struct lock (we're the only
>> caller, so I just removed it), but I still get the same panic and this time
>> the box hung. No doubt I've just done something stupid in the patch ...
>> (task_state takes the tasklist_lock ... is that safe to do inside task_lock?)
>
> It looks like you now try and take the tasklist_lock inside a task_lock()
> region, but task_lock says no-can-do:
>
> /* Protects ->fs, ->files, ->mm, and synchronises with wait4(). Nests
> * inside tasklist_lock */
> static inline void task_lock(struct task_struct *p)
> {
> ...
Right, that's what I expected, and that may well explain the hang, but I
don't see how that explains the oops still happening. What exactly is
tasklist_lock protecting here anyway?
read_lock(&tasklist_lock);
buffer += sprintf(buffer,
"State:\t%s\n"
"Tgid:\t%d\n"
"Pid:\t%d\n"
"PPid:\t%d\n"
"TracerPid:\t%d\n"
"Uid:\t%d\t%d\t%d\t%d\n"
"Gid:\t%d\t%d\t%d\t%d\n",
get_task_state(p), p->tgid,
p->pid, p->pid ? p->real_parent->pid : 0,
p->pid && p->ptrace ? p->parent->pid : 0,
p->uid, p->euid, p->suid, p->fsuid,
p->gid, p->egid, p->sgid, p->fsgid);
read_unlock(&tasklist_lock);
Is it these two accesses:
p->real_parent->pid ?
p->parent->pid ?
Don't see what I can do for this apart from to invert the ordering and take
tasklist_lock around the whole function, and nest task_lock inside that, or
I suppose I could take the task_lock for each of the parents? I seem to
recall Linus reminding people recently that it was only the lock
acquisition order that was important, not release ... does something like
the following look OK?
diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet2/fs/proc/array.c
--- virgin/fs/proc/array.c Sat Feb 15 16:11:45 2003
+++ sdet2/fs/proc/array.c Sun Feb 16 08:37:45 2003
@@ -147,11 +147,11 @@ static inline const char * get_task_stat
return *p;
}
+/* Call me with the tasklist_lock and task_lock for p held already */
static inline char * task_state(struct task_struct *p, char *buffer)
{
int g;
- read_lock(&tasklist_lock);
buffer += sprintf(buffer,
"State:\t%s\n"
"Tgid:\t%d\n"
@@ -165,13 +165,10 @@ static inline char * task_state(struct t
p->pid && p->ptrace ? p->parent->pid : 0,
p->uid, p->euid, p->suid, p->fsuid,
p->gid, p->egid, p->sgid, p->fsgid);
- read_unlock(&tasklist_lock);
- task_lock(p);
buffer += sprintf(buffer,
"FDSize:\t%d\n"
"Groups:\t",
p->files ? p->files->max_fds : 0);
- task_unlock(p);
for (g = 0; g < p->ngroups; g++)
buffer += sprintf(buffer, "%d ", p->groups[g]);
@@ -243,20 +240,22 @@ extern char *task_mem(struct mm_struct *
int proc_pid_status(struct task_struct *task, char * buffer)
{
char * orig = buffer;
- struct mm_struct *mm = get_task_mm(task);
+ read_lock(&tasklist_lock);
+ task_lock(task);
buffer = task_name(task, buffer);
buffer = task_state(task, buffer);
+ read_unlock(&tasklist_lock);
- if (mm) {
- buffer = task_mem(mm, buffer);
- mmput(mm);
- }
- buffer = task_sig(task, buffer);
+ if (task->mm)
+ buffer = task_mem(task->mm, buffer);
+ if (task->sighand)
+ buffer = task_sig(task, buffer);
buffer = task_cap(task, buffer);
#if defined(CONFIG_ARCH_S390)
buffer = task_show_regs(task, buffer);
#endif
+ task_unlock(task);
return buffer - orig;
}
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
@ 2003-02-16 17:41 Manfred Spraul
2003-02-16 18:15 ` Linus Torvalds
0 siblings, 1 reply; 23+ messages in thread
From: Manfred Spraul @ 2003-02-16 17:41 UTC (permalink / raw)
To: Martin J. Bligh
Cc: linux-kernel, Linus Torvalds, Anton Blanchard, Zwane Mwaikambo
>
>
>OK, I did the following, which is what I think you wanted, plus Zwane's
>observation that task_state acquires the task_struct lock (we're the only
>caller, so I just removed it), but I still get the same panic and this time
>the box hung.
>
AFAICS both exec and exit rely on write_lock_irq(tasklist_lock) for
synchronization of changes to tsk->sig{,hand}.
I bet an __exit_sighand occurs in the middle of proc_pid_status() -
after the NULL test, before the access in task_sig.
Martin, could you check what happens if you do not release the
tasklist_lock until after the task_sig()?
--
Manfred
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 4:00 ` Martin J. Bligh
2003-02-16 13:05 ` Anton Blanchard
@ 2003-02-16 18:07 ` Linus Torvalds
2003-02-16 18:26 ` Martin J. Bligh
1 sibling, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2003-02-16 18:07 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Andrew Morton, Kernel Mailing List
On Sat, 15 Feb 2003, Martin J. Bligh wrote:
>
> OK, I did the following, which is what I think you wanted, plus Zwane's
> observation that task_state acquires the task_struct lock (we're the only
> caller, so I just removed it), but I still get the same panic and this time
> the box hung.
Yeah, a closer look shows that the exit path doesn't actually take the
task lock at all around any of the signal stuff, so the lock protects
"task->mm", "task->files" and "task->fs", but it does NOT protect
"task->signal" or "task->sighand" (illogical, but true).
Oh, damn.
The core that checks for "p->sighand" takes the tasklist lock for this
reason (see "collect_sigign_sigcatch()"
So the choice seems to be either:
- make the exit path hold the task lock over the whole exit path, not
just over mm exit.
- take the "tasklist_lock" over more of "task_sig()" (not just the
collect_sigign_sigcatch() thing, but the "&p->signal->shared_pending"
rendering too.
The latter is a two-liner. The former is the "right thing" for multiple
reasons.
The reason I'd _like_ to see the task lock held over _all_ of the fields
in the exit() path is:
- right now we actually take it and drop it multiple times in exit. See
__exit_files(), __exit_fs(), __exit_mm(). Which all take it just to
serialize setting ot "mm/files/fs" to NULL.
- task_lock is a nice local lock, no global scalability impact.
So it really would be much nicer to do something like this in do_exit():
struct mm_struct *mm;
struct files_struct *files;
struct fs_struct *fs;
struct signal_struct *signal;
struct sighand_struct *sighand;
task_lock(task);
mm = task->mm;
files = task->files;
fs = task->fs;
signal = task->signal;
sighand = task->sighand;
task->mm = NULL;
task->files = NULL;
task->fs = NULL;
task->signal = NULL;
task->sighand = NULL;
task_unlock(task);
.. actually do the "__exit_mm(task, mm)" etc here ..
which would make things a lot more readable, and result in us taking the
lock only _once_ instead of three times, and would properly protect
"signal" and "sighand" so that the /proc code wouldn't need to take the
heavily used "tasklist_lock" just to read the signal state for a single
task.
But fixing up exit to do the above would require several (trivial) calling
convention changes, like changing
static inline void __exit_mm(struct task_struct * tsk)
{
struct mm_struct *mm = tsk->mm;
...
into
static inline void __exit_mm(struct task_struct * tsk,
struct mm_struct *mm)
{
...
instead and updatign the callers.
Is anybody willing to do that (hopefully fairly trivial) fixup and test
it, or should we go with the stupid "take the 'tasklist_lock'" approach?
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 17:41 Fw: 2.5.61 oops running SDET Manfred Spraul
@ 2003-02-16 18:15 ` Linus Torvalds
2003-02-16 18:45 ` Manfred Spraul
0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2003-02-16 18:15 UTC (permalink / raw)
To: Manfred Spraul
Cc: Martin J. Bligh, linux-kernel, Anton Blanchard, Zwane Mwaikambo
On Sun, 16 Feb 2003, Manfred Spraul wrote:
>
> AFAICS both exec and exit rely on write_lock_irq(tasklist_lock) for
> synchronization of changes to tsk->sig{,hand}.
Yeah, as I sent out in my last email this does seem to be true right now,
but it's really not correct. It's disgusting that we use such a
fundamental global lock to protect something so trivially local to the one
process, where the local per-process lock really should be more than
enough.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 16:39 ` Martin J. Bligh
@ 2003-02-16 18:21 ` Linus Torvalds
2003-02-16 19:06 ` Martin J. Bligh
0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2003-02-16 18:21 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List
On Sun, 16 Feb 2003, Martin J. Bligh wrote:
>
> Right, that's what I expected, and that may well explain the hang, but I
> don't see how that explains the oops still happening. What exactly is
> tasklist_lock protecting here anyway?
>
> read_lock(&tasklist_lock);
> buffer += sprintf(buffer,
> "State:\t%s\n"
> "Tgid:\t%d\n"
> "Pid:\t%d\n"
> "PPid:\t%d\n"
> "TracerPid:\t%d\n"
> "Uid:\t%d\t%d\t%d\t%d\n"
> "Gid:\t%d\t%d\t%d\t%d\n",
> get_task_state(p), p->tgid,
> p->pid, p->pid ? p->real_parent->pid : 0,
> p->pid && p->ptrace ? p->parent->pid : 0,
> p->uid, p->euid, p->suid, p->fsuid,
> p->gid, p->egid, p->sgid, p->fsgid);
> read_unlock(&tasklist_lock);
>
> Is it these two accesses:
>
> p->real_parent->pid ?
> p->parent->pid ?
Yup.
> Don't see what I can do for this apart from to invert the ordering and take
> tasklist_lock around the whole function, and nest task_lock inside that, or
> I suppose I could take the task_lock for each of the parents? I seem to
> recall Linus reminding people recently that it was only the lock
> acquisition order that was important, not release ... does something like
> the following look OK?
This patch looks like it should certainly fix the problem, but that is
still some god-awful ugly overkill in locking.
I'd rather make the rule be that you have to take the task lock before
modifying things like the parent pointers (and all the other fundamntal
pointers), since that's already the rule for most of it anyway.
And then the tasklist lock would go away _entirely_ from /proc (except for
task lookup in ->readdir/->lookup, of course, where it is fundamentally
needed and proper - and will probably some day be replaced by RCU, I
suspect).
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 18:07 ` Linus Torvalds
@ 2003-02-16 18:26 ` Martin J. Bligh
2003-02-16 18:36 ` Martin J. Bligh
0 siblings, 1 reply; 23+ messages in thread
From: Martin J. Bligh @ 2003-02-16 18:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, Kernel Mailing List
> So the choice seems to be either:
>
> - make the exit path hold the task lock over the whole exit path, not
> just over mm exit.
>
> - take the "tasklist_lock" over more of "task_sig()" (not just the
> collect_sigign_sigcatch() thing, but the "&p->signal->shared_pending"
> rendering too.
>
> The latter is a two-liner. The former is the "right thing" for multiple
> reasons.
>
> The reason I'd _like_ to see the task lock held over _all_ of the fields
> in the exit() path is:
>
> - right now we actually take it and drop it multiple times in exit. See
> __exit_files(), __exit_fs(), __exit_mm(). Which all take it just to
> serialize setting ot "mm/files/fs" to NULL.
>
> - task_lock is a nice local lock, no global scalability impact.
Don't we have to take tasklist_lock anyway for when task_state reads
p->real_parent->pid and p->parent->pid? Or should we have to grab
the task_lock for those too? If so, it sounds like it could get into
rather horrible ordering dependencies to me.
If we move exit under task_lock ... then tasklist_lock doesn't help us
any more there either (presumably we're trying to stop them exiting
whilst we look at them) ... I've coded up the stupid approach for now,
and will check that works first. Should have results very soon.
M.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 18:26 ` Martin J. Bligh
@ 2003-02-16 18:36 ` Martin J. Bligh
0 siblings, 0 replies; 23+ messages in thread
From: Martin J. Bligh @ 2003-02-16 18:36 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, Kernel Mailing List
>> The reason I'd _like_ to see the task lock held over _all_ of the fields
>> in the exit() path is:
>>
>> - right now we actually take it and drop it multiple times in exit. See
>> __exit_files(), __exit_fs(), __exit_mm(). Which all take it just to
>> serialize setting ot "mm/files/fs" to NULL.
>>
>> - task_lock is a nice local lock, no global scalability impact.
>
> Don't we have to take tasklist_lock anyway for when task_state reads
> p->real_parent->pid and p->parent->pid? Or should we have to grab
> the task_lock for those too? If so, it sounds like it could get into
> rather horrible ordering dependencies to me.
>
> If we move exit under task_lock ... then tasklist_lock doesn't help us
> any more there either (presumably we're trying to stop them exiting
> whilst we look at them) ... I've coded up the stupid approach for now,
> and will check that works first. Should have results very soon.
Hmmmm ... I guess we could keep real_parent->pid and parent->pid inside
the child's task_struct ... I can't see those change unless we're changing
real_parent / parent task pointers anyway. Not sure how much that idea
will make people vomit, but it would seem to make it easier to get rid
of the global locking, and it's probably nicer than trying to order the
aquistion of other people's task locks ...
M.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 18:15 ` Linus Torvalds
@ 2003-02-16 18:45 ` Manfred Spraul
2003-02-16 18:56 ` Linus Torvalds
0 siblings, 1 reply; 23+ messages in thread
From: Manfred Spraul @ 2003-02-16 18:45 UTC (permalink / raw)
To: Linus Torvalds
Cc: Martin J. Bligh, linux-kernel, Anton Blanchard, Zwane Mwaikambo
Linus Torvalds wrote:
>On Sun, 16 Feb 2003, Manfred Spraul wrote:
>
>
>>AFAICS both exec and exit rely on write_lock_irq(tasklist_lock) for
>>synchronization of changes to tsk->sig{,hand}.
>>
>>
>
>Yeah, as I sent out in my last email this does seem to be true right now,
>but it's really not correct. It's disgusting that we use such a
>fundamental global lock to protect something so trivially local to the one
>process, where the local per-process lock really should be more than
>enough.
>
The difference between the tasklist_lock and task_lock is that task_lock
is not an interrupt lock.
Think about signal delivery during exec.
Do you want to replace tasklist_lock with task_lock in exit_sighand()
and during exec, or do you want to add task_lock to tasklist_lock?
Hmm.
Someone removed the read_lock(tasklist_lock) around
send_specific_sig_info() - which lock synchronizes exec and signal delivery?
--
Manfred
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 18:45 ` Manfred Spraul
@ 2003-02-16 18:56 ` Linus Torvalds
0 siblings, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2003-02-16 18:56 UTC (permalink / raw)
To: Manfred Spraul
Cc: Martin J. Bligh, linux-kernel, Anton Blanchard, Zwane Mwaikambo,
Ingo Molnar
On Sun, 16 Feb 2003, Manfred Spraul wrote:
>
> Do you want to replace tasklist_lock with task_lock in exit_sighand()
> and during exec, or do you want to add task_lock to tasklist_lock?
I'd rather replace it, but your point about irq-safeness is a good one.
Clearly signal sending cannot take the task lock if we want to keep it
irq-unsafe.
So the signal stuff probably does need to be protected by _both_ the
tasklist lock and the provate task lock.
> Someone removed the read_lock(tasklist_lock) around
> send_specific_sig_info() - which lock synchronizes exec and signal delivery?
Looks like none. And I think it was Ingo who removed it. Ingo?
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 18:21 ` Linus Torvalds
@ 2003-02-16 19:06 ` Martin J. Bligh
2003-02-16 19:17 ` Linus Torvalds
2003-02-16 19:18 ` Manfred Spraul
0 siblings, 2 replies; 23+ messages in thread
From: Martin J. Bligh @ 2003-02-16 19:06 UTC (permalink / raw)
To: Linus Torvalds
Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List,
Zwane Mwaikambo, Manfred Spraul
>> Don't see what I can do for this apart from to invert the ordering and take
>> tasklist_lock around the whole function, and nest task_lock inside that, or
>> I suppose I could take the task_lock for each of the parents? I seem to
>> recall Linus reminding people recently that it was only the lock
>> acquisition order that was important, not release ... does something like
>> the following look OK?
>
> This patch looks like it should certainly fix the problem, but that is
> still some god-awful ugly overkill in locking.
>
> I'd rather make the rule be that you have to take the task lock before
> modifying things like the parent pointers (and all the other fundamntal
> pointers), since that's already the rule for most of it anyway.
>
> And then the tasklist lock would go away _entirely_ from /proc (except for
> task lookup in ->readdir/->lookup, of course, where it is fundamentally
> needed and proper - and will probably some day be replaced by RCU, I
> suspect).
Well, I did the stupid safe thing, and it hangs the box once we get up to
a load of 32 with SDET. Below is what I did, the only other issue I can
see in here is that task_mem takes mm->mmap_sem which is now nested inside
the task_lock inside tasklist_lock ... but I can't see anywhere that's a
problem from a quick search
diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet2/fs/proc/array.c
--- virgin/fs/proc/array.c Sat Feb 15 16:11:45 2003
+++ sdet2/fs/proc/array.c Sun Feb 16 09:59:24 2003
@@ -147,11 +147,11 @@ static inline const char * get_task_stat
return *p;
}
+/* Call me with the tasklist_lock and task_lock for p held already */
static inline char * task_state(struct task_struct *p, char *buffer)
{
int g;
- read_lock(&tasklist_lock);
buffer += sprintf(buffer,
"State:\t%s\n"
"Tgid:\t%d\n"
@@ -165,13 +165,10 @@ static inline char * task_state(struct t
p->pid && p->ptrace ? p->parent->pid : 0,
p->uid, p->euid, p->suid, p->fsuid,
p->gid, p->egid, p->sgid, p->fsgid);
- read_unlock(&tasklist_lock);
- task_lock(p);
buffer += sprintf(buffer,
"FDSize:\t%d\n"
"Groups:\t",
p->files ? p->files->max_fds : 0);
- task_unlock(p);
for (g = 0; g < p->ngroups; g++)
buffer += sprintf(buffer, "%d ", p->groups[g]);
@@ -243,20 +240,22 @@ extern char *task_mem(struct mm_struct *
int proc_pid_status(struct task_struct *task, char * buffer)
{
char * orig = buffer;
- struct mm_struct *mm = get_task_mm(task);
+ read_lock(&tasklist_lock);
+ task_lock(task);
buffer = task_name(task, buffer);
buffer = task_state(task, buffer);
- if (mm) {
- buffer = task_mem(mm, buffer);
- mmput(mm);
- }
- buffer = task_sig(task, buffer);
+ if (task->mm)
+ buffer = task_mem(task->mm, buffer);
+ if (task->sighand)
+ buffer = task_sig(task, buffer);
buffer = task_cap(task, buffer);
#if defined(CONFIG_ARCH_S390)
buffer = task_show_regs(task, buffer);
#endif
+ task_unlock(task);
+ read_unlock(&tasklist_lock);
return buffer - orig;
}
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 19:06 ` Martin J. Bligh
@ 2003-02-16 19:17 ` Linus Torvalds
2003-02-16 21:15 ` Martin J. Bligh
2003-02-16 19:18 ` Manfred Spraul
1 sibling, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2003-02-16 19:17 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List,
Zwane Mwaikambo, Manfred Spraul
On Sun, 16 Feb 2003, Martin J. Bligh wrote:
>
> Well, I did the stupid safe thing, and it hangs the box once we get up to
> a load of 32 with SDET. Below is what I did, the only other issue I can
> see in here is that task_mem takes mm->mmap_sem which is now nested inside
> the task_lock inside tasklist_lock ... but I can't see anywhere that's a
> problem from a quick search
Ho - you can _never_ nest a semaphore inside a spinlock - if the semaphore
sleeps, the spinlock will cause a lockup regardless of any reverse nesting
issues.. So I guess the old "get_task_mm()" code is requried anyway.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 19:06 ` Martin J. Bligh
2003-02-16 19:17 ` Linus Torvalds
@ 2003-02-16 19:18 ` Manfred Spraul
1 sibling, 0 replies; 23+ messages in thread
From: Manfred Spraul @ 2003-02-16 19:18 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Linus Torvalds, Anton Blanchard, Andrew Morton,
Kernel Mailing List, Zwane Mwaikambo
Martin J. Bligh wrote:
>Well, I did the stupid safe thing, and it hangs the box once we get up to
>a load of 32 with SDET. Below is what I did, the only other issue I can
>see in here is that task_mem takes mm->mmap_sem which is now nested inside
>the task_lock inside tasklist_lock ... but I can't see anywhere that's a
>problem from a quick search
>
>
task_lock is actually &spin_lock(tsk->alloc_lock); mmap_sem is a
semaphore. Nesting means deadlock.
--
Manfred
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 19:17 ` Linus Torvalds
@ 2003-02-16 21:15 ` Martin J. Bligh
2003-02-16 21:21 ` Manfred Spraul
0 siblings, 1 reply; 23+ messages in thread
From: Martin J. Bligh @ 2003-02-16 21:15 UTC (permalink / raw)
To: Linus Torvalds
Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List,
Zwane Mwaikambo, Manfred Spraul
>> Well, I did the stupid safe thing, and it hangs the box once we get up to
>> a load of 32 with SDET. Below is what I did, the only other issue I can
>> see in here is that task_mem takes mm->mmap_sem which is now nested inside
>> the task_lock inside tasklist_lock ... but I can't see anywhere that's a
>> problem from a quick search
>
> Ho - you can _never_ nest a semaphore inside a spinlock - if the semaphore
> sleeps, the spinlock will cause a lockup regardless of any reverse nesting
> issues.. So I guess the old "get_task_mm()" code is requried anyway.
True ... staring at unfamiliar code made me forget a few things ;-)
OK, the below patch works, no oopses, no hangs. If this is acceptable
(even if it's not the final solution), could you apply it? Seems to
be better than the current situation, at any rate.
-----------------------------------------------------
We can encounter a race condition where task->sighand can be NULL in
proc_pid_status, resulting in an offset NULL pointer dereference in
render_sigset_t. This can be easily demonstrated running SDET with a
load of 32 to 64 on a large machine (16x NUMA-Q in this case).
This patch makes us take task_lock around the sighand deferences,
and fixes the oops in testing on the same machine. Thanks to Linus,
Andrew, Manfred and Zwane for lots of guidance ;-)
diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet3/fs/proc/array.c
--- virgin/fs/proc/array.c Sat Feb 15 16:11:45 2003
+++ sdet3/fs/proc/array.c Sun Feb 16 11:44:24 2003
@@ -252,8 +252,11 @@ int proc_pid_status(struct task_struct *
buffer = task_mem(mm, buffer);
mmput(mm);
}
- buffer = task_sig(task, buffer);
+ task_lock(task);
+ if (task->sighand)
+ buffer = task_sig(task, buffer);
buffer = task_cap(task, buffer);
+ task_unlock(task);
#if defined(CONFIG_ARCH_S390)
buffer = task_show_regs(task, buffer);
#endif
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 21:15 ` Martin J. Bligh
@ 2003-02-16 21:21 ` Manfred Spraul
2003-02-16 22:34 ` Linus Torvalds
0 siblings, 1 reply; 23+ messages in thread
From: Manfred Spraul @ 2003-02-16 21:21 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Linus Torvalds, Anton Blanchard, Andrew Morton,
Kernel Mailing List, Zwane Mwaikambo
Martin J. Bligh wrote:
>diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet3/fs/proc/array.c
>--- virgin/fs/proc/array.c Sat Feb 15 16:11:45 2003
>+++ sdet3/fs/proc/array.c Sun Feb 16 11:44:24 2003
>@@ -252,8 +252,11 @@ int proc_pid_status(struct task_struct *
> buffer = task_mem(mm, buffer);
> mmput(mm);
> }
>- buffer = task_sig(task, buffer);
>+ task_lock(task);
>+ if (task->sighand)
>+ buffer = task_sig(task, buffer);
> buffer = task_cap(task, buffer);
>+ task_unlock(task);
> #if defined(CONFIG_ARCH_S390)
> buffer = task_show_regs(task, buffer);
> #endif
>
>
I think it's needed for 2.4, too.
--
Manfred
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 21:21 ` Manfred Spraul
@ 2003-02-16 22:34 ` Linus Torvalds
2003-02-16 23:08 ` Martin J. Bligh
0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2003-02-16 22:34 UTC (permalink / raw)
To: Manfred Spraul
Cc: Martin J. Bligh, Anton Blanchard, Andrew Morton,
Kernel Mailing List, Zwane Mwaikambo
On Sun, 16 Feb 2003, Manfred Spraul wrote:
> Martin J. Bligh wrote:
>
> >diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet3/fs/proc/array.c
> >--- virgin/fs/proc/array.c Sat Feb 15 16:11:45 2003
> >+++ sdet3/fs/proc/array.c Sun Feb 16 11:44:24 2003
> >@@ -252,8 +252,11 @@ int proc_pid_status(struct task_struct *
> > buffer = task_mem(mm, buffer);
> > mmput(mm);
> > }
> >- buffer = task_sig(task, buffer);
> >+ task_lock(task);
> >+ if (task->sighand)
> >+ buffer = task_sig(task, buffer);
> > buffer = task_cap(task, buffer);
> >+ task_unlock(task);
> > #if defined(CONFIG_ARCH_S390)
> > buffer = task_show_regs(task, buffer);
> > #endif
> >
> >
> I think it's needed for 2.4, too.
It's not wrong, but it shouldn't help. Simply because "task_lock()"
isn't even relevant to "task->sighand" as it stands now. It will be
cleared without holding the task lock, as far as I can see.
So I suspect it fixes things for Martin only because it changes timings
enough not to hit the race.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 22:34 ` Linus Torvalds
@ 2003-02-16 23:08 ` Martin J. Bligh
2003-02-16 23:32 ` Linus Torvalds
0 siblings, 1 reply; 23+ messages in thread
From: Martin J. Bligh @ 2003-02-16 23:08 UTC (permalink / raw)
To: Linus Torvalds, Manfred Spraul
Cc: Anton Blanchard, Andrew Morton, Kernel Mailing List,
Zwane Mwaikambo
>> > diff -urpN -X /home/fletch/.diff.exclude virgin/fs/proc/array.c sdet3/fs/proc/array.c
>> > --- virgin/fs/proc/array.c Sat Feb 15 16:11:45 2003
>> > +++ sdet3/fs/proc/array.c Sun Feb 16 11:44:24 2003
>> > @@ -252,8 +252,11 @@ int proc_pid_status(struct task_struct *
>> > buffer = task_mem(mm, buffer);
>> > mmput(mm);
>> > }
>> > - buffer = task_sig(task, buffer);
>> > + task_lock(task);
>> > + if (task->sighand)
>> > + buffer = task_sig(task, buffer);
>> > buffer = task_cap(task, buffer);
>> > + task_unlock(task);
>> > # if defined(CONFIG_ARCH_S390)
>> > buffer = task_show_regs(task, buffer);
>> > # endif
>> >
>> >
>> I think it's needed for 2.4, too.
>
> It's not wrong, but it shouldn't help. Simply because "task_lock()"
> isn't even relevant to "task->sighand" as it stands now. It will be
> cleared without holding the task lock, as far as I can see.
>
> So I suspect it fixes things for Martin only because it changes timings
> enough not to hit the race.
Ah, fair enough ... it's probably the if, rather than the task_lock.
So what does protect sighand? tasklist_lock? It doesn't seem to
as people do things like:
spin_unlock_irq(¤t->sighand->siglock);
all the time ... so is it just protected by good faith and the direction
of the wind?
M.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fw: 2.5.61 oops running SDET
2003-02-16 23:08 ` Martin J. Bligh
@ 2003-02-16 23:32 ` Linus Torvalds
0 siblings, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2003-02-16 23:32 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Manfred Spraul, Anton Blanchard, Andrew Morton,
Kernel Mailing List, Zwane Mwaikambo, Ingo Molnar
On Sun, 16 Feb 2003, Martin J. Bligh wrote:
>
> So what does protect sighand? tasklist_lock?
Yup. At least right now. And we do tend to hold tasklist_lock in most
cases simply by virtue of having to get it anyway in order to search for
the process.
> It doesn't seem to
> as people do things like:
>
> spin_unlock_irq(¤t->sighand->siglock);
>
> all the time ... so is it just protected by good faith and the direction
> of the wind?
Agreed. That, and the fact that most of the time the stuff _is_ there:
obviously any time the process looks at its own signals it will always be
there.
So we have two cases:
- "normal" signal sending which holds tasklist_lock to find the target.
- signals to the process itself, which is usually safe, since we know
we're there (the exception would be if/when taking a fault in the exit
path, but even that might be ok since signals are torn down not by exit
itself but by "release_task")
So the signal path may actually be ok.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2003-02-16 23:26 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-16 17:41 Fw: 2.5.61 oops running SDET Manfred Spraul
2003-02-16 18:15 ` Linus Torvalds
2003-02-16 18:45 ` Manfred Spraul
2003-02-16 18:56 ` Linus Torvalds
[not found] <20030215172407.1fdd41fd.akpm@digeo.com>
2003-02-16 1:35 ` Linus Torvalds
2003-02-16 2:09 ` Martin J. Bligh
2003-02-16 2:27 ` Linus Torvalds
2003-02-16 4:00 ` Martin J. Bligh
2003-02-16 13:05 ` Anton Blanchard
2003-02-16 16:39 ` Martin J. Bligh
2003-02-16 18:21 ` Linus Torvalds
2003-02-16 19:06 ` Martin J. Bligh
2003-02-16 19:17 ` Linus Torvalds
2003-02-16 21:15 ` Martin J. Bligh
2003-02-16 21:21 ` Manfred Spraul
2003-02-16 22:34 ` Linus Torvalds
2003-02-16 23:08 ` Martin J. Bligh
2003-02-16 23:32 ` Linus Torvalds
2003-02-16 19:18 ` Manfred Spraul
2003-02-16 18:07 ` Linus Torvalds
2003-02-16 18:26 ` Martin J. Bligh
2003-02-16 18:36 ` Martin J. Bligh
2003-02-16 2:48 ` Zwane Mwaikambo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox