* [PATCH rc1-mm] de_thread: fix deadlockable process addition
@ 2006-04-06 22:04 Oleg Nesterov
2006-04-06 22:58 ` Eric W. Biederman
0 siblings, 1 reply; 23+ messages in thread
From: Oleg Nesterov @ 2006-04-06 22:04 UTC (permalink / raw)
To: Andrew Morton, Eric W. Biederman; +Cc: linux-kernel
On top of
task-make-task-list-manipulations-rcu-safe.patch
This patch
pidhash-kill-switch_exec_pids.patch
changed de_thread() so that it doesn't remove 'leader' from it's thread group.
However de_thread() still adds current to init_task.tasks without removing
'leader' from this list. What if another CLONE_VM task starts do_coredump()
after de_thread() drops tasklist_lock but before it calls release_task(leader) ?
do_coredump()->zap_threads() will find this thread group twice on init_task.tasks
list. And it will increment mm->core_waiters twice for the new leader (current in
de_thread). So, exit_mm()->complete(mm->core_startup_done) doesn't happen, and we
have a deadlock.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
--- MM/fs/exec.c~0_DET 2006-04-06 22:37:33.000000000 +0400
+++ MM/fs/exec.c 2006-04-06 22:51:51.000000000 +0400
@@ -713,7 +713,7 @@ static int de_thread(struct task_struct
attach_pid(current, PIDTYPE_PID, current->pid);
attach_pid(current, PIDTYPE_PGID, current->signal->pgrp);
attach_pid(current, PIDTYPE_SID, current->signal->session);
- list_add_tail_rcu(¤t->tasks, &init_task.tasks);
+ list_replace_rcu(&leader->tasks, ¤t->tasks);
current->parent = current->real_parent = leader->real_parent;
leader->parent = leader->real_parent = child_reaper;
--- MM/kernel/exit.c~0_DET 2006-03-23 23:02:53.000000000 +0300
+++ MM/kernel/exit.c 2006-04-06 23:01:37.000000000 +0400
@@ -55,7 +55,9 @@ static void __unhash_process(struct task
detach_pid(p, PIDTYPE_PGID);
detach_pid(p, PIDTYPE_SID);
- list_del_rcu(&p->tasks);
+ /* see de_thread()->list_replace_rcu() */
+ if (likely(p->tasks.prev != LIST_POISON2))
+ list_del_rcu(&p->tasks);
__get_cpu_var(process_counts)--;
}
list_del_rcu(&p->thread_group);
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH rc1-mm] de_thread: fix deadlockable process addition
2006-04-06 22:04 [PATCH rc1-mm] de_thread: fix deadlockable process addition Oleg Nesterov
@ 2006-04-06 22:58 ` Eric W. Biederman
2006-04-07 23:46 ` Oleg Nesterov
0 siblings, 1 reply; 23+ messages in thread
From: Eric W. Biederman @ 2006-04-06 22:58 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Andrew Morton, linux-kernel
Oleg Nesterov <oleg@tv-sign.ru> writes:
> On top of
> task-make-task-list-manipulations-rcu-safe.patch
>
> This patch
> pidhash-kill-switch_exec_pids.patch
>
> changed de_thread() so that it doesn't remove 'leader' from it's thread group.
> However de_thread() still adds current to init_task.tasks without removing
> 'leader' from this list. What if another CLONE_VM task starts do_coredump()
> after de_thread() drops tasklist_lock but before it calls release_task(leader) ?
>
> do_coredump()->zap_threads() will find this thread group twice on
> init_task.tasks
> list. And it will increment mm->core_waiters twice for the new leader (current
> in
> de_thread). So, exit_mm()->complete(mm->core_startup_done) doesn't happen, and
> we
> have a deadlock.
Ack. The evils of de_thread!
We need this to keep from seeing the same task twice in
do_each_thread.
This bug is in 2.6.17-rc1 so this code needs to go to Linus
sometime soon.
Eric
> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
>
> --- MM/fs/exec.c~0_DET 2006-04-06 22:37:33.000000000 +0400
> +++ MM/fs/exec.c 2006-04-06 22:51:51.000000000 +0400
> @@ -713,7 +713,7 @@ static int de_thread(struct task_struct
> attach_pid(current, PIDTYPE_PID, current->pid);
> attach_pid(current, PIDTYPE_PGID, current->signal->pgrp);
> attach_pid(current, PIDTYPE_SID, current->signal->session);
> - list_add_tail_rcu(¤t->tasks, &init_task.tasks);
> + list_replace_rcu(&leader->tasks, ¤t->tasks);
>
> current->parent = current->real_parent = leader->real_parent;
> leader->parent = leader->real_parent = child_reaper;
> --- MM/kernel/exit.c~0_DET 2006-03-23 23:02:53.000000000 +0300
> +++ MM/kernel/exit.c 2006-04-06 23:01:37.000000000 +0400
> @@ -55,7 +55,9 @@ static void __unhash_process(struct task
> detach_pid(p, PIDTYPE_PGID);
> detach_pid(p, PIDTYPE_SID);
>
> - list_del_rcu(&p->tasks);
> + /* see de_thread()->list_replace_rcu() */
> + if (likely(p->tasks.prev != LIST_POISON2))
> + list_del_rcu(&p->tasks);
> __get_cpu_var(process_counts)--;
> }
> list_del_rcu(&p->thread_group);
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH rc1-mm] de_thread: fix deadlockable process addition
2006-04-07 23:46 ` Oleg Nesterov
@ 2006-04-07 22:51 ` Andrew Morton
2006-04-07 22:56 ` Andrew Morton
0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2006-04-07 22:51 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: ebiederm, linux-kernel
Oleg Nesterov <oleg@tv-sign.ru> wrote:
>
> - if (likely(p->tasks.prev != LIST_POISON2))
> + if (likely(p->tasks.prev != LIST_POISON2)) {
argh.
c'mon guys, we can't put a dependency on list_head poisoning into generic
code.
We have one in detach_timer() but that's just debugging.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH rc1-mm] de_thread: fix deadlockable process addition
2006-04-07 22:51 ` Andrew Morton
@ 2006-04-07 22:56 ` Andrew Morton
2006-04-08 7:55 ` Eric W. Biederman
0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2006-04-07 22:56 UTC (permalink / raw)
To: oleg, ebiederm, linux-kernel
Andrew Morton <akpm@osdl.org> wrote:
>
> Oleg Nesterov <oleg@tv-sign.ru> wrote:
> >
> > - if (likely(p->tasks.prev != LIST_POISON2))
> > + if (likely(p->tasks.prev != LIST_POISON2)) {
>
> argh.
>
> c'mon guys, we can't put a dependency on list_head poisoning into generic
> code.
>
A suitable fix might be to add a new list_del_poison() (or
list_del_rcu_something()?) and use that everywhere.
But it should use a different poisoning pattern, so we know that the kernel
will still work correctly when someone removes the list_head debugging.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH rc1-mm] de_thread: fix deadlockable process addition
2006-04-06 22:58 ` Eric W. Biederman
@ 2006-04-07 23:46 ` Oleg Nesterov
2006-04-07 22:51 ` Andrew Morton
0 siblings, 1 reply; 23+ messages in thread
From: Oleg Nesterov @ 2006-04-07 23:46 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Andrew Morton, linux-kernel
On 04/06, Eric W. Biederman wrote:
>
> Ack. The evils of de_thread!
Yes, just noticed another thing,
[PATCH] task-make-task-list-manipulations-rcu-safe-fix-fix
We shoudn't decrement process_counts if de_thread() unhashed the task.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
--- MM/kernel/exit.c~ 2006-04-06 23:01:37.000000000 +0400
+++ MM/kernel/exit.c 2006-04-08 03:35:24.000000000 +0400
@@ -56,9 +56,10 @@ static void __unhash_process(struct task
detach_pid(p, PIDTYPE_SID);
/* see de_thread()->list_replace_rcu() */
- if (likely(p->tasks.prev != LIST_POISON2))
+ if (likely(p->tasks.prev != LIST_POISON2)) {
list_del_rcu(&p->tasks);
- __get_cpu_var(process_counts)--;
+ __get_cpu_var(process_counts)--;
+ }
}
list_del_rcu(&p->thread_group);
remove_parent(p);
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH rc1-mm] de_thread: fix deadlockable process addition
2006-04-07 22:56 ` Andrew Morton
@ 2006-04-08 7:55 ` Eric W. Biederman
2006-04-08 17:27 ` Oleg Nesterov
0 siblings, 1 reply; 23+ messages in thread
From: Eric W. Biederman @ 2006-04-08 7:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: oleg, ebiederm, linux-kernel
Andrew Morton <akpm@osdl.org> writes:
> Andrew Morton <akpm@osdl.org> wrote:
>>
>> Oleg Nesterov <oleg@tv-sign.ru> wrote:
>> >
>> > - if (likely(p->tasks.prev != LIST_POISON2))
>> > + if (likely(p->tasks.prev != LIST_POISON2)) {
>>
>> argh.
>>
>> c'mon guys, we can't put a dependency on list_head poisoning into generic
>> code.
>>
>
> A suitable fix might be to add a new list_del_poison() (or
> list_del_rcu_something()?) and use that everywhere.
>
> But it should use a different poisoning pattern, so we know that the kernel
> will still work correctly when someone removes the list_head debugging.
Agreed. That is ugly. I would recommend some new functions
list functions but in thinking about this I believe I see
how we can avoid this case completely.
The first step is to optimize thread_group_leader to be
defined in terms of tasks and not process ids. Which
is one less pointer dereference.
The second step is to modify de_thread to reduce the
old thread group leader to a thread. This requires changing the
leaders parents, changing the leaders thread_group leader, unhashing
the leader from the process group and session, and removing
the leader from the task list.
With those two changes exit.c should not need to account for
the de_thread case.
Oleg please take a hard look and see if you can find anything
that will break with the patch below.
I believe that is all that is needed to cleanly keep do_each_thread
from seeing a single thread multiple times.
Eric
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 541f482..2964a2c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1203,7 +1203,7 @@ extern void wait_task_inactive(task_t *
#define while_each_thread(g, t) \
while ((t = next_thread(t)) != g)
-#define thread_group_leader(p) (p->pid == p->tgid)
+#define thread_group_leader(p) (p == p->group_leader)
static inline task_t *next_thread(task_t *p)
{
diff --git a/fs/exec.c b/fs/exec.c
index 0291a68..9b0f9c4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -721,9 +721,14 @@ static int de_thread(struct task_struct
list_add_tail(¤t->tasks, &init_task.tasks);
current->parent = current->real_parent = leader->real_parent;
- leader->parent = leader->real_parent = child_reaper;
+ leader->parent = leader->real_parent = current;
current->group_leader = current;
- leader->group_leader = leader;
+ leader->group_leader = current;
+
+ /* Reduce leader to a thread */
+ detach_pid(leader, PIDTYPE_PGID, current->signal->pgrp);
+ detach_pid(leader, PIDTYPE_SID current->signal->session);
+ list_del_init(&leader->tasks);
add_parent(current);
add_parent(leader);
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH rc1-mm] de_thread: fix deadlockable process addition
2006-04-08 17:27 ` Oleg Nesterov
@ 2006-04-08 16:07 ` Eric W. Biederman
2006-04-08 21:13 ` Oleg Nesterov
2006-04-10 23:07 ` [PATCH] de_thread: Don't confuse users do_each_thread Eric W. Biederman
1 sibling, 1 reply; 23+ messages in thread
From: Eric W. Biederman @ 2006-04-08 16:07 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Andrew Morton, linux-kernel
Oleg Nesterov <oleg@tv-sign.ru> writes:
> On 04/08, Eric W. Biederman wrote:
>>
>> Agreed. That is ugly.
>
> Yes, I agree also.
>
>>
>> -#define thread_group_leader(p) (p->pid == p->tgid)
>> +#define thread_group_leader(p) (p == p->group_leader)
>>
>> ...
>>
>> - leader->group_leader = leader;
>> + leader->group_leader = current;
>
> I thought about similar change too, but I am unsure about
> release_task(old_leader)->proc_flush_task() path (because
> I don't understand this code).
proc_flush_task() is purely an optimization to kill the
proc dcache entries when a process exits. So we don't
plug up the dcache with old proc entries.
All it looks at currently and pid an tgid.
So proc_flush_task() should not be a non-issue.
proc_pid_unhash() in -linus is a little more
serious but it only unhashes things.
> This change can confuse next_tid(), but this is minor.
> I don't see other problems.
next_tid?
> However, I think we can do something better instead of
> attach_pid(current)/detach_pid(leader):
>
> void exec_pid(task_t *old, task_t * new, enum pid_type type)
> {
> new->pids[type].pid = old->pids[type].pid;
> hlist_replace_rcu(&old->pids[type].node, &new->pids[type].node);
> old->pids[type].pid = NULL;
> }
>
> So de_thread() can do
>
> exec_pid(leader, current, PIDTYPE_PGID);
> exec_pid(leader, current, PIDTYPE_SID);
>
> This allows us to iterate over pgrp/session lockless without
> seeing the same task twice, btw. But may be it is just unneeded
> complication.
I think it may be worthwhile. Currently we can't do any process
group or session traversal lockless so it isn't worth looking
at until we get the bug fix handled.
Ultimately I think I would like PGID and SID to live in ->signal
in which case we would not need to touch them at all.
>> This requires changing the leaders parents
>>
>> current->parent = current->real_parent = leader->real_parent;
>> - leader->parent = leader->real_parent = child_reaper;
>> + leader->parent = leader->real_parent = current;
>> current->group_leader = current;
>
> I don't understand why do we need this change.
I was just trying to come as close as I could to normal
thread semantics. The fewer special cases in de_thread,
the fewer problems it is.
> Actually, I think leader doesn't need reparenting at all.
> ptrace_unlink(leader) already restored leader->parent = ->real_parent
> and ->sibling. So I think we can do (for review only, should go in a
> separate patch) this:
Duh. All processes in a thread group share the same real_parent.
It looks like the only practical thing we accomplish
with remove_parent/add_parent is to change our place on
our parents list of children.
Since ptrace_link and ptrace_unlink already does this we don't have a
guaranteed order on that list, so skipping the order change
should definitely be safe.
This means your patch doesn't go far enough. We should be
able to kill all of the parent list manipulation in
de_thread. Doing reduces the places that assign
real_parent to just fork and exit.
Eric
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH rc1-mm] de_thread: fix deadlockable process addition
2006-04-08 7:55 ` Eric W. Biederman
@ 2006-04-08 17:27 ` Oleg Nesterov
2006-04-08 16:07 ` Eric W. Biederman
2006-04-10 23:07 ` [PATCH] de_thread: Don't confuse users do_each_thread Eric W. Biederman
0 siblings, 2 replies; 23+ messages in thread
From: Oleg Nesterov @ 2006-04-08 17:27 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Andrew Morton, linux-kernel
On 04/08, Eric W. Biederman wrote:
>
> Agreed. That is ugly.
Yes, I agree also.
>
> -#define thread_group_leader(p) (p->pid == p->tgid)
> +#define thread_group_leader(p) (p == p->group_leader)
>
> ...
>
> - leader->group_leader = leader;
> + leader->group_leader = current;
I thought about similar change too, but I am unsure about
release_task(old_leader)->proc_flush_task() path (because
I don't understand this code).
This change can confuse next_tid(), but this is minor.
I don't see other problems.
However, I think we can do something better instead of
attach_pid(current)/detach_pid(leader):
void exec_pid(task_t *old, task_t * new, enum pid_type type)
{
new->pids[type].pid = old->pids[type].pid;
hlist_replace_rcu(&old->pids[type].node, &new->pids[type].node);
old->pids[type].pid = NULL;
}
So de_thread() can do
exec_pid(leader, current, PIDTYPE_PGID);
exec_pid(leader, current, PIDTYPE_SID);
This allows us to iterate over pgrp/session lockless without
seeing the same task twice, btw. But may be it is just unneeded
complication.
> This requires changing the leaders parents
>
> current->parent = current->real_parent = leader->real_parent;
> - leader->parent = leader->real_parent = child_reaper;
> + leader->parent = leader->real_parent = current;
> current->group_leader = current;
I don't understand why do we need this change.
Actually, I think leader doesn't need reparenting at all.
ptrace_unlink(leader) already restored leader->parent = ->real_parent
and ->sibling. So I think we can do (for review only, should go in a
separate patch) this:
--- MM/fs/exec.c~ 2006-04-08 02:19:15.000000000 +0400
+++ MM/fs/exec.c 2006-04-08 18:50:35.000000000 +0400
@@ -704,7 +704,6 @@ static int de_thread(struct task_struct
ptrace_unlink(current);
ptrace_unlink(leader);
remove_parent(current);
- remove_parent(leader);
/* Become a process group leader with the old leader's pid.
@@ -718,13 +717,11 @@ static int de_thread(struct task_struct
attach_pid(current, PIDTYPE_SID, current->signal->session);
list_replace_rcu(&leader->tasks, ¤t->tasks);
- current->parent = current->real_parent = leader->real_parent;
- leader->parent = leader->real_parent = child_reaper;
+ current->parent = current->real_parent = leader->parent;
current->group_leader = current;
leader->group_leader = leader;
add_parent(current);
- add_parent(leader);
if (ptrace) {
current->ptrace = ptrace;
__ptrace_link(current, parent);
Oleg.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH rc1-mm] de_thread: fix deadlockable process addition
2006-04-08 16:07 ` Eric W. Biederman
@ 2006-04-08 21:13 ` Oleg Nesterov
2006-04-08 21:23 ` Oleg Nesterov
0 siblings, 1 reply; 23+ messages in thread
From: Oleg Nesterov @ 2006-04-08 21:13 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Andrew Morton, linux-kernel
On 04/08, Eric W. Biederman wrote:
> Oleg Nesterov <oleg@tv-sign.ru> writes:
>
> > This change can confuse next_tid(), but this is minor.
> > I don't see other problems.
>
> next_tid?
proc_task_readdir:
first_tid() returns old_leader
next_tid() returns new_leader
de_thread:
old_leader->group_leader = new_leader;
next_rid() returns old_leader again,
because it is not thread_group_leader()
anymore
> This means your patch doesn't go far enough. We should be
> able to kill all of the parent list manipulation in
> de_thread. Doing reduces the places that assign
> real_parent to just fork and exit.
Yes!
I think I understand why we had the reason to reparent 'leader'
in the past. We used to set leader->exit_state = EXIT_ZOMBIE,
so without reparenting current's parent could have a bogus do_wait()
result if this do_wait() happens before release_task(leader).
Now we set leader->exit_state = EXIT_DEAD, which means this task
is not visible to do_wait().
Oleg.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH rc1-mm] de_thread: fix deadlockable process addition
2006-04-08 21:13 ` Oleg Nesterov
@ 2006-04-08 21:23 ` Oleg Nesterov
2006-04-10 22:11 ` Eric W. Biederman
0 siblings, 1 reply; 23+ messages in thread
From: Oleg Nesterov @ 2006-04-08 21:23 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Andrew Morton, linux-kernel
On 04/09, Oleg Nesterov wrote:
>
> proc_task_readdir:
>
> first_tid() returns old_leader
>
> next_tid() returns new_leader
>
> de_thread:
> old_leader->group_leader = new_leader;
>
>
> next_rid() returns old_leader again,
> because it is not thread_group_leader()
> anymore
I think something like this for next_tid() is sufficient:
- if (thread_group_leader(pos))
+ if (pos->pid == pos->tgid)
We can also do the same change in first_tgid().
Oleg.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH rc1-mm] de_thread: fix deadlockable process addition
2006-04-08 21:23 ` Oleg Nesterov
@ 2006-04-10 22:11 ` Eric W. Biederman
0 siblings, 0 replies; 23+ messages in thread
From: Eric W. Biederman @ 2006-04-10 22:11 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Andrew Morton, linux-kernel
Oleg Nesterov <oleg@tv-sign.ru> writes:
> On 04/09, Oleg Nesterov wrote:
>>
>> proc_task_readdir:
>>
>> first_tid() returns old_leader
>>
>> next_tid() returns new_leader
>>
>> de_thread:
>> old_leader->group_leader = new_leader;
>>
>>
>> next_rid() returns old_leader again,
>> because it is not thread_group_leader()
>> anymore
>
> I think something like this for next_tid() is sufficient:
>
> - if (thread_group_leader(pos))
> + if (pos->pid == pos->tgid)
>
> We can also do the same change in first_tgid().
But the loop will terminate when things settle down,
and we can always use my original test of pos == start->group_leader.
I was worried about the stable kernel case and next_tid was
such a generic name I failed to associate it with /proc.
Short of confusion (and de_thread invariably introduces that)
I don't see anything that the code will actually do wrong.
Eric
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] de_thread: Don't confuse users do_each_thread.
2006-04-08 17:27 ` Oleg Nesterov
2006-04-08 16:07 ` Eric W. Biederman
@ 2006-04-10 23:07 ` Eric W. Biederman
2006-04-10 23:16 ` Eric W. Biederman
1 sibling, 1 reply; 23+ messages in thread
From: Eric W. Biederman @ 2006-04-10 23:07 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Oleg Nesterov, Andrew Morton, linux-kernel
Oleg Nesterov spotted two interesting bugs with the current de_thread
code. The simplest is a long standing double decrement of
__get_cpu_var(process_counts) in __unhash_process. Caused by
two processes exiting when only one was created.
The other is that since we no longer detach from the thread_group list
it is possible for do_each_thread when run under the tasklist_lock to
see the same task_struct twice. Once on the task list as a
thread_group_leader, and once on the thread list of another
thread.
The double appearance in do_each_thread can cause a double increment
of mm_core_waiters in zap_threads resulting in problems later on in
coredump_wait.
To remedy those two problems this patch takes the simple approach
of changing the old thread group leader into a child thread.
The only routine in release_task that cares is __unhash_process,
and it can be trivially seen that we handle cleaning up a
thread group leader properly.
Since de_thread doesn't change the pid of the exiting leader process
and instead shares it with the new leader process. I change
thread_group_leader to recognize group leadership based on the
group_leader field and not based on pids. This should also be
slightly cheaper then the existing thread_group_leader macro.
I performed a quick audit and I couldn't see any user of
thread_group_leader that cared how cared about the difference.
I believe this is 2.6.17 material as the bug is present in
2.6.17-rc1 and the fix is simple.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
fs/exec.c | 7 ++++++-
include/linux/sched.h | 3 ++-
2 files changed, 8 insertions(+), 2 deletions(-)
1432f138ecaedcc0575c91e9b0ec84ef25f3f11e
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 541f482..a3e4f6b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1203,7 +1203,8 @@ #define do_each_thread(g, t) \
#define while_each_thread(g, t) \
while ((t = next_thread(t)) != g)
-#define thread_group_leader(p) (p->pid == p->tgid)
+/* de_thread depends on thread_group_leader not being a pid based check */
+#define thread_group_leader(p) (p == p->group_leader)
static inline task_t *next_thread(task_t *p)
{
diff --git a/fs/exec.c b/fs/exec.c
index 0291a68..1e1641f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -723,7 +723,12 @@ static int de_thread(struct task_struct
current->parent = current->real_parent = leader->real_parent;
leader->parent = leader->real_parent = child_reaper;
current->group_leader = current;
- leader->group_leader = leader;
+ leader->group_leader = current;
+
+ /* Reduce leader to a thread */
+ detach_pid(leader, PIDTYPE_PGID, current->signal->pgrp);
+ detach_pid(leader, PIDTYPE_SID, current->signal->session);
+ list_del_init(&leader->tasks);
add_parent(current);
add_parent(leader);
--
1.3.0.rc3.g6b3a
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH] de_thread: Don't confuse users do_each_thread.
2006-04-10 23:07 ` [PATCH] de_thread: Don't confuse users do_each_thread Eric W. Biederman
@ 2006-04-10 23:16 ` Eric W. Biederman
2006-04-10 23:52 ` Ingo Oeser
2006-04-11 10:05 ` Oleg Nesterov
0 siblings, 2 replies; 23+ messages in thread
From: Eric W. Biederman @ 2006-04-10 23:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Oleg Nesterov, Andrew Morton, linux-kernel
I can't seem to send out a correct patch out today with out
sending it twice. I accidently grabbed my old version
that sent to many arguments to detach_pid and so would not
compile. Oops.
---
Oleg Nesterov spotted two interesting bugs with the current de_thread
code. The simplest is a long standing double decrement of
__get_cpu_var(process_counts) in __unhash_process. Caused by
two processes exiting when only one was created.
The other is that since we no longer detach from the thread_group list
it is possible for do_each_thread when run under the tasklist_lock to
see the same task_struct twice. Once on the task list as a
thread_group_leader, and once on the thread list of another
thread.
The double appearance in do_each_thread can cause a double increment
of mm_core_waiters in zap_threads resulting in problems later on in
coredump_wait.
To remedy those two problems this patch takes the simple approach
of changing the old thread group leader into a child thread.
The only routine in release_task that cares is __unhash_process,
and it can be trivially seen that we handle cleaning up a
thread group leader properly.
Since de_thread doesn't change the pid of the exiting leader process
and instead shares it with the new leader process. I change
thread_group_leader to recognize group leadership based on the
group_leader field and not based on pids. This should also be
slightly cheaper then the existing thread_group_leader macro.
I performed a quick audit and I couldn't see any user of
thread_group_leader that cared how cared about the difference.
I believe this is 2.6.17 material as the bug is present in
2.6.17-rc1 and the fix is simple.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
fs/exec.c | 7 ++++++-
include/linux/sched.h | 3 ++-
2 files changed, 8 insertions(+), 2 deletions(-)
e621f800b1de6684beb995014190b3ccaad5c0b3
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 541f482..a3e4f6b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1203,7 +1203,8 @@ #define do_each_thread(g, t) \
#define while_each_thread(g, t) \
while ((t = next_thread(t)) != g)
-#define thread_group_leader(p) (p->pid == p->tgid)
+/* de_thread depends on thread_group_leader not being a pid based check */
+#define thread_group_leader(p) (p == p->group_leader)
static inline task_t *next_thread(task_t *p)
{
diff --git a/fs/exec.c b/fs/exec.c
index 0291a68..4d38ad0 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -723,7 +723,12 @@ static int de_thread(struct task_struct
current->parent = current->real_parent = leader->real_parent;
leader->parent = leader->real_parent = child_reaper;
current->group_leader = current;
- leader->group_leader = leader;
+ leader->group_leader = current;
+
+ /* Reduce leader to a thread */
+ detach_pid(leader, PIDTYPE_PGID);
+ detach_pid(leader, PIDTYPE_SID);
+ list_del_init(&leader->tasks);
add_parent(current);
add_parent(leader);
--
1.3.0.rc3.g6b3a
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] de_thread: Don't confuse users do_each_thread.
2006-04-10 23:16 ` Eric W. Biederman
@ 2006-04-10 23:52 ` Ingo Oeser
2006-04-11 6:18 ` Eric W. Biederman
2006-04-11 10:19 ` Oleg Nesterov
2006-04-11 10:05 ` Oleg Nesterov
1 sibling, 2 replies; 23+ messages in thread
From: Ingo Oeser @ 2006-04-10 23:52 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Linus Torvalds, Oleg Nesterov, Andrew Morton, linux-kernel
Hi Eric,
On Tuesday, 11. April 2006 01:16, you wrote:
> I can't seem to send out a correct patch out today with out
> sending it twice. I accidently grabbed my old version
> that sent to many arguments to detach_pid and so would not
> compile. Oops.
While you are at it: Could you please avoid calculating current over
and over again?
Just calculate it once and use the task_struct pointer.
If I do this for some random occurances of "current" in fs/exec.c I get this
on Linus' latest git tree:
text data bss dec hex filename
9501 84 12 9597 257d old/fs/exec.o
9386 84 12 9482 250a new/fs/exec.o
Do you see any side effects, except reduced size
and increased performance here?
Is a patch doing this for the whole file welcome, if you are too busy?
Regards
Ingo Oeser
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] de_thread: Don't confuse users do_each_thread.
2006-04-11 10:05 ` Oleg Nesterov
@ 2006-04-11 5:23 ` Andrew Morton
2006-04-11 10:47 ` Oleg Nesterov
2006-04-11 6:23 ` Eric W. Biederman
1 sibling, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2006-04-11 5:23 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: ebiederm, torvalds, linux-kernel
Oleg Nesterov <oleg@tv-sign.ru> wrote:
>
> Andrew, could you please drop these ones:
>
> task-make-task-list-manipulations-rcu-safe-fix.patch
> task-make-task-list-manipulations-rcu-safe-fix-fix.patch
plop, plop.
> Then we need this "patch" for de_thread:
>
> - list_add_tail_rcu(¤t->tasks, &init_task.tasks);
> + list_replace_rcu(&leader->tasks, ¤t->tasks);
> ...
> - list_del_init(&leader->tasks);
OK, please send patch.
> Currently I don't know how the code looks in -mm tree,
http://www.zip.com.au/~akpm/linux/patches/stuff/x.bz2 is -mm-as-of-now.
> I lost the plot.
What plot?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] de_thread: Don't confuse users do_each_thread.
2006-04-10 23:52 ` Ingo Oeser
@ 2006-04-11 6:18 ` Eric W. Biederman
2006-04-11 10:19 ` Oleg Nesterov
1 sibling, 0 replies; 23+ messages in thread
From: Eric W. Biederman @ 2006-04-11 6:18 UTC (permalink / raw)
To: Ingo Oeser; +Cc: Linus Torvalds, Oleg Nesterov, Andrew Morton, linux-kernel
Ingo Oeser <ioe-lkml@rameria.de> writes:
> Hi Eric,
>
> On Tuesday, 11. April 2006 01:16, you wrote:
>> I can't seem to send out a correct patch out today with out
>> sending it twice. I accidently grabbed my old version
>> that sent to many arguments to detach_pid and so would not
>> compile. Oops.
>
> While you are at it: Could you please avoid calculating current over
> and over again?
>
> Just calculate it once and use the task_struct pointer.
>
> If I do this for some random occurances of "current" in fs/exec.c I get this
> on Linus' latest git tree:
>
> text data bss dec hex filename
> 9501 84 12 9597 257d old/fs/exec.o
> 9386 84 12 9482 250a new/fs/exec.o
>
> Do you see any side effects, except reduced size
> and increased performance here?
Not that I know of. I was cleaning up pids and signaling and wound
up in fixing de_thread. It is just an ugly beast.
> Is a patch doing this for the whole file welcome, if you are too busy?
I could probably spare a couple of minutes to help review such a patch.
Beyond de_thread I'm to busy to write such a patch.
Eric
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] de_thread: Don't confuse users do_each_thread.
2006-04-11 10:05 ` Oleg Nesterov
2006-04-11 5:23 ` Andrew Morton
@ 2006-04-11 6:23 ` Eric W. Biederman
1 sibling, 0 replies; 23+ messages in thread
From: Eric W. Biederman @ 2006-04-11 6:23 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Andrew Morton, linux-kernel
Oleg Nesterov <oleg@tv-sign.ru> writes:
> On 04/10, Eric W. Biederman wrote:
>>
>> I believe this is 2.6.17 material as the bug is present in
>> 2.6.17-rc1 and the fix is simple.
>>
>> ...
>>
>> + list_del_init(&leader->tasks);
>
> I beleive this is ok for 2.6.17-rc1, but this breaks lockless
> for_each_process/while_each_thread (I am talking about -mm tree).
Agreed.
> Andrew, could you please drop these ones:
>
> task-make-task-list-manipulations-rcu-safe-fix.patch
> task-make-task-list-manipulations-rcu-safe-fix-fix.patch
>
> Then we need this "patch" for de_thread:
>
> - list_add_tail_rcu(¤t->tasks, &init_task.tasks);
> + list_replace_rcu(&leader->tasks, ¤t->tasks);
> ...
> - list_del_init(&leader->tasks);
>
> Currently I don't know how the code looks in -mm tree, I lost the plot.
Since the patches don't conflict on context I bet they are
all in there, at the moment. I am just about to see if I can
sort that out.
Eric
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] de_thread: Don't confuse users do_each_thread.
2006-04-11 10:19 ` Oleg Nesterov
@ 2006-04-11 7:25 ` Ingo Oeser
2006-04-11 7:36 ` Eric W. Biederman
0 siblings, 1 reply; 23+ messages in thread
From: Ingo Oeser @ 2006-04-11 7:25 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, linux-kernel
On Tuesday, 11. April 2006 12:19, Oleg Nesterov wrote:
> On 04/11, Ingo Oeser wrote:
> >
> > While you are at it: Could you please avoid calculating current over
> > and over again?
> >
> > Just calculate it once and use the task_struct pointer.
>
> Ironically, de_thread() has 'tsk' parameter which is equal to 'current'.
That's the thing that made me doubt :-)
But thinking more about it: current cannot change within one thread,
right? So all of this can be cleaned up.
I'll clean them up tonight and sent out a patch against current -mm.
Regards
Ingo Oeser
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] de_thread: Don't confuse users do_each_thread.
2006-04-11 7:25 ` Ingo Oeser
@ 2006-04-11 7:36 ` Eric W. Biederman
2006-04-11 19:50 ` Ingo Oeser
0 siblings, 1 reply; 23+ messages in thread
From: Eric W. Biederman @ 2006-04-11 7:36 UTC (permalink / raw)
To: Ingo Oeser; +Cc: Oleg Nesterov, Linus Torvalds, Andrew Morton, linux-kernel
Ingo Oeser <ioe-lkml@rameria.de> writes:
> On Tuesday, 11. April 2006 12:19, Oleg Nesterov wrote:
>> On 04/11, Ingo Oeser wrote:
>> >
>> > While you are at it: Could you please avoid calculating current over
>> > and over again?
>> >
>> > Just calculate it once and use the task_struct pointer.
>>
>> Ironically, de_thread() has 'tsk' parameter which is equal to 'current'.
>
> That's the thing that made me doubt :-)
>
> But thinking more about it: current cannot change within one thread,
> right? So all of this can be cleaned up.
>
> I'll clean them up tonight and sent out a patch against current -mm.
Please skip de_thread. I will catch that one, in a minute. There is
enough churn on that function that you are likely to patch the wrong
version.
Eric
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] de_thread: Don't confuse users do_each_thread.
2006-04-10 23:16 ` Eric W. Biederman
2006-04-10 23:52 ` Ingo Oeser
@ 2006-04-11 10:05 ` Oleg Nesterov
2006-04-11 5:23 ` Andrew Morton
2006-04-11 6:23 ` Eric W. Biederman
1 sibling, 2 replies; 23+ messages in thread
From: Oleg Nesterov @ 2006-04-11 10:05 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linus Torvalds, Andrew Morton, linux-kernel
On 04/10, Eric W. Biederman wrote:
>
> I believe this is 2.6.17 material as the bug is present in
> 2.6.17-rc1 and the fix is simple.
>
> ...
>
> + list_del_init(&leader->tasks);
I beleive this is ok for 2.6.17-rc1, but this breaks lockless
for_each_process/while_each_thread (I am talking about -mm tree).
Andrew, could you please drop these ones:
task-make-task-list-manipulations-rcu-safe-fix.patch
task-make-task-list-manipulations-rcu-safe-fix-fix.patch
Then we need this "patch" for de_thread:
- list_add_tail_rcu(¤t->tasks, &init_task.tasks);
+ list_replace_rcu(&leader->tasks, ¤t->tasks);
...
- list_del_init(&leader->tasks);
Currently I don't know how the code looks in -mm tree, I lost the plot.
Oleg.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] de_thread: Don't confuse users do_each_thread.
2006-04-10 23:52 ` Ingo Oeser
2006-04-11 6:18 ` Eric W. Biederman
@ 2006-04-11 10:19 ` Oleg Nesterov
2006-04-11 7:25 ` Ingo Oeser
1 sibling, 1 reply; 23+ messages in thread
From: Oleg Nesterov @ 2006-04-11 10:19 UTC (permalink / raw)
To: Ingo Oeser; +Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, linux-kernel
On 04/11, Ingo Oeser wrote:
>
> While you are at it: Could you please avoid calculating current over
> and over again?
>
> Just calculate it once and use the task_struct pointer.
Ironically, de_thread() has 'tsk' parameter which is equal to 'current'.
Oleg.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] de_thread: Don't confuse users do_each_thread.
2006-04-11 5:23 ` Andrew Morton
@ 2006-04-11 10:47 ` Oleg Nesterov
0 siblings, 0 replies; 23+ messages in thread
From: Oleg Nesterov @ 2006-04-11 10:47 UTC (permalink / raw)
To: Andrew Morton; +Cc: ebiederm, torvalds, linux-kernel
On 04/10, Andrew Morton wrote:
>
> Oleg Nesterov <oleg@tv-sign.ru> wrote:
> >
> > Currently I don't know how the code looks in -mm tree,
>
> http://www.zip.com.au/~akpm/linux/patches/stuff/x.bz2 is -mm-as-of-now.
Thanks. Here is the patch:
[PATCH] de_thread: fix lockless do_each_thread
We should keep the value of old_leader->tasks.next in de_thread,
otherwise we can't do for_each_process/do_each_thread without
tasklist_lock held.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
--- mm/fs/exec.c~ 2006-04-11 14:31:18.000000000 +0400
+++ mm/fs/exec.c 2006-04-11 14:35:23.000000000 +0400
@@ -725,7 +725,7 @@ static int de_thread(struct task_struct
attach_pid(current, PIDTYPE_PID, current->pid);
attach_pid(current, PIDTYPE_PGID, current->signal->pgrp);
attach_pid(current, PIDTYPE_SID, current->signal->session);
- list_add_tail_rcu(¤t->tasks, &init_task.tasks);
+ list_replace_rcu(&leader->tasks, ¤t->tasks);
current->parent = current->real_parent = leader->real_parent;
leader->parent = leader->real_parent = child_reaper;
@@ -735,7 +735,6 @@ static int de_thread(struct task_struct
/* Reduce leader to a thread */
detach_pid(leader, PIDTYPE_PGID);
detach_pid(leader, PIDTYPE_SID);
- list_del_init(&leader->tasks);
add_parent(current);
add_parent(leader);
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] de_thread: Don't confuse users do_each_thread.
2006-04-11 7:36 ` Eric W. Biederman
@ 2006-04-11 19:50 ` Ingo Oeser
0 siblings, 0 replies; 23+ messages in thread
From: Ingo Oeser @ 2006-04-11 19:50 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Oleg Nesterov, Linus Torvalds, Andrew Morton, linux-kernel
On Tuesday, 11. April 2006 09:36, Eric W. Biederman wrote:
[cleanup of current usage]
> Please skip de_thread. I will catch that one, in a minute. There is
> enough churn on that function that you are likely to patch the wrong
> version.
Ok, I'll wait for you and Oleg. Maybe I should revisit this cleanup
after 2.6.18 opens up. If you both are done in that area, just tell me
and I'll cook up a patch.
Regards
Ingo Oeser
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2006-04-11 19:52 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-06 22:04 [PATCH rc1-mm] de_thread: fix deadlockable process addition Oleg Nesterov
2006-04-06 22:58 ` Eric W. Biederman
2006-04-07 23:46 ` Oleg Nesterov
2006-04-07 22:51 ` Andrew Morton
2006-04-07 22:56 ` Andrew Morton
2006-04-08 7:55 ` Eric W. Biederman
2006-04-08 17:27 ` Oleg Nesterov
2006-04-08 16:07 ` Eric W. Biederman
2006-04-08 21:13 ` Oleg Nesterov
2006-04-08 21:23 ` Oleg Nesterov
2006-04-10 22:11 ` Eric W. Biederman
2006-04-10 23:07 ` [PATCH] de_thread: Don't confuse users do_each_thread Eric W. Biederman
2006-04-10 23:16 ` Eric W. Biederman
2006-04-10 23:52 ` Ingo Oeser
2006-04-11 6:18 ` Eric W. Biederman
2006-04-11 10:19 ` Oleg Nesterov
2006-04-11 7:25 ` Ingo Oeser
2006-04-11 7:36 ` Eric W. Biederman
2006-04-11 19:50 ` Ingo Oeser
2006-04-11 10:05 ` Oleg Nesterov
2006-04-11 5:23 ` Andrew Morton
2006-04-11 10:47 ` Oleg Nesterov
2006-04-11 6:23 ` Eric W. Biederman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox