* [PATCH] kernel: move exit_task_work() past exit_notify()
@ 2013-04-12 19:27 Andrey Vagin
2013-04-13 14:22 ` Oleg Nesterov
0 siblings, 1 reply; 9+ messages in thread
From: Andrey Vagin @ 2013-04-12 19:27 UTC (permalink / raw)
Cc: linux-kernel, Andrey Vagin, Andrew Morton, Eric W. Biederman,
Al Viro, Oleg Nesterov, David Howells
exit_task_work() must be called after exit_notify, because
exit_task_namespaces() may release a file and fput() enqueues a work.
exit_notify
exit_task_namespaces
free_ipc_ns
shm_destroy
fput
task_work_add
so if task works don't run after exit_notify(), a few files may leak.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
kernel/exit.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/exit.c b/kernel/exit.c
index 60bc027..1d1129b 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -795,7 +795,6 @@ void do_exit(long code)
exit_shm(tsk);
exit_files(tsk);
exit_fs(tsk);
- exit_task_work(tsk);
check_stack_usage();
exit_thread();
@@ -822,6 +821,7 @@ void do_exit(long code)
ptrace_put_breakpoints(tsk);
exit_notify(tsk, group_dead);
+ exit_task_work(tsk);
#ifdef CONFIG_NUMA
task_lock(tsk);
mpol_put(tsk->mempolicy);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] kernel: move exit_task_work() past exit_notify() 2013-04-12 19:27 [PATCH] kernel: move exit_task_work() past exit_notify() Andrey Vagin @ 2013-04-13 14:22 ` Oleg Nesterov 2013-04-13 15:54 ` [PATCH 0/1] (Was: kernel: move exit_task_work() past exit_notify()) Oleg Nesterov 0 siblings, 1 reply; 9+ messages in thread From: Oleg Nesterov @ 2013-04-13 14:22 UTC (permalink / raw) To: Andrey Vagin Cc: linux-kernel, Andrew Morton, Eric W. Biederman, Al Viro, David Howells On 04/12, Andrey Vagin wrote: > exit_task_work() must be called after exit_notify, because > exit_task_namespaces() may release a file and fput() enqueues a work. > > exit_notify > exit_task_namespaces > free_ipc_ns > shm_destroy > fput > task_work_add > > so if task works don't run after exit_notify(), a few files may leak. > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: "Eric W. Biederman" <ebiederm@xmission.com> > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: Oleg Nesterov <oleg@redhat.com> > Cc: David Howells <dhowells@redhat.com> > Signed-off-by: Andrey Vagin <avagin@openvz.org> > --- > kernel/exit.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/exit.c b/kernel/exit.c > index 60bc027..1d1129b 100644 > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -795,7 +795,6 @@ void do_exit(long code) > exit_shm(tsk); > exit_files(tsk); > exit_fs(tsk); > - exit_task_work(tsk); > check_stack_usage(); > exit_thread(); > > @@ -822,6 +821,7 @@ void do_exit(long code) > ptrace_put_breakpoints(tsk); > > exit_notify(tsk, group_dead); > + exit_task_work(tsk); I am not comfortable with this change... The task is "really dead" after exit_notify(), even release_task(current) can be called. Let me think a bit... It seems that we have the alternative. Oleg. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 0/1] (Was: kernel: move exit_task_work() past exit_notify()) 2013-04-13 14:22 ` Oleg Nesterov @ 2013-04-13 15:54 ` Oleg Nesterov 2013-04-13 15:55 ` [PATCH 1/1] move exit_task_namespaces() outside of exit_notify() Oleg Nesterov 0 siblings, 1 reply; 9+ messages in thread From: Oleg Nesterov @ 2013-04-13 15:54 UTC (permalink / raw) To: Andrey Vagin, Eric W. Biederman Cc: linux-kernel, Andrew Morton, Al Viro, David Howells On 04/13, Oleg Nesterov wrote: > > > --- a/kernel/exit.c > > +++ b/kernel/exit.c > > @@ -795,7 +795,6 @@ void do_exit(long code) > > exit_shm(tsk); > > exit_files(tsk); > > exit_fs(tsk); > > - exit_task_work(tsk); > > check_stack_usage(); > > exit_thread(); > > > > @@ -822,6 +821,7 @@ void do_exit(long code) > > ptrace_put_breakpoints(tsk); > > > > exit_notify(tsk, group_dead); > > + exit_task_work(tsk); > > I am not comfortable with this change... > > The task is "really dead" after exit_notify(), even release_task(current) > can be called. > > Let me think a bit... It seems that we have the alternative. Andrey, Eric, how about this patch? COMPLETELY UNTESTED and I need to recheck, but perhaps you can review? Oleg. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/1] move exit_task_namespaces() outside of exit_notify() 2013-04-13 15:54 ` [PATCH 0/1] (Was: kernel: move exit_task_work() past exit_notify()) Oleg Nesterov @ 2013-04-13 15:55 ` Oleg Nesterov 2013-04-14 1:52 ` Eric W. Biederman ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Oleg Nesterov @ 2013-04-13 15:55 UTC (permalink / raw) To: Andrey Vagin, Eric W. Biederman Cc: linux-kernel, Andrew Morton, Al Viro, David Howells exit_notify() does exit_task_namespaces() after forget_original_parent(). This was needed to ensure that ->nsproxy can't be cleared prematurely, an exiting child we are going to reparent can do do_notify_parent() and use the parent's (ours) pid_ns. However, after 32084504 "pidns: use task_active_pid_ns in do_notify_parent" ->nsproxy != NULL is no longer needed, we rely on task_active_pid_ns(). Move exit_task_namespaces() from exit_notify() to do_exit(), after exit_fs() and before exit_task_work(). This solves the problem reported by Andrey, free_ipc_ns()->shm_destroy() does fput() which needs task_work_add(). And this allows us do simplify exit_notify(), we can avoid unlock/lock(tasklist) and we can change ->exit_state instead of PF_EXITING in forget_original_parent(). Reported-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- x/kernel/exit.c +++ x/kernel/exit.c @@ -649,7 +649,6 @@ static void exit_notify(struct task_stru * jobs, send them a SIGHUP and then a SIGCONT. (POSIX 3.2.2.2) */ forget_original_parent(tsk); - exit_task_namespaces(tsk); write_lock_irq(&tasklist_lock); if (group_dead) @@ -795,6 +794,7 @@ void do_exit(long code) exit_shm(tsk); exit_files(tsk); exit_fs(tsk); + exit_task_namespaces(tsk); exit_task_work(tsk); check_stack_usage(); exit_thread(); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] move exit_task_namespaces() outside of exit_notify() 2013-04-13 15:55 ` [PATCH 1/1] move exit_task_namespaces() outside of exit_notify() Oleg Nesterov @ 2013-04-14 1:52 ` Eric W. Biederman 2013-04-15 9:57 ` Andrey Wagin 2013-06-13 9:24 ` Andrew Vagin 2 siblings, 0 replies; 9+ messages in thread From: Eric W. Biederman @ 2013-04-14 1:52 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrey Vagin, linux-kernel, Andrew Morton, Al Viro, David Howells Oleg Nesterov <oleg@redhat.com> writes: > exit_notify() does exit_task_namespaces() after > forget_original_parent(). This was needed to ensure that ->nsproxy > can't be cleared prematurely, an exiting child we are going to > reparent can do do_notify_parent() and use the parent's (ours) pid_ns. > > However, after 32084504 "pidns: use task_active_pid_ns in > do_notify_parent" ->nsproxy != NULL is no longer needed, we rely > on task_active_pid_ns(). > > Move exit_task_namespaces() from exit_notify() to do_exit(), after > exit_fs() and before exit_task_work(). > > This solves the problem reported by Andrey, free_ipc_ns()->shm_destroy() > does fput() which needs task_work_add(). And this allows us do simplify > exit_notify(), we can avoid unlock/lock(tasklist) and we can change > ->exit_state instead of PF_EXITING in forget_original_parent(). It feels like this ought to work, certainly the pid namespace should not need this, and the pid namespace was the motivating case for most of the movement. However we haven't called exit_task_namespaces this early since 2006. Ugh. I goofed and used that field in scm.c. Sigh. I will push a patch to rename that field nsproxy->childrens_pid_ns so it is harder to make the mistake I just made. None of the uses of nsproxy->net_ns look like they will be used on the exit path. The /proc/<pid>/ns/{uts,ipc,net,mnt,pid} files are fine as nsproxy itself is what becomes NULL and they test for that. Well except the pid file uses task_active_pid_ns. nsproxy->ipc_ns is isolated to files under ipc so it is probably fine. Likewise the nsproxy->uts_ns uses look like they will be fine. Likewise the nsproxy->mnt_ns uses look like they will be fine. So in a quick skim through the uses no problem cases stick out, nor can I think of anything that would cause trouble. This looks like a good patch. Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> > Reported-by: Andrey Vagin <avagin@openvz.org> > Signed-off-by: Oleg Nesterov <oleg@redhat.com> > > --- x/kernel/exit.c > +++ x/kernel/exit.c > @@ -649,7 +649,6 @@ static void exit_notify(struct task_stru > * jobs, send them a SIGHUP and then a SIGCONT. (POSIX 3.2.2.2) > */ > forget_original_parent(tsk); > - exit_task_namespaces(tsk); > > write_lock_irq(&tasklist_lock); > if (group_dead) > @@ -795,6 +794,7 @@ void do_exit(long code) > exit_shm(tsk); > exit_files(tsk); > exit_fs(tsk); > + exit_task_namespaces(tsk); > exit_task_work(tsk); > check_stack_usage(); > exit_thread(); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] move exit_task_namespaces() outside of exit_notify() 2013-04-13 15:55 ` [PATCH 1/1] move exit_task_namespaces() outside of exit_notify() Oleg Nesterov 2013-04-14 1:52 ` Eric W. Biederman @ 2013-04-15 9:57 ` Andrey Wagin 2013-04-15 15:33 ` Oleg Nesterov 2013-06-13 9:24 ` Andrew Vagin 2 siblings, 1 reply; 9+ messages in thread From: Andrey Wagin @ 2013-04-15 9:57 UTC (permalink / raw) To: Oleg Nesterov Cc: Eric W. Biederman, LKML, Andrew Morton, Al Viro, David Howells 2013/4/13 Oleg Nesterov <oleg@redhat.com> > > exit_notify() does exit_task_namespaces() after > forget_original_parent(). This was needed to ensure that ->nsproxy > can't be cleared prematurely, an exiting child we are going to > reparent can do do_notify_parent() and use the parent's (ours) pid_ns. > > However, after 32084504 "pidns: use task_active_pid_ns in > do_notify_parent" ->nsproxy != NULL is no longer needed, we rely > on task_active_pid_ns(). > > Move exit_task_namespaces() from exit_notify() to do_exit(), after > exit_fs() and before exit_task_work(). > > This solves the problem reported by Andrey, free_ipc_ns()->shm_destroy() > does fput() which needs task_work_add(). And this allows us do simplify > exit_notify(), we can avoid unlock/lock(tasklist) and we can change > ->exit_state instead of PF_EXITING in forget_original_parent(). > It looks good for me. I have tested it a bit and don't find any problem. Oleg, thank you. Acked-by: Andrew Vagin <avagin@gmail.com> > Reported-by: Andrey Vagin <avagin@openvz.org> > Signed-off-by: Oleg Nesterov <oleg@redhat.com> > > --- x/kernel/exit.c > +++ x/kernel/exit.c > @@ -649,7 +649,6 @@ static void exit_notify(struct task_stru > * jobs, send them a SIGHUP and then a SIGCONT. (POSIX 3.2.2.2) > */ > forget_original_parent(tsk); > - exit_task_namespaces(tsk); > > write_lock_irq(&tasklist_lock); > if (group_dead) > @@ -795,6 +794,7 @@ void do_exit(long code) > exit_shm(tsk); > exit_files(tsk); > exit_fs(tsk); > + exit_task_namespaces(tsk); > exit_task_work(tsk); > check_stack_usage(); > exit_thread(); > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] move exit_task_namespaces() outside of exit_notify() 2013-04-15 9:57 ` Andrey Wagin @ 2013-04-15 15:33 ` Oleg Nesterov 0 siblings, 0 replies; 9+ messages in thread From: Oleg Nesterov @ 2013-04-15 15:33 UTC (permalink / raw) To: Andrey Wagin, Al Viro Cc: Eric W. Biederman, LKML, Andrew Morton, David Howells On 04/15, Andrey Wagin wrote: > > It looks good for me. I have tested it a bit and don't find any problem. > Oleg, thank you. > > Acked-by: Andrew Vagin <avagin@gmail.com> Thanks Andrey and Eric. > > --- x/kernel/exit.c > > +++ x/kernel/exit.c > > @@ -649,7 +649,6 @@ static void exit_notify(struct task_stru > > * jobs, send them a SIGHUP and then a SIGCONT. (POSIX 3.2.2.2) > > */ > > forget_original_parent(tsk); > > - exit_task_namespaces(tsk); > > > > write_lock_irq(&tasklist_lock); > > if (group_dead) > > @@ -795,6 +794,7 @@ void do_exit(long code) > > exit_shm(tsk); > > exit_files(tsk); > > exit_fs(tsk); > > + exit_task_namespaces(tsk); > > exit_task_work(tsk); I do not see any problems with this patch too... but still I am worried. Even if fput() can work correctly after exit_task_namespaces(), this limits the usage of task_work_add(). Probably this is fine, but can't we at least discuss another change? We can change fput() so that it can always work, even after exit_task_work(), void fput(struct file *file) { if (atomic_long_dec_and_test(&file->f_count)) { struct task_struct *task = current; unsigned long flags; file_sb_list_del(file); if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { init_task_work(&file->f_u.fu_rcuhead, ____fput); if (!task_work_add(task, &file->f_u.fu_rcuhead, true)) return; } spin_lock_irqsave(&delayed_fput_lock, flags); list_add(&file->f_u.fu_list, &delayed_fput_list); schedule_work(&delayed_fput_work); spin_unlock_irqrestore(&delayed_fput_lock, flags); } } Al, what do you think? Untested patch below. Oleg. --- x/fs/file_table.c +++ x/fs/file_table.c @@ -306,17 +306,19 @@ void fput(struct file *file) { if (atomic_long_dec_and_test(&file->f_count)) { struct task_struct *task = current; + unsigned long flags; + file_sb_list_del(file); - if (unlikely(in_interrupt() || task->flags & PF_KTHREAD)) { - unsigned long flags; - spin_lock_irqsave(&delayed_fput_lock, flags); - list_add(&file->f_u.fu_list, &delayed_fput_list); - schedule_work(&delayed_fput_work); - spin_unlock_irqrestore(&delayed_fput_lock, flags); - return; + if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { + init_task_work(&file->f_u.fu_rcuhead, ____fput); + if (!task_work_add(task, &file->f_u.fu_rcuhead, true)) + return; } - init_task_work(&file->f_u.fu_rcuhead, ____fput); - task_work_add(task, &file->f_u.fu_rcuhead, true); + + spin_lock_irqsave(&delayed_fput_lock, flags); + list_add(&file->f_u.fu_list, &delayed_fput_list); + schedule_work(&delayed_fput_work); + spin_unlock_irqrestore(&delayed_fput_lock, flags); } } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] move exit_task_namespaces() outside of exit_notify() 2013-04-13 15:55 ` [PATCH 1/1] move exit_task_namespaces() outside of exit_notify() Oleg Nesterov 2013-04-14 1:52 ` Eric W. Biederman 2013-04-15 9:57 ` Andrey Wagin @ 2013-06-13 9:24 ` Andrew Vagin 2013-06-14 13:19 ` Oleg Nesterov 2 siblings, 1 reply; 9+ messages in thread From: Andrew Vagin @ 2013-06-13 9:24 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrey Vagin, Eric W. Biederman, linux-kernel, Andrew Morton, Al Viro, David Howells On Sat, Apr 13, 2013 at 05:55:21PM +0200, Oleg Nesterov wrote: > exit_notify() does exit_task_namespaces() after > forget_original_parent(). This was needed to ensure that ->nsproxy > can't be cleared prematurely, an exiting child we are going to > reparent can do do_notify_parent() and use the parent's (ours) pid_ns. > > However, after 32084504 "pidns: use task_active_pid_ns in > do_notify_parent" ->nsproxy != NULL is no longer needed, we rely > on task_active_pid_ns(). > > Move exit_task_namespaces() from exit_notify() to do_exit(), after > exit_fs() and before exit_task_work(). > > This solves the problem reported by Andrey, free_ipc_ns()->shm_destroy() > does fput() which needs task_work_add(). And this allows us do simplify > exit_notify(), we can avoid unlock/lock(tasklist) and we can change > ->exit_state instead of PF_EXITING in forget_original_parent(). > It looks good for me. kmemleak doesn't report any leaks. CRIU test cases, which use namespaces, work without any errors. Thanks. Acked-by: Andrey Vagin <avagin@openvz.org> > Reported-by: Andrey Vagin <avagin@openvz.org> > Signed-off-by: Oleg Nesterov <oleg@redhat.com> > > --- x/kernel/exit.c > +++ x/kernel/exit.c > @@ -649,7 +649,6 @@ static void exit_notify(struct task_stru > * jobs, send them a SIGHUP and then a SIGCONT. (POSIX 3.2.2.2) > */ > forget_original_parent(tsk); > - exit_task_namespaces(tsk); > > write_lock_irq(&tasklist_lock); > if (group_dead) > @@ -795,6 +794,7 @@ void do_exit(long code) > exit_shm(tsk); > exit_files(tsk); > exit_fs(tsk); > + exit_task_namespaces(tsk); > exit_task_work(tsk); > check_stack_usage(); > exit_thread(); > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] move exit_task_namespaces() outside of exit_notify() 2013-06-13 9:24 ` Andrew Vagin @ 2013-06-14 13:19 ` Oleg Nesterov 0 siblings, 0 replies; 9+ messages in thread From: Oleg Nesterov @ 2013-06-14 13:19 UTC (permalink / raw) To: Andrew Vagin Cc: Andrey Vagin, Eric W. Biederman, linux-kernel, Andrew Morton, Al Viro, David Howells On 06/13, Andrew Vagin wrote: > > On Sat, Apr 13, 2013 at 05:55:21PM +0200, Oleg Nesterov wrote: > > exit_notify() does exit_task_namespaces() after > > forget_original_parent(). This was needed to ensure that ->nsproxy > > can't be cleared prematurely, an exiting child we are going to > > reparent can do do_notify_parent() and use the parent's (ours) pid_ns. > > > > However, after 32084504 "pidns: use task_active_pid_ns in > > do_notify_parent" ->nsproxy != NULL is no longer needed, we rely > > on task_active_pid_ns(). > > > > Move exit_task_namespaces() from exit_notify() to do_exit(), after > > exit_fs() and before exit_task_work(). > > > > This solves the problem reported by Andrey, free_ipc_ns()->shm_destroy() > > does fput() which needs task_work_add(). And this allows us do simplify > > exit_notify(), we can avoid unlock/lock(tasklist) and we can change > > ->exit_state instead of PF_EXITING in forget_original_parent(). > > > > It looks good for me. kmemleak doesn't report any leaks. CRIU test > cases, which use namespaces, work without any errors. OK, thanks. I guess I need to resend this patch with your and Eric's acks. But in fact I am going to send the 2nd change as well, it should fix the same problem too and imho it makes sense anyway. And to me the 2nd one looks like 3.10 material, it is much more simple and straightforward. Oleg. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-06-14 13:23 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-12 19:27 [PATCH] kernel: move exit_task_work() past exit_notify() Andrey Vagin 2013-04-13 14:22 ` Oleg Nesterov 2013-04-13 15:54 ` [PATCH 0/1] (Was: kernel: move exit_task_work() past exit_notify()) Oleg Nesterov 2013-04-13 15:55 ` [PATCH 1/1] move exit_task_namespaces() outside of exit_notify() Oleg Nesterov 2013-04-14 1:52 ` Eric W. Biederman 2013-04-15 9:57 ` Andrey Wagin 2013-04-15 15:33 ` Oleg Nesterov 2013-06-13 9:24 ` Andrew Vagin 2013-06-14 13:19 ` Oleg Nesterov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).