All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 4.9 00/10] fix a race in release_task when flushing the dentry
@ 2021-01-07  7:52 Wen Yang
  2021-01-07  7:52 ` [PATCH v2 4.9 01/10] clone: add CLONE_PIDFD Wen Yang
                   ` (10 more replies)
  0 siblings, 11 replies; 17+ messages in thread
From: Wen Yang @ 2021-01-07  7:52 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Sasha Levin; +Cc: Xunlei Pang, linux-kernel, Wen Yang

The dentries such as /proc/<pid>/ns/ have the DCACHE_OP_DELETE flag, they 
should be deleted when the process exits. 

Suppose the following race appears: 

release_task                 dput 
-> proc_flush_task 
                             -> dentry->d_op->d_delete(dentry) 
-> __exit_signal 
                             -> dentry->d_lockref.count--  and return. 

In the proc_flush_task(), if another process is using this dentry, it will
not be deleted. At the same time, in dput(), d_op->d_delete() can be executed
before __exit_signal(pid has not been hashed), d_delete returns false, so
this dentry still cannot be deleted.

This dentry will always be cached (although its count is 0 and the
DCACHE_OP_DELETE flag is set), its parent denry will also be cached too, and
these dentries can only be deleted when drop_caches is manually triggered.

This will result in wasted memory. What's more troublesome is that these
dentries reference pid, according to the commit f333c700c610 ("pidns: Add a
limit on the number of pid namespaces"), if the pid cannot be released, it
may result in the inability to create a new pid_ns.

This issue was introduced by 60347f6716aa ("pid namespaces: prepare
proc_flust_task() to flush entries from multiple proc trees"), exposed by
f333c700c610 ("pidns: Add a limit on the number of pid namespaces"), and then
fixed by 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc").


Alexey Dobriyan (1):
  proc: use %u for pid printing and slightly less stack

Andreas Gruenbacher (1):
  proc: Pass file mode to proc_pid_make_inode

Christian Brauner (1):
  clone: add CLONE_PIDFD

Eric W. Biederman (6):
  proc: Better ownership of files for non-dumpable tasks in user
    namespaces
  proc: Rename in proc_inode rename sysctl_inodes sibling_inodes
  proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache
  proc: Clear the pieces of proc_inode that proc_evict_inode cares about
  proc: Use d_invalidate in proc_prune_siblings_dcache
  proc: Use a list of inodes to flush from proc

Joel Fernandes (Google) (1):
  pidfd: add polling support

 fs/proc/base.c             | 242 ++++++++++++++++++++-------------------------
 fs/proc/fd.c               |  20 +---
 fs/proc/inode.c            |  67 ++++++++++++-
 fs/proc/internal.h         |  22 ++---
 fs/proc/namespaces.c       |   3 +-
 fs/proc/proc_sysctl.c      |  45 ++-------
 fs/proc/self.c             |   6 +-
 fs/proc/thread_self.c      |   5 +-
 include/linux/pid.h        |   5 +
 include/linux/proc_fs.h    |   4 +-
 include/uapi/linux/sched.h |   1 +
 kernel/exit.c              |   5 +-
 kernel/fork.c              | 131 +++++++++++++++++++++++-
 kernel/pid.c               |   3 +
 kernel/signal.c            |  11 +++
 security/selinux/hooks.c   |   1 +
 16 files changed, 343 insertions(+), 228 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread
* [PATCH v2 4.9 00/10] fix a race in release_task when flushing the dentry
@ 2021-01-07  7:34 Wen Yang
  0 siblings, 0 replies; 17+ messages in thread
From: Wen Yang @ 2021-01-07  7:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Sasha Levin; +Cc: Xunlei Pang, linux-kernel, Wen Yang

The dentries such as /proc/<pid>/ns/ have the DCACHE_OP_DELETE flag, they 
should be deleted when the process exits. 

Suppose the following race appears: 

release_task                 dput 
-> proc_flush_task 
                             -> dentry->d_op->d_delete(dentry) 
-> __exit_signal 
                             -> dentry->d_lockref.count--  and return. 

In the proc_flush_task(), if another process is using this dentry, it will
not be deleted. At the same time, in dput(), d_op->d_delete() can be executed
before __exit_signal(pid has not been hashed), d_delete returns false, so
this dentry still cannot be deleted.

This dentry will always be cached (although its count is 0 and the
DCACHE_OP_DELETE flag is set), its parent denry will also be cached too, and
these dentries can only be deleted when drop_caches is manually triggered.

This will result in wasted memory. What's more troublesome is that these
dentries reference pid, according to the commit f333c700c610 ("pidns: Add a
limit on the number of pid namespaces"), if the pid cannot be released, it
may result in the inability to create a new pid_ns.

This issue was introduced by 60347f6716aa ("pid namespaces: prepare
proc_flust_task() to flush entries from multiple proc trees"), exposed by
f333c700c610 ("pidns: Add a limit on the number of pid namespaces"), and then
fixed by 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc").


Alexey Dobriyan (1):
  proc: use %u for pid printing and slightly less stack

Andreas Gruenbacher (1):
  proc: Pass file mode to proc_pid_make_inode

Christian Brauner (1):
  clone: add CLONE_PIDFD

Eric W. Biederman (6):
  proc: Better ownership of files for non-dumpable tasks in user
    namespaces
  proc: Rename in proc_inode rename sysctl_inodes sibling_inodes
  proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache
  proc: Clear the pieces of proc_inode that proc_evict_inode cares about
  proc: Use d_invalidate in proc_prune_siblings_dcache
  proc: Use a list of inodes to flush from proc

Joel Fernandes (Google) (1):
  pidfd: add polling support

 fs/proc/base.c             | 242 ++++++++++++++++++++-------------------------
 fs/proc/fd.c               |  20 +---
 fs/proc/inode.c            |  67 ++++++++++++-
 fs/proc/internal.h         |  22 ++---
 fs/proc/namespaces.c       |   3 +-
 fs/proc/proc_sysctl.c      |  45 ++-------
 fs/proc/self.c             |   6 +-
 fs/proc/thread_self.c      |   5 +-
 include/linux/pid.h        |   5 +
 include/linux/proc_fs.h    |   4 +-
 include/uapi/linux/sched.h |   1 +
 kernel/exit.c              |   5 +-
 kernel/fork.c              | 131 +++++++++++++++++++++++-
 kernel/pid.c               |   3 +
 kernel/signal.c            |  11 +++
 security/selinux/hooks.c   |   1 +
 16 files changed, 343 insertions(+), 228 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-01-09 12:35 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-01-07  7:52 [PATCH v2 4.9 00/10] fix a race in release_task when flushing the dentry Wen Yang
2021-01-07  7:52 ` [PATCH v2 4.9 01/10] clone: add CLONE_PIDFD Wen Yang
2021-01-07  7:52 ` [PATCH v2 4.9 02/10] pidfd: add polling support Wen Yang
2021-01-07  7:52 ` [PATCH v2 4.9 03/10] proc: Pass file mode to proc_pid_make_inode Wen Yang
2021-01-07  7:52 ` [PATCH v2 4.9 04/10] proc: Better ownership of files for non-dumpable tasks in user namespaces Wen Yang
2021-01-07  7:52 ` [PATCH v2 4.9 05/10] proc: use %u for pid printing and slightly less stack Wen Yang
2021-01-07  7:52 ` [PATCH v2 4.9 06/10] proc: Rename in proc_inode rename sysctl_inodes sibling_inodes Wen Yang
2021-01-07  7:52 ` [PATCH v2 4.9 07/10] proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache Wen Yang
2021-01-07  7:52 ` [PATCH v2 4.9 08/10] proc: Clear the pieces of proc_inode that proc_evict_inode cares about Wen Yang
2021-01-07  7:52 ` [PATCH v2 4.9 09/10] proc: Use d_invalidate in proc_prune_siblings_dcache Wen Yang
2021-01-07  7:52 ` [PATCH v2 4.9 10/10] proc: Use a list of inodes to flush from proc Wen Yang
2021-01-07 12:17 ` [PATCH v2 4.9 00/10] fix a race in release_task when flushing the dentry Greg Kroah-Hartman
2021-01-07 16:21   ` Wen Yang
2021-01-07 18:28     ` Greg Kroah-Hartman
2021-01-08  2:42       ` Wen Yang
2021-01-09 12:36         ` Greg Kroah-Hartman
  -- strict thread matches above, loose matches on Subject: below --
2021-01-07  7:34 Wen Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.