From: Brian Foster <bfoster@redhat.com>
To: Zhihao Cheng <chengzhihao1@huawei.com>
Cc: ebiederm@xmission.com, willy@infradead.org, bhe@redhat.com,
akpm@linux-foundation.org, kaleshsingh@google.com,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
yukuai3@huawei.com
Subject: Re: [PATCH v5] proc: Fix a dentry lock race between release_task and lookup
Date: Thu, 14 Jul 2022 08:01:15 -0400 [thread overview]
Message-ID: <YtAFix81eUBjzgGN@bfoster> (raw)
In-Reply-To: <20220713130029.4133533-1-chengzhihao1@huawei.com>
On Wed, Jul 13, 2022 at 09:00:29PM +0800, Zhihao Cheng wrote:
> Commit 7bc3e6e55acf06 ("proc: Use a list of inodes to flush from proc")
> moved proc_flush_task() behind __exit_signal(). Then, process systemd
> can take long period high cpu usage during releasing task in following
> concurrent processes:
>
> systemd ps
> kernel_waitid stat(/proc/tgid)
> do_wait filename_lookup
> wait_consider_task lookup_fast
> release_task
> __exit_signal
> __unhash_process
> detach_pid
> __change_pid // remove task->pid_links
> d_revalidate -> pid_revalidate // 0
> d_invalidate(/proc/tgid)
> shrink_dcache_parent(/proc/tgid)
> d_walk(/proc/tgid)
> spin_lock_nested(/proc/tgid/fd)
> // iterating opened fd
> proc_flush_pid |
> d_invalidate (/proc/tgid/fd) |
> shrink_dcache_parent(/proc/tgid/fd) |
> shrink_dentry_list(subdirs) ↓
> shrink_lock_dentry(/proc/tgid/fd) --> race on dentry lock
>
> Function d_invalidate() will remove dentry from hash firstly, but why does
> proc_flush_pid() process dentry '/proc/tgid/fd' before dentry '/proc/tgid'?
> That's because proc_pid_make_inode() adds proc inode in reverse order by
> invoking hlist_add_head_rcu(). But proc should not add any inodes under
> '/proc/tgid' except '/proc/tgid/task/pid', fix it by adding inode into
> 'pid->inodes' only if the inode is /proc/tgid or /proc/tgid/task/pid.
>
> Performance regression:
> Create 200 tasks, each task open one file for 50,000 times. Kill all
> tasks when opened files exceed 10,000,000 (cat /proc/sys/fs/file-nr).
>
> Before fix:
> $ time killall -wq aa
> real 4m40.946s # During this period, we can see 'ps' and 'systemd'
> taking high cpu usage.
>
> After fix:
> $ time killall -wq aa
> real 1m20.732s # During this period, we can see 'systemd' taking
> high cpu usage.
>
> Fixes: 7bc3e6e55acf06 ("proc: Use a list of inodes to flush from proc")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216054
> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> Suggested-by: Brian Foster <bfoster@redhat.com>
> ---
LGTM, and thanks for the tweaks:
Reviewed-by: Brian Foster <bfoster@redhat.com>
> v1->v2: Add new helper proc_pid_make_base_inode that performs the extra
> work of adding to the pid->list.
> v2->v3: Add performance regression in commit message.
> v3->v4: Make proc_pid_make_base_inode() static
> v4->v5: Add notes to explain what proc_pid_make_base_inode() does
> fs/proc/base.c | 46 ++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 38 insertions(+), 8 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 8dfa36a99c74..93f7e3d971e4 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1885,7 +1885,7 @@ void proc_pid_evict_inode(struct proc_inode *ei)
> put_pid(pid);
> }
>
> -struct inode *proc_pid_make_inode(struct super_block * sb,
> +struct inode *proc_pid_make_inode(struct super_block *sb,
> struct task_struct *task, umode_t mode)
> {
> struct inode * inode;
> @@ -1914,11 +1914,6 @@ struct inode *proc_pid_make_inode(struct super_block * sb,
>
> /* Let the pid remember us for quick removal */
> ei->pid = pid;
> - if (S_ISDIR(mode)) {
> - spin_lock(&pid->lock);
> - hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
> - spin_unlock(&pid->lock);
> - }
>
> task_dump_owner(task, 0, &inode->i_uid, &inode->i_gid);
> security_task_to_inode(task, inode);
> @@ -1931,6 +1926,39 @@ struct inode *proc_pid_make_inode(struct super_block * sb,
> return NULL;
> }
>
> +/*
> + * Generating an inode and adding it into @pid->inodes, so that task will
> + * invalidate inode's dentry before being released.
> + *
> + * This helper is used for creating dir-type entries under '/proc' and
> + * '/proc/<tgid>/task'. Other entries(eg. fd, stat) under '/proc/<tgid>'
> + * can be released by invalidating '/proc/<tgid>' dentry.
> + * In theory, dentries under '/proc/<tgid>/task' can also be released by
> + * invalidating '/proc/<tgid>' dentry, we reserve it to handle single
> + * thread exiting situation: Any one of threads should invalidate its
> + * '/proc/<tgid>/task/<pid>' dentry before released.
> + */
> +static struct inode *proc_pid_make_base_inode(struct super_block *sb,
> + struct task_struct *task, umode_t mode)
> +{
> + struct inode *inode;
> + struct proc_inode *ei;
> + struct pid *pid;
> +
> + inode = proc_pid_make_inode(sb, task, mode);
> + if (!inode)
> + return NULL;
> +
> + /* Let proc_flush_pid find this directory inode */
> + ei = PROC_I(inode);
> + pid = ei->pid;
> + spin_lock(&pid->lock);
> + hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
> + spin_unlock(&pid->lock);
> +
> + return inode;
> +}
> +
> int pid_getattr(struct user_namespace *mnt_userns, const struct path *path,
> struct kstat *stat, u32 request_mask, unsigned int query_flags)
> {
> @@ -3369,7 +3397,8 @@ static struct dentry *proc_pid_instantiate(struct dentry * dentry,
> {
> struct inode *inode;
>
> - inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
> + inode = proc_pid_make_base_inode(dentry->d_sb, task,
> + S_IFDIR | S_IRUGO | S_IXUGO);
> if (!inode)
> return ERR_PTR(-ENOENT);
>
> @@ -3671,7 +3700,8 @@ static struct dentry *proc_task_instantiate(struct dentry *dentry,
> struct task_struct *task, const void *ptr)
> {
> struct inode *inode;
> - inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
> + inode = proc_pid_make_base_inode(dentry->d_sb, task,
> + S_IFDIR | S_IRUGO | S_IXUGO);
> if (!inode)
> return ERR_PTR(-ENOENT);
>
> --
> 2.31.1
>
prev parent reply other threads:[~2022-07-14 12:01 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-13 13:00 [PATCH v5] proc: Fix a dentry lock race between release_task and lookup Zhihao Cheng
2022-07-14 12:01 ` Brian Foster [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YtAFix81eUBjzgGN@bfoster \
--to=bfoster@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=chengzhihao1@huawei.com \
--cc=ebiederm@xmission.com \
--cc=kaleshsingh@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=willy@infradead.org \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.