All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Jim Newsome <jnewsome@torproject.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Christian Brauner <christian@brauner.io>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
Date: Wed, 10 Mar 2021 16:40:57 -0600	[thread overview]
Message-ID: <m1blbqmy2u.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <20210309203919.15920-1-jnewsome@torproject.org> (Jim Newsome's message of "Tue, 9 Mar 2021 14:39:19 -0600")

Jim Newsome <jnewsome@torproject.org> writes:

> do_wait is an internal function used to implement waitpid, waitid,
> wait4, etc. To handle the general case, it does an O(n) linear scan of
> the thread group's children and tracees.
>
> This patch adds a special-case when waiting on a pid to skip these scans
> and instead do an O(1) lookup. This improves performance when waiting on
> a pid from a thread group with many children and/or tracees.
>
> Signed-off-by: James Newsome <jnewsome@torproject.org>
> ---
>  kernel/exit.c | 53 +++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 43 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 04029e35e69a..c2438d4ba262 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1439,9 +1439,34 @@ void __wake_up_parent(struct task_struct *p, struct task_struct *parent)
>  			   TASK_INTERRUPTIBLE, p);
>  }
>  
> +// Optimization for waiting on PIDTYPE_PID. No need to iterate through child
> +// and tracee lists to find the target task.

Minor nit:  C++ style comments look very out of place in this file
            which uses old school C /* */ comment delimiters for
            all of it's block comments.
                      
> +static int do_wait_pid(struct wait_opts *wo)
> +{
> +	struct task_struct *target = pid_task(wo->wo_pid, PIDTYPE_PID);
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is subtle change in behavior.

Today on the task->children list we only place thread group leaders.

Which means that your do_wait_pid wait for thread of someone else's
process and that is a change in behavior.

So the code either needs a thread_group_leader filter on target before
the ptrace=0 case or we need to use "pid_task(wo->wo_pid, PIDTYPE_TGID)"
and "pid_task(wo->wo_pid, PIDTYPE_PID)" for the "ptrace=1" case.

I would like to make thread_group_leaders go away so I would favor two
pid_task calls.  But either will work right now.

Eric

                                     
> +	int retval;
> +
> +	if (!target)
> +		return 0;
> +	if (current == target->real_parent ||
> +	    (!(wo->wo_flags & __WNOTHREAD) &&
> +	     same_thread_group(current, target->real_parent))) {
> +		retval = wait_consider_task(wo, /* ptrace= */ 0, target);
> +		if (retval)
> +			return retval;
> +	}
> +	if (target->ptrace && (current == target->parent ||
> +			       (!(wo->wo_flags & __WNOTHREAD) &&
> +				same_thread_group(current, target->parent)))) {
> +		retval = wait_consider_task(wo, /* ptrace= */ 1, target);
> +		if (retval)
> +			return retval;
> +	}
> +	return 0;
> +}
> +
>  static long do_wait(struct wait_opts *wo)
>  {
> -	struct task_struct *tsk;
>  	int retval;
>  
>  	trace_sched_process_wait(wo->wo_pid);
> @@ -1463,19 +1488,27 @@ static long do_wait(struct wait_opts *wo)
>  
>  	set_current_state(TASK_INTERRUPTIBLE);
>  	read_lock(&tasklist_lock);
> -	tsk = current;
> -	do {
> -		retval = do_wait_thread(wo, tsk);
> -		if (retval)
> -			goto end;
>  
> -		retval = ptrace_do_wait(wo, tsk);
> +	if (wo->wo_type == PIDTYPE_PID) {
> +		retval = do_wait_pid(wo);
>  		if (retval)
>  			goto end;
> +	} else {
> +		struct task_struct *tsk = current;
>  
> -		if (wo->wo_flags & __WNOTHREAD)
> -			break;
> -	} while_each_thread(current, tsk);
> +		do {
> +			retval = do_wait_thread(wo, tsk);
> +			if (retval)
> +				goto end;
> +
> +			retval = ptrace_do_wait(wo, tsk);
> +			if (retval)
> +				goto end;
> +
> +			if (wo->wo_flags & __WNOTHREAD)
> +				break;
> +		} while_each_thread(current, tsk);
> +	}
>  	read_unlock(&tasklist_lock);
>  
>  notask:

  parent reply	other threads:[~2021-03-10 22:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-09 20:39 [PATCH v3] do_wait: make PIDTYPE_PID case O(1) instead of O(n) Jim Newsome
2021-03-10 17:27 ` Oleg Nesterov
2021-03-10 22:40 ` Eric W. Biederman [this message]
2021-03-11  0:14   ` Jim Newsome
2021-03-11 15:15     ` Oleg Nesterov
2021-03-11 16:26       ` Jim Newsome
2021-03-11 16:30     ` Eric W. Biederman
2021-03-11 15:08   ` Oleg Nesterov
2021-03-11 16:37     ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1blbqmy2u.fsf@fess.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=christian@brauner.io \
    --cc=jnewsome@torproject.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.