public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
@ 2021-03-11 23:38 Jim Newsome
  2021-03-12 16:41 ` Oleg Nesterov
  0 siblings, 1 reply; 3+ messages in thread
From: Jim Newsome @ 2021-03-11 23:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Oleg Nesterov, Eric W . Biederman, Christian Brauner,
	linux-kernel, Jim Newsome

do_wait is an internal function used to implement waitpid, waitid,
wait4, etc. To handle the general case, it does an O(n) linear scan of
the thread group's children and tracees.

This patch adds a special-case when waiting on a pid to skip these scans
and instead do an O(1) lookup. This improves performance when waiting on
a pid from a thread group with many children and/or tracees.

Signed-off-by: James Newsome <jnewsome@torproject.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
---

v3: https://lkml.org/lkml/2021/3/9/1134

Oleg - I kept your "reviewed-by", but LMK if I should drop it; wasn't
sure whether these changes were enough to have to drop it or not. Also
while making the other requested changes I found the code was cleaner
with a helper function after all, which I named `is_effectively_child`.

 kernel/exit.c | 73 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 63 insertions(+), 10 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 04029e35e69a..e0fd782463c5 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1439,9 +1439,54 @@ void __wake_up_parent(struct task_struct *p, struct task_struct *parent)
 			   TASK_INTERRUPTIBLE, p);
 }
 
+static bool is_effectively_child(struct wait_opts *wo, bool ptrace,
+				 struct task_struct *target)
+{
+	struct task_struct *parent =
+		!ptrace ? target->real_parent : target->parent;
+
+	return current == parent || (!(wo->wo_flags & __WNOTHREAD) &&
+				     same_thread_group(current, parent));
+}
+
+/*
+ * Optimization for waiting on PIDTYPE_PID. No need to iterate through child
+ * and tracee lists to find the target task.
+ */
+static int do_wait_pid(struct wait_opts *wo)
+{
+	bool ptrace;
+	struct task_struct *target;
+	int retval;
+
+	ptrace = false;
+
+	/* A non-ptrace wait can only be performed on a thread group leader. */
+	target = pid_task(wo->wo_pid, PIDTYPE_TGID);
+
+	if (target && is_effectively_child(wo, ptrace, target)) {
+		retval = wait_consider_task(wo, ptrace, target);
+		if (retval)
+			return retval;
+	}
+
+	ptrace = true;
+
+	/* A ptrace wait can be done on non-thread-group-leaders. */
+	if (!target)
+		target = pid_task(wo->wo_pid, PIDTYPE_PID);
+
+	if (target && is_effectively_child(wo, ptrace, target)) {
+		retval = wait_consider_task(wo, ptrace, target);
+		if (retval)
+			return retval;
+	}
+
+	return 0;
+}
+
 static long do_wait(struct wait_opts *wo)
 {
-	struct task_struct *tsk;
 	int retval;
 
 	trace_sched_process_wait(wo->wo_pid);
@@ -1463,19 +1508,27 @@ static long do_wait(struct wait_opts *wo)
 
 	set_current_state(TASK_INTERRUPTIBLE);
 	read_lock(&tasklist_lock);
-	tsk = current;
-	do {
-		retval = do_wait_thread(wo, tsk);
-		if (retval)
-			goto end;
 
-		retval = ptrace_do_wait(wo, tsk);
+	if (wo->wo_type == PIDTYPE_PID) {
+		retval = do_wait_pid(wo);
 		if (retval)
 			goto end;
+	} else {
+		struct task_struct *tsk = current;
+
+		do {
+			retval = do_wait_thread(wo, tsk);
+			if (retval)
+				goto end;
 
-		if (wo->wo_flags & __WNOTHREAD)
-			break;
-	} while_each_thread(current, tsk);
+			retval = ptrace_do_wait(wo, tsk);
+			if (retval)
+				goto end;
+
+			if (wo->wo_flags & __WNOTHREAD)
+				break;
+		} while_each_thread(current, tsk);
+	}
 	read_unlock(&tasklist_lock);
 
 notask:
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v4] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-11 23:38 [PATCH v4] do_wait: make PIDTYPE_PID case O(1) instead of O(n) Jim Newsome
@ 2021-03-12 16:41 ` Oleg Nesterov
  2021-03-12 16:51   ` Jim Newsome
  0 siblings, 1 reply; 3+ messages in thread
From: Oleg Nesterov @ 2021-03-12 16:41 UTC (permalink / raw)
  To: Jim Newsome
  Cc: Andrew Morton, Eric W . Biederman, Christian Brauner,
	linux-kernel

On 03/11, Jim Newsome wrote:
>
> +static bool is_effectively_child(struct wait_opts *wo, bool ptrace,
> +				 struct task_struct *target)
> +{
> +	struct task_struct *parent =
> +		!ptrace ? target->real_parent : target->parent;
> +
> +	return current == parent || (!(wo->wo_flags & __WNOTHREAD) &&
> +				     same_thread_group(current, parent));
> +}
> +
> +/*
> + * Optimization for waiting on PIDTYPE_PID. No need to iterate through child
> + * and tracee lists to find the target task.
> + */
> +static int do_wait_pid(struct wait_opts *wo)
> +{
> +	bool ptrace;
> +	struct task_struct *target;
> +	int retval;
> +
> +	ptrace = false;
> +
> +	/* A non-ptrace wait can only be performed on a thread group leader. */
> +	target = pid_task(wo->wo_pid, PIDTYPE_TGID);
> +
> +	if (target && is_effectively_child(wo, ptrace, target)) {
> +		retval = wait_consider_task(wo, ptrace, target);
> +		if (retval)
> +			return retval;
> +	}
> +
> +	ptrace = true;
> +
> +	/* A ptrace wait can be done on non-thread-group-leaders. */
> +	if (!target)
> +		target = pid_task(wo->wo_pid, PIDTYPE_PID);
> +
> +	if (target && is_effectively_child(wo, ptrace, target)) {
> +		retval = wait_consider_task(wo, ptrace, target);

No, this is not right... You need to check target->ptrace != 0.

I know that Eric suggests to not use thread_group_leader() and I won't argue
even if I don't really agree.

Up to you, but to me something like

	do_wait_pid()
	{
		target = pid_task(wo->wo_pid, PIDTYPE_PID);

		if (!target)
			return 0;

		if (thread_group_leader(target) &&
		    is_effectively_child(wo, 0, target) {
			...			
		}

		if (target->ptrace &&
		    is_effectively_child(wo, 1, target) {
			...
		}

		return 0;

	}

looks more simple/clean.

Oleg.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v4] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
  2021-03-12 16:41 ` Oleg Nesterov
@ 2021-03-12 16:51   ` Jim Newsome
  0 siblings, 0 replies; 3+ messages in thread
From: Jim Newsome @ 2021-03-12 16:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W . Biederman, Christian Brauner,
	linux-kernel


On 3/12/21 10:41, Oleg Nesterov wrote:
> On 03/11, Jim Newsome wrote:
>> +
>> +	if (target && is_effectively_child(wo, ptrace, target)) {
>> +		retval = wait_consider_task(wo, ptrace, target);
> No, this is not right... You need to check target->ptrace != 0.

Shoot; got lost in the shuffle. Sorry about that and thanks for catching!

> I know that Eric suggests to not use thread_group_leader() and I won't argue
> even if I don't really agree.
>
> Up to you, but to me something like
>
> 	do_wait_pid()
> 	{
> 		target = pid_task(wo->wo_pid, PIDTYPE_PID);
>
> 		if (!target)
> 			return 0;
>
> 		if (thread_group_leader(target) &&
> 		    is_effectively_child(wo, 0, target) {
> 			...			
> 		}
>
> 		if (target->ptrace &&
> 		    is_effectively_child(wo, 1, target) {
> 			...
> 		}
>
> 		return 0;
>
> 	}
>
> looks more simple/clean.

I like that a little better too. I'll go this way since Eric seemed Ok
with either way.

If we do that then it might make sense to move the `thread_group_leader`
filter into `is_effectively_child`, but maybe that obscures what the
latter is doing too much. It'd at least have to be renamed, and I'm not
sure of a clear name that'd capture exactly what it's doing. Maybe
`is_valid_waitee`?

If I don't hear anything I'll just go with how you've already proposed.

v5 coming in a bit. I'll drop your (Oleg's) reviewed-by since it's
changed substantially since then.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-03-12 16:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-11 23:38 [PATCH v4] do_wait: make PIDTYPE_PID case O(1) instead of O(n) Jim Newsome
2021-03-12 16:41 ` Oleg Nesterov
2021-03-12 16:51   ` Jim Newsome

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox