From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CEA41C433E9 for ; Wed, 10 Mar 2021 22:41:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A855764FD2 for ; Wed, 10 Mar 2021 22:41:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232202AbhCJWlZ (ORCPT ); Wed, 10 Mar 2021 17:41:25 -0500 Received: from out03.mta.xmission.com ([166.70.13.233]:35854 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229574AbhCJWky (ORCPT ); Wed, 10 Mar 2021 17:40:54 -0500 Received: from in02.mta.xmission.com ([166.70.13.52]) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1lK7Vd-005CHo-Bb; Wed, 10 Mar 2021 15:40:53 -0700 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95] helo=fess.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1lK7Vc-001Hso-6z; Wed, 10 Mar 2021 15:40:53 -0700 From: ebiederm@xmission.com (Eric W. Biederman) To: Jim Newsome Cc: Andrew Morton , Oleg Nesterov , Christian Brauner , linux-kernel@vger.kernel.org References: <20210309203919.15920-1-jnewsome@torproject.org> Date: Wed, 10 Mar 2021 16:40:57 -0600 In-Reply-To: <20210309203919.15920-1-jnewsome@torproject.org> (Jim Newsome's message of "Tue, 9 Mar 2021 14:39:19 -0600") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1lK7Vc-001Hso-6z;;;mid=;;;hst=in02.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/9+06NnPpPegzVSwsK95sEnyhmNU8Jfqk= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH v3] do_wait: make PIDTYPE_PID case O(1) instead of O(n) X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jim Newsome writes: > do_wait is an internal function used to implement waitpid, waitid, > wait4, etc. To handle the general case, it does an O(n) linear scan of > the thread group's children and tracees. > > This patch adds a special-case when waiting on a pid to skip these scans > and instead do an O(1) lookup. This improves performance when waiting on > a pid from a thread group with many children and/or tracees. > > Signed-off-by: James Newsome > --- > kernel/exit.c | 53 +++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 43 insertions(+), 10 deletions(-) > > diff --git a/kernel/exit.c b/kernel/exit.c > index 04029e35e69a..c2438d4ba262 100644 > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -1439,9 +1439,34 @@ void __wake_up_parent(struct task_struct *p, struct task_struct *parent) > TASK_INTERRUPTIBLE, p); > } > > +// Optimization for waiting on PIDTYPE_PID. No need to iterate through child > +// and tracee lists to find the target task. Minor nit: C++ style comments look very out of place in this file which uses old school C /* */ comment delimiters for all of it's block comments. > +static int do_wait_pid(struct wait_opts *wo) > +{ > + struct task_struct *target = pid_task(wo->wo_pid, PIDTYPE_PID); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is subtle change in behavior. Today on the task->children list we only place thread group leaders. Which means that your do_wait_pid wait for thread of someone else's process and that is a change in behavior. So the code either needs a thread_group_leader filter on target before the ptrace=0 case or we need to use "pid_task(wo->wo_pid, PIDTYPE_TGID)" and "pid_task(wo->wo_pid, PIDTYPE_PID)" for the "ptrace=1" case. I would like to make thread_group_leaders go away so I would favor two pid_task calls. But either will work right now. Eric > + int retval; > + > + if (!target) > + return 0; > + if (current == target->real_parent || > + (!(wo->wo_flags & __WNOTHREAD) && > + same_thread_group(current, target->real_parent))) { > + retval = wait_consider_task(wo, /* ptrace= */ 0, target); > + if (retval) > + return retval; > + } > + if (target->ptrace && (current == target->parent || > + (!(wo->wo_flags & __WNOTHREAD) && > + same_thread_group(current, target->parent)))) { > + retval = wait_consider_task(wo, /* ptrace= */ 1, target); > + if (retval) > + return retval; > + } > + return 0; > +} > + > static long do_wait(struct wait_opts *wo) > { > - struct task_struct *tsk; > int retval; > > trace_sched_process_wait(wo->wo_pid); > @@ -1463,19 +1488,27 @@ static long do_wait(struct wait_opts *wo) > > set_current_state(TASK_INTERRUPTIBLE); > read_lock(&tasklist_lock); > - tsk = current; > - do { > - retval = do_wait_thread(wo, tsk); > - if (retval) > - goto end; > > - retval = ptrace_do_wait(wo, tsk); > + if (wo->wo_type == PIDTYPE_PID) { > + retval = do_wait_pid(wo); > if (retval) > goto end; > + } else { > + struct task_struct *tsk = current; > > - if (wo->wo_flags & __WNOTHREAD) > - break; > - } while_each_thread(current, tsk); > + do { > + retval = do_wait_thread(wo, tsk); > + if (retval) > + goto end; > + > + retval = ptrace_do_wait(wo, tsk); > + if (retval) > + goto end; > + > + if (wo->wo_flags & __WNOTHREAD) > + break; > + } while_each_thread(current, tsk); > + } > read_unlock(&tasklist_lock); > > notask: