From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E66A9C43460 for ; Fri, 7 May 2021 01:04:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8136561289 for ; Fri, 7 May 2021 01:04:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8136561289 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0F9246B00AD; Thu, 6 May 2021 21:04:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A8286B00AE; Thu, 6 May 2021 21:04:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E40516B00AF; Thu, 6 May 2021 21:04:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0034.hostedemail.com [216.40.44.34]) by kanga.kvack.org (Postfix) with ESMTP id C161F6B00AD for ; Thu, 6 May 2021 21:04:24 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8671B8249980 for ; Fri, 7 May 2021 01:04:24 +0000 (UTC) X-FDA: 78112639248.19.C3FCDF0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf13.hostedemail.com (Postfix) with ESMTP id 44237E000118 for ; Fri, 7 May 2021 01:04:11 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 014A2610F7; Fri, 7 May 2021 01:04:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1620349463; bh=QBPEZG2WgYOW+t5sc5Dew2EQsTta3czEDUc52L3XMMA=; h=Date:From:To:Subject:In-Reply-To:From; b=LscgNs/mIoopr9wOnMBV2u7fK3F8chD+2sjUcwb5a66AR/Ft64jlRFcOJ922t7oTu e8urWn8L8qyGMIc5PqCPjcy48sJ7OKjymSUTS7P+oHVvUJCPi7qpw7dn8p2EQ1gvfg eQK7fvOEYO2webBDRgaaQ/uTU3UwDecTjyAC00jY= Date: Thu, 06 May 2021 18:04:22 -0700 From: Andrew Morton To: akpm@linux-foundation.org, christian@brauner.io, ebiederm@xmission.com, jnewsome@torproject.org, linux-mm@kvack.org, mm-commits@vger.kernel.org, oleg@redhat.com, torvalds@linux-foundation.org Subject: [patch 45/91] do_wait: make PIDTYPE_PID case O(1) instead of O(n) Message-ID: <20210507010422.5op90-oWN%akpm@linux-foundation.org> In-Reply-To: <20210506180126.03e1baee7ca52bedb6cc6003@linux-foundation.org> User-Agent: s-nail v14.8.16 Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b="LscgNs/m"; dmarc=none; spf=pass (imf13.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Server: rspam03 X-Stat-Signature: nsmfugznxsoatwwndm4u5e6wpnqbsi9x X-Rspamd-Queue-Id: 44237E000118 Received-SPF: none (linux-foundation.org>: No applicable sender policy available) receiver=imf13; identity=mailfrom; envelope-from=""; helo=mail.kernel.org; client-ip=198.145.29.99 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1620349451-120309 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Jim Newsome Subject: do_wait: make PIDTYPE_PID case O(1) instead of O(n) Add a special-case when waiting on a pid (via waitpid, waitid, wait4, etc) to avoid doing an O(n) scan of children and tracees, and instead do an O(1) lookup. This improves performance when waiting on a pid from a thread group with many children and/or tracees. Time to fork and then call waitpid on the child, from a task that already has N children [1]: N | Before | After -----|---------|------ 1 | 74 us | 74 us 20 | 72 us | 75 us 100 | 83 us | 77 us 500 | 99 us | 74 us 1000 | 179 us | 75 us 5000 | 804 us | 79 us 8000 | 1268 us | 78 us [1]: https://lkml.org/lkml/2021/3/12/1567 This can make a substantial performance improvement for applications with a thread that has many children or tracees and frequently needs to wait on them. Tools that use ptrace to intercept syscalls for a large number of processes are likely to fall into this category. In particular this patch was developed while building a ptrace-based second generation of the Shadow emulator [2], for which it allows us to avoid quadratic scaling (without having to use a workaround that introduces a ~40% performance penalty) [3]. Other examples of tools that fall into this category which this patch may help include User Mode Linux [4] and DetTrace [5]. [2]: https://shadow.github.io/ [3]: https://github.com/shadow/shadow/issues/1134#issuecomment-798992292 [4]: https://en.wikipedia.org/wiki/User-mode_Linux [5]: https://github.com/dettrace/dettrace Link: https://lkml.kernel.org/r/20210314231544.9379-1-jnewsome@torproject.org Signed-off-by: James Newsome Reviewed-by: Oleg Nesterov Cc: "Eric W . Biederman" Cc: Christian Brauner Signed-off-by: Andrew Morton --- kernel/exit.c | 67 ++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 57 insertions(+), 10 deletions(-) --- a/kernel/exit.c~do_wait-make-pidtype_pid-case-o1-instead-of-on +++ a/kernel/exit.c @@ -1440,9 +1440,48 @@ void __wake_up_parent(struct task_struct TASK_INTERRUPTIBLE, p); } +static bool is_effectively_child(struct wait_opts *wo, bool ptrace, + struct task_struct *target) +{ + struct task_struct *parent = + !ptrace ? target->real_parent : target->parent; + + return current == parent || (!(wo->wo_flags & __WNOTHREAD) && + same_thread_group(current, parent)); +} + +/* + * Optimization for waiting on PIDTYPE_PID. No need to iterate through child + * and tracee lists to find the target task. + */ +static int do_wait_pid(struct wait_opts *wo) +{ + bool ptrace; + struct task_struct *target; + int retval; + + ptrace = false; + target = pid_task(wo->wo_pid, PIDTYPE_TGID); + if (target && is_effectively_child(wo, ptrace, target)) { + retval = wait_consider_task(wo, ptrace, target); + if (retval) + return retval; + } + + ptrace = true; + target = pid_task(wo->wo_pid, PIDTYPE_PID); + if (target && target->ptrace && + is_effectively_child(wo, ptrace, target)) { + retval = wait_consider_task(wo, ptrace, target); + if (retval) + return retval; + } + + return 0; +} + static long do_wait(struct wait_opts *wo) { - struct task_struct *tsk; int retval; trace_sched_process_wait(wo->wo_pid); @@ -1464,19 +1503,27 @@ repeat: set_current_state(TASK_INTERRUPTIBLE); read_lock(&tasklist_lock); - tsk = current; - do { - retval = do_wait_thread(wo, tsk); - if (retval) - goto end; - retval = ptrace_do_wait(wo, tsk); + if (wo->wo_type == PIDTYPE_PID) { + retval = do_wait_pid(wo); if (retval) goto end; + } else { + struct task_struct *tsk = current; - if (wo->wo_flags & __WNOTHREAD) - break; - } while_each_thread(current, tsk); + do { + retval = do_wait_thread(wo, tsk); + if (retval) + goto end; + + retval = ptrace_do_wait(wo, tsk); + if (retval) + goto end; + + if (wo->wo_flags & __WNOTHREAD) + break; + } while_each_thread(current, tsk); + } read_unlock(&tasklist_lock); notask: _