From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49FBAC004D4 for ; Wed, 18 Jan 2023 08:00:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D1CCA6B0078; Wed, 18 Jan 2023 03:00:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CCBD56B007B; Wed, 18 Jan 2023 03:00:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B944E6B007D; Wed, 18 Jan 2023 03:00:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id ABAE26B0078 for ; Wed, 18 Jan 2023 03:00:33 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 63B79402CF for ; Wed, 18 Jan 2023 08:00:33 +0000 (UTC) X-FDA: 80367172746.16.D537082 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf07.hostedemail.com (Postfix) with ESMTP id 8B1B340016 for ; Wed, 18 Jan 2023 08:00:31 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=qt6g0zZx; spf=pass (imf07.hostedemail.com: domain of npiggin@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674028831; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=j17jMfq2pjg82rh+LLiiAYltqlq3/q9a3KAqNyLA6fY=; b=oWYnTnep3OFmBzCl3vXJPOC7d9YkCcp31JViftIJjegpxqMImTFQIc+DEN3pOOOQ+YE2uP +Dpiwsp7cY+QhttQo54aig/o+JxZ7X/Y66tilEyjPtxogXiGWtptRZfuHB37fQ7zgayHvW +41G4nYDeg/uLyed9fILNzPXY7So8IU= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=qt6g0zZx; spf=pass (imf07.hostedemail.com: domain of npiggin@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674028831; a=rsa-sha256; cv=none; b=zyhNHwNWCfxu3t3q/Hm/3q0QxQ93F9OHRM4HJicoUfmoUWt+NDFAQRgfwLkSYkIt9fxWA9 15wvlhaS4cirTZAIQuCrJ71derARp8iZzYE64PSvmBeGfkZP39JrA1WFqGM14fwrRLtgko 8V82PzqEhEKmO6U1nd48ko5VRbks2ig= Received: by mail-pj1-f53.google.com with SMTP id z9-20020a17090a468900b00226b6e7aeeaso1365639pjf.1 for ; Wed, 18 Jan 2023 00:00:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=j17jMfq2pjg82rh+LLiiAYltqlq3/q9a3KAqNyLA6fY=; b=qt6g0zZxclXPX/KufLzDrxetxOa+Y0KuO2kDBGjEirYTyEuDL2FU0kh1CW+bvKzQ8e 1PgIn8vJJDHYwOXsY+eXFKzMReN8kzIkL3cqJbHC/KcVaDPk5+4ASp5rwAzbthNV2jCy sFId0D+KFl7RHqDm5hu7vWp9CHS1bb0XbHOhZNmj1FkqrWt4HCW+AaZLa2qrD0ImjpmG OLknz9aRVzEP5mWTfavQXBkdiCAYcQpJIhtMx/MGg9B5RRIVRIGc2Rw6L2BEqeC36yiX RdYtdnH476ugfbY0kv33lImerLZIfQVPZX7TrvFbeii8oXEK4Tdej1T6R4YBH6qhLF7Z 2J+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=j17jMfq2pjg82rh+LLiiAYltqlq3/q9a3KAqNyLA6fY=; b=I7dlmMrRcpXqM3PhFtq+RZ2qIBbzDM4ELWJ+98p7wO/i8KQY+ijodp9d/4RBTVGOMc P31IZfQLMJrMxLsm+GDFQP0a81ZAtV9NT4KMbIPVO3fRbr5TzoCWP5TNYGtwjweXDMoO Cdb7Es4FwUI4ilaK85PJkH5/B87cVm7RirscWpfYXkEGRbn4dbGtS2mAeN+L0wiypDwh Qd6qaMoGdf3q5bCv3/jaWFm9MEyiCx4vn0eZGz4MDShKcQNi4Gve5SZmLApviqXpqpf/ une6XavYVGoayaUTR8dsXFP659Szc0uQSyOXmQcDyGEioCpRqdJUmI0dtRr7xsqeG0W3 7OPA== X-Gm-Message-State: AFqh2kp8/dIN+cHghht0F5QOfIVEbWbrCsnwdAayCITf3Rj8M5OyAMH+ Ux7wY7WXAqkqMIv1i9qBkqY= X-Google-Smtp-Source: AMrXdXtbjb5k3yKNxPZ6Ry87Y+0HPuS/QyUv3IYX7qbgXoBriB0NOKeG0RZHtZHLAEw3QY6HwQqHbg== X-Received: by 2002:a17:90b:1d04:b0:229:2b7d:ee41 with SMTP id on4-20020a17090b1d0400b002292b7dee41mr5967999pjb.45.1674028830421; Wed, 18 Jan 2023 00:00:30 -0800 (PST) Received: from bobo.ibm.com (193-116-102-45.tpgi.com.au. [193.116.102.45]) by smtp.gmail.com with ESMTPSA id y2-20020a17090a16c200b002272616d3e1sm738462pje.40.2023.01.18.00.00.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Jan 2023 00:00:29 -0800 (PST) From: Nicholas Piggin To: Andrew Morton Cc: Nicholas Piggin , Andy Lutomirski , Linus Torvalds , linux-arch , linux-mm , linuxppc-dev@lists.ozlabs.org Subject: [PATCH v6 2/5] lazy tlb: allow lazy tlb mm refcounting to be configurable Date: Wed, 18 Jan 2023 18:00:08 +1000 Message-Id: <20230118080011.2258375-3-npiggin@gmail.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20230118080011.2258375-1-npiggin@gmail.com> References: <20230118080011.2258375-1-npiggin@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 8B1B340016 X-Stat-Signature: x6a9uubfib18ssiigwc6q5a5a8p53fzi X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1674028831-441083 X-HE-Meta: U2FsdGVkX1/hya+1OUXwd4z0rt0H+Js3kz9KSAbbCknWOAjNAPxlT7/eFBl2Q7M1hbpjCeNF3CNH/WfTBxE92rnjkq38Or6x8MCYKF0+tKuXq0xnQFRvWFkbHSy64XXV04gDb1CX/ZWxcu/dszaYNwaXgYkp9H+/hVmAIez6wbG2qYVo5jbFXpBl5Kro2qsNCkngMVHElV6HJLru/RZOTyTaLgze2MQ5wTYZY04QSeTc2SXsYbYtfbnjIE27SCC/obh9Sq3hXrK7v8EvA4xoZa17/d63jodFA6xOFTEp4UjjLA69RvMGWJIflK462v+rhCS3ryBJOFErk6dke7RhwEgH25at8H5wzuHxE4PSqa12bnjPhppm9XjUJzGcC4kWsT4TnyPHb5PpDrx33RYNBNC4PKkinXbDWzY7se/85b99xLijQZy3GhHOddmzeRB3KgwWaiOm/zoyOBhxU2OVEZIHGLb3Fq4Gv3L7pNIoFDOf0VyLq6XlVKV8/eIKXUk1QFx6UEACq7MF6bgRG/ka8daCDWKelR/jxg7TDo3wtwUdVIxCpFKrtkXsBCKDORu7AFRPB4MibuExRulgZSp2EsIiTZ749FOhvQLUzayQQQ2I1LhtutPC3rhChvXd7nUOBBzSyr9fYEuMhVBcew1wscSNOAp/6sNNEQAb5eY4iJaJgyBF5HB+u4X3GCPAsbppJBWZxA2cZGPkxrwQoPad2s4J5qfkMWRI+Fvu+Hk7HQZfe74nVu8YGgtUVXvNA12nKECOckubujZs2Rd4YuSkhrcYYaLBq4NsyoclQ39sWIPVYN3920CiEcMbE5qipQ0+HfJId6uLjE2X0yua8z6My2cBv58cNOvflnIyEbgieqZjfCImyUz+DIs0qFuNTr2Zal2gFZxDwRbEOADWtEB7mJy0EOgaSPCtN1asfzMibusx2x4yaRX8VhGudOVTHu15KhxE3v3tUrBE1Btxbf6 CBVRfgSU bdYZFmYQVu2u1n1h/XLzk4BnfF1AI8gYIELphwluiSa4LatHF9Lf4dTf63paV+AvcaSEyMnnDZ3rR/4gQKaJO+2czdOBrlsPfPwnpiZZHWnDbln77l6eX8ZgFb4CUJHkFAc1N X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add CONFIG_MMU_TLB_REFCOUNT which enables refcounting of the lazy tlb mm when it is context switched. This can be disabled by architectures that don't require this refcounting if they clean up lazy tlb mms when the last refcount is dropped. Currently this is always enabled, which is what existing code does, so the patch is effectively a no-op. Rename rq->prev_mm to rq->prev_lazy_mm, because that's what it is. Signed-off-by: Nicholas Piggin --- Documentation/mm/active_mm.rst | 6 ++++++ arch/Kconfig | 17 +++++++++++++++++ include/linux/sched/mm.h | 18 +++++++++++++++--- kernel/sched/core.c | 22 ++++++++++++++++++---- kernel/sched/sched.h | 4 +++- 5 files changed, 59 insertions(+), 8 deletions(-) diff --git a/Documentation/mm/active_mm.rst b/Documentation/mm/active_mm.rst index 6f8269c284ed..2b0d08332400 100644 --- a/Documentation/mm/active_mm.rst +++ b/Documentation/mm/active_mm.rst @@ -4,6 +4,12 @@ Active MM ========= +Note, the mm_count refcount may no longer include the "lazy" users +(running tasks with ->active_mm == mm && ->mm == NULL) on kernels +with CONFIG_MMU_LAZY_TLB_REFCOUNT=n. Taking and releasing these lazy +references must be done with mmgrab_lazy_tlb() and mmdrop_lazy_tlb() +helpers which abstracts this config option. + :: List: linux-kernel diff --git a/arch/Kconfig b/arch/Kconfig index 12e3ddabac9d..b07d36f08fea 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -465,6 +465,23 @@ config ARCH_WANT_IRQS_OFF_ACTIVATE_MM irqs disabled over activate_mm. Architectures that do IPI based TLB shootdowns should enable this. +# Use normal mm refcounting for MMU_LAZY_TLB kernel thread references. +# MMU_LAZY_TLB_REFCOUNT=n can improve the scalability of context switching +# to/from kernel threads when the same mm is running on a lot of CPUs (a large +# multi-threaded application), by reducing contention on the mm refcount. +# +# This can be disabled if the architecture ensures no CPUs are using an mm as a +# "lazy tlb" beyond its final refcount (i.e., by the time __mmdrop frees the mm +# or its kernel page tables). This could be arranged by arch_exit_mmap(), or +# final exit(2) TLB flush, for example. +# +# To implement this, an arch *must*: +# Ensure the _lazy_tlb variants of mmgrab/mmdrop are used when dropping the +# lazy reference of a kthread's ->active_mm (non-arch code has been converted +# already). +config MMU_LAZY_TLB_REFCOUNT + def_bool y + config ARCH_HAVE_NMI_SAFE_CMPXCHG bool diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 5376caf6fcf3..68bbe8d90c2e 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -82,17 +82,29 @@ static inline void mmdrop_sched(struct mm_struct *mm) /* Helpers for lazy TLB mm refcounting */ static inline void mmgrab_lazy_tlb(struct mm_struct *mm) { - mmgrab(mm); + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_REFCOUNT)) + mmgrab(mm); } static inline void mmdrop_lazy_tlb(struct mm_struct *mm) { - mmdrop(mm); + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_REFCOUNT)) { + mmdrop(mm); + } else { + /* + * mmdrop_lazy_tlb must provide a full memory barrier, see the + * membarrier comment finish_task_switch which relies on this. + */ + smp_mb(); + } } static inline void mmdrop_lazy_tlb_sched(struct mm_struct *mm) { - mmdrop_sched(mm); + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_REFCOUNT)) + mmdrop_sched(mm); + else + smp_mb(); // see above } /** diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 26aaa974ee6d..1ea14d849a0d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5081,7 +5081,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) __releases(rq->lock) { struct rq *rq = this_rq(); - struct mm_struct *mm = rq->prev_mm; + struct mm_struct *mm = NULL; unsigned int prev_state; /* @@ -5100,7 +5100,10 @@ static struct rq *finish_task_switch(struct task_struct *prev) current->comm, current->pid, preempt_count())) preempt_count_set(FORK_PREEMPT_COUNT); - rq->prev_mm = NULL; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + mm = rq->prev_lazy_mm; + rq->prev_lazy_mm = NULL; +#endif /* * A task struct has one reference for the use as "current". @@ -5231,9 +5234,20 @@ context_switch(struct rq *rq, struct task_struct *prev, lru_gen_use_mm(next->mm); if (!prev->mm) { // from kernel - /* will mmdrop_lazy_tlb() in finish_task_switch(). */ - rq->prev_mm = prev->active_mm; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + /* Will mmdrop_lazy_tlb() in finish_task_switch(). */ + rq->prev_lazy_mm = prev->active_mm; prev->active_mm = NULL; +#else + /* + * Without MMU_LAZY_TLB_REFCOUNT there is no lazy + * tracking (because no rq->prev_lazy_mm) in + * finish_task_switch, so no mmdrop_lazy_tlb(), so no + * memory barrier for membarrier (see the membarrier + * comment in finish_task_switch()). Do it here. + */ + smp_mb(); +#endif } } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 771f8ddb7053..33da8fa8b5a5 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1009,7 +1009,9 @@ struct rq { struct task_struct *idle; struct task_struct *stop; unsigned long next_balance; - struct mm_struct *prev_mm; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + struct mm_struct *prev_lazy_mm; +#endif unsigned int clock_update_flags; u64 clock; -- 2.37.2