From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DC6DC433E2 for ; Mon, 13 Jul 2020 16:03:12 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8A24B2067D for ; Mon, 13 Jul 2020 16:03:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="J6B1/an8" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8A24B2067D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4B57j70jTczDqS7 for ; Tue, 14 Jul 2020 02:03:07 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=kernel.org (client-ip=198.145.29.99; helo=mail.kernel.org; envelope-from=luto@kernel.org; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=kernel.org Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256 header.s=default header.b=J6B1/an8; dkim-atps=neutral Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4B57cm11q8zDqCY for ; Tue, 14 Jul 2020 01:59:19 +1000 (AEST) Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 5777720771 for ; Mon, 13 Jul 2020 15:59:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1594655957; bh=KyCw4fkjqGBP+Ng7/LXUCofxHSgPOYAylS5bQauPuV4=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=J6B1/an87Uz+Ik5Zm/ikXniJFfkB0OJpKuPGtOS2jbFknhgYBK84N6z3WsA+39h32 PIVdl90MO3IcY5znDYwiVHWLvaEypp3S2PIYeR4tpEiDakNY6jKisITSFJqHaHAgj5 vtQuv/rRYW1/UBP9ICpJLNgkUogL19k2zGmWv098= Received: by mail-wr1-f42.google.com with SMTP id j4so17100286wrp.10 for ; Mon, 13 Jul 2020 08:59:17 -0700 (PDT) X-Gm-Message-State: AOAM530qGy4Z7QrPjGSZ9GiznWI3I0lOSG608ctrUKIVokvYvZWducc4 s0EzUXfS4nHe8YF8Aigt/ZRGr/qSCPJhnnd5PG14RA== X-Google-Smtp-Source: ABdhPJxU2n3wnRIWWwiKNhxrwmUD4JY7g5VTemgvAdq6OBgmfsRoQL9sAN5gakJjmLDCt30osVAF5ooJLNn0n7EpJd8= X-Received: by 2002:adf:e482:: with SMTP id i2mr11665wrm.75.1594655955925; Mon, 13 Jul 2020 08:59:15 -0700 (PDT) MIME-Version: 1.0 References: <20200710015646.2020871-1-npiggin@gmail.com> <20200710015646.2020871-8-npiggin@gmail.com> In-Reply-To: <20200710015646.2020871-8-npiggin@gmail.com> From: Andy Lutomirski Date: Mon, 13 Jul 2020 08:59:04 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option To: Nicholas Piggin Content-Type: text/plain; charset="UTF-8" X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arch , Arnd Bergmann , Peter Zijlstra , X86 ML , LKML , Linux-MM , Mathieu Desnoyers , linuxppc-dev Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Thu, Jul 9, 2020 at 6:57 PM Nicholas Piggin wrote: > > On big systems, the mm refcount can become highly contented when doing > a lot of context switching with threaded applications (particularly > switching between the idle thread and an application thread). > > Abandoning lazy tlb slows switching down quite a bit in the important > user->idle->user cases, so so instead implement a non-refcounted scheme > that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down > any remaining lazy ones. > > On a 16-socket 192-core POWER8 system, a context switching benchmark > with as many software threads as CPUs (so each switch will go in and > out of idle), upstream can achieve a rate of about 1 million context > switches per second. After this patch it goes up to 118 million. > I read the patch a couple of times, and I have a suggestion that could be nonsense. You are, effectively, using mm_cpumask() as a sort of refcount. You're saying "hey, this mm has no more references, but it still has nonempty mm_cpumask(), so let's send an IPI and shoot down those references too." I'm wondering whether you actually need the IPI. What if, instead, you actually treated mm_cpumask as a refcount for real? Roughly, in __mmdrop(), you would only free the page tables if mm_cpumask() is empty. And, in the code that removes a CPU from mm_cpumask(), you would check if mm_users == 0 and, if so, check if you just removed the last bit from mm_cpumask and potentially free the mm. Getting the locking right here could be a bit tricky -- you need to avoid two CPUs simultaneously exiting lazy TLB and thinking they should free the mm, and you also need to avoid an mm with mm_users hitting zero concurrently with the last remote CPU using it lazily exiting lazy TLB. Perhaps this could be resolved by having mm_count == 1 mean "mm_cpumask() is might contain bits and, if so, it owns the mm" and mm_count == 0 meaning "now it's dead" and using some careful cmpxchg or dec_return to make sure that only one CPU frees it. Or maybe you'd need a lock or RCU for this, but the idea would be to only ever take the lock after mm_users goes to zero. --Andy