From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57FC8C433E5 for ; Mon, 13 Jul 2020 15:59:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1BF442076D for ; Mon, 13 Jul 2020 15:59:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="J6B1/an8" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1BF442076D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B4F7C8D0008; Mon, 13 Jul 2020 11:59:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B25FD8D0001; Mon, 13 Jul 2020 11:59:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9EE0D8D0008; Mon, 13 Jul 2020 11:59:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0212.hostedemail.com [216.40.44.212]) by kanga.kvack.org (Postfix) with ESMTP id 867E98D0001 for ; Mon, 13 Jul 2020 11:59:19 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3EF1F180AD811 for ; Mon, 13 Jul 2020 15:59:19 +0000 (UTC) X-FDA: 77033512038.15.bee48_560fab026ee9 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id 01A261814B0C9 for ; Mon, 13 Jul 2020 15:59:18 +0000 (UTC) X-HE-Tag: bee48_560fab026ee9 X-Filterd-Recvd-Size: 4581 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Mon, 13 Jul 2020 15:59:18 +0000 (UTC) Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6D37120842 for ; Mon, 13 Jul 2020 15:59:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1594655957; bh=KyCw4fkjqGBP+Ng7/LXUCofxHSgPOYAylS5bQauPuV4=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=J6B1/an87Uz+Ik5Zm/ikXniJFfkB0OJpKuPGtOS2jbFknhgYBK84N6z3WsA+39h32 PIVdl90MO3IcY5znDYwiVHWLvaEypp3S2PIYeR4tpEiDakNY6jKisITSFJqHaHAgj5 vtQuv/rRYW1/UBP9ICpJLNgkUogL19k2zGmWv098= Received: by mail-wr1-f45.google.com with SMTP id f2so17053288wrp.7 for ; Mon, 13 Jul 2020 08:59:17 -0700 (PDT) X-Gm-Message-State: AOAM531jfBw7XSBj4sXs7wF3Wb21P9TTZ7W7UDmeJT9TXIRlJNhnYogi VQzAJdyvFyXOxzIvwzKRlidTd05APGLq02kwvSoiwQ== X-Google-Smtp-Source: ABdhPJxU2n3wnRIWWwiKNhxrwmUD4JY7g5VTemgvAdq6OBgmfsRoQL9sAN5gakJjmLDCt30osVAF5ooJLNn0n7EpJd8= X-Received: by 2002:adf:e482:: with SMTP id i2mr11665wrm.75.1594655955925; Mon, 13 Jul 2020 08:59:15 -0700 (PDT) MIME-Version: 1.0 References: <20200710015646.2020871-1-npiggin@gmail.com> <20200710015646.2020871-8-npiggin@gmail.com> In-Reply-To: <20200710015646.2020871-8-npiggin@gmail.com> From: Andy Lutomirski Date: Mon, 13 Jul 2020 08:59:04 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option To: Nicholas Piggin Cc: linux-arch , X86 ML , Mathieu Desnoyers , Arnd Bergmann , Peter Zijlstra , LKML , linuxppc-dev , Linux-MM , Anton Blanchard Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 01A261814B0C9 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 9, 2020 at 6:57 PM Nicholas Piggin wrote: > > On big systems, the mm refcount can become highly contented when doing > a lot of context switching with threaded applications (particularly > switching between the idle thread and an application thread). > > Abandoning lazy tlb slows switching down quite a bit in the important > user->idle->user cases, so so instead implement a non-refcounted scheme > that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down > any remaining lazy ones. > > On a 16-socket 192-core POWER8 system, a context switching benchmark > with as many software threads as CPUs (so each switch will go in and > out of idle), upstream can achieve a rate of about 1 million context > switches per second. After this patch it goes up to 118 million. > I read the patch a couple of times, and I have a suggestion that could be nonsense. You are, effectively, using mm_cpumask() as a sort of refcount. You're saying "hey, this mm has no more references, but it still has nonempty mm_cpumask(), so let's send an IPI and shoot down those references too." I'm wondering whether you actually need the IPI. What if, instead, you actually treated mm_cpumask as a refcount for real? Roughly, in __mmdrop(), you would only free the page tables if mm_cpumask() is empty. And, in the code that removes a CPU from mm_cpumask(), you would check if mm_users == 0 and, if so, check if you just removed the last bit from mm_cpumask and potentially free the mm. Getting the locking right here could be a bit tricky -- you need to avoid two CPUs simultaneously exiting lazy TLB and thinking they should free the mm, and you also need to avoid an mm with mm_users hitting zero concurrently with the last remote CPU using it lazily exiting lazy TLB. Perhaps this could be resolved by having mm_count == 1 mean "mm_cpumask() is might contain bits and, if so, it owns the mm" and mm_count == 0 meaning "now it's dead" and using some careful cmpxchg or dec_return to make sure that only one CPU frees it. Or maybe you'd need a lock or RCU for this, but the idea would be to only ever take the lock after mm_users goes to zero. --Andy