From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18588C4338F for ; Wed, 4 Aug 2021 06:59:59 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C462460C41 for ; Wed, 4 Aug 2021 06:59:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org C462460C41 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ellerman.id.au Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.ozlabs.org Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4GfjKm1JmLz3cK9 for ; Wed, 4 Aug 2021 16:59:56 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ellerman.id.au header.i=@ellerman.id.au header.a=rsa-sha256 header.s=201909 header.b=H/U7iHRh; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=ellerman.id.au (client-ip=203.11.71.1; helo=ozlabs.org; envelope-from=mpe@ellerman.id.au; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ellerman.id.au header.i=@ellerman.id.au header.a=rsa-sha256 header.s=201909 header.b=H/U7iHRh; dkim-atps=neutral Received: from ozlabs.org (ozlabs.org [203.11.71.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4GfjKF27MQz2yNL for ; Wed, 4 Aug 2021 16:59:28 +1000 (AEST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4GfjKC2Cn2z9sRN; Wed, 4 Aug 2021 16:59:27 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ellerman.id.au; s=201909; t=1628060367; bh=t+u6zfLZ7zycmW7hqY9clsCXlEeHHrKzdg1cusO/g04=; h=From:To:Subject:In-Reply-To:References:Date:From; b=H/U7iHRh26Imk5lw1J7bUffm4LUCEi2VKtIg5t08rVJnGtXFkdk7QgCliJPJBM6KA onlPmVQY0+LNSFQR2zf3+PmY2GtT1TYZJxHHq6a7RHIB8cQ1zGUU5E80d5+uhishKj HjpsdTfYIk5X7idxEZPymdDzx//3oIk62C6lAvh/pHGg9AJ85pgB+eYcjsjAUJa444 BqSlyv7ol3UmTEup+vuFgitbhxTR53xOkMyRd0OjCiQJdMfMzc1CQ11GfUOrnNc+9D 9qzrzBk+OeEkGv3BuhqTCm3pD7QscZzCCp6yfhQ+NT6sQlFJwH02Sc5hXM0dcwh7Z3 tJJy8Lh03tjLw== From: Michael Ellerman To: Nicholas Piggin , "Aneesh Kumar K.V" , linuxppc-dev@lists.ozlabs.org Subject: Re: [RFC PATCH] powerpc/book3s64/radix: Upgrade va tlbie to PID tlbie if we cross PMD_SIZE In-Reply-To: <1628053302.0qclx0xcj9.astroid@bobo.none> References: <20210803143725.615186-1-aneesh.kumar@linux.ibm.com> <1628053302.0qclx0xcj9.astroid@bobo.none> Date: Wed, 04 Aug 2021 16:59:25 +1000 Message-ID: <87im0lofua.fsf@mpe.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" Nicholas Piggin writes: > Excerpts from Aneesh Kumar K.V's message of August 4, 2021 12:37 am: >> With shared mapping, even though we are unmapping a large range, the kernel >> will force a TLB flush with ptl lock held to avoid the race mentioned in >> commit 1cf35d47712d ("mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts") >> This results in the kernel issuing a high number of TLB flushes even for a large >> range. This can be improved by making sure the kernel switch to pid based flush if the >> kernel is unmapping a 2M range. > > It would be good to have a bit more description here. > > In any patch that changes a heuristic like this, I would like to see > some justification or reasoning that could be refuted or used as a > supporting argument if we ever wanted to change the heuristic later. > Ideally with some of the obvious downsides listed as well. > > This "improves" things here, but what if it hurt things elsewhere, how > would we come in later and decide to change it back? > > THP flushes for example, I think now they'll do PID flushes (if they > have to be broadcast, which they will tend to be when khugepaged does > them). So now that might increase jitter for THP and cause it to be a > loss for more workloads. > > So where do you notice this? What's the benefit? Ack. Needs some numbers and supporting evidence. >> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c >> index aefc100d79a7..21d0f098e43b 100644 >> --- a/arch/powerpc/mm/book3s64/radix_tlb.c >> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c >> @@ -1106,7 +1106,7 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range); >> * invalidating a full PID, so it has a far lower threshold to change from >> * individual page flushes to full-pid flushes. >> */ >> -static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33; >> +static unsigned long tlb_single_page_flush_ceiling __read_mostly = 32; >> static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2; >> >> static inline void __radix__flush_tlb_range(struct mm_struct *mm, >> @@ -1133,7 +1133,7 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm, >> if (fullmm) >> flush_pid = true; >> else if (type == FLUSH_TYPE_GLOBAL) >> - flush_pid = nr_pages > tlb_single_page_flush_ceiling; >> + flush_pid = nr_pages >= tlb_single_page_flush_ceiling; > > Arguably >= is nicer than > here, but this shouldn't be in the same > patch as the value change. > >> else >> flush_pid = nr_pages > tlb_local_single_page_flush_ceiling; > > And it should change everything to be consistent. Although I'm not sure > it's worth changing even though I highly doubt any administrator would > be tweaking this. This made me look at how an administrator tweaks these thresholds, and AFAICS the answer is "recompile the kernel"? It looks like x86 have a debugfs file for tlb_single_page_flush_ceiling, but we don't. I guess we meant to copy that but never did? So at the moment both thresholds could just be #defines. Making them tweakable at runtime would be nice, it would give us an escape hatch if we ever hit a workload in production that wants a different value. cheers