From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C29D528F4 for ; Mon, 27 Jan 2025 01:31:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737941499; cv=none; b=WBcQj9pV7Yii9WJ54me4JUm5gAyUVu7yR6WQZk6zKUK1n/H3ePoKKm/07Ng349igRWdxEa59Rehkdoexc5xgcVeXFqoiAkf5gcFSDHszDBzpGEQlJ+B3PwIiOQ885G0Ks1V01D/TcB76jfgcSwDBcBP73M1kZAuvMBfs85xT5oo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737941499; c=relaxed/simple; bh=C9oaLXpO6owfOTt2dwUE8h3tAvblW+8hazUUIkKzqu4=; h=Date:From:To:cc:Subject:In-Reply-To:Message-ID:References: MIME-Version:Content-Type; b=JSmr63xlcLd0k+Mo56JQqZOp/+DBebHp1Q/xwTNPUGQ4Ajfa9dO+c2y2GbQcuX6681CIWALosaANdgVQuYg/vo7s9oFeprb5VR4eeSK0HzTG+KqWowzxSpNN+D9KZ5HyPgjWjw/n6Iy6YkpUzYVz2Q+5w+x0IUU6NUejLbn6UB4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=BG7Ld0Yt; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="BG7Ld0Yt" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-2166651f752so75197245ad.3 for ; Sun, 26 Jan 2025 17:31:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737941497; x=1738546297; darn=vger.kernel.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=POceRtMWVWLLwpVXZyFlwaFV2xC0SiBcMp9OIByqfZ8=; b=BG7Ld0YtBbudRPbCK45DhRDkzWEDC88EkET4QRjRmnem/PX5Hkhk2rXDJx0BOUn9zw VzPAzdvHe0wAgwF/fjFalICsz+Yc2KTz7FTRtnTV7XshQqIHpDA4efDmsPTA+TWDFOE+ 3d0bHozhTSRFZ6/s5/I/uGtwJ+AuI7W1RaZO4GxDHPREIh3IbHFUZSFBdXqpDfP7JRoL yJmKxADzGJlqD6GV89kGI0i0sIKbm85XEcUhNRZZ46cjBMzAi2SpPzFr6bpU+7kXZozG 7m0XnnrZcTJYP359OG0b2LDbu7hzaV6EcC7G1jgPawq9TQiM6/hkJiJ7s5YMevTcEx3L tQ8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737941497; x=1738546297; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=POceRtMWVWLLwpVXZyFlwaFV2xC0SiBcMp9OIByqfZ8=; b=fbF1YgX85MKE+HNA5EIkgROkLUNEZIaYrpyyFTM5Ls58hX8BbOk6byqjpbGxXIeBnE mTL/GgVCA/6N9ooOLd5kNsS1f7Q1dCoN6PrdSA83CCKyQ3N20bOKMsxL6EfkTJFz/i1/ CBadYkQL4NTjO5v6voVfCykYs6G6DwaRQy0ArdJAkSuThYbJB6NJvK3n8WmBS2c4JLY9 1TA3YeHM2s1H3iHpEyfJP1350Z6ZgAsceeZskxhu592EvRilQKp6CJAA6VVdwMrI/xbl XuRPYer4ff32BiF3QGidR3JTjLU7bSC6MPsHaacCp0ZE1R68grKFTy/G2WxHIC1MRxWG hdMw== X-Gm-Message-State: AOJu0YxMLXEF+ja9ipPpf15Gv0adR9rg2/je4cntgcU21nuP8CzrgtQO HCgPRcN5TrMGREici1O14eYZBcnHM6WwsL/5oZgnN20O43zYh0N38KLNbod5gA== X-Gm-Gg: ASbGncsaKAuNtIpzvU8sTTEUbmPa38Gz/jOvNieccwwLLBZMU/8xoRJ3waaMX6lRorS G4OiVJF1IxWQJAPFJw+Xs1M7QDWr4IXefIeqHTH4ZoX6o1RyFyJbni+DtF1lD+TZaBrZbhnPyI/ FPeaW5bNRwsGz69ZKw3yIu8rbnCuLXc/M4B0BEu7fW/KsN4Ff5BHLvKNZutF6TXtTWdZ4Dgbvkr rSoUN+EoQ2ESFKqF9ZuoLmr18STeK43RyvIlkC+6C2HEmVeI1lhxypoUqy5nDUjeDnTp3BWT+IG P08SMZGkXCfBqY2fi5LEe43Dha8buRw3hO/Plsgs2IrUnNW8Vvq5ejNyoZsHSKj3 X-Google-Smtp-Source: AGHT+IF3nXHvDnjkTs9Lalq7+vB5ht9+80y0rjngMtsBffGQq/s/lUsmT7Sh1KnuAANnuh5JXv4mvw== X-Received: by 2002:a05:6a20:7f96:b0:1d9:c78f:4207 with SMTP id adf61e73a8af0-1eb2147d3f9mr58967025637.11.1737941496837; Sun, 26 Jan 2025 17:31:36 -0800 (PST) Received: from darker.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-ac496cb2106sm5282233a12.67.2025.01.26.17.31.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Jan 2025 17:31:35 -0800 (PST) Date: Sun, 26 Jan 2025 17:31:25 -0800 (PST) From: Hugh Dickins To: Roman Gushchin cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Hugh Dickins , Andrew Morton , Jann Horn , Peter Zijlstra , Will Deacon , "Aneesh Kumar K.V" , Nick Piggin , linux-arch@vger.kernel.org Subject: Re: [PATCH v3] mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables() In-Reply-To: <32128611-60d5-147c-7e82-7b1dfbe8236b@google.com> Message-ID: <1e6330a1-c671-c8d0-7eab-e6b9fc7a9d2a@google.com> References: <20250123164358.2384447-1-roman.gushchin@linux.dev> <32128611-60d5-147c-7e82-7b1dfbe8236b@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII On Thu, 23 Jan 2025, Hugh Dickins wrote: > On Thu, 23 Jan 2025, Roman Gushchin wrote: > > > Commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas") > > added a forced tlbflush to tlb_vma_end(), which is required to avoid a > > race between munmap() and unmap_mapping_range(). However it added some > > overhead to other paths where tlb_vma_end() is used, but vmas are not > > removed, e.g. madvise(MADV_DONTNEED). > > > > Fix this by moving the tlb flush out of tlb_end_vma() into > > free_pgtables(), somewhat similar to the stable version of the > > original commit: e.g. stable commit 895428ee124a ("mm: Force TLB flush > > for PFNMAP mappings before unlink_file_vma()"). > > > > Note, that if tlb->fullmm is set, no flush is required, as the whole > > mm is about to be destroyed. > > > > --- > > As Liam said, thanks. > > > > > v3: > > - added initialization of vma_pfn in __tlb_reset_range() (by Hugh D.) > > > > v2: > > - moved vma_pfn flag handling into tlb.h (by Peter Z.) > > - added comments (by Peter Z.) > > - fixed the vma_pfn flag setting (by Hugh D.) > > > > Suggested-by: Jann Horn > > Signed-off-by: Roman Gushchin > > Cc: Peter Zijlstra > > Cc: Will Deacon > > Cc: "Aneesh Kumar K.V" > > Cc: Andrew Morton > > Cc: Nick Piggin > > Cc: Hugh Dickins > > Cc: linux-arch@vger.kernel.org > > Cc: linux-mm@kvack.org > > --- > > include/asm-generic/tlb.h | 49 ++++++++++++++++++++++++++++----------- > > mm/memory.c | 7 ++++++ > > 2 files changed, 42 insertions(+), 14 deletions(-) > I had quite a wobble on Friday, couldn't be sure of anything at all. But I've now spent longer, quietly thinking about this (v3) patch, and the various races; and with Jann's help, I do feel much more confident about it all today. > The code looks right to me now, but not the comments (I usually > prefer no comment to a wrong or difficult to get right comment). Yes, the code does look right to me now. And although I can still quibble about the comments, I'd better not waste your time with that. Let me say Acked-by: Hugh Dickins while recognizing that this may not be the patch which goes into the tree, since Peter has other ideas on the naming and wording. > > Except when I try to write a simple enough correct comment, > I find the code has to be changed, and that then suggests > further changes... Sigh. > > (We could also go down a path of saying that all of the vma_pfn stuff > would be better under #fidef CONFIG_MMU_GATHER_MERGE_VMAS; but I > think we shall only confuse ourselves that way - it shouldn't be > enough to matter, so long as it does not add any extra TLB flushes.) > > > > > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h > > index 709830274b75..cdc95b69b91d 100644 > > --- a/include/asm-generic/tlb.h > > +++ b/include/asm-generic/tlb.h > > @@ -380,8 +380,16 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb) > > tlb->cleared_pmds = 0; > > tlb->cleared_puds = 0; > > tlb->cleared_p4ds = 0; > > + > > + /* > > + * vma_pfn can only be set in tlb_start_vma(), so let's > > + * initialize it here. Also a tlb flush issued by > > + * tlb_flush_mmu_pfnmap() will cancel the vma_pfn state, > > + * so that unnecessary subsequent flushes are avoided. > > No, that misses the point (or misses half of the point): the > tlb_flush_mmu_pfnmap() itself won't need to flush if for other reasons > we've done a TLB flush since the last VM_PFNMAP|VM_MIXEDMAP vma. > > What I want to write is: > * vma_pfn can only be set in tlb_start_vma(), so initialize it here. > * And then any call to tlb_flush_mmu_tlbonly() will reset it again, > * so that unnecessary subsequent flushes are avoided. > > But once I look at tlb_flush_mmu_tlbonly(), I'm reminded that actually > it does nothing, if none of cleared_ptes etc. is set: so would not reset > vma_pfn in that case; which is okay-ish, but makes writing the comment hard. > > So maybe tlb_flush_mmu_tlbonly() should do an explicit "tlb->vma_pfn = 0" > before returning early; but that then raises the question of whether it > would not be better just to initialize vma_pfn to 0 in __tlb_gather_mmu(), > not touch it in __tlb_reset_range(), but reset it to 0 at the start of > tlb_flush_mmu_tlbonly(). > > But it also makes me realize that tlb_flush_mmu_tlbonly() avoiding > __tlb_reset_range() when nothing was cleared, is not all that good: > because if flushing a small range is better than flushing a large range, > then it would be good to reset the range even when nothing was cleared > (though it looks stupid to reset all the fields just found 0 already). My paragraph (about the existing code, independent of your patch) looks nonsense to me now: if there was nothing to be cleared, then the range would not have been updated, so would not benefit from being reset. It's still true that there would sometimes be an optimization in setting "tlb->vma_pfn = 0" in every tlb_flush_mmu_tlbonly(); but it's merely an optimization, for an unusual case, which you may find demands yet more thought than it deserves; my guess is that you will prefer not to add that change, and that's fine by me. So, if you did respin and change the comment (but I don't insist), maybe: * vma_pfn can only be set in tlb_start_vma(), so initialize it here. * And then it will be reset again after any call to tlb_flush(), * so that unnecessary subsequent flushes are avoided. Hugh