From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A10AC54798 for ; Fri, 8 Mar 2024 01:34:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 99A296B0312; Thu, 7 Mar 2024 20:34:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 94D8D6B0313; Thu, 7 Mar 2024 20:34:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 839536B0314; Thu, 7 Mar 2024 20:34:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 761C56B0312 for ; Thu, 7 Mar 2024 20:34:50 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4BDE081259 for ; Fri, 8 Mar 2024 01:34:50 +0000 (UTC) X-FDA: 81872152740.26.8719363 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id 3096B4000B for ; Fri, 8 Mar 2024 01:34:48 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="c/nEGzLm"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf01.hostedemail.com: domain of luto@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=luto@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709861688; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=i3O5V8pJhzG74YuD/KeQ7+wa7X0B6Cpua0EqKaFU9uk=; b=Qv0g6thEdkVr4EBfozBYv2w8JEk70pVoQp3YFHXK2RkCY5d79/eg20mlcgy4ge+xbkCuy5 53jT6x+nK5akbEh1i6j3hzHuAPHSqyCxYIySiPJ+Qz2f9pkO6i8phrgW/tRDNHRcJFdE2c ucm1pVHQeS2dYArDB+A/hnEqVLswqss= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="c/nEGzLm"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf01.hostedemail.com: domain of luto@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=luto@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709861688; a=rsa-sha256; cv=none; b=KzKejye0K/MrBG/lMijP4hCe51b6wS5eQGaaG+VwpYIwrOff7ZMRc0NbyrEUkaKHt1gS9Z y4b9yUyy9TvlNuor6dlj1RITbOEbAJ2SQJ5iSYIQaaqvmUbBhr8iayGlwPh/Bc2TsafLHG 4uKRwHJfZ9y/nM9053a5QjJttPZINh4= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 4685F61708; Fri, 8 Mar 2024 01:34:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4E2D0C433F1; Fri, 8 Mar 2024 01:34:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1709861686; bh=VjCS+mPzMg57EfVH0u7opRtoT+iYbBR5Y72kAhSAutY=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=c/nEGzLmKXdbr0YrgSBm/K+mVHde1oYwHDddKj/l5JGP5tetyCw/QZuhYaPCrtJRI aeIP0uYYF1746o8BOSIZ6MAVWEKePUJNM225tAAHqHTpFBl6a7LeTlGKHvUz2ufuFF IVRc3QdcKexwxXkmgOuMja82d0TmGYAQfkwgqe0cp9xw8ejp+9klNstX4RFyeCeCt7 kf4VLbIS8Avm7WlzNQrZ6WVUeX7D8ud+LLyInf2t2QVoRPry7VTAOdzdVxQ08PdNRK sew34aSEMlZ5LCiwaat5yXGZTOzIJSLMUGSVkkqEK04iGi/FsGSy7JacvmyOUJmBMI cGKAQL6uka4Tw== Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailfauth.nyi.internal (Postfix) with ESMTP id 34A211200043; Thu, 7 Mar 2024 20:34:45 -0500 (EST) Received: from imap48 ([10.202.2.98]) by compute3.internal (MEProxy); Thu, 07 Mar 2024 20:34:45 -0500 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrieeggdefhecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefofgggkfgjfhffhffvvefutgesthdtredtreertdenucfhrhhomhepfdetnhgu hicunfhuthhomhhirhhskhhifdcuoehluhhtoheskhgvrhhnvghlrdhorhhgqeenucggtf frrghtthgvrhhnpedvhfeuvddthfdufffhkeekffetgffhledtleegffetheeugeejffdu hefgteeihfenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhroh hmpegrnhguhidomhgvshhmthhprghuthhhphgvrhhsohhnrghlihhthidqudduiedukeeh ieefvddqvdeifeduieeitdekqdhluhhtoheppehkvghrnhgvlhdrohhrgheslhhinhhugi drlhhuthhordhush X-ME-Proxy: Feedback-ID: ieff94742:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 2E30031A0065; Thu, 7 Mar 2024 20:34:44 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.11.0-alpha0-251-g8332da0bf6-fm-20240305.001-g8332da0b MIME-Version: 1.0 Message-Id: <420fcb06-c3c3-4e8f-a82d-be2fb2ef444d@app.fastmail.com> In-Reply-To: <20240307133916.3782068-3-yosryahmed@google.com> References: <20240307133916.3782068-1-yosryahmed@google.com> <20240307133916.3782068-3-yosryahmed@google.com> Date: Thu, 07 Mar 2024 17:34:21 -0800 From: "Andy Lutomirski" To: "Yosry Ahmed" , "Andrew Morton" Cc: "Thomas Gleixner" , "Ingo Molnar" , "Borislav Petkov" , "Dave Hansen" , "Peter Zijlstra (Intel)" , "Kirill A. Shutemov" , "the arch/x86 maintainers" , linux-mm@kvack.org, "Linux Kernel Mailing List" Subject: Re: [RFC PATCH 2/3] x86/mm: make sure LAM is up-to-date during context switching Content-Type: text/plain X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 3096B4000B X-Stat-Signature: xrk9ugbd53hdydpuz8o3quuy15eky6ir X-Rspam-User: X-HE-Tag: 1709861688-184573 X-HE-Meta: U2FsdGVkX1/pehx7ZK8fVoKX2lM6tU3cGjmj4f+yL+eY8W+5D1dgNaM0V7ULcZLH7h4/cF5amckW2hwhXmOYkqk648mX+GVxuNchWPb2Jcgqa85o8+/dQSf3AE5qf6kSaANEp1/HKHcrFjId8qsxMuAlbmMyqLRNotqx65cHzOxMqqmW3gkQMAlp2HkF1zLGPLgB510Q11nilUfVoyW6JrfFl/uiGx9+BnzEaWyDuXSGD8LUgIb6zJ/ZF5tqpLXwp8c8ldrYj+TMWoWrkFuXyCHAs0a0f6yFrZ/Y3FDiyW+jBo7UFOE0eurCPIud8Ns9ZwojOVUGoOyp6TVyOIE5cL5LYIr+F4i/6tlfe5VJfD/OLTvgbH1R9akabAhGLcdNEvBwKRUzVBR7dWlOftokfBbGcyePIEgMsyZOu1oZY1Jo0zZ6Z5ZZQBpww1Igjh98RPWkHipC/VGtlXKlPVyMG/4a7dISNMjyyxl8ROmBZps4qCoKra6eTLTebGAXzD3AmQDjOjDvSLI8U9iZxVGusSRblD6AeWd0PgqUfQ9GrHxZKy46VKiVSqu50++Xs2HgzO2MTC23pk7BlKtorUD+4OP95+xm5Ld56gZ+16iqHfSm7qeyv6IN1a7zu/mYfQV4HuSTnyRDfSkGbm37+qeCDxxX0IYiqxXxYoDwuGXJWPSixt97LgrKXNnezeqQF7x7lyNLz2iJnc9zErDpI/xpFBDJaSf1Rk0bv+M7fKqxtTq4Qr2J0CGWyPmeY9oRZf7AqymTYDXDwtL4Kb/J6/2fwlUjahhFxsJ5f3Ni5lwQM0GoE5Av6zrUGjguewGHpwkCqPgdObhvigN3BcCrc6v8UBLV7dpHHEdIUS9CRLg4SqOl7uShb1maHoGl+MLBimtzwI9Seoam4t7Ux31MptWST7q8zD+VBroxem2193s6esKgXIYyUrlHXgOlLUzpp5HSFCohjcblKoq7/WVXfXx 3+MLNBp+ WGV6rC20cVG42APP3EopnbjAtCM0Nw/IpW24z2GdXYXfeR48wHhIzous3jANr9E8TNZsf5cQx40yc5rYrR77K4cFYFidGcmGWPY6VloT7b8m4ZzJF2LHF2YPjlrb6Dsov3awgCaYtxCL3oAnwcy19AMAI6ejjNKP4WsLv X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Catching up a bit... On Thu, Mar 7, 2024, at 5:39 AM, Yosry Ahmed wrote: > During context switching, if we are not switching to new mm and no TLB > flush is needed, we do not write CR3. However, it is possible that a > user thread enables LAM while a kthread is running on a different CPU > with the old LAM CR3 mask. If the kthread context switches into any > thread of that user process, it may not write CR3 with the new LAM mask, > which would cause the user thread to run with a misconfigured CR3 that > disables LAM on the CPU. So I think (off the top of my head -- haven't thought about it all that hard) that LAM is logically like PCE and LDT: it's a property of an mm that is only rarely changed, and it doesn't really belong as part of the tlb_gen mechanism. And, critically, it's not worth the effort and complexity to try to optimize LAM changes when we have a lazy CPU (just like PCE and LDT) (whereas TLB flushes are performance critical and are absolutely worth optimizing). So... > > Fix this by making sure we write a new CR3 if LAM is not up-to-date. No > problems were observed in practice, this was found by code inspection. I think it should be fixed with a much bigger hammer: explicit IPIs. Just don't ever let it get out of date, like install_ldt(). > > Not that it is possible that mm->context.lam_cr3_mask changes throughout > switch_mm_irqs_off(). But since LAM can only be enabled by a > single-threaded process on its own behalf, in that case we cannot be > switching to a user thread in that same process, we can only be > switching to another kthread using the borrowed mm or a different user > process, which should be fine. The thought process is even simpler with the IPI: it *can* change while switching, but it will resynchronize immediately once IRQs turn back on. And whoever changes it will *synchronize* with us, which would otherwise require extremely complex logic to get right. And... > - if (!was_lazy) > - return; > + if (was_lazy) { > + /* > + * Read the tlb_gen to check whether a flush is needed. > + * If the TLB is up to date, just use it. The barrier > + * synchronizes with the tlb_gen increment in the TLB > + * shootdown code. > + */ > + smp_mb(); This is actually rather expensive -- from old memory, we're talking maybe 20 cycles here, but this path is *very* hot and we try fairly hard to make it be fast. If we get the happy PCID path, it's maybe 100-200 cycles, so this is like a 10% regression. Ouch. And you can delete all of this if you accept my suggestion.