From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5288C54E49 for ; Fri, 8 Mar 2024 01:47:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 27AF46B0313; Thu, 7 Mar 2024 20:47:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 22AE56B0316; Thu, 7 Mar 2024 20:47:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F2F06B0317; Thu, 7 Mar 2024 20:47:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F260B6B0313 for ; Thu, 7 Mar 2024 20:47:43 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 98FCE1A126B for ; Fri, 8 Mar 2024 01:47:43 +0000 (UTC) X-FDA: 81872185206.20.E3558F7 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf17.hostedemail.com (Postfix) with ESMTP id C6BBD4000E for ; Fri, 8 Mar 2024 01:47:41 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jFyAUPD7; spf=pass (imf17.hostedemail.com: domain of 3PG7qZQoKCPcxnrqxZgldcfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--yosryahmed.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3PG7qZQoKCPcxnrqxZgldcfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709862461; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gfpgu33XDv7gN5c8kt8Yl/1FH+hpdm97kb/imNVS4FI=; b=S1fpzVW+ywCOQdrMTl11FgNvPlmi8xqTlj3gjVLAch2Lvq6kd2bPOzrVyc25Wakktxjici nvoFGTET8VcVKpDh6rTRKraiIzE1yi7LuyeYPOhpMm9yWrklLSGWgaZ4lJYNSd/nJ9pE4s vT5LxAhEqqTUtF1fcRCBUj2DcGuxDW8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709862461; a=rsa-sha256; cv=none; b=euEJDkE3J8BDSJhNsjeNokuF3ggNiSqii1AgKuDuAAL7Ep3stzZKCNY68hhvaVtk5+oBA8 Al6gbJIqdXcB4jM0Dv03Bc9eggfqAB6qJx6DgXMNEG34KTNrqzb+8+FD/fniiBlynTMhxD Dm9KlmwrrCdtPigPeLazZbyLBPey+F8= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jFyAUPD7; spf=pass (imf17.hostedemail.com: domain of 3PG7qZQoKCPcxnrqxZgldcfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--yosryahmed.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3PG7qZQoKCPcxnrqxZgldcfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-dc74ac7d015so2158646276.0 for ; Thu, 07 Mar 2024 17:47:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1709862461; x=1710467261; darn=kvack.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=gfpgu33XDv7gN5c8kt8Yl/1FH+hpdm97kb/imNVS4FI=; b=jFyAUPD7pxWsAAi9RARvb6i2lv2qCybRwNl5wMoTg/o9c/jU9nYUrC/DSc8gZe67qj uzTsBgKv7pexYt0RLzA3XOAQ6StbACnfTliLR9UYobFgv9GBnHnffluDv7+dBHD71YU+ rJI2JISi6+CsKCLhQa9yS/+JVpX2dxqbx7HKDOXCIPU2LGOsMRoPRV89jCNH3A94MZpC 3fpw0KNUfmbCRxnu32TbI/OVIeJ9jZm6YRBJIVe94qBPpxDqXjj/QtfDxV4Dm0RkEr0l BRUDZP/wpFWiRX1hg6ZeD/Btaut4hQ2QFfmF2MVMvSLk2W+PHjWS/GwfqNEG7iSVVRzp 27Yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709862461; x=1710467261; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=gfpgu33XDv7gN5c8kt8Yl/1FH+hpdm97kb/imNVS4FI=; b=HVzsO+zLANhxctzOdA0CggtVMf+FFVX6CC7O91Jjp4+Dem15fdaPX1Pr6MU4+2GtBf qP0KlQm/iky7IQm0+sTLAs9ZXjniSCYIZhRjHqiShVqf0RSVfTe+ksZYJ/WV0VVJwmup SGeEgR9ZQQUpRASRKx+GIQnJb+lc7IHQ+iQGbG8EjutufYc89FTlOYShe5N7wW/gUW61 B2tgZx5gxZRGyAmLkP7OEwW65b0XtGdgWZ0twAWDQi8E0jAjB+q/mmZlFW0n91TCovkr NVFUy9vZm3pRFlnrcOIfRuKLimUUZQQIcdV1vVQ9wONpFAkoOCGfzB7wSJ6bXO1dovtL /ItQ== X-Forwarded-Encrypted: i=1; AJvYcCU/efh/DypcLiJUcGNRDk9UkJp5iFrlv1pkTawqt2S+8bsV09spvyaAz0Dr6RHU/giA5revAtU9bytrqBPXZZnFBmA= X-Gm-Message-State: AOJu0Yx2NihgIohYJr4AJkmXm6bFY63elPmog1/VT3qRiGBkeT687VFN 9UyHthECo4yVYtfL7WMYHrb1nE3UanUFkgzn9abXNFNnfvEQS7EHvVO32ql2KC8EBwCZTUhmufi MFiFw8ZmO4xlyZKAdXw== X-Google-Smtp-Source: AGHT+IEm1M+oGVLssHL64fBS/glWkZaxFmMJGGdFsNWWr37E1HSxI9bR/8WTdkuSFSgHSnIMFlAMyRb29kIpm944 X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:29b4]) (user=yosryahmed job=sendgmr) by 2002:a25:6943:0:b0:dcc:4785:b51e with SMTP id e64-20020a256943000000b00dcc4785b51emr893146ybc.12.1709862460910; Thu, 07 Mar 2024 17:47:40 -0800 (PST) Date: Fri, 8 Mar 2024 01:47:39 +0000 In-Reply-To: <420fcb06-c3c3-4e8f-a82d-be2fb2ef444d@app.fastmail.com> Mime-Version: 1.0 References: <20240307133916.3782068-1-yosryahmed@google.com> <20240307133916.3782068-3-yosryahmed@google.com> <420fcb06-c3c3-4e8f-a82d-be2fb2ef444d@app.fastmail.com> Message-ID: Subject: Re: [RFC PATCH 2/3] x86/mm: make sure LAM is up-to-date during context switching From: Yosry Ahmed To: Andy Lutomirski Cc: Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "Peter Zijlstra (Intel)" , "Kirill A. Shutemov" , "the arch/x86 maintainers" , linux-mm@kvack.org, Linux Kernel Mailing List Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: C6BBD4000E X-Rspam-User: X-Stat-Signature: t816pjsd3kh56axggpkpf9eqczes9pzm X-Rspamd-Server: rspam03 X-HE-Tag: 1709862461-96666 X-HE-Meta: U2FsdGVkX18bX/1AMuDbIDSwRqSwJzyN220mShyMtYGTz0V21zrWYJ1dJiavjDeoiH6pRGQIPRLJTOoxZOnh90oQ/7/g+PjjPinrgO9vslPCD2Lnkpy4boOzf8WMIKsoNjQwXW/Qp/aBlnFRdxl3KArqdehnl1g64sXAfqsHy0WEXKCkKhiQZLJKuBn5R7zr79MXnHv4vn93PxqEz4wQH7oElNmNB75Z2bREp/IwRfzBUMDwnxBfXxlR+ctEqFY+CojU7eEDBEuFEjgbsCN/bL6Ng3c9UvepiHquBgT+JXpsMDLFJjQyR3L3KOG8W5BtPRTtp9ieQyvizPqiVkVTwWHwvzwvzn4bAd98GtGfw4imj2gdAtd9fKv9xCYFNAwi5bQlRD6LTfTG9fH16buZfc7mHZhiyULYMOfhraRDkrpxkAZscHv3yKIfnc20fgo5FV67IL83KrNkvS72gSvGdJxiJiONPjbg6Yjws5gdZGMiNMmIVCGOSm9jXcfyxd5z+cKFadUPTimGP5UhfvX+yytOhV+7b4PmFkgVHst49UruiTH49m9tPhGZaAaIjKfhmNUK/uIIUvD8/dxjenork5u1TXZIzlifCr+JBMXUUJk8iufcguS5hZ0A1fzcOUV1unSgQrOrt6AsYmf0u118dPm4VGc2afAqqq1yhXkIEAB8FLpsB8UrSE+qrJvXj+wKYKUXIOwZjhIoxAaOtsephpYoUSAKX+s1SWnZnmuUQar4S/u+IANVPG4u48oDi5ne8ufq8v0JYl7UZ/g2vlt3ezuW7aPBPf6IYPmo44ut+p4saUHMWVho7sRvqsbnm1MUsgEkasGJTr3PacPE2sUMzTUdO9XKPQgMqIr9oNSRcTWFtHtgwwD9pYqO+iK5FyM4RQTVv475f5YhgmF9LfFWlHxC3s8Fy+FwPwLvX5VVfwQI6U+IK2KIhBSmcL1Evzkbaw678uwB2Rksya75rYS J9CuAXFg fLhwz/NmH72+tB0c8wLEeg3ljpVZHKYjBe6QQXrzFoIVxfHUoG0EAvIJBS00lVjJU6uw+9ng9LI5hA/loT5PzIvss4ray9Xr94NziS0x3bDeJ95MtMEqT7RVCA8uGyMMNJFJlEcGq+1sOvlS9cYH2r8moYVduE5hpg1j895b2J6t9DB8MrJQUkwxOCoHpSVr+pZVimyrq5PAfp9lAQCsxq4QlniyhAJXhYO+LZeb0fKhRRIYvtkhQaG+GgrL2aTZ3Tg+WxHPqwWylitFDpdFd64mlyegktbzRlP8Ur7D4dIOmyGCmKKQ7BZd/hqIlYn6UcjkDAEFul8keQm1jN8n7okeNEOwPD8sYNef3TgfSckrDx82Q6AaR1kxHF2gD/d/bTpLtBEgfdGmNvrfVSL5idLbATgZ8VGqjsKcrNtQncV2I0qG+sx+yq9m2HNVPI1q7yCdaeO7+zsY9Z+UIgfaXMcGzwFx5VT3JaJeBUrCFgSlCFnorB80MIdulsm39HkU4l9/yM59X4iGtq372YQJ0l0wgkR7Bzi+xX9rcFoCU73Yem8NLycCgCZZpA4ytbTgV0wDUOtLBey+QbE6C4KtxS0IWc0SXSZXx8bAcP3ybBiEaLDD+nKGFpYyzag== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 07, 2024 at 05:34:21PM -0800, Andy Lutomirski wrote: > Catching up a bit... >=20 > On Thu, Mar 7, 2024, at 5:39 AM, Yosry Ahmed wrote: > > During context switching, if we are not switching to new mm and no TLB > > flush is needed, we do not write CR3. However, it is possible that a > > user thread enables LAM while a kthread is running on a different CPU > > with the old LAM CR3 mask. If the kthread context switches into any > > thread of that user process, it may not write CR3 with the new LAM mask= , > > which would cause the user thread to run with a misconfigured CR3 that > > disables LAM on the CPU. >=20 > So I think (off the top of my head -- haven't thought about it all that h= ard) that LAM is logically like PCE and LDT: it's a property of an mm that = is only rarely changed, and it doesn't really belong as part of the tlb_gen= mechanism. And, critically, it's not worth the effort and complexity to t= ry to optimize LAM changes when we have a lazy CPU (just like PCE and LDT) = (whereas TLB flushes are performance critical and are absolutely worth opti= mizing). >=20 > So... >=20 > > > > Fix this by making sure we write a new CR3 if LAM is not up-to-date. No > > problems were observed in practice, this was found by code inspection. >=20 > I think it should be fixed with a much bigger hammer: explicit IPIs. Jus= t don't ever let it get out of date, like install_ldt(). I like this, and I think earlier versions of the code did this. I think the code now assumes it's fine to not send an IPI since only single-threaded processes can enable LAM, but this means we have to handle kthreads switching to user threads with outdated LAMs (what this patch is trying to do). I also think there is currently an assumption that it's fine for kthreads to run with an incorrect LAM, which is mostly fine, but the IPI also drops that assumption. >=20 > > > > Not that it is possible that mm->context.lam_cr3_mask changes throughou= t > > switch_mm_irqs_off(). But since LAM can only be enabled by a > > single-threaded process on its own behalf, in that case we cannot be > > switching to a user thread in that same process, we can only be > > switching to another kthread using the borrowed mm or a different user > > process, which should be fine. >=20 > The thought process is even simpler with the IPI: it *can* change while s= witching, but it will resynchronize immediately once IRQs turn back on. An= d whoever changes it will *synchronize* with us, which would otherwise requ= ire extremely complex logic to get right. >=20 > And... >=20 > > - if (!was_lazy) > > - return; > > + if (was_lazy) { > > + /* > > + * Read the tlb_gen to check whether a flush is needed. > > + * If the TLB is up to date, just use it. The barrier > > + * synchronizes with the tlb_gen increment in the TLB > > + * shootdown code. > > + */ > > + smp_mb(); >=20 > This is actually rather expensive -- from old memory, we're talking maybe= 20 cycles here, but this path is *very* hot and we try fairly hard to make= it be fast. If we get the happy PCID path, it's maybe 100-200 cycles, so = this is like a 10% regression. Ouch. This is not newly introduced by this patch. I merely refactored this code (reversed the if conditions). I think if we keep the current approach I should move this refactoring to a separate patch to make things clearer. >=20 > And you can delete all of this if you accept my suggestion. I like it very much. The problem now is, as I told Dave, I realized I cannot do any testing beyond compilation due to lack of hardware. I am happy to send a next version if this is acceptable or if someone else can test.