From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7318EC54E49
	for <linux-mm@archiver.kernel.org>; Fri,  8 Mar 2024 01:26:12 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id BDB926B030D; Thu,  7 Mar 2024 20:26:11 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id B8B9E6B030E; Thu,  7 Mar 2024 20:26:11 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id A543E6B030F; Thu,  7 Mar 2024 20:26:11 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id 8FB956B030D
	for <linux-mm@kvack.org>; Thu,  7 Mar 2024 20:26:11 -0500 (EST)
Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id 52465C03FB
	for <linux-mm@kvack.org>; Fri,  8 Mar 2024 01:26:11 +0000 (UTC)
X-FDA: 81872130942.21.34DE04A
Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202])
	by imf30.hostedemail.com (Postfix) with ESMTP id B5C548000E
	for <linux-mm@kvack.org>; Fri,  8 Mar 2024 01:26:09 +0000 (UTC)
Authentication-Results: imf30.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=dxT9k5Vn;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf30.hostedemail.com: domain of 3MGnqZQoKCOEbRVUbDKPHGJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--yosryahmed.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3MGnqZQoKCOEbRVUbDKPHGJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--yosryahmed.bounces.google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1709861169;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=eXseg/OXOOaXO0QfrFbk0NsY7v7P+AZODbEPqN/H3o8=;
	b=fBWLts9eq52zKj2V2dbK/q1aUEsGARgP3b9hQ5chdGW+odmkj5XJUPkkN91eX/Jmn2euko
	oSucKUtJTBRFKaxCkyZFWPoB6CIZS4j4VmH2C9FL3/mIL6kP9UerYmfvD58TUjRRNv2aaL
	zQXd+Y3gXP4f0APjBw0UibdKG0qKJ3s=
ARC-Authentication-Results: i=1;
	imf30.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=dxT9k5Vn;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf30.hostedemail.com: domain of 3MGnqZQoKCOEbRVUbDKPHGJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--yosryahmed.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3MGnqZQoKCOEbRVUbDKPHGJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--yosryahmed.bounces.google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709861169; a=rsa-sha256;
	cv=none;
	b=iFfIT1TZxB9vO8ZHjO8Cw4lzOYzq09dFhw21bP+PxhRVJwcKae6IlhuOd1unfD8q86fIHt
	aR4MRGtd3/xvPiwhEMQRkBDUjtdhzNhMVheMCxA4pmcof3k2Vs+ZV/10rS+u6VlWuQWu6t
	1nDLiZzjua/YhUdZGOPksZFbJ6vPNEA=
Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-dc6b26783b4so743200276.0
        for <linux-mm@kvack.org>; Thu, 07 Mar 2024 17:26:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1709861169; x=1710465969; darn=kvack.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=eXseg/OXOOaXO0QfrFbk0NsY7v7P+AZODbEPqN/H3o8=;
        b=dxT9k5VnIb1aXrvXJZKlUkjWGYa0acSt9ZoSk/rF5OSAJZm/F9U1AJK/7iRQh2/SDI
         AzxLFkvVNghR8Br4DnNVTs3eW2kf7GgbC1NWLr0lbCZbeZHxmBPdSPnahRtUVYZpz7e1
         rzYdAfELGQeF1LJpFq1c/ePgxe+UoKdgUQpb6531GPZGiK/c0XiQtgycTTdH1VG3bb9B
         lRPsEgUZsCYHUxpzaTWA8cy8G/zPLdvOvX3bnBJh1zqY0LfAVeMnkqIUvcdZ1CDeq/ep
         qYAlKpJtghdR8MAS/z8wGTvb++1S067u3kFdxWDLsO0HJultkyIuv+6TQiaU5SPQMyjF
         wETQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1709861169; x=1710465969;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=eXseg/OXOOaXO0QfrFbk0NsY7v7P+AZODbEPqN/H3o8=;
        b=T9BhRUF2bPQyWIizPkNiin+zVG1sKbxgl5/yzKCRwCu9nCqcq0V0Byhj7oVt3Yo0kg
         76OdkRQacRBK43K/duYpiq8jE8UMlNVje/DvfMD4mW21cCsYGlC3uUnWmCiEGIBfMMZR
         w8pUCJ5WpFbGi2jQWqbYcct2jKFYlJfdp+PmjSwjHQOHV2ta8BAiEdX+JZamkCen8q8J
         gHfiKl5VGnr6m6OHyeCe7BtoYOHKScC3lrX6GTP/yqqGEIqiH+lmnscpBEWycSSOdpjF
         5GKb5tWus08+qfC+lFauUawE6JfSRs2vpMd6knvZRwuDRTD33mz2toev1NDjrrd1TRa2
         kzqQ==
X-Forwarded-Encrypted: i=1; AJvYcCVui7R2kgdtt3nsUVyXAoKIh/2v7NDmRFAIoMf9ePhFY6F7VH0a8SHM0JTf56ZcAu8WXwQT0dGAs2z8s1IFRxIO/vc=
X-Gm-Message-State: AOJu0Yz7nxKYhcTrAoTRtJB+sydxPJIZiE4k6hMqjrQRWf5MHGNLmxB7
	ugfumMZyK1/PSSqiz80oz+ghghGAi/NeuEosUwV/zrmH0oh//+hPBJFP393igHGe2L3vYAPIFUE
	xilhqFuZ8HkAgsNsjgA==
X-Google-Smtp-Source: AGHT+IFwilqH/mUypD/jBtdQECzEjD7iyvfhQwzvfhI6NuQeEvEpas/utpcVa7OIlx3iC/kKszoRugEj08/+/mjC
X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:29b4])
 (user=yosryahmed job=sendgmr) by 2002:a05:6902:18d3:b0:dc7:82ba:ba6e with
 SMTP id ck19-20020a05690218d300b00dc782baba6emr666616ybb.7.1709861168825;
 Thu, 07 Mar 2024 17:26:08 -0800 (PST)
Date: Fri, 8 Mar 2024 01:26:06 +0000
In-Reply-To: <Zeo_yUTdGKVed7ff@google.com>
Mime-Version: 1.0
References: <20240307133916.3782068-1-yosryahmed@google.com>
 <20240307133916.3782068-3-yosryahmed@google.com> <83b019e2-1b84-491a-b0b9-beb02e45d80c@intel.com>
 <Zeor4DIGj0u6LNIw@google.com> <cb1aaac1-4800-4cae-8aea-acba6353971c@intel.com>
 <Zeo_yUTdGKVed7ff@google.com>
Message-ID: <ZeppLlDeTro6zpIg@google.com>
Subject: Re: [RFC PATCH 2/3] x86/mm: make sure LAM is up-to-date during
 context switching
From: Yosry Ahmed <yosryahmed@google.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, Thomas Gleixner <tglx@linutronix.de>, 
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Peter Zijlstra <peterz@infradead.org>, 
	Andy Lutomirski <luto@kernel.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, x86@kernel.org, 
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="us-ascii"
X-Rspam-User: 
X-Stat-Signature: zt6e7og5gtbynwg1ciro6s99t1t9s391
X-Rspamd-Server: rspam07
X-Rspamd-Queue-Id: B5C548000E
X-HE-Tag: 1709861169-208242
X-HE-Meta: U2FsdGVkX1+uNjYEWSddXCmlU19Rf80hy58vnUQMOPzeKFidbogELYnNMG3l6rREdhd/QgayldpOqNwosVNkBd7vvjsH6pZq1zZk2mPAe2okrdkoaT5VObppEINjwSXDT/y5J6lddA2+r/XSIKqHZeHdOt8qHHdKIJAdAwI+IfBHIbkFjBTHuo/HAVA/M5gGYloVS5+MNWo+8V7UP+VKqmCqwt48K8qSHFIUvslDsmE4GojBj7VkIaVntGUMLc64cppRr7d1KJDpQg6ZsNZPGMht078qnxz/Jj/ZxfWJChPKog5zt7ulGbLE44QpDD34i8Or1oaeOwb9zDYpXlNfzWfVy51vgdxgJFWKOP2quuFUNh++IYEGZ1ACc8Pas8qSk2aBLeTolKwhlOqGtKcD4f5a1Zdx2Yb5K3E13MZQK6c2G6f0uVX5fMWOKiuvaFxXxUVaOYHOy2LVqxr9WMT+/MpdxJH+ovf/wEsnA05WXvj+Ko31yiRY9b20d+CXJMRLX8+Rv9f8ye9S+NYzaguRTrmO0AkbNlqmqBtbasMk4jRcIRZWuimCKCL63PZ30IGb98Biz2iSaJkFZiZt1Ksx+hTQxsjyn6pzG1FNgmeUuItk2EU0kifCsQhHKXggOJ9RTLWVPEMJRYfry6ZwscK6HDwmcHJ+QV6685YEDtke2PPOj2AmKVZQIfk7GVHt2vQ3FuZRscjvEZkFCxY5Zkb/ExyKChzeaWTldchkBGv1bR4TbknPPoEgomKVXHm5f17spY02oGT326U87woKSOPh/p0c2MczabjqHyYCricabQVRHY5sJcx2on9nKIVEdAFrf8q7HvS/RZmoe5w6mMr9tXI9BWGhmBmNbQxrRwYy08zbz2+V4fjcHzw0cuHPR8oqQvsyOI7NmGMgW5zapEf81UwcFYEdvSE+gxOuGCCIX0G13GkTXtH/tzt/donTrfD7e+ZGHpbnqwm6RSY0WWZ
 gS/mhThg
 HlBBFfW5A2Ba8njZAkAxpsb+BX+rTzNwmaGgb/OKh/eYXbxojIW28QRkeZ9Xoqh3Rysnqit0l/0+TC74DZ61oQU/Krurerw3RZQW/uUvPu8zfFrLMJdHv5Ouaei2FW9T/bD2wPdnGJMkNkyd5odMgJ8SzQ4ZE/7p+XOn/yyynr+K0ttbLJXq5i2uuDkNXNqu1ofBNUMwtpLUGKeN0xNtbyB3X8TZOCjckLNKAmh4cGoBWkv14kKwe3L/IVbtmfsdHvd9Jc9wEbZHKA/O6NlVKLmR0lW1OxFHIxrSPs312jyF4Wd3hNI/2dU2eOciBBsnUadIikMHlXuagwftCx9WFAeVmTWC3l+NKnYIn0HFMHjhGjrXgUBzRdsdRQiI4iCCNC/wl
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Thu, Mar 07, 2024 at 10:29:29PM +0000, Yosry Ahmed wrote:
> On Thu, Mar 07, 2024 at 01:39:53PM -0800, Dave Hansen wrote:
> > On 3/7/24 13:04, Yosry Ahmed wrote:
> > > I thought about doing inc_mm_tlb_gen() when LAM is updated, but it felt
> > > hacky and more importantly doesn't make it clear in switch_mm_irqs_off()
> > > that we correctly handle LAM updates. We can certainly add a comment,
> > > but I think an explicit check for CPU LAM vs. mm LAM is much clearer.
> > > 
> > > WDYT?
> > 
> > The mm generations are literally there so that if the mm changes that
> > all the CPUs know they need an update.  Changing LAM enabling is 100%
> > consistent with telling other CPUs that they need an update.
> > 
> > I'd be curious of Andy feels differently though.
> 
> The mm generations are TLB-specific and all the code using them implies
> as such (e.g. look at the checks in switch_mm_irqs_off() when prev ==
> next). We can go around and update comments and/or function names to
> make them more generic, but this seems excessive. If we don't, the code
> becomes less clear imo.
> 
> I agree that the use case here is essentially the same (let other
> CPUs know they need to write CR3), but I still think that since the LAM
> case is just a simple one-time enablement, an explicit check in
> switch_mm_irqs_off() would be clearer.
> 
> Just my 2c, let me know what you prefer :)
> 
> > 
> > >> Considering how fun this code path is, a little effort at an actual
> > >> reproduction would be really appreciated.
> > > 
> > > I tried reproducing it but gave up quickly. We need a certain sequence
> > > of events to happen:
> > > 
> > > CPU 1					CPU 2
> > > kthread_use_mm()
> > > 					/* user thread enables LAM */
> > > 					context_switch()
> > > context_switch() /* to user thread */
> > 
> > First, it would be fine to either create a new kthread for reproduction
> > purposes or to hack an existing one.  For instance, have have the LAM
> > prctl() take an extra ref on the mm and stick it in a global variable:
> > 
> > 	mmgrab(current->mm);
> > 	global_mm = current->mm;
> > 
> > Then in the kthread, grab the mm and use it:
> > 
> > 	while (!global_mm);
> > 	kthread_use_mm(global_mm);
> > 	... check for the race
> > 	mmdrop(global_mm);
> > 
> > You can also hackily wait for thread to move with a stupid spin loop:
> > 
> > 	while (smp_processor_id() != 1);
> > 
> > and then actually move it with sched_setaffinity() from userspace.  That
> > can make it easier to get that series of events to happen in lockstep.
> 
> I will take a stab at doing something similar and let you know, thanks.

I came up with a kernel patch that I *think* may reproduce the problem
with enough iterations. Userspace only needs to enable LAM, so I think
the selftest can be enough to trigger it.

However, there is no hardware with LAM at my disposal, and IIUC I cannot
use QEMU without KVM to run a kernel with LAM. I was planning to do more
testing before sending a non-RFC version, but apparently I cannot do
any testing beyond building at this point (including reproducing) :/

Let me know how you want to proceed. I can send a non-RFC v1 based on
the feedback I got on the RFC, but it will only be build tested.

For the record, here is the diff that I *think* may reproduce the bug:

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 33b268747bb7b..c37a8c26a3c21 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -750,8 +750,25 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
 
 #define LAM_U57_BITS 6
 
+static int kthread_fn(void *_mm)
+{
+	struct mm_struct *mm = _mm;
+
+	/*
+	 * Wait for LAM to be enabled then schedule. Hopefully we will context
+	 * switch directly into the task that enabled LAM due to CPU pinning.
+	 */
+	kthread_use_mm(mm);
+	while (!test_bit(MM_CONTEXT_LOCK_LAM, &mm->context.flags));
+	schedule();
+	return 0;
+}
+
 static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
 {
+	struct task_struct *kthread_task;
+	int kthread_cpu;
+
 	if (!cpu_feature_enabled(X86_FEATURE_LAM))
 		return -ENODEV;
 
@@ -782,10 +799,22 @@ static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
 		return -EINVAL;
 	}
 
+	/* Pin the task to the current CPU */
+	set_cpus_allowed_ptr(current, cpumask_of(smp_processor_id()));
+
+	/* Run a kthread on another CPU and wait for it to start */
+	kthread_cpu = cpumask_next_wrap(smp_processor_id(), cpu_online_mask, 0, false),
+	kthread_task = kthread_run_on_cpu(kthread_fn, mm, kthread_cpu, "lam_repro_kthread");
+	while (!task_is_running(kthread_task));
+
 	write_cr3(__read_cr3() | mm->context.lam_cr3_mask);
 	set_tlbstate_lam_mode(mm);
 	set_bit(MM_CONTEXT_LOCK_LAM, &mm->context.flags);
 
+	/* Move the task to the kthread CPU */
+	set_cpus_allowed_ptr(current, cpumask_of(kthread_cpu));
+
 	mmap_write_unlock(mm);
 
 	return 0;
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 51f9f56941058..3afb53f1a1901 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -593,7 +593,7 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
 		next_tlb_gen = atomic64_read(&next->context.tlb_gen);
 		if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
 				next_tlb_gen)
-			return;
+			BUG_ON(new_lam != tlbstate_lam_cr3_mask());
 
 		/*
 		 * TLB contents went out of date while we were in lazy