From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from casper.infradead.org (casper.infradead.org [90.155.50.34])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id BD0081F192E
	for <linux-kernel@vger.kernel.org>; Tue, 21 Jan 2025 10:33:50 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1737455633; cv=none; b=YBZsAbOjGMI0OHmqWK6GTIx6l/0DryKouv5jV3K59vLQ/cb1ABbZEzPE4PiLLU2kC4nLEahu2BinVugSWfG2j2S3E64Wq1qmu+8PqmTHIq0X7ZcHlPMdhKJF/nmvv2UWCmv6OEEkkqQ6U5uQznlwoAqBqXNCX6eDQPS+uWt3JFo=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1737455633; c=relaxed/simple;
	bh=/tnKFxb6a8ps9GY4NnscZz7BSTxuNGPjVlPPkPRJ0v4=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=imtD6NPv10V8AJzcYgdurBv0DEYZwo4Hf/TAvKpHveFuClN8Upl1/i+DOhcl7KMVG2ae4yuyPTchCvGfgeBnu9olrcz78X1r84DI3RIlusVaWFUQVN5yns6XQE2NkA7kH02B//Gga6djOtIkb3NYZum856Dw01F6DZXTkjfeUcQ=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=AUGjuIvH; arc=none smtp.client-ip=90.155.50.34
Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org
Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="AUGjuIvH"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version:
	References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To:
	Content-Transfer-Encoding:Content-ID:Content-Description;
	bh=l6xknyGTGUdofrv2Cu24XlD1g2pDfkAAtlF645ZGeXI=; b=AUGjuIvHORwflO5V09JuzM+kSB
	nyhEcj/hhQLu6Ufv0A9fOXqufcIuEosXyjW+UeArVBJx97nab2GAb6di9o3+GMcdj/aGLTaLgQ1Tq
	toY7MF9hK9xIMPEUwz7dLXMi+42/b+f2Da9JhBdd0e8ePARN/RxFgwVJ7RmT2e2MReQDjP5IxF12Z
	BSWJ0cYE7m9hEEWyQtgyO7ivgIhqfMqPaw/C00MFncwbW1NXysZTtrSlxuDNyMNPDzple/koDRq+O
	ESA01K2Ibtx8rhoFhbWLIIqSmu/M9YeWU83/d7K7zyLl59/ldUfHnQttRC1dN/1dRJe8U1RF/aY4M
	gBKF8O9Q==;
Received: from 77-249-17-89.cable.dynamic.v4.ziggo.nl ([77.249.17.89] helo=noisy.programming.kicks-ass.net)
	by casper.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux))
	id 1taBZe-00000007LoL-17TN;
	Tue, 21 Jan 2025 10:33:34 +0000
Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000)
	id 529F3300619; Tue, 21 Jan 2025 11:33:33 +0100 (CET)
Date: Tue, 21 Jan 2025 11:33:33 +0100
From: Peter Zijlstra <peterz@infradead.org>
To: Rik van Riel <riel@surriel.com>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org, bp@alien8.de,
	dave.hansen@linux.intel.com, zhengqi.arch@bytedance.com,
	nadav.amit@gmail.com, thomas.lendacky@amd.com, kernel-team@meta.com,
	linux-mm@kvack.org, akpm@linux-foundation.org, jannh@google.com,
	mhklinux@outlook.com, andrew.cooper3@citrix.com
Subject: Re: [PATCH v6 09/12] x86/mm: enable broadcast TLB invalidation for
 multi-threaded processes
Message-ID: <20250121103333.GA7145@noisy.programming.kicks-ass.net>
References: <20250120024104.1924753-1-riel@surriel.com>
 <20250120024104.1924753-10-riel@surriel.com>
 <20250121095507.GB5388@noisy.programming.kicks-ass.net>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20250121095507.GB5388@noisy.programming.kicks-ass.net>

On Tue, Jan 21, 2025 at 10:55:07AM +0100, Peter Zijlstra wrote:
> On Sun, Jan 19, 2025 at 09:40:17PM -0500, Rik van Riel wrote:
> > +/*
> > + * Figure out whether to assign a global ASID to a process.
> > + * We vary the threshold by how empty or full global ASID space is.
> > + * 1/4 full: >= 4 active threads
> > + * 1/2 full: >= 8 active threads
> > + * 3/4 full: >= 16 active threads
> > + * 7/8 full: >= 32 active threads
> > + * etc
> > + *
> > + * This way we should never exhaust the global ASID space, even on very
> > + * large systems, and the processes with the largest number of active
> > + * threads should be able to use broadcast TLB invalidation.
> > + */
> > +#define HALFFULL_THRESHOLD 8
> > +static bool meets_global_asid_threshold(struct mm_struct *mm)
> > +{
> > +	int avail = global_asid_available;
> > +	int threshold = HALFFULL_THRESHOLD;
> > +
> > +	if (!avail)
> > +		return false;
> > +
> > +	if (avail > MAX_ASID_AVAILABLE * 3 / 4) {
> > +		threshold = HALFFULL_THRESHOLD / 4;
> > +	} else if (avail > MAX_ASID_AVAILABLE / 2) {
> > +		threshold = HALFFULL_THRESHOLD / 2;
> > +	} else if (avail < MAX_ASID_AVAILABLE / 3) {
> > +		do {
> > +			avail *= 2;
> > +			threshold *= 2;
> > +		} while ((avail + threshold) < MAX_ASID_AVAILABLE / 2);
> > +	}
> > +
> > +	return mm_active_cpus_exceeds(mm, threshold);
> > +}
> 
> I'm still very much disliking this. Why do we need this? Yes, running
> out of ASID space is a pain, but this increasing threshold also makes
> things behave weird.
> 
> Suppose our most used processes starts slow, and ends up not getting an
> ASID because too much irrelevant crap gets started before it spawns
> enough threads and then no longer qualifies.
> 
> Can't we just start with a very simple constant test and poke at things
> if/when its found to not work?

Something like so perhaps?

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -268,7 +268,7 @@ static inline u16 mm_global_asid(struct
 	if (!cpu_feature_enabled(X86_FEATURE_INVLPGB))
 		return 0;
 
-	asid = READ_ONCE(mm->context.global_asid);
+	asid = smp_load_acquire(&mm->context.global_asid);
 
 	/* mm->context.global_asid is either 0, or a global ASID */
 	VM_WARN_ON_ONCE(is_dyn_asid(asid));
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -308,13 +308,18 @@ static void reset_global_asid_space(void
 static u16 get_global_asid(void)
 {
 	lockdep_assert_held(&global_asid_lock);
+	bool done_reset = false;
 
 	do {
 		u16 start = last_global_asid;
 		u16 asid = find_next_zero_bit(global_asid_used, MAX_ASID_AVAILABLE, start);
 
-		if (asid >= MAX_ASID_AVAILABLE) {
+		if (asid > MAX_ASID_AVAILABLE) {
+			if (done_reset)
+				return asid;
+
 			reset_global_asid_space();
+			done_reset = true;
 			continue;
 		}
 
@@ -392,6 +398,12 @@ static bool mm_active_cpus_exceeds(struc
  */
 static void use_global_asid(struct mm_struct *mm)
 {
+	u16 asid;
+
+	/* This process is already using broadcast TLB invalidation. */
+	if (mm->context.global_asid)
+		return;
+
 	guard(raw_spinlock_irqsave)(&global_asid_lock);
 
 	/* This process is already using broadcast TLB invalidation. */
@@ -402,58 +414,25 @@ static void use_global_asid(struct mm_st
 	if (!global_asid_available)
 		return;
 
+	asid = get_global_asid();
+	if (asid > MAX_ASID_AVAILABLE)
+		return;
+
 	/*
-	 * The transition from IPI TLB flushing, with a dynamic ASID,
-	 * and broadcast TLB flushing, using a global ASID, uses memory
-	 * ordering for synchronization.
-	 *
-	 * While the process has threads still using a dynamic ASID,
-	 * TLB invalidation IPIs continue to get sent.
-	 *
-	 * This code sets asid_transition first, before assigning the
-	 * global ASID.
-	 *
-	 * The TLB flush code will only verify the ASID transition
-	 * after it has seen the new global ASID for the process.
+	 * Notably flush_tlb_mm_range() -> broadcast_tlb_flush() ->
+	 * finish_asid_transition() needs to observe asid_transition == true
+	 * once it observes global_asid.
 	 */
-	WRITE_ONCE(mm->context.asid_transition, true);
-	WRITE_ONCE(mm->context.global_asid, get_global_asid());
+	mm->context.asid_transition = true;
+	smp_store_release(&mm->context.global_asid, asid);
 }
 
-/*
- * Figure out whether to assign a global ASID to a process.
- * We vary the threshold by how empty or full global ASID space is.
- * 1/4 full: >= 4 active threads
- * 1/2 full: >= 8 active threads
- * 3/4 full: >= 16 active threads
- * 7/8 full: >= 32 active threads
- * etc
- *
- * This way we should never exhaust the global ASID space, even on very
- * large systems, and the processes with the largest number of active
- * threads should be able to use broadcast TLB invalidation.
- */
-#define HALFFULL_THRESHOLD 8
 static bool meets_global_asid_threshold(struct mm_struct *mm)
 {
-	int avail = global_asid_available;
-	int threshold = HALFFULL_THRESHOLD;
-
-	if (!avail)
+	if (!global_asid_available)
 		return false;
 
-	if (avail > MAX_ASID_AVAILABLE * 3 / 4) {
-		threshold = HALFFULL_THRESHOLD / 4;
-	} else if (avail > MAX_ASID_AVAILABLE / 2) {
-		threshold = HALFFULL_THRESHOLD / 2;
-	} else if (avail < MAX_ASID_AVAILABLE / 3) {
-		do {
-			avail *= 2;
-			threshold *= 2;
-		} while ((avail + threshold) < MAX_ASID_AVAILABLE / 2);
-	}
-
-	return mm_active_cpus_exceeds(mm, threshold);
+	return mm_active_cpus_exceeds(mm, 4);
 }
 
 static void consider_global_asid(struct mm_struct *mm)