From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B4022D1913 for ; Tue, 23 Jun 2026 10:30:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782210640; cv=none; b=PsX5dM+nQYj9l49cOIrxqrAyGYENSKB/42psfZbDLfAZIyu5Ueffd6knzaIj/0Z/UU0fIqNYeE4vIdh3YA7LBADK1T6JTH9J2P9WaAZg5Wm9giM6t1Ah/Rwl1gFcLxVh41i1XA/TcSH4DV65+IH866E6c586OnTZwADELoDHj2M= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782210640; c=relaxed/simple; bh=LEhMhRq9f7QllH6dVZPUg7DAOi51uLT+q6kucmsBKPw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=YYTFo6GUOpbJiATN2gEMGmtYrwPOojDzwEN+6Ywwt35zGBsvPp0WbuNujj3o5ahxwHZ5lcXGbgFBZH+EIFrKH2lFIDkNDsIzHtmMLbfA3ot0Tq5IzbX5I4aQp8u5Zpt7m8Ct3mCWgfkLQCBSfb4THgP86Rk8B2IQgExue6gGg9U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=n3fq68JP; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="n3fq68JP" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Reply-To:Content-ID:Content-Description; bh=LEhMhRq9f7QllH6dVZPUg7DAOi51uLT+q6kucmsBKPw=; b=n3fq68JPiAmdxX6wT8IZAhKdVN NfKGRzxG/8pcn8JMiPmGU0mVcrkV01tPHkjFEBwhQ9Cp1HbDPQKqV6H8MHrtHlvEqt0gZ7v053iEb uJeXwk2IselFbZoH4B3W8NCE4t1wpK6Ei6R1ODLl8a4EFCsWoxgLs5uv1Vm8mtuo6ue8D2pqkKOYk zfYkTEQkBFQ9W9bxNGDv9h+mYY/OkjNTHvMzdnwvL4yhphfyMpKgJCwvl5Gr8YmrXrmGioY3is3N5 Ywp37kgXuvWHHZH5lkzx86wY0fGF/frwmYJK8RPjkXRX6/ngNEKoDzfHe3H/tjH5ztGA3zHDS2qxp p+AZEB0Q==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wbyOT-001aTy-1Y; Tue, 23 Jun 2026 10:30:13 +0000 Date: Tue, 23 Jun 2026 03:30:07 -0700 From: Breno Leitao To: Peter Zijlstra Cc: Thomas Gleixner , Ingo Molnar , Darren Hart , Davidlohr Bueso , =?utf-8?B?QW5kcsOp?= Almeida , linux-kernel@vger.kernel.org, puranjay@kernel.org, rmikey@meta.com, stuclar@meta.com, namhyung@kernel.org, kernel-team@meta.com, dcostantino@meta.com Subject: Re: [PATCH RFC] futex: avoid false sharing between hb->chain and the bucket lock Message-ID: References: <20260605-futex-v1-1-4ad4a0d6f265@debian.org> <20260609104603.GA48970@noisy.programming.kicks-ass.net> <20260609201117.GA187714@noisy.programming.kicks-ass.net> <20260609201809.GA1430057@noisy.programming.kicks-ass.net> <87h5na3ait.ffs@fw13> <20260610112546.GE187714@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Debian-User: leitao On Wed, Jun 10, 2026 at 06:56:12AM -0700, Breno Leitao wrote: > .. same machine I used earlier 176-thread AMD EPYC host, 10s perf bench > futex hash per run, baseline = parent commit (acb7500801e98): I tested this on a large AI machine (NVIDIA GB200 NVL72), and the results show the highest gains observed so far. Test setup: Each kernel was measured over 5 runs of the default workload (144 threads, 1024 private futexes per thread, 10s per run; the futex hash auto-resized to 1024 buckets in both cases). Results: The optimization shows a clear, repeatable win on this hardware. The baseline averaged 1,149,586 ops/sec (range 1.14M-1.17M) while the patched kernel averaged 1,764,233 ops/sec (range 1.75M-1.77M) — a ~53% throughput improvement (1.53x). Run-to-run variance was low (~1%) and the two distributions did not overlap at all (baseline max sits well below the patched minimum), confirming the gain is statistically significant.