From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73A873F9A1A for ; Fri, 26 Jun 2026 14:57:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782485839; cv=none; b=gEZ42DqrJsTFnlzJDdp09Hcdibi60MEcIDR/p27n8MOImZbxNzDWgDZUayMaqxoaeblCexjR/w8lKJo5/d+8rpVBTSv90aeXFTKjhoQPEg26r91LL7XTlSqez21GRl6VbRf1Vt0h2WQvZinkozxwVhaL+xeSf+N2vb8Y4cHKuzE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782485839; c=relaxed/simple; bh=26W0TqhZBBW6LaJk2Ct70SE7X20tF6vX/4eGLTg5ol8=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=BfipudyBKiJUsIUQtY0WUfyLme5URt44jUM8+OrdXJYjBzHKiehoxjdgwFf3obux4J3YfLub9ocKVpEuF2UBYfSxwkWp0FjpABi71RvqEDiyUfsO1RPABXJZR/7T1iEL+9cGP+9Wc8OWwINrwioElvP4kyQSFEWczxvqimNN4KE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NPPTkVHe; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NPPTkVHe" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 792AB1F00A3A; Fri, 26 Jun 2026 14:57:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782485837; bh=ZzhgiD+Lip0fG70pLEutTqn/UbuMyOdbStMVqwUbYRw=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=NPPTkVHeMn+L87Vv/qMSMiJvmFKEF4FobEcR7gLGSn4JP375sLW2YfJXFPOsGriDL b3M53efj5cO5UcY+HdXhRzzRJ9zHBxkVAlGeMKmVOGTbDsg8yJu4cxBwqc8L9Wv0yV IOGI5JK8pmDQ3S1NbXKk2OpUIC+CdUsq0Pa0Kc/lX/L1Uzwt5mB7lB0RYGwZ7GM5HM u4t1CP26kDFZGryOIV24o4pfiWoqB/825rJeOx5TQqEQWbValSowLN3kjtieSurtWY MFCYwWJSgvasecvUspc6/XximEo8E0DEX3hPWfUbTdk4D07vcGtp/M8PlE0olmrJVL ylNXrMllXPCfA== From: Thomas Gleixner To: Chuyi Zhou , mingo@redhat.com, luto@kernel.org, peterz@infradead.org, paulmck@kernel.org, muchun.song@linux.dev, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com, bigeasy@linutronix.de, clrkwllms@kernel.org, rostedt@goodmis.org, nadav.amit@gmail.com, vkuznets@redhat.com Cc: linux-kernel@vger.kernel.org, Chuyi Zhou Subject: Re: [PATCH v8 12/14] x86/mm: Move flush_tlb_info back to the stack In-Reply-To: <20260616111127.966468-13-zhouchuyi@bytedance.com> References: <20260616111127.966468-1-zhouchuyi@bytedance.com> <20260616111127.966468-13-zhouchuyi@bytedance.com> Date: Fri, 26 Jun 2026 16:57:14 +0200 Message-ID: <87jyrlic39.ffs@fw13> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain On Tue, Jun 16 2026 at 19:11, Chuyi Zhou wrote: > flush_tlb_info benefits from cacheline alignment, but using > cacheline-aligned stack storage directly can grow stack usage too much on > configurations with large SMP_CACHE_BYTES values[1]. That problem caused What's the link for when you can explain it in prose? Right after that you tell that this caused 515... to be reverted. > commit 515ab7c41306 ("x86/mm: Align TLB invalidation info") to be > reverted. Commit 3db6d5a5ecaf ("x86/mm/tlb: Remove 'struct flush_tlb_info' > from the stack") moved flush_tlb_info to per-CPU storage, which avoided the > > base on-stack-aligned on-stack-not-aligned > ---- --------- ----------- > avg (usec/op) 2.5278 2.5261 2.5508 > stddev 0.0007 0.0027 0.0023 > > The benchmark results show that the average latency difference between > the baseline (base) and the properly aligned stack variable > (on-stack-aligned) is within the standard deviation (stddev). This > indicates that the variations are caused by testing noise, and reverting > to a stack variable with proper alignment causes no performance > regression compared to the per-CPU implementation. The unaligned version > (on-stack-not-aligned) shows a minor performance drop. This demonstrates > that we can shorten the CPU-pinned/preemption-disabled section without the ... disabled section can be shortened...