From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82E5635E95A for ; Thu, 2 Apr 2026 10:43:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775126613; cv=none; b=QJ1oEfezrhhcO+SPOj1yT/v73PmsDm2aXuHXPpKuNp+4XbPDVXkmlov2+WZWaYrM4M6038euc6PuBnrQSFxJyT97+naa9N1ZOc1c50MOFCSGlRj9716ofxnXODkMhuQRQY3onG0KQA6klLbh7dOabos9oz5EeXlDUnVXe1MsfM8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775126613; c=relaxed/simple; bh=9MwThBYB8DDMTGeUvxNIKlP6gMW9SnNuKzzL7x8z4tw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=DRdUHrUzXRyPk2CU5VhdTpzNJuO4z/qybAKyn12qU/29NztBg3gPPkZgIyc/6fj0IeutZpzSTHnNu8cIQSA7o+vB9ouDpjxeapf3u4GK+i5wbZbRL8ZwJE/bbmaiq44Q4fv1pHPfedtJLtzjTc8k8jv6wpRyqhd5pn84e9fOgU4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=LNL6/ETH; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="LNL6/ETH" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=wxeaUehvEZ46tUOBcm7owOueyyvxLD3ABXcInnfomT4=; b=LNL6/ETHCStvXhilDl1om/amSL GzjYq7OY22TxiqVmWTK8/CTeXlTuuQ6FMkBP40VaDkrN/mYn0GCcpk65mKkkWiQWTabVIn7+pFazQ mjmvajqA4vF/L6Si4ZEhT1/DjEBbgVyO+gUY5kHZgo2dFJE9JfcOzFSA2nmQmn6aGDl8Pn1ofOdgQ kPLIyIzZR9VjHsITXvu+Qg+xJTFPldb1TBnT1Wcf988evNzeSS57RkGCuHpilcDiI5T1sxy7qNUkO m0LGRCwGTM6JEfr6sW0rzTXdM8cp1gQDJqwz30B9FixA/+ThY8rOYw5eUyeDJFjCFExTQUrxDBlEh UhdlIcZQ==; Received: from 2001-1c00-8d85-4b00-266e-96ff-fe07-7dcc.cable.dynamic.v6.ziggo.nl ([2001:1c00:8d85:4b00:266e:96ff:fe07:7dcc] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1w8FWD-00000002FhD-1SsJ; Thu, 02 Apr 2026 10:43:23 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id A09EF30301D; Thu, 02 Apr 2026 12:43:19 +0200 (CEST) Date: Thu, 2 Apr 2026 12:43:19 +0200 From: Peter Zijlstra To: "Deng, Pan" Cc: "mingo@kernel.org" , "rostedt@goodmis.org" , "linux-kernel@vger.kernel.org" , "Li, Tianyou" , "tim.c.chen@linux.intel.com" , "Chen, Yu C" Subject: Re: [PATCH v2 1/4] sched/rt: Optimize cpupri_vec layout to mitigate cache line contention Message-ID: <20260402104319.GY3738786@noisy.programming.kicks-ass.net> References: <24c460fb48d86a5b990acbb42d0d29d91dfc427c.1753076363.git.pan.deng@intel.com> <20260320100903.GR3738786@noisy.programming.kicks-ass.net> <20260324121146.GC3738010@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Fri, Mar 27, 2026 at 10:17:13AM +0000, Deng, Pan wrote: > > > > On Tue, Mar 24, 2026 at 09:36:14AM +0000, Deng, Pan wrote: > > > > > Regarding this patch, yes, using cacheline aligned could increase potential > > > memory usage. > > > After internal discussion, we are thinking of an alternative method to > > > mitigate the waste of memory usage, that is, using kmalloc() to allocate > > > count in a different memory space rather than placing the count and > > > cpumask together in this structure. The rationale is that, writing to > > > address pointed by the counter and reading the address from cpumask > > > is isolated in different memory space which could reduce the ratio of > > > cache false sharing, besides, kmalloc() based on slub/slab could place > > > the objects in different cache lines to reduce the cache contention. > > > The drawback of dynamic allocation counter is that, we have to maintain > > > the life cycle of the counters. > > > Could you please advise if sticking with current cache_align attribute > > > method or using kmalloc() is preferred? > > > > Well, you'd have to allocate a full cacheline anyway. If you allocate N > > 4 byte (counter) objects, there's a fair chance they end up in the same > > cacheline (its a SLAB after all) and then you're back to having a ton of > > false sharing. > > > > Anyway, for you specific workload, why isn't partitioning a viable > > solution? It would not need any kernel modifications and would get rid > > of the contention entirely. > > Thank you very much for pointing this out. > > We understand cpuset partitioning would eliminate the contention. > However, in managed container platforms (e.g., Kubernetes), users can > obtain RT capabilities for their workloads via CAP_SYS_NICE, but they > don't have host-level privileges to create cpuset partitions. So because Kubernetes is shit, you're going to patch the kernel? Isn't that backwards? Should you not instead try and fix this kubernetes thing?