From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 388BD1EC01B for ; Fri, 13 Mar 2026 21:55:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773438950; cv=none; b=rplYuCPoEDR8s2D3c3+nEDbrw/+ovaFDDimnaSWIRsWT+8yBeqd6PBEMQ0140EVnrNZQDsB7hcFeo5u0JgR+qEX4WWCahDUhbWKAtU/uDcLHIIqe1lDm0ajss9b1W0KZZM9It6Kj4PEmLiYBZs+WyJXeTk57DUcTmD2uFhcLa5M= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773438950; c=relaxed/simple; bh=J+1whV0bQAHJPzJi06y0VUL5l09xyZG3WF6bTwRhO70=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=uj1NgMx14X44cPbD+cMlXjih1x6vUk8MvfEyC0Kl3ev+iU1N5oIdN3iyp9LADV78vGlgq/UtxrkQEhryEmvlpsC3o1dlUC7cGZRSEaNniTZpw3Nmhuem+jmcAbH4YfLwL23R0HNEJmHZBRZTLNXx+R44Pdu6zU70OujEJ1raYHs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Z97PW/xO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Z97PW/xO" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BE1D5C19421; Fri, 13 Mar 2026 21:55:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773438950; bh=J+1whV0bQAHJPzJi06y0VUL5l09xyZG3WF6bTwRhO70=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Z97PW/xONIssQ9EuEXuzpdRUODvHcthdXF7z7ZLftu8FXgDfDQ/KfQh3VKNlEBnRX Lck4u9qh8MZwQvrobgbo9CYxv/3PLJFfHC++b4u23AJmVYDiTdJyVqyQ2b17jSkFzS IrZcy4p2nbeil2Oan6TqFq1H+MXtXg+hyh5xLFQBBz41wiqAdfOHy3Z0I+WC2ABocB ojZyNF2RkJfHLSVMHZZpJa/Yx+aV08dURIQYtLRnezmxQ3tJ1yf/qkBac1GFMcEvZX D6Dd2DhMf67mUhauZ93OLmOuZIft3SgHWGAGFLAODJAnAvz7xsMQgM6P2/7anu35xI ZxwW5mvMdhYfA== Date: Fri, 13 Mar 2026 22:55:47 +0100 From: Frederic Weisbecker To: Marcelo Tosatti Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Vlastimil Babka , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Leonardo Bras , Thomas Gleixner , Waiman Long , Boqun Feun Subject: Re: [PATCH v2 2/5] Introducing qpw_lock() and per-cpu queue & flush work Message-ID: References: <20260302154945.143996316@redhat.com> <20260302155105.214878062@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20260302155105.214878062@redhat.com> Le Mon, Mar 02, 2026 at 12:49:47PM -0300, Marcelo Tosatti a écrit : > Some places in the kernel implement a parallel programming strategy > consisting on local_locks() for most of the work, and some rare remote > operations are scheduled on target cpu. This keeps cache bouncing low since > cacheline tends to be mostly local, and avoids the cost of locks in non-RT > kernels, even though the very few remote operations will be expensive due > to scheduling overhead. > > On the other hand, for RT workloads this can represent a problem: > scheduling work on remote cpu that are executing low latency tasks > is undesired and can introduce unexpected deadline misses. > > It's interesting, though, that local_lock()s in RT kernels become > spinlock(). We can make use of those to avoid scheduling work on a remote > cpu by directly updating another cpu's per_cpu structure, while holding > it's spinlock(). > > In order to do that, it's necessary to introduce a new set of functions to > make it possible to get another cpu's per-cpu "local" lock (qpw_{un,}lock*) > and also the corresponding queue_percpu_work_on() and flush_percpu_work() > helpers to run the remote work. > > Users of non-RT kernels but with low latency requirements can select > similar functionality by using the CONFIG_QPW compile time option. > > On CONFIG_QPW disabled kernels, no changes are expected, as every > one of the introduced helpers work the exactly same as the current > implementation: > qpw_{un,}lock*() -> local_{un,}lock*() (ignores cpu parameter) I find this part of the semantic a bit weird. If we eventually queue the work, why do we care about doing a local_lock() locally ? > queue_percpu_work_on() -> queue_work_on() > flush_percpu_work() -> flush_work() > > @@ -2840,6 +2840,16 @@ Kernel parameters > > The format of is described above. > > + qpw= [KNL,SMP] Select a behavior on per-CPU resource sharing > + and remote interference mechanism on a kernel built with > + CONFIG_QPW. > + Format: { "0" | "1" } > + 0 - local_lock() + queue_work_on(remote_cpu) > + 1 - spin_lock() for both local and remote operations > + > + Selecting 1 may be interesting for systems that want > + to avoid interruption & context switches from IPIs. Like Vlastimil suggested, it would be better to just have it off by default and turn it on only if nohz_full= is passed. Then we can consider introducing the parameter later if the need arise. > +#define qpw_lock_init(lock) \ > + local_lock_init(lock) > + > +#define qpw_trylock_init(lock) \ > + local_trylock_init(lock) > + > +#define qpw_lock(lock, cpu) \ > + local_lock(lock) > + > +#define local_qpw_lock(lock) \ > + local_lock(lock) It would be easier to grep if all the APIs start with qpw_* prefix. qpw_local_lock() ? > + > +#define qpw_lock_irqsave(lock, flags, cpu) \ > + local_lock_irqsave(lock, flags) > + > +#define local_qpw_lock_irqsave(lock, flags) \ > + local_lock_irqsave(lock, flags) ditto? > + > +#define qpw_trylock(lock, cpu) \ > + local_trylock(lock) > + > +#define local_qpw_trylock(lock) \ > + local_trylock(lock) ... > + > +#define qpw_trylock_irqsave(lock, flags, cpu) \ > + local_trylock_irqsave(lock, flags) > + > +#define qpw_unlock(lock, cpu) \ > + local_unlock(lock) > + > +#define local_qpw_unlock(lock) \ > + local_unlock(lock) ... > + > +#define qpw_unlock_irqrestore(lock, flags, cpu) \ > + local_unlock_irqrestore(lock, flags) > + > +#define local_qpw_unlock_irqrestore(lock, flags) \ > + local_unlock_irqrestore(lock, flags) ... > + > +#define qpw_lockdep_assert_held(lock) \ > + lockdep_assert_held(lock) > + > +#define queue_percpu_work_on(c, wq, qpw) \ > + queue_work_on(c, wq, &(qpw)->work) qpw_queue_work_on() ? Perhaps even better would be qpw_queue_work_for(), leaving some room for mystery about where/how the work will be executed :-) > + > +#define flush_percpu_work(qpw) \ > + flush_work(&(qpw)->work) qpw_flush_work() ? > + > +#define qpw_get_cpu(qpw) smp_processor_id() > + > +#define qpw_is_cpu_remote(cpu) (false) > + > +#define INIT_QPW(qpw, func, c) \ > + INIT_WORK(&(qpw)->work, (func)) > + > @@ -762,6 +762,41 @@ config CPU_ISOLATION > > Say Y if unsure. > > +config QPW > + bool "Queue per-CPU Work" > + depends on SMP || COMPILE_TEST > + default n > + help > + Allow changing the behavior on per-CPU resource sharing with cache, > + from the regular local_locks() + queue_work_on(remote_cpu) to using > + per-CPU spinlocks on both local and remote operations. > + > + This is useful to give user the option on reducing IPIs to CPUs, and > + thus reduce interruptions and context switches. On the other hand, it > + increases generated code and will use atomic operations if spinlocks > + are selected. > + > + If set, will use the default behavior set in QPW_DEFAULT unless boot > + parameter qpw is passed with a different behavior. > + > + If unset, will use the local_lock() + queue_work_on() strategy, > + regardless of the boot parameter or QPW_DEFAULT. > + > + Say N if unsure. Perhaps that too should just be selected automatically by CONFIG_NO_HZ_FULL and if the need arise in the future, make it visible to the user? Thanks. -- Frederic Weisbecker SUSE Labs