From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF70A3368B7
	for <linux-kernel@vger.kernel.org>; Sun, 15 Mar 2026 18:10:51 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.45
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773598253; cv=none; b=RIcgntAiz53VJFXEfQUv3ja4yvGkW1NboXCqcmPZXM/uJ0BD3E6RGrYR/H8EjItI0Q1H2OeIUSNNr0a/EncfGn6t4W2XTg/D6/sYzOeRwZa6I0lWO5Xt2AogwDIOwIFGXcfDxcfb0BmS5r0vp8mKjS1WusOAYP2XiL/xRQAm4bQ=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773598253; c=relaxed/simple;
	bh=Yf1zhy+OXZAdPVqXNT03xRmvOap3OrDHM+BYRCTHNJA=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type:Content-Disposition; b=W+xnjrZHOhkbmeg3uBHMtMMdvikQlFYCXDhwjtor6Q3USDvH026RSCYm/WV4M2vIgMrokyH+uyOWz4zlomqKRuKC+kwmjWFKPsx2kpKE1t9dt9Zz2j7P0CBubq/oCoJkEWJqHhRQK0br3y8TKaz5bi32hbWVLLvAhQnlYj4hicM=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=G2Plmwle; arc=none smtp.client-ip=209.85.221.45
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="G2Plmwle"
Received: by mail-wr1-f45.google.com with SMTP id ffacd0b85a97d-439b6d9c981so2693307f8f.1
        for <linux-kernel@vger.kernel.org>; Sun, 15 Mar 2026 11:10:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1773598250; x=1774203050; darn=vger.kernel.org;
        h=content-transfer-encoding:content-disposition:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:from:to
         :cc:subject:date:message-id:reply-to;
        bh=wFfWUXvwHoJE4xSBTFJR2gnf8f2b6YS0PT0BDPveSGA=;
        b=G2PlmwlePK7lnlv9lV0TDiZEiT0bD9s0HyyV8fFGukMuF3ZQe/lkvCHRL6yUxRhBt1
         QtoWrOvT/9BsLMH1hh/A9sR3z15d4Orp4Bk62yaLi4lFY6CQGzMp92as6EgxXdtd6gtQ
         lTzYpvffxVWIL6+EkyN52AM8QQ43Oat3hKINt/TKXfbAFUbrLteWXgFYwRK3/Cfzmfut
         FAMm+3wRM6QSDX+M1lFqQUgUIlmWqXOWC9vtOc5LTq3JVLPqN+NcrGX2B1v59t2khnma
         0pG2Q2BAgGIxxl765QdY2YNqONoMtAMTmdhhwyQWy+Oj1CGSaMAWz4m/q+vVaLV0GV9K
         a+sw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1773598250; x=1774203050;
        h=content-transfer-encoding:content-disposition:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from:x-gm-gg
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=wFfWUXvwHoJE4xSBTFJR2gnf8f2b6YS0PT0BDPveSGA=;
        b=U+rsznXEBImgRXhxEaFb3dHBKde4Ua8EuDM7u00m1Ha8HZXy3l7wVIVVUHqwuNn+ab
         vsRtBVO0lblwrl91WJFXyxwEzSy9G4qhbuhjUESJumwQy+sW2SAPsBVRKpOd0Tiqs4St
         aRMDYbzDyS2tv/dFghmEtxr6McT0cLVw3ri7bSAJSDVfwAjX9rrqrnT/KQPAzUSfUMNV
         Bk4h8fVKbqSgUkfam/0KfZ6VIQVszpdzciI6OtIuqOfmSn+Yd8IPJJmfT8c32FLYPuDD
         Hd6Mx+x2eb4T9PxmNIt8cDQsUhB7VC5Sm1yM+pYt9LCWqZUe+GOw/aqRZ68cNZbip6+x
         wBLQ==
X-Forwarded-Encrypted: i=1; AJvYcCWW8JQuWqMN9rYDpL3m0IMnMa51QdXhFxhg5tfyslzgssRFY8UrL9GmuyGaXtVXuo2NeVhHBD3Gtz8NPgk=@vger.kernel.org
X-Gm-Message-State: AOJu0YycCUaQl/28FuzlcryAmw2HDSUAGYZfe6FpAtTuWtxpXdQ1OCwL
	zwLktVi0uT1EGEC9wfSBzZPF+6uMACjDu9B00CqxpXmaUJ2Q5MrnEba7
X-Gm-Gg: ATEYQzz7xauCM20T1/P1ZopJaAMI1oyoie3ECXi6A09U2VhyOFIH/tVCEtVQxw9uAXB
	3MxWbcETnxNmw7vYyHAWb+O0IndbkwS7/1CncTwG+axIZNdCzCBFlcRNmlsa70wfh7OXSSb3t1j
	K9bdPXK0wqj1Z9tMNPiwkeh+N145ZPjjnjsKIGDLRgy3UWn1pQNgE60n5NQ/2/1o2IcWrEvVzfW
	NPp0T4vPwLFHRVbVoi9rPplQ1zQCcbVfdgFVDerucOKsy7JNcP2Y4SY3PnqHm5B2PcoIK75YT5e
	QE89RQJdHxYXvsk9EAXk2uVkzAifaRYGOf017gpterqMPikAT5CaafDfd4RWuQf/RZ+csFTVJKS
	KxD4RQHazIIS1qgFo0HUUPffm0jPr82E9RWwPpqEUqREVzYTOslCGkc4Jlk9jkKuOHAis43cZpw
	HRaP3v3uaLf8tJu7CQzQUNpr64gnnwDOjiBSA=
X-Received: by 2002:a05:6000:40db:b0:439:b3bc:4608 with SMTP id ffacd0b85a97d-43a04d80c26mr19640290f8f.7.1773598250045;
        Sun, 15 Mar 2026 11:10:50 -0700 (PDT)
Received: from WindFlash.powerhub ([2a0a:ef40:1b2a:fa01:9944:6a8c:dc37:eba5])
        by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-439fe1a76e5sm34632151f8f.12.2026.03.15.11.10.48
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sun, 15 Mar 2026 11:10:49 -0700 (PDT)
From: Leonardo Bras <leobras.c@gmail.com>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: Leonardo Bras <leobras.c@gmail.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Waiman Long <longman@redhat.com>,
	Boqun Feun <boqun.feng@gmail.com>
Subject: Re: [PATCH v2 2/5] Introducing qpw_lock() and per-cpu queue & flush work
Date: Sun, 15 Mar 2026 15:10:27 -0300
Message-ID: <abb2E3QW7t5Rhxrt@WindFlash>
X-Mailer: git-send-email 2.53.0
In-Reply-To: <abSH40oW9qiVDXZS@pavilion.home>
References: <20260302154945.143996316@redhat.com> <20260302155105.214878062@redhat.com> <abSH40oW9qiVDXZS@pavilion.home>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

On Fri, Mar 13, 2026 at 10:55:47PM +0100, Frederic Weisbecker wrote:
> Le Mon, Mar 02, 2026 at 12:49:47PM -0300, Marcelo Tosatti a écrit :
> > Some places in the kernel implement a parallel programming strategy
> > consisting on local_locks() for most of the work, and some rare remote
> > operations are scheduled on target cpu. This keeps cache bouncing low since
> > cacheline tends to be mostly local, and avoids the cost of locks in non-RT
> > kernels, even though the very few remote operations will be expensive due
> > to scheduling overhead.
> > 
> > On the other hand, for RT workloads this can represent a problem:
> > scheduling work on remote cpu that are executing low latency tasks
> > is undesired and can introduce unexpected deadline misses.
> > 
> > It's interesting, though, that local_lock()s in RT kernels become
> > spinlock(). We can make use of those to avoid scheduling work on a remote
> > cpu by directly updating another cpu's per_cpu structure, while holding
> > it's spinlock().
> > 
> > In order to do that, it's necessary to introduce a new set of functions to
> > make it possible to get another cpu's per-cpu "local" lock (qpw_{un,}lock*)
> > and also the corresponding queue_percpu_work_on() and flush_percpu_work()
> > helpers to run the remote work.
> > 
> > Users of non-RT kernels but with low latency requirements can select
> > similar functionality by using the CONFIG_QPW compile time option.
> > 
> > On CONFIG_QPW disabled kernels, no changes are expected, as every
> > one of the introduced helpers work the exactly same as the current
> > implementation:
> > qpw_{un,}lock*()        ->  local_{un,}lock*() (ignores cpu parameter)
> 
> I find this part of the semantic a bit weird. If we eventually queue
> the work, why do we care about doing a local_lock() locally ?

(Sorry, not sure if I was able to understand the question.)

Local locks make sure a per-cpu procedure happens on the same CPU from 
start to end. Using migrate_disable & using per-cpu spinlocks on RT and 
doing preempt_disable in non_RT.

Most of the cases happen to have the work done in the local cpu, and just 
a few procedures happen to be queued remotely, such as remote cache 
draining. 

Even with the new 'local_qpw_lock()' which is faster for cases we are sure 
to have local usages, on qpw=0 we have to make qpw_lock() a local_lock as 
well, as the cpu receiving the scheduled work needs to make sure to run it 
all without moving to a different cpu.

> 
> > queue_percpu_work_on()  ->  queue_work_on()
> > flush_percpu_work()     ->  flush_work()

btw Marcelo, I think we need to do add the local_qpw_lock here as well, or 
change the first line to '{local_,}qpw_{un,}lock*()'

> > 
> > @@ -2840,6 +2840,16 @@ Kernel parameters
> >  
> >  			The format of <cpu-list> is described above.
> >  
> > +	qpw=		[KNL,SMP] Select a behavior on per-CPU resource sharing
> > +			and remote interference mechanism on a kernel built with
> > +			CONFIG_QPW.
> > +			Format: { "0" | "1" }
> > +			0 - local_lock() + queue_work_on(remote_cpu)
> > +			1 - spin_lock() for both local and remote operations
> > +
> > +			Selecting 1 may be interesting for systems that want
> > +			to avoid interruption & context switches from IPIs.
> 
> Like Vlastimil suggested, it would be better to just have it off by default
> and turn it on only if nohz_full= is passed. Then we can consider introducing
> the parameter later if the need arise.

I agree with having it enabled with isolcpus/nohz_full, but I would 
recommend having this option anyway, as the user could disable qpw if 
wanted, or enable outside isolcpu scenarios for any reason.

> 
> > +#define qpw_lock_init(lock)				\
> > +	local_lock_init(lock)
> > +
> > +#define qpw_trylock_init(lock)				\
> > +	local_trylock_init(lock)
> > +
> > +#define qpw_lock(lock, cpu)				\
> > +	local_lock(lock)
> > +
> > +#define local_qpw_lock(lock)				\
> > +	local_lock(lock)
> 
> It would be easier to grep if all the APIs start with qpw_* prefix.
> 
> qpw_local_lock() ?

Sure, not against the change.
And sure, would need to change all versions starting with local_ .

> 
> > +
> > +#define qpw_lock_irqsave(lock, flags, cpu)		\
> > +	local_lock_irqsave(lock, flags)
> > +
> > +#define local_qpw_lock_irqsave(lock, flags)		\
> > +	local_lock_irqsave(lock, flags)
> 
> ditto?
> 
> > +
> > +#define qpw_trylock(lock, cpu)				\
> > +	local_trylock(lock)
> > +
> > +#define local_qpw_trylock(lock)				\
> > +	local_trylock(lock)
> 
> ...
> 
> > +
> > +#define qpw_trylock_irqsave(lock, flags, cpu)		\
> > +	local_trylock_irqsave(lock, flags)
> > +
> > +#define qpw_unlock(lock, cpu)				\
> > +	local_unlock(lock)
> > +
> > +#define local_qpw_unlock(lock)				\
> > +	local_unlock(lock)
> 
> ...
> 
> > +
> > +#define qpw_unlock_irqrestore(lock, flags, cpu)		\
> > +	local_unlock_irqrestore(lock, flags)
> > +
> > +#define local_qpw_unlock_irqrestore(lock, flags)	\
> > +	local_unlock_irqrestore(lock, flags)
> 
> ...
> 
> > +
> > +#define qpw_lockdep_assert_held(lock)			\
> > +	lockdep_assert_held(lock)
> > +
> > +#define queue_percpu_work_on(c, wq, qpw)		\
> > +	queue_work_on(c, wq, &(qpw)->work)
> 
> qpw_queue_work_on() ?
> 
> Perhaps even better would be qpw_queue_work_for(), leaving some room for
> mystery about where/how the work will be executed :-)
> 

QPW comes from Queue PerCPU Work
Having it called qpw_queue_work_{on,for}() would be repetitve
But having qpw_on() or qpw_for() would be misleading :) 

That's why I went with queue_percpu_work_on() based on how we have the 
original function (queue_work_on) being called.

> > +
> > +#define flush_percpu_work(qpw)				\
> > +	flush_work(&(qpw)->work)
> 
> qpw_flush_work() ?

Same as above,
qpw_flush() ?

> 
> > +
> > +#define qpw_get_cpu(qpw)	smp_processor_id()
> > +
> > +#define qpw_is_cpu_remote(cpu)		(false)
> > +
> > +#define INIT_QPW(qpw, func, c)				\
> > +	INIT_WORK(&(qpw)->work, (func))
> > +
> > @@ -762,6 +762,41 @@ config CPU_ISOLATION
> >  
> >  	  Say Y if unsure.
> >  
> > +config QPW
> > +	bool "Queue per-CPU Work"
> > +	depends on SMP || COMPILE_TEST
> > +	default n
> > +	help
> > +	  Allow changing the behavior on per-CPU resource sharing with cache,
> > +	  from the regular local_locks() + queue_work_on(remote_cpu) to using
> > +	  per-CPU spinlocks on both local and remote operations.
> > +
> > +	  This is useful to give user the option on reducing IPIs to CPUs, and
> > +	  thus reduce interruptions and context switches. On the other hand, it
> > +	  increases generated code and will use atomic operations if spinlocks
> > +	  are selected.
> > +
> > +	  If set, will use the default behavior set in QPW_DEFAULT unless boot
> > +	  parameter qpw is passed with a different behavior.
> > +
> > +	  If unset, will use the local_lock() + queue_work_on() strategy,
> > +	  regardless of the boot parameter or QPW_DEFAULT.
> > +
> > +	  Say N if unsure.
> 
> Perhaps that too should just be selected automatically by CONFIG_NO_HZ_FULL and if
> the need arise in the future, make it visible to the user?
> 

I think it would be good to have this, and let whoever is building have the 
chance to disable QPW if it doesn't work well for their machines or 
workload, without having to add a new boot parameter to continue have 
their stuff working as always after a kernel update.

But that is open to discussion :)

Thanks!
Leo