public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Leonardo Bras <leobras.c@gmail.com>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: Leonardo Bras <leobras.c@gmail.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Waiman Long <longman@redhat.com>,
	Boqun Feun <boqun.feng@gmail.com>
Subject: Re: [PATCH v2 2/5] Introducing qpw_lock() and per-cpu queue & flush work
Date: Sun, 15 Mar 2026 15:10:27 -0300	[thread overview]
Message-ID: <abb2E3QW7t5Rhxrt@WindFlash> (raw)
In-Reply-To: <abSH40oW9qiVDXZS@pavilion.home>

On Fri, Mar 13, 2026 at 10:55:47PM +0100, Frederic Weisbecker wrote:
> Le Mon, Mar 02, 2026 at 12:49:47PM -0300, Marcelo Tosatti a écrit :
> > Some places in the kernel implement a parallel programming strategy
> > consisting on local_locks() for most of the work, and some rare remote
> > operations are scheduled on target cpu. This keeps cache bouncing low since
> > cacheline tends to be mostly local, and avoids the cost of locks in non-RT
> > kernels, even though the very few remote operations will be expensive due
> > to scheduling overhead.
> > 
> > On the other hand, for RT workloads this can represent a problem:
> > scheduling work on remote cpu that are executing low latency tasks
> > is undesired and can introduce unexpected deadline misses.
> > 
> > It's interesting, though, that local_lock()s in RT kernels become
> > spinlock(). We can make use of those to avoid scheduling work on a remote
> > cpu by directly updating another cpu's per_cpu structure, while holding
> > it's spinlock().
> > 
> > In order to do that, it's necessary to introduce a new set of functions to
> > make it possible to get another cpu's per-cpu "local" lock (qpw_{un,}lock*)
> > and also the corresponding queue_percpu_work_on() and flush_percpu_work()
> > helpers to run the remote work.
> > 
> > Users of non-RT kernels but with low latency requirements can select
> > similar functionality by using the CONFIG_QPW compile time option.
> > 
> > On CONFIG_QPW disabled kernels, no changes are expected, as every
> > one of the introduced helpers work the exactly same as the current
> > implementation:
> > qpw_{un,}lock*()        ->  local_{un,}lock*() (ignores cpu parameter)
> 
> I find this part of the semantic a bit weird. If we eventually queue
> the work, why do we care about doing a local_lock() locally ?

(Sorry, not sure if I was able to understand the question.)

Local locks make sure a per-cpu procedure happens on the same CPU from 
start to end. Using migrate_disable & using per-cpu spinlocks on RT and 
doing preempt_disable in non_RT.

Most of the cases happen to have the work done in the local cpu, and just 
a few procedures happen to be queued remotely, such as remote cache 
draining. 

Even with the new 'local_qpw_lock()' which is faster for cases we are sure 
to have local usages, on qpw=0 we have to make qpw_lock() a local_lock as 
well, as the cpu receiving the scheduled work needs to make sure to run it 
all without moving to a different cpu.

> 
> > queue_percpu_work_on()  ->  queue_work_on()
> > flush_percpu_work()     ->  flush_work()

btw Marcelo, I think we need to do add the local_qpw_lock here as well, or 
change the first line to '{local_,}qpw_{un,}lock*()'

> > 
> > @@ -2840,6 +2840,16 @@ Kernel parameters
> >  
> >  			The format of <cpu-list> is described above.
> >  
> > +	qpw=		[KNL,SMP] Select a behavior on per-CPU resource sharing
> > +			and remote interference mechanism on a kernel built with
> > +			CONFIG_QPW.
> > +			Format: { "0" | "1" }
> > +			0 - local_lock() + queue_work_on(remote_cpu)
> > +			1 - spin_lock() for both local and remote operations
> > +
> > +			Selecting 1 may be interesting for systems that want
> > +			to avoid interruption & context switches from IPIs.
> 
> Like Vlastimil suggested, it would be better to just have it off by default
> and turn it on only if nohz_full= is passed. Then we can consider introducing
> the parameter later if the need arise.

I agree with having it enabled with isolcpus/nohz_full, but I would 
recommend having this option anyway, as the user could disable qpw if 
wanted, or enable outside isolcpu scenarios for any reason.

> 
> > +#define qpw_lock_init(lock)				\
> > +	local_lock_init(lock)
> > +
> > +#define qpw_trylock_init(lock)				\
> > +	local_trylock_init(lock)
> > +
> > +#define qpw_lock(lock, cpu)				\
> > +	local_lock(lock)
> > +
> > +#define local_qpw_lock(lock)				\
> > +	local_lock(lock)
> 
> It would be easier to grep if all the APIs start with qpw_* prefix.
> 
> qpw_local_lock() ?

Sure, not against the change.
And sure, would need to change all versions starting with local_ .

> 
> > +
> > +#define qpw_lock_irqsave(lock, flags, cpu)		\
> > +	local_lock_irqsave(lock, flags)
> > +
> > +#define local_qpw_lock_irqsave(lock, flags)		\
> > +	local_lock_irqsave(lock, flags)
> 
> ditto?
> 
> > +
> > +#define qpw_trylock(lock, cpu)				\
> > +	local_trylock(lock)
> > +
> > +#define local_qpw_trylock(lock)				\
> > +	local_trylock(lock)
> 
> ...
> 
> > +
> > +#define qpw_trylock_irqsave(lock, flags, cpu)		\
> > +	local_trylock_irqsave(lock, flags)
> > +
> > +#define qpw_unlock(lock, cpu)				\
> > +	local_unlock(lock)
> > +
> > +#define local_qpw_unlock(lock)				\
> > +	local_unlock(lock)
> 
> ...
> 
> > +
> > +#define qpw_unlock_irqrestore(lock, flags, cpu)		\
> > +	local_unlock_irqrestore(lock, flags)
> > +
> > +#define local_qpw_unlock_irqrestore(lock, flags)	\
> > +	local_unlock_irqrestore(lock, flags)
> 
> ...
> 
> > +
> > +#define qpw_lockdep_assert_held(lock)			\
> > +	lockdep_assert_held(lock)
> > +
> > +#define queue_percpu_work_on(c, wq, qpw)		\
> > +	queue_work_on(c, wq, &(qpw)->work)
> 
> qpw_queue_work_on() ?
> 
> Perhaps even better would be qpw_queue_work_for(), leaving some room for
> mystery about where/how the work will be executed :-)
> 

QPW comes from Queue PerCPU Work
Having it called qpw_queue_work_{on,for}() would be repetitve
But having qpw_on() or qpw_for() would be misleading :) 

That's why I went with queue_percpu_work_on() based on how we have the 
original function (queue_work_on) being called.

> > +
> > +#define flush_percpu_work(qpw)				\
> > +	flush_work(&(qpw)->work)
> 
> qpw_flush_work() ?

Same as above,
qpw_flush() ?

> 
> > +
> > +#define qpw_get_cpu(qpw)	smp_processor_id()
> > +
> > +#define qpw_is_cpu_remote(cpu)		(false)
> > +
> > +#define INIT_QPW(qpw, func, c)				\
> > +	INIT_WORK(&(qpw)->work, (func))
> > +
> > @@ -762,6 +762,41 @@ config CPU_ISOLATION
> >  
> >  	  Say Y if unsure.
> >  
> > +config QPW
> > +	bool "Queue per-CPU Work"
> > +	depends on SMP || COMPILE_TEST
> > +	default n
> > +	help
> > +	  Allow changing the behavior on per-CPU resource sharing with cache,
> > +	  from the regular local_locks() + queue_work_on(remote_cpu) to using
> > +	  per-CPU spinlocks on both local and remote operations.
> > +
> > +	  This is useful to give user the option on reducing IPIs to CPUs, and
> > +	  thus reduce interruptions and context switches. On the other hand, it
> > +	  increases generated code and will use atomic operations if spinlocks
> > +	  are selected.
> > +
> > +	  If set, will use the default behavior set in QPW_DEFAULT unless boot
> > +	  parameter qpw is passed with a different behavior.
> > +
> > +	  If unset, will use the local_lock() + queue_work_on() strategy,
> > +	  regardless of the boot parameter or QPW_DEFAULT.
> > +
> > +	  Say N if unsure.
> 
> Perhaps that too should just be selected automatically by CONFIG_NO_HZ_FULL and if
> the need arise in the future, make it visible to the user?
> 

I think it would be good to have this, and let whoever is building have the 
chance to disable QPW if it doesn't work well for their machines or 
workload, without having to add a new boot parameter to continue have 
their stuff working as always after a kernel update.

But that is open to discussion :)

Thanks!
Leo


  reply	other threads:[~2026-03-15 18:10 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-02 15:49 [PATCH v2 0/5] Introduce QPW for per-cpu operations (v2) Marcelo Tosatti
2026-03-02 15:49 ` [PATCH v2 1/5] slab: distinguish lock and trylock for sheaf_flush_main() Marcelo Tosatti
2026-03-02 15:49 ` [PATCH v2 2/5] Introducing qpw_lock() and per-cpu queue & flush work Marcelo Tosatti
2026-03-03 12:03   ` Vlastimil Babka (SUSE)
2026-03-03 16:02     ` Marcelo Tosatti
2026-03-08 18:00       ` Leonardo Bras
2026-03-09 10:14         ` Vlastimil Babka (SUSE)
2026-03-11  0:16           ` Leonardo Bras
2026-03-11  7:58   ` Vlastimil Babka (SUSE)
2026-03-15 17:37     ` Leonardo Bras
2026-03-16 10:55       ` Vlastimil Babka (SUSE)
2026-03-23  0:51         ` Leonardo Bras
2026-03-13 21:55   ` Frederic Weisbecker
2026-03-15 18:10     ` Leonardo Bras [this message]
2026-03-17 13:33       ` Frederic Weisbecker
2026-03-23  1:38         ` Leonardo Bras
2026-03-24 11:54           ` Frederic Weisbecker
2026-03-24 22:06             ` Leonardo Bras
2026-03-23 14:36         ` Marcelo Tosatti
2026-03-02 15:49 ` [PATCH v2 3/5] mm/swap: move bh draining into a separate workqueue Marcelo Tosatti
2026-03-02 15:49 ` [PATCH v2 4/5] swap: apply new queue_percpu_work_on() interface Marcelo Tosatti
2026-03-02 15:49 ` [PATCH v2 5/5] slub: " Marcelo Tosatti
2026-03-03 11:15 ` [PATCH v2 0/5] Introduce QPW for per-cpu operations (v2) Frederic Weisbecker
2026-03-08 18:02   ` Leonardo Bras
2026-03-03 12:07 ` Vlastimil Babka (SUSE)
2026-03-05 16:55 ` Frederic Weisbecker
2026-03-06  1:47   ` Marcelo Tosatti
2026-03-10 21:34     ` Frederic Weisbecker
2026-03-10 17:12   ` Marcelo Tosatti
2026-03-10 22:14     ` Frederic Weisbecker
2026-03-11  1:18     ` Hillf Danton
2026-03-11  7:54     ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abb2E3QW7t5Rhxrt@WindFlash \
    --to=leobras.c@gmail.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=boqun.feng@gmail.com \
    --cc=cl@linux.com \
    --cc=frederic@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mhocko@kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox