From: Frederic Weisbecker <frederic@kernel.org>
To: Leonardo Bras <leobras.c@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@linux.com>,
Pekka Enberg <penberg@kernel.org>,
David Rientjes <rientjes@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Vlastimil Babka <vbabka@suse.cz>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Waiman Long <longman@redhat.com>,
Boqun Feun <boqun.feng@gmail.com>
Subject: Re: [PATCH v2 2/5] Introducing qpw_lock() and per-cpu queue & flush work
Date: Tue, 17 Mar 2026 14:33:50 +0100 [thread overview]
Message-ID: <ablYPt92Y63GcIu2@localhost.localdomain> (raw)
In-Reply-To: <abb2E3QW7t5Rhxrt@WindFlash>
Le Sun, Mar 15, 2026 at 03:10:27PM -0300, Leonardo Bras a écrit :
> On Fri, Mar 13, 2026 at 10:55:47PM +0100, Frederic Weisbecker wrote:
> > I find this part of the semantic a bit weird. If we eventually queue
> > the work, why do we care about doing a local_lock() locally ?
>
> (Sorry, not sure if I was able to understand the question.)
>
> Local locks make sure a per-cpu procedure happens on the same CPU from
> start to end. Using migrate_disable & using per-cpu spinlocks on RT and
> doing preempt_disable in non_RT.
>
> Most of the cases happen to have the work done in the local cpu, and just
> a few procedures happen to be queued remotely, such as remote cache
> draining.
>
> Even with the new 'local_qpw_lock()' which is faster for cases we are sure
> to have local usages, on qpw=0 we have to make qpw_lock() a local_lock as
> well, as the cpu receiving the scheduled work needs to make sure to run it
> all without moving to a different cpu.
But queue_work_on() already makes sure the work doesn't move to a different CPU
(provided hotplug is correctly handled for the work).
Looks like we are both confused, so let's take a practical example. Suppose
CPU 0 queues a work to CPU 1 which sets a per-cpu variable named A to the value
"1". We want to guarantee that further reads of that per-cpu value by CPU 1
see the new value. With qpw=1, it looks like this:
CPU 0 CPU 1
----- -----
qpw_lock(CPU 1)
spin_lock(&QPW_CPU1)
qpw_queue_for(write_A, 1)
write_A()
A1 = per_cpu_ptr(&A, 1)
*A1 = 1
qpw_unlock(CPU 1)
spin_unlock(&QPW_CPU1)
read_A()
qpw_lock(CPU 1)
spin_lock(&QPW_CPU1)
r0 = __this_cpu_read(&A)
qpw_unlock(CPU 1)
spin_unlock(&QPW_CPU1)
CPU 0 took the spinlock while writing to A, so CPU 1 is guaranteed to further
observe the new value because it takes the same spinlock (r0 == 1)
Now look at the qpw=0 case:
CPU 0 CPU 1
----- -----
qpw_lock(CPU 1)
local_lock(&QPW_CPU0)
qpw_queue_for(write_A, 1)
queue_work_on(write_A, CPU 1)
qpw_unlock(CPU 1)
local_unlock(&QPW_CPU0)
// workqueue
write_A()
qpw_lock(CPU 1)
local_lock(&QPW_CPU1)
A1 = per_cpu_ptr(&A, 1)
*A1 = 1
qpw_unlock(CPU 1)
local_unlock(&QPW_CPU1)
read_A()
qpw_lock(CPU 1)
local_lock(&QPW_CPU1)
r0 = __this_cpu_read(&A)
qpw_unlock(CPU 1)
local_unlock(&QPW_CPU1)
Here CPU 0 queues the work on CPU 1 which writes and reads the new value
(r0 == 1). local_lock() / preempt_disable() makes sure the CPU doesn't change.
But what is the point in doing local_lock(&QPW_CPU0) on CPU 0 ?
> > >
> > > @@ -2840,6 +2840,16 @@ Kernel parameters
> > >
> > > The format of <cpu-list> is described above.
> > >
> > > + qpw= [KNL,SMP] Select a behavior on per-CPU resource sharing
> > > + and remote interference mechanism on a kernel built with
> > > + CONFIG_QPW.
> > > + Format: { "0" | "1" }
> > > + 0 - local_lock() + queue_work_on(remote_cpu)
> > > + 1 - spin_lock() for both local and remote operations
> > > +
> > > + Selecting 1 may be interesting for systems that want
> > > + to avoid interruption & context switches from IPIs.
> >
> > Like Vlastimil suggested, it would be better to just have it off by default
> > and turn it on only if nohz_full= is passed. Then we can consider introducing
> > the parameter later if the need arise.
>
> I agree with having it enabled with isolcpus/nohz_full, but I would
> recommend having this option anyway, as the user could disable qpw if
> wanted, or enable outside isolcpu scenarios for any reason.
Do you know any such users? Or suspect a potential usecase? If not we can still
add that option later. It's probably better than sticking with a useless
parameter that we'll have to maintain forever.
> > > +#define qpw_lockdep_assert_held(lock) \
> > > + lockdep_assert_held(lock)
> > > +
> > > +#define queue_percpu_work_on(c, wq, qpw) \
> > > + queue_work_on(c, wq, &(qpw)->work)
> >
> > qpw_queue_work_on() ?
> >
> > Perhaps even better would be qpw_queue_work_for(), leaving some room for
> > mystery about where/how the work will be executed :-)
> >
>
> QPW comes from Queue PerCPU Work
> Having it called qpw_queue_work_{on,for}() would be repetitve
Well, qpw_ just becomes the name of the subsystem and its prefix for APIs.
For example qpw_lock() doesn't mean that we queue and lock, it only means we lock.
> But having qpw_on() or qpw_for() would be misleading :)
>
> That's why I went with queue_percpu_work_on() based on how we have the
> original function (queue_work_on) being called.
That's much more misleading at it doesn't refer to qpw at all and it only
suggest that it's a queueing a per-cpu workqueue.
> > Perhaps that too should just be selected automatically by CONFIG_NO_HZ_FULL and if
> > the need arise in the future, make it visible to the user?
> >
>
> I think it would be good to have this, and let whoever is building have the
> chance to disable QPW if it doesn't work well for their machines or
> workload, without having to add a new boot parameter to continue have
> their stuff working as always after a kernel update.
>
> But that is open to discussion :)
Ok I guess we can stick with the Kconfig at least in the beginning.
Thanks.
--
Frederic Weisbecker
SUSE Labs
next prev parent reply other threads:[~2026-03-17 13:33 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-02 15:49 [PATCH v2 0/5] Introduce QPW for per-cpu operations (v2) Marcelo Tosatti
2026-03-02 15:49 ` [PATCH v2 1/5] slab: distinguish lock and trylock for sheaf_flush_main() Marcelo Tosatti
2026-03-02 15:49 ` [PATCH v2 2/5] Introducing qpw_lock() and per-cpu queue & flush work Marcelo Tosatti
2026-03-03 12:03 ` Vlastimil Babka (SUSE)
2026-03-03 16:02 ` Marcelo Tosatti
2026-03-08 18:00 ` Leonardo Bras
2026-03-09 10:14 ` Vlastimil Babka (SUSE)
2026-03-11 0:16 ` Leonardo Bras
2026-03-11 7:58 ` Vlastimil Babka (SUSE)
2026-03-15 17:37 ` Leonardo Bras
2026-03-16 10:55 ` Vlastimil Babka (SUSE)
2026-03-23 0:51 ` Leonardo Bras
2026-03-13 21:55 ` Frederic Weisbecker
2026-03-15 18:10 ` Leonardo Bras
2026-03-17 13:33 ` Frederic Weisbecker [this message]
2026-03-23 1:38 ` Leonardo Bras
2026-03-24 11:54 ` Frederic Weisbecker
2026-03-24 22:06 ` Leonardo Bras
2026-03-23 14:36 ` Marcelo Tosatti
2026-03-02 15:49 ` [PATCH v2 3/5] mm/swap: move bh draining into a separate workqueue Marcelo Tosatti
2026-03-02 15:49 ` [PATCH v2 4/5] swap: apply new queue_percpu_work_on() interface Marcelo Tosatti
2026-03-02 15:49 ` [PATCH v2 5/5] slub: " Marcelo Tosatti
2026-03-03 11:15 ` [PATCH v2 0/5] Introduce QPW for per-cpu operations (v2) Frederic Weisbecker
2026-03-08 18:02 ` Leonardo Bras
2026-03-03 12:07 ` Vlastimil Babka (SUSE)
2026-03-05 16:55 ` Frederic Weisbecker
2026-03-06 1:47 ` Marcelo Tosatti
2026-03-10 21:34 ` Frederic Weisbecker
2026-03-10 17:12 ` Marcelo Tosatti
2026-03-10 22:14 ` Frederic Weisbecker
2026-03-11 1:18 ` Hillf Danton
2026-03-11 7:54 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ablYPt92Y63GcIu2@localhost.localdomain \
--to=frederic@kernel.org \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=boqun.feng@gmail.com \
--cc=cl@linux.com \
--cc=hannes@cmpxchg.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=leobras.c@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=mhocko@kernel.org \
--cc=mtosatti@redhat.com \
--cc=muchun.song@linux.dev \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox