public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Leonardo Bras <leobras.c@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Waiman Long <longman@redhat.com>,
	Boqun Feun <boqun.feng@gmail.com>
Subject: Re: [PATCH v2 2/5] Introducing qpw_lock() and per-cpu queue & flush work
Date: Tue, 17 Mar 2026 14:33:50 +0100	[thread overview]
Message-ID: <ablYPt92Y63GcIu2@localhost.localdomain> (raw)
In-Reply-To: <abb2E3QW7t5Rhxrt@WindFlash>

Le Sun, Mar 15, 2026 at 03:10:27PM -0300, Leonardo Bras a écrit :
> On Fri, Mar 13, 2026 at 10:55:47PM +0100, Frederic Weisbecker wrote:
> > I find this part of the semantic a bit weird. If we eventually queue
> > the work, why do we care about doing a local_lock() locally ?
> 
> (Sorry, not sure if I was able to understand the question.)
> 
> Local locks make sure a per-cpu procedure happens on the same CPU from 
> start to end. Using migrate_disable & using per-cpu spinlocks on RT and 
> doing preempt_disable in non_RT.
> 
> Most of the cases happen to have the work done in the local cpu, and just 
> a few procedures happen to be queued remotely, such as remote cache 
> draining. 
> 
> Even with the new 'local_qpw_lock()' which is faster for cases we are sure 
> to have local usages, on qpw=0 we have to make qpw_lock() a local_lock as 
> well, as the cpu receiving the scheduled work needs to make sure to run it 
> all without moving to a different cpu.

But queue_work_on() already makes sure the work doesn't move to a different CPU
(provided hotplug is correctly handled for the work).

Looks like we are both confused, so let's take a practical example. Suppose
CPU 0 queues a work to CPU 1 which sets a per-cpu variable named A to the value
"1". We want to guarantee that further reads of that per-cpu value by CPU 1
see the new value. With qpw=1, it looks like this:

CPU 0                                               CPU 1
-----                                               -----

qpw_lock(CPU 1)
   spin_lock(&QPW_CPU1)
qpw_queue_for(write_A, 1)
    write_A()
       A1 = per_cpu_ptr(&A, 1)
       *A1 = 1
qpw_unlock(CPU 1)
    spin_unlock(&QPW_CPU1)
                                                   read_A()
                                                       qpw_lock(CPU 1)
                                                           spin_lock(&QPW_CPU1)
                                                       r0 = __this_cpu_read(&A)
                                                       qpw_unlock(CPU 1)
                                                           spin_unlock(&QPW_CPU1)
                                                   

CPU 0 took the spinlock while writing to A, so CPU 1 is guaranteed to further
observe the new value because it takes the same spinlock (r0 == 1)

Now look at the qpw=0 case:
                                  
CPU 0                                               CPU 1
-----                                               -----

qpw_lock(CPU 1)
   local_lock(&QPW_CPU0)
qpw_queue_for(write_A, 1)
    queue_work_on(write_A, CPU 1)
qpw_unlock(CPU 1)
    local_unlock(&QPW_CPU0)
                                                   // workqueue
                                                   write_A()
                                                       qpw_lock(CPU 1)
                                                           local_lock(&QPW_CPU1)
                                                       A1 = per_cpu_ptr(&A, 1)
                                                       *A1 = 1
                                                       qpw_unlock(CPU 1)
                                                           local_unlock(&QPW_CPU1)

                                                   read_A()
                                                       qpw_lock(CPU 1)
                                                           local_lock(&QPW_CPU1)
                                                       r0 = __this_cpu_read(&A)
                                                       qpw_unlock(CPU 1)
                                                           local_unlock(&QPW_CPU1)

Here CPU 0 queues the work on CPU 1 which writes and reads the new value
(r0 == 1). local_lock() / preempt_disable() makes sure the CPU doesn't change.

But what is the point in doing local_lock(&QPW_CPU0) on CPU 0 ?


> > > 
> > > @@ -2840,6 +2840,16 @@ Kernel parameters
> > >  
> > >  			The format of <cpu-list> is described above.
> > >  
> > > +	qpw=		[KNL,SMP] Select a behavior on per-CPU resource sharing
> > > +			and remote interference mechanism on a kernel built with
> > > +			CONFIG_QPW.
> > > +			Format: { "0" | "1" }
> > > +			0 - local_lock() + queue_work_on(remote_cpu)
> > > +			1 - spin_lock() for both local and remote operations
> > > +
> > > +			Selecting 1 may be interesting for systems that want
> > > +			to avoid interruption & context switches from IPIs.
> > 
> > Like Vlastimil suggested, it would be better to just have it off by default
> > and turn it on only if nohz_full= is passed. Then we can consider introducing
> > the parameter later if the need arise.
> 
> I agree with having it enabled with isolcpus/nohz_full, but I would 
> recommend having this option anyway, as the user could disable qpw if 
> wanted, or enable outside isolcpu scenarios for any reason.

Do you know any such users? Or suspect a potential usecase? If not we can still
add that option later. It's probably better than sticking with a useless
parameter that we'll have to maintain forever.

> > > +#define qpw_lockdep_assert_held(lock)			\
> > > +	lockdep_assert_held(lock)
> > > +
> > > +#define queue_percpu_work_on(c, wq, qpw)		\
> > > +	queue_work_on(c, wq, &(qpw)->work)
> > 
> > qpw_queue_work_on() ?
> > 
> > Perhaps even better would be qpw_queue_work_for(), leaving some room for
> > mystery about where/how the work will be executed :-)
> > 
> 
> QPW comes from Queue PerCPU Work
> Having it called qpw_queue_work_{on,for}() would be repetitve

Well, qpw_ just becomes the name of the subsystem and its prefix for APIs.
For example qpw_lock() doesn't mean that we queue and lock, it only means we lock.

> But having qpw_on() or qpw_for() would be misleading :) 
> 
> That's why I went with queue_percpu_work_on() based on how we have the 
> original function (queue_work_on) being called.

That's much more misleading at it doesn't refer to qpw at all and it only
suggest that it's a queueing a per-cpu workqueue.

> > Perhaps that too should just be selected automatically by CONFIG_NO_HZ_FULL and if
> > the need arise in the future, make it visible to the user?
> > 
> 
> I think it would be good to have this, and let whoever is building have the 
> chance to disable QPW if it doesn't work well for their machines or 
> workload, without having to add a new boot parameter to continue have 
> their stuff working as always after a kernel update.
> 
> But that is open to discussion :)

Ok I guess we can stick with the Kconfig at least in the beginning.

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

  reply	other threads:[~2026-03-17 13:33 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-02 15:49 [PATCH v2 0/5] Introduce QPW for per-cpu operations (v2) Marcelo Tosatti
2026-03-02 15:49 ` [PATCH v2 1/5] slab: distinguish lock and trylock for sheaf_flush_main() Marcelo Tosatti
2026-03-02 15:49 ` [PATCH v2 2/5] Introducing qpw_lock() and per-cpu queue & flush work Marcelo Tosatti
2026-03-03 12:03   ` Vlastimil Babka (SUSE)
2026-03-03 16:02     ` Marcelo Tosatti
2026-03-08 18:00       ` Leonardo Bras
2026-03-09 10:14         ` Vlastimil Babka (SUSE)
2026-03-11  0:16           ` Leonardo Bras
2026-03-11  7:58   ` Vlastimil Babka (SUSE)
2026-03-15 17:37     ` Leonardo Bras
2026-03-16 10:55       ` Vlastimil Babka (SUSE)
2026-03-23  0:51         ` Leonardo Bras
2026-03-13 21:55   ` Frederic Weisbecker
2026-03-15 18:10     ` Leonardo Bras
2026-03-17 13:33       ` Frederic Weisbecker [this message]
2026-03-23  1:38         ` Leonardo Bras
2026-03-24 11:54           ` Frederic Weisbecker
2026-03-24 22:06             ` Leonardo Bras
2026-03-23 14:36         ` Marcelo Tosatti
2026-03-02 15:49 ` [PATCH v2 3/5] mm/swap: move bh draining into a separate workqueue Marcelo Tosatti
2026-03-02 15:49 ` [PATCH v2 4/5] swap: apply new queue_percpu_work_on() interface Marcelo Tosatti
2026-03-02 15:49 ` [PATCH v2 5/5] slub: " Marcelo Tosatti
2026-03-03 11:15 ` [PATCH v2 0/5] Introduce QPW for per-cpu operations (v2) Frederic Weisbecker
2026-03-08 18:02   ` Leonardo Bras
2026-03-03 12:07 ` Vlastimil Babka (SUSE)
2026-03-05 16:55 ` Frederic Weisbecker
2026-03-06  1:47   ` Marcelo Tosatti
2026-03-10 21:34     ` Frederic Weisbecker
2026-03-10 17:12   ` Marcelo Tosatti
2026-03-10 22:14     ` Frederic Weisbecker
2026-03-11  1:18     ` Hillf Danton
2026-03-11  7:54     ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ablYPt92Y63GcIu2@localhost.localdomain \
    --to=frederic@kernel.org \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=boqun.feng@gmail.com \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=leobras.c@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mhocko@kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox