From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A27D43CD8CB
	for <linux-kernel@vger.kernel.org>; Tue, 17 Mar 2026 13:33:53 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773754433; cv=none; b=ikR+VlzDXQ+LIfVfq9HegvU56qKvOMFVyybYRlegYl/Cz4ZSZdojVybIKruqGs3KvFJ6KLfuBofKeRdtOtL8DJyJqZNvghURLAuxApi2dAQ5XieoCkL8ndQXdXyeA8h8t9wKls1IIF5K52g1gsW3XFD8rSW5vg5Ny2XJFzOS/zk=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773754433; c=relaxed/simple;
	bh=JX2H/y+49Q+R+OOqtdz8X6bLyH5h+JbiWbT3ZJSu2fk=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=u3wCZyBFDxTdEB0/FDGeWMGdxsfn6vHbjH5BsfJBxcxfYZpqgdIJQG8o02z/OSh9GZYIXxvlhg3gvi0vjYOpShi/j3UoE6zxTlmtrp81BSXqeWAMlqfvLTBj9PaNnQfc+SjhvV/GvNuJ/aulCcuYyIk1yco1/QBo0FzymsIeLwU=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=o0UGYuCD; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="o0UGYuCD"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id B1141C2BCB5;
	Tue, 17 Mar 2026 13:33:52 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1773754433;
	bh=JX2H/y+49Q+R+OOqtdz8X6bLyH5h+JbiWbT3ZJSu2fk=;
	h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
	b=o0UGYuCDKbFzDohH8kmxaxfxkxtFGeKr7GKbCyr4wkvQ9TsXZYxbSoxtpKJjSdqgP
	 pNgeUMuOnaaZcUjXRvQButrVBuYnMEb1lNrkme28GAWyEW/GvTok+Q7axIywujx3W5
	 P/McHw3f1Gnv1I7OKFswPVL8zJk5t+evfJYWgDYCy4QmQzNrejd+Ceas6JvxzkA6hk
	 UBrgVHaHPWAR+SpnHw/LeGUUhG0ddsheT4reiI5OURWEpbZnXxe2tdzzlzRxJV4flU
	 Hgq7PcMz+WSt0/9Qg5qry7/XinL13DvY6vhquBW7DrSn3jCnLgR3O6TwQ3X36QWbBX
	 uItNt/pwlq+dg==
Date: Tue, 17 Mar 2026 14:33:50 +0100
From: Frederic Weisbecker <frederic@kernel.org>
To: Leonardo Bras <leobras.c@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>, Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Waiman Long <longman@redhat.com>, Boqun Feun <boqun.feng@gmail.com>
Subject: Re: [PATCH v2 2/5] Introducing qpw_lock() and per-cpu queue & flush
 work
Message-ID: <ablYPt92Y63GcIu2@localhost.localdomain>
References: <20260302154945.143996316@redhat.com>
 <20260302155105.214878062@redhat.com>
 <abSH40oW9qiVDXZS@pavilion.home>
 <abb2E3QW7t5Rhxrt@WindFlash>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <abb2E3QW7t5Rhxrt@WindFlash>

Le Sun, Mar 15, 2026 at 03:10:27PM -0300, Leonardo Bras a écrit :
> On Fri, Mar 13, 2026 at 10:55:47PM +0100, Frederic Weisbecker wrote:
> > I find this part of the semantic a bit weird. If we eventually queue
> > the work, why do we care about doing a local_lock() locally ?
> 
> (Sorry, not sure if I was able to understand the question.)
> 
> Local locks make sure a per-cpu procedure happens on the same CPU from 
> start to end. Using migrate_disable & using per-cpu spinlocks on RT and 
> doing preempt_disable in non_RT.
> 
> Most of the cases happen to have the work done in the local cpu, and just 
> a few procedures happen to be queued remotely, such as remote cache 
> draining. 
> 
> Even with the new 'local_qpw_lock()' which is faster for cases we are sure 
> to have local usages, on qpw=0 we have to make qpw_lock() a local_lock as 
> well, as the cpu receiving the scheduled work needs to make sure to run it 
> all without moving to a different cpu.

But queue_work_on() already makes sure the work doesn't move to a different CPU
(provided hotplug is correctly handled for the work).

Looks like we are both confused, so let's take a practical example. Suppose
CPU 0 queues a work to CPU 1 which sets a per-cpu variable named A to the value
"1". We want to guarantee that further reads of that per-cpu value by CPU 1
see the new value. With qpw=1, it looks like this:

CPU 0                                               CPU 1
-----                                               -----

qpw_lock(CPU 1)
   spin_lock(&QPW_CPU1)
qpw_queue_for(write_A, 1)
    write_A()
       A1 = per_cpu_ptr(&A, 1)
       *A1 = 1
qpw_unlock(CPU 1)
    spin_unlock(&QPW_CPU1)
                                                   read_A()
                                                       qpw_lock(CPU 1)
                                                           spin_lock(&QPW_CPU1)
                                                       r0 = __this_cpu_read(&A)
                                                       qpw_unlock(CPU 1)
                                                           spin_unlock(&QPW_CPU1)
                                                   

CPU 0 took the spinlock while writing to A, so CPU 1 is guaranteed to further
observe the new value because it takes the same spinlock (r0 == 1)

Now look at the qpw=0 case:
                                  
CPU 0                                               CPU 1
-----                                               -----

qpw_lock(CPU 1)
   local_lock(&QPW_CPU0)
qpw_queue_for(write_A, 1)
    queue_work_on(write_A, CPU 1)
qpw_unlock(CPU 1)
    local_unlock(&QPW_CPU0)
                                                   // workqueue
                                                   write_A()
                                                       qpw_lock(CPU 1)
                                                           local_lock(&QPW_CPU1)
                                                       A1 = per_cpu_ptr(&A, 1)
                                                       *A1 = 1
                                                       qpw_unlock(CPU 1)
                                                           local_unlock(&QPW_CPU1)

                                                   read_A()
                                                       qpw_lock(CPU 1)
                                                           local_lock(&QPW_CPU1)
                                                       r0 = __this_cpu_read(&A)
                                                       qpw_unlock(CPU 1)
                                                           local_unlock(&QPW_CPU1)

Here CPU 0 queues the work on CPU 1 which writes and reads the new value
(r0 == 1). local_lock() / preempt_disable() makes sure the CPU doesn't change.

But what is the point in doing local_lock(&QPW_CPU0) on CPU 0 ?


> > > 
> > > @@ -2840,6 +2840,16 @@ Kernel parameters
> > >  
> > >  			The format of <cpu-list> is described above.
> > >  
> > > +	qpw=		[KNL,SMP] Select a behavior on per-CPU resource sharing
> > > +			and remote interference mechanism on a kernel built with
> > > +			CONFIG_QPW.
> > > +			Format: { "0" | "1" }
> > > +			0 - local_lock() + queue_work_on(remote_cpu)
> > > +			1 - spin_lock() for both local and remote operations
> > > +
> > > +			Selecting 1 may be interesting for systems that want
> > > +			to avoid interruption & context switches from IPIs.
> > 
> > Like Vlastimil suggested, it would be better to just have it off by default
> > and turn it on only if nohz_full= is passed. Then we can consider introducing
> > the parameter later if the need arise.
> 
> I agree with having it enabled with isolcpus/nohz_full, but I would 
> recommend having this option anyway, as the user could disable qpw if 
> wanted, or enable outside isolcpu scenarios for any reason.

Do you know any such users? Or suspect a potential usecase? If not we can still
add that option later. It's probably better than sticking with a useless
parameter that we'll have to maintain forever.

> > > +#define qpw_lockdep_assert_held(lock)			\
> > > +	lockdep_assert_held(lock)
> > > +
> > > +#define queue_percpu_work_on(c, wq, qpw)		\
> > > +	queue_work_on(c, wq, &(qpw)->work)
> > 
> > qpw_queue_work_on() ?
> > 
> > Perhaps even better would be qpw_queue_work_for(), leaving some room for
> > mystery about where/how the work will be executed :-)
> > 
> 
> QPW comes from Queue PerCPU Work
> Having it called qpw_queue_work_{on,for}() would be repetitve

Well, qpw_ just becomes the name of the subsystem and its prefix for APIs.
For example qpw_lock() doesn't mean that we queue and lock, it only means we lock.

> But having qpw_on() or qpw_for() would be misleading :) 
> 
> That's why I went with queue_percpu_work_on() based on how we have the 
> original function (queue_work_on) being called.

That's much more misleading at it doesn't refer to qpw at all and it only
suggest that it's a queueing a per-cpu workqueue.

> > Perhaps that too should just be selected automatically by CONFIG_NO_HZ_FULL and if
> > the need arise in the future, make it visible to the user?
> > 
> 
> I think it would be good to have this, and let whoever is building have the 
> chance to disable QPW if it doesn't work well for their machines or 
> workload, without having to add a new boot parameter to continue have 
> their stuff working as always after a kernel update.
> 
> But that is open to discussion :)

Ok I guess we can stick with the Kconfig at least in the beginning.

Thanks.

-- 
Frederic Weisbecker
SUSE Labs