From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A27D43CD8CB for ; Tue, 17 Mar 2026 13:33:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773754433; cv=none; b=ikR+VlzDXQ+LIfVfq9HegvU56qKvOMFVyybYRlegYl/Cz4ZSZdojVybIKruqGs3KvFJ6KLfuBofKeRdtOtL8DJyJqZNvghURLAuxApi2dAQ5XieoCkL8ndQXdXyeA8h8t9wKls1IIF5K52g1gsW3XFD8rSW5vg5Ny2XJFzOS/zk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773754433; c=relaxed/simple; bh=JX2H/y+49Q+R+OOqtdz8X6bLyH5h+JbiWbT3ZJSu2fk=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=u3wCZyBFDxTdEB0/FDGeWMGdxsfn6vHbjH5BsfJBxcxfYZpqgdIJQG8o02z/OSh9GZYIXxvlhg3gvi0vjYOpShi/j3UoE6zxTlmtrp81BSXqeWAMlqfvLTBj9PaNnQfc+SjhvV/GvNuJ/aulCcuYyIk1yco1/QBo0FzymsIeLwU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=o0UGYuCD; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="o0UGYuCD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B1141C2BCB5; Tue, 17 Mar 2026 13:33:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773754433; bh=JX2H/y+49Q+R+OOqtdz8X6bLyH5h+JbiWbT3ZJSu2fk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=o0UGYuCDKbFzDohH8kmxaxfxkxtFGeKr7GKbCyr4wkvQ9TsXZYxbSoxtpKJjSdqgP pNgeUMuOnaaZcUjXRvQButrVBuYnMEb1lNrkme28GAWyEW/GvTok+Q7axIywujx3W5 P/McHw3f1Gnv1I7OKFswPVL8zJk5t+evfJYWgDYCy4QmQzNrejd+Ceas6JvxzkA6hk UBrgVHaHPWAR+SpnHw/LeGUUhG0ddsheT4reiI5OURWEpbZnXxe2tdzzlzRxJV4flU Hgq7PcMz+WSt0/9Qg5qry7/XinL13DvY6vhquBW7DrSn3jCnLgR3O6TwQ3X36QWbBX uItNt/pwlq+dg== Date: Tue, 17 Mar 2026 14:33:50 +0100 From: Frederic Weisbecker To: Leonardo Bras Cc: Marcelo Tosatti , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Vlastimil Babka , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Thomas Gleixner , Waiman Long , Boqun Feun Subject: Re: [PATCH v2 2/5] Introducing qpw_lock() and per-cpu queue & flush work Message-ID: References: <20260302154945.143996316@redhat.com> <20260302155105.214878062@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Le Sun, Mar 15, 2026 at 03:10:27PM -0300, Leonardo Bras a écrit : > On Fri, Mar 13, 2026 at 10:55:47PM +0100, Frederic Weisbecker wrote: > > I find this part of the semantic a bit weird. If we eventually queue > > the work, why do we care about doing a local_lock() locally ? > > (Sorry, not sure if I was able to understand the question.) > > Local locks make sure a per-cpu procedure happens on the same CPU from > start to end. Using migrate_disable & using per-cpu spinlocks on RT and > doing preempt_disable in non_RT. > > Most of the cases happen to have the work done in the local cpu, and just > a few procedures happen to be queued remotely, such as remote cache > draining. > > Even with the new 'local_qpw_lock()' which is faster for cases we are sure > to have local usages, on qpw=0 we have to make qpw_lock() a local_lock as > well, as the cpu receiving the scheduled work needs to make sure to run it > all without moving to a different cpu. But queue_work_on() already makes sure the work doesn't move to a different CPU (provided hotplug is correctly handled for the work). Looks like we are both confused, so let's take a practical example. Suppose CPU 0 queues a work to CPU 1 which sets a per-cpu variable named A to the value "1". We want to guarantee that further reads of that per-cpu value by CPU 1 see the new value. With qpw=1, it looks like this: CPU 0 CPU 1 ----- ----- qpw_lock(CPU 1) spin_lock(&QPW_CPU1) qpw_queue_for(write_A, 1) write_A() A1 = per_cpu_ptr(&A, 1) *A1 = 1 qpw_unlock(CPU 1) spin_unlock(&QPW_CPU1) read_A() qpw_lock(CPU 1) spin_lock(&QPW_CPU1) r0 = __this_cpu_read(&A) qpw_unlock(CPU 1) spin_unlock(&QPW_CPU1) CPU 0 took the spinlock while writing to A, so CPU 1 is guaranteed to further observe the new value because it takes the same spinlock (r0 == 1) Now look at the qpw=0 case: CPU 0 CPU 1 ----- ----- qpw_lock(CPU 1) local_lock(&QPW_CPU0) qpw_queue_for(write_A, 1) queue_work_on(write_A, CPU 1) qpw_unlock(CPU 1) local_unlock(&QPW_CPU0) // workqueue write_A() qpw_lock(CPU 1) local_lock(&QPW_CPU1) A1 = per_cpu_ptr(&A, 1) *A1 = 1 qpw_unlock(CPU 1) local_unlock(&QPW_CPU1) read_A() qpw_lock(CPU 1) local_lock(&QPW_CPU1) r0 = __this_cpu_read(&A) qpw_unlock(CPU 1) local_unlock(&QPW_CPU1) Here CPU 0 queues the work on CPU 1 which writes and reads the new value (r0 == 1). local_lock() / preempt_disable() makes sure the CPU doesn't change. But what is the point in doing local_lock(&QPW_CPU0) on CPU 0 ? > > > > > > @@ -2840,6 +2840,16 @@ Kernel parameters > > > > > > The format of is described above. > > > > > > + qpw= [KNL,SMP] Select a behavior on per-CPU resource sharing > > > + and remote interference mechanism on a kernel built with > > > + CONFIG_QPW. > > > + Format: { "0" | "1" } > > > + 0 - local_lock() + queue_work_on(remote_cpu) > > > + 1 - spin_lock() for both local and remote operations > > > + > > > + Selecting 1 may be interesting for systems that want > > > + to avoid interruption & context switches from IPIs. > > > > Like Vlastimil suggested, it would be better to just have it off by default > > and turn it on only if nohz_full= is passed. Then we can consider introducing > > the parameter later if the need arise. > > I agree with having it enabled with isolcpus/nohz_full, but I would > recommend having this option anyway, as the user could disable qpw if > wanted, or enable outside isolcpu scenarios for any reason. Do you know any such users? Or suspect a potential usecase? If not we can still add that option later. It's probably better than sticking with a useless parameter that we'll have to maintain forever. > > > +#define qpw_lockdep_assert_held(lock) \ > > > + lockdep_assert_held(lock) > > > + > > > +#define queue_percpu_work_on(c, wq, qpw) \ > > > + queue_work_on(c, wq, &(qpw)->work) > > > > qpw_queue_work_on() ? > > > > Perhaps even better would be qpw_queue_work_for(), leaving some room for > > mystery about where/how the work will be executed :-) > > > > QPW comes from Queue PerCPU Work > Having it called qpw_queue_work_{on,for}() would be repetitve Well, qpw_ just becomes the name of the subsystem and its prefix for APIs. For example qpw_lock() doesn't mean that we queue and lock, it only means we lock. > But having qpw_on() or qpw_for() would be misleading :) > > That's why I went with queue_percpu_work_on() based on how we have the > original function (queue_work_on) being called. That's much more misleading at it doesn't refer to qpw at all and it only suggest that it's a queueing a per-cpu workqueue. > > Perhaps that too should just be selected automatically by CONFIG_NO_HZ_FULL and if > > the need arise in the future, make it visible to the user? > > > > I think it would be good to have this, and let whoever is building have the > chance to disable QPW if it doesn't work well for their machines or > workload, without having to add a new boot parameter to continue have > their stuff working as always after a kernel update. > > But that is open to discussion :) Ok I guess we can stick with the Kconfig at least in the beginning. Thanks. -- Frederic Weisbecker SUSE Labs