From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF70A3368B7 for ; Sun, 15 Mar 2026 18:10:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.45 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773598253; cv=none; b=RIcgntAiz53VJFXEfQUv3ja4yvGkW1NboXCqcmPZXM/uJ0BD3E6RGrYR/H8EjItI0Q1H2OeIUSNNr0a/EncfGn6t4W2XTg/D6/sYzOeRwZa6I0lWO5Xt2AogwDIOwIFGXcfDxcfb0BmS5r0vp8mKjS1WusOAYP2XiL/xRQAm4bQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773598253; c=relaxed/simple; bh=Yf1zhy+OXZAdPVqXNT03xRmvOap3OrDHM+BYRCTHNJA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type:Content-Disposition; b=W+xnjrZHOhkbmeg3uBHMtMMdvikQlFYCXDhwjtor6Q3USDvH026RSCYm/WV4M2vIgMrokyH+uyOWz4zlomqKRuKC+kwmjWFKPsx2kpKE1t9dt9Zz2j7P0CBubq/oCoJkEWJqHhRQK0br3y8TKaz5bi32hbWVLLvAhQnlYj4hicM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=G2Plmwle; arc=none smtp.client-ip=209.85.221.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="G2Plmwle" Received: by mail-wr1-f45.google.com with SMTP id ffacd0b85a97d-439b6d9c981so2693307f8f.1 for ; Sun, 15 Mar 2026 11:10:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773598250; x=1774203050; darn=vger.kernel.org; h=content-transfer-encoding:content-disposition:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from:from:to :cc:subject:date:message-id:reply-to; bh=wFfWUXvwHoJE4xSBTFJR2gnf8f2b6YS0PT0BDPveSGA=; b=G2PlmwlePK7lnlv9lV0TDiZEiT0bD9s0HyyV8fFGukMuF3ZQe/lkvCHRL6yUxRhBt1 QtoWrOvT/9BsLMH1hh/A9sR3z15d4Orp4Bk62yaLi4lFY6CQGzMp92as6EgxXdtd6gtQ lTzYpvffxVWIL6+EkyN52AM8QQ43Oat3hKINt/TKXfbAFUbrLteWXgFYwRK3/Cfzmfut FAMm+3wRM6QSDX+M1lFqQUgUIlmWqXOWC9vtOc5LTq3JVLPqN+NcrGX2B1v59t2khnma 0pG2Q2BAgGIxxl765QdY2YNqONoMtAMTmdhhwyQWy+Oj1CGSaMAWz4m/q+vVaLV0GV9K a+sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773598250; x=1774203050; h=content-transfer-encoding:content-disposition:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wFfWUXvwHoJE4xSBTFJR2gnf8f2b6YS0PT0BDPveSGA=; b=U+rsznXEBImgRXhxEaFb3dHBKde4Ua8EuDM7u00m1Ha8HZXy3l7wVIVVUHqwuNn+ab vsRtBVO0lblwrl91WJFXyxwEzSy9G4qhbuhjUESJumwQy+sW2SAPsBVRKpOd0Tiqs4St aRMDYbzDyS2tv/dFghmEtxr6McT0cLVw3ri7bSAJSDVfwAjX9rrqrnT/KQPAzUSfUMNV Bk4h8fVKbqSgUkfam/0KfZ6VIQVszpdzciI6OtIuqOfmSn+Yd8IPJJmfT8c32FLYPuDD Hd6Mx+x2eb4T9PxmNIt8cDQsUhB7VC5Sm1yM+pYt9LCWqZUe+GOw/aqRZ68cNZbip6+x wBLQ== X-Forwarded-Encrypted: i=1; AJvYcCWW8JQuWqMN9rYDpL3m0IMnMa51QdXhFxhg5tfyslzgssRFY8UrL9GmuyGaXtVXuo2NeVhHBD3Gtz8NPgk=@vger.kernel.org X-Gm-Message-State: AOJu0YycCUaQl/28FuzlcryAmw2HDSUAGYZfe6FpAtTuWtxpXdQ1OCwL zwLktVi0uT1EGEC9wfSBzZPF+6uMACjDu9B00CqxpXmaUJ2Q5MrnEba7 X-Gm-Gg: ATEYQzz7xauCM20T1/P1ZopJaAMI1oyoie3ECXi6A09U2VhyOFIH/tVCEtVQxw9uAXB 3MxWbcETnxNmw7vYyHAWb+O0IndbkwS7/1CncTwG+axIZNdCzCBFlcRNmlsa70wfh7OXSSb3t1j K9bdPXK0wqj1Z9tMNPiwkeh+N145ZPjjnjsKIGDLRgy3UWn1pQNgE60n5NQ/2/1o2IcWrEvVzfW NPp0T4vPwLFHRVbVoi9rPplQ1zQCcbVfdgFVDerucOKsy7JNcP2Y4SY3PnqHm5B2PcoIK75YT5e QE89RQJdHxYXvsk9EAXk2uVkzAifaRYGOf017gpterqMPikAT5CaafDfd4RWuQf/RZ+csFTVJKS KxD4RQHazIIS1qgFo0HUUPffm0jPr82E9RWwPpqEUqREVzYTOslCGkc4Jlk9jkKuOHAis43cZpw HRaP3v3uaLf8tJu7CQzQUNpr64gnnwDOjiBSA= X-Received: by 2002:a05:6000:40db:b0:439:b3bc:4608 with SMTP id ffacd0b85a97d-43a04d80c26mr19640290f8f.7.1773598250045; Sun, 15 Mar 2026 11:10:50 -0700 (PDT) Received: from WindFlash.powerhub ([2a0a:ef40:1b2a:fa01:9944:6a8c:dc37:eba5]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-439fe1a76e5sm34632151f8f.12.2026.03.15.11.10.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 15 Mar 2026 11:10:49 -0700 (PDT) From: Leonardo Bras To: Frederic Weisbecker Cc: Leonardo Bras , Marcelo Tosatti , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Vlastimil Babka , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Thomas Gleixner , Waiman Long , Boqun Feun Subject: Re: [PATCH v2 2/5] Introducing qpw_lock() and per-cpu queue & flush work Date: Sun, 15 Mar 2026 15:10:27 -0300 Message-ID: X-Mailer: git-send-email 2.53.0 In-Reply-To: References: <20260302154945.143996316@redhat.com> <20260302155105.214878062@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit On Fri, Mar 13, 2026 at 10:55:47PM +0100, Frederic Weisbecker wrote: > Le Mon, Mar 02, 2026 at 12:49:47PM -0300, Marcelo Tosatti a écrit : > > Some places in the kernel implement a parallel programming strategy > > consisting on local_locks() for most of the work, and some rare remote > > operations are scheduled on target cpu. This keeps cache bouncing low since > > cacheline tends to be mostly local, and avoids the cost of locks in non-RT > > kernels, even though the very few remote operations will be expensive due > > to scheduling overhead. > > > > On the other hand, for RT workloads this can represent a problem: > > scheduling work on remote cpu that are executing low latency tasks > > is undesired and can introduce unexpected deadline misses. > > > > It's interesting, though, that local_lock()s in RT kernels become > > spinlock(). We can make use of those to avoid scheduling work on a remote > > cpu by directly updating another cpu's per_cpu structure, while holding > > it's spinlock(). > > > > In order to do that, it's necessary to introduce a new set of functions to > > make it possible to get another cpu's per-cpu "local" lock (qpw_{un,}lock*) > > and also the corresponding queue_percpu_work_on() and flush_percpu_work() > > helpers to run the remote work. > > > > Users of non-RT kernels but with low latency requirements can select > > similar functionality by using the CONFIG_QPW compile time option. > > > > On CONFIG_QPW disabled kernels, no changes are expected, as every > > one of the introduced helpers work the exactly same as the current > > implementation: > > qpw_{un,}lock*() -> local_{un,}lock*() (ignores cpu parameter) > > I find this part of the semantic a bit weird. If we eventually queue > the work, why do we care about doing a local_lock() locally ? (Sorry, not sure if I was able to understand the question.) Local locks make sure a per-cpu procedure happens on the same CPU from start to end. Using migrate_disable & using per-cpu spinlocks on RT and doing preempt_disable in non_RT. Most of the cases happen to have the work done in the local cpu, and just a few procedures happen to be queued remotely, such as remote cache draining. Even with the new 'local_qpw_lock()' which is faster for cases we are sure to have local usages, on qpw=0 we have to make qpw_lock() a local_lock as well, as the cpu receiving the scheduled work needs to make sure to run it all without moving to a different cpu. > > > queue_percpu_work_on() -> queue_work_on() > > flush_percpu_work() -> flush_work() btw Marcelo, I think we need to do add the local_qpw_lock here as well, or change the first line to '{local_,}qpw_{un,}lock*()' > > > > @@ -2840,6 +2840,16 @@ Kernel parameters > > > > The format of is described above. > > > > + qpw= [KNL,SMP] Select a behavior on per-CPU resource sharing > > + and remote interference mechanism on a kernel built with > > + CONFIG_QPW. > > + Format: { "0" | "1" } > > + 0 - local_lock() + queue_work_on(remote_cpu) > > + 1 - spin_lock() for both local and remote operations > > + > > + Selecting 1 may be interesting for systems that want > > + to avoid interruption & context switches from IPIs. > > Like Vlastimil suggested, it would be better to just have it off by default > and turn it on only if nohz_full= is passed. Then we can consider introducing > the parameter later if the need arise. I agree with having it enabled with isolcpus/nohz_full, but I would recommend having this option anyway, as the user could disable qpw if wanted, or enable outside isolcpu scenarios for any reason. > > > +#define qpw_lock_init(lock) \ > > + local_lock_init(lock) > > + > > +#define qpw_trylock_init(lock) \ > > + local_trylock_init(lock) > > + > > +#define qpw_lock(lock, cpu) \ > > + local_lock(lock) > > + > > +#define local_qpw_lock(lock) \ > > + local_lock(lock) > > It would be easier to grep if all the APIs start with qpw_* prefix. > > qpw_local_lock() ? Sure, not against the change. And sure, would need to change all versions starting with local_ . > > > + > > +#define qpw_lock_irqsave(lock, flags, cpu) \ > > + local_lock_irqsave(lock, flags) > > + > > +#define local_qpw_lock_irqsave(lock, flags) \ > > + local_lock_irqsave(lock, flags) > > ditto? > > > + > > +#define qpw_trylock(lock, cpu) \ > > + local_trylock(lock) > > + > > +#define local_qpw_trylock(lock) \ > > + local_trylock(lock) > > ... > > > + > > +#define qpw_trylock_irqsave(lock, flags, cpu) \ > > + local_trylock_irqsave(lock, flags) > > + > > +#define qpw_unlock(lock, cpu) \ > > + local_unlock(lock) > > + > > +#define local_qpw_unlock(lock) \ > > + local_unlock(lock) > > ... > > > + > > +#define qpw_unlock_irqrestore(lock, flags, cpu) \ > > + local_unlock_irqrestore(lock, flags) > > + > > +#define local_qpw_unlock_irqrestore(lock, flags) \ > > + local_unlock_irqrestore(lock, flags) > > ... > > > + > > +#define qpw_lockdep_assert_held(lock) \ > > + lockdep_assert_held(lock) > > + > > +#define queue_percpu_work_on(c, wq, qpw) \ > > + queue_work_on(c, wq, &(qpw)->work) > > qpw_queue_work_on() ? > > Perhaps even better would be qpw_queue_work_for(), leaving some room for > mystery about where/how the work will be executed :-) > QPW comes from Queue PerCPU Work Having it called qpw_queue_work_{on,for}() would be repetitve But having qpw_on() or qpw_for() would be misleading :) That's why I went with queue_percpu_work_on() based on how we have the original function (queue_work_on) being called. > > + > > +#define flush_percpu_work(qpw) \ > > + flush_work(&(qpw)->work) > > qpw_flush_work() ? Same as above, qpw_flush() ? > > > + > > +#define qpw_get_cpu(qpw) smp_processor_id() > > + > > +#define qpw_is_cpu_remote(cpu) (false) > > + > > +#define INIT_QPW(qpw, func, c) \ > > + INIT_WORK(&(qpw)->work, (func)) > > + > > @@ -762,6 +762,41 @@ config CPU_ISOLATION > > > > Say Y if unsure. > > > > +config QPW > > + bool "Queue per-CPU Work" > > + depends on SMP || COMPILE_TEST > > + default n > > + help > > + Allow changing the behavior on per-CPU resource sharing with cache, > > + from the regular local_locks() + queue_work_on(remote_cpu) to using > > + per-CPU spinlocks on both local and remote operations. > > + > > + This is useful to give user the option on reducing IPIs to CPUs, and > > + thus reduce interruptions and context switches. On the other hand, it > > + increases generated code and will use atomic operations if spinlocks > > + are selected. > > + > > + If set, will use the default behavior set in QPW_DEFAULT unless boot > > + parameter qpw is passed with a different behavior. > > + > > + If unset, will use the local_lock() + queue_work_on() strategy, > > + regardless of the boot parameter or QPW_DEFAULT. > > + > > + Say N if unsure. > > Perhaps that too should just be selected automatically by CONFIG_NO_HZ_FULL and if > the need arise in the future, make it visible to the user? > I think it would be good to have this, and let whoever is building have the chance to disable QPW if it doesn't work well for their machines or workload, without having to add a new boot parameter to continue have their stuff working as always after a kernel update. But that is open to discussion :) Thanks! Leo