linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/4] Introduce QPW for per-cpu operations
@ 2024-06-22  3:58 Leonardo Bras
  2024-06-22  3:58 ` [RFC PATCH v1 1/4] Introducing qpw_lock() and per-cpu queue & flush work Leonardo Bras
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Leonardo Bras @ 2024-06-22  3:58 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Hyeonggon Yoo,
	Leonardo Bras, Thomas Gleixner, Marcelo Tosatti
  Cc: linux-kernel, cgroups, linux-mm

The problem:
Some places in the kernel implement a parallel programming strategy
consisting on local_locks() for most of the work, and some rare remote
operations are scheduled on target cpu. This keeps cache bouncing low since
cacheline tends to be mostly local, and avoids the cost of locks in non-RT
kernels, even though the very few remote operations will be expensive due
to scheduling overhead.

On the other hand, for RT workloads this can represent a problem: getting
an important workload scheduled out to deal with remote requests is
sure to introduce unexpected deadline misses.

The idea:
Currently with PREEMPT_RT=y, local_locks() become per-cpu spinlocks.
In this case, instead of scheduling work on a remote cpu, it should
be safe to grab that remote cpu's per-cpu spinlock and run the required
work locally. Tha major cost, which is un/locking in every local function,
already happens in PREEMPT_RT.

Also, there is no need to worry about extra cache bouncing:
The cacheline invalidation already happens due to schedule_work_on().

This will avoid schedule_work_on(), and thus avoid scheduling-out an 
RT workload. 

For patches 2, 3 & 4, I noticed just grabing the lock and executing
the function locally is much faster than just scheduling it on a
remote cpu.

Proposed solution:
A new interface called Queue PerCPU Work (QPW), which should replace
Work Queue in the above mentioned use case. 

If PREEMPT_RT=n, this interfaces just wraps the current 
local_locks + WorkQueue behavior, so no expected change in runtime.

If PREEMPT_RT=y, queue_percpu_work_on(cpu,...) will lock that cpu's
per-cpu structure and perform work on it locally. This is possible
because on functions that can be used for performing remote work on
remote per-cpu structures, the local_lock (which is already
a this_cpu spinlock()), will be replaced by a qpw_spinlock(), which
is able to get the per_cpu spinlock() for the cpu passed as parameter.

Patch 1 implements QPW interface, and patches 2, 3 & 4 replaces the
current local_lock + WorkQueue interface by the QPW interface in
swap, memcontrol & slub interface.

Please let me know what you think on that, and please suggest
improvements.

Thanks a lot!
Leo

Leonardo Bras (4):
  Introducing qpw_lock() and per-cpu queue & flush work
  swap: apply new queue_percpu_work_on() interface
  memcontrol: apply new queue_percpu_work_on() interface
  slub: apply new queue_percpu_work_on() interface

 include/linux/qpw.h | 88 +++++++++++++++++++++++++++++++++++++++++++++
 mm/memcontrol.c     | 20 ++++++-----
 mm/slub.c           | 26 ++++++++------
 mm/swap.c           | 26 +++++++-------
 4 files changed, 127 insertions(+), 33 deletions(-)
 create mode 100644 include/linux/qpw.h


base-commit: 50736169ecc8387247fe6a00932852ce7b057083
-- 
2.45.2



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-09-15  0:31 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-22  3:58 [RFC PATCH v1 0/4] Introduce QPW for per-cpu operations Leonardo Bras
2024-06-22  3:58 ` [RFC PATCH v1 1/4] Introducing qpw_lock() and per-cpu queue & flush work Leonardo Bras
2024-09-04 21:39   ` Waiman Long
2024-09-05  0:08     ` Waiman Long
2024-09-11  7:18       ` Leonardo Bras
2024-09-11  7:17     ` Leonardo Bras
2024-09-11 13:39       ` Waiman Long
2024-06-22  3:58 ` [RFC PATCH v1 2/4] swap: apply new queue_percpu_work_on() interface Leonardo Bras
2024-06-22  3:58 ` [RFC PATCH v1 3/4] memcontrol: " Leonardo Bras
2024-06-22  3:58 ` [RFC PATCH v1 4/4] slub: " Leonardo Bras
2024-06-24  7:31 ` [RFC PATCH v1 0/4] Introduce QPW for per-cpu operations Vlastimil Babka
2024-06-24 22:54   ` Boqun Feng
2024-06-25  2:57     ` Leonardo Bras
2024-06-25 17:51       ` Boqun Feng
2024-06-26 16:40         ` Leonardo Bras
2024-06-28 18:47       ` Marcelo Tosatti
2024-06-25  2:36   ` Leonardo Bras
2024-07-15 18:38   ` Marcelo Tosatti
2024-07-23 17:14 ` Marcelo Tosatti
2024-09-05 22:19   ` Hillf Danton
2024-09-11  3:04     ` Marcelo Tosatti
2024-09-15  0:30       ` Hillf Danton
2024-09-11  6:42     ` Leonardo Bras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).