From: Leonardo Bras <leobras@redhat.com>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: Leonardo Bras <leobras@redhat.com>,
Vlastimil Babka <vbabka@suse.cz>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@linux.com>,
Pekka Enberg <penberg@kernel.org>,
David Rientjes <rientjes@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Marcelo Tosatti <mtosatti@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
Waiman Long <longman@redhat.com>,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [RFC PATCH v1 0/4] Introduce QPW for per-cpu operations
Date: Mon, 24 Jun 2024 23:57:57 -0300 [thread overview]
Message-ID: <ZnoyNQLQdyAcMxjP@LeoBras> (raw)
In-Reply-To: <Znn5FgqoCAUAfQhu@boqun-archlinux>
On Mon, Jun 24, 2024 at 03:54:14PM -0700, Boqun Feng wrote:
> On Mon, Jun 24, 2024 at 09:31:51AM +0200, Vlastimil Babka wrote:
> > Hi,
> >
> > you've included tglx, which is great, but there's also LOCKING PRIMITIVES
> > section in MAINTAINERS so I've added folks from there in my reply.
>
> Thanks!
>
> > Link to full series:
> > https://lore.kernel.org/all/20240622035815.569665-1-leobras@redhat.com/
> >
>
> And apologies to Leonardo... I think this is a follow-up of:
>
> https://lpc.events/event/17/contributions/1484/
>
> and I did remember we had a quick chat after that which I suggested it's
> better to change to a different name, sorry that I never found time to
> write a proper rely to your previous seriese [1] as promised.
>
> [1]: https://lore.kernel.org/lkml/20230729083737.38699-2-leobras@redhat.com/
That's correct, I commented about this in the end of above presentation.
Don't worry, and thanks for suggesting the per-cpu naming, it was very
helpful on designing this solution.
>
> > On 6/22/24 5:58 AM, Leonardo Bras wrote:
> > > The problem:
> > > Some places in the kernel implement a parallel programming strategy
> > > consisting on local_locks() for most of the work, and some rare remote
> > > operations are scheduled on target cpu. This keeps cache bouncing low since
> > > cacheline tends to be mostly local, and avoids the cost of locks in non-RT
> > > kernels, even though the very few remote operations will be expensive due
> > > to scheduling overhead.
> > >
> > > On the other hand, for RT workloads this can represent a problem: getting
> > > an important workload scheduled out to deal with remote requests is
> > > sure to introduce unexpected deadline misses.
> > >
> > > The idea:
> > > Currently with PREEMPT_RT=y, local_locks() become per-cpu spinlocks.
> > > In this case, instead of scheduling work on a remote cpu, it should
> > > be safe to grab that remote cpu's per-cpu spinlock and run the required
> > > work locally. Tha major cost, which is un/locking in every local function,
> > > already happens in PREEMPT_RT.
> >
> > I've also noticed this a while ago (likely in the context of rewriting SLUB
> > to use local_lock) and asked about it on IRC, and IIRC tglx wasn't fond of
> > the idea. But I forgot the details about why, so I'll let the the locking
> > experts reply...
> >
>
> I think it's a good idea, especially the new name is less confusing ;-)
> So I wonder Thomas' thoughts as well.
Thanks!
>
> And I think a few (micro-)benchmark numbers will help.
Last year I got some numbers on how replacing local_locks with
spinlocks would impact memcontrol.c cache operations:
https://lore.kernel.org/all/20230125073502.743446-1-leobras@redhat.com/
tl;dr: It increased clocks spent in the most common this_cpu operations,
while reducing clocks spent in remote operations (drain_all_stock).
In RT case, since local locks are already spinlocks, this cost is
already paid, so we can get results like these:
drain_all_stock
cpus Upstream Patched Diff (cycles) Diff(%)
1 44331.10831 38978.03581 -5353.072507 -12.07520567
8 43992.96512 39026.76654 -4966.198572 -11.2886198
128 156274.6634 58053.87421 -98220.78915 -62.85138425
Upstream: Clocks to schedule work on remote CPU (performing not accounted)
Patched: Clocks to grab remote cpu's spinlock and perform the needed work
locally.
Do you have other suggestions to use as (micro-) benchmarking?
Thanks!
Leo
>
> Regards,
> Boqun
>
> > > Also, there is no need to worry about extra cache bouncing:
> > > The cacheline invalidation already happens due to schedule_work_on().
> > >
> > > This will avoid schedule_work_on(), and thus avoid scheduling-out an
> > > RT workload.
> > >
> > > For patches 2, 3 & 4, I noticed just grabing the lock and executing
> > > the function locally is much faster than just scheduling it on a
> > > remote cpu.
> > >
> > > Proposed solution:
> > > A new interface called Queue PerCPU Work (QPW), which should replace
> > > Work Queue in the above mentioned use case.
> > >
> > > If PREEMPT_RT=n, this interfaces just wraps the current
> > > local_locks + WorkQueue behavior, so no expected change in runtime.
> > >
> > > If PREEMPT_RT=y, queue_percpu_work_on(cpu,...) will lock that cpu's
> > > per-cpu structure and perform work on it locally. This is possible
> > > because on functions that can be used for performing remote work on
> > > remote per-cpu structures, the local_lock (which is already
> > > a this_cpu spinlock()), will be replaced by a qpw_spinlock(), which
> > > is able to get the per_cpu spinlock() for the cpu passed as parameter.
> > >
> > > Patch 1 implements QPW interface, and patches 2, 3 & 4 replaces the
> > > current local_lock + WorkQueue interface by the QPW interface in
> > > swap, memcontrol & slub interface.
> > >
> > > Please let me know what you think on that, and please suggest
> > > improvements.
> > >
> > > Thanks a lot!
> > > Leo
> > >
> > > Leonardo Bras (4):
> > > Introducing qpw_lock() and per-cpu queue & flush work
> > > swap: apply new queue_percpu_work_on() interface
> > > memcontrol: apply new queue_percpu_work_on() interface
> > > slub: apply new queue_percpu_work_on() interface
> > >
> > > include/linux/qpw.h | 88 +++++++++++++++++++++++++++++++++++++++++++++
> > > mm/memcontrol.c | 20 ++++++-----
> > > mm/slub.c | 26 ++++++++------
> > > mm/swap.c | 26 +++++++-------
> > > 4 files changed, 127 insertions(+), 33 deletions(-)
> > > create mode 100644 include/linux/qpw.h
> > >
> > >
> > > base-commit: 50736169ecc8387247fe6a00932852ce7b057083
> >
>
next prev parent reply other threads:[~2024-06-25 2:58 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-22 3:58 [RFC PATCH v1 0/4] Introduce QPW for per-cpu operations Leonardo Bras
2024-06-22 3:58 ` [RFC PATCH v1 1/4] Introducing qpw_lock() and per-cpu queue & flush work Leonardo Bras
2024-09-04 21:39 ` Waiman Long
2024-09-05 0:08 ` Waiman Long
2024-09-11 7:18 ` Leonardo Bras
2024-09-11 7:17 ` Leonardo Bras
2024-09-11 13:39 ` Waiman Long
2024-06-22 3:58 ` [RFC PATCH v1 2/4] swap: apply new queue_percpu_work_on() interface Leonardo Bras
2024-06-22 3:58 ` [RFC PATCH v1 3/4] memcontrol: " Leonardo Bras
2024-06-22 3:58 ` [RFC PATCH v1 4/4] slub: " Leonardo Bras
2024-06-24 7:31 ` [RFC PATCH v1 0/4] Introduce QPW for per-cpu operations Vlastimil Babka
2024-06-24 22:54 ` Boqun Feng
2024-06-25 2:57 ` Leonardo Bras [this message]
2024-06-25 17:51 ` Boqun Feng
2024-06-26 16:40 ` Leonardo Bras
2024-06-28 18:47 ` Marcelo Tosatti
2024-06-25 2:36 ` Leonardo Bras
2024-07-15 18:38 ` Marcelo Tosatti
2024-07-23 17:14 ` Marcelo Tosatti
2024-09-05 22:19 ` Hillf Danton
2024-09-11 3:04 ` Marcelo Tosatti
2024-09-15 0:30 ` Hillf Danton
2024-09-11 6:42 ` Leonardo Bras
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZnoyNQLQdyAcMxjP@LeoBras \
--to=leobras@redhat.com \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=boqun.feng@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=cl@linux.com \
--cc=hannes@cmpxchg.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=mtosatti@redhat.com \
--cc=muchun.song@linux.dev \
--cc=penberg@kernel.org \
--cc=peterz@infradead.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.