public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Uladzislau Rezki <urezki@gmail.com>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	RCU <rcu@vger.kernel.org>,
	Neeraj upadhyay <Neeraj.Upadhyay@amd.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Hillf Danton <hdanton@sina.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>,
	Frederic Weisbecker <frederic@kernel.org>
Subject: Re: [PATCH v5 2/4] rcu: Reduce synchronize_rcu() latency
Date: Wed, 28 Feb 2024 10:28:11 +0100	[thread overview]
Message-ID: <Zd78qzAGeriKUxBi@pc636> (raw)
In-Reply-To: <c8182a5a-e804-4fcc-a6a5-bb121260e6a6@joelfernandes.org>

On Tue, Feb 27, 2024 at 03:51:03PM -0500, Joel Fernandes wrote:
> 
> 
> On 2/20/2024 1:31 PM, Uladzislau Rezki (Sony) wrote:
> > A call to a synchronize_rcu() can be optimized from a latency
> > point of view. Workloads which depend on this can benefit of it.
> > 
> > The delay of wakeme_after_rcu() callback, which unblocks a waiter,
> > depends on several factors:
> > 
> > - how fast a process of offloading is started. Combination of:
> >     - !CONFIG_RCU_NOCB_CPU/CONFIG_RCU_NOCB_CPU;
> >     - !CONFIG_RCU_LAZY/CONFIG_RCU_LAZY;
> >     - other.
> > - when started, invoking path is interrupted due to:
> >     - time limit;
> >     - need_resched();
> >     - if limit is reached.
> > - where in a nocb list it is located;
> > - how fast previous callbacks completed;
> > 
> > Example:
> > 
> > 1. On our embedded devices i can easily trigger the scenario when
> > it is a last in the list out of ~3600 callbacks:
> > 
> > <snip>
> >   <...>-29      [001] d..1. 21950.145313: rcu_batch_start: rcu_preempt CBs=3613 bl=28
> > ...
> >   <...>-29      [001] ..... 21950.152578: rcu_invoke_callback: rcu_preempt rhp=00000000b2d6dee8 func=__free_vm_area_struct.cfi_jt
> >   <...>-29      [001] ..... 21950.152579: rcu_invoke_callback: rcu_preempt rhp=00000000a446f607 func=__free_vm_area_struct.cfi_jt
> >   <...>-29      [001] ..... 21950.152580: rcu_invoke_callback: rcu_preempt rhp=00000000a5cab03b func=__free_vm_area_struct.cfi_jt
> >   <...>-29      [001] ..... 21950.152581: rcu_invoke_callback: rcu_preempt rhp=0000000013b7e5ee func=__free_vm_area_struct.cfi_jt
> >   <...>-29      [001] ..... 21950.152582: rcu_invoke_callback: rcu_preempt rhp=000000000a8ca6f9 func=__free_vm_area_struct.cfi_jt
> >   <...>-29      [001] ..... 21950.152583: rcu_invoke_callback: rcu_preempt rhp=000000008f162ca8 func=wakeme_after_rcu.cfi_jt
> >   <...>-29      [001] d..1. 21950.152625: rcu_batch_end: rcu_preempt CBs-invoked=3612 idle=....
> > <snip>
> > 
> > 2. We use cpuset/cgroup to classify tasks and assign them into
> > different cgroups. For example "backgrond" group which binds tasks
> > only to little CPUs or "foreground" which makes use of all CPUs.
> > Tasks can be migrated between groups by a request if an acceleration
> > is needed.
> > 
> > See below an example how "surfaceflinger" task gets migrated.
> > Initially it is located in the "system-background" cgroup which
> > allows to run only on little cores. In order to speed it up it
> > can be temporary moved into "foreground" cgroup which allows
> > to use big/all CPUs:
> > 
> > cgroup_attach_task():
> >  -> cgroup_migrate_execute()
> >    -> cpuset_can_attach()
> >      -> percpu_down_write()
> >        -> rcu_sync_enter()
> >          -> synchronize_rcu()
> 
> We should do this patch but I wonder also if cgroup_attach_task() usage of
> synchronize_rcu() should actually be using the _expedited() variant (via some
> possible flag to the percpu rwsem / rcu_sync).
> 
> If the user assumes it a slow path, then usage of _expedited() should probably
> be OK. If it is assumed a fast path, then it is probably hurting latency anyway
> without the enablement of this patch's rcu_normal_wake_from_gp.
> 
> Thoughts?
> 
How i see it, the rcu_normal_wake_from_gp is disabled so far. We need to
work on this further to have it on by default. But we will move toward
this.


> Then it becomes a matter of how to plumb the expeditedness down the stack.
> 
> Also, speaking of percpu rwsem, I noticed that percpu refcounts don't use
> rcu_sync. I haven't looked closely why, but something I hope to get time to look
> into is if it can be converted over and what benefits would that entail if any.
> 
> Also will continue reviewing the patch. Thanks.
> 
Thanks.

--
Uladzislau Rezki

  reply	other threads:[~2024-02-28  9:28 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-20 18:31 [PATCH v5 0/4] Reduce synchronize_rcu() latency(v5) Uladzislau Rezki (Sony)
2024-02-20 18:31 ` [PATCH v5 1/4] rcu: Add data structures for synchronize_rcu() Uladzislau Rezki (Sony)
2024-02-20 18:31 ` [PATCH v5 2/4] rcu: Reduce synchronize_rcu() latency Uladzislau Rezki (Sony)
2024-02-26 23:07   ` Frederic Weisbecker
2024-02-27  6:39     ` Z qiang
2024-02-27 14:37       ` Frederic Weisbecker
2024-02-27 16:16         ` Frederic Weisbecker
2024-02-27 19:35     ` Uladzislau Rezki
2024-02-28 18:04     ` Uladzislau Rezki
2024-03-04 11:55       ` Frederic Weisbecker
2024-03-04 16:23         ` Uladzislau Rezki
2024-03-04 20:07           ` Paul E. McKenney
2024-03-05  9:35             ` Uladzislau Rezki
2024-03-04 22:56           ` Frederic Weisbecker
2024-03-05  9:38             ` Uladzislau Rezki
2024-03-05 11:36               ` Frederic Weisbecker
2024-02-27 16:15   ` Frederic Weisbecker
2024-02-27 17:03   ` Frederic Weisbecker
2024-02-27 20:51   ` Joel Fernandes
2024-02-28  9:28     ` Uladzislau Rezki [this message]
     [not found]   ` <4b932245-2825-4e53-87a4-44d2892e7c13@joelfernandes.org>
2024-02-27 22:50     ` Joel Fernandes
2024-02-27 22:53       ` Joel Fernandes
2024-02-28 14:32   ` Joel Fernandes
2024-02-28 16:44     ` Joel Fernandes
2024-02-20 18:31 ` [PATCH v5 3/4] rcu: Add a trace event for synchronize_rcu_normal() Uladzislau Rezki (Sony)
2024-02-20 18:31 ` [PATCH v5 4/4] rcu: Support direct wake-up of synchronize_rcu() users Uladzislau Rezki (Sony)
2024-02-21  1:53 ` [PATCH v5 0/4] Reduce synchronize_rcu() latency(v5) Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zd78qzAGeriKUxBi@pc636 \
    --to=urezki@gmail.com \
    --cc=Neeraj.Upadhyay@amd.com \
    --cc=boqun.feng@gmail.com \
    --cc=frederic@kernel.org \
    --cc=hdanton@sina.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleksiy.avramchenko@sony.com \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox