All of lore.kernel.org
 help / color / mirror / Atom feed
From: Uladzislau Rezki <urezki@gmail.com>
To: Kalesh Singh <kaleshsingh@google.com>
Cc: Uladzislau Rezki <urezki@gmail.com>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	RCU <rcu@vger.kernel.org>,
	Neeraj upadhyay <Neeraj.Upadhyay@amd.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Hillf Danton <hdanton@sina.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>,
	Frederic Weisbecker <frederic@kernel.org>
Subject: Re: [PATCH v4 1/4] rcu: Reduce synchronize_rcu() latency
Date: Thu, 11 Jan 2024 18:35:30 +0100	[thread overview]
Message-ID: <ZaAm4mVEf8bXnLEp@pc636> (raw)
In-Reply-To: <CAC_TJvfe52GRrjHMfvAorC+PtTBRK0hxL9OPoi0SMqDV3XkFZw@mail.gmail.com>

Hello, Kalesh!
> >
> > >
> > > Hi Uladzislau,
> > >
> > > I've tried your patches (v3) on Android with 6.1.43 kernel.
> > >
> > > The test cycles 10 apps (including camera) sequentially for 100
> > > iterations.
> > >
> > > I've set rcu_normal to override the rcu_expedited in the boot
> > > parameters:
> > >
> > > adb shell cat /proc/cmdline | tr ' ' '\n' | grep rcu
> > >
> > > rcupdate.rcu_normal=1
> > > rcupdate.rcu_expedited=1
> > > rcu_nocbs=0-7
> > >
> > >
> > > The configurations are:
> > >
> > > A - echo 0 >/sys/module/rcutree/parameters/rcu_normal_wake_from_gp
> > > B - echo 1 >/sys/module/rcutree/parameters/rcu_normal_wake_from_gp
> > >
> > > Results:
> > >
> > > = APP LAUNCH TIME =
> > >                                 delta (B-A)    ratio(%)
> > >    overall_app_launch_time(ms)    -11399.00       -6.65
> > >
> > >
> > > == camera_launch_time
> > >       type  delta(B-A %)  A_count  B_count
> > >        HOT         -7.05       99       99
> > >       COLD         -6.33        1        1
> > >
> > >
> 
> Hi Uladzislau,
> 
> > If i interpret it correctly you also see that this series reduces
> > a launch time by 6/7% on your app set. Is that correct?
> 
> Yes your understanding is correct.
> 
> >
> > > === Function Latencies ===
> > >
> > > Tracing synchronize_rcu_expedited.  Hit Ctrl-C to exit                              Tracing synchronize_rcu_expedited.  Hit Ctrl-C to exit
> > >
> > >      nsec                : count    distribution                                         nsec                : count    distribution
> > >          0 -> 1          : 0        |                                        |               0 -> 1          : 0        |                                        |
> > >          2 -> 3          : 0        |                                        |               2 -> 3          : 0        |                                        |
> > >          4 -> 7          : 0        |                                        |               4 -> 7          : 0        |                                        |
> > >          8 -> 15         : 0        |                                        |               8 -> 15         : 0        |                                        |
> > >         16 -> 31         : 0        |                                        |              16 -> 31         : 0        |                                        |
> > >         32 -> 63         : 0        |                                        |              32 -> 63         : 0        |                                        |
> > >         64 -> 127        : 0        |                                        |              64 -> 127        : 0        |                                        |
> > >        128 -> 255        : 0        |                                        |             128 -> 255        : 0        |                                        |
> > >        256 -> 511        : 0        |                                        |             256 -> 511        : 0        |                                        |
> > >        512 -> 1023       : 0        |                                        |             512 -> 1023       : 0        |                                        |
> > >       1024 -> 2047       : 0        |                                        |            1024 -> 2047       : 0        |                                        |
> > >       2048 -> 4095       : 0        |                                        |            2048 -> 4095       : 0        |                                        |
> > >       4096 -> 8191       : 0        |                                        |            4096 -> 8191       : 0        |                                        |
> > >       8192 -> 16383      : 0        |                                        |            8192 -> 16383      : 0        |                                        |
> > >      16384 -> 32767      : 0        |                                        |           16384 -> 32767      : 0        |                                        |
> > >      32768 -> 65535      : 0        |                                        |           32768 -> 65535      : 0        |                                        |
> > >      65536 -> 131071     : 0        |                                        |           65536 -> 131071     : 0        |                                        |
> > >     131072 -> 262143     : 0        |                                        |          131072 -> 262143     : 0        |                                        |
> > >     262144 -> 524287     : 0        |                                        |          262144 -> 524287     : 0        |                                        |
> > >     524288 -> 1048575    : 0        |                                        |          524288 -> 1048575    : 0        |                                        |
> > >    1048576 -> 2097151    : 0        |                                        |         1048576 -> 2097151    : 0        |                                        |
> > >    2097152 -> 4194303    : 0        |                                        |         2097152 -> 4194303    : 0        |                                        |
> > >    4194304 -> 8388607    : 871      |**                                      |         4194304 -> 8388607    : 1180     |****                                    |
> > >    8388608 -> 16777215   : 3204     |********                                |         8388608 -> 16777215   : 7020     |*************************               |
> > >   16777216 -> 33554431   : 15013    |****************************************|        16777216 -> 33554431   : 10952    |****************************************|
> > > Exiting trace of synchronize_rcu_expedited                                          Exiting trace of synchronize_rcu_expedited
> > >
> > >
> > > Tracing synchronize_rcu.  Hit Ctrl-C to exit                                        Tracing synchronize_rcu.  Hit Ctrl-C to exit
> > >
> > >      nsec                : count    distribution                                         nsec                : count    distribution
> > >          0 -> 1          : 0        |                                        |               0 -> 1          : 0        |                                        |
> > >          2 -> 3          : 0        |                                        |               2 -> 3          : 0        |                                        |
> > >          4 -> 7          : 0        |                                        |               4 -> 7          : 0        |                                        |
> > >          8 -> 15         : 0        |                                        |               8 -> 15         : 0        |                                        |
> > >         16 -> 31         : 0        |                                        |              16 -> 31         : 0        |                                        |
> > >         32 -> 63         : 0        |                                        |              32 -> 63         : 0        |                                        |
> > >         64 -> 127        : 0        |                                        |              64 -> 127        : 0        |                                        |
> > >        128 -> 255        : 0        |                                        |             128 -> 255        : 0        |                                        |
> > >        256 -> 511        : 0        |                                        |             256 -> 511        : 0        |                                        |
> > >        512 -> 1023       : 0        |                                        |             512 -> 1023       : 0        |                                        |
> > >       1024 -> 2047       : 0        |                                        |            1024 -> 2047       : 0        |                                        |
> > >       2048 -> 4095       : 0        |                                        |            2048 -> 4095       : 0        |                                        |
> > >       4096 -> 8191       : 0        |                                        |            4096 -> 8191       : 0        |                                        |
> > >       8192 -> 16383      : 0        |                                        |            8192 -> 16383      : 0        |                                        |
> > >      16384 -> 32767      : 0        |                                        |           16384 -> 32767      : 0        |                                        |
> > >      32768 -> 65535      : 0        |                                        |           32768 -> 65535      : 0        |                                        |
> > >      65536 -> 131071     : 0        |                                        |           65536 -> 131071     : 0        |                                        |
> > >     131072 -> 262143     : 0        |                                        |          131072 -> 262143     : 0        |                                        |
> > >     262144 -> 524287     : 0        |                                        |          262144 -> 524287     : 0        |                                        |
> > >     524288 -> 1048575    : 0        |                                        |          524288 -> 1048575    : 0        |                                        |
> > >    1048576 -> 2097151    : 0        |                                        |         1048576 -> 2097151    : 0        |                                        |
> > >    2097152 -> 4194303    : 0        |                                        |         2097152 -> 4194303    : 0        |                                        |
> > >    4194304 -> 8388607    : 861      |**                                      |         4194304 -> 8388607    : 1136     |****                                    |
> > >    8388608 -> 16777215   : 3111     |********                                |         8388608 -> 16777215   : 6320     |************************                |
> > >   16777216 -> 33554431   : 13901    |****************************************|        16777216 -> 33554431   : 10484    |****************************************|
> > > Exiting trace of synchronize_rcu                                                    Exiting trace of synchronize_rcu
> > >
> > Who is B and who is A?
> 
> Left is A (rcu_normal_wake_from_gp=0)  and right is B
> (rcu_normal_wake_from_gp=1)
> >
> > >
> > > Interestingly I tried the same experiment without rcu_normal=1 (leaving rcu_expedited=1):
> > >
> > > adb shell cat /proc/cmdline | tr ' ' '\n' | grep rcu
> > > rcupdate.rcu_expedited=1
> > > rcu_nocbs=0-7
> > >
> > > In this case I also saw the -6 to -7% decrease in the app launch times
> > > but I don't have a good explanation why that would be? (The fucntion
> > > latency histograms in this case didn't show any significant difference).
> > > Do you have any insight why this may happen?
> > >
> > When rcu_expedited=1 is set and rcu_normal=0 is disabled. The
> > synchronize_rcu() call is converted into synchronize_rcu_expidited():
> >
> > <snip>
> > void synchronize_rcu(void)
> > {
> >         unsigned long flags;
> >         struct rcu_node *rnp;
> >
> >         RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
> >                          lock_is_held(&rcu_lock_map) ||
> >                          lock_is_held(&rcu_sched_lock_map),
> >                          "Illegal synchronize_rcu() in RCU read-side critical section");
> >         if (!rcu_blocking_is_gp()) {
> >                 if (rcu_gp_is_expedited())
> >                         synchronize_rcu_expedited();
> >                 else
> >                         synchronize_rcu_normal();
> >                 return;
> >         }
> > ...
> > <snip>
> >
> > rcu_gp_is_expidited() is true, so invoke "expedited" version.
> >
> > I see some concerns in preferring an expedited version as a global
> > replacement. First of all it is related to latency sensitive workloads
> > because in order to expedite a grace period it sends out IPIs on all
> > online CPUs to force them to report a quiescent-state asap. I have not
> > investigated yet how it affects such workloads.
> >
> > Therefore, in your case, you also see a performance boost of your app sets.
> 
> IIUC the patch shouldn't affect the case? The only difference in A vs
> B is rcu_normal_wake_from_gp (both have rcu_expedited=1).
> 
Right. This patch does not touch "expedited" version at all.

Appreciate for test results and looking at!

--
Uladzislau Rezki

  reply	other threads:[~2024-01-11 17:35 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-04 16:25 [PATCH v4 0/4] Reduce synchronize_rcu() latency(v4) Uladzislau Rezki (Sony)
2024-01-04 16:25 ` [PATCH v4 1/4] rcu: Reduce synchronize_rcu() latency Uladzislau Rezki (Sony)
2024-01-09 19:16   ` Kalesh Singh
2024-01-10  9:21     ` Uladzislau Rezki
2024-01-11 16:37       ` Kalesh Singh
2024-01-11 17:35         ` Uladzislau Rezki [this message]
2024-01-12 23:09   ` Frederic Weisbecker
2024-01-18 10:37     ` Uladzislau Rezki
2024-01-16 16:18   ` Paul E. McKenney
2024-01-17 12:26     ` Uladzislau Rezki
2024-01-19 15:24       ` Paul E. McKenney
2024-01-22 17:35         ` Uladzislau Rezki
2024-01-23 11:21           ` Paul E. McKenney
2024-01-04 16:25 ` [PATCH v4 2/4] rcu: Add a trace event for synchronize_rcu_normal() Uladzislau Rezki (Sony)
2024-01-12 23:20   ` Frederic Weisbecker
2024-01-15 12:14     ` Uladzislau Rezki
2024-01-04 16:25 ` [PATCH v4 3/4] rcu: Improve handling of synchronize_rcu() users Uladzislau Rezki (Sony)
2024-01-16 16:32   ` Paul E. McKenney
2024-01-04 16:25 ` [PATCH v4 4/4] rcu: Support direct wake-up " Uladzislau Rezki (Sony)
2024-01-13  9:19   ` Z qiang
2024-01-15 10:46     ` Uladzislau Rezki
2024-01-15 10:57       ` Uladzislau Rezki
2024-01-16  6:19         ` Z qiang
2024-01-27  7:07 ` [PATCH v4 0/4] Reduce synchronize_rcu() latency(v4) Paul E. McKenney
2024-01-29 16:23   ` Uladzislau Rezki
2024-01-29 19:43     ` Paul E. McKenney
2024-01-29 20:36       ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZaAm4mVEf8bXnLEp@pc636 \
    --to=urezki@gmail.com \
    --cc=Neeraj.Upadhyay@amd.com \
    --cc=boqun.feng@gmail.com \
    --cc=frederic@kernel.org \
    --cc=hdanton@sina.com \
    --cc=joel@joelfernandes.org \
    --cc=kaleshsingh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleksiy.avramchenko@sony.com \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.