Re: [RFC 5/6] sched/fair: Take into account latency nice at wakeup

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Tao Zhou <tao.zhou@linux.dev>
To: Vincent Guittot <vincent.guittot@linaro.org>,
	Tao Zhou <tao.zhou@linux.dev>
Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de,
	linux-kernel@vger.kernel.org, parth@linux.ibm.com,
	qais.yousef@arm.com, chris.hyser@oracle.com, vschneid@redhat.com,
	patrick.bellasi@matbug.net, David.Laight@aculab.com,
	pjt@google.com, pavel@ucw.cz, tj@kernel.org, qperret@google.com,
	tim.c.chen@linux.intel.com
Subject: Re: [RFC 5/6] sched/fair: Take into account latency nice at wakeup
Date: Mon, 2 May 2022 23:26:14 +0800	[thread overview]
Message-ID: <Ym/4Fs9GEctSNFM3@geo.homenetwork> (raw)
In-Reply-To: <Ym/z39Qk/QDJckKE@geo.homenetwork>

On Mon, May 02, 2022 at 11:08:15PM +0800, Tao Zhou wrote:
> On Mon, May 02, 2022 at 02:30:55PM +0200, Vincent Guittot wrote:
> 
> > On Mon, 2 May 2022 at 11:54, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> > >
> > > Hi Tao,
> > >
> > > On Sun, 1 May 2022 at 17:58, Tao Zhou <tao.zhou@linux.dev> wrote:
> > > >
> > > > Hi Vincent,
> > > >
> > > > Change to Valentin Schneider's now using mail address.
> > >
> > > Thanks
> > >
> > > >
> > > > On Fri, Mar 11, 2022 at 05:14:05PM +0100, Vincent Guittot wrote:
> > > >
> > > > > Take into account the nice latency priority of a thread when deciding to
> > > > > preempt the current running thread. We don't want to provide more CPU
> > > > > bandwidth to a thread but reorder the scheduling to run latency sensitive
> > > > > task first whenever possible.
> > > > >
> > > > > As long as a thread didn't use its bandwidth, it will be able to preempt
> > > > > the current thread.
> > > > >
> > > > > At the opposite, a thread with a low latency priority will preempt current
> > > > > thread at wakeup only to keep fair CPU bandwidth sharing. Otherwise it will
> > > > > wait for the tick to get its sched slice.
> > > > >
> > > > >                                    curr vruntime
> > > > >                                        |
> > > > >                       sysctl_sched_wakeup_granularity
> > > > >                                    <-->
> > > > > ----------------------------------|----|-----------------------|---------------
> > > > >                                   |    |<--------------------->
> > > > >                                   |    .  sysctl_sched_latency
> > > > >                                   |    .
> > > > > default/current latency entity    |    .
> > > > >                                   |    .
> > > > > 1111111111111111111111111111111111|0000|-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-
> > > > > se preempts curr at wakeup ------>|<- se doesn't preempt curr -----------------
> > > > >                                   |    .
> > > > >                                   |    .
> > > > >                                   |    .
> > > > > low latency entity                |    .
> > > > >                                    ---------------------->|
> > > > >                                % of sysctl_sched_latency  |
> > > > > 1111111111111111111111111111111111111111111111111111111111|0000|-1-1-1-1-1-1-1-
> > > > > preempt ------------------------------------------------->|<- do not preempt --
> > > > >                                   |    .
> > > > >                                   |    .
> > > > >                                   |    .
> > > > > high latency entity               |    .
> > > > >          |<-----------------------|    .
> > > > >          | % of sysctl_sched_latency   .
> > > > > 111111111|0000|-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1
> > > > > preempt->|<- se doesn't preempt curr ------------------------------------------
> > > > >
> > > > > Tests results of nice latency impact on heavy load like hackbench:
> > > > >
> > > > > hackbench -l (2560 / group) -g group
> > > > > group        latency 0             latency 19
> > > > > 1            1.450(+/- 0.60%)      1.376(+/- 0.54%) + 5%
> > > > > 4            1.537(+/- 1.75%)      1.335(+/- 1.81%) +13%
> > > > > 8            1.414(+/- 1.91%)      1.348(+/- 1.88%) + 5%
> > > > > 16           1.423(+/- 1.65%)      1.374(+/- 0.58%) + 3%
> > > > >
> > > > > hackbench -p -l (2560 / group) -g group
> > > > > group
> > > > > 1            1.416(+/- 3.45%)      0.886(+/- 0.54%) +37%
> > > > > 4            1.634(+/- 7.14%)      0.888(+/- 5.40%) +45%
> > > > > 8            1.449(+/- 2.14%)      0.804(+/- 4.55%) +44%
> > > > > 16           0.917(+/- 4.12%)      0.777(+/- 1.41%) +15%
> > > > >
> > > > > By deacreasing the latency prio, we reduce the number of preemption at
> > > >
> > > > s/deacreasing/decreasing/
> > >
> > > yes
> > >
> > > > s/reduce/increase/
> > >
> > > not in the case of hackbench tests above. By decreasing the latency
> > > prio of all hackbench threads, we make sure that they will not preempt
> > > the current thread and let it move forward so we reduce the number of
> > > preemption.
> > >
> > > >
> > > > > wakeup and help hackbench making progress.
> > > > >
> > > > > Test results of nice latency impact on short live load like cyclictest
> > > > > while competing with heavy load like hackbench:
> > > > >
> > > > > hackbench -l 10000 -g group &
> > > > > cyclictest --policy other -D 5 -q -n
> > > > >         latency 0           latency -20
> > > > > group   min  avg    max     min  avg    max
> > > > > 0       16    17     28      15   17     27
> > > > > 1       61   382  10603      63   89   4628
> > > > > 4       52   437  15455      54   98  16238
> > > > > 8       56   728  38499      61  125  28983
> > > > > 16      53  1215  52207      61  183  80751
> > > > >
> > > > > group = 0 means that hackbench is not running.
> > > > >
> > > > > The avg is significantly improved with nice latency -20 especially with
> > > > > large number of groups but min and max remain quite similar. If we add the
> > > > > histogram parameters to get details of latency, we have :
> > > > >
> > > > > hackbench -l 10000 -g 16 &
> > > > > cyclictest --policy other -D 5 -q -n  -H 20000 --histfile data.txt
> > > > >               latency 0    latency -20
> > > > > Min Latencies:    63           62
> > > > > Avg Latencies:  1397          218
> > > > > Max Latencies: 44926        42291
> > > > > 50% latencies:   100           98
> > > > > 75% latencies:   762          116
> > > > > 85% latencies:  1118          126
> > > > > 90% latencies:  1540          130
> > > > > 95% latencies:  5610          138
> > > > > 99% latencies: 13738          266
> > > > >
> > > > > With percentile details, we see the benefit of nice latency -20 as
> > > > > 1% of the latencies stays above 266us whereas the default latency has
> > > > > got 25% are above 762us and 10% over the 1ms.
> > > > >
> > >
> > > [..]
> > >
> > > > > +static long wakeup_latency_gran(int latency_weight)
> > > > > +{
> > > > > +     long thresh = sysctl_sched_latency;
> > > > > +
> > > > > +     if (!latency_weight)
> > > > > +             return 0;
> > > > > +
> > > > > +     if (sched_feat(GENTLE_FAIR_SLEEPERS))
> > > > > +             thresh >>= 1;
> > > > > +
> > > > > +     /*
> > > > > +      * Clamp the delta to stay in the scheduler period range
> > > > > +      * [-sysctl_sched_latency:sysctl_sched_latency]
> > > > > +      */
> > > > > +     latency_weight = clamp_t(long, latency_weight,
> > > > > +                             -1 * NICE_LATENCY_WEIGHT_MAX,
> > > > > +                             NICE_LATENCY_WEIGHT_MAX);
> > > > > +
> > > > > +     return (thresh * latency_weight) >> NICE_LATENCY_SHIFT;
> > > > > +}
> > > > > +
> > > > >  static unsigned long wakeup_gran(struct sched_entity *se)
> > > > >  {
> > > > >       unsigned long gran = sysctl_sched_wakeup_granularity;
> > > > > @@ -7008,6 +7059,10 @@ static int
> > > > >  wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
> > > > >  {
> > > > >       s64 gran, vdiff = curr->vruntime - se->vruntime;
> > > > > +     int latency_weight = se->latency_weight - curr->latency_weight;
> > > > > +
> > > > > +     latency_weight = min(latency_weight, se->latency_weight);
> > > >
> > > > Let lable A=se->latency_weight, B=curr->latency_weight, C=A-B.
> > > >
> > > > 1 A>0 B>0
> > > >     ::C=A-B<0, min(C,A)=C, latency decrease, C is the real diff value no matter
> > > >       what A is. That means it can be very 'long'(-sched_latency) and vdiff is
> > > >       more possible to be in <= 0 case and return -1.
> > > >     ::C=A-B>0, min(C,A)=A, latency increase, but it is conservative. Limit to
> > >
> > > A>0 and B>0  then min(C=A-B, A) = C
> 
> It's my mistake. And in this case the calculating of value added to vdiff
> for latency increase and deleted to vdiff for latency decrease is the same.
> 
> > >
> > > >       A/1024*sched_latency.
> > > >     When latecy is decreased, the negtive value added to vdiff is larger than the
> > > >     positive value added to vdiff when latency is increased.
> > >
> > > When the weight > 0, the tasks have some latency requirements so we
> > > use their relative priority to decide if we can preempt current which
> > > also has some latency requirement
> > >
> > > >
> > > > 2 A>0 B<0
> > > >     ::C=A-B>0 and C>A, min(C,A)=A, latency increase and it is conservative.
> > 
> > For this one we want to use delta like for UC 1 above
> > 
> > > >
> > > > 3 A<0 B<0
> > > >     ::C=A-B>0, min(C,A)=A, latency increase but min(C,A)<0, I think if latency
> > > >     increase means the value added to vdiff will be a positive value to increase
> > > >     the chance to return 1. I would say it is max(C,A)=C
> > > >     ::C=A-B<0, min(C,A)=A, latency decrease and the real negtive value.
> > >
> > > When the weight < 0, the tasks haven't latency requirement and even
> > > don't care of being scheduled "quickly". In this case, we don't care
> > > about the relative priority but we want to minimize the preemption of
> > > current so we use the weight
> > >
> > > >
> > > > 4 A<0,B>0
> > > >     ::C=A-B<0, min(C,A)=C, latency decrease and the real negtive value.
> > 
> > And for this one we probably want to use A like for other UC with A < 0
> > 
> > I'm going to update the way the weight is computed to match this
> 
> According to your explanations, the min(C,A) is used for A and B both in
> the negtive region or in the postive region, the max(C,A) is use for A and
> B both not in the same region.
> 
> if ((A>>31)^(B>>31))
>   max(C,A)
> else
>   min(C,A)

Not right. 

  if ((A&(1<<31))^(B(1<<31)))
      max(C,A)
  else
      min(C,A)

> 
> wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
> {
>     s64 gran, vdiff = curr->vruntime - se->vruntime;
>     int latency_weight = se->latency_weight - curr->latency_weight;
> 
>     if ((se->latency_weight>>(WMULT_SHIFT-1)) ^
>          curr->latency_weight>>(WMULT_SHIFT-1))
>         latency_weight = max(latency_weight, se->latency_weight);
>     else
>         latency_weight = min(latency_weight, se->latency_weight);
> 
>     vdiff += wakeup_latency_gran(latency_weight);
>     ...
> }

Not right.

wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
{
    s64 gran, vdiff = curr->vruntime - se->vruntime;
    int latency_weight = se->latency_weight - curr->latency_weight;

    if ((se->latency_weight & (1 << WMULT_SHIFT-1)) ^
         curr->latency_weight & (1 << WMULT_SHIFT-1))
        latency_weight = max(latency_weight, se->latency_weight);
    else
        latency_weight = min(latency_weight, se->latency_weight);

    vdiff += wakeup_latency_gran(latency_weight);
    ...
}

> > > >
> > > > Is there a reason that the value when latency increase and latency decrease
> > > > be treated differently. Latency increase value is limited to se's latency_weight
> > >
> > > I have tried to explain why I treat differently if weight is > 0 or < 0.
> > >
> > > > but latency decrease value can extend to -sched_latency or treat them the same.
> > > > Penalty latency decrease and conserve latency increase.
> > > >
> > > >
> > > > There is any value that this latency_weight can be considered to be a average.
> > > >
> > > > The delta value choose is ~%5 to 1024. %5*sched_latency=0.05*6ms=0.3ms.(no scale)
> > > > I do not think over that vdiff equation  and others though.
> > > >
> > > > Thanks,
> > > > Tao
> > > > > +     vdiff += wakeup_latency_gran(latency_weight);
> > > > >
> > > > >       if (vdiff <= 0)
> > > > >               return -1;
> > > > > @@ -7117,6 +7172,7 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
> > > > >               return;
> > > > >
> > > > >       update_curr(cfs_rq_of(se));
> > > > > +
> > > > >       if (wakeup_preempt_entity(se, pse) == 1) {
> > > > >               /*
> > > > >                * Bias pick_next to pick the sched entity that is
> > > > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > > > > index 456ad2159eb1..dd92aa9c36f9 100644
> > > > > --- a/kernel/sched/sched.h
> > > > > +++ b/kernel/sched/sched.h
> > > > > @@ -122,6 +122,17 @@ extern void call_trace_sched_update_nr_running(struct rq *rq, int count);
> > > > >   * Default tasks should be treated as a task with latency_nice = 0.
> > > > >   */
> > > > >  #define DEFAULT_LATENCY_NICE 0
> > > > > +#define DEFAULT_LATENCY_PRIO (DEFAULT_LATENCY_NICE + LATENCY_NICE_WIDTH/2)
> > > > > +
> > > > > +/*
> > > > > + * Convert user-nice values [ -20 ... 0 ... 19 ]
> > > > > + * to static latency [ 0..39 ],
> > > > > + * and back.
> > > > > + */
> > > > > +#define NICE_TO_LATENCY(nice)        ((nice) + DEFAULT_LATENCY_PRIO)
> > > > > +#define LATENCY_TO_NICE(prio)        ((prio) - DEFAULT_LATENCY_PRIO)
> > > > > +#define NICE_LATENCY_SHIFT   (SCHED_FIXEDPOINT_SHIFT)
> > > > > +#define NICE_LATENCY_WEIGHT_MAX      (1L << NICE_LATENCY_SHIFT)
> > > > >
> > > > >  /*
> > > > >   * Increase resolution of nice-level calculations for 64-bit architectures.
> > > > > @@ -2098,6 +2109,7 @@ static_assert(WF_TTWU == SD_BALANCE_WAKE);
> > > > >
> > > > >  extern const int             sched_prio_to_weight[40];
> > > > >  extern const u32             sched_prio_to_wmult[40];
> > > > > +extern const int             sched_latency_to_weight[40];
> > > > >
> > > > >  /*
> > > > >   * {de,en}queue flags:
> > > > > --
> > > > > 2.17.1
> > > > >

next prev parent reply	other threads:[~2022-05-02 15:25 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-11 16:14 [PATCH 0/6] Add latency_nice priority Vincent Guittot
2022-03-11 16:14 ` [PATCH 1/6] sched: Introduce latency-nice as a per-task attribute Vincent Guittot
2022-03-11 16:14 ` [PATCH 2/6] sched/core: Propagate parent task's latency requirements to the child task Vincent Guittot
2022-03-22  0:22   ` Tim Chen
2022-03-22 14:57     ` Vincent Guittot
2022-03-11 16:14 ` [PATCH 3/6] sched: Allow sched_{get,set}attr to change latency_nice of the task Vincent Guittot
2022-03-22  0:22   ` Tim Chen
2022-03-22 14:55     ` Vincent Guittot
2022-03-28  9:23   ` Dietmar Eggemann
2022-03-28 12:41     ` Vincent Guittot
2022-03-11 16:14 ` [PATCH 4/6] sched/core: Add permission checks for setting the latency_nice value Vincent Guittot
2022-03-11 16:14 ` [RFC 5/6] sched/fair: Take into account latency nice at wakeup Vincent Guittot
2022-03-15  0:53   ` Josh Don
2022-03-15 13:54     ` Vincent Guittot
2022-03-28  9:24   ` Dietmar Eggemann
2022-03-28 12:51     ` Vincent Guittot
2022-05-01 15:58   ` Tao Zhou
2022-05-02  9:54     ` Vincent Guittot
2022-05-02 12:30       ` Vincent Guittot
2022-05-02 15:08         ` Tao Zhou
2022-05-02 15:26           ` Tao Zhou [this message]
2022-05-02 15:47             ` Tao Zhou
2022-05-02 16:21               ` Vincent Guittot
2022-05-02 23:09                 ` Tao Zhou
2022-05-03  2:30     ` Tao Zhou
2022-05-03 12:40       ` Vincent Guittot
2022-05-04 11:14   ` Chen Yu
2022-05-04 12:39     ` Vincent Guittot
2022-03-11 16:14 ` [RFC 6/6] sched/fair: Add sched group latency support Vincent Guittot
2022-03-15  0:58   ` Josh Don
2022-03-15 17:07     ` Vincent Guittot
2022-03-21 17:24   ` Tejun Heo
2022-03-22 16:10     ` Vincent Guittot
2022-03-22 16:40       ` Tejun Heo
2022-03-23 15:04         ` Vincent Guittot
2022-03-23 18:20           ` Chris Hyser
2022-03-22 16:41     ` Tim Chen
2022-03-23 15:23       ` Vincent Guittot
2022-03-22 16:39 ` [PATCH 0/6] Add latency_nice priority Qais Yousef
2022-03-23 15:32   ` Vincent Guittot
2022-03-24 17:25     ` Qais Yousef
2022-03-25 13:27       ` Vincent Guittot
2022-03-28 16:27         ` Qais Yousef
2022-03-30  7:30           ` Vincent Guittot
2022-03-28  9:24 ` Dietmar Eggemann
2022-03-28 12:56   ` Vincent Guittot
2022-04-01 12:15     ` Qais Yousef
2022-04-02  8:46       ` Vincent Guittot
2022-04-09 17:08         ` Qais Yousef
2022-04-09 17:28           ` Steven Rostedt
2022-04-09 18:10             ` Qais Yousef
2022-04-11  7:26               ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ym/4Fs9GEctSNFM3@geo.homenetwork \
    --to=tao.zhou@linux.dev \
    --cc=David.Laight@aculab.com \
    --cc=bsegall@google.com \
    --cc=chris.hyser@oracle.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=parth@linux.ibm.com \
    --cc=patrick.bellasi@matbug.net \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=qais.yousef@arm.com \
    --cc=qperret@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox