Re: [RFC PATCH] sched/fair: Interleave cfs bandwidth timers for improved single thread performance at low utilization

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Benjamin Segall <bsegall@google.com>
To: shrikanth hegde <sshegde@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	mingo@redhat.com, Vincent Guittot <vincent.guittot@linaro.org>,
	dietmar.eggemann@arm.com, Thomas Gleixner <tglx@linutronix.de>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Arjan van de Ven <arjan@linux.intel.com>,
	svaidy@linux.ibm.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] sched/fair: Interleave cfs bandwidth timers for improved single thread performance at low utilization
Date: Tue, 21 Feb 2023 13:43:27 -0800	[thread overview]
Message-ID: <xm267cwa4ruo.fsf@google.com> (raw)
In-Reply-To: <0c4d7bbb-3fef-031e-e9a1-a678ab68ade7@linux.vnet.ibm.com> (shrikanth hegde's message of "Wed, 22 Feb 2023 00:23:05 +0530")

shrikanth hegde <sshegde@linux.vnet.ibm.com> writes:

> On 2/20/23 11:08 PM, Peter Zijlstra wrote:
>> On Tue, Feb 14, 2023 at 08:54:09PM +0530, shrikanth hegde wrote:
>> 
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index ff4dbbae3b10..7b69c329e05d 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -5939,14 +5939,25 @@ static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq)
>>>
>>>  void start_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
>>>  {
>>> -	lockdep_assert_held(&cfs_b->lock);
>>> +	struct hrtimer *period_timer = &cfs_b->period_timer;
>>> +	s64 incr = ktime_to_ns(cfs_b->period) / 10;
>>> +	ktime_t delta;
>>> +	u64 orun = 1;
>>>
>>> +	lockdep_assert_held(&cfs_b->lock);
>>>  	if (cfs_b->period_active)
>>>  		return;
>>>
>>>  	cfs_b->period_active = 1;
>>> -	hrtimer_forward_now(&cfs_b->period_timer, cfs_b->period);
>>> -	hrtimer_start_expires(&cfs_b->period_timer, HRTIMER_MODE_ABS_PINNED);
>>> +	delta = ktime_sub(period_timer->base->get_time(),
>>> +			hrtimer_get_expires(period_timer));
>>> +	if (unlikely(delta >= cfs_b->period)) {
>>> +		orun = ktime_divns(delta, incr);
>>> +		hrtimer_add_expires_ns(period_timer, incr * orun);
>>> +	}
>>> +
>>> +	hrtimer_forward_now(period_timer, cfs_b->period);
>>> +	hrtimer_start_expires(period_timer, HRTIMER_MODE_ABS_PINNED);
>>>  }
>> 
>> What kind of mad hackery is this? Why can't you do the sane thing and
>> initialize the timer at !0 in init_cfs_bandwidth(), then any of the
>> forwards will stay in period -- as they should.
>> 
>> Please, go re-read Thomas's email.
>
> Thank you Peter for taking a look and review.
> we can scrap this implementation and switch to the one you suggested.
> All we need is to initialize the offset. 
>
> Only reason was the way i had implemented. This implementation couldn't be
> fit into init_cfs_bandwidth as timers would align if the cgroups are 
> created together and continue to align forever. 
>
>> 
>> *completely* untested...
>> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 7c46485d65d7..4d6ea76096dc 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5915,6 +5915,7 @@ void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
>> 
>>  	INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq);
>>  	hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
>> +	cfs_b->period_timer.node.expires = get_random_u32_below(cfs_b->period);
>
> This approach/implementation is better as the random function provides uniform  
> distribution. Had to modify this a bit to make it work.  Along with setting     
> setting node.expires, we need to set _softexpires as well. Which is what        
> hrtimer_set_expires does.
>
> Here are the similar numbers again.
> 8 Core system with SMT=8. Total of 64 CPU                                       
> Workload: stress-ng --cpu=32 --cpu-ops=50000                                    
>                                                                                 
>            6.2-rc6                     |   with patch                           
> 8Core   1CG    power    2CG     power  |  1CG    power  2CG    power           
>         27.5    80.6    40      90     |  27.3    82    32.3    104             
>         27.5    81      40.2    91     |  27.5    81    38.7     96             
>         27.7    80      40.1    89     |  27.6    80    29.7    115             
>         27.7    80.1    40.3    94     |  27.6    80    31.5    105   
>
> Will collect some more benchmarks numbers w.r.t to performance.
>
>
>>  	cfs_b->period_timer.function = sched_cfs_period_timer;
>>  	hrtimer_init(&cfs_b->slack_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>>  	cfs_b->slack_timer.function = sched_cfs_slack_timer;
>
> This below patch worked. 
> Does the below patch look okay? shall i send the [PATCH V1] with this
> change?

Yeah, this design makes way more sense.

>
> Question. 
> Should we skip this adding the offset for root_task_group?

The value should never come up, so it's just a question of if it's fine
to call get_random_* in early contexts, which I don't know offhand.

>
>
> ---
>  kernel/sched/fair.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ff4dbbae3b10..6448533178af 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5923,6 +5923,9 @@ void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
>  	INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq);
>  	hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
>  	cfs_b->period_timer.function = sched_cfs_period_timer;
> +	/* Add a random offset so that timers interleave */
> +	hrtimer_set_expires(&cfs_b->period_timer, get_random_u32_below(cfs_b->period));
> +
>  	hrtimer_init(&cfs_b->slack_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>  	cfs_b->slack_timer.function = sched_cfs_slack_timer;
>  	cfs_b->slack_started = false;

next prev parent reply	other threads:[~2023-02-21 21:43 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-14 15:24 [RFC PATCH] sched/fair: Interleave cfs bandwidth timers for improved single thread performance at low utilization shrikanth hegde
2023-02-20 17:38 ` Peter Zijlstra
2023-02-21 18:53   ` shrikanth hegde
2023-02-21 21:43     ` Benjamin Segall [this message]
2023-02-22  9:36       ` Peter Zijlstra
     [not found] <20230214120502.934324-1-sshegde@linux.vnet.ibm.com>
2023-02-14 21:37 ` Benjamin Segall
2023-02-15 11:01   ` shrikanth hegde
2023-02-15 21:32     ` Benjamin Segall
2023-02-16 19:57       ` shrikanth hegde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xm267cwa4ruo.fsf@google.com \
    --to=bsegall@google.com \
    --cc=arjan@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=sshegde@linux.vnet.ibm.com \
    --cc=svaidy@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.