From: Chengming Zhou <chengming.zhou@linux.dev>
To: Vishal Chourasia <vishalc@linux.ibm.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Chuyi Zhou <zhouchuyi@bytedance.com>
Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, rostedt@goodmis.org,
bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] sched/fair: Sync se's load_avg with cfs_rq in reweight_task
Date: Wed, 24 Jul 2024 10:12:16 +0800 [thread overview]
Message-ID: <bc8e5905-7729-4263-84e2-ca86a710eccd@linux.dev> (raw)
In-Reply-To: <d758614b-73e8-42a3-92d1-5d2424ee4e89@linux.ibm.com>
On 2024/7/24 07:00, Vishal Chourasia wrote:
> On 24/07/24 2:40 am, Dietmar Eggemann wrote:
>> On 23/07/2024 17:48, Vishal Chourasia wrote:
>>> On Tue, Jul 23, 2024 at 07:42:47PM +0800, Chuyi Zhou wrote:
>>>> In reweight_task(), there are two situations:
>>>>
>>>> 1. The task was on_rq, then the task's load_avg is accurate because we
>>>> synchronized it with cfs_rq through update_load_avg() in dequeue_task().
>>>>
>>>> 2. The task is sleeping, its load_avg might not have been updated for some
>>>> time, which can result in inaccurate dequeue_load_avg() in
>>>> reweight_entity().
>>>>
>>>> This patch solves this by using sync_entity_load_avg() to synchronize the
>>>> load_avg of se with cfs_rq before dequeue_load_avg() in reweight_entity().
>>>> For tasks were on_rq, since we already update load_avg to accurate values
>>>> in dequeue_task(), this change will not have other effects due to the short
>>>> time interval between the two updates.
>>>>
>>>> Suggested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
>>>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
>>>> ---
>>>> Changes in v3:
>>>> - use sync_entity_load_avg() rather than update_load_avg() to sync the
>>>> sleeping task with its cfs_rq suggested by Dietmar.
>>>> - Link t0 v2: https://lore.kernel.org/lkml/20240720051248.59608-1-zhouchuyi@bytedance.com/
>>>> Changes in v2:
>>>> - change the description in commit log.
>>>> - use update_load_avg() in reweight_task() rather than in reweight_entity
>>>> suggested by chengming.
>>>> - Link to v1: https://lore.kernel.org/lkml/20240716150840.23061-1-zhouchuyi@bytedance.com/
>>>> ---
>>>> kernel/sched/fair.c | 43 ++++++++++++++++++++++++-------------------
>>>> 1 file changed, 24 insertions(+), 19 deletions(-)
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 9057584ec06d..da3cdd86ab2e 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -3669,11 +3669,32 @@ dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
>>>> cfs_rq->avg.load_sum = max_t(u32, cfs_rq->avg.load_sum,
>>>> cfs_rq->avg.load_avg * PELT_MIN_DIVIDER);
>>>> }
>>>> +
>>>> +static inline u64 cfs_rq_last_update_time(struct cfs_rq *cfs_rq)
>>>> +{
>>>> + return u64_u32_load_copy(cfs_rq->avg.last_update_time,
>>>> + cfs_rq->last_update_time_copy);
>>>> +}
>>>> +
>>>> +/*
>>>> + * Synchronize entity load avg of dequeued entity without locking
>>>> + * the previous rq.
>>>> + */
>>>> +static void sync_entity_load_avg(struct sched_entity *se)
>>>> +{
>>>> + struct cfs_rq *cfs_rq = cfs_rq_of(se);
>>>> + u64 last_update_time;
>>>> +
>>>> + last_update_time = cfs_rq_last_update_time(cfs_rq);
>>>> + __update_load_avg_blocked_se(last_update_time, se);
>>>> +}
>>>> +
>>> The difference between using update_load_avg() and
>>> sync_entity_load_avg() is:
>>> 1. update_load_avg() uses the updated PELT clock value from the rq
>>> structure.
>>> 2. sync_entity_load_avg() uses the last updated time of
>>> the cfs_rq where the scheduling entity (se) is attached.
>>>
>>> Won't this affect the entity load sync?
>>
>> Not sure what you mean exactly by entity load sync here.
> load avg sync for the wakee
>>
>> The task has been sleeping for a long time, i.e. its PELT values haven't
>> been updated or a long time (its last_update_time (lut) value is pretty
>> old).
>>
>> In this meantime the task's cfs_rq has potentially seen other PELT
>> updates due to PELT updates of other non-sleeping tasks related to this
>> cfs_rq. I.e. the cfs_rq lut is much more recent.
>>
>> What we want to do here is to sync the sleeping task with its cfs_rq. If
>> the task was sleeping for more than 1us (1024ns) and we cross a 1ms PELT
>> period (1024us) when we use cfs_rq's lut as the 'now' value for
>> __update_load_avg_blocked_se() then we will see the task PELT values decay.
>>
>> We rely on sync_entity_load_avg() for instance in EAS wakeup where the
>> task's util_avg influences on which CPU type the task will run next. So
>> we sync the wakee with its cfs_rq to be able to work with a current task
>> util_avg.
> I was talking about the case where all the tasks on a cfs_rq are sleeping.
> In this case, lut of the cfs_rq will be same as, at the time of last dequeue.
In this case, cfs_rq is not on_rq, its load_sum/avg will be updated when
enqueue next time. (Or periodically updated from load balance)
>
> And, wakee is been woken up (suppose) after 1us window
>
>
> I guess, in this case pelt metric would not have changed for the cfs_rq
IMHO, so long as task_se load is synced with cfs_rq there should be ok.
Thanks.
>
> Thanks
> -- vishal.c
>
next prev parent reply other threads:[~2024-07-24 2:12 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-23 11:42 [PATCH v3] sched/fair: Sync se's load_avg with cfs_rq in reweight_task Chuyi Zhou
2024-07-23 15:48 ` Vishal Chourasia
2024-07-23 21:10 ` Dietmar Eggemann
2024-07-23 23:00 ` Vishal Chourasia
2024-07-24 2:12 ` Chengming Zhou [this message]
2024-07-24 9:06 ` Dietmar Eggemann
2024-07-24 2:08 ` Chengming Zhou
2024-07-24 6:20 ` Vishal Chourasia
2024-07-29 7:56 ` Dietmar Eggemann
2024-07-29 8:20 ` Chuyi Zhou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bc8e5905-7729-4263-84e2-ca86a710eccd@linux.dev \
--to=chengming.zhou@linux.dev \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vishalc@linux.ibm.com \
--cc=vschneid@redhat.com \
--cc=zhouchuyi@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.