public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tianchen Ding <dtcccc@linux.alibaba.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Josh Don <joshdon@google.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: [PATCH v2] sched/fair: Make SCHED_IDLE entity be preempted in strict hierarchy
Date: Tue, 9 Jul 2024 21:42:23 +0800	[thread overview]
Message-ID: <eab4db35-9cd2-4348-b41c-ea6176583ef4@linux.alibaba.com> (raw)
In-Reply-To: <20240708142832.GB27299@noisy.programming.kicks-ass.net>

On 2024/7/8 22:28, Peter Zijlstra wrote:
> On Mon, Jul 08, 2024 at 02:47:34PM +0200, Vincent Guittot wrote:
>> On Mon, 8 Jul 2024 at 14:02, Peter Zijlstra <peterz@infradead.org> wrote:
> 
>>>> @@ -8409,6 +8400,15 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>>>        if (cse_is_idle != pse_is_idle)
>>>>                return;
>>>>
>>>> +     /*
>>>> +      * Batch tasks do not preempt non-idle tasks (their preemption
>>>> +      * is driven by the tick).
>>>> +      * We've done the check about "only one of the entities is idle",
>>>> +      * so cse must be non-idle if p is a batch task.
>>>> +      */
>>>> +     if (unlikely(entity_is_task(pse) && p->policy == SCHED_BATCH))
>>>> +             return;
>>>
>>> I'm not convinced this condition is right. The current behaviour of
>>> SCHED_BATCH doesn't care about pse, only p.
>>>
>>> That is, if p is SCHED_BATCH it will not preempt -- except an
>>> SCHED_IDLE.
>>>
>>> So I'm tempted to delete this first part of your condition and have it
>>> be:
>>>
>>>          if (p->policy == SCHED_BATCH)
>>>                  return;
>>>
>>> That is, suppose you have:
>>>
>>>                          root
>>>                           |
>>>                ------------------------
>>>                |                      |
>>>          normal_cgroup          normal_cgroup
>>>                |                      |
>>>          SCHED_BATCH task_A     SCHED_BATCH task_B
>>>
>>> Then the preemption crud will end up comparing the groups to one another
>>> and still allow A to preempt B -- except we explicitly do not want this.
>>>
>>> The 'problem' is that the whole BATCH thing isn't cgroup aware ofcourse,

Agree.

>>> but I'm not sure we want to go fix that -- esp. not in this patch.
>>>
>>> Hmm?
>>
>> Good question, but do we want to make SCHED_BATCH tasks behave
>> differently than SCHED_IDLE tasks in a group in this case ?
> 
> I suspect we'll have to. People added the idle-cgroup thing, but never
> did the same for batch. With the result that they're now fundamentally
> different.
> 
>> With this patch, we don't want task_B to preempt task_A for the case
>> but task_A will preempt task_B whereas task A is SCHED_IDLE
>>
>>                           root
>>                            |
>>                 ------------------------
>>                 |                      |
>>           normal_cgroup          idle_cgroup
>>                 |                      |
>>           SCHED_IDLE task_A     SCHED_NORMAL task_B
>>
>> As we only look at the common level of hierarchy between the tasks,
>> the below will make A to preempt B whereas both are SCHED_IDLE
>>
>>                           root
>>                            |
>>                 ------------------------
>>                 |                      |
>>           normal_cgroup          normal_cgroup
>>                 |                      |
>>           SCHED_IDLE task_A     SCHED_IDLE task_B
> 
> So we can make the last test be:
> 
> 	if (unlikely(p->policy != SCHED_NORMAL))
> 		return;
> 
> Much like the original condition removed here:
> 
> -       if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
> +       if (!sched_feat(WAKEUP_PREEMPTION))
> 
> Except now after all that cgroup nonsense. Then the OP case will preempt
> because normal_cgroup vs idle_cgroup, my BATCH example will not preempt,
> because BATCH != NORMAL, your IDLE example will not preempt either,
> because IDLE != NORMAL.
> 

So there may be 2 interesting issues. Let's take the example of:

                            root
                             |
                  ------------------------
                  |                      |
            normal_cgroup          normal_cgroup
                  |                      |
            SCHED_IDLE task_A     SCHED_IDLE task_B


The first issue is the scope of task policy. My original proposal is to only 
compare the common level of hierarchy (pse and cse), and ignore all their 
children. So that the scope of task policy will be limited within its own 
cgroup, and A may preempt B.

However, If we simply check

         if (p->policy == SCHED_BATCH)
                 return;

or

  	if (p->policy != SCHED_NORMAL)
  		return;

Then the scope of task policy will "overflow" and take effect "across" the 
cgroups. So A will not preempt B.

I don't know which one is better, and both are reasonable and acceptable for me.


The second issue is about SCHED_BATCH. Now let's make task_A be SCHED_BATCH.

In vanilla kernel, A will preempt B because "Idle tasks are by definition 
preempted by non-idle tasks."

However, with my original patch, A may preempt B. (depending on pse and cse)

With your modification, A will not preempt B.

Again, which one should be preferred?

Thanks.

  reply	other threads:[~2024-07-09 13:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-26  2:35 [PATCH v2] sched/fair: Make SCHED_IDLE entity be preempted in strict hierarchy Tianchen Ding
2024-07-04  8:15 ` Tianchen Ding
2024-07-08 12:02 ` Peter Zijlstra
2024-07-08 12:47   ` Vincent Guittot
2024-07-08 14:28     ` Peter Zijlstra
2024-07-09 13:42       ` Tianchen Ding [this message]
2024-07-09 18:28       ` Josh Don
2024-07-29 10:34 ` [tip: sched/core] " tip-bot2 for Tianchen Ding

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eab4db35-9cd2-4348-b41c-ea6176583ef4@linux.alibaba.com \
    --to=dtcccc@linux.alibaba.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox