All of lore.kernel.org
 help / color / mirror / Atom feed
From: K Prateek Nayak <kprateek.nayak@amd.com>
To: Adam Li <adamli@os.amperecomputing.com>, <peterz@infradead.org>,
	<mingo@redhat.com>, <juri.lelli@redhat.com>,
	<vincent.guittot@linaro.org>
Cc: <dietmar.eggemann@arm.com>, <rostedt@goodmis.org>,
	<bsegall@google.com>, <mgorman@suse.de>, <vschneid@redhat.com>,
	<linux-kernel@vger.kernel.org>, <patches@amperecomputing.com>,
	<cl@linux.com>, <christian.loehle@arm.com>,
	<vineethr@linux.ibm.com>
Subject: Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled
Date: Fri, 29 Nov 2024 09:58:17 +0530	[thread overview]
Message-ID: <d16cc372-b4ae-473f-bf86-83469fbead99@amd.com> (raw)
In-Reply-To: <3deb3671-64df-4dd9-a539-3d41009f9875@os.amperecomputing.com>

Hello Adam,

On 11/29/2024 8:51 AM, Adam Li wrote:
> On 11/28/2024 3:29 PM, K Prateek Nayak wrote:
>> Hello Adam,
>>
> Hi Prateek,
> Thanks for comments.
> 
>> On 11/27/2024 11:26 AM, Adam Li wrote:
>>> Enabling NEXT_BUDDY triggers warning, and rcu stall:
>>>
>>> [  124.977300] cfs_rq->next->sched_delayed
>>
>> I could reproduce this with a run of "perf bench sched messaging" but
>> given that we hit this warning, it also means that either
>> set_next_buddy() has incorrectly set a delayed entity as next buddy, or
>> clear_next_buddy() did not clear a delayed entity.
>>
> Yes. The logic of this patch is a delayed entity should not be set as next buddy.
> 
>> I also see PSI splats like:
>>
>>      psi: inconsistent task state! task=2524:kworker/u1028:2 cpu=154 psi_flags=10 clear=14 set=0
>>
>> but the PSI flags it has set "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL)" and
>> the flags it is trying to clear
>> "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL | TSK_RUNNING)" seem to be only
>> possible if you have picked a dequeued entity for running before its
>> wakeup, which is also perhaps why the "nr_running" computation goes awry
>> and pick_eevdf() returns NULL (which it should never since
>> pick_next_entity() is only called when rq->cfs.nr_running is > 0)
> IIUC, one path for pick_eevdf() to return NULL is:
> pick_eevdf():
> <snip>
> 	if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
> 		curr = NULL; <--- curr is set to NULL

"on_rq" is only cleared when the entity is dequeued so "curr" is in fact
going to sleep (proper sleep) and we've reached at pick_eevdf(),
otherwise, if "curr" is not eligible, there is at least one more tasks
on the cfs_rq which implies best has be found and will be non-null.

> <snip>
> found:
> 	if (!best || (curr && entity_before(curr, best)))
> 		best = curr; <--- curr and best are both NULL

Say "curr" is going to sleep, and there is no "best", in which case
"curr" is already blocked and "cfs_rq->nr_running" should be 0 and it
should have not reached pick_eevdf() in the first place since
pick_next_entity() is only called by pick_task_fair() if
"cfs_rq->nr_running" is non-zero.

So as long as "cfs_rq->nr_running" is non-zero, pick_eevdf() should
return a valid runnable entity. Failure to do so perhaps points to
"entity_eligible()" check going sideways somewhere or a bug in
"nr_running" accounting.

Chenyu had proposed a similar fix long back in
https://lore.kernel.org/lkml/20240226082349.302363-1-yu.c.chen@intel.com/
but the consensus was it was covering up a larger problem which
then boiled down to avg_vruntime being computed incorrectly
https://lore.kernel.org/lkml/ZiAWTU5xb%2FJMn%2FHs@chenyu5-mobl2/

> 
> 	return best;  <--- return NULL
> 
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index fbdca89c677f..cd1188b7f3df 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -8748,6 +8748,8 @@ static void set_next_buddy(struct sched_entity *se)
>>>                return;
>>>            if (se_is_idle(se))
>>>                return;
>>> +        if (se->sched_delayed)
>>> +            return;
>>
>> I tried to put a SCHED_WARN_ON() here to track where this comes from and
>> seems like it is usually from attach_task() in the load balancing path
>> pulling a delayed task which is set as the next buddy in
>> check_preempt_wakeup_fair()
>>
>> Can you please try the following diff instead of the first two patches
>> and see if you still hit these warnings, stalls, and pick_eevdf()
>> returning NULL?
>>
> Tested. Run specjbb with NEXT_BUDDY enabled, warnings, stalls and panic disappear.

Thank you for testing. I'll let Peter come back on which approach he
prefers :)

> 
> Regards,
> -adam

-- 
Thanks and Regards,
Prateek


  reply	other threads:[~2024-11-29  4:28 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-27  5:56 [PATCH v2 0/3] sched/fair: Fix NEXT_BUDDY panic and warning Adam Li
2024-11-27  5:56 ` [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled Adam Li
2024-11-28  7:29   ` K Prateek Nayak
2024-11-29  3:21     ` Adam Li
2024-11-29  4:28       ` K Prateek Nayak [this message]
2024-11-29  7:40         ` Adam Li
2024-11-29  8:00           ` K Prateek Nayak
2024-11-29  9:55     ` Peter Zijlstra
2024-11-29 10:15       ` [PATCH] sched/fair: Untangle NEXT_BUDDY and pick_next_task() Peter Zijlstra
2024-11-29 10:18         ` Peter Zijlstra
2024-11-29 10:37           ` Adam Li
2024-11-29 11:45             ` Peter Zijlstra
2024-12-06  6:47         ` Adam Li
2024-12-09 11:00         ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-11-29 17:46       ` [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled K Prateek Nayak
2024-11-29 17:53         ` K Prateek Nayak
2024-12-03 16:05     ` Peter Zijlstra
2024-12-03 16:48       ` K Prateek Nayak
2024-12-09 11:00     ` [tip: sched/core] sched/fair: Fix NEXT_BUDDY tip-bot2 for K Prateek Nayak
2024-11-27  5:56 ` [PATCH v2 2/3] sched/fair: Fix panic if pick_eevdf() returns NULL Adam Li
2024-11-29  9:18   ` Peter Zijlstra
2024-11-27  5:56 ` [PATCH v2 3/3] sched/fair: Update comments regarding last and skip buddy Adam Li
2025-03-13  8:30   ` Madadi Vineeth Reddy
2025-03-14  2:53     ` Adam Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d16cc372-b4ae-473f-bf86-83469fbead99@amd.com \
    --to=kprateek.nayak@amd.com \
    --cc=adamli@os.amperecomputing.com \
    --cc=bsegall@google.com \
    --cc=christian.loehle@arm.com \
    --cc=cl@linux.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=patches@amperecomputing.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vineethr@linux.ibm.com \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.