From: K Prateek Nayak <kprateek.nayak@amd.com>
To: Adam Li <adamli@os.amperecomputing.com>, <peterz@infradead.org>,
<mingo@redhat.com>, <juri.lelli@redhat.com>,
<vincent.guittot@linaro.org>
Cc: <dietmar.eggemann@arm.com>, <rostedt@goodmis.org>,
<bsegall@google.com>, <mgorman@suse.de>, <vschneid@redhat.com>,
<linux-kernel@vger.kernel.org>, <patches@amperecomputing.com>,
<cl@linux.com>, <christian.loehle@arm.com>,
<vineethr@linux.ibm.com>
Subject: Re: [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled
Date: Fri, 29 Nov 2024 09:58:17 +0530 [thread overview]
Message-ID: <d16cc372-b4ae-473f-bf86-83469fbead99@amd.com> (raw)
In-Reply-To: <3deb3671-64df-4dd9-a539-3d41009f9875@os.amperecomputing.com>
Hello Adam,
On 11/29/2024 8:51 AM, Adam Li wrote:
> On 11/28/2024 3:29 PM, K Prateek Nayak wrote:
>> Hello Adam,
>>
> Hi Prateek,
> Thanks for comments.
>
>> On 11/27/2024 11:26 AM, Adam Li wrote:
>>> Enabling NEXT_BUDDY triggers warning, and rcu stall:
>>>
>>> [ 124.977300] cfs_rq->next->sched_delayed
>>
>> I could reproduce this with a run of "perf bench sched messaging" but
>> given that we hit this warning, it also means that either
>> set_next_buddy() has incorrectly set a delayed entity as next buddy, or
>> clear_next_buddy() did not clear a delayed entity.
>>
> Yes. The logic of this patch is a delayed entity should not be set as next buddy.
>
>> I also see PSI splats like:
>>
>> psi: inconsistent task state! task=2524:kworker/u1028:2 cpu=154 psi_flags=10 clear=14 set=0
>>
>> but the PSI flags it has set "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL)" and
>> the flags it is trying to clear
>> "(TSK_MEMSTALL_RUNNING | TSK_MEMSTALL | TSK_RUNNING)" seem to be only
>> possible if you have picked a dequeued entity for running before its
>> wakeup, which is also perhaps why the "nr_running" computation goes awry
>> and pick_eevdf() returns NULL (which it should never since
>> pick_next_entity() is only called when rq->cfs.nr_running is > 0)
> IIUC, one path for pick_eevdf() to return NULL is:
> pick_eevdf():
> <snip>
> if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
> curr = NULL; <--- curr is set to NULL
"on_rq" is only cleared when the entity is dequeued so "curr" is in fact
going to sleep (proper sleep) and we've reached at pick_eevdf(),
otherwise, if "curr" is not eligible, there is at least one more tasks
on the cfs_rq which implies best has be found and will be non-null.
> <snip>
> found:
> if (!best || (curr && entity_before(curr, best)))
> best = curr; <--- curr and best are both NULL
Say "curr" is going to sleep, and there is no "best", in which case
"curr" is already blocked and "cfs_rq->nr_running" should be 0 and it
should have not reached pick_eevdf() in the first place since
pick_next_entity() is only called by pick_task_fair() if
"cfs_rq->nr_running" is non-zero.
So as long as "cfs_rq->nr_running" is non-zero, pick_eevdf() should
return a valid runnable entity. Failure to do so perhaps points to
"entity_eligible()" check going sideways somewhere or a bug in
"nr_running" accounting.
Chenyu had proposed a similar fix long back in
https://lore.kernel.org/lkml/20240226082349.302363-1-yu.c.chen@intel.com/
but the consensus was it was covering up a larger problem which
then boiled down to avg_vruntime being computed incorrectly
https://lore.kernel.org/lkml/ZiAWTU5xb%2FJMn%2FHs@chenyu5-mobl2/
>
> return best; <--- return NULL
>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index fbdca89c677f..cd1188b7f3df 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -8748,6 +8748,8 @@ static void set_next_buddy(struct sched_entity *se)
>>> return;
>>> if (se_is_idle(se))
>>> return;
>>> + if (se->sched_delayed)
>>> + return;
>>
>> I tried to put a SCHED_WARN_ON() here to track where this comes from and
>> seems like it is usually from attach_task() in the load balancing path
>> pulling a delayed task which is set as the next buddy in
>> check_preempt_wakeup_fair()
>>
>> Can you please try the following diff instead of the first two patches
>> and see if you still hit these warnings, stalls, and pick_eevdf()
>> returning NULL?
>>
> Tested. Run specjbb with NEXT_BUDDY enabled, warnings, stalls and panic disappear.
Thank you for testing. I'll let Peter come back on which approach he
prefers :)
>
> Regards,
> -adam
--
Thanks and Regards,
Prateek
next prev parent reply other threads:[~2024-11-29 4:28 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-27 5:56 [PATCH v2 0/3] sched/fair: Fix NEXT_BUDDY panic and warning Adam Li
2024-11-27 5:56 ` [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled Adam Li
2024-11-28 7:29 ` K Prateek Nayak
2024-11-29 3:21 ` Adam Li
2024-11-29 4:28 ` K Prateek Nayak [this message]
2024-11-29 7:40 ` Adam Li
2024-11-29 8:00 ` K Prateek Nayak
2024-11-29 9:55 ` Peter Zijlstra
2024-11-29 10:15 ` [PATCH] sched/fair: Untangle NEXT_BUDDY and pick_next_task() Peter Zijlstra
2024-11-29 10:18 ` Peter Zijlstra
2024-11-29 10:37 ` Adam Li
2024-11-29 11:45 ` Peter Zijlstra
2024-12-06 6:47 ` Adam Li
2024-12-09 11:00 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2024-11-29 17:46 ` [PATCH v2 1/3] sched/fair: Fix warning if NEXT_BUDDY enabled K Prateek Nayak
2024-11-29 17:53 ` K Prateek Nayak
2024-12-03 16:05 ` Peter Zijlstra
2024-12-03 16:48 ` K Prateek Nayak
2024-12-09 11:00 ` [tip: sched/core] sched/fair: Fix NEXT_BUDDY tip-bot2 for K Prateek Nayak
2024-11-27 5:56 ` [PATCH v2 2/3] sched/fair: Fix panic if pick_eevdf() returns NULL Adam Li
2024-11-29 9:18 ` Peter Zijlstra
2024-11-27 5:56 ` [PATCH v2 3/3] sched/fair: Update comments regarding last and skip buddy Adam Li
2025-03-13 8:30 ` Madadi Vineeth Reddy
2025-03-14 2:53 ` Adam Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d16cc372-b4ae-473f-bf86-83469fbead99@amd.com \
--to=kprateek.nayak@amd.com \
--cc=adamli@os.amperecomputing.com \
--cc=bsegall@google.com \
--cc=christian.loehle@arm.com \
--cc=cl@linux.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=patches@amperecomputing.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vineethr@linux.ibm.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox