From: Chen Yu <yu.c.chen@intel.com>
To: Valentin Schneider <vschneid@redhat.com>
Cc: <paulmck@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
<linux-kernel@vger.kernel.org>, <sfr@canb.auug.org.au>,
<linux-next@vger.kernel.org>, <kernel-team@meta.com>
Subject: Re: [BUG almost bisected] Splat in dequeue_rt_stack() and build error
Date: Wed, 28 Aug 2024 21:44:08 +0800 [thread overview]
Message-ID: <Zs8pqJjIYOFuPDiH@chenyu5-mobl2> (raw)
In-Reply-To: <xhsmha5gwome6.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Hi,
On 2024-08-28 at 14:35:45 +0200, Valentin Schneider wrote:
> On 27/08/24 13:36, Paul E. McKenney wrote:
> > On Tue, Aug 27, 2024 at 10:30:24PM +0200, Valentin Schneider wrote:
> >> On 27/08/24 11:35, Paul E. McKenney wrote:
> >> > On Tue, Aug 27, 2024 at 10:33:13AM -0700, Paul E. McKenney wrote:
> >> >> On Tue, Aug 27, 2024 at 05:41:52PM +0200, Valentin Schneider wrote:
> >> >> > I've taken tip/sched/core and shuffled hunks around; I didn't re-order any
> >> >> > commit. I've also taken out the dequeue from switched_from_fair() and put
> >> >> > it at the very top of the branch which should hopefully help bisection.
> >> >> >
> >> >> > The final delta between that branch and tip/sched/core is empty, so it
> >> >> > really is just shuffling inbetween commits.
> >> >> >
> >> >> > Please find the branch at:
> >> >> >
> >> >> > https://gitlab.com/vschneid/linux.git -b mainline/sched/eevdf-complete-builderr
> >> >> >
> >> >> > I'll go stare at the BUG itself now.
> >> >>
> >> >> Thank you!
> >> >>
> >> >> I have fired up tests on the "BROKEN?" commit. If that fails, I will
> >> >> try its predecessor, and if that fails, I wlll bisect from e28b5f8bda01
> >> >> ("sched/fair: Assert {set_next,put_prev}_entity() are properly balanced"),
> >> >> which has stood up to heavy hammering in earlier testing.
> >> >
> >> > And of 50 runs of TREE03 on the "BROKEN?" commit resulted in 32 failures.
> >> > Of these, 29 were the dequeue_rt_stack() failure. Two more were RCU
> >> > CPU stall warnings, and the last one was an oddball "kernel BUG at
> >> > kernel/sched/rt.c:1714" followed by an equally oddball "Oops: invalid
> >> > opcode: 0000 [#1] PREEMPT SMP PTI".
> >> >
> >> > Just to be specific, this is commit:
> >> >
> >> > df8fe34bfa36 ("BROKEN? sched/fair: Dequeue sched_delayed tasks when switching from fair")
> >> >
> >> > This commit's predecessor is this commit:
> >> >
> >> > 2f888533d073 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
> >> >
> >> > This predecessor commit passes 50 runs of TREE03 with no failures.
> >> >
> >> > So that addition of that dequeue_task() call to the switched_from_fair()
> >> > function is looking quite suspicious to me. ;-)
> >> >
> >> > Thanx, Paul
> >>
> >> Thanks for the testing!
> >>
> >> The WARN_ON_ONCE(!rt_se->on_list); hit in __dequeue_rt_entity() feels like
> >> a put_prev/set_next kind of issue...
> >>
> >> So far I'd assumed a ->sched_delayed task can't be current during
> >> switched_from_fair(), I got confused because it's Mond^CCC Tuesday, but I
> >> think that still holds: we can't get a balance_dl() or balance_rt() to drop
> >> the RQ lock because prev would be fair, and we can't get a
> >> newidle_balance() with a ->sched_delayed task because we'd have
> >> sched_fair_runnable() := true.
> >>
> >> I'll pick this back up tomorrow, this is a task that requires either
> >> caffeine or booze and it's too late for either.
> >
> > Thank you for chasing this, and get some sleep! This one is of course
> > annoying, but it is not (yet) an emergency. I look forward to seeing
> > what you come up with.
> >
> > Also, I would of course be happy to apply debug patches.
> >
> > Thanx, Paul
>
> Chen Yu made me realize [1] that dequeue_task() really isn't enough; the
> dequeue_task() in e.g. __sched_setscheduler() won't have DEQUEUE_DELAYED,
> so stuff will just be left on the CFS tree.
>
One question, although there is no DEQUEUE_DELAYED flag, it is possible
the delayed task could be dequeued from CFS tree. Because the dequeue in
set_schedule() does not have DEQUEUE_SLEEP. And in dequeue_entity():
bool sleep = flags & DEQUEUE_SLEEP;
if (flags & DEQUEUE_DELAYED) {
} else {
bool delay = sleep;
if (sched_feat(DELAY_DEQUEUE) && delay && //false
!entity_eligible(cfs_rq, se) {
//do not dequeue
}
}
//dequeue the task <---- we should reach here?
thanks,
Chenyu
> Worse, what we need here is the __block_task() like we have at the end of
> dequeue_entities(), otherwise p stays ->on_rq and that's borked - AFAICT
> that explains the splat you're getting, because affine_move_task() ends up
> doing a move_queued_task() for what really is a dequeued task.
>
> I unfortunately couldn't reproduce the issue locally using your TREE03
> invocation. I've pushed a new patch on top of my branch, would you mind
> giving it a spin? It's a bit sketchy but should at least be going in the
> right direction...
>
> [1]: http://lore.kernel.org/r/Zs2d2aaC/zSyR94v@chenyu5-mobl2
>
next prev parent reply other threads:[~2024-08-28 13:44 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-21 21:57 [BUG almost bisected] Splat in dequeue_rt_stack() and build error Paul E. McKenney
2024-08-22 23:01 ` Paul E. McKenney
2024-08-23 7:47 ` Peter Zijlstra
2024-08-23 12:46 ` Paul E. McKenney
2024-08-23 21:51 ` Paul E. McKenney
2024-08-24 6:54 ` Peter Zijlstra
2024-08-24 15:26 ` Paul E. McKenney
2024-08-25 2:10 ` Paul E. McKenney
2024-08-25 19:36 ` Paul E. McKenney
2024-08-26 11:44 ` Valentin Schneider
2024-08-26 16:31 ` Paul E. McKenney
2024-08-27 10:03 ` Valentin Schneider
2024-08-27 15:41 ` Valentin Schneider
2024-08-27 17:33 ` Paul E. McKenney
2024-08-27 18:35 ` Paul E. McKenney
2024-08-27 20:30 ` Valentin Schneider
2024-08-27 20:36 ` Paul E. McKenney
2024-08-28 12:35 ` Valentin Schneider
2024-08-28 13:03 ` Paul E. McKenney
2024-08-28 13:40 ` Paul E. McKenney
2024-08-28 13:44 ` Chen Yu [this message]
2024-08-28 14:32 ` Valentin Schneider
2024-08-28 16:35 ` Paul E. McKenney
2024-08-28 18:17 ` Valentin Schneider
2024-08-28 18:39 ` Paul E. McKenney
2024-08-29 10:28 ` Paul E. McKenney
2024-08-29 13:50 ` Valentin Schneider
2024-08-29 14:13 ` Paul E. McKenney
2024-09-08 16:32 ` Paul E. McKenney
2024-09-13 14:08 ` Paul E. McKenney
2024-09-13 16:55 ` Valentin Schneider
2024-09-13 18:00 ` Paul E. McKenney
2024-09-30 19:09 ` Paul E. McKenney
2024-09-30 20:44 ` Valentin Schneider
2024-10-01 10:10 ` Paul E. McKenney
2024-10-01 12:52 ` Valentin Schneider
2024-10-01 16:47 ` Paul E. McKenney
2024-10-02 9:01 ` Tomas Glozar
2024-10-02 12:07 ` Paul E. McKenney
2024-10-10 11:24 ` Tomas Glozar
2024-10-10 15:01 ` Paul E. McKenney
2024-10-10 23:28 ` Paul E. McKenney
2024-10-14 18:55 ` Paul E. McKenney
2024-10-21 19:25 ` Paul E. McKenney
2024-11-14 18:16 ` Paul E. McKenney
2024-12-15 18:31 ` Paul E. McKenney
2024-12-16 14:38 ` Tomas Glozar
2024-12-16 19:36 ` Paul E. McKenney
2024-12-17 16:42 ` Paul E. McKenney
2024-10-22 6:33 ` Tomas Glozar
2024-10-03 8:40 ` Peter Zijlstra
2024-10-03 8:47 ` Peter Zijlstra
2024-10-03 9:27 ` Peter Zijlstra
2024-10-03 12:28 ` Peter Zijlstra
2024-10-03 12:45 ` Paul E. McKenney
2024-10-03 14:22 ` Peter Zijlstra
2024-10-03 16:04 ` Paul E. McKenney
2024-10-03 18:50 ` Peter Zijlstra
2024-10-03 19:12 ` Paul E. McKenney
2024-10-04 13:22 ` Paul E. McKenney
2024-10-04 13:35 ` Peter Zijlstra
2024-10-06 20:44 ` Paul E. McKenney
2024-10-07 9:34 ` Peter Zijlstra
2024-10-08 11:11 ` Peter Zijlstra
2024-10-08 16:24 ` Paul E. McKenney
2024-10-08 22:34 ` Paul E. McKenney
2024-10-12 14:16 ` [tip: sched/urgent] sched: Fix delayed_dequeue vs switched_from_fair() tip-bot2 for Peter Zijlstra
2024-10-03 12:44 ` [BUG almost bisected] Splat in dequeue_rt_stack() and build error Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zs8pqJjIYOFuPDiH@chenyu5-mobl2 \
--to=yu.c.chen@intel.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-next@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=sfr@canb.auug.org.au \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.