public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* AW: Crash in fair scheduler
  2019-12-03 10:30 ` Peter Zijlstra
@ 2019-12-03 10:51   ` Schmid, Carsten
  2019-12-03 14:01     ` Peter Zijlstra
  0 siblings, 1 reply; 4+ messages in thread
From: Schmid, Carsten @ 2019-12-03 10:51 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo@redhat.com, linux-kernel@vger.kernel.org

> > we had a crash in the fair scheduler and analysis shows that this could
> happen again.
> > Happened on 4.14.86 (LTS series) but failing code path still exists in 5.4-rc2
> (and 4.14.147 too).
> 
> Please, do try if you can reproduce with Linus' latest git. I've no idea
> what is, or is not, in those stable trees.
> 
unfortunately a once issue so far ...


--- snip ---

> > include/linux/rbtree.h:91:#define rb_first_cached(root) (root)-
> >rb_leftmost
> 
> > struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
> > {
> > 	struct rb_node *left = rb_first_cached(&cfs_rq->tasks_timeline);
> >
> > 	if (!left)
> > 		return NULL; <<<<<<<<<< the case
> >
> > 	return rb_entry(left, struct sched_entity, run_node);
> > }
> 
> This the problem, for some reason the rbtree code got that rb_leftmost
> thing wrecked.
> 
Any known issue on rbtree code regarding this?

> > Is this a corner case nobody thought of or do we have cfs_rq data that is
> unexpected in it's content?
> 
> No, the rbtree is corrupt. Your tree has a single node (which matches
> with nr_running), but for some reason it thinks rb_leftmost is NULL.
> This is wrong, if the tree is non-empty, it must have a leftmost
> element.
Is there a chance to find the left-most element in the core dump?
Maybe i can dig deeper to find the root c ause then.
Does any of the structs/data in this context point to some memory
where i can continue to search?
Where should rb_leftmost point to if only one node is in the tree?
To the node itself?

> 
> Can you reproduce at will? If so, can you please try the latest kernel,
> and or share the reproducer?
Unfortunately this was a "once" issue so far; i haven't a reproducer yet.

Thanks,
Carsten

^ permalink raw reply	[flat|nested] 4+ messages in thread

* AW: Crash in fair scheduler
  2019-12-03 15:08       ` Dietmar Eggemann
@ 2019-12-03 15:57         ` Schmid, Carsten
  0 siblings, 0 replies; 4+ messages in thread
From: Schmid, Carsten @ 2019-12-03 15:57 UTC (permalink / raw)
  To: Dietmar Eggemann, Valentin Schneider, mingo@redhat.com,
	peterz@infradead.org, linux-kernel@vger.kernel.org

> On 03/12/2019 12:09, Valentin Schneider wrote:
> > On 03/12/2019 10:40, Dietmar Eggemann wrote:
> >> On 03/12/2019 11:30, Valentin Schneider wrote:
> >>> On 03/12/2019 09:11, Schmid, Carsten wrote:
> >>
> >> [...]
> 
> I can't reproduce it on Arm64 Juno running 4.14.86. I suppose that there
> is no extra reproducer testcase since the issue happened with
> prev->sched_class eq. &idle_sched_class [prev eq. swapper/X 0] in the
> simple path of pick_next_task_fair().
> 
> I'm running with CONFIG_SCHED_AUTOGROUP=y and
> CONFIG_FAIR_GROUP_SCHED=y
> some taskgroup related tests for hours now. So the sched_entity (se) can
> be a task, an autogroup or a taskgroup in the simple path. pref is
> either swapper/X or migration/X.

We have the same kernel config settings.
However, as i stated in the analysis, we had
prev->sched_class ne. &fair_sched_class
and, unfortunately, no reproducer.

Looks like we need to find out why rb_leftmost is 0x0/NULL.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* AW: Crash in fair scheduler
  2019-12-03 14:01     ` Peter Zijlstra
@ 2019-12-05 10:56       ` Schmid, Carsten
  0 siblings, 0 replies; 4+ messages in thread
From: Schmid, Carsten @ 2019-12-05 10:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo@redhat.com, linux-kernel@vger.kernel.org, walken@google.com,
	dave@stgolabs.net

> Von: Peter Zijlstra [mailto:peterz@infradead.org]

> 
> Exatly.
> 
> 
> I suppose one approach is to add code to both __enqueue_entity() and
> __dequeue_entity() that compares ->rb_leftmost to the result of
> rb_first(). That'd incur some overhead but it'd double check the logic.

As this is a ONCE without reproducer, i would prefer to use an approach
to exactly check for this case in the code path where it crashed.
Something like this (with pseudo-code):

simple:
....

do {
  se = pick_next_entity(..)
  if (unlikely(!se)) { /* here we check for the issue */
     write warning and some useful data to dmesg
     if (cur_rq->rb_leftmost == NULL) { /* our case */
       set cur_rq->rb_leftmost to itself as mentioned in the discussion
       se = pick_next_entity(..)       /* should now return a valid pointer */
     } else { /* another case happened, unknown */
        write warning to dmesg UNKNOWN
        panic() /* not known what to do here, would crash anyway. */
     }
  set_next_entity(se, ..)
  cfs_rq = group_cfs_rq(...)
} while (cfs_rq);

This will definitely not fix the rb_leftmost being NULL, but we can't tell
where this happened at all, so it's digging in the dark.
Maybe the data written to dmesg will help to diagnose further, if the issue
will happen again.
And, this will not affect performance much, as i have to take care of this too.

Thanks for all your suggestions.
Carsten

^ permalink raw reply	[flat|nested] 4+ messages in thread

* AW: Crash in fair scheduler
@ 2019-12-06 10:11 Schmid, Carsten
  0 siblings, 0 replies; 4+ messages in thread
From: Schmid, Carsten @ 2019-12-06 10:11 UTC (permalink / raw)
  To: Davidlohr Bueso, Peter Zijlstra
  Cc: mingo@redhat.com, linux-kernel@vger.kernel.org, walken@google.com

> Von: Davidlohr Bueso [mailto:dave@stgolabs.net]
> Gesendet: Donnerstag, 5. Dezember 2019 18:41
> 
> Yeah I had never seen this either, and would expect the world to fall
> appart if leftmost is buggy (much less a one time occurance), but the
> following certainly raises a red flag:
> 
>     &cfs_rq->tasks_timeline->rb_leftmost
>   tasks_timeline = {
>     rb_root = {
>       rb_node = 0xffff99a9502e0d10
>     },
>     rb_leftmost = 0x0
>   },
> 
Meanwhile i am diving a bit deeper into the kernel dump.
I can see that for this rb_root we have a node structure with 2 nodes:
crash> p -x *(struct rb_node *)0xffff99a9502e0d10
$7 = {
  __rb_parent_color = 0xffff99a9502e0d10, <- points to SELF
  rb_right = 0xffff99a9502e0d10, <- points to self
  rb_left = 0xffff99a9502e1990 <- and we have a node left
}

The rb_left node:
crash> p -x *(struct rb_node *)0xffff99a9502e1990
$6 = {
  __rb_parent_color = 0xffff99a9502e0d11, <- points to the rb_root node (bit 0 is color)
  rb_right = 0x0, <- no leaf
  rb_left = 0x0 <- no leaf
}

I'm currently trying to extract the information what se (scheduling entity)
covers these nodes.
Anyway, the cfs_rq->tasks_timeline.rb_leftmost should point to 0xffff99a9502e1990
as far as i understand the rb_tree, right?

> >
> >I suppose one approach is to add code to both __enqueue_entity() and
> >__dequeue_entity() that compares ->rb_leftmost to the result of
> >rb_first(). That'd incur some overhead but it'd double check the logic.
> 
> We could benefit from improved debugging in rbtrees, not only the cached
> flavor. Perhaps we can start with the following -- this would at least
> let us know if the case where the tree is non-empty and leftmost is nil
> was hit, whether in the scheduler or another user...
> 
> Thanks,
> Davidlohr
> 
That's what i will do too, add some debugging stuff.
Add that to the project i'm on here, not upstream; and try
to log as much debug data as possible if a similar case occurs again.
But as rb_tree is excessively used i need to be careful where
to add debug code due to performance impact.

The approach you do with a configurable rb_tree debug
might help me here, yes; i would have taken a similar approach.

Thanks,
Carsten


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-12-06 10:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-12-06 10:11 AW: Crash in fair scheduler Schmid, Carsten
  -- strict thread matches above, loose matches on Subject: below --
2019-12-03  9:11 Schmid, Carsten
2019-12-03 10:30 ` Valentin Schneider
2019-12-03 10:40   ` Dietmar Eggemann
2019-12-03 11:09     ` Valentin Schneider
2019-12-03 15:08       ` Dietmar Eggemann
2019-12-03 15:57         ` AW: " Schmid, Carsten
2019-12-03 10:30 ` Peter Zijlstra
2019-12-03 10:51   ` AW: " Schmid, Carsten
2019-12-03 14:01     ` Peter Zijlstra
2019-12-05 10:56       ` AW: " Schmid, Carsten

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox