All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Qais Yousef <qais.yousef@arm.com>
Cc: John Stultz <john.stultz@linaro.org>,
	Quentin Perret <qperret@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Patrick Bellasi <Patrick.Bellasi@arm.com>,
	Ingo Molnar <mingo@kernel.org>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: Null pointer crash at find_idlest_group on db845c w/ linus/master
Date: Wed, 4 Dec 2019 11:09:25 +0100	[thread overview]
Message-ID: <20191204100925.GA15727@linaro.org> (raw)
In-Reply-To: <20191204094216.u7yld5r3zelp22lf@e107158-lin.cambridge.arm.com>

Le Wednesday 04 Dec 2019 à 09:42:17 (+0000), Qais Yousef a écrit :
> On 12/04/19 09:06, Vincent Guittot wrote:
> > Hi John,
> > 
> > On Tue, 3 Dec 2019 at 20:15, John Stultz <john.stultz@linaro.org> wrote:
> > >
> > > With today's linus/master on db845c running android, I'm able to
> > > fairly easily reproduce the following crash. I've not had a chance to
> > > bisect it yet, but I'm suspecting its connected to Vincent's recent
> > > rework.
> > 
> > Does the crash happen randomly or after a specific action ?
> > I have a db845 so I can try to reproduce it locally.
> 
> Isn't there a chance we use local_sgs without initializing it in that function?

Normally not because the cpu belongs to its sched_domain

Now, we test that a group has at least one allowed CPU for the task so we
could skip the local group with the correct/wrong p->cpus_ptr

The path is used for fork/exec ibut also for wakeup path for b.L when the task doesn't fit in the CPUs

So we can probably imagine a scenario where we change task affinity while
sleeping. If the wakeup happens on a CPU that belongs to the group that is not
allowed, we can imagine that we skip the local_group

John,

Could you try the fix below ?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 08a233e..bcd216d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8417,6 +8417,10 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
        if (!idlest)
                return NULL;
 
+       /* The local group has been skipped because of cpu affinity */
+       if (!local)
+               return idlest;
+
        /*
         * If the local group is idler than the selected idlest group
         * don't try and push the task.


>
> AFAICS we define local_sgs on the stack but not always could be populated with
> the right values. I can't see tmp_sgs being used in the function too. Could
> this cause the/a problem?
> 
>  8377         struct sg_lb_stats local_sgs, tmp_sgs;
> .
> .
> .
>  8399                 if (local_group) {
>  8400                         sgs = &local_sgs;
>  8401                         local = group;
>  8402                 } else {
>  8403                         sgs = &tmp_sgs;
>  8404                 }
>  8405
>  8406                 update_sg_wakeup_stats(sd, group, sgs, p);
> 
> Cheers
> 
> --
> Qais Youef

  parent reply	other threads:[~2019-12-04 10:09 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-03 19:15 Null pointer crash at find_idlest_group on db845c w/ linus/master John Stultz
2019-12-03 23:20 ` Valentin Schneider
2019-12-03 23:49   ` Valentin Schneider
2019-12-04  0:13     ` Valentin Schneider
2019-12-04  3:46       ` John Stultz
2019-12-04 10:05         ` Valentin Schneider
2019-12-04  8:06 ` Vincent Guittot
2019-12-04  8:22   ` Vincent Guittot
2019-12-04  9:59     ` Valentin Schneider
2019-12-04  9:42   ` Qais Yousef
2019-12-04 10:09     ` Valentin Schneider
2019-12-04 10:09     ` Vincent Guittot [this message]
2019-12-04 10:41       ` Valentin Schneider
2019-12-04 12:08         ` Vincent Guittot
2019-12-04 13:32           ` Qais Yousef
2019-12-04 13:48             ` Vincent Guittot
2019-12-04 13:55               ` Qais Yousef
2019-12-04 14:06           ` Valentin Schneider
2019-12-04 18:16       ` John Stultz
2019-12-04 18:19         ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191204100925.GA15727@linaro.org \
    --to=vincent.guittot@linaro.org \
    --cc=Patrick.Bellasi@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=john.stultz@linaro.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=qperret@google.com \
    --cc=valentin.schneider@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.