All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
To: Balbir Singh <bsingharora@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org, Michael Ellerman <mpe@ellerman.id.au>
Subject: Re: [RFC] powerpc/pseries: Increase busy loop in pseries_cpu_die
Date: Tue, 07 Feb 2017 13:32:22 -0200	[thread overview]
Message-ID: <11711559.cvjM6Evb1C@morokweng> (raw)
In-Reply-To: <20170207025645.GB22303@localhost.localdomain>

Am Dienstag, 7. Februar 2017, 08:26:45 BRST schrieb Balbir Singh:
> On Mon, Feb 06, 2017 at 04:58:16PM -0200, Thiago Jung Bauermann wrote:
> > [  447.714064] Querying DEAD? cpu 134 (134) shows 2
> > cpu 0x86: Vector: 300 (Data Access) at [c000000007b0fd40]
> > 
> >     pc: 000000001ec3072c
> >     lr: 000000001ec2fee0
> >     sp: 1faf6bd0
> >    
> >    msr: 8000000102801000
> >    dar: 212d6c1a2a20c
> 
> This looks like we accessed a bad address, but why?

Am Dienstag, 7. Februar 2017, 13:10:22 BRST schrieb Michael Ellerman:
> We shouldn't be crashing.
> 
> So we need to fix that.
> 
> We may also need to increase the timeout, though it's pretty gross TBH.
> 
> But step one is make sure we don't crash.

I didn't analyze exactly what is causing the CPU to crash because the root 
cause is the inconsistency between what the kernel thinks the CPU state is and 
reality. But if we have to be able to handle that inconsistency I will keep 
digging and try to fix that.

> > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > @@ -206,7 +206,7 @@ static void pseries_cpu_die(unsigned int cpu)
> > 
> >  		}
> >  	
> >  	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
> > 
> > -		for (tries = 0; tries < 25; tries++) {
> > +		for (tries = 0; tries < 5000; tries++) {
> 
> This fixes some of the asymmetry between handling of CPU_STATE_INACTIVE
> and CPU_STATE_OFFLINE, but I think we can probably move the cpu_relax()
> to msleep(1).

I didn't change it to msleep() because I thought it would introduce a 
regression. commit b906cfa397fd ("powerpc/pseries: Fix cpu hotplug") changed a 
msleep(200) that was there to a cpu_relax() with this explanation:

    Currently, pseries_cpu_die() calls msleep() while polling RTAS for 
    the status of the dying cpu.

    However, if the cpu that is going down also happens to be the one 
    doing the tick then we're hosed as the tick_do_timer_cpu 'baton' is
    only passed later on in tick_shutdown() when _cpu_down() does the 
    CPU_DEAD notification.  Therefore jiffies won't be updated anymore.

    This replaces that msleep() with a cpu_relax() to make sure we're not 
    going to schedule at that point.

    With this patch my test box survives a 100k iterations hotplug stress
    test on _all_ cpus, whereas without it, it quickly dies after ~50 
    iterations.

I can try to add it back and see what happens. Perhaps that situation won't 
happen anymore with today's kernel.

> Please also see
> 940ce42 powerpc/pseries: Increase cpu die timeout

Yes, that is the commit that I mentioned in the patch description.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

  reply	other threads:[~2017-02-07 15:32 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-06 18:58 [RFC] powerpc/pseries: Increase busy loop in pseries_cpu_die Thiago Jung Bauermann
2017-02-07  1:05 ` Han Pingtian
2017-02-07  2:10 ` Michael Ellerman
2017-02-07  2:56 ` Balbir Singh
2017-02-07 15:32   ` Thiago Jung Bauermann [this message]
2017-02-08  2:59     ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=11711559.cvjM6Evb1C@morokweng \
    --to=bauerman@linux.vnet.ibm.com \
    --cc=bsingharora@gmail.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.