linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Michael Ellerman <mpe@ellerman.id.au>
To: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>,
	"Gautham R. Shenoy" <ego@linux.vnet.ibm.com>,
	Nathan Lynch <nathanl@linux.ibm.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Tyrel Datwyler <tyreld@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org,
	Kamalesh Babulal <kamaleshb@in.ibm.com>,
	linux-kernel@vger.kernel.org,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Subject: Re: [PATCH 0/2] pseries/hotplug: Change the default behaviour of cede_offline
Date: Wed, 18 Sep 2019 21:31:35 +1000	[thread overview]
Message-ID: <877e65x2lk.fsf@mpe.ellerman.id.au> (raw)
In-Reply-To: <1568788924.kxcnnog4r7.naveen@linux.ibm.com>

"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
> Michael Ellerman wrote:
>> "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> writes:
>>> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
>>>
>>> Currently on Pseries Linux Guests, the offlined CPU can be put to one
>>> of the following two states:
>>>    - Long term processor cede (also called extended cede)
>>>    - Returned to the Hypervisor via RTAS "stop-self" call.
>>>
>>> This is controlled by the kernel boot parameter "cede_offline=on/off".
>>>
>>> By default the offlined CPUs enter extended cede.
>> 
>> Since commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state") (Nov 2009)
>> 
>> Which you wrote :)
>> 
>> Why was that wrong?
>> 
>>> The PHYP hypervisor considers CPUs in extended cede to be "active"
>>> since the CPUs are still under the control fo the Linux Guests. Hence, when we change the
>>> SMT modes by offlining the secondary CPUs, the PURR and the RWMR SPRs
>>> will continue to count the values for offlined CPUs in extended cede
>>> as if they are online.
>>>
>>> One of the expectations with PURR is that the for an interval of time,
>>> the sum of the PURR increments across the online CPUs of a core should
>>> equal the number of timebase ticks for that interval.
>>>
>>> This is currently not the case.
>> 
>> But why does that matter? It's just some accounting stuff, does it
>> actually break something meaningful?
>
> Yes, this broke lparstat at the very least (though its quite unfortunate 
> we took so long to notice).

By "so long" you mean 10 years?

Also I've never heard of lparstat, but I assume it's one of these tools
that's meant to behave like the AIX equivalent?

If it's been "broken" for 10 years and no one noticed, I'd argue the
current behaviour is now "correct" and fixing it would actually be a
breakage :)

> With SMT disabled, and under load:
>   $ sudo lparstat 1 10
>
>   System Configuration
>   type=Shared mode=Uncapped smt=Off lcpu=2 mem=7759616 kB cpus=6 ent=1.00 
>
>   %user  %sys %wait    %idle    physc %entc lbusy  vcsw phint
>   ----- ----- -----    -----    ----- ----- ----- ----- -----
>   100.00  0.00  0.00     0.00     1.10 110.00 100.00 128784460     0
>   100.00  0.00  0.00     0.00     1.07 107.00 100.00 128784860     0
>   100.00  0.00  0.00     0.00     1.07 107.00 100.00 128785260     0
>   100.00  0.00  0.00     0.00     1.07 107.00 100.00 128785662     0
>   100.00  0.00  0.00     0.00     1.07 107.00 100.00 128786062     0
>   100.00  0.00  0.00     0.00     1.07 107.00 100.00 128786462     0
>   100.00  0.00  0.00     0.00     1.07 107.00 100.00 128786862     0
>   100.00  0.00  0.00     0.00     1.07 107.00 100.00 128787262     0
>   100.00  0.00  0.00     0.00     1.07 107.00 100.00 128787664     0
>   100.00  0.00  0.00     0.00     1.07 107.00 100.00 128788064     0

What about that is wrong?

> With cede_offline=off:
>   $ sudo lparstat 1 10
>
>   System Configuration
>   type=Shared mode=Uncapped smt=Off lcpu=2 mem=7759616 kB cpus=6 ent=1.00 
>
>   %user  %sys %wait    %idle    physc %entc lbusy  vcsw phint
>   ----- ----- -----    -----    ----- ----- ----- ----- -----
>   100.00  0.00  0.00     0.00     1.94 194.00 100.00 128961588     0
>   100.00  0.00  0.00     0.00     1.91 191.00 100.00 128961988     0
>   100.00  0.00  0.00     0.00      inf   inf 100.00 128962392     0
>   100.00  0.00  0.00     0.00     1.91 191.00 100.00 128962792     0
>   100.00  0.00  0.00     0.00     1.91 191.00 100.00 128963192     0
>   100.00  0.00  0.00     0.00     1.91 191.00 100.00 128963592     0
>   100.00  0.00  0.00     0.00     1.91 191.00 100.00 128963992     0
>   100.00  0.00  0.00     0.00     1.91 191.00 100.00 128964392     0
>   100.00  0.00  0.00     0.00     1.91 191.00 100.00 128964792     0
>   100.00  0.00  0.00     0.00     1.91 191.00 100.00 128965194     0
>
> [The 'inf' values there show a different bug]
>
> Also, since we expose [S]PURR through sysfs, any tools that make use of 
> that directly are also affected due to this.

But again if we've had the current behaviour for 10 years then arguably
that's now the correct behaviour.

cheers

  reply	other threads:[~2019-09-18 11:33 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-12 10:35 [PATCH 0/2] pseries/hotplug: Change the default behaviour of cede_offline Gautham R. Shenoy
2019-09-12 10:35 ` [PATCH 1/2] pseries/hotplug-cpu: Change default behaviour of cede_offline to "off" Gautham R. Shenoy
2019-09-12 10:35 ` [PATCH 2/2] pseries/hotplug-cpu: Add sysfs attribute for cede_offline Gautham R. Shenoy
2019-09-12 15:39 ` [PATCH 0/2] pseries/hotplug: Change the default behaviour of cede_offline Nathan Lynch
2019-09-15  7:42   ` Gautham R Shenoy
2019-09-17 17:36     ` Nathan Lynch
2019-09-18  5:17       ` Michael Ellerman
2019-09-18 12:30       ` Gautham R Shenoy
2019-09-18 17:08         ` Nathan Lynch
2019-09-18  5:14 ` Michael Ellerman
2019-09-18  6:52   ` Naveen N. Rao
2019-09-18 11:31     ` Michael Ellerman [this message]
2019-09-18 13:38       ` Aneesh Kumar K.V
2019-09-18 16:24       ` Naveen N. Rao
2019-09-18 12:51   ` Gautham R Shenoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877e65x2lk.fsf@mpe.ellerman.id.au \
    --to=mpe@ellerman.id.au \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=ego@linux.vnet.ibm.com \
    --cc=kamaleshb@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=nathanl@linux.ibm.com \
    --cc=naveen.n.rao@linux.vnet.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=tyreld@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).