From: Prarit Bhargava <prarit@redhat.com>
To: Jacob Tanenbaum <jtanenba@redhat.com>, linux-pm@vger.kernel.org
Cc: trenn@suse.com
Subject: Re: cpupower reports uninitialized values for offline cpus
Date: Tue, 29 Sep 2015 16:25:47 -0400 [thread overview]
Message-ID: <560AF3CB.1090401@redhat.com> (raw)
In-Reply-To: <560ADF55.6090901@redhat.com>
On 09/29/2015 02:58 PM, Jacob Tanenbaum wrote:
> Hi guys,
>
> I have found a bug in the cpupower tool. In the most recent pull of linus's tree
> cpupower reports bogus information for offlined cpus.
>
> [root@hp-dl980g7-02 linux]# uname -a
> Linux hp-dl980g7-02.rhts.eng.bos.redhat.com 4.3.0-rc3+ #1 SMP Tue Sep 29
> 12:03:15 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
> [root@hp-dl980g7-02 linux]# echo 0 > /sys/devices/system/cpu/cpu150/online
> [root@hp-dl980g7-02 linux]# echo 0 > /sys/devices/system/cpu/cpu1/online
> [root@hp-dl980g7-02 linux]# echo 0 > /sys/devices/system/cpu/cpu140/online
> [root@hp-dl980g7-02 linux]# cpupower monitor
> |Nehalem || Mperf || Idle_Stats
> PKG |CORE|CPU | C3 | C6 | PC3 | PC6 || C0 | Cx | Freq || POLL | C1-N |
> C1E- | C3-N | C6-N
> 0| 0| 0| 0.00| 99.24| 0.00| 0.00|| 0.38| 99.62| 1596|| 0.00| 0.00|
> 0.00| 0.00| 99.92
> 0| 0| 80| 0.00| 99.25| 0.00| 0.00|| 0.28| 99.72| 1681|| 0.00| 0.00|
> 0.00| 0.00| 99.97
> 0| 1| 81| 0.00| 99.47| 0.00| 0.00|| 0.27| 99.73| 1711|| 0.00| 0.00|
> 0.00| 0.00| 99.94
> ...
> 7| 7| 157| 0.00| 99.47| 0.00| 0.00|| 0.25| 99.75| 1735|| 0.00| 0.00|
> 0.00| 0.00| 99.98
> 7| 8| 78| 0.00| 99.48| 0.00| 0.00|| 0.26| 99.74| 1741|| 0.00| 0.00|
> 0.00| 0.00| 99.98
> 7| 8| 158| 0.00| 99.47| 0.00| 0.00|| 0.25| 99.75| 1726|| 0.00| 0.00|
> 0.00| 0.00| 99.98
> 7| 9| 79| 0.00| 99.57| 0.00| 0.00|| 0.23| 99.77| 1745|| 0.00| 0.00|
> 0.00| 0.00| 99.96
> 5472| 0| 1|******|******|******|******||******|******|******|| 0.00| 0.00|
> 0.00| 0.00| 0.00 *is offline
> 10567| 0| 159|******|******|******|******||******|******|******|| 0.00|
> 0.00| 0.00| 0.00| 0.00 *is offline
> 1661206560|859272560| 150|******|******|******|******||******|******|******||
> 0.00| 0.00| 0.00| 0.00| 0.00 *is offline
> 1661206560|943093104| 140|******|******|******|******||******|******|******||
> 0.00| 0.00| 0.00| 0.00| 0.00 *is offline
>
>
> Also the number of sockets in cpu->pkgs from get_cpu_topology will be wrong.
>
> This is because when a cpu is downed the topology directory
> /sys/devices/system/cpu/cpu[num]/ is removed and get_cpu_topology
> relies on the files there to get the correct information. Patch 404c2db6... has
> the loop skip acquiring the data for the offline
> cpus but that causes the values to remain uninitialized. Because of the way
> cpu_top->pkgs is calculated each uninitialized and
> unique cpu_top->core_info[cpu].core will erroneously increase the cpu_top->pkgs
> value.
>
> I want to fix this bug I am just not sure how to proceed, I see three options...
>
> 1. Change the code to not report any information on offline cpus
Thomas,
I'm not sure I see any benefit in reporting the offline cpu status here. We
could I suppose output a message at the beginning of the output to indicate that
some cpus have been hotplugged, but I'm not sure that is even necessary.
I would prefer changing cpupower to reporting only online CPUs, but there's
probably some situation I'm unaware of that we need to report offline cpu status.
> 2. When printing the results check if the cpu is offline and if it is obscure
> the incorrect information and change the cpu->pkg
> value to reflect all sockets with at least one online cpu.
This is papering over the issue IMO, but still an option.
> 3. Change the sysfs to retain topology data for offlined cpus, may require
> adding an additional state to reflect the difference
> between a processor that has been brought down via software and a processor
> that has been removed from the machine.
>
Yeah, there's a problem here in which the topology files come and go when a CPU
is online'd. The kernel's CPU hotplug mechanism makes no distinction if the CPU
is being physically hot added or soft onlined so that makes the "lifetime" of
the files in the topology directory difficult to determine.
> I would like to submit a patch to fix this bug and not have one presented to me
> to test.
Yep -- I think that's a good idea :) More the merrier and all that ...
P.
next prev parent reply other threads:[~2015-09-29 20:25 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-29 18:58 cpupower reports uninitialized values for offline cpus Jacob Tanenbaum
2015-09-29 20:25 ` Prarit Bhargava [this message]
2015-09-30 16:23 ` Thomas Renninger
2015-09-30 18:25 ` Prarit Bhargava
2015-10-01 12:29 ` Thomas Renninger
2015-10-01 13:11 ` Prarit Bhargava
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=560AF3CB.1090401@redhat.com \
--to=prarit@redhat.com \
--cc=jtanenba@redhat.com \
--cc=linux-pm@vger.kernel.org \
--cc=trenn@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).