public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: srinivas pandruvada <srinivas.pandruvada@linux.intel.com>
To: "Bhatnagar, Rishabh" <risbhat@amazon.com>,
	linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
	lenb@kernel.org
Cc: surajjs@amazon.com
Subject: Re: Using nr_cpus breaks no_turbo setting of pstate driver
Date: Wed, 30 Nov 2022 08:40:35 -0800	[thread overview]
Message-ID: <2b3c7457d34f05ab42c7577bbda10033147d5f4b.camel@linux.intel.com> (raw)
In-Reply-To: <9b23e9d4-7c7a-a74f-3a6b-939a4a469c2b@amazon.com>

Hi Rishabh,

I think this is because of request of SMT and the the following CPUID
flag is set to false.
# cpuid -1 -l 6 | grep ignoring
ignoring idle logical processor HWP req = false


The way CPUs are numbered, the SMT starts at number 24. So, nr_cpus=24
didn't bring up any SMT for driver to control.
The default power up MSR_HWP_REQ.max is 0xFF, this will also play role
in resolving the max frequency. Here driver is disabling turbo by
controlling MSR_HWP_REQ.max.

If you want to disable turbo here with reduced nr_cpus, you can set bit
38 of MSR 0x1a0 using msr tools.

Thanks,
Srinivas


On Wed, 2022-11-30 at 08:10 -0800, Bhatnagar, Rishabh wrote:
> "Having some issue with my mails failing to deliver. So trying once 
> again after disabling HTML."
> 
> 
> Hi Srinivas/Len,
> 
> 
> I have been doing some testing on m5zn.metal AWS instances which 
> provides 48 vcpus and 192GB memory.
> 
> Test includes loading all cpus with maximum capacity (using stress)
> and 
> observing the frequency after enabling and disabling turbo
> frequencies. 
> Intel_pstate is enabled with hwp.
> 
> 
> lscpu output for reference:
> 
> *# lscpu*
> Architecture:        x86_64
> CPU op-mode(s):      32-bit, 64-bit
> Byte Order:          Little Endian
> CPU(s):              48
> On-line CPU(s) list: 0-47
> Thread(s) per core:  2
> Core(s) per socket:  12
> Socket(s):           2
> NUMA node(s):        2
> Vendor ID:           GenuineIntel
> CPU family:          6
> Model:               85
> Model name:          Intel(R) Xeon(R) Platinum 8252C CPU @ 3.80GHz
> Stepping:            7
> CPU MHz:             3800.000
> CPU max MHz:         4500.0000
> CPU min MHz:         1200.0000
> BogoMIPS:            7600.00
> Virtualization:      VT-x
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            1024K
> L3 cache:            25344K
> NUMA node0 CPU(s):   0-11,24-35
> NUMA node1 CPU(s):   12-23,36-47
> Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
> pbe 
> syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts 
> rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq 
> dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm
> pcid 
> dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave
> avx 
> f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 
> invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced 
> tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1
> avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx
> smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt
> xsavec 
> xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local
> dtherm 
> ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke 
> avx512_vnni md_clear flush_l1d arch_capabilities
> 
> When testing with all cpus online I don't see any issue. CPUs reach 
> turbo frequency when intel_pstate/no_turbo is disabled and stay below
> turbo frequencies when no_turbo is enabled as expected.
> 
> However, issue occurs when I boot with nr_cpus argument specifying
> less 
> than 48 cpus. e.g. cmdline:
> 
> 
> */# cat /proc/cmdline/*/
> /mmio_stale_data=off mds=off tsx=off tsx_async_abort=off 
> intel_idle.max_cstate=0 
> BOOT_IMAGE=/boot/vmlinuz-5.10.154-119.671.amzn2.x86_64 
> root=UUID=d8605abb-d6cd-4a46-a657-b6bd206da2ab ro console=tty0 
> console=ttyS0,115200n8 net.ifnames=0 biosdevname=0 
> nvme_core.io_timeout=4294967295 rd.emergency=poweroff 
> rd.shell=0*nr_cpus=24 intel_idle.max_cstate=1 processor.max_cstate=1*
> 
> # echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
> # turbostat
> 
> 
> ^CPackage       Core CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ     
> SMI     POLL C1      POLL%   C1%     CPU%c1  CPU%c6  CoreTmp PkgTmp  
> Pkg%pc2 Pkg%pc6 PkgWatt RAMWatt PKG_%   RAM_%
> -       -       -       4370    99.76   4381    3800    6374 0       
> 0       0       0.00    0.00    0.24    0.00    96 96      0.00    
> 0.00    479.79  69.20   0.00    0.00
> 0       0       0       4292    99.93   4295    3800    266 0       
> 0       0       0.00    0.00    0.07    0.00    88 96      0.00    
> 0.00    239.89  33.62   0.00    0.00
> 0       2       1       4292    99.92   4295    3800    270 0       
> 0       0       0.00    0.00    0.08    0.00    92
> 0       4       2       4292    99.92   4295    3800    264 0       
> 0       0       0.00    0.00    0.08    0.00    92
> 0       8       3       4292    99.93   4295    3800    268 0       
> 0       0       0.00    0.00    0.07    0.00    90
> 0       9       4       4292    99.93   4295    3800    264 0       
> 0       0       0.00    0.00    0.07    0.00    91
> 0       10      5       4292    99.92   4295    3800    264 0       
> 0       0       0.00    0.00    0.08    0.00    92
> 0       11      6       4292    99.92   4295    3800    265 0       
> 0       0       0.00    0.00    0.08    0.00    96
> 0       17      7       4292    99.93   4295    3800    266 0       
> 0       0       0.00    0.00    0.07    0.00    91
> 0       19      8       4292    99.93   4295    3800    266 0       
> 0       0       0.00    0.00    0.07    0.00    91
> 0       25      9       4292    99.92   4295    3800    264 0       
> 0       0       0.00    0.00    0.08    0.00    91
> 0       26      10      4292    99.92   4295    3800    264 0       
> 0       0       0.00    0.00    0.08    0.00    93
> 0       27      11      4292    99.92   4295    3800    264 0       
> 0       0       0.00    0.00    0.08    0.00    94
> 1       1       12      4449    99.61   4466    3800    264 0       
> 0       0       0.00    0.00    0.39    0.00    80 88      0.00    
> 0.00    239.90  35.58   0.00    0.00
> 1       2       13      4448    99.60   4466    3800    264 0       
> 0       0       0.00    0.00    0.40    0.00    86
> 1       3       14      4449    99.61   4466    3800    267 0       
> 0       0       0.00    0.00    0.39    0.00    81
> 1       8       15      4449    99.61   4466    3800    264 0       
> 0       0       0.00    0.00    0.39    0.00    79
> 1       9       16      4448    99.60   4466    3800    264 0       
> 0       0       0.00    0.00    0.40    0.00    79
> 1       10      17      4449    99.61   4466    3800    264 0       
> 0       0       0.00    0.00    0.39    0.00    83
> 1       17      18      4448    99.60   4466    3800    264 0       
> 0       0       0.00    0.00    0.40    0.00    82
> 1       18      19      4449    99.61   4466    3800    281 0       
> 0       0       0.00    0.00    0.39    0.00    80
> 1       19      20      4449    99.61   4466    3800    264 0       
> 0       0       0.00    0.00    0.39    0.00    83
> 1       20      21      4448    99.59   4466    3800    264 0       
> 0       0       0.00    0.00    0.41    0.00    83
> 1       25      22      4448    99.60   4466    3800    264 0       
> 0       0       0.00    0.00    0.40    0.00    81
> 1       27      23      4448    99.59   4466    3800    265 0       
> 0       0       0.00    0.00    0.41    0.00    88
> 
> /
> /
> 
> As part of the turbostat output I could see that pstate driver is 
> setting the hints correctly in MSR_HWP_REQUEST. But somehow these
> limits 
> are not being honored by HWP logic.
> 
> 
> cpu0: MSR_PM_ENABLE: 0x00000001 (HWP)
> cpu0: MSR_HWP_CAPABILITIES: 0x070c262d (high 45 guar 38 eff 12 low 7)
> *cpu0: MSR_HWP_REQUEST: 0x00002626 (min 38 max 38 des 0 epp 0x0
> window 
> 0x0 pkg 0x0)*
> cpu0: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 
> window 0x0)
> cpu0: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, 
> No-Excursion_Min)
> 
> I tried the same experiment after disabling hwp by 
> specifying*intel_pstate=no_hwp*in cmdline. In this case the pstate 
> driver seems to be selecting pstates correctly.
> 
> ^CPackage       Core CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ     
> SMI     POLL C1      POLL%   C1%     CPU%c1  CPU%c6  CoreTmp PkgTmp  
> Pkg%pc2 Pkg%pc6 PkgWatt RAMWatt PKG_%   RAM_%
> -       -       -       3800    100.00  3800    3800    1646 0       
> 0       0       0.00    0.00    0.00    0.00    85 85      0.00    
> 0.00    328.38  62.14   0.00    0.00
> 0       0       0       3800    100.00  3800    3800    66 0       
> 0       0       0.00    0.00    0.00    0.00    85 85      0.00    
> 0.00    175.42  36.96   0.00    0.00
> 0       0       24      3800    100.00  3800    3800    70 0       
> 0       0       0.00    0.00    0.00
> 0       1       1       3800    100.00  3800    3800    65 0       
> 0       0       0.00    0.00    0.00    0.00    82
> 0       2       2       3800    100.00  3800    3800    67 0       
> 0       0       0.00    0.00    0.00    0.00    84
> 0       3       3       3800    100.00  3800    3800    67 0       
> 0       0       0.00    0.00    0.00    0.00    83
> 0       4       4       3800    100.00  3800    3800    68 0       
> 0       0       0.00    0.00    0.00    0.00    85
> 0       8       5       3800    100.00  3800    3800    68 0       
> 0       0       0.00    0.00    0.00    0.00    83
> 0       9       6       3800    100.00  3800    3800    65 0       
> 0       0       0.00    0.00    0.00    0.00    85
> 0       10      7       3800    100.00  3800    3800    65 0       
> 0       0       0.00    0.00    0.00    0.00    82
> 0       11      8       3800    100.00  3800    3800    65 0       
> 0       0       0.00    0.00    0.00    0.00    84
> 0       17      9       3800    100.00  3800    3800    65 0       
> 0       0       0.00    0.00    0.00    0.00    83
> 0       18      10      3800    100.00  3800    3800    65 0       
> 0       0       0.00    0.00    0.00    0.00    83
> 0       27      11      3800    100.00  3800    3800    65 0       
> 0       0       0.00    0.00    0.00    0.00    85
> 1       0       12      3800    100.00  3800    3800    65 0       
> 0       0       0.00    0.00    0.00    0.00    66 68      0.00    
> 0.00    152.95  25.18   0.00    0.00
> 1       2       13      3800    100.00  3800    3800    65 0       
> 0       0       0.00    0.00    0.00    0.00    65
> 1       4       14      3800    100.00  3800    3800    65 0       
> 0       0       0.00    0.00    0.00    0.00    68
> 1       8       15      3800    100.00  3800    3800    65 0       
> 0       0       0.00    0.00    0.00    0.00    63
> 1       9       16      3800    100.00  3800    3800    65 0       
> 0       0       0.00    0.00    0.00    0.00    65
> 1       10      17      3800    100.00  3800    3800    65 0       
> 0       0       0.00    0.00    0.00    0.00    64
> 1       17      18      3800    100.00  3800    3800    66 0       
> 0       0       0.00    0.00    0.00    0.00    63
> 1       18      19      3800    100.00  3800    3800    66 0       
> 0       0       0.00    0.00    0.00    0.00    63
> 1       19      20      3800    100.00  3800    3800    65 0       
> 0       0       0.00    0.00    0.00    0.00    63
> 1       20      21      3800    100.00  3800    3800    66 0       
> 0       0       0.00    0.00    0.00    0.00    66
> 1       25      22      3800    100.00  3800    3800    65 0       
> 0       0       0.00    0.00    0.00    0.00    64
> 1       27      23      3800    100.00  3800    3800    67 0       
> 0       0       0.00    0.00    0.00    0.00    67
> 
> [root@ip-172-31-36-94 ec2-user]# cat 
> /sys/devices/system/cpu/intel_pstate/no_turbo
> 1
> 
> 
> Thanks,
> Rishabh
> 


  reply	other threads:[~2022-11-30 16:40 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-30 16:10 Using nr_cpus breaks no_turbo setting of pstate driver Bhatnagar, Rishabh
2022-11-30 16:40 ` srinivas pandruvada [this message]
2022-12-01  1:03   ` Bhatnagar, Rishabh
     [not found]     ` <84472a6904fa12ec1c51879eac11d4735c80936a.camel@linux.intel.com>
2022-12-01 18:32       ` Bhatnagar, Rishabh
2022-12-01 19:49         ` srinivas pandruvada

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2b3c7457d34f05ab42c7577bbda10033147d5f4b.camel@linux.intel.com \
    --to=srinivas.pandruvada@linux.intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=risbhat@amazon.com \
    --cc=surajjs@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox