LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: Nathan Lynch <nathanl@linux.ibm.com>, mpe@ellerman.id.au
Cc: naveen.n.rao@linux.ibm.com, tyreld@linux.ibm.com,
	linuxppc-dev@lists.ozlabs.org, mahesh@linux.ibm.com,
	npiggin@gmail.com
Subject: Re: [PATCH] powerpc/pseries: Add pool idle time at LPAR boot
Date: Fri, 5 Apr 2024 22:32:24 +0530	[thread overview]
Message-ID: <4b1d3241-2caa-4bd1-b9cc-871d176c409e@linux.ibm.com> (raw)
In-Reply-To: <87msq86kju.fsf@li-e15d104c-2135-11b2-a85c-d7ef17e56be6.ibm.com>



On 4/5/24 6:19 PM, Nathan Lynch wrote:
> Shrikanth Hegde <sshegde@linux.ibm.com> writes:

Hi Nathan, Thanks for reviewing this.

>> When there are no options specified for lparstat, it is expected to
>> give reports since LPAR(Logical Partition) boot. App is an indicator
>> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
>> derived using pool_idle_time which is obtained using H_PIC call.
> 
> If "App" is short for "Available Procesoor Pool" then it should be
> written "APP" and the it should be introduced and defined more clearly
> than this.
> 

Ok.I reworded it for v2. 

yes APP is Available Processor Pool. 

> 
>> The interval based reports show correct App value while since boot
>> report shows very high App values. This happens because in that case app
>> is obtained by dividing pool idle time by LPAR uptime. Since pool idle
>> time is reported by the PowerVM hypervisor since its boot, it need not
>> align with LPAR boot. This leads to large App values.
>>
>> To fix that export boot pool idle time in lparcfg and powerpc-utils will
>> use this info to derive App as below for since boot reports.
>>
>> App = (pool idle time - boot pool idle time) / (uptime * timebase)
>>
>> Results:: Observe app values.
>> ====================== Shared LPAR ================================
>> lparstat
>> System Configuration
>> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
>>
>> reboot
>> stress-ng --cpu=$(nproc) -t 600
>> sleep 600
>> So in this case app is expected to close to 37-6=31.
>>
>> ====== 6.9-rc1 and lparstat 1.3.10  =============
>> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
>> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
>> 47.48  0.01  0.00    52.51     0.00  0.00 47.49 69099.72 541547    21
>>
>> === With this patch and powerpc-utils patch to do the above equation ===
>> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
>> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
>> 47.48  0.01  0.00    52.51     5.73 47.75 47.49 31.21 541753    21
>> =====================================================================
>>
>> Note: physc, purr/idle purr being inaccurate is being handled in a
>> separate patch in powerpc-utils tree.
>>
>> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
>> ---
>> Note:
>>
>> This patch needs to merged first in the kernel for the powerpc-utils
>> patches to work. powerpc-utils patches will be posted to its mailing
>> list and link would be found in the reply to this patch if available.
>>
>> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
>> index f73c4d1c26af..8df4e7c529d7 100644
>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time,
>>  	return rc;
>>  }
>>
>> +unsigned long boot_pool_idle_time;
> 
> Should be static, and u64. Better to use explicitly sized types for data
> at the kernel-hypervisor boundary.

Current usage of h_pic doesn't follow this either. Are you suggesting we change that 
as well? Or is this applicable to only boot_pool_idle_time?

For example in parse_ppp_data: 

        if (lppaca_shared_proc()) {
                unsigned long pool_idle_time, pool_procs;

		h_pic(&pool_idle_time, &pool_procs);
                seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
                seq_printf(m, "pool_num_procs=%ld\n", pool_procs);

> 
>> +
>>  /*
>>   * parse_ppp_data
>>   * Parse out the data returned from h_get_ppp and h_pic
>> @@ -218,6 +220,7 @@ static void parse_ppp_data(struct seq_file *m)
>>  		h_pic(&pool_idle_time, &pool_procs);
>>  		seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
>>  		seq_printf(m, "pool_num_procs=%ld\n", pool_procs);
>> +		seq_printf(m, "boot_pool_idle_time=%ld\n", boot_pool_idle_time);
> 
> If boot_pool_idle_time is unsigned then the format string should be %ul
> or similar, not %ld.
> 
>>  	}
>>
>>  	seq_printf(m, "unallocated_capacity_weight=%d\n",
>> @@ -792,6 +795,7 @@ static const struct proc_ops lparcfg_proc_ops = {
>>  static int __init lparcfg_init(void)
>>  {
>>  	umode_t mode = 0444;
>> +	unsigned long num_procs;
>>
>>  	/* Allow writing if we have FW_FEATURE_SPLPAR */
>>  	if (firmware_has_feature(FW_FEATURE_SPLPAR))
>> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void)
>>  		printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
>>  		return -EIO;
>>  	}
>> +
>> +	h_pic(&boot_pool_idle_time, &num_procs);
> 
> h_pic() can fail, leaving the out parameters uninitialized.

Naveen pointed to me this a while ago, but I forgot. 

Currently h_pic return value is not checked at all, either at boor or at runtime. 
When it fails, should we re-try or just print a kernel debug? What would be expected 
behavior? because if it fails, it would anyway result in wrong values of app even 
if the variables are initialized to 0.

> 
>> +
>>  	return 0;
>>  }
>>  machine_device_initcall(pseries, lparcfg_init);
>> --
>> 2.39.3

  reply	other threads:[~2024-04-05 17:03 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-05 10:13 [PATCH] powerpc/pseries: Add pool idle time at LPAR boot Shrikanth Hegde
2024-04-05 11:14 ` Shrikanth Hegde
2024-04-05 12:49 ` Nathan Lynch
2024-04-05 17:02   ` Shrikanth Hegde [this message]
2024-04-05 20:03     ` Nathan Lynch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b1d3241-2caa-4bd1-b9cc-871d176c409e@linux.ibm.com \
    --to=sshegde@linux.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=nathanl@linux.ibm.com \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=tyreld@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox