LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/pseries: Add pool idle time at LPAR boot
@ 2024-04-05 10:13 Shrikanth Hegde
  2024-04-05 11:14 ` Shrikanth Hegde
  2024-04-05 12:49 ` Nathan Lynch
  0 siblings, 2 replies; 5+ messages in thread
From: Shrikanth Hegde @ 2024-04-05 10:13 UTC (permalink / raw)
  To: mpe; +Cc: tyreld, sshegde, npiggin, mahesh, naveen.n.rao, linuxppc-dev

When there are no options specified for lparstat, it is expected to
give reports since LPAR(Logical Partition) boot. App is an indicator
for available processor pool in an Shared Processor LPAR(SPLPAR). App is
derived using pool_idle_time which is obtained using H_PIC call.

The interval based reports show correct App value while since boot
report shows very high App values. This happens because in that case app
is obtained by dividing pool idle time by LPAR uptime. Since pool idle
time is reported by the PowerVM hypervisor since its boot, it need not
align with LPAR boot. This leads to large App values.

To fix that export boot pool idle time in lparcfg and powerpc-utils will
use this info to derive App as below for since boot reports.

App = (pool idle time - boot pool idle time) / (uptime * timebase)

Results:: Observe app values.
====================== Shared LPAR ================================
lparstat
System Configuration
type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00

reboot
stress-ng --cpu=$(nproc) -t 600
sleep 600
So in this case app is expected to close to 37-6=31.

====== 6.9-rc1 and lparstat 1.3.10  =============
%user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
----- ----- -----    -----    ----- ----- ----- ----- ----- -----
47.48  0.01  0.00    52.51     0.00  0.00 47.49 69099.72 541547    21

=== With this patch and powerpc-utils patch to do the above equation ===
%user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
----- ----- -----    -----    ----- ----- ----- ----- ----- -----
47.48  0.01  0.00    52.51     5.73 47.75 47.49 31.21 541753    21
=====================================================================

Note: physc, purr/idle purr being inaccurate is being handled in a
separate patch in powerpc-utils tree.

Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
Note:

This patch needs to merged first in the kernel for the powerpc-utils
patches to work. powerpc-utils patches will be posted to its mailing
list and link would be found in the reply to this patch if available.

arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
index f73c4d1c26af..8df4e7c529d7 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time,
 	return rc;
 }

+unsigned long boot_pool_idle_time;
+
 /*
  * parse_ppp_data
  * Parse out the data returned from h_get_ppp and h_pic
@@ -218,6 +220,7 @@ static void parse_ppp_data(struct seq_file *m)
 		h_pic(&pool_idle_time, &pool_procs);
 		seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
 		seq_printf(m, "pool_num_procs=%ld\n", pool_procs);
+		seq_printf(m, "boot_pool_idle_time=%ld\n", boot_pool_idle_time);
 	}

 	seq_printf(m, "unallocated_capacity_weight=%d\n",
@@ -792,6 +795,7 @@ static const struct proc_ops lparcfg_proc_ops = {
 static int __init lparcfg_init(void)
 {
 	umode_t mode = 0444;
+	unsigned long num_procs;

 	/* Allow writing if we have FW_FEATURE_SPLPAR */
 	if (firmware_has_feature(FW_FEATURE_SPLPAR))
@@ -801,6 +805,9 @@ static int __init lparcfg_init(void)
 		printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
 		return -EIO;
 	}
+
+	h_pic(&boot_pool_idle_time, &num_procs);
+
 	return 0;
 }
 machine_device_initcall(pseries, lparcfg_init);
--
2.39.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc/pseries: Add pool idle time at LPAR boot
  2024-04-05 10:13 [PATCH] powerpc/pseries: Add pool idle time at LPAR boot Shrikanth Hegde
@ 2024-04-05 11:14 ` Shrikanth Hegde
  2024-04-05 12:49 ` Nathan Lynch
  1 sibling, 0 replies; 5+ messages in thread
From: Shrikanth Hegde @ 2024-04-05 11:14 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: naveen.n.rao, tyreld, mahesh, npiggin



On 4/5/24 3:43 PM, Shrikanth Hegde wrote:
> When there are no options specified for lparstat, it is expected to
> give reports since LPAR(Logical Partition) boot. App is an indicator
> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
> derived using pool_idle_time which is obtained using H_PIC call.
> 
powerpc-utils link: https://groups.google.com/g/powerpc-utils-devel/c/WZFxrm2aCzU 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc/pseries: Add pool idle time at LPAR boot
  2024-04-05 10:13 [PATCH] powerpc/pseries: Add pool idle time at LPAR boot Shrikanth Hegde
  2024-04-05 11:14 ` Shrikanth Hegde
@ 2024-04-05 12:49 ` Nathan Lynch
  2024-04-05 17:02   ` Shrikanth Hegde
  1 sibling, 1 reply; 5+ messages in thread
From: Nathan Lynch @ 2024-04-05 12:49 UTC (permalink / raw)
  To: Shrikanth Hegde, mpe
  Cc: tyreld, sshegde, npiggin, mahesh, naveen.n.rao, linuxppc-dev

Shrikanth Hegde <sshegde@linux.ibm.com> writes:
> When there are no options specified for lparstat, it is expected to
> give reports since LPAR(Logical Partition) boot. App is an indicator
> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
> derived using pool_idle_time which is obtained using H_PIC call.

If "App" is short for "Available Procesoor Pool" then it should be
written "APP" and the it should be introduced and defined more clearly
than this.


> The interval based reports show correct App value while since boot
> report shows very high App values. This happens because in that case app
> is obtained by dividing pool idle time by LPAR uptime. Since pool idle
> time is reported by the PowerVM hypervisor since its boot, it need not
> align with LPAR boot. This leads to large App values.
>
> To fix that export boot pool idle time in lparcfg and powerpc-utils will
> use this info to derive App as below for since boot reports.
>
> App = (pool idle time - boot pool idle time) / (uptime * timebase)
>
> Results:: Observe app values.
> ====================== Shared LPAR ================================
> lparstat
> System Configuration
> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
>
> reboot
> stress-ng --cpu=$(nproc) -t 600
> sleep 600
> So in this case app is expected to close to 37-6=31.
>
> ====== 6.9-rc1 and lparstat 1.3.10  =============
> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
> 47.48  0.01  0.00    52.51     0.00  0.00 47.49 69099.72 541547    21
>
> === With this patch and powerpc-utils patch to do the above equation ===
> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
> 47.48  0.01  0.00    52.51     5.73 47.75 47.49 31.21 541753    21
> =====================================================================
>
> Note: physc, purr/idle purr being inaccurate is being handled in a
> separate patch in powerpc-utils tree.
>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> ---
> Note:
>
> This patch needs to merged first in the kernel for the powerpc-utils
> patches to work. powerpc-utils patches will be posted to its mailing
> list and link would be found in the reply to this patch if available.
>
> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
> index f73c4d1c26af..8df4e7c529d7 100644
> --- a/arch/powerpc/platforms/pseries/lparcfg.c
> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time,
>  	return rc;
>  }
>
> +unsigned long boot_pool_idle_time;

Should be static, and u64. Better to use explicitly sized types for data
at the kernel-hypervisor boundary.

> +
>  /*
>   * parse_ppp_data
>   * Parse out the data returned from h_get_ppp and h_pic
> @@ -218,6 +220,7 @@ static void parse_ppp_data(struct seq_file *m)
>  		h_pic(&pool_idle_time, &pool_procs);
>  		seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
>  		seq_printf(m, "pool_num_procs=%ld\n", pool_procs);
> +		seq_printf(m, "boot_pool_idle_time=%ld\n", boot_pool_idle_time);

If boot_pool_idle_time is unsigned then the format string should be %ul
or similar, not %ld.

>  	}
>
>  	seq_printf(m, "unallocated_capacity_weight=%d\n",
> @@ -792,6 +795,7 @@ static const struct proc_ops lparcfg_proc_ops = {
>  static int __init lparcfg_init(void)
>  {
>  	umode_t mode = 0444;
> +	unsigned long num_procs;
>
>  	/* Allow writing if we have FW_FEATURE_SPLPAR */
>  	if (firmware_has_feature(FW_FEATURE_SPLPAR))
> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void)
>  		printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
>  		return -EIO;
>  	}
> +
> +	h_pic(&boot_pool_idle_time, &num_procs);

h_pic() can fail, leaving the out parameters uninitialized.

> +
>  	return 0;
>  }
>  machine_device_initcall(pseries, lparcfg_init);
> --
> 2.39.3

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc/pseries: Add pool idle time at LPAR boot
  2024-04-05 12:49 ` Nathan Lynch
@ 2024-04-05 17:02   ` Shrikanth Hegde
  2024-04-05 20:03     ` Nathan Lynch
  0 siblings, 1 reply; 5+ messages in thread
From: Shrikanth Hegde @ 2024-04-05 17:02 UTC (permalink / raw)
  To: Nathan Lynch, mpe; +Cc: naveen.n.rao, tyreld, linuxppc-dev, mahesh, npiggin



On 4/5/24 6:19 PM, Nathan Lynch wrote:
> Shrikanth Hegde <sshegde@linux.ibm.com> writes:

Hi Nathan, Thanks for reviewing this.

>> When there are no options specified for lparstat, it is expected to
>> give reports since LPAR(Logical Partition) boot. App is an indicator
>> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
>> derived using pool_idle_time which is obtained using H_PIC call.
> 
> If "App" is short for "Available Procesoor Pool" then it should be
> written "APP" and the it should be introduced and defined more clearly
> than this.
> 

Ok.I reworded it for v2. 

yes APP is Available Processor Pool. 

> 
>> The interval based reports show correct App value while since boot
>> report shows very high App values. This happens because in that case app
>> is obtained by dividing pool idle time by LPAR uptime. Since pool idle
>> time is reported by the PowerVM hypervisor since its boot, it need not
>> align with LPAR boot. This leads to large App values.
>>
>> To fix that export boot pool idle time in lparcfg and powerpc-utils will
>> use this info to derive App as below for since boot reports.
>>
>> App = (pool idle time - boot pool idle time) / (uptime * timebase)
>>
>> Results:: Observe app values.
>> ====================== Shared LPAR ================================
>> lparstat
>> System Configuration
>> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
>>
>> reboot
>> stress-ng --cpu=$(nproc) -t 600
>> sleep 600
>> So in this case app is expected to close to 37-6=31.
>>
>> ====== 6.9-rc1 and lparstat 1.3.10  =============
>> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
>> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
>> 47.48  0.01  0.00    52.51     0.00  0.00 47.49 69099.72 541547    21
>>
>> === With this patch and powerpc-utils patch to do the above equation ===
>> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
>> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
>> 47.48  0.01  0.00    52.51     5.73 47.75 47.49 31.21 541753    21
>> =====================================================================
>>
>> Note: physc, purr/idle purr being inaccurate is being handled in a
>> separate patch in powerpc-utils tree.
>>
>> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
>> ---
>> Note:
>>
>> This patch needs to merged first in the kernel for the powerpc-utils
>> patches to work. powerpc-utils patches will be posted to its mailing
>> list and link would be found in the reply to this patch if available.
>>
>> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
>> index f73c4d1c26af..8df4e7c529d7 100644
>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time,
>>  	return rc;
>>  }
>>
>> +unsigned long boot_pool_idle_time;
> 
> Should be static, and u64. Better to use explicitly sized types for data
> at the kernel-hypervisor boundary.

Current usage of h_pic doesn't follow this either. Are you suggesting we change that 
as well? Or is this applicable to only boot_pool_idle_time?

For example in parse_ppp_data: 

        if (lppaca_shared_proc()) {
                unsigned long pool_idle_time, pool_procs;

		h_pic(&pool_idle_time, &pool_procs);
                seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
                seq_printf(m, "pool_num_procs=%ld\n", pool_procs);

> 
>> +
>>  /*
>>   * parse_ppp_data
>>   * Parse out the data returned from h_get_ppp and h_pic
>> @@ -218,6 +220,7 @@ static void parse_ppp_data(struct seq_file *m)
>>  		h_pic(&pool_idle_time, &pool_procs);
>>  		seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
>>  		seq_printf(m, "pool_num_procs=%ld\n", pool_procs);
>> +		seq_printf(m, "boot_pool_idle_time=%ld\n", boot_pool_idle_time);
> 
> If boot_pool_idle_time is unsigned then the format string should be %ul
> or similar, not %ld.
> 
>>  	}
>>
>>  	seq_printf(m, "unallocated_capacity_weight=%d\n",
>> @@ -792,6 +795,7 @@ static const struct proc_ops lparcfg_proc_ops = {
>>  static int __init lparcfg_init(void)
>>  {
>>  	umode_t mode = 0444;
>> +	unsigned long num_procs;
>>
>>  	/* Allow writing if we have FW_FEATURE_SPLPAR */
>>  	if (firmware_has_feature(FW_FEATURE_SPLPAR))
>> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void)
>>  		printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
>>  		return -EIO;
>>  	}
>> +
>> +	h_pic(&boot_pool_idle_time, &num_procs);
> 
> h_pic() can fail, leaving the out parameters uninitialized.

Naveen pointed to me this a while ago, but I forgot. 

Currently h_pic return value is not checked at all, either at boor or at runtime. 
When it fails, should we re-try or just print a kernel debug? What would be expected 
behavior? because if it fails, it would anyway result in wrong values of app even 
if the variables are initialized to 0.

> 
>> +
>>  	return 0;
>>  }
>>  machine_device_initcall(pseries, lparcfg_init);
>> --
>> 2.39.3

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc/pseries: Add pool idle time at LPAR boot
  2024-04-05 17:02   ` Shrikanth Hegde
@ 2024-04-05 20:03     ` Nathan Lynch
  0 siblings, 0 replies; 5+ messages in thread
From: Nathan Lynch @ 2024-04-05 20:03 UTC (permalink / raw)
  To: Shrikanth Hegde, mpe; +Cc: naveen.n.rao, tyreld, linuxppc-dev, mahesh, npiggin

Shrikanth Hegde <sshegde@linux.ibm.com> writes:
> On 4/5/24 6:19 PM, Nathan Lynch wrote:
>> Shrikanth Hegde <sshegde@linux.ibm.com> writes:
>
> Hi Nathan, Thanks for reviewing this.
>
>>> When there are no options specified for lparstat, it is expected to
>>> give reports since LPAR(Logical Partition) boot. App is an indicator
>>> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
>>> derived using pool_idle_time which is obtained using H_PIC call.
>> 
>> If "App" is short for "Available Procesoor Pool" then it should be
>> written "APP" and the it should be introduced and defined more clearly
>> than this.
>> 
>
> Ok.I reworded it for v2. 
>
> yes APP is Available Processor Pool. 
>
>> 
>>> The interval based reports show correct App value while since boot
>>> report shows very high App values. This happens because in that case app
>>> is obtained by dividing pool idle time by LPAR uptime. Since pool idle
>>> time is reported by the PowerVM hypervisor since its boot, it need not
>>> align with LPAR boot. This leads to large App values.
>>>
>>> To fix that export boot pool idle time in lparcfg and powerpc-utils will
>>> use this info to derive App as below for since boot reports.
>>>
>>> App = (pool idle time - boot pool idle time) / (uptime * timebase)
>>>
>>> Results:: Observe app values.
>>> ====================== Shared LPAR ================================
>>> lparstat
>>> System Configuration
>>> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
>>>
>>> reboot
>>> stress-ng --cpu=$(nproc) -t 600
>>> sleep 600
>>> So in this case app is expected to close to 37-6=31.
>>>
>>> ====== 6.9-rc1 and lparstat 1.3.10  =============
>>> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
>>> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
>>> 47.48  0.01  0.00    52.51     0.00  0.00 47.49 69099.72 541547    21
>>>
>>> === With this patch and powerpc-utils patch to do the above equation ===
>>> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
>>> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
>>> 47.48  0.01  0.00    52.51     5.73 47.75 47.49 31.21 541753    21
>>> =====================================================================
>>>
>>> Note: physc, purr/idle purr being inaccurate is being handled in a
>>> separate patch in powerpc-utils tree.
>>>
>>> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
>>> ---
>>> Note:
>>>
>>> This patch needs to merged first in the kernel for the powerpc-utils
>>> patches to work. powerpc-utils patches will be posted to its mailing
>>> list and link would be found in the reply to this patch if available.
>>>
>>> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
>>> index f73c4d1c26af..8df4e7c529d7 100644
>>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>>> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time,
>>>  	return rc;
>>>  }
>>>
>>> +unsigned long boot_pool_idle_time;
>> 
>> Should be static, and u64. Better to use explicitly sized types for data
>> at the kernel-hypervisor boundary.
>
> Current usage of h_pic doesn't follow this either. Are you suggesting we change that 
> as well?

Yes pretty much. h_pic() as currently written and used has some
problems:

  static unsigned h_pic(unsigned long *pool_idle_time,
                        unsigned long *num_procs)
  {
          unsigned long rc;
          unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
  
          rc = plpar_hcall(H_PIC, retbuf);
  
          *pool_idle_time = retbuf[0];
          *num_procs = retbuf[1];
  
          return rc;
  }

* Coerces the return value of plpar_hcall() to unsigned -- hcall
  errors are negative.
* Assigns *pool_idle_time and *num_procs using uninitialized
  data when H_PIC is unsuccessful.
* Assigns the outparams unconditionally; would be nicer if it allowed
  callers to pass NULL so they don't have to provide dummy inputs that
  aren't even used, as in your change.
* Should follow Linux -errno return value convention in the absence
  of a need for the specific hcall status in its callers.

>>> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void)
>>>  		printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
>>>  		return -EIO;
>>>  	}
>>> +
>>> +	h_pic(&boot_pool_idle_time, &num_procs);
>> 
>> h_pic() can fail, leaving the out parameters uninitialized.
>
> Naveen pointed to me this a while ago, but I forgot. 
>
> Currently h_pic return value is not checked at all, either at boor or at runtime. 
> When it fails, should we re-try or just print a kernel debug? What would be expected 
> behavior? because if it fails, it would anyway result in wrong values of app even 
> if the variables are initialized to 0.

There's nothing in the spec for H_PIC that suggests retrying on failure.
I'm not really in favor of printing a message about it either; that
practice tends to result in log noise in non-PowerVM guests.

When H_PIC doesn't work the corresponding lines should not be emitted in
/proc/powerpc/lparcfg IMO.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-04-05 20:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-05 10:13 [PATCH] powerpc/pseries: Add pool idle time at LPAR boot Shrikanth Hegde
2024-04-05 11:14 ` Shrikanth Hegde
2024-04-05 12:49 ` Nathan Lynch
2024-04-05 17:02   ` Shrikanth Hegde
2024-04-05 20:03     ` Nathan Lynch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox