From: Nathan Lynch <nathanl@linux.ibm.com>
To: Tyrel Datwyler <tyreld@linux.ibm.com>,
Scott Cheloha <cheloha@linux.ibm.com>,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] powerpc/pseries: new lparcfg key/value pair: partition_affinity_score
Date: Fri, 19 Jun 2020 19:32:03 -0500 [thread overview]
Message-ID: <87d05ufugs.fsf@linux.ibm.com> (raw)
In-Reply-To: <10febdd6-4672-24ca-03e9-553468c8852c@linux.ibm.com>
Hi Tyrel,
Tyrel Datwyler <tyreld@linux.ibm.com> writes:
> On 6/19/20 8:34 AM, Scott Cheloha wrote:
>> The H_GetPerformanceCounterInfo PHYP hypercall has a subcall,
>> Affinity_Domain_Info_By_Partition, which returns, among other things,
>> a "partition affinity score" for a given LPAR. This score, a value on
>> [0-100], represents the processor-memory affinity for the LPAR in
>> question. A score of 0 indicates the worst possible affinity while a
>> score of 100 indicates perfect affinity. The score can be used to
>> reason about performance.
>>
>> This patch adds the score for the local LPAR to the lparcfg procfile
>> under a new 'partition_affinity_score' key.
>
> I expect that you will probably get a NACK from Michael on this. The overall
> desire is to move away from these dated /proc interfaces. While its true that I
> did add a new value recently it was strictly to facilitate and correct the
> calculation of a derived value that was already dependent on a couple other
> existing values in lparcfg.
>
> With that said I would expect that you would likely be advised to expose this as
> a sysfs attribute. The question is where? We probably should put some thought in
> to this as I would like to port each lparcfg value over to sysfs so that we can
> move to deprecating lparcfg. Putting everything under something like
> /sys/kernel/lparcfg/* maybe. Michael may have a better suggestion.
I think this score fits pretty naturally in lparcfg: it's a simple
metric that is specific to the pseries/papr platform, like everything
else in there.
A few dozen key=value pairs contained in a single file is simple and
efficient, unlike sysfs with its rather inconsistently applied
one-value-per-file convention. Surely it's OK if lparcfg gains a line
every few years?
>> The H_GetPerformanceCounterInfo hypercall is already used elsewhere in
>> the kernel, in powerpc/perf/hv-gpci.c. Refactoring that code and this
>> code into a more general API might be worthwhile if additional modules
>> require the hypercall in the future.
>
> If you are duplicating code its likely you should already be doing this. See the
> rest of my comments about below.
>
>>
>> Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
>> ---
>> arch/powerpc/platforms/pseries/lparcfg.c | 53 ++++++++++++++++++++++++
>> 1 file changed, 53 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
>> index b8d28ab88178..b75151eee0f0 100644
>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>> @@ -136,6 +136,57 @@ static unsigned int h_get_ppp(struct hvcall_ppp_data *ppp_data)
>> return rc;
>> }
>>
>> +/*
>> + * Based on H_GetPerformanceCounterInfo v1.10.
>> + */
>> +static void show_gpci_data(struct seq_file *m)
>> +{
>> + struct perf_counter_info_params {
>> + __be32 counter_request;
>> + __be32 starting_index;
>> + __be16 secondary_index;
>> + __be16 returned_values;
>> + __be32 detail_rc;
>> + __be16 counter_value_element_size;
>> + u8 counter_info_version_in;
>> + u8 counter_info_version_out;
>> + u8 reserved[0xC];
>> + } __packed;
>
> This looks to duplicate the hv_get_perf_counter_info_params struct from
> arch/powerpc/perf/hv-gpci.h. Maybe this include file needs to move to
> arch/powerpc/asm/inlcude so you don't have to redefine this struct.
>
>> + struct hv_gpci_request_buffer {
>> + struct perf_counter_info_params params;
>> + u8 output[4096 - sizeof(struct perf_counter_info_params)];
>> + } __packed;
>
> This struct is code duplication of the one defined in
> arch/powerpc/perf/hv-gpci.c and could be moved into hv-gpci.h along with
> HGPCI_MAX_DATA_BYTES so that you can use those versions here.
I tend to agree with these comments.
>> + struct hv_gpci_request_buffer *buf;
>> + long ret;
>> + unsigned int affinity_score;
>> +
>> + buf = kmalloc(sizeof(*buf), GFP_KERNEL);
>> + if (buf == NULL)
>> + return;
>> +
>> + /*
>> + * Show the local LPAR's affinity score.
>> + *
>> + * 0xB1 selects the Affinity_Domain_Info_By_Partition subcall.
>> + * The score is at byte 0xB in the output buffer.
>> + */
>> + memset(&buf->params, 0, sizeof(buf->params));
>> + buf->params.counter_request = cpu_to_be32(0xB1);
>> + buf->params.starting_index = cpu_to_be32(-1); /* local LPAR */
>> + buf->params.counter_info_version_in = 0x5; /* v5+ for score */
>> + ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO, virt_to_phys(buf),
>> + sizeof(*buf));
>> + if (ret != H_SUCCESS) {
>> + pr_debug("hcall failed: H_GET_PERF_COUNTER_INFO: %ld, %x\n",
>> + ret, be32_to_cpu(buf->params.detail_rc));
>> + goto out;
>> + }
>> + affinity_score = buf->output[0xB];
>> + seq_printf(m, "partition_affinity_score=%u\n", affinity_score);
>> +out:
>> + kfree(buf);
>> +}
>> +
>
> IIUC we should already be able to get this value from userspace using perf tool,
> right? If thats the case can't we also programatically retrieve it via the
> perf_event interface in userspace as well?
No... I had the same thought when I first looked at this, but perf is
for sampling counters, and the partition affinity score is not a
counter.
next prev parent reply other threads:[~2020-06-20 0:33 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-19 15:34 [PATCH] powerpc/pseries: new lparcfg key/value pair: partition_affinity_score Scott Cheloha
2020-06-19 20:38 ` Tyrel Datwyler
2020-06-20 0:32 ` Nathan Lynch [this message]
2020-06-22 18:59 ` Tyrel Datwyler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87d05ufugs.fsf@linux.ibm.com \
--to=nathanl@linux.ibm.com \
--cc=cheloha@linux.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=tyreld@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).