* Re: [PATCH 2/2] selftests/powerpc: Add prefixed loads/stores to alignment_handler test
From: Michael Ellerman @ 2020-06-22 4:19 UTC (permalink / raw)
To: Alistair Popple, linuxppc-dev; +Cc: Jordan Niethe
In-Reply-To: <2070842.8SDOZEvoPg@townsend>
Alistair Popple <alistair@popple.id.au> writes:
> On Wednesday, 20 May 2020 12:11:03 PM AEST Jordan Niethe wrote:
>> +/* POWER10 feature */
>> +#ifndef PPC_FEATURE2_ARCH_3_10
>> +#define PPC_FEATURE2_ARCH_3_10 0x00040000
>> +#endif
>
> One minor nit pick, this needs to be updated to PPC_FEATURE2_ARCH_3_1 to
> reflect the changes made in response to feedback on the patch series that
> introduced this feature.
Done, thanks for noticing.
cheers
^ permalink raw reply
* [PATCH 0/2] powerpc/papr_scm: add support for reporting NVDIMM 'life_used_percentage' metric
From: Vaibhav Jain @ 2020-06-22 4:24 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm
Cc: Santosh Sivaraj, Oliver O'Halloran, Aneesh Kumar K . V,
Vaibhav Jain, Dan Williams
This small patchset implements kernel side support for reporting
'life_used_percentage' metric in NDCTL with dimm health output for
papr-scm NVDIMMs. With corresponding NDCTL side changes [1] output for
should be like:
$ sudo ndctl list -DH
[
{
"dev":"nmem0",
"health":{
"health_state":"ok",
"life_used_percentage":0,
"shutdown_state":"clean"
}
}
]
PHYP supports H_SCM_PERFORMANCE_STATS hcall through which an LPAR can
fetch various performance stats including 'fuel_gauge' percentage for
an NVDIMM. 'fuel_gauge' metric indicates the usable life remaining of
an NVDIMM expressed as percentage and 'life_used_percentage' can be
calculated as 'life_used_percentage = 100 - fuel_gauge'.
Structure of the patchset
=========================
First patch implements necessary scaffolding needed to issue the
H_SCM_PERFORMANCE_STATS hcall and fetch performance stats
catalogue. The patch also implements support for 'perf_stats' sysfs
attribute to report the full catalogue of supported performance stats
by PHYP.
Second and final patch implements support for sending this value to
libndctl by extending the PAPR_PDSM_HEALTH pdsm payload to add a new
field named 'dimm_fuel_gauge' to it.
References
==========
[1]
https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v13_run_guage
Vaibhav Jain (2):
powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 +++
arch/powerpc/include/uapi/asm/papr_pdsm.h | 9 +
arch/powerpc/platforms/pseries/papr_scm.c | 186 ++++++++++++++++++
3 files changed, 222 insertions(+)
--
2.26.2
^ permalink raw reply
* [PATCH 1/2] powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
From: Vaibhav Jain @ 2020-06-22 4:24 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm
Cc: Santosh Sivaraj, Oliver O'Halloran, Aneesh Kumar K . V,
Vaibhav Jain, Dan Williams
In-Reply-To: <20200622042451.22448-1-vaibhav@linux.ibm.com>
Update papr_scm.c to query dimm performance statistics from PHYP via
H_SCM_PERFORMANCE_STATS hcall and export them to user-space as PAPR
specific NVDIMM attribute 'perf_stats' in sysfs. The patch also
provide a sysfs ABI documentation for the stats being reported and
their meanings.
During NVDIMM probe time in papr_scm_nvdimm_init() a special variant
of H_SCM_PERFORMANCE_STATS hcall is issued to check if collection of
performance statistics is supported or not. If successful then a PHYP
returns a maximum possible buffer length needed to read all
performance stats. This returned value is stored in a per-nvdimm
attribute 'len_stat_buffer'.
The layout of request buffer for reading NVDIMM performance stats from
PHYP is defined in 'struct papr_scm_perf_stats' and 'struct
papr_scm_perf_stat'. These structs are used in newly introduced
drc_pmem_query_stats() that issues the H_SCM_PERFORMANCE_STATS hcall.
The sysfs access function perf_stats_show() uses value
'len_stat_buffer' to allocate a buffer large enough to hold all
possible NVDIMM performance stats and passes it to
drc_pmem_query_stats() to populate. Finally statistics reported in the
buffer are formatted into the sysfs access function output buffer.
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 ++++
arch/powerpc/platforms/pseries/papr_scm.c | 139 ++++++++++++++++++
2 files changed, 166 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
index 5b10d036a8d4..c1a67275c43f 100644
--- a/Documentation/ABI/testing/sysfs-bus-papr-pmem
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -25,3 +25,30 @@ Description:
NVDIMM have been scrubbed.
* "locked" : Indicating that NVDIMM contents cant
be modified until next power cycle.
+
+What: /sys/bus/nd/devices/nmemX/papr/perf_stats
+Date: May, 2020
+KernelVersion: v5.9
+Contact: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, linux-nvdimm@lists.01.org,
+Description:
+ (RO) Report various performance stats related to papr-scm NVDIMM
+ device. Each stat is reported on a new line with each line
+ composed of a stat-identifier followed by it value. Below are
+ currently known dimm performance stats which are reported:
+
+ * "CtlResCt" : Controller Reset Count
+ * "CtlResTm" : Controller Reset Elapsed Time
+ * "PonSecs " : Power-on Seconds
+ * "MemLife " : Life Remaining
+ * "CritRscU" : Critical Resource Utilization
+ * "HostLCnt" : Host Load Count
+ * "HostSCnt" : Host Store Count
+ * "HostSDur" : Host Store Duration
+ * "HostLDur" : Host Load Duration
+ * "MedRCnt " : Media Read Count
+ * "MedWCnt " : Media Write Count
+ * "MedRDur " : Media Read Duration
+ * "MedWDur " : Media Write Duration
+ * "CchRHCnt" : Cache Read Hit Count
+ * "CchWHCnt" : Cache Write Hit Count
+ * "FastWCnt" : Fast Write Count
\ No newline at end of file
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 9c569078a09f..cb3f9acc325b 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -62,6 +62,24 @@
PAPR_PMEM_HEALTH_FATAL | \
PAPR_PMEM_HEALTH_UNHEALTHY)
+#define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
+#define PAPR_SCM_PERF_STATS_VERSION 0x1
+
+/* Struct holding a single performance metric */
+struct papr_scm_perf_stat {
+ u8 statistic_id[8];
+ u64 statistic_value;
+};
+
+/* Struct exchanged between kernel and PHYP for fetching drc perf stats */
+struct papr_scm_perf_stats {
+ u8 eye_catcher[8];
+ u32 stats_version; /* Should be 0x01 */
+ u32 num_statistics; /* Number of stats following */
+ /* zero or more performance matrics */
+ struct papr_scm_perf_stat scm_statistic[];
+} __packed;
+
/* private struct associated with each region */
struct papr_scm_priv {
struct platform_device *pdev;
@@ -89,6 +107,9 @@ struct papr_scm_priv {
/* Health information for the dimm */
u64 health_bitmap;
+
+ /* length of the stat buffer as expected by phyp */
+ size_t len_stat_buffer;
};
static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -194,6 +215,75 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
return drc_pmem_bind(p);
}
+/*
+ * Query the Dimm performance stats from PHYP and copy them (if returned) to
+ * provided struct papr_scm_perf_stats instance 'stats' of 'size' in bytes.
+ * The value of R4 is copied to 'out' if the pointer is provided.
+ */
+static int drc_pmem_query_stats(struct papr_scm_priv *p,
+ struct papr_scm_perf_stats *buff_stats,
+ size_t size, unsigned int num_stats,
+ uint64_t *out)
+{
+ unsigned long ret[PLPAR_HCALL_BUFSIZE];
+ struct papr_scm_perf_stat *stats;
+ s64 rc, i;
+
+ /* Setup the out buffer */
+ if (buff_stats) {
+ memcpy(buff_stats->eye_catcher,
+ PAPR_SCM_PERF_STATS_EYECATCHER, 8);
+ buff_stats->stats_version =
+ cpu_to_be32(PAPR_SCM_PERF_STATS_VERSION);
+ buff_stats->num_statistics =
+ cpu_to_be32(num_stats);
+ } else {
+ /* In case of no out buffer ignore the size */
+ size = 0;
+ }
+
+ /*
+ * Do the HCALL asking PHYP for info and if R4 was requested
+ * return its value in 'out' variable.
+ */
+ rc = plpar_hcall(H_SCM_PERFORMANCE_STATS, ret, p->drc_index,
+ virt_to_phys(buff_stats), size);
+ if (out)
+ *out = ret[0];
+
+ if (rc == H_PARTIAL) {
+ dev_err(&p->pdev->dev,
+ "Unknown performance stats, Err:0x%016lX\n", ret[0]);
+ return -ENOENT;
+ } else if (rc != H_SUCCESS) {
+ dev_err(&p->pdev->dev,
+ "Failed to query performance stats, Err:%lld\n", rc);
+ return -ENXIO;
+ }
+
+ /* Successfully fetched the requested stats from phyp */
+ if (size != 0) {
+ buff_stats->num_statistics =
+ be32_to_cpu(buff_stats->num_statistics);
+
+ /* Transform the stats buffer values from BE to cpu native */
+ for (i = 0, stats = buff_stats->scm_statistic;
+ i < buff_stats->num_statistics; ++i) {
+ stats[i].statistic_value =
+ be64_to_cpu(stats[i].statistic_value);
+ }
+ dev_dbg(&p->pdev->dev,
+ "Performance stats returned %d stats\n",
+ buff_stats->num_statistics);
+ } else {
+ /* Handle case where stat buffer size was requested */
+ dev_dbg(&p->pdev->dev,
+ "Performance stats size %ld\n", ret[0]);
+ }
+
+ return 0;
+}
+
/*
* Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
* health information.
@@ -631,6 +721,45 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
return 0;
}
+static ssize_t perf_stats_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ int index, rc;
+ struct seq_buf s;
+ struct papr_scm_perf_stat *stat;
+ struct papr_scm_perf_stats *stats;
+ struct nvdimm *dimm = to_nvdimm(dev);
+ struct papr_scm_priv *p = nvdimm_provider_data(dimm);
+
+ if (!p->len_stat_buffer)
+ return -ENOENT;
+
+ /* Allocate the buffer for phyp where stats are written */
+ stats = kzalloc(p->len_stat_buffer, GFP_KERNEL);
+ if (!stats)
+ return -ENOMEM;
+
+ /* Ask phyp to return all dimm perf stats */
+ rc = drc_pmem_query_stats(p, stats, p->len_stat_buffer, 0, NULL);
+ if (!rc) {
+ /*
+ * Go through the returned output buffer and print stats and
+ * values. Since statistic_id is essentially a char string of
+ * 8 bytes, simply use the string format specifier to print it.
+ */
+ seq_buf_init(&s, buf, PAGE_SIZE);
+ for (index = 0, stat = stats->scm_statistic;
+ index < stats->num_statistics; ++index, ++stat) {
+ seq_buf_printf(&s, "%.8s = 0x%016llX\n",
+ stat->statistic_id, stat->statistic_value);
+ }
+ }
+
+ kfree(stats);
+ return rc ? rc : seq_buf_used(&s);
+}
+DEVICE_ATTR_RO(perf_stats);
+
static ssize_t flags_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -676,6 +805,7 @@ DEVICE_ATTR_RO(flags);
/* papr_scm specific dimm attributes */
static struct attribute *papr_nd_attributes[] = {
&dev_attr_flags.attr,
+ &dev_attr_perf_stats.attr,
NULL,
};
@@ -696,6 +826,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
struct nd_region_desc ndr_desc;
unsigned long dimm_flags;
int target_nid, online_nid;
+ u64 stat_size;
p->bus_desc.ndctl = papr_scm_ndctl;
p->bus_desc.module = THIS_MODULE;
@@ -759,6 +890,14 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
dev_info(dev, "Region registered with target node %d and online node %d",
target_nid, online_nid);
+ /* Try retriving the stat buffer and see if its supported */
+ if (!drc_pmem_query_stats(p, NULL, 0, 0, &stat_size)) {
+ p->len_stat_buffer = (size_t)stat_size;
+ dev_dbg(&p->pdev->dev, "Max perf-stat size %lu-bytes\n",
+ p->len_stat_buffer);
+ } else {
+ dev_info(&p->pdev->dev, "Limited dimm stat info available\n");
+ }
return 0;
err: nvdimm_bus_unregister(p->bus);
--
2.26.2
^ permalink raw reply related
* [PATCH 2/2] powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
From: Vaibhav Jain @ 2020-06-22 4:24 UTC (permalink / raw)
To: linuxppc-dev, linux-nvdimm
Cc: Santosh Sivaraj, Oliver O'Halloran, Aneesh Kumar K . V,
Vaibhav Jain, Dan Williams
In-Reply-To: <20200622042451.22448-1-vaibhav@linux.ibm.com>
We add support for reporting 'fuel-gauge' NVDIMM metric via
PAPR_PDSM_HEALTH pdsm payload. 'fuel-gauge' metric indicates the usage
life remaining of a papr-scm compatible NVDIMM. PHYP exposes this
metric via the H_SCM_PERFORMANCE_STATS.
The metric value is returned from the pdsm by extending the return
payload 'struct nd_papr_pdsm_health' without breaking the ABI. A new
field 'dimm_fuel_gauge' to hold the metric value is introduced at the
end of the payload struct and its presence is indicated by by
extension flag PDSM_DIMM_HEALTH_RUN_GAUGE_VALID.
The patch introduces a new function papr_pdsm_fuel_gauge() that is
called from papr_pdsm_health(). If fetching NVDIMM performance stats
is supported then 'papr_pdsm_fuel_gauge()' allocated an output buffer
large enough to hold the performance stat and passes it to
drc_pmem_query_stats() that issues the HCALL to PHYP. The return value
of the stat is then populated in the 'struct
nd_papr_pdsm_health.dimm_fuel_gauge' field with extension flag
'PDSM_DIMM_HEALTH_RUN_GAUGE_VALID' set in 'struct
nd_papr_pdsm_health.extension_flags'
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
arch/powerpc/include/uapi/asm/papr_pdsm.h | 9 +++++
arch/powerpc/platforms/pseries/papr_scm.c | 47 +++++++++++++++++++++++
2 files changed, 56 insertions(+)
diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
index 9ccecc1d6840..50ef95e2f5b1 100644
--- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -72,6 +72,11 @@
#define PAPR_PDSM_DIMM_CRITICAL 2
#define PAPR_PDSM_DIMM_FATAL 3
+/* struct nd_papr_pdsm_health.extension_flags field flags */
+
+/* Indicate that the 'dimm_fuel_gauge' field is valid */
+#define PDSM_DIMM_HEALTH_RUN_GAUGE_VALID 1
+
/*
* Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
* Various flags indicate the health status of the dimm.
@@ -84,6 +89,7 @@
* dimm_locked : Contents of the dimm cant be modified until CEC reboot
* dimm_encrypted : Contents of dimm are encrypted.
* dimm_health : Dimm health indicator. One of PAPR_PDSM_DIMM_XXXX
+ * dimm_fuel_gauge : Life remaining of DIMM as a percentage from 0-100
*/
struct nd_papr_pdsm_health {
union {
@@ -96,6 +102,9 @@ struct nd_papr_pdsm_health {
__u8 dimm_locked;
__u8 dimm_encrypted;
__u16 dimm_health;
+
+ /* Extension flag PDSM_DIMM_HEALTH_RUN_GAUGE_VALID */
+ __u16 dimm_fuel_gauge;
};
__u8 buf[ND_PDSM_PAYLOAD_MAX_SIZE];
};
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index cb3f9acc325b..39527cd38d9c 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -506,6 +506,45 @@ static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
return 0;
}
+static int papr_pdsm_fuel_gauge(struct papr_scm_priv *p,
+ union nd_pdsm_payload *payload)
+{
+ int rc, size;
+ struct papr_scm_perf_stat *stat;
+ struct papr_scm_perf_stats *stats;
+
+ /* Silently fail if fetching performance metrics isn't supported */
+ if (!p->len_stat_buffer)
+ return 0;
+
+ /* Allocate request buffer enough to hold single performance stat */
+ size = sizeof(struct papr_scm_perf_stats) +
+ sizeof(struct papr_scm_perf_stat);
+
+ stats = kzalloc(size, GFP_KERNEL);
+ if (!stats)
+ return -ENOMEM;
+
+ stat = &stats->scm_statistic[0];
+ memcpy(&stat->statistic_id, "MemLife ", sizeof(stat->statistic_id));
+ stat->statistic_value = 0;
+
+ /* Fetch the fuel gauge and populate it in payload */
+ rc = drc_pmem_query_stats(p, stats, size, 1, NULL);
+ if (!rc) {
+ dev_dbg(&p->pdev->dev,
+ "Fetched fuel-gauge %llu", stat->statistic_value);
+ payload->health.extension_flags |=
+ PDSM_DIMM_HEALTH_RUN_GAUGE_VALID;
+ payload->health.dimm_fuel_gauge = stat->statistic_value;
+
+ rc = sizeof(struct nd_papr_pdsm_health);
+ }
+
+ kfree(stats);
+ return rc;
+}
+
/* Fetch the DIMM health info and populate it in provided package. */
static int papr_pdsm_health(struct papr_scm_priv *p,
union nd_pdsm_payload *payload)
@@ -546,6 +585,14 @@ static int papr_pdsm_health(struct papr_scm_priv *p,
/* struct populated hence can release the mutex now */
mutex_unlock(&p->health_mutex);
+
+ /* Populate the fuel gauge meter in the payload */
+ rc = papr_pdsm_fuel_gauge(p, payload);
+
+ /* Error fetching fuel gauge is not fatal */
+ if (rc < 0)
+ dev_dbg(&p->pdev->dev, "Err(%d) fetching fuel gauge\n", rc);
+
rc = sizeof(struct nd_papr_pdsm_health);
out:
--
2.26.2
^ permalink raw reply related
* Re: [PATCH 1/2] powerpc/perf/hv-24x7: Add cpu hotplug support
From: kajoljain @ 2020-06-22 5:48 UTC (permalink / raw)
To: ego; +Cc: nathanl, maddy, suka, anju, linuxppc-dev
In-Reply-To: <20200619045801.GA13981@in.ibm.com>
On 6/19/20 10:28 AM, Gautham R Shenoy wrote:
> Hello Kajol,
>
> On Thu, Jun 18, 2020 at 05:57:12PM +0530, Kajol Jain wrote:
>> Patch here adds cpu hotplug functions to hv_24x7 pmu.
>> A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE" enum
>> is added.
>>
>> The online function update the cpumask only if its NULL.
>> As the primary intention for adding hotplug support
>> is to desiginate a CPU to make HCALL to collect the
>> count data.
>>
>> The offline function test and clear corresponding cpu in a cpumask
>> and update cpumask to any other active cpu.
>>
>> With this patchset, perf tool side does not need "-C <cpu>"
>> to be added.
>>
>> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
>> ---
>> arch/powerpc/perf/hv-24x7.c | 45 +++++++++++++++++++++++++++++++++++++
>> include/linux/cpuhotplug.h | 1 +
>> 2 files changed, 46 insertions(+)
>>
>> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
>> index db213eb7cb02..fdc4ae155d60 100644
>> --- a/arch/powerpc/perf/hv-24x7.c
>> +++ b/arch/powerpc/perf/hv-24x7.c
>> @@ -31,6 +31,8 @@ static int interface_version;
>> /* Whether we have to aggregate result data for some domains. */
>> static bool aggregate_result_elements;
>>
>> +static cpumask_t hv_24x7_cpumask;
>> +
>> static bool domain_is_valid(unsigned domain)
>> {
>> switch (domain) {
>> @@ -1641,6 +1643,44 @@ static struct pmu h_24x7_pmu = {
>> .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
>> };
>>
>> +static int ppc_hv_24x7_cpu_online(unsigned int cpu)
>> +{
>> + /* Make this CPU the designated target for counter collection */
>> + if (cpumask_empty(&hv_24x7_cpumask))
>> + cpumask_set_cpu(cpu, &hv_24x7_cpumask);
>> +
>> + return 0;
>> +}
>> +
>> +static int ppc_hv_24x7_cpu_offline(unsigned int cpu)
>> +{
>> + int target = -1;
>> +
>> + /* Check if exiting cpu is used for collecting 24x7 events */
>> + if (!cpumask_test_and_clear_cpu(cpu, &hv_24x7_cpumask))
>> + return 0;
>> +
>> + /* Find a new cpu to collect 24x7 events */
>> + target = cpumask_any_but(cpu_active_mask, cpu);
>
> cpumask_any_but() typically picks the first CPU in cpu_active_mask
> that is not @cpu.
>
>
>> +
>> + if (target < 0 || target >= nr_cpu_ids)
>> + return -1;
>> +
>> + /* Migrate 24x7 events to the new target */
>> + cpumask_set_cpu(target, &hv_24x7_cpumask);
>> + perf_pmu_migrate_context(&h_24x7_pmu, cpu, target);
>
>
> On a system with N CPUs numbered [O..N-1], can you please verify if
> the time required to sequentially offline CPUs [0..N-2] ,in that
> order, increase with this patch ?
>
> I am asking this because we have encountered this problem once before
> at a customer site and the commit 9c9f8fb71fee ("powerpc/perf: Use
> cpumask_last() to determine the designated cpu for nest/core units.")
> was introduced to fix that problem.
>
Hi Gautham,
Thanks for reviewing the patch. So, cpu_active_mask has bit 'cpu' set
only if that cpu is available. Even if we offline cpu non-sequentially
it will update "cpu_acive_mask" accordingly.
This is some of test I tried:
command:# cat /sys/devices/hv_24x7/cpumask
0
command:# echo 0 > /sys/devices/system/cpu/cpu0/online
command:# echo 0 > /sys/devices/system/cpu/cpu2/online
command:# cat /sys/devices/hv_24x7/cpumask
1
command:# echo 0 > /sys/devices/system/cpu/cpu1/online
command:# cat /sys/devices/hv_24x7/cpumask
3
Please let me know if my understanding is fine.
Thanks,
Kajol Jain
>> +
>> + return 0;
>> +}
>> +
>> +static int hv_24x7_cpu_hotplug_init(void)
>> +{
>> + return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE,
>> + "perf/powerpc/hv_24x7:online",
>> + ppc_hv_24x7_cpu_online,
>> + ppc_hv_24x7_cpu_offline);
>> +}
>> +
>> static int hv_24x7_init(void)
>> {
>> int r;
>> @@ -1685,6 +1725,11 @@ static int hv_24x7_init(void)
>> if (r)
>> return r;
>>
>> + /* init cpuhotplug */
>> + r = hv_24x7_cpu_hotplug_init();
>> + if (r)
>> + pr_err("hv_24x7: CPU hotplug init failed\n");
>> +
>> r = perf_pmu_register(&h_24x7_pmu, h_24x7_pmu.name, -1);
>> if (r)
>> return r;
>> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
>> index 8377afef8806..16ed8f6f8774 100644
>> --- a/include/linux/cpuhotplug.h
>> +++ b/include/linux/cpuhotplug.h
>> @@ -180,6 +180,7 @@ enum cpuhp_state {
>> CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE,
>> CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE,
>> CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE,
>> + CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE,
>> CPUHP_AP_WATCHDOG_ONLINE,
>> CPUHP_AP_WORKQUEUE_ONLINE,
>> CPUHP_AP_RCUTREE_ONLINE,
>> --
>> 2.18.2
>>
^ permalink raw reply
* Re: [PATCH 2/2] powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask
From: kajoljain @ 2020-06-22 5:49 UTC (permalink / raw)
To: ego; +Cc: nathanl, maddy, suka, anju, linuxppc-dev
In-Reply-To: <20200619050501.GB13981@in.ibm.com>
On 6/19/20 10:35 AM, Gautham R Shenoy wrote:
> On Thu, Jun 18, 2020 at 05:57:13PM +0530, Kajol Jain wrote:
>> Patch here adds a cpumask attr to hv_24x7 pmu along with ABI documentation.
>>
>> command:# cat /sys/devices/hv_24x7/cpumask
>> 0
>>
>> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
>> ---
>> .../sysfs-bus-event_source-devices-hv_24x7 | 6 ++++
>> arch/powerpc/perf/hv-24x7.c | 31 ++++++++++++++++++-
>> 2 files changed, 36 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
>> index e8698afcd952..281e7b367733 100644
>> --- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
>> +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
>> @@ -43,6 +43,12 @@ Description: read only
>> This sysfs interface exposes the number of cores per chip
>> present in the system.
>>
>> +What: /sys/devices/hv_24x7/cpumask
>> +Date: June 2020
>> +Contact: Linux on PowerPC Developer List <linuxppc-dev@lists.ozlabs.org>
>> +Description: read only
>> + This sysfs file exposes cpumask.
>
> Could you please describe this in little more detail as to what the
> cpumask is ?
>
Hi Gautham,
Sure I will update the detail.
>> +
>> What: /sys/bus/event_source/devices/hv_24x7/event_descs/<event-name>
>> Date: February 2014
>> Contact: Linux on PowerPC Developer List <linuxppc-dev@lists.ozlabs.org>
>> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
>> index fdc4ae155d60..03d870a9fc36 100644
>> --- a/arch/powerpc/perf/hv-24x7.c
>> +++ b/arch/powerpc/perf/hv-24x7.c
>> @@ -448,6 +448,12 @@ static ssize_t device_show_string(struct device *dev,
>> return sprintf(buf, "%s\n", (char *)d->var);
>> }
>>
>> +static ssize_t cpumask_get_attr(struct device *dev,
>> + struct device_attribute *attr, char *buf)
>> +{
>> + return cpumap_print_to_pagebuf(true, buf, &hv_24x7_cpumask);
>> +}
>> +
>> static ssize_t sockets_show(struct device *dev,
>> struct device_attribute *attr, char *buf)
>> {
>> @@ -1116,6 +1122,17 @@ static DEVICE_ATTR_RO(sockets);
>> static DEVICE_ATTR_RO(chipspersocket);
>> static DEVICE_ATTR_RO(coresperchip);
>>
>> +static DEVICE_ATTR(cpumask, S_IRUGO, cpumask_get_attr, NULL);
>> +
>> +static struct attribute *cpumask_attrs[] = {
>> + &dev_attr_cpumask.attr,
>> + NULL,
>> +};
>> +
>> +static struct attribute_group cpumask_attr_group = {
>> + .attrs = cpumask_attrs,
>> +};
>> +
>> static struct bin_attribute *if_bin_attrs[] = {
>> &bin_attr_catalog,
>> NULL,
>> @@ -1143,6 +1160,11 @@ static const struct attribute_group *attr_groups[] = {
>> &event_desc_group,
>> &event_long_desc_group,
>> &if_group,
>> + /*
>> + * This NULL is a placeholder for the cpumask attr which will update
>> + * onlyif cpuhotplug registration is successful
>> + */
>> + NULL,
>> NULL,
>> };
>>
>> @@ -1727,8 +1749,15 @@ static int hv_24x7_init(void)
>>
>> /* init cpuhotplug */
>> r = hv_24x7_cpu_hotplug_init();
>> - if (r)
>> + if (r) {
>> pr_err("hv_24x7: CPU hotplug init failed\n");
>> + } else {
>> + /*
>> + * Cpu hotplug init is successful, add the
>> + * cpumask file as part of pmu attr group
>> + */
>> + attr_groups[5] = &cpumask_attr_group;
>
> Since this is only a one-time initialization, wouldn't it be safer to
> iterate through attr_groups[] and assin cpumask_attr_group to the
> first NULL location ?
Yes thats right. Will update that part.
Thanks,
Kajol Jain
>
>> + }
>>
>> r = perf_pmu_register(&h_24x7_pmu, h_24x7_pmu.name, -1);
>> if (r)
>> --
>> 2.18.2
>>
^ permalink raw reply
* [PATCH] powerpc/mm/book3s64: Skip 16G page reservation with radix
From: Aneesh Kumar K.V @ 2020-06-22 6:40 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V
With hash translation, the hypervisor can hint the LPAR about 16GB contiguous range
via ibm,expected#pages. The kernel marks the range specified in the device tree
as reserved. Avoid doing this when using radix translation. Radix translation
only supports 1G gigantic hugepage and kernel can do the 1G gigantic hugepage
allocation via early memblock reservation. This can be done because with radix
translation pages are not required to be contiguous on the host.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 0124003e60d0..65ab00566233 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -596,7 +596,7 @@ static void __init htab_scan_page_sizes(void)
}
#ifdef CONFIG_HUGETLB_PAGE
- if (!hugetlb_disabled) {
+ if (!hugetlb_disabled && !early_radix_enabled() ) {
/* Reserve 16G huge page memory sections for huge pages */
of_scan_flat_dt(htab_dt_scan_hugepage_blocks, NULL);
}
--
2.26.2
^ permalink raw reply related
* [PATCH 0/6] Prefixed instruction tests to cover negative cases
From: Balamuruhan S @ 2020-06-22 7:09 UTC (permalink / raw)
To: mpe
Cc: ravi.bangoria, jniethe5, Balamuruhan S, paulus, sandipan,
naveen.n.rao, linuxppc-dev
This patchset adds support to test negative scenarios and adds testcase
for paddi with few fixes. It is based on powerpc/next and on top of
Jordan's tests for prefixed instructions patchset,
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-May/211394.html
Balamuruhan S (6):
powerpc test_emulate_step: update nip with patched instruction address
powerpc test_emulate_step: fix pr_info() to print 8-byte for prefixed
instruction
powerpc test_emulate_step: enhancement to test negative scenarios
powerpc test_emulate_step: add negative tests for prefixed addi
powerpc sstep: introduce macros to retrieve Prefix instruction
operands
powerpc test_emulate_step: move extern declaration to sstep.h
arch/powerpc/include/asm/sstep.h | 6 +++
arch/powerpc/lib/sstep.c | 12 ++---
arch/powerpc/lib/test_emulate_step.c | 78 +++++++++++++++++++++++-----
3 files changed, 77 insertions(+), 19 deletions(-)
--
2.24.1
^ permalink raw reply
* [PATCH 1/6] powerpc test_emulate_step: update nip with patched instruction address
From: Balamuruhan S @ 2020-06-22 7:09 UTC (permalink / raw)
To: mpe
Cc: ravi.bangoria, jniethe5, Balamuruhan S, paulus, sandipan,
naveen.n.rao, linuxppc-dev
In-Reply-To: <20200622070941.759307-1-bala24@linux.ibm.com>
pt_regs are initialized to zero in the test infrastructure, R bit
in prefixed instruction form is used to specify whether the effective
address of the storage operand is computed relative to the address
of the instruction.
If R = 1 and RA = R0|0, the sum of the address of the instruction
and the value SI is placed into register RT. So to assert the emulated
instruction with executed instruction, update nip of emulated pt_regs.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
---
arch/powerpc/lib/test_emulate_step.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c
index 33a72b7d2764..d5902b7b4e5c 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -1204,13 +1204,24 @@ static struct compute_test compute_tests[] = {
static int __init emulate_compute_instr(struct pt_regs *regs,
struct ppc_inst instr)
{
+ int prefix_r, ra;
extern s32 patch__exec_instr;
struct instruction_op op;
if (!regs || !ppc_inst_val(instr))
return -EINVAL;
- regs->nip = patch_site_addr(&patch__exec_instr);
+ /*
+ * If R=1 and RA=0 in Prefixed instruction form, calculate the address
+ * of the instruction and update nip to assert with executed
+ * instruction
+ */
+ if (ppc_inst_prefixed(instr)) {
+ prefix_r = ppc_inst_val(instr) & (1UL << 20);
+ ra = (ppc_inst_suffix(instr) >> 16) & 0x1f;
+ if (prefix_r && !ra)
+ regs->nip = patch_site_addr(&patch__exec_instr);
+ }
if (analyse_instr(&op, regs, instr) != 1 ||
GETTYPE(op.type) != COMPUTE) {
--
2.24.1
^ permalink raw reply related
* [PATCH 2/6] powerpc test_emulate_step: fix pr_info() to print 8-byte for prefixed instruction
From: Balamuruhan S @ 2020-06-22 7:09 UTC (permalink / raw)
To: mpe
Cc: ravi.bangoria, jniethe5, Balamuruhan S, paulus, sandipan,
naveen.n.rao, linuxppc-dev
In-Reply-To: <20200622070941.759307-1-bala24@linux.ibm.com>
On test failure, `pr_log()` prints 4 bytes instruction
irrespective of word/prefix instruction, fix it by printing
them appropriately.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
---
arch/powerpc/lib/test_emulate_step.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c
index d5902b7b4e5c..e3b1797adfae 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -1225,7 +1225,14 @@ static int __init emulate_compute_instr(struct pt_regs *regs,
if (analyse_instr(&op, regs, instr) != 1 ||
GETTYPE(op.type) != COMPUTE) {
- pr_info("emulation failed, instruction = 0x%08x\n", ppc_inst_val(instr));
+ if (!ppc_inst_prefixed(instr)) {
+ pr_info("emulation failed, instruction = 0x%08x\n",
+ ppc_inst_val(instr));
+ } else {
+ pr_info("emulation failed, instruction = 0x%08x 0x%08x\n",
+ ppc_inst_val(instr),
+ ppc_inst_suffix(instr));
+ }
return -EFAULT;
}
--
2.24.1
^ permalink raw reply related
* [PATCH 3/6] powerpc test_emulate_step: enhancement to test negative scenarios
From: Balamuruhan S @ 2020-06-22 7:09 UTC (permalink / raw)
To: mpe
Cc: ravi.bangoria, jniethe5, Balamuruhan S, paulus, sandipan,
naveen.n.rao, linuxppc-dev
In-Reply-To: <20200622070941.759307-1-bala24@linux.ibm.com>
add provision to declare test is a negative scenario, verify
whether emulation fails and avoid executing it.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
---
arch/powerpc/lib/test_emulate_step.c | 46 ++++++++++++++++++++++------
1 file changed, 36 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c
index e3b1797adfae..79acc899a618 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -703,6 +703,7 @@ struct compute_test {
unsigned long flags;
struct ppc_inst instr;
struct pt_regs regs;
+ bool negative;
} subtests[MAX_SUBTESTS + 1];
};
@@ -1202,9 +1203,10 @@ static struct compute_test compute_tests[] = {
};
static int __init emulate_compute_instr(struct pt_regs *regs,
- struct ppc_inst instr)
+ struct ppc_inst instr,
+ bool negative)
{
- int prefix_r, ra;
+ int prefix_r, ra, analysed;
extern s32 patch__exec_instr;
struct instruction_op op;
@@ -1223,8 +1225,10 @@ static int __init emulate_compute_instr(struct pt_regs *regs,
regs->nip = patch_site_addr(&patch__exec_instr);
}
- if (analyse_instr(&op, regs, instr) != 1 ||
- GETTYPE(op.type) != COMPUTE) {
+ analysed = analyse_instr(&op, regs, instr);
+ if (analysed != 1 || GETTYPE(op.type) != COMPUTE) {
+ if (negative)
+ return -EFAULT;
if (!ppc_inst_prefixed(instr)) {
pr_info("emulation failed, instruction = 0x%08x\n",
ppc_inst_val(instr));
@@ -1235,8 +1239,18 @@ static int __init emulate_compute_instr(struct pt_regs *regs,
}
return -EFAULT;
}
-
- emulate_update_regs(regs, &op);
+ if (analysed == 1 && negative) {
+ if (!ppc_inst_prefixed(instr)) {
+ pr_info("negative test failed, instruction = 0x%08x\n",
+ ppc_inst_val(instr));
+ } else {
+ pr_info("negative test failed, instruction = 0x%08x 0x%08x\n",
+ ppc_inst_val(instr),
+ ppc_inst_suffix(instr));
+ }
+ }
+ if (!negative)
+ emulate_update_regs(regs, &op);
return 0;
}
@@ -1252,7 +1266,14 @@ static int __init execute_compute_instr(struct pt_regs *regs,
/* Patch the NOP with the actual instruction */
patch_instruction_site(&patch__exec_instr, instr);
if (exec_instr(regs)) {
- pr_info("execution failed, instruction = 0x%08x\n", ppc_inst_val(instr));
+ if (!ppc_inst_prefixed(instr)) {
+ pr_info("execution failed, instruction = 0x%08x\n",
+ ppc_inst_val(instr));
+ } else {
+ pr_info("execution failed, instruction = 0x%08x 0x%08x\n",
+ ppc_inst_val(instr),
+ ppc_inst_suffix(instr));
+ }
return -EFAULT;
}
@@ -1274,7 +1295,7 @@ static void __init run_tests_compute(void)
struct pt_regs *regs, exp, got;
unsigned int i, j, k;
struct ppc_inst instr;
- bool ignore_gpr, ignore_xer, ignore_ccr, passed;
+ bool ignore_gpr, ignore_xer, ignore_ccr, passed, rc, negative;
for (i = 0; i < ARRAY_SIZE(compute_tests); i++) {
test = &compute_tests[i];
@@ -1288,6 +1309,7 @@ static void __init run_tests_compute(void)
instr = test->subtests[j].instr;
flags = test->subtests[j].flags;
regs = &test->subtests[j].regs;
+ negative = test->subtests[j].negative;
ignore_xer = flags & IGNORE_XER;
ignore_ccr = flags & IGNORE_CCR;
passed = true;
@@ -1302,8 +1324,12 @@ static void __init run_tests_compute(void)
exp.msr = MSR_KERNEL;
got.msr = MSR_KERNEL;
- if (emulate_compute_instr(&got, instr) ||
- execute_compute_instr(&exp, instr)) {
+ rc = emulate_compute_instr(&got, instr, negative) != 0;
+ if (negative) {
+ /* skip executing instruction */
+ passed = rc;
+ goto print;
+ } else if (rc || execute_compute_instr(&exp, instr)) {
passed = false;
goto print;
}
--
2.24.1
^ permalink raw reply related
* [PATCH 4/6] powerpc test_emulate_step: add negative tests for prefixed addi
From: Balamuruhan S @ 2020-06-22 7:09 UTC (permalink / raw)
To: mpe
Cc: ravi.bangoria, jniethe5, Balamuruhan S, paulus, sandipan,
naveen.n.rao, linuxppc-dev
In-Reply-To: <20200622070941.759307-1-bala24@linux.ibm.com>
testcases for `paddi` instruction to cover the negative case,
if R is equal to 1 and RA is not equal to 0, the instruction
form is invalid.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
---
arch/powerpc/lib/test_emulate_step.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c
index 79acc899a618..f9825c275c31 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -1197,6 +1197,16 @@ static struct compute_test compute_tests[] = {
.regs = {
.gpr[21] = 0,
}
+ },
+ /* Invalid instruction form with R = 1 and RA != 0 */
+ {
+ .descr = "RA = R22(0), SI = 0, R = 1",
+ .instr = TEST_PADDI(21, 22, 0, 1),
+ .negative = true,
+ .regs = {
+ .gpr[21] = 0,
+ .gpr[22] = 0,
+ }
}
}
}
--
2.24.1
^ permalink raw reply related
* [PATCH 5/6] powerpc sstep: introduce macros to retrieve Prefix instruction operands
From: Balamuruhan S @ 2020-06-22 7:09 UTC (permalink / raw)
To: mpe
Cc: ravi.bangoria, jniethe5, Balamuruhan S, paulus, sandipan,
naveen.n.rao, linuxppc-dev
In-Reply-To: <20200622070941.759307-1-bala24@linux.ibm.com>
retrieve prefix instruction operands RA and pc relative bit R values
using macros and adopt it in sstep.c and test_emulate_step.c.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
---
arch/powerpc/include/asm/sstep.h | 4 ++++
arch/powerpc/lib/sstep.c | 12 ++++++------
arch/powerpc/lib/test_emulate_step.c | 4 ++--
3 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index 3b01c69a44aa..325975b4ef30 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -104,6 +104,10 @@ enum instruction_type {
#define MKOP(t, f, s) ((t) | (f) | SIZE(s))
+/* Prefix instruction operands */
+#define GET_PREFIX_RA(i) (((i) >> 16) & 0x1f)
+#define GET_PREFIX_R(i) ((i) & (1ul << 20))
+
struct instruction_op {
int type;
int reg;
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 5abe98216dc2..fb4c5767663d 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -200,8 +200,8 @@ static nokprobe_inline unsigned long mlsd_8lsd_ea(unsigned int instr,
unsigned int dd;
unsigned long ea, d0, d1, d;
- prefix_r = instr & (1ul << 20);
- ra = (suffix >> 16) & 0x1f;
+ prefix_r = GET_PREFIX_R(instr);
+ ra = GET_PREFIX_RA(suffix);
d0 = instr & 0x3ffff;
d1 = suffix & 0xffff;
@@ -1339,8 +1339,8 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
switch (opcode) {
#ifdef __powerpc64__
case 1:
- prefix_r = word & (1ul << 20);
- ra = (suffix >> 16) & 0x1f;
+ prefix_r = GET_PREFIX_R(word);
+ ra = GET_PREFIX_RA(suffix);
rd = (suffix >> 21) & 0x1f;
op->reg = rd;
op->val = regs->gpr[rd];
@@ -2715,8 +2715,8 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
}
break;
case 1: /* Prefixed instructions */
- prefix_r = word & (1ul << 20);
- ra = (suffix >> 16) & 0x1f;
+ prefix_r = GET_PREFIX_R(word);
+ ra = GET_PREFIX_RA(suffix);
op->update_reg = ra;
rd = (suffix >> 21) & 0x1f;
op->reg = rd;
diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c
index f9825c275c31..f1a447026b6e 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -1229,8 +1229,8 @@ static int __init emulate_compute_instr(struct pt_regs *regs,
* instruction
*/
if (ppc_inst_prefixed(instr)) {
- prefix_r = ppc_inst_val(instr) & (1UL << 20);
- ra = (ppc_inst_suffix(instr) >> 16) & 0x1f;
+ prefix_r = GET_PREFIX_R(ppc_inst_val(instr));
+ ra = GET_PREFIX_RA(ppc_inst_suffix(instr));
if (prefix_r && !ra)
regs->nip = patch_site_addr(&patch__exec_instr);
}
--
2.24.1
^ permalink raw reply related
* [PATCH 6/6] powerpc test_emulate_step: move extern declaration to sstep.h
From: Balamuruhan S @ 2020-06-22 7:09 UTC (permalink / raw)
To: mpe
Cc: ravi.bangoria, jniethe5, Balamuruhan S, paulus, sandipan,
naveen.n.rao, linuxppc-dev
In-Reply-To: <20200622070941.759307-1-bala24@linux.ibm.com>
fix checkpatch.pl warnings by moving extern declaration from source
file to headerfile.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
---
arch/powerpc/include/asm/sstep.h | 2 ++
arch/powerpc/lib/test_emulate_step.c | 2 --
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index 325975b4ef30..c8e37ef060c1 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -108,6 +108,8 @@ enum instruction_type {
#define GET_PREFIX_RA(i) (((i) >> 16) & 0x1f)
#define GET_PREFIX_R(i) ((i) & (1ul << 20))
+extern s32 patch__exec_instr;
+
struct instruction_op {
int type;
int reg;
diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c
index f1a447026b6e..386245607568 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -1217,7 +1217,6 @@ static int __init emulate_compute_instr(struct pt_regs *regs,
bool negative)
{
int prefix_r, ra, analysed;
- extern s32 patch__exec_instr;
struct instruction_op op;
if (!regs || !ppc_inst_val(instr))
@@ -1268,7 +1267,6 @@ static int __init execute_compute_instr(struct pt_regs *regs,
struct ppc_inst instr)
{
extern int exec_instr(struct pt_regs *regs);
- extern s32 patch__exec_instr;
if (!regs || !ppc_inst_val(instr))
return -EINVAL;
--
2.24.1
^ permalink raw reply related
* [PATCH] ASoC: fsl_mqs: Fix unchecked return value for clk_prepare_enable
From: Shengjiu Wang @ 2020-06-22 8:48 UTC (permalink / raw)
To: timur, nicoleotsuka, Xiubo.Lee, festevam, broonie, perex, tiwai,
alsa-devel
Cc: linuxppc-dev, linux-kernel
Fix unchecked return value for clk_prepare_enable.
And because clk_prepare_enable and clk_disable_unprepare should
check input clock parameter is NULL or not, then we don't need
to check it before calling the function.
Fixes: 9e28f6532c61 ("ASoC: fsl_mqs: Add MQS component driver")
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
---
sound/soc/fsl/fsl_mqs.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/sound/soc/fsl/fsl_mqs.c b/sound/soc/fsl/fsl_mqs.c
index 0c813a45bba7..69aeb0e71844 100644
--- a/sound/soc/fsl/fsl_mqs.c
+++ b/sound/soc/fsl/fsl_mqs.c
@@ -265,12 +265,20 @@ static int fsl_mqs_remove(struct platform_device *pdev)
static int fsl_mqs_runtime_resume(struct device *dev)
{
struct fsl_mqs *mqs_priv = dev_get_drvdata(dev);
+ int ret;
- if (mqs_priv->ipg)
- clk_prepare_enable(mqs_priv->ipg);
+ ret = clk_prepare_enable(mqs_priv->ipg);
+ if (ret) {
+ dev_err(dev, "failed to enable ipg clock\n");
+ return ret;
+ }
- if (mqs_priv->mclk)
- clk_prepare_enable(mqs_priv->mclk);
+ ret = clk_prepare_enable(mqs_priv->mclk);
+ if (ret) {
+ dev_err(dev, "failed to enable mclk clock\n");
+ clk_disable_unprepare(mqs_priv->ipg);
+ return ret;
+ }
if (mqs_priv->use_gpr)
regmap_write(mqs_priv->regmap, IOMUXC_GPR2,
@@ -292,11 +300,8 @@ static int fsl_mqs_runtime_suspend(struct device *dev)
regmap_read(mqs_priv->regmap, REG_MQS_CTRL,
&mqs_priv->reg_mqs_ctrl);
- if (mqs_priv->mclk)
- clk_disable_unprepare(mqs_priv->mclk);
-
- if (mqs_priv->ipg)
- clk_disable_unprepare(mqs_priv->ipg);
+ clk_disable_unprepare(mqs_priv->mclk);
+ clk_disable_unprepare(mqs_priv->ipg);
return 0;
}
--
2.21.0
^ permalink raw reply related
* [PATCH] ASoC: fsl_easrc: Fix uninitialized scalar variable in fsl_easrc_set_ctx_format
From: Shengjiu Wang @ 2020-06-22 9:03 UTC (permalink / raw)
To: timur, nicoleotsuka, Xiubo.Lee, festevam, broonie, alsa-devel,
lgirdwood, perex, tiwai
Cc: linuxppc-dev, linux-kernel
The "ret" in fsl_easrc_set_ctx_format is not initialized, then
the unknown value maybe returned by this function.
Fixes: 955ac624058f ("ASoC: fsl_easrc: Add EASRC ASoC CPU DAI drivers")
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
---
sound/soc/fsl/fsl_easrc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/fsl/fsl_easrc.c b/sound/soc/fsl/fsl_easrc.c
index 2f6b3d8bfcfc..03b3aef41d34 100644
--- a/sound/soc/fsl/fsl_easrc.c
+++ b/sound/soc/fsl/fsl_easrc.c
@@ -1132,7 +1132,7 @@ static int fsl_easrc_set_ctx_format(struct fsl_asrc_pair *ctx,
struct fsl_easrc_ctx_priv *ctx_priv = ctx->private;
struct fsl_easrc_data_fmt *in_fmt = &ctx_priv->in_params.fmt;
struct fsl_easrc_data_fmt *out_fmt = &ctx_priv->out_params.fmt;
- int ret;
+ int ret = 0;
/* Get the bitfield values for input data format */
if (in_raw_format && out_raw_format) {
--
2.21.0
^ permalink raw reply related
* Re: [PATCH 3/6] powerpc test_emulate_step: enhancement to test negative scenarios
From: Sandipan Das @ 2020-06-22 9:34 UTC (permalink / raw)
To: Balamuruhan S; +Cc: ravi.bangoria, paulus, jniethe5, naveen.n.rao, linuxppc-dev
In-Reply-To: <20200622070941.759307-4-bala24@linux.ibm.com>
Hi Bala,
On 22/06/20 12:39 pm, Balamuruhan S wrote:
> add provision to declare test is a negative scenario, verify
> whether emulation fails and avoid executing it.
>
> Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
> ---
> arch/powerpc/lib/test_emulate_step.c | 46 ++++++++++++++++++++++------
> 1 file changed, 36 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c
> index e3b1797adfae..79acc899a618 100644
> --- a/arch/powerpc/lib/test_emulate_step.c
> +++ b/arch/powerpc/lib/test_emulate_step.c
> @@ -703,6 +703,7 @@ struct compute_test {
> unsigned long flags;
> struct ppc_inst instr;
> struct pt_regs regs;
> + bool negative;
> } subtests[MAX_SUBTESTS + 1];
> };
>
Bits of 'flags' are currently used to specify if parts of the resulting pt_regs
are to be ignored. Instead of adding a new member to the struct, can we not do
this using a bit in 'flags'?
- Sandipan
^ permalink raw reply
* Re: [PATCH 1/4] powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows
From: Alexey Kardashevskiy @ 2020-06-22 10:02 UTC (permalink / raw)
To: Leonardo Bras
Cc: Ram Pai, linux-kernel, Paul Mackerras, linuxppc-dev,
Thiago Jung Bauermann
In-Reply-To: <20200619050619.266888-2-leobras.c@gmail.com>
On 19/06/2020 15:06, Leonardo Bras wrote:
> From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can make the number of
> outputs from "ibm,query-pe-dma-windows" go from 5 to 6.
>
> This change of output size is meant to expand the address size of
> largest_available_block PE TCE from 32-bit to 64-bit, which ends up
> shifting page_size and migration_capable.
>
> This ends up requiring the update of
> ddw_query_response->largest_available_block from u32 to u64, and manually
> assigning the values from the buffer into this struct, according to
> output size.
>
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 57 +++++++++++++++++++++-----
> 1 file changed, 46 insertions(+), 11 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 6d47b4a3ce39..e5a617738c8b 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -334,7 +334,7 @@ struct direct_window {
> /* Dynamic DMA Window support */
> struct ddw_query_response {
> u32 windows_available;
> - u32 largest_available_block;
> + u64 largest_available_block;
> u32 page_size;
> u32 migration_capable;
> };
> @@ -869,14 +869,32 @@ static int find_existing_ddw_windows(void)
> }
> machine_arch_initcall(pseries, find_existing_ddw_windows);
>
> +/*
> + * From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can rule how many output
> + * parameters ibm,query-pe-dma-windows will have, ranging from 5 to 6.
> + */
> +
> +static int query_ddw_out_sz(struct device_node *par_dn)
Can easily be folded into query_ddw().
> +{
> + int ret;
> + u32 ddw_ext[3];
> +
> + ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
> + &ddw_ext[0], 3);
> + if (ret || ddw_ext[0] < 2 || ddw_ext[2] != 1)
Oh that PAPR thing again :-/
===
The “ibm,ddw-extensions” property value is a list of integers the first
integer indicates the number of extensions implemented and subsequent
integers, one per extension, provide a value associated with that
extension.
===
So ddw_ext[0] is length.
Listindex==2 is for "reset" says PAPR and
Listindex==3 is for this new 64bit "largest_available_block".
So I'd expect ddw_ext[2] to have the "reset" token and ddw_ext[3] to
have "1" for this new feature but indexes are smaller. I am confused.
Either way these "2" and "3" needs to be defined in macros, "0" probably
too.
Please post 'lsprop "ibm,ddw-extensions"' here. Thanks,
> + return 5;
> + return 6;
> +}
> +
> static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
> - struct ddw_query_response *query)
> + struct ddw_query_response *query,
> + struct device_node *par_dn)
> {
> struct device_node *dn;
> struct pci_dn *pdn;
> - u32 cfg_addr;
> + u32 cfg_addr, query_out[5];
> u64 buid;
> - int ret;
> + int ret, out_sz;
>
> /*
> * Get the config address and phb buid of the PE window.
> @@ -888,12 +906,29 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
> pdn = PCI_DN(dn);
> buid = pdn->phb->buid;
> cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
> + out_sz = query_ddw_out_sz(par_dn);
> +
> + ret = rtas_call(ddw_avail[0], 3, out_sz, query_out,
> + cfg_addr, BUID_HI(buid), BUID_LO(buid));
> + dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x returned %d\n",
> + ddw_avail[0], cfg_addr, BUID_HI(buid), BUID_LO(buid), ret);
> +
> + switch (out_sz) {
> + case 5:
> + query->windows_available = query_out[0];
> + query->largest_available_block = query_out[1];
> + query->page_size = query_out[2];
> + query->migration_capable = query_out[3];
> + break;
> + case 6:
> + query->windows_available = query_out[0];
> + query->largest_available_block = ((u64)query_out[1] << 32) |
> + query_out[2];
> + query->page_size = query_out[3];
> + query->migration_capable = query_out[4];
> + break;
> + }
>
> - ret = rtas_call(ddw_avail[0], 3, 5, (u32 *)query,
> - cfg_addr, BUID_HI(buid), BUID_LO(buid));
> - dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
> - " returned %d\n", ddw_avail[0], cfg_addr, BUID_HI(buid),
> - BUID_LO(buid), ret);
> return ret;
> }
>
> @@ -1040,7 +1075,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> * of page sizes: supported and supported for migrate-dma.
> */
> dn = pci_device_to_OF_node(dev);
> - ret = query_ddw(dev, ddw_avail, &query);
> + ret = query_ddw(dev, ddw_avail, &query, pdn);
> if (ret != 0)
> goto out_failed;
>
> @@ -1068,7 +1103,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> /* check largest block * page size > max memory hotplug addr */
> max_addr = ddw_memory_hotplug_max();
> if (query.largest_available_block < (max_addr >> page_shift)) {
> - dev_dbg(&dev->dev, "can't map partition max 0x%llx with %u "
> + dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu "
> "%llu-sized pages\n", max_addr, query.largest_available_block,
> 1ULL << page_shift);
> goto out_failed;
>
--
Alexey
^ permalink raw reply
* Re: [PATCH 2/4] powerpc/pseries/iommu: Implement ibm,reset-pe-dma-windows rtas call
From: Alexey Kardashevskiy @ 2020-06-22 10:02 UTC (permalink / raw)
To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Thiago Jung Bauermann, Ram Pai
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200619050619.266888-3-leobras.c@gmail.com>
On 19/06/2020 15:06, Leonardo Bras wrote:
> Platforms supporting the DDW option starting with LoPAR level 2.7 implement
> ibm,ddw-extensions. The first extension available (index 2) carries the
> token for ibm,reset-pe-dma-windows rtas call, which is used to restore
> the default DMA window for a device, if it has been deleted.
>
> It does so by resetting the TCE table allocation for the PE to it's
> boot time value, available in "ibm,dma-window" device tree node.
>
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 33 ++++++++++++++++++++++++++
> 1 file changed, 33 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index e5a617738c8b..5e1fbc176a37 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1012,6 +1012,39 @@ static phys_addr_t ddw_memory_hotplug_max(void)
> return max_addr;
> }
>
> +/*
> + * Platforms supporting the DDW option starting with LoPAR level 2.7 implement
> + * ibm,ddw-extensions, which carries the rtas token for
> + * ibm,reset-pe-dma-windows.
> + * That rtas-call can be used to restore the default DMA window for the device.
> + */
> +static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
> +{
> + int ret;
> + u32 cfg_addr, ddw_ext[3];
> + u64 buid;
> + struct device_node *dn;
> + struct pci_dn *pdn;
> +
> + ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
> + &ddw_ext[0], 3);
s/3/2/ as for the reset extension you do not need the "64bit largest
block" extension.
> + if (ret)
> + return;
> +
> + dn = pci_device_to_OF_node(dev);
> + pdn = PCI_DN(dn);
> + buid = pdn->phb->buid;
> + cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
> +
> + ret = rtas_call(ddw_ext[1], 3, 1, NULL, cfg_addr,
Here the "reset" extention is in ddw_ext[1]. Hm. 1/4 has a bug then.
And I am pretty sure it won't compile as reset_dma_window() is not used
and it is static so fold it into one the next patches. Thanks,
> + BUID_HI(buid), BUID_LO(buid));
> + if (ret)
> + dev_info(&dev->dev,
> + "ibm,reset-pe-dma-windows(%x) %x %x %x returned %d ",
> + ddw_ext[1], cfg_addr, BUID_HI(buid), BUID_LO(buid),
> + ret);
> +}
> +
> /*
> * If the PE supports dynamic dma windows, and there is space for a table
> * that can map all pages in a linear offset, then setup such a table,
>
--
Alexey
^ permalink raw reply
* Re: [PATCH 3/4] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window
From: Alexey Kardashevskiy @ 2020-06-22 10:02 UTC (permalink / raw)
To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Thiago Jung Bauermann, Ram Pai
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200619050619.266888-4-leobras.c@gmail.com>
On 19/06/2020 15:06, Leonardo Bras wrote:
> Move the window-removing part of remove_ddw into a new function
> (remove_dma_window), so it can be used to remove other DMA windows.
>
> It's useful for removing DMA windows that don't create DIRECT64_PROPNAME
> property, like the default DMA window from the device, which uses
> "ibm,dma-window".
>
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 53 +++++++++++++++-----------
> 1 file changed, 31 insertions(+), 22 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 5e1fbc176a37..de633f6ae093 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -767,25 +767,14 @@ static int __init disable_ddw_setup(char *str)
>
> early_param("disable_ddw", disable_ddw_setup);
>
> -static void remove_ddw(struct device_node *np, bool remove_prop)
> +static void remove_dma_window(struct device_node *pdn, u32 *ddw_avail,
You do not need the entire ddw_avail here, pass just the token you need.
Also, despite this particular file, the "pdn" name is usually used for
struct pci_dn (not device_node), let's keep it that way.
> + struct property *win)
> {
> struct dynamic_dma_window_prop *dwp;
> - struct property *win64;
> - u32 ddw_avail[3];
> u64 liobn;
> - int ret = 0;
> -
> - ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> - &ddw_avail[0], 3);
> -
> - win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
> - if (!win64)
> - return;
> -
> - if (ret || win64->length < sizeof(*dwp))
> - goto delprop;
> + int ret;
>
> - dwp = win64->value;
> + dwp = win->value;
> liobn = (u64)be32_to_cpu(dwp->liobn);
>
> /* clear the whole window, note the arg is in kernel pages */
> @@ -793,24 +782,44 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
> 1ULL << (be32_to_cpu(dwp->window_shift) - PAGE_SHIFT), dwp);
> if (ret)
> pr_warn("%pOF failed to clear tces in window.\n",
> - np);
> + pdn);
> else
> pr_debug("%pOF successfully cleared tces in window.\n",
> - np);
> + pdn);
>
> ret = rtas_call(ddw_avail[2], 1, 1, NULL, liobn);
> if (ret)
> pr_warn("%pOF: failed to remove direct window: rtas returned "
> "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> - np, ret, ddw_avail[2], liobn);
> + pdn, ret, ddw_avail[2], liobn);
> else
> pr_debug("%pOF: successfully removed direct window: rtas returned "
> "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> - np, ret, ddw_avail[2], liobn);
> + pdn, ret, ddw_avail[2], liobn);
> +}
> +
> +static void remove_ddw(struct device_node *np, bool remove_prop)
> +{
> + struct property *win;
> + u32 ddw_avail[3];
> + int ret = 0;
> +
> + ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> + &ddw_avail[0], 3);
> + if (ret)
> + return;
> +
> + win = of_find_property(np, DIRECT64_PROPNAME, NULL);
> + if (!win)
> + return;
> +
> + if (win->length >= sizeof(struct dynamic_dma_window_prop))
Any good reason not to make it "=="? Is there something optional or we
expect extension (which may not grow from the end but may add cells in
between). Thanks,
> + remove_dma_window(np, ddw_avail, win);
> +
> + if (!remove_prop)
> + return;
>
> -delprop:
> - if (remove_prop)
> - ret = of_remove_property(np, win64);
> + ret = of_remove_property(np, win);
> if (ret)
> pr_warn("%pOF: failed to remove direct window property: %d\n",
> np, ret);
>
--
Alexey
^ permalink raw reply
* Re: [PATCH 4/4] powerpc/pseries/iommu: Remove default DMA window before creating DDW
From: Alexey Kardashevskiy @ 2020-06-22 10:02 UTC (permalink / raw)
To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Thiago Jung Bauermann, Ram Pai
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200619050619.266888-5-leobras.c@gmail.com>
On 19/06/2020 15:06, Leonardo Bras wrote:
> On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the
> default DMA window for the device, before attempting to configure a DDW,
> in order to make the maximum resources available for the next DDW to be
> created.
>
> This is a requirement for some devices to use DDW, given they only
> allow one DMA window.
>
> If setting up a new DDW fails anywhere after the removal of this
> default DMA window, restore it using reset_dma_window.
Nah... If we do it like this, then under pHyp we lose 32bit DMA for good
as pHyp can only create a single window and it has to map at
0x800.0000.0000.0000. They probably do not care though.
Under KVM, this will fail as VFIO allows creating 2 windows and it
starts from 0 but the existing iommu_bypass_supported_pSeriesLP() treats
the window address == 0 as a failure. And we want to keep both DMA
windows for PCI adapters with both 64bit and 32bit PCI functions (I
heard AMD GPU video + audio are like this) or someone could hotplug
32bit DMA device on a vphb with already present 64bit DMA window so we
do not remove the default window.
The last discussed thing I remember was that there was supposed to be a
new bit in "ibm,architecture-vec-5" (forgot the details), we could use
that to decide whether to keep the default window or not, like this.
>
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 20 +++++++++++++++++---
> 1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index de633f6ae093..68d1ea957ac7 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1074,8 +1074,9 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> u64 dma_addr, max_addr;
> struct device_node *dn;
> u32 ddw_avail[3];
> +
> struct direct_window *window;
> - struct property *win64;
> + struct property *win64, *dfl_win;
Make it "default_win" or "def_win", "dfl" hurts to read :)
> struct dynamic_dma_window_prop *ddwprop;
> struct failed_ddw_pdn *fpdn;
>
> @@ -1110,8 +1111,19 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> if (ret)
> goto out_failed;
>
> - /*
> - * Query if there is a second window of size to map the
> + /*
> + * First step of setting up DDW is removing the default DMA window,
> + * if it's present. It will make all the resources available to the
> + * new DDW window.
> + * If anything fails after this, we need to restore it.
> + */
> +
> + dfl_win = of_find_property(pdn, "ibm,dma-window", NULL);
> + if (dfl_win)
> + remove_dma_window(pdn, ddw_avail, dfl_win);
Before doing so, you want to make sure that the "reset" is actually
supported. Thanks,
> +
> + /*
> + * Query if there is a window of size to map the
> * whole partition. Query returns number of windows, largest
> * block assigned to PE (partition endpoint), and two bitmasks
> * of page sizes: supported and supported for migrate-dma.
> @@ -1219,6 +1231,8 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> kfree(win64);
>
> out_failed:
> + if (dfl_win)
> + reset_dma_window(dev, pdn);
>
> fpdn = kzalloc(sizeof(*fpdn), GFP_KERNEL);
> if (!fpdn)
>
--
Alexey
^ permalink raw reply
* [next-20200621] LTP tests af_alg02/05 failure on POWER9 PowerVM LPAR
From: Sachin Sant @ 2020-06-22 12:25 UTC (permalink / raw)
To: herbert, linux-crypto; +Cc: Linux Next Mailing List, linuxppc-dev
With recent next(next-20200621) af_alg02/05 tests fail while running on POWER9
PowerVM LPAR.
Results from 5.8.0-rc1-next-20200622
# ./af_alg02
tst_test.c:1096: INFO: Timeout per run is 0h 00m 20s
af_alg02.c:52: BROK: Timed out while reading from request socket.
#
5.8.0-rc1-next-20200618 was good. The test case ran fine.
Root cause analysis point to following commit:
commit f3c802a1f30013f8f723b62d7fa49eb9e991da23
crypto: algif_aead - Only wake up when ctx->more is zero
Reverting this commit allows the test to PASS.
Results after reverting the mentioned commit:
# uname -r
5.8.0-rc1-next-20200622-dirty
# ./af_alg02
tst_test.c:1096: INFO: Timeout per run is 0h 00m 20s
af_alg02.c:33: PASS: Successfully "encrypted" an empty message
#
# ./af_alg05
tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
af_alg05.c:34: PASS: read() expectedly failed with EINVAL
#
Thanks
-Sachin
^ permalink raw reply
* Re: [RFC PATCH v0 2/5] powerpc/mm/radix: Create separate mappings for hot-plugged memory
From: Aneesh Kumar K.V @ 2020-06-22 12:46 UTC (permalink / raw)
To: Bharata B Rao, linuxppc-dev
Cc: leonardo, aneesh.kumar, npiggin, Bharata B Rao
In-Reply-To: <20200406034925.22586-3-bharata@linux.ibm.com>
Bharata B Rao <bharata@linux.ibm.com> writes:
> Memory that gets hot-plugged _during_ boot (and not the memory
> that gets plugged in after boot), is mapped with 1G mappings
> and will undergo splitting when it is unplugged. The splitting
> code has a few issues:
>
> 1. Recursive locking
> --------------------
> Memory unplug path takes cpu_hotplug_lock and calls stop_machine()
> for splitting the mappings. However stop_machine() takes
> cpu_hotplug_lock again causing deadlock.
>
> 2. BUG: sleeping function called from in_atomic() context
> ---------------------------------------------------------
> Memory unplug path (remove_pagetable) takes init_mm.page_table_lock
> spinlock and later calls stop_machine() which does wait_for_completion()
>
> 3. Bad unlock unbalance
> -----------------------
> Memory unplug path takes init_mm.page_table_lock spinlock and calls
> stop_machine(). The stop_machine thread function runs in a different
> thread context (migration thread) which tries to release and reaquire
> ptl. Releasing ptl from a different thread than which acquired it
> causes bad unlock unbalance.
>
> These problems can be avoided if we avoid mapping hot-plugged memory
> with 1G mapping, thereby removing the need for splitting them during
> unplug. During radix init, identify(*) the hot-plugged memory region
> and create separate mappings for each LMB so that they don't get mapped
> with 1G mappings.
>
> To create separate mappings for every LMB in the hot-plugged
> region, we need lmb-size. I am currently using memory_block_size_bytes()
> API to get the lmb-size. Since this is early init time code, the
> machine type isn't probed yet and hence memory_block_size_bytes()
> would return the default LMB size as 16MB. Hence we end up creating
> separate mappings at much lower granularity than what we can ideally
> do for pseries machine.
>
> (*) Identifying and differentiating hot-plugged memory from the
> boot time memory is now possible with PAPR extension to LMB flags.
> (Ref: https://lore.kernel.org/linuxppc-dev/f55a7b65a43cc9dc7b22385cf9960f8b11d5ce2e.camel@linux.ibm.com/T/#t)
>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
> ---
> arch/powerpc/mm/book3s64/radix_pgtable.c | 15 ++++++++++++---
> 1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index dd1bea45325c..4a4fb30f6c3d 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -16,6 +16,7 @@
> #include <linux/hugetlb.h>
> #include <linux/string_helpers.h>
> #include <linux/stop_machine.h>
> +#include <linux/memory.h>
>
> #include <asm/pgtable.h>
> #include <asm/pgalloc.h>
> @@ -313,6 +314,8 @@ static void __init radix_init_pgtable(void)
> {
> unsigned long rts_field;
> struct memblock_region *reg;
> + phys_addr_t addr;
> + u64 lmb_size = memory_block_size_bytes();
>
> /* We don't support slb for radix */
> mmu_slb_size = 0;
> @@ -331,9 +334,15 @@ static void __init radix_init_pgtable(void)
> continue;
> }
>
> - WARN_ON(create_physical_mapping(reg->base,
> - reg->base + reg->size,
> - -1));
> + if (memblock_is_hotpluggable(reg)) {
> + for (addr = reg->base; addr < (reg->base + reg->size);
> + addr += lmb_size)
> + WARN_ON(create_physical_mapping(addr,
> + addr + lmb_size, -1));
Is that indentation correct?
> + } else
> + WARN_ON(create_physical_mapping(reg->base,
> + reg->base + reg->size,
> + -1));
> }
>
> /* Find out how many PID bits are supported */
> --
> 2.21.0
^ permalink raw reply
* Re: [RFC PATCH v0 3/5] powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings
From: Aneesh Kumar K.V @ 2020-06-22 12:53 UTC (permalink / raw)
To: Bharata B Rao, linuxppc-dev
Cc: leonardo, aneesh.kumar, npiggin, Bharata B Rao
In-Reply-To: <20200406034925.22586-4-bharata@linux.ibm.com>
Bharata B Rao <bharata@linux.ibm.com> writes:
> We can hit the following BUG_ON during memory unplug
>
> kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:344!
> Oops: Exception in kernel mode, sig: 5 [#1]
> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> NIP [c000000000097d48] pmd_fragment_free+0x48/0xd0
> LR [c0000000016aaefc] remove_pagetable+0x494/0x530
> Call Trace:
> _raw_spin_lock+0x54/0x80 (unreliable)
> remove_pagetable+0x2b0/0x530
> radix__remove_section_mapping+0x18/0x2c
> remove_section_mapping+0x38/0x5c
> arch_remove_memory+0x124/0x190
> try_remove_memory+0xd0/0x1c0
> __remove_memory+0x20/0x40
> dlpar_remove_lmb+0xbc/0x110
> dlpar_memory+0xa90/0xd40
> handle_dlpar_errorlog+0xa8/0x160
> pseries_hp_work_fn+0x2c/0x60
> process_one_work+0x47c/0x870
> worker_thread+0x364/0x5e0
> kthread+0x1b4/0x1c0
> ret_from_kernel_thread+0x5c/0x74
>
> This occurs when unplug is attempted for such memory which has
> been mapped using memblock pages as part of early kernel page
> table setup. We wouldn't have initialized the PMD or PTE fragment
> count for those PMD or PTE pages.
>
> Fixing this includes 3 parts:
>
> - Re-walk the init_mm page tables from mem_init() and initialize
> the PMD and PTE fragment count to 1.
> - When freeing PUD, PMD and PTE page table pages, check explicitly
> if they come from memblock and if so free then appropriately.
> - When we do early memblock based allocation of PMD and PUD pages,
> allocate in PAGE_SIZE granularity so that we are sure the
> complete page is used as pagetable page.
>
> Since we now do PAGE_SIZE allocations for both PUD table and
> PMD table (Note that PTE table allocation is already of PAGE_SIZE),
> we end up allocating more memory for the same amount of system RAM.
> Here is a comparision of how much more we need for a 64T and 2G
> system after this patch:
>
> 1. 64T system
> -------------
> 64T RAM would need 64G for vmemmap with struct page size being 64B.
>
> 128 PUD tables for 64T memory (1G mappings)
> 1 PUD table and 64 PMD tables for 64G vmemmap (2M mappings)
>
> With default PUD[PMD]_TABLE_SIZE(4K), (128+1+64)*4K=772K
> With PAGE_SIZE(64K) table allocations, (128+1+64)*64K=12352K
>
> 2. 2G system
> ------------
> 2G RAM would need 2M for vmemmap with struct page size being 64B.
>
> 1 PUD table for 2G memory (1G mapping)
> 1 PUD table and 1 PMD table for 2M vmemmap (2M mappings)
>
> With default PUD[PMD]_TABLE_SIZE(4K), (1+1+1)*4K=12K
> With new PAGE_SIZE(64K) table allocations, (1+1+1)*64K=192K
>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
> ---
> arch/powerpc/include/asm/book3s/64/pgalloc.h | 11 ++-
> arch/powerpc/include/asm/book3s/64/radix.h | 1 +
> arch/powerpc/include/asm/sparsemem.h | 1 +
> arch/powerpc/mm/book3s64/pgtable.c | 31 ++++++++-
> arch/powerpc/mm/book3s64/radix_pgtable.c | 72 ++++++++++++++++++--
> arch/powerpc/mm/mem.c | 5 ++
> arch/powerpc/mm/pgtable-frag.c | 9 ++-
> 7 files changed, 121 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h b/arch/powerpc/include/asm/book3s/64/pgalloc.h
> index a41e91bd0580..e96572fb2871 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
> @@ -109,7 +109,16 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
>
> static inline void pud_free(struct mm_struct *mm, pud_t *pud)
> {
> - kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), pud);
> + struct page *page = virt_to_page(pud);
> +
> + /*
> + * Early pud pages allocated via memblock allocator
> + * can't be directly freed to slab
> + */
> + if (PageReserved(page))
> + free_reserved_page(page);
> + else
> + kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), pud);
> }
>
> static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> index d97db3ad9aae..0aff8750181a 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -291,6 +291,7 @@ static inline unsigned long radix__get_tree_size(void)
> #ifdef CONFIG_MEMORY_HOTPLUG
> int radix__create_section_mapping(unsigned long start, unsigned long end, int nid);
> int radix__remove_section_mapping(unsigned long start, unsigned long end);
> +void radix__fixup_pgtable_fragments(void);
> #endif /* CONFIG_MEMORY_HOTPLUG */
> #endif /* __ASSEMBLY__ */
> #endif
> diff --git a/arch/powerpc/include/asm/sparsemem.h b/arch/powerpc/include/asm/sparsemem.h
> index 3192d454a733..e662f9232d35 100644
> --- a/arch/powerpc/include/asm/sparsemem.h
> +++ b/arch/powerpc/include/asm/sparsemem.h
> @@ -15,6 +15,7 @@
> #ifdef CONFIG_MEMORY_HOTPLUG
> extern int create_section_mapping(unsigned long start, unsigned long end, int nid);
> extern int remove_section_mapping(unsigned long start, unsigned long end);
> +void fixup_pgtable_fragments(void);
>
> #ifdef CONFIG_PPC_BOOK3S_64
> extern int resize_hpt_for_hotplug(unsigned long new_mem_size);
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
> index 2bf7e1b4fd82..be7aa8786747 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -186,6 +186,13 @@ int __meminit remove_section_mapping(unsigned long start, unsigned long end)
>
> return hash__remove_section_mapping(start, end);
> }
> +
> +void fixup_pgtable_fragments(void)
> +{
> + if (radix_enabled())
> + radix__fixup_pgtable_fragments();
> +}
> +
> #endif /* CONFIG_MEMORY_HOTPLUG */
>
> void __init mmu_partition_table_init(void)
> @@ -343,13 +350,23 @@ void pmd_fragment_free(unsigned long *pmd)
>
> BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
> if (atomic_dec_and_test(&page->pt_frag_refcount)) {
> - pgtable_pmd_page_dtor(page);
> - __free_page(page);
> + /*
> + * Early pmd pages allocated via memblock
> + * allocator wouldn't have called _ctor
> + */
> + if (PageReserved(page))
> + free_reserved_page(page);
> + else {
> + pgtable_pmd_page_dtor(page);
> + __free_page(page);
> + }
> }
> }
>
> static inline void pgtable_free(void *table, int index)
> {
> + struct page *page;
> +
> switch (index) {
> case PTE_INDEX:
> pte_fragment_free(table, 0);
> @@ -358,7 +375,15 @@ static inline void pgtable_free(void *table, int index)
> pmd_fragment_free(table);
> break;
> case PUD_INDEX:
> - kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), table);
> + page = virt_to_page(table);
> + /*
> + * Early pud pages allocated via memblock
> + * allocator need to be freed differently
> + */
> + if (PageReserved(page))
> + free_reserved_page(page);
> + else
> + kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), table);
> break;
> #if defined(CONFIG_PPC_4K_PAGES) && defined(CONFIG_HUGETLB_PAGE)
> /* 16M hugepd directory at pud level */
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index 4a4fb30f6c3d..e675c0bbf9a4 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -36,6 +36,70 @@
> unsigned int mmu_pid_bits;
> unsigned int mmu_base_pid;
>
> +static void fixup_pte_fragments(pmd_t *pmd)
> +{
> + int i;
> +
> + for (i = 0; i < PTRS_PER_PMD; i++, pmd++) {
> + pte_t *pte;
> + struct page *page;
> +
> + if (pmd_none(*pmd))
> + continue;
> + if (pmd_is_leaf(*pmd))
> + continue;
> +
> + pte = pte_offset_kernel(pmd, 0);
> + page = virt_to_page(pte);
> + atomic_inc(&page->pt_frag_refcount);
> + }
> +}
> +
> +static void fixup_pmd_fragments(pud_t *pud)
> +{
> + int i;
> +
> + for (i = 0; i < PTRS_PER_PUD; i++, pud++) {
> + pmd_t *pmd;
> + struct page *page;
> +
> + if (pud_none(*pud))
> + continue;
> + if (pud_is_leaf(*pud))
> + continue;
> +
> + pmd = pmd_offset(pud, 0);
> + page = virt_to_page(pmd);
> + atomic_inc(&page->pt_frag_refcount);
> + fixup_pte_fragments(pmd);
> + }
> +}
> +
> +/*
> + * Walk the init_mm page tables and fixup the PMD and PTE fragment
> + * counts. This allows the PUD, PMD and PTE pages to be freed
> + * back to buddy allocator properly during memory unplug.
> + */
> +void radix__fixup_pgtable_fragments(void)
> +{
> + int i;
> + pgd_t *pgd = pgd_offset_k(0UL);
> +
> + spin_lock(&init_mm.page_table_lock);
> + for (i = 0; i < PTRS_PER_PGD; i++, pgd++) {
> + pud_t *pud;
> +
> + if (pgd_none(*pgd))
> + continue;
> + if (pgd_is_leaf(*pgd))
> + continue;
> +
> + pud = pud_offset(pgd, 0);
> + fixup_pmd_fragments(pud);
> + }
> + spin_unlock(&init_mm.page_table_lock);
> +}
> +
> static __ref void *early_alloc_pgtable(unsigned long size, int nid,
> unsigned long region_start, unsigned long region_end)
> {
> @@ -71,8 +135,8 @@ static int early_map_kernel_page(unsigned long ea, unsigned long pa,
>
> pgdp = pgd_offset_k(ea);
> if (pgd_none(*pgdp)) {
> - pudp = early_alloc_pgtable(PUD_TABLE_SIZE, nid,
> - region_start, region_end);
> + pudp = early_alloc_pgtable(PAGE_SIZE, nid, region_start,
> + region_end);
> pgd_populate(&init_mm, pgdp, pudp);
> }
> pudp = pud_offset(pgdp, ea);
> @@ -81,8 +145,8 @@ static int early_map_kernel_page(unsigned long ea, unsigned long pa,
> goto set_the_pte;
> }
> if (pud_none(*pudp)) {
> - pmdp = early_alloc_pgtable(PMD_TABLE_SIZE, nid,
> - region_start, region_end);
> + pmdp = early_alloc_pgtable(PAGE_SIZE, nid, region_start,
> + region_end);
> pud_populate(&init_mm, pudp, pmdp);
> }
> pmdp = pmd_offset(pudp, ea);
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 1c07d5a3f543..d43ad701f693 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -53,6 +53,10 @@
>
> #include <mm/mmu_decl.h>
>
> +void __weak fixup_pgtable_fragments(void)
> +{
> +}
> +
> #ifndef CPU_FTR_COHERENT_ICACHE
> #define CPU_FTR_COHERENT_ICACHE 0 /* XXX for now */
> #define CPU_FTR_NOEXECUTE 0
> @@ -307,6 +311,7 @@ void __init mem_init(void)
>
> memblock_free_all();
>
> + fixup_pgtable_fragments();
> #ifdef CONFIG_HIGHMEM
> {
> unsigned long pfn, highmem_mapnr;
> diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
> index ee4bd6d38602..16213c09896a 100644
> --- a/arch/powerpc/mm/pgtable-frag.c
> +++ b/arch/powerpc/mm/pgtable-frag.c
> @@ -114,6 +114,13 @@ void pte_fragment_free(unsigned long *table, int kernel)
> if (atomic_dec_and_test(&page->pt_frag_refcount)) {
> if (!kernel)
> pgtable_pte_page_dtor(page);
> - __free_page(page);
> + /*
> + * Early pte pages allocated via memblock
> + * allocator need to be freed differently
> + */
> + if (PageReserved(page))
> + free_reserved_page(page);
> + else
> + __free_page(page);
> }
> }
> --
> 2.21.0
^ permalink raw reply
* Re: [RFC PATCH v0 4/5] powerpc/mm/radix: Free PUD table when freeing pagetable
From: Aneesh Kumar K.V @ 2020-06-22 13:07 UTC (permalink / raw)
To: Bharata B Rao, linuxppc-dev
Cc: leonardo, aneesh.kumar, npiggin, Bharata B Rao
In-Reply-To: <20200406034925.22586-5-bharata@linux.ibm.com>
Bharata B Rao <bharata@linux.ibm.com> writes:
> remove_pagetable() isn't freeing PUD table. This causes memory
> leak during memory unplug. Fix this.
>
We had changes w.r.t p4d (folded 5 level table). You may want to get
this updated to recent kernel.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
> ---
> arch/powerpc/mm/book3s64/radix_pgtable.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index e675c0bbf9a4..0d9ef3277579 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -767,6 +767,21 @@ static void free_pmd_table(pmd_t *pmd_start, pud_t *pud)
> pud_clear(pud);
> }
>
> +static void free_pud_table(pud_t *pud_start, pgd_t *pgd)
> +{
> + pud_t *pud;
> + int i;
> +
> + for (i = 0; i < PTRS_PER_PUD; i++) {
> + pud = pud_start + i;
> + if (!pud_none(*pud))
> + return;
> + }
> +
> + pud_free(&init_mm, pud_start);
> + pgd_clear(pgd);
> +}
> +
> struct change_mapping_params {
> pte_t *pte;
> unsigned long start;
> @@ -937,6 +952,7 @@ static void __meminit remove_pagetable(unsigned long start, unsigned long end)
>
> pud_base = (pud_t *)pgd_page_vaddr(*pgd);
> remove_pud_table(pud_base, addr, next);
> + free_pud_table(pud_base, pgd);
> }
>
> spin_unlock(&init_mm.page_table_lock);
> --
> 2.21.0
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox