* [PATCH v2 10/28] powerpc/pseries/mobility: use rtas_activate_firmware() on resume
From: Nathan Lynch @ 2020-12-07 21:51 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking
In-Reply-To: <20201207215200.1785968-1-nathanl@linux.ibm.com>
It's incorrect to abort post-suspend processing if
ibm,activate-firmware isn't available. Use rtas_activate_firmware(),
which logs this condition appropriately and allows us to proceed.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
arch/powerpc/platforms/pseries/mobility.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 31d81b7da961..01ac7c03558e 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -312,21 +312,8 @@ int pseries_devicetree_update(s32 scope)
void post_mobility_fixup(void)
{
int rc;
- int activate_fw_token;
- activate_fw_token = rtas_token("ibm,activate-firmware");
- if (activate_fw_token == RTAS_UNKNOWN_SERVICE) {
- printk(KERN_ERR "Could not make post-mobility "
- "activate-fw call.\n");
- return;
- }
-
- do {
- rc = rtas_call(activate_fw_token, 0, 1, NULL);
- } while (rtas_busy_delay(rc));
-
- if (rc)
- printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc);
+ rtas_activate_firmware();
/*
* We don't want CPUs to go online/offline while the device
--
2.28.0
^ permalink raw reply related
* [PATCH v2 08/28] powerpc/pseries/mobility: add missing break to default case
From: Nathan Lynch @ 2020-12-07 21:51 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking
In-Reply-To: <20201207215200.1785968-1-nathanl@linux.ibm.com>
update_dt_node() has a switch statement where the default case lacks a
break statement.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
arch/powerpc/platforms/pseries/mobility.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index e66359b00297..527a64e2d89f 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -213,6 +213,7 @@ static int update_dt_node(__be32 phandle, s32 scope)
}
prop_data += vd;
+ break;
}
cond_resched();
--
2.28.0
^ permalink raw reply related
* [PATCH v2 06/28] powerpc/hvcall: add token and codes for H_VASI_SIGNAL
From: Nathan Lynch @ 2020-12-07 21:51 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking
In-Reply-To: <20201207215200.1785968-1-nathanl@linux.ibm.com>
H_VASI_SIGNAL can be used by a partition to request cancellation of
its migration. To be used in future changes.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
arch/powerpc/include/asm/hvcall.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index c1fbccb04390..c98f5141e3fc 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -155,6 +155,14 @@
#define H_VASI_RESUMED 5
#define H_VASI_COMPLETED 6
+/* VASI signal codes. Only the Cancel code is valid for H_VASI_SIGNAL. */
+#define H_VASI_SIGNAL_CANCEL 1
+#define H_VASI_SIGNAL_ABORT 2
+#define H_VASI_SIGNAL_SUSPEND 3
+#define H_VASI_SIGNAL_COMPLETE 4
+#define H_VASI_SIGNAL_ENABLE 5
+#define H_VASI_SIGNAL_FAILOVER 6
+
/* Each control block has to be on a 4K boundary */
#define H_CB_ALIGNMENT 4096
@@ -261,6 +269,7 @@
#define H_ADD_CONN 0x284
#define H_DEL_CONN 0x288
#define H_JOIN 0x298
+#define H_VASI_SIGNAL 0x2A0
#define H_VASI_STATE 0x2A4
#define H_VIOCTL 0x2A8
#define H_ENABLE_CRQ 0x2B0
--
2.28.0
^ permalink raw reply related
* [PATCH v2 07/28] powerpc/pseries/mobility: don't error on absence of ibm, update-nodes
From: Nathan Lynch @ 2020-12-07 21:51 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking
In-Reply-To: <20201207215200.1785968-1-nathanl@linux.ibm.com>
Treat the absence of the ibm,update-nodes function as benign instead
of reporting an error. If the platform does not provide that facility,
it's not a problem for Linux.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
arch/powerpc/platforms/pseries/mobility.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 6ff642e84c6a..e66359b00297 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -261,7 +261,7 @@ int pseries_devicetree_update(s32 scope)
update_nodes_token = rtas_token("ibm,update-nodes");
if (update_nodes_token == RTAS_UNKNOWN_SERVICE)
- return -EINVAL;
+ return 0;
rtas_buf = kzalloc(RTAS_DATA_BUF_SIZE, GFP_KERNEL);
if (!rtas_buf)
--
2.28.0
^ permalink raw reply related
* [PATCH v2 05/28] powerpc/rtas: add rtas_activate_firmware()
From: Nathan Lynch @ 2020-12-07 21:51 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking
In-Reply-To: <20201207215200.1785968-1-nathanl@linux.ibm.com>
Provide a documented wrapper function for the ibm,activate-firmware
service, which must be called after a partition migration or
hibernation.
If the function is absent or the call fails, the OS will continue to
run normally with the current firmware, so there is no need to perform
any recovery. Just log it and continue.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
arch/powerpc/include/asm/rtas.h | 1 +
arch/powerpc/kernel/rtas.c | 30 ++++++++++++++++++++++++++++++
2 files changed, 31 insertions(+)
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index b43165fc6c2a..fdefe6a974eb 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -247,6 +247,7 @@ extern void __noreturn rtas_restart(char *cmd);
extern void rtas_power_off(void);
extern void __noreturn rtas_halt(void);
extern void rtas_os_term(char *str);
+void rtas_activate_firmware(void);
extern int rtas_get_sensor(int sensor, int index, int *state);
extern int rtas_get_sensor_fast(int sensor, int index, int *state);
extern int rtas_get_power_level(int powerdomain, int *level);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 8a618a3c4beb..3a740ae933f8 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -798,6 +798,36 @@ void rtas_os_term(char *str)
printk(KERN_EMERG "ibm,os-term call failed %d\n", status);
}
+/**
+ * rtas_activate_firmware() - Activate a new version of firmware.
+ *
+ * Activate a new version of partition firmware. The OS must call this
+ * after resuming from a partition hibernation or migration in order
+ * to maintain the ability to perform live firmware updates. It's not
+ * catastrophic for this method to be absent or to fail; just log the
+ * condition in that case.
+ *
+ * Context: This function may sleep.
+ */
+void rtas_activate_firmware(void)
+{
+ int token;
+ int fwrc;
+
+ token = rtas_token("ibm,activate-firmware");
+ if (token == RTAS_UNKNOWN_SERVICE) {
+ pr_notice("ibm,activate-firmware method unavailable\n");
+ return;
+ }
+
+ do {
+ fwrc = rtas_call(token, 0, 1, NULL);
+ } while (rtas_busy_delay(fwrc));
+
+ if (fwrc)
+ pr_err("ibm,activate-firmware failed (%i)\n", fwrc);
+}
+
static int ibm_suspend_me_token = RTAS_UNKNOWN_SERVICE;
#ifdef CONFIG_PPC_PSERIES
static int __rtas_suspend_last_cpu(struct rtas_suspend_me_data *data, int wake_when_done)
--
2.28.0
^ permalink raw reply related
* [PATCH v2 02/28] powerpc/rtas: complete ibm,suspend-me status codes
From: Nathan Lynch @ 2020-12-07 21:51 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking
In-Reply-To: <20201207215200.1785968-1-nathanl@linux.ibm.com>
We don't completely account for the possible return codes for
ibm,suspend-me. Add definitions for these.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
arch/powerpc/include/asm/rtas.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 55f9a154c95d..f060181a0d32 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -23,11 +23,16 @@
#define RTAS_RMOBUF_MAX (64 * 1024)
/* RTAS return status codes */
-#define RTAS_NOT_SUSPENDABLE -9004
#define RTAS_BUSY -2 /* RTAS Busy */
#define RTAS_EXTENDED_DELAY_MIN 9900
#define RTAS_EXTENDED_DELAY_MAX 9905
+/* statuses specific to ibm,suspend-me */
+#define RTAS_SUSPEND_ABORTED 9000 /* Suspension aborted */
+#define RTAS_NOT_SUSPENDABLE -9004 /* Partition not suspendable */
+#define RTAS_THREADS_ACTIVE -9005 /* Multiple processor threads active */
+#define RTAS_OUTSTANDING_COPROC -9006 /* Outstanding coprocessor operations */
+
/*
* In general to call RTAS use rtas_token("string") to lookup
* an RTAS token for the given string (e.g. "event-scan").
--
2.28.0
^ permalink raw reply related
* [PATCH v2 04/28] powerpc/rtas: add rtas_ibm_suspend_me()
From: Nathan Lynch @ 2020-12-07 21:51 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking
In-Reply-To: <20201207215200.1785968-1-nathanl@linux.ibm.com>
Now that the name is available, provide a simple wrapper for
ibm,suspend-me which returns both a Linux errno and optionally the
actual RTAS status to the caller.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
arch/powerpc/include/asm/rtas.h | 1 +
arch/powerpc/kernel/rtas.c | 57 +++++++++++++++++++++++++++++++++
2 files changed, 58 insertions(+)
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 8436ed01567b..b43165fc6c2a 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -258,6 +258,7 @@ extern void rtas_progress(char *s, unsigned short hex);
extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
int rtas_ibm_suspend_me_unsafe(u64 handle);
+int rtas_ibm_suspend_me(int *fw_status);
struct rtc_time;
extern time64_t rtas_get_boot_time(void);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 0a8e5dc2c108..8a618a3c4beb 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -684,6 +684,63 @@ int rtas_set_indicator_fast(int indicator, int index, int new_value)
return rc;
}
+/**
+ * rtas_ibm_suspend_me() - Call ibm,suspend-me to suspend the LPAR.
+ *
+ * @fw_status: RTAS call status will be placed here if not NULL.
+ *
+ * rtas_ibm_suspend_me() should be called only on a CPU which has
+ * received H_CONTINUE from the H_JOIN hcall. All other active CPUs
+ * should be waiting to return from H_JOIN.
+ *
+ * rtas_ibm_suspend_me() may suspend execution of the OS
+ * indefinitely. Callers should take appropriate measures upon return, such as
+ * resetting watchdog facilities.
+ *
+ * Callers may choose to retry this call if @fw_status is
+ * %RTAS_THREADS_ACTIVE.
+ *
+ * Return:
+ * 0 - The partition has resumed from suspend, possibly after
+ * migration to a different host.
+ * -ECANCELED - The operation was aborted.
+ * -EAGAIN - There were other CPUs not in H_JOIN at the time of the call.
+ * -EBUSY - Some other condition prevented the suspend from succeeding.
+ * -EIO - Hardware/platform error.
+ */
+int rtas_ibm_suspend_me(int *fw_status)
+{
+ int fwrc;
+ int ret;
+
+ fwrc = rtas_call(rtas_token("ibm,suspend-me"), 0, 1, NULL);
+
+ switch (fwrc) {
+ case 0:
+ ret = 0;
+ break;
+ case RTAS_SUSPEND_ABORTED:
+ ret = -ECANCELED;
+ break;
+ case RTAS_THREADS_ACTIVE:
+ ret = -EAGAIN;
+ break;
+ case RTAS_NOT_SUSPENDABLE:
+ case RTAS_OUTSTANDING_COPROC:
+ ret = -EBUSY;
+ break;
+ case -1:
+ default:
+ ret = -EIO;
+ break;
+ }
+
+ if (fw_status)
+ *fw_status = fwrc;
+
+ return ret;
+}
+
void __noreturn rtas_restart(char *cmd)
{
if (rtas_flash_term_hook)
--
2.28.0
^ permalink raw reply related
* [PATCH v2 00/28] partition suspend updates
From: Nathan Lynch @ 2020-12-07 21:51 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking
This series aims to improve the pseries-specific partition migration
and hibernation implementation, part of which has been living in
kernel/rtas.c. Most of that code is eliminated or moved to
platforms/pseries, and the following major functional changes are
made:
- Use stop_machine() instead of on_each_cpu() to avoid deadlock in the
join/suspend sequence.
- Retry the join/suspend sequence on errors that are likely to be
transient. This is a mitigation for the fact that drivers currently
have no way to prepare for an impending partition suspension,
sometimes resulting in a virtual adapter being in a state which
causes the platform to fail the suspend call.
- Request cancellation of the migration via H_VASI_SIGNAL if Linux is
going to error out of the suspend attempt. This allows the
management console and other entities to promptly clean up their
operations instead of relying on long timeouts to fail the
migration.
- Little-endian users of ibm,suspend-me, ibm,update-nodes and
ibm,update-properties via sys_rtas are blocked when
CONFIG_PPC_RTAS_FILTERS is enabled.
- Legacy user space code (drmgr) historically has driven the migration
process by using sys_rtas to separately call ibm,suspend-me,
ibm,activate-firmware, and ibm,update-nodes/properties, in that
order. With these changes, when sys_rtas() dispatches
ibm,suspend-me, the kernel performs the device tree update and
firmware activation before returning. This is more reliable, and
drmgr does not seem bothered by it.
- If the H_VASI_STATE hcall is absent, the implementation proceeds
with the suspend instead of erroring out. This allows us to exercise
these code paths in QEMU.
Changes since v1:
- Drop "powerpc/rtas: move rtas_call_reentrant() out of pseries
guards". rtas_call_reentrant() actually is pseries-specific and this
broke builds without CONFIG_PPC_PSERIES set.
- Simplify polling logic in wait_for_vasi_session_suspending().
("powerpc/pseries/mobility: extract VASI session polling logic")
- Use direct return instead of goto in pseries_migrate_partition().
("powerpc/pseries/mobility: use stop_machine for join/suspend")
- Change dispatch of ibm,suspend-me in rtas syscall path to use
conventional config symbol guards instead of a weak function.
("powerpc/rtas: dispatch partition migration requests to pseries")
- Fix refcount imbalance in add_dt_node() error path.
("powerpc/pseries/mobility: refactor node lookup during DT update")
Nathan Lynch (28):
powerpc/rtas: prevent suspend-related sys_rtas use on LE
powerpc/rtas: complete ibm,suspend-me status codes
powerpc/rtas: rtas_ibm_suspend_me -> rtas_ibm_suspend_me_unsafe
powerpc/rtas: add rtas_ibm_suspend_me()
powerpc/rtas: add rtas_activate_firmware()
powerpc/hvcall: add token and codes for H_VASI_SIGNAL
powerpc/pseries/mobility: don't error on absence of ibm,update-nodes
powerpc/pseries/mobility: add missing break to default case
powerpc/pseries/mobility: error message improvements
powerpc/pseries/mobility: use rtas_activate_firmware() on resume
powerpc/pseries/mobility: extract VASI session polling logic
powerpc/pseries/mobility: use stop_machine for join/suspend
powerpc/pseries/mobility: signal suspend cancellation to platform
powerpc/pseries/mobility: retry partition suspend after error
powerpc/rtas: dispatch partition migration requests to pseries
powerpc/rtas: remove rtas_ibm_suspend_me_unsafe()
powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend
ops
powerpc/pseries/hibernation: pass stream id via function arguments
powerpc/pseries/hibernation: remove pseries_suspend_cpu()
powerpc/machdep: remove suspend_disable_cpu()
powerpc/rtas: remove rtas_suspend_cpu()
powerpc/pseries/hibernation: switch to rtas_ibm_suspend_me()
powerpc/rtas: remove unused rtas_suspend_last_cpu()
powerpc/pseries/hibernation: remove redundant cacheinfo update
powerpc/pseries/hibernation: perform post-suspend fixups later
powerpc/pseries/hibernation: remove prepare_late() callback
powerpc/rtas: remove unused rtas_suspend_me_data
powerpc/pseries/mobility: refactor node lookup during DT update
arch/powerpc/include/asm/hvcall.h | 9 +
arch/powerpc/include/asm/machdep.h | 1 -
arch/powerpc/include/asm/rtas-types.h | 8 -
arch/powerpc/include/asm/rtas.h | 17 +-
arch/powerpc/kernel/rtas.c | 243 ++++++---------
arch/powerpc/platforms/pseries/mobility.c | 358 ++++++++++++++++++----
arch/powerpc/platforms/pseries/suspend.c | 79 +----
7 files changed, 415 insertions(+), 300 deletions(-)
--
2.28.0
^ permalink raw reply
* [PATCH v2 03/28] powerpc/rtas: rtas_ibm_suspend_me -> rtas_ibm_suspend_me_unsafe
From: Nathan Lynch @ 2020-12-07 21:51 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking
In-Reply-To: <20201207215200.1785968-1-nathanl@linux.ibm.com>
The pseries partition suspend sequence requires that all active CPUs
call H_JOIN, which suspends all but one of them with interrupts
disabled. The "chosen" CPU is then to call ibm,suspend-me to complete
the suspend. Upon returning from ibm,suspend-me, the chosen CPU is to
use H_PROD to wake the joined CPUs.
Using on_each_cpu() for this, as rtas_ibm_suspend_me() does to
implement partition migration, is susceptible to deadlock with other
users of on_each_cpu() and with users of stop_machine APIs. The
callback passed to on_each_cpu() is not allowed to synchronize with
other CPUs in the way it is used here.
Complicating the fix is the fact that rtas_ibm_suspend_me() also
occupies the function name that should be used to provide a more
conventional wrapper for ibm,suspend-me. Rename rtas_ibm_suspend_me()
to rtas_ibm_suspend_me_unsafe() to free up the name and indicate that
it should not gain users.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
arch/powerpc/include/asm/rtas.h | 2 +-
arch/powerpc/kernel/rtas.c | 6 +++---
arch/powerpc/platforms/pseries/mobility.c | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index f060181a0d32..8436ed01567b 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -257,7 +257,7 @@ extern int rtas_set_indicator_fast(int indicator, int index, int new_value);
extern void rtas_progress(char *s, unsigned short hex);
extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
-extern int rtas_ibm_suspend_me(u64 handle);
+int rtas_ibm_suspend_me_unsafe(u64 handle);
struct rtc_time;
extern time64_t rtas_get_boot_time(void);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 4ed64aba37d6..0a8e5dc2c108 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -843,7 +843,7 @@ static void rtas_percpu_suspend_me(void *info)
__rtas_suspend_cpu((struct rtas_suspend_me_data *)info, 1);
}
-int rtas_ibm_suspend_me(u64 handle)
+int rtas_ibm_suspend_me_unsafe(u64 handle)
{
long state;
long rc;
@@ -949,7 +949,7 @@ int rtas_call_reentrant(int token, int nargs, int nret, int *outputs, ...)
}
#else /* CONFIG_PPC_PSERIES */
-int rtas_ibm_suspend_me(u64 handle)
+int rtas_ibm_suspend_me_unsafe(u64 handle)
{
return -ENOSYS;
}
@@ -1185,7 +1185,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
int rc = 0;
u64 handle = ((u64)be32_to_cpu(args.args[0]) << 32)
| be32_to_cpu(args.args[1]);
- rc = rtas_ibm_suspend_me(handle);
+ rc = rtas_ibm_suspend_me_unsafe(handle);
if (rc == -EAGAIN)
args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE);
else if (rc == -EIO)
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 2f73cb5bf12d..6ff642e84c6a 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -370,7 +370,7 @@ static ssize_t migration_store(struct class *class,
return rc;
do {
- rc = rtas_ibm_suspend_me(streamid);
+ rc = rtas_ibm_suspend_me_unsafe(streamid);
if (rc == -EAGAIN)
ssleep(1);
} while (rc == -EAGAIN);
--
2.28.0
^ permalink raw reply related
* [PATCH v2 01/28] powerpc/rtas: prevent suspend-related sys_rtas use on LE
From: Nathan Lynch @ 2020-12-07 21:51 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld, ajd, mmc, cforno12, drt, brking
In-Reply-To: <20201207215200.1785968-1-nathanl@linux.ibm.com>
While drmgr has had work in some areas to make its RTAS syscall
interactions endian-neutral, its code for performing partition
migration via the syscall has never worked on LE. While it is able to
complete ibm,suspend-me successfully, it crashes when attempting the
subsequent ibm,update-nodes call.
drmgr is the only known (or plausible) user of ibm,suspend-me,
ibm,update-nodes, and ibm,update-properties, so allow them only in
big-endian configurations.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
arch/powerpc/kernel/rtas.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 954f41676f69..4ed64aba37d6 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1050,9 +1050,11 @@ static struct rtas_filter rtas_filters[] __ro_after_init = {
{ "set-time-for-power-on", -1, -1, -1, -1, -1 },
{ "ibm,set-system-parameter", -1, 1, -1, -1, -1 },
{ "set-time-of-day", -1, -1, -1, -1, -1 },
+#ifdef CONFIG_CPU_BIG_ENDIAN
{ "ibm,suspend-me", -1, -1, -1, -1, -1 },
{ "ibm,update-nodes", -1, 0, -1, -1, -1, 4096 },
{ "ibm,update-properties", -1, 0, -1, -1, -1, 4096 },
+#endif
{ "ibm,physical-attestation", -1, 0, 1, -1, -1 },
};
--
2.28.0
^ permalink raw reply related
* Re: [PATCH v6 1/5] PCI: Unify ECAM constants in native PCI Express drivers
From: Jim Quinlan @ 2020-12-07 20:29 UTC (permalink / raw)
To: Florian Fainelli
Cc: Krzysztof Wilczyński, Heiko Stuebner, Shawn Lin,
Paul Mackerras, Thomas Petazzoni, Jonathan Chocron, Toan Le,
Will Deacon, Rob Herring, Lorenzo Pieralisi, Michal Simek,
linux-rockchip, maintainer:BROADCOM BCM7XXX ARM ARCHITECTURE,
moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
open list:PCI NATIVE HOST BRIDGE AND ENDPOINT DRIVERS, Ray Jui,
moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
Jonathan Cameron, Bjorn Helgaas, Jonathan Derrick, Scott Branden,
Zhou Wang, Robert Richter, linuxppc-dev, Nicolas Saenz Julienne
In-Reply-To: <094b314f-7f61-d0fd-fd63-c9c4da9e84a8@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3939 bytes --]
On Sun, Dec 6, 2020 at 10:25 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> +JimQ,
>
> On 12/6/2020 12:16 PM, Krzysztof Wilczyński wrote:
> > Hello Nicolas, Florian and Florian,
> >
> > [...]
> >> -/* Configuration space read/write support */
> >> -static inline int brcm_pcie_cfg_index(int busnr, int devfn, int reg)
> >> -{
> >> - return ((PCI_SLOT(devfn) & 0x1f) << PCIE_EXT_SLOT_SHIFT)
> >> - | ((PCI_FUNC(devfn) & 0x07) << PCIE_EXT_FUNC_SHIFT)
> >> - | (busnr << PCIE_EXT_BUSNUM_SHIFT)
> >> - | (reg & ~3);
> >> -}
> >> -
> >> static void __iomem *brcm_pcie_map_conf(struct pci_bus *bus, unsigned int devfn,
> >> int where)
> >> {
> >> @@ -716,7 +704,7 @@ static void __iomem *brcm_pcie_map_conf(struct pci_bus *bus, unsigned int devfn,
> >> return PCI_SLOT(devfn) ? NULL : base + where;
> >>
> >> /* For devices, write to the config space index register */
> >> - idx = brcm_pcie_cfg_index(bus->number, devfn, 0);
> >> + idx = PCIE_ECAM_OFFSET(bus->number, devfn, 0);
> >> writel(idx, pcie->base + PCIE_EXT_CFG_INDEX);
> >> return base + PCIE_EXT_CFG_DATA + where;
> >> }
> > [...]
> >
> > Passing the hard-coded 0 as the "reg" argument here never actually did
> > anything, thus the 32 bit alignment was never correctly enforced.
> >
> > My question would be: should this be 32 bit aligned? It seems like the
> > intention was to perhaps make the alignment? I am sadly not intimately
> > familiar with his hardware, so I am not sure if there is something to
> > fix here or not.
Hello Krzystzof,
The value gets assigned to our config-space index register, which has
the lower two bits marked "unused". We're making sure that we are
putting zeroes there but it is most likely not necessary.
> >
> > Also, I wonder whether it would be safe to pass the offset (the "where"
> > variable) rather than hard-coded 0?
The answer is "no" for this code but "maybe" in the future -- allow me
to explain. We have two methods to access the config space:
(1) Set a designated index register to map to the base of a device's
config-space. From then we can access a 4k register set. This is the
method you see in the code and is why we set reg=0 for the index value
and then add "where" to the return address.
(2) Set our index register to the bus/slot/func/reg value, and then we
access a single data register. In this case we do set the "reg" to
the register value to set the index and then only add "where & 0x3" to
the return address.
As it turns out, (1) is not compatible with some MIPs SOCs that we
still support as they do not have the 4k data register set. So I may
be changing to (1) in a future pullreq, and if so, I will invoke
PCIE_ECAM_OFFSET(bus->number, devfn, where & ~3);
Regards,
Jim Quinlan
Broadcom STB
> >
> > Thank you for help in advance!
> >
> > Bjorn also asked the same question:
> > https://lore.kernel.org/linux-pci/20201120203428.GA272511@bjorn-Precision-5520/
> >
> > Krzysztof
> >
>
> --
> Florian
--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4167 bytes --]
^ permalink raw reply
* Re: [powerpc:next-test 54/220] arch/powerpc/kernel/vdso32/vgettimeofday.c:13:5: warning: no previous prototype for function '__c_kernel_clock_gettime64'
From: Segher Boessenkool @ 2020-12-07 18:20 UTC (permalink / raw)
To: Nick Desaulniers
Cc: kernel test robot, linuxppc-dev, kbuild-all, clang-built-linux
In-Reply-To: <CAKwvOd=4vu_o3Sr14JwDO6s+tqZWp-DQKWs9So8g2_-zTt+8KA@mail.gmail.com>
On Mon, Dec 07, 2020 at 09:56:56AM -0800, Nick Desaulniers wrote:
> On Mon, Dec 7, 2020 at 4:23 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
> > So is clang defining __powerpc64__ even for 32-bit code?
> >
> > And the answer appears to be yes:
> >
> > $ clang --version
> > Ubuntu clang version 11.0.0-2
> > Target: powerpc64le-unknown-linux-gnu
> >
> > $ clang -m32 -dM -E - < /dev/null | grep powerpc
> > #define __powerpc64__ 1
> > #define __powerpc__ 1
> >
> > Compare to gcc:
> >
> > $ gcc --version
> > gcc (Ubuntu 10.2.0-13ubuntu1) 10.2.0
> >
> > $ gcc -m32 -dM -E - < /dev/null | grep powerpc
> > #define __powerpc__ 1
> > #define powerpc 1
> > #define __powerpc 1
> >
> >
> > Which is fairly problematic, because we use the presence/absence of
> > __powerpc64__ to determine if we're building 64-bit/32-bit code in
> > several places.
> >
> > Not sure what the best approach for fixing that is.
>
> Thanks for the triage; we should fix our preprocessor:
> https://bugs.llvm.org/show_bug.cgi?id=48427
Not only is that a compatibility problem (as the bug report says): it is
a straight up violation of the ABI! (For ELFv2, which you have here;
older ABIs did not mention the preprocessor predefines, but this was
exactly the same on all compilers afaik.)
Segher
^ permalink raw reply
* Re: [powerpc:next-test 54/220] arch/powerpc/kernel/vdso32/vgettimeofday.c:13:5: warning: no previous prototype for function '__c_kernel_clock_gettime64'
From: Nick Desaulniers @ 2020-12-07 17:56 UTC (permalink / raw)
To: Michael Ellerman
Cc: clang-built-linux, linuxppc-dev, kbuild-all, kernel test robot
In-Reply-To: <87czzlu7n4.fsf@mpe.ellerman.id.au>
On Mon, Dec 7, 2020 at 4:23 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> So is clang defining __powerpc64__ even for 32-bit code?
>
> And the answer appears to be yes:
>
> $ clang --version
> Ubuntu clang version 11.0.0-2
> Target: powerpc64le-unknown-linux-gnu
>
> $ clang -m32 -dM -E - < /dev/null | grep powerpc
> #define __powerpc64__ 1
> #define __powerpc__ 1
>
> Compare to gcc:
>
> $ gcc --version
> gcc (Ubuntu 10.2.0-13ubuntu1) 10.2.0
>
> $ gcc -m32 -dM -E - < /dev/null | grep powerpc
> #define __powerpc__ 1
> #define powerpc 1
> #define __powerpc 1
>
>
> Which is fairly problematic, because we use the presence/absence of
> __powerpc64__ to determine if we're building 64-bit/32-bit code in
> several places.
>
> Not sure what the best approach for fixing that is.
Thanks for the triage; we should fix our preprocessor:
https://bugs.llvm.org/show_bug.cgi?id=48427
--
Thanks,
~Nick Desaulniers
^ permalink raw reply
* Re: [PATCH] powerpc/mm: Fix KUAP warning by providing copy_from_kernel_nofault_allowed()
From: Christophe Leroy @ 2020-12-07 17:02 UTC (permalink / raw)
To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras, hch,
viro, akpm
Cc: linux-mm, linuxppc-dev, linux-kernel
In-Reply-To: <87ft4itqdw.fsf@mpe.ellerman.id.au>
Le 07/12/2020 à 01:24, Michael Ellerman a écrit :
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> Since commit c33165253492 ("powerpc: use non-set_fs based maccess
>> routines"), userspace access is not granted anymore when using
>> copy_from_kernel_nofault()
>>
>> However, kthread_probe_data() uses copy_from_kernel_nofault()
>> to check validity of pointers. When the pointer is NULL,
>> it points to userspace, leading to a KUAP fault and triggering
>> the following big hammer warning many times when you request
>> a sysrq "show task":
>>
>> [ 1117.202054] ------------[ cut here ]------------
>> [ 1117.202102] Bug: fault blocked by AP register !
>> [ 1117.202261] WARNING: CPU: 0 PID: 377 at arch/powerpc/include/asm/nohash/32/kup-8xx.h:66 do_page_fault+0x4a8/0x5ec
>> [ 1117.202310] Modules linked in:
>> [ 1117.202428] CPU: 0 PID: 377 Comm: sh Tainted: G W 5.10.0-rc5-01340-g83f53be2de31-dirty #4175
>> [ 1117.202499] NIP: c0012048 LR: c0012048 CTR: 00000000
>> [ 1117.202573] REGS: cacdbb88 TRAP: 0700 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty)
>> [ 1117.202625] MSR: 00021032 <ME,IR,DR,RI> CR: 24082222 XER: 20000000
>> [ 1117.202899]
>> [ 1117.202899] GPR00: c0012048 cacdbc40 c2929290 00000023 c092e554 00000001 c09865e8 c092e640
>> [ 1117.202899] GPR08: 00001032 00000000 00000000 00014efc 28082224 100d166a 100a0920 00000000
>> [ 1117.202899] GPR16: 100cac0c 100b0000 1080c3fc 1080d685 100d0000 100d0000 00000000 100a0900
>> [ 1117.202899] GPR24: 100d0000 c07892ec 00000000 c0921510 c21f4440 0000005c c0000000 cacdbc80
>> [ 1117.204362] NIP [c0012048] do_page_fault+0x4a8/0x5ec
>> [ 1117.204461] LR [c0012048] do_page_fault+0x4a8/0x5ec
>> [ 1117.204509] Call Trace:
>> [ 1117.204609] [cacdbc40] [c0012048] do_page_fault+0x4a8/0x5ec (unreliable)
>> [ 1117.204771] [cacdbc70] [c00112f0] handle_page_fault+0x8/0x34
>> [ 1117.204911] --- interrupt: 301 at copy_from_kernel_nofault+0x70/0x1c0
>> [ 1117.204979] NIP: c010dbec LR: c010dbac CTR: 00000001
>> [ 1117.205053] REGS: cacdbc80 TRAP: 0301 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty)
>> [ 1117.205104] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 28082224 XER: 00000000
>> [ 1117.205416] DAR: 0000005c DSISR: c0000000
>> [ 1117.205416] GPR00: c0045948 cacdbd38 c2929290 00000001 00000017 00000017 00000027 0000000f
>> [ 1117.205416] GPR08: c09926ec 00000000 00000000 3ffff000 24082224
>> [ 1117.206106] NIP [c010dbec] copy_from_kernel_nofault+0x70/0x1c0
>> [ 1117.206202] LR [c010dbac] copy_from_kernel_nofault+0x30/0x1c0
>> [ 1117.206258] --- interrupt: 301
>> [ 1117.206372] [cacdbd38] [c004bbb0] kthread_probe_data+0x44/0x70 (unreliable)
>> [ 1117.206561] [cacdbd58] [c0045948] print_worker_info+0xe0/0x194
>> [ 1117.206717] [cacdbdb8] [c00548ac] sched_show_task+0x134/0x168
>> [ 1117.206851] [cacdbdd8] [c005a268] show_state_filter+0x70/0x100
>> [ 1117.206989] [cacdbe08] [c039baa0] sysrq_handle_showstate+0x14/0x24
>> [ 1117.207122] [cacdbe18] [c039bf18] __handle_sysrq+0xac/0x1d0
>> [ 1117.207257] [cacdbe48] [c039c0c0] write_sysrq_trigger+0x4c/0x74
>> [ 1117.207407] [cacdbe68] [c01fba48] proc_reg_write+0xb4/0x114
>> [ 1117.207550] [cacdbe88] [c0179968] vfs_write+0x12c/0x478
>> [ 1117.207686] [cacdbf08] [c0179e60] ksys_write+0x78/0x128
>> [ 1117.207826] [cacdbf38] [c00110d0] ret_from_syscall+0x0/0x34
>> [ 1117.207938] --- interrupt: c01 at 0xfd4e784
>> [ 1117.208008] NIP: 0fd4e784 LR: 0fe0f244 CTR: 10048d38
>> [ 1117.208083] REGS: cacdbf48 TRAP: 0c01 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty)
>> [ 1117.208134] MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 44002222 XER: 00000000
>> [ 1117.208470]
>> [ 1117.208470] GPR00: 00000004 7fc34090 77bfb4e0 00000001 1080fa40 00000002 7400000f fefefeff
>> [ 1117.208470] GPR08: 7f7f7f7f 10048d38 1080c414 7fc343c0 00000000
>> [ 1117.209104] NIP [0fd4e784] 0xfd4e784
>> [ 1117.209180] LR [0fe0f244] 0xfe0f244
>> [ 1117.209236] --- interrupt: c01
>> [ 1117.209274] Instruction dump:
>> [ 1117.209353] 714a4000 418200f0 73ca0001 40820084 73ca0032 408200f8 73c90040 4082ff60
>> [ 1117.209727] 0fe00000 3c60c082 386399f4 48013b65 <0fe00000> 80010034 3860000b 7c0803a6
>> [ 1117.210102] ---[ end trace 1927c0323393af3e ]---
>>
>> To avoid that, copy_from_kernel_nofault_allowed() is used to check
>> whether the address is a valid kernel address. But the default
>> version of it returns true for any address.
>>
>> Provide a powerpc version of copy_from_kernel_nofault_allowed()
>> that returns false when the address is below TASK_USER_MAX,
>> so that copy_from_kernel_nofault() will return -ERANGE.
>>
>> Reported-by: Qian Cai <qcai@redhat.com>
>> Fixes: c33165253492 ("powerpc: use non-set_fs based maccess routines")
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: Al Viro <viro@zeniv.linux.org.uk>
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> ---
>> This issue was introduced in 5.10. I didn't mark it for stable, hopping it will go into 5.10-rc7
>> ---
>> arch/powerpc/mm/Makefile | 2 +-
>> arch/powerpc/mm/maccess.c | 9 +++++++++
>> 2 files changed, 10 insertions(+), 1 deletion(-)
>> create mode 100644 arch/powerpc/mm/maccess.c
>>
>> diff --git a/arch/powerpc/mm/maccess.c b/arch/powerpc/mm/maccess.c
>> new file mode 100644
>> index 000000000000..56e97c0fb233
>> --- /dev/null
>> +++ b/arch/powerpc/mm/maccess.c
>> @@ -0,0 +1,9 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +
>> +#include <linux/uaccess.h>
>> +#include <linux/kernel.h>
>> +
>> +bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
>> +{
>> + return (unsigned long)unsafe_src >= TASK_SIZE_MAX;
>> +}
>
> Is there a reason we're using TASK_SIZE_MAX?
No special reason, that's just copied from x86.
>
> It's copy from *kernel* (nofault) allowed, so shouldn't we be checking
> that the address plausibly points at kernel memory? Not at no-man's land
> above TASK_SIZE_MAX but below the start of kernel memory?
Yes, on PPC64 that's right. On PPC32 the kernel memory starts where the userland stops.
>
> We have is_kernel_addr() which already encapsulates some platform quirks
> around that logic, it seems like it would be a better fit?
Yes probably, I send v2. For PPC32 that's a comparison with TASK_SIZE thought.
>
> ie:
>
> bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
> {
> return is_kernel_addr((unsigned long)unsafe_src);
> }
>
> cheers
>
Christophe
^ permalink raw reply
* [PATCH v2] powerpc/mm: Fix KUAP warning by providing copy_from_kernel_nofault_allowed()
From: Christophe Leroy @ 2020-12-07 16:58 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, hch,
viro, akpm
Cc: linux-mm, linuxppc-dev, linux-kernel
Since commit c33165253492 ("powerpc: use non-set_fs based maccess
routines"), userspace access is not granted anymore when using
copy_from_kernel_nofault()
However, kthread_probe_data() uses copy_from_kernel_nofault()
to check validity of pointers. When the pointer is NULL,
it points to userspace, leading to a KUAP fault and triggering
the following big hammer warning many times when you request
a sysrq "show task":
[ 1117.202054] ------------[ cut here ]------------
[ 1117.202102] Bug: fault blocked by AP register !
[ 1117.202261] WARNING: CPU: 0 PID: 377 at arch/powerpc/include/asm/nohash/32/kup-8xx.h:66 do_page_fault+0x4a8/0x5ec
[ 1117.202310] Modules linked in:
[ 1117.202428] CPU: 0 PID: 377 Comm: sh Tainted: G W 5.10.0-rc5-01340-g83f53be2de31-dirty #4175
[ 1117.202499] NIP: c0012048 LR: c0012048 CTR: 00000000
[ 1117.202573] REGS: cacdbb88 TRAP: 0700 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty)
[ 1117.202625] MSR: 00021032 <ME,IR,DR,RI> CR: 24082222 XER: 20000000
[ 1117.202899]
[ 1117.202899] GPR00: c0012048 cacdbc40 c2929290 00000023 c092e554 00000001 c09865e8 c092e640
[ 1117.202899] GPR08: 00001032 00000000 00000000 00014efc 28082224 100d166a 100a0920 00000000
[ 1117.202899] GPR16: 100cac0c 100b0000 1080c3fc 1080d685 100d0000 100d0000 00000000 100a0900
[ 1117.202899] GPR24: 100d0000 c07892ec 00000000 c0921510 c21f4440 0000005c c0000000 cacdbc80
[ 1117.204362] NIP [c0012048] do_page_fault+0x4a8/0x5ec
[ 1117.204461] LR [c0012048] do_page_fault+0x4a8/0x5ec
[ 1117.204509] Call Trace:
[ 1117.204609] [cacdbc40] [c0012048] do_page_fault+0x4a8/0x5ec (unreliable)
[ 1117.204771] [cacdbc70] [c00112f0] handle_page_fault+0x8/0x34
[ 1117.204911] --- interrupt: 301 at copy_from_kernel_nofault+0x70/0x1c0
[ 1117.204979] NIP: c010dbec LR: c010dbac CTR: 00000001
[ 1117.205053] REGS: cacdbc80 TRAP: 0301 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty)
[ 1117.205104] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 28082224 XER: 00000000
[ 1117.205416] DAR: 0000005c DSISR: c0000000
[ 1117.205416] GPR00: c0045948 cacdbd38 c2929290 00000001 00000017 00000017 00000027 0000000f
[ 1117.205416] GPR08: c09926ec 00000000 00000000 3ffff000 24082224
[ 1117.206106] NIP [c010dbec] copy_from_kernel_nofault+0x70/0x1c0
[ 1117.206202] LR [c010dbac] copy_from_kernel_nofault+0x30/0x1c0
[ 1117.206258] --- interrupt: 301
[ 1117.206372] [cacdbd38] [c004bbb0] kthread_probe_data+0x44/0x70 (unreliable)
[ 1117.206561] [cacdbd58] [c0045948] print_worker_info+0xe0/0x194
[ 1117.206717] [cacdbdb8] [c00548ac] sched_show_task+0x134/0x168
[ 1117.206851] [cacdbdd8] [c005a268] show_state_filter+0x70/0x100
[ 1117.206989] [cacdbe08] [c039baa0] sysrq_handle_showstate+0x14/0x24
[ 1117.207122] [cacdbe18] [c039bf18] __handle_sysrq+0xac/0x1d0
[ 1117.207257] [cacdbe48] [c039c0c0] write_sysrq_trigger+0x4c/0x74
[ 1117.207407] [cacdbe68] [c01fba48] proc_reg_write+0xb4/0x114
[ 1117.207550] [cacdbe88] [c0179968] vfs_write+0x12c/0x478
[ 1117.207686] [cacdbf08] [c0179e60] ksys_write+0x78/0x128
[ 1117.207826] [cacdbf38] [c00110d0] ret_from_syscall+0x0/0x34
[ 1117.207938] --- interrupt: c01 at 0xfd4e784
[ 1117.208008] NIP: 0fd4e784 LR: 0fe0f244 CTR: 10048d38
[ 1117.208083] REGS: cacdbf48 TRAP: 0c01 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty)
[ 1117.208134] MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 44002222 XER: 00000000
[ 1117.208470]
[ 1117.208470] GPR00: 00000004 7fc34090 77bfb4e0 00000001 1080fa40 00000002 7400000f fefefeff
[ 1117.208470] GPR08: 7f7f7f7f 10048d38 1080c414 7fc343c0 00000000
[ 1117.209104] NIP [0fd4e784] 0xfd4e784
[ 1117.209180] LR [0fe0f244] 0xfe0f244
[ 1117.209236] --- interrupt: c01
[ 1117.209274] Instruction dump:
[ 1117.209353] 714a4000 418200f0 73ca0001 40820084 73ca0032 408200f8 73c90040 4082ff60
[ 1117.209727] 0fe00000 3c60c082 386399f4 48013b65 <0fe00000> 80010034 3860000b 7c0803a6
[ 1117.210102] ---[ end trace 1927c0323393af3e ]---
To avoid that, copy_from_kernel_nofault_allowed() is used to check
whether the address is a valid kernel address. But the default
version of it returns true for any address.
Provide a powerpc version of copy_from_kernel_nofault_allowed()
that returns false when the address is below TASK_USER_MAX,
so that copy_from_kernel_nofault() will return -ERANGE.
Reported-by: Qian Cai <qcai@redhat.com>
Fixes: c33165253492 ("powerpc: use non-set_fs based maccess routines")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
This issue was introduced in 5.10-rc1. I didn't mark it for stable, hopping it will go into 5.10
v2: Using is_kernel_addr() instead of comparison to TASK_SIZE_MAX.
---
arch/powerpc/mm/Makefile | 2 +-
arch/powerpc/mm/maccess.c | 9 +++++++++
2 files changed, 10 insertions(+), 1 deletion(-)
create mode 100644 arch/powerpc/mm/maccess.c
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 5e147986400d..55b4a8bd408a 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -5,7 +5,7 @@
ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC)
-obj-y := fault.o mem.o pgtable.o mmap.o \
+obj-y := fault.o mem.o pgtable.o mmap.o maccess.o \
init_$(BITS).o pgtable_$(BITS).o \
pgtable-frag.o ioremap.o ioremap_$(BITS).o \
init-common.o mmu_context.o drmem.o
diff --git a/arch/powerpc/mm/maccess.c b/arch/powerpc/mm/maccess.c
new file mode 100644
index 000000000000..fa9a7a718fc6
--- /dev/null
+++ b/arch/powerpc/mm/maccess.c
@@ -0,0 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/uaccess.h>
+#include <linux/kernel.h>
+
+bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
+{
+ return is_kernel_addr((unsigned long)unsafe_src);
+}
--
2.25.0
^ permalink raw reply related
* Re: [PATCH v5 19/19] dt-bindings: usb: intel, keembay-dwc3: Validate DWC3 sub-node
From: Rob Herring @ 2020-12-07 16:53 UTC (permalink / raw)
To: Serge Semin
Cc: Neil Armstrong, Bjorn Andersson, Pavel Parkhomenko, Kevin Hilman,
Krzysztof Kozlowski, Ahmad Zainie, Andy Gross, Chunfeng Yun,
linux-snps-arc, devicetree, Mathias Nyman, Martin Blumenstingl,
Lad Prabhakar, Alexey Malahov, Rob Herring, linux-arm-kernel,
Roger Quadros, Felipe Balbi, Greg Kroah-Hartman,
Yoshihiro Shimoda, linux-usb, linux-mips, Serge Semin,
linux-kernel, Manu Gautam, linuxppc-dev
In-Reply-To: <20201205152427.29537-20-Sergey.Semin@baikalelectronics.ru>
On Sat, 05 Dec 2020 18:24:26 +0300, Serge Semin wrote:
> Intel Keem Bay DWC3 compatible DT nodes are supposed to have a DWC USB3
> compatible sub-node to describe a fully functioning USB interface. Let's
> use the available DWC USB3 DT schema to validate the Qualcomm DWC3
> sub-nodes.
>
> Note since the generic DWC USB3 DT node is supposed to be named as generic
> USB HCD ("^usb(@.*)?") one we have to accordingly fix the sub-nodes name
> regexp and fix the DT node example.
>
> Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
>
> ---
>
> Changelog v5:
> - This is a new patch created for the new Intel Keem Bay bindings file,
> which has been added just recently.
> ---
> .../devicetree/bindings/usb/intel,keembay-dwc3.yaml | 9 +++------
> 1 file changed, 3 insertions(+), 6 deletions(-)
>
Reviewed-by: Rob Herring <robh@kernel.org>
^ permalink raw reply
* Re: [PATCH v5 01/19] dt-bindings: usb: usb-hcd: Detach generic USB controller properties
From: Rob Herring @ 2020-12-07 16:50 UTC (permalink / raw)
To: Serge Semin
Cc: Neil Armstrong, linux-mips, Pavel Parkhomenko, Kevin Hilman,
Krzysztof Kozlowski, Ahmad Zainie, Andy Gross, Chunfeng Yun,
linux-snps-arc, devicetree, Mathias Nyman, Martin Blumenstingl,
Lad Prabhakar, Alexey Malahov, Rob Herring, Bjorn Andersson,
linux-arm-kernel, Roger Quadros, Felipe Balbi, Greg Kroah-Hartman,
Yoshihiro Shimoda, linux-usb, linux-kernel, Serge Semin,
Manu Gautam, linuxppc-dev
In-Reply-To: <20201205152427.29537-2-Sergey.Semin@baikalelectronics.ru>
On Sat, 05 Dec 2020 18:24:08 +0300, Serge Semin wrote:
> There can be three distinctive types of the USB controllers: USB hosts,
> USB peripherals/gadgets and USB OTG, which can switch from one role to
> another. In order to have that hierarchy handled in the DT binding files,
> we need to collect common properties in a common DT schema and specific
> properties in dedicated schemas. Seeing the usb-hcd.yaml DT schema is
> dedicated for the USB host controllers only, let's move some common
> properties from there into the usb.yaml schema. So the later would be
> available to evaluate all currently supported types of the USB
> controllers.
>
> While at it add an explicit "additionalProperties: true" into the
> usb-hcd.yaml as setting the additionalProperties/unevaluateProperties
> properties is going to be get mandatory soon.
>
> Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
>
> ---
>
> Changelog v4:
> - This is a new patch created as a result of the comment left
> by Chunfeng Yun in v3
>
> Changelog v5:
> - Discard duplicated additionalProperties property definition.
> ---
> .../devicetree/bindings/usb/usb-hcd.yaml | 14 ++-------
> .../devicetree/bindings/usb/usb.yaml | 29 +++++++++++++++++++
> 2 files changed, 31 insertions(+), 12 deletions(-)
> create mode 100644 Documentation/devicetree/bindings/usb/usb.yaml
>
My bot found errors running 'make dt_binding_check' on your patch:
yamllint warnings/errors:
dtschema/dtc warnings/errors:
schemas/usb/usb-hcd.yaml: ignoring, error in schema:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/usb/usb-hcd.yaml: 'anyOf' conditional failed, one must be fixed:
'properties' is a required property
'patternProperties' is a required property
schemas/usb/usb-hcd.yaml: ignoring, error in schema:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/usb/usb-hcd.yaml: ignoring, error in schema:
warning: no schema found in file: ./Documentation/devicetree/bindings/usb/usb-hcd.yaml
schemas/usb/usb-hcd.yaml: ignoring, error in schema:
dt-validate: recursion error: Check for prior errors in a referenced schema
schemas/usb/usb-hcd.yaml: ignoring, error in schema:
dt-validate: recursion error: Check for prior errors in a referenced schema
schemas/usb/usb-hcd.yaml: ignoring, error in schema:
dt-validate: recursion error: Check for prior errors in a referenced schema
schemas/usb/usb-hcd.yaml: ignoring, error in schema:
dt-validate: recursion error: Check for prior errors in a referenced schema
schemas/usb/usb-hcd.yaml: ignoring, error in schema:
dt-validate: recursion error: Check for prior errors in a referenced schema
dt-validate: recursion error: Check for prior errors in a referenced schema
See https://patchwork.ozlabs.org/patch/1411574
The base for the patch is generally the last rc1. Any dependencies
should be noted.
If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:
pip3 install dtschema --upgrade
Please check and re-submit.
^ permalink raw reply
* Re: [PATCH v5 10/19] dt-bindings: usb: Convert DWC USB3 bindings to DT schema
From: Rob Herring @ 2020-12-07 16:49 UTC (permalink / raw)
To: Serge Semin
Cc: Neil Armstrong, linux-mips, Pavel Parkhomenko, Kevin Hilman,
Krzysztof Kozlowski, Ahmad Zainie, Andy Gross, Chunfeng Yun,
linux-snps-arc, devicetree, Mathias Nyman, Martin Blumenstingl,
Lad Prabhakar, Alexey Malahov, Rob Herring, Bjorn Andersson,
linux-arm-kernel, Roger Quadros, Felipe Balbi, Greg Kroah-Hartman,
Yoshihiro Shimoda, linux-usb, linux-kernel, Serge Semin,
Manu Gautam, linuxppc-dev
In-Reply-To: <20201205152427.29537-11-Sergey.Semin@baikalelectronics.ru>
On Sat, 05 Dec 2020 18:24:17 +0300, Serge Semin wrote:
> DWC USB3 DT node is supposed to be compliant with the Generic xHCI
> Controller schema, but with additional vendor-specific properties, the
> controller-specific reference clocks and PHYs. So let's convert the
> currently available legacy text-based DWC USB3 bindings to the DT schema
> and make sure the DWC USB3 nodes are also validated against the
> usb-xhci.yaml schema.
>
> Note 1. we have to discard the nodename restriction of being prefixed with
> "dwc3@" string, since in accordance with the usb-hcd.yaml schema USB nodes
> are supposed to be named as "^usb(@.*)".
>
> Note 2. The clock-related properties are marked as optional to match the
> DWC USB3 driver expectation and to improve the bindings mainainability
> so in case if there is a glue-node it would the responsible for the
> clocks initialization.
>
> Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
>
> ---
>
> Changelog v2:
> - Discard '|' from the descriptions, since we don't need to preserve
> the text formatting in any of them.
> - Drop quotes from around the string constants.
> - Fix the "clock-names" prop description to be referring the enumerated
> clock-names instead of the ones from the Databook.
>
> Changelog v3:
> - Apply usb-xhci.yaml# schema only if the controller is supposed to work
> as either host or otg.
>
> Changelog v4:
> - Apply usb-drd.yaml schema first. If the controller is configured
> to work in a gadget mode only, then apply the usb.yaml schema too,
> otherwise apply the usb-xhci.yaml schema.
> - Discard the Rob'es Reviewed-by tag. Please review the patch one more
> time.
>
> Changelog v5:
> - Add "snps,dis-split-quirk" property to the DWC USB3 DT schema.
> - Add a commit log text about the clock-related property changes.
> - Make sure dr_mode exist to apply the USB-gadget-only schema.
> ---
> .../devicetree/bindings/usb/dwc3.txt | 128 -------
> .../devicetree/bindings/usb/snps,dwc3.yaml | 312 ++++++++++++++++++
> 2 files changed, 312 insertions(+), 128 deletions(-)
> delete mode 100644 Documentation/devicetree/bindings/usb/dwc3.txt
> create mode 100644 Documentation/devicetree/bindings/usb/snps,dwc3.yaml
>
My bot found errors running 'make dt_binding_check' on your patch:
yamllint warnings/errors:
./Documentation/devicetree/bindings/usb/snps,dwc3.yaml:55:4: [warning] wrong indentation: expected 4 but found 3 (indentation)
dtschema/dtc warnings/errors:
Unknown file referenced: [Errno 2] No such file or directory: '/usr/local/lib/python3.8/dist-packages/dtschema/schemas/usb/usb-drd.yaml'
xargs: dt-doc-validate: exited with status 255; aborting
make[1]: *** [Documentation/devicetree/bindings/Makefile:59: Documentation/devicetree/bindings/processed-schema-examples.json] Error 124
make: *** [Makefile:1364: dt_binding_check] Error 2
See https://patchwork.ozlabs.org/patch/1411582
The base for the patch is generally the last rc1. Any dependencies
should be noted.
If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:
pip3 install dtschema --upgrade
Please check and re-submit.
^ permalink raw reply
* [PATCH] powerpc: fix spelling mistake in Kconfig "seleted" -> "selected"
From: Colin King @ 2020-12-07 15:54 UTC (permalink / raw)
To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
linuxppc-dev
Cc: kernel-janitors, linux-kernel
From: Colin Ian King <colin.king@canonical.com>
There is a spelling mistake in the help text of the Kconfig. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
arch/powerpc/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8fb61a285c76..4010bae52351 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -87,7 +87,7 @@ config PPC_WATCHDOG
help
This is a placeholder when the powerpc hardlockup detector
watchdog is selected (arch/powerpc/kernel/watchdog.c). It is
- seleted via the generic lockup detector menu which is why we
+ selected via the generic lockup detector menu which is why we
have no standalone config option for it here.
config STACKTRACE_SUPPORT
--
2.29.2
^ permalink raw reply related
* [PATCH] arch: fix 'unexpected IRQ trap at vector' warnings
From: Enrico Weigelt, metux IT consult @ 2020-12-07 14:31 UTC (permalink / raw)
To: linux-kernel
Cc: linux-s390, hpa, linux-parisc, deller, x86, linux-um,
James.Bottomley, mingo, paulus, richard, bp, tglx, linuxppc-dev,
jdike, anton.ivanov
All archs, except Alpha, print out the irq number in hex, but the message
looks like it was a decimal number, which is quite confusing. Fixing this
by adding "0x" prefix.
Signed-off-by: Enrico Weigelt, metux IT consult <info@metux.net>
---
arch/arm/include/asm/hw_irq.h | 2 +-
arch/parisc/include/asm/hardirq.h | 2 +-
arch/powerpc/include/asm/hardirq.h | 2 +-
arch/s390/include/asm/hardirq.h | 2 +-
arch/um/include/asm/hardirq.h | 2 +-
arch/x86/kernel/irq.c | 2 +-
6 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/arm/include/asm/hw_irq.h b/arch/arm/include/asm/hw_irq.h
index cecc13214ef1..2749f19271d9 100644
--- a/arch/arm/include/asm/hw_irq.h
+++ b/arch/arm/include/asm/hw_irq.h
@@ -9,7 +9,7 @@ static inline void ack_bad_irq(int irq)
{
extern unsigned long irq_err_count;
irq_err_count++;
- pr_crit("unexpected IRQ trap at vector %02x\n", irq);
+ pr_crit("unexpected IRQ trap at vector 0x%02x\n", irq);
}
#define ARCH_IRQ_INIT_FLAGS (IRQ_NOREQUEST | IRQ_NOPROBE)
diff --git a/arch/parisc/include/asm/hardirq.h b/arch/parisc/include/asm/hardirq.h
index 7f7039516e53..c3348af88d3f 100644
--- a/arch/parisc/include/asm/hardirq.h
+++ b/arch/parisc/include/asm/hardirq.h
@@ -35,6 +35,6 @@ DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
#define __IRQ_STAT(cpu, member) (irq_stat[cpu].member)
#define inc_irq_stat(member) this_cpu_inc(irq_stat.member)
#define __inc_irq_stat(member) __this_cpu_inc(irq_stat.member)
-#define ack_bad_irq(irq) WARN(1, "unexpected IRQ trap at vector %02x\n", irq)
+#define ack_bad_irq(irq) WARN(1, "unexpected IRQ trap at vector 0x%02x\n", irq)
#endif /* _PARISC_HARDIRQ_H */
diff --git a/arch/powerpc/include/asm/hardirq.h b/arch/powerpc/include/asm/hardirq.h
index f133b5930ae1..ec8cf3cf6e49 100644
--- a/arch/powerpc/include/asm/hardirq.h
+++ b/arch/powerpc/include/asm/hardirq.h
@@ -29,7 +29,7 @@ DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
static inline void ack_bad_irq(unsigned int irq)
{
- printk(KERN_CRIT "unexpected IRQ trap at vector %02x\n", irq);
+ printk(KERN_CRIT "unexpected IRQ trap at vector 0x%02x\n", irq);
}
extern u64 arch_irq_stat_cpu(unsigned int cpu);
diff --git a/arch/s390/include/asm/hardirq.h b/arch/s390/include/asm/hardirq.h
index dfbc3c6c0674..aaaec5cdd4fe 100644
--- a/arch/s390/include/asm/hardirq.h
+++ b/arch/s390/include/asm/hardirq.h
@@ -23,7 +23,7 @@
static inline void ack_bad_irq(unsigned int irq)
{
- printk(KERN_CRIT "unexpected IRQ trap at vector %02x\n", irq);
+ printk(KERN_CRIT "unexpected IRQ trap at vector 0x%02x\n", irq);
}
#endif /* __ASM_HARDIRQ_H */
diff --git a/arch/um/include/asm/hardirq.h b/arch/um/include/asm/hardirq.h
index b426796d26fd..2a2e6eae034b 100644
--- a/arch/um/include/asm/hardirq.h
+++ b/arch/um/include/asm/hardirq.h
@@ -15,7 +15,7 @@ typedef struct {
#ifndef ack_bad_irq
static inline void ack_bad_irq(unsigned int irq)
{
- printk(KERN_CRIT "unexpected IRQ trap at vector %02x\n", irq);
+ printk(KERN_CRIT "unexpected IRQ trap at vector 0x%02x\n", irq);
}
#endif
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index c5dd50369e2f..957c716f2df7 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -37,7 +37,7 @@ atomic_t irq_err_count;
void ack_bad_irq(unsigned int irq)
{
if (printk_ratelimit())
- pr_err("unexpected IRQ trap at vector %02x\n", irq);
+ pr_err("unexpected IRQ trap at vector 0x%02x\n", irq);
/*
* Currently unexpected vectors happen only on SMP and APIC.
--
2.11.0
^ permalink raw reply related
* Re: [PATCH 3/3] powerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache
From: Srikar Dronamraju @ 2020-12-07 13:11 UTC (permalink / raw)
To: Gautham R. Shenoy
Cc: Nathan Lynch, Michael Neuling, Vaidyanathan Srinivasan,
Peter Zijlstra, linux-kernel, Nicholas Piggin, linuxppc-dev,
Valentin Schneider
In-Reply-To: <1607057327-29822-4-git-send-email-ego@linux.vnet.ibm.com>
* Gautham R. Shenoy <ego@linux.vnet.ibm.com> [2020-12-04 10:18:47]:
> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
>
>
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
>
> +extern bool thread_group_shares_l2;
> /*
> * On big-core systems, each core has two groups of CPUs each of which
> * has its own L1-cache. The thread-siblings which share l1-cache with
> * @cpu can be obtained via cpu_smallcore_mask().
> + *
> + * On some big-core systems, the L2 cache is shared only between some
> + * groups of siblings. This is already parsed and encoded in
> + * cpu_l2_cache_mask().
> */
> static const struct cpumask *get_big_core_shared_cpu_map(int cpu, struct cache *cache)
> {
> if (cache->level == 1)
> return cpu_smallcore_mask(cpu);
> + if (cache->level == 2 && thread_group_shares_l2)
> + return cpu_l2_cache_mask(cpu);
>
> return &cache->shared_cpu_map;
As pointed with lkp@intel.org, we need to do this only with #CONFIG_SMP,
even for cache->level = 1 too.
I agree that we are displaying shared_cpu_map correctly. Should we have also
update /clear shared_cpu_map in the first place. For example:- If for a P9
core with CPUs 0-7, the cache->shared_cpu_map for L1 would have 0-7 but
would display 0,2,4,6.
The drawback of this is even if cpus 0,2,4,6 are released L1 cache will not
be released. Is this as expected?
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply
* Re: [PATCH 2/3] powerpc/smp: Add support detecting thread-groups sharing L2 cache
From: Srikar Dronamraju @ 2020-12-07 12:40 UTC (permalink / raw)
To: Gautham R. Shenoy
Cc: Nathan Lynch, Michael Neuling, Vaidyanathan Srinivasan,
Peter Zijlstra, linux-kernel, Nicholas Piggin, linuxppc-dev,
Valentin Schneider
In-Reply-To: <1607057327-29822-3-git-send-email-ego@linux.vnet.ibm.com>
* Gautham R. Shenoy <ego@linux.vnet.ibm.com> [2020-12-04 10:18:46]:
> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
>
> On POWER systems, groups of threads within a core sharing the L2-cache
> can be indicated by the "ibm,thread-groups" property array with the
> identifier "2".
>
> This patch adds support for detecting this, and when present, populate
> the populating the cpu_l2_cache_mask of every CPU to the core-siblings
> which share L2 with the CPU as specified in the by the
> "ibm,thread-groups" property array.
>
> On a platform with the following "ibm,thread-group" configuration
> 00000001 00000002 00000004 00000000
> 00000002 00000004 00000006 00000001
> 00000003 00000005 00000007 00000002
> 00000002 00000004 00000000 00000002
> 00000004 00000006 00000001 00000003
> 00000005 00000007
>
> Without this patch, the sched-domain hierarchy for CPUs 0,1 would be
> CPU0 attaching sched-domain(s):
> domain-0: span=0,2,4,6 level=SMT
> domain-1: span=0-7 level=CACHE
> domain-2: span=0-15,24-39,48-55 level=MC
> domain-3: span=0-55 level=DIE
>
> CPU1 attaching sched-domain(s):
> domain-0: span=1,3,5,7 level=SMT
> domain-1: span=0-7 level=CACHE
> domain-2: span=0-15,24-39,48-55 level=MC
> domain-3: span=0-55 level=DIE
>
> The CACHE domain at 0-7 is incorrect since the ibm,thread-groups
> sub-array
> [00000002 00000002 00000004
> 00000000 00000002 00000004 00000006
> 00000001 00000003 00000005 00000007]
> indicates that L2 (Property "2") is shared only between the threads of a single
> group. There are "2" groups of threads where each group contains "4"
> threads each. The groups being {0,2,4,6} and {1,3,5,7}.
>
> With this patch, the sched-domain hierarchy for CPUs 0,1 would be
> CPU0 attaching sched-domain(s):
> domain-0: span=0,2,4,6 level=SMT
> domain-1: span=0-15,24-39,48-55 level=MC
> domain-2: span=0-55 level=DIE
>
> CPU1 attaching sched-domain(s):
> domain-0: span=1,3,5,7 level=SMT
> domain-1: span=0-15,24-39,48-55 level=MC
> domain-2: span=0-55 level=DIE
>
> The CACHE domain with span=0,2,4,6 for CPU 0 (span=1,3,5,7 for CPU 1
> resp.) gets degenerated into the SMT domain. Furthermore, the
> last-level-cache domain gets correctly set to the SMT sched-domain.
>
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
> arch/powerpc/kernel/smp.c | 66 +++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 58 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 6a242a3..a116d2d 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -76,6 +76,7 @@
> struct task_struct *secondary_current;
> bool has_big_cores;
> bool coregroup_enabled;
> +bool thread_group_shares_l2;
Either keep this as static in this patch or add its declaration
>
> DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
> DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
> @@ -99,6 +100,7 @@ enum {
>
> #define MAX_THREAD_LIST_SIZE 8
> #define THREAD_GROUP_SHARE_L1 1
> +#define THREAD_GROUP_SHARE_L2 2
> struct thread_groups {
> unsigned int property;
> unsigned int nr_groups;
> @@ -107,7 +109,7 @@ struct thread_groups {
> };
>
> /* Maximum number of properties that groups of threads within a core can share */
> -#define MAX_THREAD_GROUP_PROPERTIES 1
> +#define MAX_THREAD_GROUP_PROPERTIES 2
>
> struct thread_groups_list {
> unsigned int nr_properties;
> @@ -121,6 +123,13 @@ struct thread_groups_list {
> */
> DEFINE_PER_CPU(cpumask_var_t, cpu_l1_cache_map);
>
> +/*
> + * On some big-cores system, thread_group_l2_cache_map for each CPU
> + * corresponds to the set its siblings within the core that share the
> + * L2-cache.
> + */
> +DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map);
> +
NIT:
We are trying to confuse ourselves with the names.
For L1 we have cpu_l2_cache_map to store the tasks from the thread group.
but cpu_smallcore_map for keeping track of tasks.
For L2 we have thread_group_l2_cache_map to store the tasks from the thread
group. but cpu_l2_cache_map for keeping track of tasks.
I think we should do some renaming to keep the names consistent.
I would say probably say move the current cpu_l2_cache_map to
cpu_llc_cache_map and move the new aka thread_group_l2_cache_map as
cpu_l2_cache_map to be somewhat consistent.
> /* SMP operations for this machine */
> struct smp_ops_t *smp_ops;
>
> @@ -840,7 +851,8 @@ static int init_cpu_cache_map(int cpu, unsigned int cache_property)
> if (!dn)
> return -ENODATA;
>
> - if (!(cache_property == THREAD_GROUP_SHARE_L1))
> + if (!(cache_property == THREAD_GROUP_SHARE_L1 ||
> + cache_property == THREAD_GROUP_SHARE_L2))
> return -EINVAL;
>
> if (!cpu_tgl->nr_properties) {
> @@ -867,7 +879,10 @@ static int init_cpu_cache_map(int cpu, unsigned int cache_property)
> goto out;
> }
>
> - mask = &per_cpu(cpu_l1_cache_map, cpu);
> + if (cache_property == THREAD_GROUP_SHARE_L1)
> + mask = &per_cpu(cpu_l1_cache_map, cpu);
> + else if (cache_property == THREAD_GROUP_SHARE_L2)
> + mask = &per_cpu(thread_group_l2_cache_map, cpu);
>
> zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu));
>
> @@ -973,6 +988,16 @@ static int init_big_cores(void)
> }
>
> has_big_cores = true;
> +
> + for_each_possible_cpu(cpu) {
> + int err = init_cpu_cache_map(cpu, THREAD_GROUP_SHARE_L2);
> +
> + if (err)
> + return err;
> + }
> +
> + thread_group_shares_l2 = true;
Why do we need a separate loop. Why cant we merge this in the above loop
itself?
> + pr_info("Thread-groups in a core share L2-cache\n");
Can this be moved to a pr_debug? Does it help any regular user/admins to
know if thread-groups shared l2 cache. Infact it may confuse users on what
thread groups are and which thread groups dont share cache.
I would prefer some other name than thread_group_shares_l2 but dont know any
better alternatives and may be my choices are even worse.
> return 0;
> }
>
> @@ -1287,6 +1312,31 @@ static bool update_mask_by_l2(int cpu, cpumask_var_t *mask)
> if (has_big_cores)
> submask_fn = cpu_smallcore_mask;
>
> +
NIT: extra blank line?
> + /*
> + * If the threads in a thread-group share L2 cache, then then
> + * the L2-mask can be obtained from thread_group_l2_cache_map.
> + */
> + if (thread_group_shares_l2) {
> + /* Siblings that share L1 is a subset of siblings that share L2.*/
> + or_cpumasks_related(cpu, cpu, submask_fn, cpu_l2_cache_mask);
> + if (*mask) {
> + cpumask_andnot(*mask,
> + per_cpu(thread_group_l2_cache_map, cpu),
> + cpu_l2_cache_mask(cpu));
> + } else {
> + mask = &per_cpu(thread_group_l2_cache_map, cpu);
> + }
> +
> + for_each_cpu(i, *mask) {
> + if (!cpu_online(i))
> + continue;
> + set_cpus_related(i, cpu, cpu_l2_cache_mask);
> + }
> +
> + return true;
> + }
> +
Ah this can be simplified to:
if (thread_group_shares_l2) {
cpumask_set_cpu(cpu, cpu_l2_cache_mask(cpu));
for_each_cpu(i, per_cpu(thread_group_l2_cache_map, cpu)) {
if (cpu_online(i))
set_cpus_related(i, cpu, cpu_l2_cache_mask);
}
}
No?
> l2_cache = cpu_to_l2cache(cpu);
> if (!l2_cache || !*mask) {
> /* Assume only core siblings share cache with this CPU */
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply
* Re: [powerpc:next-test 54/220] arch/powerpc/kernel/vdso32/vgettimeofday.c:13:5: warning: no previous prototype for function '__c_kernel_clock_gettime64'
From: Michael Ellerman @ 2020-12-07 12:23 UTC (permalink / raw)
To: kernel test robot, Christophe Leroy
Cc: clang-built-linux, kbuild-all, linuxppc-dev
In-Reply-To: <202012042220.zO7hSFT2-lkp@intel.com>
kernel test robot <lkp@intel.com> writes:
> tree: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test
> head: 4e4ed87981c764498942c52004c620bb8f104eac
> commit: d0e3fc69d00d1f50d22d6b6acfc555ccda80ad1e [54/220] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32
> config: powerpc64-randconfig-r011-20201204 (attached as .config)
> compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 32c501dd88b62787d3a5ffda7aabcf4650dbe3cd)
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # install powerpc64 cross compiling tool for clang build
> # apt-get install binutils-powerpc64-linux-gnu
> # https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=d0e3fc69d00d1f50d22d6b6acfc555ccda80ad1e
> git remote add powerpc https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
> git fetch --no-tags powerpc next-test
> git checkout d0e3fc69d00d1f50d22d6b6acfc555ccda80ad1e
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All warnings (new ones prefixed by >>):
>
> arch/powerpc/kernel/vdso32/vgettimeofday.c:7:5: error: conflicting types for '__c_kernel_clock_gettime'
> int __c_kernel_clock_gettime(clockid_t clock, struct old_timespec32 *ts,
> ^
We're building vdso32, which is 32-bit code, we pass -m32:
clang -Wp,-MMD,arch/powerpc/kernel/vdso32/.vgettimeofday.o.d -nostdinc -isystem /usr/lib/llvm-11/lib/clang/11.0.0/include -I/linux/arch/powerpc/include -I./arch/powerpc/include/generated -I/linux/include -I./include -I/linux/arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi -I/linux/include/uapi -I./include/generated/uapi -include /linux/include/linux/kconfig.h -include /linux/include/linux/compiler_types.h -D__KERNEL__ -I /linux/arch/powerpc -DHAVE_AS_ATHIGH=1 -Qunused-arguments -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu89 --target=powerpc64le-linux-gnu --prefix=/usr/bin/powerpc64le-linux-gnu- --gcc-toolchain=/usr -no-integrated-as -Werror=unknown-warning-option -mlittle-endian -m64 -msoft-float -pipe -mcpu=power8 -mtune=power9 -mno-altivec -mno-vsx -mno-spe -fno-asynchronous-unwind-tables -Wa,-mpower4 -Wa,-many -mlittle-endian -fno-delete-null-pointer-checks -Wno-frame-address -Wno-address-of-packed-member -Os -Wframe-larger-than=2048 -fno-stack-protector -Wno-format-invalid-specifier -Wno-gnu -mno-global-merge -Wno-unused-const-variable -fomit-frame-pointer -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-array-bounds -fno-strict-overflow -fno-stack-check -Werror=date-time -Werror=incompatible-pointer-types -fmacro-prefix-map=/linux/= -Wno-initializer-overrides -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-pointer-to-enum-cast -Wno-tautological-constant-out-of-range-compare -D_TASK_CPU=304 -shared -fno-common -fno-builtin -nostdlib -Wl,-soname=linux-vdso32.so.1 -Wl,--hash-style=both -include /linux/lib/vdso/gettimeofday.c -fno-stack-protector -DDISABLE_BRANCH_PROFILING -ffreestanding -fasynchronous-unwind-tables -I /linux/arch/powerpc/kernel/vdso32 -I ./arch/powerpc/kernel/vdso32 -DKBUILD_MODFILE='"arch/powerpc/kernel/vdso32/vgettimeofday"' -DKBUILD_BASENAME='"vgettimeofday"' -DKBUILD_MODNAME='"vgettimeofday"' -m32 -c -o arch/powerpc/kernel/vdso32/vgettimeofday.o /linux/arch/powerpc/kernel/vdso32/vgettimeofday.c
> arch/powerpc/include/asm/vdso/gettimeofday.h:183:5: note: previous declaration is here
> int __c_kernel_clock_gettime(clockid_t clock, struct __kernel_timespec *ts,
> ^
But this is inside an #ifdef __powerpc64__ block:
182 #ifdef __powerpc64__
183 int __c_kernel_clock_gettime(clockid_t clock, struct __kernel_timespec *ts,
184 const struct vdso_data *vd);
So is clang defining __powerpc64__ even for 32-bit code?
And the answer appears to be yes:
$ clang --version
Ubuntu clang version 11.0.0-2
Target: powerpc64le-unknown-linux-gnu
$ clang -m32 -dM -E - < /dev/null | grep powerpc
#define __powerpc64__ 1
#define __powerpc__ 1
Compare to gcc:
$ gcc --version
gcc (Ubuntu 10.2.0-13ubuntu1) 10.2.0
$ gcc -m32 -dM -E - < /dev/null | grep powerpc
#define __powerpc__ 1
#define powerpc 1
#define __powerpc 1
Which is fairly problematic, because we use the presence/absence of
__powerpc64__ to determine if we're building 64-bit/32-bit code in
several places.
Not sure what the best approach for fixing that is.
cheers
^ permalink raw reply
* Re: [PATCH 1/3] powerpc/smp: Parse ibm,thread-groups with multiple properties
From: Srikar Dronamraju @ 2020-12-07 12:10 UTC (permalink / raw)
To: Gautham R. Shenoy
Cc: Nathan Lynch, Michael Neuling, Vaidyanathan Srinivasan,
Peter Zijlstra, linux-kernel, Nicholas Piggin, linuxppc-dev,
Valentin Schneider
In-Reply-To: <1607057327-29822-2-git-send-email-ego@linux.vnet.ibm.com>
* Gautham R. Shenoy <ego@linux.vnet.ibm.com> [2020-12-04 10:18:45]:
> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
<snipped>
>
> static int parse_thread_groups(struct device_node *dn,
> - struct thread_groups *tg,
> - unsigned int property)
> + struct thread_groups_list *tglp)
> {
> - int i;
> - u32 thread_group_array[3 + MAX_THREAD_LIST_SIZE];
> + int i = 0;
> + u32 *thread_group_array;
> u32 *thread_list;
> size_t total_threads;
> - int ret;
> + int ret = 0, count;
> + unsigned int property_idx = 0;
NIT:
tglx mentions in one of his recent comments to try keep a reverse fir tree
ordering of variables where possible.
>
> + count = of_property_count_u32_elems(dn, "ibm,thread-groups");
> + thread_group_array = kcalloc(count, sizeof(u32), GFP_KERNEL);
> ret = of_property_read_u32_array(dn, "ibm,thread-groups",
> - thread_group_array, 3);
> + thread_group_array, count);
> if (ret)
> - return ret;
> -
> - tg->property = thread_group_array[0];
> - tg->nr_groups = thread_group_array[1];
> - tg->threads_per_group = thread_group_array[2];
> - if (tg->property != property ||
> - tg->nr_groups < 1 ||
> - tg->threads_per_group < 1)
> - return -ENODATA;
> + goto out_free;
>
> - total_threads = tg->nr_groups * tg->threads_per_group;
> + while (i < count && property_idx < MAX_THREAD_GROUP_PROPERTIES) {
> + int j;
> + struct thread_groups *tg = &tglp->property_tgs[property_idx++];
NIT: same as above.
>
> - ret = of_property_read_u32_array(dn, "ibm,thread-groups",
> - thread_group_array,
> - 3 + total_threads);
> - if (ret)
> - return ret;
> + tg->property = thread_group_array[i];
> + tg->nr_groups = thread_group_array[i + 1];
> + tg->threads_per_group = thread_group_array[i + 2];
> + total_threads = tg->nr_groups * tg->threads_per_group;
> +
> + thread_list = &thread_group_array[i + 3];
>
> - thread_list = &thread_group_array[3];
> + for (j = 0; j < total_threads; j++)
> + tg->thread_list[j] = thread_list[j];
> + i = i + 3 + total_threads;
Can't we simply use memcpy instead?
> + }
>
> - for (i = 0 ; i < total_threads; i++)
> - tg->thread_list[i] = thread_list[i];
> + tglp->nr_properties = property_idx;
>
> - return 0;
> +out_free:
> + kfree(thread_group_array);
> + return ret;
> }
>
> /*
> @@ -805,24 +827,39 @@ static int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
> return -1;
> }
>
> -static int init_cpu_l1_cache_map(int cpu)
> +static int init_cpu_cache_map(int cpu, unsigned int cache_property)
>
> {
> struct device_node *dn = of_get_cpu_node(cpu, NULL);
> - struct thread_groups tg = {.property = 0,
> - .nr_groups = 0,
> - .threads_per_group = 0};
> + struct thread_groups *tg = NULL;
> int first_thread = cpu_first_thread_sibling(cpu);
> int i, cpu_group_start = -1, err = 0;
> + cpumask_var_t *mask;
> + struct thread_groups_list *cpu_tgl = &tgl[cpu];
NIT: same as 1st comment.
>
> if (!dn)
> return -ENODATA;
>
> - err = parse_thread_groups(dn, &tg, THREAD_GROUP_SHARE_L1);
> - if (err)
> - goto out;
> + if (!(cache_property == THREAD_GROUP_SHARE_L1))
> + return -EINVAL;
>
> - cpu_group_start = get_cpu_thread_group_start(cpu, &tg);
> + if (!cpu_tgl->nr_properties) {
> + err = parse_thread_groups(dn, cpu_tgl);
> + if (err)
> + goto out;
> + }
> +
> + for (i = 0; i < cpu_tgl->nr_properties; i++) {
> + if (cpu_tgl->property_tgs[i].property == cache_property) {
> + tg = &cpu_tgl->property_tgs[i];
> + break;
> + }
> + }
> +
> + if (!tg)
> + return -EINVAL;
> +
> + cpu_group_start = get_cpu_thread_group_start(cpu, tg);
This whole hunk should be moved to a new function and called before
init_cpu_cache_map. It will simplify the logic to great extent.
>
> if (unlikely(cpu_group_start == -1)) {
> WARN_ON_ONCE(1);
> @@ -830,11 +867,12 @@ static int init_cpu_l1_cache_map(int cpu)
> goto out;
> }
>
> - zalloc_cpumask_var_node(&per_cpu(cpu_l1_cache_map, cpu),
> - GFP_KERNEL, cpu_to_node(cpu));
> + mask = &per_cpu(cpu_l1_cache_map, cpu);
> +
> + zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu));
>
This hunk (and the next hunk) should be moved to next patch.
> for (i = first_thread; i < first_thread + threads_per_core; i++) {
> - int i_group_start = get_cpu_thread_group_start(i, &tg);
> + int i_group_start = get_cpu_thread_group_start(i, tg);
>
> if (unlikely(i_group_start == -1)) {
> WARN_ON_ONCE(1);
> @@ -843,7 +881,7 @@ static int init_cpu_l1_cache_map(int cpu)
> }
>
> if (i_group_start == cpu_group_start)
> - cpumask_set_cpu(i, per_cpu(cpu_l1_cache_map, cpu));
> + cpumask_set_cpu(i, *mask);
> }
>
> out:
> @@ -924,7 +962,7 @@ static int init_big_cores(void)
> int cpu;
>
> for_each_possible_cpu(cpu) {
> - int err = init_cpu_l1_cache_map(cpu);
> + int err = init_cpu_cache_map(cpu, THREAD_GROUP_SHARE_L1);
>
> if (err)
> return err;
> --
> 1.9.4
>
--
Thanks and Regards
Srikar Dronamraju
^ permalink raw reply
* Re: Build regressions/improvements in v5.10-rc7
From: Geert Uytterhoeven @ 2020-12-07 12:09 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: linuxppc-dev, Nicholas Piggin
In-Reply-To: <20201207120611.1315807-1-geert@linux-m68k.org>
On Mon, Dec 7, 2020 at 1:08 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> JFYI, when comparing v5.10-rc7[1] to v5.10-rc6[3], the summaries are:
> - build errors: +1/-0
+ /kisskb/src/arch/powerpc/platforms/powermac/smp.c: error: implicit
declaration of function 'cleanup_cpu_mmu_context'
[-Werror=implicit-function-declaration]: => 914:2
v5.10-rc7/powerpc-gcc4.9/pmac32_defconfig+SMP
> [1] http://kisskb.ellerman.id.au/kisskb/branch/linus/head/0477e92881850d44910a7e94fc2c46f96faa131f/ (all 192 configs)
> [3] http://kisskb.ellerman.id.au/kisskb/branch/linus/head/b65054597872ce3aefbc6a666385eabdf9e288da/ (all 192 configs)
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox