* [PATCH v2 1/8] bpf powerpc: Remove unused SEEN_STACK
From: Hari Bathini @ 2021-09-17 15:30 UTC (permalink / raw)
To: naveen.n.rao, mpe, ast, daniel
Cc: Ravi Bangoria, songliubraving, netdev, john.fastabend, andrii,
kpsingh, paulus, yhs, bpf, linuxppc-dev, kafai
In-Reply-To: <20210917153047.177141-1-hbathini@linux.ibm.com>
From: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
SEEN_STACK is unused on PowerPC. Remove it. Also, have
SEEN_TAILCALL use 0x40000000.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
* No changes in v2.
arch/powerpc/net/bpf_jit.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 99fad093f43e..d6267e93027a 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -116,8 +116,7 @@ static inline bool is_nearbranch(int offset)
#define COND_LE (CR0_GT | COND_CMP_FALSE)
#define SEEN_FUNC 0x20000000 /* might call external helpers */
-#define SEEN_STACK 0x40000000 /* uses BPF stack */
-#define SEEN_TAILCALL 0x80000000 /* uses tail calls */
+#define SEEN_TAILCALL 0x40000000 /* uses tail calls */
#define SEEN_VREG_MASK 0x1ff80000 /* Volatile registers r3-r12 */
#define SEEN_NVREG_MASK 0x0003ffff /* Non volatile registers r14-r31 */
--
2.31.1
^ permalink raw reply related
* [PATCH v2 0/8] bpf powerpc: Add BPF_PROBE_MEM support in powerpc JIT compiler
From: Hari Bathini @ 2021-09-17 15:30 UTC (permalink / raw)
To: naveen.n.rao, mpe, ast, daniel
Cc: songliubraving, netdev, john.fastabend, andrii, kpsingh, paulus,
yhs, bpf, linuxppc-dev, kafai, Hari Bathini
Patch #1 & #2 are simple cleanup patches. Patch #3 refactors JIT
compiler code with the aim to simplify adding BPF_PROBE_MEM support.
Patch #4 introduces PPC_RAW_BRANCH() macro instead of open coding
branch instruction. Patch #5 & #7 add BPF_PROBE_MEM support for PPC64
& PPC32 JIT compilers respectively. Patch #6 & #8 add explicit
addr > TASK_SIZE_MAX check to handle bad userspace pointers for
PPC64 & PPC32 cases respectively.
Link to v1 posted by Ravi:
https://lore.kernel.org/all/20210706073211.349889-1-ravi.bangoria@linux.ibm.com/
("[PATCH 0/4] bpf powerpc: Add BPF_PROBE_MEM support for 64bit JIT")
Hari Bathini (4):
bpf powerpc: refactor JIT compiler code
powerpc/ppc-opcode: introduce PPC_RAW_BRANCH() macro
bpf ppc32: Add BPF_PROBE_MEM support for JIT
bpf ppc32: Add addr > TASK_SIZE_MAX explicit check
Ravi Bangoria (4):
bpf powerpc: Remove unused SEEN_STACK
bpf powerpc: Remove extra_pass from bpf_jit_build_body()
bpf ppc64: Add BPF_PROBE_MEM support for JIT
bpf ppc64: Add addr > TASK_SIZE_MAX explicit check
arch/powerpc/include/asm/ppc-opcode.h | 2 +
arch/powerpc/net/bpf_jit.h | 19 ++--
arch/powerpc/net/bpf_jit_comp.c | 75 ++++++++++++++--
arch/powerpc/net/bpf_jit_comp32.c | 123 +++++++++++++++++++++-----
arch/powerpc/net/bpf_jit_comp64.c | 114 ++++++++++++++++--------
5 files changed, 260 insertions(+), 73 deletions(-)
--
2.31.1
^ permalink raw reply
* Re: [PATCH v5 5/6] sched/fair: Carve out logic to mark a group for asymmetric packing
From: Vincent Guittot @ 2021-09-17 15:27 UTC (permalink / raw)
To: Ricardo Neri
Cc: Juri Lelli, Aubrey Li, Srikar Dronamraju, Ravi V. Shankar,
Peter Zijlstra (Intel), Ricardo Neri, Ben Segall,
Srinivas Pandruvada, Joel Fernandes (Google), Ingo Molnar,
Rafael J . Wysocki, Steven Rostedt, Mel Gorman, Len Brown,
Nicholas Piggin, Aubrey Li, Dietmar Eggemann, Tim Chen,
Quentin Perret, Daniel Bristot de Oliveira, linux-kernel,
linuxppc-dev
In-Reply-To: <20210911011819.12184-6-ricardo.neri-calderon@linux.intel.com>
On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> Create a separate function, sched_asym(). A subsequent changeset will
> introduce logic to deal with SMT in conjunction with asmymmetric
> packing. Such logic will need the statistics of the scheduling
> group provided as argument. Update them before calling sched_asym().
>
> Cc: Aubrey Li <aubrey.li@intel.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Quentin Perret <qperret@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> Changes since v4:
> * None
>
> Changes since v3:
> * Remove a redundant check for the local group in sched_asym().
> (Dietmar)
> * Reworded commit message for clarity. (Len)
>
> Changes since v2:
> * Introduced this patch.
>
> Changes since v1:
> * N/A
> ---
> kernel/sched/fair.c | 20 +++++++++++++-------
> 1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c5851260b4d8..26db017c14a3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8597,6 +8597,13 @@ group_type group_classify(unsigned int imbalance_pct,
> return group_has_spare;
> }
>
> +static inline bool
> +sched_asym(struct lb_env *env, struct sd_lb_stats *sds, struct sg_lb_stats *sgs,
> + struct sched_group *group)
> +{
> + return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
> +}
> +
> /**
> * update_sg_lb_stats - Update sched_group's statistics for load balancing.
> * @env: The load balancing environment.
> @@ -8657,18 +8664,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
> }
> }
>
> + sgs->group_capacity = group->sgc->capacity;
> +
> + sgs->group_weight = group->group_weight;
> +
> /* Check if dst CPU is idle and preferred to this group */
> if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
> - env->idle != CPU_NOT_IDLE &&
> - sgs->sum_h_nr_running &&
> - sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
> + env->idle != CPU_NOT_IDLE && sgs->sum_h_nr_running &&
> + sched_asym(env, sds, sgs, group)) {
> sgs->group_asym_packing = 1;
> }
>
> - sgs->group_capacity = group->sgc->capacity;
> -
> - sgs->group_weight = group->group_weight;
> -
> sgs->group_type = group_classify(env->sd->imbalance_pct, group, sgs);
>
> /* Computing avg_load makes sense only when group is overloaded */
> --
> 2.17.1
>
^ permalink raw reply
* Re: [PATCH v5 4/6] sched/fair: Provide update_sg_lb_stats() with sched domain statistics
From: Vincent Guittot @ 2021-09-17 15:27 UTC (permalink / raw)
To: Ricardo Neri
Cc: Juri Lelli, Aubrey Li, Srikar Dronamraju, Ravi V. Shankar,
Peter Zijlstra (Intel), Ricardo Neri, Ben Segall,
Srinivas Pandruvada, Joel Fernandes (Google), Ingo Molnar,
Rafael J . Wysocki, Steven Rostedt, Mel Gorman, Len Brown,
Nicholas Piggin, Aubrey Li, Dietmar Eggemann, Tim Chen,
Quentin Perret, Daniel Bristot de Oliveira, linux-kernel,
linuxppc-dev
In-Reply-To: <20210911011819.12184-5-ricardo.neri-calderon@linux.intel.com>
On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> Before deciding to pull tasks when using asymmetric packing of tasks,
> on some architectures (e.g., x86) it is necessary to know not only the
> state of dst_cpu but also of its SMT siblings. The decision to classify
> a candidate busiest group as group_asym_packing is done in
> update_sg_lb_stats(). Give this function access to the scheduling domain
> statistics, which contains the statistics of the local group.
>
> Cc: Aubrey Li <aubrey.li@intel.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Quentin Perret <qperret@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> Changes since v4:
> * None
>
> Changes since v3:
> * None
>
> Changes since v2:
> * Introduced this patch.
>
> Changes since v1:
> * N/A
> ---
> kernel/sched/fair.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7a054f528bcc..c5851260b4d8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8605,6 +8605,7 @@ group_type group_classify(unsigned int imbalance_pct,
> * @sg_status: Holds flag indicating the status of the sched_group
> */
> static inline void update_sg_lb_stats(struct lb_env *env,
> + struct sd_lb_stats *sds,
> struct sched_group *group,
> struct sg_lb_stats *sgs,
> int *sg_status)
> @@ -8613,7 +8614,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>
> memset(sgs, 0, sizeof(*sgs));
>
> - local_group = cpumask_test_cpu(env->dst_cpu, sched_group_span(group));
> + local_group = group == sds->local;
>
> for_each_cpu_and(i, sched_group_span(group), env->cpus) {
> struct rq *rq = cpu_rq(i);
> @@ -9176,7 +9177,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
> update_group_capacity(env->sd, env->dst_cpu);
> }
>
> - update_sg_lb_stats(env, sg, sgs, &sg_status);
> + update_sg_lb_stats(env, sds, sg, sgs, &sg_status);
>
> if (local_group)
> goto next_group;
> --
> 2.17.1
>
^ permalink raw reply
* Re: [PATCH v5 3/6] sched/fair: Optimize checking for group_asym_packing
From: Vincent Guittot @ 2021-09-17 15:26 UTC (permalink / raw)
To: Ricardo Neri
Cc: Juri Lelli, Aubrey Li, Srikar Dronamraju, Ravi V. Shankar,
Peter Zijlstra (Intel), Ricardo Neri, Ben Segall,
Srinivas Pandruvada, Joel Fernandes (Google), Ingo Molnar,
Rafael J . Wysocki, Steven Rostedt, Mel Gorman, Len Brown,
Nicholas Piggin, Aubrey Li, Dietmar Eggemann, Tim Chen,
Quentin Perret, Daniel Bristot de Oliveira, linux-kernel,
linuxppc-dev
In-Reply-To: <20210911011819.12184-4-ricardo.neri-calderon@linux.intel.com>
On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> sched_asmy_prefer() always returns false when called on the local group. By
> checking local_group, we can avoid additional checks and invoking
> sched_asmy_prefer() when it is not needed. No functional changes are
> introduced.
>
> Cc: Aubrey Li <aubrey.li@intel.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Quentin Perret <qperret@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> Changes since v4:
> * None
>
> Changes since v3:
> * Further rewording of the commit message. (Len)
>
> Changes since v2:
> * Reworded the commit message for clarity. (Peter Z)
>
> Changes since v1:
> * None
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ff69f245b939..7a054f528bcc 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8657,7 +8657,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
> }
>
> /* Check if dst CPU is idle and preferred to this group */
> - if (env->sd->flags & SD_ASYM_PACKING &&
> + if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
> env->idle != CPU_NOT_IDLE &&
> sgs->sum_h_nr_running &&
> sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
> --
> 2.17.1
>
^ permalink raw reply
* Re: [PATCH v5 2/6] sched/topology: Introduce sched_group::flags
From: Vincent Guittot @ 2021-09-17 15:26 UTC (permalink / raw)
To: Ricardo Neri
Cc: Juri Lelli, Aubrey Li, Srikar Dronamraju, Ravi V. Shankar,
Peter Zijlstra (Intel), Ricardo Neri, Ben Segall,
Srinivas Pandruvada, Joel Fernandes (Google), Ingo Molnar,
Rafael J . Wysocki, Steven Rostedt, Mel Gorman, Len Brown,
Nicholas Piggin, Aubrey Li, Dietmar Eggemann, Tim Chen,
Quentin Perret, Daniel Bristot de Oliveira, linux-kernel,
linuxppc-dev
In-Reply-To: <20210911011819.12184-3-ricardo.neri-calderon@linux.intel.com>
On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> There exist situations in which the load balance needs to know the
> properties of the CPUs in a scheduling group. When using asymmetric
> packing, for instance, the load balancer needs to know not only the
> state of dst_cpu but also of its SMT siblings, if any.
>
> Use the flags of the child scheduling domains to initialize scheduling
> group flags. This will reflect the properties of the CPUs in the
> group.
>
> A subsequent changeset will make use of these new flags. No functional
> changes are introduced.
>
> Cc: Aubrey Li <aubrey.li@intel.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Quentin Perret <qperret@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> Changes since v4:
> * None
>
> Changes since v3:
> * Clear the flags of the scheduling groups of a domain if its child is
> destroyed.
> * Minor rewording of the commit message.
>
> Changes since v2:
> * Introduced this patch.
>
> Changes since v1:
> * N/A
> ---
> kernel/sched/sched.h | 1 +
> kernel/sched/topology.c | 21 ++++++++++++++++++---
> 2 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3d3e5793e117..86ab33ce529d 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1809,6 +1809,7 @@ struct sched_group {
> unsigned int group_weight;
> struct sched_group_capacity *sgc;
> int asym_prefer_cpu; /* CPU of highest priority in group */
> + int flags;
>
> /*
> * The CPUs this group covers.
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 4e8698e62f07..c56faae461d9 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -716,8 +716,20 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
> tmp = sd;
> sd = sd->parent;
> destroy_sched_domain(tmp);
> - if (sd)
> + if (sd) {
> + struct sched_group *sg = sd->groups;
> +
> + /*
> + * sched groups hold the flags of the child sched
> + * domain for convenience. Clear such flags since
> + * the child is being destroyed.
> + */
> + do {
> + sg->flags = 0;
> + } while (sg != sd->groups);
> +
> sd->child = NULL;
> + }
> }
>
> for (tmp = sd; tmp; tmp = tmp->parent)
> @@ -916,10 +928,12 @@ build_group_from_child_sched_domain(struct sched_domain *sd, int cpu)
> return NULL;
>
> sg_span = sched_group_span(sg);
> - if (sd->child)
> + if (sd->child) {
> cpumask_copy(sg_span, sched_domain_span(sd->child));
> - else
> + sg->flags = sd->child->flags;
> + } else {
> cpumask_copy(sg_span, sched_domain_span(sd));
> + }
>
> atomic_inc(&sg->ref);
> return sg;
> @@ -1169,6 +1183,7 @@ static struct sched_group *get_group(int cpu, struct sd_data *sdd)
> if (child) {
> cpumask_copy(sched_group_span(sg), sched_domain_span(child));
> cpumask_copy(group_balance_mask(sg), sched_group_span(sg));
> + sg->flags = child->flags;
> } else {
> cpumask_set_cpu(cpu, sched_group_span(sg));
> cpumask_set_cpu(cpu, group_balance_mask(sg));
> --
> 2.17.1
>
^ permalink raw reply
* Re: [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
From: Vincent Guittot @ 2021-09-17 15:25 UTC (permalink / raw)
To: Ricardo Neri
Cc: Juri Lelli, Aubrey Li, Srikar Dronamraju, Ravi V. Shankar,
Peter Zijlstra (Intel), Ricardo Neri, Ben Segall,
Srinivas Pandruvada, Joel Fernandes (Google), Ingo Molnar,
Rafael J . Wysocki, Steven Rostedt, Mel Gorman, Len Brown,
Nicholas Piggin, Aubrey Li, Dietmar Eggemann, Tim Chen,
Quentin Perret, Daniel Bristot de Oliveira, linux-kernel,
linuxppc-dev
In-Reply-To: <20210917010044.GA23727@ranerica-svr.sc.intel.com>
On Fri, 17 Sept 2021 at 03:01, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> On Wed, Sep 15, 2021 at 05:43:44PM +0200, Vincent Guittot wrote:
> > On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
> > <ricardo.neri-calderon@linux.intel.com> wrote:
> > >
> > > When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
> > > check for the idle state of the destination CPU, dst_cpu, but also of
> > > its SMT siblings.
> > >
> > > If dst_cpu is idle but its SMT siblings are busy, performance suffers
> > > if it pulls tasks from a medium priority CPU that does not have SMT
> > > siblings.
> > >
> > > Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
> > > siblings of both dst_cpu and the CPUs in the candidate busiest group.
> > >
> > > Cc: Aubrey Li <aubrey.li@intel.com>
> > > Cc: Ben Segall <bsegall@google.com>
> > > Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> > > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > > Cc: Mel Gorman <mgorman@suse.de>
> > > Cc: Quentin Perret <qperret@google.com>
> > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Tim Chen <tim.c.chen@linux.intel.com>
> > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > Reviewed-by: Len Brown <len.brown@intel.com>
> > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > > ---
> > > Changes since v4:
> > > * Use sg_lb_stats::sum_nr_running the idle state of a scheduling group.
> > > (Vincent, Peter)
> > > * Do not even idle CPUs in asym_smt_can_pull_tasks(). (Vincent)
> > > * Updated function documentation and corrected a typo.
> > >
> > > Changes since v3:
> > > * Removed the arch_asym_check_smt_siblings() hook. Discussions with the
> > > powerpc folks showed that this patch should not impact them. Also, more
> > > recent powerpc processor no longer use asym_packing. (PeterZ)
> > > * Removed unnecessary local variable in asym_can_pull_tasks(). (Dietmar)
> > > * Removed unnecessary check for local CPUs when the local group has zero
> > > utilization. (Joel)
> > > * Renamed asym_can_pull_tasks() as asym_smt_can_pull_tasks() to reflect
> > > the fact that it deals with SMT cases.
> > > * Made asym_smt_can_pull_tasks() return false for !CONFIG_SCHED_SMT so
> > > that callers can deal with non-SMT cases.
> > >
> > > Changes since v2:
> > > * Reworded the commit message to reflect updates in code.
> > > * Corrected misrepresentation of dst_cpu as the CPU doing the load
> > > balancing. (PeterZ)
> > > * Removed call to arch_asym_check_smt_siblings() as it is now called in
> > > sched_asym().
> > >
> > > Changes since v1:
> > > * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull
> > > tasks. Instead, reclassify the candidate busiest group, as it
> > > may still be selected. (PeterZ)
> > > * Avoid an expensive and unnecessary call to cpumask_weight() when
> > > determining if a sched_group is comprised of SMT siblings.
> > > (PeterZ).
> > > ---
> > > kernel/sched/fair.c | 94 +++++++++++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 94 insertions(+)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 26db017c14a3..8d763dd0174b 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8597,10 +8597,98 @@ group_type group_classify(unsigned int imbalance_pct,
> > > return group_has_spare;
> > > }
> > >
> > > +/**
> > > + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
> > > + * @dst_cpu: Destination CPU of the load balancing
> > > + * @sds: Load-balancing data with statistics of the local group
> > > + * @sgs: Load-balancing statistics of the candidate busiest group
> > > + * @sg: The candidate busiest group
> > > + *
> > > + * Check the state of the SMT siblings of both @sds::local and @sg and decide
> > > + * if @dst_cpu can pull tasks.
> > > + *
> > > + * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
> > > + * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
> > > + * only if @dst_cpu has higher priority.
> > > + *
> > > + * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
> > > + * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
> > > + * Bigger imbalances in the number of busy CPUs will be dealt with in
> > > + * update_sd_pick_busiest().
> > > + *
> > > + * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
> > > + * of @dst_cpu are idle and @sg has lower priority.
> > > + */
> > > +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> > > + struct sg_lb_stats *sgs,
> > > + struct sched_group *sg)
> > > +{
> > > +#ifdef CONFIG_SCHED_SMT
> > > + bool local_is_smt, sg_is_smt;
> > > + int sg_busy_cpus;
> > > +
> > > + local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> > > + sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> > > +
> > > + sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> > > +
> > > + if (!local_is_smt) {
> > > + /*
> > > + * If we are here, @dst_cpu is idle and does not have SMT
> > > + * siblings. Pull tasks if candidate group has two or more
> > > + * busy CPUs.
> > > + */
> > > + if (sg_is_smt && sg_busy_cpus >= 2)
> >
> > Do you really need to test sg_is_smt ? if sg_busy_cpus >= 2 then
> > sd_is_smt must be true ?
>
> Thank you very much for your feedback Vincent!
>
> Yes, it is true that sg_busy_cpus >=2 is only true if @sg is SMT. I will
> remove this check.
>
> >
> > Also, This is the default behavior where we want to even the number of
> > busy cpu. Shouldn't you return false and fall back to the default
> > behavior ?
>
> This is also true.
>
> >
> > That being said, the default behavior tries to even the number of idle
> > cpus which is easier to compute and is equal to even the number of
> > busy cpus in "normal" system with the same number of cpus in groups
> > but this is not the case here. It could be good to change the default
> > behavior to even the number of busy cpus and that you use the default
> > behavior here. Additional condition will be used to select the busiest
> > group like more busy cpu or more number of running tasks
>
> That is a very good observation. Checking the number of idle CPUs
> assumes that both groups have the same number of CPUs. I'll look into
> modifying the default behavior.
Because this change will impact default smt/smp system, we might
prefer to do that in a separate step
With the removal of the condition !sds->local_stat.sum_nr_running
which seems useless because dst_cpu is idle and not SMT, this patch
looks good to me
>
> >
> > > + return true;
> > > +
> > > + /*
> > > + * @dst_cpu does not have SMT siblings. @sg may have SMT
> > > + * siblings and only one is busy. In such case, @dst_cpu
> > > + * can help if it has higher priority and is idle (i.e.,
> > > + * it has no running tasks).
> >
> > The previous comment above assume that "@dst_cpu is idle" but now you
> > need to check that sds->local_stat.sum_nr_running == 0
>
> But we already know that, right? We are here because in
> update_sg_lb_stats() we determine that dst CPU is idle (env->idle !=
> CPU_NOT_IDLE).
>
> Thanks and BR,
> Ricardo
^ permalink raw reply
* RE: [PATCH v2] powerpc/32: Don't use a struct based type for pte_t
From: David Laight @ 2021-09-17 14:32 UTC (permalink / raw)
To: 'Christophe Leroy', Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
In-Reply-To: <c904599f33aaf6bb7ee2836a9ff8368509e0d78d.1631887042.git.christophe.leroy@csgroup.eu>
From: Christophe Leroy
> Sent: 17 September 2021 14:58
>
> Long time ago we had a config item called STRICT_MM_TYPECHECKS
> to build the kernel with pte_t defined as a structure in order
> to perform additional build checks or build it with pte_t
> defined as a simple type in order to get simpler generated code.
>
...
> diff --git a/arch/powerpc/include/asm/pgtable-types.h b/arch/powerpc/include/asm/pgtable-types.h
> index d11b4c61d686..c60199fc6fa6 100644
> --- a/arch/powerpc/include/asm/pgtable-types.h
> +++ b/arch/powerpc/include/asm/pgtable-types.h
> @@ -5,14 +5,26 @@
> /* PTE level */
> #if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
> typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
> -#else
> +#elif defined(__CHECKER__) || !defined(CONFIG_PPC32)
> typedef struct { pte_basic_t pte; } pte_t;
> +#else
> +typedef pte_basic_t pte_t;
> #endif
> +
> +#if defined(__CHECKER__) || !defined(CONFIG_PPC32) || \
> + (defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES))
> #define __pte(x) ((pte_t) { (x) })
> static inline pte_basic_t pte_val(pte_t x)
> {
> return x.pte;
> }
> +#else
> +#define __pte(x) ((pte_t)(x))
> +static inline pte_basic_t pte_val(pte_t x)
> +{
> + return x;
> +}
> +#endif
Would it be better to define:
static inline pte_basic_*pte_basic(pte_t *x)
{
#if xxx
return x;
#else
return &x->pte;
#endif
}
Then pte_val(x) is always *pt_basic(x)
and the casts like:
> - pte_basic_t *entry = &ptep->pte;
> + pte_basic_t *entry = (pte_basic_t *)ptep;
can go away.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply
* [PATCH v2] powerpc/32: Don't use a struct based type for pte_t
From: Christophe Leroy @ 2021-09-17 13:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
Long time ago we had a config item called STRICT_MM_TYPECHECKS
to build the kernel with pte_t defined as a structure in order
to perform additional build checks or build it with pte_t
defined as a simple type in order to get simpler generated code.
Commit 670eea924198 ("powerpc/mm: Always use STRICT_MM_TYPECHECKS")
made the struct based definition the only one, considering that the
generated code was similar in both cases.
That's right on ppc64 because the ABI is such that the content of a
struct having a single simple type element is passed as register,
but on ppc32 such a structure is passed via the stack like any
structure.
Simple test function:
pte_t test(pte_t pte)
{
return pte;
}
Before this patch we get
c00108ec <test>:
c00108ec: 81 24 00 00 lwz r9,0(r4)
c00108f0: 91 23 00 00 stw r9,0(r3)
c00108f4: 4e 80 00 20 blr
So, for PPC32, restore the simple type behaviour we got before
commit 670eea924198, but instead of adding a config option to
activate type check, do it when __CHECKER__ is set so that type
checking is performed by 'sparse' and provides feedback like:
arch/powerpc/mm/pgtable.c:466:16: warning: incorrect type in return expression (different base types)
arch/powerpc/mm/pgtable.c:466:16: expected unsigned long
arch/powerpc/mm/pgtable.c:466:16: got struct pte_t [usertype] x
With this patch we now get
c0010890 <test>:
c0010890: 4e 80 00 20 blr
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v2: Properly handle 8xx 16k pages
---
arch/powerpc/include/asm/nohash/32/pgtable.h | 2 +-
arch/powerpc/include/asm/pgtable-types.h | 14 +++++++++++++-
arch/powerpc/mm/pgtable.c | 2 +-
3 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index f06ae00f2a65..34ce50da1850 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -245,7 +245,7 @@ static int number_of_cells_per_pte(pmd_t *pmd, pte_basic_t val, int huge)
static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, pte_t *p,
unsigned long clr, unsigned long set, int huge)
{
- pte_basic_t *entry = &p->pte;
+ pte_basic_t *entry = (pte_basic_t *)p;
pte_basic_t old = pte_val(*p);
pte_basic_t new = (old & ~(pte_basic_t)clr) | set;
int num, i;
diff --git a/arch/powerpc/include/asm/pgtable-types.h b/arch/powerpc/include/asm/pgtable-types.h
index d11b4c61d686..c60199fc6fa6 100644
--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -5,14 +5,26 @@
/* PTE level */
#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
-#else
+#elif defined(__CHECKER__) || !defined(CONFIG_PPC32)
typedef struct { pte_basic_t pte; } pte_t;
+#else
+typedef pte_basic_t pte_t;
#endif
+
+#if defined(__CHECKER__) || !defined(CONFIG_PPC32) || \
+ (defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES))
#define __pte(x) ((pte_t) { (x) })
static inline pte_basic_t pte_val(pte_t x)
{
return x.pte;
}
+#else
+#define __pte(x) ((pte_t)(x))
+static inline pte_basic_t pte_val(pte_t x)
+{
+ return x;
+}
+#endif
/* PMD level */
#ifdef CONFIG_PPC64
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index cd16b407f47e..ce9482383144 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -271,7 +271,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_
{
pmd_t *pmd = pmd_off(mm, addr);
pte_basic_t val;
- pte_basic_t *entry = &ptep->pte;
+ pte_basic_t *entry = (pte_basic_t *)ptep;
int num, i;
/*
--
2.31.1
^ permalink raw reply related
* [PATCH v2] powerpc/8xx: Simplify TLB handling
From: Christophe Leroy @ 2021-09-17 13:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In the old days, TLB handling for 8xx was using tlbie and tlbia
instructions directly as much as possible.
But commit f048aace29e0 ("powerpc/mm: Add SMP support to no-hash
TLB handling") broke that by introducing out-of-line unnecessary
complex functions for booke/smp which don't have tlbie/tlbia
instructions and require more complex handling.
Restore direct use of tlbie and tlbia for 8xx which is never SMP.
With this patch we now get
c00ecc68 <ptep_clear_flush>:
c00ecc68: 39 00 00 00 li r8,0
c00ecc6c: 81 46 00 00 lwz r10,0(r6)
c00ecc70: 91 06 00 00 stw r8,0(r6)
c00ecc74: 7c 00 2a 64 tlbie r5,r0
c00ecc78: 7c 00 04 ac hwsync
c00ecc7c: 91 43 00 00 stw r10,0(r3)
c00ecc80: 4e 80 00 20 blr
Before it was
c0012880 <local_flush_tlb_page>:
c0012880: 2c 03 00 00 cmpwi r3,0
c0012884: 41 82 00 54 beq c00128d8 <local_flush_tlb_page+0x58>
c0012888: 81 22 00 00 lwz r9,0(r2)
c001288c: 81 43 00 20 lwz r10,32(r3)
c0012890: 39 29 00 01 addi r9,r9,1
c0012894: 91 22 00 00 stw r9,0(r2)
c0012898: 2c 0a 00 00 cmpwi r10,0
c001289c: 41 82 00 10 beq c00128ac <local_flush_tlb_page+0x2c>
c00128a0: 81 2a 01 dc lwz r9,476(r10)
c00128a4: 2c 09 ff ff cmpwi r9,-1
c00128a8: 41 82 00 0c beq c00128b4 <local_flush_tlb_page+0x34>
c00128ac: 7c 00 22 64 tlbie r4,r0
c00128b0: 7c 00 04 ac hwsync
c00128b4: 81 22 00 00 lwz r9,0(r2)
c00128b8: 39 29 ff ff addi r9,r9,-1
c00128bc: 2c 09 00 00 cmpwi r9,0
c00128c0: 91 22 00 00 stw r9,0(r2)
c00128c4: 4c a2 00 20 bclr+ 4,eq
c00128c8: 81 22 00 70 lwz r9,112(r2)
c00128cc: 71 29 00 04 andi. r9,r9,4
c00128d0: 4d 82 00 20 beqlr
c00128d4: 48 65 76 74 b c0669f48 <preempt_schedule>
c00128d8: 81 22 00 00 lwz r9,0(r2)
c00128dc: 39 29 00 01 addi r9,r9,1
c00128e0: 91 22 00 00 stw r9,0(r2)
c00128e4: 4b ff ff c8 b c00128ac <local_flush_tlb_page+0x2c>
...
c00ecdc8 <ptep_clear_flush>:
c00ecdc8: 94 21 ff f0 stwu r1,-16(r1)
c00ecdcc: 39 20 00 00 li r9,0
c00ecdd0: 93 c1 00 08 stw r30,8(r1)
c00ecdd4: 83 c6 00 00 lwz r30,0(r6)
c00ecdd8: 91 26 00 00 stw r9,0(r6)
c00ecddc: 93 e1 00 0c stw r31,12(r1)
c00ecde0: 7c 08 02 a6 mflr r0
c00ecde4: 7c 7f 1b 78 mr r31,r3
c00ecde8: 7c 83 23 78 mr r3,r4
c00ecdec: 7c a4 2b 78 mr r4,r5
c00ecdf0: 90 01 00 14 stw r0,20(r1)
c00ecdf4: 4b f2 5a 8d bl c0012880 <local_flush_tlb_page>
c00ecdf8: 93 df 00 00 stw r30,0(r31)
c00ecdfc: 7f e3 fb 78 mr r3,r31
c00ece00: 80 01 00 14 lwz r0,20(r1)
c00ece04: 83 c1 00 08 lwz r30,8(r1)
c00ece08: 83 e1 00 0c lwz r31,12(r1)
c00ece0c: 7c 08 03 a6 mtlr r0
c00ece10: 38 21 00 10 addi r1,r1,16
c00ece14: 4e 80 00 20 blr
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v2: Remove tracing of tlbie and tlbia as it generates circular dependency problems. Will see later if it can be added back. Not a big issue though.
---
arch/powerpc/include/asm/nohash/tlbflush.h | 15 +++++++++++++++
arch/powerpc/mm/nohash/tlb.c | 2 ++
2 files changed, 17 insertions(+)
diff --git a/arch/powerpc/include/asm/nohash/tlbflush.h b/arch/powerpc/include/asm/nohash/tlbflush.h
index 1edb7243e515..c08d25e3e626 100644
--- a/arch/powerpc/include/asm/nohash/tlbflush.h
+++ b/arch/powerpc/include/asm/nohash/tlbflush.h
@@ -32,11 +32,26 @@ extern void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end);
extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
+#ifdef CONFIG_PPC_8xx
+static inline void local_flush_tlb_mm(struct mm_struct *mm)
+{
+ unsigned int pid = READ_ONCE(mm->context.id);
+
+ if (pid != MMU_NO_CONTEXT)
+ asm volatile ("sync; tlbia; isync" : : : "memory");
+}
+
+static inline void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
+{
+ asm volatile ("tlbie %0; sync" : : "r" (vmaddr) : "memory");
+}
+#else
extern void local_flush_tlb_mm(struct mm_struct *mm);
extern void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
extern void __local_flush_tlb_page(struct mm_struct *mm, unsigned long vmaddr,
int tsize, int ind);
+#endif
#ifdef CONFIG_SMP
extern void flush_tlb_mm(struct mm_struct *mm);
diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index 5872f69141d5..6d8cb68f6efb 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -185,6 +185,7 @@ EXPORT_PER_CPU_SYMBOL(next_tlbcam_idx);
* processor
*/
+#ifndef CONFIG_PPC_8xx
/*
* These are the base non-SMP variants of page and mm flushing
*/
@@ -218,6 +219,7 @@ void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
mmu_get_tsize(mmu_virtual_psize), 0);
}
EXPORT_SYMBOL(local_flush_tlb_page);
+#endif
/*
* And here are the SMP non-local implementations
--
2.31.1
^ permalink raw reply related
* Re: [PATCH] powerpc: warn on emulation of dcbz instruction
From: Benjamin Herrenschmidt @ 2021-09-17 12:33 UTC (permalink / raw)
To: David Laight, 'Christophe Leroy', Paul Mackerras,
Michael Ellerman
Cc: Finn Thain, linuxppc-dev@lists.ozlabs.org,
linux-kernel@vger.kernel.org, Stan Johnson
In-Reply-To: <2fc6953808854149965e60f340eca94b@AcuMS.aculab.com>
On Thu, 2021-09-16 at 14:36 +0000, David Laight wrote:
> > Does userspace accesses non-cached memory directly ?
>
>
> It probably can if a driver mmaps PCI space directly into user space.
>
> That certainly works on x86-64.
The posterchild for that is Xorg
Cheers,
Ben.
^ permalink raw reply
* [PATCH] powerpc/32: Don't use a struct based type for pte_t
From: Christophe Leroy @ 2021-09-17 10:54 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
Long time ago we had a config item called STRICT_MM_TYPECHECKS
to build the kernel with pte_t defined as a structure in order
to perform additional build checks or build it with pte_t
defined as a simple type in order to get simpler generated code.
Commit 670eea924198 ("powerpc/mm: Always use STRICT_MM_TYPECHECKS")
made the struct based definition the only one, considering that the
generated code was similar in both cases.
That's right on ppc64 because the ABI is such that the content of a
struct having a single simple type element is passed as register,
but on ppc32 such a structure is passed via the stack like any
structure.
Simple test function:
pte_t test(pte_t pte)
{
return pte;
}
Before this patch we get
c00108ec <test>:
c00108ec: 81 24 00 00 lwz r9,0(r4)
c00108f0: 91 23 00 00 stw r9,0(r3)
c00108f4: 4e 80 00 20 blr
So, for PPC32, restore the simple type behaviour we got before
commit 670eea924198, but instead of adding a config option to
activate type check, do it when __CHECKER__ is set so that type
checking is performed by 'sparse' and provides feedback like:
arch/powerpc/mm/pgtable.c:466:16: warning: incorrect type in return expression (different base types)
arch/powerpc/mm/pgtable.c:466:16: expected unsigned long
arch/powerpc/mm/pgtable.c:466:16: got struct pte_t [usertype] x
With this patch we now get
c0010890 <test>:
c0010890: 4e 80 00 20 blr
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/nohash/32/pgtable.h | 2 +-
arch/powerpc/include/asm/pgtable-types.h | 13 ++++++++++++-
arch/powerpc/mm/pgtable.c | 2 +-
3 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index f06ae00f2a65..34ce50da1850 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -245,7 +245,7 @@ static int number_of_cells_per_pte(pmd_t *pmd, pte_basic_t val, int huge)
static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, pte_t *p,
unsigned long clr, unsigned long set, int huge)
{
- pte_basic_t *entry = &p->pte;
+ pte_basic_t *entry = (pte_basic_t *)p;
pte_basic_t old = pte_val(*p);
pte_basic_t new = (old & ~(pte_basic_t)clr) | set;
int num, i;
diff --git a/arch/powerpc/include/asm/pgtable-types.h b/arch/powerpc/include/asm/pgtable-types.h
index d11b4c61d686..e5ed4580a62c 100644
--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -5,14 +5,25 @@
/* PTE level */
#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
-#else
+#elif defined(__CHECKER__) || !defined(CONFIG_PPC32)
typedef struct { pte_basic_t pte; } pte_t;
+#else
+typedef pte_basic_t pte_t;
#endif
+
+#if defined(__CHECKER__) || !defined(CONFIG_PPC32)
#define __pte(x) ((pte_t) { (x) })
static inline pte_basic_t pte_val(pte_t x)
{
return x.pte;
}
+#else
+#define __pte(x) ((pte_t)(x))
+static inline pte_basic_t pte_val(pte_t x)
+{
+ return x;
+}
+#endif
/* PMD level */
#ifdef CONFIG_PPC64
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index cd16b407f47e..ce9482383144 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -271,7 +271,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_
{
pmd_t *pmd = pmd_off(mm, addr);
pte_basic_t val;
- pte_basic_t *entry = &ptep->pte;
+ pte_basic_t *entry = (pte_basic_t *)ptep;
int num, i;
/*
--
2.31.1
^ permalink raw reply related
* Re: [PATCH] swiotlb: set IO TLB segment size via cmdline
From: Roman Skakun @ 2021-09-17 10:53 UTC (permalink / raw)
To: Jan Beulich
Cc: linux-doc, Peter Zijlstra, Viresh Kumar,
Linux Kernel Mailing List, Paul Mackerras, Will Deacon,
Boris Ostrovsky, Marek Szyprowski, Stefano Stabellini,
Jonathan Corbet, Christoph Hellwig, xen-devel, Paul E. McKenney,
Konrad Rzeszutek Wilk, Muchun Song, Thomas Gleixner,
Juergen Gross, Thomas Bogendoerfer, Andrii Anisov, linuxppc-dev,
Randy Dunlap, linux-mips, iommu, Roman Skakun, Andrew Morton,
Lu Baolu, Robin Murphy, Mike Rapoport, Maciej W. Rozycki
In-Reply-To: <84ef7ff7-2c9c-113a-4a2c-cef54a6ded51@suse.com>
Hi Jan,
>>> In order to be sure to catch all uses like this one (including ones
>>> which make it upstream in parallel to yours), I think you will want
>>> to rename the original IO_TLB_SEGSIZE to e.g. IO_TLB_DEFAULT_SEGSIZE.
>>
>> I don't understand your point. Can you clarify this?
>
> There's a concrete present example: I have a patch pending adding
> another use of IO_TLB_SEGSIZE. This use would need to be replaced
> like you do here in several places. The need for the additional
> replacement would be quite obvious (from a build failure) if you
> renamed the manifest constant. Without renaming, it'll take
> someone running into an issue on a live system, which I consider
> far worse. This is because a simple re-basing of one of the
> patches on top of the other will not point out the need for the
> extra replacement, nor would a test build (with both patches in
> place).
It's reasonable.
I will change the original IO_TLB_SEGSIZE to IO_TLB_DEFAULT_SEGSIZE in the
next patch series.
Thanks.
ср, 15 сент. 2021 г. в 16:50, Jan Beulich <jbeulich@suse.com>:
>
> On 15.09.2021 15:37, Roman Skakun wrote:
> >>> From: Roman Skakun <roman_skakun@epam.com>
> >>>
> >>> It is possible when default IO TLB size is not
> >>> enough to fit a long buffers as described here [1].
> >>>
> >>> This patch makes a way to set this parameter
> >>> using cmdline instead of recompiling a kernel.
> >>>
> >>> [1] https://www.xilinx.com/support/answers/72694.html
> >>
> >> I'm not convinced the swiotlb use describe there falls under "intended
> >> use" - mapping a 1280x720 framebuffer in a single chunk?
> >
> > I had the same issue while mapping DMA chuck ~4MB for gem fb when
> > using xen vdispl.
> > I got the next log:
> > [ 142.030421] rcar-fcp fea2f000.fcp: swiotlb buffer is full (sz:
> > 3686400 bytes), total 32768 (slots), used 32 (slots)
> >
> > It happened when I tried to map bounce buffer, which has a large size.
> > The default size if 128(IO_TLB_SEGSIZE) * 2048(IO_TLB_SHIFT) = 262144
> > bytes, but we requested 3686400 bytes.
> > When I change IO_TLB_SEGSIZE to 2048. (2048(IO_TLB_SEGSIZE) *
> > 2048(IO_TLB_SHIFT) = 4194304bytes).
> > It makes possible to retrieve a bounce buffer for requested size.
> > After changing this value, the problem is gone.
>
> But the question remains: Why does the framebuffer need to be mapped
> in a single giant chunk?
>
> >> In order to be sure to catch all uses like this one (including ones
> >> which make it upstream in parallel to yours), I think you will want
> >> to rename the original IO_TLB_SEGSIZE to e.g. IO_TLB_DEFAULT_SEGSIZE.
> >
> > I don't understand your point. Can you clarify this?
>
> There's a concrete present example: I have a patch pending adding
> another use of IO_TLB_SEGSIZE. This use would need to be replaced
> like you do here in several places. The need for the additional
> replacement would be quite obvious (from a build failure) if you
> renamed the manifest constant. Without renaming, it'll take
> someone running into an issue on a live system, which I consider
> far worse. This is because a simple re-basing of one of the
> patches on top of the other will not point out the need for the
> extra replacement, nor would a test build (with both patches in
> place).
>
> Jan
>
--
Best Regards, Roman.
^ permalink raw reply
* [powerpc:merge] BUILD SUCCESS a0175bcd45a4dd0de301dca8fd17d74fcb3ce240
From: kernel test robot @ 2021-09-17 10:30 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge
branch HEAD: a0175bcd45a4dd0de301dca8fd17d74fcb3ce240 powerpc/ci: Switch GCC 4.9.4 builds to 5.5.0
elapsed time: 1069m
configs tested: 105
configs skipped: 3
The following configs have been built successfully.
More configs may be tested in the coming days.
gcc tested configs:
arm defconfig
arm64 allyesconfig
arm64 defconfig
arm allyesconfig
arm allmodconfig
i386 randconfig-c001-20210916
sh rsk7269_defconfig
powerpc sequoia_defconfig
mips bmips_be_defconfig
riscv allnoconfig
powerpc mpc85xx_cds_defconfig
s390 zfcpdump_defconfig
powerpc arches_defconfig
m68k m5275evb_defconfig
powerpc powernv_defconfig
mips rm200_defconfig
mips cu1000-neo_defconfig
arm ep93xx_defconfig
xtensa audio_kc705_defconfig
xtensa iss_defconfig
powerpc linkstation_defconfig
sh ecovec24-romimage_defconfig
mips malta_defconfig
arm am200epdkit_defconfig
x86_64 randconfig-c001-20210916
arm randconfig-c002-20210916
ia64 allmodconfig
ia64 defconfig
ia64 allyesconfig
m68k allmodconfig
m68k defconfig
m68k allyesconfig
nios2 defconfig
nds32 allnoconfig
nds32 defconfig
nios2 allyesconfig
csky defconfig
alpha defconfig
alpha allyesconfig
xtensa allyesconfig
h8300 allyesconfig
arc defconfig
sh allmodconfig
parisc defconfig
s390 allyesconfig
s390 allmodconfig
parisc allyesconfig
s390 defconfig
i386 allyesconfig
sparc allyesconfig
sparc defconfig
i386 defconfig
arc allyesconfig
mips allyesconfig
mips allmodconfig
powerpc allyesconfig
powerpc allmodconfig
powerpc allnoconfig
x86_64 randconfig-a016-20210916
x86_64 randconfig-a013-20210916
x86_64 randconfig-a012-20210916
x86_64 randconfig-a011-20210916
x86_64 randconfig-a014-20210916
x86_64 randconfig-a015-20210916
i386 randconfig-a016-20210916
i386 randconfig-a015-20210916
i386 randconfig-a011-20210916
i386 randconfig-a012-20210916
i386 randconfig-a013-20210916
i386 randconfig-a014-20210916
riscv randconfig-r042-20210916
s390 randconfig-r044-20210916
arc randconfig-r043-20210916
riscv nommu_k210_defconfig
riscv allyesconfig
riscv nommu_virt_defconfig
riscv defconfig
riscv rv32_defconfig
riscv allmodconfig
x86_64 rhel-8.3-kselftests
um x86_64_defconfig
um i386_defconfig
x86_64 allyesconfig
x86_64 defconfig
x86_64 rhel-8.3
x86_64 kexec
clang tested configs:
riscv randconfig-c006-20210916
x86_64 randconfig-c007-20210916
mips randconfig-c004-20210916
powerpc randconfig-c003-20210916
arm randconfig-c002-20210916
i386 randconfig-c001-20210916
s390 randconfig-c005-20210916
i386 randconfig-a004-20210916
i386 randconfig-a005-20210916
i386 randconfig-a006-20210916
i386 randconfig-a002-20210916
i386 randconfig-a003-20210916
i386 randconfig-a001-20210916
x86_64 randconfig-a002-20210916
x86_64 randconfig-a003-20210916
x86_64 randconfig-a006-20210916
x86_64 randconfig-a004-20210916
x86_64 randconfig-a005-20210916
x86_64 randconfig-a001-20210916
hexagon randconfig-r045-20210916
hexagon randconfig-r041-20210916
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply
* Re: [PATCH] powerpc/mce: check if event info is valid
From: Ganesh @ 2021-09-17 8:54 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: mahesh, npiggin
In-Reply-To: <20210806132307.367688-1-ganeshgr@linux.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 851 bytes --]
On 8/6/21 6:53 PM, Ganesh Goudar wrote:
> Check if the event info is valid before printing the
> event information. When a fwnmi enabled nested kvm guest
> hits a machine check exception L0 and L2 would generate
> machine check event info, But L1 would not generate any
> machine check event info as it won't go through 0x200
> vector and prints some unwanted message.
>
> To fix this, 'in_use' variable in machine check event info is
> no more in use, rename it to 'valid' and check if the event
> information is valid before logging the event information.
>
> without this patch L1 would print following message for
> exceptions encountered in L2, as event structure will be
> empty in L1.
>
> "Machine Check Exception, Unknown event version 0".
>
> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
> ---
Hi mpe, Any comments on this patch.
[-- Attachment #2: Type: text/html, Size: 1255 bytes --]
^ permalink raw reply
* Re: [PATCH] swiotlb: set IO TLB segment size via cmdline
From: Jan Beulich @ 2021-09-17 9:52 UTC (permalink / raw)
To: Roman Skakun
Cc: linux-doc, Peter Zijlstra, Viresh Kumar,
Linux Kernel Mailing List, Paul Mackerras, Will Deacon,
Boris Ostrovsky, Marek Szyprowski, Stefano Stabellini,
Jonathan Corbet, Christoph Hellwig, xen-devel, Paul E. McKenney,
Konrad Rzeszutek Wilk, Muchun Song, Thomas Gleixner,
Juergen Gross, Thomas Bogendoerfer, Andrii Anisov, linuxppc-dev,
Randy Dunlap, linux-mips, iommu, Roman Skakun, Andrew Morton,
Maciej W. Rozycki, Robin Murphy, Mike Rapoport, Lu Baolu
In-Reply-To: <CADu_u-OZzgVj+z=iD6kUQOZxUufF5QSMR6-MmpN_hLZ9PyQJhQ@mail.gmail.com>
On 17.09.2021 11:36, Roman Skakun wrote:
> I use Xen PV display. In my case, PV display backend(Dom0) allocates
> contiguous buffer via DMA-API to
> to implement zero-copy between Dom0 and DomU.
Why does the buffer need to be allocated by Dom0? If it was allocated
by DomU, it could use grants to give the backend direct (zero-copy)
access.
Jan
^ permalink raw reply
* Re: [PATCH] swiotlb: set IO TLB segment size via cmdline
From: Robin Murphy @ 2021-09-17 9:44 UTC (permalink / raw)
To: Roman Skakun, Christoph Hellwig
Cc: linux-doc, Peter Zijlstra, Viresh Kumar,
Linux Kernel Mailing List, Paul Mackerras, Jan Beulich,
Will Deacon, Boris Ostrovsky, Marek Szyprowski,
Stefano Stabellini, Jonathan Corbet, xen-devel, Paul E. McKenney,
Konrad Rzeszutek Wilk, Muchun Song, Thomas Gleixner,
Juergen Gross, Thomas Bogendoerfer, Andrii Anisov, Randy Dunlap,
linux-mips, iommu, Roman Skakun, Andrew Morton, Maciej W. Rozycki,
linuxppc-dev, Mike Rapoport, Lu Baolu
In-Reply-To: <CADu_u-OZzgVj+z=iD6kUQOZxUufF5QSMR6-MmpN_hLZ9PyQJhQ@mail.gmail.com>
On 2021-09-17 10:36, Roman Skakun wrote:
> Hi, Christoph
>
> I use Xen PV display. In my case, PV display backend(Dom0) allocates
> contiguous buffer via DMA-API to
> to implement zero-copy between Dom0 and DomU.
Well, something's gone badly wrong there - if you have to shadow the
entire thing in a bounce buffer to import it then it's hardly zero-copy,
is it? If you want to do buffer sharing the buffer really needs to be
allocated appropriately to begin with, such that all relevant devices
can access it directly. That might be something which needs fixing in Xen.
Robin.
> When I start Weston under DomU, I got the next log in Dom0:
> ```
> [ 112.554471] CPU: 0 PID: 367 Comm: weston Tainted: G O
> 5.10.0-yocto-standard+ #312
> [ 112.575149] Call trace:
> [ 112.577666] dump_backtrace+0x0/0x1b0
> [ 112.581373] show_stack+0x18/0x70
> [ 112.584746] dump_stack+0xd0/0x12c
> [ 112.588200] swiotlb_tbl_map_single+0x234/0x360
> [ 112.592781] xen_swiotlb_map_page+0xe4/0x4c0
> [ 112.597095] xen_swiotlb_map_sg+0x84/0x12c
> [ 112.601249] dma_map_sg_attrs+0x54/0x60
> [ 112.605138] vsp1_du_map_sg+0x30/0x60
> [ 112.608851] rcar_du_vsp_map_fb+0x134/0x170
> [ 112.613082] rcar_du_vsp_plane_prepare_fb+0x44/0x64
> [ 112.618007] drm_atomic_helper_prepare_planes+0xac/0x160
> [ 112.623362] drm_atomic_helper_commit+0x88/0x390
> [ 112.628029] drm_atomic_nonblocking_commit+0x4c/0x60
> [ 112.633043] drm_mode_atomic_ioctl+0x9a8/0xb0c
> [ 112.637532] drm_ioctl_kernel+0xc4/0x11c
> [ 112.641506] drm_ioctl+0x21c/0x460
> [ 112.644967] __arm64_sys_ioctl+0xa8/0xf0
> [ 112.648939] el0_svc_common.constprop.0+0x78/0x1a0
> [ 112.653775] do_el0_svc+0x24/0x90
> [ 112.657148] el0_svc+0x14/0x20
> [ 112.660254] el0_sync_handler+0x1a4/0x1b0
> [ 112.664315] el0_sync+0x174/0x180
> [ 112.668145] rcar-fcp fea2f000.fcp: swiotlb buffer is full (sz:
> 3686400 bytes), total 65536 (slots), used 112 (slots)
> ```
> The problem is happened here:
> https://elixir.bootlin.com/linux/v5.14.4/source/drivers/gpu/drm/rcar-du/rcar_du_vsp.c#L202
>
> Sgt was created in dma_get_sgtable() by dma_common_get_sgtable() and
> includes a single page chunk
> as shown here:
> https://elixir.bootlin.com/linux/v5.14.5/source/kernel/dma/ops_helpers.c#L18
>
> After creating a new sgt, we tried to map this sgt through vsp1_du_map_sg().
> Internally, vsp1_du_map_sg() using ops->map_sg (e.g
> xen_swiotlb_map_sg) to perform
> mapping.
>
> I realized that required segment is too big to be fitted to default
> swiotlb segment and condition
> https://elixir.bootlin.com/linux/latest/source/kernel/dma/swiotlb.c#L474
> is always false.
>
> I know that I use a large buffer, but why can't I map this buffer in one chunk?
>
> Thanks!
>
> ср, 15 сент. 2021 г. в 16:53, Christoph Hellwig <hch@lst.de>:
>>
>> On Wed, Sep 15, 2021 at 03:49:52PM +0200, Jan Beulich wrote:
>>> But the question remains: Why does the framebuffer need to be mapped
>>> in a single giant chunk?
>>
>> More importantly: if you use dynamic dma mappings for your framebuffer
>> you're doing something wrong.
>
>
>
^ permalink raw reply
* Re: [PATCH] swiotlb: set IO TLB segment size via cmdline
From: Roman Skakun @ 2021-09-17 9:36 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Roman Skakun, linux-doc, Peter Zijlstra, Viresh Kumar,
Linux Kernel Mailing List, Paul Mackerras, Jan Beulich,
Will Deacon, Boris Ostrovsky, Marek Szyprowski,
Stefano Stabellini, Jonathan Corbet, xen-devel, Paul E. McKenney,
Konrad Rzeszutek Wilk, Muchun Song, Thomas Gleixner,
Juergen Gross, Thomas Bogendoerfer, Andrii Anisov, linuxppc-dev,
Randy Dunlap, linux-mips, iommu, Roman Skakun, Andrew Morton,
Lu Baolu, Robin Murphy, Mike Rapoport, Maciej W. Rozycki
In-Reply-To: <20210915135321.GA15216@lst.de>
Hi, Christoph
I use Xen PV display. In my case, PV display backend(Dom0) allocates
contiguous buffer via DMA-API to
to implement zero-copy between Dom0 and DomU.
When I start Weston under DomU, I got the next log in Dom0:
```
[ 112.554471] CPU: 0 PID: 367 Comm: weston Tainted: G O
5.10.0-yocto-standard+ #312
[ 112.575149] Call trace:
[ 112.577666] dump_backtrace+0x0/0x1b0
[ 112.581373] show_stack+0x18/0x70
[ 112.584746] dump_stack+0xd0/0x12c
[ 112.588200] swiotlb_tbl_map_single+0x234/0x360
[ 112.592781] xen_swiotlb_map_page+0xe4/0x4c0
[ 112.597095] xen_swiotlb_map_sg+0x84/0x12c
[ 112.601249] dma_map_sg_attrs+0x54/0x60
[ 112.605138] vsp1_du_map_sg+0x30/0x60
[ 112.608851] rcar_du_vsp_map_fb+0x134/0x170
[ 112.613082] rcar_du_vsp_plane_prepare_fb+0x44/0x64
[ 112.618007] drm_atomic_helper_prepare_planes+0xac/0x160
[ 112.623362] drm_atomic_helper_commit+0x88/0x390
[ 112.628029] drm_atomic_nonblocking_commit+0x4c/0x60
[ 112.633043] drm_mode_atomic_ioctl+0x9a8/0xb0c
[ 112.637532] drm_ioctl_kernel+0xc4/0x11c
[ 112.641506] drm_ioctl+0x21c/0x460
[ 112.644967] __arm64_sys_ioctl+0xa8/0xf0
[ 112.648939] el0_svc_common.constprop.0+0x78/0x1a0
[ 112.653775] do_el0_svc+0x24/0x90
[ 112.657148] el0_svc+0x14/0x20
[ 112.660254] el0_sync_handler+0x1a4/0x1b0
[ 112.664315] el0_sync+0x174/0x180
[ 112.668145] rcar-fcp fea2f000.fcp: swiotlb buffer is full (sz:
3686400 bytes), total 65536 (slots), used 112 (slots)
```
The problem is happened here:
https://elixir.bootlin.com/linux/v5.14.4/source/drivers/gpu/drm/rcar-du/rcar_du_vsp.c#L202
Sgt was created in dma_get_sgtable() by dma_common_get_sgtable() and
includes a single page chunk
as shown here:
https://elixir.bootlin.com/linux/v5.14.5/source/kernel/dma/ops_helpers.c#L18
After creating a new sgt, we tried to map this sgt through vsp1_du_map_sg().
Internally, vsp1_du_map_sg() using ops->map_sg (e.g
xen_swiotlb_map_sg) to perform
mapping.
I realized that required segment is too big to be fitted to default
swiotlb segment and condition
https://elixir.bootlin.com/linux/latest/source/kernel/dma/swiotlb.c#L474
is always false.
I know that I use a large buffer, but why can't I map this buffer in one chunk?
Thanks!
ср, 15 сент. 2021 г. в 16:53, Christoph Hellwig <hch@lst.de>:
>
> On Wed, Sep 15, 2021 at 03:49:52PM +0200, Jan Beulich wrote:
> > But the question remains: Why does the framebuffer need to be mapped
> > in a single giant chunk?
>
> More importantly: if you use dynamic dma mappings for your framebuffer
> you're doing something wrong.
--
Best Regards, Roman.
^ permalink raw reply
* [PATCH] powerpc/8xx: Simplify TLB handling
From: Christophe Leroy @ 2021-09-17 9:16 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
In the old days, TLB handling for 8xx was using tlbie and tlbia
instructions directly as much as possible.
But commit f048aace29e0 ("powerpc/mm: Add SMP support to no-hash
TLB handling") broke that by introducing out-of-line unnecessary
complex functions for booke/smp which don't have tlbie/tlbia
instructions and require more complex handling.
Restore direct use of tlbie and tlbia for 8xx which is never SMP.
With this patch we now get
c00ecc68 <ptep_clear_flush>:
c00ecc68: 39 00 00 00 li r8,0
c00ecc6c: 81 46 00 00 lwz r10,0(r6)
c00ecc70: 91 06 00 00 stw r8,0(r6)
c00ecc74: 7c 00 2a 64 tlbie r5,r0
c00ecc78: 7c 00 04 ac hwsync
c00ecc7c: 91 43 00 00 stw r10,0(r3)
c00ecc80: 4e 80 00 20 blr
Before it was
c0012880 <local_flush_tlb_page>:
c0012880: 2c 03 00 00 cmpwi r3,0
c0012884: 41 82 00 54 beq c00128d8 <local_flush_tlb_page+0x58>
c0012888: 81 22 00 00 lwz r9,0(r2)
c001288c: 81 43 00 20 lwz r10,32(r3)
c0012890: 39 29 00 01 addi r9,r9,1
c0012894: 91 22 00 00 stw r9,0(r2)
c0012898: 2c 0a 00 00 cmpwi r10,0
c001289c: 41 82 00 10 beq c00128ac <local_flush_tlb_page+0x2c>
c00128a0: 81 2a 01 dc lwz r9,476(r10)
c00128a4: 2c 09 ff ff cmpwi r9,-1
c00128a8: 41 82 00 0c beq c00128b4 <local_flush_tlb_page+0x34>
c00128ac: 7c 00 22 64 tlbie r4,r0
c00128b0: 7c 00 04 ac hwsync
c00128b4: 81 22 00 00 lwz r9,0(r2)
c00128b8: 39 29 ff ff addi r9,r9,-1
c00128bc: 2c 09 00 00 cmpwi r9,0
c00128c0: 91 22 00 00 stw r9,0(r2)
c00128c4: 4c a2 00 20 bclr+ 4,eq
c00128c8: 81 22 00 70 lwz r9,112(r2)
c00128cc: 71 29 00 04 andi. r9,r9,4
c00128d0: 4d 82 00 20 beqlr
c00128d4: 48 65 76 74 b c0669f48 <preempt_schedule>
c00128d8: 81 22 00 00 lwz r9,0(r2)
c00128dc: 39 29 00 01 addi r9,r9,1
c00128e0: 91 22 00 00 stw r9,0(r2)
c00128e4: 4b ff ff c8 b c00128ac <local_flush_tlb_page+0x2c>
...
c00ecdc8 <ptep_clear_flush>:
c00ecdc8: 94 21 ff f0 stwu r1,-16(r1)
c00ecdcc: 39 20 00 00 li r9,0
c00ecdd0: 93 c1 00 08 stw r30,8(r1)
c00ecdd4: 83 c6 00 00 lwz r30,0(r6)
c00ecdd8: 91 26 00 00 stw r9,0(r6)
c00ecddc: 93 e1 00 0c stw r31,12(r1)
c00ecde0: 7c 08 02 a6 mflr r0
c00ecde4: 7c 7f 1b 78 mr r31,r3
c00ecde8: 7c 83 23 78 mr r3,r4
c00ecdec: 7c a4 2b 78 mr r4,r5
c00ecdf0: 90 01 00 14 stw r0,20(r1)
c00ecdf4: 4b f2 5a 8d bl c0012880 <local_flush_tlb_page>
c00ecdf8: 93 df 00 00 stw r30,0(r31)
c00ecdfc: 7f e3 fb 78 mr r3,r31
c00ece00: 80 01 00 14 lwz r0,20(r1)
c00ece04: 83 c1 00 08 lwz r30,8(r1)
c00ece08: 83 e1 00 0c lwz r31,12(r1)
c00ece0c: 7c 08 03 a6 mtlr r0
c00ece10: 38 21 00 10 addi r1,r1,16
c00ece14: 4e 80 00 20 blr
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/include/asm/nohash/tlbflush.h | 20 ++++++++++++++++++++
arch/powerpc/mm/nohash/tlb.c | 2 ++
2 files changed, 22 insertions(+)
diff --git a/arch/powerpc/include/asm/nohash/tlbflush.h b/arch/powerpc/include/asm/nohash/tlbflush.h
index 1edb7243e515..a7ec171d79a6 100644
--- a/arch/powerpc/include/asm/nohash/tlbflush.h
+++ b/arch/powerpc/include/asm/nohash/tlbflush.h
@@ -32,11 +32,31 @@ extern void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end);
extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
+#ifdef CONFIG_PPC_8xx
+#include <asm/trace.h>
+
+static inline void local_flush_tlb_mm(struct mm_struct *mm)
+{
+ unsigned int pid = READ_ONCE(mm->context.id);
+
+ if (pid != MMU_NO_CONTEXT) {
+ asm volatile ("sync; tlbia; isync" : : : "memory");
+ trace_tlbia(pid);
+ }
+}
+
+static inline void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
+{
+ asm volatile ("tlbie %0; sync" : : "r" (vmaddr) : "memory");
+ trace_tlbie(0, 0, vmaddr, vma ? vma->vm_mm->context.id : 0, 0, 0, 0);
+}
+#else
extern void local_flush_tlb_mm(struct mm_struct *mm);
extern void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
extern void __local_flush_tlb_page(struct mm_struct *mm, unsigned long vmaddr,
int tsize, int ind);
+#endif
#ifdef CONFIG_SMP
extern void flush_tlb_mm(struct mm_struct *mm);
diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index 5872f69141d5..6d8cb68f6efb 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -185,6 +185,7 @@ EXPORT_PER_CPU_SYMBOL(next_tlbcam_idx);
* processor
*/
+#ifndef CONFIG_PPC_8xx
/*
* These are the base non-SMP variants of page and mm flushing
*/
@@ -218,6 +219,7 @@ void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
mmu_get_tsize(mmu_virtual_psize), 0);
}
EXPORT_SYMBOL(local_flush_tlb_page);
+#endif
/*
* And here are the SMP non-local implementations
--
2.31.1
^ permalink raw reply related
* Re: [PATCH v2] powerpc/mce: Fix access error in mce handler
From: Ganesh @ 2021-09-17 8:40 UTC (permalink / raw)
To: Daniel Axtens, linuxppc-dev, mpe; +Cc: mahesh, npiggin
In-Reply-To: <87lf3v903y.fsf@linkitivity.dja.id.au>
[-- Attachment #1: Type: text/plain, Size: 2655 bytes --]
On 9/17/21 12:09 PM, Daniel Axtens wrote:
> Hi Ganesh,
>
>> We queue an irq work for deferred processing of mce event
>> in realmode mce handler, where translation is disabled.
>> Queuing of the work may result in accessing memory outside
>> RMO region, such access needs the translation to be enabled
>> for an LPAR running with hash mmu else the kernel crashes.
>>
>> After enabling translation in mce_handle_error() we used to
>> leave it enabled to avoid crashing here, but now with the
>> commit 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before
>> returning from handler") we are restoring the MSR to disable
>> translation.
>>
>> Hence to fix this enable the translation before queuing the work.
> [snip]
>
>> Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from handler")
> That patch changes arch/powerpc/powerpc/platforms/pseries/ras.c just
> below this comment:
>
> /*
> * Enable translation as we will be accessing per-cpu variables
> * in save_mce_event() which may fall outside RMO region, also
> * leave it enabled because subsequently we will be queuing work
> * to workqueues where again per-cpu variables accessed, besides
> * fwnmi_release_errinfo() crashes when called in realmode on
> * pseries.
> * Note: All the realmode handling like flushing SLB entries for
> * SLB multihit is done by now.
> */
>
> That suggests per-cpu variables need protection. In your patch, you
> enable translations just around irq_work_queue:
The comment is bit old, most of it doesn't make any sense now, yes per-cpu
variables cannot be accessed in realmode, but with commit 923b3cf00b3f
("powerpc/mce: Remove per cpu variables from MCE handlers") we moved all of
them to paca.
>> + /* Queue irq work to process this event later. Before
>> + * queuing the work enable translation for non radix LPAR,
>> + * as irq_work_queue may try to access memory outside RMO
>> + * region.
>> + */
>> + if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) {
>> + msr = mfmsr();
>> + mtmsr(msr | MSR_IR | MSR_DR);
>> + irq_work_queue(&mce_event_process_work);
>> + mtmsr(msr);
>> + } else {
>> + irq_work_queue(&mce_event_process_work);
>> + }
> However, just before that in the function, there are a few things that
> access per-cpu variables via the local_paca, e.g.:
>
> memcpy(&local_paca->mce_info->mce_event_queue[index],
> &evt, sizeof(evt));
>
> Do we need to widen the window where translations are enabled in order
> to protect accesses to local_paca?
paca will be within Real Mode Area, so it can be accessed with translate off.
[-- Attachment #2: Type: text/html, Size: 3620 bytes --]
^ permalink raw reply
* Re: [PATCH v1 1/2] powerpc/64s: system call rfscv workaround for TM bugs
From: Daniel Axtens @ 2021-09-17 8:02 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Eirik Fuller, Nicholas Piggin, kvm-ppc
In-Reply-To: <20210908101718.118522-1-npiggin@gmail.com>
Nicholas Piggin <npiggin@gmail.com> writes:
> The rfscv instruction does not work correctly with the fake-suspend mode
> in POWER9, which can end up with the hypervisor restoring an incorrect
> checkpoint.
If I understand correctly from commit 4bb3c7a0208f ("KVM: PPC: Book3S
HV: Work around transactional memory bugs in POWER9"), this is because
rfscv does not cause a soft-patch interrupt in the way that rfid etc do.
So we need to avoid calling rfscv if we are in fake-suspend state -
instead we must call something that does indeed get soft-patched - like
rfid.
> Work around this by setting the _TIF_RESTOREALL flag if a system call
> returns to a transaction active state, causing rfid to be used instead
> of rfscv to return, which will do the right thing. The contents of the
> registers are irrelevant because they will be overwritten in this case
> anyway.
I can follow that this will indeed cause syscall_exit_prepare to return
non-zero and therefore we should take the
syscall_vectored_*_restore_regs path which does an RFID_TO_USER rather
than a RFSCV_TO_USER. My only question/concern is:
.Lsyscall_vectored_\name\()_exit:
addi r4,r1,STACK_FRAME_OVERHEAD
li r5,1 /* scv */
bl syscall_exit_prepare <-------- we get r3 != 0 here
std r1,PACA_EXIT_SAVE_R1(r13) /* save r1 for restart */
.Lsyscall_vectored_\name\()_rst_start:
lbz r11,PACAIRQHAPPENED(r13)
andi. r11,r11,(~PACA_IRQ_HARD_DIS)@l
bne- syscall_vectored_\name\()_restart <-- can we end up taking
this branch?
Are there any circumstances that would take us down the _restart path,
and if so, will we still go through the correct RFID_TO_USER branch
rather than the RFSCV_TO_USER branch?
Apart from that this looks good to me, although with the heavy
disclaimer that I only learned about fake suspend for the first time
while reviewing the patch.
Kind regards,
Daniel
>
> Reported-by: Eirik Fuller <efuller@redhat.com>
> Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions")
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> arch/powerpc/kernel/interrupt.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index c77c80214ad3..917a2ac4def6 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -139,6 +139,19 @@ notrace long system_call_exception(long r3, long r4, long r5,
> */
> irq_soft_mask_regs_set_state(regs, IRQS_ENABLED);
>
> + /*
> + * If system call is called with TM active, set _TIF_RESTOREALL to
> + * prevent RFSCV being used to return to userspace, because POWER9
> + * TM implementation has problems with this instruction returning to
> + * transactional state. Final register values are not relevant because
> + * the transaction will be aborted upon return anyway. Or in the case
> + * of unsupported_scv SIGILL fault, the return state does not much
> + * matter because it's an edge case.
> + */
> + if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
> + unlikely(MSR_TM_TRANSACTIONAL(regs->msr)))
> + current_thread_info()->flags |= _TIF_RESTOREALL;
> +
> /*
> * If the system call was made with a transaction active, doom it and
> * return without performing the system call. Unless it was an
> --
> 2.23.0
^ permalink raw reply
* Re: [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
From: Vincent Guittot @ 2021-09-17 7:41 UTC (permalink / raw)
To: Ricardo Neri
Cc: Juri Lelli, Aubrey Li, Srikar Dronamraju, Ravi V. Shankar,
Peter Zijlstra (Intel), Ricardo Neri, Ben Segall,
Srinivas Pandruvada, Joel Fernandes (Google), Ingo Molnar,
Rafael J . Wysocki, Steven Rostedt, Mel Gorman, Len Brown,
Nicholas Piggin, Aubrey Li, Dietmar Eggemann, Tim Chen,
Quentin Perret, Daniel Bristot de Oliveira, linux-kernel,
linuxppc-dev
In-Reply-To: <20210917010044.GA23727@ranerica-svr.sc.intel.com>
On Fri, 17 Sept 2021 at 03:01, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> On Wed, Sep 15, 2021 at 05:43:44PM +0200, Vincent Guittot wrote:
> > On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
> > <ricardo.neri-calderon@linux.intel.com> wrote:
> > >
> > > When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
> > > check for the idle state of the destination CPU, dst_cpu, but also of
> > > its SMT siblings.
> > >
> > > If dst_cpu is idle but its SMT siblings are busy, performance suffers
> > > if it pulls tasks from a medium priority CPU that does not have SMT
> > > siblings.
> > >
> > > Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
> > > siblings of both dst_cpu and the CPUs in the candidate busiest group.
> > >
> > > Cc: Aubrey Li <aubrey.li@intel.com>
> > > Cc: Ben Segall <bsegall@google.com>
> > > Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> > > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > > Cc: Mel Gorman <mgorman@suse.de>
> > > Cc: Quentin Perret <qperret@google.com>
> > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Tim Chen <tim.c.chen@linux.intel.com>
> > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > Reviewed-by: Len Brown <len.brown@intel.com>
> > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > > ---
> > > Changes since v4:
> > > * Use sg_lb_stats::sum_nr_running the idle state of a scheduling group.
> > > (Vincent, Peter)
> > > * Do not even idle CPUs in asym_smt_can_pull_tasks(). (Vincent)
> > > * Updated function documentation and corrected a typo.
> > >
> > > Changes since v3:
> > > * Removed the arch_asym_check_smt_siblings() hook. Discussions with the
> > > powerpc folks showed that this patch should not impact them. Also, more
> > > recent powerpc processor no longer use asym_packing. (PeterZ)
> > > * Removed unnecessary local variable in asym_can_pull_tasks(). (Dietmar)
> > > * Removed unnecessary check for local CPUs when the local group has zero
> > > utilization. (Joel)
> > > * Renamed asym_can_pull_tasks() as asym_smt_can_pull_tasks() to reflect
> > > the fact that it deals with SMT cases.
> > > * Made asym_smt_can_pull_tasks() return false for !CONFIG_SCHED_SMT so
> > > that callers can deal with non-SMT cases.
> > >
> > > Changes since v2:
> > > * Reworded the commit message to reflect updates in code.
> > > * Corrected misrepresentation of dst_cpu as the CPU doing the load
> > > balancing. (PeterZ)
> > > * Removed call to arch_asym_check_smt_siblings() as it is now called in
> > > sched_asym().
> > >
> > > Changes since v1:
> > > * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull
> > > tasks. Instead, reclassify the candidate busiest group, as it
> > > may still be selected. (PeterZ)
> > > * Avoid an expensive and unnecessary call to cpumask_weight() when
> > > determining if a sched_group is comprised of SMT siblings.
> > > (PeterZ).
> > > ---
> > > kernel/sched/fair.c | 94 +++++++++++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 94 insertions(+)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 26db017c14a3..8d763dd0174b 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8597,10 +8597,98 @@ group_type group_classify(unsigned int imbalance_pct,
> > > return group_has_spare;
> > > }
> > >
> > > +/**
> > > + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
> > > + * @dst_cpu: Destination CPU of the load balancing
> > > + * @sds: Load-balancing data with statistics of the local group
> > > + * @sgs: Load-balancing statistics of the candidate busiest group
> > > + * @sg: The candidate busiest group
> > > + *
> > > + * Check the state of the SMT siblings of both @sds::local and @sg and decide
> > > + * if @dst_cpu can pull tasks.
> > > + *
> > > + * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
> > > + * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
> > > + * only if @dst_cpu has higher priority.
> > > + *
> > > + * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
> > > + * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
> > > + * Bigger imbalances in the number of busy CPUs will be dealt with in
> > > + * update_sd_pick_busiest().
> > > + *
> > > + * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
> > > + * of @dst_cpu are idle and @sg has lower priority.
> > > + */
> > > +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> > > + struct sg_lb_stats *sgs,
> > > + struct sched_group *sg)
> > > +{
> > > +#ifdef CONFIG_SCHED_SMT
> > > + bool local_is_smt, sg_is_smt;
> > > + int sg_busy_cpus;
> > > +
> > > + local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> > > + sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> > > +
> > > + sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> > > +
> > > + if (!local_is_smt) {
> > > + /*
> > > + * If we are here, @dst_cpu is idle and does not have SMT
> > > + * siblings. Pull tasks if candidate group has two or more
> > > + * busy CPUs.
> > > + */
> > > + if (sg_is_smt && sg_busy_cpus >= 2)
> >
> > Do you really need to test sg_is_smt ? if sg_busy_cpus >= 2 then
> > sd_is_smt must be true ?
>
> Thank you very much for your feedback Vincent!
>
> Yes, it is true that sg_busy_cpus >=2 is only true if @sg is SMT. I will
> remove this check.
>
> >
> > Also, This is the default behavior where we want to even the number of
> > busy cpu. Shouldn't you return false and fall back to the default
> > behavior ?
>
> This is also true.
>
> >
> > That being said, the default behavior tries to even the number of idle
> > cpus which is easier to compute and is equal to even the number of
> > busy cpus in "normal" system with the same number of cpus in groups
> > but this is not the case here. It could be good to change the default
> > behavior to even the number of busy cpus and that you use the default
> > behavior here. Additional condition will be used to select the busiest
> > group like more busy cpu or more number of running tasks
>
> That is a very good observation. Checking the number of idle CPUs
> assumes that both groups have the same number of CPUs. I'll look into
> modifying the default behavior.
>
> >
> > > + return true;
> > > +
> > > + /*
> > > + * @dst_cpu does not have SMT siblings. @sg may have SMT
> > > + * siblings and only one is busy. In such case, @dst_cpu
> > > + * can help if it has higher priority and is idle (i.e.,
> > > + * it has no running tasks).
> >
> > The previous comment above assume that "@dst_cpu is idle" but now you
> > need to check that sds->local_stat.sum_nr_running == 0
>
> But we already know that, right? We are here because in
> update_sg_lb_stats() we determine that dst CPU is idle (env->idle !=
> CPU_NOT_IDLE).
That's my point:
Why do you add the condition !sds->local_stat.sum_nr_running below ? I
assume that it's to check that the cpu is idle, isn't it ?
> > > + */
> > > + return !sds->local_stat.sum_nr_running &&
> > > + sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > > + }
>
> Thanks and BR,
> Ricardo
^ permalink raw reply
* [PATCH v5 3/5] signal: Add unsafe_copy_siginfo_to_user()
From: Christophe Leroy @ 2021-09-17 6:59 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
ebiederm, hch
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1718f38859d5366f82d5bef531f255cedf537b5d.1631861883.git.christophe.leroy@csgroup.eu>
In the same spirit as commit fb05121fd6a2 ("signal: Add
unsafe_get_compat_sigset()"), implement an 'unsafe' version of
copy_siginfo_to_user() in order to use it within user access blocks.
For that, also add an 'unsafe' version of clear_user().
This commit adds the generic fallback for unsafe_clear_user().
Architectures wanting to use unsafe_copy_siginfo_to_user() within a
user_access_begin() section have to make sure they have their
own unsafe_clear_user().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v3: Added precision about unsafe_clear_user() in commit message.
---
include/linux/signal.h | 15 +++++++++++++++
include/linux/uaccess.h | 1 +
kernel/signal.c | 5 -----
3 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 3f96a6374e4f..70ea7e741427 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -35,6 +35,21 @@ static inline void copy_siginfo_to_external(siginfo_t *to,
int copy_siginfo_to_user(siginfo_t __user *to, const kernel_siginfo_t *from);
int copy_siginfo_from_user(kernel_siginfo_t *to, const siginfo_t __user *from);
+static __always_inline char __user *si_expansion(const siginfo_t __user *info)
+{
+ return ((char __user *)info) + sizeof(struct kernel_siginfo);
+}
+
+#define unsafe_copy_siginfo_to_user(to, from, label) do { \
+ siginfo_t __user *__ucs_to = to; \
+ const kernel_siginfo_t *__ucs_from = from; \
+ char __user *__ucs_expansion = si_expansion(__ucs_to); \
+ \
+ unsafe_copy_to_user(__ucs_to, __ucs_from, \
+ sizeof(struct kernel_siginfo), label); \
+ unsafe_clear_user(__ucs_expansion, SI_EXPANSION_SIZE, label); \
+} while (0)
+
enum siginfo_layout {
SIL_KILL,
SIL_TIMER,
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index c05e903cef02..37073caac474 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -398,6 +398,7 @@ long strnlen_user_nofault(const void __user *unsafe_addr, long count);
#define unsafe_put_user(x,p,e) unsafe_op_wrap(__put_user(x,p),e)
#define unsafe_copy_to_user(d,s,l,e) unsafe_op_wrap(__copy_to_user(d,s,l),e)
#define unsafe_copy_from_user(d,s,l,e) unsafe_op_wrap(__copy_from_user(d,s,l),e)
+#define unsafe_clear_user(d, l, e) unsafe_op_wrap(__clear_user(d, l), e)
static inline unsigned long user_access_save(void) { return 0UL; }
static inline void user_access_restore(unsigned long flags) { }
#endif
diff --git a/kernel/signal.c b/kernel/signal.c
index 952741f6d0f9..23f168730b7e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -3324,11 +3324,6 @@ enum siginfo_layout siginfo_layout(unsigned sig, int si_code)
return layout;
}
-static inline char __user *si_expansion(const siginfo_t __user *info)
-{
- return ((char __user *)info) + sizeof(struct kernel_siginfo);
-}
-
int copy_siginfo_to_user(siginfo_t __user *to, const kernel_siginfo_t *from)
{
char __user *expansion = si_expansion(to);
--
2.31.1
^ permalink raw reply related
* [PATCH v5 5/5] powerpc/signal: Use unsafe_copy_siginfo_to_user()
From: Christophe Leroy @ 2021-09-17 6:59 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
ebiederm, hch
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1718f38859d5366f82d5bef531f255cedf537b5d.1631861883.git.christophe.leroy@csgroup.eu>
Use unsafe_copy_siginfo_to_user() in order to do the copy
within the user access block.
On an mpc 8321 (book3s/32) the improvment is about 5% on a process
sending a signal to itself.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v5: Added missing __user flag when calling unsafe_copy_siginfo_to_user()
v4: Use another approach for compat: drop the unsafe_copy_siginfo_to_user32(), instead directly call copy_siginfo_to_external32() before user_access_begin()
v3: Don't leave compat aside, use the new unsafe_copy_siginfo_to_user32()
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/signal_32.c | 17 ++++++++---------
arch/powerpc/kernel/signal_64.c | 5 +----
2 files changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index ff101e2b3bab..0baf3c10b6c0 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -710,12 +710,6 @@ static long restore_tm_user_regs(struct pt_regs *regs, struct mcontext __user *s
}
#endif
-#ifdef CONFIG_PPC64
-
-#define copy_siginfo_to_user copy_siginfo_to_user32
-
-#endif /* CONFIG_PPC64 */
-
/*
* Set up a signal frame for a "real-time" signal handler
* (one which gets siginfo).
@@ -731,6 +725,7 @@ int handle_rt_signal32(struct ksignal *ksig, sigset_t *oldset,
struct pt_regs *regs = tsk->thread.regs;
/* Save the thread's msr before get_tm_stackpointer() changes it */
unsigned long msr = regs->msr;
+ compat_siginfo_t uinfo;
/* Set up Signal Frame */
frame = get_sigframe(ksig, tsk, sizeof(*frame), 1);
@@ -744,6 +739,9 @@ int handle_rt_signal32(struct ksignal *ksig, sigset_t *oldset,
else
prepare_save_user_regs(1);
+ if (IS_ENABLED(CONFIG_COMPAT))
+ copy_siginfo_to_external32(&uinfo, &ksig->info);
+
if (!user_access_begin(newsp, __SIGNAL_FRAMESIZE + 16 + sizeof(*frame)))
goto badframe;
@@ -779,15 +777,16 @@ int handle_rt_signal32(struct ksignal *ksig, sigset_t *oldset,
asm("dcbst %y0; sync; icbi %y0; sync" :: "Z" (mctx->mc_pad[0]));
}
unsafe_put_sigset_t(&frame->uc.uc_sigmask, oldset, failed);
+ if (IS_ENABLED(CONFIG_COMPAT))
+ unsafe_copy_to_user(&frame->info, &uinfo, sizeof(frame->info), failed);
+ else
+ unsafe_copy_siginfo_to_user((void __user *)&frame->info, &ksig->info, failed);
/* create a stack frame for the caller of the handler */
unsafe_put_user(regs->gpr[1], newsp, failed);
user_access_end();
- if (copy_siginfo_to_user(&frame->info, &ksig->info))
- goto badframe;
-
regs->link = tramp;
#ifdef CONFIG_PPC_FPU_REGS
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index d80ff83cacb9..56c0c74aa28c 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -901,15 +901,12 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
}
unsafe_copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set), badframe_block);
+ unsafe_copy_siginfo_to_user(&frame->info, &ksig->info, badframe_block);
/* Allocate a dummy caller frame for the signal handler. */
unsafe_put_user(regs->gpr[1], newsp, badframe_block);
user_write_access_end();
- /* Save the siginfo outside of the unsafe block. */
- if (copy_siginfo_to_user(&frame->info, &ksig->info))
- goto badframe;
-
/* Make sure signal handler doesn't get spurious FP exceptions */
tsk->thread.fp_state.fpscr = 0;
--
2.31.1
^ permalink raw reply related
* [PATCH v5 2/5] powerpc/signal: Include the new stack frame inside the user access block
From: Christophe Leroy @ 2021-09-17 6:59 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
ebiederm, hch
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1718f38859d5366f82d5bef531f255cedf537b5d.1631861883.git.christophe.leroy@csgroup.eu>
Include the new stack frame inside the user access block and set it up
using unsafe_put_user().
On an mpc 8321 (book3s/32) the improvment is about 4% on a process
sending a signal to itself.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/kernel/signal_32.c | 29 +++++++++++++----------------
arch/powerpc/kernel/signal_64.c | 14 +++++++-------
2 files changed, 20 insertions(+), 23 deletions(-)
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 0608581967f0..ff101e2b3bab 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -726,7 +726,7 @@ int handle_rt_signal32(struct ksignal *ksig, sigset_t *oldset,
struct rt_sigframe __user *frame;
struct mcontext __user *mctx;
struct mcontext __user *tm_mctx = NULL;
- unsigned long newsp = 0;
+ unsigned long __user *newsp;
unsigned long tramp;
struct pt_regs *regs = tsk->thread.regs;
/* Save the thread's msr before get_tm_stackpointer() changes it */
@@ -734,6 +734,7 @@ int handle_rt_signal32(struct ksignal *ksig, sigset_t *oldset,
/* Set up Signal Frame */
frame = get_sigframe(ksig, tsk, sizeof(*frame), 1);
+ newsp = (unsigned long __user *)((unsigned long)frame - (__SIGNAL_FRAMESIZE + 16));
mctx = &frame->uc.uc_mcontext;
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
tm_mctx = &frame->uc_transact.uc_mcontext;
@@ -743,7 +744,7 @@ int handle_rt_signal32(struct ksignal *ksig, sigset_t *oldset,
else
prepare_save_user_regs(1);
- if (!user_access_begin(frame, sizeof(*frame)))
+ if (!user_access_begin(newsp, __SIGNAL_FRAMESIZE + 16 + sizeof(*frame)))
goto badframe;
/* Put the siginfo & fill in most of the ucontext */
@@ -779,6 +780,9 @@ int handle_rt_signal32(struct ksignal *ksig, sigset_t *oldset,
}
unsafe_put_sigset_t(&frame->uc.uc_sigmask, oldset, failed);
+ /* create a stack frame for the caller of the handler */
+ unsafe_put_user(regs->gpr[1], newsp, failed);
+
user_access_end();
if (copy_siginfo_to_user(&frame->info, &ksig->info))
@@ -790,13 +794,8 @@ int handle_rt_signal32(struct ksignal *ksig, sigset_t *oldset,
tsk->thread.fp_state.fpscr = 0; /* turn off all fp exceptions */
#endif
- /* create a stack frame for the caller of the handler */
- newsp = ((unsigned long)frame) - (__SIGNAL_FRAMESIZE + 16);
- if (put_user(regs->gpr[1], (u32 __user *)newsp))
- goto badframe;
-
/* Fill registers for signal handler */
- regs->gpr[1] = newsp;
+ regs->gpr[1] = (unsigned long)newsp;
regs->gpr[3] = ksig->sig;
regs->gpr[4] = (unsigned long)&frame->info;
regs->gpr[5] = (unsigned long)&frame->uc;
@@ -826,7 +825,7 @@ int handle_signal32(struct ksignal *ksig, sigset_t *oldset,
struct sigframe __user *frame;
struct mcontext __user *mctx;
struct mcontext __user *tm_mctx = NULL;
- unsigned long newsp = 0;
+ unsigned long __user *newsp;
unsigned long tramp;
struct pt_regs *regs = tsk->thread.regs;
/* Save the thread's msr before get_tm_stackpointer() changes it */
@@ -834,6 +833,7 @@ int handle_signal32(struct ksignal *ksig, sigset_t *oldset,
/* Set up Signal Frame */
frame = get_sigframe(ksig, tsk, sizeof(*frame), 1);
+ newsp = (unsigned long __user *)((unsigned long)frame - __SIGNAL_FRAMESIZE);
mctx = &frame->mctx;
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
tm_mctx = &frame->mctx_transact;
@@ -843,7 +843,7 @@ int handle_signal32(struct ksignal *ksig, sigset_t *oldset,
else
prepare_save_user_regs(1);
- if (!user_access_begin(frame, sizeof(*frame)))
+ if (!user_access_begin(newsp, __SIGNAL_FRAMESIZE + sizeof(*frame)))
goto badframe;
sc = (struct sigcontext __user *) &frame->sctx;
@@ -873,6 +873,8 @@ int handle_signal32(struct ksignal *ksig, sigset_t *oldset,
unsafe_put_user(PPC_RAW_SC(), &mctx->mc_pad[1], failed);
asm("dcbst %y0; sync; icbi %y0; sync" :: "Z" (mctx->mc_pad[0]));
}
+ /* create a stack frame for the caller of the handler */
+ unsafe_put_user(regs->gpr[1], newsp, failed);
user_access_end();
regs->link = tramp;
@@ -881,12 +883,7 @@ int handle_signal32(struct ksignal *ksig, sigset_t *oldset,
tsk->thread.fp_state.fpscr = 0; /* turn off all fp exceptions */
#endif
- /* create a stack frame for the caller of the handler */
- newsp = ((unsigned long)frame) - __SIGNAL_FRAMESIZE;
- if (put_user(regs->gpr[1], (u32 __user *)newsp))
- goto badframe;
-
- regs->gpr[1] = newsp;
+ regs->gpr[1] = (unsigned long)newsp;
regs->gpr[3] = ksig->sig;
regs->gpr[4] = (unsigned long) sc;
regs_set_return_ip(regs, (unsigned long) ksig->ka.sa.sa_handler);
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 7b1cd50bc4fb..d80ff83cacb9 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -847,13 +847,14 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
struct task_struct *tsk)
{
struct rt_sigframe __user *frame;
- unsigned long newsp = 0;
+ unsigned long __user *newsp;
long err = 0;
struct pt_regs *regs = tsk->thread.regs;
/* Save the thread's msr before get_tm_stackpointer() changes it */
unsigned long msr = regs->msr;
frame = get_sigframe(ksig, tsk, sizeof(*frame), 0);
+ newsp = (unsigned long __user *)((unsigned long)frame - __SIGNAL_FRAMESIZE);
/*
* This only applies when calling unsafe_setup_sigcontext() and must be
@@ -862,7 +863,7 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
if (!MSR_TM_ACTIVE(msr))
prepare_setup_sigcontext(tsk);
- if (!user_write_access_begin(frame, sizeof(*frame)))
+ if (!user_write_access_begin(newsp, __SIGNAL_FRAMESIZE + sizeof(*frame)))
goto badframe;
unsafe_put_user(&frame->info, &frame->pinfo, badframe_block);
@@ -900,6 +901,9 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
}
unsafe_copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set), badframe_block);
+ /* Allocate a dummy caller frame for the signal handler. */
+ unsafe_put_user(regs->gpr[1], newsp, badframe_block);
+
user_write_access_end();
/* Save the siginfo outside of the unsafe block. */
@@ -919,10 +923,6 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
regs_set_return_ip(regs, (unsigned long) &frame->tramp[0]);
}
- /* Allocate a dummy caller frame for the signal handler. */
- newsp = ((unsigned long)frame) - __SIGNAL_FRAMESIZE;
- err |= put_user(regs->gpr[1], (unsigned long __user *)newsp);
-
/* Set up "regs" so we "return" to the signal handler. */
if (is_elf2_task()) {
regs->ctr = (unsigned long) ksig->ka.sa.sa_handler;
@@ -947,7 +947,7 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
/* enter the signal handler in native-endian mode */
regs_set_return_msr(regs, (regs->msr & ~MSR_LE) | (MSR_KERNEL & MSR_LE));
- regs->gpr[1] = newsp;
+ regs->gpr[1] = (unsigned long)newsp;
regs->gpr[3] = ksig->sig;
regs->result = 0;
if (ksig->ka.sa.sa_flags & SA_SIGINFO) {
--
2.31.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox