* [PATCH v6 1/2] ftrace: disable preemption when recursion locked
From: 王贇 @ 2021-10-27 2:34 UTC (permalink / raw)
To: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
Joe Lawrence, Masami Hiramatsu, Peter Zijlstra (Intel),
Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
linux-parisc, linuxppc-dev, linux-riscv, live-patching
In-Reply-To: <df8e9b3e-504c-635d-4e92-99cdf9f05479@linux.alibaba.com>
As the documentation explained, ftrace_test_recursion_trylock()
and ftrace_test_recursion_unlock() were supposed to disable and
enable preemption properly, however currently this work is done
outside of the function, which could be missing by mistake.
And since the internal using of trace_test_and_set_recursion()
and trace_clear_recursion() also require preemption disabled, we
can just merge the logical.
This patch will make sure the preemption has been disabled when
trace_test_and_set_recursion() return bit >= 0, and
trace_clear_recursion() will enable the preemption if previously
enabled.
CC: Petr Mladek <pmladek@suse.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Miroslav Benes <mbenes@suse.cz>
Reported-by: Abaci <abaci@linux.alibaba.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
---
arch/csky/kernel/probes/ftrace.c | 2 --
arch/parisc/kernel/ftrace.c | 2 --
arch/powerpc/kernel/kprobes-ftrace.c | 2 --
arch/riscv/kernel/probes/ftrace.c | 2 --
arch/x86/kernel/kprobes/ftrace.c | 2 --
include/linux/trace_recursion.h | 13 ++++++++++++-
kernel/livepatch/patch.c | 13 +++++++------
kernel/trace/ftrace.c | 15 +++++----------
kernel/trace/trace_functions.c | 5 -----
9 files changed, 24 insertions(+), 32 deletions(-)
diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
index b388228..834cffc 100644
--- a/arch/csky/kernel/probes/ftrace.c
+++ b/arch/csky/kernel/probes/ftrace.c
@@ -17,7 +17,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
return;
regs = ftrace_get_regs(fregs);
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (!p) {
p = get_kprobe((kprobe_opcode_t *)(ip - MCOUNT_INSN_SIZE));
@@ -57,7 +56,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
__this_cpu_write(current_kprobe, NULL);
}
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 7d14242..90c4345 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -210,7 +210,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
return;
regs = ftrace_get_regs(fregs);
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -239,7 +238,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
}
__this_cpu_write(current_kprobe, NULL);
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
index 7154d58..072ebe7 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -26,7 +26,6 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
return;
regs = ftrace_get_regs(fregs);
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)nip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -61,7 +60,6 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
__this_cpu_write(current_kprobe, NULL);
}
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/riscv/kernel/probes/ftrace.c b/arch/riscv/kernel/probes/ftrace.c
index aab85a8..7142ec4 100644
--- a/arch/riscv/kernel/probes/ftrace.c
+++ b/arch/riscv/kernel/probes/ftrace.c
@@ -15,7 +15,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
if (bit < 0)
return;
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -52,7 +51,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
__this_cpu_write(current_kprobe, NULL);
}
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index 596de2f..dd2ec14 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -25,7 +25,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
if (bit < 0)
return;
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -59,7 +58,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
__this_cpu_write(current_kprobe, NULL);
}
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index abe1a50..64c03ee 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -135,6 +135,9 @@ static __always_inline int trace_get_context_bit(void)
# define do_ftrace_record_recursion(ip, pip) do { } while (0)
#endif
+/*
+ * Preemption is promised to be disabled when return bit >= 0.
+ */
static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
int start)
{
@@ -162,11 +165,19 @@ static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsign
current->trace_recursion = val;
barrier();
+ preempt_disable_notrace();
+
return bit;
}
+/*
+ * Preemption will be enabled (if it was previously enabled).
+ */
static __always_inline void trace_clear_recursion(int bit)
{
+ WARN_ON_ONCE(bit < 0);
+
+ preempt_enable_notrace();
barrier();
trace_recursion_clear(bit);
}
@@ -178,7 +189,7 @@ static __always_inline void trace_clear_recursion(int bit)
* tracing recursed in the same context (normal vs interrupt),
*
* Returns: -1 if a recursion happened.
- * >= 0 if no recursion
+ * >= 0 if no recursion.
*/
static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
unsigned long parent_ip)
diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
index e8029ae..b8d75fb 100644
--- a/kernel/livepatch/patch.c
+++ b/kernel/livepatch/patch.c
@@ -49,14 +49,16 @@ static void notrace klp_ftrace_handler(unsigned long ip,
ops = container_of(fops, struct klp_ops, fops);
+ /*
+ *
+ * The ftrace_test_recursion_trylock() will disable preemption,
+ * which is required for the variant of synchronize_rcu() that is
+ * used to allow patching functions where RCU is not watching.
+ * See klp_synchronize_transition() for more details.
+ */
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (WARN_ON_ONCE(bit < 0))
return;
- /*
- * A variant of synchronize_rcu() is used to allow patching functions
- * where RCU is not watching, see klp_synchronize_transition().
- */
- preempt_disable_notrace();
func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
stack_node);
@@ -120,7 +122,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
klp_arch_set_pc(fregs, (unsigned long)func->new_func);
unlock:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b7be1df..7392bc7 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -7198,16 +7198,15 @@ void ftrace_reset_array_ops(struct trace_array *tr)
struct ftrace_ops *op;
int bit;
+ /*
+ * The ftrace_test_and_set_recursion() will disable preemption,
+ * which is required since some of the ops may be dynamically
+ * allocated, they must be freed after a synchronize_rcu().
+ */
bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START);
if (bit < 0)
return;
- /*
- * Some of the ops may be dynamically allocated,
- * they must be freed after a synchronize_rcu().
- */
- preempt_disable_notrace();
-
do_for_each_ftrace_op(op, ftrace_ops_list) {
/* Stub functions don't need to be called nor tested */
if (op->flags & FTRACE_OPS_FL_STUB)
@@ -7231,7 +7230,6 @@ void ftrace_reset_array_ops(struct trace_array *tr)
}
} while_for_each_ftrace_op(op);
out:
- preempt_enable_notrace();
trace_clear_recursion(bit);
}
@@ -7279,12 +7277,9 @@ static void ftrace_ops_assist_func(unsigned long ip, unsigned long parent_ip,
if (bit < 0)
return;
- preempt_disable_notrace();
-
if (!(op->flags & FTRACE_OPS_FL_RCU) || rcu_is_watching())
op->func(ip, parent_ip, op, fregs);
- preempt_enable_notrace();
trace_clear_recursion(bit);
}
NOKPROBE_SYMBOL(ftrace_ops_assist_func);
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 1f0e63f..9f1bfbe 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -186,7 +186,6 @@ static void function_trace_start(struct trace_array *tr)
return;
trace_ctx = tracing_gen_ctx();
- preempt_disable_notrace();
cpu = smp_processor_id();
data = per_cpu_ptr(tr->array_buffer.data, cpu);
@@ -194,7 +193,6 @@ static void function_trace_start(struct trace_array *tr)
trace_function(tr, ip, parent_ip, trace_ctx);
ftrace_test_recursion_unlock(bit);
- preempt_enable_notrace();
}
#ifdef CONFIG_UNWINDER_ORC
@@ -298,8 +296,6 @@ static inline void process_repeats(struct trace_array *tr,
if (bit < 0)
return;
- preempt_disable_notrace();
-
cpu = smp_processor_id();
data = per_cpu_ptr(tr->array_buffer.data, cpu);
if (atomic_read(&data->disabled))
@@ -324,7 +320,6 @@ static inline void process_repeats(struct trace_array *tr,
out:
ftrace_test_recursion_unlock(bit);
- preempt_enable_notrace();
}
static void
--
1.8.3.1
^ permalink raw reply related
* [PATCH v6 0/2] fix & prevent the missing preemption disabling
From: 王贇 @ 2021-10-27 2:32 UTC (permalink / raw)
To: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
Joe Lawrence, Masami Hiramatsu, Peter Zijlstra (Intel),
Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
linux-parisc, linuxppc-dev, linux-riscv, live-patching
The testing show that perf_ftrace_function_call() are using smp_processor_id()
with preemption enabled, all the checking on CPU could be wrong after preemption.
As Peter point out, the section between ftrace_test_recursion_trylock/unlock()
pair require the preemption to be disabled as 'Documentation/trace/ftrace-uses.rst'
explained, but currently the work is done outside of the helpers.
And since the internal using of trace_test_and_set_recursion()
and trace_clear_recursion() also require preemption to be disabled, we
can just merge the logical together.
Patch 1/2 will make sure preemption disabled when recursion lock succeed,
patch 2/2 will do smp_processor_id() checking after trylock() to address the
issue.
v1: https://lore.kernel.org/all/8c7de46d-9869-aa5e-2bb9-5dbc2eda395e@linux.alibaba.com/
v2: https://lore.kernel.org/all/b1d7fe43-ce84-0ed7-32f7-ea1d12d0b716@linux.alibaba.com/
v3: https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d994@linux.alibaba.com/
V4: https://lore.kernel.org/all/32a36348-69ee-6464-390c-3a8d6e9d2b53@linux.alibaba.com/
V5: https://lore.kernel.org/all/3ca92dc9-ea04-ddc2-71cd-524bfa5a5721@linux.alibaba.com/
Michael Wang (2):
ftrace: disable preemption when recursion locked
ftrace: do CPU checking after preemption disabled
arch/csky/kernel/probes/ftrace.c | 2 --
arch/parisc/kernel/ftrace.c | 2 --
arch/powerpc/kernel/kprobes-ftrace.c | 2 --
arch/riscv/kernel/probes/ftrace.c | 2 --
arch/x86/kernel/kprobes/ftrace.c | 2 --
include/linux/trace_recursion.h | 13 ++++++++++++-
kernel/livepatch/patch.c | 13 +++++++------
kernel/trace/ftrace.c | 15 +++++----------
kernel/trace/trace_event_perf.c | 6 +++---
kernel/trace/trace_functions.c | 5 -----
10 files changed, 27 insertions(+), 35 deletions(-)
--
1.8.3.1
^ permalink raw reply
* Re: [PATCH v5 1/2] ftrace: disable preemption when recursion locked
From: 王贇 @ 2021-10-27 2:28 UTC (permalink / raw)
To: Steven Rostedt
Cc: Peter Zijlstra (Intel), Paul Walmsley, James E.J. Bottomley,
Guo Ren, Jisheng Zhang, H. Peter Anvin, live-patching,
linux-riscv, Miroslav Benes, Joe Lawrence, Helge Deller, x86,
linux-csky, Ingo Molnar, Petr Mladek, Albert Ou, Jiri Kosina,
Nicholas Piggin, Borislav Petkov, Josh Poimboeuf, Thomas Gleixner,
linux-parisc, linux-kernel, Palmer Dabbelt, Masami Hiramatsu,
Paul Mackerras, linuxppc-dev
In-Reply-To: <20211026222600.7899126f@rorschach.local.home>
On 2021/10/27 上午10:26, Steven Rostedt wrote:
> On Wed, 27 Oct 2021 09:54:13 +0800
> 王贇 <yun.wang@linux.alibaba.com> wrote:
>
>> My apologize for the stupid comments... I'll send a v6 for this patch
>> only to fix that, please let me know if this is not a good way to fix
>> few lines of comments.
>
> Actually, please resend both patches, as a new patch set, on its own thread.
>
> Just replying here won't trigger my patchwork scripts.
>
> And also, if you don't include the other patch, the scripts will drop
> it.
Got it~
Regards,
Michael Wang
>
> Thanks,
>
> -- Steve
>
^ permalink raw reply
* Re: [PATCH v5 1/2] ftrace: disable preemption when recursion locked
From: Steven Rostedt @ 2021-10-27 2:26 UTC (permalink / raw)
To: 王贇
Cc: Peter Zijlstra (Intel), Paul Walmsley, James E.J. Bottomley,
Guo Ren, Jisheng Zhang, H. Peter Anvin, live-patching,
linux-riscv, Miroslav Benes, Joe Lawrence, Helge Deller, x86,
linux-csky, Ingo Molnar, Petr Mladek, Albert Ou, Jiri Kosina,
Nicholas Piggin, Borislav Petkov, Josh Poimboeuf, Thomas Gleixner,
linux-parisc, linux-kernel, Palmer Dabbelt, Masami Hiramatsu,
Paul Mackerras, linuxppc-dev
In-Reply-To: <3d897161-7b74-944a-f2a0-07311436fbd9@linux.alibaba.com>
On Wed, 27 Oct 2021 09:54:13 +0800
王贇 <yun.wang@linux.alibaba.com> wrote:
> My apologize for the stupid comments... I'll send a v6 for this patch
> only to fix that, please let me know if this is not a good way to fix
> few lines of comments.
Actually, please resend both patches, as a new patch set, on its own thread.
Just replying here won't trigger my patchwork scripts.
And also, if you don't include the other patch, the scripts will drop
it.
Thanks,
-- Steve
^ permalink raw reply
* Re: [PATCH v6] ftrace: disable preemption when recursion locked
From: 王贇 @ 2021-10-27 2:24 UTC (permalink / raw)
To: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
Joe Lawrence, Masami Hiramatsu, Peter Zijlstra (Intel),
Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
linux-parisc, linuxppc-dev, linux-riscv, live-patching
In-Reply-To: <1d876d3f-b844-4e99-6043-af0b062dc315@linux.alibaba.com>
Hi, Steven, Miroslav
Should have fixed the comments about bit value, besides, add
a warn in trace_clear_recursion() to make sure the bit < 0
abusing case will get notified.
Please let me know if there are any other issues :-)
Regards,
Michael Wang
On 2021/10/27 上午10:11, 王贇 wrote:
> As the documentation explained, ftrace_test_recursion_trylock()
> and ftrace_test_recursion_unlock() were supposed to disable and
> enable preemption properly, however currently this work is done
> outside of the function, which could be missing by mistake.
>
> And since the internal using of trace_test_and_set_recursion()
> and trace_clear_recursion() also require preemption disabled, we
> can just merge the logical.
>
> This patch will make sure the preemption has been disabled when
> trace_test_and_set_recursion() return bit >= 0, and
> trace_clear_recursion() will enable the preemption if previously
> enabled.
>
> CC: Petr Mladek <pmladek@suse.com>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: Miroslav Benes <mbenes@suse.cz>
> Reported-by: Abaci <abaci@linux.alibaba.com>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
> ---
> arch/csky/kernel/probes/ftrace.c | 2 --
> arch/parisc/kernel/ftrace.c | 2 --
> arch/powerpc/kernel/kprobes-ftrace.c | 2 --
> arch/riscv/kernel/probes/ftrace.c | 2 --
> arch/x86/kernel/kprobes/ftrace.c | 2 --
> include/linux/trace_recursion.h | 13 ++++++++++++-
> kernel/livepatch/patch.c | 13 +++++++------
> kernel/trace/ftrace.c | 15 +++++----------
> kernel/trace/trace_functions.c | 5 -----
> 9 files changed, 24 insertions(+), 32 deletions(-)
>
> diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
> index b388228..834cffc 100644
> --- a/arch/csky/kernel/probes/ftrace.c
> +++ b/arch/csky/kernel/probes/ftrace.c
> @@ -17,7 +17,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> return;
>
> regs = ftrace_get_regs(fregs);
> - preempt_disable_notrace();
> p = get_kprobe((kprobe_opcode_t *)ip);
> if (!p) {
> p = get_kprobe((kprobe_opcode_t *)(ip - MCOUNT_INSN_SIZE));
> @@ -57,7 +56,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> __this_cpu_write(current_kprobe, NULL);
> }
> out:
> - preempt_enable_notrace();
> ftrace_test_recursion_unlock(bit);
> }
> NOKPROBE_SYMBOL(kprobe_ftrace_handler);
> diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
> index 7d14242..90c4345 100644
> --- a/arch/parisc/kernel/ftrace.c
> +++ b/arch/parisc/kernel/ftrace.c
> @@ -210,7 +210,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> return;
>
> regs = ftrace_get_regs(fregs);
> - preempt_disable_notrace();
> p = get_kprobe((kprobe_opcode_t *)ip);
> if (unlikely(!p) || kprobe_disabled(p))
> goto out;
> @@ -239,7 +238,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> }
> __this_cpu_write(current_kprobe, NULL);
> out:
> - preempt_enable_notrace();
> ftrace_test_recursion_unlock(bit);
> }
> NOKPROBE_SYMBOL(kprobe_ftrace_handler);
> diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
> index 7154d58..072ebe7 100644
> --- a/arch/powerpc/kernel/kprobes-ftrace.c
> +++ b/arch/powerpc/kernel/kprobes-ftrace.c
> @@ -26,7 +26,6 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
> return;
>
> regs = ftrace_get_regs(fregs);
> - preempt_disable_notrace();
> p = get_kprobe((kprobe_opcode_t *)nip);
> if (unlikely(!p) || kprobe_disabled(p))
> goto out;
> @@ -61,7 +60,6 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
> __this_cpu_write(current_kprobe, NULL);
> }
> out:
> - preempt_enable_notrace();
> ftrace_test_recursion_unlock(bit);
> }
> NOKPROBE_SYMBOL(kprobe_ftrace_handler);
> diff --git a/arch/riscv/kernel/probes/ftrace.c b/arch/riscv/kernel/probes/ftrace.c
> index aab85a8..7142ec4 100644
> --- a/arch/riscv/kernel/probes/ftrace.c
> +++ b/arch/riscv/kernel/probes/ftrace.c
> @@ -15,7 +15,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> if (bit < 0)
> return;
>
> - preempt_disable_notrace();
> p = get_kprobe((kprobe_opcode_t *)ip);
> if (unlikely(!p) || kprobe_disabled(p))
> goto out;
> @@ -52,7 +51,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> __this_cpu_write(current_kprobe, NULL);
> }
> out:
> - preempt_enable_notrace();
> ftrace_test_recursion_unlock(bit);
> }
> NOKPROBE_SYMBOL(kprobe_ftrace_handler);
> diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
> index 596de2f..dd2ec14 100644
> --- a/arch/x86/kernel/kprobes/ftrace.c
> +++ b/arch/x86/kernel/kprobes/ftrace.c
> @@ -25,7 +25,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> if (bit < 0)
> return;
>
> - preempt_disable_notrace();
> p = get_kprobe((kprobe_opcode_t *)ip);
> if (unlikely(!p) || kprobe_disabled(p))
> goto out;
> @@ -59,7 +58,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> __this_cpu_write(current_kprobe, NULL);
> }
> out:
> - preempt_enable_notrace();
> ftrace_test_recursion_unlock(bit);
> }
> NOKPROBE_SYMBOL(kprobe_ftrace_handler);
> diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
> index abe1a50..64c03ee 100644
> --- a/include/linux/trace_recursion.h
> +++ b/include/linux/trace_recursion.h
> @@ -135,6 +135,9 @@ static __always_inline int trace_get_context_bit(void)
> # define do_ftrace_record_recursion(ip, pip) do { } while (0)
> #endif
>
> +/*
> + * Preemption is promised to be disabled when return bit >= 0.
> + */
> static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
> int start)
> {
> @@ -162,11 +165,19 @@ static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsign
> current->trace_recursion = val;
> barrier();
>
> + preempt_disable_notrace();
> +
> return bit;
> }
>
> +/*
> + * Preemption will be enabled (if it was previously enabled).
> + */
> static __always_inline void trace_clear_recursion(int bit)
> {
> + WARN_ON_ONCE(bit < 0);
> +
> + preempt_enable_notrace();
> barrier();
> trace_recursion_clear(bit);
> }
> @@ -178,7 +189,7 @@ static __always_inline void trace_clear_recursion(int bit)
> * tracing recursed in the same context (normal vs interrupt),
> *
> * Returns: -1 if a recursion happened.
> - * >= 0 if no recursion
> + * >= 0 if no recursion.
> */
> static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
> unsigned long parent_ip)
> diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
> index e8029ae..b8d75fb 100644
> --- a/kernel/livepatch/patch.c
> +++ b/kernel/livepatch/patch.c
> @@ -49,14 +49,16 @@ static void notrace klp_ftrace_handler(unsigned long ip,
>
> ops = container_of(fops, struct klp_ops, fops);
>
> + /*
> + *
> + * The ftrace_test_recursion_trylock() will disable preemption,
> + * which is required for the variant of synchronize_rcu() that is
> + * used to allow patching functions where RCU is not watching.
> + * See klp_synchronize_transition() for more details.
> + */
> bit = ftrace_test_recursion_trylock(ip, parent_ip);
> if (WARN_ON_ONCE(bit < 0))
> return;
> - /*
> - * A variant of synchronize_rcu() is used to allow patching functions
> - * where RCU is not watching, see klp_synchronize_transition().
> - */
> - preempt_disable_notrace();
>
> func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
> stack_node);
> @@ -120,7 +122,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
> klp_arch_set_pc(fregs, (unsigned long)func->new_func);
>
> unlock:
> - preempt_enable_notrace();
> ftrace_test_recursion_unlock(bit);
> }
>
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index b7be1df..7392bc7 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -7198,16 +7198,15 @@ void ftrace_reset_array_ops(struct trace_array *tr)
> struct ftrace_ops *op;
> int bit;
>
> + /*
> + * The ftrace_test_and_set_recursion() will disable preemption,
> + * which is required since some of the ops may be dynamically
> + * allocated, they must be freed after a synchronize_rcu().
> + */
> bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START);
> if (bit < 0)
> return;
>
> - /*
> - * Some of the ops may be dynamically allocated,
> - * they must be freed after a synchronize_rcu().
> - */
> - preempt_disable_notrace();
> -
> do_for_each_ftrace_op(op, ftrace_ops_list) {
> /* Stub functions don't need to be called nor tested */
> if (op->flags & FTRACE_OPS_FL_STUB)
> @@ -7231,7 +7230,6 @@ void ftrace_reset_array_ops(struct trace_array *tr)
> }
> } while_for_each_ftrace_op(op);
> out:
> - preempt_enable_notrace();
> trace_clear_recursion(bit);
> }
>
> @@ -7279,12 +7277,9 @@ static void ftrace_ops_assist_func(unsigned long ip, unsigned long parent_ip,
> if (bit < 0)
> return;
>
> - preempt_disable_notrace();
> -
> if (!(op->flags & FTRACE_OPS_FL_RCU) || rcu_is_watching())
> op->func(ip, parent_ip, op, fregs);
>
> - preempt_enable_notrace();
> trace_clear_recursion(bit);
> }
> NOKPROBE_SYMBOL(ftrace_ops_assist_func);
> diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
> index 1f0e63f..9f1bfbe 100644
> --- a/kernel/trace/trace_functions.c
> +++ b/kernel/trace/trace_functions.c
> @@ -186,7 +186,6 @@ static void function_trace_start(struct trace_array *tr)
> return;
>
> trace_ctx = tracing_gen_ctx();
> - preempt_disable_notrace();
>
> cpu = smp_processor_id();
> data = per_cpu_ptr(tr->array_buffer.data, cpu);
> @@ -194,7 +193,6 @@ static void function_trace_start(struct trace_array *tr)
> trace_function(tr, ip, parent_ip, trace_ctx);
>
> ftrace_test_recursion_unlock(bit);
> - preempt_enable_notrace();
> }
>
> #ifdef CONFIG_UNWINDER_ORC
> @@ -298,8 +296,6 @@ static inline void process_repeats(struct trace_array *tr,
> if (bit < 0)
> return;
>
> - preempt_disable_notrace();
> -
> cpu = smp_processor_id();
> data = per_cpu_ptr(tr->array_buffer.data, cpu);
> if (atomic_read(&data->disabled))
> @@ -324,7 +320,6 @@ static inline void process_repeats(struct trace_array *tr,
>
> out:
> ftrace_test_recursion_unlock(bit);
> - preempt_enable_notrace();
> }
>
> static void
>
^ permalink raw reply
* [PATCH v6] ftrace: disable preemption when recursion locked
From: 王贇 @ 2021-10-27 2:11 UTC (permalink / raw)
To: Guo Ren, Steven Rostedt, Ingo Molnar, James E.J. Bottomley,
Helge Deller, Michael Ellerman, Benjamin Herrenschmidt,
Paul Mackerras, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Thomas Gleixner, Borislav Petkov, x86, H. Peter Anvin,
Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
Joe Lawrence, Masami Hiramatsu, Peter Zijlstra (Intel),
Nicholas Piggin, Jisheng Zhang, linux-csky, linux-kernel,
linux-parisc, linuxppc-dev, linux-riscv, live-patching
In-Reply-To: <333cecfe-3045-8e0a-0c08-64ff590845ab@linux.alibaba.com>
As the documentation explained, ftrace_test_recursion_trylock()
and ftrace_test_recursion_unlock() were supposed to disable and
enable preemption properly, however currently this work is done
outside of the function, which could be missing by mistake.
And since the internal using of trace_test_and_set_recursion()
and trace_clear_recursion() also require preemption disabled, we
can just merge the logical.
This patch will make sure the preemption has been disabled when
trace_test_and_set_recursion() return bit >= 0, and
trace_clear_recursion() will enable the preemption if previously
enabled.
CC: Petr Mladek <pmladek@suse.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Miroslav Benes <mbenes@suse.cz>
Reported-by: Abaci <abaci@linux.alibaba.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
---
arch/csky/kernel/probes/ftrace.c | 2 --
arch/parisc/kernel/ftrace.c | 2 --
arch/powerpc/kernel/kprobes-ftrace.c | 2 --
arch/riscv/kernel/probes/ftrace.c | 2 --
arch/x86/kernel/kprobes/ftrace.c | 2 --
include/linux/trace_recursion.h | 13 ++++++++++++-
kernel/livepatch/patch.c | 13 +++++++------
kernel/trace/ftrace.c | 15 +++++----------
kernel/trace/trace_functions.c | 5 -----
9 files changed, 24 insertions(+), 32 deletions(-)
diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
index b388228..834cffc 100644
--- a/arch/csky/kernel/probes/ftrace.c
+++ b/arch/csky/kernel/probes/ftrace.c
@@ -17,7 +17,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
return;
regs = ftrace_get_regs(fregs);
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (!p) {
p = get_kprobe((kprobe_opcode_t *)(ip - MCOUNT_INSN_SIZE));
@@ -57,7 +56,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
__this_cpu_write(current_kprobe, NULL);
}
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 7d14242..90c4345 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -210,7 +210,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
return;
regs = ftrace_get_regs(fregs);
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -239,7 +238,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
}
__this_cpu_write(current_kprobe, NULL);
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
index 7154d58..072ebe7 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -26,7 +26,6 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
return;
regs = ftrace_get_regs(fregs);
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)nip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -61,7 +60,6 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
__this_cpu_write(current_kprobe, NULL);
}
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/riscv/kernel/probes/ftrace.c b/arch/riscv/kernel/probes/ftrace.c
index aab85a8..7142ec4 100644
--- a/arch/riscv/kernel/probes/ftrace.c
+++ b/arch/riscv/kernel/probes/ftrace.c
@@ -15,7 +15,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
if (bit < 0)
return;
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -52,7 +51,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
__this_cpu_write(current_kprobe, NULL);
}
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index 596de2f..dd2ec14 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -25,7 +25,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
if (bit < 0)
return;
- preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (unlikely(!p) || kprobe_disabled(p))
goto out;
@@ -59,7 +58,6 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
__this_cpu_write(current_kprobe, NULL);
}
out:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index abe1a50..64c03ee 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -135,6 +135,9 @@ static __always_inline int trace_get_context_bit(void)
# define do_ftrace_record_recursion(ip, pip) do { } while (0)
#endif
+/*
+ * Preemption is promised to be disabled when return bit >= 0.
+ */
static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
int start)
{
@@ -162,11 +165,19 @@ static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsign
current->trace_recursion = val;
barrier();
+ preempt_disable_notrace();
+
return bit;
}
+/*
+ * Preemption will be enabled (if it was previously enabled).
+ */
static __always_inline void trace_clear_recursion(int bit)
{
+ WARN_ON_ONCE(bit < 0);
+
+ preempt_enable_notrace();
barrier();
trace_recursion_clear(bit);
}
@@ -178,7 +189,7 @@ static __always_inline void trace_clear_recursion(int bit)
* tracing recursed in the same context (normal vs interrupt),
*
* Returns: -1 if a recursion happened.
- * >= 0 if no recursion
+ * >= 0 if no recursion.
*/
static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
unsigned long parent_ip)
diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
index e8029ae..b8d75fb 100644
--- a/kernel/livepatch/patch.c
+++ b/kernel/livepatch/patch.c
@@ -49,14 +49,16 @@ static void notrace klp_ftrace_handler(unsigned long ip,
ops = container_of(fops, struct klp_ops, fops);
+ /*
+ *
+ * The ftrace_test_recursion_trylock() will disable preemption,
+ * which is required for the variant of synchronize_rcu() that is
+ * used to allow patching functions where RCU is not watching.
+ * See klp_synchronize_transition() for more details.
+ */
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (WARN_ON_ONCE(bit < 0))
return;
- /*
- * A variant of synchronize_rcu() is used to allow patching functions
- * where RCU is not watching, see klp_synchronize_transition().
- */
- preempt_disable_notrace();
func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
stack_node);
@@ -120,7 +122,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
klp_arch_set_pc(fregs, (unsigned long)func->new_func);
unlock:
- preempt_enable_notrace();
ftrace_test_recursion_unlock(bit);
}
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b7be1df..7392bc7 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -7198,16 +7198,15 @@ void ftrace_reset_array_ops(struct trace_array *tr)
struct ftrace_ops *op;
int bit;
+ /*
+ * The ftrace_test_and_set_recursion() will disable preemption,
+ * which is required since some of the ops may be dynamically
+ * allocated, they must be freed after a synchronize_rcu().
+ */
bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START);
if (bit < 0)
return;
- /*
- * Some of the ops may be dynamically allocated,
- * they must be freed after a synchronize_rcu().
- */
- preempt_disable_notrace();
-
do_for_each_ftrace_op(op, ftrace_ops_list) {
/* Stub functions don't need to be called nor tested */
if (op->flags & FTRACE_OPS_FL_STUB)
@@ -7231,7 +7230,6 @@ void ftrace_reset_array_ops(struct trace_array *tr)
}
} while_for_each_ftrace_op(op);
out:
- preempt_enable_notrace();
trace_clear_recursion(bit);
}
@@ -7279,12 +7277,9 @@ static void ftrace_ops_assist_func(unsigned long ip, unsigned long parent_ip,
if (bit < 0)
return;
- preempt_disable_notrace();
-
if (!(op->flags & FTRACE_OPS_FL_RCU) || rcu_is_watching())
op->func(ip, parent_ip, op, fregs);
- preempt_enable_notrace();
trace_clear_recursion(bit);
}
NOKPROBE_SYMBOL(ftrace_ops_assist_func);
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 1f0e63f..9f1bfbe 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -186,7 +186,6 @@ static void function_trace_start(struct trace_array *tr)
return;
trace_ctx = tracing_gen_ctx();
- preempt_disable_notrace();
cpu = smp_processor_id();
data = per_cpu_ptr(tr->array_buffer.data, cpu);
@@ -194,7 +193,6 @@ static void function_trace_start(struct trace_array *tr)
trace_function(tr, ip, parent_ip, trace_ctx);
ftrace_test_recursion_unlock(bit);
- preempt_enable_notrace();
}
#ifdef CONFIG_UNWINDER_ORC
@@ -298,8 +296,6 @@ static inline void process_repeats(struct trace_array *tr,
if (bit < 0)
return;
- preempt_disable_notrace();
-
cpu = smp_processor_id();
data = per_cpu_ptr(tr->array_buffer.data, cpu);
if (atomic_read(&data->disabled))
@@ -324,7 +320,6 @@ static inline void process_repeats(struct trace_array *tr,
out:
ftrace_test_recursion_unlock(bit);
- preempt_enable_notrace();
}
static void
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH v5 1/2] ftrace: disable preemption when recursion locked
From: 王贇 @ 2021-10-27 1:54 UTC (permalink / raw)
To: Steven Rostedt
Cc: Peter Zijlstra (Intel), Paul Walmsley, James E.J. Bottomley,
Guo Ren, Jisheng Zhang, H. Peter Anvin, live-patching,
linux-riscv, Miroslav Benes, Joe Lawrence, Helge Deller, x86,
linux-csky, Ingo Molnar, Petr Mladek, Albert Ou, Jiri Kosina,
Nicholas Piggin, Borislav Petkov, Josh Poimboeuf, Thomas Gleixner,
linux-parisc, linux-kernel, Palmer Dabbelt, Masami Hiramatsu,
Paul Mackerras, linuxppc-dev
In-Reply-To: <20211026080117.366137a5@gandalf.local.home>
On 2021/10/26 下午8:01, Steven Rostedt wrote:
> On Tue, 26 Oct 2021 17:48:10 +0800
> 王贇 <yun.wang@linux.alibaba.com> wrote:
>
>>> The two comments should be updated too since Steven removed the "bit == 0"
>>> trick.
>>
>> Could you please give more hint on how will it be correct?
>>
>> I get the point that bit will no longer be 0, there are only -1 or > 0 now
>> so trace_test_and_set_recursion() will disable preemption on bit > 0 and
>> trace_clear_recursion() will enabled it since it should only be called when
>> bit > 0 (I remember we could use a WARN_ON here now :-P).
>>
>>>
>>>> @@ -178,7 +187,7 @@ static __always_inline void trace_clear_recursion(int bit)
>>>> * tracing recursed in the same context (normal vs interrupt),
>>>> *
>>>> * Returns: -1 if a recursion happened.
>>>> - * >= 0 if no recursion
>>>> + * > 0 if no recursion.
>>>> */
>>>> static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
>>>> unsigned long parent_ip)
>>>
>>> And this change would not be correct now.
>>
>> I thought it will no longer return 0 so I change it to > 0, isn't that correct?
>
> No it is not. I removed the bit + 1 return value, which means it returns the
> actual bit now. Which is 0 or more.
Ah, the return is bit not val, I must be drunk...
My apologize for the stupid comments... I'll send a v6 for this patch
only to fix that, please let me know if this is not a good way to fix
few lines of comments.
Regards,
Michael Wang
>
> -- Steve
>
^ permalink raw reply
* Re: [PATCH] powerpc/xmon: fix task state output
From: Michael Ellerman @ 2021-10-27 1:11 UTC (permalink / raw)
To: Denis Kirjanov, linuxppc-dev
In-Reply-To: <20211026133108.7113-1-kda@linux-powerpc.org>
Denis Kirjanov <kda@linux-powerpc.org> writes:
> p_state is unsigned since the commit 2f064a59a11f
>
> The patch also uses TASK_RUNNING instead of null.
>
> Fixes: 2f064a59a11f ("sched: Change task_struct::state")
> Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
> ---
> arch/powerpc/xmon/xmon.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> index dd8241c009e5..8b28ff9d98d1 100644
> --- a/arch/powerpc/xmon/xmon.c
> +++ b/arch/powerpc/xmon/xmon.c
> @@ -3264,8 +3264,7 @@ static void show_task(struct task_struct *volatile tsk)
> * appropriate for calling from xmon. This could be moved
> * to a common, generic, routine used by both.
> */
> - state = (p_state == 0) ? 'R' :
> - (p_state < 0) ? 'U' :
I guess 'U' meant 'unknown'? I always thought it meant uninterruptible,
but obviously that is 'D'.
> + state = (p_state == TASK_RUNNING) ? 'R' :
> (p_state & TASK_UNINTERRUPTIBLE) ? 'D' :
> (p_state & TASK_STOPPED) ? 'T' :
> (p_state & TASK_TRACED) ? 'C' :
I think a better cleanup would be to use task_is_running(),
task_is_traced(), task_is_stopped(). That way we're insulated somewhat
from any future changes.
That would add additional READ_ONCE()s of the state, but I don't think
we care, the task should not be running if the system is in xmon.
cheers
^ permalink raw reply
* Re: [PATCH v2] macintosh/via-pmu-led: make disk activity usage a parameter.
From: Michael Ellerman @ 2021-10-27 0:33 UTC (permalink / raw)
To: Hill Ma, Nathan Lynch; +Cc: linuxppc-dev, linux-kernel, linux-doc
In-Reply-To: <CABpQrUMcCKbgSTnTB4BeUUVwq5jkOw7pGbUC53SGe-4DEVnUag@mail.gmail.com>
Hill Ma <maahiuzeon@gmail.com> writes:
> Thanks for the review.
>
> On Tue, Oct 26, 2021 at 6:08 AM Nathan Lynch <nathanl@linux.ibm.com> wrote:
>>
>> Hello,
>>
>> Hill Ma <maahiuzeon@gmail.com> writes:
>> > Whether to use the LED as a disk activity is a user preference.
>> > Some like this usage while others find the LED too bright. So it
>> > might be a good idea to make this choice a runtime parameter rather
>> > than compile-time config.
>>
>> Users already have the ability to change the LED behavior at runtime
>> already, correct? I.e. they can do:
>>
>> echo none > /sys/class/leds/pmu-led::front/trigger
>>
>> in their boot scripts. Granted, a kernel built with ADB_PMU_LED_DISK=y
>> will blink the LED on disk activity until user space is running. Is this
>> unsatisfactory?
>
> Yes, indeed. As someone who does not like this behavior on iBooks.
>
>> > The default is set to disabled as OS X does not use the LED as a
>> > disk activity indicator.
>>
>> This is long-standing behavior in Linux and OS X has been EOL on this
>> architecture for a decade, so this isn't much of a consideration at this
>> point. Seems more important to avoid surprising existing users and
>> distributions with a behavior change that makes additional work for
>> them. See below.
>>
>> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> > index 43dc35fe5bc0..a656a51ba0a8 100644
>> > --- a/Documentation/admin-guide/kernel-parameters.txt
>> > +++ b/Documentation/admin-guide/kernel-parameters.txt
>> > @@ -250,6 +250,12 @@
>> > Use timer override. For some broken Nvidia NF5 boards
>> > that require a timer override, but don't have HPET
>> >
>> > + adb_pmu_led_disk [PPC]
>> > + Use front LED as disk LED by default. Only applies to
>> > + PowerBook, iBook, PowerMac 7,2/7,3.
>> > + Format: <bool> (1/Y/y=enable, 0/N/n=disable)
>> > + Default: disabled
>> > +
>> > add_efi_memmap [EFI; X86] Include EFI memory map in
>> > kernel's map of available physical RAM.
>> >
>> > diff --git a/drivers/macintosh/Kconfig b/drivers/macintosh/Kconfig
>> > index 5cdc361da37c..243215de563c 100644
>> > --- a/drivers/macintosh/Kconfig
>> > +++ b/drivers/macintosh/Kconfig
>> > @@ -78,16 +78,6 @@ config ADB_PMU_LED
>> > behaviour of the old CONFIG_BLK_DEV_IDE_PMAC_BLINK, select this
>> > and the disk LED trigger and configure appropriately through sysfs.
>> >
>> > -config ADB_PMU_LED_DISK
>> > - bool "Use front LED as DISK LED by default"
>> > - depends on ADB_PMU_LED
>> > - depends on LEDS_CLASS
>> > - select LEDS_TRIGGERS
>> > - select LEDS_TRIGGER_DISK
>> > - help
>> > - This option makes the front LED default to the disk trigger
>> > - so that it blinks on disk activity.
>> > -
>>
>> So, if I've been relying on CONFIG_ADB_PMU_LED_DISK=y and I upgrade to a
>> newer kernel with the proposed change, from my point of view the disk
>> activity LED has stopped working and I need to alter the bootloader
>> config or init scripts to restore the expected behavior. That seems
>> undesirable to me.
>>
>> I don't think we rigidly enforce Kconfig backward compatibility, but
>> when it comes to a user-visible function on a legacy platform where
>> users and distros likely have their configurations figured out already,
>> it's probably best to avoid such changes.
>
> I actually asked some distributions that still ship PowerPC BE
> architectures to unset it.
> https://github.com/void-ppc/void-packages/pull/48
> https://github.com/void-linux/void-packages/pull/33275
> https://git.adelielinux.org/adelie/packages/-/merge_requests/607
>
> And Debian, which still has PowerPC BE architectures as ports, does
> not turn it on.
> https://salsa.debian.org/kernel-team/linux/-/blob/master/debian/config/kernelarch-powerpc/config
Looks like all three distros have it disabled.
So let's drop the config option, make it disabled by default, and anyone
who wants to turn it on can do so in their initramfs or init scripts.
cheers
^ permalink raw reply
* instruction storage exception handling
From: Jacques de Laval @ 2021-10-26 22:44 UTC (permalink / raw)
To: linuxppc-dev@lists.ozlabs.org
Hi,
We are trying to upgrade kernel from 5.10 to 5.14.11. We have a Freescale/NXP
T1023 SOC with two e5500 cores, and are running in 32-bit mode:
CONFIG_PPC32=y
# CONFIG_PPC64 is not set
#
# Processor support
#
# CONFIG_PPC_BOOK3S_32 is not set
CONFIG_PPC_85xx=y
# CONFIG_PPC_8xx is not set
# CONFIG_40x is not set
# CONFIG_44x is not set
CONFIG_GENERIC_CPU=y
# CONFIG_E5500_CPU is not set
# CONFIG_E6500_CPU is not set
CONFIG_E500=y
CONFIG_PPC_E500MC=y
CONFIG_PPC_FPU_REGS=y
CONFIG_PPC_FPU=y
CONFIG_FSL_EMB_PERFMON=y
CONFIG_FSL_EMB_PERF_EVENT=y
CONFIG_FSL_EMB_PERF_EVENT_E500=y
CONFIG_BOOKE=y
CONFIG_FSL_BOOKE=y
CONFIG_PPC_FSL_BOOK3E=y
CONFIG_PTE_64BIT=y
CONFIG_PHYS_64BIT=y
CONFIG_PPC_MMU_NOHASH=y
CONFIG_PPC_BOOK3E_MMU=y
# CONFIG_PMU_SYSFS is not set
CONFIG_SMP=y
CONFIG_NR_CPUS=2
CONFIG_PPC_DOORBELL=y
# end of Processor support
We compile using 32-bit Bootlin PPC toolchain:
powerpc-e500mc glibc bleeding-edge 2020.08-1.
When booting, and starting PID 1 we sometimes get a hang. Nothing but our init
process is running, and for debugging purposes our init currently looks like
this:
int main(int argc, char *argv[]){
for (int i = 0; ; i++) {
FILE *fp = fopen("/dev/kmsg", "w");
if (fp) {
fprintf(fp, "%d\n", i);
fclose(fp);
}
sleep(1);
}
return 0;
}
When the hangup occur we don't get any output at all from our PID 1.
The last output is from the kernel:
Run /sbin/init as init process
with arguments:
/sbin/init
with environment:
HOME=/
TERM=linux
kgdboc=ttyS0,115200
When issuing a backtrace on all active cpus we can see that the kernel is
handling an instruction storage exception:
sysrq: Show backtrace of all active CPUs
sysrq: CPU0:
CPU: 0 PID: 1 Comm: init Not tainted 5.14.11 #1
NIP: c02aac78 LR: c02aac2c CTR: 00000000
REGS: c1907d40 TRAP: 0500 Not tainted (5.14.11)
MSR: 00029002 <CE,EE,ME> CR: 82244284 XER: 20000000
GPR00: 0000000f c1907e20 c1910000 0065a000 000001d0 01100cca c1907e84 0000000c
GPR08: d39a8000 000001d3 0000000c c1907f10 42244284 00000000 00740514 bfb71670
GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
GPR24: 00708558 00701698 d3994040 00029002 c1907f20 0065a238 00000355 d39a2790
NIP [c02aac78] handle_mm_fault+0xf8/0x11f0
LR [c02aac2c] handle_mm_fault+0xac/0x11f0
Call Trace:
[c1907e20] [c02aac10] handle_mm_fault+0x90/0x11f0 (unreliable)
[c1907ec0] [c003078c] ___do_page_fault+0x26c/0x780
[c1907ef0] [c0030cd4] do_page_fault+0x34/0x100
[c1907f10] [c0000988] InstructionStorage+0x108/0x120
--- interrupt: 400 at 0x65a238
NIP: 0065a238 LR: 0052f26c CTR: 0052f260
REGS: c1907f20 TRAP: 0400 Not tainted (5.14.11)
MSR: 0002d002 <CE,EE,PR,ME> CR: 42242284 XER: 00000000
GPR00: b7be9914 bfb71620 b7c203a0 8c008000 0070400d b7c182a0 000b8260 0052f260
GPR08: 0047d448 0052f260 0000000a 00000003 42242284 00000000 00740514 bfb71670
GPR16: 007040e6 00701418 b7c1a5f0 00702f18 00000000 bfb71690 0000fff1 b7c1c478
GPR24: 00708558 00701698 00700000 00000015 b7c1c2b0 00707e20 b7c1b8a8 bfb71660
NIP [0065a238] 0x65a238
LR [0052f26c] 0x52f26c
--- interrupt: 400
Instruction dump:
60a500c0 811f0020 57aa6cfa 813f0000 57a30026 809f004c 81080024 7d29e850
90a1002c 5529a33e 93c10038 7d244a14 <90610034> 7d485215 91210030 41c203dc
We have also observed that the CPU is continuously servicing the same interrupt
(north of 140k times per sec), it is not deadlocked.
We have not yet been able to reproduce this behavior under QEMU system
emulation.
When bisecting between 5.10 and 5.14.11 we can see that this behavior started
with commit a01a3f2ddbcda83e8572787c0ec1dcbeba86915a:
powerpc: remove arguments from fault handler functions
Our best guess that the instruction storage exception is not properly handled
and the kernel is never able to recover from the page fault, but we don't
really know how to proceed. Does anyone have any suggestions or insights?
Regards,
Jacques de Laval
^ permalink raw reply
* Re: [PATCH 0/2] ftrace: make sure preemption disabled on recursion testing
From: Palmer Dabbelt @ 2021-10-26 23:48 UTC (permalink / raw)
To: yun.wang
Cc: peterz, Paul Walmsley, James.Bottomley, guoren, jszhang, hpa,
live-patching, linux-riscv, mbenes, paulus, joe.lawrence, deller,
x86, linux-csky, mingo, pmladek, aou, jikos, rostedt, bp, npiggin,
jpoimboe, tglx, linux-parisc, linux-kernel, mhiramat, colin.king,
linuxppc-dev
In-Reply-To: <8c7de46d-9869-aa5e-2bb9-5dbc2eda395e@linux.alibaba.com>
On Mon, 11 Oct 2021 22:39:16 PDT (-0700), yun.wang@linux.alibaba.com wrote:
> The testing show that perf_ftrace_function_call() are using
> smp_processor_id() with preemption enabled, all the checking
> on CPU could be wrong after preemption, PATCH 1/2 will fix
> that.
>
> Besides, as Peter point out, the testing of recursion within
> the section between ftrace_test_recursion_trylock()/_unlock()
> pair also need the preemption disabled as the documentation
> explained, PATCH 2/2 will make sure on that.
>
> Michael Wang (2):
> ftrace: disable preemption on the testing of recursion
> ftrace: prevent preemption in perf_ftrace_function_call()
>
> arch/csky/kernel/probes/ftrace.c | 2 --
> arch/parisc/kernel/ftrace.c | 2 --
> arch/powerpc/kernel/kprobes-ftrace.c | 2 --
> arch/riscv/kernel/probes/ftrace.c | 2 --
> arch/x86/kernel/kprobes/ftrace.c | 2 --
> include/linux/trace_recursion.h | 10 +++++++++-
> kernel/livepatch/patch.c | 6 ------
> kernel/trace/trace_event_perf.c | 17 +++++++++++++----
> kernel/trace/trace_functions.c | 5 -----
> 9 files changed, 22 insertions(+), 26 deletions(-)
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> # RISC-V
^ permalink raw reply
* Re: [PATCH][next] powerpc/vas: Fix potential NULL pointer dereference
From: Gustavo A. R. Silva @ 2021-10-26 22:40 UTC (permalink / raw)
To: Michael Ellerman
Cc: Tyrel Datwyler, Haren Myneni, linux-kernel, Nicholas Piggin,
Paul Mackerras, linux-hardening, linuxppc-dev
In-Reply-To: <87h7d3beqq.fsf@mpe.ellerman.id.au>
On Wed, Oct 27, 2021 at 09:30:53AM +1100, Michael Ellerman wrote:
[..]
> > I think I'll take this in my tree.
>
> I've already put it in powerpc/next:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next&id=61cb9ac66b30374c7fd8a8b2a3c4f8f432c72e36
Oh, great. :)
> If you need to pick it up as well for some reason that's fine.
I just didn't want it to get lost somehow. I'll drop it from tree now.
Thanks
--
Gustavo
^ permalink raw reply
* Re: [PATCH][next] powerpc/vas: Fix potential NULL pointer dereference
From: Michael Ellerman @ 2021-10-26 22:30 UTC (permalink / raw)
To: Gustavo A. R. Silva, Tyrel Datwyler
Cc: Haren Myneni, linux-kernel, Nicholas Piggin, Paul Mackerras,
linux-hardening, linuxppc-dev
In-Reply-To: <20211026184201.GB1457721@embeddedor>
"Gustavo A. R. Silva" <gustavoars@kernel.org> writes:
> On Mon, Oct 18, 2021 at 02:09:31PM -0700, Tyrel Datwyler wrote:
>> On 10/14/21 10:03 PM, Gustavo A. R. Silva wrote:
>> > (!ptr && !ptr->foo) strikes again. :)
>> >
>> > The expression (!ptr && !ptr->foo) is bogus and in case ptr is NULL,
>> > it leads to a NULL pointer dereference: ptr->foo.
>> >
>> > Fix this by converting && to ||
>> >
>> > This issue was detected with the help of Coccinelle, and audited and
>> > fixed manually.
>> >
>> > Fixes: 1a0d0d5ed5e3 ("powerpc/vas: Add platform specific user window operations")
>> > Cc: stable@vger.kernel.org
>> > Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
>> Looking at the usage pattern it is obvious that if we determine !ptr attempting
>> to also confirm !ptr->ops is going to blow up.
>>
>> LGTM.
>>
>> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
>
> I think I'll take this in my tree.
I've already put it in powerpc/next:
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next&id=61cb9ac66b30374c7fd8a8b2a3c4f8f432c72e36
If you need to pick it up as well for some reason that's fine.
cheers
^ permalink raw reply
* Re: [PATCH v7 1/3] riscv: Introduce CONFIG_RELOCATABLE
From: Palmer Dabbelt @ 2021-10-26 21:29 UTC (permalink / raw)
To: alex
Cc: aou, linux-kernel, paulus, Paul Walmsley, linux-riscv,
alexandre.ghiti, linuxppc-dev
In-Reply-To: <a6223864-7263-5450-0890-0f05a137d8c2@ghiti.fr>
On Sat, 09 Oct 2021 10:20:20 PDT (-0700), alex@ghiti.fr wrote:
> Arf, I have sent this patchset with the wrong email address. @Palmer
> tell me if you want me to resend it correctly.
Sorry for being kind of slow here. It's fine: there's a "From:" in the
patch, and git picks those up so it'll match the signed-off-by line. I
send pretty much all my patches that way, as I never managed to get my
Google address working correctly.
>
> Thanks,
>
> Alex
>
> On 10/9/21 7:12 PM, Alexandre Ghiti wrote:
>> From: Alexandre Ghiti <alex@ghiti.fr>
>>
>> This config allows to compile 64b kernel as PIE and to relocate it at
>> any virtual address at runtime: this paves the way to KASLR.
>> Runtime relocation is possible since relocation metadata are embedded into
>> the kernel.
IMO this should really be user selectable, at a bare minimum so it's testable.
I just sent along a patch to do that (my power's off at home, so email is a bit
wacky right now).
I haven't put this on for-next yet as I'm not sure if you had a fix for the
kasan issue (which IIUC would conflict with this).
>> Note that relocating at runtime introduces an overhead even if the
>> kernel is loaded at the same address it was linked at and that the compiler
>> options are those used in arm64 which uses the same RELA relocation
>> format.
>>
>> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
>> ---
>> arch/riscv/Kconfig | 12 ++++++++
>> arch/riscv/Makefile | 7 +++--
>> arch/riscv/kernel/vmlinux.lds.S | 6 ++++
>> arch/riscv/mm/Makefile | 4 +++
>> arch/riscv/mm/init.c | 54 ++++++++++++++++++++++++++++++++-
>> 5 files changed, 80 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>> index ea16fa2dd768..043ba92559fa 100644
>> --- a/arch/riscv/Kconfig
>> +++ b/arch/riscv/Kconfig
>> @@ -213,6 +213,18 @@ config PGTABLE_LEVELS
>> config LOCKDEP_SUPPORT
>> def_bool y
>>
>> +config RELOCATABLE
>> + bool
>> + depends on MMU && 64BIT && !XIP_KERNEL
>> + help
>> + This builds a kernel as a Position Independent Executable (PIE),
>> + which retains all relocation metadata required to relocate the
>> + kernel binary at runtime to a different virtual address than the
>> + address it was linked at.
>> + Since RISCV uses the RELA relocation format, this requires a
>> + relocation pass at runtime even if the kernel is loaded at the
>> + same address it was linked at.
>> +
>> source "arch/riscv/Kconfig.socs"
>> source "arch/riscv/Kconfig.erratas"
>>
>> diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
>> index 0eb4568fbd29..2f509915f246 100644
>> --- a/arch/riscv/Makefile
>> +++ b/arch/riscv/Makefile
>> @@ -9,9 +9,12 @@
>> #
>>
>> OBJCOPYFLAGS := -O binary
>> -LDFLAGS_vmlinux :=
>> +ifeq ($(CONFIG_RELOCATABLE),y)
>> + LDFLAGS_vmlinux += -shared -Bsymbolic -z notext -z norelro
>> + KBUILD_CFLAGS += -fPIE
>> +endif
>> ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
>> - LDFLAGS_vmlinux := --no-relax
>> + LDFLAGS_vmlinux += --no-relax
>> KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
>> CC_FLAGS_FTRACE := -fpatchable-function-entry=8
>> endif
>> diff --git a/arch/riscv/kernel/vmlinux.lds.S b/arch/riscv/kernel/vmlinux.lds.S
>> index 5104f3a871e3..862a8c09723c 100644
>> --- a/arch/riscv/kernel/vmlinux.lds.S
>> +++ b/arch/riscv/kernel/vmlinux.lds.S
>> @@ -133,6 +133,12 @@ SECTIONS
>>
>> BSS_SECTION(PAGE_SIZE, PAGE_SIZE, 0)
>>
>> + .rela.dyn : ALIGN(8) {
>> + __rela_dyn_start = .;
>> + *(.rela .rela*)
>> + __rela_dyn_end = .;
>> + }
>> +
>> #ifdef CONFIG_EFI
>> . = ALIGN(PECOFF_SECTION_ALIGNMENT);
>> __pecoff_data_virt_size = ABSOLUTE(. - __pecoff_text_end);
>> diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
>> index 7ebaef10ea1b..2d33ec574bbb 100644
>> --- a/arch/riscv/mm/Makefile
>> +++ b/arch/riscv/mm/Makefile
>> @@ -1,6 +1,10 @@
>> # SPDX-License-Identifier: GPL-2.0-only
>>
>> CFLAGS_init.o := -mcmodel=medany
>> +ifdef CONFIG_RELOCATABLE
>> +CFLAGS_init.o += -fno-pie
>> +endif
>> +
>> ifdef CONFIG_FTRACE
>> CFLAGS_REMOVE_init.o = $(CC_FLAGS_FTRACE)
>> CFLAGS_REMOVE_cacheflush.o = $(CC_FLAGS_FTRACE)
>> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
>> index c0cddf0fc22d..42041c12d496 100644
>> --- a/arch/riscv/mm/init.c
>> +++ b/arch/riscv/mm/init.c
>> @@ -20,6 +20,9 @@
>> #include <linux/dma-map-ops.h>
>> #include <linux/crash_dump.h>
>> #include <linux/hugetlb.h>
>> +#ifdef CONFIG_RELOCATABLE
>> +#include <linux/elf.h>
>> +#endif
>>
>> #include <asm/fixmap.h>
>> #include <asm/tlbflush.h>
>> @@ -103,7 +106,7 @@ static void __init print_vm_layout(void)
>> print_mlm("lowmem", (unsigned long)PAGE_OFFSET,
>> (unsigned long)high_memory);
>> #ifdef CONFIG_64BIT
>> - print_mlm("kernel", (unsigned long)KERNEL_LINK_ADDR,
>> + print_mlm("kernel", (unsigned long)kernel_map.virt_addr,
>> (unsigned long)ADDRESS_SPACE_END);
>> #endif
>> }
>> @@ -518,6 +521,44 @@ static __init pgprot_t pgprot_from_va(uintptr_t va)
>> #error "setup_vm() is called from head.S before relocate so it should not use absolute addressing."
>> #endif
>>
>> +#ifdef CONFIG_RELOCATABLE
>> +extern unsigned long __rela_dyn_start, __rela_dyn_end;
>> +
>> +static void __init relocate_kernel(void)
>> +{
>> + Elf64_Rela *rela = (Elf64_Rela *)&__rela_dyn_start;
>> + /*
>> + * This holds the offset between the linked virtual address and the
>> + * relocated virtual address.
>> + */
>> + uintptr_t reloc_offset = kernel_map.virt_addr - KERNEL_LINK_ADDR;
>> + /*
>> + * This holds the offset between kernel linked virtual address and
>> + * physical address.
>> + */
>> + uintptr_t va_kernel_link_pa_offset = KERNEL_LINK_ADDR - kernel_map.phys_addr;
>> +
>> + for ( ; rela < (Elf64_Rela *)&__rela_dyn_end; rela++) {
>> + Elf64_Addr addr = (rela->r_offset - va_kernel_link_pa_offset);
>> + Elf64_Addr relocated_addr = rela->r_addend;
>> +
>> + if (rela->r_info != R_RISCV_RELATIVE)
>> + continue;
>> +
>> + /*
>> + * Make sure to not relocate vdso symbols like rt_sigreturn
>> + * which are linked from the address 0 in vmlinux since
>> + * vdso symbol addresses are actually used as an offset from
>> + * mm->context.vdso in VDSO_OFFSET macro.
>> + */
>> + if (relocated_addr >= KERNEL_LINK_ADDR)
>> + relocated_addr += reloc_offset;
>> +
>> + *(Elf64_Addr *)addr = relocated_addr;
>> + }
>> +}
>> +#endif /* CONFIG_RELOCATABLE */
>> +
>> #ifdef CONFIG_XIP_KERNEL
>> static void __init create_kernel_page_table(pgd_t *pgdir,
>> __always_unused bool early)
>> @@ -625,6 +666,17 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>> BUG_ON((kernel_map.virt_addr + kernel_map.size) > ADDRESS_SPACE_END - SZ_4K);
>> #endif
>>
>> +#ifdef CONFIG_RELOCATABLE
>> + /*
>> + * Early page table uses only one PGDIR, which makes it possible
>> + * to map PGDIR_SIZE aligned on PGDIR_SIZE: if the relocation offset
>> + * makes the kernel cross over a PGDIR_SIZE boundary, raise a bug
>> + * since a part of the kernel would not get mapped.
>> + */
>> + BUG_ON(PGDIR_SIZE - (kernel_map.virt_addr & (PGDIR_SIZE - 1)) < kernel_map.size);
>> + relocate_kernel();
>> +#endif
>> +
>> pt_ops.alloc_pte = alloc_pte_early;
>> pt_ops.get_pte_virt = get_pte_virt_early;
>> #ifndef __PAGETABLE_PMD_FOLDED
^ permalink raw reply
* [powerpc:next] BUILD SUCCESS 319fa1a52e438a6e028329187783a25ad498c4e6
From: kernel test robot @ 2021-10-26 20:37 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
branch HEAD: 319fa1a52e438a6e028329187783a25ad498c4e6 powerpc/pseries/mobility: ignore ibm, platform-facilities updates
elapsed time: 1963m
configs tested: 64
configs skipped: 3
The following configs have been built successfully.
More configs may be tested in the coming days.
gcc tested configs:
arm defconfig
arm64 allyesconfig
arm64 defconfig
arm allyesconfig
arm allmodconfig
ia64 allmodconfig
ia64 defconfig
ia64 allyesconfig
m68k allmodconfig
m68k defconfig
m68k allyesconfig
nios2 defconfig
arc allyesconfig
nds32 allnoconfig
nds32 defconfig
nios2 allyesconfig
csky defconfig
alpha defconfig
alpha allyesconfig
xtensa allyesconfig
h8300 allyesconfig
arc defconfig
sh allmodconfig
parisc defconfig
s390 allyesconfig
s390 allmodconfig
parisc allyesconfig
s390 defconfig
i386 allyesconfig
sparc allyesconfig
i386 defconfig
i386 debian-10.3
sparc defconfig
mips allyesconfig
mips allmodconfig
powerpc allyesconfig
powerpc allmodconfig
powerpc allnoconfig
x86_64 randconfig-a002-20211024
x86_64 randconfig-a004-20211024
x86_64 randconfig-a005-20211024
x86_64 randconfig-a006-20211024
x86_64 randconfig-a001-20211024
x86_64 randconfig-a003-20211024
riscv allyesconfig
riscv allnoconfig
riscv defconfig
riscv allmodconfig
riscv nommu_k210_defconfig
riscv nommu_virt_defconfig
riscv rv32_defconfig
um x86_64_defconfig
um i386_defconfig
x86_64 rhel-8.3-kselftests
x86_64 allyesconfig
x86_64 defconfig
x86_64 kexec
x86_64 rhel-8.3
clang tested configs:
x86_64 randconfig-a002-20211025
x86_64 randconfig-a004-20211025
x86_64 randconfig-a005-20211025
x86_64 randconfig-a006-20211025
x86_64 randconfig-a001-20211025
x86_64 randconfig-a003-20211025
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply
* Re: [PATCH][next] powerpc/vas: Fix potential NULL pointer dereference
From: Gustavo A. R. Silva @ 2021-10-26 18:42 UTC (permalink / raw)
To: Tyrel Datwyler
Cc: Haren Myneni, linux-kernel, Nicholas Piggin, Paul Mackerras,
linux-hardening, linuxppc-dev
In-Reply-To: <97c42e43-15b9-5db6-d460-dbb16f31954d@linux.ibm.com>
On Mon, Oct 18, 2021 at 02:09:31PM -0700, Tyrel Datwyler wrote:
> On 10/14/21 10:03 PM, Gustavo A. R. Silva wrote:
> > (!ptr && !ptr->foo) strikes again. :)
> >
> > The expression (!ptr && !ptr->foo) is bogus and in case ptr is NULL,
> > it leads to a NULL pointer dereference: ptr->foo.
> >
> > Fix this by converting && to ||
> >
> > This issue was detected with the help of Coccinelle, and audited and
> > fixed manually.
> >
> > Fixes: 1a0d0d5ed5e3 ("powerpc/vas: Add platform specific user window operations")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
> Looking at the usage pattern it is obvious that if we determine !ptr attempting
> to also confirm !ptr->ops is going to blow up.
>
> LGTM.
>
> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
I think I'll take this in my tree.
Thanks, Tyrel.
--
Gustavo
^ permalink raw reply
* [PATCH 0/2] powerpc prevents deadlock in the watchdog path
From: Laurent Dufour @ 2021-10-26 16:27 UTC (permalink / raw)
To: mpe, benh, paulus; +Cc: linuxppc-dev, linux-kernel, npiggin
While doing LPM on large system (for instance a Brazos system with 1024
CPUs and 12TB of memory) with an heavy load (I ran 'stress-ng --futex 500
-vm 5'), watchdog hard lockup are seen when the hypervisor is taking
too much time handling the page tables to track page's changes.
When this happens, the system may hung with a deadlock between the watchdog
lock and the console owner lock.
The first patch of this series prevents that deadlock by not calling printk
while holding the watchdog lock, and also not sending IPI (and waiting for
CPU's answer during 1s) while holding the watchdog lock.
The second patch ensures that the watchdog's data are accessed under the
protection of the watchdog lock.
Laurent Dufour (2):
powerpc/watchdog: prevent printk and send IPI while holding the wd
lock
powerpc/watchdog: ensure watchdog data accesses are protected
arch/powerpc/kernel/watchdog.c | 45 +++++++++++++++++++---------------
1 file changed, 25 insertions(+), 20 deletions(-)
--
2.33.1
^ permalink raw reply
* [PATCH 2/2] powerpc/watchdog: ensure watchdog data accesses are protected
From: Laurent Dufour @ 2021-10-26 16:27 UTC (permalink / raw)
To: mpe, benh, paulus; +Cc: linuxppc-dev, linux-kernel, npiggin
In-Reply-To: <20211026162740.16283-1-ldufour@linux.ibm.com>
The wd_smp_cpus_pending CPU mask should be accessed under the protection of
the __wd_smp_lock.
This prevents false alarm to be raised when the system is under an heavy
stress. This has been seen while doing LPM on large system with a big
workload.
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
arch/powerpc/kernel/watchdog.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index bc7411327066..8d7a1a86187e 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -203,12 +203,13 @@ static void watchdog_smp_panic(int cpu, u64 tb)
static void wd_smp_clear_cpu_pending(int cpu, u64 tb)
{
+ unsigned long flags;
+
+ wd_smp_lock(&flags);
if (!cpumask_test_cpu(cpu, &wd_smp_cpus_pending)) {
if (unlikely(cpumask_test_cpu(cpu, &wd_smp_cpus_stuck))) {
struct pt_regs *regs = get_irq_regs();
- unsigned long flags;
- wd_smp_lock(&flags);
cpumask_clear_cpu(cpu, &wd_smp_cpus_stuck);
wd_smp_unlock(&flags);
@@ -219,22 +220,23 @@ static void wd_smp_clear_cpu_pending(int cpu, u64 tb)
show_regs(regs);
else
dump_stack();
+ return;
}
+
+ wd_smp_unlock(&flags);
return;
}
+
cpumask_clear_cpu(cpu, &wd_smp_cpus_pending);
if (cpumask_empty(&wd_smp_cpus_pending)) {
- unsigned long flags;
-
- wd_smp_lock(&flags);
if (cpumask_empty(&wd_smp_cpus_pending)) {
wd_smp_last_reset_tb = tb;
cpumask_andnot(&wd_smp_cpus_pending,
&wd_cpus_enabled,
&wd_smp_cpus_stuck);
}
- wd_smp_unlock(&flags);
}
+ wd_smp_unlock(&flags);
}
static void watchdog_timer_interrupt(int cpu)
--
2.33.1
^ permalink raw reply related
* [PATCH 1/2] powerpc/watchdog: prevent printk and send IPI while holding the wd lock
From: Laurent Dufour @ 2021-10-26 16:27 UTC (permalink / raw)
To: mpe, benh, paulus; +Cc: linuxppc-dev, linux-kernel, npiggin
In-Reply-To: <20211026162740.16283-1-ldufour@linux.ibm.com>
When handling the Watchdog interrupt, long processing should not be done
while holding the __wd_smp_lock. This prevents the other CPUs to grab it
and to process Watchdog timer interrupts. Furhtermore, this could lead to
the following situation:
CPU x detect lockup on CPU y and grab the __wd_smp_lock
in watchdog_smp_panic()
CPU y caught the watchdog interrupt and try to grab the __wd_smp_lock
in soft_nmi_interrupt()
CPU x wait for CPU y to catch the IPI for 1s in __smp_send_nmi_ipi()
CPU x will timeout and so has spent 1s waiting while holding the
__wd_smp_lock.
A deadlock may also happen between the __wd_smp_lock and the console_owner
'lock' this way:
CPU x grab the console_owner
CPU y grab the __wd_smp_lock
CPU x catch the watchdog timer interrupt and needs to grab __wd_smp_lock
CPU y wants to print something and wait for console_owner
-> deadlock
Doing all the long processing without holding the _wd_smp_lock prevents
these situations.
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
arch/powerpc/kernel/watchdog.c | 31 +++++++++++++++++--------------
1 file changed, 17 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index f9ea0e5357f9..bc7411327066 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -149,6 +149,8 @@ static void set_cpu_stuck(int cpu, u64 tb)
static void watchdog_smp_panic(int cpu, u64 tb)
{
+ cpumask_t cpus_pending_copy;
+ u64 last_reset_tb_copy;
unsigned long flags;
int c;
@@ -161,29 +163,32 @@ static void watchdog_smp_panic(int cpu, u64 tb)
if (cpumask_weight(&wd_smp_cpus_pending) == 0)
goto out;
+ cpumask_copy(&cpus_pending_copy, &wd_smp_cpus_pending);
+ last_reset_tb_copy = wd_smp_last_reset_tb;
+
+ /* Take the stuck CPUs out of the watch group */
+ set_cpumask_stuck(&wd_smp_cpus_pending, tb);
+
+ wd_smp_unlock(&flags);
+
pr_emerg("CPU %d detected hard LOCKUP on other CPUs %*pbl\n",
- cpu, cpumask_pr_args(&wd_smp_cpus_pending));
+ cpu, cpumask_pr_args(&cpus_pending_copy));
pr_emerg("CPU %d TB:%lld, last SMP heartbeat TB:%lld (%lldms ago)\n",
- cpu, tb, wd_smp_last_reset_tb,
- tb_to_ns(tb - wd_smp_last_reset_tb) / 1000000);
+ cpu, tb, last_reset_tb_copy,
+ tb_to_ns(tb - last_reset_tb_copy) / 1000000);
if (!sysctl_hardlockup_all_cpu_backtrace) {
/*
* Try to trigger the stuck CPUs, unless we are going to
* get a backtrace on all of them anyway.
*/
- for_each_cpu(c, &wd_smp_cpus_pending) {
+ for_each_cpu(c, &cpus_pending_copy) {
if (c == cpu)
continue;
smp_send_nmi_ipi(c, wd_lockup_ipi, 1000000);
}
}
- /* Take the stuck CPUs out of the watch group */
- set_cpumask_stuck(&wd_smp_cpus_pending, tb);
-
- wd_smp_unlock(&flags);
-
if (sysctl_hardlockup_all_cpu_backtrace)
trigger_allbutself_cpu_backtrace();
@@ -204,6 +209,8 @@ static void wd_smp_clear_cpu_pending(int cpu, u64 tb)
unsigned long flags;
wd_smp_lock(&flags);
+ cpumask_clear_cpu(cpu, &wd_smp_cpus_stuck);
+ wd_smp_unlock(&flags);
pr_emerg("CPU %d became unstuck TB:%lld\n",
cpu, tb);
@@ -212,9 +219,6 @@ static void wd_smp_clear_cpu_pending(int cpu, u64 tb)
show_regs(regs);
else
dump_stack();
-
- cpumask_clear_cpu(cpu, &wd_smp_cpus_stuck);
- wd_smp_unlock(&flags);
}
return;
}
@@ -267,6 +271,7 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
return 0;
}
set_cpu_stuck(cpu, tb);
+ wd_smp_unlock(&flags);
pr_emerg("CPU %d self-detected hard LOCKUP @ %pS\n",
cpu, (void *)regs->nip);
@@ -277,8 +282,6 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
print_irqtrace_events(current);
show_regs(regs);
- wd_smp_unlock(&flags);
-
if (sysctl_hardlockup_all_cpu_backtrace)
trigger_allbutself_cpu_backtrace();
--
2.33.1
^ permalink raw reply related
* Re: [PATCH] mm/migrate.c: Remove MIGRATE_PFN_LOCKED
From: Felix Kuehling @ 2021-10-26 16:09 UTC (permalink / raw)
To: Alistair Popple, akpm
Cc: rcampbell, amd-gfx, nouveau, linux-kernel, dri-devel, linux-mm,
jglisse, kvm-ppc, ziy, jhubbard, alexander.deucher, linuxppc-dev,
hch, bskeggs
In-Reply-To: <20211025041608.289017-1-apopple@nvidia.com>
Am 2021-10-25 um 12:16 a.m. schrieb Alistair Popple:
> MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a
> source page was already locked during migrate_vma_collect(). If it
> wasn't then the a second attempt is made to lock the page. However if
> the first attempt failed it's unlikely a second attempt will succeed,
> and the retry adds complexity. So clean this up by removing the retry
> and MIGRATE_PFN_LOCKED flag.
>
> Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag
> set, but nothing actually checks that.
>
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
It makes sense to me. Do you have any empirical data on how much more
likely migrations are going to fail with this change due to contested
page locks?
Either way, the patch is
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
> Documentation/vm/hmm.rst | 2 +-
> arch/powerpc/kvm/book3s_hv_uvmem.c | 4 +-
> drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 -
> drivers/gpu/drm/nouveau/nouveau_dmem.c | 4 +-
> include/linux/migrate.h | 1 -
> lib/test_hmm.c | 5 +-
> mm/migrate.c | 145 +++++------------------
> 7 files changed, 35 insertions(+), 128 deletions(-)
>
> diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
> index a14c2938e7af..f2a59ed82ed3 100644
> --- a/Documentation/vm/hmm.rst
> +++ b/Documentation/vm/hmm.rst
> @@ -360,7 +360,7 @@ between device driver specific code and shared common code:
> system memory page, locks the page with ``lock_page()``, and fills in the
> ``dst`` array entry with::
>
> - dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> + dst[i] = migrate_pfn(page_to_pfn(dpage));
>
> Now that the driver knows that this page is being migrated, it can
> invalidate device private MMU mappings and copy device private memory
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index a7061ee3b157..28c436df9935 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -560,7 +560,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
> gpa, 0, page_shift);
>
> if (ret == U_SUCCESS)
> - *mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
> + *mig.dst = migrate_pfn(pfn);
> else {
> unlock_page(dpage);
> __free_page(dpage);
> @@ -774,7 +774,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
> }
> }
>
> - *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> + *mig.dst = migrate_pfn(page_to_pfn(dpage));
> migrate_vma_pages(&mig);
> out_finalize:
> migrate_vma_finalize(&mig);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> index 4a16e3c257b9..41d9417f182b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> @@ -300,7 +300,6 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
> migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]);
> svm_migrate_get_vram_page(prange, migrate->dst[i]);
> migrate->dst[i] = migrate_pfn(migrate->dst[i]);
> - migrate->dst[i] |= MIGRATE_PFN_LOCKED;
> src[i] = dma_map_page(dev, spage, 0, PAGE_SIZE,
> DMA_TO_DEVICE);
> r = dma_mapping_error(dev, src[i]);
> @@ -580,7 +579,6 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
> dst[i] >> PAGE_SHIFT, page_to_pfn(dpage));
>
> migrate->dst[i] = migrate_pfn(page_to_pfn(dpage));
> - migrate->dst[i] |= MIGRATE_PFN_LOCKED;
> j++;
> }
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> index 92987daa5e17..3828aafd3ac4 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> @@ -166,7 +166,7 @@ static vm_fault_t nouveau_dmem_fault_copy_one(struct nouveau_drm *drm,
> goto error_dma_unmap;
> mutex_unlock(&svmm->mutex);
>
> - args->dst[0] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> + args->dst[0] = migrate_pfn(page_to_pfn(dpage));
> return 0;
>
> error_dma_unmap:
> @@ -602,7 +602,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm,
> ((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT);
> if (src & MIGRATE_PFN_WRITE)
> *pfn |= NVIF_VMM_PFNMAP_V0_W;
> - return migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> + return migrate_pfn(page_to_pfn(dpage));
>
> out_dma_unmap:
> dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index c8077e936691..479b861ae490 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -119,7 +119,6 @@ static inline int migrate_misplaced_page(struct page *page,
> */
> #define MIGRATE_PFN_VALID (1UL << 0)
> #define MIGRATE_PFN_MIGRATE (1UL << 1)
> -#define MIGRATE_PFN_LOCKED (1UL << 2)
> #define MIGRATE_PFN_WRITE (1UL << 3)
> #define MIGRATE_PFN_SHIFT 6
>
> diff --git a/lib/test_hmm.c b/lib/test_hmm.c
> index c259842f6d44..e2ce8f9b7605 100644
> --- a/lib/test_hmm.c
> +++ b/lib/test_hmm.c
> @@ -613,8 +613,7 @@ static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args,
> */
> rpage->zone_device_data = dmirror;
>
> - *dst = migrate_pfn(page_to_pfn(dpage)) |
> - MIGRATE_PFN_LOCKED;
> + *dst = migrate_pfn(page_to_pfn(dpage));
> if ((*src & MIGRATE_PFN_WRITE) ||
> (!spage && args->vma->vm_flags & VM_WRITE))
> *dst |= MIGRATE_PFN_WRITE;
> @@ -1137,7 +1136,7 @@ static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args,
> lock_page(dpage);
> xa_erase(&dmirror->pt, addr >> PAGE_SHIFT);
> copy_highpage(dpage, spage);
> - *dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> + *dst = migrate_pfn(page_to_pfn(dpage));
> if (*src & MIGRATE_PFN_WRITE)
> *dst |= MIGRATE_PFN_WRITE;
> }
> diff --git a/mm/migrate.c b/mm/migrate.c
> index a6a7743ee98f..915e969811d0 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2369,7 +2369,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> * can't be dropped from it).
> */
> get_page(page);
> - migrate->cpages++;
>
> /*
> * Optimize for the common case where page is only mapped once
> @@ -2379,7 +2378,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> if (trylock_page(page)) {
> pte_t swp_pte;
>
> - mpfn |= MIGRATE_PFN_LOCKED;
> + migrate->cpages++;
> ptep_get_and_clear(mm, addr, ptep);
>
> /* Setup special migration page table entry */
> @@ -2413,6 +2412,9 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>
> if (pte_present(pte))
> unmapped++;
> + } else {
> + put_page(page);
> + mpfn = 0;
> }
>
> next:
> @@ -2517,15 +2519,17 @@ static bool migrate_vma_check_page(struct page *page)
> }
>
> /*
> - * migrate_vma_prepare() - lock pages and isolate them from the lru
> + * migrate_vma_unmap() - replace page mapping with special migration pte entry
> * @migrate: migrate struct containing all migration information
> *
> - * This locks pages that have been collected by migrate_vma_collect(). Once each
> - * page is locked it is isolated from the lru (for non-device pages). Finally,
> - * the ref taken by migrate_vma_collect() is dropped, as locked pages cannot be
> - * migrated by concurrent kernel threads.
> + * Isolate pages from the LRU and replace mappings (CPU page table pte) with a
> + * special migration pte entry and check if it has been pinned. Pinned pages are
> + * restored because we cannot migrate them.
> + *
> + * This is the last step before we call the device driver callback to allocate
> + * destination memory and copy contents of original page over to new page.
> */
> -static void migrate_vma_prepare(struct migrate_vma *migrate)
> +static void migrate_vma_unmap(struct migrate_vma *migrate)
> {
> const unsigned long npages = migrate->npages;
> const unsigned long start = migrate->start;
> @@ -2534,32 +2538,12 @@ static void migrate_vma_prepare(struct migrate_vma *migrate)
>
> lru_add_drain();
>
> - for (i = 0; (i < npages) && migrate->cpages; i++) {
> + for (i = 0; i < npages; i++) {
> struct page *page = migrate_pfn_to_page(migrate->src[i]);
> - bool remap = true;
>
> if (!page)
> continue;
>
> - if (!(migrate->src[i] & MIGRATE_PFN_LOCKED)) {
> - /*
> - * Because we are migrating several pages there can be
> - * a deadlock between 2 concurrent migration where each
> - * are waiting on each other page lock.
> - *
> - * Make migrate_vma() a best effort thing and backoff
> - * for any page we can not lock right away.
> - */
> - if (!trylock_page(page)) {
> - migrate->src[i] = 0;
> - migrate->cpages--;
> - put_page(page);
> - continue;
> - }
> - remap = false;
> - migrate->src[i] |= MIGRATE_PFN_LOCKED;
> - }
> -
> /* ZONE_DEVICE pages are not on LRU */
> if (!is_zone_device_page(page)) {
> if (!PageLRU(page) && allow_drain) {
> @@ -2569,16 +2553,9 @@ static void migrate_vma_prepare(struct migrate_vma *migrate)
> }
>
> if (isolate_lru_page(page)) {
> - if (remap) {
> - migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> - migrate->cpages--;
> - restore++;
> - } else {
> - migrate->src[i] = 0;
> - unlock_page(page);
> - migrate->cpages--;
> - put_page(page);
> - }
> + migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> + migrate->cpages--;
> + restore++;
> continue;
> }
>
> @@ -2586,80 +2563,20 @@ static void migrate_vma_prepare(struct migrate_vma *migrate)
> put_page(page);
> }
>
> - if (!migrate_vma_check_page(page)) {
> - if (remap) {
> - migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> - migrate->cpages--;
> - restore++;
> -
> - if (!is_zone_device_page(page)) {
> - get_page(page);
> - putback_lru_page(page);
> - }
> - } else {
> - migrate->src[i] = 0;
> - unlock_page(page);
> - migrate->cpages--;
> + if (page_mapped(page))
> + try_to_migrate(page, 0);
>
> - if (!is_zone_device_page(page))
> - putback_lru_page(page);
> - else
> - put_page(page);
> + if (page_mapped(page) || !migrate_vma_check_page(page)) {
> + if (!is_zone_device_page(page)) {
> + get_page(page);
> + putback_lru_page(page);
> }
> - }
> - }
> -
> - for (i = 0, addr = start; i < npages && restore; i++, addr += PAGE_SIZE) {
> - struct page *page = migrate_pfn_to_page(migrate->src[i]);
> -
> - if (!page || (migrate->src[i] & MIGRATE_PFN_MIGRATE))
> - continue;
>
> - remove_migration_pte(page, migrate->vma, addr, page);
> -
> - migrate->src[i] = 0;
> - unlock_page(page);
> - put_page(page);
> - restore--;
> - }
> -}
> -
> -/*
> - * migrate_vma_unmap() - replace page mapping with special migration pte entry
> - * @migrate: migrate struct containing all migration information
> - *
> - * Replace page mapping (CPU page table pte) with a special migration pte entry
> - * and check again if it has been pinned. Pinned pages are restored because we
> - * cannot migrate them.
> - *
> - * This is the last step before we call the device driver callback to allocate
> - * destination memory and copy contents of original page over to new page.
> - */
> -static void migrate_vma_unmap(struct migrate_vma *migrate)
> -{
> - const unsigned long npages = migrate->npages;
> - const unsigned long start = migrate->start;
> - unsigned long addr, i, restore = 0;
> -
> - for (i = 0; i < npages; i++) {
> - struct page *page = migrate_pfn_to_page(migrate->src[i]);
> -
> - if (!page || !(migrate->src[i] & MIGRATE_PFN_MIGRATE))
> + migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> + migrate->cpages--;
> + restore++;
> continue;
> -
> - if (page_mapped(page)) {
> - try_to_migrate(page, 0);
> - if (page_mapped(page))
> - goto restore;
> }
> -
> - if (migrate_vma_check_page(page))
> - continue;
> -
> -restore:
> - migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> - migrate->cpages--;
> - restore++;
> }
>
> for (addr = start, i = 0; i < npages && restore; addr += PAGE_SIZE, i++) {
> @@ -2672,12 +2589,8 @@ static void migrate_vma_unmap(struct migrate_vma *migrate)
>
> migrate->src[i] = 0;
> unlock_page(page);
> + put_page(page);
> restore--;
> -
> - if (is_zone_device_page(page))
> - put_page(page);
> - else
> - putback_lru_page(page);
> }
> }
>
> @@ -2700,8 +2613,8 @@ static void migrate_vma_unmap(struct migrate_vma *migrate)
> * it for all those entries (ie with MIGRATE_PFN_VALID and MIGRATE_PFN_MIGRATE
> * flag set). Once these are allocated and copied, the caller must update each
> * corresponding entry in the dst array with the pfn value of the destination
> - * page and with the MIGRATE_PFN_VALID and MIGRATE_PFN_LOCKED flags set
> - * (destination pages must have their struct pages locked, via lock_page()).
> + * page and with MIGRATE_PFN_VALID. Destination pages must be locked via
> + * lock_page().
> *
> * Note that the caller does not have to migrate all the pages that are marked
> * with MIGRATE_PFN_MIGRATE flag in src array unless this is a migration from
> @@ -2770,8 +2683,6 @@ int migrate_vma_setup(struct migrate_vma *args)
>
> migrate_vma_collect(args);
>
> - if (args->cpages)
> - migrate_vma_prepare(args);
> if (args->cpages)
> migrate_vma_unmap(args);
>
^ permalink raw reply
* Re: [PATCH v3] powerpc/boot: Set LC_ALL=C in wrapper script
From: Segher Boessenkool @ 2021-10-26 15:59 UTC (permalink / raw)
To: Christophe Leroy; +Cc: Paul Mackerras, linuxppc-dev, linux-kernel
In-Reply-To: <a9ff3bc98035f63b122c051f02dc47c7aed10430.1635256089.git.christophe.leroy@csgroup.eu>
On Tue, Oct 26, 2021 at 03:48:29PM +0200, Christophe Leroy wrote:
> While trying to build a simple Image for ACADIA platform, I got the
> following error:
>
> WRAP arch/powerpc/boot/simpleImage.acadia
> INFO: Uncompressed kernel (size 0x6ae7d0) overlaps the address of the wrapper(0x400000)
> INFO: Fixing the link_address of wrapper to (0x700000)
> powerpc64-linux-gnu-ld : mode d'émulation non reconnu : -T
> Émulations prises en charge : elf64ppc elf32ppc elf32ppclinux elf32ppcsim elf64lppc elf32lppc elf32lppclinux elf32lppcsim
> make[1]: *** [arch/powerpc/boot/Makefile:424 : arch/powerpc/boot/simpleImage.acadia] Erreur 1
> make: *** [arch/powerpc/Makefile:285 : simpleImage.acadia] Erreur 2
>
> Trying again with V=1 shows the following command
>
> powerpc64-linux-gnu-ld -m -T arch/powerpc/boot/zImage.lds -Ttext 0x700000 --no-dynamic-linker -o arch/powerpc/boot/simpleImage.acadia -Map wrapper.map arch/powerpc/boot/fixed-head.o arch/powerpc/boot/simpleboot.o ./zImage.3278022.o arch/powerpc/boot/wrapper.a
>
> The argument of '-m' is missing.
>
> This is due to the wrapper script calling 'objdump -p vmlinux' and
> looking for 'file format', whereas the output of objdump is:
>
> vmlinux: format de fichier elf32-powerpc
>
> En-tête de programme:
> LOAD off 0x00010000 vaddr 0xc0000000 paddr 0x00000000 align 2**16
> filesz 0x0069e1d4 memsz 0x006c128c flags rwx
> NOTE off 0x0064591c vaddr 0xc063591c paddr 0x0063591c align 2**2
> filesz 0x00000054 memsz 0x00000054 flags ---
>
> Add LC_ALL=C at the beginning of the wrapper script in order to get the
> output expected by the script:
>
> vmlinux: file format elf32-powerpc
>
> Program Header:
> LOAD off 0x00010000 vaddr 0xc0000000 paddr 0x00000000 align 2**16
> filesz 0x0069e1d4 memsz 0x006c128c flags rwx
> NOTE off 0x0064591c vaddr 0xc063591c paddr 0x0063591c align 2**2
> filesz 0x00000054 memsz 0x00000054 flags ---
>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Thanks,
Segher
^ permalink raw reply
* Re: [PATCH v2] macintosh/via-pmu-led: make disk activity usage a parameter.
From: Hill Ma @ 2021-10-26 14:50 UTC (permalink / raw)
To: Nathan Lynch; +Cc: linuxppc-dev, linux-kernel, linux-doc
In-Reply-To: <87fssox7ah.fsf@linux.ibm.com>
Thanks for the review.
On Tue, Oct 26, 2021 at 6:08 AM Nathan Lynch <nathanl@linux.ibm.com> wrote:
>
> Hello,
>
> Hill Ma <maahiuzeon@gmail.com> writes:
> > Whether to use the LED as a disk activity is a user preference.
> > Some like this usage while others find the LED too bright. So it
> > might be a good idea to make this choice a runtime parameter rather
> > than compile-time config.
>
> Users already have the ability to change the LED behavior at runtime
> already, correct? I.e. they can do:
>
> echo none > /sys/class/leds/pmu-led::front/trigger
>
> in their boot scripts. Granted, a kernel built with ADB_PMU_LED_DISK=y
> will blink the LED on disk activity until user space is running. Is this
> unsatisfactory?
Yes, indeed. As someone who does not like this behavior on iBooks.
> > The default is set to disabled as OS X does not use the LED as a
> > disk activity indicator.
>
> This is long-standing behavior in Linux and OS X has been EOL on this
> architecture for a decade, so this isn't much of a consideration at this
> point. Seems more important to avoid surprising existing users and
> distributions with a behavior change that makes additional work for
> them. See below.
>
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 43dc35fe5bc0..a656a51ba0a8 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -250,6 +250,12 @@
> > Use timer override. For some broken Nvidia NF5 boards
> > that require a timer override, but don't have HPET
> >
> > + adb_pmu_led_disk [PPC]
> > + Use front LED as disk LED by default. Only applies to
> > + PowerBook, iBook, PowerMac 7,2/7,3.
> > + Format: <bool> (1/Y/y=enable, 0/N/n=disable)
> > + Default: disabled
> > +
> > add_efi_memmap [EFI; X86] Include EFI memory map in
> > kernel's map of available physical RAM.
> >
> > diff --git a/drivers/macintosh/Kconfig b/drivers/macintosh/Kconfig
> > index 5cdc361da37c..243215de563c 100644
> > --- a/drivers/macintosh/Kconfig
> > +++ b/drivers/macintosh/Kconfig
> > @@ -78,16 +78,6 @@ config ADB_PMU_LED
> > behaviour of the old CONFIG_BLK_DEV_IDE_PMAC_BLINK, select this
> > and the disk LED trigger and configure appropriately through sysfs.
> >
> > -config ADB_PMU_LED_DISK
> > - bool "Use front LED as DISK LED by default"
> > - depends on ADB_PMU_LED
> > - depends on LEDS_CLASS
> > - select LEDS_TRIGGERS
> > - select LEDS_TRIGGER_DISK
> > - help
> > - This option makes the front LED default to the disk trigger
> > - so that it blinks on disk activity.
> > -
>
> So, if I've been relying on CONFIG_ADB_PMU_LED_DISK=y and I upgrade to a
> newer kernel with the proposed change, from my point of view the disk
> activity LED has stopped working and I need to alter the bootloader
> config or init scripts to restore the expected behavior. That seems
> undesirable to me.
>
> I don't think we rigidly enforce Kconfig backward compatibility, but
> when it comes to a user-visible function on a legacy platform where
> users and distros likely have their configurations figured out already,
> it's probably best to avoid such changes.
I actually asked some distributions that still ship PowerPC BE
architectures to unset it.
https://github.com/void-ppc/void-packages/pull/48
https://github.com/void-linux/void-packages/pull/33275
https://git.adelielinux.org/adelie/packages/-/merge_requests/607
And Debian, which still has PowerPC BE architectures as ports, does
not turn it on.
https://salsa.debian.org/kernel-team/linux/-/blob/master/debian/config/kernelarch-powerpc/config
The problem I see is the following:
- A distribution might accidentally turn it back on, which happened
with Void already.
- For people like the disk activity behavior, they need to recompile
the kernel to regain the exact previous behavior.
I think we could retain backward compatibility by adding back the
Kconfig and setting the initial value of adb_pmu_led_disk based on the
config. I am not sure if we need two mechanisms for this single
preference though.
^ permalink raw reply
* Re: linux-next: manual merge of the audit tree with the powerpc tree
From: Paul Moore @ 2021-10-26 14:27 UTC (permalink / raw)
To: Michael Ellerman
Cc: Stephen Rothwell, Richard Guy Briggs, Linux Kernel Mailing List,
Linux Next Mailing List, PowerPC
In-Reply-To: <87k0i0awdl.fsf@mpe.ellerman.id.au>
On Tue, Oct 26, 2021 at 6:55 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Stephen Rothwell <sfr@canb.auug.org.au> writes:
> > Hi all,
> >
> > Today's linux-next merge of the audit tree got conflicts in:
> >
> > arch/powerpc/kernel/audit.c
> > arch/powerpc/kernel/compat_audit.c
> >
> > between commit:
> >
> > 566af8cda399 ("powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC")
> >
> > from the powerpc tree and commits:
> >
> > 42f355ef59a2 ("audit: replace magic audit syscall class numbers with macros")
> > 1c30e3af8a79 ("audit: add support for the openat2 syscall")
> >
> > from the audit tree.
>
> Thanks.
>
> I guess this is OK, unless the audit folks disagree. I could revert the
> powerpc commit and try it again later.
>
> If I don't hear anything I'll leave it as-is.
Hi Michael,
Last I recall from the powerpc/audit thread there were still some
issues with audit working properly in your testing, has that been
resolved? If nothing else, -rc7 seems a bit late for this to hit
-next for me to feel comfortable about this.
--
paul moore
www.paul-moore.com
^ permalink raw reply
* [PATCH v3] powerpc/boot: Set LC_ALL=C in wrapper script
From: Christophe Leroy @ 2021-10-26 13:48 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
While trying to build a simple Image for ACADIA platform, I got the
following error:
WRAP arch/powerpc/boot/simpleImage.acadia
INFO: Uncompressed kernel (size 0x6ae7d0) overlaps the address of the wrapper(0x400000)
INFO: Fixing the link_address of wrapper to (0x700000)
powerpc64-linux-gnu-ld : mode d'émulation non reconnu : -T
Émulations prises en charge : elf64ppc elf32ppc elf32ppclinux elf32ppcsim elf64lppc elf32lppc elf32lppclinux elf32lppcsim
make[1]: *** [arch/powerpc/boot/Makefile:424 : arch/powerpc/boot/simpleImage.acadia] Erreur 1
make: *** [arch/powerpc/Makefile:285 : simpleImage.acadia] Erreur 2
Trying again with V=1 shows the following command
powerpc64-linux-gnu-ld -m -T arch/powerpc/boot/zImage.lds -Ttext 0x700000 --no-dynamic-linker -o arch/powerpc/boot/simpleImage.acadia -Map wrapper.map arch/powerpc/boot/fixed-head.o arch/powerpc/boot/simpleboot.o ./zImage.3278022.o arch/powerpc/boot/wrapper.a
The argument of '-m' is missing.
This is due to the wrapper script calling 'objdump -p vmlinux' and
looking for 'file format', whereas the output of objdump is:
vmlinux: format de fichier elf32-powerpc
En-tête de programme:
LOAD off 0x00010000 vaddr 0xc0000000 paddr 0x00000000 align 2**16
filesz 0x0069e1d4 memsz 0x006c128c flags rwx
NOTE off 0x0064591c vaddr 0xc063591c paddr 0x0063591c align 2**2
filesz 0x00000054 memsz 0x00000054 flags ---
Add LC_ALL=C at the beginning of the wrapper script in order to get the
output expected by the script:
vmlinux: file format elf32-powerpc
Program Header:
LOAD off 0x00010000 vaddr 0xc0000000 paddr 0x00000000 align 2**16
filesz 0x0069e1d4 memsz 0x006c128c flags rwx
NOTE off 0x0064591c vaddr 0xc063591c paddr 0x0063591c align 2**2
filesz 0x00000054 memsz 0x00000054 flags ---
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v3: Also change patch's subject:
v2: Use LC_ALL=C per Segher
---
arch/powerpc/boot/wrapper | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
index 1cd82564c996..9184eda780fd 100755
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -26,6 +26,8 @@
# Stop execution if any command fails
set -e
+export LC_ALL=C
+
# Allow for verbose output
if [ "$V" = 1 ]; then
set -x
--
2.31.1
^ permalink raw reply related
* [PATCH v2] powerpc/boot: Set LANG=C in wrapper script
From: Christophe Leroy @ 2021-10-26 13:45 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linuxppc-dev, linux-kernel
While trying to build a simple Image for ACADIA platform, I got the
following error:
WRAP arch/powerpc/boot/simpleImage.acadia
INFO: Uncompressed kernel (size 0x6ae7d0) overlaps the address of the wrapper(0x400000)
INFO: Fixing the link_address of wrapper to (0x700000)
powerpc64-linux-gnu-ld : mode d'émulation non reconnu : -T
Émulations prises en charge : elf64ppc elf32ppc elf32ppclinux elf32ppcsim elf64lppc elf32lppc elf32lppclinux elf32lppcsim
make[1]: *** [arch/powerpc/boot/Makefile:424 : arch/powerpc/boot/simpleImage.acadia] Erreur 1
make: *** [arch/powerpc/Makefile:285 : simpleImage.acadia] Erreur 2
Trying again with V=1 shows the following command
powerpc64-linux-gnu-ld -m -T arch/powerpc/boot/zImage.lds -Ttext 0x700000 --no-dynamic-linker -o arch/powerpc/boot/simpleImage.acadia -Map wrapper.map arch/powerpc/boot/fixed-head.o arch/powerpc/boot/simpleboot.o ./zImage.3278022.o arch/powerpc/boot/wrapper.a
The argument of '-m' is missing.
This is due to the wrapper script calling 'objdump -p vmlinux' and
looking for 'file format', whereas the output of objdump is:
vmlinux: format de fichier elf32-powerpc
En-tête de programme:
LOAD off 0x00010000 vaddr 0xc0000000 paddr 0x00000000 align 2**16
filesz 0x0069e1d4 memsz 0x006c128c flags rwx
NOTE off 0x0064591c vaddr 0xc063591c paddr 0x0063591c align 2**2
filesz 0x00000054 memsz 0x00000054 flags ---
Add LC_ALL=C at the beginning of the wrapper script in order to get the
output expected by the script:
vmlinux: file format elf32-powerpc
Program Header:
LOAD off 0x00010000 vaddr 0xc0000000 paddr 0x00000000 align 2**16
filesz 0x0069e1d4 memsz 0x006c128c flags rwx
NOTE off 0x0064591c vaddr 0xc063591c paddr 0x0063591c align 2**2
filesz 0x00000054 memsz 0x00000054 flags ---
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v2: Use LC_ALL=C per Segher
---
arch/powerpc/boot/wrapper | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
index 1cd82564c996..9184eda780fd 100755
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -26,6 +26,8 @@
# Stop execution if any command fails
set -e
+export LC_ALL=C
+
# Allow for verbose output
if [ "$V" = 1 ]; then
set -x
--
2.31.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox