Re: [RFC][PATCH] printk: Fixup the nmi printk mess

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Petr Mladek <pmladek@suse.cz>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	linux-kernel@vger.kernel.org, jkosina@suse.cz,
	paulmck@linux.vnet.ibm.com, Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC][PATCH] printk: Fixup the nmi printk mess
Date: Wed, 10 Jun 2015 16:57:37 +0200	[thread overview]
Message-ID: <20150610145737.GE9409@pathway.suse.cz> (raw)
In-Reply-To: <20150610143155.GD9409@pathway.suse.cz>

On Wed 2015-06-10 16:31:55, Petr Mladek wrote:
> On Wed 2015-06-10 14:55:09, Peter Zijlstra wrote:
> > Hi,
> > 
> > I just stumbled upon:
> > 
> > 	a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs")
> > 
> > Which is not dissimilar to what I've proposed in the past. Except its
> > squirreled away in some far and dark corner of one arch.
> > 
> > Lets fix that.
> 
> It is very interesting approach. But I see two possible races, see below.
> 
> > diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
> > index dfd59d6bc6f0..a477e0766d2e 100644
> > --- a/include/linux/hardirq.h
> > +++ b/include/linux/hardirq.h
> > @@ -67,10 +67,12 @@ extern void irq_exit(void);
> >  		preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
> >  		rcu_nmi_enter();				\
> >  		trace_hardirq_enter();				\
> > +		printk_nmi_enter();				\
> >  	} while (0)
> >  
> >  #define nmi_exit()						\
> >  	do {							\
> > +		printk_nmi_exit();				\
> >  		trace_hardirq_exit();				\
> >  		rcu_nmi_exit();					\
> >  		BUG_ON(!in_nmi());				\
> > diff --git a/include/linux/percpu.h b/include/linux/percpu.h
> > index caebf2a758dc..04c68b9f56f8 100644
> > --- a/include/linux/percpu.h
> > +++ b/include/linux/percpu.h
> > @@ -135,7 +135,4 @@ extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
> >  	(typeof(type) __percpu *)__alloc_percpu(sizeof(type),		\
> >  						__alignof__(type))
> >  
> > -/* To avoid include hell, as printk can not declare this, we declare it here */
> > -DECLARE_PER_CPU(printk_func_t, printk_func);
> > -
> >  #endif /* __LINUX_PERCPU_H */
> > diff --git a/init/main.c b/init/main.c
> > index 2115055faeac..4a28accaaa98 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -586,6 +586,7 @@ asmlinkage __visible void __init start_kernel(void)
> >  	timekeeping_init();
> >  	time_init();
> >  	sched_clock_postinit();
> > +	printk_init();
> >  	perf_event_init();
> >  	profile_init();
> >  	call_function_init();
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index c099b082cd02..0149c41be95b 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1821,13 +1821,125 @@ int vprintk_default(const char *fmt, va_list args)
> >  }
> >  EXPORT_SYMBOL_GPL(vprintk_default);
> >  
> > +#ifdef CONFIG_PRINTK_NMI
> > +
> > +typedef int(*printk_func_t)(const char *fmt, va_list args);
> >  /*
> >   * This allows printk to be diverted to another function per cpu.
> >   * This is useful for calling printk functions from within NMI
> >   * without worrying about race conditions that can lock up the
> >   * box.
> >   */
> > -DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
> > +static DEFINE_PER_CPU(printk_func_t, printk_func) = vprintk_default;
> > +
> > +#include <linux/seq_buf.h>
> > +
> > +struct nmi_seq_buf {
> > +	struct seq_buf		seq;
> > +	struct irq_work		work;
> > +	unsigned char		buffer[PAGE_SIZE -
> > +					sizeof(struct seq_buf) -
> > +					sizeof(struct irq_work)];
> > +};
> > +
> > +/* Safe printing in NMI context */
> > +static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
> > +
> > +static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
> > +{
> > +	const char *buf = s->buffer + start;
> > +
> > +	printk("%.*s", (end - start) + 1, buf);
> > +}
> > +
> > +static void __printk_nmi_flush(struct irq_work *work)
> > +{
> > +	struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
> > +	int len, last_i = 0, i = 0;
> > +
> > +again:
> > +	len = seq_buf_used(&s->seq);
> > +	if (!len)
> > +		return;
> > +
> > +	/* Print line by line. */
> > +	for (; i < len; i++) {
> > +		if (s->buffer[i] == '\n') {
> > +			print_seq_line(s, last_i, i);
> > +			last_i = i + 1;
> > +		}
> > +	}
> > +	/* Check if there was a partial line. */
> > +	if (last_i < len) {
> > +		print_seq_line(s, last_i, len - 1);
> > +		pr_cont("\n");
> > +	}
> > +
> > +	/*
> > +	 * Another NMI could have come in while we were printing
> > +	 * check if nothing has been added to the buffer.
> > +	 */
> > +	if (cmpxchg_local(&s->seq.len, len, 0) != len)
> > +		goto again;
> 
> If another NMI comes at this point, it will start filling the buffer
> from the beginning. If it is fast enough, it might override the text
> that we print above.
> 
> Note that it is easy to be faster because the normal printk() is more
> complex and might wait for console.

Also note that show_regs() calls many separate printk()s, the irqwork
is scheduled by the first one => it is quite likely that some
backtrace will get messed.


Anothrer problem is that __printk_nmi_flush() is per-CPU => more
instances might be running in parallel. I haven't tested this but
I quess that it will mix backtraces from different CPUs in
the main ring buffer.

Note that the backtraces used to be serialized via
static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED
in the past. See the commit
a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs")


A solution would be to block __printk_nmi_flush() until
printk_nmi_exit() is called via some bit and cmpxchg.

Also we could add one more bit and vprintk_nmi() could drop messages
until the buffer is printed to avoid any mess.

Huh, it is getting more and more complicated.


Best Regards,
Petr

> > +}
> > +
> > +void printk_init(void)
> > +{
> > +	int cpu;
> > +
> > +	for_each_possible_cpu(cpu) {
> > +		struct nmi_seq_buf *s = &per_cpu(nmi_print_seq, cpu);
> > +
> > +		init_irq_work(&s->work, __printk_nmi_flush);
> > +		seq_buf_init(&s->seq, s->buffer, sizeof(s->buffer));
> > +	}
> > +}
> > +
> > +/*
> > + * It is not safe to call printk() directly from NMI handlers.
> > + * It may be fine if the NMI detected a lock up and we have no choice
> > + * but to do so, but doing a NMI on all other CPUs to get a back trace
> > + * can be done with a sysrq-l. We don't want that to lock up, which
> > + * can happen if the NMI interrupts a printk in progress.
> > + *
> > + * Instead, we redirect the vprintk() to this nmi_vprintk() that writes
> > + * the content into a per cpu seq_buf buffer. Then when the NMIs are
> > + * all done, we can safely dump the contents of the seq_buf to a printk()
> > + * from a non NMI context.
> > + */
> > +static int vprintk_nmi(const char *fmt, va_list args)
> > +{
> > +	struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
> > +	unsigned int len = seq_buf_used(&s->seq);
> > +
> > +	irq_work_queue(&s->work);
> > +	seq_buf_vprintf(&s->seq, fmt, args);
> 
> This is more critical. seq_buf_vprintf() has the following code:
> 
> 	len = vsnprintf(s->buffer + s->len, s->size - s->len, fmt, args);
> 
> The race might look like:
> 
> CPU0					CPU1
> 
> 					__printk_nmi_flush()
> 					...
> 
> seq_buf_vprintf()
> 
>    s->buffer + s->len
> 
> 					cmpxchg_local(&s->seq.len,
> 					len, 0)
> 
>    s->size - s->len
> 
> => that seq_buf_vprint() will start writing to the end of the seq_buf
>   but it will think that the entire writer is available and it might
>   cause buffer overflow
> 
> 
> Well, it is rather theoretical. s->len will be most likely read only
> once (compiler optimization). Anyway, we must make make sure that
> all the used seq_buf operations are save against such races.
> It is not obvious and might be prone to regressions.
> 
> To be honest, I am not familiar with cmpxchg_local(). But I think that
> it can't do much difference. The value has to be synced to the other
> CPUs one day.
> 
> 
> Note that Steven's solution is safe because the prints are serialized via
> @backtrace_flag in arch_trigger_all_cpu_backtrace().
> 
> 
> Best Regards,
> Petr
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

next prev parent reply	other threads:[~2015-06-10 14:57 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-10 12:55 [RFC][PATCH] printk: Fixup the nmi printk mess Peter Zijlstra
2015-06-10 14:31 ` Petr Mladek
2015-06-10 14:57   ` Petr Mladek [this message]
2015-06-10 15:32     ` Peter Zijlstra
2015-06-10 15:54       ` Steven Rostedt
2015-06-10 15:29   ` Peter Zijlstra
2015-06-10 19:23     ` Peter Zijlstra
2015-06-10 19:36       ` Peter Zijlstra
2015-06-10 19:36       ` Steven Rostedt
2015-06-11 14:55       ` Petr Mladek
2015-06-12 12:00         ` Peter Zijlstra
2015-06-12 14:28           ` Petr Mladek
2015-06-12 15:35             ` Petr Mladek
2015-06-25 18:14               ` Peter Zijlstra
2015-06-26  8:06                 ` Petr Mladek
2015-06-11  9:57     ` Petr Mladek
2015-06-10 15:53 ` Steven Rostedt
2015-06-10 16:02 ` Steven Rostedt
2015-06-10 16:19   ` Joe Perches
2015-06-10 19:25     ` Peter Zijlstra
2015-06-10 19:35       ` Steven Rostedt
2015-06-10 19:39       ` Joe Perches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150610145737.GE9409@pathway.suse.cz \
    --to=pmladek@suse.cz \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.