All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
To: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Konstantin Khlebnikov <koct9i@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <peterz@infradead.org>,
	x86@kernel.org, John Ogness <john.ogness@linutronix.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Petr Tesarik <ptesarik@suse.cz>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] printk/panic: Access the main printk log in panic() only when safe
Date: Wed, 31 Jul 2019 15:08:39 +0900	[thread overview]
Message-ID: <20190731060839.GA1756@jagdpanzerIV> (raw)
In-Reply-To: <20190724122711.qquevkcuge24bhdd@pathway.suse.cz>

Sorry for the delayed response.

On (07/24/19 14:27), Petr Mladek wrote:
[..]
> > And this is where the idea of "disconnecting" those CPUs from main
> > logbuf come from.
> > 
> > So what we can do:
> > - smp_send_stop()
> > - disconnect all-but-self from logbuf (via printk-safe)
> 
> printk_safe is not really necessary. As you wrote, nobody
> is interested into messages from CPUs that are supposed
> to be stopped.

OK.
Doing it in printk.c makes it easier to handle console_owner/waiter,
which are not exported.

> It might be enough to set some global variable, for example,
> with the CPU number that is calling panic() and is the only
> one allowed to print messages from this point on.

Sounds good.

> It might even be used to force console lock owner to leave
> the cycle immediately.

Yes, makes sense.

[..]
> > So, shall we try one more time with the "disconnect" misbehaving CPUs
> > approach? I can send an RFC patch.
> 
> IMHO, it will be acceptable only when it is reasonably simple and
> straightforward. The panic() code is full of special hacks and
> it is already complicated to follow all the twists.

When you have a chance, mind to take a look at the patch below?
Doesn't look very difficult (half of it are white-spaces and
comments, I believe).

> Especially because this scenario came up from a theoretical
> discussion. Note that my original, real bug report, was
> with logbuf_lock NMI, enabled kdump notifiers, ...

I understand. The idea is that this patch should handle your real
scenario + theoretical scenario.

---
 include/linux/printk.h |  2 ++
 kernel/panic.c         |  9 ++++++-
 kernel/printk/printk.c | 53 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 57c9473f4a81..016f5ba06e94 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -322,6 +322,8 @@ static inline void printk_safe_flush_on_panic(void)
 }
 #endif
 
+void printk_enter_panic_mode(int cpu);
+
 extern int kptr_restrict;
 
 #ifndef pr_fmt
diff --git a/kernel/panic.c b/kernel/panic.c
index d1ece4c363b9..85fac975a90f 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -254,13 +254,20 @@ void panic(const char *fmt, ...)
 		crash_smp_send_stop();
 	}
 
+	/* Misbehaving secondary CPUs cannot printk() to the main logbuf now */
+	printk_enter_panic_mode(this_cpu);
+
 	/*
 	 * Run any panic handlers, including those that might need to
 	 * add information to the kmsg dump output.
 	 */
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
-	/* Call flush even twice. It tries harder with a single online CPU */
+	/*
+	 * Call flush even twice. It tries harder with a single online CPU.
+	 * Even if we failed to stop some of secondary CPUs we have printk
+	 * locks re-initialized and keep secondary CPUs off printk().
+	 */
 	printk_safe_flush_on_panic();
 	kmsg_dump(KMSG_DUMP_PANIC);
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index f0bc37a511a7..750f83c3b589 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -87,6 +87,8 @@ static DEFINE_SEMAPHORE(console_sem);
 struct console *console_drivers;
 EXPORT_SYMBOL_GPL(console_drivers);
 
+static atomic_t __read_mostly printk_panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID);
+
 /*
  * System may need to suppress printk message under certain
  * circumstances, like after kernel panic happens.
@@ -1644,6 +1646,49 @@ static void console_lock_spinning_enable(void)
 	spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
 }
 
+static int panic_in_progress_on_other_cpu(void)
+{
+	int cpu = atomic_read(&printk_panic_cpu);
+
+	return cpu != PANIC_CPU_INVALID && cpu != smp_processor_id();
+}
+
+void printk_enter_panic_mode(int cpu)
+{
+	unsigned long timeout;
+
+	cpu = atomic_cmpxchg(&printk_panic_cpu, PANIC_CPU_INVALID, cpu);
+	/* printk can enter panic mode only once */
+	if (cpu != PANIC_CPU_INVALID)
+		return;
+	/*
+	 * Wait for active secondary CPUs (if there are any) to leave
+	 * console_unlock() printing loop (for up to one second).
+	 */
+	if (num_online_cpus() > 1) {
+		timeout = USEC_PER_SEC;
+		while (num_online_cpus() > 1 && timeout--)
+			udelay(1);
+	}
+
+	debug_locks_off();
+	/*
+	 * On some platforms crash_smp_send_stop() can kill CPUs via NMI
+	 * vector. Re-init printk() locks just in case if any of those killed
+	 * CPUs held any of printk() locks. On platforms which don't support
+	 * NMI stop, misbehaving secondary CPUs will be handled by
+	 * panic_in_progress_on_other_cpu() test.
+	 *
+	 * We re-init only printk() locks here. oops_in_progress is expected
+	 * to be set by now, so good console drivers are in lockless mode,
+	 * bad console drivers, however, can deadlock.
+	 */
+	raw_spin_lock_init(&logbuf_lock);
+	sema_init(&console_sem, 1);
+	WRITE_ONCE(console_waiter, false);
+	raw_spin_lock_init(&console_owner_lock);
+}
+
 /**
  * console_lock_spinning_disable_and_check - mark end of code where another
  *	thread was able to busy wait and check if there is a waiter
@@ -1900,6 +1945,9 @@ int vprintk_store(int facility, int level,
 	size_t text_len;
 	enum log_flags lflags = 0;
 
+	if (panic_in_progress_on_other_cpu())
+		return 0;
+
 	/*
 	 * The printf needs to come first; we need the syslog
 	 * prefix which might be passed-in as a parameter.
@@ -2468,6 +2516,11 @@ void console_unlock(void)
 			return;
 		}
 
+		if (panic_in_progress_on_other_cpu()) {
+			printk_safe_exit_irqrestore(flags);
+			return;
+		}
+
 		printk_safe_exit_irqrestore(flags);
 
 		if (do_cond_resched)
-- 
2.22.0


  reply	other threads:[~2019-07-31  6:08 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-16  7:28 [PATCH 0/2] panic/printk/x86: Prevent some more printk-related deadlocks in panic() Petr Mladek
2019-07-16  7:28 ` [PATCH 1/2] printk/panic: Access the main printk log in panic() only when safe Petr Mladek
2019-07-17  9:56   ` Sergey Senozhatsky
2019-07-18  8:36     ` Petr Mladek
2019-07-18  9:49       ` Sergey Senozhatsky
2019-07-19 12:57         ` Petr Mladek
2019-07-23  3:13           ` Sergey Senozhatsky
2019-07-24 12:27             ` Petr Mladek
2019-07-31  6:08               ` Sergey Senozhatsky [this message]
2019-07-31  6:59                 ` Sergey Senozhatsky
2019-07-18 10:11   ` Sergey Senozhatsky
2019-07-16  7:28 ` [PATCH 2/2] printk/panic/x86: Allow to access printk log buffer after crash_smp_send_stop() Petr Mladek
2019-07-18 10:47   ` Sergey Senozhatsky
2019-07-18 11:07     ` Konstantin Khlebnikov
2019-07-18 11:29       ` Sergey Senozhatsky
2019-07-19 12:19         ` Petr Mladek
2019-07-18  9:59 ` [PATCH 0/2] panic/printk/x86: Prevent some more printk-related deadlocks in panic() Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190731060839.GA1756@jagdpanzerIV \
    --to=sergey.senozhatsky.work@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=john.ogness@linutronix.de \
    --cc=koct9i@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=ptesarik@suse.cz \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.