public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
To: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>, Calvin Owens <calvinowens@fb.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH 6/7] printk: use alternative printk buffers
Date: Thu, 6 Oct 2016 13:22:48 +0900	[thread overview]
Message-ID: <20161006042248.GA5458@swordfish> (raw)
In-Reply-To: <20161005095013.GB23809@pathway.suse.cz>

On (10/05/16 11:50), Petr Mladek wrote:
[..]
> > well, it solves a number of problems that the existing implementation
> > cannot handle.
> 
> Please, provide a summary. I wonder if these are real life problems.

anything that starts from printk().
I'm trying to address printk() recursion only, by handling the top-most
recursion - printk -> printk, which causes all of the dependent recursions:
	printk -> spin_lock(logbuf) -> printk() -> spin_lock(logbuf)
	printk -> sem->lock() -> printk() -> sem->lock()
and so on.

I'm not building a lock dependency graph/etc.


the existing recursion detection logic is racy and quite limited in
scope: not only it protects only a tiny bit of printk, but it doesn't
even protect it fully. the very moment we do `logbuf_cpu = UINT_MAX;'
it's over - we still own the lock, but we don't remember it anymore.
a spin_dump() from raw_spin_unlock(&logbuf_lock) will kill us.


** a very quick list **
// a mixed list of theoretical and real problems can also be found
// in the patch set cover letter.


public reports:

1) some pathces/reports from Byungchul Park

the first one in my mailbox (seems to be)
https://marc.info/?l=linux-kernel&m=145406977822244
20160129121545.GH31266@X58A-UD3R

or here
https://marc.info/?l=linux-kernel&m=145570036513749

in his case, I believe, the lock was fine (not corrupted). it just he had
a spin_lock lockup on console semaphore spin_lock coming from:

static void __spin_lock_debug(raw_spinlock_t *lock)
{
	u64 i;
	u64 loops = loops_per_jiffy * HZ;

	for (i = 0; i < loops; i++) {
		if (arch_spin_trylock(&lock->raw_lock))
			return;
		__delay(1);
	}
	/* lockup suspected: */
	spin_dump(lock, "lockup suspected");


which ended up in an infinite recursion.

https://marc.info/?l=linux-kernel&m=145449014308161



2) a report from Viresh Kumar.

frankly, we were lucky to have Viresh on this: a less experienced developer
would probably give up. So would probably do a developer with no appropriate
hardware: jtag debugger/serial console/etc. and I don't know how much would
we spend on meditations to figure out it was a WARN from timekeeping. we
better be 'more prepared'.


=======
reports (unique occurrences only) that I have in internal bugzilla

4) sleeping function called from inside logbuf lock
   which resulted in spin_dump() call from spin_unlock(&logbuf_lock)
   when:
	a) we still owned the logbuf_lock
	b) yet logbuf_cpu was already reset, so printk recursion
	detection was helpless

   'a + b' leave us no chances to survive.

5) ARM specific
   an imprecise abort (http://infocenter.arm.com/help/topic/com.arm.doc.faqs/14809.html)
   hit the CPU while it was holding the printk-related spin-lock.
   that deadlocked the system, because abort handler attempted to
   printk() a message.


6) logbuf_lock corruption
   well, no cookies for us. un-fixable at the moment. we can probably
   do something about it. have a spinlock-debug bool function that would
   tell us whether the lock is corrupted, so we can re-init logbuf_lock,
   perhaps.


> Note that we need to put aside all problems that are solvable
> with printk_deferred(). It seems that printk_deferred() will
> need to stay because it avoids the deadlock caused by
> scheduler/timekeeping code locks.

agree. printk_deferred() takes only one lock and avoids console_unlock()
loop. as long as logbuf_lock is not on it's way printk_deferred() may be
helpful.

> By other words, if there is a missing printk_deferred() we
> need to put it there anyway because the same code might get
> first called outside printk().

right. and I'm not addressing this. there are just too many locks
that can be acquired out of order. not only timekeeping and sched
locks, but any of serial console locks adn so on. we need something
like lockdep locks graph here that would not report the issues (any
printk() can result in deadlock when we detect that at least one of
printk related locks was acquired out of order), but instead would
somehow selectively fix/workaround them.


can we somehow transparently for the rest of the system (just in
printk()) detect that we are in a potentially risky situation? hmm,
I don't know...

something *very* radical?

	vprintk_func()
	{
		if (this_cpu_read(alt_printk_ctx) & ALT_PRINTK_NMI_CONTEXT_MASK)
		return vprintk_nmi(fmt, args);

		if (in_atomic() ||
		/* ^^^^^^^^^^^^ */
			this_cpu_read(alt_printk_ctx) & ALT_PRINTK_CONTEXT_MASK)
			return vprintk_alt(fmt, args);

		return vprintk_default(fmt, args);
	}

	-ss

  reply	other threads:[~2016-10-06  4:22 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-27 14:22 [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls Sergey Senozhatsky
2016-09-27 14:22 ` [RFC][PATCH 1/7] printk: use vprintk_func in vprintk() Sergey Senozhatsky
2016-09-27 14:22 ` [RFC][PATCH 2/7] printk: rename nmi.c and exported api Sergey Senozhatsky
2016-09-27 14:22 ` [RFC][PATCH 3/7] printk: introduce per-cpu alt_print seq buffer Sergey Senozhatsky
2016-09-29 12:26   ` Petr Mladek
2016-09-30  1:05     ` Sergey Senozhatsky
2016-09-30 11:35       ` Petr Mladek
2016-09-27 14:22 ` [RFC][PATCH 4/7] printk: make alt_printk available when config printk set Sergey Senozhatsky
2016-09-27 14:22 ` [RFC][PATCH 5/7] printk: drop vprintk_func function Sergey Senozhatsky
2016-09-27 14:22 ` [RFC][PATCH 6/7] printk: use alternative printk buffers Sergey Senozhatsky
2016-09-29 13:00   ` Petr Mladek
2016-09-30  1:15     ` Sergey Senozhatsky
2016-09-30 11:15       ` Petr Mladek
2016-10-01  2:48         ` Sergey Senozhatsky
2016-10-04 12:22           ` Petr Mladek
2016-10-05  1:36             ` Sergey Senozhatsky
2016-10-05 10:18               ` Petr Mladek
2016-10-03  7:53         ` Sergey Senozhatsky
2016-10-04 14:52         ` Petr Mladek
2016-10-05  1:27           ` Sergey Senozhatsky
2016-10-05  9:50             ` Petr Mladek
2016-10-06  4:22               ` Sergey Senozhatsky [this message]
2016-10-06 11:32                 ` Petr Mladek
2016-10-10  4:09                   ` Sergey Senozhatsky
2016-10-10 11:17                     ` Petr Mladek
2016-10-11  7:35                       ` Sergey Senozhatsky
2016-10-11  9:30                         ` Petr Mladek
2016-09-27 14:22 ` [RFC][PATCH 7/7] printk: new printk() recursion detection Sergey Senozhatsky
2016-09-29 13:19   ` Petr Mladek
2016-09-30  2:00     ` Sergey Senozhatsky
2016-09-29 13:25 ` [RFC][PATCH 0/7] printk: use alt_printk to handle printk() recursive calls Petr Mladek
2016-09-30  2:43   ` Sergey Senozhatsky
2016-09-30 11:27     ` Petr Mladek
2016-10-01  3:02       ` Sergey Senozhatsky
2016-10-04 11:35         ` Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161006042248.GA5458@swordfish \
    --to=sergey.senozhatsky.work@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=calvinowens@fb.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox