public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Don Zickus <dzickus@redhat.com>
To: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Luck, Tony" <tony.luck@intel.com>,
	Matthew Garrett <mjg@redhat.com>, Vivek Goyal <vgoyal@redhat.com>,
	"Chen, Gong" <gong.chen@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"len.brown@intel.com" <len.brown@intel.com>,
	"ying.huang@intel.com" <ying.huang@intel.com>,
	"ak@linux.intel.com" <ak@linux.intel.com>,
	"hughd@chromium.org" <hughd@chromium.org>,
	"mingo@elte.hu" <mingo@elte.hu>,
	"jmorris@namei.org" <jmorris@namei.org>,
	"a.p.zijlstra@chello.nl" <a.p.zijlstra@chello.nl>,
	"namhyung@gmail.com" <namhyung@gmail.com>,
	"dle-develop@lists.sourceforge.net" 
	<dle-develop@lists.sourceforge.net>,
	Satoru Moriya <satoru.moriya@hds.com>
Subject: Re: [RFC][PATCH -next] make pstore/kmsg_dump run after stopping other cpus in panic path
Date: Mon, 17 Oct 2011 12:09:04 -0400	[thread overview]
Message-ID: <20111017160904.GG5795@redhat.com> (raw)
In-Reply-To: <5C4C569E8A4B9B42A84A977CF070A35B2C5747DC7B@USINDEVS01.corp.hds.com>

On Fri, Oct 14, 2011 at 04:53:05PM -0400, Seiji Aguchi wrote:
> Hi,
>  
>  As Don mentioned in following thread, it would be nice for pstore/kmsg_dump to serialize 
>  panic path and have one cpu running because they can log messages reliably. 
>  
>  https://lkml.org/lkml/2011/10/13/427
>  
>  For realizing this idea, we have to move kmsg_dump below smp_send_stop() and bust some locks 
>  of kmsg_dump/pstore in panic path.
> 
>  This patch does followings.
> 
>   - moving kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop.
>   - busting logbuf_lock of kmsg_dump() in panic path for avoiding deadlock.
>   - busting psinfo->buf_lock of pstore_dump() in panic path for avoiding deadlock.
> 
> Any comments are welcome.

I think I am ok with this, just some comments below.

> 
>  Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
> 
> ---
>  fs/pstore/platform.c |   22 ++++++++++------------
>  kernel/panic.c       |    4 ++--
>  kernel/printk.c      |    7 +++++++
>  3 files changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
> index 2bd620f..e73d940 100644
> --- a/fs/pstore/platform.c
> +++ b/fs/pstore/platform.c
> @@ -90,19 +90,21 @@ static void pstore_dump(struct kmsg_dumper *dumper,
>  	int		hsize, ret;
>  	unsigned int	part = 1;
>  	unsigned long	flags = 0;
> -	int		is_locked = 0;
>  
>  	if (reason < ARRAY_SIZE(reason_str))
>  		why = reason_str[reason];
>  	else
>  		why = "Unknown";
>  
> -	if (in_nmi()) {
> -		is_locked = spin_trylock(&psinfo->buf_lock);
> -		if (!is_locked)
> -			pr_err("pstore dump routine blocked in NMI, may corrupt error record\n");
> -	} else
> -		spin_lock_irqsave(&psinfo->buf_lock, flags);
> +	/*
> +	 * pstore_dump() is called after smp_send_stop() in panic path.
> +	 * So, spin_lock should be bust for avoiding deadlock.
> +	 */
> +	if (reason == KMSG_DUMP_PANIC)
> +		spin_lock_init(&psinfo->buf_lock);
> +
> +	spin_lock_irqsave(&psinfo->buf_lock, flags);
> +
>  	oopscount++;
>  	while (total < kmsg_bytes) {
>  		dst = psinfo->buf;
> @@ -131,11 +133,7 @@ static void pstore_dump(struct kmsg_dumper *dumper,
>  		total += l1_cpy + l2_cpy;
>  		part++;
>  	}
> -	if (in_nmi()) {
> -		if (is_locked)
> -			spin_unlock(&psinfo->buf_lock);
> -	} else
> -		spin_unlock_irqrestore(&psinfo->buf_lock, flags);
> +	spin_unlock_irqrestore(&psinfo->buf_lock, flags);
>  }

My original problem was the NMI path was calling panic and thus causing
these locking problems.  With the change below, which allows the change
above, I believe the NMI case is covered (because this path is only called
by NMI during panic).  This makes my previous patch (accepted by Tony)
probably unnecessary if we go with this patch.

As for the need for using spin_lock_init vs just skipping the locking, I
don't have a preference, nor know what the correct usage is for these
things.  I suppose this patch does the proper thing.

>  
>  static struct kmsg_dumper pstore_dumper = {
> diff --git a/kernel/panic.c b/kernel/panic.c
> index d7bb697..41bf6ad 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -88,8 +88,6 @@ NORET_TYPE void panic(const char * fmt, ...)
>  	 */
>  	crash_kexec(NULL);
>  
> -	kmsg_dump(KMSG_DUMP_PANIC);
> -
>  	/*
>  	 * Note smp_send_stop is the usual smp shutdown function, which
>  	 * unfortunately means it may not be hardened to work in a panic
> @@ -97,6 +95,8 @@ NORET_TYPE void panic(const char * fmt, ...)
>  	 */
>  	smp_send_stop();
>  
> +	kmsg_dump(KMSG_DUMP_PANIC);
> +
>  	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>  
>  	bust_spinlocks(0);

Yes, this is what Seiji and I talked about previously based on Vivek's
suggestion.  This can only work on x86 if we pick up my 'using NMI in stop
cpu path' patch from last week.

https://lkml.org/lkml/2011/10/13/426

Otherwise, there is no guarantee all the cpus stop and we can still end up
in data corruption depending on the backend being used.

> diff --git a/kernel/printk.c b/kernel/printk.c
> index 1455a0d..e1e57db 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -1732,6 +1732,13 @@ void kmsg_dump(enum kmsg_dump_reason reason)
>  	unsigned long l1, l2;
>  	unsigned long flags;
>  
> +	/*
> +	 *  kmsg_dump() is called after smp_send_stop() in panic path.
> +	 *  So, spin_lock should be bust for avoiding deadlock.
> +	 */
> +	if (reason == KMSG_DUMP_PANIC)
> +		raw_spin_lock_init(&logbuf_lock);
> +
>  	/* Theoretically, the log could move on after we do this, but
>  	   there's not a lot we can do about that. The new messages
>  	   will overwrite the start of what we dump. */
> -- 

The just adds more lock busting based on the movement of kmsg_dump() in
the panic path.

There is still more lock busting to do in the various backends, but this
patch provides the necessary first steps.

All this lock busting probably isn't pretty and causes one to reflect what
is going on here.  But as long as we are going to keep the kmsg_dump
design, changes like this seem necessary to make sure the locking stays
sane(r) for now.

Acked-by: Don Zickus <dzickus@redhat.com>

  parent reply	other threads:[~2011-10-17 16:09 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-14 20:53 [RFC][PATCH -next] make pstore/kmsg_dump run after stopping other cpus in panic path Seiji Aguchi
2011-10-17  6:21 ` Chen Gong
2011-10-17 14:10   ` Seiji Aguchi
2011-10-18  7:28     ` Chen Gong
2011-10-17 16:09 ` Don Zickus [this message]
2011-10-17 16:56   ` Luck, Tony
2011-10-17 17:22     ` Don Zickus
2011-10-17 17:34       ` Seiji Aguchi
2011-10-17 23:47 ` Andrew Morton
2011-10-18 12:49   ` Don Zickus
2011-10-18 14:52     ` Seiji Aguchi
2011-10-18 15:10       ` Don Zickus
2011-10-20 15:13         ` Seiji Aguchi
2011-10-20 17:48           ` Don Zickus
2011-10-20 18:39             ` Seiji Aguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111017160904.GG5795@redhat.com \
    --to=dzickus@redhat.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dle-develop@lists.sourceforge.net \
    --cc=gong.chen@intel.com \
    --cc=hughd@chromium.org \
    --cc=jmorris@namei.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mjg@redhat.com \
    --cc=namhyung@gmail.com \
    --cc=satoru.moriya@hds.com \
    --cc=seiji.aguchi@hds.com \
    --cc=tony.luck@intel.com \
    --cc=vgoyal@redhat.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox