All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Verma, Vishal L" <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: "Luck, Tony" <tony.luck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"bp-l3A5Bk7waGM@public.gmane.org"
	<bp-l3A5Bk7waGM@public.gmane.org>
Cc: "linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org"
	<linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org>,
	"x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org"
	<x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org"
	<tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'
Date: Fri, 21 Apr 2017 21:39:45 +0000	[thread overview]
Message-ID: <1492810703.2738.27.camel@intel.com> (raw)
In-Reply-To: <20170413113159.rc32ebiswn64nzrr-fF5Pk5pvG8Y@public.gmane.org>

On Thu, 2017-04-13 at 13:31 +0200, Borislav Petkov wrote:
> On Thu, Apr 13, 2017 at 12:29:25AM +0200, Borislav Petkov wrote:
> > On Wed, Apr 12, 2017 at 03:26:19PM -0700, Luck, Tony wrote:
> > > We can futz with that and have them specify which chain (or both)
> > > that they want to be added to.
> > 
> > Well, I didn't want the atomic chain to be a notifier because we can
> > keep it simple and non-blocking. Only the process context one will
> > be.
> > 
> > So the question is, do we even have a use case for outside consumers
> > hanging on the atomic chain? Because if not, we're good to go.
> 
> Ok, new day, new patch.
> 
> Below is what we could do: we don't call the notifier at all on the
> atomic path but only print the MCEs. We do log them and if the machine
> survives, we process them accordingly. This is only a fix for upstream
> so that the current issue at hand is addressed.
> 
> For later, we'd need to split the paths in:
> 
> critical_print_mce()
> 
> or somesuch which immediately dumps the MCE to dmesg, and
> 
> mce_log()
> 
> which does the slow path of logging MCEs and calling the blocking
> notifier.
> 
> Now, I'd want to have decoding of the MCE on the critical path too so
> I have to think about how to do that nicely. Maybe move the decoding
> bits which are the same between Intel and AMD in mce.c and have some
> vendor-specific, fast calls. We'll see. Btw, this is something Ingo
> has
> been mentioning for a while.
> 
> Anyway, here's just the urgent fix for now.
> 
> Thanks.
> 
> ---
> From: Vishal Verma <vishal.l.verma@intel.com>
> Date: Tue, 11 Apr 2017 16:44:57 -0600
> Subject: [PATCH] x86/mce: Make the MCE notifier a blocking one
> 
> The NFIT MCE handler callback (for handling media errors on NVDIMMs)
> takes a mutex to add the location of a memory error to a list. But
> since
> the notifier call chain for machine checks (x86_mce_decoder_chain) is
> atomic, we get a lockdep splat like:
> 
>   BUG: sleeping function called from invalid context at
> kernel/locking/mutex.c:620
>   in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
>   [..]
>   Call Trace:
>    dump_stack
>    ___might_sleep
>    __might_sleep
>    mutex_lock_nested
>    ? __lock_acquire
>    nfit_handle_mce
>    notifier_call_chain
>    atomic_notifier_call_chain
>    ? atomic_notifier_call_chain
>    mce_gen_pool_process
> 
> Convert the notifier to a blocking one which gets to run only in
> process
> context.
> 
> Boris: remove the notifier call in atomic context in print_mce(). For
> now, let's print the MCE on the atomic path so that we can make sure
> it
> goes out. We still log it for process context later.
> 
> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-edac <linux-edac@vger.kernel.org>
> Cc: x86-ml <x86@kernel.org>
> Cc: <stable@vger.kernel.org>
> Link: http://lkml.kernel.org/r/20170411224457.24777-1-vishal.l.verma@i
> ntel.com
> Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media
> error")
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  arch/x86/kernel/cpu/mcheck/mce-genpool.c  |  2 +-
>  arch/x86/kernel/cpu/mcheck/mce-internal.h |  2 +-
>  arch/x86/kernel/cpu/mcheck/mce.c          | 18 ++++--------------
>  3 files changed, 6 insertions(+), 16 deletions(-)
> 

I noticed this patch was picked up in tip, in ras/urgent, but didn't see
a pull request for 4.11 - was this the intention? Or will it just be
added for 4.12?

	-Vishal
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: "Verma, Vishal L" <vishal.l.verma@intel.com>
To: "Luck, Tony" <tony.luck@intel.com>, "bp@suse.de" <bp@suse.de>
Cc: "tglx@linutronix.de" <tglx@linutronix.de>,
	"Williams, Dan J" <dan.j.williams@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-nvdimm@ml01.01.org" <linux-nvdimm@ml01.01.org>
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'
Date: Fri, 21 Apr 2017 21:39:45 +0000	[thread overview]
Message-ID: <1492810703.2738.27.camel@intel.com> (raw)
In-Reply-To: <20170413113159.rc32ebiswn64nzrr@pd.tnic>

On Thu, 2017-04-13 at 13:31 +0200, Borislav Petkov wrote:
> On Thu, Apr 13, 2017 at 12:29:25AM +0200, Borislav Petkov wrote:
> > On Wed, Apr 12, 2017 at 03:26:19PM -0700, Luck, Tony wrote:
> > > We can futz with that and have them specify which chain (or both)
> > > that they want to be added to.
> > 
> > Well, I didn't want the atomic chain to be a notifier because we can
> > keep it simple and non-blocking. Only the process context one will
> > be.
> > 
> > So the question is, do we even have a use case for outside consumers
> > hanging on the atomic chain? Because if not, we're good to go.
> 
> Ok, new day, new patch.
> 
> Below is what we could do: we don't call the notifier at all on the
> atomic path but only print the MCEs. We do log them and if the machine
> survives, we process them accordingly. This is only a fix for upstream
> so that the current issue at hand is addressed.
> 
> For later, we'd need to split the paths in:
> 
> critical_print_mce()
> 
> or somesuch which immediately dumps the MCE to dmesg, and
> 
> mce_log()
> 
> which does the slow path of logging MCEs and calling the blocking
> notifier.
> 
> Now, I'd want to have decoding of the MCE on the critical path too so
> I have to think about how to do that nicely. Maybe move the decoding
> bits which are the same between Intel and AMD in mce.c and have some
> vendor-specific, fast calls. We'll see. Btw, this is something Ingo
> has
> been mentioning for a while.
> 
> Anyway, here's just the urgent fix for now.
> 
> Thanks.
> 
> ---
> From: Vishal Verma <vishal.l.verma@intel.com>
> Date: Tue, 11 Apr 2017 16:44:57 -0600
> Subject: [PATCH] x86/mce: Make the MCE notifier a blocking one
> 
> The NFIT MCE handler callback (for handling media errors on NVDIMMs)
> takes a mutex to add the location of a memory error to a list. But
> since
> the notifier call chain for machine checks (x86_mce_decoder_chain) is
> atomic, we get a lockdep splat like:
> 
>   BUG: sleeping function called from invalid context at
> kernel/locking/mutex.c:620
>   in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
>   [..]
>   Call Trace:
>    dump_stack
>    ___might_sleep
>    __might_sleep
>    mutex_lock_nested
>    ? __lock_acquire
>    nfit_handle_mce
>    notifier_call_chain
>    atomic_notifier_call_chain
>    ? atomic_notifier_call_chain
>    mce_gen_pool_process
> 
> Convert the notifier to a blocking one which gets to run only in
> process
> context.
> 
> Boris: remove the notifier call in atomic context in print_mce(). For
> now, let's print the MCE on the atomic path so that we can make sure
> it
> goes out. We still log it for process context later.
> 
> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-edac <linux-edac@vger.kernel.org>
> Cc: x86-ml <x86@kernel.org>
> Cc: <stable@vger.kernel.org>
> Link: http://lkml.kernel.org/r/20170411224457.24777-1-vishal.l.verma@i
> ntel.com
> Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media
> error")
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  arch/x86/kernel/cpu/mcheck/mce-genpool.c  |  2 +-
>  arch/x86/kernel/cpu/mcheck/mce-internal.h |  2 +-
>  arch/x86/kernel/cpu/mcheck/mce.c          | 18 ++++--------------
>  3 files changed, 6 insertions(+), 16 deletions(-)
> 

I noticed this patch was picked up in tip, in ras/urgent, but didn't see
a pull request for 4.11 - was this the intention? Or will it just be
added for 4.12?

	-Vishal

  parent reply	other threads:[~2017-04-21 21:39 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-11 22:44 [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic' Vishal Verma
2017-04-11 22:44 ` Vishal Verma
2017-04-12  9:14 ` Borislav Petkov
2017-04-12  9:14   ` Borislav Petkov
2017-04-12 19:59   ` Vishal Verma
2017-04-12 19:59     ` Vishal Verma
2017-04-12 20:22     ` Borislav Petkov
2017-04-12 20:22       ` Borislav Petkov
2017-04-12 20:27       ` Verma, Vishal L
2017-04-12 20:27         ` Verma, Vishal L
2017-04-12 20:52         ` Luck, Tony
2017-04-12 20:52           ` Luck, Tony
2017-04-12 20:55           ` Dan Williams
2017-04-12 20:55             ` Dan Williams
2017-04-12 21:12             ` Thomas Gleixner
2017-04-12 21:12               ` Thomas Gleixner
2017-04-12 21:19               ` Luck, Tony
2017-04-12 21:19                 ` Luck, Tony
2017-04-12 21:47                 ` Borislav Petkov
2017-04-12 21:47                   ` Borislav Petkov
2017-04-12 22:16                   ` Borislav Petkov
2017-04-12 22:16                     ` Borislav Petkov
2017-04-12 22:26                     ` Luck, Tony
2017-04-12 22:26                       ` Luck, Tony
2017-04-12 22:29                       ` Borislav Petkov
2017-04-12 22:29                         ` Borislav Petkov
2017-04-13 11:31                         ` Borislav Petkov
2017-04-13 11:31                           ` Borislav Petkov
2017-04-13 12:12                           ` Borislav Petkov
2017-04-13 12:12                             ` Borislav Petkov
2017-04-18 16:28                             ` Luck, Tony
2017-04-18 16:28                               ` Luck, Tony
     [not found]                           ` <20170413113159.rc32ebiswn64nzrr-fF5Pk5pvG8Y@public.gmane.org>
2017-04-21 21:39                             ` Verma, Vishal L [this message]
2017-04-21 21:39                               ` Verma, Vishal L
2017-04-12 21:13         ` Borislav Petkov
2017-04-12 21:13           ` Borislav Petkov
2017-04-12 21:50           ` Thomas Gleixner
2017-04-12 21:50             ` Thomas Gleixner
2017-04-12 22:42             ` Paul E. McKenney
2017-04-12 22:42               ` Paul E. McKenney
2017-04-12 23:45               ` Paul E. McKenney
2017-04-12 23:45                 ` Paul E. McKenney
2017-04-13 14:34                 ` Paul E. McKenney
2017-04-13 14:34                   ` Paul E. McKenney
  -- strict thread matches above, loose matches on Subject: below --
2017-04-18 20:27 [tip:ras/urgent] x86/mce: Make the MCE notifier a blocking one tip-bot for Borislav Petkov
2017-04-18 20:27 ` tip-bot for Vishal Verma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1492810703.2738.27.camel@intel.com \
    --to=vishal.l.verma-ral2jqcrhueavxtiumwx3w@public.gmane.org \
    --cc=bp-l3A5Bk7waGM@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org \
    --cc=tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org \
    --cc=tony.luck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.