From: Vishal Verma <vishal.l.verma@intel.com>
To: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>,
linux-nvdimm@lists.01.org, x86@kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'
Date: Wed, 12 Apr 2017 13:59:03 -0600 [thread overview]
Message-ID: <20170412195903.GA29506@omniknight.lm.intel.com> (raw)
In-Reply-To: <20170412091442.dwonfr4dwyta7nvx@pd.tnic>
On 04/12, Borislav Petkov wrote:
> On Tue, Apr 11, 2017 at 04:44:57PM -0600, Vishal Verma wrote:
> > The NFIT MCE handler callback (for handling media errors on NVDIMMs)
> > takes a mutex to add the location of a memory error to a list. But since
> > the notifier call chain for machine checks (x86_mce_decoder_chain) is
> > atomic, we get a lockdep splat like:
> >
> > BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
> > in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
> > [..]
> > Call Trace:
> > dump_stack+0x86/0xc3
> > ___might_sleep+0x178/0x240
> > __might_sleep+0x4a/0x80
> > mutex_lock_nested+0x43/0x3f0
> > ? __lock_acquire+0xcbc/0x1290
> > nfit_handle_mce+0x33/0x180 [nfit]
> > notifier_call_chain+0x4a/0x70
> > atomic_notifier_call_chain+0x6e/0x110
> > ? atomic_notifier_call_chain+0x5/0x110
> > mce_gen_pool_process+0x41/0x70
> >
> > Commit 648ed94038c030245a06e4be59744fd5cdc18c40
> > x86/mce: Provide a lockless memory pool to save error records
> > Changes the mce notifier callbacks to be run in a process context, and
> > this can allow us to use the 'blocking' type notifier, where we can take
> > mutexes etc. in the call chain functions.
> >
> > Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> > arch/x86/kernel/cpu/mcheck/mce-genpool.c | 2 +-
> > arch/x86/kernel/cpu/mcheck/mce-internal.h | 2 +-
> > arch/x86/kernel/cpu/mcheck/mce.c | 8 ++++----
> > 3 files changed, 6 insertions(+), 6 deletions(-)
> >
> > While this patch almost solves the problem, I think it is not quite right.
> > The x86_mce_decoder_chain is also called from print_mce for fatal machine
> > checks, and that is, afaict, still from an atomic context. One thing Tony
> > suggested was splitting the notifier chain into two distinct chains, one
> > for regular logging and recoverable actions that allows blocking, the
> > other from the panic path.
>
> Well, if Mohammad won't come to the mountain...
>
> So the NFIT handler has:
>
> /* We only care about memory errors */
> if (!(mce->status & MCACOD))
> return NOTIFY_DONE;
>
> what severity are we talking here? Errors which can be reported on the
> panic path, i.e., in atomic context or only AO/AR ones which don't raise
> an #MC exception?
I don't think we can do anything about the panic path errors. The NFIT
handler takes the recoverable machine checks, and essentially, adds the
location to a list.
>
> --
> Regards/Gruss,
> Boris.
>
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
> --
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Vishal Verma <vishal.l.verma@intel.com>
To: Borislav Petkov <bp@suse.de>
Cc: linux-kernel@vger.kernel.org, linux-nvdimm@ml01.01.org,
x86@kernel.org, Ross Zwisler <ross.zwisler@linux.intel.com>,
Tony Luck <tony.luck@intel.com>,
Dan Williams <dan.j.williams@intel.com>
Subject: Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'
Date: Wed, 12 Apr 2017 13:59:03 -0600 [thread overview]
Message-ID: <20170412195903.GA29506@omniknight.lm.intel.com> (raw)
In-Reply-To: <20170412091442.dwonfr4dwyta7nvx@pd.tnic>
On 04/12, Borislav Petkov wrote:
> On Tue, Apr 11, 2017 at 04:44:57PM -0600, Vishal Verma wrote:
> > The NFIT MCE handler callback (for handling media errors on NVDIMMs)
> > takes a mutex to add the location of a memory error to a list. But since
> > the notifier call chain for machine checks (x86_mce_decoder_chain) is
> > atomic, we get a lockdep splat like:
> >
> > BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
> > in_atomic(): 1, irqs_disabled(): 0, pid: 4, name: kworker/0:0
> > [..]
> > Call Trace:
> > dump_stack+0x86/0xc3
> > ___might_sleep+0x178/0x240
> > __might_sleep+0x4a/0x80
> > mutex_lock_nested+0x43/0x3f0
> > ? __lock_acquire+0xcbc/0x1290
> > nfit_handle_mce+0x33/0x180 [nfit]
> > notifier_call_chain+0x4a/0x70
> > atomic_notifier_call_chain+0x6e/0x110
> > ? atomic_notifier_call_chain+0x5/0x110
> > mce_gen_pool_process+0x41/0x70
> >
> > Commit 648ed94038c030245a06e4be59744fd5cdc18c40
> > x86/mce: Provide a lockless memory pool to save error records
> > Changes the mce notifier callbacks to be run in a process context, and
> > this can allow us to use the 'blocking' type notifier, where we can take
> > mutexes etc. in the call chain functions.
> >
> > Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> > arch/x86/kernel/cpu/mcheck/mce-genpool.c | 2 +-
> > arch/x86/kernel/cpu/mcheck/mce-internal.h | 2 +-
> > arch/x86/kernel/cpu/mcheck/mce.c | 8 ++++----
> > 3 files changed, 6 insertions(+), 6 deletions(-)
> >
> > While this patch almost solves the problem, I think it is not quite right.
> > The x86_mce_decoder_chain is also called from print_mce for fatal machine
> > checks, and that is, afaict, still from an atomic context. One thing Tony
> > suggested was splitting the notifier chain into two distinct chains, one
> > for regular logging and recoverable actions that allows blocking, the
> > other from the panic path.
>
> Well, if Mohammad won't come to the mountain...
>
> So the NFIT handler has:
>
> /* We only care about memory errors */
> if (!(mce->status & MCACOD))
> return NOTIFY_DONE;
>
> what severity are we talking here? Errors which can be reported on the
> panic path, i.e., in atomic context or only AO/AR ones which don't raise
> an #MC exception?
I don't think we can do anything about the panic path errors. The NFIT
handler takes the recoverable machine checks, and essentially, adds the
location to a list.
>
> --
> Regards/Gruss,
> Boris.
>
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
> --
next prev parent reply other threads:[~2017-04-12 20:00 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-11 22:44 [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic' Vishal Verma
2017-04-11 22:44 ` Vishal Verma
2017-04-12 9:14 ` Borislav Petkov
2017-04-12 9:14 ` Borislav Petkov
2017-04-12 19:59 ` Vishal Verma [this message]
2017-04-12 19:59 ` Vishal Verma
2017-04-12 20:22 ` Borislav Petkov
2017-04-12 20:22 ` Borislav Petkov
2017-04-12 20:27 ` Verma, Vishal L
2017-04-12 20:27 ` Verma, Vishal L
2017-04-12 20:52 ` Luck, Tony
2017-04-12 20:52 ` Luck, Tony
2017-04-12 20:55 ` Dan Williams
2017-04-12 20:55 ` Dan Williams
2017-04-12 21:12 ` Thomas Gleixner
2017-04-12 21:12 ` Thomas Gleixner
2017-04-12 21:19 ` Luck, Tony
2017-04-12 21:19 ` Luck, Tony
2017-04-12 21:47 ` Borislav Petkov
2017-04-12 21:47 ` Borislav Petkov
2017-04-12 22:16 ` Borislav Petkov
2017-04-12 22:16 ` Borislav Petkov
2017-04-12 22:26 ` Luck, Tony
2017-04-12 22:26 ` Luck, Tony
2017-04-12 22:29 ` Borislav Petkov
2017-04-12 22:29 ` Borislav Petkov
2017-04-13 11:31 ` Borislav Petkov
2017-04-13 11:31 ` Borislav Petkov
2017-04-13 12:12 ` Borislav Petkov
2017-04-13 12:12 ` Borislav Petkov
2017-04-18 16:28 ` Luck, Tony
2017-04-18 16:28 ` Luck, Tony
[not found] ` <20170413113159.rc32ebiswn64nzrr-fF5Pk5pvG8Y@public.gmane.org>
2017-04-21 21:39 ` Verma, Vishal L
2017-04-21 21:39 ` Verma, Vishal L
2017-04-12 21:13 ` Borislav Petkov
2017-04-12 21:13 ` Borislav Petkov
2017-04-12 21:50 ` Thomas Gleixner
2017-04-12 21:50 ` Thomas Gleixner
2017-04-12 22:42 ` Paul E. McKenney
2017-04-12 22:42 ` Paul E. McKenney
2017-04-12 23:45 ` Paul E. McKenney
2017-04-12 23:45 ` Paul E. McKenney
2017-04-13 14:34 ` Paul E. McKenney
2017-04-13 14:34 ` Paul E. McKenney
-- strict thread matches above, loose matches on Subject: below --
2017-04-18 20:27 [tip:ras/urgent] x86/mce: Make the MCE notifier a blocking one tip-bot for Borislav Petkov
2017-04-18 20:27 ` tip-bot for Vishal Verma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170412195903.GA29506@omniknight.lm.intel.com \
--to=vishal.l.verma@intel.com \
--cc=bp@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.