public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* pstore: change mutex locking to spin_locks
@ 2011-08-12 17:54 Luck, Tony
  2011-08-12 17:59 ` Matthew Garrett
  2011-08-17 21:22 ` Andrew Morton
  0 siblings, 2 replies; 10+ messages in thread
From: Luck, Tony @ 2011-08-12 17:54 UTC (permalink / raw)
  To: linux-kernel; +Cc: Don Zickus, Matthew Garrett

From: Don Zickus <dzickus@redhat.com>

pstore was using mutex locking to protect read/write access to the
backend plug-ins.  This causes problems when pstore is executed in
an NMI context through panic() -> kmsg_dump().

This patch changes the mutex to a spin_lock_irqsave then also checks to
see if we are in an NMI context.  If we are in an NMI and can't get the
lock, just print a message stating that and blow by the locking.

All this is probably a hack around the bigger locking problem but it
solves my current situation of trying to sleep in an NMI context.

Tested by loading the lkdtm module and executing a HARDLOCKUP which
will cause the machine to panic inside the nmi handler.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

---

Essentially the same patch that Don posted on July 20th. I just added the
efivars.c bit, and did some minor formatting changes to make it fit on
top of the "defer inserting OOPs entries into pstore" patch that I posted
yesterday.  I plan to add both patches to linux-next soon, and will send
to Linus in the 3.2 merge window.

diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
index 2ca59dc..5e820ea 100644
--- a/drivers/acpi/apei/erst.c
+++ b/drivers/acpi/apei/erst.c
@@ -1165,7 +1165,7 @@ static int __init erst_init(void)
 		goto err_release_erange;
 
 	buf = kmalloc(erst_erange.size, GFP_KERNEL);
-	mutex_init(&erst_info.buf_mutex);
+	spin_lock_init(&erst_info.buf_lock);
 	if (buf) {
 		erst_info.buf = buf + sizeof(struct cper_pstore_record);
 		erst_info.bufsize = erst_erange.size -
diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index eb80b54..be8bcb0 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -978,7 +978,7 @@ int register_efivars(struct efivars *efivars,
 	if (efivars->efi_pstore_info.buf) {
 		efivars->efi_pstore_info.bufsize = 1024;
 		efivars->efi_pstore_info.data = efivars;
-		mutex_init(&efivars->efi_pstore_info.buf_mutex);
+		spin_lock_init(&efivars->efi_pstore_info.buf_lock);
 		pstore_register(&efivars->efi_pstore_info);
 	}
 
diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index ca60ebc..0472924 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -28,6 +28,7 @@
 #include <linux/timer.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
+#include <linux/hardirq.h>
 #include <linux/workqueue.h>
 
 #include "internal.h"
@@ -88,13 +89,20 @@ static void pstore_dump(struct kmsg_dumper *dumper,
 	u64		id;
 	int		hsize;
 	unsigned int	part = 1;
+	unsigned long	flags = 0;
+	int		is_locked = 0;
 
 	if (reason < ARRAY_SIZE(reason_str))
 		why = reason_str[reason];
 	else
 		why = "Unknown";
 
-	mutex_lock(&psinfo->buf_mutex);
+	if (in_nmi()) {
+		is_locked = spin_trylock(&psinfo->buf_lock);
+		if (!is_locked)
+			pr_err("pstore dump routine blocked in NMI, may corrupt error record\n");
+	} else
+		spin_lock_irqsave(&psinfo->buf_lock, flags);
 	oopscount++;
 	while (total < kmsg_bytes) {
 		dst = psinfo->buf;
@@ -123,7 +131,11 @@ static void pstore_dump(struct kmsg_dumper *dumper,
 		total += l1_cpy + l2_cpy;
 		part++;
 	}
-	mutex_unlock(&psinfo->buf_mutex);
+	if (in_nmi()) {
+		if (is_locked)
+			spin_unlock(&psinfo->buf_lock);
+	} else
+		spin_unlock_irqrestore(&psinfo->buf_lock, flags);
 }
 
 static struct kmsg_dumper pstore_dumper = {
@@ -188,11 +200,12 @@ void pstore_get_records(int quiet)
 	enum pstore_type_id	type;
 	struct timespec		time;
 	int			failed = 0, rc;
+	unsigned long		flags;
 
 	if (!psi)
 		return;
 
-	mutex_lock(&psinfo->buf_mutex);
+	spin_lock_irqsave(&psinfo->buf_lock, flags);
 	rc = psi->open(psi);
 	if (rc)
 		goto out;
@@ -205,7 +218,7 @@ void pstore_get_records(int quiet)
 	}
 	psi->close(psi);
 out:
-	mutex_unlock(&psinfo->buf_mutex);
+	spin_unlock_irqrestore(&psinfo->buf_lock, flags);
 
 	if (failed)
 		printk(KERN_WARNING "pstore: failed to load %d record(s) from '%s'\n",
@@ -233,7 +246,8 @@ static void pstore_timefunc(unsigned long dummy)
  */
 int pstore_write(enum pstore_type_id type, char *buf, size_t size)
 {
-	u64	id;
+	u64		id;
+	unsigned long	flags;
 
 	if (!psinfo)
 		return -ENODEV;
@@ -241,13 +255,13 @@ int pstore_write(enum pstore_type_id type, char *buf, size_t size)
 	if (size > psinfo->bufsize)
 		return -EFBIG;
 
-	mutex_lock(&psinfo->buf_mutex);
+	spin_lock_irqsave(&psinfo->buf_lock, flags);
 	memcpy(psinfo->buf, buf, size);
 	id = psinfo->write(type, 0, size, psinfo);
 	if (pstore_is_mounted())
 		pstore_mkfile(PSTORE_TYPE_DMESG, psinfo->name, id, psinfo->buf,
 			      size, CURRENT_TIME, psinfo);
-	mutex_unlock(&psinfo->buf_mutex);
+	spin_unlock_irqrestore(&psinfo->buf_lock, flags);
 
 	return 0;
 }
diff --git a/include/linux/pstore.h b/include/linux/pstore.h
index cc03bbf..b91440e 100644
--- a/include/linux/pstore.h
+++ b/include/linux/pstore.h
@@ -32,7 +32,7 @@ enum pstore_type_id {
 struct pstore_info {
 	struct module	*owner;
 	char		*name;
-	struct mutex	buf_mutex;	/* serialize access to 'buf' */
+	spinlock_t	buf_lock;	/* serialize access to 'buf' */
 	char		*buf;
 	size_t		bufsize;
 	int		(*open)(struct pstore_info *psi);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: pstore: change mutex locking to spin_locks
  2011-08-12 17:54 pstore: change mutex locking to spin_locks Luck, Tony
@ 2011-08-12 17:59 ` Matthew Garrett
  2011-08-17 21:22 ` Andrew Morton
  1 sibling, 0 replies; 10+ messages in thread
From: Matthew Garrett @ 2011-08-12 17:59 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-kernel, Don Zickus

On Fri, Aug 12, 2011 at 10:54:51AM -0700, Luck, Tony wrote:
> From: Don Zickus <dzickus@redhat.com>
> 
> pstore was using mutex locking to protect read/write access to the
> backend plug-ins.  This causes problems when pstore is executed in
> an NMI context through panic() -> kmsg_dump().
> 
> This patch changes the mutex to a spin_lock_irqsave then also checks to
> see if we are in an NMI context.  If we are in an NMI and can't get the
> lock, just print a message stating that and blow by the locking.
> 
> All this is probably a hack around the bigger locking problem but it
> solves my current situation of trying to sleep in an NMI context.
> 
> Tested by loading the lkdtm module and executing a HARDLOCKUP which
> will cause the machine to panic inside the nmi handler.
> 
> Signed-off-by: Don Zickus <dzickus@redhat.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Matthew Garrett <mjg@redhat.com>

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pstore: change mutex locking to spin_locks
  2011-08-12 17:54 pstore: change mutex locking to spin_locks Luck, Tony
  2011-08-12 17:59 ` Matthew Garrett
@ 2011-08-17 21:22 ` Andrew Morton
  2011-08-17 21:47   ` Peter Zijlstra
  2011-08-18 12:58   ` Don Zickus
  1 sibling, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2011-08-17 21:22 UTC (permalink / raw)
  To: Luck, Tony
  Cc: linux-kernel, Don Zickus, Matthew Garrett, Ingo Molnar,
	Thomas Gleixner, Peter Zijlstra

On Fri, 12 Aug 2011 10:54:51 -0700
"Luck, Tony" <tony.luck@intel.com> wrote:

> From: Don Zickus <dzickus@redhat.com>
> 
> pstore was using mutex locking to protect read/write access to the
> backend plug-ins.  This causes problems when pstore is executed in
> an NMI context through panic() -> kmsg_dump().
> 
> This patch changes the mutex to a spin_lock_irqsave then also checks to
> see if we are in an NMI context.  If we are in an NMI and can't get the
> lock, just print a message stating that and blow by the locking.
> 
> All this is probably a hack around the bigger locking problem but it
> solves my current situation of trying to sleep in an NMI context.
> 
> Tested by loading the lkdtm module and executing a HARDLOCKUP which
> will cause the machine to panic inside the nmi handler.
> 
> ...
>
> +	if (in_nmi()) {
> +		is_locked = spin_trylock(&psinfo->buf_lock);
> +		if (!is_locked)
> +			pr_err("pstore dump routine blocked in NMI, may corrupt error record\n");
> +	} else
> +		spin_lock_irqsave(&psinfo->buf_lock, flags);
>  	oopscount++;
>  	while (total < kmsg_bytes) {
>  		dst = psinfo->buf;
> @@ -123,7 +131,11 @@ static void pstore_dump(struct kmsg_dumper *dumper,
>  		total += l1_cpy + l2_cpy;
>  		part++;
>  	}
> -	mutex_unlock(&psinfo->buf_mutex);
> +	if (in_nmi()) {
> +		if (is_locked)
> +			spin_unlock(&psinfo->buf_lock);
> +	} else
> +		spin_unlock_irqrestore(&psinfo->buf_lock, flags);
>  }

It's still bad if lockdep is enabled.  See
kernel/lockdep.c:lock_acquire() and lock_release().  They aren't
NMI-safe.

One approach would be to switch to bit_spin_lock().  Which will break
if/when bit spinlocks get lockdep-enabled, so don't do that.

A better approach would be to use the underlying spinlock functions
which bypass lockdep, but I cannot immediately locate those amongst
the misama of spinlock interface mess.

This problem of locking-vs-NMIs has been "solved" several times before
but I don't recall any standardized approach being developed.  Does
anyone have a favorite implementation to look at?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pstore: change mutex locking to spin_locks
  2011-08-17 21:22 ` Andrew Morton
@ 2011-08-17 21:47   ` Peter Zijlstra
  2011-08-18 13:04     ` Don Zickus
  2011-08-18 12:58   ` Don Zickus
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2011-08-17 21:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Luck, Tony, linux-kernel, Don Zickus, Matthew Garrett,
	Ingo Molnar, Thomas Gleixner

On Wed, 2011-08-17 at 14:22 -0700, Andrew Morton wrote:
> It's still bad if lockdep is enabled.  See
> kernel/lockdep.c:lock_acquire() and lock_release().  They aren't
> NMI-safe. 

Good thing nmi_enter() does lockdep_disable() and makes lock_acquire()
bail on if (current->lockdep_recursion).

Still the fact that pstore needs locks from NMI context (let alone tried
to use a mutex) makes one think one should avoid it like the plague.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pstore: change mutex locking to spin_locks
  2011-08-17 21:22 ` Andrew Morton
  2011-08-17 21:47   ` Peter Zijlstra
@ 2011-08-18 12:58   ` Don Zickus
  2011-08-18 14:37     ` Peter Zijlstra
  2011-08-18 16:33     ` Luck, Tony
  1 sibling, 2 replies; 10+ messages in thread
From: Don Zickus @ 2011-08-18 12:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Luck, Tony, linux-kernel, Matthew Garrett, Ingo Molnar,
	Thomas Gleixner, Peter Zijlstra

On Wed, Aug 17, 2011 at 02:22:25PM -0700, Andrew Morton wrote:
> On Fri, 12 Aug 2011 10:54:51 -0700
> "Luck, Tony" <tony.luck@intel.com> wrote:
> 
> > From: Don Zickus <dzickus@redhat.com>
> > 
> > pstore was using mutex locking to protect read/write access to the
> > backend plug-ins.  This causes problems when pstore is executed in
> > an NMI context through panic() -> kmsg_dump().
> > 
> > This patch changes the mutex to a spin_lock_irqsave then also checks to
> > see if we are in an NMI context.  If we are in an NMI and can't get the
> > lock, just print a message stating that and blow by the locking.
> > 
> > All this is probably a hack around the bigger locking problem but it
> > solves my current situation of trying to sleep in an NMI context.
> > 
> > Tested by loading the lkdtm module and executing a HARDLOCKUP which
> > will cause the machine to panic inside the nmi handler.
> > 
> > ...
> >
> > +	if (in_nmi()) {
> > +		is_locked = spin_trylock(&psinfo->buf_lock);
> > +		if (!is_locked)
> > +			pr_err("pstore dump routine blocked in NMI, may corrupt error record\n");
> > +	} else
> > +		spin_lock_irqsave(&psinfo->buf_lock, flags);
> >  	oopscount++;
> >  	while (total < kmsg_bytes) {
> >  		dst = psinfo->buf;
> > @@ -123,7 +131,11 @@ static void pstore_dump(struct kmsg_dumper *dumper,
> >  		total += l1_cpy + l2_cpy;
> >  		part++;
> >  	}
> > -	mutex_unlock(&psinfo->buf_mutex);
> > +	if (in_nmi()) {
> > +		if (is_locked)
> > +			spin_unlock(&psinfo->buf_lock);
> > +	} else
> > +		spin_unlock_irqrestore(&psinfo->buf_lock, flags);
> >  }
> 
> It's still bad if lockdep is enabled.  See
> kernel/lockdep.c:lock_acquire() and lock_release().  They aren't
> NMI-safe.
> 
> One approach would be to switch to bit_spin_lock().  Which will break
> if/when bit spinlocks get lockdep-enabled, so don't do that.
> 
> A better approach would be to use the underlying spinlock functions
> which bypass lockdep, but I cannot immediately locate those amongst
> the misama of spinlock interface mess.

Probably the raw_spin_* stuff.  I think those purposely avoid the lockdep
mechanisms.

> 
> This problem of locking-vs-NMIs has been "solved" several times before
> but I don't recall any standardized approach being developed.  Does
> anyone have a favorite implementation to look at?

The ones I have looked at perf and apei/ghes, had issues of receiving data
in an NMI context and trying to pass it to userspace.  This was solved
with irq_work_queue.  Sometimes one can sprinkle some cmpxchg commands in
there to quickly write to registers from a normal context which seems to
play nicely with a process in an NMI context wants to write to the same
register.

But in this case we have a filesystem that can be read/written to from a
normal context and also written to from an NMI context.  irq_work_queue
doesn't apply here and I don't think you can just cmpxchg a PAGE full of
data into a firmware storage area.  This doesn't even get into the state
machine the kernel needs to walk through to store the data (which can
easily be interrupted by an NMI).

I would be excited in there was a solution that we can copy, but I didn't
see any nor would I really expect one in this unusual case.

The patch that I provided that Tony reposted is just a lesser of two evils
approach.  It is still flawed, just not as much as before.  The idea was
to buy time until we could think of a better approach to solving this.

Cheers,
Don


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pstore: change mutex locking to spin_locks
  2011-08-17 21:47   ` Peter Zijlstra
@ 2011-08-18 13:04     ` Don Zickus
  2011-08-18 15:26       ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Don Zickus @ 2011-08-18 13:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Luck, Tony, linux-kernel, Matthew Garrett,
	Ingo Molnar, Thomas Gleixner

On Wed, Aug 17, 2011 at 11:47:59PM +0200, Peter Zijlstra wrote:
> On Wed, 2011-08-17 at 14:22 -0700, Andrew Morton wrote:
> > It's still bad if lockdep is enabled.  See
> > kernel/lockdep.c:lock_acquire() and lock_release().  They aren't
> > NMI-safe. 
> 
> Good thing nmi_enter() does lockdep_disable() and makes lock_acquire()
> bail on if (current->lockdep_recursion).
> 
> Still the fact that pstore needs locks from NMI context (let alone tried
> to use a mutex) makes one think one should avoid it like the plague.

Unfortunately, this plague is called ACPI4 (well APEI).  It's an attempt
to provide a coherent strategy for platform errors.  It needs a lot of
love to make it work, so we can't just avoid it (well us distro folks have
customers who want this). :-(

I have been having conversations with Matthew and Vivek Goyal about how to
simplify kmsg_dump, which might allow us to make some of the pstore paths
lockless, simplifying the solution.  We'll see how that goes.

Cheers,
Don

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pstore: change mutex locking to spin_locks
  2011-08-18 12:58   ` Don Zickus
@ 2011-08-18 14:37     ` Peter Zijlstra
  2011-08-18 16:33     ` Luck, Tony
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2011-08-18 14:37 UTC (permalink / raw)
  To: Don Zickus
  Cc: Andrew Morton, Luck, Tony, linux-kernel, Matthew Garrett,
	Ingo Molnar, Thomas Gleixner

On Thu, 2011-08-18 at 08:58 -0400, Don Zickus wrote:
> Probably the raw_spin_* stuff.  I think those purposely avoid the lockdep
> mechanisms. 

Nope, raw_spin_* is covered by lockdep just fine. Also anything
purposely avoiding lockdep is bound to be a piece of crap :-)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pstore: change mutex locking to spin_locks
  2011-08-18 13:04     ` Don Zickus
@ 2011-08-18 15:26       ` Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2011-08-18 15:26 UTC (permalink / raw)
  To: Don Zickus
  Cc: Andrew Morton, Luck, Tony, linux-kernel, Matthew Garrett,
	Ingo Molnar, Thomas Gleixner

On Thu, 2011-08-18 at 09:04 -0400, Don Zickus wrote:
> 
> Unfortunately, this plague is called ACPI4 (well APEI) 

Bah, ACPI4 is a running trainwreck, we should just boycot that horror..
OS vendors refusing to implement their drug induced crap should
hopefully give a signal.. then again you can never tell with those
chaps.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: pstore: change mutex locking to spin_locks
  2011-08-18 12:58   ` Don Zickus
  2011-08-18 14:37     ` Peter Zijlstra
@ 2011-08-18 16:33     ` Luck, Tony
  2011-08-18 17:25       ` Don Zickus
  1 sibling, 1 reply; 10+ messages in thread
From: Luck, Tony @ 2011-08-18 16:33 UTC (permalink / raw)
  To: Don Zickus, Andrew Morton
  Cc: linux-kernel@vger.kernel.org, Matthew Garrett, Ingo Molnar,
	Thomas Gleixner, Peter Zijlstra

> But in this case we have a filesystem that can be read/written to from a
> normal context and also written to from an NMI context

I fixed (avoided) the problem of writing to the pstore filesystem in NMI
context in my other pstore patch in linux-next ("defer inserting OOPS entries in pstore").

So the remaining locking requirement in pstore is just to protect
the non-volatile backing store from simultaneous access.

-Tony



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pstore: change mutex locking to spin_locks
  2011-08-18 16:33     ` Luck, Tony
@ 2011-08-18 17:25       ` Don Zickus
  0 siblings, 0 replies; 10+ messages in thread
From: Don Zickus @ 2011-08-18 17:25 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Andrew Morton, linux-kernel@vger.kernel.org, Matthew Garrett,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra

On Thu, Aug 18, 2011 at 09:33:17AM -0700, Luck, Tony wrote:
> > But in this case we have a filesystem that can be read/written to from a
> > normal context and also written to from an NMI context
> 
> I fixed (avoided) the problem of writing to the pstore filesystem in NMI
> context in my other pstore patch in linux-next ("defer inserting OOPS entries in pstore").

Ok, just found that patch.  That fixes one of the issues.  :-)

> 
> So the remaining locking requirement in pstore is just to protect
> the non-volatile backing store from simultaneous access.

Right.  It is still awkward to have userspace write to the pstore and then
be interrupted by an NMI who wants to write to the pstore without causing
deadlocks.  But I think we will get there.

Cheers,
Don

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-08-18 17:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-12 17:54 pstore: change mutex locking to spin_locks Luck, Tony
2011-08-12 17:59 ` Matthew Garrett
2011-08-17 21:22 ` Andrew Morton
2011-08-17 21:47   ` Peter Zijlstra
2011-08-18 13:04     ` Don Zickus
2011-08-18 15:26       ` Peter Zijlstra
2011-08-18 12:58   ` Don Zickus
2011-08-18 14:37     ` Peter Zijlstra
2011-08-18 16:33     ` Luck, Tony
2011-08-18 17:25       ` Don Zickus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox