From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755743Ab1HRM7r (ORCPT ); Thu, 18 Aug 2011 08:59:47 -0400 Received: from mx1.redhat.com ([209.132.183.28]:15945 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755495Ab1HRM7q (ORCPT ); Thu, 18 Aug 2011 08:59:46 -0400 Date: Thu, 18 Aug 2011 08:58:49 -0400 From: Don Zickus To: Andrew Morton Cc: "Luck, Tony" , linux-kernel@vger.kernel.org, Matthew Garrett , Ingo Molnar , Thomas Gleixner , Peter Zijlstra Subject: Re: pstore: change mutex locking to spin_locks Message-ID: <20110818125849.GZ1972@redhat.com> References: <4e4568eb10165cbab6@agluck-desktop.sc.intel.com> <20110817142225.8645fff7.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110817142225.8645fff7.akpm@linux-foundation.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 17, 2011 at 02:22:25PM -0700, Andrew Morton wrote: > On Fri, 12 Aug 2011 10:54:51 -0700 > "Luck, Tony" wrote: > > > From: Don Zickus > > > > pstore was using mutex locking to protect read/write access to the > > backend plug-ins. This causes problems when pstore is executed in > > an NMI context through panic() -> kmsg_dump(). > > > > This patch changes the mutex to a spin_lock_irqsave then also checks to > > see if we are in an NMI context. If we are in an NMI and can't get the > > lock, just print a message stating that and blow by the locking. > > > > All this is probably a hack around the bigger locking problem but it > > solves my current situation of trying to sleep in an NMI context. > > > > Tested by loading the lkdtm module and executing a HARDLOCKUP which > > will cause the machine to panic inside the nmi handler. > > > > ... > > > > + if (in_nmi()) { > > + is_locked = spin_trylock(&psinfo->buf_lock); > > + if (!is_locked) > > + pr_err("pstore dump routine blocked in NMI, may corrupt error record\n"); > > + } else > > + spin_lock_irqsave(&psinfo->buf_lock, flags); > > oopscount++; > > while (total < kmsg_bytes) { > > dst = psinfo->buf; > > @@ -123,7 +131,11 @@ static void pstore_dump(struct kmsg_dumper *dumper, > > total += l1_cpy + l2_cpy; > > part++; > > } > > - mutex_unlock(&psinfo->buf_mutex); > > + if (in_nmi()) { > > + if (is_locked) > > + spin_unlock(&psinfo->buf_lock); > > + } else > > + spin_unlock_irqrestore(&psinfo->buf_lock, flags); > > } > > It's still bad if lockdep is enabled. See > kernel/lockdep.c:lock_acquire() and lock_release(). They aren't > NMI-safe. > > One approach would be to switch to bit_spin_lock(). Which will break > if/when bit spinlocks get lockdep-enabled, so don't do that. > > A better approach would be to use the underlying spinlock functions > which bypass lockdep, but I cannot immediately locate those amongst > the misama of spinlock interface mess. Probably the raw_spin_* stuff. I think those purposely avoid the lockdep mechanisms. > > This problem of locking-vs-NMIs has been "solved" several times before > but I don't recall any standardized approach being developed. Does > anyone have a favorite implementation to look at? The ones I have looked at perf and apei/ghes, had issues of receiving data in an NMI context and trying to pass it to userspace. This was solved with irq_work_queue. Sometimes one can sprinkle some cmpxchg commands in there to quickly write to registers from a normal context which seems to play nicely with a process in an NMI context wants to write to the same register. But in this case we have a filesystem that can be read/written to from a normal context and also written to from an NMI context. irq_work_queue doesn't apply here and I don't think you can just cmpxchg a PAGE full of data into a firmware storage area. This doesn't even get into the state machine the kernel needs to walk through to store the data (which can easily be interrupted by an NMI). I would be excited in there was a solution that we can copy, but I didn't see any nor would I really expect one in this unusual case. The patch that I provided that Tony reposted is just a lesser of two evils approach. It is still flawed, just not as much as before. The idea was to buy time until we could think of a better approach to solving this. Cheers, Don