* [PATCH -tip 2/2 resend] x86, traps: Drop nmi_reason_lock until it is really needed
@ 2011-03-02 15:32 Cyrill Gorcunov
2011-03-02 15:46 ` Ingo Molnar
0 siblings, 1 reply; 8+ messages in thread
From: Cyrill Gorcunov @ 2011-03-02 15:32 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Don Zickus, Huang, Ying, Maciej W. Rozycki, lkml
At moment we have only BSP apic configured to listen
for external NMIs. So there is no reason for additional
spinlock since only BSP will receive them.
Though we still have UV chips which do enable external NMIs
on all cpus, but since an approach to allow retrieving
NMI reason on BSP only was working pretty fine before --
I assume it still remains valid.
Also it's worth to mention that an initial idea of all this
NMI code-path changes was to make BSP hot-unpluggable but
until all other parts of kernel is prepared for it (which
might consume quite a time to implement) I believe we should
not lock/unlock for nothing.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
arch/x86/kernel/traps.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
Index: linux-2.6.git/arch/x86/kernel/traps.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/traps.c
+++ linux-2.6.git/arch/x86/kernel/traps.c
@@ -84,11 +84,6 @@ EXPORT_SYMBOL_GPL(used_vectors);
static int ignore_nmis;
int unknown_nmi_panic;
-/*
- * Prevent NMI reason port (0x61) being accessed simultaneously, can
- * only be used in NMI handler.
- */
-static DEFINE_RAW_SPINLOCK(nmi_reason_lock);
static inline void conditional_sti(struct pt_regs *regs)
{
@@ -406,9 +401,12 @@ static notrace __kprobes void default_do
if (notify_die(DIE_NMI, "nmi", regs, 0, 2, SIGINT) == NOTIFY_STOP)
return;
- /* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
- raw_spin_lock(&nmi_reason_lock);
- reason = get_nmi_reason();
+ /*
+ * Only BSP is configured to listen and handle external NMIs.
+ * Note this implicitly orders a call to the get_nmi_reason.
+ */
+ if (!smp_processor_id())
+ reason = get_nmi_reason();
if (reason & NMI_REASON_MASK) {
if (reason & NMI_REASON_SERR)
@@ -422,10 +420,8 @@ static notrace __kprobes void default_do
*/
reassert_nmi();
#endif
- raw_spin_unlock(&nmi_reason_lock);
return;
}
- raw_spin_unlock(&nmi_reason_lock);
unknown_nmi_error(reason, regs);
}
--
Cyrill
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -tip 2/2 resend] x86, traps: Drop nmi_reason_lock until it is really needed
2011-03-02 15:32 [PATCH -tip 2/2 resend] x86, traps: Drop nmi_reason_lock until it is really needed Cyrill Gorcunov
@ 2011-03-02 15:46 ` Ingo Molnar
2011-03-02 15:55 ` Cyrill Gorcunov
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2011-03-02 15:46 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: Don Zickus, Huang, Ying, Maciej W. Rozycki, lkml
* Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> At moment we have only BSP apic configured to listen
> for external NMIs. So there is no reason for additional
> spinlock since only BSP will receive them.
>
> Though we still have UV chips which do enable external NMIs
> on all cpus, but since an approach to allow retrieving
> NMI reason on BSP only was working pretty fine before --
> I assume it still remains valid.
I'm not sure I get the point here: we might get NMIs on non-BSP on UV
systems ... so we want to remove the spinlock?
If UV systems can get NMIs on any CPU then the lock is needed.
It might have worked before - but UV systems are rare and relatively
new - plus the race window is small, so it might not have been triggered
in practice.
> Also it's worth to mention that an initial idea of all this
> NMI code-path changes was to make BSP hot-unpluggable but
> until all other parts of kernel is prepared for it (which
> might consume quite a time to implement) I believe we should
> not lock/unlock for nothing.
That would be another argument in favor of keeping the lock, right?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -tip 2/2 resend] x86, traps: Drop nmi_reason_lock until it is really needed
2011-03-02 15:46 ` Ingo Molnar
@ 2011-03-02 15:55 ` Cyrill Gorcunov
2011-03-02 16:03 ` Ingo Molnar
0 siblings, 1 reply; 8+ messages in thread
From: Cyrill Gorcunov @ 2011-03-02 15:55 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Don Zickus, Huang, Ying, Maciej W. Rozycki, lkml
On 03/02/2011 06:46 PM, Ingo Molnar wrote:
>
> * Cyrill Gorcunov <gorcunov@openvz.org> wrote:
>
>> At moment we have only BSP apic configured to listen
>> for external NMIs. So there is no reason for additional
>> spinlock since only BSP will receive them.
>>
>> Though we still have UV chips which do enable external NMIs
>> on all cpus, but since an approach to allow retrieving
>> NMI reason on BSP only was working pretty fine before --
>> I assume it still remains valid.
>
> I'm not sure I get the point here: we might get NMIs on non-BSP on UV
> systems ... so we want to remove the spinlock?
>
> If UV systems can get NMIs on any CPU then the lock is needed.
>
> It might have worked before - but UV systems are rare and relatively
> new - plus the race window is small, so it might not have been triggered
> in practice.
Well, it is incomplete anyway. As far as I can tell even ordering such
NMIs with spinlock would not make situation better 'cause other cpu might
obtain unknown nmi (ie two or more cpu's gets NMI then handing started on
first found that it was say MCE error, handle it, unlock spinlock and then
the second cpu gets this nmi (the reason for which was already handled by
first cpu) and sees unknown NMI. So this lock might simply hiding a bug.
Of course I might be missing something.
>
>> Also it's worth to mention that an initial idea of all this
>> NMI code-path changes was to make BSP hot-unpluggable but
>> until all other parts of kernel is prepared for it (which
>> might consume quite a time to implement) I believe we should
>> not lock/unlock for nothing.
>
> That would be another argument in favor of keeping the lock, right?
Yes, but I think this lock should be the last thing which is introduced,
after all other parts of kernel are ready for bsp unplug.
>
> Thanks,
>
> Ingo
--
Cyrill
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -tip 2/2 resend] x86, traps: Drop nmi_reason_lock until it is really needed
2011-03-02 15:55 ` Cyrill Gorcunov
@ 2011-03-02 16:03 ` Ingo Molnar
2011-03-02 16:13 ` Cyrill Gorcunov
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2011-03-02 16:03 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: Don Zickus, Huang, Ying, Maciej W. Rozycki, lkml
* Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> On 03/02/2011 06:46 PM, Ingo Molnar wrote:
> >
> > * Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> >
> >> At moment we have only BSP apic configured to listen
> >> for external NMIs. So there is no reason for additional
> >> spinlock since only BSP will receive them.
> >>
> >> Though we still have UV chips which do enable external NMIs
> >> on all cpus, but since an approach to allow retrieving
> >> NMI reason on BSP only was working pretty fine before --
> >> I assume it still remains valid.
> >
> > I'm not sure I get the point here: we might get NMIs on non-BSP on UV
> > systems ... so we want to remove the spinlock?
> >
> > If UV systems can get NMIs on any CPU then the lock is needed.
> >
> > It might have worked before - but UV systems are rare and relatively
> > new - plus the race window is small, so it might not have been triggered
> > in practice.
>
> Well, it is incomplete anyway. As far as I can tell even ordering such
> NMIs with spinlock would not make situation better 'cause other cpu might
> obtain unknown nmi (ie two or more cpu's gets NMI then handing started on
> first found that it was say MCE error, handle it, unlock spinlock and then
> the second cpu gets this nmi (the reason for which was already handled by
> first cpu) and sees unknown NMI. So this lock might simply hiding a bug.
Well, the lock serializes the read-out of the 'NMI reason' port, the handling of
whatever known reason and then the reassertion of the NMI (on 32-bit).
EDAC has a callback in pci_serr_error() - and this lock serializes that. So we
cannot just remove a lock like that, if there's any chance of parallel execution on
multiple CPUs.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -tip 2/2 resend] x86, traps: Drop nmi_reason_lock until it is really needed
2011-03-02 16:03 ` Ingo Molnar
@ 2011-03-02 16:13 ` Cyrill Gorcunov
2011-03-02 18:40 ` Don Zickus
0 siblings, 1 reply; 8+ messages in thread
From: Cyrill Gorcunov @ 2011-03-02 16:13 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Don Zickus, Huang, Ying, Maciej W. Rozycki, lkml
On 03/02/2011 07:03 PM, Ingo Molnar wrote:
...
>
> Well, the lock serializes the read-out of the 'NMI reason' port, the handling of
> whatever known reason and then the reassertion of the NMI (on 32-bit).
>
> EDAC has a callback in pci_serr_error() - and this lock serializes that. So we
> cannot just remove a lock like that, if there's any chance of parallel execution on
> multiple CPUs.
>
> Thanks,
>
> Ingo
OK, probably we need some UV person CC'ed (not sure whom) just to explain the
reason for such nmi-listening model. Meanwhile -- lets drop my patch.
--
Cyrill
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -tip 2/2 resend] x86, traps: Drop nmi_reason_lock until it is really needed
2011-03-02 16:13 ` Cyrill Gorcunov
@ 2011-03-02 18:40 ` Don Zickus
2011-03-02 19:14 ` Cyrill Gorcunov
0 siblings, 1 reply; 8+ messages in thread
From: Don Zickus @ 2011-03-02 18:40 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: Ingo Molnar, Huang, Ying, Maciej W. Rozycki, lkml
On Wed, Mar 02, 2011 at 07:13:42PM +0300, Cyrill Gorcunov wrote:
> On 03/02/2011 07:03 PM, Ingo Molnar wrote:
> ...
> >
> > Well, the lock serializes the read-out of the 'NMI reason' port, the handling of
> > whatever known reason and then the reassertion of the NMI (on 32-bit).
> >
> > EDAC has a callback in pci_serr_error() - and this lock serializes that. So we
> > cannot just remove a lock like that, if there's any chance of parallel execution on
> > multiple CPUs.
> >
> > Thanks,
> >
> > Ingo
>
> OK, probably we need some UV person CC'ed (not sure whom) just to explain the
> reason for such nmi-listening model. Meanwhile -- lets drop my patch.
It's for debugging reasons. When their huge machine deadlocks, they
wanted an easy mechanism to dump the cpu stacks. That mechanism was an
nmi button. The problem was the button would only dump the first cpu. By
opening up the other cpus to accept external nmis, they could dump all the
cpus.
Now this spinlock doesn't affect them, because they registered an nmi
handler to catch it and dump their stack (I modified the code to use
DIE_NMIUNKNOWN instead of DIE_NMI to avoid conflict with the
nmi_watchdog). But I don't know what the affect is, if that spinlock is
not there (I sent a private email to SGI inquiring, their guy wasn't
around this week).
Personally I am indifferent to this patch. I don't have any problems with
the code the way it is now, but can understand what you mean having stuff
lying around as 'dead code'. I had thought Intel would have pushed more
patches upstream to remove the BSP lock-in by now.
Cheers,
Don
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -tip 2/2 resend] x86, traps: Drop nmi_reason_lock until it is really needed
2011-03-02 18:40 ` Don Zickus
@ 2011-03-02 19:14 ` Cyrill Gorcunov
2011-03-02 19:46 ` Don Zickus
0 siblings, 1 reply; 8+ messages in thread
From: Cyrill Gorcunov @ 2011-03-02 19:14 UTC (permalink / raw)
To: Don Zickus; +Cc: Ingo Molnar, Huang, Ying, Maciej W. Rozycki, lkml
On 03/02/2011 09:40 PM, Don Zickus wrote:
> On Wed, Mar 02, 2011 at 07:13:42PM +0300, Cyrill Gorcunov wrote:
>> On 03/02/2011 07:03 PM, Ingo Molnar wrote:
>> ...
>>>
>>> Well, the lock serializes the read-out of the 'NMI reason' port, the handling of
>>> whatever known reason and then the reassertion of the NMI (on 32-bit).
>>>
>>> EDAC has a callback in pci_serr_error() - and this lock serializes that. So we
>>> cannot just remove a lock like that, if there's any chance of parallel execution on
>>> multiple CPUs.
>>>
>>> Thanks,
>>>
>>> Ingo
>>
>> OK, probably we need some UV person CC'ed (not sure whom) just to explain the
>> reason for such nmi-listening model. Meanwhile -- lets drop my patch.
>
> It's for debugging reasons. When their huge machine deadlocks, they
> wanted an easy mechanism to dump the cpu stacks. That mechanism was an
> nmi button. The problem was the button would only dump the first cpu. By
> opening up the other cpus to accept external nmis, they could dump all the
> cpus.
Yeah, thanks Don, just noted that (actually the former commit log
/78c06176466cbd1b3f0f67709d3023c40dbebcbd/ didn't mention that x86
masks only LVT1).
>
> Now this spinlock doesn't affect them, because they registered an nmi
> handler to catch it and dump their stack (I modified the code to use
> DIE_NMIUNKNOWN instead of DIE_NMI to avoid conflict with the
> nmi_watchdog). But I don't know what the affect is, if that spinlock is
> not there (I sent a private email to SGI inquiring, their guy wasn't
> around this week).
Don, do you know -- was new nmi-watchdog system tested with UV machine
somewhere?
>
> Personally I am indifferent to this patch. I don't have any problems with
> the code the way it is now, but can understand what you mean having stuff
> lying around as 'dead code'. I had thought Intel would have pushed more
> patches upstream to remove the BSP lock-in by now.
>
> Cheers,
> Don
>
--
Cyrill
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -tip 2/2 resend] x86, traps: Drop nmi_reason_lock until it is really needed
2011-03-02 19:14 ` Cyrill Gorcunov
@ 2011-03-02 19:46 ` Don Zickus
0 siblings, 0 replies; 8+ messages in thread
From: Don Zickus @ 2011-03-02 19:46 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: Ingo Molnar, Huang, Ying, Maciej W. Rozycki, lkml
On Wed, Mar 02, 2011 at 10:14:08PM +0300, Cyrill Gorcunov wrote:
> >
> > Now this spinlock doesn't affect them, because they registered an nmi
> > handler to catch it and dump their stack (I modified the code to use
> > DIE_NMIUNKNOWN instead of DIE_NMI to avoid conflict with the
> > nmi_watchdog). But I don't know what the affect is, if that spinlock is
> > not there (I sent a private email to SGI inquiring, their guy wasn't
> > around this week).
>
> Don, do you know -- was new nmi-watchdog system tested with UV machine
> somewhere?
They are testing it now, I am working through the issues with them.
Cheers,
Don
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-03-02 19:46 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-02 15:32 [PATCH -tip 2/2 resend] x86, traps: Drop nmi_reason_lock until it is really needed Cyrill Gorcunov
2011-03-02 15:46 ` Ingo Molnar
2011-03-02 15:55 ` Cyrill Gorcunov
2011-03-02 16:03 ` Ingo Molnar
2011-03-02 16:13 ` Cyrill Gorcunov
2011-03-02 18:40 ` Don Zickus
2011-03-02 19:14 ` Cyrill Gorcunov
2011-03-02 19:46 ` Don Zickus
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox