All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Vivek Goyal <vgoyal@redhat.com>
Cc: "Don Zickus" <dzickus@redhat.com>,
	akpm@linux-foundation.org, linux-tip-commits@vger.kernel.org,
	"Yinghai Lu" <yinghai@kernel.org>,
	"Fernando Luis Vázquez Cao" <fernando@oss.ntt.co.jp>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	mingo@redhat.com, "H. Peter Anvin" <hpa@zytor.com>,
	tglx@linutronix.de, torvalds@linux-foundation.org, mingo@elte.hu
Subject: Re: [PATCH 1/2] boot: ignore early NMIs
Date: Mon, 12 Mar 2012 12:02:06 -0700	[thread overview]
Message-ID: <m1ipi9prrl.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <20120312133619.GB17288@redhat.com> (Vivek Goyal's message of "Mon, 12 Mar 2012 09:36:19 -0400")

Vivek Goyal <vgoyal@redhat.com> writes:

> On Mon, Mar 12, 2012 at 03:14:20PM +0900, Fernando Luis Vázquez Cao wrote:
>
> [..]
>> The thing is that we want to avoid playing with hardware in the kdump
>> reboot patch when we can avoid it, the premise being that it cannot
>> be accessed without risking a lockup or worse (as the deadlock accessing
>> the I/O APIC showed).
>
> I think there needs to be a limit to being paranoid. On one hand people
> want to run panic notifiers, all the kmsg_dump() hooks in panic path, and
> on the other hand we are afraid of even disabling LAPIC.

And the kmsg_dump code and the panic notifiers aren't being run.  Having
seen some of their failure modes being patched up recently (Adding and
removing sysfs files!!!!) I'm very comfortable with the level of
paranoia.

It has been proven time and time again that the more you do in the
failing kernel that the greater your likely-hood of not getting your
failure information out.

> I personally think that disabling LAPIC is reasonably practical solution
> to the problem until and unless somebody shows that it deadlocks
> easily.

Disabling NMI generation in the LAPIC is fine, and for the short term
I don't even have a problem with disabling the entire LAPIC as all of
our platforms seem to have code for completely reprogramming it.

At the same time there have been cases like the i8259 routed through
the ExtInt pin of the lapci that we haven't been given programming
information about and that if we want to work we should avoid touching.

Furthermore we have two reported cases of people experiencing real NMIs
on the kdump path.  So we have to assume the presence of the CMOS nmi
disable as well if we are going to unequivocally disable NMIs.

Given the variety of x86 hardware today and the growing variety of x86
hardware tomorrow we are going to be fixing this until we can actually
handle the NMIs.  Hardware designers are unfortunately creative enough
that we aren't going to think of everything.  Given that it is has taken
us almost a decade to realize that there actually is a real world
problem  I'm not too keen on a solution that is just good enough to
fix a small problem.

I would love it if x86 had an architectural NMI off switch but with
Intel pushing EFI and the removal of the cmos clock x86 no longer
has an always available NMI off switch.

Furthermore handling of NMI is not hard it is just a little tricky,
to test.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: ebiederm@xmission.com (Eric W. Biederman)
To: Vivek Goyal <vgoyal@redhat.com>
Cc: "Fernando Luis Vázquez Cao" <fernando@oss.ntt.co.jp>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Don Zickus" <dzickus@redhat.com>,
	linux-tip-commits@vger.kernel.org, torvalds@linux-foundation.org,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	mingo@redhat.com, tglx@linutronix.de, mingo@elte.hu,
	"Yinghai Lu" <yinghai@kernel.org>,
	akpm@linux-foundation.org
Subject: Re: [PATCH 1/2] boot: ignore early NMIs
Date: Mon, 12 Mar 2012 12:02:06 -0700	[thread overview]
Message-ID: <m1ipi9prrl.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <20120312133619.GB17288@redhat.com> (Vivek Goyal's message of "Mon, 12 Mar 2012 09:36:19 -0400")

Vivek Goyal <vgoyal@redhat.com> writes:

> On Mon, Mar 12, 2012 at 03:14:20PM +0900, Fernando Luis Vázquez Cao wrote:
>
> [..]
>> The thing is that we want to avoid playing with hardware in the kdump
>> reboot patch when we can avoid it, the premise being that it cannot
>> be accessed without risking a lockup or worse (as the deadlock accessing
>> the I/O APIC showed).
>
> I think there needs to be a limit to being paranoid. On one hand people
> want to run panic notifiers, all the kmsg_dump() hooks in panic path, and
> on the other hand we are afraid of even disabling LAPIC.

And the kmsg_dump code and the panic notifiers aren't being run.  Having
seen some of their failure modes being patched up recently (Adding and
removing sysfs files!!!!) I'm very comfortable with the level of
paranoia.

It has been proven time and time again that the more you do in the
failing kernel that the greater your likely-hood of not getting your
failure information out.

> I personally think that disabling LAPIC is reasonably practical solution
> to the problem until and unless somebody shows that it deadlocks
> easily.

Disabling NMI generation in the LAPIC is fine, and for the short term
I don't even have a problem with disabling the entire LAPIC as all of
our platforms seem to have code for completely reprogramming it.

At the same time there have been cases like the i8259 routed through
the ExtInt pin of the lapci that we haven't been given programming
information about and that if we want to work we should avoid touching.

Furthermore we have two reported cases of people experiencing real NMIs
on the kdump path.  So we have to assume the presence of the CMOS nmi
disable as well if we are going to unequivocally disable NMIs.

Given the variety of x86 hardware today and the growing variety of x86
hardware tomorrow we are going to be fixing this until we can actually
handle the NMIs.  Hardware designers are unfortunately creative enough
that we aren't going to think of everything.  Given that it is has taken
us almost a decade to realize that there actually is a real world
problem  I'm not too keen on a solution that is just good enough to
fix a small problem.

I would love it if x86 had an architectural NMI off switch but with
Intel pushing EFI and the removal of the cmos clock x86 no longer
has an always available NMI off switch.

Furthermore handling of NMI is not hard it is just a little tricky,
to test.

Eric

  reply	other threads:[~2012-03-12 18:59 UTC|newest]

Thread overview: 127+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-11 23:09 [tip:x86/debug] x86/kdump: No need to disable ioapic/ lapic in crash path tip-bot for Don Zickus
2012-02-12  1:04 ` Yinghai Lu
2012-02-12  1:04   ` Yinghai Lu
2012-02-12  3:13   ` Eric W. Biederman
2012-02-12  3:13     ` Eric W. Biederman
2012-02-12  4:17     ` Yinghai Lu
2012-02-12  4:17       ` Yinghai Lu
2012-02-13 12:52       ` Eric W. Biederman
2012-02-13 12:52         ` Eric W. Biederman
2012-02-13 16:51         ` Yinghai Lu
2012-02-13 16:51           ` Yinghai Lu
2012-02-13 18:16           ` Yinghai Lu
2012-02-13 18:16             ` Yinghai Lu
2012-02-16 17:27             ` Don Zickus
2012-02-16 17:27               ` Don Zickus
2012-02-16 21:53               ` Yinghai Lu
2012-02-16 21:53                 ` Yinghai Lu
2012-02-16 21:56                 ` Don Zickus
2012-02-16 21:56                   ` Don Zickus
2012-02-17  3:38                   ` Eric W. Biederman
2012-02-17  3:38                     ` Eric W. Biederman
2012-02-17 12:41                     ` Eric W. Biederman
2012-02-17 12:41                       ` Eric W. Biederman
2012-02-17 15:49                       ` HATAYAMA Daisuke
2012-02-17 15:49                         ` HATAYAMA Daisuke
2012-02-17 20:18                         ` Don Zickus
2012-02-17 20:18                           ` Don Zickus
2012-02-20  5:17                           ` HATAYAMA Daisuke
2012-02-20  5:17                             ` HATAYAMA Daisuke
2012-02-20 15:24                             ` Don Zickus
2012-02-20 15:24                               ` Don Zickus
2012-02-17 19:54                       ` Don Zickus
2012-02-17 19:54                         ` Don Zickus
2012-02-18  3:21                         ` Eric W. Biederman
2012-02-18  3:21                           ` Eric W. Biederman
2012-02-20 15:14                           ` Don Zickus
2012-02-20 15:14                             ` Don Zickus
2012-02-21  8:01                             ` Eric W. Biederman
2012-02-21  8:01                               ` Eric W. Biederman
2012-02-21 13:59                               ` Don Zickus
2012-02-21 13:59                                 ` Don Zickus
2012-02-29 23:19                                 ` Eric W. Biederman
2012-02-29 23:19                                   ` Eric W. Biederman
2012-03-07 10:53                                   ` Fernando Luis Vázquez Cao
2012-03-07 10:53                                     ` Fernando Luis Vázquez Cao
2012-03-07 10:54                                     ` [PATCH 1/2] boot: ignore early NMIs Fernando Luis Vázquez Cao
2012-03-07 10:54                                       ` Fernando Luis Vázquez Cao
2012-03-07 10:56                                       ` [PATCH 2/2] boot: add early NMI counter Fernando Luis Vázquez Cao
2012-03-07 10:56                                         ` Fernando Luis Vázquez Cao
2012-03-08  4:50                                         ` Eric W. Biederman
2012-03-08  4:50                                           ` Eric W. Biederman
2012-03-08  6:00                                           ` Fernando Luis Vázquez Cao
2012-03-08  6:00                                             ` Fernando Luis Vázquez Cao
2012-03-08  4:41                                       ` [PATCH 1/2] boot: ignore early NMIs Eric W. Biederman
2012-03-08  4:41                                         ` Eric W. Biederman
2012-03-08  5:53                                         ` Fernando Luis Vázquez Cao
2012-03-08  5:53                                           ` Fernando Luis Vázquez Cao
2012-03-08 16:35                                           ` Eric W. Biederman
2012-03-08 16:35                                             ` Eric W. Biederman
2012-03-09  9:31                                             ` Fernando Luis Vázquez Cao
2012-03-09  9:31                                               ` Fernando Luis Vázquez Cao
2012-03-09  9:51                                               ` [PATCH 1/3] boot: fortify early_idt_handlers definition Fernando Luis Vázquez Cao
2012-03-09  9:51                                                 ` Fernando Luis Vázquez Cao
2012-03-09  9:55                                                 ` [PATCH 2/3] boot: ignore early NMIs Fernando Luis Vázquez Cao
2012-03-09  9:55                                                   ` Fernando Luis Vázquez Cao
2012-03-09 10:01                                                   ` [PATCH 3/3] boot: add early NMI counter Fernando Luis Vázquez Cao
2012-03-09 10:01                                                     ` Fernando Luis Vázquez Cao
2012-03-09 20:52                                             ` [PATCH 1/2] boot: ignore early NMIs H. Peter Anvin
2012-03-09 20:52                                               ` H. Peter Anvin
2012-03-12  5:43                                               ` Fernando Luis Vázquez Cao
2012-03-12  5:43                                                 ` Fernando Luis Vázquez Cao
2012-03-12  5:49                                                 ` H. Peter Anvin
2012-03-12  5:49                                                   ` H. Peter Anvin
2012-03-12  6:14                                                   ` Fernando Luis Vázquez Cao
2012-03-12  6:14                                                     ` Fernando Luis Vázquez Cao
2012-03-12 13:36                                                     ` Vivek Goyal
2012-03-12 13:36                                                       ` Vivek Goyal
2012-03-12 19:02                                                       ` Eric W. Biederman [this message]
2012-03-12 19:02                                                         ` Eric W. Biederman
2012-03-12 19:58                                                         ` Vivek Goyal
2012-03-12 19:58                                                           ` Vivek Goyal
2012-03-12 20:02                                                         ` H. Peter Anvin
2012-03-12 20:02                                                           ` H. Peter Anvin
2012-03-12 18:40                                                     ` H. Peter Anvin
2012-03-12 18:40                                                       ` H. Peter Anvin
2012-03-12 20:01                                                       ` Eric W. Biederman
2012-03-12 20:01                                                         ` Eric W. Biederman
2012-03-12 20:04                                                         ` H. Peter Anvin
2012-03-12 20:04                                                           ` H. Peter Anvin
2012-03-12 20:16                                                           ` H. Peter Anvin
2012-03-12 20:16                                                             ` H. Peter Anvin
2012-03-13  2:11                                                             ` Fernando Luis Vázquez Cao
2012-03-13  2:11                                                               ` Fernando Luis Vázquez Cao
2012-03-13 13:33                                                               ` Don Zickus
2012-03-13 13:33                                                                 ` Don Zickus
2012-03-15  0:43                                                                 ` Simon Horman
2012-03-15  0:43                                                                   ` Simon Horman
2012-03-13  1:43                                                       ` Fernando Luis Vázquez Cao
2012-03-13  1:43                                                         ` Fernando Luis Vázquez Cao
2012-03-12 14:41                                                   ` Don Zickus
2012-03-12 14:41                                                     ` Don Zickus
2012-03-07 15:50                                     ` [tip:x86/debug] x86/kdump: No need to disable ioapic/ lapic in crash path Vivek Goyal
2012-03-07 15:50                                       ` Vivek Goyal
2012-03-07 18:27                                       ` Yinghai Lu
2012-03-07 18:27                                         ` Yinghai Lu
2012-03-08  1:29                                         ` Fernando Luis Vázquez Cao
2012-03-08  1:29                                           ` Fernando Luis Vázquez Cao
2012-03-09  0:59                                     ` HATAYAMA Daisuke
2012-03-09  0:59                                       ` HATAYAMA Daisuke
2012-03-09  2:48                                       ` Eric W. Biederman
2012-03-09  2:48                                         ` Eric W. Biederman
2012-02-12 11:12   ` Ingo Molnar
2012-02-12 11:12     ` Ingo Molnar
2012-02-13 15:28   ` Don Zickus
2012-02-13 15:28     ` Don Zickus
2012-02-13 16:52     ` Yinghai Lu
2012-02-13 16:52       ` Yinghai Lu
2012-02-13 22:12       ` Don Zickus
2012-02-13 22:12         ` Don Zickus
2012-02-13 22:51         ` Don Zickus
2012-02-13 22:51           ` Don Zickus
2012-02-16  2:53       ` Don Zickus
2012-02-16  2:53         ` Don Zickus
2012-02-16 18:43         ` Yinghai Lu
2012-02-16 18:43           ` Yinghai Lu
2012-02-16 21:41           ` Don Zickus
2012-02-16 21:41             ` Don Zickus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1ipi9prrl.fsf@fess.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=dzickus@redhat.com \
    --cc=fernando@oss.ntt.co.jp \
    --cc=hpa@zytor.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vgoyal@redhat.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.