From: Vivek Goyal <vgoyal@redhat.com>
To: Alok Kataria <akataria@vmware.com>
Cc: jeremy@xensource.com,
"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
the arch/x86 maintainers <x86@kernel.org>,
Daniel Hecht <dhecht@vmware.com>,
LKML <linux-kernel@vger.kernel.org>,
Haren Myneni <hbabu@us.ibm.com>,
"Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: [RFC PATCH] Bug during kexec...not all cpus are stopped
Date: Mon, 11 Oct 2010 17:39:01 -0400 [thread overview]
Message-ID: <20101011213901.GR12743@redhat.com> (raw)
In-Reply-To: <1286826083.1372.15.camel@ank32.eng.vmware.com>
On Mon, Oct 11, 2010 at 12:41:23PM -0700, Alok Kataria wrote:
> Hi Eric,
>
> Thanks for taking a look.
>
> On Mon, 2010-10-11 at 11:07 -0700, Eric W. Biederman wrote:
> > Alok Kataria <akataria@vmware.com> writes:
> >
> > > Copying a few more maintainers on the thread...
> > > On Fri, 2010-10-08 at 13:34 -0700, Alok Kataria wrote:
> > >> Before starting the new kernel kexec calls machine_shutdown to stop all
> > >> the cpus, which internally calls native_smp_send_stop. AFAIU, kexec
> > >> expects that all the cpus are now halted after that call returns.
> > >> Now, looking at the code for native_smp_send_stop, it assumes that all
> > >> the processors have processed the REBOOT ipi in 1 second after the IPI
> > >> was sent.
> > >> native_smp_send_stop()
> > >> ---------------------------------------------------------
> > >> apic->send_IPI_allbutself(REBOOT_VECTOR);
> > >>
> > >> /* Don't wait longer than a second */
> > >> wait = USEC_PER_SEC;
> > >> while (num_online_cpus() > 1 && wait--)
> > >> udelay(1);
> > >> ---------------------------------------------------------
> > >>
> > >> It just returns after that 1 second irrespective of whether all cpus
> > >> were halted or not. This brings up a issue in the kexec case, since we
> > >> can have the BSP starting the new kernel and AP's still processing the
> > >> REBOOT IPI simultaneously.
> > >>
> > >> Many distribution kernels use kexec to load the newly installed kernel
> > >> during the installation phase, in virtualized environment with the host
> > >> heavily overcommitted, we have seen some instances when vcpu fails to
> > >> process the IPI in the allotted 1 sec and as a result the AP's end up
> > >> accessing uninitialized state (the BSP has already gone ahead with
> > >> setting up the new state) and causing GPF's.
> > >>
> > >> IMO, kexec expects machine_shutdown to return only after all cpus are
> > >> stopped.
> > >>
> > >> The patch below should fix the issue, comments ??
> > >>
> > >> --
> > >> machine_shutdown now takes a parameter "wait", if it is true, it waits
> > >> until all the cpus are halted. All the callers except kexec still
> > >> fallback to the earlier version of the shutdown call, where it just
> > >> waited for max 1 sec before returning.
> >
> > The approach seems reasonable. However on every path except for the
> > panic path I would wait for all of the cpus to stop, as that is what
> > those code paths are expecting. Until last year smp_send_stop always
> > waited until all of the cpus stopped. So I think changing
> > machine_shutdown should not need to happen.
> >
> > This patch cannot be used as is because it changes the parameters to
> > smp_send_stop() and there are more than just x86 implementations out
> > there.
> >
> > Let me propose a slightly different tactic. We need to separate
> > the panic path from the normal path to avoid confusion. At the
> > generic level smp_send_stop is exclusively used for the panic
> > path. So we should keep that, and rename the x86 non-panic path
> > function something else.
> >
> > Can you rename the x86 smp_send_stop function say stop_all_other_cpus().
> >
> > At which point we could implement smp_send_stop as:
> > stop_all_other_cpus(0);
> >
> > And everywhere else would call stop_all_other_cpus(1) waiting for
> > the cpus to actually stop.
>
> Done, I have added a stop_other_cpus function which calls
> smp_ops.stop_other_cpus(1)
>
> >
> > I really think we want to wait for the cpus to stop in all cases except
> > for panic/kdump where we expect things to be broken and we are doing
> > our best to make things work anyway.
> Now it does, except in the panic case.
>
> Jeremy, I have modified the xen_reboot bits too so that it now waits
> until all cpus are actually stopped, please take a look.
>
> --
>
> x86 smp_ops now has a new op, stop_other_cpus which takes a parameter "wait"
> this allows the caller to specify if it wants to stop untill all the cpus
> have processed the stop IPI. This is required specifically for the kexec case
> where we should wait for all the cpus to be stopped before starting the new
> kernel.
> We now wait for the cpus to stop in all cases except for panic/kdump where
> we expect things to be broken and we are doing our best to make things
> work anyway.
I don't think that kdump path uses smp_send_stop().
IIUC, on x86, we directly send NMI to other cpus.
native_machine_crash_shutdown()
kdump_nmi_shootdown_cpus()
nmi_shootdown_cpus()
smp_send_nmi_allbutself
apic->send_IPI_allbutself(NMI_VECTOR);
So above description should be limited to only panic() path.
On a side note, I am wondering why panic() and kdump path can't share the
shutdown routine.
Vivek
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Vivek Goyal <vgoyal@redhat.com>
To: Alok Kataria <akataria@vmware.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
Haren Myneni <hbabu@us.ibm.com>,
the arch/x86 maintainers <x86@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Daniel Hecht <dhecht@vmware.com>,
jeremy@xensource.com
Subject: Re: [RFC PATCH] Bug during kexec...not all cpus are stopped
Date: Mon, 11 Oct 2010 17:39:01 -0400 [thread overview]
Message-ID: <20101011213901.GR12743@redhat.com> (raw)
In-Reply-To: <1286826083.1372.15.camel@ank32.eng.vmware.com>
On Mon, Oct 11, 2010 at 12:41:23PM -0700, Alok Kataria wrote:
> Hi Eric,
>
> Thanks for taking a look.
>
> On Mon, 2010-10-11 at 11:07 -0700, Eric W. Biederman wrote:
> > Alok Kataria <akataria@vmware.com> writes:
> >
> > > Copying a few more maintainers on the thread...
> > > On Fri, 2010-10-08 at 13:34 -0700, Alok Kataria wrote:
> > >> Before starting the new kernel kexec calls machine_shutdown to stop all
> > >> the cpus, which internally calls native_smp_send_stop. AFAIU, kexec
> > >> expects that all the cpus are now halted after that call returns.
> > >> Now, looking at the code for native_smp_send_stop, it assumes that all
> > >> the processors have processed the REBOOT ipi in 1 second after the IPI
> > >> was sent.
> > >> native_smp_send_stop()
> > >> ---------------------------------------------------------
> > >> apic->send_IPI_allbutself(REBOOT_VECTOR);
> > >>
> > >> /* Don't wait longer than a second */
> > >> wait = USEC_PER_SEC;
> > >> while (num_online_cpus() > 1 && wait--)
> > >> udelay(1);
> > >> ---------------------------------------------------------
> > >>
> > >> It just returns after that 1 second irrespective of whether all cpus
> > >> were halted or not. This brings up a issue in the kexec case, since we
> > >> can have the BSP starting the new kernel and AP's still processing the
> > >> REBOOT IPI simultaneously.
> > >>
> > >> Many distribution kernels use kexec to load the newly installed kernel
> > >> during the installation phase, in virtualized environment with the host
> > >> heavily overcommitted, we have seen some instances when vcpu fails to
> > >> process the IPI in the allotted 1 sec and as a result the AP's end up
> > >> accessing uninitialized state (the BSP has already gone ahead with
> > >> setting up the new state) and causing GPF's.
> > >>
> > >> IMO, kexec expects machine_shutdown to return only after all cpus are
> > >> stopped.
> > >>
> > >> The patch below should fix the issue, comments ??
> > >>
> > >> --
> > >> machine_shutdown now takes a parameter "wait", if it is true, it waits
> > >> until all the cpus are halted. All the callers except kexec still
> > >> fallback to the earlier version of the shutdown call, where it just
> > >> waited for max 1 sec before returning.
> >
> > The approach seems reasonable. However on every path except for the
> > panic path I would wait for all of the cpus to stop, as that is what
> > those code paths are expecting. Until last year smp_send_stop always
> > waited until all of the cpus stopped. So I think changing
> > machine_shutdown should not need to happen.
> >
> > This patch cannot be used as is because it changes the parameters to
> > smp_send_stop() and there are more than just x86 implementations out
> > there.
> >
> > Let me propose a slightly different tactic. We need to separate
> > the panic path from the normal path to avoid confusion. At the
> > generic level smp_send_stop is exclusively used for the panic
> > path. So we should keep that, and rename the x86 non-panic path
> > function something else.
> >
> > Can you rename the x86 smp_send_stop function say stop_all_other_cpus().
> >
> > At which point we could implement smp_send_stop as:
> > stop_all_other_cpus(0);
> >
> > And everywhere else would call stop_all_other_cpus(1) waiting for
> > the cpus to actually stop.
>
> Done, I have added a stop_other_cpus function which calls
> smp_ops.stop_other_cpus(1)
>
> >
> > I really think we want to wait for the cpus to stop in all cases except
> > for panic/kdump where we expect things to be broken and we are doing
> > our best to make things work anyway.
> Now it does, except in the panic case.
>
> Jeremy, I have modified the xen_reboot bits too so that it now waits
> until all cpus are actually stopped, please take a look.
>
> --
>
> x86 smp_ops now has a new op, stop_other_cpus which takes a parameter "wait"
> this allows the caller to specify if it wants to stop untill all the cpus
> have processed the stop IPI. This is required specifically for the kexec case
> where we should wait for all the cpus to be stopped before starting the new
> kernel.
> We now wait for the cpus to stop in all cases except for panic/kdump where
> we expect things to be broken and we are doing our best to make things
> work anyway.
I don't think that kdump path uses smp_send_stop().
IIUC, on x86, we directly send NMI to other cpus.
native_machine_crash_shutdown()
kdump_nmi_shootdown_cpus()
nmi_shootdown_cpus()
smp_send_nmi_allbutself
apic->send_IPI_allbutself(NMI_VECTOR);
So above description should be limited to only panic() path.
On a side note, I am wondering why panic() and kdump path can't share the
shutdown routine.
Vivek
next prev parent reply other threads:[~2010-10-11 21:39 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-08 20:34 [RFC PATCH] Bug during kexec...not all cpus are stopped Alok Kataria
2010-10-08 20:34 ` Alok Kataria
2010-10-11 17:09 ` Alok Kataria
2010-10-11 17:09 ` Alok Kataria
2010-10-11 18:07 ` Eric W. Biederman
2010-10-11 18:07 ` Eric W. Biederman
2010-10-11 19:41 ` Alok Kataria
2010-10-11 19:41 ` Alok Kataria
2010-10-11 21:17 ` Eric W. Biederman
2010-10-11 21:17 ` Eric W. Biederman
2010-10-11 21:37 ` Alok Kataria
2010-10-11 21:37 ` Alok Kataria
2010-10-21 21:40 ` [tip:x86/urgent] x86, kexec: Make sure to stop all CPUs before exiting the kernel tip-bot for Alok Kataria
2010-10-11 21:39 ` Vivek Goyal [this message]
2010-10-11 21:39 ` [RFC PATCH] Bug during kexec...not all cpus are stopped Vivek Goyal
2010-10-11 21:47 ` Alok Kataria
2010-10-11 21:47 ` Alok Kataria
2010-10-11 22:10 ` Eric W. Biederman
2010-10-11 22:10 ` Eric W. Biederman
2010-10-12 22:17 ` Vivek Goyal
2010-10-12 22:17 ` Vivek Goyal
2010-10-13 0:23 ` Alok Kataria
2010-10-13 0:23 ` Alok Kataria
2010-10-21 19:09 ` Alok Kataria
2010-10-21 19:09 ` Alok Kataria
2010-10-21 20:26 ` H. Peter Anvin
2010-10-21 20:26 ` H. Peter Anvin
2010-10-21 21:10 ` Alok Kataria
2010-10-21 21:10 ` Alok Kataria
2010-10-21 21:24 ` H. Peter Anvin
2010-10-21 21:24 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101011213901.GR12743@redhat.com \
--to=vgoyal@redhat.com \
--cc=akataria@vmware.com \
--cc=dhecht@vmware.com \
--cc=ebiederm@xmission.com \
--cc=hbabu@us.ibm.com \
--cc=jeremy@xensource.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.