* [PATCH] ia64: reset console_loglevel so INIT output always goes to
@ 2005-01-07 23:34 Bjorn Helgaas
2005-01-08 1:32 ` [PATCH] ia64: reset console_loglevel so INIT output always goes to console Keith Owens
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Bjorn Helgaas @ 2005-01-07 23:34 UTC (permalink / raw)
To: linux-ia64
Reset console_loglevel early in INIT handler. Otherwise, if
it has been turned down (i.e., with "dmesg -n1"), the user may
see no effect at all from issuing an INIT. We're never going
to run any more user code, so there won't be any opportunity for
anything to collect the output from the dmesg buffer.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
=== arch/ia64/kernel/mca.c 1.71 vs edited ==--- 1.71/arch/ia64/kernel/mca.c 2004-11-11 11:04:30 -07:00
+++ edited/arch/ia64/kernel/mca.c 2005-01-07 16:22:54 -07:00
@@ -1133,6 +1133,7 @@
pal_min_state_area_t *ms;
oops_in_progress = 1; /* avoid deadlock in printk, but it makes recovery dodgy */
+ console_loglevel = 15; /* make sure printks make it to console */
printk(KERN_INFO "Entered OS INIT handler. PSP=%lx\n",
ia64_sal_to_os_handoff_state.proc_state_param);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ia64: reset console_loglevel so INIT output always goes to console
2005-01-07 23:34 [PATCH] ia64: reset console_loglevel so INIT output always goes to Bjorn Helgaas
@ 2005-01-08 1:32 ` Keith Owens
2005-01-09 16:24 ` [PATCH] ia64: reset console_loglevel so INIT output always goes to Russ Anderson
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Keith Owens @ 2005-01-08 1:32 UTC (permalink / raw)
To: linux-ia64
On Fri, 07 Jan 2005 16:34:31 -0700,
Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
>Reset console_loglevel early in INIT handler. Otherwise, if
>it has been turned down (i.e., with "dmesg -n1"), the user may
>see no effect at all from issuing an INIT. We're never going
>to run any more user code, so there won't be any opportunity for
>anything to collect the output from the dmesg buffer.
That is fine for the current code base. It will have to be revisited
when we get per cpu INIT stacks and the slave INIT handler actually
does something.
We are slowly but steadily moving to recovery from some MCA events. If
one of the cpus is spinning disabled when an MCA occurs then the
disabled cpu will get a slave INIT event as part of the MCA rendezvous.
If the MCA is recoverable then the slave INIT event will also be
recoverable and will eventually return to user space.
That change is still some way off, but bear it in mind when changing
the existing code.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ia64: reset console_loglevel so INIT output always goes to
2005-01-07 23:34 [PATCH] ia64: reset console_loglevel so INIT output always goes to Bjorn Helgaas
2005-01-08 1:32 ` [PATCH] ia64: reset console_loglevel so INIT output always goes to console Keith Owens
@ 2005-01-09 16:24 ` Russ Anderson
2005-01-09 17:06 ` [PATCH] ia64: reset console_loglevel so INIT output always goes to console Russ Anderson
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Russ Anderson @ 2005-01-09 16:24 UTC (permalink / raw)
To: linux-ia64
Bjorn Helgaas wrote:
>
> Reset console_loglevel early in INIT handler. Otherwise, if
> it has been turned down (i.e., with "dmesg -n1"), the user may
> see no effect at all from issuing an INIT. We're never going
> to run any more user code, so there won't be any opportunity for
> anything to collect the output from the dmesg buffer.
As Keith pointed out, we are working on returning from an INIT.
Would it be reasonable to save the current console_loglevel,
set console_loglevel up to 15, call init_handler_platform(), and
then set console_loglevel back to the original level?
Right now init_handler_platform() never returns. When the code
is improved to return, the console_loglevel will get restored
to the correct level.
> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
>
> === arch/ia64/kernel/mca.c 1.71 vs edited ==> --- 1.71/arch/ia64/kernel/mca.c 2004-11-11 11:04:30 -07:00
> +++ edited/arch/ia64/kernel/mca.c 2005-01-07 16:22:54 -07:00
> @@ -1133,6 +1133,7 @@
> pal_min_state_area_t *ms;
>
> oops_in_progress = 1; /* avoid deadlock in printk, but it makes recovery dodgy */
> + console_loglevel = 15; /* make sure printks make it to console */
>
> printk(KERN_INFO "Entered OS INIT handler. PSP=%lx\n",
> ia64_sal_to_os_handoff_state.proc_state_param);
--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc rja@sgi.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ia64: reset console_loglevel so INIT output always goes to console
2005-01-07 23:34 [PATCH] ia64: reset console_loglevel so INIT output always goes to Bjorn Helgaas
2005-01-08 1:32 ` [PATCH] ia64: reset console_loglevel so INIT output always goes to console Keith Owens
2005-01-09 16:24 ` [PATCH] ia64: reset console_loglevel so INIT output always goes to Russ Anderson
@ 2005-01-09 17:06 ` Russ Anderson
2005-01-09 23:26 ` Keith Owens
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Russ Anderson @ 2005-01-09 17:06 UTC (permalink / raw)
To: linux-ia64
Keith Owens wrote:
>
> We are slowly but steadily moving to recovery from some MCA events. If
> one of the cpus is spinning disabled when an MCA occurs then the
> disabled cpu will get a slave INIT event as part of the MCA rendezvous.
> If the MCA is recoverable then the slave INIT event will also be
> recoverable and will eventually return to user space.
>
> That change is still some way off, but bear it in mind when changing
> the existing code.
Good points, Keith.
There are a number of changes that will be needed now that MCAs and INITs
are becoming recoverable. A disabled CPU should not receive an INIT
as part of MCA rendezvous. Some of the changes will require changes
in the MCA and SAL specs.
For example, having SAL rendezvous all the CPUs before calling OS_MCA
may have been reasonable when linux lacked the ability to recover from
an MCA. But now that is changing, the descision to rendezvous CPUs
should get made later, in linux, if it cannot recover from the MCA.
Does it really make sense to rendezvous 512 CPUs just because one
CPU happened to hit a memory uncorrectable in a user application
(and recovers by killing the appication and discarding the page)?
Does it still make sense to have only one call into OS_MCA at
a time? Or is it more reasonable to support multiple OS_MCAs
and let the linux MCA code coordinate processing of the OS_MCA,
when needed? As the code progresses, it should be reasonable move
more of the decision & coordination code further into the
recovery code (or at least not prevent that from happening) so
that, for example, multiple independent MCAs can be recovered
in parallel.
As I said, this will require changes in the MCA & SAL specs.
Some are simply clearing up ambiguities in the specs, as Keith found
in MCA logging of recovered errors. Some will be more fundamental
changes to support better recovery. The code has reached the point
where we need to start making enhancements to those specs.
Thanks,
--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc rja@sgi.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ia64: reset console_loglevel so INIT output always goes to console
2005-01-07 23:34 [PATCH] ia64: reset console_loglevel so INIT output always goes to Bjorn Helgaas
` (2 preceding siblings ...)
2005-01-09 17:06 ` [PATCH] ia64: reset console_loglevel so INIT output always goes to console Russ Anderson
@ 2005-01-09 23:26 ` Keith Owens
2005-01-10 4:27 ` [PATCH] ia64: reset console_loglevel so INIT output always Bjorn Helgaas
2005-01-10 7:54 ` [PATCH] ia64: reset console_loglevel so INIT output always goes to console Matthias Fouquet-Lapar
5 siblings, 0 replies; 7+ messages in thread
From: Keith Owens @ 2005-01-09 23:26 UTC (permalink / raw)
To: linux-ia64
On Sun, 9 Jan 2005 11:06:37 -0600 (CST),
Russ Anderson <rja@sgi.com> wrote:
>For example, having SAL rendezvous all the CPUs before calling OS_MCA
>may have been reasonable when linux lacked the ability to recover from
>an MCA. But now that is changing, the descision to rendezvous CPUs
>should get made later, in linux, if it cannot recover from the MCA.
>Does it really make sense to rendezvous 512 CPUs just because one
>CPU happened to hit a memory uncorrectable in a user application
>(and recovers by killing the appication and discarding the page)?
I do not see any alternative. SAL has no idea if the OS can recover
from a memory MCA or not, that decision has to be made by the OS.
Leaving the rendezvous decision to the OS would significantly
complicate the OS/SAL interface, it requires another SAL call by the
OS, changes to every SAL version and code in the OS to work out if the
current prom supports the SAL change or not. If memory is failing, we
want the other cpus to keep off that physical memory while we work
around the problem and decide if we can recover, so we need to stop the
other cpus anyway.
MCA/INIT events are very rare and, in most cases, the rendezvous is a
standard interrupt which is reasonably fast. I will take simplicity of
OS coding over speed every time for this case. The simplest method is
for SAL to rendezvous all the cpus unless SAL knows unequivocally that
the error will be recovered. If there is any uncertainty about whether
the OS will recover or not, then do the full rendezvous.
>Does it still make sense to have only one call into OS_MCA at
>a time? Or is it more reasonable to support multiple OS_MCAs
>and let the linux MCA code coordinate processing of the OS_MCA,
>when needed?
I believe that option is already allowed by the SAL specification.
Linux has never supported it because it never had the per cpu
infrastructure.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ia64: reset console_loglevel so INIT output always
2005-01-07 23:34 [PATCH] ia64: reset console_loglevel so INIT output always goes to Bjorn Helgaas
` (3 preceding siblings ...)
2005-01-09 23:26 ` Keith Owens
@ 2005-01-10 4:27 ` Bjorn Helgaas
2005-01-10 7:54 ` [PATCH] ia64: reset console_loglevel so INIT output always goes to console Matthias Fouquet-Lapar
5 siblings, 0 replies; 7+ messages in thread
From: Bjorn Helgaas @ 2005-01-10 4:27 UTC (permalink / raw)
To: linux-ia64
> As Keith pointed out, we are working on returning from an INIT.
>
> Would it be reasonable to save the current console_loglevel,
> set console_loglevel up to 15, call init_handler_platform(), and
> then set console_loglevel back to the original level?
>
> Right now init_handler_platform() never returns. When the code
> is improved to return, the console_loglevel will get restored
> to the correct level.
Sure. I expect that we will one day be able to continue after
an INIT, and I considered doing the save/restore in the current
patch. But I preferred the simplest possible patch, partly because
this is an issue for distributions, and I'd like to see them
pick this up quickly
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ia64: reset console_loglevel so INIT output always goes to console
2005-01-07 23:34 [PATCH] ia64: reset console_loglevel so INIT output always goes to Bjorn Helgaas
` (4 preceding siblings ...)
2005-01-10 4:27 ` [PATCH] ia64: reset console_loglevel so INIT output always Bjorn Helgaas
@ 2005-01-10 7:54 ` Matthias Fouquet-Lapar
5 siblings, 0 replies; 7+ messages in thread
From: Matthias Fouquet-Lapar @ 2005-01-10 7:54 UTC (permalink / raw)
To: linux-ia64
> Russ Anderson <rja@sgi.com> wrote:
> >For example, having SAL rendezvous all the CPUs before calling OS_MCA
> >may have been reasonable when linux lacked the ability to recover from
> >an MCA. But now that is changing, the descision to rendezvous CPUs
> >should get made later, in linux, if it cannot recover from the MCA.
> >Does it really make sense to rendezvous 512 CPUs just because one
> >CPU happened to hit a memory uncorrectable in a user application
> >(and recovers by killing the appication and discarding the page)?
>
> I do not see any alternative. SAL has no idea if the OS can recover
> from a memory MCA or not, that decision has to be made by the OS.
> Leaving the rendezvous decision to the OS would significantly
> complicate the OS/SAL interface, it requires another SAL call by the
> OS, changes to every SAL version and code in the OS to work out if the
> current prom supports the SAL change or not. If memory is failing, we
> want the other cpus to keep off that physical memory while we work
> around the problem and decide if we can recover, so we need to stop the
> other cpus anyway.
I agree with Keith. Although it might seem a bit of an overkill to
rendezvous all CPUs on large systems, I think it greatly enhances the
chances of a clean recovery. Based on my error handing experience
on MIPS based systems with similar CPU counts, you still might have external
interventions etc.
Getting the system into a known state for the rare recovery case is
certainly a big advantage and avoids a lot of corner cases which will
be extremly hard to test.
Thanks
Matthias Fouquet-Lapar Core Platform Software mfl@sgi.com VNET 521-8213
Principal Engineer Silicon Graphics Home Office (+33) 1 3047 4127
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-01-10 7:54 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-07 23:34 [PATCH] ia64: reset console_loglevel so INIT output always goes to Bjorn Helgaas
2005-01-08 1:32 ` [PATCH] ia64: reset console_loglevel so INIT output always goes to console Keith Owens
2005-01-09 16:24 ` [PATCH] ia64: reset console_loglevel so INIT output always goes to Russ Anderson
2005-01-09 17:06 ` [PATCH] ia64: reset console_loglevel so INIT output always goes to console Russ Anderson
2005-01-09 23:26 ` Keith Owens
2005-01-10 4:27 ` [PATCH] ia64: reset console_loglevel so INIT output always Bjorn Helgaas
2005-01-10 7:54 ` [PATCH] ia64: reset console_loglevel so INIT output always goes to console Matthias Fouquet-Lapar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox