[PATCH] [2.6] PPC64: log firmware errors during boot.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] [2.6] PPC64: log firmware errors during boot.
@ 2004-06-30  0:10 linas
  2004-06-30 10:55 ` Paul Mackerras
  0 siblings, 1 reply; 16+ messages in thread
From: linas @ 2004-06-30  0:10 UTC (permalink / raw)
  To: paulus, paulus; +Cc: linuxppc64-dev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 846 bytes --]


Paul,

Here's the fourth patch ... please apply.

Firmware can report errors at any time, and not atypically during boot.
However, these reports were being discarded until th rtasd comes up,
which occurs fairly late in the boot cycle.  As a result, firmware
errors during boot were being silently ignored. 

This patch at least gets them printk'ed so that at least they show 
up in boot.msg/syslog.  There are two other logging mechanisms,
nvram and rtas, that I didn't touch because I don't understand 
the reprecussions.  In particular, nvram logging isn't enabled
until late in the boot ... but what's the point of nvram logging
if not to catch messages that occured very early in boot ?? 

Please apply the patch, and discussion welcome on how nvram
logging works/should work ... 

Signed-off-by: Linas Vepstas <linas@linas.org>

--linas




[-- Attachment #2: rtas-log-boot-msgs.patch --]
[-- Type: text/plain, Size: 1421 bytes --]

--- arch/ppc64/kernel/rtasd.c.orig	2004-06-28 15:33:12.000000000 -0500
+++ arch/ppc64/kernel/rtasd.c	2004-06-29 18:51:31.000000000 -0500
@@ -57,6 +57,8 @@ volatile int error_log_cnt = 0;
  */
 static unsigned char logdata[RTAS_ERROR_LOG_MAX];
 
+static int get_eventscan_parms(void);
+		  
 /* To see this info, grep RTAS /var/log/messages and each entry
  * will be collected together with obvious begin/end.
  * There will be a unique identifier on the begin and end lines.
@@ -121,6 +123,9 @@ static int log_rtas_len(char * buf)
 		len += err->extended_log_length;
 	}
 
+	if (rtas_error_log_max == 0) {
+		get_eventscan_parms();
+	}
 	if (len > rtas_error_log_max)
 		len = rtas_error_log_max;
 
@@ -148,7 +153,6 @@ void pSeries_log_error(char *buf, unsign
 	int len = 0;
 
 	DEBUG("logging event\n");
-
 	if (buf == NULL)
 		return;
 
@@ -171,6 +175,13 @@ void pSeries_log_error(char *buf, unsign
 	if (!no_more_logging && !(err_type & ERR_FLAG_BOOT))
 		nvram_write_error_log(buf, len, err_type);
 
+	/* rtas errors can occur during boot, and we do want to capture
+	 * those somewhere, even if nvram isn't ready (why not?), and even 
+	 * if rtasd isn't ready. Put them into the boot log, at least.  */
+	if ((err_type & ERR_TYPE_MASK) == ERR_TYPE_RTAS_LOG) {
+		printk_log_rtas(buf, len);
+	}
+	
 	/* Check to see if we need to or have stopped logging */
 	if (fatal || no_more_logging) {
 		no_more_logging = 1;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
  2004-06-30  0:10 [PATCH] [2.6] PPC64: log firmware errors during boot linas
@ 2004-06-30 10:55 ` Paul Mackerras
  2004-07-01 21:06   ` linas
  2004-07-06 13:41   ` Jake Moilanen
  0 siblings, 2 replies; 16+ messages in thread
From: Paul Mackerras @ 2004-06-30 10:55 UTC (permalink / raw)
  To: linas; +Cc: linuxppc64-dev, linux-kernel

Linas,

> Firmware can report errors at any time, and not atypically during boot.
> However, these reports were being discarded until th rtasd comes up,
> which occurs fairly late in the boot cycle.  As a result, firmware
> errors during boot were being silently ignored. 

As far as I can see the main change is in log_rtas_len, which is
called from pSeries_log_error, which is called from do_event_scan and
rtasd(), and do_event_scan is only called from rtasd().  And
get_eventscan_parms() is already called at the beginning of rtasd().
So I don't see the point of the get_eventscan_parms call in
log_rtas_len.  The other change is also in pSeries_log_error.

What am I missing?

> This patch at least gets them printk'ed so that at least they show 
> up in boot.msg/syslog.  There are two other logging mechanisms,
> nvram and rtas, that I didn't touch because I don't understand 
> the reprecussions.  In particular, nvram logging isn't enabled
> until late in the boot ... but what's the point of nvram logging
> if not to catch messages that occured very early in boot ?? 

Indeed.

As for printk'ing the errors, it is annoying and it seems of somewhat
dubious benefit to me, given that it is just incomprehensible hex
numbers that can go on and on.  There has to be a better way.  Putting
it in nvram seems like a better option to me.  I don't know of any
reason why we can't use nvram quite early on.

Paul.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
  2004-06-30 10:55 ` Paul Mackerras
@ 2004-07-01 21:06   ` linas
  2004-07-02  5:36     ` Greg KH
  2004-07-02 10:44     ` Paul Mackerras
  2004-07-06 13:41   ` Jake Moilanen
  1 sibling, 2 replies; 16+ messages in thread
From: linas @ 2004-07-01 21:06 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc64-dev, linux-kernel

On Wed, Jun 30, 2004 at 08:55:15PM +1000, Paul Mackerras wrote:
> Linas,
> 
> > Firmware can report errors at any time, and not atypically during boot.
> > However, these reports were being discarded until th rtasd comes up,
> > which occurs fairly late in the boot cycle.  As a result, firmware
> > errors during boot were being silently ignored. 
> 
> As far as I can see the main change is in log_rtas_len, which is
> called from pSeries_log_error, which is called from do_event_scan and
> rtasd(), and do_event_scan is only called from rtasd().  And
> get_eventscan_parms() is already called at the beginning of rtasd().

Yes, but rtasd starts up late in the book process.  Most of the 
"interesting" manipulations with firmware are old history by then,
and thus, any firmware errors encountered during the boot were never 
logged.

> So I don't see the point of the get_eventscan_parms call in
> log_rtas_len.  

If the parms aren't set up, then the rtas_error_log_max is zero,
and, as a result, the message is never logged.  By initializing
rtas_error_log_max to the correct non-zero value, the errors can 
get logged.

> > This patch at least gets them printk'ed so that at least they show 
> > up in boot.msg/syslog.  There are two other logging mechanisms,
> > nvram and rtas, that I didn't touch because I don't understand 
> > the reprecussions.  In particular, nvram logging isn't enabled
> > until late in the boot ... but what's the point of nvram logging
> > if not to catch messages that occured very early in boot ?? 
> 
> Indeed.
> 
> As for printk'ing the errors, it is annoying and it seems of somewhat
> dubious benefit to me, given that it is just incomprehensible hex
> numbers that can go on and on.  There has to be a better way.  

Yes, well, you'll be hard-pressed to find a lover of the hex format 
anywhere.  Lets review the history of the design decisions that
got us to this point.  I think a better solution might then become
evident.

-- Originally, these binary messages from firmware were decoded
in the kernel, and printed out in 'plain english'.  However, there
were problems: 1) the format of the binary kept evolving; I think 
we are now up to version 6.  2) the need for supporting version 6 
and *all* of the earlier versions lead to dreaded kernel bloat.
For the current user-space decoder:
# wc *.c *.h
   2207    7056   67959 total

-- So the decision was wisely made to move this all to user-space. 
But what shall the communications link between user-space and kernel be? 
Somebody, somewhere,  I know not who or why, decided that they should 
go into syslog.  And so here we are.

How else could we do this?  I have never had to architect a kernel-to-user
data communications interface, so I don't know what the alternatives 
are.  We could queue them up to some file in /proc, which user-space 
reads.  Or maybe /sys instead ?? Maybe a stunt with sockets? Some 
new device in /dev/ that can be opened, read, closed?  How should 
the user space daemon indicate that its picked up the message and 
doesn't need it any more?  Write a msg number to a /proc file?  
Maybe each individual message should go in its own file, and user 
space just rm's that file after its fetched/saved the message.  
I dunno, I think any one of these could be whipped up in a jiffy.
Convincing the user-space to use the interface might be harder.

Pick one. If it can be coded in under a day, I can volunteer to
do that.

> Putting
> it in nvram seems like a better option to me.  I don't know of any
> reason why we can't use nvram quite early on.

Me neither, Jake knows. I thought the whole point of nvram was to not
loose the messages during crash; the messages are promtly copied out
of nvram once the system is up and stable; nvram is a staging area, 
not a permanent repository.

--linas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
  2004-07-01 21:06   ` linas
@ 2004-07-02  5:36     ` Greg KH
  2004-07-02 10:44     ` Paul Mackerras
  1 sibling, 0 replies; 16+ messages in thread
From: Greg KH @ 2004-07-02  5:36 UTC (permalink / raw)
  To: linas; +Cc: Paul Mackerras, linuxppc64-dev, linux-kernel

On Thu, Jul 01, 2004 at 04:06:14PM -0500, linas@austin.ibm.com wrote:
> How else could we do this?  I have never had to architect a kernel-to-user
> data communications interface, so I don't know what the alternatives
> are.  We could queue them up to some file in /proc, which user-space
> reads. 

No.

> Or maybe /sys instead ??

No.

> Maybe a stunt with sockets?

Yes, use netlink.

> Some new device in /dev/ that can be opened, read, closed?

No.

> How should the user space daemon indicate that its picked up the
> message and doesn't need it any more?

The kernel doesn't care.

> Write a msg number to a /proc file?

No way.

> Maybe each individual message should go in its own file, and user
> space just rm's that file after its fetched/saved the message.

Hm, that's a neat idea I don't think I've seen before.  But no :)

> I dunno, I think any one of these could be whipped up in a jiffy.
> Convincing the user-space to use the interface might be harder.

In summary, use syslog or netlink like the whole rest of the kernel
does.  Don't reinvent the wheel again, please.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
  2004-07-01 21:06   ` linas
  2004-07-02  5:36     ` Greg KH
@ 2004-07-02 10:44     ` Paul Mackerras
  2004-07-02 14:15       ` Hollis Blanchard
  1 sibling, 1 reply; 16+ messages in thread
From: Paul Mackerras @ 2004-07-02 10:44 UTC (permalink / raw)
  To: linas; +Cc: linuxppc64-dev, linux-kernel

linas@austin.ibm.com writes:

> Yes, but rtasd starts up late in the book process.  Most of the 
> "interesting" manipulations with firmware are old history by then,
> and thus, any firmware errors encountered during the boot were never 
> logged.

It all makes a lot more sense with the change to set ppc_md.log_error
to pSeries_log_error.  I do wonder why we need a ppc_md function
pointer for that though, given how pSeries-specific the error log
format is.

> If the parms aren't set up, then the rtas_error_log_max is zero,
> and, as a result, the message is never logged.  By initializing
> rtas_error_log_max to the correct non-zero value, the errors can 
> get logged.

This looks to me like the setting of rtas_error_log_max should be done
much earlier, in pSeries_init_early, say.  Shouldn't we be using the
rtas_error_log_max variable in __fetch_rtas_last_error, too, rather
than the constant RTAS_ERROR_LOG_MAX?

> -- So the decision was wisely made to move this all to user-space. 
> But what shall the communications link between user-space and kernel be? 
> Somebody, somewhere,  I know not who or why, decided that they should 
> go into syslog.  And so here we are.

Netlink is the usual solution to this sort of problem.  I think it
would be reasonable to printk RTAS error events with a severity of
fatal and maybe even of error.  Warnings and events should just get
sent to rtasd.

Oh, and it would be useful to have a comment in the code that calls
__fetch_rtas_last_error that says that we are only calling it if the
RTAS call could not perform its function due to a hardware error.  In
other words the -1 return isn't a generic "didn't work" code but more
specifically a "hardware error" code.

Paul.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
  2004-07-02 10:44     ` Paul Mackerras
@ 2004-07-02 14:15       ` Hollis Blanchard
  2004-07-02 16:18         ` Nathan Fontenot
  0 siblings, 1 reply; 16+ messages in thread
From: Hollis Blanchard @ 2004-07-02 14:15 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linas, linuxppc64-dev, linux-kernel

On Jul 2, 2004, at 5:44 AM, Paul Mackerras wrote:
>
> Netlink is the usual solution to this sort of problem.  I think it
> would be reasonable to printk RTAS error events with a severity of
> fatal and maybe even of error.  Warnings and events should just get
> sent to rtasd.

I asked about this before, and was told that there is no way to 
determine the severity of an event without doing full parsing of the 
binary data. I'd be thrilled to be wrong...

-- 
Hollis Blanchard
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
  2004-07-02 14:15       ` Hollis Blanchard
@ 2004-07-02 16:18         ` Nathan Fontenot
  2004-07-02 17:29           ` Hollis Blanchard
  0 siblings, 1 reply; 16+ messages in thread
From: Nathan Fontenot @ 2004-07-02 16:18 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: Paul Mackerras, linas, linuxppc64-dev, linux-kernel

> I asked about this before, and was told that there is no way to
> determine the severity of an event without doing full parsing of the
> binary data. I'd be thrilled to be wrong...
> 

Gettting the severity of an RTAS event is possible, and not too 
difficult.  Check out asm-ppc64/rtas.h for a definition of the
RTAS event header (struct rtas_error_log).  All RTAS events have the 
same initial header containing the severity of the event.

Decoding RTAS events beyond the intial header, that gets ugly quick and 
will hopefully never need to be done in the kernel.

-- 
Nathan Fontenot

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
  2004-07-02 16:18         ` Nathan Fontenot
@ 2004-07-02 17:29           ` Hollis Blanchard
  2004-07-02 18:13             ` linas
  2004-07-06 13:24             ` Jake Moilanen
  0 siblings, 2 replies; 16+ messages in thread
From: Hollis Blanchard @ 2004-07-02 17:29 UTC (permalink / raw)
  To: nfont; +Cc: Paul Mackerras, linas, linuxppc64-dev, linux-kernel

On Fri, 2004-07-02 at 11:18, Nathan Fontenot wrote:
> > I asked about this before, and was told that there is no way to
> > determine the severity of an event without doing full parsing of the
> > binary data. I'd be thrilled to be wrong...
> 
> Gettting the severity of an RTAS event is possible, and not too 
> difficult.  Check out asm-ppc64/rtas.h for a definition of the
> RTAS event header (struct rtas_error_log).  All RTAS events have the 
> same initial header containing the severity of the event.

Great! Of course that won't help much if we get repeating "important"
events that aren't even interesting much less important, but it's worth
trying to printk only the important ones and leave the rest to netlink.

Note that currently we printk them all as KERN_DEBUG messages. Although
they aren't spewed to console, they still take up (lots of) space in the
printk buffer, and dmesg is still afflicted too...

-- 
Hollis Blanchard
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
  2004-07-02 17:29           ` Hollis Blanchard
@ 2004-07-02 18:13             ` linas
  2004-07-02 18:27               ` Greg KH
  2004-07-06 13:24             ` Jake Moilanen
  1 sibling, 1 reply; 16+ messages in thread
From: linas @ 2004-07-02 18:13 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: nfont, Paul Mackerras, linuxppc64-dev, linux-kernel

On Fri, Jul 02, 2004 at 12:29:08PM -0500, Hollis Blanchard wrote:
> On Fri, 2004-07-02 at 11:18, Nathan Fontenot wrote:
> > > I asked about this before, and was told that there is no way to
> > > determine the severity of an event without doing full parsing of the
> > > binary data. I'd be thrilled to be wrong...
> > 
> > Gettting the severity of an RTAS event is possible, and not too 
> > difficult.  Check out asm-ppc64/rtas.h for a definition of the
> > RTAS event header (struct rtas_error_log).  All RTAS events have the 
> > same initial header containing the severity of the event.
> 
> Great! Of course that won't help much if we get repeating "important"
> events that aren't even interesting much less important, but it's worth
> trying to printk only the important ones and leave the rest to netlink.

OK,

I'd like to wait until some of the current patches get in, so as to 
avoid a case of patch-versionitis.

I mis-spoke earlier about who the intendend consumers of the printk'ed 
messages are; rtasd already implements its own kernl-to-user interface
via the /proc interface.  Yes, everything in /proc/ppc64 is prolly 
deprecated, but lets put this off till later.

--linas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
  2004-07-02 18:13             ` linas
@ 2004-07-02 18:27               ` Greg KH
  2004-07-02 18:55                 ` Dave Hansen
  0 siblings, 1 reply; 16+ messages in thread
From: Greg KH @ 2004-07-02 18:27 UTC (permalink / raw)
  To: linas; +Cc: Hollis Blanchard, nfont, Paul Mackerras, linuxppc64-dev,
	linux-kernel

On Fri, Jul 02, 2004 at 01:13:47PM -0500, linas@austin.ibm.com wrote:
> 
> On Fri, Jul 02, 2004 at 12:29:08PM -0500, Hollis Blanchard wrote:
> > On Fri, 2004-07-02 at 11:18, Nathan Fontenot wrote:
> > > > I asked about this before, and was told that there is no way to
> > > > determine the severity of an event without doing full parsing of the
> > > > binary data. I'd be thrilled to be wrong...
> > >
> > > Gettting the severity of an RTAS event is possible, and not too
> > > difficult.  Check out asm-ppc64/rtas.h for a definition of the
> > > RTAS event header (struct rtas_error_log).  All RTAS events have the
> > > same initial header containing the severity of the event.
> >
> > Great! Of course that won't help much if we get repeating "important"
> > events that aren't even interesting much less important, but it's worth
> > trying to printk only the important ones and leave the rest to netlink.
> 
> OK,
> 
> I'd like to wait until some of the current patches get in, so as to
> avoid a case of patch-versionitis.
> 
> I mis-spoke earlier about who the intendend consumers of the printk'ed
> messages are; rtasd already implements its own kernl-to-user interface
> via the /proc interface.  Yes, everything in /proc/ppc64 is prolly
> deprecated, but lets put this off till later.

Later when?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
  2004-07-02 18:27               ` Greg KH
@ 2004-07-02 18:55                 ` Dave Hansen
  2004-07-02 19:44                   ` Greg KH
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Hansen @ 2004-07-02 18:55 UTC (permalink / raw)
  To: Greg KH
  Cc: linas, Hollis Blanchard, nfont, Paul Mackerras,
	PPC64 External List, Linux Kernel Mailing List

On Fri, 2004-07-02 at 11:27, Greg KH wrote:
> On Fri, Jul 02, 2004 at 01:13:47PM -0500, linas@austin.ibm.com wrote:
> > I mis-spoke earlier about who the intendend consumers of the printk'ed
> > messages are; rtasd already implements its own kernl-to-user interface
> > via the /proc interface.  Yes, everything in /proc/ppc64 is prolly
> > deprecated, but lets put this off till later.
> 
> Later when?

2.7.0, anyone?

I think it would be nice to put printk()s in /proc/ppc64 handler
functions in early 2.7 and print out the task names along with a message
asking the user to report them.  That way, we can more easily track down
all of the users.  

The code would come back out before the next stable kernel.  

-- Dave


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
  2004-07-02 18:55                 ` Dave Hansen
@ 2004-07-02 19:44                   ` Greg KH
  0 siblings, 0 replies; 16+ messages in thread
From: Greg KH @ 2004-07-02 19:44 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linas, Hollis Blanchard, nfont, Paul Mackerras,
	PPC64 External List, Linux Kernel Mailing List

On Fri, Jul 02, 2004 at 11:55:48AM -0700, Dave Hansen wrote:
> On Fri, 2004-07-02 at 11:27, Greg KH wrote:
> > On Fri, Jul 02, 2004 at 01:13:47PM -0500, linas@austin.ibm.com wrote:
> > > I mis-spoke earlier about who the intendend consumers of the printk'ed
> > > messages are; rtasd already implements its own kernl-to-user interface
> > > via the /proc interface.  Yes, everything in /proc/ppc64 is prolly
> > > deprecated, but lets put this off till later.
> > 
> > Later when?
> 
> 2.7.0, anyone?

Fine with me, Linas?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
  2004-07-02 17:29           ` Hollis Blanchard
  2004-07-02 18:13             ` linas
@ 2004-07-06 13:24             ` Jake Moilanen
  1 sibling, 0 replies; 16+ messages in thread
From: Jake Moilanen @ 2004-07-06 13:24 UTC (permalink / raw)
  To: Hollis Blanchard
  Cc: nfont, paulus, linas, linuxppc64-dev, linux-kernel, strosake

> 
> On Fri, 2004-07-02 at 11:18, Nathan Fontenot wrote:
> > > I asked about this before, and was told that there is no way to
> > > determine the severity of an event without doing full parsing of the
> > > binary data. I'd be thrilled to be wrong...
> >
> > Gettting the severity of an RTAS event is possible, and not too
> > difficult.  Check out asm-ppc64/rtas.h for a definition of the
> > RTAS event header (struct rtas_error_log).  All RTAS events have the
> > same initial header containing the severity of the event.
> 
> Great! Of course that won't help much if we get repeating "important"
> events that aren't even interesting much less important, but it's worth
> trying to printk only the important ones and leave the rest to netlink.
> 
> Note that currently we printk them all as KERN_DEBUG messages. Although
> they aren't spewed to console, they still take up (lots of) space in the
> printk buffer, and dmesg is still afflicted too...
> 

The original "plan" for error logging was to eventually take out the
printk's all together once we could get ela (the userspace daemon
responsible for parsing error messages and routing them appropriately)
into all distros. We didn't want the possibility of a customer losing a
vital message by not having ela installed.  

I would propose the making the printk's of the messages a kernel config
option.  Then the distros could turn it on or off depending on if they
are packaging ela.  All messages should still go to userspace though.
This will alleviate the spamming of the printk buffer.

I have no problems in moving communication between kernel and userspace
to netlink.  Whomever makes the change needs to keep Mike Strosaker and
Nathan Fontenot informed since they are maintaining the user space
counterpart. 

Thanks,
Jake

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
  2004-06-30 10:55 ` Paul Mackerras
  2004-07-01 21:06   ` linas
@ 2004-07-06 13:41   ` Jake Moilanen
  2004-07-08 16:03     ` linas
  1 sibling, 1 reply; 16+ messages in thread
From: Jake Moilanen @ 2004-07-06 13:41 UTC (permalink / raw)
  To: Paul Mackerras, linas; +Cc: linuxppc64-dev, linux-kernel

> > Firmware can report errors at any time, and not atypically during boot.
> > However, these reports were being discarded until th rtasd comes up,
> > which occurs fairly late in the boot cycle.  As a result, firmware
> > errors during boot were being silently ignored.

Linas, the main consumer of error-log is events coming in from
event-scan.  We don't call event-scan until rtasd is up (eg they are
queued in FW until we call event-scan).  The only events I see us
missing are epow events, eeh? and anything coming from check-exception. 
epow is set up pretty late as well, and I don't think we even support
check-exception on 2.6.  eeh might be an issue.

> 
> > This patch at least gets them printk'ed so that at least they show
> > up in boot.msg/syslog.  There are two other logging mechanisms,
> > nvram and rtas, that I didn't touch because I don't understand
> > the reprecussions.  In particular, nvram logging isn't enabled
> > until late in the boot ... but what's the point of nvram logging
> > if not to catch messages that occured very early in boot ??
> 
> Indeed.
> 
> As for printk'ing the errors, it is annoying and it seems of somewhat
> dubious benefit to me, given that it is just incomprehensible hex
> numbers that can go on and on.  There has to be a better way.  Putting
> it in nvram seems like a better option to me.  I don't know of any
> reason why we can't use nvram quite early on.
> 

Paul,

We can initialize nvram very early, but we shouldn't discard an event
stored in nvram until rtasd is up and can pull the event out as it might
have been the error that took the system down on the previous boot.

We could probably start rtasd up a little earlier, but I'm not sure it
buys us that much.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
  2004-07-06 13:41   ` Jake Moilanen
@ 2004-07-08 16:03     ` linas
  2004-07-08 17:55       ` Jake Moilanen
  0 siblings, 1 reply; 16+ messages in thread
From: linas @ 2004-07-08 16:03 UTC (permalink / raw)
  To: Jake Moilanen; +Cc: Paul Mackerras, linuxppc64-dev, linux-kernel

On Tue, Jul 06, 2004 at 08:41:16AM -0500, Jake Moilanen wrote:
> 
> > > Firmware can report errors at any time, and not atypically during boot.
> > > However, these reports were being discarded until th rtasd comes up,
> > > which occurs fairly late in the boot cycle.  As a result, firmware
> > > errors during boot were being silently ignored.
> 
> Linas, the main consumer of error-log is events coming in from
> event-scan.  We don't call event-scan until rtasd is up (eg they are
> queued in FW until we call event-scan).  

Actually, they don't seem to be queueed at all; when I turned on 
logging earlier, a whole pile of messages poped out that weren't 
visible before.

> The only events I see us
> missing are epow events, 

Depends on what you are doing.  In my case, the fact that the 
early-boot messages were discarded was hiding a bug (that was causing
those messages, that I've sent in a patch for).

--linas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
  2004-07-08 16:03     ` linas
@ 2004-07-08 17:55       ` Jake Moilanen
  0 siblings, 0 replies; 16+ messages in thread
From: Jake Moilanen @ 2004-07-08 17:55 UTC (permalink / raw)
  To: linas; +Cc: paulus, linuxppc64-dev, linux-kernel

On Thu, 8 Jul 2004 11:03:37 -0500
linas@austin.ibm.com wrote:

> On Tue, Jul 06, 2004 at 08:41:16AM -0500, Jake Moilanen wrote:
> > 
> > > > Firmware can report errors at any time, and not atypically during boot.
> > > > However, these reports were being discarded until th rtasd comes up,
> > > > which occurs fairly late in the boot cycle.  As a result, firmware
> > > > errors during boot were being silently ignored.
> > 
> > Linas, the main consumer of error-log is events coming in from
> > event-scan.  We don't call event-scan until rtasd is up (eg they are
> > queued in FW until we call event-scan).  
> 
> Actually, they don't seem to be queueed at all; when I turned on 
> logging earlier, a whole pile of messages poped out that weren't 
> visible before.

event-scan is called every 30 seconds.  FW has to queue them.

If you are seeing a different pile of messages, I would imagine the
messages that popped out are not coming from event-scan then.  Might be
last_error, which messages do not come in from event-scan.  I can see
them not being logged in early boot.  

A problem I could see, is if we make an rtas call before the VM
is up.  The kmalloc for last_error won't like that.

Jake

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2004-07-08 17:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-30  0:10 [PATCH] [2.6] PPC64: log firmware errors during boot linas
2004-06-30 10:55 ` Paul Mackerras
2004-07-01 21:06   ` linas
2004-07-02  5:36     ` Greg KH
2004-07-02 10:44     ` Paul Mackerras
2004-07-02 14:15       ` Hollis Blanchard
2004-07-02 16:18         ` Nathan Fontenot
2004-07-02 17:29           ` Hollis Blanchard
2004-07-02 18:13             ` linas
2004-07-02 18:27               ` Greg KH
2004-07-02 18:55                 ` Dave Hansen
2004-07-02 19:44                   ` Greg KH
2004-07-06 13:24             ` Jake Moilanen
2004-07-06 13:41   ` Jake Moilanen
2004-07-08 16:03     ` linas
2004-07-08 17:55       ` Jake Moilanen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox