All of lore.kernel.org
 help / color / mirror / Atom feed
From: linas@austin.ibm.com
To: Paul Mackerras <paulus@samba.org>
Cc: linuxppc64-dev@lists.linuxppc.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] [2.6] PPC64: log firmware errors during boot.
Date: Thu, 1 Jul 2004 16:06:14 -0500	[thread overview]
Message-ID: <20040701160614.I21634@forte.austin.ibm.com> (raw)
In-Reply-To: <16610.39955.554139.858593@cargo.ozlabs.ibm.com>; from paulus@samba.org on Wed, Jun 30, 2004 at 08:55:15PM +1000

On Wed, Jun 30, 2004 at 08:55:15PM +1000, Paul Mackerras wrote:
> Linas,
> 
> > Firmware can report errors at any time, and not atypically during boot.
> > However, these reports were being discarded until th rtasd comes up,
> > which occurs fairly late in the boot cycle.  As a result, firmware
> > errors during boot were being silently ignored. 
> 
> As far as I can see the main change is in log_rtas_len, which is
> called from pSeries_log_error, which is called from do_event_scan and
> rtasd(), and do_event_scan is only called from rtasd().  And
> get_eventscan_parms() is already called at the beginning of rtasd().

Yes, but rtasd starts up late in the book process.  Most of the 
"interesting" manipulations with firmware are old history by then,
and thus, any firmware errors encountered during the boot were never 
logged.

> So I don't see the point of the get_eventscan_parms call in
> log_rtas_len.  

If the parms aren't set up, then the rtas_error_log_max is zero,
and, as a result, the message is never logged.  By initializing
rtas_error_log_max to the correct non-zero value, the errors can 
get logged.


> > This patch at least gets them printk'ed so that at least they show 
> > up in boot.msg/syslog.  There are two other logging mechanisms,
> > nvram and rtas, that I didn't touch because I don't understand 
> > the reprecussions.  In particular, nvram logging isn't enabled
> > until late in the boot ... but what's the point of nvram logging
> > if not to catch messages that occured very early in boot ?? 
> 
> Indeed.
> 
> As for printk'ing the errors, it is annoying and it seems of somewhat
> dubious benefit to me, given that it is just incomprehensible hex
> numbers that can go on and on.  There has to be a better way.  

Yes, well, you'll be hard-pressed to find a lover of the hex format 
anywhere.  Lets review the history of the design decisions that
got us to this point.  I think a better solution might then become
evident.

-- Originally, these binary messages from firmware were decoded
in the kernel, and printed out in 'plain english'.  However, there
were problems: 1) the format of the binary kept evolving; I think 
we are now up to version 6.  2) the need for supporting version 6 
and *all* of the earlier versions lead to dreaded kernel bloat.
For the current user-space decoder:
# wc *.c *.h
   2207    7056   67959 total

-- So the decision was wisely made to move this all to user-space. 
But what shall the communications link between user-space and kernel be? 
Somebody, somewhere,  I know not who or why, decided that they should 
go into syslog.  And so here we are.

How else could we do this?  I have never had to architect a kernel-to-user
data communications interface, so I don't know what the alternatives 
are.  We could queue them up to some file in /proc, which user-space 
reads.  Or maybe /sys instead ?? Maybe a stunt with sockets? Some 
new device in /dev/ that can be opened, read, closed?  How should 
the user space daemon indicate that its picked up the message and 
doesn't need it any more?  Write a msg number to a /proc file?  
Maybe each individual message should go in its own file, and user 
space just rm's that file after its fetched/saved the message.  
I dunno, I think any one of these could be whipped up in a jiffy.
Convincing the user-space to use the interface might be harder.

Pick one. If it can be coded in under a day, I can volunteer to
do that.

> Putting
> it in nvram seems like a better option to me.  I don't know of any
> reason why we can't use nvram quite early on.

Me neither, Jake knows. I thought the whole point of nvram was to not
loose the messages during crash; the messages are promtly copied out
of nvram once the system is up and stable; nvram is a staging area, 
not a permanent repository.

--linas

  reply	other threads:[~2004-07-02  0:20 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-06-30  0:10 [PATCH] [2.6] PPC64: log firmware errors during boot linas
2004-06-30 10:55 ` Paul Mackerras
2004-07-01 21:06   ` linas [this message]
2004-07-02  5:36     ` Greg KH
2004-07-02 10:44     ` Paul Mackerras
2004-07-02 14:15       ` Hollis Blanchard
2004-07-02 16:18         ` Nathan Fontenot
2004-07-02 17:29           ` Hollis Blanchard
2004-07-02 18:13             ` linas
2004-07-02 18:27               ` Greg KH
2004-07-02 18:55                 ` Dave Hansen
2004-07-02 19:44                   ` Greg KH
2004-07-06 13:24             ` Jake Moilanen
2004-07-06 13:41   ` Jake Moilanen
2004-07-08 16:03     ` linas
2004-07-08 17:55       ` Jake Moilanen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040701160614.I21634@forte.austin.ibm.com \
    --to=linas@austin.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc64-dev@lists.linuxppc.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.