public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jim Keniston <jkenisto@linux.vnet.ibm.com>
To: linux-kernel@vger.kernel.org
Subject: Re: [RFC] persistent store
Date: Mon, 22 Nov 2010 16:06:03 -0800	[thread overview]
Message-ID: <1290470763.3008.252.camel@localhost> (raw)

> Here's a patch based on some discussions I had with Thomas
> Gleixner at plumbers conference that implements a generic
> layer for persistent storage usable to pass tens or hundreds
> of kilobytes of data from the dying breath of a crashing
> kernel to its successor.
> 
> The usage model I'm envisioning is that a platform driver
> will register with this code to provide the actual storage.
> I've tried to make this interface general, but I'm working
> from a sample of one (the ACPI ERST code), so if anyone else
> has some persistent store that can't be handled by this code,
> speak up and we can put in the necessary tweaks.

I recently posted a patch set for powerpc to capture the most recent
oops or panic message in NVRAM:
http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-November/087032.html
It covers a lot of the same ground, and could be adapted to use your
framework.  See below for concerns and suggestions.  I'd also be
interested in feedback about the design decisions I mention below.

On powerpc, the amount of NVRAM available for this may be as little
as 1-2 Kbytes.  The minimal oops report (with essentially no backtrace)
is about 1800 bytes.  See below for implications.

We currently read our NVRAM contents via /dev/nvram and the nvram
command.  NVRAM is divided up into several "partitions" -- only
one of which is used for the oops/panic report -- so the user-space
code needs to know about how the partitions are laid out.  It also
needs to know how much text we actually wrote to the partition, and
whether or not it's compressed.  Since the kernel already knows how
to determine all this, it would probably be more convenient to get
at the oops/panic partition through your /sys interface.

> ...
> 2) "Why do you read in all the data from the device when it
> registers and save it in memory? Couldn't you just get the
> list of records and pick up the data from the device when
> the user reads the file?"
> I don't think this is going to be very much data, just a few hundred
> kilobytes (i.e. less that $0.01 worth of memory, even expensive server
> memory). The memory is freed when the record is erased ... which is
> likely to be soon after boot.

Since the amount of text we capture is so tiny, this is unlikely to be
an issue in my case.

> ...
> 6) "Is this widely useful? How many systems have persistent storage?"
> Although ERST was only added to the ACPI spec earlier this year, it
> merely documents existing functionality required for WHEA (Windows
> Hardware Error Architecture). So most modern server systems should
> have it (my test system has it, and it has a BIOS written in mid 2008).
> Sorry desktops & laptops - no love for you here.
> 

Powerpc p Series does, obviously, and we're looking to exploit it in
just this way.

> ...
> +static void
> +pstore_dump(struct kmsg_dumper *dumper, enum kmsg_dump_reason reason,
> + const char *s1, unsigned long l1,
> + const char *s2, unsigned long l2)
> +{
> + unsigned long s1_start, s2_start;
> + unsigned long l1_cpy, l2_cpy;
> + char *dst = pstore_buf + psinfo->header_size;
> +
> + /* Don't dump oopses to persistent store */

Why not?  In our case, we capture every oops and panic report, but keep
only the most recent.  Seems like catching the last oops could be useful
if your system hangs thereafter and can't be made to panic.  I suggest
you pass along the reason (KMSG_DUMP_OOPS or whatever) and let the
callback decide.

You'd have to serialize the oops handling, I guess, in case multiple
CPUs oops simultaneously.  (Gotta fix that in my code.)

> + if (reason == KMSG_DUMP_OOPS)
> + return;
> +
> + l2_cpy = min(l2, psinfo->data_size);
> + l1_cpy = min(l1, psinfo->data_size - l2_cpy);
> +
> + s2_start = l2 - l2_cpy;
> + s1_start = l1 - l1_cpy;
> +
> + memcpy(dst, s1 + s1_start, l1_cpy);
> + memcpy(dst + l1_cpy, s2 + s2_start, l2_cpy);
> +
> + psinfo->writer(PSTORE_DMESG, pstore_buf, l1_cpy + l2_cpy);

This assumes that you always want to capture the last psinfo->data_size
bytes of the printk buffer.  Given the small capacity of our NVRAM
partition, I handle the case where the whole oops report doesn't fit.
In that case, I sacrifice the end of the oops report to capture the
beginning.  Patch #3 in my set is about this.

> ...
> +static int
> +pstore_create_sysfs_entry(struct pstore_entry *new_pstore)
> +{
> ...
> + new_pstore->attr.attr.mode = 0444;

/var/log/messages is typically not readable by everybody.  This
appears to circumvent that.

> ...

Thanks.

Jim Keniston
IBM Linux Technology Center
Beaverton, OR



             reply	other threads:[~2010-11-23  0:06 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-23  0:06 Jim Keniston [this message]
2010-11-23  1:37 ` [RFC] persistent store Tony Luck
2010-11-23  3:10   ` Tony Luck
2010-11-23  4:30     ` Kyungmin Park
2010-11-24 20:35   ` Jim Keniston
  -- strict thread matches above, loose matches on Subject: below --
2010-11-20 23:48 Luck, Tony
2010-11-21  9:07 ` Borislav Petkov
2010-11-21 21:47   ` Tony Luck
2010-11-22  7:32     ` Borislav Petkov
2010-11-22  7:48       ` Borislav Petkov
2010-11-21 21:14 ` David Miller
2010-11-22  1:59 ` Huang Ying
2010-11-22 10:43   ` Alan Cox
2010-11-22 18:17     ` Tony Luck
2010-11-22 16:55   ` Tony Luck
2010-11-22 18:24     ` Geert Uytterhoeven
2010-11-22 18:33       ` Tony Luck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1290470763.3008.252.camel@localhost \
    --to=jkenisto@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox