public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Aaron Durbin <adurbin@google.com>
To: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"simon.kagstrom@netinsight.net" <simon.kagstrom@netinsight.net>,
	"David.Woodhouse@intel.com" <David.Woodhouse@intel.com>,
	"anders.grafstrom@netinsight.net"
	<anders.grafstrom@netinsight.net>,
	"Artem.Bityutskiy@nokia.com" <Artem.Bityutskiy@nokia.com>,
	"kosaki.motohiro@jp.fujitsu.com" <kosaki.motohiro@jp.fujitsu.com>,
	"jason.wessel@windriver.com" <jason.wessel@windriver.com>,
	"jslaby@suse.cz" <jslaby@suse.cz>,
	"jmorris@namei.org" <jmorris@namei.org>,
	"eparis@redhat.com" <eparis@redhat.com>,
	"hch@lst.de" <hch@lst.de>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"dle-develop@lists.sourceforge.net" 
	<dle-develop@lists.sourceforge.net>,
	"Satoru Moriya"@google.com
Subject: Re: [RFC][Patch] Adding kmsg_dump() to reboot/halt/poweroff/emergency_restart path
Date: Wed, 03 Nov 2010 14:50:17 -0700	[thread overview]
Message-ID: <4CD1D919.5000209@google.com> (raw)
In-Reply-To: <5C4C569E8A4B9B42A84A977CF070A35B2C1276CEC7@USINDEVS01.corp.hds.com>

On 10/27/10 12:44, Seiji Aguchi wrote:
> Hi,
>
>> What actual problem are we solving here?  Why is the current code
>> inadequate?  It would help to demonstrate some use-case and to explain
>> how the situation improved with this patch.
>
> [Purpose]
>   My purpose is developing highly reliable logging facility for
enterprise use.
>
>   I'm planning to add the following triggers of kmsg_dumper().
>      - reboot/poweroff/halt/emergency_restart (this patch)
>      - Machine check
>
>   I'm also planning to add an feature outputting kernel messages to
NVRAM,
>   because NVRAM is equipped with enterprise servers.
>   We can realize highly reliable logging facility by outputting
kernel messages to NVRAM.
>   (NVRAM is commonly used on Mainframe and Commercial Unix as well.)
>
> [Use case of reboot/poweroff/halt/emergency_restart]
>
>   My company has often experienced the followings in our support service.
>   - Customer's system suddenly reboots.
>   - Customers ask us to investigate the reason of the reboot.
>
>   We recognize the fact itself because boot messages remain in
/var/log/messages.
>   However, we can't investigate the reason why the system rebooted,
>   because the last messages don't remain.
>   And off course we can't explain the reason.
>
>
>   We can solve above problem with this patch as follows.
>   Case1: reboot with command
>     - We can see "Restarting system with command:" or ""Restarting
system.".
>
>   Case2: halt with command
>     - We can see "System halted.".
>
>   Case3: poweroff with command
>     - We can see " Power down.".
>
>   Case4: emergency_restart with sysrq.
>     - We can see "Sysrq:" outputted in __handle_sysrq().
>
>   Case5: emergency_restart with softdog.
>     - We can see "Initiating system reboot" in watchdog_fire().
>
>   So, we can distinguish the reason of reboot, poweroff, halt and
emergency_restart.
>
>   If customer executed reboot command, you may think the customer
should know the fact.
>   However, they often claim they don't execute the command when they
rebooted system by mistake.
>
>   No evidential message remain on current Linux kernel, so we can't
show the proof to the customer.
>   This patch improves this situation.
>
> Seiji

We carry patches in our kernels that do very similar things. The reason 
is essentially the same as what you have cited. On our platforms we have 
two different ways of storing events to an event log. One communicates 
with the BIOS itself; the other writes bit flags to a known area of 
non-volatile storage. That way when the machine comes back up we have a 
clear eventlog (with times) as to what happened when. Piecing these 
events together has proven to be invaluable for finding issues.

For both of the drivers that log these events they use a shared 
interface that collect various events in the kernel and present them 
through a single notifier chain for the drivers' consumption.

The things we currently track and log are the following:
- clean reboot/shutdown
- panic
- oops
- die
- NMI watchdog

An example eventlog produced by our systems looks like the following 
(63-67 are the boot numbers of the system in question):

2010-10-14 10:26:06 | System Reset | 63
2010-10-14 10:26:19 | System boot | 63
2010-10-14 11:36:43 | Kernel Shutdown | 63 | Unknown Shutdown Reason
2010-10-14 11:36:43 | System Reset | 64
2010-10-14 11:36:56 | System boot | 64
2010-10-18 14:51:54 | Kernel Shutdown | 64 | Clean
2010-10-18 14:52:38 | System Reset | 65
2010-10-18 14:52:51 | System boot | 65
2010-10-26 02:44:48 | Kernel Shutdown | 65 | Oops
2010-10-26 02:44:48 | Kernel Shutdown | 65 | Die
2010-10-26 02:44:49 | Kernel Shutdown | 65 | Panic
2010-10-26 02:45:43 | System Reset | 66
2010-10-26 02:45:56 | System boot | 66
2010-10-26 02:49:22 | Kernel Shutdown | 66 | Clean
2010-10-26 02:50:05 | System Reset | 67
2010-10-26 02:50:18 | System boot | 67
2010-10-26 11:39:20 | Kernel Shutdown | 67 | Clean

Hope that helps others know that we think such a mechansim is vital. I 
can post the patches for the common infrastructure if people are interested.

-Aaron

  reply	other threads:[~2010-11-03 21:51 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-18 22:24 [RFC][Patch] Adding kmsg_dump() to reboot/halt/poweroff/emergency_restart path Seiji Aguchi
2010-10-18 22:33 ` Andrew Morton
2010-10-27 19:44   ` Seiji Aguchi
2010-11-03 21:50     ` Aaron Durbin [this message]
2010-11-04  7:20       ` Artem Bityutskiy
2010-10-19  8:52 ` KOSAKI Motohiro
2010-10-19  8:51   ` Artem Bityutskiy
2010-10-27 23:35     ` Andrew Morton
2010-10-28 19:58       ` Artem Bityutskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CD1D919.5000209@google.com \
    --to=adurbin@google.com \
    --cc="Satoru Moriya"@google.com \
    --cc=Artem.Bityutskiy@nokia.com \
    --cc=David.Woodhouse@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=anders.grafstrom@netinsight.net \
    --cc=dle-develop@lists.sourceforge.net \
    --cc=eparis@redhat.com \
    --cc=hch@lst.de \
    --cc=jason.wessel@windriver.com \
    --cc=jmorris@namei.org \
    --cc=jslaby@suse.cz \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=seiji.aguchi@hds.com \
    --cc=simon.kagstrom@netinsight.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox