From: Borislav Petkov <bp@alien8.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>,
Mauro Carvalho Chehab <m.chehab@samsung.com>,
"bhelgaas@google.com" <bhelgaas@google.com>,
"rostedt@goodmis.org" <rostedt@goodmis.org>,
"rjw@sisk.pl" <rjw@sisk.pl>,
"lance.ortiz@hp.com" <lance.ortiz@hp.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
Date: Thu, 15 Aug 2013 12:14:32 +0200 [thread overview]
Message-ID: <20130815101432.GE27616@pd.tnic> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F31CBAAFA@ORSMSX106.amr.corp.intel.com>
On Wed, Aug 14, 2013 at 06:38:09PM +0000, Luck, Tony wrote:
> We've wandered around different strategies here. We definitely
> want the panic log. Some people want all other "kernel exit" logs
> (shutdown, reboot, kexec). When there is enough space in the pstore
> backend we might also want the "oops" that preceeded the panic. (Of
> course when the oops happens we don't know the future, so have to
> save it just in case ... then if more "oops" happen we have to decide
> whether to keep the old oops log, or save the newer one).
Ok, dmesg over serial and *only* oops+panic in pstore. Right.
> Yes - longer logs are better. Sad that the pstore backend devices are
> measured in kilobytes :-)
Right, so good ole serial again to the rescue! There's no room for full
dmesg in nvram because it needs space for the UEFI GUI and some other
crap :-)
> No - write speed for the persistent storage backing pstore (flash)
> means we don't log as we go. We wait for a panic and then our
> registered function gets called so we can snapshot what is in the
> console log at that point. We also don't want to wear out the flash
> which may be soldered to the motherboard.
I suspected as much. So we can forget about using *only* pstore for hw
errors logging. It would be cool to do so but the technology simply
doesn't give it.
> Agreed - we shouldn't clutter logs with details of corrected errors.
> At most we should have a rate-limited log showing the count of
> corrected errors so that someone who just watches dmesg knows they
> should go dig deeper if they see some big number of corrected errors.
/me nods.
> Yes. There are people looking at various "flight recorder" modes for
> tracing that keep logs of normal events in a circular buffer in RAM
> ... if these exist they should be saved at crash time (and they are in
> the kexec/kdump path, but I don’t know if anyone does anything in
> the non-kdump case).
Right, the cheapest solution is serial. Simply log everything to serial
because we can. But this is the key thing I wanted to emphasize:
For severe hardware errors we don't want to use any tracepoint -
actually it is even a bad thing to do so because they would get lost in
some side channels which, during a critical situation, might not get
written to anything/survive the crash, etc.
So what I'm saying is, we basically want severe hardware errors to land
to good old dmesg and to all consoles. No fancy TP stuff for them.
> Tracepoints for errors that are going to lead to system crash would
> only be useful together with a flight recorder to make sure they get
> saved. I think tracepoints for corrected errors are better than dmesg
> logs.
Yes, exactly.
> In a perfect world yes - I don't know that we can achieve perfection
> - but we can iterate through good, better, even better. The really
> hard part of this is figuring out what is *relevant* to save before a
> particular crash happens.
Well, if I have serial connected to the box, it will contain basically
everything the machine said, no?
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
next prev parent reply other threads:[~2013-08-15 10:14 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-08 18:27 [PATCH 0/3] Add trace event for ghes memory error Naveen N. Rao
2013-08-08 18:27 ` [PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally Naveen N. Rao
2013-08-08 19:23 ` Steven Rostedt
2013-08-12 11:37 ` Naveen N. Rao
2013-08-12 13:13 ` Steven Rostedt
2013-08-12 13:26 ` Borislav Petkov
2013-08-08 18:27 ` [PATCH 2/3] mce: acpi/apei: trace: Add trace event for ghes memory error Naveen N. Rao
2013-08-08 19:17 ` Borislav Petkov
2013-08-12 11:28 ` Naveen N. Rao
2013-08-08 18:27 ` [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event Naveen N. Rao
2013-08-08 19:38 ` Mauro Carvalho Chehab
2013-08-10 18:03 ` Borislav Petkov
2013-08-12 11:33 ` Mauro Carvalho Chehab
2013-08-12 12:38 ` Borislav Petkov
2013-08-12 14:49 ` Mauro Carvalho Chehab
2013-08-12 15:04 ` Borislav Petkov
2013-08-12 17:25 ` Mauro Carvalho Chehab
2013-08-12 17:54 ` Luck, Tony
2013-08-12 17:56 ` Borislav Petkov
2013-08-13 11:36 ` Naveen N. Rao
2013-08-13 12:21 ` Mauro Carvalho Chehab
2013-08-13 12:33 ` Borislav Petkov
2013-08-13 16:55 ` Naveen N. Rao
2013-08-14 23:54 ` Mauro Carvalho Chehab
2013-08-12 12:41 ` Naveen N. Rao
2013-08-12 12:53 ` Borislav Petkov
2013-08-13 11:21 ` Naveen N. Rao
2013-08-13 12:42 ` Borislav Petkov
2013-08-13 17:32 ` Naveen N. Rao
2013-08-13 17:58 ` Borislav Petkov
2013-08-13 18:05 ` Luck, Tony
2013-08-13 18:05 ` Luck, Tony
2013-08-13 18:05 ` Luck, Tony
2013-08-13 18:10 ` Borislav Petkov
2013-08-13 20:13 ` Luck, Tony
2013-08-13 20:13 ` Luck, Tony
2013-08-13 20:13 ` Luck, Tony
2013-08-14 5:43 ` Borislav Petkov
2013-08-14 18:38 ` Luck, Tony
2013-08-14 18:38 ` Luck, Tony
2013-08-14 18:38 ` Luck, Tony
2013-08-15 10:14 ` Borislav Petkov [this message]
2013-08-15 19:14 ` Luck, Tony
2013-08-15 19:14 ` Luck, Tony
2013-08-15 19:14 ` Luck, Tony
2013-08-15 19:43 ` Borislav Petkov
2013-08-15 0:05 ` Mauro Carvalho Chehab
2013-08-14 10:57 ` Naveen N. Rao
2013-08-15 0:22 ` Mauro Carvalho Chehab
2013-08-15 9:38 ` Borislav Petkov
2013-08-15 13:26 ` Mauro Carvalho Chehab
2013-08-15 13:44 ` Borislav Petkov
2013-08-15 14:14 ` Mauro Carvalho Chehab
2013-08-15 16:11 ` Borislav Petkov
2013-08-15 19:20 ` Luck, Tony
2013-08-15 19:41 ` Borislav Petkov
2013-08-15 0:00 ` Mauro Carvalho Chehab
2013-08-15 9:43 ` Borislav Petkov
2013-08-12 14:44 ` Mauro Carvalho Chehab
2013-08-13 11:41 ` Naveen N. Rao
2013-08-13 12:41 ` Mauro Carvalho Chehab
2013-08-13 17:17 ` Naveen N. Rao
2013-08-13 17:39 ` Luck, Tony
2013-08-14 10:47 ` Naveen N. Rao
2013-08-14 12:18 ` Borislav Petkov
2013-08-15 0:15 ` Mauro Carvalho Chehab
2013-08-15 10:01 ` Borislav Petkov
2013-08-15 13:34 ` Mauro Carvalho Chehab
2013-08-15 13:51 ` Borislav Petkov
2013-08-15 18:16 ` Luck, Tony
2013-08-15 18:16 ` Luck, Tony
2013-08-15 18:16 ` Luck, Tony
2013-08-15 18:41 ` Borislav Petkov
2013-08-14 23:56 ` Mauro Carvalho Chehab
2013-08-15 10:02 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130815101432.GE27616@pd.tnic \
--to=bp@alien8.de \
--cc=bhelgaas@google.com \
--cc=lance.ortiz@hp.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=m.chehab@samsung.com \
--cc=naveen.n.rao@linux.vnet.ibm.com \
--cc=rjw@sisk.pl \
--cc=rostedt@goodmis.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.