linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>,
	Mauro Carvalho Chehab <m.chehab@samsung.com>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	"rjw@sisk.pl" <rjw@sisk.pl>,
	"lance.ortiz@hp.com" <lance.ortiz@hp.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
Date: Wed, 14 Aug 2013 07:43:22 +0200	[thread overview]
Message-ID: <20130814054322.GA9158@pd.tnic> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F31CB9150@ORSMSX106.amr.corp.intel.com>

On Tue, Aug 13, 2013 at 08:13:56PM +0000, Luck, Tony wrote:
> Generic tracepoints are architected to be able to fire at very high
> rates and log huge amounts of information. So we'd need something
> special to say just log these special tracepoints to network/serial.
>
> > Which reminds me, pstore could also be a good thing to use, in addition.
> > Only put error info there as it is limited anyway.
> 
> Yes - space is very limited.  I don't know how to assign priority for logging
> the dmesg data vs. some error logs.

Didn't we say at some point, "log only the panic messsage which kills
the machine"?

However, we probably could use more the messages before that
catastrophic event because they could give us hints about what lead to
the panic but in that case maybe a limited pstore is the wrong logging
medium.

Actually, I can imagine the full serial/network logs of "special"
tracepoints + dmesg to be the optimal thing.

> If we just "printk()" the most important parts - then that data will
> automatically flow to the serial console and to pstore.

Actually, does the pstore act like a circular buffer? Because if it
contains the last N relevant messages (for an arbitrary definition of
relevant) before the system dies, then that could more helpful than only
the error messages.

And with the advent of UEFI, pretty much every system has a pstore. Too
bad that we have to limit it to 50% of size so that the boxes don't
brick. :-P

> Then we have multiple paths for the critical bits of the error log
> - and the tracepoints give us more details for the cases where the
> machine doesn't spontaneously explode.

Ok, let's sort:

* First we have the not-so-critical hw error messages. We want to carry
those out-of-band, i.e. not in dmesg so that people don't have to parse
and collect dmesg but have a specialized solution which gives them
structured logs and tools can analyze, collect and ... those errors.

* When a critical error happens, the above usage is not necessarily
advantageous anymore in the sense that, in order to debug what caused
the machine to crash, we don't simply necessarily want only the crash
message but also the whole system activity that lead to it.

In which case, we probably actually want to turn off/ignore the error
logging tracepoints and write *only* to dmesg which goes out over serial
and to pstore. Right?

Because in such cases I want to have *all* *relevant* messages that lead
to the explosion + the explosion message itself.

Makes sense? Yes, no? Aspects I've missed?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

  reply	other threads:[~2013-08-14  5:43 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-08 18:27 [PATCH 0/3] Add trace event for ghes memory error Naveen N. Rao
2013-08-08 18:27 ` [PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally Naveen N. Rao
2013-08-08 19:23   ` Steven Rostedt
2013-08-12 11:37     ` Naveen N. Rao
2013-08-12 13:13       ` Steven Rostedt
2013-08-12 13:26         ` Borislav Petkov
2013-08-08 18:27 ` [PATCH 2/3] mce: acpi/apei: trace: Add trace event for ghes memory error Naveen N. Rao
2013-08-08 19:17   ` Borislav Petkov
2013-08-12 11:28     ` Naveen N. Rao
2013-08-08 18:27 ` [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event Naveen N. Rao
2013-08-08 19:38   ` Mauro Carvalho Chehab
2013-08-10 18:03     ` Borislav Petkov
2013-08-12 11:33       ` Mauro Carvalho Chehab
2013-08-12 12:38         ` Borislav Petkov
2013-08-12 14:49           ` Mauro Carvalho Chehab
2013-08-12 15:04             ` Borislav Petkov
2013-08-12 17:25               ` Mauro Carvalho Chehab
2013-08-12 17:54                 ` Luck, Tony
2013-08-12 17:56                 ` Borislav Petkov
2013-08-13 11:36                   ` Naveen N. Rao
2013-08-13 12:21                     ` Mauro Carvalho Chehab
2013-08-13 12:33                       ` Borislav Petkov
2013-08-13 16:55                       ` Naveen N. Rao
2013-08-14 23:54                         ` Mauro Carvalho Chehab
2013-08-12 12:41         ` Naveen N. Rao
2013-08-12 12:53           ` Borislav Petkov
2013-08-13 11:21             ` Naveen N. Rao
2013-08-13 12:42               ` Borislav Petkov
2013-08-13 17:32                 ` Naveen N. Rao
2013-08-13 17:58                   ` Borislav Petkov
2013-08-13 18:05                     ` Luck, Tony
2013-08-13 18:10                       ` Borislav Petkov
2013-08-13 20:13                         ` Luck, Tony
2013-08-14  5:43                           ` Borislav Petkov [this message]
2013-08-14 18:38                             ` Luck, Tony
2013-08-15 10:14                               ` Borislav Petkov
2013-08-15 19:14                                 ` Luck, Tony
2013-08-15 19:43                                   ` Borislav Petkov
2013-08-15  0:05                             ` Mauro Carvalho Chehab
2013-08-14 10:57                     ` Naveen N. Rao
2013-08-15  0:22                       ` Mauro Carvalho Chehab
2013-08-15  9:38                         ` Borislav Petkov
2013-08-15 13:26                           ` Mauro Carvalho Chehab
2013-08-15 13:44                             ` Borislav Petkov
2013-08-15 14:14                               ` Mauro Carvalho Chehab
2013-08-15 16:11                                 ` Borislav Petkov
2013-08-15 19:20                                 ` Luck, Tony
2013-08-15 19:41                                   ` Borislav Petkov
2013-08-15  0:00                   ` Mauro Carvalho Chehab
2013-08-15  9:43                     ` Borislav Petkov
2013-08-12 14:44           ` Mauro Carvalho Chehab
2013-08-13 11:41             ` Naveen N. Rao
2013-08-13 12:41               ` Mauro Carvalho Chehab
2013-08-13 17:17                 ` Naveen N. Rao
2013-08-13 17:39                   ` Luck, Tony
2013-08-14 10:47                     ` Naveen N. Rao
2013-08-14 12:18                       ` Borislav Petkov
2013-08-15  0:15                       ` Mauro Carvalho Chehab
2013-08-15 10:01                         ` Borislav Petkov
2013-08-15 13:34                           ` Mauro Carvalho Chehab
2013-08-15 13:51                             ` Borislav Petkov
2013-08-15 18:16                               ` Luck, Tony
2013-08-15 18:41                                 ` Borislav Petkov
2013-08-14 23:56                   ` Mauro Carvalho Chehab
2013-08-15 10:02                     ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130814054322.GA9158@pd.tnic \
    --to=bp@alien8.de \
    --cc=bhelgaas@google.com \
    --cc=lance.ortiz@hp.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=m.chehab@samsung.com \
    --cc=naveen.n.rao@linux.vnet.ibm.com \
    --cc=rjw@sisk.pl \
    --cc=rostedt@goodmis.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).