From: Mauro Carvalho Chehab <mchehab@redhat.com>
To: linux-edac@vger.kernel.org
Cc: LKML <linux-kernel@vger.kernel.org>,
Steven Rostedt <srostedt@redhat.com>,
Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>,
"Mark A. Grondona" <mgrondona@llnl.gov>
Subject: [ANNOUNCE] rasdaemon userspace tool v.0.1
Date: Thu, 14 Mar 2013 15:38:44 -0300 [thread overview]
Message-ID: <20130314153844.158d817c@redhat.com> (raw)
In Kernel 3.5 we've started to address the long-discussed need of having a
better way to handle platform Reliability, Availability and Serviceability
(RAS).
Basically, a tracepoint event that handles memory errors called ras:mc_event
was added there, together with HERM/EDAC version 3.0 patches.
In Kernel 3.8, a new event was added, to handle PCIe AER events (ras:aer_event)
[1].
On kernel 3.9, a new driver was added to report hardware memory errors that
comes from the BIOS via ras:mc_event (the new ghes_edac driver).
It is still on my TODO list to add a RAS trace event for non-memory related
errors that come via the MCA machine check handler (mcelog).
While progress made was made at Kernel infrastructure, the needed userspace
tools were still lacking. So, I decided to start materializing the userspace
counterpart for what it was informally named as rasdaemon on some discussions.
The rasdaemon tool is available at:
http://git.infradead.org/users/mchehab/rasdaemon.git
The current version is on very early stages, and it has a copy on it of
the library that Steven Rostedt's is writing for trace-cmd tool. The plan
is to use the trace-cmd library, when it starts to packaged as a separate
library. I'd like to thanks Steven for the help he gave me to write this
initial version.
The current version of the tool enables the ras:mc_event log, and reads
it via the raw trace debugfs node:
/sys/kernel/debug/tracing/per_cpu/cpu*/trace_pipe_raw
It also has a code that allows recording the errors via an sqlite3 database.
The long term plan is to provide a tool that will catch and handle all
ras:* error events that comes from the Kernel tracing infrastructure,
logging them and providing tools to report it, being able to detect burst
errors (like the ones caused by a solar storm at memories) or sparsed errors,
in a way that would provide a glue to the users about the root cause of the
error.
Of course, there are much to do there.
It is a natural evolution of the tool to add support there for the
ras:aer_event traces that can come from PCIe AER.
While it currently works with current Kernels since kernel 3.5, there are a
number of interesting changes at tracing that are planned to be merged for
Kernel 3.10:
- poll() support for per_cpu trace_pipe_raw;
- a timestamp that could more easily associated with machine's
uptime;
- support for a separate ringbuffer for RAS.
So, it is planned the minimal requirement for the final version (v1.0)
would be kernel 3.10.
This is currently on very early staging. Help is needed ;)
So, please send us suggestions, patches etc to the EDAC mailing list:
linux-edac@vger.kernel.org
Thanks!
Mauro
-
[1] Currently, ras:mc_event is at include/ras/. It is on my todo list
to move it to be together with ras:aer_event, at include/trace/events/ras.h.
next reply other threads:[~2013-03-14 18:39 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-14 18:38 Mauro Carvalho Chehab [this message]
2013-03-15 7:18 ` [ANNOUNCE] rasdaemon userspace tool v.0.1 Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130314153844.158d817c@redhat.com \
--to=mchehab@redhat.com \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mgrondona@llnl.gov \
--cc=srostedt@redhat.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.