From: Mauro Carvalho Chehab <mchehab@redhat.com>
To: linux-edac@vger.kernel.org
Cc: LKML <linux-kernel@vger.kernel.org>,
Steven Rostedt <srostedt@redhat.com>,
Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>,
"Mark A. Grondona" <mgrondona@llnl.gov>
Subject: [ANNOUNCE] rasdaemon userspace tool v.0.1
Date: Thu, 14 Mar 2013 15:38:44 -0300 [thread overview]
Message-ID: <20130314153844.158d817c@redhat.com> (raw)
In Kernel 3.5 we've started to address the long-discussed need of having a
better way to handle platform Reliability, Availability and Serviceability
(RAS).
Basically, a tracepoint event that handles memory errors called ras:mc_event
was added there, together with HERM/EDAC version 3.0 patches.
In Kernel 3.8, a new event was added, to handle PCIe AER events (ras:aer_event)
[1].
On kernel 3.9, a new driver was added to report hardware memory errors that
comes from the BIOS via ras:mc_event (the new ghes_edac driver).
It is still on my TODO list to add a RAS trace event for non-memory related
errors that come via the MCA machine check handler (mcelog).
While progress made was made at Kernel infrastructure, the needed userspace
tools were still lacking. So, I decided to start materializing the userspace
counterpart for what it was informally named as rasdaemon on some discussions.
The rasdaemon tool is available at:
http://git.infradead.org/users/mchehab/rasdaemon.git
The current version is on very early stages, and it has a copy on it of
the library that Steven Rostedt's is writing for trace-cmd tool. The plan
is to use the trace-cmd library, when it starts to packaged as a separate
library. I'd like to thanks Steven for the help he gave me to write this
initial version.
The current version of the tool enables the ras:mc_event log, and reads
it via the raw trace debugfs node:
/sys/kernel/debug/tracing/per_cpu/cpu*/trace_pipe_raw
It also has a code that allows recording the errors via an sqlite3 database.
The long term plan is to provide a tool that will catch and handle all
ras:* error events that comes from the Kernel tracing infrastructure,
logging them and providing tools to report it, being able to detect burst
errors (like the ones caused by a solar storm at memories) or sparsed errors,
in a way that would provide a glue to the users about the root cause of the
error.
Of course, there are much to do there.
It is a natural evolution of the tool to add support there for the
ras:aer_event traces that can come from PCIe AER.
While it currently works with current Kernels since kernel 3.5, there are a
number of interesting changes at tracing that are planned to be merged for
Kernel 3.10:
- poll() support for per_cpu trace_pipe_raw;
- a timestamp that could more easily associated with machine's
uptime;
- support for a separate ringbuffer for RAS.
So, it is planned the minimal requirement for the final version (v1.0)
would be kernel 3.10.
This is currently on very early staging. Help is needed ;)
So, please send us suggestions, patches etc to the EDAC mailing list:
linux-edac@vger.kernel.org
Thanks!
Mauro
-
[1] Currently, ras:mc_event is at include/ras/. It is on my todo list
to move it to be together with ras:aer_event, at include/trace/events/ras.h.
next reply other threads:[~2013-03-14 18:39 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-14 18:38 Mauro Carvalho Chehab [this message]
2013-03-15 7:18 ` [ANNOUNCE] rasdaemon userspace tool v.0.1 Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130314153844.158d817c@redhat.com \
--to=mchehab@redhat.com \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mgrondona@llnl.gov \
--cc=srostedt@redhat.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox