* how to Identify and determine NMI sourcing in GHES.c ? for example , knowing a NMI error is caused by pcie error
@ 2012-01-17 3:17 Lin-Bao Zhang
2012-01-17 3:58 ` Lin-Bao Zhang
0 siblings, 1 reply; 2+ messages in thread
From: Lin-Bao Zhang @ 2012-01-17 3:17 UTC (permalink / raw)
To: Bjorn Helgaas, ying.huang, yanmin.zhang
Cc: linux-kernel, linux-acpi, linux-kernel
I am sorry again, my last email can't be sent to LKML and acpi mailing
list successfully maybe due to not pure text format , so I resent it
again with pure text format .
thanks very much!
I just want to know , in upstream linux kernel ,can we know NMI
sourcing when NMI occurs ?
------------------------------------------------
Hi Huang and Bjorn,
> In firmware first mode (BIOS hold AER service control), AER will be reported via> APEI HEST Generic Hardware Error Source, AER will be logged by kernel there.> AER recovery can be triggered there too, but the code has not been merged by> Linux kernel upstream yet.> > Best Regards,> Huang Ying> In GHES.c : I saw this function :
static struct notifier_block ghes_notifier_nmi = {
.notifier_call = ghes_notify_nmi,};......
//here ,there is a NMI handler specially for NMI . case
ACPI_HEST_NOTIFY_NMI: mutex_lock(&ghes_list_mutex);
if (list_empty(&ghes_nmi))
register_die_notifier(&ghes_notifier_nmi); ..........
a) Now , I have one question about GHES.c can differ NMI sourcing ?
You know ,some sources can trigger NMI , how to know which is the
source ? for example ,memory corrupted or pcie error ? Especially ,
for PCIe error ,we want to do more works.
If we know NMI sourcing , we can do more works . for different NMI
errors , different actions should be taken, certainly , they should
have the same parts : reboot the machine at last.b) Your code is
developing now ? what is your plan to submit them ? c) In
ghes_notify_nmi() , can we add a code to differ NMI sourcing ?
differing NMI sourcing is of vendor's issue ? our HP's proliant
provide a driver "hpwdt.c" to check NMI sourcing by using CRU
interface on pre-Gen8 machine.
What is the relationship between GHES and HEST (table) ? I feel , HEST
is just table , GHES is just method : all error information are stored
HEST table by firmware , GHES is just firmware interface which is used
to expose to OSPM to parse this table.What is meaning of "general" in
"GHES" ? I guess , this presents common code , vendor needs to
implement its own method to hook after general code ? For example ,for
HP's machine , we must implement a special code for our HP's machine
to get error source ?
d) http://lwn.net/Articles/368119/ , you said that :APEI stands for
ACPI Platform Error Interface, which allows to reporterrors (for
example from chipset) to the operating system. Thisimproves NMI
handling especially. In addition it supports errorserialization and
error injection. Why did you say "This improves NMI handling
especially." ? How do HEST and GHES improve NMI handling ? Could you
share your comments ? thanks very much!
e) About the SourceID and NMIerror :About how to identify the
NMIsourcing, following is my some thinking ,
From ACPI spec :ACPI 5.0 from 18.3.2.6 Generic Hardware Error Source
It seems that NMI handler should read the error status block to know
error source . from 18.4 Firmware First Error HandlingIt seems that
NMI handler can know the original source ID , but through this source
ID ,for example ,we can know this error is of pci error or other error
? It seems that what we can use to identify NMI source is just source
ID ?
In rom 18.4.1 Example: Firmware First Handling Using NMI Notification
I feel that our ghes_notify_nmi () should do similar works just like
"OSPM NMI handler scans the list of generic error sources to find the
error source that reported the error and processes the error report"
thanks very much for your reply, I am sorry for my poor English .
-- Bob"子曰:不患人知不己知,患不知人也"If not us, who ? if not now, when ?"
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: how to Identify and determine NMI sourcing in GHES.c ? for example , knowing a NMI error is caused by pcie error
2012-01-17 3:17 how to Identify and determine NMI sourcing in GHES.c ? for example , knowing a NMI error is caused by pcie error Lin-Bao Zhang
@ 2012-01-17 3:58 ` Lin-Bao Zhang
0 siblings, 0 replies; 2+ messages in thread
From: Lin-Bao Zhang @ 2012-01-17 3:58 UTC (permalink / raw)
To: ying.huang, linux-kernel, linux-acpi
> In firmware first mode (BIOS hold AER service control), AER will be reported via
> APEI HEST Generic Hardware Error Source, AER will be logged by kernel there.
> AER recovery can be triggered there too, but the code has not been merged by
> Linux kernel upstream yet.
Wow! ,from LKML , I have found your patches here :
http://www.spinics.net/lists/linux-acpi/msg34003.html
I check the latest code from
http://lxr.linux.no/linux+v3.2.1/drivers/acpi/apei/ghes.c ,this patch
has been not adopted by upstream.
another question :
for corrected error , hardware can correct it.
but for uncorrected error, especially fatal error , machines need to
be restarted immediately, but for OS ,for different error (NMI
sourcing) ,OS needs to do different ready works before restarting
machine ?
thanks .
-Bob Zhang
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2012-01-17 3:58 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-17 3:17 how to Identify and determine NMI sourcing in GHES.c ? for example , knowing a NMI error is caused by pcie error Lin-Bao Zhang
2012-01-17 3:58 ` Lin-Bao Zhang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).