From: "Luck, Tony" <tony.luck@intel.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: "Joseph, Jithu" <jithu.joseph@intel.com>,
"hdegoede@redhat.com" <hdegoede@redhat.com>,
"markgross@kernel.org" <markgross@kernel.org>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"mingo@redhat.com" <mingo@redhat.com>,
"bp@alien8.de" <bp@alien8.de>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"x86@kernel.org" <x86@kernel.org>,
"hpa@zytor.com" <hpa@zytor.com>,
"corbet@lwn.net" <corbet@lwn.net>,
"andriy.shevchenko@linux.intel.com"
<andriy.shevchenko@linux.intel.com>,
"Raj, Ashok" <ashok.raj@intel.com>,
"rostedt@goodmis.org" <rostedt@goodmis.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"platform-driver-x86@vger.kernel.org"
<platform-driver-x86@vger.kernel.org>,
"patches@lists.linux.dev" <patches@lists.linux.dev>,
"Shankar, Ravi V" <ravi.v.shankar@intel.com>,
"Williams, Dan J" <dan.j.williams@intel.com>
Subject: RE: [RFC 00/10] Introduce In Field Scan driver
Date: Tue, 15 Mar 2022 16:10:59 +0000 [thread overview]
Message-ID: <de895b9617aa412e95fdd14fcad285fa@intel.com> (raw)
In-Reply-To: <YjCwI4N00reBuIqA@kroah.com>
> Again, I have no idea what you are doing at all with this driver, nor
> what you want to do with it.
>
> Start over please.
TL;DR is that silicon ages and some things break that don't have parity/ECC checks.
So systems start behaving erratically. If you are lucky they crash. If you are less lucky
they give incorrect results.
There's a paper (and even a movie 11 minutes) that describe the research by
Google on this.
https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s01-hochschild.pdf
(https://www.youtube.com/watch?v=QMF3rqhjYuM)
> What is the hardware you have to support?
Feature first available in Sapphire Rapids (Xeon: coming later this year)
> What is the expectation from userspace with regards to using the
> hardware?
Expectation from users is that they can run these tests frequently (many times
per day) to catch silicon that has developed faults quickly and take action to
isolate the cores that have issues.
On HT enabled systems both threads that share a core need to be put into
test mode together. The current version of tests takes around 50 milli-seconds
(so for many workloads doesn't need much prep ... those with high sensitivity
to latency would need to do some additional userspace task binding to make
sure those workloads were moved to another core while the h/w test runs).
There are three outcomes from running a test:
1) The test passes all stages.
2) The test did not complete (for a variety of reasons, e.g. power states)
3) The test indicates failure. Recommendation is to run one more time in case
the failure was transient .. e.g. cause by a neutron/alpha strike.
-Tony
next prev parent reply other threads:[~2022-03-15 16:11 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-01 19:54 [RFC 00/10] Introduce In Field Scan driver Jithu Joseph
2022-03-01 19:54 ` [RFC 01/10] x86/microcode/intel: expose collect_cpu_info_early() for IFS Jithu Joseph
2022-03-01 20:08 ` Greg KH
2022-03-02 0:56 ` Joseph, Jithu
2022-03-02 10:30 ` Borislav Petkov
2022-03-03 1:34 ` Joseph, Jithu
2022-03-01 19:54 ` [RFC 02/10] Documentation: In-Field Scan Jithu Joseph
2022-03-01 20:07 ` Greg KH
2022-03-02 0:58 ` Joseph, Jithu
2022-03-01 19:54 ` [RFC 03/10] platform/x86/intel/ifs: Add driver for " Jithu Joseph
2022-03-02 23:24 ` Williams, Dan J
2022-03-02 23:31 ` Raj, Ashok
2022-03-03 0:02 ` Luck, Tony
2022-03-03 2:04 ` Joseph, Jithu
2022-03-01 19:54 ` [RFC 04/10] platform/x86/intel/ifs: Load IFS Image Jithu Joseph
2022-03-03 2:58 ` Williams, Dan J
2022-03-01 19:54 ` [RFC 05/10] platform/x86/intel/ifs: Check IFS Image sanity Jithu Joseph
2022-03-01 19:54 ` [RFC 06/10] platform/x86/intel/ifs: Authenticate and copy to secured memory Jithu Joseph
2022-03-01 19:54 ` [RFC 07/10] platform/x86/intel/ifs: Create kthreads for online cpus for scan test Jithu Joseph
2022-03-03 4:17 ` Williams, Dan J
2022-03-03 19:59 ` Luck, Tony
2022-03-04 19:20 ` Joseph, Jithu
2022-03-07 16:52 ` Dan Williams
2022-03-07 17:46 ` Luck, Tony
2022-03-10 21:42 ` Kok, Auke
2022-03-01 19:54 ` [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface Jithu Joseph
2022-03-04 0:31 ` Williams, Dan J
2022-03-04 16:51 ` Luck, Tony
2022-03-04 20:42 ` Joseph, Jithu
2022-03-04 21:01 ` Luck, Tony
2022-03-21 21:15 ` Luck, Tony
2022-03-07 17:38 ` Dan Williams
2022-03-07 19:09 ` Joseph, Jithu
2022-03-07 19:15 ` Dan Williams
2022-03-07 19:55 ` Joseph, Jithu
2022-03-07 20:25 ` Dan Williams
2022-03-07 20:56 ` Joseph, Jithu
2022-03-07 21:28 ` Dan Williams
2022-03-07 21:30 ` gregkh
2022-03-07 21:33 ` Luck, Tony
2022-03-01 19:54 ` [RFC 09/10] platform/x86/intel/ifs: add ABI documentation for IFS Jithu Joseph
2022-03-04 0:57 ` Williams, Dan J
2022-03-01 19:54 ` [RFC 10/10] trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations Jithu Joseph
2022-03-01 20:17 ` Steven Rostedt
2022-03-02 1:02 ` Joseph, Jithu
2022-03-01 20:10 ` [RFC 00/10] Introduce In Field Scan driver Greg KH
2022-03-01 20:14 ` Greg KH
2022-03-14 23:10 ` Luck, Tony
2022-03-15 7:34 ` Greg KH
2022-03-15 14:59 ` Luck, Tony
2022-03-15 15:26 ` Greg KH
2022-03-15 16:04 ` Dan Williams
2022-03-15 16:09 ` Dan Williams
2022-03-15 16:10 ` Luck, Tony [this message]
2022-03-16 8:09 ` Greg KH
2022-03-02 15:33 ` Steven Rostedt
2022-03-02 16:20 ` Greg KH
2022-03-02 13:59 ` Andy Lutomirski
2022-03-02 20:29 ` Luck, Tony
2022-03-02 21:18 ` Andy Lutomirski
2022-03-02 21:41 ` Luck, Tony
2022-03-02 23:11 ` Williams, Dan J
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=de895b9617aa412e95fdd14fcad285fa@intel.com \
--to=tony.luck@intel.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=ashok.raj@intel.com \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=hdegoede@redhat.com \
--cc=hpa@zytor.com \
--cc=jithu.joseph@intel.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=markgross@kernel.org \
--cc=mingo@redhat.com \
--cc=patches@lists.linux.dev \
--cc=platform-driver-x86@vger.kernel.org \
--cc=ravi.v.shankar@intel.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).