From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Shiju Jose <shiju.jose@huawei.com>,
"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"tony.luck@intel.com" <tony.luck@intel.com>,
"rafael@kernel.org" <rafael@kernel.org>,
"lenb@kernel.org" <lenb@kernel.org>,
"mchehab@kernel.org" <mchehab@kernel.org>,
"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
"dave@stgolabs.net" <dave@stgolabs.net>,
"dave.jiang@intel.com" <dave.jiang@intel.com>,
"alison.schofield@intel.com" <alison.schofield@intel.com>,
"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
"ira.weiny@intel.com" <ira.weiny@intel.com>,
"david@redhat.com" <david@redhat.com>,
"Vilas.Sridharan@amd.com" <Vilas.Sridharan@amd.com>,
"leo.duran@amd.com" <leo.duran@amd.com>,
"Yazen.Ghannam@amd.com" <Yazen.Ghannam@amd.com>,
"rientjes@google.com" <rientjes@google.com>,
"jiaqiyan@google.com" <jiaqiyan@google.com>,
"Jon.Grimm@amd.com" <Jon.Grimm@amd.com>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"naoya.horiguchi@nec.com" <naoya.horiguchi@nec.com>,
"james.morse@arm.com" <james.morse@arm.com>,
"jthoughton@google.com" <jthoughton@google.com>,
"somasundaram.a@hpe.com" <somasundaram.a@hpe.com>,
"erdemaktas@google.com" <erdemaktas@google.com>,
"pgonda@google.com" <pgonda@google.com>,
"duenwen@google.com" <duenwen@google.com>,
"gthelen@google.com" <gthelen@google.com>,
"wschwartz@amperecomputing.com" <wschwartz@amperecomputing.com>,
"dferguson@amperecomputing.com" <dferguson@amperecomputing.com>,
"wbs@os.amperecomputing.com" <wbs@os.amperecomputing.com>,
"nifan.cxl@gmail.com" <nifan.cxl@gmail.com>,
tanxiaofei <tanxiaofei@huawei.com>,
"Zengtao (B)" <prime.zeng@hisilicon.com>,
"Roberto Sassu" <roberto.sassu@huawei.com>,
"kangkang.shen@futurewei.com" <kangkang.shen@futurewei.com>,
wanghuiqiang <wanghuiqiang@huawei.com>,
Linuxarm <linuxarm@huawei.com>
Subject: Re: [PATCH v18 04/19] EDAC: Add memory repair control feature
Date: Thu, 9 Jan 2025 18:34:48 +0000 [thread overview]
Message-ID: <20250109183448.000059ec@huawei.com> (raw)
In-Reply-To: <20250109161902.GDZ3_29rH-sQMV4n0N@fat_crate.local>
On Thu, 9 Jan 2025 17:19:02 +0100
Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jan 09, 2025 at 04:01:59PM +0000, Jonathan Cameron wrote:
> > Ok. To me the fact it's not a single write was relevant. Seems not
> > in your mental model of how this works. For me a single write
> > that you cannot query back is fine, setting lots of parameters and
> > being unable to query any of them less so. I guess you disagree.
>
> Why can't you query it back?
>
> grep -r . /sysfs/dir/
>
> All files' values have been previously set and should still be there on
> a read, I'd strongly hope. Your ->read routines should give the values back.
Today you can. Seems we are talking cross purposes.
I'm confused. I thought your proposal was for "bank" attribute to present an
allowed range on read.
"bank" attribute is currently written to and read back as the value of the bank on which
to conduct a repair. Maybe this disconnect is down to the fact max_ and min_
attributes should have been marked as RO in the docs. They aren't controls,
just presentation of limits to userspace.
Was intent a separate bank_range type attribute rather than max_bank, min_bank?
One of those would be absolutely fine (similar to the _available attributes
in IIO - I added those years ago to meet a similar need and we've never had
any issues with those).
>
> > In interests of progress I'm not going to argue further. No one is
> > going to use this interface by hand anyway so the lost of useability
> > I'm seeing doesn't matter a lot.
>
> I had the suspicion that this user interface is not really going to be used by
> a user but by a tool. But then if you don't have a tool, you're lost.
>
> This is one of the reasons why you can control ftrace directly on the shell
> too - without a tool. This is very useful in certain cases where you cannot
> run some userspace tools.
I fully agree. What I was saying was in response to me thinking you wanted it
to not be possible to read back the user set values (overlapping uses of
single bank attribute which wasn't what you meant). That is useful for a user
wanting to do the cat /sys/... that you mention above, but not vital if they are
directly reading the tracepoints for the error records and poking the
sysfs interface.
Given it seems I misunderstood that suggestion, ignore my reply to that
as irrelevant.
>
> > In at least the CXL case I'm fairly sure most of them are not discoverable.
> > Until you see errors you have no idea what the memory topology is.
>
> Ok.
>
> > For that you'd need to have a path to read back what happened.
>
> So how is this scrubbing going to work? You get an error, you parse it for all
> the attributes and you go and write those attributes into the scrub interface
> and it starts scrubbing?
Repair not scrubbing. They are different things we should keep separate,
scrub corrects the value, if it can, but doesn't change the underlying memory to
new memory cells to avoid repeated errors. Replacing scrub with repair
(which I think was the intent here)...
You get error records that describe the error seen in hardware, write back the
values into this interface and tell it to repair the memory. This is not
necessarily a synchronous or immediate thing - instead typically based on
trend analysis.
As an example, the decision might be that bit of ram threw up 3 errors
over a month including multiple system reboots (for other reasons) and
that is over some threshold so we use a spare memory line to replace it.
>
> But then why do you even need the interface at all?
>
> Why can't the kernel automatically collect all those attributes and start the
> scrubbing automatically - no need for any user interaction...?
>
> So why do you *actually* even need user interaction here and why can't the
> kernel be smart enough to start the scrub automatically?
Short answer, it needs to be very smart and there isn't a case of one size
fits all - hence suggested approach of making it a user space problem.
There are hardware autonomous solutions and ones handled by host firmware.
That is how repair is done in many servers - at most software sees a slightly
latency spike as the memory is repaired under the hood. Some CXL devices
will do this as well. Those CXL devices may provide an additional repair
interface for the less clear cut decisions that need more data processing
/ analysis than the device firmware is doing. Other CXL devices will take
the view the OS is best placed to make all the decisions - those sometimes
will give a 'maintenance needed' indication in the error records but that
is still a hint the host may or may not take any notice of.
Given in the systems being considered here, software is triggering the repair,
we want to allow for policy in the decision. In simple cases we could push
that policy into the kernel e.g. just repair the moment we see an error record.
These repair resources are very limited in number, so immediately repairing
may a bad idea. We want to build up a history of errors before making
such a decision. That can be done in kernel.
The decision to repair memory is heavily influenced by policy and time considerations
against device resource constraints.
Some options that are hard to do in kernel.
1. Typical asynchronous error report for a corrected error.
Tells us memory had an error (perhaps from a scrubbing engine on the device
running checks). No need to take action immediately. Instead build up more data
over time and if lots of errors occur make decision to repair as no we are sure it
is worth doing rather than a single random event. We may tune scrubbing engines
to check this memory more frequently and adjust our data analysis to take that
into account for setting thresholds etc.
When an admin considers it a good time to take action, offline the memory and
repair before bringing it back into use (sometimes by rebooting the machine).
Sometimes repair can be triggered in a software transparent way, sometimes not.
This also applies to uncorrectable errors though in that case you can't necessarily
repair it without ever seeing a synchronous poison with all the impacts that has.
2. Soft repair across boots. We are actually storing the error records, then only
applying the fix on reboot before using the memory - so maintaining a list
of bad memory and saving it to a file to read back on boot. We could provide
another kernel interface to get this info and reinject it after reboot instead
of doing it in userspace but that is another ABI to design.
3. Complex policy across fleets. A lot of work is going on around prediction techniques
that may change the local policy on each node dependent on the overall reliability
patterns of a particular batch of devices and local characteristics, service guarantees
etc. If it is hard repair, then once you've run out you need schedule an engineer
out to replace the DIMM. All complex inputs to the decision.
Similar cases like CPU offlining on repeated errors are done in userspace (e.g.
RAS Daemon) for similar reasons of long term data gathering and potentially
complex algorithms.
>
> > Ok. Then can we just drop the range discoverability entirely or we go with
> > your suggestion and do not support read back of what has been
> > requested but instead have the reads return a range if known or "" /
> > return -EONOTSUPP if simply not known?
>
> Probably.
Too many options in the above paragraph so just to check... Probably to which?
If it's a separate attribute from the one we write the control so then
we do what is already done here and don't present the interface at all if
the range isn't discoverable.
>
> > I can live with that though to me we are heading in the direction of
> > a less intuitive interface to save a small number of additional files.
>
> This is not the point. I already alluded to this earlier - we're talking about
> a user visible interface which, once it goes out, it is cast in stone forever.
>
> So those files better have a good reason to exist...
>
> And if we're not sure yet, we can upstream only those which are fine now and
> then continue discussing the rest.
Ok. Best path is drop the available range support then (so no min_ max_ or
anything to replace them for now).
Added bonus is we don't have to rush this conversation and can make sure we
come to the right solution driven by use cases.
Jonathan
> HTH.
>
next prev parent reply other threads:[~2025-01-09 18:34 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-06 12:09 [PATCH v18 00/19] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
2025-01-06 12:09 ` [PATCH v18 01/19] EDAC: Add support for EDAC device features control shiju.jose
2025-01-06 13:37 ` Borislav Petkov
2025-01-06 14:48 ` Shiju Jose
2025-01-13 15:06 ` Mauro Carvalho Chehab
2025-01-14 9:55 ` Jonathan Cameron
2025-01-14 10:08 ` Shiju Jose
2025-01-14 11:33 ` Mauro Carvalho Chehab
2025-01-30 19:18 ` Daniel Ferguson
2025-01-06 12:09 ` [PATCH v18 02/19] EDAC: Add scrub control feature shiju.jose
2025-01-06 15:57 ` Borislav Petkov
2025-01-06 19:34 ` Shiju Jose
2025-01-07 7:32 ` Borislav Petkov
2025-01-07 9:23 ` Shiju Jose
2025-01-08 15:47 ` Shiju Jose
2025-01-13 15:50 ` Mauro Carvalho Chehab
2025-01-30 19:18 ` Daniel Ferguson
2025-01-06 12:09 ` [PATCH v18 03/19] EDAC: Add ECS " shiju.jose
2025-01-13 16:09 ` Mauro Carvalho Chehab
2025-01-06 12:10 ` [PATCH v18 04/19] EDAC: Add memory repair " shiju.jose
2025-01-09 9:19 ` Borislav Petkov
2025-01-09 11:00 ` Shiju Jose
2025-01-09 12:32 ` Borislav Petkov
2025-01-09 14:24 ` Jonathan Cameron
2025-01-09 15:18 ` Borislav Petkov
2025-01-09 16:01 ` Jonathan Cameron
2025-01-09 16:19 ` Borislav Petkov
2025-01-09 18:34 ` Jonathan Cameron [this message]
2025-01-09 23:51 ` Dan Williams
2025-01-10 11:01 ` Jonathan Cameron
2025-01-10 22:49 ` Dan Williams
2025-01-13 11:40 ` Jonathan Cameron
2025-01-14 19:35 ` Dan Williams
2025-01-15 10:07 ` Jonathan Cameron
2025-01-15 11:35 ` Mauro Carvalho Chehab
2025-01-11 17:12 ` Borislav Petkov
2025-01-13 11:07 ` Jonathan Cameron
2025-01-21 16:16 ` Borislav Petkov
2025-01-21 18:16 ` Jonathan Cameron
2025-01-22 19:09 ` Borislav Petkov
2025-02-06 13:39 ` Jonathan Cameron
2025-02-17 13:23 ` Borislav Petkov
2025-02-18 16:51 ` Jonathan Cameron
2025-02-19 18:45 ` Borislav Petkov
2025-02-20 12:19 ` Jonathan Cameron
2025-01-14 13:10 ` Mauro Carvalho Chehab
2025-01-14 12:57 ` Mauro Carvalho Chehab
2025-01-14 12:38 ` Mauro Carvalho Chehab
2025-01-14 13:05 ` Jonathan Cameron
2025-01-14 14:39 ` Mauro Carvalho Chehab
2025-01-14 11:47 ` Mauro Carvalho Chehab
2025-01-14 12:31 ` Shiju Jose
2025-01-14 14:26 ` Mauro Carvalho Chehab
2025-01-14 13:47 ` Mauro Carvalho Chehab
2025-01-14 14:30 ` Shiju Jose
2025-01-15 12:03 ` Mauro Carvalho Chehab
2025-01-06 12:10 ` [PATCH v18 05/19] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2025-01-21 23:01 ` Daniel Ferguson
2025-01-22 15:38 ` Shiju Jose
2025-01-30 19:19 ` Daniel Ferguson
2025-01-06 12:10 ` [PATCH v18 06/19] ras: mem: Add memory " shiju.jose
2025-01-21 23:01 ` Daniel Ferguson
2025-01-30 19:19 ` Daniel Ferguson
2025-01-06 12:10 ` [PATCH v18 07/19] cxl: Refactor user ioctl command path from mds to mailbox shiju.jose
2025-01-06 12:10 ` [PATCH v18 08/19] cxl: Add skeletal features driver shiju.jose
2025-01-06 12:10 ` [PATCH v18 09/19] cxl: Enumerate feature commands shiju.jose
2025-01-06 12:10 ` [PATCH v18 10/19] cxl: Add Get Supported Features command for kernel usage shiju.jose
2025-01-06 12:10 ` [PATCH v18 11/19] cxl: Add features driver attribute to emit number of features supported shiju.jose
2025-01-06 12:10 ` [PATCH v18 12/19] cxl/mbox: Add GET_FEATURE mailbox command shiju.jose
2025-01-06 12:10 ` [PATCH v18 13/19] cxl/mbox: Add SET_FEATURE " shiju.jose
2025-01-06 12:10 ` [PATCH v18 14/19] cxl: Setup exclusive CXL features that are reserved for the kernel shiju.jose
2025-01-06 12:10 ` [PATCH v18 15/19] cxl/memfeature: Add CXL memory device patrol scrub control feature shiju.jose
2025-01-24 20:38 ` Dan Williams
2025-01-27 10:06 ` Jonathan Cameron
2025-01-27 12:53 ` Shiju Jose
2025-01-27 23:17 ` Dan Williams
2025-01-29 12:28 ` Shiju Jose
2025-01-06 12:10 ` [PATCH v18 16/19] cxl/memfeature: Add CXL memory device ECS " shiju.jose
2025-01-06 12:10 ` [PATCH v18 17/19] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
2025-01-06 12:10 ` [PATCH v18 18/19] cxl/memfeature: Add CXL memory device soft PPR control feature shiju.jose
2025-01-06 12:10 ` [PATCH v18 19/19] cxl/memfeature: Add CXL memory device memory sparing " shiju.jose
2025-01-13 14:46 ` [PATCH v18 00/19] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers Mauro Carvalho Chehab
2025-01-13 15:36 ` Jonathan Cameron
2025-01-14 14:06 ` Mauro Carvalho Chehab
2025-01-13 18:15 ` Shiju Jose
2025-01-30 19:18 ` Daniel Ferguson
2025-02-03 9:25 ` Shiju Jose
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250109183448.000059ec@huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=Jon.Grimm@amd.com \
--cc=Vilas.Sridharan@amd.com \
--cc=Yazen.Ghannam@amd.com \
--cc=alison.schofield@intel.com \
--cc=bp@alien8.de \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=dferguson@amperecomputing.com \
--cc=duenwen@google.com \
--cc=erdemaktas@google.com \
--cc=gthelen@google.com \
--cc=ira.weiny@intel.com \
--cc=james.morse@arm.com \
--cc=jiaqiyan@google.com \
--cc=jthoughton@google.com \
--cc=kangkang.shen@futurewei.com \
--cc=lenb@kernel.org \
--cc=leo.duran@amd.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxarm@huawei.com \
--cc=mchehab@kernel.org \
--cc=naoya.horiguchi@nec.com \
--cc=nifan.cxl@gmail.com \
--cc=pgonda@google.com \
--cc=prime.zeng@hisilicon.com \
--cc=rafael@kernel.org \
--cc=rientjes@google.com \
--cc=roberto.sassu@huawei.com \
--cc=shiju.jose@huawei.com \
--cc=somasundaram.a@hpe.com \
--cc=tanxiaofei@huawei.com \
--cc=tony.luck@intel.com \
--cc=vishal.l.verma@intel.com \
--cc=wanghuiqiang@huawei.com \
--cc=wbs@os.amperecomputing.com \
--cc=wschwartz@amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox