Linux CXL
 help / color / mirror / Atom feed
* CXL RAS flows on Linux
@ 2023-07-31  6:06 Parthasarathy, Mohan (HPC/AI and Labs)
  2023-07-31 11:35 ` Jonathan Cameron
  0 siblings, 1 reply; 3+ messages in thread
From: Parthasarathy, Mohan (HPC/AI and Labs) @ 2023-07-31  6:06 UTC (permalink / raw)
  To: linux-cxl@vger.kernel.org

Hi all,

I am very interested in the RAS enablement for CXL on Linux. Is there a RAS project for CXL/Linux ?

1)	Do we have a design specification somewhere on the RAS interfaces on Linux for CXL that I can read on this ? Any document describing the correctable and uncorrectable error flows for CXL.mem?
2)	Are there any error injections tests and testcases that I can experiment with to see the RAS flows with CXL on Linux, using QEMU ?

Any pointers for both would be very much appreciated. 

Thanks and Regards,
Mohan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: CXL RAS flows on Linux
  2023-07-31  6:06 CXL RAS flows on Linux Parthasarathy, Mohan (HPC/AI and Labs)
@ 2023-07-31 11:35 ` Jonathan Cameron
  2023-08-02  5:17   ` Parthasarathy, Mohan (HPC/AI and Labs)
  0 siblings, 1 reply; 3+ messages in thread
From: Jonathan Cameron @ 2023-07-31 11:35 UTC (permalink / raw)
  To: "Parthasarathy, Mohan (HPC/AI and Labs)\" <mohan_parthasarathy
  Cc: linux-cxl@vger.kernel.org, shiju.jose

On Mon, 31 Jul 2023 06:06:27 +0000
"Parthasarathy, Mohan (HPC/AI and Labs)"         <mohan_parthasarathy@hpe.com> wrote:

> Hi all,

Hi Mohan,

Great to have more interest in this aspect.

> 
> I am very interested in the RAS enablement for CXL on Linux. Is there a RAS project for CXL/Linux ?

I'm not aware of a separate project, just mixture of work in the kernel and standard
tools such as RAS daemon.  You'll find all the relevant stuff in the archive linux-cxl
https://lore.kernel.org/linux-cxl/

We are definitely only part of the way there for RAS flows - have reporting but beyond
that there is a lot of work still to do.

> 
> 1)	Do we have a design specification somewhere on the RAS interfaces on Linux for CXL that I can read on this ? Any document describing the correctable and uncorrectable error flows for CXL.mem?

From Linux side of things I'm not aware of any public docs (there will be various internal ones in the
companies are contributing).


> 2)	Are there any error injections tests and testcases that I can experiment with to see the RAS flows with CXL on Linux, using QEMU ?

The infrastructure is there but we don't have any automated scripted flows yet.  Note we got
some of this stuff upstream only recently so you will want to build directly from the master branch
or wait for the next qemu release in a few weeks time. My staging branch at 
gitlab.com/jic23/qemu (cxl-* whatever latest date available is) runs ahead of that for features
but I don't think we have much ras stuff in the queue currently.
https://gitlab.com/jic23/qemu/-/commits/cxl-2023-07-17/


Documentation is lagging as well, so most of the instructions are in the commit messages
e.g. For poison
https://gitlab.com/qemu-project/qemu/-/commits/master/hw/cxl
For DRAM event records etc
https://lore.kernel.org/linux-cxl/20230530133603.16934-1-Jonathan.Cameron@huawei.com/
Similar for Uncor and Cor events... They've been in for a while, so easiest
is to look at the json files for cxl
https://elixir.bootlin.com/qemu/v8.1.0-rc1/source/qapi/cxl.json

For now RAS Daemon upstream support is lagging though you can see the RAS events
are there and there is a pull request for the various event queue based reports.
https://github.com/mchehab/rasdaemon/commits/master 
https://github.com/mchehab/rasdaemon/pull/104

Injection is all done via the QMP interface qemu provides.

I've not used it but I gather https://github.com/pmem/run_qemu is useful
for bringing up suitable qemu configs to poke.

If no events are coming through, check that the internal errors aren't
masked in AER as I don't think we've fully resolved how to control
that masking in the kernel yet.

Let me know how you get on. We should document this stuff better but
as ever there are too many things on the todo list :(

Jonathan

> 
> Any pointers for both would be very much appreciated. 
> 
> Thanks and Regards,
> Mohan


^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: CXL RAS flows on Linux
  2023-07-31 11:35 ` Jonathan Cameron
@ 2023-08-02  5:17   ` Parthasarathy, Mohan (HPC/AI and Labs)
  0 siblings, 0 replies; 3+ messages in thread
From: Parthasarathy, Mohan (HPC/AI and Labs) @ 2023-08-02  5:17 UTC (permalink / raw)
  To: Jonathan Cameron,
	"Parthasarathy, Mohan (HPC/AI and Labs)\" <mohan_parthasarathy@hpe.com>"@domain.invalid
  Cc: linux-cxl@vger.kernel.org, shiju.jose@huawei.com

Thanks, Jonathan. I will look into your suggestions and get back if I face any issues.

Regards,
Mohan


-----Original Message-----
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com> 
Sent: Monday, July 31, 2023 5:06 PM
To: "Parthasarathy, Mohan (HPC/AI and Labs)\" <mohan_parthasarathy@hpe.com>"@domain.invalid
Cc: linux-cxl@vger.kernel.org; shiju.jose@huawei.com
Subject: Re: CXL RAS flows on Linux

On Mon, 31 Jul 2023 06:06:27 +0000
"Parthasarathy, Mohan (HPC/AI and Labs)"         <mohan_parthasarathy@hpe.com> wrote:

> Hi all,

Hi Mohan,

Great to have more interest in this aspect.

> 
> I am very interested in the RAS enablement for CXL on Linux. Is there a RAS project for CXL/Linux ?

I'm not aware of a separate project, just mixture of work in the kernel and standard tools such as RAS daemon.  You'll find all the relevant stuff in the archive linux-cxl https://lore.kernel.org/linux-cxl/

We are definitely only part of the way there for RAS flows - have reporting but beyond that there is a lot of work still to do.

> 
> 1)	Do we have a design specification somewhere on the RAS interfaces on Linux for CXL that I can read on this ? Any document describing the correctable and uncorrectable error flows for CXL.mem?

From Linux side of things I'm not aware of any public docs (there will be various internal ones in the companies are contributing).


> 2)	Are there any error injections tests and testcases that I can experiment with to see the RAS flows with CXL on Linux, using QEMU ?

The infrastructure is there but we don't have any automated scripted flows yet.  Note we got some of this stuff upstream only recently so you will want to build directly from the master branch or wait for the next qemu release in a few weeks time. My staging branch at gitlab.com/jic23/qemu (cxl-* whatever latest date available is) runs ahead of that for features but I don't think we have much ras stuff in the queue currently.
https://gitlab.com/jic23/qemu/-/commits/cxl-2023-07-17/ 


Documentation is lagging as well, so most of the instructions are in the commit messages
e.g. For poison
https://gitlab.com/qemu-project/qemu/-/commits/master/hw/cxl 
For DRAM event records etc
https://lore.kernel.org/linux-cxl/20230530133603.16934-1-Jonathan.Cameron@huawei.com/
Similar for Uncor and Cor events... They've been in for a while, so easiest
is to look at the json files for cxl
https://elixir.bootlin.com/qemu/v8.1.0-rc1/source/qapi/cxl.json

For now RAS Daemon upstream support is lagging though you can see the RAS events
are there and there is a pull request for the various event queue based reports.
https://github.com/mchehab/rasdaemon/commits/master 
https://github.com/mchehab/rasdaemon/pull/104

Injection is all done via the QMP interface qemu provides.

I've not used it but I gather https://github.com/pmem/run_qemu is useful
for bringing up suitable qemu configs to poke.

If no events are coming through, check that the internal errors aren't
masked in AER as I don't think we've fully resolved how to control
that masking in the kernel yet.

Let me know how you get on. We should document this stuff better but
as ever there are too many things on the todo list :(

Jonathan

> 
> Any pointers for both would be very much appreciated. 
> 
> Thanks and Regards,
> Mohan


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-08-02  5:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-31  6:06 CXL RAS flows on Linux Parthasarathy, Mohan (HPC/AI and Labs)
2023-07-31 11:35 ` Jonathan Cameron
2023-08-02  5:17   ` Parthasarathy, Mohan (HPC/AI and Labs)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox