Linux CXL
 help / color / mirror / Atom feed
From: Alejandro Lucero Palau <alucerop@amd.com>
To: Srirangan Madhavan <smadhavan@nvidia.com>,
	Ira Weiny <ira.weiny@intel.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>
Cc: Vishal Aslot <vaslot@nvidia.com>
Subject: Re: [RFC PATCH v1] cxl: add support for cxl reset
Date: Tue, 24 Dec 2024 10:03:03 +0000	[thread overview]
Message-ID: <5685ba8b-128e-398b-32da-bfe71c678c29@amd.com> (raw)
In-Reply-To: <SJ2PR12MB7963AC888E432469404E4FC8C3022@SJ2PR12MB7963.namprd12.prod.outlook.com>


On 12/23/24 23:49, Srirangan Madhavan wrote:
>>>> What happens if there are current cxl regions mapped to the device being
>>>> reset?  I don't think it is enough to flush the caches.  Section 9.7
>>>> talks about system software requirements for the HDMs.  How are those
>>>> requirements met with this patch?
>>> Considering different platforms might have specific operations to
>>> effectively clear the regions, some type of optional helper routines in the
>>> CXL core would be appropriate.  This way if required, a Type 2 device driver
>>> for example can choose to use it if required.
>> There are a couple of ways to go here.  For memory devices I think this is
>> on the user.  But I don't think that is a primary use case as I'm still
>> unclear of why a user would need to do any reset of the memory device.
>>
>> For type 2 devices I think the use case will be device dependant.  The
>> specific driver in that case will need to tear down their regions (with or
>> without user space coordination) and then issue the reset.
> That makes sense. So in either case, based on this logic, I am assuming
> it is out of scope for this patch to remove the the memory regions.
>
>> All that said I think this patch belongs with a series which implements
>> some device support.  Or is more general for type 2.  Which is what
>> Alejandro (cc'ed) is working on.
> Thanks for the feedback.
> @Alejandro, if you have any thoughts on this, kindly let me know.
> I can proceed based on that.


Sorry for the late reply.


I thought about this some time ago and shared my concerns about current 
CXL error management which should lead to a CXL reset in some situations.


Apart from that not being there yet, how to do the reset has the problem 
of the restoration of device HDM registers. I think the kernel CXL core 
should keep that information before the reset, and restore them back 
when the reset is done. Other possibility is to just unroll the HDM 
which implies involving all CXL switches in the path plus the CXL Root 
Complex. I do not know if there are other considerations for supporting 
restoration. For a, I hope, near future, CXL.cache should be included 
here, but it is not clear yet how to support it, so even less clear for 
a reset.


Of course, CXL errors can be more problematic than with PCIe, and a 
reset could imply the system (some cores) being stuck, so all this is 
quite tricky.


In the specific case I'm adding along with the generic CXL Type2 
support, if the system is stable after a detected CXL error, the device 
HDM should be configured after the reset, by implicitly some code linked 
to the reset triggered by the driver, what could be quite similar to the 
initial CXL probe, but if restoration is an option, the CXL core keeping 
the same ranges and somehow giving that back to the device.


I hope this helps somehow.

Thank you

Alejandro


> Regards,
> Srirangan.
>
>

      reply	other threads:[~2024-12-24 10:03 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-13  7:41 [RFC PATCH v1] cxl: add support for cxl reset Srirangan Madhavan
2024-12-17 17:02 ` Ira Weiny
2024-12-20 22:09   ` Srirangan Madhavan
     [not found]   ` <PH7PR12MB796800828DEC6E60D7C7C0F1C3072@PH7PR12MB7968.namprd12.prod.outlook.com>
2024-12-20 23:54     ` Ira Weiny
2024-12-23 23:49       ` Srirangan Madhavan
2024-12-24 10:03         ` Alejandro Lucero Palau [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5685ba8b-128e-398b-32da-bfe71c678c29@amd.com \
    --to=alucerop@amd.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=smadhavan@nvidia.com \
    --cc=vaslot@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox