Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Alejandro Lucero Palau <alucerop@amd.com>
To: Srirangan Madhavan <smadhavan@nvidia.com>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Lukas Wunner <lukas@wunner.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Zhi Wang <zhiw@nvidia.com>, Vishal Aslot <vaslot@nvidia.com>,
	Shanker Donthineni <sdonthineni@nvidia.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>
Subject: Re: [PATCH v2 2/2] cxl: add support for cxl reset
Date: Fri, 22 Aug 2025 16:33:02 +0100	[thread overview]
Message-ID: <f90ba985-5cfb-4714-bbab-17de471a7332@amd.com> (raw)
In-Reply-To: <SJ2PR12MB79630812834531324D148DB8C3AA2@SJ2PR12MB7963.namprd12.prod.outlook.com>

Hi,


I'm sorry I missed your reply. Some comments below.


On 4/8/25 00:45, Srirangan Madhavan wrote:
> Thank you so much for the comments Alejandro, Lukas and Bjorn.
> My apologies for the late response. I was out of office for a few weeks.
> I am picking up this patch again and adding my responses.
>
>> I think cxl_reset_start after the *_prepare call makes more sense here.
> @Alejandro, Just for clarity,
> do you mean I should rename the cxl_reset_init -> cxl_reset_start?


Yes, that's my view. init and prepare are similar, so prepare or init, 
then start.


>> I do not know how safe is this. IMO, this needs to be synchronized by
>> the accel driver which could imply to tell user space first. In our case
>> it would imply to stop rx queues in the netdev for CXL.cache, and tx
>> queues for CXL.mem. Doing it unconditionally could make current CXL
>> transactions to stall ... although it could be argued the reset event
>> implies something is broken, but let's try to do it properly if there is
>> a chance of the system not unreliable.
> Regarding this, we feel that the reset framework already
> has a *reset_prepare and is called by the Linux kernel, before initiating the steps
> for acutal CXL reset. During this reset prepare is when the accel driver should
> quiesce its device. In this case, that would imply stopping the rx & tx and
> draining the queues. Is there any particular reason this wouldn't work/be sufficient?


I did not see the connection to the driver/device in the patch so I 
assumed it was not there, but after your comment, I studied the code and 
it is just a matter of implementing such reset prepare function in our 
driver, and I think that solves the problem I saw.


>> One thing this does not cover, and I do not know if it should, is the
>> fact that the device CXL regs will be reset, so the question is if the
>> old values should be restored or the device/driver should go through the
>> same initialization, if a hotplug device, or do it specifically if
>> already present at boot time and the BIOS doing that first
>> initialization. In one case the restoration needs to happen, in the
>> other the old values/objects need to be removed. I think the second case
>> is more problematic because this is likely involving CXL root complex
>> configuration performed by the BIOS ... Not trivial at all IMO.
> Here again, we think that the accel driver is the one that should be doing this.
> In our case, when the reset done call is made, the driver before making the device
> available to be reopened and used, needs to restore the config space/DVSEC etc. to
> their prior state. During the reset_prepare stage these need to be saved. Since this
> is how SBR is handled currently even in the PCIe world, (for ex. AER/EEH handling framework)
> where driver is responsible for save and restore of the regs, and it is also not possible
> for the kernel to know exactly what to save and restore for each specific type 2 devices,
> it seems logical to do the same here.


At the time I wrote that comment, it was not clear in my mind how this 
should work, and I had doubts about the BIOS.

But I can say now all this is fine and doable.


 From you comment (not in the thread but through discord) about being 
the sfc driver the client for this functionality, I think it could be. 
I'm not sure if as part of the current Type2 patchset effort or a 
follow-up work. I'll do some work the next week for seeing the implications.


Thank you


>> +1  The reset-related content in drivers/pci/pci.c has been growing
>> recently.  Maybe we should consider moving it all to a reset.c file.
> This makes sense. I'll prepare a patch to move the reset code and
> compile out CXL specific parts while making any changes for the next version.
>
> Thank you.
>
> Regards,
> Srirangan.
> ________________________________________
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Friday, February 21, 2025 4:13 PM
> To: Lukas Wunner
> Cc: Srirangan Madhavan; Davidlohr Bueso; Jonathan Cameron; Dave Jiang; Alison Schofield; Vishal Verma; Ira Weiny; Dan Williams; Zhi Wang; Vishal Aslot; Shanker Donthineni; linux-cxl@vger.kernel.org; Bjorn Helgaas; linux-pci@vger.kernel.org
> Subject: Re: [PATCH v2 2/2] cxl: add support for cxl reset
>
> External email: Use caution opening links or attachments
>
>
> On Fri, Feb 21, 2025 at 11:45:56AM +0100, Lukas Wunner wrote:
>> On Thu, Feb 20, 2025 at 08:39:06PM -0800, Srirangan Madhavan wrote:
>>> Type 2 devices are being introduced and will require finer-grained
>>> reset mechanisms beyond bus-wide reset methods.
>>>
>>> Add support for CXL reset per CXL v3.2 Section 9.6/9.7
>>>
>>> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
>>> ---
>>>   drivers/pci/pci.c   | 146 ++++++++++++++++++++++++++++++++++++++++++++
>> drivers/pci/pci.c is basically a catch-all for anything that doesn't fit
>> in one of the other .c files in drivers/pci.  I'm slightly worried that
>> this (otherwise legitimate) patch increases the clutter in pci.c further,
>> rendering it unmaintainable in the long term.
> +1  The reset-related content in drivers/pci/pci.c has been growing
> recently.  Maybe we should consider moving it all to a reset.c file.
>
>> At the very least, I'm wondering if this can be #ifdef'ed to
>> CONFIG_CXL_PCI?
>>
>> One idea would be to move this newly added reset method, as well as the
>> existing cxl_reset_bus_function(), to a new drivers/pci/cxl.c file.
>>
>> I guess moving it to drivers/cxl/ isn't an option because cxl can be
>> modular.
>>
>> Another idea would be to move all the reset handling (which makes up
>> a significant portion of pci.c) to a separate drivers/pci/reset.c.
>> This might be beyond the scope of your patch, but in the interim,
>> maybe at least an #ifdef can be added because the PCI core is also
>> used e.g. on memory-constrained wifi routers which don't care about
>> CXL at all.
> Agree, we'll need some way to make this optional.
>
> Bjorn

  reply	other threads:[~2025-08-22 15:33 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-21  4:39 [PATCH v2 0/2] Add CXL Reset Support for CXL Devices Srirangan Madhavan
2025-02-21  4:39 ` [PATCH v2 1/2] cxl: de-duplicate cxl DVSEC reg defines Srirangan Madhavan
2025-02-21  4:39 ` [PATCH v2 2/2] cxl: add support for cxl reset Srirangan Madhavan
2025-02-21 10:45   ` Lukas Wunner
2025-02-22  0:13     ` Bjorn Helgaas
2025-04-07 23:45       ` Srirangan Madhavan
2025-08-22 15:33         ` Alejandro Lucero Palau [this message]
2025-02-21 12:43   ` Alejandro Lucero Palau
2025-02-22  5:45   ` kernel test robot
2025-02-22  7:08   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f90ba985-5cfb-4714-bbab-17de471a7332@amd.com \
    --to=alucerop@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=helgaas@kernel.org \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=sdonthineni@nvidia.com \
    --cc=smadhavan@nvidia.com \
    --cc=vaslot@nvidia.com \
    --cc=vishal.l.verma@intel.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox