From: Alejandro Lucero Palau <alucerop@amd.com>
To: Srirangan Madhavan <smadhavan@nvidia.com>,
Bjorn Helgaas <helgaas@kernel.org>,
Lukas Wunner <lukas@wunner.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
Dave Jiang <dave.jiang@intel.com>,
Alison Schofield <alison.schofield@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Ira Weiny <ira.weiny@intel.com>,
Dan Williams <dan.j.williams@intel.com>,
Zhi Wang <zhiw@nvidia.com>, Vishal Aslot <vaslot@nvidia.com>,
Shanker Donthineni <sdonthineni@nvidia.com>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
Bjorn Helgaas <bhelgaas@google.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>
Subject: Re: [PATCH v2 2/2] cxl: add support for cxl reset
Date: Fri, 22 Aug 2025 16:33:02 +0100 [thread overview]
Message-ID: <f90ba985-5cfb-4714-bbab-17de471a7332@amd.com> (raw)
In-Reply-To: <SJ2PR12MB79630812834531324D148DB8C3AA2@SJ2PR12MB7963.namprd12.prod.outlook.com>
Hi,
I'm sorry I missed your reply. Some comments below.
On 4/8/25 00:45, Srirangan Madhavan wrote:
> Thank you so much for the comments Alejandro, Lukas and Bjorn.
> My apologies for the late response. I was out of office for a few weeks.
> I am picking up this patch again and adding my responses.
>
>> I think cxl_reset_start after the *_prepare call makes more sense here.
> @Alejandro, Just for clarity,
> do you mean I should rename the cxl_reset_init -> cxl_reset_start?
Yes, that's my view. init and prepare are similar, so prepare or init,
then start.
>> I do not know how safe is this. IMO, this needs to be synchronized by
>> the accel driver which could imply to tell user space first. In our case
>> it would imply to stop rx queues in the netdev for CXL.cache, and tx
>> queues for CXL.mem. Doing it unconditionally could make current CXL
>> transactions to stall ... although it could be argued the reset event
>> implies something is broken, but let's try to do it properly if there is
>> a chance of the system not unreliable.
> Regarding this, we feel that the reset framework already
> has a *reset_prepare and is called by the Linux kernel, before initiating the steps
> for acutal CXL reset. During this reset prepare is when the accel driver should
> quiesce its device. In this case, that would imply stopping the rx & tx and
> draining the queues. Is there any particular reason this wouldn't work/be sufficient?
I did not see the connection to the driver/device in the patch so I
assumed it was not there, but after your comment, I studied the code and
it is just a matter of implementing such reset prepare function in our
driver, and I think that solves the problem I saw.
>> One thing this does not cover, and I do not know if it should, is the
>> fact that the device CXL regs will be reset, so the question is if the
>> old values should be restored or the device/driver should go through the
>> same initialization, if a hotplug device, or do it specifically if
>> already present at boot time and the BIOS doing that first
>> initialization. In one case the restoration needs to happen, in the
>> other the old values/objects need to be removed. I think the second case
>> is more problematic because this is likely involving CXL root complex
>> configuration performed by the BIOS ... Not trivial at all IMO.
> Here again, we think that the accel driver is the one that should be doing this.
> In our case, when the reset done call is made, the driver before making the device
> available to be reopened and used, needs to restore the config space/DVSEC etc. to
> their prior state. During the reset_prepare stage these need to be saved. Since this
> is how SBR is handled currently even in the PCIe world, (for ex. AER/EEH handling framework)
> where driver is responsible for save and restore of the regs, and it is also not possible
> for the kernel to know exactly what to save and restore for each specific type 2 devices,
> it seems logical to do the same here.
At the time I wrote that comment, it was not clear in my mind how this
should work, and I had doubts about the BIOS.
But I can say now all this is fine and doable.
From you comment (not in the thread but through discord) about being
the sfc driver the client for this functionality, I think it could be.
I'm not sure if as part of the current Type2 patchset effort or a
follow-up work. I'll do some work the next week for seeing the implications.
Thank you
>> +1 The reset-related content in drivers/pci/pci.c has been growing
>> recently. Maybe we should consider moving it all to a reset.c file.
> This makes sense. I'll prepare a patch to move the reset code and
> compile out CXL specific parts while making any changes for the next version.
>
> Thank you.
>
> Regards,
> Srirangan.
> ________________________________________
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Friday, February 21, 2025 4:13 PM
> To: Lukas Wunner
> Cc: Srirangan Madhavan; Davidlohr Bueso; Jonathan Cameron; Dave Jiang; Alison Schofield; Vishal Verma; Ira Weiny; Dan Williams; Zhi Wang; Vishal Aslot; Shanker Donthineni; linux-cxl@vger.kernel.org; Bjorn Helgaas; linux-pci@vger.kernel.org
> Subject: Re: [PATCH v2 2/2] cxl: add support for cxl reset
>
> External email: Use caution opening links or attachments
>
>
> On Fri, Feb 21, 2025 at 11:45:56AM +0100, Lukas Wunner wrote:
>> On Thu, Feb 20, 2025 at 08:39:06PM -0800, Srirangan Madhavan wrote:
>>> Type 2 devices are being introduced and will require finer-grained
>>> reset mechanisms beyond bus-wide reset methods.
>>>
>>> Add support for CXL reset per CXL v3.2 Section 9.6/9.7
>>>
>>> Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
>>> ---
>>> drivers/pci/pci.c | 146 ++++++++++++++++++++++++++++++++++++++++++++
>> drivers/pci/pci.c is basically a catch-all for anything that doesn't fit
>> in one of the other .c files in drivers/pci. I'm slightly worried that
>> this (otherwise legitimate) patch increases the clutter in pci.c further,
>> rendering it unmaintainable in the long term.
> +1 The reset-related content in drivers/pci/pci.c has been growing
> recently. Maybe we should consider moving it all to a reset.c file.
>
>> At the very least, I'm wondering if this can be #ifdef'ed to
>> CONFIG_CXL_PCI?
>>
>> One idea would be to move this newly added reset method, as well as the
>> existing cxl_reset_bus_function(), to a new drivers/pci/cxl.c file.
>>
>> I guess moving it to drivers/cxl/ isn't an option because cxl can be
>> modular.
>>
>> Another idea would be to move all the reset handling (which makes up
>> a significant portion of pci.c) to a separate drivers/pci/reset.c.
>> This might be beyond the scope of your patch, but in the interim,
>> maybe at least an #ifdef can be added because the PCI core is also
>> used e.g. on memory-constrained wifi routers which don't care about
>> CXL at all.
> Agree, we'll need some way to make this optional.
>
> Bjorn
next prev parent reply other threads:[~2025-08-22 15:33 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-21 4:39 [PATCH v2 0/2] Add CXL Reset Support for CXL Devices Srirangan Madhavan
2025-02-21 4:39 ` [PATCH v2 1/2] cxl: de-duplicate cxl DVSEC reg defines Srirangan Madhavan
2025-02-21 4:39 ` [PATCH v2 2/2] cxl: add support for cxl reset Srirangan Madhavan
2025-02-21 10:45 ` Lukas Wunner
2025-02-22 0:13 ` Bjorn Helgaas
2025-04-07 23:45 ` Srirangan Madhavan
2025-08-22 15:33 ` Alejandro Lucero Palau [this message]
2025-02-21 12:43 ` Alejandro Lucero Palau
2025-02-22 5:45 ` kernel test robot
2025-02-22 7:08 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f90ba985-5cfb-4714-bbab-17de471a7332@amd.com \
--to=alucerop@amd.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=helgaas@kernel.org \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=sdonthineni@nvidia.com \
--cc=smadhavan@nvidia.com \
--cc=vaslot@nvidia.com \
--cc=vishal.l.verma@intel.com \
--cc=zhiw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox