linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: CXL Hot and Warm Reset Testing
       [not found] <DM5PR12MB2534D383B0226498DD7F2005BDFA9@DM5PR12MB2534.namprd12.prod.outlook.com>
@ 2021-08-13 17:14 ` Bjorn Helgaas
  2021-08-13 21:27   ` Dan Williams
  2021-08-14 11:16   ` Amey Narkhede
  0 siblings, 2 replies; 5+ messages in thread
From: Bjorn Helgaas @ 2021-08-13 17:14 UTC (permalink / raw)
  To: Vikram Sethi
  Cc: Dan Williams, Chris Browy, linux-cxl@vger.kernel.org,
	Ben Widawsky, Jonathan Cameron, Alex Williamson, Bjorn Helgaas,
	Shanker Donthineni, Amey Narkhede, linux-pci

[+cc Amey (working on PCI resets), linux-pci]

On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote:
> Hi Dan, 
> 
> > -----Original Message-----
> > From: Dan Williams <dan.j.williams@intel.com>
> > 
> > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com>
> > wrote:
> > 
> > /sys/bus/pci/devices/$device/reset is a method to trigger PCI
> > device reset, but I do not expect that will ever gain CXL specific
> > knowledge.
> > 
> CXL reset may need some thought, specially for devices that don't
> expose FLR but do expose CXL reset (while former does not affect
> CXL.cache/mem, the latter wipes out CXL.cache/mem state in the
> device and there is discoverability as to whether or not memory
> contents can be cleared as part of CXL reset). We may need a way of
> triggering CXL reset from userspace, and if the existing
> /sys/bus/pci/devices/$device/reset won't have knowledge of CXL
> reset, there still should be a prioritized order in the kernel in
> which CXL reset is attempted before more drastic resets like SBR.
> IIRC CXL reset can also impact all functions that use CXL.cache/mem,
> but not legacy PCIe functions on the device which do not use
> CXL.cache/mem (there is discoverability as to which functions are
> not impacted by CXL reset). 
> 
> Thanks,
> Vikram

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CXL Hot and Warm Reset Testing
  2021-08-13 17:14 ` CXL Hot and Warm Reset Testing Bjorn Helgaas
@ 2021-08-13 21:27   ` Dan Williams
  2021-08-17  3:03     ` Vikram Sethi
  2021-08-14 11:16   ` Amey Narkhede
  1 sibling, 1 reply; 5+ messages in thread
From: Dan Williams @ 2021-08-13 21:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Vikram Sethi, Chris Browy, linux-cxl@vger.kernel.org,
	Ben Widawsky, Jonathan Cameron, Alex Williamson, Bjorn Helgaas,
	Shanker Donthineni, Amey Narkhede, Linux PCI

On Fri, Aug 13, 2021 at 10:14 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Amey (working on PCI resets), linux-pci]
>
> On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote:
> > Hi Dan,
> >
> > > -----Original Message-----
> > > From: Dan Williams <dan.j.williams@intel.com>
> > >
> > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com>
> > > wrote:
> > >
> > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI
> > > device reset, but I do not expect that will ever gain CXL specific
> > > knowledge.
> > >
> > CXL reset may need some thought, specially for devices that don't
> > expose FLR but do expose CXL reset (while former does not affect
> > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the
> > device and there is discoverability as to whether or not memory
> > contents can be cleared as part of CXL reset). We may need a way of
> > triggering CXL reset from userspace, and if the existing
> > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL
> > reset, there still should be a prioritized order in the kernel in
> > which CXL reset is attempted before more drastic resets like SBR.
> > IIRC CXL reset can also impact all functions that use CXL.cache/mem,
> > but not legacy PCIe functions on the device which do not use
> > CXL.cache/mem (there is discoverability as to which functions are
> > not impacted by CXL reset).

What's the Linux use case for supporting CXL reset for a CXL memory
expander? PCI reset is useful for device assignment, and CXL reset
might be useful for similarly assigning an accelerator. CXL.mem on the
other hand can be directly assigned at a per-page level without also
needing to assign the device. How could a VM reliably program HDM
decoders when it cannot perceive the host physical address space? I
understand the utility of CXL reset for device bring-up and test
software that knows what it is doing can write config space directly,
but that software would assume all responsibility.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CXL Hot and Warm Reset Testing
  2021-08-13 17:14 ` CXL Hot and Warm Reset Testing Bjorn Helgaas
  2021-08-13 21:27   ` Dan Williams
@ 2021-08-14 11:16   ` Amey Narkhede
  2021-08-14 19:47     ` Dan Williams
  1 sibling, 1 reply; 5+ messages in thread
From: Amey Narkhede @ 2021-08-14 11:16 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Vikram Sethi, Chris Browy, linux-cxl, Ben Widawsky,
	Jonathan Cameron, Alex Williamson, Shanker Donthineni,
	Amey Narkhede, Linux PCI

On 21/08/13 12:14PM, Bjorn Helgaas wrote:
> [+cc Amey (working on PCI resets), linux-pci]
>
> On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote:
> > Hi Dan,
> >
> > > -----Original Message-----
> > > From: Dan Williams <dan.j.williams@intel.com>
> > >
> > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com>
> > > wrote:
> > >
> > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI
> > > device reset, but I do not expect that will ever gain CXL specific
> > > knowledge.
> > >
> > CXL reset may need some thought, specially for devices that don't
> > expose FLR but do expose CXL reset (while former does not affect
> > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the
> > device and there is discoverability as to whether or not memory
> > contents can be cleared as part of CXL reset). We may need a way of
> > triggering CXL reset from userspace, and if the existing
> > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL
> > reset, there still should be a prioritized order in the kernel in
> > which CXL reset is attempted before more drastic resets like SBR.
> > IIRC CXL reset can also impact all functions that use CXL.cache/mem,
> > but not legacy PCIe functions on the device which do not use
> > CXL.cache/mem (there is discoverability as to which functions are
> > not impacted by CXL reset).
> >
> > Thanks,
> > Vikram

We can add new reset method and expose it to userspace via new 'reset_method'
sysfs attribute introduced in this series
https://lore.kernel.org/linux-pci/20210805162917.3989-1-ameynarkhede03@gmail.com/

Thanks,
Amey

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: CXL Hot and Warm Reset Testing
  2021-08-14 11:16   ` Amey Narkhede
@ 2021-08-14 19:47     ` Dan Williams
  0 siblings, 0 replies; 5+ messages in thread
From: Dan Williams @ 2021-08-14 19:47 UTC (permalink / raw)
  To: Amey Narkhede
  Cc: Bjorn Helgaas, Vikram Sethi, Chris Browy, linux-cxl, Ben Widawsky,
	Jonathan Cameron, Alex Williamson, Shanker Donthineni, Linux PCI

On Sat, Aug 14, 2021 at 4:16 AM Amey Narkhede <ameynarkhede03@gmail.com> wrote:
>
> On 21/08/13 12:14PM, Bjorn Helgaas wrote:
> > [+cc Amey (working on PCI resets), linux-pci]
> >
> > On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote:
> > > Hi Dan,
> > >
> > > > -----Original Message-----
> > > > From: Dan Williams <dan.j.williams@intel.com>
> > > >
> > > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy <cbrowy@avery-design.com>
> > > > wrote:
> > > >
> > > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI
> > > > device reset, but I do not expect that will ever gain CXL specific
> > > > knowledge.
> > > >
> > > CXL reset may need some thought, specially for devices that don't
> > > expose FLR but do expose CXL reset (while former does not affect
> > > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the
> > > device and there is discoverability as to whether or not memory
> > > contents can be cleared as part of CXL reset). We may need a way of
> > > triggering CXL reset from userspace, and if the existing
> > > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL
> > > reset, there still should be a prioritized order in the kernel in
> > > which CXL reset is attempted before more drastic resets like SBR.
> > > IIRC CXL reset can also impact all functions that use CXL.cache/mem,
> > > but not legacy PCIe functions on the device which do not use
> > > CXL.cache/mem (there is discoverability as to which functions are
> > > not impacted by CXL reset).
> > >
> > > Thanks,
> > > Vikram
>
> We can add new reset method and expose it to userspace via new 'reset_method'
> sysfs attribute introduced in this series
> https://lore.kernel.org/linux-pci/20210805162917.3989-1-ameynarkhede03@gmail.com/

It's not clear to me that's a suitable place for CXL reset though. CXL
reset wants to coordinate with the device's participation in a
potential interleave-set across multiple devices. So something like
/sys/bus/cxl/devices/memX/reset might be a better location for
coordinated CXL reset if needed. Again though, the primary use case
for userspace triggered reset is device assignment, and there are
better mechanisms to assign CXL.mem resources to a guest.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: CXL Hot and Warm Reset Testing
  2021-08-13 21:27   ` Dan Williams
@ 2021-08-17  3:03     ` Vikram Sethi
  0 siblings, 0 replies; 5+ messages in thread
From: Vikram Sethi @ 2021-08-17  3:03 UTC (permalink / raw)
  To: Dan Williams, Bjorn Helgaas
  Cc: Chris Browy, linux-cxl@vger.kernel.org, Ben Widawsky,
	Jonathan Cameron, Alex Williamson, Bjorn Helgaas,
	Shanker Donthineni, Amey Narkhede, Linux PCI


> -----Original Message-----
> From: Dan Williams <dan.j.williams@intel.com>
> On Fri, Aug 13, 2021 at 10:14 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > [+cc Amey (working on PCI resets), linux-pci]
> >
> > On Fri, Aug 13, 2021 at 05:01:32PM +0000, Vikram Sethi wrote:
> > > Hi Dan,
> > >
> > > > -----Original Message-----
> > > > From: Dan Williams <dan.j.williams@intel.com>
> > > >
> > > > On Wed, Aug 11, 2021 at 9:42 AM Chris Browy
> > > > <cbrowy@avery-design.com>
> > > > wrote:
> > > >
> > > > /sys/bus/pci/devices/$device/reset is a method to trigger PCI
> > > > device reset, but I do not expect that will ever gain CXL specific
> > > > knowledge.
> > > >
> > > CXL reset may need some thought, specially for devices that don't
> > > expose FLR but do expose CXL reset (while former does not affect
> > > CXL.cache/mem, the latter wipes out CXL.cache/mem state in the
> > > device and there is discoverability as to whether or not memory
> > > contents can be cleared as part of CXL reset). We may need a way of
> > > triggering CXL reset from userspace, and if the existing
> > > /sys/bus/pci/devices/$device/reset won't have knowledge of CXL
> > > reset, there still should be a prioritized order in the kernel in
> > > which CXL reset is attempted before more drastic resets like SBR.
> > > IIRC CXL reset can also impact all functions that use CXL.cache/mem,
> > > but not legacy PCIe functions on the device which do not use
> > > CXL.cache/mem (there is discoverability as to which functions are
> > > not impacted by CXL reset).
> 
> What's the Linux use case for supporting CXL reset for a CXL memory
> expander? PCI reset is useful for device assignment, and CXL reset might be
> useful for similarly assigning an accelerator. CXL.mem on the other hand can
> be directly assigned at a per-page level without also needing to assign the
> device. How could a VM reliably program HDM decoders when it cannot
> perceive the host physical address space? I understand the utility of CXL
> reset for device bring-up and test software that knows what it is doing can
> write config space directly, but that software would assume all responsibility.

Agree that CXL reset will be needed for type1/2 CXL devices (accelerators) 
which will need a sysfs interface for userspace to use CXL reset. 
 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-17  3:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <DM5PR12MB2534D383B0226498DD7F2005BDFA9@DM5PR12MB2534.namprd12.prod.outlook.com>
2021-08-13 17:14 ` CXL Hot and Warm Reset Testing Bjorn Helgaas
2021-08-13 21:27   ` Dan Williams
2021-08-17  3:03     ` Vikram Sethi
2021-08-14 11:16   ` Amey Narkhede
2021-08-14 19:47     ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).