From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Alex Williamson <alex@shazbot.org>
Cc: Lukas Wunner <lukas@wunner.de>, <smadhavan@nvidia.com>,
<dave@stgolabs.net>, <dave.jiang@intel.com>,
<alison.schofield@intel.com>, <vishal.l.verma@intel.com>,
<ira.weiny@intel.com>, <dan.j.williams@intel.com>,
<bhelgaas@google.com>, <ming.li@zohomail.com>, <rrichter@amd.com>,
<Smita.KoralahalliChannabasappa@amd.com>,
<huaisheng.ye@intel.com>, <linux-cxl@vger.kernel.org>,
<linux-pci@vger.kernel.org>, <vaslot@nvidia.com>,
<vsethi@nvidia.com>, <sdonthineni@nvidia.com>,
<vidyas@nvidia.com>, <mochs@nvidia.com>, <jsequeira@nvidia.com>,
Terry Bowman <terry.bowman@amd.com>
Subject: Re: [PATCH v4 09/10] PCI: save/restore CXL config around reset
Date: Thu, 12 Mar 2026 18:24:01 +0000 [thread overview]
Message-ID: <20260312182401.00001adc@huawei.com> (raw)
In-Reply-To: <20260126153435.5f1557df@shazbot.org>
On Mon, 26 Jan 2026 15:34:35 -0700
Alex Williamson <alex@shazbot.org> wrote:
> On Thu, 22 Jan 2026 10:47:45 +0000
> Jonathan Cameron <jonathan.cameron@huawei.com> wrote:
>
> > On Thu, 22 Jan 2026 11:01:57 +0100
> > Lukas Wunner <lukas@wunner.de> wrote:
> >
> > > On Tue, Jan 20, 2026 at 10:26:09PM +0000, smadhavan@nvidia.com wrote:
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -4989,6 +4990,11 @@ static int cxl_reset(struct pci_dev *dev, bool probe)
> > > > if (probe)
> > > > return 0;
> > > >
> > > > + pci_save_state(dev);
> > > > + rc = cxl_config_save_state(dev, &cxl_state);
> > > > + if (rc)
> > > > + pci_warn(dev, "Failed to save CXL config state: %d\n", rc);
> > > > +
> > >
> > > Hm, shouldn't the call to cxl_config_save_state() be moved to
> > > pci_save_state() (and likewise, cxl_config_restore_state() moved to
> > > pci_restore_state())?
> > >
> > > E.g. when a DPC event occurs, I assume CXL registers need to
> > > be restored as well on recovery, right?
> > The CXL spec has some comic language around DPC that basically says
> > "use with care, DPC trigger will bring down physical link, reset devicestate,
> > disrupt CXL.cache and CXL.mem traffic".
> > or in shorter words
> > 'Good luck'
> >
> > If a CXL device undergoes DPC high chance you'll either trigger CXL isolation
> > which we aren't handing yet in Linux because we aren't convinced software
> > can really recover form it, or stall a CPU and end up rebooting.
> >
> > Maybe we'll one day we'll figure this out. Today turn off DPC on CXL ports! :)
>
> Even if we hand-wave that DPC isn't an issue, save/restore of the PCI
> state happens at a higher level for every other PCI reset method and
> we're creating inconsistency here.
>
> PCI-core includes interfaces for saving PCI state, offloading PCI state
> as an opaque blob, reloading, and restoring that state, and performing
> resets without saving and restoring state. This has a couple users,
> including vfio.
>
> If we want similar behavior for CXL type2 devices for a future vfio use
> case, we shouldn't create unnecessary differentiation here with saving
> the CXL state separately and making the reset method behave
> differently. Thanks,
>
I'm a bit concerned that, unlike PCI where no traffic flows after reset
and restore of basic PCIe stuff, for CXL once you've put the decoders
etc back in place, CXL.mem traffic can happen autonomously. It's
cacheable and physical address prefetchers on the CPU side may be able
wander into it more or less randomly, whether there are page tables yet
or not.
This is somewhat similar to PCI devices misbehaving if you enable
bus mastering without ensuring they are in a clean state (just in the
other direction).
So I'm not sure how safe it is to restore the generic CXL state with
out the driver taking control.
I don't think there are tight enough guarantees that devices should be
able to survive this if their drivers haven't managed the setup of CXL.mem
carefully as they did during driver bind etc. Maybe they had to
load a firmware first before there was anything behind a CXL protocol
front end.
The drivers can't stop CXL.mem in a prepare reset callback
prior to saving state as it may be RWL by an annoying BIOS.
Maybe I'm overly paranoid and all device manufacturers are sensible.
Or I missed some spec text that says devices should politely handle
traffic turning up before they are ready. If they implement the memory
ready checks then we may be fine as hopefully Media Status == Ready
doesn't happen until it's safe to enable access (though the spec
doesn't actually say that is sufficient that I can find).
I need to do some more digging and maybe a spot of prototyping.
Also more than plausible I'm missing a nugget of code in here
that makes this all safe.
Jonathan
> Alex
next prev parent reply other threads:[~2026-03-12 18:24 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-20 22:26 [PATCH v4 0/10] CXL Reset support for Type 2 devices smadhavan
2026-01-20 22:26 ` [PATCH v4 01/10] cxl: move DVSEC defines to cxl pci header smadhavan
2026-01-21 10:31 ` Jonathan Cameron
2026-01-20 22:26 ` [PATCH v4 02/10] PCI: switch CXL port DVSEC defines smadhavan
2026-01-21 10:34 ` Jonathan Cameron
2026-01-20 22:26 ` [PATCH v4 03/10] cxl: add type 2 helper and reset DVSEC bits smadhavan
2026-01-20 23:27 ` Dave Jiang
2026-01-21 10:45 ` Jonathan Cameron
2026-01-20 22:26 ` [PATCH v4 04/10] PCI: add CXL reset method smadhavan
2026-01-21 0:08 ` Dave Jiang
2026-01-21 10:57 ` Jonathan Cameron
2026-01-23 13:54 ` kernel test robot
2026-01-20 22:26 ` [PATCH v4 05/10] cxl: add reset prepare and region teardown smadhavan
2026-01-21 11:09 ` Jonathan Cameron
2026-01-21 21:25 ` Dave Jiang
2026-01-20 22:26 ` [PATCH v4 06/10] PCI: wire CXL reset prepare/cleanup smadhavan
2026-01-21 22:13 ` Dave Jiang
2026-01-22 2:17 ` Srirangan Madhavan
2026-01-22 15:11 ` Dave Jiang
2026-01-24 7:54 ` kernel test robot
2026-01-20 22:26 ` [PATCH v4 07/10] cxl: add host cache flush and multi-function reset smadhavan
2026-01-21 11:20 ` Jonathan Cameron
2026-01-21 20:27 ` Davidlohr Bueso
2026-01-22 9:53 ` Jonathan Cameron
2026-01-21 22:19 ` Vikram Sethi
2026-01-22 9:40 ` Souvik Chakravarty
[not found] ` <PH7PR12MB9175CDFC163843BB497073CEBD96A@PH7PR12MB9175.namprd12.prod.outlook.com>
2026-01-22 10:31 ` Jonathan Cameron
2026-01-22 19:24 ` Vikram Sethi
2026-01-23 13:13 ` Jonathan Cameron
2026-01-21 23:59 ` Dave Jiang
2026-01-20 22:26 ` [PATCH v4 08/10] cxl: add DVSEC config save/restore smadhavan
2026-01-21 11:31 ` Jonathan Cameron
2026-01-20 22:26 ` [PATCH v4 09/10] PCI: save/restore CXL config around reset smadhavan
2026-01-21 22:32 ` Dave Jiang
2026-01-22 10:01 ` Lukas Wunner
2026-01-22 10:47 ` Jonathan Cameron
2026-01-26 22:34 ` Alex Williamson
2026-03-12 18:24 ` Jonathan Cameron [this message]
2026-01-20 22:26 ` [PATCH v4 10/10] cxl: add HDM decoder and IDE save/restore smadhavan
2026-01-21 11:42 ` Jonathan Cameron
2026-01-22 15:09 ` Dave Jiang
2026-01-21 1:19 ` [PATCH v4 0/10] CXL Reset support for Type 2 devices Alison Schofield
2026-01-22 0:00 ` Bjorn Helgaas
2026-01-27 16:33 ` Alex Williamson
2026-01-27 17:02 ` dan.j.williams
2026-01-27 18:07 ` Vikram Sethi
2026-01-28 3:42 ` dan.j.williams
2026-01-28 12:36 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260312182401.00001adc@huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alex@shazbot.org \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=huaisheng.ye@intel.com \
--cc=ira.weiny@intel.com \
--cc=jsequeira@nvidia.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=ming.li@zohomail.com \
--cc=mochs@nvidia.com \
--cc=rrichter@amd.com \
--cc=sdonthineni@nvidia.com \
--cc=smadhavan@nvidia.com \
--cc=terry.bowman@amd.com \
--cc=vaslot@nvidia.com \
--cc=vidyas@nvidia.com \
--cc=vishal.l.verma@intel.com \
--cc=vsethi@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox