linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Li <chrisl@kernel.org>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Bjorn Helgaas <bhelgaas@google.com>,
	 Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	 Danilo Krummrich <dakr@kernel.org>, Len Brown <lenb@kernel.org>,
	linux-kernel@vger.kernel.org,  linux-pci@vger.kernel.org,
	linux-acpi@vger.kernel.org,  David Matlack <dmatlack@google.com>,
	Pasha Tatashin <tatashin@google.com>,
	 Jason Miu <jasonmiu@google.com>,
	Vipin Sharma <vipinsh@google.com>,
	 Saeed Mahameed <saeedm@nvidia.com>,
	Adithya Jayachandran <ajayachandra@nvidia.com>,
	 Parav Pandit <parav@nvidia.com>, William Tu <witu@nvidia.com>,
	Mike Rapoport <rppt@kernel.org>,
	 Leon Romanovsky <leon@kernel.org>,
	Junaid Shahid <junaids@google.com>
Subject: Re: [PATCH RFC 20/25] PCI/LUO: Avoid write to liveupdate devices at boot
Date: Wed, 6 Aug 2025 17:50:37 -0700	[thread overview]
Message-ID: <CAF8kJuN4yjBzaTuAA9wERbxbJQs=YSf-1RY_nHu+XvMybpYbfA@mail.gmail.com> (raw)
In-Reply-To: <20250802135034.GJ26511@ziepe.ca>

Hi Jason,

Thanks for your feedback.

On Sat, Aug 2, 2025 at 6:50 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Aug 01, 2025 at 04:04:39PM -0700, Chris Li wrote:
> > My philosophy is that the LUO PCI subsystem is for service of the PCI
> > device driver. Ultimately it is the PCI device driver who decides what
> > part of the config space they want to preserve or overwrite. The PCI
> > layer is just there to facilitate that service.
>
> I don't think this makes any sense at all. There is nothing the device
> driver can contribute here.

I am considering that the device driver owner will know a lot more
device internal knowledge, e.g. why it needs to reserve this and that
register where the PCI layer might not know much about the internal
device behavior.

> > If you still think it is unjustifiable to have one test try to
> > preserve all config space for liveupdate.
>
> I do think it is unjustifiable, it is architecurally wrong. You only
> should be preserving the absolute bare minimum of config space bits
> and everything else should be rewritten by the next kernel in the
> normal way. This MSI is a prime example of a nonsensical outcome if
> you take the position the config space should not be written to.

OK. Let me rework the V2 with your approach.

>
> > > Only some config accesse are bad. Each and every "bad" one needs to be
> > > clearly explained *why* it is bad and only then mitigated.
> >
> > That is exactly the reason why we have the conservative test that
> > preserves every config space test as a starting point.
>
> That is completely the opposite of what I said. Preserving everything
> is giving up on the harder job of identifying which bits cannot be
> changed, explaining why they can't be changed, and then mitigating
> only those things.

We can still preserve every thing then work backwards to preserve
less.  As I said, I will rework V2 with your approach preserving bare
minimum as the starting place.

> > Another constraint is that the data center servers are dependent on
> > the network device able to connect to the network appropriately. Take
> > diorite NIC  for example, if I try only preserving ATS/PASID did not
> > finish the rest of liveupdate, the nic wasn't able to boot up and
> > connect to the network all the way. Even if the test passes for the
> > ATS part, the over test fails because the server is not back online. I
> > can't include that test into the test dashboard, because it brings
> > down the server. The only way to recover from that is rebooting the
> > server, which takes a long time for a big server. I can only keep that
> > non-passing test as my own private developing test, not the regression
> > test set.
>
> I have no idea what this is trying to say and it sounds like you also
> can't explain exactly what is "wrong" and justify why things are being
> preserved.

I know what register is causing the trouble but I think we are under a
different philosophy of addressing the problem from different ends.
Another consideration is the device testing matrixs. The kexec with
device liveupdate is a rare event. With that many device state
re-initializing might trigger some very rare bug in the device or
firmware. So it might be due to the device internal implementation,
even though PCI spec might say otherwise or undefined.

Anyway, let me do it your way in V2 then.

> Again, your series should be starting simpler. Perserve the dumbest
> simplest PCI configuration. Certainly no switches, P2P, ATS or
> PASID. When that is working you can then add on more complex PCI
> features piece by piece.

With the V1 the patch series deliverable is having an Intel diorite
NVMe device preserve every config space access and pass to the vfio
and iommu people to build the vfio and iommu on top of it. Let's
forget about V1.

With V2 I want to start with the minimal end. No switches,P2P, ATS or
PASID. I need some help to define what is deliverable in such a
minimal preserve. e.g. Do I be able to read back the config value not
changed then call it a day. Or do I expect to see the device fully
initialized, it is able to be used by the user space. Will the device
need to perform any DMA? Interrupt?

I will probably find a device as simple as possible and it is attached
to the root PCI host bridge, not the PCI-PCI bridge.
Maybe no interrupt as the first step. One possibility is using the
Intel DSA device that does the DMA streaming.

If you have any other feedback on the candidate device and deliverable
test for V2, I am looking forward to it.

Thanks.

Chris

  reply	other threads:[~2025-08-07  0:50 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-28  8:24 [RFC PATCH 00/25] Live Update Orchestrator: PCI subsystem Chris Li
2025-07-28  8:24 ` [PATCH RFC 01/25] PCI/LUO: Register with Liveupdate Orchestrator Chris Li
2025-07-28  8:24 ` [PATCH RFC 02/25] PCI/LUO: Add struct dev_liveupdate Chris Li
2025-07-28  8:24 ` [PATCH RFC 03/25] PCI/LUO: Create requested liveupdate device list Chris Li
2025-07-28  8:24 ` [PATCH RFC 04/25] PCI/LUO: Forward prepare()/freeze()/cancel() callbacks to driver Chris Li
2025-07-28  8:24 ` [PATCH RFC 05/25] PCI/LUO: Restore state at PCI enumeration Chris Li
2025-07-28  8:24 ` [PATCH RFC 06/25] PCI/LUO: Forward finish callbacks to drivers Chris Li
2025-07-28  8:24 ` [PATCH RFC 07/25] PCI/LUO: Save and restore driver name Chris Li
2025-07-28  8:24 ` [PATCH RFC 08/25] PCI/LUO: Add liveupdate to pcieport driver Chris Li
2025-07-28  8:24 ` [PATCH RFC 09/25] PCI/LUO: Save SR-IOV number of VF Chris Li
2025-07-28  8:24 ` [PATCH RFC 10/25] PCI/LUO: Add pci_liveupdate_get_driver_data() Chris Li
2025-07-28  8:24 ` [PATCH RFC 11/25] PCI: pci-lu-stub: Add a stub driver for Live Update testing Chris Li
2025-07-28  8:24 ` [PATCH RFC 12/25] PCI/LUO: Save struct pci_dev info during prepare phase chrisl
2025-07-28  8:24 ` [PATCH RFC 13/25] PCI/LUO: Check the device function numbers in restoration chrisl
2025-07-28  8:24 ` [PATCH RFC 14/25] PCI/LUO: Restore power state of a PCI device chrisl
2025-07-28  8:24 ` [PATCH RFC 15/25] PCI/LUO: Restore PM related fields chrisl
2025-07-28  8:24 ` [PATCH RFC 16/25] PCI/LUO: Restore the pme_poll flag chrisl
2025-07-28  8:24 ` [PATCH RFC 17/25] PCI/LUO: Restore the no_d3cold flag chrisl
2025-07-28  8:24 ` [PATCH RFC 18/25] PCI/LUO: Restore pci_dev fields during probe chrisl
2025-07-28  8:24 ` [PATCH RFC 19/25] PCI/LUO: Track liveupdate buses Chris Li
2025-07-28  8:24 ` [PATCH RFC 20/25] PCI/LUO: Avoid write to liveupdate devices at boot Chris Li
2025-07-28 17:23   ` Thomas Gleixner
2025-07-28 23:50     ` Jason Gunthorpe
2025-07-30  4:13       ` Chris Li
2025-07-30  1:51     ` Chris Li
2025-07-31 15:01       ` Jason Gunthorpe
2025-08-01 23:04         ` Chris Li
2025-08-02 13:50           ` Jason Gunthorpe
2025-08-07  0:50             ` Chris Li [this message]
2025-07-28  8:24 ` [PATCH RFC 21/25] PCI/LUO: Save and restore the PCI resource chrisl
2025-07-28  8:24 ` [PATCH RFC 22/25] PCI/LUO: Save PCI bus and host bridge states chrisl
2025-07-28  8:24 ` [PATCH RFC 23/25] PCI/LUO: Check the PCI bus state after restoration chrisl
2025-07-28  8:24 ` [PATCH RFC 24/25] PCI: pci-lu-pf-stub: Add a PF stub driver for Live Update testing Chris Li
2025-07-28  8:24 ` [PATCH RFC 25/25] PCI/LUO: Clean up PCI_SER_GET() chrisl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF8kJuN4yjBzaTuAA9wERbxbJQs=YSf-1RY_nHu+XvMybpYbfA@mail.gmail.com' \
    --to=chrisl@kernel.org \
    --cc=ajayachandra@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=dakr@kernel.org \
    --cc=dmatlack@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jasonmiu@google.com \
    --cc=jgg@ziepe.ca \
    --cc=junaids@google.com \
    --cc=lenb@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=parav@nvidia.com \
    --cc=rafael@kernel.org \
    --cc=rppt@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=tatashin@google.com \
    --cc=tglx@linutronix.de \
    --cc=vipinsh@google.com \
    --cc=witu@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).