All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>, <x86@kernel.org>,
	Marc Zyngier <maz@kernel.org>, Megha Dey <megha.dey@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	"Jacob Pan" <jacob.jun.pan@intel.com>,
	Baolu Lu <baolu.lu@intel.com>, Kevin Tian <kevin.tian@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Joerg Roedel <joro@8bytes.org>,
	<iommu@lists.linux-foundation.org>,
	<linux-hyperv@vger.kernel.org>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	"Jon Derrick" <jonathan.derrick@intel.com>,
	Lu Baolu <baolu.lu@linux.intel.com>, Wei Liu <wei.liu@kernel.org>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	"Stephen Hemminger" <sthemmin@microsoft.com>,
	Steve Wahl <steve.wahl@hpe.com>,
	"Dimitri Sivanich" <sivanich@hpe.com>,
	Russ Anderson <rja@hpe.com>, <linux-pci@vger.kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"Lorenzo Pieralisi" <lorenzo.pieralisi@arm.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	<xen-devel@lists.xenproject.org>, Juergen Gross <jgross@suse.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>
Subject: Re: [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING
Date: Sat, 22 Aug 2020 20:05:11 -0300	[thread overview]
Message-ID: <20200822230511.GD1152540@nvidia.com> (raw)
In-Reply-To: <874kovqx8q.fsf@nanos.tec.linutronix.de>

On Sat, Aug 22, 2020 at 03:34:45AM +0200, Thomas Gleixner wrote:
> >> One question is whether the device can see partial updates to that
> >> memory due to the async 'swap' of context from the device CPU.
> >
> > It is worse than just partial updates.. The device operation is much
> > more like you'd imagine a CPU cache. There could be copies of the RAM
> > in the device for long periods of time, dirty data in the device that
> > will flush back to CPU RAM overwriting CPU changes, etc.
> 
> TBH, that's insane. You clearly want to think about this some
> more. If

I think this general design is around 15 years old, across a healthy
number of silicon generations, and rather a lager number of shipped
devices. People have thought about it :)

> you swap out device state and device control state then you definitly
> want to have regions which are read only from the device POV and never
> written back. 

It is not as useful as you'd think - the issue with atomicity of
update still largely prevents doing much useful from the CPU, and to
make any CPU side changes visible a device command would still be
needed to synchronize the internal state to that modified memory.

So, CPU centric updates would cover a very limited number of
operations, and a device command is required anyhow. Little is
actually gained.

> The MSI msg store clearly belongs into that category.
> But that's not restricted to the MSI msg store, there is certainly other
> stuff which never wants to be written back by the device.

To get a design where you'd be able to run everything from a CPU
atomic context that can't trigger a WQ..

New silicon would have to implement some MSI-only 'cache' that can
invalidate entries based on a simple MemWr TLP.

Then the affinity update would write to the host memory, then send a
MemWr to the device to trigger invalidate.

As a silicon design it might work, but it means existing devices can't
be used with this dev_msi. It is also the sort of thing that would
need a standard document to have any hope of multiple vendors fitting
into it. Eg at PCI-SIG or something.

> If you don't do that then you simply can't write to that space from the
> CPU and you have to transport this kind information always via command
> queues.

Yes, exactly. This is part of the architectural design of the device,
has been for a long time. Has positives and negatives.

> > I suppose the core code could provide this as a service? Sort of a
> > varient of the other lazy things above?
> 
> Kinda. That needs a lot of thought for the affinity setting stuff
> because it can be called from contexts which do not allow that. It's
> solvable though, but I clearly need to stare at the corner cases for a
> while.

If possible, this would be ideal, as we could use the dev_msi on a big
installed base of existing HW.

I suspect other HW can probably fit into this too as the basic
ingredients should be fairly widespread.

Even a restricted version for situations where affinity does not need
a device update would possibly be interesting (eg x86 IOMMU remap, ARM
GIC, etc)

> OTOH, in normal operation for MSI interrupts (edge type) masking is not
> used at all and just restricted to the startup teardown.

Yeah, at least this device doesn't need masking at runtime, just
startup/teardown and affinity update.

Thanks,
Jason

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Dimitri Sivanich <sivanich@hpe.com>,
	linux-hyperv@vger.kernel.org, Steve Wahl <steve.wahl@hpe.com>,
	linux-pci@vger.kernel.org, "K. Y. Srinivasan" <kys@microsoft.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Wei Liu <wei.liu@kernel.org>, Dave Jiang <dave.jiang@intel.com>,
	Baolu Lu <baolu.lu@intel.com>, Marc Zyngier <maz@kernel.org>,
	x86@kernel.org, Megha Dey <megha.dey@intel.com>,
	xen-devel@lists.xenproject.org, Kevin Tian <kevin.tian@intel.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Jon Derrick <jonathan.derrick@intel.com>,
	Juergen Gross <jgross@suse.com>, Russ Anderson <rja@hpe.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	iommu@lists.linux-foundation.org,
	Jacob Pan <jacob.jun.pan@intel.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>
Subject: Re: [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING
Date: Sat, 22 Aug 2020 20:05:11 -0300	[thread overview]
Message-ID: <20200822230511.GD1152540@nvidia.com> (raw)
In-Reply-To: <874kovqx8q.fsf@nanos.tec.linutronix.de>

On Sat, Aug 22, 2020 at 03:34:45AM +0200, Thomas Gleixner wrote:
> >> One question is whether the device can see partial updates to that
> >> memory due to the async 'swap' of context from the device CPU.
> >
> > It is worse than just partial updates.. The device operation is much
> > more like you'd imagine a CPU cache. There could be copies of the RAM
> > in the device for long periods of time, dirty data in the device that
> > will flush back to CPU RAM overwriting CPU changes, etc.
> 
> TBH, that's insane. You clearly want to think about this some
> more. If

I think this general design is around 15 years old, across a healthy
number of silicon generations, and rather a lager number of shipped
devices. People have thought about it :)

> you swap out device state and device control state then you definitly
> want to have regions which are read only from the device POV and never
> written back. 

It is not as useful as you'd think - the issue with atomicity of
update still largely prevents doing much useful from the CPU, and to
make any CPU side changes visible a device command would still be
needed to synchronize the internal state to that modified memory.

So, CPU centric updates would cover a very limited number of
operations, and a device command is required anyhow. Little is
actually gained.

> The MSI msg store clearly belongs into that category.
> But that's not restricted to the MSI msg store, there is certainly other
> stuff which never wants to be written back by the device.

To get a design where you'd be able to run everything from a CPU
atomic context that can't trigger a WQ..

New silicon would have to implement some MSI-only 'cache' that can
invalidate entries based on a simple MemWr TLP.

Then the affinity update would write to the host memory, then send a
MemWr to the device to trigger invalidate.

As a silicon design it might work, but it means existing devices can't
be used with this dev_msi. It is also the sort of thing that would
need a standard document to have any hope of multiple vendors fitting
into it. Eg at PCI-SIG or something.

> If you don't do that then you simply can't write to that space from the
> CPU and you have to transport this kind information always via command
> queues.

Yes, exactly. This is part of the architectural design of the device,
has been for a long time. Has positives and negatives.

> > I suppose the core code could provide this as a service? Sort of a
> > varient of the other lazy things above?
> 
> Kinda. That needs a lot of thought for the affinity setting stuff
> because it can be called from contexts which do not allow that. It's
> solvable though, but I clearly need to stare at the corner cases for a
> while.

If possible, this would be ideal, as we could use the dev_msi on a big
installed base of existing HW.

I suspect other HW can probably fit into this too as the basic
ingredients should be fairly widespread.

Even a restricted version for situations where affinity does not need
a device update would possibly be interesting (eg x86 IOMMU remap, ARM
GIC, etc)

> OTOH, in normal operation for MSI interrupts (edge type) masking is not
> used at all and just restricted to the startup teardown.

Yeah, at least this device doesn't need masking at runtime, just
startup/teardown and affinity update.

Thanks,
Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2020-08-22 23:05 UTC|newest]

Thread overview: 145+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-21  0:24 [patch RFC 00/38] x86, PCI, XEN, genirq ...: Prepare for device MSI Thomas Gleixner
2020-08-21  0:24 ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 01/38] iommu/amd: Prevent NULL pointer dereference Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 02/38] x86/init: Remove unused init ops Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 03/38] x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 04/38] x86/irq: Add allocation type for parent domain retrieval Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 05/38] iommu/vt-d: Consolidate irq domain getter Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 06/38] iommu/amd: " Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 07/38] iommu/irq_remapping: Consolidate irq domain lookup Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 08/38] x86/irq: Prepare consolidation of irq_alloc_info Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 09/38] x86/msi: Consolidate HPET allocation Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 10/38] x86/ioapic: Consolidate IOAPIC allocation Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-26  8:40   ` Boqun Feng
2020-08-26  8:40     ` Boqun Feng
2020-08-26  9:53     ` Thomas Gleixner
2020-08-26  9:53       ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 11/38] x86/irq: Consolidate DMAR irq allocation Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 12/38] x86/irq: Consolidate UV domain allocation Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 13/38] PCI: MSI: Rework pci_msi_domain_calc_hwirq() Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-25 20:03   ` Bjorn Helgaas
2020-08-25 20:03     ` Bjorn Helgaas
2020-08-25 21:11     ` Thomas Gleixner
2020-08-25 21:11       ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 14/38] x86/msi: Consolidate MSI allocation Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 15/38] x86/msi: Use generic MSI domain ops Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 16/38] x86/irq: Move apic_post_init() invocation to one place Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 17/38] x86/pci: Reducde #ifdeffery in PCI init code Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-25 20:20   ` Bjorn Helgaas
2020-08-25 20:20     ` Bjorn Helgaas
2020-08-21  0:24 ` [patch RFC 18/38] x86/irq: Initialize PCI/MSI domain at PCI init time Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  6:31   ` kernel test robot
2020-08-21  0:24 ` [patch RFC 19/38] irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 20/38] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-25 20:04   ` Bjorn Helgaas
2020-08-25 20:04     ` Bjorn Helgaas
2020-08-21  0:24 ` [patch RFC 21/38] PCI: MSI: Provide pci_dev_has_special_msi_domain() helper Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-25 20:16   ` Bjorn Helgaas
2020-08-25 20:16     ` Bjorn Helgaas
2020-08-21  0:24 ` [patch RFC 22/38] x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init() Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-24  4:48   ` Jürgen Groß
2020-08-24  4:48     ` Jürgen Groß
2020-08-21  0:24 ` [patch RFC 23/38] x86/xen: Rework MSI teardown Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-24  5:09   ` Jürgen Groß
2020-08-24  5:09     ` Jürgen Groß
2020-08-21  0:24 ` [patch RFC 24/38] x86/xen: Consolidate XEN-MSI init Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-24  4:59   ` Jürgen Groß
2020-08-24  4:59     ` Jürgen Groß
2020-08-24 21:21     ` Thomas Gleixner
2020-08-24 21:21       ` Thomas Gleixner
2020-08-25  4:21       ` Jürgen Groß
2020-08-25  4:21         ` Jürgen Groß
2020-08-25  9:51         ` Thomas Gleixner
2020-08-25  9:51           ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 25/38] irqdomain/msi: Allow to override msi_domain_alloc/free_irqs() Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 26/38] x86/xen: Wrap XEN MSI management into irqdomain Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-24  6:21   ` Jürgen Groß
2020-08-24  6:21     ` Jürgen Groß
2020-08-25  7:57     ` Thomas Gleixner
2020-08-25  7:57       ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 27/38] iommm/vt-d: Store irq domain in struct device Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  7:03   ` kernel test robot
2020-08-21  0:24 ` [patch RFC 28/38] iommm/amd: " Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 29/38] x86/pci: Set default irq domain in pcibios_add_device() Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 30/38] PCI/MSI: Allow to disable arch fallbacks Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-25 20:07   ` Bjorn Helgaas
2020-08-25 20:07     ` Bjorn Helgaas
2020-08-25 21:28     ` Thomas Gleixner
2020-08-25 21:28       ` Thomas Gleixner
2020-08-25 21:35       ` Bjorn Helgaas
2020-08-25 21:35         ` Bjorn Helgaas
2020-08-25 21:40         ` Thomas Gleixner
2020-08-25 21:40           ` Thomas Gleixner
2020-08-25 22:03           ` Thomas Gleixner
2020-08-25 22:03             ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 31/38] x86/irq: Cleanup the arch_*_msi_irqs() leftovers Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 32/38] x86/irq: Make most MSI ops XEN private Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 33/38] x86/irq: Add DEV_MSI allocation type Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:24 ` [patch RFC 34/38] x86/msi: Let pci_msi_prepare() handle non-PCI MSI Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-25 20:24   ` Bjorn Helgaas
2020-08-25 20:24     ` Bjorn Helgaas
2020-08-25 21:30     ` Thomas Gleixner
2020-08-25 21:30       ` Thomas Gleixner
2020-08-25 21:50       ` Bjorn Helgaas
2020-08-25 21:50         ` Bjorn Helgaas
2020-08-21  0:24 ` [patch RFC 35/38] platform-msi: Provide default irq_chip::ack Thomas Gleixner
2020-08-21  0:24   ` Thomas Gleixner
2020-08-21  0:25 ` [patch RFC 36/38] platform-msi: Add device MSI infrastructure Thomas Gleixner
2020-08-21  0:25   ` Thomas Gleixner
2020-08-21  0:25 ` [patch RFC 37/38] irqdomain/msi: Provide msi_alloc/free_store() callbacks Thomas Gleixner
2020-08-21  0:25   ` Thomas Gleixner
2020-08-21  0:25 ` [patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING Thomas Gleixner
2020-08-21  0:25   ` Thomas Gleixner
2020-08-21  8:48   ` kernel test robot
2020-08-21 12:45   ` Jason Gunthorpe
2020-08-21 12:45     ` Jason Gunthorpe
2020-08-21 19:47     ` Thomas Gleixner
2020-08-21 19:47       ` Thomas Gleixner
2020-08-21 20:17       ` Jason Gunthorpe
2020-08-21 20:17         ` Jason Gunthorpe
2020-08-21 23:47         ` Thomas Gleixner
2020-08-21 23:47           ` Thomas Gleixner
2020-08-22  0:51           ` Jason Gunthorpe
2020-08-22  0:51             ` Jason Gunthorpe
2020-08-22  1:34             ` Thomas Gleixner
2020-08-22  1:34               ` Thomas Gleixner
2020-08-22 23:05               ` Jason Gunthorpe [this message]
2020-08-22 23:05                 ` Jason Gunthorpe
2020-08-23  8:03                 ` Thomas Gleixner
2020-08-23  8:03                   ` Thomas Gleixner
2020-08-22 14:19 ` [patch RFC 00/38] x86, PCI, XEN, genirq ...: Prepare for device MSI Jürgen Groß
2020-08-22 14:19   ` Jürgen Groß

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200822230511.GD1152540@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=haiyangz@microsoft.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=jgross@suse.com \
    --cc=jonathan.derrick@intel.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=maz@kernel.org \
    --cc=megha.dey@intel.com \
    --cc=rafael@kernel.org \
    --cc=rja@hpe.com \
    --cc=sivanich@hpe.com \
    --cc=sstabellini@kernel.org \
    --cc=steve.wahl@hpe.com \
    --cc=sthemmin@microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=wei.liu@kernel.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.