All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH vfio] vfio/pci: remove msi domain on msi disable
@ 2023-09-14 19:14 Shannon Nelson
  2023-09-18 14:17 ` Jason Gunthorpe
  0 siblings, 1 reply; 15+ messages in thread
From: Shannon Nelson @ 2023-09-14 19:14 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian, reinette.chatre, tglx, kvm
  Cc: shannon.nelson, brett.creeley, linux-kernel

The new MSI dynamic allocation machinery is great for making the irq
management more flexible.  It includes caching information about the
MSI domain which gets reused on each new open of a VFIO fd.  However,
this causes an issue when the underlying hardware has flexible MSI-x
configurations, as a changed configuration doesn't get seen between
new opens, and is only refreshed between PCI unbind/bind cycles.

In our device we can change the per-VF MSI-x resource allocation
without the need for rebooting or function reset.  For example,

  1. Initial power up and kernel boot:
	# lspci -s 2e:00.1 -vv | grep MSI-X
	        Capabilities: [a0] MSI-X: Enable+ Count=8 Masked-

  2. Device VF configuration change happens with no reset

  3. New MSI-x count value seen:
	# lspci -s 2e:00.1 -vv | grep MSI-X
		Capabilities: [a0] MSI-X: Enable- Count=64 Masked-

This allows for per-VF dynamic reconfiguration of interrupt resources
for the VMs using the VFIO devices supported by our hardware.

The problem comes where the dynamic IRQ management created the MSI
domain when the VFIO device creates the first IRQ in the first ioctl()
VFIO_DEVICE_SET_IRQS request.  The current MSI-x count (hwsize) is read
when setting up the irq vectors under pci_alloc_irq_vectors_affinity(),
and the MSI domain information is set up, which includes the hwsize.

When the VFIO fd is closed, the IRQs are removed, but the MSI domain
information is kept for later use since we're only closing the current
VFIO fd, not unbinding the PCI device connection.  When a new VFIO fd
open happens and a new VFIO_DEVICE_SET_IRQS request comes down, the cycle
starts again, reusing the existing MSI domain with the previous hwsize.
This is fine until this new QEMU instance has read the new larger MSI-x
count from PCI config space (QEMU:vfio_msi_enable()) and tries to create
more IRQs than were available before.  We fail in msi_insert_desc()
because the MSI domain still is set up for the earlier hwsize and has no
room for the n+1 IRQ.

This can be easily fixed by simply adding msi_remove_device_irq_domain()
into vfio_msi_disable() which is called when the VFIO IRQs are removed
either by an ioctl() call from QEMU or from the VFIO fd close.  This forces
the MSI domain to be recreated with the new MSI-x count on the next
VFIO_DEVICE_SET_IRQS request.

Link: https://lore.kernel.org/all/cover.1683740667.git.reinette.chatre@intel.com/
Link: https://lore.kernel.org/r/20221124232325.798556374@linutronix.de
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
---
 drivers/vfio/pci/vfio_pci_intrs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index cbb4bcbfbf83..f66d5e7e078b 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -538,6 +538,7 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
 	cmd = vfio_pci_memory_lock_and_enable(vdev);
 	pci_free_irq_vectors(pdev);
 	vfio_pci_memory_unlock_and_restore(vdev, cmd);
+	msi_remove_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN);
 
 	/*
 	 * Both disable paths above use pci_intx_for_msi() to clear DisINTx
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread
* Re: [PATCH vfio] vfio/pci: remove msi domain on msi disable
@ 2023-09-15  7:38 kernel test robot
  0 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2023-09-15  7:38 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp

:::::: 
:::::: Manual check reason: "git am base is a link in commit message"
:::::: 

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20230914191406.54656-1-shannon.nelson@amd.com>
References: <20230914191406.54656-1-shannon.nelson@amd.com>
TO: Shannon Nelson <shannon.nelson@amd.com>
TO: alex.williamson@redhat.com
TO: jgg@ziepe.ca
TO: kevin.tian@intel.com
TO: reinette.chatre@intel.com
TO: tglx@linutronix.de
TO: kvm@vger.kernel.org
CC: shannon.nelson@amd.com
CC: brett.creeley@amd.com
CC: linux-kernel@vger.kernel.org

Hi Shannon,

kernel test robot noticed the following build errors:

[auto build test ERROR on awilliam-vfio/for-linus]
[cannot apply to awilliam-vfio/next linus/master v6.6-rc1 next-20230915]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shannon-Nelson/vfio-pci-remove-msi-domain-on-msi-disable/20230915-031544
base:   https://github.com/awilliam/linux-vfio.git for-linus
patch link:    https://lore.kernel.org/r/20230914191406.54656-1-shannon.nelson%40amd.com
patch subject: [PATCH vfio] vfio/pci: remove msi domain on msi disable
:::::: branch date: 12 hours ago
:::::: commit date: 12 hours ago
config: i386-randconfig-004-20230915 (https://download.01.org/0day-ci/archive/20230915/202309151535.q1bqfuzX-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230915/202309151535.q1bqfuzX-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/r/202309151535.q1bqfuzX-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/vfio/pci/vfio_pci_intrs.c: In function 'vfio_msi_disable':
>> drivers/vfio/pci/vfio_pci_intrs.c:541:9: error: implicit declaration of function 'msi_remove_device_irq_domain' [-Werror=implicit-function-declaration]
     541 |         msi_remove_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN);
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/msi_remove_device_irq_domain +541 drivers/vfio/pci/vfio_pci_intrs.c

89e1f7d4c66d85f Alex Williamson   2012-07-31  524  
536475109c82841 Max Gurtovoy      2021-08-26  525  static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
89e1f7d4c66d85f Alex Williamson   2012-07-31  526  {
89e1f7d4c66d85f Alex Williamson   2012-07-31  527  	struct pci_dev *pdev = vdev->pdev;
d74e9fd8c9c81e2 Reinette Chatre   2023-05-11  528  	struct vfio_pci_irq_ctx *ctx;
7f95ee792e66322 Reinette Chatre   2023-05-11  529  	unsigned long i;
abafbc551fddede Alex Williamson   2020-04-22  530  	u16 cmd;
89e1f7d4c66d85f Alex Williamson   2012-07-31  531  
7f95ee792e66322 Reinette Chatre   2023-05-11  532  	xa_for_each(&vdev->ctx, i, ctx) {
d74e9fd8c9c81e2 Reinette Chatre   2023-05-11  533  		vfio_virqfd_disable(&ctx->unmask);
d74e9fd8c9c81e2 Reinette Chatre   2023-05-11  534  		vfio_virqfd_disable(&ctx->mask);
287c9512e767e82 Reinette Chatre   2023-05-11  535  		vfio_msi_set_vector_signal(vdev, i, -1, msix);
89e1f7d4c66d85f Alex Williamson   2012-07-31  536  	}
89e1f7d4c66d85f Alex Williamson   2012-07-31  537  
abafbc551fddede Alex Williamson   2020-04-22  538  	cmd = vfio_pci_memory_lock_and_enable(vdev);
61771468e0a567f Christoph Hellwig 2016-09-11  539  	pci_free_irq_vectors(pdev);
abafbc551fddede Alex Williamson   2020-04-22  540  	vfio_pci_memory_unlock_and_restore(vdev, cmd);
d32739dbb39d42c Shannon Nelson    2023-09-14 @541  	msi_remove_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN);
89e1f7d4c66d85f Alex Williamson   2012-07-31  542  
c93a97ee0583cd6 Alex Williamson   2016-09-26  543  	/*
c93a97ee0583cd6 Alex Williamson   2016-09-26  544  	 * Both disable paths above use pci_intx_for_msi() to clear DisINTx
c93a97ee0583cd6 Alex Williamson   2016-09-26  545  	 * via their shutdown paths.  Restore for NoINTx devices.
c93a97ee0583cd6 Alex Williamson   2016-09-26  546  	 */
c93a97ee0583cd6 Alex Williamson   2016-09-26  547  	if (vdev->nointx)
c93a97ee0583cd6 Alex Williamson   2016-09-26  548  		pci_intx(pdev, 0);
c93a97ee0583cd6 Alex Williamson   2016-09-26  549  
89e1f7d4c66d85f Alex Williamson   2012-07-31  550  	vdev->irq_type = VFIO_PCI_NUM_IRQS;
89e1f7d4c66d85f Alex Williamson   2012-07-31  551  }
89e1f7d4c66d85f Alex Williamson   2012-07-31  552  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: [PATCH vfio] vfio/pci: remove msi domain on msi disable
@ 2023-09-15 14:22 kernel test robot
  2023-09-28  1:52 ` Liu, Yujie
  0 siblings, 1 reply; 15+ messages in thread
From: kernel test robot @ 2023-09-15 14:22 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp

:::::: 
:::::: Manual check reason: "git am base is a link in commit message"
:::::: 

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20230914191406.54656-1-shannon.nelson@amd.com>
References: <20230914191406.54656-1-shannon.nelson@amd.com>
TO: Shannon Nelson <shannon.nelson@amd.com>
TO: alex.williamson@redhat.com
TO: jgg@ziepe.ca
TO: kevin.tian@intel.com
TO: reinette.chatre@intel.com
TO: tglx@linutronix.de
TO: kvm@vger.kernel.org
CC: shannon.nelson@amd.com
CC: brett.creeley@amd.com
CC: linux-kernel@vger.kernel.org

Hi Shannon,

kernel test robot noticed the following build errors:

[auto build test ERROR on awilliam-vfio/for-linus]
[cannot apply to awilliam-vfio/next linus/master v6.6-rc1 next-20230915]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shannon-Nelson/vfio-pci-remove-msi-domain-on-msi-disable/20230915-031544
base:   https://github.com/awilliam/linux-vfio.git for-linus
patch link:    https://lore.kernel.org/r/20230914191406.54656-1-shannon.nelson%40amd.com
patch subject: [PATCH vfio] vfio/pci: remove msi domain on msi disable
:::::: branch date: 19 hours ago
:::::: commit date: 19 hours ago
config: x86_64-randconfig-071-20230915 (https://download.01.org/0day-ci/archive/20230915/202309152211.7YCkF75G-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230915/202309152211.7YCkF75G-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/r/202309152211.7YCkF75G-lkp@intel.com/

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "msi_remove_device_irq_domain" [drivers/vfio/pci/vfio-pci-core.ko] undefined!

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-09-28  1:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-14 19:14 [PATCH vfio] vfio/pci: remove msi domain on msi disable Shannon Nelson
2023-09-18 14:17 ` Jason Gunthorpe
2023-09-18 17:48   ` Nelson, Shannon
2023-09-18 23:32     ` Jason Gunthorpe
2023-09-19  0:13       ` Nelson, Shannon
2023-09-18 18:43   ` Thomas Gleixner
2023-09-18 23:37     ` Jason Gunthorpe
2023-09-18 23:47       ` Thomas Gleixner
2023-09-19  0:02         ` Jason Gunthorpe
2023-09-19  0:25           ` Thomas Gleixner
2023-09-19  0:32             ` Jason Gunthorpe
2023-09-19  0:57               ` Thomas Gleixner
  -- strict thread matches above, loose matches on Subject: below --
2023-09-15  7:38 kernel test robot
2023-09-15 14:22 kernel test robot
2023-09-28  1:52 ` Liu, Yujie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.