public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg
@ 2023-04-08 21:11 Dexuan Cui
  2023-04-11 17:27 ` Michael Kelley (LINUX)
  0 siblings, 1 reply; 4+ messages in thread
From: Dexuan Cui @ 2023-04-08 21:11 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, lpieralisi, kw, robh, bhelgaas,
	linux-hyperv, linux-pci, mikelley
  Cc: linux-kernel

4 commits are involved here:
A (2016): commit 0de8ce3ee8e3 ("PCI: hv: Allocate physically contiguous hypercall params buffer")
B (2017): commit be66b6736591 ("PCI: hv: Use page allocation for hbus structure")
C (2019): commit 877b911a5ba0 ("PCI: hv: Avoid a kmemleak false positive caused by the hbus buffer")
D (2018): commit 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments")

Patch D introduced the per-CPU hypercall input page "hyperv_pcpu_input_arg"
in 2018. With patch D, we no longer need the per-Hyper-V-PCI-bus hypercall
input page "hbus->retarget_msi_interrupt_params" that was added in patch A,
and the issue addressed by patch B is no longer an issue, and we can also
get rid of patch C.

The change here is required for PCI device assignment to work for
Confidential VMs (CVMs), because otherwise we would have to call
set_memory_decrypted() for "hbus->retarget_msi_interrupt_params" before
calling the hypercall HVCALL_RETARGET_INTERRUPT.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/pci/controller/pci-hyperv.c | 48 +++++------------------------
 1 file changed, 7 insertions(+), 41 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 337f3b4a04fc0..bc32662c6bb7f 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -508,20 +508,11 @@ struct hv_pcibus_device {
 	struct msi_domain_info msi_info;
 	struct irq_domain *irq_domain;
 
-	spinlock_t retarget_msi_interrupt_lock;
-
 	struct workqueue_struct *wq;
 
 	/* Highest slot of child device with resources allocated */
 	int wslot_res_allocated;
 	bool use_calls; /* Use hypercalls to access mmio cfg space */
-
-	/* hypercall arg, must not cross page boundary */
-	struct hv_retarget_device_interrupt retarget_msi_interrupt_params;
-
-	/*
-	 * Don't put anything here: retarget_msi_interrupt_params must be last
-	 */
 };
 
 /*
@@ -645,9 +636,9 @@ static void hv_arch_irq_unmask(struct irq_data *data)
 	hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata);
 	int_desc = data->chip_data;
 
-	spin_lock_irqsave(&hbus->retarget_msi_interrupt_lock, flags);
+	local_irq_save(flags);
 
-	params = &hbus->retarget_msi_interrupt_params;
+	params = *this_cpu_ptr(hyperv_pcpu_input_arg);
 	memset(params, 0, sizeof(*params));
 	params->partition_id = HV_PARTITION_ID_SELF;
 	params->int_entry.source = HV_INTERRUPT_SOURCE_MSI;
@@ -680,7 +671,7 @@ static void hv_arch_irq_unmask(struct irq_data *data)
 
 		if (!alloc_cpumask_var(&tmp, GFP_ATOMIC)) {
 			res = 1;
-			goto exit_unlock;
+			goto out;
 		}
 
 		cpumask_and(tmp, dest, cpu_online_mask);
@@ -689,7 +680,7 @@ static void hv_arch_irq_unmask(struct irq_data *data)
 
 		if (nr_bank <= 0) {
 			res = 1;
-			goto exit_unlock;
+			goto out;
 		}
 
 		/*
@@ -708,8 +699,8 @@ static void hv_arch_irq_unmask(struct irq_data *data)
 	res = hv_do_hypercall(HVCALL_RETARGET_INTERRUPT | (var_size << 17),
 			      params, NULL);
 
-exit_unlock:
-	spin_unlock_irqrestore(&hbus->retarget_msi_interrupt_lock, flags);
+out:
+	local_irq_restore(flags);
 
 	/*
 	 * During hibernation, when a CPU is offlined, the kernel tries
@@ -3598,35 +3589,11 @@ static int hv_pci_probe(struct hv_device *hdev,
 	bool enter_d0_retry = true;
 	int ret;
 
-	/*
-	 * hv_pcibus_device contains the hypercall arguments for retargeting in
-	 * hv_irq_unmask(). Those must not cross a page boundary.
-	 */
-	BUILD_BUG_ON(sizeof(*hbus) > HV_HYP_PAGE_SIZE);
-
 	bridge = devm_pci_alloc_host_bridge(&hdev->device, 0);
 	if (!bridge)
 		return -ENOMEM;
 
-	/*
-	 * With the recent 59bb47985c1d ("mm, sl[aou]b: guarantee natural
-	 * alignment for kmalloc(power-of-two)"), kzalloc() is able to allocate
-	 * a 4KB buffer that is guaranteed to be 4KB-aligned. Here the size and
-	 * alignment of hbus is important because hbus's field
-	 * retarget_msi_interrupt_params must not cross a 4KB page boundary.
-	 *
-	 * Here we prefer kzalloc to get_zeroed_page(), because a buffer
-	 * allocated by the latter is not tracked and scanned by kmemleak, and
-	 * hence kmemleak reports the pointer contained in the hbus buffer
-	 * (i.e. the hpdev struct, which is created in new_pcichild_device() and
-	 * is tracked by hbus->children) as memory leak (false positive).
-	 *
-	 * If the kernel doesn't have 59bb47985c1d, get_zeroed_page() *must* be
-	 * used to allocate the hbus buffer and we can avoid the kmemleak false
-	 * positive by using kmemleak_alloc() and kmemleak_free() to ask
-	 * kmemleak to track and scan the hbus buffer.
-	 */
-	hbus = kzalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
+	hbus = kzalloc(sizeof(*hbus), GFP_KERNEL);
 	if (!hbus)
 		return -ENOMEM;
 
@@ -3683,7 +3650,6 @@ static int hv_pci_probe(struct hv_device *hdev,
 	INIT_LIST_HEAD(&hbus->dr_list);
 	spin_lock_init(&hbus->config_lock);
 	spin_lock_init(&hbus->device_list_lock);
-	spin_lock_init(&hbus->retarget_msi_interrupt_lock);
 	hbus->wq = alloc_ordered_workqueue("hv_pci_%x", 0,
 					   hbus->bridge->domain_nr);
 	if (!hbus->wq) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* RE: [PATCH] PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg
  2023-04-08 21:11 [PATCH] PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg Dexuan Cui
@ 2023-04-11 17:27 ` Michael Kelley (LINUX)
  2023-04-13  1:41   ` Wei Liu
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Kelley (LINUX) @ 2023-04-11 17:27 UTC (permalink / raw)
  To: Dexuan Cui, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
	lpieralisi@kernel.org, kw@linux.com, robh@kernel.org,
	bhelgaas@google.com, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org

From: Dexuan Cui <decui@microsoft.com> Sent: Saturday, April 8, 2023 2:11 PM
> 
> 4 commits are involved here:
> A (2016): commit 0de8ce3ee8e3 ("PCI: hv: Allocate physically contiguous hypercall
> params buffer")
> B (2017): commit be66b6736591 ("PCI: hv: Use page allocation for hbus structure")
> C (2019): commit 877b911a5ba0 ("PCI: hv: Avoid a kmemleak false positive caused by
> the hbus buffer")
> D (2018): commit 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments")
> 
> Patch D introduced the per-CPU hypercall input page "hyperv_pcpu_input_arg"
> in 2018. With patch D, we no longer need the per-Hyper-V-PCI-bus hypercall
> input page "hbus->retarget_msi_interrupt_params" that was added in patch A,
> and the issue addressed by patch B is no longer an issue, and we can also
> get rid of patch C.
> 
> The change here is required for PCI device assignment to work for
> Confidential VMs (CVMs), because otherwise we would have to call
> set_memory_decrypted() for "hbus->retarget_msi_interrupt_params" before
> calling the hypercall HVCALL_RETARGET_INTERRUPT.

Well, not all CVMs.  It's not required for SEV-SNP vTOM VMs on Hyper-V because
of the paravisor.  Is it more accurate to say "for Confidential VMs (CVMs) running
without a paravisor"?

Otherwise,

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/pci/controller/pci-hyperv.c | 48 +++++------------------------
>  1 file changed, 7 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 337f3b4a04fc0..bc32662c6bb7f 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -508,20 +508,11 @@ struct hv_pcibus_device {
>  	struct msi_domain_info msi_info;
>  	struct irq_domain *irq_domain;
> 
> -	spinlock_t retarget_msi_interrupt_lock;
> -
>  	struct workqueue_struct *wq;
> 
>  	/* Highest slot of child device with resources allocated */
>  	int wslot_res_allocated;
>  	bool use_calls; /* Use hypercalls to access mmio cfg space */
> -
> -	/* hypercall arg, must not cross page boundary */
> -	struct hv_retarget_device_interrupt retarget_msi_interrupt_params;
> -
> -	/*
> -	 * Don't put anything here: retarget_msi_interrupt_params must be last
> -	 */
>  };
> 
>  /*
> @@ -645,9 +636,9 @@ static void hv_arch_irq_unmask(struct irq_data *data)
>  	hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata);
>  	int_desc = data->chip_data;
> 
> -	spin_lock_irqsave(&hbus->retarget_msi_interrupt_lock, flags);
> +	local_irq_save(flags);
> 
> -	params = &hbus->retarget_msi_interrupt_params;
> +	params = *this_cpu_ptr(hyperv_pcpu_input_arg);
>  	memset(params, 0, sizeof(*params));
>  	params->partition_id = HV_PARTITION_ID_SELF;
>  	params->int_entry.source = HV_INTERRUPT_SOURCE_MSI;
> @@ -680,7 +671,7 @@ static void hv_arch_irq_unmask(struct irq_data *data)
> 
>  		if (!alloc_cpumask_var(&tmp, GFP_ATOMIC)) {
>  			res = 1;
> -			goto exit_unlock;
> +			goto out;
>  		}
> 
>  		cpumask_and(tmp, dest, cpu_online_mask);
> @@ -689,7 +680,7 @@ static void hv_arch_irq_unmask(struct irq_data *data)
> 
>  		if (nr_bank <= 0) {
>  			res = 1;
> -			goto exit_unlock;
> +			goto out;
>  		}
> 
>  		/*
> @@ -708,8 +699,8 @@ static void hv_arch_irq_unmask(struct irq_data *data)
>  	res = hv_do_hypercall(HVCALL_RETARGET_INTERRUPT | (var_size << 17),
>  			      params, NULL);
> 
> -exit_unlock:
> -	spin_unlock_irqrestore(&hbus->retarget_msi_interrupt_lock, flags);
> +out:
> +	local_irq_restore(flags);
> 
>  	/*
>  	 * During hibernation, when a CPU is offlined, the kernel tries
> @@ -3598,35 +3589,11 @@ static int hv_pci_probe(struct hv_device *hdev,
>  	bool enter_d0_retry = true;
>  	int ret;
> 
> -	/*
> -	 * hv_pcibus_device contains the hypercall arguments for retargeting in
> -	 * hv_irq_unmask(). Those must not cross a page boundary.
> -	 */
> -	BUILD_BUG_ON(sizeof(*hbus) > HV_HYP_PAGE_SIZE);
> -
>  	bridge = devm_pci_alloc_host_bridge(&hdev->device, 0);
>  	if (!bridge)
>  		return -ENOMEM;
> 
> -	/*
> -	 * With the recent 59bb47985c1d ("mm, sl[aou]b: guarantee natural
> -	 * alignment for kmalloc(power-of-two)"), kzalloc() is able to allocate
> -	 * a 4KB buffer that is guaranteed to be 4KB-aligned. Here the size and
> -	 * alignment of hbus is important because hbus's field
> -	 * retarget_msi_interrupt_params must not cross a 4KB page boundary.
> -	 *
> -	 * Here we prefer kzalloc to get_zeroed_page(), because a buffer
> -	 * allocated by the latter is not tracked and scanned by kmemleak, and
> -	 * hence kmemleak reports the pointer contained in the hbus buffer
> -	 * (i.e. the hpdev struct, which is created in new_pcichild_device() and
> -	 * is tracked by hbus->children) as memory leak (false positive).
> -	 *
> -	 * If the kernel doesn't have 59bb47985c1d, get_zeroed_page() *must* be
> -	 * used to allocate the hbus buffer and we can avoid the kmemleak false
> -	 * positive by using kmemleak_alloc() and kmemleak_free() to ask
> -	 * kmemleak to track and scan the hbus buffer.
> -	 */
> -	hbus = kzalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
> +	hbus = kzalloc(sizeof(*hbus), GFP_KERNEL);
>  	if (!hbus)
>  		return -ENOMEM;
> 
> @@ -3683,7 +3650,6 @@ static int hv_pci_probe(struct hv_device *hdev,
>  	INIT_LIST_HEAD(&hbus->dr_list);
>  	spin_lock_init(&hbus->config_lock);
>  	spin_lock_init(&hbus->device_list_lock);
> -	spin_lock_init(&hbus->retarget_msi_interrupt_lock);
>  	hbus->wq = alloc_ordered_workqueue("hv_pci_%x", 0,
>  					   hbus->bridge->domain_nr);
>  	if (!hbus->wq) {
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg
  2023-04-11 17:27 ` Michael Kelley (LINUX)
@ 2023-04-13  1:41   ` Wei Liu
  2023-04-21  1:36     ` Dexuan Cui
  0 siblings, 1 reply; 4+ messages in thread
From: Wei Liu @ 2023-04-13  1:41 UTC (permalink / raw)
  To: Michael Kelley (LINUX)
  Cc: Dexuan Cui, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
	lpieralisi@kernel.org, kw@linux.com, robh@kernel.org,
	bhelgaas@google.com, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org

On Tue, Apr 11, 2023 at 05:27:41PM +0000, Michael Kelley (LINUX) wrote:
> From: Dexuan Cui <decui@microsoft.com> Sent: Saturday, April 8, 2023 2:11 PM
> > 
> > 4 commits are involved here:
> > A (2016): commit 0de8ce3ee8e3 ("PCI: hv: Allocate physically contiguous hypercall
> > params buffer")
> > B (2017): commit be66b6736591 ("PCI: hv: Use page allocation for hbus structure")
> > C (2019): commit 877b911a5ba0 ("PCI: hv: Avoid a kmemleak false positive caused by
> > the hbus buffer")
> > D (2018): commit 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments")
> > 
> > Patch D introduced the per-CPU hypercall input page "hyperv_pcpu_input_arg"
> > in 2018. With patch D, we no longer need the per-Hyper-V-PCI-bus hypercall
> > input page "hbus->retarget_msi_interrupt_params" that was added in patch A,
> > and the issue addressed by patch B is no longer an issue, and we can also
> > get rid of patch C.
> > 
> > The change here is required for PCI device assignment to work for
> > Confidential VMs (CVMs), because otherwise we would have to call
> > set_memory_decrypted() for "hbus->retarget_msi_interrupt_params" before
> > calling the hypercall HVCALL_RETARGET_INTERRUPT.
> 
> Well, not all CVMs.  It's not required for SEV-SNP vTOM VMs on Hyper-V because
> of the paravisor.  Is it more accurate to say "for Confidential VMs (CVMs) running
> without a paravisor"?
> 

Let me know how this text should be reworded.

> Otherwise,
> 
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH] PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg
  2023-04-13  1:41   ` Wei Liu
@ 2023-04-21  1:36     ` Dexuan Cui
  0 siblings, 0 replies; 4+ messages in thread
From: Dexuan Cui @ 2023-04-21  1:36 UTC (permalink / raw)
  To: Wei Liu, Michael Kelley (LINUX)
  Cc: KY Srinivasan, Haiyang Zhang, lpieralisi@kernel.org, kw@linux.com,
	robh@kernel.org, bhelgaas@google.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

> From: Wei Liu <wei.liu@kernel.org>
> Sent: Wednesday, April 12, 2023 6:42 PM
> > > ...
> > > The change here is required for PCI device assignment to work for
> > > Confidential VMs (CVMs), because otherwise we would have to call
> > > set_memory_decrypted() for "hbus->retarget_msi_interrupt_params"
> > > before calling the hypercall HVCALL_RETARGET_INTERRUPT.
> >
> > Well, not all CVMs.  It's not required for SEV-SNP vTOM VMs on 
> > Hyper-V because of the paravisor. Is it more accurate to say 
> > "for Confidential VMs (CVMs) running without a paravisor"?
> 
> Let me know how this text should be reworded.

Thanks, I just posted v2:
https://lwn.net/ml/linux-kernel/20230421013025.17152-1-decui%40microsoft.com/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-04-21  1:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-08 21:11 [PATCH] PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg Dexuan Cui
2023-04-11 17:27 ` Michael Kelley (LINUX)
2023-04-13  1:41   ` Wei Liu
2023-04-21  1:36     ` Dexuan Cui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox