Linux-HyperV List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v5] net: mana: Add MAC address to vPort logs and clarify error messages
From: Simon Horman @ 2026-03-05  9:19 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, dipayanroy, shirazsaleem, kees,
	shradhagupta, gargaditya, linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260302174204.234837-1-ernis@linux.microsoft.com>

On Mon, Mar 02, 2026 at 09:41:52AM -0800, Erni Sri Satya Vennela wrote:
> Add MAC address to vPort configuration success message and update error
> message to be more specific about HWC message errors in
> mana_send_request.
> 
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> ---
> Changes in v5:
> * Remove __func__ and __LINE__ from error logs in hw_channel.c

Thanks for the update.

Reviewed-by: Simon Horman <horms@kernel.org>

^ permalink raw reply

* Re: [PATCH net-next v5] net: mana: Add MAC address to vPort logs and clarify error messages
From: patchwork-bot+netdevbpf @ 2026-03-05 11:30 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, dipayanroy, shirazsaleem, kees,
	shradhagupta, gargaditya, linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260302174204.234837-1-ernis@linux.microsoft.com>

Hello:

This patch was applied to netdev/net-next.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Mon,  2 Mar 2026 09:41:52 -0800 you wrote:
> Add MAC address to vPort configuration success message and update error
> message to be more specific about HWC message errors in
> mana_send_request.
> 
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> ---
> Changes in v5:
> * Remove __func__ and __LINE__ from error logs in hw_channel.c
> Changes in v4:
> * Remove logs that do not add value in hw_channel.c.
> Changes in v3:
> * Remove the changes from v2 and Update commit message.
> * Use "Enabled vPort ..." instead of "Configured vPort" in
>   mana_cfg_vport.
> * Update error logs in mana_hwc_send_request.
> Changes in v2:
> * Update commit message.
> * Use "Enabled vPort ..." instead of "Configured vPort" in
>   mana_cfg_vport.
> * Add info log in mana_uncfg_vport, mana_gd_verify_vf_version,
>   mana_gd_query_max_resources, mana_query_device_cfg and
>   mana_query_vport_cfg.
> 
> [...]

Here is the summary with links:
  - [net-next,v5] net: mana: Add MAC address to vPort logs and clarify error messages
    https://git.kernel.org/netdev/net-next/c/0172f8d80220

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH AUTOSEL 6.19-5.10] scsi: storvsc: Fix scheduling while atomic on PREEMPT_RT
From: Sasha Levin @ 2026-03-05 15:36 UTC (permalink / raw)
  To: patches, stable
  Cc: Jan Kiszka, Florian Bezdeka, Michael Kelley, Martin K. Petersen,
	Sasha Levin, kys, haiyangz, wei.liu, decui, longli,
	James.Bottomley, linux-hyperv, linux-scsi, linux-kernel
In-Reply-To: <20260305153704.106918-1-sashal@kernel.org>

From: Jan Kiszka <jan.kiszka@siemens.com>

[ Upstream commit 57297736c08233987e5d29ce6584c6ca2a831b12 ]

This resolves the follow splat and lock-up when running with PREEMPT_RT
enabled on Hyper-V:

[  415.140818] BUG: scheduling while atomic: stress-ng-iomix/1048/0x00000002
[  415.140822] INFO: lockdep is turned off.
[  415.140823] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec ghash_clmulni_intel aesni_intel rapl binfmt_misc nls_ascii nls_cp437 vfat fat snd_pcm hyperv_drm snd_timer drm_client_lib drm_shmem_helper snd sg soundcore drm_kms_helper pcspkr hv_balloon hv_utils evdev joydev drm configfs efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vsock vmw_vmci efivarfs autofs4 ext4 crc16 mbcache jbd2 sr_mod sd_mod cdrom hv_storvsc serio_raw hid_generic scsi_transport_fc hid_hyperv scsi_mod hid hv_netvsc hyperv_keyboard scsi_common
[  415.140846] Preemption disabled at:
[  415.140847] [<ffffffffc0656171>] storvsc_queuecommand+0x2e1/0xbe0 [hv_storvsc]
[  415.140854] CPU: 8 UID: 0 PID: 1048 Comm: stress-ng-iomix Not tainted 6.19.0-rc7 #30 PREEMPT_{RT,(full)}
[  415.140856] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/04/2024
[  415.140857] Call Trace:
[  415.140861]  <TASK>
[  415.140861]  ? storvsc_queuecommand+0x2e1/0xbe0 [hv_storvsc]
[  415.140863]  dump_stack_lvl+0x91/0xb0
[  415.140870]  __schedule_bug+0x9c/0xc0
[  415.140875]  __schedule+0xdf6/0x1300
[  415.140877]  ? rtlock_slowlock_locked+0x56c/0x1980
[  415.140879]  ? rcu_is_watching+0x12/0x60
[  415.140883]  schedule_rtlock+0x21/0x40
[  415.140885]  rtlock_slowlock_locked+0x502/0x1980
[  415.140891]  rt_spin_lock+0x89/0x1e0
[  415.140893]  hv_ringbuffer_write+0x87/0x2a0
[  415.140899]  vmbus_sendpacket_mpb_desc+0xb6/0xe0
[  415.140900]  ? rcu_is_watching+0x12/0x60
[  415.140902]  storvsc_queuecommand+0x669/0xbe0 [hv_storvsc]
[  415.140904]  ? HARDIRQ_verbose+0x10/0x10
[  415.140908]  ? __rq_qos_issue+0x28/0x40
[  415.140911]  scsi_queue_rq+0x760/0xd80 [scsi_mod]
[  415.140926]  __blk_mq_issue_directly+0x4a/0xc0
[  415.140928]  blk_mq_issue_direct+0x87/0x2b0
[  415.140931]  blk_mq_dispatch_queue_requests+0x120/0x440
[  415.140933]  blk_mq_flush_plug_list+0x7a/0x1a0
[  415.140935]  __blk_flush_plug+0xf4/0x150
[  415.140940]  __submit_bio+0x2b2/0x5c0
[  415.140944]  ? submit_bio_noacct_nocheck+0x272/0x360
[  415.140946]  submit_bio_noacct_nocheck+0x272/0x360
[  415.140951]  ext4_read_bh_lock+0x3e/0x60 [ext4]
[  415.140995]  ext4_block_write_begin+0x396/0x650 [ext4]
[  415.141018]  ? __pfx_ext4_da_get_block_prep+0x10/0x10 [ext4]
[  415.141038]  ext4_da_write_begin+0x1c4/0x350 [ext4]
[  415.141060]  generic_perform_write+0x14e/0x2c0
[  415.141065]  ext4_buffered_write_iter+0x6b/0x120 [ext4]
[  415.141083]  vfs_write+0x2ca/0x570
[  415.141087]  ksys_write+0x76/0xf0
[  415.141089]  do_syscall_64+0x99/0x1490
[  415.141093]  ? rcu_is_watching+0x12/0x60
[  415.141095]  ? finish_task_switch.isra.0+0xdf/0x3d0
[  415.141097]  ? rcu_is_watching+0x12/0x60
[  415.141098]  ? lock_release+0x1f0/0x2a0
[  415.141100]  ? rcu_is_watching+0x12/0x60
[  415.141101]  ? finish_task_switch.isra.0+0xe4/0x3d0
[  415.141103]  ? rcu_is_watching+0x12/0x60
[  415.141104]  ? __schedule+0xb34/0x1300
[  415.141106]  ? hrtimer_try_to_cancel+0x1d/0x170
[  415.141109]  ? do_nanosleep+0x8b/0x160
[  415.141111]  ? hrtimer_nanosleep+0x89/0x100
[  415.141114]  ? __pfx_hrtimer_wakeup+0x10/0x10
[  415.141116]  ? xfd_validate_state+0x26/0x90
[  415.141118]  ? rcu_is_watching+0x12/0x60
[  415.141120]  ? do_syscall_64+0x1e0/0x1490
[  415.141121]  ? do_syscall_64+0x1e0/0x1490
[  415.141123]  ? rcu_is_watching+0x12/0x60
[  415.141124]  ? do_syscall_64+0x1e0/0x1490
[  415.141125]  ? do_syscall_64+0x1e0/0x1490
[  415.141127]  ? irqentry_exit+0x140/0x7e0
[  415.141129]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

get_cpu() disables preemption while the spinlock hv_ringbuffer_write is
using is converted to an rt-mutex under PREEMPT_RT.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Tested-by: Florian Bezdeka <florian.bezdeka@siemens.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/0c7fb5cd-fb21-4760-8593-e04bade84744@siemens.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have enough context to analyze this commit thoroughly.

## Analysis

### 1. Problem Description
The commit fixes a **"BUG: scheduling while atomic"** crash and
**lockup** on Hyper-V VMs running with `PREEMPT_RT` enabled. The stack
trace in the commit message clearly shows the issue:

- `storvsc_queuecommand()` calls `get_cpu()` which disables preemption
- It then calls `storvsc_do_io()` → `vmbus_sendpacket_mpb_desc()` →
  `hv_ringbuffer_write()`
- `hv_ringbuffer_write()` takes a spinlock that, under PREEMPT_RT, is
  converted to an rt-mutex
- rt-mutexes can sleep/schedule, but preemption is disabled →
  **scheduling while atomic BUG**

### 2. The Fix
The fix replaces:
```c
ret = storvsc_do_io(dev, cmd_request, get_cpu());
put_cpu();
```
with:
```c
migrate_disable();
ret = storvsc_do_io(dev, cmd_request, smp_processor_id());
migrate_enable();
```

The purpose of `get_cpu()` here was to get a stable CPU number to use as
a channel index in `storvsc_do_io()`. The actual requirement is just to
prevent migration (so the CPU number stays valid), not to disable
preemption entirely. `migrate_disable()` achieves this while allowing
scheduling under PREEMPT_RT.

### 3. Correctness
- `migrate_disable()` prevents the task from being migrated to another
  CPU, so `smp_processor_id()` remains valid throughout the call
- On non-PREEMPT_RT kernels, this is functionally equivalent
  (migrate_disable maps to preempt_disable)
- On PREEMPT_RT, it allows the rt-mutex in `hv_ringbuffer_write()` to
  sleep as needed

### 4. Scope and Risk
- **3 lines changed** - extremely small and surgical
- Only affects `storvsc_queuecommand()` in the Hyper-V storage driver
- Well-understood transformation pattern
  (`get_cpu()`→`migrate_disable()`+`smp_processor_id()`) used
  extensively across the kernel for PREEMPT_RT fixes
- Has been **Tested-by** two people and **Reviewed-by** the Hyper-V
  subsystem expert (Michael Kelley)

### 5. User Impact
- **Hyper-V VMs with PREEMPT_RT**: This is a hard crash/lockup during
  normal I/O operations (ext4 writes), making the system completely
  unusable
- PREEMPT_RT is increasingly used in enterprise and embedded
  deployments, including on Hyper-V/Azure
- Without this fix, PREEMPT_RT kernels on Hyper-V are effectively broken
  for any storage I/O

### 6. Stable Criteria Assessment
- **Obviously correct**: Yes - standard PREEMPT_RT fix pattern
- **Fixes a real bug**: Yes - kernel BUG + lockup
- **Important issue**: Yes - system lockup during normal I/O
- **Small and contained**: Yes - 3-line change in one file
- **No new features**: Correct - pure bug fix
- **Tested**: Yes - two Tested-by tags

### Verification
- Read `storvsc_drv.c:1858` confirming the old code uses
  `get_cpu()`/`put_cpu()` (matches the diff)
- Read `storvsc_do_io()` at line 1453: confirms `q_num` is used as a CPU
  index into `stor_chns[]` array and passed to
  `cpumask_of_node(cpu_to_node(q_num))` - only needs migration
  stability, not preemption disabled
- Confirmed the call chain: `storvsc_do_io()` →
  `vmbus_sendpacket_mpb_desc()` (line 1547) → `hv_ringbuffer_write()`
  which takes a spinlock (as shown in the stack trace)
- The stack trace shows `rt_spin_lock` → `rtlock_slowlock_locked` →
  `schedule_rtlock` confirming the spinlock-to-rt-mutex conversion is
  the trigger
- `git log` confirmed this is in a well-maintained driver with recent
  activity

**YES**

 drivers/scsi/storvsc_drv.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index b43d876747b76..68c837146b9ea 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1855,8 +1855,9 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
 	cmd_request->payload_sz = payload_sz;
 
 	/* Invokes the vsc to start an IO */
-	ret = storvsc_do_io(dev, cmd_request, get_cpu());
-	put_cpu();
+	migrate_disable();
+	ret = storvsc_do_io(dev, cmd_request, smp_processor_id());
+	migrate_enable();
 
 	if (ret)
 		scsi_dma_unmap(scmnd);
-- 
2.51.0


^ permalink raw reply related

* RE: [PATCH 1/4] mshv: Support larger memory deposits
From: Michael Kelley @ 2026-03-05 19:43 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <177258381446.229866.108795434668770412.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Tuesday, March 3, 2026 4:24 PM
> 
> Convert hv_call_deposit_pages() into a wrapper supporting arbitrary number
> of pages, and use it in the memory deposit code paths.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/hv_proc.c |   50
> +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
> index 5f4fd9c3231c..0f84a70def30 100644
> --- a/drivers/hv/hv_proc.c
> +++ b/drivers/hv/hv_proc.c
> @@ -16,7 +16,7 @@
>  #define HV_DEPOSIT_MAX (HV_HYP_PAGE_SIZE / sizeof(u64) - 1)
> 
>  /* Deposits exact number of pages. Must be called with interrupts enabled.  */
> -int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
> +static int __hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
>  {
>  	struct page **pages, *page;
>  	int *counts;
> @@ -108,6 +108,54 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
>  	kfree(counts);
>  	return ret;
>  }
> +
> +/**
> + * hv_call_deposit_pages - Deposit memory pages to a partition
> + * @node        : NUMA node from which to allocate pages
> + * @partition_id: Target partition ID to deposit pages to
> + * @num_pages   : Number of pages to deposit
> + *
> + * Deposits memory pages to the specified partition. The deposit is
> + * performed in chunks of HV_DEPOSIT_MAX pages to handle large requests
> + * efficiently.
> + *
> + * Return: 0 on success, negative error code on failure

For the failure case, a key fact seems to be that there's no attempt to
withdraw any pages that might have been successfully deposited. In
such failure case, the caller has no information about how many pages
were, or were not, deposited. The 2x for L1VH further muddies the
picture.

__hv_call_deposit_pages() apparently assumes that if the underlying
hypercall fails, none of the pages were deposited.  So it frees all the
allocated pages. But I wonder if that's really true. The hypercall is
a rep hypercall, which can get partly through the list, return to the
guest, then restart where it left off.  If there's a failure after a
restart, I wonder if the hypercall goes back and withdraws any
pages that were successfully deposited before the restart. The
restart behaves like a new invocation of the hypercall.

> + */
> +int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)

Perhaps the num_pages parameter should be a u64. The u32 imposes
a limit of 8 Tbytes on the amount of memory that can be deposited
(allowing for the 2x multiplier for L1VH partitions). Azure has VM sizes
today with up to 30 Tbytes of memory, so it's certainly possible.

> +{
> +	u32 done;

Same here. Use u64.

> +	int ret = 0;
> +
> +	/*
> +	 * Do a double deposit for L1VH. This reserves enough memory for
> +	 * Hypervisor Hot Restart (HHR).
> +	 *
> +	 * During HHR, every data structure must be recreated in the new
> +	 * ("proto") hypervisor. Memory is required by the proto hypervisor
> +	 * to do this work.
> +	 *
> +	 * For regular L1 partitions, more memory can be requested from the
> +	 * root during HHR by sending an asynchronous message. But this is
> +	 * not supported for L1VHs. A guest must not be allowed to block
> +	 * HHR by refusing to deposit more memory.
> +	 *
> +	 * So for L1VH a deposit is always required for both current needs
> +	 * and future HHR work.
> +	 */
> +	if (hv_l1vh_partition())
> +		num_pages *= 2;
> +
> +	for (done = 0; done < num_pages; done += HV_DEPOSIT_MAX) {
> +		u32 to_deposit = min(num_pages - done, HV_DEPOSIT_MAX);
> +
> +		ret = __hv_call_deposit_pages(node, partition_id,
> +					      to_deposit);
> +		if (ret)
> +			break;
> +	}
> +
> +	return ret;
> +}
>  EXPORT_SYMBOL_GPL(hv_call_deposit_pages);
> 
>  int hv_deposit_memory_node(int node, u64 partition_id,
> 
> 


^ permalink raw reply

* RE: [PATCH 2/4] mshv: Fix pre-depositing of pages for partition initialization
From: Michael Kelley @ 2026-03-05 19:43 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <177258381999.229866.4628731518107275272.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Tuesday, March 3, 2026 4:24 PM
> 
> Deposit enough pages upfront to avoid partition initialization failures due
> to low memory. This also speeds up partition initialization.
> 
> Move page depositing from the hypercall wrapper to the partition
> initialization code. The required number of pages is empirical. This logic
> fits better in the partition initialization code than in the hypercall
> wrapper.
> 
> A partition with nested capability requires 40x more pages (20 MB) to
> accommodate the nested MSHV hypervisor. This may be improved in the future.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_root.h         |    1 +
>  drivers/hv/mshv_root_hv_call.c |    6 ------
>  drivers/hv/mshv_root_main.c    |   23 +++++++++++++++++++++--
>  3 files changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index 947dfb76bb19..40cf7bdbd62f 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -106,6 +106,7 @@ struct mshv_partition {
> 
>  	struct hlist_node pt_hnode;
>  	u64 pt_id;
> +	u64 pt_flags;
>  	refcount_t pt_ref_count;
>  	struct mutex pt_mutex;
> 
> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> index bdcb8de7fb47..b8d199f95299 100644
> --- a/drivers/hv/mshv_root_hv_call.c
> +++ b/drivers/hv/mshv_root_hv_call.c
> @@ -15,7 +15,6 @@
>  #include "mshv_root.h"
> 
>  /* Determined empirically */

I think the above comment applies to HV_INIT_PARTITION_DEPOSIT_PAGES
(not to HV_UMAP_GPA_PAGES) and should be removed.

> -#define HV_INIT_PARTITION_DEPOSIT_PAGES 208
>  #define HV_UMAP_GPA_PAGES		512
> 
>  #define HV_PAGE_COUNT_2M_ALIGNED(pg_count) (!((pg_count) & (0x200 - 1)))
> @@ -139,11 +138,6 @@ int hv_call_initialize_partition(u64 partition_id)
> 
>  	input.partition_id = partition_id;
> 
> -	ret = hv_call_deposit_pages(NUMA_NO_NODE, partition_id,
> -				    HV_INIT_PARTITION_DEPOSIT_PAGES);
> -	if (ret)
> -		return ret;
> -
>  	do {
>  		status = hv_do_fast_hypercall8(HVCALL_INITIALIZE_PARTITION,
>  					       *(u64 *)&input);
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index d753f41d3b57..fbfc50de332c 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -35,6 +35,10 @@
>  #include "mshv.h"
>  #include "mshv_root.h"
> 
> +/* The deposit values below are empirical and may need to be adjusted. */
> +#define MSHV_PARTITION_DEPOSIT_PAGES		(SZ_512K >> PAGE_SHIFT)
> +#define MSHV_PARTITION_DEPOSIT_PAGES_NESTED	(20 * SZ_1M >> PAGE_SHIFT)

Nit: The placement of these #defines *above* the MODULE_* notations seems
a bit odd to me. 

> +
>  MODULE_AUTHOR("Microsoft");
>  MODULE_LICENSE("GPL");
>  MODULE_DESCRIPTION("Microsoft Hyper-V root partition VMM interface /dev/mshv");
> @@ -1587,6 +1591,15 @@ mshv_partition_ioctl_set_msi_routing(struct
> mshv_partition *partition,
>  	return ret;
>  }
> 
> +static u64
> +mshv_partition_deposit_pages(struct mshv_partition *partition)

Nit: This function name makes it seem like it will "deposit pages".  Maybe
mshv_partition_get_deposit_cnt(), or something similar, would be better?

> +{
> +	if (partition->pt_flags &
> +	    HV_PARTITION_CREATION_FLAG_NESTED_VIRTUALIZATION_CAPABLE)
> +		return MSHV_PARTITION_DEPOSIT_PAGES_NESTED;
> +	return MSHV_PARTITION_DEPOSIT_PAGES;
> +}
> +
>  static long
>  mshv_partition_ioctl_initialize(struct mshv_partition *partition)
>  {
> @@ -1595,6 +1608,11 @@ mshv_partition_ioctl_initialize(struct mshv_partition *partition)
>  	if (partition->pt_initialized)
>  		return 0;
> 
> +	ret = hv_call_deposit_pages(NUMA_NO_NODE, partition->pt_id,
> +				    mshv_partition_deposit_pages(partition));
> +	if (ret)
> +		goto withdraw_mem;
> +
>  	ret = hv_call_initialize_partition(partition->pt_id);
>  	if (ret)
>  		goto withdraw_mem;
> @@ -1610,8 +1628,8 @@ mshv_partition_ioctl_initialize(struct mshv_partition *partition)
>  finalize_partition:
>  	hv_call_finalize_partition(partition->pt_id);
>  withdraw_mem:
> -	hv_call_withdraw_memory(U64_MAX, NUMA_NO_NODE, partition->pt_id);
> -
> +	hv_call_withdraw_memory(MSHV_PARTITION_DEPOSIT_PAGES,
> +				NUMA_NO_NODE, partition->pt_id);

What's the strategy here for withdrawing memory after a failure? As I noted in
Patch 1 of the series, there's no way to know how many pages were deposited.
Might have been zero, or significantly more than MSHV_PARTITION_DEPOSIT_PAGES.
And in Patches 3 and 4 of the series, there's no attempt to withdraw pages if
hv_call_deposit_pages() fails, which seems inconsistent.

>  	return ret;
>  }
> 
> @@ -2032,6 +2050,7 @@ mshv_ioctl_create_partition(void __user *user_arg, struct device *module_dev)
>  		return -ENOMEM;
> 
>  	partition->pt_module_dev = module_dev;
> +	partition->pt_flags = creation_flags;
>  	partition->isolation_type = isolation_properties.isolation_type;
> 
>  	refcount_set(&partition->pt_ref_count, 1);
> 
> 


^ permalink raw reply

* RE: [PATCH 3/4] mshv: Fix pre-depositing of pages for virtual processor initialization
From: Michael Kelley @ 2026-03-05 19:44 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <177258382549.229866.5072213647599344057.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Tuesday, March 3, 2026 4:24 PM
> 
> Deposit enough pages up front to avoid virtual processor creation failures
> due to low memory. This also speeds up guest creation. A VP uses 25% more
> pages in a partition with nested virtualization enabled, but the exact
> number doesn't vary much, so deposit a fixed number of pages per VP that
> works for nested virtualization.
> 
> Move page depositing from the hypercall wrapper to the virtual processor
> creation code. The required number of pages is based on empirical data.
> This logic fits better in the virtual processor creation code than in the
> hypercall wrapper.
> 
> Also withdraw the deposited memory if virtual processor creation fails.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/hv_proc.c        |    8 --------
>  drivers/hv/mshv_root_main.c |   11 ++++++++++-
>  2 files changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
> index 0f84a70def30..3d41f52efd9a 100644
> --- a/drivers/hv/hv_proc.c
> +++ b/drivers/hv/hv_proc.c
> @@ -251,14 +251,6 @@ int hv_call_create_vp(int node, u64 partition_id, u32
> vp_index, u32 flags)
>  	unsigned long irq_flags;
>  	int ret = 0;
> 
> -	/* Root VPs don't seem to need pages deposited */
> -	if (partition_id != hv_current_partition_id) {
> -		/* The value 90 is empirically determined. It may change. */
> -		ret = hv_call_deposit_pages(node, partition_id, 90);
> -		if (ret)
> -			return ret;
> -	}
> -
>  	do {
>  		local_irq_save(irq_flags);
> 
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index fbfc50de332c..48c842b6938d 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -38,6 +38,7 @@
>  /* The deposit values below are empirical and may need to be adjusted. */
>  #define MSHV_PARTITION_DEPOSIT_PAGES		(SZ_512K >> PAGE_SHIFT)
>  #define MSHV_PARTITION_DEPOSIT_PAGES_NESTED	(20 * SZ_1M >> PAGE_SHIFT)
> +#define MSHV_VP_DEPOSIT_PAGES			(1 * SZ_1M >> PAGE_SHIFT)
> 
>  MODULE_AUTHOR("Microsoft");
>  MODULE_LICENSE("GPL");
> @@ -1077,10 +1078,15 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
>  	if (partition->pt_vp_array[args.vp_index])
>  		return -EEXIST;
> 
> +	ret = hv_call_deposit_pages(NUMA_NO_NODE, partition->pt_id,
> +				    MSHV_VP_DEPOSIT_PAGES);
> +	if (ret)
> +		return ret;
> +
>  	ret = hv_call_create_vp(NUMA_NO_NODE, partition->pt_id, args.vp_index,
>  				0 /* Only valid for root partition VPs */);
>  	if (ret)
> -		return ret;
> +		goto withdraw_mem;
> 
>  	ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
>  				   HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> @@ -1177,6 +1183,9 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
>  			       intercept_msg_page, input_vtl_zero);
>  destroy_vp:
>  	hv_call_delete_vp(partition->pt_id, args.vp_index);
> +withdraw_mem:
> +	hv_call_withdraw_memory(MSHV_VP_DEPOSIT_PAGES, NUMA_NO_NODE,
> +				partition->pt_id);

If the partition is an L1VH partition, hv_call_deposit_pages() will have deposited
2 * MSHV_VP_DEPOSIT_PAGES, but here in the failure case you are withdrawing
only MSHV_VP_DEPOSIT_PAGES.

>  out:
>  	trace_mshv_create_vp(partition->pt_id, args.vp_index, ret);
>  	return ret;
> 
> 


^ permalink raw reply

* RE: [PATCH 4/4] mshv: Pre-deposit pages for SLAT creation
From: Michael Kelley @ 2026-03-05 19:44 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <177258383107.229866.16867493994305727391.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Tuesday, March 3, 2026 4:24 PM
> 
> Deposit enough pages up front to avoid guest address space region creation
> failures due to low memory. This also speeds up guest creation.
> 
> Calculate the required number of pages based on the guest's physical
> address space size, rounded up to 1 GB chunks. Even the smallest guests are
> assumed to need at least 1 GB worth of deposits. This is because every
> guest requires tens of megabytes of deposited pages for hypervisor
> overhead, making smaller deposits impractical.
> 
> Estimating in 1 GB chunks prevents over-depositing for larger guests while
> accepting some over-deposit for smaller ones. This trade-off keeps the
> estimate close to actual needs for larger guests.
> 
> Also withdraw the deposited pages if address space region creation fails.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_root_main.c |   25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index 48c842b6938d..cb5b4505f8eb 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -39,6 +39,7 @@
>  #define MSHV_PARTITION_DEPOSIT_PAGES		(SZ_512K >> PAGE_SHIFT)
>  #define MSHV_PARTITION_DEPOSIT_PAGES_NESTED	(20 * SZ_1M >> PAGE_SHIFT)
>  #define MSHV_VP_DEPOSIT_PAGES			(1 * SZ_1M >> PAGE_SHIFT)
> +#define MSHV_1G_DEPOSIT_PAGES			(6 * SZ_1M >> PAGE_SHIFT)
> 
>  MODULE_AUTHOR("Microsoft");
>  MODULE_LICENSE("GPL");
> @@ -1324,6 +1325,18 @@ static int mshv_prepare_pinned_region(struct mshv_mem_region *region)
>  	return ret;
>  }
> 
> +static u64
> +mshv_region_deposit_slat_pages(struct mshv_mem_region *region)

Same nit about the function name. This one seems like it will "deposit slat pages".

> +{
> +	u64 region_in_gbs, slat_pages;
> +
> +	/* SLAT needs 6 MB per 1 GB of address space. */
> +	region_in_gbs = DIV_ROUND_UP(region->nr_pages << HV_HYP_PAGE_SHIFT, SZ_1G);

This local variable "region_in_gbs" is computed in units of bytes.

> +	slat_pages = region_in_gbs * MSHV_1G_DEPOSIT_PAGES;

But here region_in_gbs is used as if it were in units of Gbytes.  So the
slat_pages return value is much larger than intended.

> +
> +	return slat_pages;
> +}
> +
>  /*
>   * This maps two things: guest RAM and for pci passthru mmio space.
>   *
> @@ -1364,6 +1377,11 @@ mshv_map_user_memory(struct mshv_partition *partition,
>  	if (ret)
>  		return ret;
> 
> +	ret = hv_call_deposit_pages(NUMA_NO_NODE, partition->pt_id,
> +				    mshv_region_deposit_slat_pages(region));
> +	if (ret)
> +		goto free_region;
> +
>  	switch (region->mreg_type) {
>  	case MSHV_REGION_TYPE_MEM_PINNED:
>  		ret = mshv_prepare_pinned_region(region);
> @@ -1392,7 +1410,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
>  				   region->hv_map_flags, ret);
> 
>  	if (ret)
> -		goto errout;
> +		goto withdraw_memory;
> 
>  	spin_lock(&partition->pt_mem_regions_lock);
>  	hlist_add_head(&region->hnode, &partition->pt_mem_regions);
> @@ -1400,7 +1418,10 @@ mshv_map_user_memory(struct mshv_partition *partition,
> 
>  	return 0;
> 
> -errout:
> +withdraw_memory:
> +	hv_call_withdraw_memory(mshv_region_deposit_slat_pages(region),
> +				NUMA_NO_NODE, partition->pt_id);

Again, for an L1VH partition, the actual number of pages deposited would
be 2x what mshv_region_deposit_slat_pages() returns.

> +free_region:
>  	vfree(region);
>  	return ret;
>  }
> 
> 


^ permalink raw reply

* [PATCH net-next] net: mana: Expose hardware diagnostic info via debugfs
From: Erni Sri Satya Vennela @ 2026-03-05 20:52 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, kotaranov, horms, shradhagupta,
	dipayanroy, yury.norov, kees, ernis, ssengar, shirazsaleem,
	linux-hyperv, netdev, linux-kernel, linux-rdma

Add debugfs entries to expose hardware configuration and diagnostic
information that aids in debugging driver initialization and runtime
operations without adding noise to dmesg.

Device-level entries (under /sys/kernel/debug/mana/<slot>/):
  - num_msix_usable, max_num_queues: Max resources from hardware
  - gdma_protocol_ver, pf_cap_flags1: VF version negotiation results
  - num_vports, bm_hostmode: Device configuration

  - port_handle: Hardware vPort handle
  - max_sq, max_rq: Max queues from vPort config
  - indir_table_sz: Indirection table size
  - steer_rx, steer_rss, steer_update_tab, steer_cqe_coalescing:
    Last applied steering configuration parameters

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
---
 .../net/ethernet/microsoft/mana/gdma_main.c   | 12 +++++++
 drivers/net/ethernet/microsoft/mana/mana_en.c | 31 +++++++++++++++++++
 include/net/mana/gdma.h                       |  1 +
 include/net/mana/mana.h                       |  8 +++++
 4 files changed, 52 insertions(+)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 0055c231acf6..2ba8d224fd26 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -152,6 +152,11 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
 	if (gc->max_num_queues > gc->num_msix_usable - 1)
 		gc->max_num_queues = gc->num_msix_usable - 1;
 
+	debugfs_create_u32("num_msix_usable", 0400, gc->mana_pci_debugfs,
+			   &gc->num_msix_usable);
+	debugfs_create_u32("max_num_queues", 0400, gc->mana_pci_debugfs,
+			   &gc->max_num_queues);
+
 	return 0;
 }
 
@@ -1221,6 +1226,13 @@ int mana_gd_verify_vf_version(struct pci_dev *pdev)
 		return err ? err : -EPROTO;
 	}
 	gc->pf_cap_flags1 = resp.pf_cap_flags1;
+	gc->gdma_protocol_ver = resp.gdma_protocol_ver;
+
+	debugfs_create_x64("gdma_protocol_ver", 0400, gc->mana_pci_debugfs,
+			   &gc->gdma_protocol_ver);
+	debugfs_create_x64("pf_cap_flags1", 0400, gc->mana_pci_debugfs,
+			   &gc->pf_cap_flags1);
+
 	if (resp.pf_cap_flags1 & GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG) {
 		err = mana_gd_query_hwc_timeout(pdev, &hwc->hwc_timeout);
 		if (err) {
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 53f24244de75..25ce81283e92 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1265,6 +1265,9 @@ static int mana_query_vport_cfg(struct mana_port_context *apc, u32 vport_index,
 	apc->port_handle = resp.vport;
 	ether_addr_copy(apc->mac_addr, resp.mac_addr);
 
+	apc->vport_max_sq = *max_sq;
+	apc->vport_max_rq = *max_rq;
+
 	return 0;
 }
 
@@ -1411,6 +1414,11 @@ static int mana_cfg_vport_steering(struct mana_port_context *apc,
 
 	netdev_info(ndev, "Configured steering vPort %llu entries %u\n",
 		    apc->port_handle, apc->indir_table_sz);
+
+	apc->steer_rx = rx;
+	apc->steer_rss = apc->rss_state;
+	apc->steer_update_tab = update_tab;
+	apc->steer_cqe_coalescing = req->cqe_coalescing_enable;
 out:
 	kfree(req);
 	return err;
@@ -3102,6 +3110,24 @@ static int mana_init_port(struct net_device *ndev)
 	eth_hw_addr_set(ndev, apc->mac_addr);
 	sprintf(vport, "vport%d", port_idx);
 	apc->mana_port_debugfs = debugfs_create_dir(vport, gc->mana_pci_debugfs);
+
+	debugfs_create_u64("port_handle", 0400, apc->mana_port_debugfs,
+			   &apc->port_handle);
+	debugfs_create_u32("max_sq", 0400, apc->mana_port_debugfs,
+			   &apc->vport_max_sq);
+	debugfs_create_u32("max_rq", 0400, apc->mana_port_debugfs,
+			   &apc->vport_max_rq);
+	debugfs_create_u32("indir_table_sz", 0400, apc->mana_port_debugfs,
+			   &apc->indir_table_sz);
+	debugfs_create_u32("steer_rx", 0400, apc->mana_port_debugfs,
+			   &apc->steer_rx);
+	debugfs_create_u32("steer_rss", 0400, apc->mana_port_debugfs,
+			   &apc->steer_rss);
+	debugfs_create_u32("steer_update_tab", 0400, apc->mana_port_debugfs,
+			   &apc->steer_update_tab);
+	debugfs_create_u32("steer_cqe_coalescing", 0400, apc->mana_port_debugfs,
+			   &apc->steer_cqe_coalescing);
+
 	return 0;
 
 reset_apc:
@@ -3587,6 +3613,11 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 		ac->num_ports = num_ports;
 
 		INIT_WORK(&ac->link_change_work, mana_link_state_handle);
+
+		debugfs_create_u16("num_vports", 0400, gc->mana_pci_debugfs,
+				   &ac->num_ports);
+		debugfs_create_u8("bm_hostmode", 0400, gc->mana_pci_debugfs,
+				  &ac->bm_hostmode);
 	} else {
 		if (ac->num_ports != num_ports) {
 			dev_err(dev, "The number of vPorts changed: %d->%d\n",
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 766f4fb25e26..9bbb7fb0c964 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -434,6 +434,7 @@ struct gdma_context {
 	struct gdma_dev		mana_ib;
 
 	u64 pf_cap_flags1;
+	u64 gdma_protocol_ver;
 
 	struct workqueue_struct *service_wq;
 
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index a078af283bdd..83f6de67c0cc 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -563,6 +563,14 @@ struct mana_port_context {
 
 	/* Debugfs */
 	struct dentry *mana_port_debugfs;
+
+	/* Cached vport/steering config for debugfs */
+	u32 vport_max_sq;
+	u32 vport_max_rq;
+	u32 steer_rx;
+	u32 steer_rss;
+	u32 steer_update_tab;
+	u32 steer_cqe_coalescing;
 };
 
 netdev_tx_t mana_start_xmit(struct sk_buff *skb, struct net_device *ndev);
-- 
2.34.1

^ permalink raw reply related

* Re: [PATCH 1/4] mshv: Support larger memory deposits
From: Mukesh R @ 2026-03-06  3:19 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui, longli
  Cc: linux-hyperv, linux-kernel
In-Reply-To: <177258381446.229866.108795434668770412.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

On 3/3/26 16:23, Stanislav Kinsburskii wrote:
> Convert hv_call_deposit_pages() into a wrapper supporting arbitrary number
> of pages, and use it in the memory deposit code paths.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>   drivers/hv/hv_proc.c |   50 +++++++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
> index 5f4fd9c3231c..0f84a70def30 100644
> --- a/drivers/hv/hv_proc.c
> +++ b/drivers/hv/hv_proc.c
> @@ -16,7 +16,7 @@
>   #define HV_DEPOSIT_MAX (HV_HYP_PAGE_SIZE / sizeof(u64) - 1)
>   
>   /* Deposits exact number of pages. Must be called with interrupts enabled.  */
> -int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
> +static int __hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
>   {
>   	struct page **pages, *page;
>   	int *counts;
> @@ -108,6 +108,54 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
>   	kfree(counts);
>   	return ret;
>   }
> +
> +/**
> + * hv_call_deposit_pages - Deposit memory pages to a partition
> + * @node        : NUMA node from which to allocate pages
> + * @partition_id: Target partition ID to deposit pages to
> + * @num_pages   : Number of pages to deposit
> + *
> + * Deposits memory pages to the specified partition. The deposit is
> + * performed in chunks of HV_DEPOSIT_MAX pages to handle large requests
> + * efficiently.
> + *
> + * Return: 0 on success, negative error code on failure
> + */
> +int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
> +{
> +	u32 done;
> +	int ret = 0;
> +
> +	/*
> +	 * Do a double deposit for L1VH. This reserves enough memory for
> +	 * Hypervisor Hot Restart (HHR).
> +	 *
> +	 * During HHR, every data structure must be recreated in the new
> +	 * ("proto") hypervisor. Memory is required by the proto hypervisor
> +	 * to do this work.
> +	 *
> +	 * For regular L1 partitions, more memory can be requested from the
> +	 * root during HHR by sending an asynchronous message. But this is
> +	 * not supported for L1VHs. A guest must not be allowed to block
> +	 * HHR by refusing to deposit more memory.
> +	 *
> +	 * So for L1VH a deposit is always required for both current needs
> +	 * and future HHR work.
> +	 */
> +	if (hv_l1vh_partition())
> +		num_pages *= 2;

I'm not sure if it is a good idea to just do this unconditionally for
all cases of l1vh. I'd like to experiment to see if this is actually
truy for all the passthru and interrupt related hypercalls that fail
with insuff memory.

> +
> +	for (done = 0; done < num_pages; done += HV_DEPOSIT_MAX) {
> +		u32 to_deposit = min(num_pages - done, HV_DEPOSIT_MAX);
> +
> +		ret = __hv_call_deposit_pages(node, partition_id,
> +					      to_deposit);
> +		if (ret)
> +			break;
> +	}
> +
> +	return ret;
> +}
>   EXPORT_SYMBOL_GPL(hv_call_deposit_pages);
>   
>   int hv_deposit_memory_node(int node, u64 partition_id,
> 
> 


^ permalink raw reply

* Re: [PATCH 2/4] mshv: Fix pre-depositing of pages for partition initialization
From: Mukesh R @ 2026-03-06  3:26 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui, longli
  Cc: linux-hyperv, linux-kernel
In-Reply-To: <177258381999.229866.4628731518107275272.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

On 3/3/26 16:23, Stanislav Kinsburskii wrote:
> Deposit enough pages upfront to avoid partition initialization failures due
> to low memory. This also speeds up partition initialization.

I am curious what kinda of failures are observerd. Normally, hypercall
would fail with insuff memory, and we continue to deposit till it
succeeds, right? Is there an issue there that some calls are not looping
in the deposit path?

> Move page depositing from the hypercall wrapper to the partition
> initialization code. The required number of pages is empirical. This logic
> fits better in the partition initialization code than in the hypercall
> wrapper.
> 
> A partition with nested capability requires 40x more pages (20 MB) to
> accommodate the nested MSHV hypervisor. This may be improved in the future.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>   drivers/hv/mshv_root.h         |    1 +
>   drivers/hv/mshv_root_hv_call.c |    6 ------
>   drivers/hv/mshv_root_main.c    |   23 +++++++++++++++++++++--
>   3 files changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index 947dfb76bb19..40cf7bdbd62f 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -106,6 +106,7 @@ struct mshv_partition {
>   
>   	struct hlist_node pt_hnode;
>   	u64 pt_id;
> +	u64 pt_flags;
>   	refcount_t pt_ref_count;
>   	struct mutex pt_mutex;
>   
> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> index bdcb8de7fb47..b8d199f95299 100644
> --- a/drivers/hv/mshv_root_hv_call.c
> +++ b/drivers/hv/mshv_root_hv_call.c
> @@ -15,7 +15,6 @@
>   #include "mshv_root.h"
>   
>   /* Determined empirically */
> -#define HV_INIT_PARTITION_DEPOSIT_PAGES 208
>   #define HV_UMAP_GPA_PAGES		512
>   
>   #define HV_PAGE_COUNT_2M_ALIGNED(pg_count) (!((pg_count) & (0x200 - 1)))
> @@ -139,11 +138,6 @@ int hv_call_initialize_partition(u64 partition_id)
>   
>   	input.partition_id = partition_id;
>   
> -	ret = hv_call_deposit_pages(NUMA_NO_NODE, partition_id,
> -				    HV_INIT_PARTITION_DEPOSIT_PAGES);
> -	if (ret)
> -		return ret;
> -
>   	do {
>   		status = hv_do_fast_hypercall8(HVCALL_INITIALIZE_PARTITION,
>   					       *(u64 *)&input);
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index d753f41d3b57..fbfc50de332c 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -35,6 +35,10 @@
>   #include "mshv.h"
>   #include "mshv_root.h"
>   
> +/* The deposit values below are empirical and may need to be adjusted. */
> +#define MSHV_PARTITION_DEPOSIT_PAGES		(SZ_512K >> PAGE_SHIFT)
> +#define MSHV_PARTITION_DEPOSIT_PAGES_NESTED	(20 * SZ_1M >> PAGE_SHIFT)

This suggests action rather than count. imo, much better would be:

   #define MSHV_PT_NUM_DEPOSIT_PAGES      	(SZ_512K >> PAGE_SHIFT)
   #define MSHV_PT_NUM_DEPOSIT_PAGES_NESTED	(20 * SZ_1M >> PAGE_SHIFT)

+
>   MODULE_AUTHOR("Microsoft");
>   MODULE_LICENSE("GPL");
>   MODULE_DESCRIPTION("Microsoft Hyper-V root partition VMM interface /dev/mshv");
> @@ -1587,6 +1591,15 @@ mshv_partition_ioctl_set_msi_routing(struct mshv_partition *partition,
>   	return ret;
>   }
>   
> +static u64
> +mshv_partition_deposit_pages(struct mshv_partition *partition)
> +{
> +	if (partition->pt_flags &
> +	    HV_PARTITION_CREATION_FLAG_NESTED_VIRTUALIZATION_CAPABLE)
> +		return MSHV_PARTITION_DEPOSIT_PAGES_NESTED;
> +	return MSHV_PARTITION_DEPOSIT_PAGES;
> +}
> +
>   static long
>   mshv_partition_ioctl_initialize(struct mshv_partition *partition)
>   {
> @@ -1595,6 +1608,11 @@ mshv_partition_ioctl_initialize(struct mshv_partition *partition)
>   	if (partition->pt_initialized)
>   		return 0;
>   
> +	ret = hv_call_deposit_pages(NUMA_NO_NODE, partition->pt_id,
> +				    mshv_partition_deposit_pages(partition));
> +	if (ret)
> +		goto withdraw_mem;
> +
>   	ret = hv_call_initialize_partition(partition->pt_id);
>   	if (ret)
>   		goto withdraw_mem;
> @@ -1610,8 +1628,8 @@ mshv_partition_ioctl_initialize(struct mshv_partition *partition)
>   finalize_partition:
>   	hv_call_finalize_partition(partition->pt_id);
>   withdraw_mem:
> -	hv_call_withdraw_memory(U64_MAX, NUMA_NO_NODE, partition->pt_id);
> -
> +	hv_call_withdraw_memory(MSHV_PARTITION_DEPOSIT_PAGES,
> +				NUMA_NO_NODE, partition->pt_id);
>   	return ret;
>   }
>   
> @@ -2032,6 +2050,7 @@ mshv_ioctl_create_partition(void __user *user_arg, struct device *module_dev)
>   		return -ENOMEM;
>   
>   	partition->pt_module_dev = module_dev;
> +	partition->pt_flags = creation_flags;
>   	partition->isolation_type = isolation_properties.isolation_type;
>   
>   	refcount_set(&partition->pt_ref_count, 1);
> 
> 


^ permalink raw reply

* Re: [PATCH 3/4] mshv: Fix pre-depositing of pages for virtual processor initialization
From: Mukesh R @ 2026-03-06  3:33 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui, longli
  Cc: linux-hyperv, linux-kernel
In-Reply-To: <177258382549.229866.5072213647599344057.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

On 3/3/26 16:23, Stanislav Kinsburskii wrote:
> Deposit enough pages up front to avoid virtual processor creation failures
> due to low memory. This also speeds up guest creation. A VP uses 25% more
> pages in a partition with nested virtualization enabled, but the exact
> number doesn't vary much, so deposit a fixed number of pages per VP that
> works for nested virtualization.
> 
> Move page depositing from the hypercall wrapper to the virtual processor
> creation code. The required number of pages is based on empirical data.
> This logic fits better in the virtual processor creation code than in the
> hypercall wrapper.
> 
> Also withdraw the deposited memory if virtual processor creation fails.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>   drivers/hv/hv_proc.c        |    8 --------
>   drivers/hv/mshv_root_main.c |   11 ++++++++++-
>   2 files changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
> index 0f84a70def30..3d41f52efd9a 100644
> --- a/drivers/hv/hv_proc.c
> +++ b/drivers/hv/hv_proc.c
> @@ -251,14 +251,6 @@ int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags)
>   	unsigned long irq_flags;
>   	int ret = 0;
>   
> -	/* Root VPs don't seem to need pages deposited */
> -	if (partition_id != hv_current_partition_id) {
> -		/* The value 90 is empirically determined. It may change. */
> -		ret = hv_call_deposit_pages(node, partition_id, 90);
> -		if (ret)
> -			return ret;
> -	}
> -
>   	do {
>   		local_irq_save(irq_flags);
>   
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index fbfc50de332c..48c842b6938d 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -38,6 +38,7 @@
>   /* The deposit values below are empirical and may need to be adjusted. */
>   #define MSHV_PARTITION_DEPOSIT_PAGES		(SZ_512K >> PAGE_SHIFT)
>   #define MSHV_PARTITION_DEPOSIT_PAGES_NESTED	(20 * SZ_1M >> PAGE_SHIFT)
> +#define MSHV_VP_DEPOSIT_PAGES			(1 * SZ_1M >> PAGE_SHIFT)


This seems to assume that each vp will use up total of 1M, and I don't
think that is the case. My understanding, hyp will reuse remaining chunks.
IOW, 6M maybe enought for 8 vcpus.


>   MODULE_AUTHOR("Microsoft");
>   MODULE_LICENSE("GPL");
> @@ -1077,10 +1078,15 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
>   	if (partition->pt_vp_array[args.vp_index])
>   		return -EEXIST;
>   
> +	ret = hv_call_deposit_pages(NUMA_NO_NODE, partition->pt_id,
> +				    MSHV_VP_DEPOSIT_PAGES);
> +	if (ret)
> +		return ret;
> +
>   	ret = hv_call_create_vp(NUMA_NO_NODE, partition->pt_id, args.vp_index,
>   				0 /* Only valid for root partition VPs */);
>   	if (ret)
> -		return ret;
> +		goto withdraw_mem;
>   
>   	ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
>   				   HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> @@ -1177,6 +1183,9 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
>   			       intercept_msg_page, input_vtl_zero);
>   destroy_vp:
>   	hv_call_delete_vp(partition->pt_id, args.vp_index);
> +withdraw_mem:
> +	hv_call_withdraw_memory(MSHV_VP_DEPOSIT_PAGES, NUMA_NO_NODE,
> +				partition->pt_id);
>   out:
>   	trace_mshv_create_vp(partition->pt_id, args.vp_index, ret);
>   	return ret;
> 
> 


^ permalink raw reply

* Re: [PATCH 4/4] mshv: Pre-deposit pages for SLAT creation
From: Mukesh R @ 2026-03-06  3:41 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui, longli
  Cc: linux-hyperv, linux-kernel
In-Reply-To: <177258383107.229866.16867493994305727391.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

On 3/3/26 16:23, Stanislav Kinsburskii wrote:
> Deposit enough pages up front to avoid guest address space region creation
> failures due to low memory. This also speeds up guest creation.

Does this imply that some hypercall fails and has no return of
insufficient memory?

> Calculate the required number of pages based on the guest's physical
> address space size, rounded up to 1 GB chunks. Even the smallest guests are
> assumed to need at least 1 GB worth of deposits. This is because every
> guest requires tens of megabytes of deposited pages for hypervisor
> overhead, making smaller deposits impractical.
> 
> Estimating in 1 GB chunks prevents over-depositing for larger guests while
> accepting some over-deposit for smaller ones. This trade-off keeps the
> estimate close to actual needs for larger guests.
> 
> Also withdraw the deposited pages if address space region creation fails.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>   drivers/hv/mshv_root_main.c |   25 +++++++++++++++++++++++--
>   1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index 48c842b6938d..cb5b4505f8eb 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -39,6 +39,7 @@
>   #define MSHV_PARTITION_DEPOSIT_PAGES		(SZ_512K >> PAGE_SHIFT)
>   #define MSHV_PARTITION_DEPOSIT_PAGES_NESTED	(20 * SZ_1M >> PAGE_SHIFT)
>   #define MSHV_VP_DEPOSIT_PAGES			(1 * SZ_1M >> PAGE_SHIFT)
> +#define MSHV_1G_DEPOSIT_PAGES			(6 * SZ_1M >> PAGE_SHIFT)
>   
>   MODULE_AUTHOR("Microsoft");
>   MODULE_LICENSE("GPL");
> @@ -1324,6 +1325,18 @@ static int mshv_prepare_pinned_region(struct mshv_mem_region *region)
>   	return ret;
>   }
>   
> +static u64
> +mshv_region_deposit_slat_pages(struct mshv_mem_region *region)

I don't think it is accurate to say slat pages, because in case of
overdeposit, they may be used for non-slat purposes according to my
understanding.

> +{
> +	u64 region_in_gbs, slat_pages;
> +
> +	/* SLAT needs 6 MB per 1 GB of address space. */
> +	region_in_gbs = DIV_ROUND_UP(region->nr_pages << HV_HYP_PAGE_SHIFT, SZ_1G);
> +	slat_pages = region_in_gbs * MSHV_1G_DEPOSIT_PAGES;
> +
> +	return slat_pages;
> +}
> +

Again, unconditionally depositing for each region is not good because
that is empirical, and hyp will reuse the leftover ram.

/*
>    * This maps two things: guest RAM and for pci passthru mmio space.
>    *
> @@ -1364,6 +1377,11 @@ mshv_map_user_memory(struct mshv_partition *partition,
>   	if (ret)
>   		return ret;
>   
> +	ret = hv_call_deposit_pages(NUMA_NO_NODE, partition->pt_id,
> +				    mshv_region_deposit_slat_pages(region));
> +	if (ret)
> +		goto free_region;
> +
>   	switch (region->mreg_type) {
>   	case MSHV_REGION_TYPE_MEM_PINNED:
>   		ret = mshv_prepare_pinned_region(region);
> @@ -1392,7 +1410,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
>   				   region->hv_map_flags, ret);
>   
>   	if (ret)
> -		goto errout;
> +		goto withdraw_memory;
>   
>   	spin_lock(&partition->pt_mem_regions_lock);
>   	hlist_add_head(&region->hnode, &partition->pt_mem_regions);
> @@ -1400,7 +1418,10 @@ mshv_map_user_memory(struct mshv_partition *partition,
>   
>   	return 0;
>   
> -errout:
> +withdraw_memory:
> +	hv_call_withdraw_memory(mshv_region_deposit_slat_pages(region),
> +				NUMA_NO_NODE, partition->pt_id);
> +free_region:
>   	vfree(region);
>   	return ret;
>   }
> 
> 


^ permalink raw reply

* Re: [PATCH 0/4] mshv: Fix and improve memory pre-depositing
From: Mukesh R @ 2026-03-06  3:44 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui, longli
  Cc: linux-hyperv, linux-kernel
In-Reply-To: <177258296744.229866.4926075663598294228.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

On 3/3/26 16:23, Stanislav Kinsburskii wrote:
> This series fixes and improves memory pre-depositing in the Microsoft
> Hypervisor (MSHV) driver to avoid partition and virtual processor
> creation failures due to insufficient deposited memory, and to speed
> up guest creation.
> 
> The first patch converts hv_call_deposit_pages() into a wrapper that
> supports arbitrarily large deposit requests by splitting them into
> HV_DEPOSIT_MAX-sized chunks. It also doubles the deposit amount for
> L1 virtual hypervisor (L1VH) partitions to reserve memory for
> Hypervisor Hot Restart (HHR), since L1VH guests cannot request
> additional memory from the root partition during HHR.
> 
> The second patch moves partition initialization page depositing from
> the hypercall wrapper to the partition initialization ioctl. The
> required number of pages is determined empirically. Partitions with
> nested virtualization capability require significantly more pages
> (20 MB) to accommodate the nested hypervisor. The partition creation
> flags are saved in the partition structure to allow selecting the
> correct deposit size at initialization time.
> 
> The third patch moves virtual processor page depositing from
> hv_call_create_vp() to mshv_partition_ioctl_create_vp(). A fixed
> deposit of 1 MB per VP is used, which covers both regular and nested
> virtualization cases. Deposited memory is now properly withdrawn if
> VP creation fails.
> 
> The fourth patch adds pre-depositing of pages for guest address space
> (SLAT) region creation. The deposit size is calculated based on the
> region size rounded up to 1 GB chunks, with 6 MB deposited per GB of
> address space. Deposited pages are withdrawn on failure.


Can't we just get away with changing deposit for most cases to just
2M? My theory is with that we won't really find any measurable
performance hits, and it keeps things simple.

Thanks,
-Mukesh


> ---
> 
> Stanislav Kinsburskii (4):
>        mshv: Support larger memory deposits
>        mshv: Fix pre-depositing of pages for partition initialization
>        mshv: Fix pre-depositing of pages for virtual processor initialization
>        mshv: Pre-deposit pages for SLAT creation
> 
> 
>   drivers/hv/hv_proc.c           |   58 +++++++++++++++++++++++++++++++++------
>   drivers/hv/mshv_root.h         |    1 +
>   drivers/hv/mshv_root_hv_call.c |    6 ----
>   drivers/hv/mshv_root_main.c    |   59 +++++++++++++++++++++++++++++++++++++---
>   4 files changed, 104 insertions(+), 20 deletions(-)
> 


^ permalink raw reply

* Re: [PATCH 4/4] mshv: Pre-deposit pages for SLAT creation
From: Mukesh R @ 2026-03-06  3:54 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui, longli
  Cc: linux-hyperv, linux-kernel
In-Reply-To: <177258383107.229866.16867493994305727391.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

On 3/3/26 16:23, Stanislav Kinsburskii wrote:
> Deposit enough pages up front to avoid guest address space region creation
> failures due to low memory. This also speeds up guest creation.
> 
> Calculate the required number of pages based on the guest's physical
> address space size, rounded up to 1 GB chunks. Even the smallest guests are
> assumed to need at least 1 GB worth of deposits. This is because every
> guest requires tens of megabytes of deposited pages for hypervisor
> overhead, making smaller deposits impractical.
> 
> Estimating in 1 GB chunks prevents over-depositing for larger guests while
> accepting some over-deposit for smaller ones. This trade-off keeps the
> estimate close to actual needs for larger guests.
> 
> Also withdraw the deposited pages if address space region creation fails.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>   drivers/hv/mshv_root_main.c |   25 +++++++++++++++++++++++--
>   1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index 48c842b6938d..cb5b4505f8eb 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -39,6 +39,7 @@
>   #define MSHV_PARTITION_DEPOSIT_PAGES		(SZ_512K >> PAGE_SHIFT)
>   #define MSHV_PARTITION_DEPOSIT_PAGES_NESTED	(20 * SZ_1M >> PAGE_SHIFT)
>   #define MSHV_VP_DEPOSIT_PAGES			(1 * SZ_1M >> PAGE_SHIFT)
> +#define MSHV_1G_DEPOSIT_PAGES			(6 * SZ_1M >> PAGE_SHIFT)
>   
>   MODULE_AUTHOR("Microsoft");
>   MODULE_LICENSE("GPL");
> @@ -1324,6 +1325,18 @@ static int mshv_prepare_pinned_region(struct mshv_mem_region *region)
>   	return ret;
>   }
>   
> +static u64
> +mshv_region_deposit_slat_pages(struct mshv_mem_region *region)
> +{
> +	u64 region_in_gbs, slat_pages;
> +
> +	/* SLAT needs 6 MB per 1 GB of address space. */
> +	region_in_gbs = DIV_ROUND_UP(region->nr_pages << HV_HYP_PAGE_SHIFT, SZ_1G);
> +	slat_pages = region_in_gbs * MSHV_1G_DEPOSIT_PAGES;
> +
> +	return slat_pages;
> +}
> +
>   /*
>    * This maps two things: guest RAM and for pci passthru mmio space.
>    *
> @@ -1364,6 +1377,11 @@ mshv_map_user_memory(struct mshv_partition *partition,
>   	if (ret)
>   		return ret;
>   
> +	ret = hv_call_deposit_pages(NUMA_NO_NODE, partition->pt_id,
> +				    mshv_region_deposit_slat_pages(region));
> +	if (ret)
> +		goto free_region;
> +


Also, for MSHV_REGION_TYPE_MEM_PINNED, deposit is not needed.


>   	switch (region->mreg_type) {
>   	case MSHV_REGION_TYPE_MEM_PINNED:
>   		ret = mshv_prepare_pinned_region(region);
> @@ -1392,7 +1410,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
>   				   region->hv_map_flags, ret);
>   
>   	if (ret)
> -		goto errout;
> +		goto withdraw_memory;
>   
>   	spin_lock(&partition->pt_mem_regions_lock);
>   	hlist_add_head(&region->hnode, &partition->pt_mem_regions);
> @@ -1400,7 +1418,10 @@ mshv_map_user_memory(struct mshv_partition *partition,
>   
>   	return 0;
>   
> -errout:
> +withdraw_memory:
> +	hv_call_withdraw_memory(mshv_region_deposit_slat_pages(region),
> +				NUMA_NO_NODE, partition->pt_id);
> +free_region:
>   	vfree(region);
>   	return ret;
>   }
> 
> 


^ permalink raw reply

* RE: [PATCH 4/4] mshv: Pre-deposit pages for SLAT creation
From: mhklkml @ 2026-03-06  4:15 UTC (permalink / raw)
  To: 'Michael Kelley', 'Stanislav Kinsburskii', kys,
	haiyangz, wei.liu, decui, longli
  Cc: linux-hyperv, linux-kernel
In-Reply-To: <SN6PR02MB4157C408547E59A469C5CE08D47DA@SN6PR02MB4157.namprd02.prod.outlook.com>

From: Michael Kelley <mhklinux@outlook.com> Sent: Thursday, March 5, 2026 11:45 AM
> 
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Tuesday, March
3, 2026 4:24 PM
> >
> > Deposit enough pages up front to avoid guest address space region creation
> > failures due to low memory. This also speeds up guest creation.
> >
> > Calculate the required number of pages based on the guest's physical
> > address space size, rounded up to 1 GB chunks. Even the smallest guests are
> > assumed to need at least 1 GB worth of deposits. This is because every
> > guest requires tens of megabytes of deposited pages for hypervisor
> > overhead, making smaller deposits impractical.
> >
> > Estimating in 1 GB chunks prevents over-depositing for larger guests while
> > accepting some over-deposit for smaller ones. This trade-off keeps the
> > estimate close to actual needs for larger guests.
> >
> > Also withdraw the deposited pages if address space region creation fails.
> >
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >  drivers/hv/mshv_root_main.c |   25 +++++++++++++++++++++++--
> >  1 file changed, 23 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> > index 48c842b6938d..cb5b4505f8eb 100644
> > --- a/drivers/hv/mshv_root_main.c
> > +++ b/drivers/hv/mshv_root_main.c
> > @@ -39,6 +39,7 @@
> >  #define MSHV_PARTITION_DEPOSIT_PAGES		(SZ_512K >> PAGE_SHIFT)
> >  #define MSHV_PARTITION_DEPOSIT_PAGES_NESTED	(20 * SZ_1M >> PAGE_SHIFT)
> >  #define MSHV_VP_DEPOSIT_PAGES			(1 * SZ_1M >> PAGE_SHIFT)
> > +#define MSHV_1G_DEPOSIT_PAGES			(6 * SZ_1M >> PAGE_SHIFT)
> >
> >  MODULE_AUTHOR("Microsoft");
> >  MODULE_LICENSE("GPL");
> > @@ -1324,6 +1325,18 @@ static int mshv_prepare_pinned_region(struct
mshv_mem_region *region)
> >  	return ret;
> >  }
> >
> > +static u64
> > +mshv_region_deposit_slat_pages(struct mshv_mem_region *region)
> 
> Same nit about the function name. This one seems like it will "deposit slat pages".
> 
> > +{
> > +	u64 region_in_gbs, slat_pages;
> > +
> > +	/* SLAT needs 6 MB per 1 GB of address space. */
> > +	region_in_gbs = DIV_ROUND_UP(region->nr_pages << HV_HYP_PAGE_SHIFT, SZ_1G);
> 
> This local variable "region_in_gbs" is computed in units of bytes.

Ignore this comment and the following one in this function. I saw the
ROUND_UP(), but somehow failed to see that it was DIV_ROUND_UP().  :-(

Michael

> 
> > +	slat_pages = region_in_gbs * MSHV_1G_DEPOSIT_PAGES;
> 
> But here region_in_gbs is used as if it were in units of Gbytes.  So the
> slat_pages return value is much larger than intended.
> 
> > +
> > +	return slat_pages;
> > +}
> > +
> >  /*
> >   * This maps two things: guest RAM and for pci passthru mmio space.
> >   *
> > @@ -1364,6 +1377,11 @@ mshv_map_user_memory(struct mshv_partition *partition,
> >  	if (ret)
> >  		return ret;
> >
> > +	ret = hv_call_deposit_pages(NUMA_NO_NODE, partition->pt_id,
> > +				    mshv_region_deposit_slat_pages(region));
> > +	if (ret)
> > +		goto free_region;
> > +
> >  	switch (region->mreg_type) {
> >  	case MSHV_REGION_TYPE_MEM_PINNED:
> >  		ret = mshv_prepare_pinned_region(region);
> > @@ -1392,7 +1410,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
> >  				   region->hv_map_flags, ret);
> >
> >  	if (ret)
> > -		goto errout;
> > +		goto withdraw_memory;
> >
> >  	spin_lock(&partition->pt_mem_regions_lock);
> >  	hlist_add_head(&region->hnode, &partition->pt_mem_regions);
> > @@ -1400,7 +1418,10 @@ mshv_map_user_memory(struct mshv_partition *partition,
> >
> >  	return 0;
> >
> > -errout:
> > +withdraw_memory:
> > +	hv_call_withdraw_memory(mshv_region_deposit_slat_pages(region),
> > +				NUMA_NO_NODE, partition->pt_id);
> 
> Again, for an L1VH partition, the actual number of pages deposited would
> be 2x what mshv_region_deposit_slat_pages() returns.
> 
> > +free_region:
> >  	vfree(region);
> >  	return ret;
> >  }
> >
> >
> 



^ permalink raw reply

* Re: [PATCH net-next] net: mana: Force full-page RX buffers for 4K page size on specific systems.
From: Dipayaan Roy @ 2026-03-06 13:12 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, leon, longli, kotaranov, horms, shradhagupta, ssengar,
	ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, dipayanroy
In-Reply-To: <81b7e296-0cfe-4c24-ac97-7f6c712aa0e9@redhat.com>

On Tue, Mar 03, 2026 at 11:56:29AM +0100, Paolo Abeni wrote:
> On 2/27/26 11:15 AM, Dipayaan Roy wrote:
> > On certain systems configured with 4K PAGE_SIZE, utilizing page_pool
> > fragments for RX buffers results in a significant throughput regression.
> > Profiling reveals that this regression correlates with high overhead in the
> > fragment allocation and reference counting paths on these specific
> > platforms, rendering the multi-buffer-per-page strategy counterproductive.
> > 
> > To mitigate this, bypass the page_pool fragment path and force a single RX
> > packet per page allocation when all the following conditions are met:
> >   1. The system is configured with a 4K PAGE_SIZE.
> >   2. A processor-specific quirk is detected via SMBIOS Type 4 data.
> > 
> > This approach restores expected line-rate performance by ensuring
> > predictable RX refill behavior on affected hardware.
> > 
> > There is no behavioral change for systems using larger page sizes
> > (16K/64K), or platforms where this processor-specific quirk do not
> > apply.
> > 
> > Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
> > ---
> >  .../net/ethernet/microsoft/mana/gdma_main.c   | 120 ++++++++++++++++++
> >  drivers/net/ethernet/microsoft/mana/mana_en.c |  23 +++-
> >  include/net/mana/gdma.h                       |  10 ++
> >  3 files changed, 151 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > index 0055c231acf6..26bbe736a770 100644
> > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > @@ -9,6 +9,7 @@
> >  #include <linux/msi.h>
> >  #include <linux/irqdomain.h>
> >  #include <linux/export.h>
> > +#include <linux/dmi.h>
> >  
> >  #include <net/mana/mana.h>
> >  #include <net/mana/hw_channel.h>
> > @@ -1955,6 +1956,115 @@ static bool mana_is_pf(unsigned short dev_id)
> >  	return dev_id == MANA_PF_DEVICE_ID;
> >  }
> >  
> > +/*
> > + * Table for Processor Version strings found from SMBIOS Type 4 information,
> > + * for processors that needs to force single RX buffer per page quirk for
> > + * meeting line rate performance with ARM64 + 4K pages.
> > + * Note: These strings are exactly matched with version fetched from SMBIOS.
> > + */
> > +static const char * const mana_single_rxbuf_per_page_quirk_tbl[] = {
> > +	"Cobalt 200",
> > +};
> > +
> > +static const char *smbios_get_string(const struct dmi_header *hdr, u8 idx)
> > +{
> > +	const u8 *start, *end;
> > +	u8 i;
> > +
> > +	/* Indexing starts from 1. */
> > +	if (!idx)
> > +		return NULL;
> > +
> > +	start   = (const u8 *)hdr + hdr->length;
> > +	end = start + SMBIOS_STR_AREA_MAX;
> > +
> > +	for (i = 1; i < idx; i++) {
> > +		while (start < end && *start)
> > +			start++;
> > +		if (start < end)
> > +			start++;
> > +		if (start + 1 < end && start[0] == 0 && start[1] == 0)
> > +			return NULL;
> > +	}
> > +
> > +	if (start >= end || *start == 0)
> > +		return NULL;
> > +
> > +	return (const char *)start;
> 
> If I read correctly, the above sort of duplicate dmi_decode_table().
>
Yes, its not exported.
 
> I think you are better of:
> - use the mana_get_proc_ver_from_smbios() decoder to store the
> SMBIOS_TYPE4_PROC_VERSION_OFFSET index into gd
> - do a 2nd walk with a different decoder to fetch the string at the
> specified index.
Sure, will implement the 2nd walk for fetching string in v2.

> 
> /P

Thank you Paolo, for the comments, and apologies in my delay in response as this week I am on-call.
I will send out v2 with the changes suggested.

Regards

^ permalink raw reply

* Re: [PATCH net-next] net: mana: Force full-page RX buffers for 4K page size on specific systems.
From: Dipayaan Roy @ 2026-03-06 13:25 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, leon, longli, kotaranov, horms, shradhagupta, ssengar,
	ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, dipayanroy
In-Reply-To: <03ac38dc-69c5-4641-98ea-5679465c0b7f@redhat.com>

On Tue, Mar 03, 2026 at 12:56:35PM +0100, Paolo Abeni wrote:
> On 2/27/26 11:15 AM, Dipayaan Roy wrote:
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > index 91c418097284..a53a8921050b 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > @@ -748,6 +748,26 @@ static void *mana_get_rxbuf_pre(struct mana_rxq *rxq, dma_addr_t *da)
> >  	return va;
> >  }
> >  
> > +static inline bool
> > +mana_use_single_rxbuf_per_page(struct mana_port_context *apc, u32 mtu)
> > +{
> 
> I almost forgot: please avoid the 'inline' keyword in .c files. This is
> function used only once, should be inlined by the compiler anyway.
>
Ack, will remove it in v2.
> > +	struct gdma_context *gc = apc->ac->gdma_dev->gdma_context;
> > +
> > +	/* On some systems with 4K PAGE_SIZE, page_pool RX fragments can
> > +	 * trigger a throughput regression. Hence forces one RX buffer per page
> > +	 * to avoid the fragment allocation/refcounting overhead in the RX
> > +	 * refill path for those processors only.
> > +	 */
> > +	if (gc->force_full_page_rx_buffer)
> > +		return true;
> 
> Side note: since you could keep the above flag up2date according to the
> current mtu and xdp configuration and just test it in the data path.
> 
If not an issue, would like to keep it this way for better readability.
> /P
> 

Regrads

^ permalink raw reply

* [PATCH net-next, v2] net: mana: Force full-page RX buffers for 4K page size on specific systems.
From: Dipayaan Roy @ 2026-03-06 13:33 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
	ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, dipayanroy

On certain systems configured with 4K PAGE_SIZE, utilizing page_pool
fragments for RX buffers results in a significant throughput regression.
Profiling reveals that this regression correlates with high overhead in the
fragment allocation and reference counting paths on these specific
platforms, rendering the multi-buffer-per-page strategy counterproductive.

To mitigate this, bypass the page_pool fragment path and force a single RX
packet per page allocation when all the following conditions are met:
  1. The system is configured with a 4K PAGE_SIZE.
  2. A processor-specific quirk is detected via SMBIOS Type 4 data.

This approach restores expected line-rate performance by ensuring
predictable RX refill behavior on affected hardware.

There is no behavioral change for systems using larger page sizes
(16K/64K), or platforms where this processor-specific quirk do not
apply.

Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
---
Changes in v2:
  - separate reading string index and the string, remove inline.
---
---
 .../net/ethernet/microsoft/mana/gdma_main.c   | 133 ++++++++++++++++++
 drivers/net/ethernet/microsoft/mana/mana_en.c |  23 ++-
 include/net/mana/gdma.h                       |   9 ++
 3 files changed, 163 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index aef8612b73cb..05fecc00a90c 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -9,6 +9,7 @@
 #include <linux/msi.h>
 #include <linux/irqdomain.h>
 #include <linux/export.h>
+#include <linux/dmi.h>
 
 #include <net/mana/mana.h>
 #include <net/mana/hw_channel.h>
@@ -1959,6 +1960,128 @@ static bool mana_is_pf(unsigned short dev_id)
 	return dev_id == MANA_PF_DEVICE_ID;
 }
 
+/*
+ * Table for Processor Version strings found from SMBIOS Type 4 information,
+ * for processors that needs to force single RX buffer per page quirk for
+ * meeting line rate performance with ARM64 + 4K pages.
+ * Note: These strings are exactly matched with version fetched from SMBIOS.
+ */
+static const char * const mana_single_rxbuf_per_page_quirk_tbl[] = {
+	"Cobalt 200",
+};
+
+/* On some systems with 4K PAGE_SIZE, page_pool RX fragments can
+ * trigger a throughput regression. Hence identify those processors
+ * from the extracted SMBIOS table and apply the quirk to forces one
+ * RX buffer per page to avoid the fragment allocation/refcounting
+ * overhead in the RX refill path for those processors only.
+ */
+static bool mana_needs_single_rxbuf_per_page(struct gdma_context *gc)
+{
+	int i = 0;
+	const char *ver = gc->processor_version;
+
+	if (!ver)
+		return false;
+
+	if (PAGE_SIZE != SZ_4K)
+		return false;
+
+	while (i < ARRAY_SIZE(mana_single_rxbuf_per_page_quirk_tbl)) {
+		if (!strcmp(ver, mana_single_rxbuf_per_page_quirk_tbl[i]))
+			return true;
+		i++;
+	}
+
+	return false;
+}
+
+static void mana_get_proc_ver_strno(const struct dmi_header *hdr, void *data)
+{
+	struct gdma_context *gc = data;
+	const u8 *d = (const u8 *)hdr;
+
+	/* We are only looking for Type 4: Processor Information */
+	if (hdr->type != SMBIOS_TYPE_4_PROCESSOR_INFO)
+		return;
+
+	/* Ensure the record is long enough to contain the Processor Version
+	 * field
+	 */
+	if (hdr->length <= SMBIOS_TYPE4_PROC_VERSION_OFFSET)
+		return;
+
+	/* The 'Processor Version' string is located at index pointed by
+	 * SMBIOS_TYPE4_PROC_VERSION_OFFSET.  Make a copy of the index.
+	 * There could be multiple Type 4 tables so read and store the
+	 * processor version index found the first time.
+	 */
+	if (gc->proc_ver_strno)
+		return;
+
+	gc->proc_ver_strno = d[SMBIOS_TYPE4_PROC_VERSION_OFFSET];
+}
+
+static const char *mana_dmi_string_nosave(const struct dmi_header *hdr, u8 s)
+{
+	const char *bp = (const char *)hdr + hdr->length;
+
+	if (!s)
+		return NULL;
+
+	/* String numbers start at 1 */
+	while (--s > 0 && *bp)
+		bp += strlen(bp) + 1;
+
+	if (!*bp)
+		return NULL;
+
+	return bp;
+}
+
+static void mana_fetch_proc_ver_string(const struct dmi_header *hdr,
+				       void *data)
+{
+	struct gdma_context *gc = data;
+	const char *ver;
+
+	/* We are only looking for Type 4: Processor Information */
+	if (hdr->type != SMBIOS_TYPE_4_PROCESSOR_INFO)
+		return;
+
+	/* Extract proc version found the first time only */
+	if (!gc->proc_ver_strno || gc->processor_version)
+		return;
+
+	ver = mana_dmi_string_nosave(hdr, gc->proc_ver_strno);
+	if (ver)
+		gc->processor_version = kstrdup(ver, GFP_KERNEL);
+}
+
+/* Check and initialize all processor optimizations/quirks here */
+static bool mana_init_processor_optimization(struct gdma_context *gc)
+{
+	bool opt_initialized = false;
+
+	gc->proc_ver_strno = 0;
+	gc->processor_version = NULL;
+
+	dmi_walk(mana_get_proc_ver_strno, gc);
+	if (!gc->proc_ver_strno)
+		return false;
+
+	dmi_walk(mana_fetch_proc_ver_string, gc);
+	if (!gc->processor_version)
+		return false;
+
+	if (mana_needs_single_rxbuf_per_page(gc)) {
+		gc->force_full_page_rx_buffer = true;
+		opt_initialized = true;
+	}
+
+	return opt_initialized;
+}
+
 static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
 	struct gdma_context *gc;
@@ -2013,6 +2136,11 @@ static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		gc->mana_pci_debugfs = debugfs_create_dir(pci_slot_name(pdev->slot),
 							  mana_debugfs_root);
 
+	if (mana_init_processor_optimization(gc))
+		dev_info(&pdev->dev,
+			 "Processor specific optimization initialized on: %s\n",
+			gc->processor_version);
+
 	err = mana_gd_setup(pdev);
 	if (err)
 		goto unmap_bar;
@@ -2055,6 +2183,8 @@ static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	pci_iounmap(pdev, bar0_va);
 free_gc:
 	pci_set_drvdata(pdev, NULL);
+	kfree(gc->processor_version);
+	gc->processor_version = NULL;
 	vfree(gc);
 release_region:
 	pci_release_regions(pdev);
@@ -2110,6 +2240,9 @@ static void mana_gd_remove(struct pci_dev *pdev)
 
 	pci_iounmap(pdev, gc->bar0_va);
 
+	kfree(gc->processor_version);
+	gc->processor_version = NULL;
+
 	vfree(gc);
 
 	pci_release_regions(pdev);
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index a868c28c8280..38f94f7619ad 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -744,6 +744,26 @@ static void *mana_get_rxbuf_pre(struct mana_rxq *rxq, dma_addr_t *da)
 	return va;
 }
 
+static bool
+mana_use_single_rxbuf_per_page(struct mana_port_context *apc, u32 mtu)
+{
+	struct gdma_context *gc = apc->ac->gdma_dev->gdma_context;
+
+	/* On some systems with 4K PAGE_SIZE, page_pool RX fragments can
+	 * trigger a throughput regression. Hence forces one RX buffer per page
+	 * to avoid the fragment allocation/refcounting overhead in the RX
+	 * refill path for those processors only.
+	 */
+	if (gc->force_full_page_rx_buffer)
+		return true;
+
+	/* For xdp and jumbo frames make sure only one packet fits per page. */
+	if (mtu + MANA_RXBUF_PAD > PAGE_SIZE / 2 || mana_xdp_get(apc))
+		return true;
+
+	return false;
+}
+
 /* Get RX buffer's data size, alloc size, XDP headroom based on MTU */
 static void mana_get_rxbuf_cfg(struct mana_port_context *apc,
 			       int mtu, u32 *datasize, u32 *alloc_size,
@@ -754,8 +774,7 @@ static void mana_get_rxbuf_cfg(struct mana_port_context *apc,
 	/* Calculate datasize first (consistent across all cases) */
 	*datasize = mtu + ETH_HLEN;
 
-	/* For xdp and jumbo frames make sure only one packet fits per page */
-	if (mtu + MANA_RXBUF_PAD > PAGE_SIZE / 2 || mana_xdp_get(apc)) {
+	if (mana_use_single_rxbuf_per_page(apc, mtu)) {
 		if (mana_xdp_get(apc)) {
 			*headroom = XDP_PACKET_HEADROOM;
 			*alloc_size = PAGE_SIZE;
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index ec17004b10c0..be56b347f3f6 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -9,6 +9,12 @@
 
 #include "shm_channel.h"
 
+/* SMBIOS Type 4: Processor Information table */
+#define SMBIOS_TYPE_4_PROCESSOR_INFO 4
+
+/* Byte offset containing the Processor Version string number.*/
+#define SMBIOS_TYPE4_PROC_VERSION_OFFSET 0x10
+
 #define GDMA_STATUS_MORE_ENTRIES	0x00000105
 #define GDMA_STATUS_CMD_UNSUPPORTED	0xffffffff
 
@@ -444,6 +450,9 @@ struct gdma_context {
 	struct workqueue_struct *service_wq;
 
 	unsigned long		flags;
+	u8			*processor_version;
+	u8			proc_ver_strno;
+	bool			force_full_page_rx_buffer;
 };
 
 static inline bool mana_gd_is_mana(struct gdma_dev *gd)
-- 
2.34.1


^ permalink raw reply related

* RE: [PATCH v2] hv_utils: Allow implicit ICTIMESYNCFLAG_SYNC
From: David Balazic @ 2026-03-06 14:25 UTC (permalink / raw)
  To: pmartincic@linux.microsoft.com, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20231127213524.52783-1-pmartincic@linux.microsoft.com>

On Mon, Nov 27, 2023 at 01:35:24PM -0800, pmartincic@linux.microsoft.com wrote:

> From: Peter Martincic <pmartincic@microsoft.com>
>
> Hyper-V hosts can omit the _SYNC flag to due a bug on resume from modern
> suspend. In such a case, the guest may fail to update its time-of-day to
> account for the period when it was suspended, and could proceed with a
> significantly wrong time-of-day. In such a case when the guest is significantly
> behind, fix it by treating a _SAMPLE the same as if _SYNC was received so that
> the guest time-of-day is updated.
>
> This is hidden behind param hv_utils.timesync_implicit.
>
> Signed-off-by: Peter Martincic <pmartincic@microsoft.com>

Hi!

As Peters mail does not seem to exists any more, I'll ask here (LKML and LHV), if all right:

Is there any news  about this bug on Hyper-V hosts?
If this is actually a Hyper-V (host) bug, was it ever addressed and fixed?

I encounter this bug when using Windows 11 Enterprise (25H2) host and Oracle Linux 8 and 9 as guests.
They use an older kernel, so I can't use this parameter to fix it.

Windows guests do not seem to be affected by it, in my experience.

Feel free to reply off list, as it is not really on topic.

Regards,
David Balažic



The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Any opinions expressed are mine and do not necessarily represent the opinions of the Company. Emails are susceptible to interference. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is strictly prohibited and may be unlawful. If you have received this message in error, do not open any attachments but please notify the Endava Service Desk on (+44 (0)870 423 0187), and delete this message from your system. The sender accepts no responsibility for information, errors or omissions in this email, or for its use or misuse, or for any act committed or omitted in connection with this communication. If in doubt, please verify the authenticity of the contents with the sender. Please rely on your own virus checkers as no responsibility is taken by the sender for any damage rising out of any bug or virus infection.

Endava plc is a company registered in England under company number 5722669 whose registered office is at 125 Old Broad Street, London, EC2N 1AR, United Kingdom. Endava plc is the Endava group holding company and does not provide any services to clients. Each of Endava plc and its subsidiaries is a separate legal entity and has no liability for another such entity's acts or omissions.

^ permalink raw reply

* Re: [v2,net-next,1/6] net: mana: Create separate EQs for each vPort
From: Simon Horman @ 2026-03-06 16:08 UTC (permalink / raw)
  To: longli
  Cc: Simon Horman, davem, edumazet, haiyangz, linux-rdma, linux-hyperv,
	kuba, wei.liu, shradhagupta, pabeni, kotaranov, schakrabarti,
	erick.archer, linux-kernel, kys, decui, netdev
In-Reply-To: <20260304000017.333312-2-longli@microsoft.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.

For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net: mana: Create separate EQs for each vPort

To prepare for assigning vPorts to dedicated MSI-X vectors, this commit
removes EQ sharing among vPorts and creates dedicated EQs for each vPort.
The EQ definition is moved from struct mana_context to struct
mana_port_context and related support functions are updated.

> diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> --- a/include/net/mana/mana.h
> +++ b/include/net/mana/mana.h

[ ... ]

> @@ -499,6 +497,9 @@ struct mana_port_context {
>
>  	u8 mac_addr[ETH_ALEN];
>
> +	struct mana_eq *eqs;
> +	struct dentry *mana_eqs_debugfs;
> +
>  	enum TRI_STATE rss_state;
>
>  	mana_handle_t default_rxobj;

[ ... ]

> diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
> --- a/drivers/infiniband/hw/mana/qp.c
> +++ b/drivers/infiniband/hw/mana/qp.c
> @@ -188,7 +188,7 @@ struct mana_obj_spec cq_spec = {};
>  		cq_spec.gdma_region = cq->queue.gdma_region;
>  		cq_spec.queue_size = cq->cqe * COMP_ENTRY_SIZE;
>  		cq_spec.modr_ctx_id = 0;
> -		eq = &mpc->ac->eqs[cq->comp_vector];
> +		eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
                           ^^^^^^^
>  		cq_spec.attached_eq = eq->eq->id;

Can mpc->eqs be NULL here? Looking at the initialization paths:

Before this patch, mpc->ac->eqs was allocated during mana_probe() and was
always non-NULL after driver initialization.

After this patch, mpc->eqs is only allocated in two places:
1. mana_alloc_queues() - called when the Ethernet interface is brought up
2. mana_ib_cfg_vport() - called when RDMA RAW QPs configure the vport

However, mana_ib_create_qp_rss() doesn't call mana_ib_cfg_vport(). If an
RSS QP is created via RDMA verbs before the Ethernet port is brought up
and before any RAW QP has configured the vport, mpc->eqs would still be
NULL.

Computing &mpc->eqs[cq->comp_vector % mpc->num_queues] from a NULL pointer
produces an invalid address, and the subsequent dereference eq->eq->id
would crash the kernel.

Should there be a NULL check here, or should mana_ib_create_qp_rss()
ensure EQs are created before reaching this point?

>
>  		ret = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_RQ,

[ ... ]

^ permalink raw reply

* RE: [EXTERNAL] Re: [v2,net-next,1/6] net: mana: Create separate EQs for each vPort
From: Long Li @ 2026-03-06 20:45 UTC (permalink / raw)
  To: Simon Horman
  Cc: davem@davemloft.net, edumazet@google.com, Haiyang Zhang,
	linux-rdma@vger.kernel.org, linux-hyperv@vger.kernel.org,
	kuba@kernel.org, wei.liu@kernel.org,
	shradhagupta@linux.microsoft.com, pabeni@redhat.com,
	Konstantin Taranov, schakrabarti@linux.microsoft.com,
	erick.archer@outlook.com, linux-kernel@vger.kernel.org,
	KY Srinivasan, Dexuan Cui, netdev@vger.kernel.org
In-Reply-To: <20260306160820.525640-1-horms@kernel.org>

> Subject: [EXTERNAL] Re: [v2,net-next,1/6] net: mana: Create separate EQs for
> each vPort
> 
> This is an AI-generated review of your patch. The human sending this email has
> considered the AI review valid, or at least plausible.
> 
> For local reproduction steps see
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnetdev-
> ai.bots.linux.dev%2Fai-
> local.html&data=05%7C02%7Clongli%40microsoft.com%7C070979f149574e63
> e15a08de7b9aa19e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6
> 39084101208945710%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnR
> ydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3
> D%3D%7C0%7C%7C%7C&sdata=Cs%2BZwNguYa8FGbpb4H0naVDhDzhjzBvDU7
> 5aZekerzc%3D&reserved=0
> ---
> net: mana: Create separate EQs for each vPort
> 
> To prepare for assigning vPorts to dedicated MSI-X vectors, this commit
> removes EQ sharing among vPorts and creates dedicated EQs for each vPort.
> The EQ definition is moved from struct mana_context to struct
> mana_port_context and related support functions are updated.
> 
> > diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> > --- a/include/net/mana/mana.h
> > +++ b/include/net/mana/mana.h
> 
> [ ... ]
> 
> > @@ -499,6 +497,9 @@ struct mana_port_context {
> >
> >  	u8 mac_addr[ETH_ALEN];
> >
> > +	struct mana_eq *eqs;
> > +	struct dentry *mana_eqs_debugfs;
> > +
> >  	enum TRI_STATE rss_state;
> >
> >  	mana_handle_t default_rxobj;
> 
> [ ... ]
> 
> > diff --git a/drivers/infiniband/hw/mana/qp.c
> > b/drivers/infiniband/hw/mana/qp.c
> > --- a/drivers/infiniband/hw/mana/qp.c
> > +++ b/drivers/infiniband/hw/mana/qp.c
> > @@ -188,7 +188,7 @@ struct mana_obj_spec cq_spec = {};
> >  		cq_spec.gdma_region = cq->queue.gdma_region;
> >  		cq_spec.queue_size = cq->cqe * COMP_ENTRY_SIZE;
> >  		cq_spec.modr_ctx_id = 0;
> > -		eq = &mpc->ac->eqs[cq->comp_vector];
> > +		eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
>                            ^^^^^^^
> >  		cq_spec.attached_eq = eq->eq->id;
> 
> Can mpc->eqs be NULL here? Looking at the initialization paths:
> 
> Before this patch, mpc->ac->eqs was allocated during mana_probe() and was
> always non-NULL after driver initialization.
> 
> After this patch, mpc->eqs is only allocated in two places:
> 1. mana_alloc_queues() - called when the Ethernet interface is brought up 2.
> mana_ib_cfg_vport() - called when RDMA RAW QPs configure the vport
> 
> However, mana_ib_create_qp_rss() doesn't call mana_ib_cfg_vport(). If an RSS
> QP is created via RDMA verbs before the Ethernet port is brought up and
> before any RAW QP has configured the vport, mpc->eqs would still be NULL.
> 
> Computing &mpc->eqs[cq->comp_vector % mpc->num_queues] from a NULL
> pointer produces an invalid address, and the subsequent dereference eq->eq-
> >id would crash the kernel.
> 
> Should there be a NULL check here, or should mana_ib_create_qp_rss() ensure
> EQs are created before reaching this point?

I'm fixing it in v3.

Thank you,
Long


^ permalink raw reply

* [PATCH net-next] net: mana: hardening: Validate doorbell ID from GDMA_REGISTER_DEVICE response
From: Erni Sri Satya Vennela @ 2026-03-06 21:12 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, kotaranov, horms, shradhagupta,
	dipayanroy, yury.norov, kees, linux-hyperv, netdev, linux-kernel,
	linux-rdma
  Cc: Erni Sri Satya Vennela

As a part of MANA hardening for CVM, add validation for the doorbell
ID (db_id) received from hardware in the GDMA_REGISTER_DEVICE response
to prevent out-of-bounds memory access when calculating the doorbell
page address.

In mana_gd_ring_doorbell(), the doorbell page address is calculated as:
  addr = db_page_base + db_page_size * db_index
       = (bar0_va + db_page_off) + db_page_size * db_index

A hardware could return values that cause this address to fall outside
the BAR0 MMIO region. In Confidential VM environments, hardware responses
cannot be fully trusted.

Add the following validations:
- Store the BAR0 size (bar0_size) in gdma_context during probe.
- Validate the doorbell page offset (db_page_off) read from device
  registers does not exceed bar0_size during initialization, converting
  mana_gd_init_registers() to return an error code.
- Validate db_id from GDMA_REGISTER_DEVICE response against the
  maximum number of doorbell pages that fit within BAR0.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
---
 .../net/ethernet/microsoft/mana/gdma_main.c   | 60 ++++++++++++++-----
 include/net/mana/gdma.h                       |  4 +-
 2 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index aef8612b73cb..ef0dbfaac8f4 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -39,49 +39,66 @@ static u64 mana_gd_r64(struct gdma_context *g, u64 offset)
 	return readq(g->bar0_va + offset);
 }
 
-static void mana_gd_init_pf_regs(struct pci_dev *pdev)
+static int mana_gd_init_pf_regs(struct pci_dev *pdev)
 {
 	struct gdma_context *gc = pci_get_drvdata(pdev);
 	void __iomem *sriov_base_va;
 	u64 sriov_base_off;
 
 	gc->db_page_size = mana_gd_r32(gc, GDMA_PF_REG_DB_PAGE_SIZE) & 0xFFFF;
-	gc->db_page_base = gc->bar0_va +
-				mana_gd_r64(gc, GDMA_PF_REG_DB_PAGE_OFF);
+	gc->db_page_off = mana_gd_r64(gc, GDMA_PF_REG_DB_PAGE_OFF);
 
-	gc->phys_db_page_base = gc->bar0_pa +
-				mana_gd_r64(gc, GDMA_PF_REG_DB_PAGE_OFF);
+	/* Validate doorbell offset is within BAR0 */
+	if (gc->db_page_off >= gc->bar0_size) {
+		dev_err(gc->dev,
+			"Doorbell offset 0x%llx exceeds BAR0 size 0x%llx\n",
+			gc->db_page_off, (u64)gc->bar0_size);
+		return -EPROTO;
+	}
+
+	gc->db_page_base = gc->bar0_va + gc->db_page_off;
+	gc->phys_db_page_base = gc->bar0_pa + gc->db_page_off;
 
 	sriov_base_off = mana_gd_r64(gc, GDMA_SRIOV_REG_CFG_BASE_OFF);
 
 	sriov_base_va = gc->bar0_va + sriov_base_off;
 	gc->shm_base = sriov_base_va +
 			mana_gd_r64(gc, sriov_base_off + GDMA_PF_REG_SHM_OFF);
+
+	return 0;
 }
 
-static void mana_gd_init_vf_regs(struct pci_dev *pdev)
+static int mana_gd_init_vf_regs(struct pci_dev *pdev)
 {
 	struct gdma_context *gc = pci_get_drvdata(pdev);
 
 	gc->db_page_size = mana_gd_r32(gc, GDMA_REG_DB_PAGE_SIZE) & 0xFFFF;
+	gc->db_page_off = mana_gd_r64(gc, GDMA_REG_DB_PAGE_OFFSET);
 
-	gc->db_page_base = gc->bar0_va +
-				mana_gd_r64(gc, GDMA_REG_DB_PAGE_OFFSET);
+	/* Validate doorbell offset is within BAR0 */
+	if (gc->db_page_off >= gc->bar0_size) {
+		dev_err(gc->dev,
+			"Doorbell offset 0x%llx exceeds BAR0 size 0x%llx\n",
+			gc->db_page_off, (u64)gc->bar0_size);
+		return -EPROTO;
+	}
 
-	gc->phys_db_page_base = gc->bar0_pa +
-				mana_gd_r64(gc, GDMA_REG_DB_PAGE_OFFSET);
+	gc->db_page_base = gc->bar0_va + gc->db_page_off;
+	gc->phys_db_page_base = gc->bar0_pa + gc->db_page_off;
 
 	gc->shm_base = gc->bar0_va + mana_gd_r64(gc, GDMA_REG_SHM_OFFSET);
+
+	return 0;
 }
 
-static void mana_gd_init_registers(struct pci_dev *pdev)
+static int mana_gd_init_registers(struct pci_dev *pdev)
 {
 	struct gdma_context *gc = pci_get_drvdata(pdev);
 
 	if (gc->is_pf)
-		mana_gd_init_pf_regs(pdev);
+		return mana_gd_init_pf_regs(pdev);
 	else
-		mana_gd_init_vf_regs(pdev);
+		return mana_gd_init_vf_regs(pdev);
 }
 
 /* Suppress logging when we set timeout to zero */
@@ -1256,6 +1273,17 @@ int mana_gd_register_device(struct gdma_dev *gd)
 		return err ? err : -EPROTO;
 	}
 
+	/* Validate that doorbell page for db_id is within the BAR0 region.
+	 * In mana_gd_ring_doorbell(), the address is calculated as:
+	 *   addr = db_page_base + db_page_size * db_id
+	 *        = (bar0_va + db_page_off) + (db_page_size * db_id)
+	 * So we need: db_page_off + db_page_size * (db_id + 1) <= bar0_size
+	 */
+	if (gc->db_page_off + gc->db_page_size * ((u64)resp.db_id + 1) > gc->bar0_size) {
+		dev_err(gc->dev, "Doorbell ID %u out of range\n", resp.db_id);
+		return -EPROTO;
+	}
+
 	gd->pdid = resp.pdid;
 	gd->gpa_mkey = resp.gpa_mkey;
 	gd->doorbell = resp.db_id;
@@ -1890,7 +1918,10 @@ static int mana_gd_setup(struct pci_dev *pdev)
 	struct gdma_context *gc = pci_get_drvdata(pdev);
 	int err;
 
-	mana_gd_init_registers(pdev);
+	err = mana_gd_init_registers(pdev);
+	if (err)
+		return err;
+
 	mana_smc_init(&gc->shm_channel, gc->dev, gc->shm_base);
 
 	gc->service_wq = alloc_ordered_workqueue("gdma_service_wq", 0);
@@ -1996,6 +2027,7 @@ static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	mutex_init(&gc->eq_test_event_mutex);
 	pci_set_drvdata(pdev, gc);
 	gc->bar0_pa = pci_resource_start(pdev, 0);
+	gc->bar0_size = pci_resource_len(pdev, 0);
 
 	bar0_va = pci_iomap(pdev, bar, 0);
 	if (!bar0_va)
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index ec17004b10c0..7fe3a1b61b2d 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -421,10 +421,12 @@ struct gdma_context {
 
 	phys_addr_t		bar0_pa;
 	void __iomem		*bar0_va;
+	resource_size_t		bar0_size;
 	void __iomem		*shm_base;
 	void __iomem		*db_page_base;
 	phys_addr_t		phys_db_page_base;
-	u32 db_page_size;
+	u64 db_page_off;
+	u64 db_page_size;
 	int                     numa_node;
 
 	/* Shared memory chanenl (used to bootstrap HWC) */
-- 
2.34.1


^ permalink raw reply related

* [PATCH net-next v3 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management
From: Long Li @ 2026-03-06 21:32 UTC (permalink / raw)
  To: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Shradha Gupta, Simon Horman, Konstantin Taranov,
	Souradeep Chakrabarti, Erick Archer, linux-hyperv, netdev,
	linux-kernel, linux-rdma
  Cc: Long Li

This series adds per-vPort Event Queue (EQ) allocation and MSI-X interrupt
management for the MANA driver. Previously, all vPorts shared a single set
of EQs. This change enables dedicated EQs per vPort with support for both
dedicated and shared MSI-X vector allocation modes.

Patch 1 moves EQ ownership from mana_context to per-vPort mana_port_context
and exports create/destroy functions for the RDMA driver.

Patch 2 adds device capability queries to determine whether MSI-X vectors
should be dedicated per-vPort or shared. When the number of available MSI-X
vectors is insufficient for dedicated allocation, the driver enables sharing
mode with bitmap-based vector assignment.

Patch 3 introduces the GIC (GDMA IRQ Context) abstraction with reference
counting, allowing multiple EQs to safely share a single MSI-X vector.

Patch 4 converts the global EQ allocation in probe/resume to use the new
GIC functions.

Patch 5 adds per-vPort GIC lifecycle management, calling get/put on each
EQ creation and destruction during vPort open/close.

Patch 6 extends the same GIC lifecycle management to the RDMA driver's EQ
allocation path.

Tested on Azure VMs with both MSI-X sharing mode 0 (dedicated) and mode 1
(shared): NIC down/up tests, iperf3 traffic tests up to 181 Gbps.

Changes in v3:
- Rebased on net-next/main
- Patch 1: Added NULL check for mpc->eqs in mana_ib_create_qp_rss() to
  prevent NULL pointer dereference when RSS QP is created before a raw QP
  has configured the vport and allocated EQs

Changes in v2:
- Rebased on net-next/main (adapted to kzalloc_objs/kzalloc_obj macros,
  new GDMA_DRV_CAP_FLAG definitions)
- Patch 2: Fixed misleading comment for max_num_queues vs
  max_num_queues_vport in gdma.h
- Patch 3: Fixed spelling typo in gdma_main.c ("difference" -> "different")

Long Li (6):
  net: mana: Create separate EQs for each vPort
  net: mana: Query device capabilities and configure MSI-X sharing for
    EQs
  net: mana: Introduce GIC context with refcounting for interrupt
    management
  net: mana: Use GIC functions to allocate global EQs
  net: mana: Allocate interrupt context for each EQ when creating vPort
  RDMA/mana_ib: Allocate interrupt contexts on EQs

 drivers/infiniband/hw/mana/main.c             |  47 ++-
 drivers/infiniband/hw/mana/qp.c               |  12 +-
 .../net/ethernet/microsoft/mana/gdma_main.c   | 309 +++++++++++++-----
 drivers/net/ethernet/microsoft/mana/mana_en.c | 162 +++++----
 include/net/mana/gdma.h                       |  31 +-
 include/net/mana/mana.h                       |   7 +-
 6 files changed, 410 insertions(+), 158 deletions(-)

-- 
2.43.0


^ permalink raw reply

* [PATCH net-next v3 1/6] net: mana: Create separate EQs for each vPort
From: Long Li @ 2026-03-06 21:32 UTC (permalink / raw)
  To: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Shradha Gupta, Simon Horman, Konstantin Taranov,
	Souradeep Chakrabarti, Erick Archer, linux-hyperv, netdev,
	linux-kernel, linux-rdma
  Cc: Long Li
In-Reply-To: <20260306213302.544681-1-longli@microsoft.com>

To prepare for assigning vPorts to dedicated MSI-X vectors, remove EQ
sharing among the vPorts and create dedicated EQs for each vPort.

Move the EQ definition from struct mana_context to struct mana_port_context
and update related support functions. Export mana_create_eq() and
mana_destroy_eq() for use by the MANA RDMA driver.

Signed-off-by: Long Li <longli@microsoft.com>
---
Changes in v3:
- Added NULL check for mpc->eqs in mana_ib_create_qp_rss() to prevent
  kernel crash when RSS QP is created before EQs are allocated
---
 drivers/infiniband/hw/mana/main.c             |  14 ++-
 drivers/infiniband/hw/mana/qp.c               |  12 +-
 drivers/net/ethernet/microsoft/mana/mana_en.c | 109 ++++++++++--------
 include/net/mana/mana.h                       |   7 +-
 4 files changed, 90 insertions(+), 52 deletions(-)

diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index 8d99cd00f002..d51dd0ee85f4 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -20,8 +20,10 @@ void mana_ib_uncfg_vport(struct mana_ib_dev *dev, struct mana_ib_pd *pd,
 	pd->vport_use_count--;
 	WARN_ON(pd->vport_use_count < 0);
 
-	if (!pd->vport_use_count)
+	if (!pd->vport_use_count) {
+		mana_destroy_eq(mpc);
 		mana_uncfg_vport(mpc);
+	}
 
 	mutex_unlock(&pd->vport_mutex);
 }
@@ -55,15 +57,21 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, struct mana_ib_pd *pd,
 		return err;
 	}
 
-	mutex_unlock(&pd->vport_mutex);
 
 	pd->tx_shortform_allowed = mpc->tx_shortform_allowed;
 	pd->tx_vp_offset = mpc->tx_vp_offset;
+	err = mana_create_eq(mpc);
+	if (err) {
+		mana_uncfg_vport(mpc);
+		pd->vport_use_count--;
+	}
+
+	mutex_unlock(&pd->vport_mutex);
 
 	ibdev_dbg(&dev->ib_dev, "vport handle %llx pdid %x doorbell_id %x\n",
 		  mpc->port_handle, pd->pdn, doorbell_id);
 
-	return 0;
+	return err;
 }
 
 int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
index 82f84f7ad37a..039c86eca0dc 100644
--- a/drivers/infiniband/hw/mana/qp.c
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -188,7 +188,15 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
 		cq_spec.gdma_region = cq->queue.gdma_region;
 		cq_spec.queue_size = cq->cqe * COMP_ENTRY_SIZE;
 		cq_spec.modr_ctx_id = 0;
-		eq = &mpc->ac->eqs[cq->comp_vector];
+		/* EQs are created when a raw QP configures the vport.
+		 * A raw QP must be created before creating rwq_ind_tbl.
+		 */
+		if (!mpc->eqs) {
+			ret = -EINVAL;
+			i--;
+			goto fail;
+		}
+		eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
 		cq_spec.attached_eq = eq->eq->id;
 
 		ret = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_RQ,
@@ -340,7 +348,7 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
 	cq_spec.queue_size = send_cq->cqe * COMP_ENTRY_SIZE;
 	cq_spec.modr_ctx_id = 0;
 	eq_vec = send_cq->comp_vector;
-	eq = &mpc->ac->eqs[eq_vec];
+	eq = &mpc->eqs[eq_vec % mpc->num_queues];
 	cq_spec.attached_eq = eq->eq->id;
 
 	err = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_SQ, &wq_spec,
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 56ee993e3a43..428dafaf315b 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1588,78 +1588,82 @@ void mana_destroy_wq_obj(struct mana_port_context *apc, u32 wq_type,
 }
 EXPORT_SYMBOL_NS(mana_destroy_wq_obj, "NET_MANA");
 
-static void mana_destroy_eq(struct mana_context *ac)
+void mana_destroy_eq(struct mana_port_context *apc)
 {
+	struct mana_context *ac = apc->ac;
 	struct gdma_context *gc = ac->gdma_dev->gdma_context;
 	struct gdma_queue *eq;
 	int i;
 
-	if (!ac->eqs)
+	if (!apc->eqs)
 		return;
 
-	debugfs_remove_recursive(ac->mana_eqs_debugfs);
-	ac->mana_eqs_debugfs = NULL;
+	debugfs_remove_recursive(apc->mana_eqs_debugfs);
+	apc->mana_eqs_debugfs = NULL;
 
-	for (i = 0; i < gc->max_num_queues; i++) {
-		eq = ac->eqs[i].eq;
+	for (i = 0; i < apc->num_queues; i++) {
+		eq = apc->eqs[i].eq;
 		if (!eq)
 			continue;
 
 		mana_gd_destroy_queue(gc, eq);
 	}
 
-	kfree(ac->eqs);
-	ac->eqs = NULL;
+	kfree(apc->eqs);
+	apc->eqs = NULL;
 }
+EXPORT_SYMBOL_NS(mana_destroy_eq, "NET_MANA");
 
-static void mana_create_eq_debugfs(struct mana_context *ac, int i)
+static void mana_create_eq_debugfs(struct mana_port_context *apc, int i)
 {
-	struct mana_eq eq = ac->eqs[i];
+	struct mana_eq eq = apc->eqs[i];
 	char eqnum[32];
 
 	sprintf(eqnum, "eq%d", i);
-	eq.mana_eq_debugfs = debugfs_create_dir(eqnum, ac->mana_eqs_debugfs);
+	eq.mana_eq_debugfs = debugfs_create_dir(eqnum, apc->mana_eqs_debugfs);
 	debugfs_create_u32("head", 0400, eq.mana_eq_debugfs, &eq.eq->head);
 	debugfs_create_u32("tail", 0400, eq.mana_eq_debugfs, &eq.eq->tail);
 	debugfs_create_file("eq_dump", 0400, eq.mana_eq_debugfs, eq.eq, &mana_dbg_q_fops);
 }
 
-static int mana_create_eq(struct mana_context *ac)
+int mana_create_eq(struct mana_port_context *apc)
 {
-	struct gdma_dev *gd = ac->gdma_dev;
+	struct gdma_dev *gd = apc->ac->gdma_dev;
 	struct gdma_context *gc = gd->gdma_context;
 	struct gdma_queue_spec spec = {};
 	int err;
 	int i;
 
-	ac->eqs = kzalloc_objs(struct mana_eq, gc->max_num_queues);
-	if (!ac->eqs)
+	WARN_ON(apc->eqs);
+	apc->eqs = kzalloc_objs(struct mana_eq, apc->num_queues);
+	if (!apc->eqs)
 		return -ENOMEM;
 
 	spec.type = GDMA_EQ;
 	spec.monitor_avl_buf = false;
 	spec.queue_size = EQ_SIZE;
 	spec.eq.callback = NULL;
-	spec.eq.context = ac->eqs;
+	spec.eq.context = apc->eqs;
 	spec.eq.log2_throttle_limit = LOG2_EQ_THROTTLE;
 
-	ac->mana_eqs_debugfs = debugfs_create_dir("EQs", gc->mana_pci_debugfs);
+	apc->mana_eqs_debugfs = debugfs_create_dir("EQs", apc->mana_port_debugfs);
 
-	for (i = 0; i < gc->max_num_queues; i++) {
+	for (i = 0; i < apc->num_queues; i++) {
 		spec.eq.msix_index = (i + 1) % gc->num_msix_usable;
-		err = mana_gd_create_mana_eq(gd, &spec, &ac->eqs[i].eq);
+		err = mana_gd_create_mana_eq(gd, &spec, &apc->eqs[i].eq);
 		if (err) {
 			dev_err(gc->dev, "Failed to create EQ %d : %d\n", i, err);
 			goto out;
 		}
-		mana_create_eq_debugfs(ac, i);
+		mana_create_eq_debugfs(apc, i);
 	}
 
 	return 0;
 out:
-	mana_destroy_eq(ac);
+	mana_destroy_eq(apc);
 	return err;
 }
+EXPORT_SYMBOL_NS(mana_create_eq, "NET_MANA");
 
 static int mana_fence_rq(struct mana_port_context *apc, struct mana_rxq *rxq)
 {
@@ -2377,7 +2381,7 @@ static int mana_create_txq(struct mana_port_context *apc,
 		spec.monitor_avl_buf = false;
 		spec.queue_size = cq_size;
 		spec.cq.callback = mana_schedule_napi;
-		spec.cq.parent_eq = ac->eqs[i].eq;
+		spec.cq.parent_eq = apc->eqs[i].eq;
 		spec.cq.context = cq;
 		err = mana_gd_create_mana_wq_cq(gd, &spec, &cq->gdma_cq);
 		if (err)
@@ -2770,13 +2774,12 @@ static void mana_create_rxq_debugfs(struct mana_port_context *apc, int idx)
 static int mana_add_rx_queues(struct mana_port_context *apc,
 			      struct net_device *ndev)
 {
-	struct mana_context *ac = apc->ac;
 	struct mana_rxq *rxq;
 	int err = 0;
 	int i;
 
 	for (i = 0; i < apc->num_queues; i++) {
-		rxq = mana_create_rxq(apc, i, &ac->eqs[i], ndev);
+		rxq = mana_create_rxq(apc, i, &apc->eqs[i], ndev);
 		if (!rxq) {
 			err = -ENOMEM;
 			netdev_err(ndev, "Failed to create rxq %d : %d\n", i, err);
@@ -2795,9 +2798,8 @@ static int mana_add_rx_queues(struct mana_port_context *apc,
 	return err;
 }
 
-static void mana_destroy_vport(struct mana_port_context *apc)
+static void mana_destroy_rxqs(struct mana_port_context *apc)
 {
-	struct gdma_dev *gd = apc->ac->gdma_dev;
 	struct mana_rxq *rxq;
 	u32 rxq_idx;
 
@@ -2809,8 +2811,12 @@ static void mana_destroy_vport(struct mana_port_context *apc)
 		mana_destroy_rxq(apc, rxq, true);
 		apc->rxqs[rxq_idx] = NULL;
 	}
+}
+
+static void mana_destroy_vport(struct mana_port_context *apc)
+{
+	struct gdma_dev *gd = apc->ac->gdma_dev;
 
-	mana_destroy_txq(apc);
 	mana_uncfg_vport(apc);
 
 	if (gd->gdma_context->is_pf && !apc->ac->bm_hostmode)
@@ -2831,11 +2837,7 @@ static int mana_create_vport(struct mana_port_context *apc,
 			return err;
 	}
 
-	err = mana_cfg_vport(apc, gd->pdid, gd->doorbell);
-	if (err)
-		return err;
-
-	return mana_create_txq(apc, net);
+	return mana_cfg_vport(apc, gd->pdid, gd->doorbell);
 }
 
 static int mana_rss_table_alloc(struct mana_port_context *apc)
@@ -3112,21 +3114,36 @@ int mana_alloc_queues(struct net_device *ndev)
 
 	err = mana_create_vport(apc, ndev);
 	if (err) {
-		netdev_err(ndev, "Failed to create vPort %u : %d\n", apc->port_idx, err);
+		netdev_err(ndev, "Failed to create vPort %u : %d\n",
+			   apc->port_idx, err);
 		return err;
 	}
 
+	err = mana_create_eq(apc);
+	if (err) {
+		netdev_err(ndev, "Failed to create EQ on vPort %u: %d\n",
+			   apc->port_idx, err);
+		goto destroy_vport;
+	}
+
+	err = mana_create_txq(apc, ndev);
+	if (err) {
+		netdev_err(ndev, "Failed to create TXQ on vPort %u: %d\n",
+			   apc->port_idx, err);
+		goto destroy_eq;
+	}
+
 	err = netif_set_real_num_tx_queues(ndev, apc->num_queues);
 	if (err) {
 		netdev_err(ndev,
 			   "netif_set_real_num_tx_queues () failed for ndev with num_queues %u : %d\n",
 			   apc->num_queues, err);
-		goto destroy_vport;
+		goto destroy_txq;
 	}
 
 	err = mana_add_rx_queues(apc, ndev);
 	if (err)
-		goto destroy_vport;
+		goto destroy_rxq;
 
 	apc->rss_state = apc->num_queues > 1 ? TRI_STATE_TRUE : TRI_STATE_FALSE;
 
@@ -3135,7 +3152,7 @@ int mana_alloc_queues(struct net_device *ndev)
 		netdev_err(ndev,
 			   "netif_set_real_num_rx_queues () failed for ndev with num_queues %u : %d\n",
 			   apc->num_queues, err);
-		goto destroy_vport;
+		goto destroy_rxq;
 	}
 
 	mana_rss_table_init(apc);
@@ -3143,19 +3160,25 @@ int mana_alloc_queues(struct net_device *ndev)
 	err = mana_config_rss(apc, TRI_STATE_TRUE, true, true);
 	if (err) {
 		netdev_err(ndev, "Failed to configure RSS table: %d\n", err);
-		goto destroy_vport;
+		goto destroy_rxq;
 	}
 
 	if (gd->gdma_context->is_pf && !apc->ac->bm_hostmode) {
 		err = mana_pf_register_filter(apc);
 		if (err)
-			goto destroy_vport;
+			goto destroy_rxq;
 	}
 
 	mana_chn_setxdp(apc, mana_xdp_get(apc));
 
 	return 0;
 
+destroy_rxq:
+	mana_destroy_rxqs(apc);
+destroy_txq:
+	mana_destroy_txq(apc);
+destroy_eq:
+	mana_destroy_eq(apc);
 destroy_vport:
 	mana_destroy_vport(apc);
 	return err;
@@ -3258,6 +3281,9 @@ static int mana_dealloc_queues(struct net_device *ndev)
 		netdev_err(ndev, "Failed to disable vPort: %d\n", err);
 
 	/* Even in err case, still need to cleanup the vPort */
+	mana_destroy_rxqs(apc);
+	mana_destroy_txq(apc);
+	mana_destroy_eq(apc);
 	mana_destroy_vport(apc);
 
 	return 0;
@@ -3572,12 +3598,6 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 		gd->driver_data = ac;
 	}
 
-	err = mana_create_eq(ac);
-	if (err) {
-		dev_err(dev, "Failed to create EQs: %d\n", err);
-		goto out;
-	}
-
 	err = mana_query_device_cfg(ac, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
 				    MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
 	if (err)
@@ -3716,7 +3736,6 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
 		free_netdev(ndev);
 	}
 
-	mana_destroy_eq(ac);
 out:
 	if (ac->per_port_queue_reset_wq) {
 		destroy_workqueue(ac->per_port_queue_reset_wq);
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index a078af283bdd..787e637059df 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -478,8 +478,6 @@ struct mana_context {
 	u8 bm_hostmode;
 
 	struct mana_ethtool_hc_stats hc_stats;
-	struct mana_eq *eqs;
-	struct dentry *mana_eqs_debugfs;
 	struct workqueue_struct *per_port_queue_reset_wq;
 	/* Workqueue for querying hardware stats */
 	struct delayed_work gf_stats_work;
@@ -499,6 +497,9 @@ struct mana_port_context {
 
 	u8 mac_addr[ETH_ALEN];
 
+	struct mana_eq *eqs;
+	struct dentry *mana_eqs_debugfs;
+
 	enum TRI_STATE rss_state;
 
 	mana_handle_t default_rxobj;
@@ -1023,6 +1024,8 @@ void mana_destroy_wq_obj(struct mana_port_context *apc, u32 wq_type,
 int mana_cfg_vport(struct mana_port_context *apc, u32 protection_dom_id,
 		   u32 doorbell_pg_id);
 void mana_uncfg_vport(struct mana_port_context *apc);
+int mana_create_eq(struct mana_port_context *apc);
+void mana_destroy_eq(struct mana_port_context *apc);
 
 struct net_device *mana_get_primary_netdev(struct mana_context *ac,
 					   u32 port_index,
-- 
2.43.0


^ permalink raw reply related

* [PATCH net-next v3 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs
From: Long Li @ 2026-03-06 21:32 UTC (permalink / raw)
  To: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Shradha Gupta, Simon Horman, Konstantin Taranov,
	Souradeep Chakrabarti, Erick Archer, linux-hyperv, netdev,
	linux-kernel, linux-rdma
  Cc: Long Li
In-Reply-To: <20260306213302.544681-1-longli@microsoft.com>

When querying the device, adjust the max number of queues to allow
dedicated MSI-X vectors for each vPort. The number of queues per vPort
is clamped to no less than 16. MSI-X sharing among vPorts is disabled
by default and is only enabled when there are not enough MSI-X vectors
for dedicated allocation.

Rename mana_query_device_cfg() to mana_gd_query_device_cfg() as it is
used at GDMA device probe time for querying device capabilities.

Signed-off-by: Long Li <longli@microsoft.com>
---
 .../net/ethernet/microsoft/mana/gdma_main.c   | 66 ++++++++++++++++---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 36 +++++-----
 include/net/mana/gdma.h                       | 13 +++-
 3 files changed, 91 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index aef8612b73cb..a6ab2f053fe9 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -107,6 +107,9 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
 	struct gdma_context *gc = pci_get_drvdata(pdev);
 	struct gdma_query_max_resources_resp resp = {};
 	struct gdma_general_req req = {};
+	unsigned int max_num_queues;
+	u8 bm_hostmode;
+	u16 num_ports;
 	int err;
 
 	mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_MAX_RESOURCES,
@@ -152,6 +155,40 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
 	if (gc->max_num_queues > gc->num_msix_usable - 1)
 		gc->max_num_queues = gc->num_msix_usable - 1;
 
+	err = mana_gd_query_device_cfg(gc, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
+				       MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
+	if (err)
+		return err;
+
+	if (!num_ports)
+		return -EINVAL;
+
+	/*
+	 * Adjust gc->max_num_queues returned from the SOC to allow dedicated MSIx
+	 * for each vPort. Reduce max_num_queues to no less than 16 if necessary
+	 */
+	max_num_queues = (gc->num_msix_usable - 1) / num_ports;
+	max_num_queues = roundup_pow_of_two(max(max_num_queues, 1U));
+	if (max_num_queues < 16)
+		max_num_queues = 16;
+
+	/*
+	 * Use dedicated MSIx for EQs whenever possible, use MSIx sharing for
+	 * Ethernet EQs when (max_num_queues * num_ports > num_msix_usable - 1)
+	 */
+	max_num_queues = min(gc->max_num_queues, max_num_queues);
+	if (max_num_queues * num_ports > gc->num_msix_usable - 1)
+		gc->msi_sharing = true;
+
+	/* If MSI is shared, use max allowed value */
+	if (gc->msi_sharing)
+		gc->max_num_queues_vport = min(gc->num_msix_usable - 1, gc->max_num_queues);
+	else
+		gc->max_num_queues_vport = max_num_queues;
+
+	dev_info(gc->dev, "MSI sharing mode %d max queues %d\n",
+		 gc->msi_sharing, gc->max_num_queues);
+
 	return 0;
 }
 
@@ -1803,6 +1840,7 @@ static int mana_gd_setup_hwc_irqs(struct pci_dev *pdev)
 		/* Need 1 interrupt for HWC */
 		max_irqs = min(num_online_cpus(), MANA_MAX_NUM_QUEUES) + 1;
 		min_irqs = 2;
+		gc->msi_sharing = true;
 	}
 
 	nvec = pci_alloc_irq_vectors(pdev, min_irqs, max_irqs, PCI_IRQ_MSIX);
@@ -1881,6 +1919,8 @@ static void mana_gd_remove_irqs(struct pci_dev *pdev)
 
 	pci_free_irq_vectors(pdev);
 
+	bitmap_free(gc->msi_bitmap);
+	gc->msi_bitmap = NULL;
 	gc->max_num_msix = 0;
 	gc->num_msix_usable = 0;
 }
@@ -1912,20 +1952,30 @@ static int mana_gd_setup(struct pci_dev *pdev)
 	if (err)
 		goto destroy_hwc;
 
-	err = mana_gd_query_max_resources(pdev);
+	err = mana_gd_detect_devices(pdev);
 	if (err)
 		goto destroy_hwc;
 
-	err = mana_gd_setup_remaining_irqs(pdev);
-	if (err) {
-		dev_err(gc->dev, "Failed to setup remaining IRQs: %d", err);
-		goto destroy_hwc;
-	}
-
-	err = mana_gd_detect_devices(pdev);
+	err = mana_gd_query_max_resources(pdev);
 	if (err)
 		goto destroy_hwc;
 
+	if (!gc->msi_sharing) {
+		gc->msi_bitmap = bitmap_zalloc(gc->num_msix_usable, GFP_KERNEL);
+		if (!gc->msi_bitmap) {
+			err = -ENOMEM;
+			goto destroy_hwc;
+		}
+		/* Set bit for HWC */
+		set_bit(0, gc->msi_bitmap);
+	} else {
+		err = mana_gd_setup_remaining_irqs(pdev);
+		if (err) {
+			dev_err(gc->dev, "Failed to setup remaining IRQs: %d", err);
+			goto destroy_hwc;
+		}
+	}
+
 	dev_dbg(&pdev->dev, "mana gdma setup successful\n");
 	return 0;
 
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 428dafaf315b..bfa0f354355d 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1000,10 +1000,9 @@ static int mana_init_port_context(struct mana_port_context *apc)
 	return !apc->rxqs ? -ENOMEM : 0;
 }
 
-static int mana_send_request(struct mana_context *ac, void *in_buf,
-			     u32 in_len, void *out_buf, u32 out_len)
+static int gdma_mana_send_request(struct gdma_context *gc, void *in_buf,
+				  u32 in_len, void *out_buf, u32 out_len)
 {
-	struct gdma_context *gc = ac->gdma_dev->gdma_context;
 	struct gdma_resp_hdr *resp = out_buf;
 	struct gdma_req_hdr *req = in_buf;
 	struct device *dev = gc->dev;
@@ -1037,6 +1036,14 @@ static int mana_send_request(struct mana_context *ac, void *in_buf,
 	return 0;
 }
 
+static int mana_send_request(struct mana_context *ac, void *in_buf,
+			     u32 in_len, void *out_buf, u32 out_len)
+{
+	struct gdma_context *gc = ac->gdma_dev->gdma_context;
+
+	return gdma_mana_send_request(gc, in_buf, in_len, out_buf, out_len);
+}
+
 static int mana_verify_resp_hdr(const struct gdma_resp_hdr *resp_hdr,
 				const enum mana_command_code expected_code,
 				const u32 min_size)
@@ -1170,11 +1177,10 @@ static void mana_pf_deregister_filter(struct mana_port_context *apc)
 			   err, resp.hdr.status);
 }
 
-static int mana_query_device_cfg(struct mana_context *ac, u32 proto_major_ver,
-				 u32 proto_minor_ver, u32 proto_micro_ver,
-				 u16 *max_num_vports, u8 *bm_hostmode)
+int mana_gd_query_device_cfg(struct gdma_context *gc, u32 proto_major_ver,
+			     u32 proto_minor_ver, u32 proto_micro_ver,
+			     u16 *max_num_vports, u8 *bm_hostmode)
 {
-	struct gdma_context *gc = ac->gdma_dev->gdma_context;
 	struct mana_query_device_cfg_resp resp = {};
 	struct mana_query_device_cfg_req req = {};
 	struct device *dev = gc->dev;
@@ -1189,7 +1195,7 @@ static int mana_query_device_cfg(struct mana_context *ac, u32 proto_major_ver,
 	req.proto_minor_ver = proto_minor_ver;
 	req.proto_micro_ver = proto_micro_ver;
 
-	err = mana_send_request(ac, &req, sizeof(req), &resp, sizeof(resp));
+	err = gdma_mana_send_request(gc, &req, sizeof(req), &resp, sizeof(resp));
 	if (err) {
 		dev_err(dev, "Failed to query config: %d", err);
 		return err;
@@ -1217,8 +1223,6 @@ static int mana_query_device_cfg(struct mana_context *ac, u32 proto_major_ver,
 	else
 		*bm_hostmode = 0;
 
-	debugfs_create_u16("adapter-MTU", 0400, gc->mana_pci_debugfs, &gc->adapter_mtu);
-
 	return 0;
 }
 
@@ -3329,7 +3333,7 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
 	int err;
 
 	ndev = alloc_etherdev_mq(sizeof(struct mana_port_context),
-				 gc->max_num_queues);
+				 gc->max_num_queues_vport);
 	if (!ndev)
 		return -ENOMEM;
 
@@ -3338,8 +3342,8 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
 	apc = netdev_priv(ndev);
 	apc->ac = ac;
 	apc->ndev = ndev;
-	apc->max_queues = gc->max_num_queues;
-	apc->num_queues = gc->max_num_queues;
+	apc->max_queues = gc->max_num_queues_vport;
+	apc->num_queues = gc->max_num_queues_vport;
 	apc->tx_queue_size = DEF_TX_BUFFERS_PER_QUEUE;
 	apc->rx_queue_size = DEF_RX_BUFFERS_PER_QUEUE;
 	apc->port_handle = INVALID_MANA_HANDLE;
@@ -3598,13 +3602,15 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 		gd->driver_data = ac;
 	}
 
-	err = mana_query_device_cfg(ac, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
-				    MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
+	err = mana_gd_query_device_cfg(gc, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
+				       MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
 	if (err)
 		goto out;
 
 	ac->bm_hostmode = bm_hostmode;
 
+	debugfs_create_u16("adapter-MTU", 0400, gc->mana_pci_debugfs, &gc->adapter_mtu);
+
 	if (!resuming) {
 		ac->num_ports = num_ports;
 
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index ec17004b10c0..b744253b44e8 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -399,8 +399,10 @@ struct gdma_context {
 	struct device		*dev;
 	struct dentry		*mana_pci_debugfs;
 
-	/* Per-vPort max number of queues */
+	/* Hardware max number of queues */
 	unsigned int		max_num_queues;
+	/* Per-vPort max number of queues */
+	unsigned int		max_num_queues_vport;
 	unsigned int		max_num_msix;
 	unsigned int		num_msix_usable;
 	struct xarray		irq_contexts;
@@ -444,6 +446,12 @@ struct gdma_context {
 	struct workqueue_struct *service_wq;
 
 	unsigned long		flags;
+
+	/* Indicate if this device is sharing MSI for EQs on MANA */
+	bool msi_sharing;
+
+	/* Bitmap tracks where MSI is allocated when it is not shared for EQs */
+	unsigned long *msi_bitmap;
 };
 
 static inline bool mana_gd_is_mana(struct gdma_dev *gd)
@@ -1011,4 +1019,7 @@ int mana_gd_resume(struct pci_dev *pdev);
 
 bool mana_need_log(struct gdma_context *gc, int err);
 
+int mana_gd_query_device_cfg(struct gdma_context *gc, u32 proto_major_ver,
+			     u32 proto_minor_ver, u32 proto_micro_ver,
+			     u16 *max_num_vports, u8 *bm_hostmode);
 #endif /* _GDMA_H */
-- 
2.43.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox