Linux-HyperV List

Linux-HyperV List
 help / color / mirror / Atom feed

* [PATCH net v3 4/5] net: mana: Don't overwrite port probe error with add_adev result
From: Erni Sri Satya Vennela @ 2026-04-15  8:09 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, ernis, ssengar, dipayanroy, gargaditya,
	shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
	linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260415080944.732901-1-ernis@linux.microsoft.com>

In mana_probe(), if mana_probe_port() fails for any port, the error
is stored in 'err' and the loop breaks. However, the subsequent
unconditional 'err = add_adev(gd, "eth")' overwrites this error.
If add_adev() succeeds, mana_probe() returns success despite ports
being left in a partially initialized state (ac->ports[i] == NULL).

Only call add_adev() when there is no prior error, so the probe
correctly fails and triggers mana_remove() cleanup.

Fixes: ced82fce77e9 ("net: mana: Probe rdma device in mana driver")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
---
Changes in v3:
*  Fix inaccurate comments.
Changes in v2:
* Apply the patch in net instead of net-next.
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index ce1b7ec46a27..39b18577fb51 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -3680,10 +3680,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 	if (!resuming) {
 		for (i = 0; i < ac->num_ports; i++) {
 			err = mana_probe_port(ac, i, &ac->ports[i]);
-			/* we log the port for which the probe failed and stop
-			 * probes for subsequent ports.
-			 * Note that we keep running ports, for which the probes
-			 * were successful, unless add_adev fails too
+			/* Log the port for which the probe failed, stop probing
+			 * subsequent ports, and skip add_adev.
+			 * mana_remove() will clean up already-probed ports.
 			 */
 			if (err) {
 				dev_err(dev, "Probe Failed for port %d\n", i);
@@ -3697,10 +3696,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 			enable_work(&apc->queue_reset_work);
 			err = mana_attach(ac->ports[i]);
 			rtnl_unlock();
-			/* we log the port for which the attach failed and stop
-			 * attach for subsequent ports
-			 * Note that we keep running ports, for which the attach
-			 * were successful, unless add_adev fails too
+			/* Log the port for which the attach failed, stop
+			 * attaching subsequent ports, and skip add_adev.
+			 * mana_remove() will clean up already-attached ports.
 			 */
 			if (err) {
 				dev_err(dev, "Attach Failed for port %d\n", i);
@@ -3709,7 +3707,8 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
 		}
 	}
 
-	err = add_adev(gd, "eth");
+	if (!err)
+		err = add_adev(gd, "eth");
 
 	schedule_delayed_work(&ac->gf_stats_work, MANA_GF_STATS_PERIOD);
 
-- 
2.34.1


^ permalink raw reply related

* [PATCH net v3 5/5] net: mana: Fix EQ leak in mana_remove on NULL port
From: Erni Sri Satya Vennela @ 2026-04-15  8:09 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, ernis, ssengar, dipayanroy, gargaditya,
	shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
	linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260415080944.732901-1-ernis@linux.microsoft.com>

In mana_remove(), when a NULL port is encountered in the port iteration
loop, 'goto out' skips the mana_destroy_eq(ac) call, leaking the event
queues allocated earlier by mana_create_eq().

This can happen when mana_probe_port() fails for port 0, leaving
ac->ports[0] as NULL. On driver unload or error cleanup, mana_remove()
hits the NULL entry and jumps past mana_destroy_eq().

Change 'goto out' to 'break' so the for-loop exits normally and
mana_destroy_eq() is always reached. Remove the now-unreferenced out:
label.

Fixes: 1e2d0824a9c3 ("net: mana: Add support for EQ sharing")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
---
Changes in v3;
* Update Fixes tag to appropriate commit id.
Changes in v2:
* Apply the patch in net instead of net-next.
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 39b18577fb51..98e2fcc797ca 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -3752,7 +3752,7 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
 		if (!ndev) {
 			if (i == 0)
 				dev_err(dev, "No net device to remove\n");
-			goto out;
+			break;
 		}
 
 		apc = netdev_priv(ndev);
@@ -3783,7 +3783,7 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
 	}
 
 	mana_destroy_eq(ac);
-out:
+
 	if (ac->per_port_queue_reset_wq) {
 		destroy_workqueue(ac->per_port_queue_reset_wq);
 		ac->per_port_queue_reset_wq = NULL;
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH 2/2] Drivers: hv: Move add_interrupt_randomness() to hypervisor callback sysvec
From: Tianyu Lan @ 2026-04-15  8:28 UTC (permalink / raw)
  To: mhklinux
  Cc: kys, haiyangz, wei.liu, decui, longli, tglx, mingo, bp,
	dave.hansen, hpa, maz, bigeasy, x86, linux-kernel, linux-hyperv
In-Reply-To: <20260402202400.1707-3-mhklkml@zohomail.com>

On Fri, Apr 3, 2026 at 4:29 AM Michael Kelley <mhklkml@zohomail.com> wrote:
>
> From: Michael Kelley <mhklinux@outlook.com>
>
> The Hyper-V ISRs, for normal guests and when running in the
> hypervisor root patition, are calling add_interrupt_randomness() as a
> primary source of entropy. The call is currently in the ISRs as a common
> place to handle both x86/x64 and arm64. On x86/x64, hypervisor interrupts
> come through a custom sysvec entry, and do not go through a generic
> interrupt handler. On arm64, hypervisor interrupts come through an
> emulated GICv3. GICv3 uses the generic handler handle_percpu_devid_irq(),
> which does not do add_interrupt_randomness() -- unlike its counterpart
> handle_percpu_irq(). But handle_percpu_devid_irq() is now updated to do
> the add_interrupt_randomness(). So add_interrupt_randomness() is now
> needed only in Hyper-V's x86/x64 custom sysvec path.
>
> Move add_interrupt_randomness() from the Hyper-V ISRs into the Hyper-V
> x86/x64 custom sysvec path, matching the existing STIMER0 sysvec path.
> With this change, add_interrupt_randomness() is no longer called from any
> device drivers, which is appropriate.
>
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---

Reviewed-by: Tianyu Lan <tiala@microsoft.com>

-- 
Thanks
Tianyu Lan

^ permalink raw reply

* Re: [PATCH net] hv_sock: Report EOF instead of -EIO for FIN
From: Stefano Garzarella @ 2026-04-15 10:38 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: kys, haiyangz, wei.liu, longli, davem, edumazet, kuba, pabeni,
	horms, niuxuewei.nxw, linux-hyperv, virtualization, netdev,
	linux-kernel, stable, Ben Hillis, Mitchell Levy
In-Reply-To: <20260414234316.711578-1-decui@microsoft.com>

On Tue, Apr 14, 2026 at 04:43:16PM -0700, Dexuan Cui wrote:
>Commit f0c5827d07cb unluckily causes a regression for the FIN packet,
>and the final read syscall gets an error rather than 0.
>
>Ideally, we would want to fix hvs_channel_readable_payload() so that it
>could return 0 in the FIN scenario, but it's not good for the hv_sock
>driver to use the VMBus ringbuffer's cached priv_read_index, which is
>internal data in the VMBus driver.
>
>Fix the regression in hv_sock by returning 0 rather than -EIO.
>
>Fixes: f0c5827d07cb ("hv_sock: Return the readable bytes in hvs_stream_has_data()")
>Cc: stable@vger.kernel.org
>Reported-by: Ben Hillis <Ben.Hillis@microsoft.com>
>Reported-by: Mitchell Levy <levymitchell0@gmail.com>
>Signed-off-by: Dexuan Cui <decui@microsoft.com>
>---
> net/vmw_vsock/hyperv_transport.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
>diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
>index 069386a74557..63d3549125be 100644
>--- a/net/vmw_vsock/hyperv_transport.c
>+++ b/net/vmw_vsock/hyperv_transport.c
>@@ -703,8 +703,22 @@ static s64 hvs_stream_has_data(struct vsock_sock *vsk)
> 	switch (hvs_channel_readable_payload(hvs->chan)) {
> 	case 1:
> 		need_refill = !hvs->recv_desc;
>-		if (!need_refill)
>-			return -EIO;
>+		if (!need_refill) {

Can we drop `need_refill` entirly and just check `hvs->recv_desc` here?

Mainly because now the comment we are adding is confusing me about what 
`need_refill` means.

The rest LGTM.

Thanks,
Stefano

>+			/* Here hvs->recv_data_len is 0, so hvs->recv_desc must
>+			 * be NULL unless it points to the 0-byte-payload FIN
>+			 * packet: see hvs_update_recv_data().
>+			 *
>+			 * Here all the payload has been dequeued, but
>+			 * hvs_channel_readable_payload() still returns 1,
>+			 * because the VMBus ringbuffer's read_index is not
>+			 * updated for the FIN packet: hvs_stream_dequeue() ->
>+			 * hv_pkt_iter_next() updates the cached priv_read_index
>+			 * but has no opportunity to update the read_index in
>+			 * hv_pkt_iter_close() as hvs_stream_has_data() returns
>+			 * 0 for the FIN packet, so it won't get dequeued.
>+			 */
>+			return 0;
>+		}
>
> 		hvs->recv_desc = hv_pkt_iter_first(hvs->chan);
> 		if (!hvs->recv_desc)
>-- 
>2.49.0
>
>


^ permalink raw reply

* Re: [PATCH net v2 3/4] net: mana: Don't overwrite port probe error with add_adev result
From: Simon Horman @ 2026-04-15 12:37 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, ssengar, dipayanroy, gargaditya,
	shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
	linux-hyperv, netdev, linux-kernel
In-Reply-To: <ad84k4qaMd9wnYwd@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>

On Wed, Apr 15, 2026 at 12:04:51AM -0700, Erni Sri Satya Vennela wrote:
> On Tue, Apr 14, 2026 at 04:35:03PM +0100, Simon Horman wrote:
> > From: 'Simon Horman' <horms@kernel.org>
> > 
> > This is an AI-generated review of your patch. The human sending this
> > email has considered the AI review valid, or at least plausible.
> > Full review at: https://sashiko.dev
> > ---
> > net: mana: Don't overwrite port probe error with add_adev result
> > 
> > This commit ensures that if mana_probe_port() or mana_attach() fails, the
> > error is not overwritten by the result of add_adev(). By preserving the
> > error, it triggers mana_remove() cleanup instead of leaving ports partially
> > initialized.
> > 
> > > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > > index 9d095a6fb56c7..4d42cbfb3f15b 100644
> > > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > > @@ -3634,10 +3634,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
> > >  	if (!resuming) {
> > >  		for (i = 0; i < ac->num_ports; i++) {
> > >  			err = mana_probe_port(ac, i, &ac->ports[i]);
> > > -			/* we log the port for which the probe failed and stop
> > > -			 * probes for subsequent ports.
> > > -			 * Note that we keep running ports, for which the probes
> > > -			 * were successful, unless add_adev fails too
> > > +			/* Log the port for which the probe failed, stop probing
> > > +			 * subsequent ports, and skip add_adev.
> > > +			 * Already-probed ports remain functional.
> > >  			 */
> > 
> > Are these comments accurate regarding ports remaining functional?
> > 
> > Since this patch preserves the error code, execution falls through to the end
> > of mana_probe() where it checks if err is set and calls mana_remove().
> > mana_remove() will iterate over the successfully allocated ports, completely
> > detach them, unregister their net_devices, and destroy event queues.
> > 
> > It appears no ports remain functional because the device is completely torn
> > down.
> >
> I will update the patch with the updated comment message. 

Thanks.

> > [ ... ]
> > 
> > > @@ -3651,10 +3650,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
> > >  			enable_work(&apc->queue_reset_work);
> > >  			err = mana_attach(ac->ports[i]);
> > >  			rtnl_unlock();
> > > -			/* we log the port for which the attach failed and stop
> > > -			 * attach for subsequent ports
> > > -			 * Note that we keep running ports, for which the attach
> > > -			 * were successful, unless add_adev fails too
> > > +			/* Log the port for which the attach failed, stop
> > > +			 * attaching subsequent ports, and skip add_adev.
> > > +			 * Already-attached ports remain functional.
> > >  			 */
> > >  			if (err) {
> > >  				dev_err(dev, "Attach Failed for port %d\n", i);
> > > @@ -3663,7 +3661,8 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
> > 
> > Does propagating this error during PM resume introduce a NULL pointer
> > dereference?
> > 
> > If PM resume fails here (resuming == true), the preserved error code causes
> > mana_probe() to call mana_remove(), which tears down the device and sets
> > gd->gdma_context = NULL.
> > 
> > However, a failed resume callback does not automatically unbind the driver.
> > When the device is eventually unbound later, mana_remove() is invoked again:
> > 
> > drivers/net/ethernet/microsoft/mana/mana_en.c:mana_remove() {
> >     struct gdma_context *gc = gd->gdma_context;
> >     struct device *dev = gc->dev;
> >     ...
> > }
> > 
> > In the second invocation, gd->gdma_context is NULL, meaning gc is NULL,
> > which could lead to a kernel panic when dereferencing gc->dev.
> 
> Thankyou for pointing it out, Simon.
> Since this is a pre-existing bug, I will create a different patch for
> this change and make it as part of this patchset.

Likewise, thanks.

FTR, it it is a pre-existing bug then I don't think it needs
to block progress of your patchset. Even if fixing things
sooner than later is a good maxim.

^ permalink raw reply

* Re: [EXTERNAL] Re: [PATCH 1/8] hv: Select CONFIG_SYSFB only for CONFIG_HYPERV_VMBUS
From: Thomas Zimmermann @ 2026-04-15 13:42 UTC (permalink / raw)
  To: Saurabh Singh Sengar, javierm@redhat.com, arnd@arndb.de,
	ardb@kernel.org, ilias.apalodimas@linaro.org,
	chenhuacai@kernel.org, kernel@xen0n.name,
	maarten.lankhorst@linux.intel.com, mripard@kernel.org,
	airlied@gmail.com, simona@ffwll.ch, KY Srinivasan, Haiyang Zhang,
	wei.liu@kernel.org, Dexuan Cui, Long Li, deller@gmx.de
  Cc: linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev,
	linux-efi@vger.kernel.org, linux-riscv@lists.infradead.org,
	dri-devel@lists.freedesktop.org, linux-hyperv@vger.kernel.org,
	linux-fbdev@vger.kernel.org, Michael Kelley, Saurabh Sengar,
	stable@vger.kernel.org
In-Reply-To: <KUZP153MB1444885C302B353C02C2FA2FBE242@KUZP153MB1444.APCP153.PROD.OUTLOOK.COM>



Am 13.04.26 um 10:22 schrieb Saurabh Singh Sengar:
[...]
>>
>>> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
>> This fix is independent from the rest of the series. Do you want to merge it or
>> can I take it into DRM trees?
> Please feel free to take it via DRM tree.

Done now. Thanks a lot.

> CC : Wei Liu
>
> - Saurabh
>

-- 
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstr. 146, 90461 Nürnberg, Germany, www.suse.com
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich, (HRB 36809, AG Nürnberg)



^ permalink raw reply

* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
From: Dexuan Cui @ 2026-04-15 15:30 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
	Long Li, lpieralisi@kernel.org, kwilczynski@kernel.org,
	mani@kernel.org, robh@kernel.org, bhelgaas@google.com,
	Jake Oshins, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
  Cc: Krister Johansen, stable@vger.kernel.org
In-Reply-To: <SN6PR02MB4157D5C8EE35A221B130FC5AD45BA@SN6PR02MB4157.namprd02.prod.outlook.com>

> From: Michael Kelley <mhklinux@outlook.com>
> Sent: Wednesday, April 8, 2026 6:54 AM
> > > ...
> > > A slightly different approach to the whole problem is to change
> > > vmbus_reserve_fb(). If it is unable to get a non-zero "start" value, then
> > > it should use the same assumption as above, and reserve a frame buffer
> > > area starting at the lowest address in low MMIO space. The reserved size

The framebuffer base of Gen1 VMs always starts at 4GB-128MB, even if
the low mmio base is 1GB.

> > > could be the max possible frame buffer size, which I think is 64 MiB (?).
> >
> > It can be 128MB with the highest resolution 7680*4320 (I hope the
> > highest resolution won't become bigger in the future).
> 
> Indeed!
> 
> >
> > > This still leaves low MMIO space for subsequent PCI devices, and allows
> > > 32-bit BARs to continue to work. This approach requires one further
> > > assumption, which is that the host, plus any movement by hyperv_drm,
> > > has kept the frame buffer at the low end of the low MMIO space. From
> > > what I've seen, that assumption is reality -- the frame buffer always
> > > starts at the beginning of low MMIO space.
> > >
> > > This approach could be taken one step further, where vmbus_reserve_fb()
> > > *always* reserves 64 MiB starting at the low end of low MMIO space,
> > > regardless of the value of "start". The messy code for getting "start"
> > > could be dropped entirely, and the dependency on CONFIG_SYSFB goes
> > > away. Or maybe still get the value of "start" and "size", and if non-zero
> > > just do a sanity check that they are within the fixed 64 MiB reserved area.
> > >
> > > Thoughts? To me tweaking vmbus_reserve_fb() is a more
> > > straightforward and explicit way to do the reserving, vs. modifying
> > > the requested range in the Hyper-V PCI driver.
> >
> > Agreed. Let me try to make a new patch for review.

Please refer to my testing results and my thoughts below:


On x86-64 lab hosts, I tested Gen1 and Gen2 VMs on the latest
Hyper-V build, and on Windows Server 2019
(Hyper-V: Hypervisor Build 10.0.17763.8510-8-0), and I saw the
same host behavior on both the hosts:

1) The max required framebuffer size is determined by Set-VMVideo,
   and is reported to the guest hyperv_drm driver via
   hdev->channel->offermsg.offer.mmio_megabytes.

   1.1) For Gen1 VMs, the framebuffer's base is reported via the
        legacy PCI graphics device's BAR: the PCI BAR's base is
        hardcoded to 4G-128MB, and the size is hardcoded to 64MB,
        but the hyperv_drm driver can use a framebuffer size bigger
        than 64MB when Set-VMVideo specifies a big framebuffer.

   1.2) For Gen2 VMs, the framebuffer's base is reported via the
        UEFI firmware, and the size is hardcoded to 3MB, but the
        hyperv_drm driver can use a framebuffer size bigger than
        64MB when Set-VMVideo specifies a big framebuffer.

2) The low mmio range is affected by the PowerShell command
   "Set-VM -LowMemoryMappedIoSpace". Note: the command only accepts
   a value between 128MB and 3.5GB.

3) For Gen2 VMs, the low mmio range is also affected by another
   command "Set-VMVideo", and the framebuffer always starts at the
   beginning of the low mmio range.

   3.1) By default, both the low mmio range and the framebuffer
        start at the fixed location 4G-128MB. If the max
        framebuffer size is X MB bigger than 64MB, the
        low_mmio_base decreases by 2*X MB.

   3.2) With "Set-VM -LowMemoryMappedIoSpace 1GB", the
        low_mmio_base is 3GB, the low_mmio_size=1GB. The
        fb_mmio_base is also 3GB; if the max framebuffer size is
        X MB bigger than 64MB, the low_mmio_base decreases by
        2*X MB.

4) For Gen1 VMs, the framebuffer always starts at the fixed
   location 4G-128MB.

   4.1) By default, the low mmio range also starts at 4G-128MB,
        and the size is 127.75 MB, i.e. if
        hdev->channel->offermsg.offer.mmio_megabytes needs 128MB,
        the guest hyperv_drm driver can't find enough available
        mmio in the low mmio range, and has to use the high mmio
        range.

   4.2) With "Set-VM -LowMemoryMappedIoSpace 1GB", the
        low_mmio_base is 3GB, the low_mmio_size=1023.75 MB. The
        fb_mmio_base is still 4G-128MB, i.e. if hyperv_drm needs
        128 MB of mmio, it still has to use the high mmio range.

5) Note: the mmio range [VTPM_BASE_ADDRESS, 4GB), whose size is
   18.75MB, can not be used by the framebuffer.

To recap, according to my testing, the pseudo code of the
host/guest firmware that determine the low mmio range and the
framebuffer range should be:

max_fb_size = round_up_to_2MB(HorizontalResolution *
                              VerticalResolution * 4);

if (is_gen1_VM) {
    low_mmio_base = 4G - 128MB
    fb_mmio_base = 4G - 128MB
    low_mmio_size = 128MB - 0.25MB
} else { /* Gen2 VMs */
    low_mmio_base = 4G - 128MB
    low_mmio_size = 128MB

    excess_fb_size = (max_fb_size > 64MB) ?
                     (max_fb_size - 64MB) : 0;
    low_mmio_base -= excess_fb_size * 2;
    low_mmio_size = 4GB - low_mmio_base
    fb_mmio_base = low_mmio_base;
}

If ("Set-VM -LowMemoryMappedIoSpace" sets a target_low_mmio_size) {
    target_low_mmio_size = round_up_to_2MB(target_low_mmio_size)

    if (4GB - target_low_mmio_size < low_mmio_base) {
        low_mmio_base = 4GB - target_low_mmio_size

        if (is_gen1_VM) {
            low_mmio_size = target_low_mmio_size - 0.25MB
            // fb_mmio_base is still 4GB - 128MB
        } else {
            low_mmio_size = target_low_mmio_size
            fb_mmio_base = low_mmio_base;
        }
    }
}

e.g. for a Gen2 VM with the below commands:
   Set-VM -LowMemoryMappedIoSpace 128MB \
          -VMName decui-u2204-gen2-fb
   // i.e. the default setting on a lab host
   Set-VMVideo -VMName decui-u2204-gen2-fb \
               -HorizontalResolution 4834 \
               -VerticalResolution 3622 \
               -ResolutionType Single
we have:
    max_fb_size = round_up_to_2MB(4834*3622*4) = 68 MB
    excess_fb_size = 4MB
    low_mmio_base = 4GB - 128MB - 4MB * 2
                  = 4GB - 136 MB = 0xf7800000
    fb_mmio_base = low_mmio_base
    low_mmio_size = 4GB - low_mmio_base = 136MB

    In this case, we'd like to reserve low_mmio_size/2 = 68MB
    (rather than a fixed value of 128MB) for the framebuffer mmio:
    actually we can't reserve 128MB from the low mmio range,
    because the range [VTPM_BASE_ADDRESS, 4GB), whose size is
    18.75MB, is reserved for vTPM and other system devices like
    the I/O APIC, so the available low mmio size is only
    136MB - 18.75MB = 117.25MB.

    If we further run
    "Set-VM -LowMemoryMappedIoSpace 150MB \
     -VMName decui-u2204-gen2-fb", we have
    max_fb_size = round_up_to_2MB(4834*3622*4) = 68 MB
    excess_fb_size = 4MB
    low_mmio_base = 4GB - 128MB - 4MB * 2
                  = 4GB - 136 MB = 0xf7800000
    but 4GB - target_low_mmio_size = 4GB - 150MB, which is
    smaller than low_mmio_base, so low_mmio_base and
    fb_mmio_base are both set to 4GB - 150MB = 0xf6a00000,
    and low_mmio_size = 150MB. In this case, we'd like to
    reserve low_mmio_size/2 = 75MB for the framebuffer mmio,
    since we don't know the exact framebuffer size in
    vmbus_reserve_fb().

    With the same PowerShell commands, if the VM is a Gen1 VM,
    the low_mmio_base = 0xf6a00000, and
    low_mmio_size = 149.75MB but the fb_mmio_base is
    4GB - 128MB = 0xf8000000.

Another example is: for a Gen2 VM with the below commands:
   Set-VM -LowMemoryMappedIoSpace 1GB \
          -VMName decui-u2204-gen2-fb
   // i.e. the default setting on Azure. Let's ignore CVMs here.
   Set-VMVideo -VMName decui-u2204-gen2-fb \
               -HorizontalResolution 4834 \
               -VerticalResolution 3622 \
               -ResolutionType Single
we have:
    max_fb_size = round_up_to_2MB(4834*3622*4) = 68 MB
    excess_fb_size = 4MB
    low_mmio_base = 4GB - 128MB - 4MB * 2
                  = 4GB - 136 MB = 0xf7800000
    but 4GB - target_low_mmio_size = 4GB - 1GB, which is
    smaller than low_mmio_base, so low_mmio_base and
    fb_mmio_base are both set to 4GB - 1GB = 0xc0000000,
    and low_mmio_size = 1GB.
    In this case, we'd like to reserve
    min(low_mmio_size/2, 128MB) = 128MB for the framebuffer
    mmio, since the max possible framebuffer so far is 128MB.

************************************

On an ARM64 lab host, I also tested Gen2 VMs (there is no Gen1 VM
for ARM VMs):

By default:
  low_mmio_base = 4GB - 512MB, i.e. 0xe0000000
  low_mmio_size = 512MB
  fb_mmio_base = low_mmio_base
  The default framebuffer size is 3MB
  (i.e. screen.lfb_size = 3MB) but hyperv_drm:
  mmio_megabytes = 8 MB, which supports up to 1920 * 1080.

With the below commands:
   Set-VM -LowMemoryMappedIoSpace 512MB \
          -VMName decui-u2204-gen2-fb
   // the command only accepts a value between 512MB and 3.5GB.
   Set-VMVideo -VMName decui-u2204-gen2-fb \
               -HorizontalResolution 4834 \
               -VerticalResolution 3622 \
               -ResolutionType Single
I thought we would have:
    max_fb_size = round_up_to_2MB(4834*3622*4) = 68 MB
    excess_fb_size = 4MB
    low_mmio_base = 4GB - 512MB - 4MB * 2
                  = 4GB - 520MB
    fb_mmio_base = low_mmio_base
    low_mmio_size = 4GB - low_mmio_base = 520MB

    Since 4GB - target_low_mmio_size = 4GB - 512MB, which is
    smaller than low_mmio_base, so low_mmio_base and
    fb_mmio_base would be both set to 4GB - 520MB, and
    low_mmio_size would be 520MB.

    However, the actual result is:
    max_fb_size is indeed 68MB.
    but fb_mmio_base = low_mmio_base = 4GB - 512MB, and
    low_mmio_size = 512MB, i.e. the 'excess_fb_size' is not
    considered on ARM64!

    In this case, we'd like to reserve
    min(low_mmio_size/2, 128MB) = 128MB for the framebuffer
    mmio, since the max possible framebuffer so far is 128MB.

With the below command:
   Set-VM -LowMemoryMappedIoSpace 3GB \
          -VMName decui-u2204-gen2-fb
   // i.e. the default setting on Azure. Unlike x86-64, an ARM64
   // VM on Azure has 3GB of mmio below 4GB.
   Set-VMVideo -VMName decui-u2204-gen2-fb \
               -HorizontalResolution 4834 \
               -VerticalResolution 3622 \
               -ResolutionType Single
we have:
    max_fb_size = round_up_to_2MB(4834*3622*4) = 68 MB
    low_mmio_base = 4GB - 3GB = 1GB = 0x40000000
    low_mmio_size = 3GB
    fb_mmio_base = low_mmio_base = 1GB

    In this case, we'd like to reserve
    min(low_mmio_size/2, 128MB) = 128MB for the framebuffer
    mmio, since the max possible framebuffer so far is 128MB.

************************************

To recap, I think the bottom line is:

a) For Gen2 VMs, we can safely reserve a mmio range starting at
   sysfb_primary_display.screen.lfb_base with a size of
   min(low_mmio_size/2, 128MB).

   If sysfb_primary_display.screen.lfb_base is 0, i.e. in the case
   of kdump kernel, we use low_mmio_base instead.
   This should fix the mmio conflict in the kdump kernel.

b) For Gen1 VMs, let's still only reserve a mmio range starting at
   4GB - 128MB with a size of 64MB, because when we are in
   vmbus_reserve_fb(), we still don't know the exact size of the
   max_fb_size, and we don't want to reserve too much as we would
   want to reserve some low mmio space for PCI devices with 32-bit
   BARs (if any).

   If the user runs Set-VMVideo and needs a framebuffer size
   bigger than 64MB (IMO this is not a typical scenario in
   practice), we have to use high mmio for hyperv_drm in the first
   kernel, and the kdump kernel still suffers from the mmio
   conflict between hyperv_drm and hv_pci. We encourage Gen1 VM
   users to upgrade to Gen2 VMs to resolve the issue. Anyway, the
   mmio conflict is inevitable for Gen1 VMs, if the max required
   framebuffer size is bigger than 108MB (Note:
   128MB - VTPM_BASE_ADDRESS = 109.25, and the required framebuffer
   size is always rounded up to 2MB).

c) CVMs don't have the framebuffer device, so we don't need to reserve
    any mmio in vmbus_reserve_fb() for them.

Thanks for reading through this long email!

I'm making a patch right now...

Thanks,
Dexuan

^ permalink raw reply

* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
From: Dexuan Cui @ 2026-04-15 16:46 UTC (permalink / raw)
  To: Dexuan Cui, Michael Kelley, KY Srinivasan, Haiyang Zhang,
	wei.liu@kernel.org, Long Li, lpieralisi@kernel.org,
	kwilczynski@kernel.org, mani@kernel.org, robh@kernel.org,
	bhelgaas@google.com, Jake Oshins, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
  Cc: Krister Johansen, stable@vger.kernel.org
In-Reply-To: <SA1PR21MB69218F955B62DFF62E3E88D2BF222@SA1PR21MB6921.namprd21.prod.outlook.com>

> From: Dexuan Cui <DECUI@microsoft.com>
> Sent: Wednesday, April 15, 2026 8:31 AM
>  ...
> 4) For Gen1 VMs, the framebuffer always starts at the fixed
>    location 4G-128MB.
> 
>    4.1) By default, the low mmio range also starts at 4G-128MB,
>         and the size is 127.75 MB, i.e. if
>         hdev->channel->offermsg.offer.mmio_megabytes needs 128MB,
>         the guest hyperv_drm driver can't find enough available
>         mmio in the low mmio range, and has to use the high mmio
>         range.
> 
>    4.2) With "Set-VM -LowMemoryMappedIoSpace 1GB", the
>         low_mmio_base is 3GB, the low_mmio_size=1023.75 MB. The
>         fb_mmio_base is still 4G-128MB, i.e. if hyperv_drm needs
>         128 MB of mmio, it still has to use the high mmio range.

Well, this is inaccurate: in this case we could reserve 128MB low
mmio for hyperv_drm, but this is not really what we want: our
purpose is that we reserve the "initial" framebuffer mmio range so that
hyperv_drm in the first kernel doesn't have to relocate the framebuffer
mmio range. Even if we reserve 128MB low mmio for hyperv_drm
starting at 1GB:

a) hyperv_drm can be blacklisted by the users so from the host perspective,
it's still the "initial" framebuffer mmio range that takes affect, and we still
can have the mmio conflict in the kdump kernel.

b) hyperv_drm can load after hv_pci, so we can even have the mmio
conflict in the first kernel.

> On an ARM64 lab host, I also tested Gen2 VMs (there is no Gen1 VM
> for ARM VMs):
> 
> By default:
>   low_mmio_base = 4GB - 512MB, i.e. 0xe0000000
>   low_mmio_size = 512MB
>   fb_mmio_base = low_mmio_base
>   The default framebuffer size is 3MB
>   (i.e. screen.lfb_size = 3MB) but hyperv_drm:
>   mmio_megabytes = 8 MB, which supports up to 1920 * 1080.
> 
> With the below commands:
>    Set-VM -LowMemoryMappedIoSpace 512MB \
>           -VMName decui-u2204-gen2-fb
>    // the command only accepts a value between 512MB and 3.5GB.
>    Set-VMVideo -VMName decui-u2204-gen2-fb \
>                -HorizontalResolution 4834 \
>                -VerticalResolution 3622 \
>                -ResolutionType Single
> I thought we would have:
>     max_fb_size = round_up_to_2MB(4834*3622*4) = 68 MB
>     excess_fb_size = 4MB
>     low_mmio_base = 4GB - 512MB - 4MB * 2
>                   = 4GB - 520MB
>     fb_mmio_base = low_mmio_base
>     low_mmio_size = 4GB - low_mmio_base = 520MB
> 
>     Since 4GB - target_low_mmio_size = 4GB - 512MB, which is
>     smaller than low_mmio_base, so low_mmio_base and

Sorry for the typo: here the "smaller" should be "bigger".

>     fb_mmio_base would be both set to 4GB - 520MB, and
>     low_mmio_size would be 520MB.
> 
>     However, the actual result is:
>     max_fb_size is indeed 68MB.
>     but fb_mmio_base = low_mmio_base = 4GB - 512MB, and
>     low_mmio_size = 512MB, i.e. the 'excess_fb_size' is not
>     considered on ARM64!

I think this makes senses since " low_mmio_size = 512MB" is
already bigger enough for the framebuffer.

>     In this case, we'd like to reserve
>     min(low_mmio_size/2, 128MB) = 128MB for the framebuffer
>     mmio, since the max possible framebuffer so far is 128MB.
> 
 
Thanks,
Dexuan

^ permalink raw reply

* RE: [EXTERNAL] Re: [PATCH net] hv_sock: Report EOF instead of -EIO for FIN
From: Dexuan Cui @ 2026-04-15 16:55 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Long Li,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, horms@kernel.org, niuxuewei.nxw@antgroup.com,
	linux-hyperv@vger.kernel.org, virtualization@lists.linux.dev,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org, Ben Hillis, Mitchell Levy
In-Reply-To: <ad9pPrji1uYSgNir@sgarzare-redhat>

> From: Stefano Garzarella <sgarzare@redhat.com>
> Sent: Wednesday, April 15, 2026 3:38 AM
> >@@ -703,8 +703,22 @@ static s64 hvs_stream_has_data(struct vsock_sock
> *vsk)
> > 	switch (hvs_channel_readable_payload(hvs->chan)) {
> > 	case 1:
> > 		need_refill = !hvs->recv_desc;
> >-		if (!need_refill)
> >-			return -EIO;
> >+		if (!need_refill) {
> 
> Can we drop `need_refill` entirly and just check `hvs->recv_desc` here?

OK. Will post v2 later today.

> Mainly because now the comment we are adding is confusing me about what
> `need_refill` means.
> 
> The rest LGTM.
> 
> Thanks,
> Stefano

Thanks for the review!


^ permalink raw reply

* Re: [RFC v1 3/5] hyperv: Introduce new hypercall interfaces used by Hyper-V guest IOMMU
From: Tianyu Lan @ 2026-04-16  3:30 UTC (permalink / raw)
  To: Yu Zhang
  Cc: linux-kernel, linux-hyperv, iommu, linux-pci, kys, haiyangz,
	wei.liu, decui, lpieralisi, kwilczynski, mani, robh, bhelgaas,
	arnd, joro, will, robin.murphy, easwar.hariharan, jacob.pan,
	nunodasneves, mrathor, mhklinux, peterz, linux-arch
In-Reply-To: <20251209051128.76913-4-zhangyu1@linux.microsoft.com>

On Tue, Dec 9, 2025 at 1:13 PM Yu Zhang <zhangyu1@linux.microsoft.com> wrote:
>
> From: Wei Liu <wei.liu@kernel.org>
>
> Hyper-V guest IOMMU is a para-virtualized IOMMU based on hypercalls.
> Introduce the hypercalls used by the child partition to interact with
> this facility.
>
> These hypercalls fall into below categories:
> - Detection and capability: HVCALL_GET_IOMMU_CAPABILITIES is used to
>   detect the existence and capabilities of the guest IOMMU.
>
> - Device management: HVCALL_GET_LOGICAL_DEVICE_PROPERTY is used to
>   check whether an endpoint device is managed by the guest IOMMU.
>
> - Domain management: A set of hypercalls is provided to handle the
>   creation, configuration, and deletion of guest domains, as well as
>   the attachment/detachment of endpoint devices to/from those domains.
>
> - IOTLB flushing: HVCALL_FLUSH_DEVICE_DOMAIN is used to ask Hyper-V
>   for a domain-selective IOTLB flush(which in its handler may flush
>   the device TLB as well). Page-selective IOTLB flushes will be offered
>   by new hypercalls in future patches.
>
> Signed-off-by: Wei Liu <wei.liu@kernel.org>
> Co-developed-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
> Co-developed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
> Signed-off-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
> Co-developed-by: Yu Zhang <zhangyu1@linux.microsoft.com>
> Signed-off-by: Yu Zhang <zhangyu1@linux.microsoft.com>
> ---
>  include/hyperv/hvgdk_mini.h |   8 +++
>  include/hyperv/hvhdk_mini.h | 123 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 131 insertions(+)
>
> diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
> index 77abddfc750e..e5b302bbfe14 100644
> --- a/include/hyperv/hvgdk_mini.h
> +++ b/include/hyperv/hvgdk_mini.h
> @@ -478,10 +478,16 @@ union hv_vp_assist_msr_contents {  /* HV_REGISTER_VP_ASSIST_PAGE */
>  #define HVCALL_GET_VP_INDEX_FROM_APIC_ID                       0x009a
>  #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE      0x00af
>  #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST       0x00b0
> +#define HVCALL_CREATE_DEVICE_DOMAIN                    0x00b1
> +#define HVCALL_ATTACH_DEVICE_DOMAIN                    0x00b2
>  #define HVCALL_SIGNAL_EVENT_DIRECT                     0x00c0
>  #define HVCALL_POST_MESSAGE_DIRECT                     0x00c1
>  #define HVCALL_DISPATCH_VP                             0x00c2
> +#define HVCALL_DETACH_DEVICE_DOMAIN                    0x00c4
> +#define HVCALL_DELETE_DEVICE_DOMAIN                    0x00c5
>  #define HVCALL_GET_GPA_PAGES_ACCESS_STATES             0x00c9
> +#define HVCALL_CONFIGURE_DEVICE_DOMAIN                 0x00ce
> +#define HVCALL_FLUSH_DEVICE_DOMAIN                     0x00d0
>  #define HVCALL_ACQUIRE_SPARSE_SPA_PAGE_HOST_ACCESS     0x00d7
>  #define HVCALL_RELEASE_SPARSE_SPA_PAGE_HOST_ACCESS     0x00d8
>  #define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY  0x00db
> @@ -492,6 +498,8 @@ union hv_vp_assist_msr_contents {    /* HV_REGISTER_VP_ASSIST_PAGE */
>  #define HVCALL_GET_VP_CPUID_VALUES                     0x00f4
>  #define HVCALL_MMIO_READ                               0x0106
>  #define HVCALL_MMIO_WRITE                              0x0107
> +#define HVCALL_GET_IOMMU_CAPABILITIES                  0x0125
> +#define HVCALL_GET_LOGICAL_DEVICE_PROPERTY             0x0127
>
>  /* HV_HYPERCALL_INPUT */
>  #define HV_HYPERCALL_RESULT_MASK       GENMASK_ULL(15, 0)
> diff --git a/include/hyperv/hvhdk_mini.h b/include/hyperv/hvhdk_mini.h
> index 858f6a3925b3..ba6b91746b13 100644
> --- a/include/hyperv/hvhdk_mini.h
> +++ b/include/hyperv/hvhdk_mini.h
> @@ -400,4 +400,127 @@ union hv_device_id {              /* HV_DEVICE_ID */
>         } acpi;
>  } __packed;
>
> +/* Device domain types */
> +#define HV_DEVICE_DOMAIN_TYPE_S1       1 /* Stage 1 domain */
> +
> +/* ID for default domain and NULL domain */
> +#define HV_DEVICE_DOMAIN_ID_DEFAULT 0
> +#define HV_DEVICE_DOMAIN_ID_NULL    0xFFFFFFFFULL
> +
> +union hv_device_domain_id {
> +       u64 as_uint64;
> +       struct {
> +               u32 type: 4;
> +               u32 reserved: 28;
> +               u32 id;
> +       } __packed;
> +};
> +
> +struct hv_input_device_domain {
> +       u64 partition_id;
> +       union hv_input_vtl owner_vtl;
> +       u8 padding[7];
> +       union hv_device_domain_id domain_id;
> +} __packed;
> +
> +union hv_create_device_domain_flags {
> +       u32 as_uint32;
> +       struct {
> +               u32 forward_progress_required: 1;
> +               u32 inherit_owning_vtl: 1;
> +               u32 reserved: 30;
> +       } __packed;
> +};
> +
> +struct hv_input_create_device_domain {
> +       struct hv_input_device_domain device_domain;
> +       union hv_create_device_domain_flags create_device_domain_flags;
> +} __packed;
> +
> +struct hv_input_delete_device_domain {
> +       struct hv_input_device_domain device_domain;
> +} __packed;
> +
> +struct hv_input_attach_device_domain {
> +       struct hv_input_device_domain device_domain;
> +       union hv_device_id device_id;
> +} __packed;
> +
> +struct hv_input_detach_device_domain {
> +       u64 partition_id;
> +       union hv_device_id device_id;
> +} __packed;
> +
> +struct hv_device_domain_settings {
> +       struct {
> +               /*
> +                * Enable translations. If not enabled, all transaction bypass
> +                * S1 translations.
> +                */
> +               u64 translation_enabled: 1;
> +               u64 blocked: 1;
> +               /*
> +                * First stage address translation paging mode:
> +                * 0: 4-level paging (default)
> +                * 1: 5-level paging
> +                */
> +               u64 first_stage_paging_mode: 1;
> +               u64 reserved: 61;
> +       } flags;
> +
> +       /* Address of translation table */
> +       u64 page_table_root;
> +} __packed;
> +
> +struct hv_input_configure_device_domain {
> +       struct hv_input_device_domain device_domain;
> +       struct hv_device_domain_settings settings;
> +} __packed;
> +
> +struct hv_input_get_iommu_capabilities {
> +       u64 partition_id;
> +       u64 reserved;
> +} __packed;
> +
> +struct hv_output_get_iommu_capabilities {
> +       u32 size;
> +       u16 reserved;
> +       u8  max_iova_width;
> +       u8  max_pasid_width;
> +
> +#define HV_IOMMU_CAP_PRESENT (1ULL << 0)
> +#define HV_IOMMU_CAP_S2 (1ULL << 1)
> +#define HV_IOMMU_CAP_S1 (1ULL << 2)
> +#define HV_IOMMU_CAP_S1_5LVL (1ULL << 3)
> +#define HV_IOMMU_CAP_PASID (1ULL << 4)
> +#define HV_IOMMU_CAP_ATS (1ULL << 5)
> +#define HV_IOMMU_CAP_PRI (1ULL << 6)
> +
> +       u64 iommu_cap;
> +       u64 pgsize_bitmap;
> +} __packed;
> +
> +enum hv_logical_device_property_code {
> +       HV_LOGICAL_DEVICE_PROPERTY_PVIOMMU = 10,
> +};
> +
> +struct hv_input_get_logical_device_property {
> +       u64 partition_id;
> +       u64 logical_device_id;
> +       enum hv_logical_device_property_code code;
> +       u32 reserved;
> +} __packed;
> +
> +struct hv_output_get_logical_device_property {
> +#define HV_DEVICE_IOMMU_ENABLED (1ULL << 0)
> +       u64 device_iommu;
> +       u64 reserved;
> +} __packed;
> +
> +struct hv_input_flush_device_domain {
> +       struct hv_input_device_domain device_domain;
> +       u32 flags;
> +       u32 reserved;
> +} __packed;
> +
>  #endif /* _HV_HVHDK_MINI_H */
> --
> 2.49.0
>
>

No code uses these definitions in this patch and it's better to merge
with Patch 5?
-- 
Thanks
Tianyu Lan

^ permalink raw reply

* Re: [PATCH 55/61] interconnect: Prefer IS_ERR_OR_NULL over manual NULL check
From: Krzysztof Kozlowski @ 2026-04-16 12:24 UTC (permalink / raw)
  To: Philipp Hahn, amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel,
	dri-devel, gfs2, intel-gfx, intel-wired-lan, iommu, kvm,
	linux-arm-kernel, linux-block, linux-bluetooth, linux-btrfs,
	linux-cifs, linux-clk, linux-erofs, linux-ext4, linux-fsdevel,
	linux-gpio, linux-hyperv, linux-input, linux-kernel, linux-leds,
	linux-media, linux-mips, linux-mm, linux-modules, linux-mtd,
	linux-nfs, linux-omap, linux-phy, linux-pm, linux-rockchip,
	linux-s390, linux-scsi, linux-sctp, linux-security-module,
	linux-sh, linux-sound, linux-stm32, linux-trace-kernel, linux-usb,
	linux-wireless, netdev, ntfs3, samba-technical, sched-ext,
	target-devel, tipc-discussion, v9fs
  Cc: Georgi Djakov
In-Reply-To: <20260310-b4-is_err_or_null-v1-55-bd63b656022d@avm.de>

On 10/03/2026 12:49, Philipp Hahn wrote:
> Prefer using IS_ERR_OR_NULL() over using IS_ERR() and a manual NULL
> check.
> 
> Semantich change: Previously the code only printed the warning on error,
> but not when the pointer was NULL. Now the warning is printed in both
> cases!

NAK, read the code

> 
> Change found with coccinelle.
> 
> To: Georgi Djakov <djakov@kernel.org>
> Cc: linux-pm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
> ---
>  drivers/interconnect/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> index 8569b78a18517b33abeafac091978b25cbc1acc7..22e92b30f73853d5bd2e05b4f52cb5aa22556468 100644
> --- a/drivers/interconnect/core.c
> +++ b/drivers/interconnect/core.c
> @@ -790,7 +790,7 @@ void icc_put(struct icc_path *path)
>  	size_t i;
>  	int ret;
>  
> -	if (!path || WARN_ON(IS_ERR(path)))
> +	if (WARN_ON(IS_ERR_OR_NULL(path)))

IS_ERR_OR_NULL is simply discouraged, but beside of code preference, you
just added bug here. This is clearly not equivalent and you emit warn on
perfectly valid case!

Best regards,
Krzysztof

^ permalink raw reply

* Re: [PATCH 01/61] Coccinelle: Prefer IS_ERR_OR_NULL over manual NULL check
From: Krzysztof Kozlowski @ 2026-04-16 12:30 UTC (permalink / raw)
  To: Philipp Hahn, amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel,
	dri-devel, gfs2, intel-gfx, intel-wired-lan, iommu, kvm,
	linux-arm-kernel, linux-block, linux-bluetooth, linux-btrfs,
	linux-cifs, linux-clk, linux-erofs, linux-ext4, linux-fsdevel,
	linux-gpio, linux-hyperv, linux-input, linux-kernel, linux-leds,
	linux-media, linux-mips, linux-mm, linux-modules, linux-mtd,
	linux-nfs, linux-omap, linux-phy, linux-pm, linux-rockchip,
	linux-s390, linux-scsi, linux-sctp, linux-security-module,
	linux-sh, linux-sound, linux-stm32, linux-trace-kernel, linux-usb,
	linux-wireless, netdev, ntfs3, samba-technical, sched-ext,
	target-devel, tipc-discussion, v9fs
  Cc: Julia Lawall, Nicolas Palix
In-Reply-To: <20260310-b4-is_err_or_null-v1-1-bd63b656022d@avm.de>

On 10/03/2026 12:48, Philipp Hahn wrote:
> Find and convert uses of IS_ERR() plus NULL check to IS_ERR_OR_NULL().
> 
> There are several cases where `!ptr && WARN_ON[_ONCE](IS_ERR(ptr))` is
> used:
> - arch/x86/kernel/callthunks.c:215 WARN_ON_ONCE
> - drivers/clk/clk.c:4561 WARN_ON_ONCE
> - drivers/interconnect/core.c:793 WARN_ON
> - drivers/reset/core.c:718 WARN_ON
> The change is not 100% semantical equivalent as the warning will now
> also happen when the pointer is NULL.
> 
> To: Julia Lawall <Julia.Lawall@inria.fr>
> To: Nicolas Palix <nicolas.palix@imag.fr>
> Cc: cocci@inria.fr
> Cc: linux-kernel@vger.kernel.org
> 
> ---
> drivers/clocksource/mips-gic-timer.c:283 looks suspicious: ret != clk,
> but Daniel Lezcano verified it as cottect.
> 
> There are some cases where the checks are part of a larger expression:
> - mm/kmemleak.c:1095
> - mm/kmemleak.c:1155
> - mm/kmemleak.c:1173
> - mm/kmemleak.c:1290
> - mm/kmemleak.c:1328
> - mm/kmemleak.c:1241
> - mm/kmemleak.c:1310
> - mm/kmemleak.c:1258
> - net/netlink/af_netlink.c:2670
> Thanks to Julia Lawall for the help to also handle them.
> 
> Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
> ---
>  scripts/coccinelle/api/is_err_or_null.cocci | 125 ++++++++++++++++++++++++++++
>  1 file changed, 125 insertions(+)
> 

Neither this, nor try from 2011, nor any future try should be accepted,
because it creates impression IS_ERR_OR_NULL is somehow okay. No, it is
not okay, it is a discouraged pattern leading to less readable and
maintainable code. We should not have therefore any tools suggesting
usage of IS_ERR_OR_NULL, because people will be converting poor code
into that, instead of fixing that poor code.

Best regards,
Krzysztof

^ permalink raw reply

* [PATCH] mshv: remove page order restriction to enable 1G hugepage support
From: Anirudh Rayabharam (Microsoft) @ 2026-04-16 13:37 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li
  Cc: linux-hyperv, linux-kernel, Anirudh Rayabharam (Microsoft)

The hypervisor's map GPA hypercall handles large pages intelligently,
combining 2M pages into 1G mappings when alignment allows.

Remove the PMD_ORDER check in mshv_chunk_stride() so that 1G hugepages
and other large page orders are passed through as 2M-aligned chunks,
letting the hypervisor promote them to 1G mappings automatically.

Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
---
 drivers/hv/mshv_regions.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index fdffd4f002f6..5f617a96d97a 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -29,7 +29,7 @@
  * Uses huge page stride if the backing page is huge and the guest mapping
  * is properly aligned; otherwise falls back to single page stride.
  *
- * Return: Stride in pages, or -EINVAL if page order is unsupported.
+ * Return: Stride in pages.
  */
 static int mshv_chunk_stride(struct page *page,
 			     u64 gfn, u64 page_count)
@@ -47,9 +47,6 @@ static int mshv_chunk_stride(struct page *page,
 		return 1;

 	page_order = folio_order(page_folio(page));
-	/* The hypervisor only supports 2M huge page */
-	if (page_order != PMD_ORDER)
-		return -EINVAL;

 	return 1 << page_order;
 }

---
base-commit: cd9f2e7d6e5b1837ef40b96e300fa28b73ab5a77
change-id: 20260416-huge_1g-e44461393c8f

Best regards,
-- 
Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>

^ permalink raw reply related

* [PATCH] Drivers: hv: mshv: add bounds check on vp_index in mshv_intercept_isr()
From: Junrui Luo @ 2026-04-16 14:18 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Jinank Jain, Praveen K Paladugu, Mukesh Rathor, Nuno Das Neves,
	Anirudh Rayabharam
  Cc: Roman Kisel, Muminul Islam, linux-hyperv, linux-kernel,
	Stanislav Kinsburskii, Yuhao Jiang, Junrui Luo

mshv_intercept_isr() extracts vp_index from the hypervisor message
payload and uses it directly to index into pt_vp_array without
validation. handle_bitset_message() and handle_pair_message() already
validate vp_index against MSHV_MAX_VPS before array access.

A vp_index exceeding MSHV_MAX_VPS leads to an out-of-bounds read from
pt_vp_array.

Add the same MSHV_MAX_VPS bounds check for consistency with the other
message handlers.

Fixes: 621191d709b1 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
---
 drivers/hv/mshv_synic.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/hv/mshv_synic.c b/drivers/hv/mshv_synic.c
index 43f1bcbbf2d3..5bceb8122981 100644
--- a/drivers/hv/mshv_synic.c
+++ b/drivers/hv/mshv_synic.c
@@ -384,6 +384,10 @@ mshv_intercept_isr(struct hv_message *msg)
 	 */
 	vp_index =
 	       ((struct hv_opaque_intercept_message *)msg->u.payload)->vp_index;
+	if (unlikely(vp_index >= MSHV_MAX_VPS)) {
+		pr_debug("VP index %u out of bounds\n", vp_index);
+		goto unlock_out;
+	}
 	vp = partition->pt_vp_array[vp_index];
 	if (unlikely(!vp)) {
 		pr_debug("failed to find VP %u\n", vp_index);

---
base-commit: 7aaa8047eafd0bd628065b15757d9b48c5f9c07d
change-id: 20260416-fixes-693196e52f93

Best regards,
-- 
Junrui Luo <moonafterrain@outlook.com>


^ permalink raw reply related

* Re: [PATCH net-next v6 0/2] net: mana: add ethtool private flag for full-page RX buffers
From: Jakub Kicinski @ 2026-04-16 15:31 UTC (permalink / raw)
  To: Dipayaan Roy
  Cc: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	pabeni, leon, longli, kotaranov, horms, shradhagupta, ssengar,
	ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, stephen, jacob.e.keller, leitao, kees, john.fastabend,
	hawk, bpf, daniel, ast, sdf, dipayanroy
In-Reply-To: <ad5kuCZz+gR1TlSh@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>

On Tue, 14 Apr 2026 09:00:56 -0700 Dipayaan Roy wrote:
> I still see roughly a 5% overhead from the atomic refcount operation
> itself, but on that platform there is no throughput drop when using
> page fragments versus full-page mode.

That seems to contradict your claim that it's a problem with a specific
platform.. Since we're in the merge window I asked David Wei to try to
experiment with disabling page fragmentation on the ARM64 platforms we
have at Meta. If it repros we should use the generic rx-buf-len
ringparam because more NICs may want to implement this strategy.

^ permalink raw reply

* [PATCH] Drivers: hv: vmbus: Improve the logc of reserving fb_mmio on Gen2 VMs
From: Dexuan Cui @ 2026-04-16 18:35 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel,
	mhklinux, matthew.ruffell, johansen
  Cc: stable

If vmbus_reserve_fb() in the kdump kernel fails to properly reserve the
framebuffer MMIO range due to a Gen2 VM's screen.lfb_base being zero [1],
there is an MMIO conflict between the drivers hyperv_drm and pci-hyperv.
This is especially an issue if pci-hyperv is built-in and hyperv_drm is
built as a module. Consequently, the kdump kernel fails to detect PCI
devices via pci-hyperv, and may fail to mount the root file system,
which may reside in a NVMe disk.

On Gen2 VMs, if the screen.lfb_base is 0 in the kdump kernel, fall
back to the low MMIO base, which should be equal to the framebuffer
MMIO base (Tested on x64 Windows Server 2016, and on x64 and ARM64 Windows
Server 2025 and on Azure) [2]. In the first kernel, screen.lfb_base
is not 0; if the user specifies a high resolution, it's not enough to
only reserve 8MB: in this case, reserve half of the space below 4GB, but
cap the reservation to 128MB, which is the required framebuffer size of
the highest resolution 7680*4320 supported by Hyper-V.

Add the cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) check, because a CoCo
VM (i.e. Confidential VM) on Hyper-V doesn't have any framebuffer
device, so there is no need to reserve any MMIO for it.

While at it, fix the comparison "end > VTPM_BASE_ADDRESS" by changing
the > to >=. Here the 'end' is an inclusive end (typically, it's
0xFFFF_FFFF).

[1] https://lore.kernel.org/all/SA1PR21MB692176C1BC53BFC9EAE5CF8EBF51A@SA1PR21MB6921.namprd21.prod.outlook.com/
[2] https://lore.kernel.org/all/SA1PR21MB69218F955B62DFF62E3E88D2BF222@SA1PR21MB6921.namprd21.prod.outlook.com/

Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
CC: stable@vger.kernel.org
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/hv/vmbus_drv.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index f0d0803d1e16..a0b34f9e426a 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -37,6 +37,7 @@
 #include <linux/dma-map-ops.h>
 #include <linux/pci.h>
 #include <linux/export.h>
+#include <linux/cc_platform.h>
 #include <clocksource/hyperv_timer.h>
 #include <asm/mshyperv.h>
 #include "hyperv_vmbus.h"
@@ -2327,8 +2328,8 @@ static acpi_status vmbus_walk_resources(struct acpi_resource *res, void *ctx)
 		return AE_NO_MEMORY;

 	/* If this range overlaps the virtual TPM, truncate it. */
-	if (end > VTPM_BASE_ADDRESS && start < VTPM_BASE_ADDRESS)
-		end = VTPM_BASE_ADDRESS;
+	if (end >= VTPM_BASE_ADDRESS && start < VTPM_BASE_ADDRESS)
+		end = VTPM_BASE_ADDRESS - 1;

 	new_res->name = "hyperv mmio";
 	new_res->flags = IORESOURCE_MEM;
@@ -2395,13 +2396,36 @@ static void vmbus_mmio_remove(void)
 static void __maybe_unused vmbus_reserve_fb(void)
 {
 	resource_size_t start = 0, size;
+	resource_size_t low_mmio_base;
 	struct pci_dev *pdev;

+	/* Hyper-V CoCo guests do not have a framebuffer device. */
+	if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
+		return;
+
 	if (efi_enabled(EFI_BOOT)) {
 		/* Gen2 VM: get FB base from EFI framebuffer */
 		if (IS_ENABLED(CONFIG_SYSFB)) {
 			start = sysfb_primary_display.screen.lfb_base;
 			size = max_t(__u32, sysfb_primary_display.screen.lfb_size, 0x800000);
+
+			low_mmio_base = hyperv_mmio->start;
+			if (!low_mmio_base || low_mmio_base >= SZ_4G ||
+			    (start && start < low_mmio_base)) {
+				pr_warn("Unexpected low mmio base 0x%pa\n", &low_mmio_base);
+			} else {
+				/*
+				 * If the kdump kernel's lfb_base is 0,
+				 * fall back to the low mmio base.
+				 */
+				if (!start)
+					start = low_mmio_base;
+				/*
+				 * Reserve half of the space below 4GB for high
+				 * resolutions, but cap the reservation to 128MB.
+				 */
+				size = min((SZ_4G - start) / 2, SZ_128M);
+			}
 		}
 	} else {
 		/* Gen1 VM: get FB base from PCI */
@@ -2433,6 +2457,8 @@ static void __maybe_unused vmbus_reserve_fb(void)
 	 */
 	for (; !fb_mmio && (size >= 0x100000); size >>= 1)
 		fb_mmio = __request_region(hyperv_mmio, start, size, fb_mmio_name, 0);
+
+	pr_info("hv_mmio=%pR,%pR fb=%pR\n", hyperv_mmio, hyperv_mmio->sibling, fb_mmio);
 }

 /**
-- 
2.34.1

^ permalink raw reply related

* RE: [PATCH v2] PCI: hv: Allocate MMIO from above 4GB for the config window
From: Dexuan Cui @ 2026-04-16 18:49 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
	Long Li, lpieralisi@kernel.org, kwilczynski@kernel.org,
	mani@kernel.org, robh@kernel.org, bhelgaas@google.com,
	Jake Oshins, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	matthew.ruffell@canonical.com, kjlx@templeofstupid.com
  Cc: Krister Johansen, stable@vger.kernel.org
In-Reply-To: <SA1PR21MB69215C164B06109C6682984EBF5BA@SA1PR21MB6921.namprd21.prod.outlook.com>

> > ...
> > This approach could be taken one step further, where vmbus_reserve_fb()
> > *always* reserves 64 MiB starting at the low end of low MMIO space,
> > regardless of the value of "start". The messy code for getting "start"
> > could be dropped entirely, and the dependency on CONFIG_SYSFB goes
> > away. Or maybe still get the value of "start" and "size", and if non-zero
> > just do a sanity check that they are within the fixed 64 MiB reserved area.

My earlier reply yesterday explains why we shouldn't get rid of the
screen.lfb_base. I'm trying to make as few assumptions as possible.

> > Thoughts? To me tweaking vmbus_reserve_fb() is a more
> > straightforward and explicit way to do the reserving, vs. modifying
> > the requested range in the Hyper-V PCI driver.
> 
> Agreed. Let me try to make a new patch for review.

I just posted a patch here:
https://lwn.net/ml/linux-kernel/20260416183529.838321-1-decui%40microsoft.com/
Please review.

The new patch changes the vmbus driver. With it, the previous v2 pci-hyperv patch
 is unnecessary now.

Thanks,
-- Dexuan

^ permalink raw reply

* [PATCH net v2] hv_sock: Report EOF instead of -EIO for FIN
From: Dexuan Cui @ 2026-04-16 19:14 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, sgarzare, davem, edumazet,
	kuba, pabeni, horms, niuxuewei.nxw, linux-hyperv, virtualization,
	netdev, linux-kernel
  Cc: stable, Ben Hillis, Mitchell Levy

Commit f0c5827d07cb unluckily causes a regression for the FIN packet,
and the final read syscall gets an error rather than 0.

Ideally, we would want to fix hvs_channel_readable_payload() so that it
could return 0 in the FIN scenario, but it's not good for the hv_sock
driver to use the VMBus ringbuffer's cached priv_read_index, which is
internal data in the VMBus driver.

Fix the regression in hv_sock by returning 0 rather than -EIO.

Fixes: f0c5827d07cb ("hv_sock: Return the readable bytes in hvs_stream_has_data()")
Cc: stable@vger.kernel.org
Reported-by: Ben Hillis <Ben.Hillis@microsoft.com>
Reported-by: Mitchell Levy <levymitchell0@gmail.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes since v1:
    Removed the local variable 'need_refill' to make the code more 
    readable. Stefano, thanks!

    No other change.

 net/vmw_vsock/hyperv_transport.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 069386a74557..e5ee7aa14d0c 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -694,7 +694,6 @@ static ssize_t hvs_stream_enqueue(struct vsock_sock *vsk, struct msghdr *msg,
 static s64 hvs_stream_has_data(struct vsock_sock *vsk)
 {
 	struct hvsock *hvs = vsk->trans;
-	bool need_refill;
 	s64 ret;
 
 	if (hvs->recv_data_len > 0)
@@ -702,9 +701,22 @@ static s64 hvs_stream_has_data(struct vsock_sock *vsk)
 
 	switch (hvs_channel_readable_payload(hvs->chan)) {
 	case 1:
-		need_refill = !hvs->recv_desc;
-		if (!need_refill)
-			return -EIO;
+		if (hvs->recv_desc) {
+			/* Here hvs->recv_data_len is 0, so hvs->recv_desc must
+			 * be NULL unless it points to the 0-byte-payload FIN
+			 * packet: see hvs_update_recv_data().
+			 *
+			 * Here all the payload has been dequeued, but
+			 * hvs_channel_readable_payload() still returns 1,
+			 * because the VMBus ringbuffer's read_index is not
+			 * updated for the FIN packet: hvs_stream_dequeue() ->
+			 * hv_pkt_iter_next() updates the cached priv_read_index
+			 * but has no opportunity to update the read_index in
+			 * hv_pkt_iter_close() as hvs_stream_has_data() returns
+			 * 0 for the FIN packet, so it won't get dequeued.
+			 */
+			return 0;
+		}
 
 		hvs->recv_desc = hv_pkt_iter_first(hvs->chan);
 		if (!hvs->recv_desc)
-- 
2.49.0


^ permalink raw reply related

* RE: [EXTERNAL] Re: [PATCH net] hv_sock: Report EOF instead of -EIO for FIN
From: Dexuan Cui @ 2026-04-16 19:30 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Long Li,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, horms@kernel.org, niuxuewei.nxw@antgroup.com,
	linux-hyperv@vger.kernel.org, virtualization@lists.linux.dev,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org, Ben Hillis, Mitchell Levy
In-Reply-To: <SA1PR21MB6921C57E27E17305E56BC0F9BF222@SA1PR21MB6921.namprd21.prod.outlook.com>

> From: Dexuan Cui
> Sent: Wednesday, April 15, 2026 9:56 AM
> To: 'Stefano Garzarella' <sgarzare@redhat.com>
> > ...
> > Can we drop `need_refill` entirly and just check `hvs->recv_desc` here?
> 
> OK. Will post v2 later today.
> 
> > Mainly because now the comment we are adding is confusing me about what
> > `need_refill` means.
> >
> > The rest LGTM.
> >
> > Thanks,
> > Stefano

Hi Stefano, I just posted v2 here:
https://lore.kernel.org/linux-hyperv/20260416191433.840637-1-decui@microsoft.com/T/#u

^ permalink raw reply

* RE: [PATCH] Drivers: hv: vmbus: Improve the logc of reserving fb_mmio on Gen2 VMs
From: Dexuan Cui @ 2026-04-16 19:58 UTC (permalink / raw)
  To: Dexuan Cui, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
	Long Li, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, mhklinux@outlook.com,
	matthew.ruffell@canonical.com, johansen@templeofstupid.com
  Cc: stable@vger.kernel.org
In-Reply-To: <20260416183529.838321-1-decui@microsoft.com>

> Subject: [PATCH] Drivers: hv: vmbus: Improve the logc of reserving fb_mmio on
> Gen2 VMs

Sorry for the typo in the subject -- the "logc" should be "logic". If this is the only
issue, I guess Wei can fix it for me :-)


^ permalink raw reply

* Re: [PATCH net-next 0/2] net: mana: Avoid queue struct allocation failure under memory fragmentation
From: Jakub Kicinski @ 2026-04-17  2:08 UTC (permalink / raw)
  To: Aditya Garg
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, pabeni, kotaranov, horms, ssengar, jacob.e.keller,
	dipayanroy, ernis, shirazsaleem, kees, sbhatta, leitao, netdev,
	linux-hyperv, linux-kernel, linux-rdma, bpf, gargaditya
In-Reply-To: <20260414151456.687506-1-gargaditya@linux.microsoft.com>

On Tue, 14 Apr 2026 08:13:28 -0700 Aditya Garg wrote:
> The MANA driver can fail to load on systems with high memory
> utilization because several allocations in the queue setup paths
> require large physically contiguous blocks via kmalloc. Under memory
> fragmentation these high-order allocations may fail, preventing the
> driver from creating queues at probe time or when reconfiguring
> channels, ring parameters or MTU at runtime.

## Form letter - net-next-closed

We have already submitted our pull request with net-next material for v7.1,
and therefore net-next is closed for new drivers, features, code refactoring
and optimizations. We are currently accepting bug fixes only.

Please repost when net-next reopens after Apr 27th.

RFC patches sent for review only are obviously welcome at any time.

See: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#development-cycle
-- 
pw-bot: defer
pv-bot: closed

^ permalink raw reply

* Re: [PATCH net v2] hv_sock: Report EOF instead of -EIO for FIN
From: Stefano Garzarella @ 2026-04-17  8:11 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: kys, haiyangz, wei.liu, longli, davem, edumazet, kuba, pabeni,
	horms, niuxuewei.nxw, linux-hyperv, virtualization, netdev,
	linux-kernel, stable, Ben Hillis, Mitchell Levy
In-Reply-To: <20260416191433.840637-1-decui@microsoft.com>

On Thu, Apr 16, 2026 at 12:14:33PM -0700, Dexuan Cui wrote:
>Commit f0c5827d07cb unluckily causes a regression for the FIN packet,
>and the final read syscall gets an error rather than 0.
>
>Ideally, we would want to fix hvs_channel_readable_payload() so that it
>could return 0 in the FIN scenario, but it's not good for the hv_sock
>driver to use the VMBus ringbuffer's cached priv_read_index, which is
>internal data in the VMBus driver.
>
>Fix the regression in hv_sock by returning 0 rather than -EIO.
>
>Fixes: f0c5827d07cb ("hv_sock: Return the readable bytes in hvs_stream_has_data()")
>Cc: stable@vger.kernel.org
>Reported-by: Ben Hillis <Ben.Hillis@microsoft.com>
>Reported-by: Mitchell Levy <levymitchell0@gmail.com>
>Signed-off-by: Dexuan Cui <decui@microsoft.com>
>---
>
>Changes since v1:
>    Removed the local variable 'need_refill' to make the code more
>    readable. Stefano, thanks!

Thanks for the fix!

>
>    No other change.
>
> net/vmw_vsock/hyperv_transport.c | 20 ++++++++++++++++----
> 1 file changed, 16 insertions(+), 4 deletions(-)

Acked-by: Stefano Garzarella <sgarzare@redhat.com>


^ permalink raw reply

* Re: [PATCH net v3 1/5] net: mana: Init link_change_work before potential error paths in probe
From: Simon Horman @ 2026-04-17 14:08 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, ssengar, dipayanroy, gargaditya,
	shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
	linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260415080944.732901-2-ernis@linux.microsoft.com>

On Wed, Apr 15, 2026 at 01:09:37AM -0700, Erni Sri Satya Vennela wrote:
> Move INIT_WORK(link_change_work) to right after the mana_context
> allocation, before any error path that could reach mana_remove().
> 
> Previously, if mana_create_eq() or mana_query_device_cfg() failed,
> mana_probe() would jump to the error path which calls mana_remove().
> mana_remove() unconditionally calls disable_work_sync(link_change_work),
> but the work struct had not been initialized yet. This can trigger
> CONFIG_DEBUG_OBJECTS_WORK enabled.
> 
> Fixes: 54133f9b4b53 ("net: mana: Support HW link state events")
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> ---
> Changes in v3:
> * No change.
> Changes in v2:
> * Apply the patch in net instead of net-next.

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [PATCH net v3 2/5] net: mana: Init gf_stats_work before potential error paths in probe
From: Simon Horman @ 2026-04-17 14:08 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, ssengar, dipayanroy, gargaditya,
	shirazsaleem, kees, kotaranov, leon, shacharr, stephen,
	linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260415080944.732901-3-ernis@linux.microsoft.com>

On Wed, Apr 15, 2026 at 01:09:38AM -0700, Erni Sri Satya Vennela wrote:
> Move INIT_DELAYED_WORK(gf_stats_work) to before mana_create_eq(),
> while keeping schedule_delayed_work() at its original location.
> 
> Previously, if any function between mana_create_eq() and the
> INIT_DELAYED_WORK call failed, mana_probe() would call mana_remove()
> which unconditionally calls cancel_delayed_work_sync(gf_stats_work)
> in __flush_work() or debug object warnings with
> CONFIG_DEBUG_OBJECTS_WORK enabled.
> 
> Fixes: be4f1d67ec56 ("net: mana: Add standard counter rx_missed_errors")
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> ---
> Changes in v3:
> * No change
> Changes in v2:
> * Apply the patch in net instead of net-next.

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [PATCH net v3 3/5] net: mana: Guard mana_remove against double invocation
From: Simon Horman @ 2026-04-17 14:09 UTC (permalink / raw)
  To: ernis
  Cc: Simon Horman, haiyangz, davem, longli, ssengar, linux-kernel,
	kuba, kotaranov, stephen, andrew+netdev, kys, wei.liu,
	shirazsaleem, kees, pabeni, linux-hyperv, edumazet, decui, leon,
	gargaditya, dipayanroy, netdev, shacharr
In-Reply-To: <20260415080944.732901-4-ernis@linux.microsoft.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.

For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net: mana: Guard mana_remove against double invocation

If PM resume fails (e.g., mana_attach() returns an error), mana_probe()
calls mana_remove(), which tears down the device and sets
gd->gdma_context = NULL and gd->driver_data = NULL. However, a failed
resume callback does not automatically unbind the driver. When the device
is eventually unbound, mana_remove() is invoked a second time. Without a
NULL check, it dereferences gc->dev with gc == NULL, causing a kernel
panic. Add an early return if gdma_context or driver_data is NULL so the
second invocation is harmless.

> Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")

The Fixes: tag points to ca9c54d2d6a5, which introduced mana_remove()
without NULL checks. However, the exploitable double invocation scenario
was actually introduced by commit 635096a86edb ("net: mana: Support
hibernation and kexec"), which added suspend/resume support and made
mana_probe() call mana_remove() on error.

Should the Fixes: tag be:

Fixes: 635096a86edb ("net: mana: Support hibernation and kexec")

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index 9d095a6fb56c..2ab7c89e2fed 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -3685,11 +3685,16 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
>  	struct gdma_context *gc = gd->gdma_context;
>  	struct mana_context *ac = gd->driver_data;
>  	struct mana_port_context *apc;
> -	struct device *dev = gc->dev;
> +	struct device *dev;
>  	struct net_device *ndev;
>  	int err;
>  	int i;
>
> +	if (!gc || !ac)
> +		return;
> +
> +	dev = gc->dev;
> +
>  	disable_work_sync(&ac->link_change_work);
>  	cancel_delayed_work_sync(&ac->gf_stats_work);

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox