* [PATCH] Drivers: hv: vmbus: fix typo in function name reference
From: Julia Lawall @ 2025-12-30 14:14 UTC (permalink / raw)
To: K. Y. Srinivasan
Cc: yunbolyu, kexinsun, ratnadiraw, xutong.ma, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, linux-hyperv, linux-kernel
Replace cmxchg by cmpxchg.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
---
drivers/hv/hyperv_vmbus.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index b2862e0a317a..cdbc5f5c3215 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -375,7 +375,7 @@ static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
return;
/*
- * The cmxchg() above does an implicit memory barrier to
+ * The cmpxchg() above does an implicit memory barrier to
* ensure the write to MessageType (ie set to
* HVMSG_NONE) happens before we read the
* MessagePending and EOMing. Otherwise, the EOMing
^ permalink raw reply related
* Re: [PATCH 1/3] drivers: video: fbdev: Remove hyperv_fb driver
From: Helge Deller @ 2025-12-30 9:06 UTC (permalink / raw)
To: Prasanna Kumar T S M, linux-fbdev, dri-devel, linux-hyperv,
ssengar, mhklinux, wei.liu, kys, haiyangz, decui
Cc: linux-kernel
In-Reply-To: <1766809486-24731-1-git-send-email-ptsm@linux.microsoft.com>
On 12/27/25 05:24, Prasanna Kumar T S M wrote:
> The HyperV DRM driver is available since 5.14. This makes the hyperv_fb
> driver redundant, remove it.
>
> Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
> ---
> MAINTAINERS | 10 -
> drivers/video/fbdev/Kconfig | 11 -
> drivers/video/fbdev/Makefile | 1 -
> drivers/video/fbdev/hyperv_fb.c | 1388 -------------------------------
> 4 files changed, 1410 deletions(-)
> delete mode 100644 drivers/video/fbdev/hyperv_fb.c
applied to fbdev git tree.
Thanks!
Helge
^ permalink raw reply
* RE: [PATCH 1/1] Drivers: hv: Fix uninit'ed variable in hv_msg_dump() if CONFIG_PRINTK not set
From: Michael Kelley @ 2025-12-30 3:04 UTC (permalink / raw)
To: vdso@mailbox.org, mhkelley58@gmail.com
Cc: haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
kys@microsoft.com, linux-kernel@vger.kernel.org,
linux-hyperv@vger.kernel.org, dan.carpenter@linaro.org
In-Reply-To: <21281086.36492.1766981516854@app.mailbox.org>
From: vdso@mailbox.org <vdso@mailbox.org> Sent: Sunday, December 28, 2025 8:12 PM
>
> > On 12/19/2025 8:08 AM mhkelley58@gmail.com wrote:
[snip]
> > @@ -198,9 +199,9 @@ static void hv_kmsg_dump(struct kmsg_dumper *dumper,
> > * be single-threaded.
> > */
> > kmsg_dump_rewind(&iter);
> > - kmsg_dump_get_buffer(&iter, false, hv_panic_page, HV_HYP_PAGE_SIZE,
> > - &bytes_written);
> > - if (!bytes_written)
> > + ret = kmsg_dump_get_buffer(&iter, false, hv_panic_page, HV_HYP_PAGE_SIZE,
> > + &bytes_written);
> > + if (!ret || !bytes_written)
> > return;
> > /*
> > * P3 to contain the physical address of the panic page & P4 to
>
> The existing code
>
> 1. doesn't care about the return value from kmsg_dump_get_buffer.
> The return value wouldn't make the function return before, why does that
> need to change?
The existing code depends on the implementation of kmsg_dump_get_buffer()
always setting bytes_written, even if it fails. That's atypical behavior, but it is
what kmsg_dump_get_buffer() does -- except that if CONFIG_PRINTK=n, the
stub kmsg_dump_get_buffer() does *not* do that. Testing the return value is
the more typical pattern, and bytes_written should be used only if the return
value indicates success. So that's why I proposed this change, instead of just
initializing bytes_written to zero when it is defined. My proposed change
makes the overall pattern more typical, and would work if the implementation
of kmsg_dump_get_buffer() should ever change to not set bytes_written in
some error case.
>
> 2. returns early when there are no bytes written.
> I think it shouldn't as otherwise the crash control register isn't written to,
> and the panic isn't signalled to the host. Is there another path maybe that
> I'm not noticing?
You make an excellent point. I didn't even think about the possibility of the
current logic being wrong. There is hyperv_report_panic(), but it is not called
if hv_panic_page is allocated, in order to avoid duplicate reports. I agree that
this code should go ahead and send the panic report even if there's no
message data. And in that case the discussion about testing the return value
from kmsg_dump_get_buffer() is moot.
I'll submit a new patch to change the behavior to send the panic report to
the host even if the message length is zero. I did a quick test of that case,
and it behaves like the case where HV_CRASH_CTL_CRASH_NOTIFY_MSG
is not set, which is fine.
I'll submit a new version of the patch focused on submitting the panic
report to the hypervisor even if the message size is zero. Avoiding the
uninitialized bytes_written will fall out of that change.
See a comment below in your suggested patch.
>
> That said, would it make sense to you the patch be something similar to:
>
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 0a3ab7efed46..20e4a9a13b32 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -188,6 +188,7 @@ static void hv_kmsg_dump(struct kmsg_dumper *dumper,
> {
> struct kmsg_dump_iter iter;
> size_t bytes_written;
> + bool ret;
>
> /* We are only interested in panics. */
> if (detail->reason != KMSG_DUMP_PANIC || !sysctl_record_panic_msg)
> @@ -197,11 +198,16 @@ static void hv_kmsg_dump(struct kmsg_dumper *dumper,
> * Write dump contents to the page. No need to synchronize; panic should
> * be single-threaded.
> */
> + bytes_written = 0;
> kmsg_dump_rewind(&iter);
> - kmsg_dump_get_buffer(&iter, false, hv_panic_page, HV_HYP_PAGE_SIZE,
> + ret = kmsg_dump_get_buffer(&iter, false, hv_panic_page, HV_HYP_PAGE_SIZE,
> &bytes_written);
Ignoring the return value can be made explicit as:
+ (void)kmsg_dump_get_buffer(&iter, false, hv_panic_page, HV_HYP_PAGE_SIZE,
&bytes_written);
Plus an appropriate comment. Then there's no need to introduce the "ret" local
variable and the somewhat funky:
(void) ret;
Michael
> - if (!bytes_written)
> - return;
> + /*
> + * Whether there is more data available or not, send what has been captured
> + * to the host. Ignore the return value.
> + */
> + (void) ret;
> +
> /*
> * P3 to contain the physical address of the panic page & P4 to
> * contain the size of the panic data in that page. Rest of the
> @@ -210,7 +216,7 @@ static void hv_kmsg_dump(struct kmsg_dumper *dumper,
> hv_set_msr(HV_MSR_CRASH_P0, 0);
> hv_set_msr(HV_MSR_CRASH_P1, 0);
> hv_set_msr(HV_MSR_CRASH_P2, 0);
> - hv_set_msr(HV_MSR_CRASH_P3, virt_to_phys(hv_panic_page));
> + hv_set_msr(HV_MSR_CRASH_P3, bytes_written ? virt_to_phys(hv_panic_page) : NULL);
> hv_set_msr(HV_MSR_CRASH_P4, bytes_written);
>
> /*
>
> --
> Cheers,
> Roman
>
> > --
> > 2.25.1
^ permalink raw reply
* Re: [PATCH v2 1/3] mshv: Ignore second stats page map result failure
From: Nuno Das Neves @ 2025-12-30 0:27 UTC (permalink / raw)
To: Michael Kelley, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, skinsburskii@linux.microsoft.com
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, longli@microsoft.com,
prapal@linux.microsoft.com, mrathor@linux.microsoft.com,
paekkaladevi@linux.microsoft.com
In-Reply-To: <SN6PR02MB41578C85BD5C114340677F84D4A2A@SN6PR02MB4157.namprd02.prod.outlook.com>
On 12/8/2025 7:12 AM, Michael Kelley wrote:
> From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Friday, December 5, 2025 10:59 AM
>>
>> From: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
>>
>> Older versions of the hypervisor do not support HV_STATS_AREA_PARENT
>> and return HV_STATUS_INVALID_PARAMETER for the second stats page
>> mapping request.
>>
>> This results a failure in module init. Instead of failing, gracefully
>> fall back to populating stats_pages[HV_STATS_AREA_PARENT] with the
>> already-mapped stats_pages[HV_STATS_AREA_SELF].
>
> This explains "what" this patch does. But could you add an explanation of "why"
> substituting SELF for the unavailable PARENT is the right thing to do? As a somewhat
> outside reviewer, I don't know enough about SELF vs. PARENT to immediately know
> why this substitution makes sense.
>
I'll attempt to explain. I'm a little hindered by the fact that like many of the
root interfaces this is not well-documented, but this is my understanding:
The stats areas HV_STATS_AREA_SELF and HV_STATS_AREA_PARENT indicate the privilege
level of the data in the mapped stats page.
Both SELF and PARENT contain the same fields, but some fields that are 0 in the
SELF page may be nonzero in PARENT page, and vice-versa. So, to read all the fields
we need to map both pages if possible, and prioritize reading non-zero data from
each field, by checking both the SELF and PARENT pages.
I don't know if it's possible for a given field to have a different (nonzero) value
in both SELF and PARENT pages. I imagine in that case we'd want to prioritize the
PARENT value, but it may simply not be possible.
The API is designed in this way to be backward-compatible with older hypervisors
that didn't have a concept of SELF and PARENT. Hence on older hypervisors (detectable
via the error code), all we can do is map SELF and use it for everything.
> Also, does this patch affect the logic in mshv_vp_dispatch_thread_blocked() where
> a zero value for the SELF version of VpRootDispatchThreadBlocked is replaced by
> the PARENT value? But that logic seems to be in the reverse direction -- replacing
> a missing SELF value with the PARENT value -- whereas this patch is about replacing
> missing PARENT values with SELF values. So are there two separate PARENT vs. SELF
> issues overall? And after this patch is in place and PARENT values are replaced with
> SELF on older hypervisor versions, the logic in mshv_vp_dispatch_thread_blocked()
> then effectively becomes a no-op if the SELF value is zero, and the return value will
> be zero. Is that problem?
>
This is the same issue, because we only care about any nonzero value in
mshv_vp_dispatch_thread_blocked(). It doesn't matter which page we check first in that
code, just that any nonzero value is returned as a boolean to indicate a blocked state.
The code in question could be rewritten:
return self_vp_cntrs[VpRootDispatchThreadBlocked] || parent_vp_cntrs[VpRootDispatchThreadBlocked];
>>
>> Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
>> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
>> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>> ---
>> drivers/hv/mshv_root_hv_call.c | 41 ++++++++++++++++++++++++++++++----
>> drivers/hv/mshv_root_main.c | 3 +++
>> 2 files changed, 40 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
>> index 598eaff4ff29..b1770c7b500c 100644
>> --- a/drivers/hv/mshv_root_hv_call.c
>> +++ b/drivers/hv/mshv_root_hv_call.c
>> @@ -855,6 +855,24 @@ static int hv_call_map_stats_page2(enum
>> hv_stats_object_type type,
>> return ret;
>> }
>>
>> +static int
>> +hv_stats_get_area_type(enum hv_stats_object_type type,
>> + const union hv_stats_object_identity *identity)
>> +{
>> + switch (type) {
>> + case HV_STATS_OBJECT_HYPERVISOR:
>> + return identity->hv.stats_area_type;
>> + case HV_STATS_OBJECT_LOGICAL_PROCESSOR:
>> + return identity->lp.stats_area_type;
>> + case HV_STATS_OBJECT_PARTITION:
>> + return identity->partition.stats_area_type;
>> + case HV_STATS_OBJECT_VP:
>> + return identity->vp.stats_area_type;
>> + }
>> +
>> + return -EINVAL;
>> +}
>> +
>> static int hv_call_map_stats_page(enum hv_stats_object_type type,
>> const union hv_stats_object_identity *identity,
>> void **addr)
>> @@ -863,7 +881,7 @@ static int hv_call_map_stats_page(enum hv_stats_object_type type,
>> struct hv_input_map_stats_page *input;
>> struct hv_output_map_stats_page *output;
>> u64 status, pfn;
>> - int ret = 0;
>> + int hv_status, ret = 0;
>>
>> do {
>> local_irq_save(flags);
>> @@ -878,11 +896,26 @@ static int hv_call_map_stats_page(enum hv_stats_object_type type,
>> pfn = output->map_location;
>>
>> local_irq_restore(flags);
>> - if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
>> - ret = hv_result_to_errno(status);
>> +
>> + hv_status = hv_result(status);
>> + if (hv_status != HV_STATUS_INSUFFICIENT_MEMORY) {
>> if (hv_result_success(status))
>> break;
>> - return ret;
>> +
>> + /*
>> + * Older versions of the hypervisor do not support the
>> + * PARENT stats area. In this case return "success" but
>> + * set the page to NULL. The caller should check for
>> + * this case and instead just use the SELF area.
>> + */
>> + if (hv_stats_get_area_type(type, identity) == HV_STATS_AREA_PARENT &&
>> + hv_status == HV_STATUS_INVALID_PARAMETER) {
>> + *addr = NULL;
>> + return 0;
>> + }
>> +
>> + hv_status_debug(status, "\n");
>> + return hv_result_to_errno(status);
>
> Does the hv_call_map_stats_page2() function need a similar fix? Or is there a linkage
> in hypervisor functionality where any hypervisor version that supports an overlay GPFN
> also supports the PARENT stats? If such a linkage is why hv_call_map_stats_page2()
> doesn't need a similar fix, please add a code comment to that effect in
> hv_call_map_stats_page2().
>
Exactly; hv_call_map_stats_page2() is only available on hypervisors where the PARENT
page is also available. I'll add a comment.
>> }
>>
>> ret = hv_call_deposit_pages(NUMA_NO_NODE,
>> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
>> index bc15d6f6922f..f59a4ab47685 100644
>> --- a/drivers/hv/mshv_root_main.c
>> +++ b/drivers/hv/mshv_root_main.c
>> @@ -905,6 +905,9 @@ static int mshv_vp_stats_map(u64 partition_id, u32 vp_index,
>> if (err)
>> goto unmap_self;
>>
>> + if (!stats_pages[HV_STATS_AREA_PARENT])
>> + stats_pages[HV_STATS_AREA_PARENT] =
>> stats_pages[HV_STATS_AREA_SELF];
>> +
>> return 0;
>>
>> unmap_self:
>> --
>> 2.34.1
^ permalink raw reply
* [PATCH 00/12] Recover sysfb after DRM probe failure
From: Zack Rusin @ 2025-12-29 21:58 UTC (permalink / raw)
To: dri-devel
Cc: Alex Deucher, amd-gfx, Ard Biesheuvel, Ce Sun, Chia-I Wu,
Christian König, Danilo Krummrich, Dave Airlie, Deepak Rawat,
Dmitry Osipenko, Gerd Hoffmann, Gurchetan Singh, Hans de Goede,
Hawking Zhang, Helge Deller, intel-gfx, intel-xe, Jani Nikula,
Javier Martinez Canillas, Jocelyn Falempe, Joonas Lahtinen,
Lijo Lazar, linux-efi, linux-fbdev, linux-hyperv, linux-kernel,
Lucas De Marchi, Lyude Paul, Maarten Lankhorst,
Mario Limonciello (AMD), Mario Limonciello, Maxime Ripard,
nouveau, Rodrigo Vivi, Simona Vetter, spice-devel,
Thomas Hellström, Thomas Zimmermann, Timur Kristóf,
Tvrtko Ursulin, virtualization, Vitaly Prosyak
Almost a rite of passage for every DRM developer and most Linux users
is upgrading your DRM driver/updating boot flags/changing some config
and having DRM driver fail at probe resulting in a blank screen.
Currently there's no way to recover from DRM driver probe failure. PCI
DRM driver explicitly throw out the existing sysfb to get exclusive
access to PCI resources so if the probe fails the system is left without
a functioning display driver.
Add code to sysfb to recever system framebuffer when DRM driver's probe
fails. This means that a DRM driver that fails to load reloads the system
framebuffer driver.
This works best with simpledrm. Without it Xorg won't recover because
it still tries to load the vendor specific driver which ends up usually
not working at all. With simpledrm the system recovers really nicely
ending up with a working console and not a blank screen.
There's a caveat in that some hardware might require some special magic
register write to recover EFI display. I'd appreciate it a lot if
maintainers could introduce a temporary failure in their drivers
probe to validate that the sysfb recovers and they get a working console.
The easiest way to double check it is by adding:
/* XXX: Temporary failure to test sysfb restore - REMOVE BEFORE COMMIT */
dev_info(&pdev->dev, "Testing sysfb restore: forcing probe failure\n");
ret = -EINVAL;
goto out_error;
or such right after the devm_aperture_remove_conflicting_pci_devices .
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ce Sun <cesun102@amd.com>
Cc: Chia-I Wu <olvaffe@gmail.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Deepak Rawat <drawat.floss@gmail.com>
Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Cc: dri-devel@lists.freedesktop.org
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Gurchetan Singh <gurchetansingh@chromium.org>
Cc: Hans de Goede <hansg@kernel.org>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Helge Deller <deller@gmx.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: intel-xe@lists.freedesktop.org
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: linux-efi@vger.kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-hyperv@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Mario Limonciello (AMD)" <superm1@kernel.org>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: nouveau@lists.freedesktop.org
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: spice-devel@lists.freedesktop.org
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: "Timur Kristóf" <timur.kristof@gmail.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: virtualization@lists.linux.dev
Cc: Vitaly Prosyak <vitaly.prosyak@amd.com>
Zack Rusin (12):
video/aperture: Add sysfb restore on DRM probe failure
drm/vmwgfx: Use devm aperture helpers for sysfb restore on probe
failure
drm/xe: Use devm aperture helpers for sysfb restore on probe failure
drm/amdgpu: Use devm aperture helpers for sysfb restore on probe
failure
drm/virtio: Add sysfb restore on probe failure
drm/nouveau: Use devm aperture helpers for sysfb restore on probe
failure
drm/qxl: Use devm aperture helpers for sysfb restore on probe failure
drm/vboxvideo: Use devm aperture helpers for sysfb restore on probe
failure
drm/hyperv: Add sysfb restore on probe failure
drm/ast: Use devm aperture helpers for sysfb restore on probe failure
drm/radeon: Use devm aperture helpers for sysfb restore on probe
failure
drm/i915: Use devm aperture helpers for sysfb restore on probe failure
drivers/firmware/efi/sysfb_efi.c | 2 +-
drivers/firmware/sysfb.c | 191 +++++++++++++--------
drivers/firmware/sysfb_simplefb.c | 10 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 7 +
drivers/gpu/drm/ast/ast_drv.c | 13 +-
drivers/gpu/drm/hyperv/hyperv_drm_drv.c | 23 +++
drivers/gpu/drm/i915/i915_driver.c | 13 +-
drivers/gpu/drm/nouveau/nouveau_drm.c | 16 +-
drivers/gpu/drm/qxl/qxl_drv.c | 14 +-
drivers/gpu/drm/radeon/radeon_drv.c | 15 +-
drivers/gpu/drm/vboxvideo/vbox_drv.c | 13 +-
drivers/gpu/drm/virtio/virtgpu_drv.c | 29 ++++
drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 13 +-
drivers/gpu/drm/xe/xe_device.c | 7 +-
drivers/gpu/drm/xe/xe_pci.c | 7 +
drivers/video/aperture.c | 54 ++++++
include/linux/aperture.h | 14 ++
include/linux/sysfb.h | 6 +
19 files changed, 368 insertions(+), 88 deletions(-)
--
2.48.1
^ permalink raw reply
* [PATCH 09/12] drm/hyperv: Add sysfb restore on probe failure
From: Zack Rusin @ 2025-12-29 21:58 UTC (permalink / raw)
To: dri-devel
Cc: Deepak Rawat, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, linux-hyperv, linux-kernel
In-Reply-To: <20251229215906.3688205-1-zack.rusin@broadcom.com>
Register a devm action on the vmbus device to restore the system
framebuffer (efifb/simpledrm) if the driver's probe fails after
removing the firmware framebuffer.
Unlike PCI drivers, hyperv cannot use the
devm_aperture_remove_conflicting_pci_devices() helper because this
is a vmbus device, not a PCI device. Instead, register the sysfb
restore action on the hv device (&hdev->device) which will be
released if probe fails. Cancel the action after successful probe
since the driver is now responsible for display output.
This ensures users don't lose display output if the hyperv driver
fails to probe after removing the firmware framebuffer.
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Cc: Deepak Rawat <drawat.floss@gmail.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: linux-hyperv@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
---
drivers/gpu/drm/hyperv/hyperv_drm_drv.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_drv.c b/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
index 06b5d96e6eaf..6d66cd243bab 100644
--- a/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
+++ b/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
@@ -8,6 +8,7 @@
#include <linux/hyperv.h>
#include <linux/module.h>
#include <linux/pci.h>
+#include <linux/sysfb.h>
#include <drm/clients/drm_client_setup.h>
#include <drm/drm_atomic_helper.h>
@@ -102,6 +103,11 @@ static int hyperv_setup_vram(struct hyperv_drm_device *hv,
return ret;
}
+static void hyperv_restore_sysfb(void *unused)
+{
+ sysfb_restore();
+}
+
static int hyperv_vmbus_probe(struct hv_device *hdev,
const struct hv_vmbus_device_id *dev_id)
{
@@ -127,6 +133,17 @@ static int hyperv_vmbus_probe(struct hv_device *hdev,
aperture_remove_all_conflicting_devices(hyperv_driver.name);
+ /*
+ * Register sysfb restore on the hv device. We can't use
+ * devm_aperture_remove_conflicting_pci_devices() because this
+ * is a vmbus device, not a PCI device. Register on &hdev->device
+ * so it fires if our probe fails after removing firmware FB.
+ */
+ ret = devm_add_action_or_reset(&hdev->device, hyperv_restore_sysfb,
+ NULL);
+ if (ret)
+ goto err_vmbus_close;
+
ret = hyperv_setup_vram(hv, hdev);
if (ret)
goto err_vmbus_close;
@@ -152,6 +169,12 @@ static int hyperv_vmbus_probe(struct hv_device *hdev,
drm_client_setup(dev, NULL);
+ /*
+ * Probe succeeded - cancel sysfb restore. We're now responsible
+ * for display output.
+ */
+ devm_remove_action(&hdev->device, hyperv_restore_sysfb, NULL);
+
return 0;
err_free_mmio:
--
2.48.1
^ permalink raw reply related
* Re: [PATCH 1/1] Drivers: hv: Fix uninit'ed variable in hv_msg_dump() if CONFIG_PRINTK not set
From: vdso @ 2025-12-29 4:11 UTC (permalink / raw)
To: mhklinux, mhkelley58
Cc: haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
kys@microsoft.com, linux-kernel@vger.kernel.org,
linux-hyperv@vger.kernel.org, dan.carpenter@linaro.org
In-Reply-To: <20251219160832.1628-1-mhklinux@outlook.com>
> On 12/19/2025 8:08 AM mhkelley58@gmail.com wrote:
>
>
> From: Michael Kelley <mhklinux@outlook.com>
>
> When CONFIG_PRINTK is not set, kmsg_dump_get_buffer() returns 'false'
> without setting the bytes_written argument. In such case, bytes_written
> is uninitialized when it is tested for zero.
>
> This is admittedly an unlikely scenario, but in the interest of correctness
> and avoiding tool noise about uninitialized variables, fix this by testing
> the return value before testing bytes_written.
>
> Fixes: 9c318a1d9b50 ("Drivers: hv: move panic report code from vmbus to hv early init code")
> Reported-by: kernel test robot <lkp@intel.com>
> Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
> Closes: https://lore.kernel.org/all/202512172102.OcUspn1Z-lkp@intel.com/
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
> drivers/hv/hv_common.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index f466a6099eff..de9e069c5a0c 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -188,6 +188,7 @@ static void hv_kmsg_dump(struct kmsg_dumper *dumper,
> {
> struct kmsg_dump_iter iter;
> size_t bytes_written;
> + bool ret;
>
> /* We are only interested in panics. */
> if (detail->reason != KMSG_DUMP_PANIC || !sysctl_record_panic_msg)
> @@ -198,9 +199,9 @@ static void hv_kmsg_dump(struct kmsg_dumper *dumper,
> * be single-threaded.
> */
> kmsg_dump_rewind(&iter);
> - kmsg_dump_get_buffer(&iter, false, hv_panic_page, HV_HYP_PAGE_SIZE,
> - &bytes_written);
> - if (!bytes_written)
> + ret = kmsg_dump_get_buffer(&iter, false, hv_panic_page, HV_HYP_PAGE_SIZE,
> + &bytes_written);
> + if (!ret || !bytes_written)
> return;
> /*
> * P3 to contain the physical address of the panic page & P4 to
The existing code
1. doesn't care about the return value from kmsg_dump_get_buffer.
The return value wouldn't make the function return before, why does that
need to change?
2. returns early when there are no bytes written.
I think it shouldn't as otherwise the crash control register isn't written to,
and the panic isn't signalled to the host. Is there another path maybe that
I'm not noticing?
That said, would it make sense to you the patch be something similar to:
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 0a3ab7efed46..20e4a9a13b32 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -188,6 +188,7 @@ static void hv_kmsg_dump(struct kmsg_dumper *dumper,
{
struct kmsg_dump_iter iter;
size_t bytes_written;
+ bool ret;
/* We are only interested in panics. */
if (detail->reason != KMSG_DUMP_PANIC || !sysctl_record_panic_msg)
@@ -197,11 +198,16 @@ static void hv_kmsg_dump(struct kmsg_dumper *dumper,
* Write dump contents to the page. No need to synchronize; panic should
* be single-threaded.
*/
+ bytes_written = 0;
kmsg_dump_rewind(&iter);
- kmsg_dump_get_buffer(&iter, false, hv_panic_page, HV_HYP_PAGE_SIZE,
+ ret = kmsg_dump_get_buffer(&iter, false, hv_panic_page, HV_HYP_PAGE_SIZE,
&bytes_written);
- if (!bytes_written)
- return;
+ /*
+ * Whether there is more data available or not, send what has been captured
+ * to the host. Ignore the return value.
+ */
+ (void) ret;
+
/*
* P3 to contain the physical address of the panic page & P4 to
* contain the size of the panic data in that page. Rest of the
@@ -210,7 +216,7 @@ static void hv_kmsg_dump(struct kmsg_dumper *dumper,
hv_set_msr(HV_MSR_CRASH_P0, 0);
hv_set_msr(HV_MSR_CRASH_P1, 0);
hv_set_msr(HV_MSR_CRASH_P2, 0);
- hv_set_msr(HV_MSR_CRASH_P3, virt_to_phys(hv_panic_page));
+ hv_set_msr(HV_MSR_CRASH_P3, bytes_written ? virt_to_phys(hv_panic_page) : NULL);
hv_set_msr(HV_MSR_CRASH_P4, bytes_written);
/*
--
Cheers,
Roman
> --
> 2.25.1
^ permalink raw reply related
* Re: [PATCH net, v2] net: mana: Fix use-after-free in reset service rescan path
From: patchwork-bot+netdevbpf @ 2025-12-28 9:51 UTC (permalink / raw)
To: Dipayaan Roy
Cc: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
kuba, pabeni, longli, kotaranov, horms, shradhagupta, ssengar,
ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
linux-rdma, dipayanroy
In-Reply-To: <20251218131054.GA3173@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
Hello:
This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Thu, 18 Dec 2025 05:10:54 -0800 you wrote:
> When mana_serv_reset() encounters -ETIMEDOUT or -EPROTO from
> mana_gd_resume(), it performs a PCI rescan via mana_serv_rescan().
>
> mana_serv_rescan() calls pci_stop_and_remove_bus_device(), which can
> invoke the driver's remove path and free the gdma_context associated
> with the device. After returning, mana_serv_reset() currently jumps to
> the out label and attempts to clear gc->in_service, dereferencing a
> freed gdma_context.
>
> [...]
Here is the summary with links:
- [net,v2] net: mana: Fix use-after-free in reset service rescan path
https://git.kernel.org/netdev/net/c/3387a7ad478b
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* [PATCH 3/3] drm/hyprev: Remove reference to hyperv_fb driver
From: Prasanna Kumar T S M @ 2025-12-27 4:31 UTC (permalink / raw)
To: ptsm, linux-hyperv, drawat.floss, tzimmermann
Cc: linux-kernel, dri-devel, simona, airlied, mripard,
maarten.lankhorst
In-Reply-To: <1766809486-24731-1-git-send-email-ptsm@linux.microsoft.com>
Remove hyperv_fb reference as the driver is removed.
Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
---
drivers/gpu/drm/Kconfig | 3 +--
drivers/gpu/drm/hyperv/hyperv_drm_proto.c | 15 +++++----------
2 files changed, 6 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 7e6bc0b3a589..01a1438b35a0 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -407,8 +407,7 @@ config DRM_HYPERV
help
This is a KMS driver for Hyper-V synthetic video device. Choose this
option if you would like to enable drm driver for Hyper-V virtual
- machine. Unselect Hyper-V framebuffer driver (CONFIG_FB_HYPERV) so
- that DRM driver is used by default.
+ machine.
If M is selected the module will be called hyperv_drm.
diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
index 013a7829182d..051ecc526832 100644
--- a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
+++ b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
@@ -1,8 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright 2021 Microsoft
- *
- * Portions of this code is derived from hyperv_fb.c
*/
#include <linux/hyperv.h>
@@ -304,16 +302,13 @@ int hyperv_update_situation(struct hv_device *hdev, u8 active, u32 bpp,
* but the Hyper-V host still draws a point as an extra mouse pointer,
* which is unwanted, especially when Xorg is running.
*
- * The hyperv_fb driver uses synthvid_send_ptr() to hide the unwanted
- * pointer, by setting msg.ptr_pos.is_visible = 1 and setting the
- * msg.ptr_shape.data. Note: setting msg.ptr_pos.is_visible to 0 doesn't
+ * Hide the unwanted pointer, by setting msg.ptr_pos.is_visible = 1 and setting
+ * the msg.ptr_shape.data. Note: setting msg.ptr_pos.is_visible to 0 doesn't
* work in tests.
*
- * Copy synthvid_send_ptr() to hyperv_drm and rename it to
- * hyperv_hide_hw_ptr(). Note: hyperv_hide_hw_ptr() is also called in the
- * handler of the SYNTHVID_FEATURE_CHANGE event, otherwise the host still
- * draws an extra unwanted mouse pointer after the VM Connection window is
- * closed and reopened.
+ * The hyperv_hide_hw_ptr() is also called in the handler of the
+ * SYNTHVID_FEATURE_CHANGE event, otherwise the host still draws an extra
+ * unwanted mouse pointer after the VM Connection window is closed and reopened.
*/
int hyperv_hide_hw_ptr(struct hv_device *hdev)
{
--
2.49.0
^ permalink raw reply related
* [PATCH 2/3] drivers: hv: vmbus_drv: Remove reference to hpyerv_fb
From: Prasanna Kumar T S M @ 2025-12-27 4:27 UTC (permalink / raw)
To: linux-hyperv, longli, decui, wei.liu, haiyangz, kys; +Cc: linux-kernel
In-Reply-To: <1766809486-24731-1-git-send-email-ptsm@linux.microsoft.com>
Remove hyperv_fb reference as the driver is removed.
Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
---
drivers/hv/vmbus_drv.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index a53af6fe81a6..7758d7e25a7b 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2356,8 +2356,8 @@ static void __maybe_unused vmbus_reserve_fb(void)
}
/*
- * Release the PCI device so hyperv_drm or hyperv_fb driver can
- * grab it later.
+ * Release the PCI device so hyperv_drm driver can grab it
+ * later.
*/
pci_dev_put(pdev);
}
--
2.49.0
^ permalink raw reply related
* [PATCH 1/3] drivers: video: fbdev: Remove hyperv_fb driver
From: Prasanna Kumar T S M @ 2025-12-27 4:24 UTC (permalink / raw)
To: linux-fbdev, dri-devel, linux-hyperv, ssengar, mhklinux, wei.liu,
ptsm, tzimmermann, deller, kys, haiyangz, decui
Cc: mrathor, rdunlap, soci, hsukrut3, linux-kernel
The HyperV DRM driver is available since 5.14. This makes the hyperv_fb
driver redundant, remove it.
Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
---
MAINTAINERS | 10 -
drivers/video/fbdev/Kconfig | 11 -
drivers/video/fbdev/Makefile | 1 -
drivers/video/fbdev/hyperv_fb.c | 1388 -------------------------------
4 files changed, 1410 deletions(-)
delete mode 100644 drivers/video/fbdev/hyperv_fb.c
diff --git a/MAINTAINERS b/MAINTAINERS
index dc731d37c8fe..73073551048e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11762,16 +11762,6 @@ F: include/uapi/rdma/mana-abi.h
F: net/vmw_vsock/hyperv_transport.c
F: tools/hv/
-HYPER-V FRAMEBUFFER DRIVER
-M: "K. Y. Srinivasan" <kys@microsoft.com>
-M: Haiyang Zhang <haiyangz@microsoft.com>
-M: Wei Liu <wei.liu@kernel.org>
-M: Dexuan Cui <decui@microsoft.com>
-L: linux-hyperv@vger.kernel.org
-S: Obsolete
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git
-F: drivers/video/fbdev/hyperv_fb.c
-
HYPERBUS SUPPORT
M: Vignesh Raghavendra <vigneshr@ti.com>
R: Tudor Ambarus <tudor.ambarus@linaro.org>
diff --git a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig
index a733f90eca55..45733522ff48 100644
--- a/drivers/video/fbdev/Kconfig
+++ b/drivers/video/fbdev/Kconfig
@@ -1770,17 +1770,6 @@ config FB_BROADSHEET
and could also have been called by other names when coupled with
a bridge adapter.
-config FB_HYPERV
- tristate "Microsoft Hyper-V Synthetic Video support (DEPRECATED)"
- depends on FB && HYPERV_VMBUS
- select DMA_CMA if HAVE_DMA_CONTIGUOUS && CMA
- select FB_IOMEM_HELPERS_DEFERRED
- help
- This framebuffer driver supports Microsoft Hyper-V Synthetic Video.
-
- This driver is deprecated, please use the Hyper-V DRM driver at
- drivers/gpu/drm/hyperv (CONFIG_DRM_HYPERV) instead.
-
config FB_SIMPLE
tristate "Simple framebuffer support"
depends on FB
diff --git a/drivers/video/fbdev/Makefile b/drivers/video/fbdev/Makefile
index b3d12f977c06..36a18d958ba0 100644
--- a/drivers/video/fbdev/Makefile
+++ b/drivers/video/fbdev/Makefile
@@ -111,7 +111,6 @@ obj-y += omap2/
obj-$(CONFIG_XEN_FBDEV_FRONTEND) += xen-fbfront.o
obj-$(CONFIG_FB_CARMINE) += carminefb.o
obj-$(CONFIG_FB_MB862XX) += mb862xx/
-obj-$(CONFIG_FB_HYPERV) += hyperv_fb.o
obj-$(CONFIG_FB_OPENCORES) += ocfb.o
obj-$(CONFIG_FB_SM712) += sm712fb.o
diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
deleted file mode 100644
index c99e2ea4b3de..000000000000
--- a/drivers/video/fbdev/hyperv_fb.c
+++ /dev/null
@@ -1,1388 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Copyright (c) 2012, Microsoft Corporation.
- *
- * Author:
- * Haiyang Zhang <haiyangz@microsoft.com>
- */
-
-/*
- * Hyper-V Synthetic Video Frame Buffer Driver
- *
- * This is the driver for the Hyper-V Synthetic Video, which supports
- * screen resolution up to Full HD 1920x1080 with 32 bit color on Windows
- * Server 2012, and 1600x1200 with 16 bit color on Windows Server 2008 R2
- * or earlier.
- *
- * It also solves the double mouse cursor issue of the emulated video mode.
- *
- * The default screen resolution is 1152x864, which may be changed by a
- * kernel parameter:
- * video=hyperv_fb:<width>x<height>
- * For example: video=hyperv_fb:1280x1024
- *
- * Portrait orientation is also supported:
- * For example: video=hyperv_fb:864x1152
- *
- * When a Windows 10 RS5+ host is used, the virtual machine screen
- * resolution is obtained from the host. The "video=hyperv_fb" option is
- * not needed, but still can be used to overwrite what the host specifies.
- * The VM resolution on the host could be set by executing the powershell
- * "set-vmvideo" command. For example
- * set-vmvideo -vmname name -horizontalresolution:1920 \
- * -verticalresolution:1200 -resolutiontype single
- *
- * Gen 1 VMs also support direct using VM's physical memory for framebuffer.
- * It could improve the efficiency and performance for framebuffer and VM.
- * This requires to allocate contiguous physical memory from Linux kernel's
- * CMA memory allocator. To enable this, supply a kernel parameter to give
- * enough memory space to CMA allocator for framebuffer. For example:
- * cma=130m
- * This gives 130MB memory to CMA allocator that can be allocated to
- * framebuffer. For reference, 8K resolution (7680x4320) takes about
- * 127MB memory.
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/aperture.h>
-#include <linux/module.h>
-#include <linux/kernel.h>
-#include <linux/vmalloc.h>
-#include <linux/init.h>
-#include <linux/completion.h>
-#include <linux/fb.h>
-#include <linux/pci.h>
-#include <linux/panic_notifier.h>
-#include <linux/efi.h>
-#include <linux/console.h>
-
-#include <linux/hyperv.h>
-
-/* Hyper-V Synthetic Video Protocol definitions and structures */
-#define MAX_VMBUS_PKT_SIZE 0x4000
-
-#define SYNTHVID_VERSION(major, minor) ((minor) << 16 | (major))
-/* Support for VERSION_WIN7 is removed. #define is retained for reference. */
-#define SYNTHVID_VERSION_WIN7 SYNTHVID_VERSION(3, 0)
-#define SYNTHVID_VERSION_WIN8 SYNTHVID_VERSION(3, 2)
-#define SYNTHVID_VERSION_WIN10 SYNTHVID_VERSION(3, 5)
-
-#define SYNTHVID_VER_GET_MAJOR(ver) (ver & 0x0000ffff)
-#define SYNTHVID_VER_GET_MINOR(ver) ((ver & 0xffff0000) >> 16)
-
-#define SYNTHVID_DEPTH_WIN8 32
-#define SYNTHVID_FB_SIZE_WIN8 (8 * 1024 * 1024)
-
-enum pipe_msg_type {
- PIPE_MSG_INVALID,
- PIPE_MSG_DATA,
- PIPE_MSG_MAX
-};
-
-struct pipe_msg_hdr {
- u32 type;
- u32 size; /* size of message after this field */
-} __packed;
-
-
-enum synthvid_msg_type {
- SYNTHVID_ERROR = 0,
- SYNTHVID_VERSION_REQUEST = 1,
- SYNTHVID_VERSION_RESPONSE = 2,
- SYNTHVID_VRAM_LOCATION = 3,
- SYNTHVID_VRAM_LOCATION_ACK = 4,
- SYNTHVID_SITUATION_UPDATE = 5,
- SYNTHVID_SITUATION_UPDATE_ACK = 6,
- SYNTHVID_POINTER_POSITION = 7,
- SYNTHVID_POINTER_SHAPE = 8,
- SYNTHVID_FEATURE_CHANGE = 9,
- SYNTHVID_DIRT = 10,
- SYNTHVID_RESOLUTION_REQUEST = 13,
- SYNTHVID_RESOLUTION_RESPONSE = 14,
-
- SYNTHVID_MAX = 15
-};
-
-#define SYNTHVID_EDID_BLOCK_SIZE 128
-#define SYNTHVID_MAX_RESOLUTION_COUNT 64
-
-struct hvd_screen_info {
- u16 width;
- u16 height;
-} __packed;
-
-struct synthvid_msg_hdr {
- u32 type;
- u32 size; /* size of this header + payload after this field*/
-} __packed;
-
-struct synthvid_version_req {
- u32 version;
-} __packed;
-
-struct synthvid_version_resp {
- u32 version;
- u8 is_accepted;
- u8 max_video_outputs;
-} __packed;
-
-struct synthvid_supported_resolution_req {
- u8 maximum_resolution_count;
-} __packed;
-
-struct synthvid_supported_resolution_resp {
- u8 edid_block[SYNTHVID_EDID_BLOCK_SIZE];
- u8 resolution_count;
- u8 default_resolution_index;
- u8 is_standard;
- struct hvd_screen_info
- supported_resolution[SYNTHVID_MAX_RESOLUTION_COUNT];
-} __packed;
-
-struct synthvid_vram_location {
- u64 user_ctx;
- u8 is_vram_gpa_specified;
- u64 vram_gpa;
-} __packed;
-
-struct synthvid_vram_location_ack {
- u64 user_ctx;
-} __packed;
-
-struct video_output_situation {
- u8 active;
- u32 vram_offset;
- u8 depth_bits;
- u32 width_pixels;
- u32 height_pixels;
- u32 pitch_bytes;
-} __packed;
-
-struct synthvid_situation_update {
- u64 user_ctx;
- u8 video_output_count;
- struct video_output_situation video_output[1];
-} __packed;
-
-struct synthvid_situation_update_ack {
- u64 user_ctx;
-} __packed;
-
-struct synthvid_pointer_position {
- u8 is_visible;
- u8 video_output;
- s32 image_x;
- s32 image_y;
-} __packed;
-
-
-#define CURSOR_MAX_X 96
-#define CURSOR_MAX_Y 96
-#define CURSOR_ARGB_PIXEL_SIZE 4
-#define CURSOR_MAX_SIZE (CURSOR_MAX_X * CURSOR_MAX_Y * CURSOR_ARGB_PIXEL_SIZE)
-#define CURSOR_COMPLETE (-1)
-
-struct synthvid_pointer_shape {
- u8 part_idx;
- u8 is_argb;
- u32 width; /* CURSOR_MAX_X at most */
- u32 height; /* CURSOR_MAX_Y at most */
- u32 hot_x; /* hotspot relative to upper-left of pointer image */
- u32 hot_y;
- u8 data[4];
-} __packed;
-
-struct synthvid_feature_change {
- u8 is_dirt_needed;
- u8 is_ptr_pos_needed;
- u8 is_ptr_shape_needed;
- u8 is_situ_needed;
-} __packed;
-
-struct rect {
- s32 x1, y1; /* top left corner */
- s32 x2, y2; /* bottom right corner, exclusive */
-} __packed;
-
-struct synthvid_dirt {
- u8 video_output;
- u8 dirt_count;
- struct rect rect[1];
-} __packed;
-
-struct synthvid_msg {
- struct pipe_msg_hdr pipe_hdr;
- struct synthvid_msg_hdr vid_hdr;
- union {
- struct synthvid_version_req ver_req;
- struct synthvid_version_resp ver_resp;
- struct synthvid_vram_location vram;
- struct synthvid_vram_location_ack vram_ack;
- struct synthvid_situation_update situ;
- struct synthvid_situation_update_ack situ_ack;
- struct synthvid_pointer_position ptr_pos;
- struct synthvid_pointer_shape ptr_shape;
- struct synthvid_feature_change feature_chg;
- struct synthvid_dirt dirt;
- struct synthvid_supported_resolution_req resolution_req;
- struct synthvid_supported_resolution_resp resolution_resp;
- };
-} __packed;
-
-
-/* FB driver definitions and structures */
-#define HVFB_WIDTH 1152 /* default screen width */
-#define HVFB_HEIGHT 864 /* default screen height */
-#define HVFB_WIDTH_MIN 640
-#define HVFB_HEIGHT_MIN 480
-
-#define RING_BUFSIZE (256 * 1024)
-#define VSP_TIMEOUT (10 * HZ)
-#define HVFB_UPDATE_DELAY (HZ / 20)
-#define HVFB_ONDEMAND_THROTTLE (HZ / 20)
-
-struct hvfb_par {
- struct fb_info *info;
- struct resource *mem;
- bool fb_ready; /* fb device is ready */
- struct completion wait;
- u32 synthvid_version;
-
- struct delayed_work dwork;
- bool update;
- bool update_saved; /* The value of 'update' before hibernation */
-
- u32 pseudo_palette[16];
- u8 init_buf[MAX_VMBUS_PKT_SIZE];
- u8 recv_buf[MAX_VMBUS_PKT_SIZE];
-
- /* If true, the VSC notifies the VSP on every framebuffer change */
- bool synchronous_fb;
-
- /* If true, need to copy from deferred IO mem to framebuffer mem */
- bool need_docopy;
-
- struct notifier_block hvfb_panic_nb;
-
- /* Memory for deferred IO and frame buffer itself */
- unsigned char *dio_vp;
- unsigned char *mmio_vp;
- phys_addr_t mmio_pp;
-
- /* Dirty rectangle, protected by delayed_refresh_lock */
- int x1, y1, x2, y2;
- bool delayed_refresh;
- spinlock_t delayed_refresh_lock;
-};
-
-static uint screen_width = HVFB_WIDTH;
-static uint screen_height = HVFB_HEIGHT;
-static uint screen_depth;
-static uint screen_fb_size;
-static uint dio_fb_size; /* FB size for deferred IO */
-
-static void hvfb_putmem(struct fb_info *info);
-
-/* Send message to Hyper-V host */
-static inline int synthvid_send(struct hv_device *hdev,
- struct synthvid_msg *msg)
-{
- static atomic64_t request_id = ATOMIC64_INIT(0);
- int ret;
-
- msg->pipe_hdr.type = PIPE_MSG_DATA;
- msg->pipe_hdr.size = msg->vid_hdr.size;
-
- ret = vmbus_sendpacket(hdev->channel, msg,
- msg->vid_hdr.size + sizeof(struct pipe_msg_hdr),
- atomic64_inc_return(&request_id),
- VM_PKT_DATA_INBAND, 0);
-
- if (ret)
- pr_err_ratelimited("Unable to send packet via vmbus; error %d\n", ret);
-
- return ret;
-}
-
-
-/* Send screen resolution info to host */
-static int synthvid_send_situ(struct hv_device *hdev)
-{
- struct fb_info *info = hv_get_drvdata(hdev);
- struct synthvid_msg msg;
-
- if (!info)
- return -ENODEV;
-
- memset(&msg, 0, sizeof(struct synthvid_msg));
-
- msg.vid_hdr.type = SYNTHVID_SITUATION_UPDATE;
- msg.vid_hdr.size = sizeof(struct synthvid_msg_hdr) +
- sizeof(struct synthvid_situation_update);
- msg.situ.user_ctx = 0;
- msg.situ.video_output_count = 1;
- msg.situ.video_output[0].active = 1;
- msg.situ.video_output[0].vram_offset = 0;
- msg.situ.video_output[0].depth_bits = info->var.bits_per_pixel;
- msg.situ.video_output[0].width_pixels = info->var.xres;
- msg.situ.video_output[0].height_pixels = info->var.yres;
- msg.situ.video_output[0].pitch_bytes = info->fix.line_length;
-
- synthvid_send(hdev, &msg);
-
- return 0;
-}
-
-/* Send mouse pointer info to host */
-static int synthvid_send_ptr(struct hv_device *hdev)
-{
- struct synthvid_msg msg;
-
- memset(&msg, 0, sizeof(struct synthvid_msg));
- msg.vid_hdr.type = SYNTHVID_POINTER_POSITION;
- msg.vid_hdr.size = sizeof(struct synthvid_msg_hdr) +
- sizeof(struct synthvid_pointer_position);
- msg.ptr_pos.is_visible = 1;
- msg.ptr_pos.video_output = 0;
- msg.ptr_pos.image_x = 0;
- msg.ptr_pos.image_y = 0;
- synthvid_send(hdev, &msg);
-
- memset(&msg, 0, sizeof(struct synthvid_msg));
- msg.vid_hdr.type = SYNTHVID_POINTER_SHAPE;
- msg.vid_hdr.size = sizeof(struct synthvid_msg_hdr) +
- sizeof(struct synthvid_pointer_shape);
- msg.ptr_shape.part_idx = CURSOR_COMPLETE;
- msg.ptr_shape.is_argb = 1;
- msg.ptr_shape.width = 1;
- msg.ptr_shape.height = 1;
- msg.ptr_shape.hot_x = 0;
- msg.ptr_shape.hot_y = 0;
- msg.ptr_shape.data[0] = 0;
- msg.ptr_shape.data[1] = 1;
- msg.ptr_shape.data[2] = 1;
- msg.ptr_shape.data[3] = 1;
- synthvid_send(hdev, &msg);
-
- return 0;
-}
-
-/* Send updated screen area (dirty rectangle) location to host */
-static int
-synthvid_update(struct fb_info *info, int x1, int y1, int x2, int y2)
-{
- struct hv_device *hdev = device_to_hv_device(info->device);
- struct synthvid_msg msg;
-
- memset(&msg, 0, sizeof(struct synthvid_msg));
- if (x2 == INT_MAX)
- x2 = info->var.xres;
- if (y2 == INT_MAX)
- y2 = info->var.yres;
-
- msg.vid_hdr.type = SYNTHVID_DIRT;
- msg.vid_hdr.size = sizeof(struct synthvid_msg_hdr) +
- sizeof(struct synthvid_dirt);
- msg.dirt.video_output = 0;
- msg.dirt.dirt_count = 1;
- msg.dirt.rect[0].x1 = (x1 > x2) ? 0 : x1;
- msg.dirt.rect[0].y1 = (y1 > y2) ? 0 : y1;
- msg.dirt.rect[0].x2 =
- (x2 < x1 || x2 > info->var.xres) ? info->var.xres : x2;
- msg.dirt.rect[0].y2 =
- (y2 < y1 || y2 > info->var.yres) ? info->var.yres : y2;
-
- synthvid_send(hdev, &msg);
-
- return 0;
-}
-
-static void hvfb_docopy(struct hvfb_par *par,
- unsigned long offset,
- unsigned long size)
-{
- if (!par || !par->mmio_vp || !par->dio_vp || !par->fb_ready ||
- size == 0 || offset >= dio_fb_size)
- return;
-
- if (offset + size > dio_fb_size)
- size = dio_fb_size - offset;
-
- memcpy(par->mmio_vp + offset, par->dio_vp + offset, size);
-}
-
-/* Deferred IO callback */
-static void synthvid_deferred_io(struct fb_info *p, struct list_head *pagereflist)
-{
- struct hvfb_par *par = p->par;
- struct fb_deferred_io_pageref *pageref;
- unsigned long start, end;
- int y1, y2, miny, maxy;
-
- miny = INT_MAX;
- maxy = 0;
-
- /*
- * Merge dirty pages. It is possible that last page cross
- * over the end of frame buffer row yres. This is taken care of
- * in synthvid_update function by clamping the y2
- * value to yres.
- */
- list_for_each_entry(pageref, pagereflist, list) {
- start = pageref->offset;
- end = start + PAGE_SIZE - 1;
- y1 = start / p->fix.line_length;
- y2 = end / p->fix.line_length;
- miny = min_t(int, miny, y1);
- maxy = max_t(int, maxy, y2);
-
- /* Copy from dio space to mmio address */
- if (par->fb_ready && par->need_docopy)
- hvfb_docopy(par, start, PAGE_SIZE);
- }
-
- if (par->fb_ready && par->update)
- synthvid_update(p, 0, miny, p->var.xres, maxy + 1);
-}
-
-static struct fb_deferred_io synthvid_defio = {
- .delay = HZ / 20,
- .deferred_io = synthvid_deferred_io,
-};
-
-/*
- * Actions on received messages from host:
- * Complete the wait event.
- * Or, reply with screen and cursor info.
- */
-static void synthvid_recv_sub(struct hv_device *hdev)
-{
- struct fb_info *info = hv_get_drvdata(hdev);
- struct hvfb_par *par;
- struct synthvid_msg *msg;
-
- if (!info)
- return;
-
- par = info->par;
- msg = (struct synthvid_msg *)par->recv_buf;
-
- /* Complete the wait event */
- if (msg->vid_hdr.type == SYNTHVID_VERSION_RESPONSE ||
- msg->vid_hdr.type == SYNTHVID_RESOLUTION_RESPONSE ||
- msg->vid_hdr.type == SYNTHVID_VRAM_LOCATION_ACK) {
- memcpy(par->init_buf, msg, MAX_VMBUS_PKT_SIZE);
- complete(&par->wait);
- return;
- }
-
- /* Reply with screen and cursor info */
- if (msg->vid_hdr.type == SYNTHVID_FEATURE_CHANGE) {
- if (par->fb_ready) {
- synthvid_send_ptr(hdev);
- synthvid_send_situ(hdev);
- }
-
- par->update = msg->feature_chg.is_dirt_needed;
- if (par->update)
- schedule_delayed_work(&par->dwork, HVFB_UPDATE_DELAY);
- }
-}
-
-/* Receive callback for messages from the host */
-static void synthvid_receive(void *ctx)
-{
- struct hv_device *hdev = ctx;
- struct fb_info *info = hv_get_drvdata(hdev);
- struct hvfb_par *par;
- struct synthvid_msg *recv_buf;
- u32 bytes_recvd;
- u64 req_id;
- int ret;
-
- if (!info)
- return;
-
- par = info->par;
- recv_buf = (struct synthvid_msg *)par->recv_buf;
-
- do {
- ret = vmbus_recvpacket(hdev->channel, recv_buf,
- MAX_VMBUS_PKT_SIZE,
- &bytes_recvd, &req_id);
- if (bytes_recvd > 0 &&
- recv_buf->pipe_hdr.type == PIPE_MSG_DATA)
- synthvid_recv_sub(hdev);
- } while (bytes_recvd > 0 && ret == 0);
-}
-
-/* Check if the ver1 version is equal or greater than ver2 */
-static inline bool synthvid_ver_ge(u32 ver1, u32 ver2)
-{
- if (SYNTHVID_VER_GET_MAJOR(ver1) > SYNTHVID_VER_GET_MAJOR(ver2) ||
- (SYNTHVID_VER_GET_MAJOR(ver1) == SYNTHVID_VER_GET_MAJOR(ver2) &&
- SYNTHVID_VER_GET_MINOR(ver1) >= SYNTHVID_VER_GET_MINOR(ver2)))
- return true;
-
- return false;
-}
-
-/* Check synthetic video protocol version with the host */
-static int synthvid_negotiate_ver(struct hv_device *hdev, u32 ver)
-{
- struct fb_info *info = hv_get_drvdata(hdev);
- struct hvfb_par *par = info->par;
- struct synthvid_msg *msg = (struct synthvid_msg *)par->init_buf;
- int ret = 0;
- unsigned long t;
-
- memset(msg, 0, sizeof(struct synthvid_msg));
- msg->vid_hdr.type = SYNTHVID_VERSION_REQUEST;
- msg->vid_hdr.size = sizeof(struct synthvid_msg_hdr) +
- sizeof(struct synthvid_version_req);
- msg->ver_req.version = ver;
- synthvid_send(hdev, msg);
-
- t = wait_for_completion_timeout(&par->wait, VSP_TIMEOUT);
- if (!t) {
- pr_err("Time out on waiting version response\n");
- ret = -ETIMEDOUT;
- goto out;
- }
- if (!msg->ver_resp.is_accepted) {
- ret = -ENODEV;
- goto out;
- }
-
- par->synthvid_version = ver;
- pr_info("Synthvid Version major %d, minor %d\n",
- SYNTHVID_VER_GET_MAJOR(ver), SYNTHVID_VER_GET_MINOR(ver));
-
-out:
- return ret;
-}
-
-/* Get current resolution from the host */
-static int synthvid_get_supported_resolution(struct hv_device *hdev)
-{
- struct fb_info *info = hv_get_drvdata(hdev);
- struct hvfb_par *par = info->par;
- struct synthvid_msg *msg = (struct synthvid_msg *)par->init_buf;
- int ret = 0;
- unsigned long t;
- u8 index;
-
- memset(msg, 0, sizeof(struct synthvid_msg));
- msg->vid_hdr.type = SYNTHVID_RESOLUTION_REQUEST;
- msg->vid_hdr.size = sizeof(struct synthvid_msg_hdr) +
- sizeof(struct synthvid_supported_resolution_req);
-
- msg->resolution_req.maximum_resolution_count =
- SYNTHVID_MAX_RESOLUTION_COUNT;
- synthvid_send(hdev, msg);
-
- t = wait_for_completion_timeout(&par->wait, VSP_TIMEOUT);
- if (!t) {
- pr_err("Time out on waiting resolution response\n");
- ret = -ETIMEDOUT;
- goto out;
- }
-
- if (msg->resolution_resp.resolution_count == 0) {
- pr_err("No supported resolutions\n");
- ret = -ENODEV;
- goto out;
- }
-
- index = msg->resolution_resp.default_resolution_index;
- if (index >= msg->resolution_resp.resolution_count) {
- pr_err("Invalid resolution index: %d\n", index);
- ret = -ENODEV;
- goto out;
- }
-
- screen_width =
- msg->resolution_resp.supported_resolution[index].width;
- screen_height =
- msg->resolution_resp.supported_resolution[index].height;
-
-out:
- return ret;
-}
-
-/* Connect to VSP (Virtual Service Provider) on host */
-static int synthvid_connect_vsp(struct hv_device *hdev)
-{
- struct fb_info *info = hv_get_drvdata(hdev);
- struct hvfb_par *par = info->par;
- int ret;
-
- ret = vmbus_open(hdev->channel, RING_BUFSIZE, RING_BUFSIZE,
- NULL, 0, synthvid_receive, hdev);
- if (ret) {
- pr_err("Unable to open vmbus channel\n");
- return ret;
- }
-
- /* Negotiate the protocol version with host */
- switch (vmbus_proto_version) {
- case VERSION_WIN10:
- case VERSION_WIN10_V5:
- ret = synthvid_negotiate_ver(hdev, SYNTHVID_VERSION_WIN10);
- if (!ret)
- break;
- fallthrough;
- case VERSION_WIN8:
- case VERSION_WIN8_1:
- ret = synthvid_negotiate_ver(hdev, SYNTHVID_VERSION_WIN8);
- break;
- default:
- ret = synthvid_negotiate_ver(hdev, SYNTHVID_VERSION_WIN10);
- break;
- }
-
- if (ret) {
- pr_err("Synthetic video device version not accepted\n");
- goto error;
- }
-
- screen_depth = SYNTHVID_DEPTH_WIN8;
- if (synthvid_ver_ge(par->synthvid_version, SYNTHVID_VERSION_WIN10)) {
- ret = synthvid_get_supported_resolution(hdev);
- if (ret)
- pr_info("Failed to get supported resolution from host, use default\n");
- }
-
- screen_fb_size = hdev->channel->offermsg.offer.
- mmio_megabytes * 1024 * 1024;
-
- return 0;
-
-error:
- vmbus_close(hdev->channel);
- return ret;
-}
-
-/* Send VRAM and Situation messages to the host */
-static int synthvid_send_config(struct hv_device *hdev)
-{
- struct fb_info *info = hv_get_drvdata(hdev);
- struct hvfb_par *par = info->par;
- struct synthvid_msg *msg = (struct synthvid_msg *)par->init_buf;
- int ret = 0;
- unsigned long t;
-
- /* Send VRAM location */
- memset(msg, 0, sizeof(struct synthvid_msg));
- msg->vid_hdr.type = SYNTHVID_VRAM_LOCATION;
- msg->vid_hdr.size = sizeof(struct synthvid_msg_hdr) +
- sizeof(struct synthvid_vram_location);
- msg->vram.user_ctx = msg->vram.vram_gpa = par->mmio_pp;
- msg->vram.is_vram_gpa_specified = 1;
- synthvid_send(hdev, msg);
-
- t = wait_for_completion_timeout(&par->wait, VSP_TIMEOUT);
- if (!t) {
- pr_err("Time out on waiting vram location ack\n");
- ret = -ETIMEDOUT;
- goto out;
- }
- if (msg->vram_ack.user_ctx != par->mmio_pp) {
- pr_err("Unable to set VRAM location\n");
- ret = -ENODEV;
- goto out;
- }
-
- /* Send pointer and situation update */
- synthvid_send_ptr(hdev);
- synthvid_send_situ(hdev);
-
-out:
- return ret;
-}
-
-
-/*
- * Delayed work callback:
- * It is scheduled to call whenever update request is received and it has
- * not been called in last HVFB_ONDEMAND_THROTTLE time interval.
- */
-static void hvfb_update_work(struct work_struct *w)
-{
- struct hvfb_par *par = container_of(w, struct hvfb_par, dwork.work);
- struct fb_info *info = par->info;
- unsigned long flags;
- int x1, x2, y1, y2;
- int j;
-
- spin_lock_irqsave(&par->delayed_refresh_lock, flags);
- /* Reset the request flag */
- par->delayed_refresh = false;
-
- /* Store the dirty rectangle to local variables */
- x1 = par->x1;
- x2 = par->x2;
- y1 = par->y1;
- y2 = par->y2;
-
- /* Clear dirty rectangle */
- par->x1 = par->y1 = INT_MAX;
- par->x2 = par->y2 = 0;
-
- spin_unlock_irqrestore(&par->delayed_refresh_lock, flags);
-
- if (x1 > info->var.xres || x2 > info->var.xres ||
- y1 > info->var.yres || y2 > info->var.yres || x2 <= x1)
- return;
-
- /* Copy the dirty rectangle to frame buffer memory */
- if (par->need_docopy)
- for (j = y1; j < y2; j++)
- hvfb_docopy(par,
- j * info->fix.line_length +
- (x1 * screen_depth / 8),
- (x2 - x1) * screen_depth / 8);
-
- /* Refresh */
- if (par->fb_ready && par->update)
- synthvid_update(info, x1, y1, x2, y2);
-}
-
-/*
- * Control the on-demand refresh frequency. It schedules a delayed
- * screen update if it has not yet.
- */
-static void hvfb_ondemand_refresh_throttle(struct hvfb_par *par,
- int x1, int y1, int w, int h)
-{
- unsigned long flags;
- int x2 = x1 + w;
- int y2 = y1 + h;
-
- spin_lock_irqsave(&par->delayed_refresh_lock, flags);
-
- /* Merge dirty rectangle */
- par->x1 = min_t(int, par->x1, x1);
- par->y1 = min_t(int, par->y1, y1);
- par->x2 = max_t(int, par->x2, x2);
- par->y2 = max_t(int, par->y2, y2);
-
- /* Schedule a delayed screen update if not yet */
- if (par->delayed_refresh == false) {
- schedule_delayed_work(&par->dwork,
- HVFB_ONDEMAND_THROTTLE);
- par->delayed_refresh = true;
- }
-
- spin_unlock_irqrestore(&par->delayed_refresh_lock, flags);
-}
-
-static int hvfb_on_panic(struct notifier_block *nb,
- unsigned long e, void *p)
-{
- struct hv_device *hdev;
- struct hvfb_par *par;
- struct fb_info *info;
-
- par = container_of(nb, struct hvfb_par, hvfb_panic_nb);
- info = par->info;
- hdev = device_to_hv_device(info->device);
-
- if (hv_ringbuffer_spinlock_busy(hdev->channel))
- return NOTIFY_DONE;
-
- par->synchronous_fb = true;
- if (par->need_docopy)
- hvfb_docopy(par, 0, dio_fb_size);
- synthvid_update(info, 0, 0, INT_MAX, INT_MAX);
-
- return NOTIFY_DONE;
-}
-
-/* Framebuffer operation handlers */
-
-static int hvfb_check_var(struct fb_var_screeninfo *var, struct fb_info *info)
-{
- if (var->xres < HVFB_WIDTH_MIN || var->yres < HVFB_HEIGHT_MIN ||
- var->xres > screen_width || var->yres > screen_height ||
- var->bits_per_pixel != screen_depth)
- return -EINVAL;
-
- var->xres_virtual = var->xres;
- var->yres_virtual = var->yres;
-
- return 0;
-}
-
-static int hvfb_set_par(struct fb_info *info)
-{
- struct hv_device *hdev = device_to_hv_device(info->device);
-
- return synthvid_send_situ(hdev);
-}
-
-
-static inline u32 chan_to_field(u32 chan, struct fb_bitfield *bf)
-{
- return ((chan & 0xffff) >> (16 - bf->length)) << bf->offset;
-}
-
-static int hvfb_setcolreg(unsigned regno, unsigned red, unsigned green,
- unsigned blue, unsigned transp, struct fb_info *info)
-{
- u32 *pal = info->pseudo_palette;
-
- if (regno > 15)
- return -EINVAL;
-
- pal[regno] = chan_to_field(red, &info->var.red)
- | chan_to_field(green, &info->var.green)
- | chan_to_field(blue, &info->var.blue)
- | chan_to_field(transp, &info->var.transp);
-
- return 0;
-}
-
-static int hvfb_blank(int blank, struct fb_info *info)
-{
- return 1; /* get fb_blank to set the colormap to all black */
-}
-
-static void hvfb_ops_damage_range(struct fb_info *info, off_t off, size_t len)
-{
- /* TODO: implement damage handling */
-}
-
-static void hvfb_ops_damage_area(struct fb_info *info, u32 x, u32 y, u32 width, u32 height)
-{
- struct hvfb_par *par = info->par;
-
- if (par->synchronous_fb)
- synthvid_update(info, 0, 0, INT_MAX, INT_MAX);
- else
- hvfb_ondemand_refresh_throttle(par, x, y, width, height);
-}
-
-/*
- * fb_ops.fb_destroy is called by the last put_fb_info() call at the end
- * of unregister_framebuffer() or fb_release(). Do any cleanup related to
- * framebuffer here.
- */
-static void hvfb_destroy(struct fb_info *info)
-{
- hvfb_putmem(info);
- framebuffer_release(info);
-}
-
-/*
- * TODO: GEN1 codepaths allocate from system or DMA-able memory. Fix the
- * driver to use the _SYSMEM_ or _DMAMEM_ helpers in these cases.
- */
-FB_GEN_DEFAULT_DEFERRED_IOMEM_OPS(hvfb_ops,
- hvfb_ops_damage_range,
- hvfb_ops_damage_area)
-
-static const struct fb_ops hvfb_ops = {
- .owner = THIS_MODULE,
- FB_DEFAULT_DEFERRED_OPS(hvfb_ops),
- .fb_check_var = hvfb_check_var,
- .fb_set_par = hvfb_set_par,
- .fb_setcolreg = hvfb_setcolreg,
- .fb_blank = hvfb_blank,
- .fb_destroy = hvfb_destroy,
-};
-
-/* Get options from kernel paramenter "video=" */
-static void hvfb_get_option(struct fb_info *info)
-{
- struct hvfb_par *par = info->par;
- char *opt = NULL, *p;
- uint x = 0, y = 0;
-
- if (fb_get_options(KBUILD_MODNAME, &opt) || !opt || !*opt)
- return;
-
- p = strsep(&opt, "x");
- if (!*p || kstrtouint(p, 0, &x) ||
- !opt || !*opt || kstrtouint(opt, 0, &y)) {
- pr_err("Screen option is invalid: skipped\n");
- return;
- }
-
- if (x < HVFB_WIDTH_MIN || y < HVFB_HEIGHT_MIN ||
- (synthvid_ver_ge(par->synthvid_version, SYNTHVID_VERSION_WIN10) &&
- (x * y * screen_depth / 8 > screen_fb_size)) ||
- (par->synthvid_version == SYNTHVID_VERSION_WIN8 &&
- x * y * screen_depth / 8 > SYNTHVID_FB_SIZE_WIN8)) {
- pr_err("Screen resolution option is out of range: skipped\n");
- return;
- }
-
- screen_width = x;
- screen_height = y;
- return;
-}
-
-/*
- * Allocate enough contiguous physical memory.
- * Return physical address if succeeded or -1 if failed.
- */
-static phys_addr_t hvfb_get_phymem(struct hv_device *hdev,
- unsigned int request_size)
-{
- struct page *page = NULL;
- dma_addr_t dma_handle;
- void *vmem;
- phys_addr_t paddr = 0;
- unsigned int order = get_order(request_size);
-
- if (request_size == 0)
- return -1;
-
- if (order <= MAX_PAGE_ORDER) {
- /* Call alloc_pages if the size is less than 2^MAX_PAGE_ORDER */
- page = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
- if (!page)
- return -1;
-
- paddr = (page_to_pfn(page) << PAGE_SHIFT);
- } else {
- /* Allocate from CMA */
- hdev->device.coherent_dma_mask = DMA_BIT_MASK(64);
-
- vmem = dma_alloc_coherent(&hdev->device,
- round_up(request_size, PAGE_SIZE),
- &dma_handle,
- GFP_KERNEL | __GFP_NOWARN);
-
- if (!vmem)
- return -1;
-
- paddr = virt_to_phys(vmem);
- }
-
- return paddr;
-}
-
-/* Release contiguous physical memory */
-static void hvfb_release_phymem(struct device *device,
- phys_addr_t paddr, unsigned int size)
-{
- unsigned int order = get_order(size);
-
- if (order <= MAX_PAGE_ORDER)
- __free_pages(pfn_to_page(paddr >> PAGE_SHIFT), order);
- else
- dma_free_coherent(device,
- round_up(size, PAGE_SIZE),
- phys_to_virt(paddr),
- paddr);
-}
-
-
-/* Get framebuffer memory from Hyper-V video pci space */
-static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
-{
- struct hvfb_par *par = info->par;
- struct pci_dev *pdev = NULL;
- void __iomem *fb_virt;
- int gen2vm = efi_enabled(EFI_BOOT);
- resource_size_t base = 0;
- resource_size_t size = 0;
- phys_addr_t paddr;
- int ret;
-
- if (!gen2vm) {
- pdev = pci_get_device(PCI_VENDOR_ID_MICROSOFT,
- PCI_DEVICE_ID_HYPERV_VIDEO, NULL);
- if (!pdev) {
- pr_err("Unable to find PCI Hyper-V video\n");
- return -ENODEV;
- }
-
- base = pci_resource_start(pdev, 0);
- size = pci_resource_len(pdev, 0);
- aperture_remove_conflicting_devices(base, size, KBUILD_MODNAME);
-
- /*
- * For Gen 1 VM, we can directly use the contiguous memory
- * from VM. If we succeed, deferred IO happens directly
- * on this allocated framebuffer memory, avoiding extra
- * memory copy.
- */
- paddr = hvfb_get_phymem(hdev, screen_fb_size);
- if (paddr != (phys_addr_t) -1) {
- par->mmio_pp = paddr;
- par->mmio_vp = par->dio_vp = __va(paddr);
-
- info->fix.smem_start = paddr;
- info->fix.smem_len = screen_fb_size;
- info->screen_base = par->mmio_vp;
- info->screen_size = screen_fb_size;
-
- par->need_docopy = false;
- goto getmem_done;
- }
- pr_info("Unable to allocate enough contiguous physical memory on Gen 1 VM. Using MMIO instead.\n");
- } else {
- aperture_remove_all_conflicting_devices(KBUILD_MODNAME);
- }
-
- /*
- * Cannot use contiguous physical memory, so allocate MMIO space for
- * the framebuffer. At this point in the function, conflicting devices
- * that might have claimed the framebuffer MMIO space based on
- * screen_info.lfb_base must have already been removed so that
- * vmbus_allocate_mmio() does not allocate different MMIO space. If the
- * kdump image were to be loaded using kexec_file_load(), the
- * framebuffer location in the kdump image would be set from
- * screen_info.lfb_base at the time that kdump is enabled. If the
- * framebuffer has moved elsewhere, this could be the wrong location,
- * causing kdump to hang when efifb (for example) loads.
- */
- dio_fb_size =
- screen_width * screen_height * screen_depth / 8;
-
- ret = vmbus_allocate_mmio(&par->mem, hdev, 0, -1,
- screen_fb_size, 0x100000, true);
- if (ret != 0) {
- pr_err("Unable to allocate framebuffer memory\n");
- goto err1;
- }
-
- /*
- * Map the VRAM cacheable for performance. This is also required for
- * VM Connect to display properly for ARM64 Linux VM, as the host also
- * maps the VRAM cacheable.
- */
- fb_virt = ioremap_cache(par->mem->start, screen_fb_size);
- if (!fb_virt)
- goto err2;
-
- /* Allocate memory for deferred IO */
- par->dio_vp = vzalloc(round_up(dio_fb_size, PAGE_SIZE));
- if (par->dio_vp == NULL)
- goto err3;
-
- /* Physical address of FB device */
- par->mmio_pp = par->mem->start;
- /* Virtual address of FB device */
- par->mmio_vp = (unsigned char *) fb_virt;
-
- info->fix.smem_start = par->mem->start;
- info->fix.smem_len = dio_fb_size;
- info->screen_base = par->dio_vp;
- info->screen_size = dio_fb_size;
-
-getmem_done:
- if (!gen2vm)
- pci_dev_put(pdev);
-
- return 0;
-
-err3:
- iounmap(fb_virt);
-err2:
- vmbus_free_mmio(par->mem->start, screen_fb_size);
- par->mem = NULL;
-err1:
- if (!gen2vm)
- pci_dev_put(pdev);
-
- return -ENOMEM;
-}
-
-/* Release the framebuffer */
-static void hvfb_putmem(struct fb_info *info)
-{
- struct hvfb_par *par = info->par;
-
- if (par->need_docopy) {
- vfree(par->dio_vp);
- iounmap(par->mmio_vp);
- vmbus_free_mmio(par->mem->start, screen_fb_size);
- } else {
- hvfb_release_phymem(info->device, info->fix.smem_start,
- screen_fb_size);
- }
-
- par->mem = NULL;
-}
-
-
-static int hvfb_probe(struct hv_device *hdev,
- const struct hv_vmbus_device_id *dev_id)
-{
- struct fb_info *info;
- struct hvfb_par *par;
- int ret;
-
- info = framebuffer_alloc(sizeof(struct hvfb_par), &hdev->device);
- if (!info)
- return -ENOMEM;
-
- par = info->par;
- par->info = info;
- par->fb_ready = false;
- par->need_docopy = true;
- init_completion(&par->wait);
- INIT_DELAYED_WORK(&par->dwork, hvfb_update_work);
-
- par->delayed_refresh = false;
- spin_lock_init(&par->delayed_refresh_lock);
- par->x1 = par->y1 = INT_MAX;
- par->x2 = par->y2 = 0;
-
- /* Connect to VSP */
- hv_set_drvdata(hdev, info);
- ret = synthvid_connect_vsp(hdev);
- if (ret) {
- pr_err("Unable to connect to VSP\n");
- goto error1;
- }
-
- hvfb_get_option(info);
- pr_info("Screen resolution: %dx%d, Color depth: %d, Frame buffer size: %d\n",
- screen_width, screen_height, screen_depth, screen_fb_size);
-
- ret = hvfb_getmem(hdev, info);
- if (ret) {
- pr_err("No memory for framebuffer\n");
- goto error2;
- }
-
- /* Set up fb_info */
- info->var.xres_virtual = info->var.xres = screen_width;
- info->var.yres_virtual = info->var.yres = screen_height;
- info->var.bits_per_pixel = screen_depth;
-
- if (info->var.bits_per_pixel == 16) {
- info->var.red = (struct fb_bitfield){11, 5, 0};
- info->var.green = (struct fb_bitfield){5, 6, 0};
- info->var.blue = (struct fb_bitfield){0, 5, 0};
- info->var.transp = (struct fb_bitfield){0, 0, 0};
- } else {
- info->var.red = (struct fb_bitfield){16, 8, 0};
- info->var.green = (struct fb_bitfield){8, 8, 0};
- info->var.blue = (struct fb_bitfield){0, 8, 0};
- info->var.transp = (struct fb_bitfield){24, 8, 0};
- }
-
- info->var.activate = FB_ACTIVATE_NOW;
- info->var.height = -1;
- info->var.width = -1;
- info->var.vmode = FB_VMODE_NONINTERLACED;
-
- strcpy(info->fix.id, KBUILD_MODNAME);
- info->fix.type = FB_TYPE_PACKED_PIXELS;
- info->fix.visual = FB_VISUAL_TRUECOLOR;
- info->fix.line_length = screen_width * screen_depth / 8;
- info->fix.accel = FB_ACCEL_NONE;
-
- info->fbops = &hvfb_ops;
- info->pseudo_palette = par->pseudo_palette;
-
- /* Initialize deferred IO */
- info->fbdefio = &synthvid_defio;
- fb_deferred_io_init(info);
-
- /* Send config to host */
- ret = synthvid_send_config(hdev);
- if (ret)
- goto error;
-
- ret = devm_register_framebuffer(&hdev->device, info);
- if (ret) {
- pr_err("Unable to register framebuffer\n");
- goto error;
- }
-
- par->fb_ready = true;
-
- par->synchronous_fb = false;
-
- /*
- * We need to be sure this panic notifier runs _before_ the
- * vmbus disconnect, so order it by priority. It must execute
- * before the function hv_panic_vmbus_unload() [drivers/hv/vmbus_drv.c],
- * which is almost at the end of list, with priority = INT_MIN + 1.
- */
- par->hvfb_panic_nb.notifier_call = hvfb_on_panic;
- par->hvfb_panic_nb.priority = INT_MIN + 10;
- atomic_notifier_chain_register(&panic_notifier_list,
- &par->hvfb_panic_nb);
-
- return 0;
-
-error:
- fb_deferred_io_cleanup(info);
- hvfb_putmem(info);
-error2:
- vmbus_close(hdev->channel);
-error1:
- cancel_delayed_work_sync(&par->dwork);
- hv_set_drvdata(hdev, NULL);
- framebuffer_release(info);
- return ret;
-}
-
-static void hvfb_remove(struct hv_device *hdev)
-{
- struct fb_info *info = hv_get_drvdata(hdev);
- struct hvfb_par *par = info->par;
-
- atomic_notifier_chain_unregister(&panic_notifier_list,
- &par->hvfb_panic_nb);
-
- par->update = false;
- par->fb_ready = false;
-
- fb_deferred_io_cleanup(info);
-
- cancel_delayed_work_sync(&par->dwork);
-
- vmbus_close(hdev->channel);
- hv_set_drvdata(hdev, NULL);
-}
-
-static int hvfb_suspend(struct hv_device *hdev)
-{
- struct fb_info *info = hv_get_drvdata(hdev);
- struct hvfb_par *par = info->par;
-
- console_lock();
-
- /* 1 means do suspend */
- fb_set_suspend(info, 1);
-
- cancel_delayed_work_sync(&par->dwork);
- cancel_delayed_work_sync(&info->deferred_work);
-
- par->update_saved = par->update;
- par->update = false;
- par->fb_ready = false;
-
- vmbus_close(hdev->channel);
-
- console_unlock();
-
- return 0;
-}
-
-static int hvfb_resume(struct hv_device *hdev)
-{
- struct fb_info *info = hv_get_drvdata(hdev);
- struct hvfb_par *par = info->par;
- int ret;
-
- console_lock();
-
- ret = synthvid_connect_vsp(hdev);
- if (ret != 0)
- goto out;
-
- ret = synthvid_send_config(hdev);
- if (ret != 0) {
- vmbus_close(hdev->channel);
- goto out;
- }
-
- par->fb_ready = true;
- par->update = par->update_saved;
-
- schedule_delayed_work(&info->deferred_work, info->fbdefio->delay);
- schedule_delayed_work(&par->dwork, HVFB_UPDATE_DELAY);
-
- /* 0 means do resume */
- fb_set_suspend(info, 0);
-
-out:
- console_unlock();
-
- return ret;
-}
-
-
-static const struct pci_device_id pci_stub_id_table[] = {
- {
- .vendor = PCI_VENDOR_ID_MICROSOFT,
- .device = PCI_DEVICE_ID_HYPERV_VIDEO,
- },
- { /* end of list */ }
-};
-
-static const struct hv_vmbus_device_id id_table[] = {
- /* Synthetic Video Device GUID */
- {HV_SYNTHVID_GUID},
- {}
-};
-
-MODULE_DEVICE_TABLE(pci, pci_stub_id_table);
-MODULE_DEVICE_TABLE(vmbus, id_table);
-
-static struct hv_driver hvfb_drv = {
- .name = KBUILD_MODNAME,
- .id_table = id_table,
- .probe = hvfb_probe,
- .remove = hvfb_remove,
- .suspend = hvfb_suspend,
- .resume = hvfb_resume,
- .driver = {
- .probe_type = PROBE_PREFER_ASYNCHRONOUS,
- },
-};
-
-static int hvfb_pci_stub_probe(struct pci_dev *pdev,
- const struct pci_device_id *ent)
-{
- return 0;
-}
-
-static void hvfb_pci_stub_remove(struct pci_dev *pdev)
-{
-}
-
-static struct pci_driver hvfb_pci_stub_driver = {
- .name = KBUILD_MODNAME,
- .id_table = pci_stub_id_table,
- .probe = hvfb_pci_stub_probe,
- .remove = hvfb_pci_stub_remove,
- .driver = {
- .probe_type = PROBE_PREFER_ASYNCHRONOUS,
- }
-};
-
-static int __init hvfb_drv_init(void)
-{
- int ret;
-
- pr_warn("Deprecated: use Hyper-V DRM driver instead\n");
-
- if (fb_modesetting_disabled("hyper_fb"))
- return -ENODEV;
-
- ret = vmbus_driver_register(&hvfb_drv);
- if (ret != 0)
- return ret;
-
- ret = pci_register_driver(&hvfb_pci_stub_driver);
- if (ret != 0) {
- vmbus_driver_unregister(&hvfb_drv);
- return ret;
- }
-
- return 0;
-}
-
-static void __exit hvfb_drv_exit(void)
-{
- pci_unregister_driver(&hvfb_pci_stub_driver);
- vmbus_driver_unregister(&hvfb_drv);
-}
-
-module_init(hvfb_drv_init);
-module_exit(hvfb_drv_exit);
-
-MODULE_LICENSE("GPL");
-MODULE_DESCRIPTION("Microsoft Hyper-V Synthetic Video Frame Buffer Driver");
--
2.49.0
^ permalink raw reply related
* Re: [PATCH net-next v12 04/12] vsock: add netns support to virtio transports
From: Stefano Garzarella @ 2025-12-24 13:01 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang,
Eugenio Pérez, Xuan Zhuo, K. Y. Srinivasan, Haiyang Zhang,
Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, Shuah Khan, linux-kernel,
virtualization, netdev, kvm, linux-hyperv, linux-kselftest,
berrange, Sargun Dhillon, Bobby Eshleman
In-Reply-To: <aUs0no+ni8/R8/1N@devvm11784.nha0.facebook.com>
On Tue, Dec 23, 2025 at 04:32:30PM -0800, Bobby Eshleman wrote:
>On Mon, Dec 15, 2025 at 05:22:02PM -0800, Bobby Eshleman wrote:
>> On Mon, Dec 15, 2025 at 03:11:22PM +0100, Stefano Garzarella wrote:
[...]
>> >
>> > FYI I'll be off from Dec 25 to Jan 6, so if we want to do an RFC in the
>> > middle, I'll do my best to take a look before my time off.
>> >
>> > Thanks,
>> > Stefano
>
>Just sent this out, though I acknowledge its pretty last minute WRT
>your time off.
Thanks for that, but yeah I didn't have time to take a closer look :-(
I'll do as soon I'm back!
>
>If I don't hear from you before then, have a good holiday!
Thanks, you too if you will have the opportunity!
Thanks,
Stefano
^ permalink raw reply
* [PATCH RFC net-next v13 00/13] vsock: add namespace support to vhost-vsock and loopback
From: Bobby Eshleman @ 2025-12-24 0:28 UTC (permalink / raw)
To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Eugenio Pérez, Xuan Zhuo, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, Shuah Khan, Long Li
Cc: linux-kernel, virtualization, netdev, kvm, linux-hyperv,
linux-kselftest, berrange, Sargun Dhillon, Bobby Eshleman,
Bobby Eshleman
This series adds namespace support to vhost-vsock and loopback. It does
not add namespaces to any of the other guest transports (virtio-vsock,
hyperv, or vmci).
The current revision supports two modes: local and global. Local
mode is complete isolation of namespaces, while global mode is complete
sharing between namespaces of CIDs (the original behavior).
The mode is set using the parent namespace's
/proc/sys/net/vsock/child_ns_mode and inherited when a new namespace is
created. The mode of the current namespace can be queried by reading
/proc/sys/net/vsock/ns_mode. The mode can not change after the namespace
has been created.
Modes are per-netns. This allows a system to configure namespaces
independently (some may share CIDs, others are completely isolated).
This also supports future possible mixed use cases, where there may be
namespaces in global mode spinning up VMs while there are mixed mode
namespaces that provide services to the VMs, but are not allowed to
allocate from the global CID pool (this mode is not implemented in this
series).
Additionally, added tests for the new namespace features:
tools/testing/selftests/vsock/vmtest.sh
1..25
ok 1 vm_server_host_client
ok 2 vm_client_host_server
ok 3 vm_loopback
ok 4 ns_host_vsock_ns_mode_ok
ok 5 ns_host_vsock_child_ns_mode_ok
ok 6 ns_global_same_cid_fails
ok 7 ns_local_same_cid_ok
ok 8 ns_global_local_same_cid_ok
ok 9 ns_local_global_same_cid_ok
ok 10 ns_diff_global_host_connect_to_global_vm_ok
ok 11 ns_diff_global_host_connect_to_local_vm_fails
ok 12 ns_diff_global_vm_connect_to_global_host_ok
ok 13 ns_diff_global_vm_connect_to_local_host_fails
ok 14 ns_diff_local_host_connect_to_local_vm_fails
ok 15 ns_diff_local_vm_connect_to_local_host_fails
ok 16 ns_diff_global_to_local_loopback_local_fails
ok 17 ns_diff_local_to_global_loopback_fails
ok 18 ns_diff_local_to_local_loopback_fails
ok 19 ns_diff_global_to_global_loopback_ok
ok 20 ns_same_local_loopback_ok
ok 21 ns_same_local_host_connect_to_local_vm_ok
ok 22 ns_same_local_vm_connect_to_local_host_ok
ok 23 ns_delete_vm_ok
ok 24 ns_delete_host_ok
ok 25 ns_delete_both_ok
SUMMARY: PASS=25 SKIP=0 FAIL=0
Thanks again for everyone's help and reviews!
Suggested-by: Sargun Dhillon <sargun@sargun.me>
Signed-off-by: Bobby Eshleman <bobbyeshleman@gmail.com>
Changes in v13:
- add support for immutable sysfs ns_mode and inheritance from sysfs child_ns_mode
- remove passing around of net_mode, can be accessed now via
vsock_net_mode(net) since it is immutable
- update tests for new uAPI
- add one patch to extend the kselftest timeout (it was starting to
fail with the new tests added)
- Link to v12: https://lore.kernel.org/r/20251126-vsock-vmtest-v12-0-257ee21cd5de@meta.com
Changes in v12:
- add ns mode checking to _allow() callbacks to reject local mode for
incompatible transports (Stefano)
- flip vhost/loopback to return true for stream_allow() and
seqpacket_allow() in "vsock: add netns support to virtio transports"
(Stefano)
- add VMADDR_CID_ANY + local mode documentation in af_vsock.c (Stefano)
- change "selftests/vsock: add tests for host <-> vm connectivity with
namespaces" to skip test 29 in vsock_test for namespace local
vsock_test calls in a host local-mode namespace. There is a
false-positive edge case for that test encountered with the
->stream_allow() approach. More details in that patch.
- updated cover letter with new test output
- Link to v11: https://lore.kernel.org/r/20251120-vsock-vmtest-v11-0-55cbc80249a7@meta.com
Changes in v11:
- vmtest: add a patch to use ss in wait_for_listener functions and
support vsock, tcp, and unix. Change all patches to use the new
functions.
- vmtest: add a patch to re-use vm dmesg / warn counting functions
- Link to v10: https://lore.kernel.org/r/20251117-vsock-vmtest-v10-0-df08f165bf3e@meta.com
Changes in v10:
- Combine virtio common patches into one (Stefano)
- Resolve vsock_loopback virtio_transport_reset_no_sock() issue
with info->vsk setting. This eliminates the need for skb->cb,
so remove skb->cb patches.
- many line width 80 fixes
- Link to v9: https://lore.kernel.org/all/20251111-vsock-vmtest-v9-0-852787a37bed@meta.com
Changes in v9:
- reorder loopback patch after patch for virtio transport common code
- remove module ordering tests patch because loopback no longer depends
on pernet ops
- major simplifications in vsock_loopback
- added a new patch for blocking local mode for guests, added test case
to check
- add net ref tracking to vsock_loopback patch
- Link to v8: https://lore.kernel.org/r/20251023-vsock-vmtest-v8-0-dea984d02bb0@meta.com
Changes in v8:
- Break generic cleanup/refactoring patches into standalone series,
remove those from this series
- Link to dependency: https://lore.kernel.org/all/20251022-vsock-selftests-fixes-and-improvements-v1-0-edeb179d6463@meta.com/
- Link to v7: https://lore.kernel.org/r/20251021-vsock-vmtest-v7-0-0661b7b6f081@meta.com
Changes in v7:
- fix hv_sock build
- break out vmtest patches into distinct, more well-scoped patches
- change `orig_net_mode` to `net_mode`
- many fixes and style changes in per-patch change sets (see individual
patches for specific changes)
- optimize `virtio_vsock_skb_cb` layout
- update commit messages with more useful descriptions
- vsock_loopback: use orig_net_mode instead of current net mode
- add tests for edge cases (ns deletion, mode changing, loopback module
load ordering)
- Link to v6: https://lore.kernel.org/r/20250916-vsock-vmtest-v6-0-064d2eb0c89d@meta.com
Changes in v6:
- define behavior when mode changes to local while socket/VM is alive
- af_vsock: clarify description of CID behavior
- af_vsock: use stronger langauge around CID rules (dont use "may")
- af_vsock: improve naming of buf/buffer
- af_vsock: improve string length checking on proc writes
- vsock_loopback: add space in struct to clarify lock protection
- vsock_loopback: do proper cleanup/unregister on vsock_loopback_exit()
- vsock_loopback: use virtio_vsock_skb_net() instead of sock_net()
- vsock_loopback: set loopback to NULL after kfree()
- vsock_loopback: use pernet_operations and remove callback mechanism
- vsock_loopback: add macros for "global" and "local"
- vsock_loopback: fix length checking
- vmtest.sh: check for namespace support in vmtest.sh
- Link to v5: https://lore.kernel.org/r/20250827-vsock-vmtest-v5-0-0ba580bede5b@meta.com
Changes in v5:
- /proc/net/vsock_ns_mode -> /proc/sys/net/vsock/ns_mode
- vsock_global_net -> vsock_global_dummy_net
- fix netns lookup in vhost_vsock to respect pid namespaces
- add callbacks for vsock_loopback to avoid circular dependency
- vmtest.sh loads vsock_loopback module
- remove vsock_net_mode_can_set()
- change vsock_net_write_mode() to return true/false based on success
- make vsock_net_mode enum instead of u8
- Link to v4: https://lore.kernel.org/r/20250805-vsock-vmtest-v4-0-059ec51ab111@meta.com
Changes in v4:
- removed RFC tag
- implemented loopback support
- renamed new tests to better reflect behavior
- completed suite of tests with permutations of ns modes and vsock_test
as guest/host
- simplified socat bridging with unix socket instead of tcp + veth
- only use vsock_test for success case, socat for failure case (context
in commit message)
- lots of cleanup
Changes in v3:
- add notion of "modes"
- add procfs /proc/net/vsock_ns_mode
- local and global modes only
- no /dev/vhost-vsock-netns
- vmtest.sh already merged, so new patch just adds new tests for NS
- Link to v2:
https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com
Changes in v2:
- only support vhost-vsock namespaces
- all g2h namespaces retain old behavior, only common API changes
impacted by vhost-vsock changes
- add /dev/vhost-vsock-netns for "opt-in"
- leave /dev/vhost-vsock to old behavior
- removed netns module param
- Link to v1:
https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com
Changes in v1:
- added 'netns' module param to vsock.ko to enable the
network namespace support (disabled by default)
- added 'vsock_net_eq()' to check the "net" assigned to a socket
only when 'netns' support is enabled
- Link to RFC: https://patchwork.ozlabs.org/cover/1202235/
---
Bobby Eshleman (13):
vsock: add per-net vsock NS mode state
vsock: add netns to vsock core
virtio: set skb owner of virtio_transport_reset_no_sock() reply
vsock: add netns support to virtio transports
selftests/vsock: increase timeout to 1200
selftests/vsock: add namespace helpers to vmtest.sh
selftests/vsock: prepare vm management helpers for namespaces
selftests/vsock: add vm_dmesg_{warn,oops}_count() helpers
selftests/vsock: use ss to wait for listeners instead of /proc/net
selftests/vsock: add tests for proc sys vsock ns_mode
selftests/vsock: add namespace tests for CID collisions
selftests/vsock: add tests for host <-> vm connectivity with namespaces
selftests/vsock: add tests for namespace deletion
MAINTAINERS | 1 +
drivers/vhost/vsock.c | 44 +-
include/linux/virtio_vsock.h | 9 +-
include/net/af_vsock.h | 53 +-
include/net/net_namespace.h | 4 +
include/net/netns/vsock.h | 17 +
net/vmw_vsock/af_vsock.c | 296 ++++++++-
net/vmw_vsock/hyperv_transport.c | 7 +-
net/vmw_vsock/virtio_transport.c | 22 +-
net/vmw_vsock/virtio_transport_common.c | 62 +-
net/vmw_vsock/vmci_transport.c | 26 +-
net/vmw_vsock/vsock_loopback.c | 22 +-
tools/testing/selftests/vsock/settings | 2 +-
tools/testing/selftests/vsock/vmtest.sh | 1055 +++++++++++++++++++++++++++++--
14 files changed, 1487 insertions(+), 133 deletions(-)
---
base-commit: 962ac5ca99a5c3e7469215bf47572440402dfd59
change-id: 20250325-vsock-vmtest-b3a21d2102c2
prerequisite-message-id: <20251022-vsock-selftests-fixes-and-improvements-v1-0-edeb179d6463@meta.com>
prerequisite-patch-id: a2eecc3851f2509ed40009a7cab6990c6d7cfff5
prerequisite-patch-id: 501db2100636b9c8fcb3b64b8b1df797ccbede85
prerequisite-patch-id: ba1a2f07398a035bc48ef72edda41888614be449
prerequisite-patch-id: fd5cc5445aca9355ce678e6d2bfa89fab8a57e61
prerequisite-patch-id: 795ab4432ffb0843e22b580374782e7e0d99b909
prerequisite-patch-id: 1499d263dc933e75366c09e045d2125ca39f7ddd
prerequisite-patch-id: f92d99bb1d35d99b063f818a19dcda999152d74c
prerequisite-patch-id: e3296f38cdba6d903e061cff2bbb3e7615e8e671
prerequisite-patch-id: bc4662b4710d302d4893f58708820fc2a0624325
prerequisite-patch-id: f8991f2e98c2661a706183fde6b35e2b8d9aedcf
prerequisite-patch-id: 44bf9ed69353586d284e5ee63d6fffa30439a698
prerequisite-patch-id: d50621bc630eeaf608bbaf260370c8dabf6326df
Best regards,
--
Bobby Eshleman <bobbyeshleman@meta.com>
^ permalink raw reply
* [PATCH RFC net-next v13 04/13] vsock: add netns support to virtio transports
From: Bobby Eshleman @ 2025-12-24 0:28 UTC (permalink / raw)
To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Eugenio Pérez, Xuan Zhuo, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, Shuah Khan, Long Li
Cc: linux-kernel, virtualization, netdev, kvm, linux-hyperv,
linux-kselftest, berrange, Sargun Dhillon, Bobby Eshleman,
Bobby Eshleman
In-Reply-To: <20251223-vsock-vmtest-v13-0-9d6db8e7c80b@meta.com>
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add netns support to loopback and vhost. Keep netns disabled for
virtio-vsock, but add necessary changes to comply with common API
updates.
This is the patch in the series when vhost-vsock namespaces actually
come online.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v13:
- do not store or pass the mode around now that net->vsock.mode is
immutable
- move virtio_transport_stream_allow() into virtio_transport.c
because virtio is the only caller now
Changes in v12:
- change seqpacket_allow() and stream_allow() to return true for
loopback and vhost (Stefano)
Changes in v11:
- reorder with the skb ownership patch for loopback (Stefano)
- toggle vhost_transport_supports_local_mode() to true
Changes in v10:
- Splitting patches complicates the series with meaningless placeholder
values that eventually get replaced anyway, so to avoid that this
patch combines into one. Links to previous patches here:
- Link: https://lore.kernel.org/all/20251111-vsock-vmtest-v9-3-852787a37bed@meta.com/
- Link: https://lore.kernel.org/all/20251111-vsock-vmtest-v9-6-852787a37bed@meta.com/
- Link: https://lore.kernel.org/all/20251111-vsock-vmtest-v9-7-852787a37bed@meta.com/
- remove placeholder values (Stefano)
- update comment describe net/net_mode for
virtio_transport_reset_no_sock()
---
drivers/vhost/vsock.c | 41 ++++++++++++++++---------
include/linux/virtio_vsock.h | 5 +--
net/vmw_vsock/virtio_transport.c | 13 ++++++--
net/vmw_vsock/virtio_transport_common.c | 54 +++++++++++++++++++--------------
net/vmw_vsock/vsock_loopback.c | 14 +++++++--
5 files changed, 84 insertions(+), 43 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 8eca59fd1afb..d939cac2b52e 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -48,6 +48,8 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);
struct vhost_vsock {
struct vhost_dev dev;
struct vhost_virtqueue vqs[2];
+ struct net *net;
+ netns_tracker ns_tracker;
/* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */
struct hlist_node hash;
@@ -69,7 +71,7 @@ static u32 vhost_transport_get_local_cid(void)
/* Callers that dereference the return value must hold vhost_vsock_mutex or the
* RCU read lock.
*/
-static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid, struct net *net)
{
struct vhost_vsock *vsock;
@@ -80,9 +82,9 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
if (other_cid == 0)
continue;
- if (other_cid == guest_cid)
+ if (other_cid == guest_cid &&
+ vsock_net_check_mode(net, vsock->net))
return vsock;
-
}
return NULL;
@@ -271,7 +273,7 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work)
}
static int
-vhost_transport_send_pkt(struct sk_buff *skb)
+vhost_transport_send_pkt(struct sk_buff *skb, struct net *net)
{
struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
struct vhost_vsock *vsock;
@@ -280,7 +282,7 @@ vhost_transport_send_pkt(struct sk_buff *skb)
rcu_read_lock();
/* Find the vhost_vsock according to guest context id */
- vsock = vhost_vsock_get(le64_to_cpu(hdr->dst_cid));
+ vsock = vhost_vsock_get(le64_to_cpu(hdr->dst_cid), net);
if (!vsock) {
rcu_read_unlock();
kfree_skb(skb);
@@ -307,7 +309,8 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
rcu_read_lock();
/* Find the vhost_vsock according to guest context id */
- vsock = vhost_vsock_get(vsk->remote_addr.svm_cid);
+ vsock = vhost_vsock_get(vsk->remote_addr.svm_cid,
+ sock_net(sk_vsock(vsk)));
if (!vsock)
goto out;
@@ -409,6 +412,12 @@ static bool vhost_transport_msgzerocopy_allow(void)
static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk,
u32 remote_cid);
+static bool
+vhost_transport_stream_allow(struct vsock_sock *vsk, u32 cid, u32 port)
+{
+ return true;
+}
+
static struct virtio_transport vhost_transport = {
.transport = {
.module = THIS_MODULE,
@@ -433,7 +442,7 @@ static struct virtio_transport vhost_transport = {
.stream_has_space = virtio_transport_stream_has_space,
.stream_rcvhiwat = virtio_transport_stream_rcvhiwat,
.stream_is_active = virtio_transport_stream_is_active,
- .stream_allow = virtio_transport_stream_allow,
+ .stream_allow = vhost_transport_stream_allow,
.seqpacket_dequeue = virtio_transport_seqpacket_dequeue,
.seqpacket_enqueue = virtio_transport_seqpacket_enqueue,
@@ -466,14 +475,12 @@ static struct virtio_transport vhost_transport = {
static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk,
u32 remote_cid)
{
+ struct net *net = sock_net(sk_vsock(vsk));
struct vhost_vsock *vsock;
bool seqpacket_allow = false;
- if (vsock_net_mode(sock_net(sk_vsock(vsk))) != VSOCK_NET_MODE_GLOBAL)
- return false;
-
rcu_read_lock();
- vsock = vhost_vsock_get(remote_cid);
+ vsock = vhost_vsock_get(remote_cid, net);
if (vsock)
seqpacket_allow = vsock->seqpacket_allow;
@@ -544,7 +551,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
if (le64_to_cpu(hdr->src_cid) == vsock->guest_cid &&
le64_to_cpu(hdr->dst_cid) ==
vhost_transport_get_local_cid())
- virtio_transport_recv_pkt(&vhost_transport, skb);
+ virtio_transport_recv_pkt(&vhost_transport, skb,
+ vsock->net);
else
kfree_skb(skb);
@@ -661,6 +669,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
{
struct vhost_virtqueue **vqs;
struct vhost_vsock *vsock;
+ struct net *net;
int ret;
/* This struct is large and allocation could fail, fall back to vmalloc
@@ -676,6 +685,9 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
goto out;
}
+ net = current->nsproxy->net_ns;
+ vsock->net = get_net_track(net, &vsock->ns_tracker, GFP_KERNEL);
+
vsock->guest_cid = 0; /* no CID assigned yet */
vsock->seqpacket_allow = false;
@@ -715,7 +727,7 @@ static void vhost_vsock_reset_orphans(struct sock *sk)
*/
/* If the peer is still valid, no need to reset connection */
- if (vhost_vsock_get(vsk->remote_addr.svm_cid))
+ if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk)))
return;
/* If the close timeout is pending, let it expire. This avoids races
@@ -760,6 +772,7 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
virtio_vsock_skb_queue_purge(&vsock->send_pkt_queue);
vhost_dev_cleanup(&vsock->dev);
+ put_net_track(vsock->net, &vsock->ns_tracker);
kfree(vsock->dev.vqs);
vhost_vsock_free(vsock);
return 0;
@@ -786,7 +799,7 @@ static int vhost_vsock_set_cid(struct vhost_vsock *vsock, u64 guest_cid)
/* Refuse if CID is already in use */
mutex_lock(&vhost_vsock_mutex);
- other = vhost_vsock_get(guest_cid);
+ other = vhost_vsock_get(guest_cid, vsock->net);
if (other && other != vsock) {
mutex_unlock(&vhost_vsock_mutex);
return -EADDRINUSE;
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 1845e8d4f78d..f91704731057 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -173,6 +173,7 @@ struct virtio_vsock_pkt_info {
u32 remote_cid, remote_port;
struct vsock_sock *vsk;
struct msghdr *msg;
+ struct net *net;
u32 pkt_len;
u16 type;
u16 op;
@@ -185,7 +186,7 @@ struct virtio_transport {
struct vsock_transport transport;
/* Takes ownership of the packet */
- int (*send_pkt)(struct sk_buff *skb);
+ int (*send_pkt)(struct sk_buff *skb, struct net *net);
/* Used in MSG_ZEROCOPY mode. Checks, that provided data
* (number of buffers) could be transmitted with zerocopy
@@ -280,7 +281,7 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
void virtio_transport_destruct(struct vsock_sock *vsk);
void virtio_transport_recv_pkt(struct virtio_transport *t,
- struct sk_buff *skb);
+ struct sk_buff *skb, struct net *net);
void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_buff *skb);
u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 wanted);
void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit);
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 37eeefddb48c..22ff5a503070 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -231,7 +231,7 @@ static int virtio_transport_send_skb_fast_path(struct virtio_vsock *vsock, struc
}
static int
-virtio_transport_send_pkt(struct sk_buff *skb)
+virtio_transport_send_pkt(struct sk_buff *skb, struct net *net)
{
struct virtio_vsock_hdr *hdr;
struct virtio_vsock *vsock;
@@ -536,6 +536,11 @@ static bool virtio_transport_msgzerocopy_allow(void)
return true;
}
+bool virtio_transport_stream_allow(struct vsock_sock *vsk, u32 cid, u32 port)
+{
+ return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
+}
+
static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk,
u32 remote_cid);
@@ -665,7 +670,11 @@ static void virtio_transport_rx_work(struct work_struct *work)
virtio_vsock_skb_put(skb, payload_len);
virtio_transport_deliver_tap_pkt(skb);
- virtio_transport_recv_pkt(&virtio_transport, skb);
+
+ /* Force virtio-transport into global mode since it
+ * does not yet support local-mode namespacing.
+ */
+ virtio_transport_recv_pkt(&virtio_transport, skb, NULL);
}
} while (!virtqueue_enable_cb(vq));
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 718be9f33274..c126aa235091 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -413,7 +413,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
virtio_transport_inc_tx_pkt(vvs, skb);
- ret = t_ops->send_pkt(skb);
+ ret = t_ops->send_pkt(skb, info->net);
if (ret < 0)
break;
@@ -527,6 +527,7 @@ static int virtio_transport_send_credit_update(struct vsock_sock *vsk)
struct virtio_vsock_pkt_info info = {
.op = VIRTIO_VSOCK_OP_CREDIT_UPDATE,
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1043,12 +1044,6 @@ bool virtio_transport_stream_is_active(struct vsock_sock *vsk)
}
EXPORT_SYMBOL_GPL(virtio_transport_stream_is_active);
-bool virtio_transport_stream_allow(struct vsock_sock *vsk, u32 cid, u32 port)
-{
- return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
-
int virtio_transport_dgram_bind(struct vsock_sock *vsk,
struct sockaddr_vm *addr)
{
@@ -1067,6 +1062,7 @@ int virtio_transport_connect(struct vsock_sock *vsk)
struct virtio_vsock_pkt_info info = {
.op = VIRTIO_VSOCK_OP_REQUEST,
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1082,6 +1078,7 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)
(mode & SEND_SHUTDOWN ?
VIRTIO_VSOCK_SHUTDOWN_SEND : 0),
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1108,6 +1105,7 @@ virtio_transport_stream_enqueue(struct vsock_sock *vsk,
.msg = msg,
.pkt_len = len,
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1145,6 +1143,7 @@ static int virtio_transport_reset(struct vsock_sock *vsk,
.op = VIRTIO_VSOCK_OP_RST,
.reply = !!skb,
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
/* Send RST only if the original pkt is not a RST pkt */
@@ -1156,9 +1155,13 @@ static int virtio_transport_reset(struct vsock_sock *vsk,
/* Normally packets are associated with a socket. There may be no socket if an
* attempt was made to connect to a socket that does not exist.
+ *
+ * net refers to the namespace of whoever sent the invalid message. For
+ * loopback, this is the namespace of the socket. For vhost, this is the
+ * namespace of the VM (i.e., vhost_vsock).
*/
static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
- struct sk_buff *skb)
+ struct sk_buff *skb, struct net *net)
{
struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
struct virtio_vsock_pkt_info info = {
@@ -1171,6 +1174,12 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
* sock_net(sk) until the reply skb is freed.
*/
.vsk = vsock_sk(skb->sk),
+
+ /* net is not defined here because we pass it directly to
+ * t->send_pkt(), instead of relying on
+ * virtio_transport_send_pkt_info() to pass it. It is not needed
+ * by virtio_transport_alloc_skb().
+ */
};
struct sk_buff *reply;
@@ -1189,7 +1198,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
if (!reply)
return -ENOMEM;
- return t->send_pkt(reply);
+ return t->send_pkt(reply, net);
}
/* This function should be called with sk_lock held and SOCK_DONE set */
@@ -1471,6 +1480,7 @@ virtio_transport_send_response(struct vsock_sock *vsk,
.remote_port = le32_to_cpu(hdr->src_port),
.reply = true,
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1513,12 +1523,12 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
int ret;
if (le16_to_cpu(hdr->op) != VIRTIO_VSOCK_OP_REQUEST) {
- virtio_transport_reset_no_sock(t, skb);
+ virtio_transport_reset_no_sock(t, skb, sock_net(sk));
return -EINVAL;
}
if (sk_acceptq_is_full(sk)) {
- virtio_transport_reset_no_sock(t, skb);
+ virtio_transport_reset_no_sock(t, skb, sock_net(sk));
return -ENOMEM;
}
@@ -1526,13 +1536,13 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
* Subsequent enqueues would lead to a memory leak.
*/
if (sk->sk_shutdown == SHUTDOWN_MASK) {
- virtio_transport_reset_no_sock(t, skb);
+ virtio_transport_reset_no_sock(t, skb, sock_net(sk));
return -ESHUTDOWN;
}
child = vsock_create_connected(sk);
if (!child) {
- virtio_transport_reset_no_sock(t, skb);
+ virtio_transport_reset_no_sock(t, skb, sock_net(sk));
return -ENOMEM;
}
@@ -1554,7 +1564,7 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
*/
if (ret || vchild->transport != &t->transport) {
release_sock(child);
- virtio_transport_reset_no_sock(t, skb);
+ virtio_transport_reset_no_sock(t, skb, sock_net(sk));
sock_put(child);
return ret;
}
@@ -1582,7 +1592,7 @@ static bool virtio_transport_valid_type(u16 type)
* lock.
*/
void virtio_transport_recv_pkt(struct virtio_transport *t,
- struct sk_buff *skb)
+ struct sk_buff *skb, struct net *net)
{
struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
struct sockaddr_vm src, dst;
@@ -1605,24 +1615,24 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
le32_to_cpu(hdr->fwd_cnt));
if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) {
- (void)virtio_transport_reset_no_sock(t, skb);
+ (void)virtio_transport_reset_no_sock(t, skb, net);
goto free_pkt;
}
/* The socket must be in connected or bound table
* otherwise send reset back
*/
- sk = vsock_find_connected_socket(&src, &dst);
+ sk = vsock_find_connected_socket_net(&src, &dst, net);
if (!sk) {
- sk = vsock_find_bound_socket(&dst);
+ sk = vsock_find_bound_socket_net(&dst, net);
if (!sk) {
- (void)virtio_transport_reset_no_sock(t, skb);
+ (void)virtio_transport_reset_no_sock(t, skb, net);
goto free_pkt;
}
}
if (virtio_transport_get_type(sk) != le16_to_cpu(hdr->type)) {
- (void)virtio_transport_reset_no_sock(t, skb);
+ (void)virtio_transport_reset_no_sock(t, skb, net);
sock_put(sk);
goto free_pkt;
}
@@ -1641,7 +1651,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
*/
if (sock_flag(sk, SOCK_DONE) ||
(sk->sk_state != TCP_LISTEN && vsk->transport != &t->transport)) {
- (void)virtio_transport_reset_no_sock(t, skb);
+ (void)virtio_transport_reset_no_sock(t, skb, net);
release_sock(sk);
sock_put(sk);
goto free_pkt;
@@ -1673,7 +1683,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
kfree_skb(skb);
break;
default:
- (void)virtio_transport_reset_no_sock(t, skb);
+ (void)virtio_transport_reset_no_sock(t, skb, net);
kfree_skb(skb);
break;
}
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index 378a96dcb666..dbd4d81e0acb 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -26,7 +26,7 @@ static u32 vsock_loopback_get_local_cid(void)
return VMADDR_CID_LOCAL;
}
-static int vsock_loopback_send_pkt(struct sk_buff *skb)
+static int vsock_loopback_send_pkt(struct sk_buff *skb, struct net *net)
{
struct vsock_loopback *vsock = &the_vsock_loopback;
int len = skb->len;
@@ -48,6 +48,13 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk)
static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk,
u32 remote_cid);
+
+static bool vsock_loopback_stream_allow(struct vsock_sock *vsk, u32 cid,
+ u32 port)
+{
+ return true;
+}
+
static bool vsock_loopback_msgzerocopy_allow(void)
{
return true;
@@ -77,7 +84,7 @@ static struct virtio_transport loopback_transport = {
.stream_has_space = virtio_transport_stream_has_space,
.stream_rcvhiwat = virtio_transport_stream_rcvhiwat,
.stream_is_active = virtio_transport_stream_is_active,
- .stream_allow = virtio_transport_stream_allow,
+ .stream_allow = vsock_loopback_stream_allow,
.seqpacket_dequeue = virtio_transport_seqpacket_dequeue,
.seqpacket_enqueue = virtio_transport_seqpacket_enqueue,
@@ -132,7 +139,8 @@ static void vsock_loopback_work(struct work_struct *work)
*/
virtio_transport_consume_skb_sent(skb, false);
virtio_transport_deliver_tap_pkt(skb);
- virtio_transport_recv_pkt(&loopback_transport, skb);
+ virtio_transport_recv_pkt(&loopback_transport, skb,
+ sock_net(skb->sk));
}
}
--
2.47.3
^ permalink raw reply related
* Re: [PATCH net-next v12 04/12] vsock: add netns support to virtio transports
From: Bobby Eshleman @ 2025-12-24 0:32 UTC (permalink / raw)
To: Stefano Garzarella
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang,
Eugenio Pérez, Xuan Zhuo, K. Y. Srinivasan, Haiyang Zhang,
Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, Shuah Khan, linux-kernel,
virtualization, netdev, kvm, linux-hyperv, linux-kselftest,
berrange, Sargun Dhillon, Bobby Eshleman
In-Reply-To: <aUC0Op2trtt3z405@devvm11784.nha0.facebook.com>
On Mon, Dec 15, 2025 at 05:22:02PM -0800, Bobby Eshleman wrote:
> On Mon, Dec 15, 2025 at 03:11:22PM +0100, Stefano Garzarella wrote:
> > On Fri, Dec 12, 2025 at 07:26:15AM -0800, Bobby Eshleman wrote:
> > > On Tue, Dec 02, 2025 at 02:01:04PM -0800, Bobby Eshleman wrote:
> > > > On Tue, Dec 02, 2025 at 09:47:19PM +0100, Paolo Abeni wrote:
> > > > > On 12/2/25 6:56 PM, Bobby Eshleman wrote:
> > > > > > On Tue, Dec 02, 2025 at 11:18:14AM +0100, Paolo Abeni wrote:
> > > > > >> On 11/27/25 8:47 AM, Bobby Eshleman wrote:
> > > > > >>> @@ -674,6 +689,17 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
> > > > > >>> goto out;
> > > > > >>> }
> > > > > >>>
> > > > > >>> + net = current->nsproxy->net_ns;
> > > > > >>> + vsock->net = get_net_track(net, &vsock->ns_tracker, GFP_KERNEL);
> > > > > >>> +
> > > > > >>> + /* Store the mode of the namespace at the time of creation. If this
> > > > > >>> + * namespace later changes from "global" to "local", we want this vsock
> > > > > >>> + * to continue operating normally and not suddenly break. For that
> > > > > >>> + * reason, we save the mode here and later use it when performing
> > > > > >>> + * socket lookups with vsock_net_check_mode() (see vhost_vsock_get()).
> > > > > >>> + */
> > > > > >>> + vsock->net_mode = vsock_net_mode(net);
> > > > > >>
> > > > > >> I'm sorry for the very late feedback. I think that at very least the
> > > > > >> user-space needs a way to query if the given transport is in local or
> > > > > >> global mode, as AFAICS there is no way to tell that when socket creation
> > > > > >> races with mode change.
> > > > > >
> > > > > > Are you thinking something along the lines of sockopt?
> > > > >
> > > > > I'd like to see a way for the user-space to query the socket 'namespace
> > > > > mode'.
> > > > >
> > > > > sockopt could be an option; a possibly better one could be sock_diag. Or
> > > > > you could do both using dumping the info with a shared helper invoked by
> > > > > both code paths, alike what TCP is doing.
> > > > > >> Also I'm a bit uneasy with the model implemented here, as 'local' socket
> > > > > >> may cross netns boundaris and connect to 'local' socket in other netns
> > > > > >> (if I read correctly patch 2/12). That in turns AFAICS break the netns
> > > > > >> isolation.
> > > > > >
> > > > > > Local mode sockets are unable to communicate with local mode (and global
> > > > > > mode too) sockets that are in other namespaces. The key piece of code
> > > > > > for that is vsock_net_check_mode(), where if either modes is local the
> > > > > > namespaces must be the same.
> > > > >
> > > > > Sorry, I likely misread the large comment in patch 2:
> > > > >
> > > > > https://lore.kernel.org/netdev/20251126-vsock-vmtest-v12-2-257ee21cd5de@meta.com/
> > > > >
> > > > > >> Have you considered instead a slightly different model, where the
> > > > > >> local/global model is set in stone at netns creation time - alike what
> > > > > >> /proc/sys/net/ipv4/tcp_child_ehash_entries is doing[1] - and
> > > > > >> inter-netns connectivity is explicitly granted by the admin (I guess
> > > > > >> you will need new transport operations for that)?
> > > > > >>
> > > > > >> /P
> > > > > >>
> > > > > >> [1] tcp allows using per-netns established socket lookup tables - as
> > > > > >> opposed to the default global lookup table (even if match always takes
> > > > > >> in account the netns obviously). The mentioned sysctl specify such
> > > > > >> configuration for the children namespaces, if any.
> > > > > >
> > > > > > I'll save this discussion if the above doesn't resolve your concerns.
> > > > > I still have some concern WRT the dynamic mode change after netns
> > > > > creation. I fear some 'unsolvable' (or very hard to solve) race I can't
> > > > > see now. A tcp_child_ehash_entries-like model will avoid completely the
> > > > > issue, but I understand it would be a significant change over the
> > > > > current status.
> > > > >
> > > > > "Luckily" the merge window is on us and we have some time to discuss. Do
> > > > > you have a specific use-case for the ability to change the netns >
> > > > mode
> > > > > after creation?
> > > > >
> > > > > /P
> > > >
> > > > I don't think there is a hard requirement that the mode be change-able
> > > > after creation. Though I'd love to avoid such a big change... or at
> > > > least leave unchanged as much of what we've already reviewed as
> > > > possible.
> > > >
> > > > In the scheme of defining the mode at creation and following the
> > > > tcp_child_ehash_entries-ish model, what I'm imagining is:
> > > > - /proc/sys/net/vsock/child_ns_mode can be set to "local" or "global"
> > > > - /proc/sys/net/vsock/child_ns_mode is not immutable, can change any
> > > > number of times
> > > >
> > > > - when a netns is created, the new netns mode is inherited from
> > > > child_ns_mode, being assigned using something like:
> > > >
> > > > net->vsock.ns_mode =
> > > > get_net_ns_by_pid(current->pid)->child_ns_mode
> > > >
> > > > - /proc/sys/net/vsock/ns_mode queries the current mode, returning
> > > > "local" or "global", returning value of net->vsock.ns_mode
> > > > - /proc/sys/net/vsock/ns_mode and net->vsock.ns_mode are immutable and
> > > > reject writes
> > > >
> > > > Does that align with what you have in mind?
> > >
> > > Hey Paolo, I just wanted to sync up on this one. Does the above align
> > > with what you envision?
> >
> > Hi Bobby, AFAIK Paolo was at LPC, so there could be some delay.
> >
> > FYI I'll be off from Dec 25 to Jan 6, so if we want to do an RFC in the
> > middle, I'll do my best to take a look before my time off.
> >
> > Thanks,
> > Stefano
Just sent this out, though I acknowledge its pretty last minute WRT
your time off.
If I don't hear from you before then, have a good holiday!
Best,
Bobby
^ permalink raw reply
* [PATCH RFC net-next v13 13/13] selftests/vsock: add tests for namespace deletion
From: Bobby Eshleman @ 2025-12-24 0:28 UTC (permalink / raw)
To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Eugenio Pérez, Xuan Zhuo, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, Shuah Khan, Long Li
Cc: linux-kernel, virtualization, netdev, kvm, linux-hyperv,
linux-kselftest, berrange, Sargun Dhillon, Bobby Eshleman,
Bobby Eshleman
In-Reply-To: <20251223-vsock-vmtest-v13-0-9d6db8e7c80b@meta.com>
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add tests that validate vsock sockets are resilient to deleting
namespaces. The vsock sockets should still function normally.
The function check_ns_delete_doesnt_break_connection() is added to
re-use the step-by-step logic of 1) setup connections, 2) delete ns,
3) check that the connections are still ok.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v13:
- remove tests that change the mode after socket creation (this is not
supported behavior now and the immutability property is tested in other
tests)
- remove "change_mode" behavior of
check_ns_changes_dont_break_connection() and rename to
check_ns_delete_doesnt_break_connection() because we only need to test
namespace deletion (other tests confirm that the mode cannot change)
Changes in v11:
- remove pipefile (Stefano)
Changes in v9:
- more consistent shell style
- clarify -u usage comment for pipefile
---
tools/testing/selftests/vsock/vmtest.sh | 84 +++++++++++++++++++++++++++++++++
1 file changed, 84 insertions(+)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index a9eaf37bc31b..dc8dbe74a6d0 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -68,6 +68,9 @@ readonly TEST_NAMES=(
ns_same_local_loopback_ok
ns_same_local_host_connect_to_local_vm_ok
ns_same_local_vm_connect_to_local_host_ok
+ ns_delete_vm_ok
+ ns_delete_host_ok
+ ns_delete_both_ok
)
readonly TEST_DESCS=(
# vm_server_host_client
@@ -135,6 +138,15 @@ readonly TEST_DESCS=(
# ns_same_local_vm_connect_to_local_host_ok
"Run vsock_test client in VM in a local ns with server in same ns."
+
+ # ns_delete_vm_ok
+ "Check that deleting the VM's namespace does not break the socket connection"
+
+ # ns_delete_host_ok
+ "Check that deleting the host's namespace does not break the socket connection"
+
+ # ns_delete_both_ok
+ "Check that deleting the VM and host's namespaces does not break the socket connection"
)
readonly USE_SHARED_VM=(
@@ -1287,6 +1299,78 @@ test_vm_loopback() {
return "${KSFT_PASS}"
}
+check_ns_delete_doesnt_break_connection() {
+ local pipefile pidfile outfile
+ local ns0="global0"
+ local ns1="global1"
+ local port=12345
+ local pids=()
+ local rc=0
+
+ init_namespaces
+
+ pidfile="$(create_pidfile)"
+ if ! vm_start "${pidfile}" "${ns0}"; then
+ return "${KSFT_FAIL}"
+ fi
+ vm_wait_for_ssh "${ns0}"
+
+ outfile=$(mktemp)
+ vm_ssh "${ns0}" -- \
+ socat VSOCK-LISTEN:"${port}",fork STDOUT > "${outfile}" 2>/dev/null &
+ pids+=($!)
+ vm_wait_for_listener "${ns0}" "${port}" "vsock"
+
+ # We use a pipe here so that we can echo into the pipe instead of using
+ # socat and a unix socket file. We just need a name for the pipe (not a
+ # regular file) so use -u.
+ pipefile=$(mktemp -u /tmp/vmtest_pipe_XXXX)
+ ip netns exec "${ns1}" \
+ socat PIPE:"${pipefile}" VSOCK-CONNECT:"${VSOCK_CID}":"${port}" &
+ pids+=($!)
+
+ timeout "${WAIT_PERIOD}" \
+ bash -c 'while [[ ! -e '"${pipefile}"' ]]; do sleep 1; done; exit 0'
+
+ if [[ "$1" == "vm" ]]; then
+ ip netns del "${ns0}"
+ elif [[ "$1" == "host" ]]; then
+ ip netns del "${ns1}"
+ elif [[ "$1" == "both" ]]; then
+ ip netns del "${ns0}"
+ ip netns del "${ns1}"
+ fi
+
+ echo "TEST" > "${pipefile}"
+
+ timeout "${WAIT_PERIOD}" \
+ bash -c 'while [[ ! -s '"${outfile}"' ]]; do sleep 1; done; exit 0'
+
+ if grep -q "TEST" "${outfile}"; then
+ rc="${KSFT_PASS}"
+ else
+ rc="${KSFT_FAIL}"
+ fi
+
+ terminate_pidfiles "${pidfile}"
+ terminate_pids "${pids[@]}"
+ rm -f "${outfile}" "${pipefile}"
+
+ return "${rc}"
+}
+
+test_ns_delete_vm_ok() {
+ check_ns_delete_doesnt_break_connection "vm"
+}
+
+test_ns_delete_host_ok() {
+ check_ns_delete_doesnt_break_connection "host"
+}
+
+test_ns_delete_both_ok() {
+ check_ns_delete_doesnt_break_connection "both"
+}
+
shared_vm_test() {
local tname
--
2.47.3
^ permalink raw reply related
* [PATCH RFC net-next v13 12/13] selftests/vsock: add tests for host <-> vm connectivity with namespaces
From: Bobby Eshleman @ 2025-12-24 0:28 UTC (permalink / raw)
To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Eugenio Pérez, Xuan Zhuo, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, Shuah Khan, Long Li
Cc: linux-kernel, virtualization, netdev, kvm, linux-hyperv,
linux-kselftest, berrange, Sargun Dhillon, Bobby Eshleman,
Bobby Eshleman
In-Reply-To: <20251223-vsock-vmtest-v13-0-9d6db8e7c80b@meta.com>
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add tests to validate namespace correctness using vsock_test and socat.
The vsock_test tool is used to validate expected success tests, but
socat is used for expected failure tests. socat is used to ensure that
connections are rejected outright instead of failing due to some other
socket behavior (as tested in vsock_test). Additionally, socat is
already required for tunneling TCP traffic from vsock_test. Using only
one of the vsock_test tests like 'test_stream_client_close_client' would
have yielded a similar result, but doing so wouldn't remove the socat
dependency.
Additionally, check for the dependency socat. socat needs special
handling beyond just checking if it is on the path because it must be
compiled with support for both vsock and unix. The function
check_socat() checks that this support exists.
Add more padding to test name printf strings because the tests added in
this patch would otherwise overflow.
Add vm_dmesg_* helpers to encapsulate checking dmesg
for oops and warnings.
Add ability to pass extra args to host-side vsock_test so that tests
that cause false positives may be skipped with arg --skip.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v12:
- add test skip (vsock_test test 29) when host_vsock_test() uses client
mode in a local namespace. Test 29 causes a false positive to trigger.
Changes in v11:
- add 'sleep "${WAIT_PERIOD}"' after any non-TCP socat LISTEN cmd
(Stefano)
- add host_wait_for_listener() after any socat TCP-LISTEN (Stefano)
- reuse vm_dmesg_{oops,warn}_count() inside vm_dmesg_check()
- fix copy-paste in test_ns_same_local_vm_connect_to_local_host_ok()
(Stefano)
Changes in v10:
- add vm_dmesg_start() and vm_dmesg_check()
Changes in v9:
- consistent variable quoting
---
tools/testing/selftests/vsock/vmtest.sh | 572 +++++++++++++++++++++++++++++++-
1 file changed, 568 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index 1bf537410ea6..a9eaf37bc31b 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -7,6 +7,7 @@
# * virtme-ng
# * busybox-static (used by virtme-ng)
# * qemu (used by virtme-ng)
+# * socat
#
# shellcheck disable=SC2317,SC2119
@@ -54,6 +55,19 @@ readonly TEST_NAMES=(
ns_local_same_cid_ok
ns_global_local_same_cid_ok
ns_local_global_same_cid_ok
+ ns_diff_global_host_connect_to_global_vm_ok
+ ns_diff_global_host_connect_to_local_vm_fails
+ ns_diff_global_vm_connect_to_global_host_ok
+ ns_diff_global_vm_connect_to_local_host_fails
+ ns_diff_local_host_connect_to_local_vm_fails
+ ns_diff_local_vm_connect_to_local_host_fails
+ ns_diff_global_to_local_loopback_local_fails
+ ns_diff_local_to_global_loopback_fails
+ ns_diff_local_to_local_loopback_fails
+ ns_diff_global_to_global_loopback_ok
+ ns_same_local_loopback_ok
+ ns_same_local_host_connect_to_local_vm_ok
+ ns_same_local_vm_connect_to_local_host_ok
)
readonly TEST_DESCS=(
# vm_server_host_client
@@ -82,6 +96,45 @@ readonly TEST_DESCS=(
# ns_local_global_same_cid_ok
"Check QEMU successfully starts one VM in a local ns and then another VM in a global ns with the same CID."
+
+ # ns_diff_global_host_connect_to_global_vm_ok
+ "Run vsock_test client in global ns with server in VM in another global ns."
+
+ # ns_diff_global_host_connect_to_local_vm_fails
+ "Run socat to test a process in a global ns fails to connect to a VM in a local ns."
+
+ # ns_diff_global_vm_connect_to_global_host_ok
+ "Run vsock_test client in VM in a global ns with server in another global ns."
+
+ # ns_diff_global_vm_connect_to_local_host_fails
+ "Run socat to test a VM in a global ns fails to connect to a host process in a local ns."
+
+ # ns_diff_local_host_connect_to_local_vm_fails
+ "Run socat to test a host process in a local ns fails to connect to a VM in another local ns."
+
+ # ns_diff_local_vm_connect_to_local_host_fails
+ "Run socat to test a VM in a local ns fails to connect to a host process in another local ns."
+
+ # ns_diff_global_to_local_loopback_local_fails
+ "Run socat to test a loopback vsock in a global ns fails to connect to a vsock in a local ns."
+
+ # ns_diff_local_to_global_loopback_fails
+ "Run socat to test a loopback vsock in a local ns fails to connect to a vsock in a global ns."
+
+ # ns_diff_local_to_local_loopback_fails
+ "Run socat to test a loopback vsock in a local ns fails to connect to a vsock in another local ns."
+
+ # ns_diff_global_to_global_loopback_ok
+ "Run socat to test a loopback vsock in a global ns successfully connects to a vsock in another global ns."
+
+ # ns_same_local_loopback_ok
+ "Run socat to test a loopback vsock in a local ns successfully connects to a vsock in the same ns."
+
+ # ns_same_local_host_connect_to_local_vm_ok
+ "Run vsock_test client in a local ns with server in VM in same ns."
+
+ # ns_same_local_vm_connect_to_local_host_ok
+ "Run vsock_test client in VM in a local ns with server in same ns."
)
readonly USE_SHARED_VM=(
@@ -112,7 +165,7 @@ usage() {
for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do
name=${TEST_NAMES[${i}]}
desc=${TEST_DESCS[${i}]}
- printf "\t%-35s%-35s\n" "${name}" "${desc}"
+ printf "\t%-55s%-35s\n" "${name}" "${desc}"
done
echo
@@ -222,7 +275,7 @@ check_args() {
}
check_deps() {
- for dep in vng ${QEMU} busybox pkill ssh ss; do
+ for dep in vng ${QEMU} busybox pkill ssh ss socat; do
if [[ ! -x $(command -v "${dep}") ]]; then
echo -e "skip: dependency ${dep} not found!\n"
exit "${KSFT_SKIP}"
@@ -273,6 +326,20 @@ check_vng() {
fi
}
+check_socat() {
+ local support_string
+
+ support_string="$(socat -V)"
+
+ if [[ "${support_string}" != *"WITH_VSOCK 1"* ]]; then
+ die "err: socat is missing vsock support"
+ fi
+
+ if [[ "${support_string}" != *"WITH_UNIX 1"* ]]; then
+ die "err: socat is missing unix support"
+ fi
+}
+
handle_build() {
if [[ ! "${BUILD}" -eq 1 ]]; then
return
@@ -321,6 +388,14 @@ terminate_pidfiles() {
done
}
+terminate_pids() {
+ local pid
+
+ for pid in "$@"; do
+ kill -SIGTERM "${pid}" &>/dev/null || :
+ done
+}
+
vm_start() {
local pidfile=$1
local ns=$2
@@ -459,6 +534,28 @@ vm_dmesg_warn_count() {
vm_ssh "${ns}" -- dmesg --level=warn 2>/dev/null | grep -c -i 'vsock'
}
+vm_dmesg_check() {
+ local pidfile=$1
+ local ns=$2
+ local oops_before=$3
+ local warn_before=$4
+ local oops_after warn_after
+
+ oops_after=$(vm_dmesg_oops_count "${ns}")
+ if [[ "${oops_after}" -gt "${oops_before}" ]]; then
+ echo "FAIL: kernel oops detected on vm in ns ${ns}" | log_host
+ return 1
+ fi
+
+ warn_after=$(vm_dmesg_warn_count "${ns}")
+ if [[ "${warn_after}" -gt "${warn_before}" ]]; then
+ echo "FAIL: kernel warning detected on vm in ns ${ns}" | log_host
+ return 1
+ fi
+
+ return 0
+}
+
vm_vsock_test() {
local ns=$1
local host=$2
@@ -502,6 +599,8 @@ host_vsock_test() {
local host=$2
local cid=$3
local port=$4
+ shift 4
+ local extra_args=("$@")
local rc
local cmd="${VSOCK_TEST}"
@@ -516,13 +615,15 @@ host_vsock_test() {
--mode=client \
--peer-cid="${cid}" \
--control-host="${host}" \
- --control-port="${port}" 2>&1 | log_host
+ --control-port="${port}" \
+ "${extra_args[@]}" 2>&1 | log_host
rc=$?
else
${cmd} \
--mode=server \
--peer-cid="${cid}" \
- --control-port="${port}" 2>&1 | log_host &
+ --control-port="${port}" \
+ "${extra_args[@]}" 2>&1 | log_host &
rc=$?
if [[ $rc -ne 0 ]]; then
@@ -593,6 +694,468 @@ test_ns_host_vsock_ns_mode_ok() {
return "${KSFT_PASS}"
}
+test_ns_diff_global_host_connect_to_global_vm_ok() {
+ local oops_before warn_before
+ local pids pid pidfile
+ local ns0 ns1 port
+ declare -a pids
+ local unixfile
+ ns0="global0"
+ ns1="global1"
+ port=1234
+ local rc
+
+ init_namespaces
+
+ pidfile="$(create_pidfile)"
+
+ if ! vm_start "${pidfile}" "${ns0}"; then
+ return "${KSFT_FAIL}"
+ fi
+
+ vm_wait_for_ssh "${ns0}"
+ oops_before=$(vm_dmesg_oops_count "${ns0}")
+ warn_before=$(vm_dmesg_warn_count "${ns0}")
+
+ unixfile=$(mktemp -u /tmp/XXXX.sock)
+ ip netns exec "${ns1}" \
+ socat TCP-LISTEN:"${TEST_HOST_PORT}",fork \
+ UNIX-CONNECT:"${unixfile}" &
+ pids+=($!)
+ host_wait_for_listener "${ns1}" "${TEST_HOST_PORT}" "tcp"
+
+ ip netns exec "${ns0}" socat UNIX-LISTEN:"${unixfile}",fork \
+ TCP-CONNECT:localhost:"${TEST_HOST_PORT}" &
+ pids+=($!)
+ host_wait_for_listener "${ns0}" "${unixfile}" "unix"
+
+ vm_vsock_test "${ns0}" "server" 2 "${TEST_GUEST_PORT}"
+ vm_wait_for_listener "${ns0}" "${TEST_GUEST_PORT}" "tcp"
+ host_vsock_test "${ns1}" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"
+ rc=$?
+
+ vm_dmesg_check "${pidfile}" "${ns0}" "${oops_before}" "${warn_before}"
+ dmesg_rc=$?
+
+ terminate_pids "${pids[@]}"
+ terminate_pidfiles "${pidfile}"
+
+ if [[ "${rc}" -ne 0 ]] || [[ "${dmesg_rc}" -ne 0 ]]; then
+ return "${KSFT_FAIL}"
+ fi
+
+ return "${KSFT_PASS}"
+}
+
+test_ns_diff_global_host_connect_to_local_vm_fails() {
+ local oops_before warn_before
+ local ns0="global0"
+ local ns1="local0"
+ local port=12345
+ local dmesg_rc
+ local pidfile
+ local result
+ local pid
+
+ init_namespaces
+
+ outfile=$(mktemp)
+
+ pidfile="$(create_pidfile)"
+ if ! vm_start "${pidfile}" "${ns1}"; then
+ log_host "failed to start vm (cid=${VSOCK_CID}, ns=${ns0})"
+ return "${KSFT_FAIL}"
+ fi
+
+ vm_wait_for_ssh "${ns1}"
+ oops_before=$(vm_dmesg_oops_count "${ns1}")
+ warn_before=$(vm_dmesg_warn_count "${ns1}")
+
+ vm_ssh "${ns1}" -- socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" &
+ vm_wait_for_listener "${ns1}" "${port}" "vsock"
+ echo TEST | ip netns exec "${ns0}" \
+ socat STDIN VSOCK-CONNECT:"${VSOCK_CID}":"${port}" 2>/dev/null
+
+ vm_dmesg_check "${pidfile}" "${ns1}" "${oops_before}" "${warn_before}"
+ dmesg_rc=$?
+
+ terminate_pidfiles "${pidfile}"
+ result=$(cat "${outfile}")
+ rm -f "${outfile}"
+
+ if [[ "${result}" == "TEST" ]] || [[ "${dmesg_rc}" -ne 0 ]]; then
+ return "${KSFT_FAIL}"
+ fi
+
+ return "${KSFT_PASS}"
+}
+
+test_ns_diff_global_vm_connect_to_global_host_ok() {
+ local oops_before warn_before
+ local ns0="global0"
+ local ns1="global1"
+ local port=12345
+ local unixfile
+ local dmesg_rc
+ local pidfile
+ local pids
+ local rc
+
+ init_namespaces
+
+ declare -a pids
+
+ log_host "Setup socat bridge from ns ${ns0} to ns ${ns1} over port ${port}"
+
+ unixfile=$(mktemp -u /tmp/XXXX.sock)
+
+ ip netns exec "${ns0}" \
+ socat TCP-LISTEN:"${port}" UNIX-CONNECT:"${unixfile}" &
+ pids+=($!)
+ host_wait_for_listener "${ns0}" "${port}" "tcp"
+
+ ip netns exec "${ns1}" \
+ socat UNIX-LISTEN:"${unixfile}" TCP-CONNECT:127.0.0.1:"${port}" &
+ pids+=($!)
+ host_wait_for_listener "${ns1}" "${unixfile}" "unix"
+
+ log_host "Launching ${VSOCK_TEST} in ns ${ns1}"
+ host_vsock_test "${ns1}" "server" "${VSOCK_CID}" "${port}"
+
+ pidfile="$(create_pidfile)"
+ if ! vm_start "${pidfile}" "${ns0}"; then
+ log_host "failed to start vm (cid=${cid}, ns=${ns0})"
+ terminate_pids "${pids[@]}"
+ rm -f "${unixfile}"
+ return "${KSFT_FAIL}"
+ fi
+
+ vm_wait_for_ssh "${ns0}"
+
+ oops_before=$(vm_dmesg_oops_count "${ns0}")
+ warn_before=$(vm_dmesg_warn_count "${ns0}")
+
+ vm_vsock_test "${ns0}" "10.0.2.2" 2 "${port}"
+ rc=$?
+
+ vm_dmesg_check "${pidfile}" "${ns0}" "${oops_before}" "${warn_before}"
+ dmesg_rc=$?
+
+ terminate_pidfiles "${pidfile}"
+ terminate_pids "${pids[@]}"
+ rm -f "${unixfile}"
+
+ if [[ "${rc}" -ne 0 ]] || [[ "${dmesg_rc}" -ne 0 ]]; then
+ return "${KSFT_FAIL}"
+ fi
+
+ return "${KSFT_PASS}"
+
+}
+
+test_ns_diff_global_vm_connect_to_local_host_fails() {
+ local ns0="global0"
+ local ns1="local0"
+ local port=12345
+ local oops_before warn_before
+ local dmesg_rc
+ local pidfile
+ local result
+ local pid
+
+ init_namespaces
+
+ log_host "Launching socat in ns ${ns1}"
+ outfile=$(mktemp)
+
+ ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT &> "${outfile}" &
+ pid=$!
+ host_wait_for_listener "${ns1}" "${port}" "vsock"
+
+ pidfile="$(create_pidfile)"
+ if ! vm_start "${pidfile}" "${ns0}"; then
+ log_host "failed to start vm (cid=${cid}, ns=${ns0})"
+ terminate_pids "${pid}"
+ rm -f "${outfile}"
+ return "${KSFT_FAIL}"
+ fi
+
+ vm_wait_for_ssh "${ns0}"
+
+ oops_before=$(vm_dmesg_oops_count "${ns0}")
+ warn_before=$(vm_dmesg_warn_count "${ns0}")
+
+ vm_ssh "${ns0}" -- \
+ bash -c "echo TEST | socat STDIN VSOCK-CONNECT:2:${port}" 2>&1 | log_guest
+
+ vm_dmesg_check "${pidfile}" "${ns0}" "${oops_before}" "${warn_before}"
+ dmesg_rc=$?
+
+ terminate_pidfiles "${pidfile}"
+ terminate_pids "${pid}"
+
+ result=$(cat "${outfile}")
+ rm -f "${outfile}"
+
+ if [[ "${result}" != TEST ]] && [[ "${dmesg_rc}" -eq 0 ]]; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_diff_local_host_connect_to_local_vm_fails() {
+ local ns0="local0"
+ local ns1="local1"
+ local port=12345
+ local oops_before warn_before
+ local dmesg_rc
+ local pidfile
+ local result
+ local pid
+
+ init_namespaces
+
+ outfile=$(mktemp)
+
+ pidfile="$(create_pidfile)"
+ if ! vm_start "${pidfile}" "${ns1}"; then
+ log_host "failed to start vm (cid=${cid}, ns=${ns0})"
+ return "${KSFT_FAIL}"
+ fi
+
+ vm_wait_for_ssh "${ns1}"
+ oops_before=$(vm_dmesg_oops_count "${ns1}")
+ warn_before=$(vm_dmesg_warn_count "${ns1}")
+
+ vm_ssh "${ns1}" -- socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" &
+ vm_wait_for_listener "${ns1}" "${port}" "vsock"
+
+ echo TEST | ip netns exec "${ns0}" \
+ socat STDIN VSOCK-CONNECT:"${VSOCK_CID}":"${port}" 2>/dev/null
+
+ vm_dmesg_check "${pidfile}" "${ns1}" "${oops_before}" "${warn_before}"
+ dmesg_rc=$?
+
+ terminate_pidfiles "${pidfile}"
+
+ result=$(cat "${outfile}")
+ rm -f "${outfile}"
+
+ if [[ "${result}" != TEST ]] && [[ "${dmesg_rc}" -eq 0 ]]; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_diff_local_vm_connect_to_local_host_fails() {
+ local oops_before warn_before
+ local ns0="local0"
+ local ns1="local1"
+ local port=12345
+ local dmesg_rc
+ local pidfile
+ local result
+ local pid
+
+ init_namespaces
+
+ log_host "Launching socat in ns ${ns1}"
+ outfile=$(mktemp)
+ ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT &> "${outfile}" &
+ pid=$!
+ host_wait_for_listener "${ns1}" "${port}" "vsock"
+
+ pidfile="$(create_pidfile)"
+ if ! vm_start "${pidfile}" "${ns0}"; then
+ log_host "failed to start vm (cid=${cid}, ns=${ns0})"
+ rm -f "${outfile}"
+ return "${KSFT_FAIL}"
+ fi
+
+ vm_wait_for_ssh "${ns0}"
+ oops_before=$(vm_dmesg_oops_count "${ns0}")
+ warn_before=$(vm_dmesg_warn_count "${ns0}")
+
+ vm_ssh "${ns0}" -- \
+ bash -c "echo TEST | socat STDIN VSOCK-CONNECT:2:${port}" 2>&1 | log_guest
+
+ vm_dmesg_check "${pidfile}" "${ns0}" "${oops_before}" "${warn_before}"
+ dmesg_rc=$?
+
+ terminate_pidfiles "${pidfile}"
+ terminate_pids "${pid}"
+
+ result=$(cat "${outfile}")
+ rm -f "${outfile}"
+
+ if [[ "${result}" != TEST ]] && [[ "${dmesg_rc}" -eq 0 ]]; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+__test_loopback_two_netns() {
+ local ns0=$1
+ local ns1=$2
+ local port=12345
+ local result
+ local pid
+
+ modprobe vsock_loopback &> /dev/null || :
+
+ log_host "Launching socat in ns ${ns1}"
+ outfile=$(mktemp)
+
+ ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" 2>/dev/null &
+ pid=$!
+ host_wait_for_listener "${ns1}" "${port}" "vsock"
+
+ log_host "Launching socat in ns ${ns0}"
+ echo TEST | ip netns exec "${ns0}" socat STDIN VSOCK-CONNECT:1:"${port}" 2>/dev/null
+ terminate_pids "${pid}"
+
+ result=$(cat "${outfile}")
+ rm -f "${outfile}"
+
+ if [[ "${result}" == TEST ]]; then
+ return 0
+ fi
+
+ return 1
+}
+
+test_ns_diff_global_to_local_loopback_local_fails() {
+ init_namespaces
+
+ if ! __test_loopback_two_netns "global0" "local0"; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_diff_local_to_global_loopback_fails() {
+ init_namespaces
+
+ if ! __test_loopback_two_netns "local0" "global0"; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_diff_local_to_local_loopback_fails() {
+ init_namespaces
+
+ if ! __test_loopback_two_netns "local0" "local1"; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_diff_global_to_global_loopback_ok() {
+ init_namespaces
+
+ if __test_loopback_two_netns "global0" "global1"; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_same_local_loopback_ok() {
+ init_namespaces
+
+ if __test_loopback_two_netns "local0" "local0"; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_same_local_host_connect_to_local_vm_ok() {
+ local oops_before warn_before
+ local ns="local0"
+ local port=1234
+ local dmesg_rc
+ local pidfile
+ local rc
+
+ init_namespaces
+
+ pidfile="$(create_pidfile)"
+
+ if ! vm_start "${pidfile}" "${ns}"; then
+ return "${KSFT_FAIL}"
+ fi
+
+ vm_wait_for_ssh "${ns}"
+ oops_before=$(vm_dmesg_oops_count "${ns}")
+ warn_before=$(vm_dmesg_warn_count "${ns}")
+
+ vm_vsock_test "${ns}" "server" 2 "${TEST_GUEST_PORT}"
+
+ # Skip test 29 (transport release use-after-free): This test attempts
+ # binding both G2H and H2G CIDs. Because virtio-vsock (G2H) doesn't
+ # support local namespaces the test will fail when
+ # transport_g2h->stream_allow() returns false. This edge case only
+ # happens for vsock_test in client mode on the host in a local
+ # namespace. This is a false positive.
+ host_vsock_test "${ns}" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}" --skip=29
+ rc=$?
+
+ vm_dmesg_check "${pidfile}" "${ns}" "${oops_before}" "${warn_before}"
+ dmesg_rc=$?
+
+ terminate_pidfiles "${pidfile}"
+
+ if [[ "${rc}" -ne 0 ]] || [[ "${dmesg_rc}" -ne 0 ]]; then
+ return "${KSFT_FAIL}"
+ fi
+
+ return "${KSFT_PASS}"
+}
+
+test_ns_same_local_vm_connect_to_local_host_ok() {
+ local oops_before warn_before
+ local ns="local0"
+ local port=1234
+ local dmesg_rc
+ local pidfile
+ local rc
+
+ init_namespaces
+
+ pidfile="$(create_pidfile)"
+
+ if ! vm_start "${pidfile}" "${ns}"; then
+ return "${KSFT_FAIL}"
+ fi
+
+ vm_wait_for_ssh "${ns}"
+ oops_before=$(vm_dmesg_oops_count "${ns}")
+ warn_before=$(vm_dmesg_warn_count "${ns}")
+
+ host_vsock_test "${ns}" "server" "${VSOCK_CID}" "${port}"
+ vm_vsock_test "${ns}" "10.0.2.2" 2 "${port}"
+ rc=$?
+
+ vm_dmesg_check "${pidfile}" "${ns}" "${oops_before}" "${warn_before}"
+ dmesg_rc=$?
+
+ terminate_pidfiles "${pidfile}"
+
+ if [[ "${rc}" -ne 0 ]] || [[ "${dmesg_rc}" -ne 0 ]]; then
+ return "${KSFT_FAIL}"
+ fi
+
+ return "${KSFT_PASS}"
+}
+
namespaces_can_boot_same_cid() {
local ns0=$1
local ns1=$2
@@ -882,6 +1445,7 @@ fi
check_args "${ARGS[@]}"
check_deps
check_vng
+check_socat
handle_build
echo "1..${#ARGS[@]}"
--
2.47.3
^ permalink raw reply related
* [PATCH RFC net-next v13 11/13] selftests/vsock: add namespace tests for CID collisions
From: Bobby Eshleman @ 2025-12-24 0:28 UTC (permalink / raw)
To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Eugenio Pérez, Xuan Zhuo, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, Shuah Khan, Long Li
Cc: linux-kernel, virtualization, netdev, kvm, linux-hyperv,
linux-kselftest, berrange, Sargun Dhillon, Bobby Eshleman,
Bobby Eshleman
In-Reply-To: <20251223-vsock-vmtest-v13-0-9d6db8e7c80b@meta.com>
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add tests to verify CID collision rules across different vsock namespace
modes.
1. Two VMs with the same CID cannot start in different global namespaces
(ns_global_same_cid_fails)
2. Two VMs with the same CID can start in different local namespaces
(ns_local_same_cid_ok)
3. VMs with the same CID can coexist when one is in a global namespace
and another is in a local namespace (ns_global_local_same_cid_ok and
ns_local_global_same_cid_ok)
The tests ns_global_local_same_cid_ok and ns_local_global_same_cid_ok
make sure that ordering does not matter.
The tests use a shared helper function namespaces_can_boot_same_cid()
that attempts to start two VMs with identical CIDs in the specified
namespaces and verifies whether VM initialization failed or succeeded.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v11:
- check vm_start() rc in namespaces_can_boot_same_cid() (Stefano)
- fix ns_local_same_cid_ok() to use local0 and local1 instead of reusing
local0 twice. This check should pass, ensuring local namespaces do not
collide (Stefano)
---
tools/testing/selftests/vsock/vmtest.sh | 78 +++++++++++++++++++++++++++++++++
1 file changed, 78 insertions(+)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index 38785a102236..1bf537410ea6 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -50,6 +50,10 @@ readonly TEST_NAMES=(
vm_loopback
ns_host_vsock_ns_mode_ok
ns_host_vsock_child_ns_mode_ok
+ ns_global_same_cid_fails
+ ns_local_same_cid_ok
+ ns_global_local_same_cid_ok
+ ns_local_global_same_cid_ok
)
readonly TEST_DESCS=(
# vm_server_host_client
@@ -66,6 +70,18 @@ readonly TEST_DESCS=(
# ns_host_vsock_child_ns_mode_ok
"Check /proc/sys/net/vsock/ns_mode is read-only and child_ns_mode is writable."
+
+ # ns_global_same_cid_fails
+ "Check QEMU fails to start two VMs with same CID in two different global namespaces."
+
+ # ns_local_same_cid_ok
+ "Check QEMU successfully starts two VMs with same CID in two different local namespaces."
+
+ # ns_global_local_same_cid_ok
+ "Check QEMU successfully starts one VM in a global ns and then another VM in a local ns with the same CID."
+
+ # ns_local_global_same_cid_ok
+ "Check QEMU successfully starts one VM in a local ns and then another VM in a global ns with the same CID."
)
readonly USE_SHARED_VM=(
@@ -577,6 +593,68 @@ test_ns_host_vsock_ns_mode_ok() {
return "${KSFT_PASS}"
}
+namespaces_can_boot_same_cid() {
+ local ns0=$1
+ local ns1=$2
+ local pidfile1 pidfile2
+ local rc
+
+ pidfile1="$(create_pidfile)"
+
+ # The first VM should be able to start. If it can't then we have
+ # problems and need to return non-zero.
+ if ! vm_start "${pidfile1}" "${ns0}"; then
+ return 1
+ fi
+
+ pidfile2="$(create_pidfile)"
+ vm_start "${pidfile2}" "${ns1}"
+ rc=$?
+ terminate_pidfiles "${pidfile1}" "${pidfile2}"
+
+ return "${rc}"
+}
+
+test_ns_global_same_cid_fails() {
+ init_namespaces
+
+ if namespaces_can_boot_same_cid "global0" "global1"; then
+ return "${KSFT_FAIL}"
+ fi
+
+ return "${KSFT_PASS}"
+}
+
+test_ns_local_global_same_cid_ok() {
+ init_namespaces
+
+ if namespaces_can_boot_same_cid "local0" "global0"; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_global_local_same_cid_ok() {
+ init_namespaces
+
+ if namespaces_can_boot_same_cid "global0" "local0"; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
+test_ns_local_same_cid_ok() {
+ init_namespaces
+
+ if namespaces_can_boot_same_cid "local0" "local1"; then
+ return "${KSFT_PASS}"
+ fi
+
+ return "${KSFT_FAIL}"
+}
+
test_ns_host_vsock_child_ns_mode_ok() {
local orig_mode
local rc
--
2.47.3
^ permalink raw reply related
* [PATCH RFC net-next v13 10/13] selftests/vsock: add tests for proc sys vsock ns_mode
From: Bobby Eshleman @ 2025-12-24 0:28 UTC (permalink / raw)
To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Eugenio Pérez, Xuan Zhuo, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, Shuah Khan, Long Li
Cc: linux-kernel, virtualization, netdev, kvm, linux-hyperv,
linux-kselftest, berrange, Sargun Dhillon, Bobby Eshleman,
Bobby Eshleman
In-Reply-To: <20251223-vsock-vmtest-v13-0-9d6db8e7c80b@meta.com>
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add tests for the /proc/sys/net/vsock/{ns_mode,child_ns_mode}
interfaces. Namely, that they accept/report "global" and "local" strings
and enforce their access policies.
Start a convention of commenting the test name over the test
description. Add test name comments over test descriptions that existed
before this convention.
Add a check_netns() function that checks if the test requires namespaces
and if the current kernel supports namespaces. Skip tests that require
namespaces if the system does not have namespace support.
This patch is the first to add tests that do *not* re-use the same
shared VM. For that reason, it adds a run_ns_tests() function to run
these tests and filter out the shared VM tests.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v13:
- remove write-once test ns_host_vsock_ns_mode_write_once_ok to reflect
removing the write-once policy
- add child_ns_mode test test_ns_host_vsock_child_ns_mode_ok
- modify test_ns_host_vsock_ns_mode_ok() to check that the correct mode
was inherited from child_ns_mode
Changes in v12:
- remove ns_vm_local_mode_rejected test, due to dropping that constraint
Changes in v11:
- Document ns_ prefix above TEST_NAMES (Stefano)
Changes in v10:
- Remove extraneous add_namespaces/del_namespaces calls.
- Rename run_tests() to run_ns_tests() since it is designed to only
run ns tests.
Changes in v9:
- add test ns_vm_local_mode_rejected to check that guests cannot use
local mode
---
tools/testing/selftests/vsock/vmtest.sh | 140 +++++++++++++++++++++++++++++++-
1 file changed, 138 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index 0e681d4c3a15..38785a102236 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -41,14 +41,38 @@ readonly KERNEL_CMDLINE="\
virtme.ssh virtme_ssh_channel=tcp virtme_ssh_user=$USER \
"
readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log)
-readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback)
+
+# Namespace tests must use the ns_ prefix. This is checked in check_netns() and
+# is used to determine if a test needs namespace setup before test execution.
+readonly TEST_NAMES=(
+ vm_server_host_client
+ vm_client_host_server
+ vm_loopback
+ ns_host_vsock_ns_mode_ok
+ ns_host_vsock_child_ns_mode_ok
+)
readonly TEST_DESCS=(
+ # vm_server_host_client
"Run vsock_test in server mode on the VM and in client mode on the host."
+
+ # vm_client_host_server
"Run vsock_test in client mode on the VM and in server mode on the host."
+
+ # vm_loopback
"Run vsock_test using the loopback transport in the VM."
+
+ # ns_host_vsock_ns_mode_ok
+ "Check /proc/sys/net/vsock/ns_mode strings on the host."
+
+ # ns_host_vsock_child_ns_mode_ok
+ "Check /proc/sys/net/vsock/ns_mode is read-only and child_ns_mode is writable."
)
-readonly USE_SHARED_VM=(vm_server_host_client vm_client_host_server vm_loopback)
+readonly USE_SHARED_VM=(
+ vm_server_host_client
+ vm_client_host_server
+ vm_loopback
+)
readonly NS_MODES=("local" "global")
VERBOSE=0
@@ -196,6 +220,20 @@ check_deps() {
fi
}
+check_netns() {
+ local tname=$1
+
+ # If the test requires NS support, check if NS support exists
+ # using /proc/self/ns
+ if [[ "${tname}" =~ ^ns_ ]] &&
+ [[ ! -e /proc/self/ns ]]; then
+ log_host "No NS support detected for test ${tname}"
+ return 1
+ fi
+
+ return 0
+}
+
check_vng() {
local tested_versions
local version
@@ -519,6 +557,54 @@ log_guest() {
LOG_PREFIX=guest log "$@"
}
+ns_get_mode() {
+ local ns=$1
+
+ ip netns exec "${ns}" cat /proc/sys/net/vsock/ns_mode 2>/dev/null
+}
+
+test_ns_host_vsock_ns_mode_ok() {
+ for mode in "${NS_MODES[@]}"; do
+ local actual
+
+ actual=$(ns_get_mode "${mode}0")
+ if [[ "${actual}" != "${mode}" ]]; then
+ log_host "expected mode ${mode}, got ${actual}"
+ return "${KSFT_FAIL}"
+ fi
+ done
+
+ return "${KSFT_PASS}"
+}
+
+test_ns_host_vsock_child_ns_mode_ok() {
+ local orig_mode
+ local rc
+
+ orig_mode=$(cat /proc/sys/net/vsock/child_ns_mode)
+
+ rc="${KSFT_PASS}"
+ for mode in "${NS_MODES[@]}"; do
+ local ns="${mode}0"
+
+ if echo "${mode}" 2>/dev/null > /proc/sys/net/vsock/ns_mode; then
+ log_host "ns_mode should be read-only but write succeeded"
+ rc="${KSFT_FAIL}"
+ continue
+ fi
+
+ if ! echo "${mode}" > /proc/sys/net/vsock/child_ns_mode; then
+ log_host "child_ns_mode should be writable to ${mode}"
+ rc="${KSFT_FAIL}"
+ continue
+ fi
+ done
+
+ echo "${orig_mode}" > /proc/sys/net/vsock/child_ns_mode
+
+ return "${rc}"
+}
+
test_vm_server_host_client() {
if ! vm_vsock_test "init_ns" "server" 2 "${TEST_GUEST_PORT}"; then
return "${KSFT_FAIL}"
@@ -592,6 +678,11 @@ run_shared_vm_tests() {
continue
fi
+ if ! check_netns "${arg}"; then
+ check_result "${KSFT_SKIP}" "${arg}"
+ continue
+ fi
+
run_shared_vm_test "${arg}"
check_result "$?" "${arg}"
done
@@ -645,6 +736,49 @@ run_shared_vm_test() {
return "${rc}"
}
+run_ns_tests() {
+ for arg in "${ARGS[@]}"; do
+ if shared_vm_test "${arg}"; then
+ continue
+ fi
+
+ if ! check_netns "${arg}"; then
+ check_result "${KSFT_SKIP}" "${arg}"
+ continue
+ fi
+
+ add_namespaces
+
+ name=$(echo "${arg}" | awk '{ print $1 }')
+ log_host "Executing test_${name}"
+
+ host_oops_before=$(dmesg 2>/dev/null | grep -c -i 'Oops')
+ host_warn_before=$(dmesg --level=warn 2>/dev/null | grep -c -i 'vsock')
+ eval test_"${name}"
+ rc=$?
+
+ host_oops_after=$(dmesg 2>/dev/null | grep -c -i 'Oops')
+ if [[ "${host_oops_after}" -gt "${host_oops_before}" ]]; then
+ echo "FAIL: kernel oops detected on host" | log_host
+ check_result "${KSFT_FAIL}" "${name}"
+ del_namespaces
+ continue
+ fi
+
+ host_warn_after=$(dmesg --level=warn 2>/dev/null | grep -c -i 'vsock')
+ if [[ "${host_warn_after}" -gt "${host_warn_before}" ]]; then
+ echo "FAIL: kernel warning detected on host" | log_host
+ check_result "${KSFT_FAIL}" "${name}"
+ del_namespaces
+ continue
+ fi
+
+ check_result "${rc}" "${name}"
+
+ del_namespaces
+ done
+}
+
BUILD=0
QEMU="qemu-system-$(uname -m)"
@@ -690,6 +824,8 @@ if shared_vm_tests_requested "${ARGS[@]}"; then
terminate_pidfiles "${pidfile}"
fi
+run_ns_tests "${ARGS[@]}"
+
echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}"
echo "Log: ${LOG}"
--
2.47.3
^ permalink raw reply related
* [PATCH RFC net-next v13 09/13] selftests/vsock: use ss to wait for listeners instead of /proc/net
From: Bobby Eshleman @ 2025-12-24 0:28 UTC (permalink / raw)
To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Eugenio Pérez, Xuan Zhuo, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, Shuah Khan, Long Li
Cc: linux-kernel, virtualization, netdev, kvm, linux-hyperv,
linux-kselftest, berrange, Sargun Dhillon, Bobby Eshleman,
Bobby Eshleman
In-Reply-To: <20251223-vsock-vmtest-v13-0-9d6db8e7c80b@meta.com>
From: Bobby Eshleman <bobbyeshleman@meta.com>
Replace /proc/net parsing with ss(8) for detecting listening sockets in
wait_for_listener() functions and add support for TCP, VSOCK, and Unix
socket protocols.
The previous implementation parsed /proc/net/tcp using awk to detect
listening sockets, but this approach could not support vsock because
vsock does not export socket information to /proc/net/.
Instead, use ss so that we can detect listeners on tcp, vsock, and unix.
The protocol parameter is now required for all wait_for_listener family
functions (wait_for_listener, vm_wait_for_listener,
host_wait_for_listener) to explicitly specify which socket type to wait
for.
ss is added to the dependency check in check_deps().
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
tools/testing/selftests/vsock/vmtest.sh | 47 +++++++++++++++++++++------------
1 file changed, 30 insertions(+), 17 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index 4b5929ffc9eb..0e681d4c3a15 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -182,7 +182,7 @@ check_args() {
}
check_deps() {
- for dep in vng ${QEMU} busybox pkill ssh; do
+ for dep in vng ${QEMU} busybox pkill ssh ss; do
if [[ ! -x $(command -v "${dep}") ]]; then
echo -e "skip: dependency ${dep} not found!\n"
exit "${KSFT_SKIP}"
@@ -337,21 +337,32 @@ wait_for_listener()
local port=$1
local interval=$2
local max_intervals=$3
- local protocol=tcp
- local pattern
+ local protocol=$4
local i
- pattern=":$(printf "%04X" "${port}") "
-
- # for tcp protocol additionally check the socket state
- [ "${protocol}" = "tcp" ] && pattern="${pattern}0A"
-
for i in $(seq "${max_intervals}"); do
- if awk -v pattern="${pattern}" \
- 'BEGIN {rc=1} $2" "$4 ~ pattern {rc=0} END {exit rc}' \
- /proc/net/"${protocol}"*; then
+ case "${protocol}" in
+ tcp)
+ if ss --listening --tcp --numeric | grep -q ":${port} "; then
+ break
+ fi
+ ;;
+ vsock)
+ if ss --listening --vsock --numeric | grep -q ":${port} "; then
+ break
+ fi
+ ;;
+ unix)
+ # For unix sockets, port is actually the socket path
+ if ss --listening --unix | grep -q "${port}"; then
+ break
+ fi
+ ;;
+ *)
+ echo "Unknown protocol: ${protocol}" >&2
break
- fi
+ ;;
+ esac
sleep "${interval}"
done
}
@@ -359,23 +370,25 @@ wait_for_listener()
vm_wait_for_listener() {
local ns=$1
local port=$2
+ local protocol=$3
vm_ssh "${ns}" <<EOF
$(declare -f wait_for_listener)
-wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX}
+wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX} ${protocol}
EOF
}
host_wait_for_listener() {
local ns=$1
local port=$2
+ local protocol=$3
if [[ "${ns}" == "init_ns" ]]; then
- wait_for_listener "${port}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
+ wait_for_listener "${port}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}" "${protocol}"
else
ip netns exec "${ns}" bash <<-EOF
$(declare -f wait_for_listener)
- wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX}
+ wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX} ${protocol}
EOF
fi
}
@@ -422,7 +435,7 @@ vm_vsock_test() {
return $rc
fi
- vm_wait_for_listener "${ns}" "${port}"
+ vm_wait_for_listener "${ns}" "${port}" "tcp"
rc=$?
fi
set +o pipefail
@@ -463,7 +476,7 @@ host_vsock_test() {
return $rc
fi
- host_wait_for_listener "${ns}" "${port}"
+ host_wait_for_listener "${ns}" "${port}" "tcp"
rc=$?
fi
set +o pipefail
--
2.47.3
^ permalink raw reply related
* [PATCH RFC net-next v13 08/13] selftests/vsock: add vm_dmesg_{warn,oops}_count() helpers
From: Bobby Eshleman @ 2025-12-24 0:28 UTC (permalink / raw)
To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Eugenio Pérez, Xuan Zhuo, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, Shuah Khan, Long Li
Cc: linux-kernel, virtualization, netdev, kvm, linux-hyperv,
linux-kselftest, berrange, Sargun Dhillon, Bobby Eshleman,
Bobby Eshleman
In-Reply-To: <20251223-vsock-vmtest-v13-0-9d6db8e7c80b@meta.com>
From: Bobby Eshleman <bobbyeshleman@meta.com>
These functions are reused by the VM tests to collect and compare dmesg
warnings and oops counts. The future VM-specific tests use them heavily.
This patches relies on vm_ssh() already supporting namespaces.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v11:
- break these out into an earlier patch so that they can be used
directly in new patches (instead of causing churn by adding this
later)
---
tools/testing/selftests/vsock/vmtest.sh | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index 1d03acb62347..4b5929ffc9eb 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -380,6 +380,17 @@ host_wait_for_listener() {
fi
}
+vm_dmesg_oops_count() {
+ local ns=$1
+
+ vm_ssh "${ns}" -- dmesg 2>/dev/null | grep -c -i 'Oops'
+}
+
+vm_dmesg_warn_count() {
+ local ns=$1
+
+ vm_ssh "${ns}" -- dmesg --level=warn 2>/dev/null | grep -c -i 'vsock'
+}
vm_vsock_test() {
local ns=$1
@@ -587,8 +598,8 @@ run_shared_vm_test() {
host_oops_cnt_before=$(dmesg | grep -c -i 'Oops')
host_warn_cnt_before=$(dmesg --level=warn | grep -c -i 'vsock')
- vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops')
- vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | grep -c -i 'vsock')
+ vm_oops_cnt_before=$(vm_dmesg_oops_count "init_ns")
+ vm_warn_cnt_before=$(vm_dmesg_warn_count "init_ns")
name=$(echo "${1}" | awk '{ print $1 }')
eval test_"${name}"
@@ -606,13 +617,13 @@ run_shared_vm_test() {
rc=$KSFT_FAIL
fi
- vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l)
+ vm_oops_cnt_after=$(vm_dmesg_oops_count "init_ns")
if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then
echo "FAIL: kernel oops detected on vm" | log_host
rc=$KSFT_FAIL
fi
- vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | grep -c -i 'vsock')
+ vm_warn_cnt_after=$(vm_dmesg_warn_count "init_ns")
if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then
echo "FAIL: kernel warning detected on vm" | log_host
rc=$KSFT_FAIL
--
2.47.3
^ permalink raw reply related
* [PATCH RFC net-next v13 07/13] selftests/vsock: prepare vm management helpers for namespaces
From: Bobby Eshleman @ 2025-12-24 0:28 UTC (permalink / raw)
To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Eugenio Pérez, Xuan Zhuo, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, Shuah Khan, Long Li
Cc: linux-kernel, virtualization, netdev, kvm, linux-hyperv,
linux-kselftest, berrange, Sargun Dhillon, Bobby Eshleman,
Bobby Eshleman
In-Reply-To: <20251223-vsock-vmtest-v13-0-9d6db8e7c80b@meta.com>
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add namespace support to vm management, ssh helpers, and vsock_test
wrapper functions. This enables running VMs and test helpers in specific
namespaces, which is required for upcoming namespace isolation tests.
The functions still work correctly within the init ns, though the caller
must now pass "init_ns" explicitly.
No functional changes for existing tests. All have been updated to pass
"init_ns" explicitly.
Affected functions (such as vm_start() and vm_ssh()) now wrap their
commands with 'ip netns exec' when executing commands in non-init
namespaces.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
tools/testing/selftests/vsock/vmtest.sh | 93 +++++++++++++++++++++++----------
1 file changed, 65 insertions(+), 28 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index c2bdc293b94c..1d03acb62347 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -135,7 +135,18 @@ del_namespaces() {
}
vm_ssh() {
- ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@"
+ local ns_exec
+
+ if [[ "${1}" == init_ns ]]; then
+ ns_exec=""
+ else
+ ns_exec="ip netns exec ${1}"
+ fi
+
+ shift
+
+ ${ns_exec} ssh -q -o UserKnownHostsFile=/dev/null -p "${SSH_HOST_PORT}" localhost "$@"
+
return $?
}
@@ -258,10 +269,12 @@ terminate_pidfiles() {
vm_start() {
local pidfile=$1
+ local ns=$2
local logfile=/dev/null
local verbose_opt=""
local kernel_opt=""
local qemu_opts=""
+ local ns_exec=""
local qemu
qemu=$(command -v "${QEMU}")
@@ -282,7 +295,11 @@ vm_start() {
kernel_opt="${KERNEL_CHECKOUT}"
fi
- vng \
+ if [[ "${ns}" != "init_ns" ]]; then
+ ns_exec="ip netns exec ${ns}"
+ fi
+
+ ${ns_exec} vng \
--run \
${kernel_opt} \
${verbose_opt} \
@@ -297,6 +314,7 @@ vm_start() {
}
vm_wait_for_ssh() {
+ local ns=$1
local i
i=0
@@ -304,7 +322,8 @@ vm_wait_for_ssh() {
if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then
die "Timed out waiting for guest ssh"
fi
- if vm_ssh -- true; then
+
+ if vm_ssh "${ns}" -- true; then
break
fi
i=$(( i + 1 ))
@@ -338,30 +357,41 @@ wait_for_listener()
}
vm_wait_for_listener() {
- local port=$1
+ local ns=$1
+ local port=$2
- vm_ssh <<EOF
+ vm_ssh "${ns}" <<EOF
$(declare -f wait_for_listener)
wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX}
EOF
}
host_wait_for_listener() {
- local port=$1
+ local ns=$1
+ local port=$2
- wait_for_listener "${port}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
+ if [[ "${ns}" == "init_ns" ]]; then
+ wait_for_listener "${port}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
+ else
+ ip netns exec "${ns}" bash <<-EOF
+ $(declare -f wait_for_listener)
+ wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX}
+ EOF
+ fi
}
+
vm_vsock_test() {
- local host=$1
- local cid=$2
- local port=$3
+ local ns=$1
+ local host=$2
+ local cid=$3
+ local port=$4
local rc
# log output and use pipefail to respect vsock_test errors
set -o pipefail
if [[ "${host}" != server ]]; then
- vm_ssh -- "${VSOCK_TEST}" \
+ vm_ssh "${ns}" -- "${VSOCK_TEST}" \
--mode=client \
--control-host="${host}" \
--peer-cid="${cid}" \
@@ -369,7 +399,7 @@ vm_vsock_test() {
2>&1 | log_guest
rc=$?
else
- vm_ssh -- "${VSOCK_TEST}" \
+ vm_ssh "${ns}" -- "${VSOCK_TEST}" \
--mode=server \
--peer-cid="${cid}" \
--control-port="${port}" \
@@ -381,7 +411,7 @@ vm_vsock_test() {
return $rc
fi
- vm_wait_for_listener "${port}"
+ vm_wait_for_listener "${ns}" "${port}"
rc=$?
fi
set +o pipefail
@@ -390,22 +420,28 @@ vm_vsock_test() {
}
host_vsock_test() {
- local host=$1
- local cid=$2
- local port=$3
+ local ns=$1
+ local host=$2
+ local cid=$3
+ local port=$4
local rc
+ local cmd="${VSOCK_TEST}"
+ if [[ "${ns}" != "init_ns" ]]; then
+ cmd="ip netns exec ${ns} ${cmd}"
+ fi
+
# log output and use pipefail to respect vsock_test errors
set -o pipefail
if [[ "${host}" != server ]]; then
- ${VSOCK_TEST} \
+ ${cmd} \
--mode=client \
--peer-cid="${cid}" \
--control-host="${host}" \
--control-port="${port}" 2>&1 | log_host
rc=$?
else
- ${VSOCK_TEST} \
+ ${cmd} \
--mode=server \
--peer-cid="${cid}" \
--control-port="${port}" 2>&1 | log_host &
@@ -416,7 +452,7 @@ host_vsock_test() {
return $rc
fi
- host_wait_for_listener "${port}"
+ host_wait_for_listener "${ns}" "${port}"
rc=$?
fi
set +o pipefail
@@ -460,11 +496,11 @@ log_guest() {
}
test_vm_server_host_client() {
- if ! vm_vsock_test "server" 2 "${TEST_GUEST_PORT}"; then
+ if ! vm_vsock_test "init_ns" "server" 2 "${TEST_GUEST_PORT}"; then
return "${KSFT_FAIL}"
fi
- if ! host_vsock_test "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"; then
+ if ! host_vsock_test "init_ns" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"; then
return "${KSFT_FAIL}"
fi
@@ -472,11 +508,11 @@ test_vm_server_host_client() {
}
test_vm_client_host_server() {
- if ! host_vsock_test "server" "${VSOCK_CID}" "${TEST_HOST_PORT_LISTENER}"; then
+ if ! host_vsock_test "init_ns" "server" "${VSOCK_CID}" "${TEST_HOST_PORT_LISTENER}"; then
return "${KSFT_FAIL}"
fi
- if ! vm_vsock_test "10.0.2.2" 2 "${TEST_HOST_PORT_LISTENER}"; then
+ if ! vm_vsock_test "init_ns" "10.0.2.2" 2 "${TEST_HOST_PORT_LISTENER}"; then
return "${KSFT_FAIL}"
fi
@@ -486,13 +522,14 @@ test_vm_client_host_server() {
test_vm_loopback() {
local port=60000 # non-forwarded local port
- vm_ssh -- modprobe vsock_loopback &> /dev/null || :
+ vm_ssh "init_ns" -- modprobe vsock_loopback &> /dev/null || :
- if ! vm_vsock_test "server" 1 "${port}"; then
+ if ! vm_vsock_test "init_ns" "server" 1 "${port}"; then
return "${KSFT_FAIL}"
fi
- if ! vm_vsock_test "127.0.0.1" 1 "${port}"; then
+
+ if ! vm_vsock_test "init_ns" "127.0.0.1" 1 "${port}"; then
return "${KSFT_FAIL}"
fi
@@ -621,8 +658,8 @@ cnt_total=0
if shared_vm_tests_requested "${ARGS[@]}"; then
log_host "Booting up VM"
pidfile="$(create_pidfile)"
- vm_start "${pidfile}"
- vm_wait_for_ssh
+ vm_start "${pidfile}" "init_ns"
+ vm_wait_for_ssh "init_ns"
log_host "VM booted up"
run_shared_vm_tests "${ARGS[@]}"
--
2.47.3
^ permalink raw reply related
* [PATCH RFC net-next v13 06/13] selftests/vsock: add namespace helpers to vmtest.sh
From: Bobby Eshleman @ 2025-12-24 0:28 UTC (permalink / raw)
To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Eugenio Pérez, Xuan Zhuo, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, Shuah Khan, Long Li
Cc: linux-kernel, virtualization, netdev, kvm, linux-hyperv,
linux-kselftest, berrange, Sargun Dhillon, Bobby Eshleman,
Bobby Eshleman
In-Reply-To: <20251223-vsock-vmtest-v13-0-9d6db8e7c80b@meta.com>
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add functions for initializing namespaces with the different vsock NS
modes. Callers can use add_namespaces() and del_namespaces() to create
namespaces global0, global1, local0, and local1.
The add_namespaces() function initializes global0, local0, etc... with
their respective vsock NS mode by toggling child_ns_mode before creating
the namespace.
Remove namespaces upon exiting the program in cleanup(). This is
unlikely to be needed for a healthy run, but it is useful for tests that
are manually killed mid-test.
This patch is in preparation for later namespace tests.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v13:
- intialize namespaces to use the child_ns_mode mechanism
- remove setting modes from init_namespaces() function (this function
only sets up the lo device now)
- remove ns_set_mode(ns) because ns_mode is no longer mutable
---
tools/testing/selftests/vsock/vmtest.sh | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh
index c7b270dd77a9..c2bdc293b94c 100755
--- a/tools/testing/selftests/vsock/vmtest.sh
+++ b/tools/testing/selftests/vsock/vmtest.sh
@@ -49,6 +49,7 @@ readonly TEST_DESCS=(
)
readonly USE_SHARED_VM=(vm_server_host_client vm_client_host_server vm_loopback)
+readonly NS_MODES=("local" "global")
VERBOSE=0
@@ -103,6 +104,36 @@ check_result() {
fi
}
+add_namespaces() {
+ local orig_mode
+ orig_mode=$(cat /proc/sys/net/vsock/child_ns_mode)
+
+ for mode in "${NS_MODES[@]}"; do
+ echo "${mode}" > /proc/sys/net/vsock/child_ns_mode
+ ip netns add "${mode}0" 2>/dev/null
+ ip netns add "${mode}1" 2>/dev/null
+ done
+
+ echo "${orig_mode}" > /proc/sys/net/vsock/child_ns_mode
+}
+
+init_namespaces() {
+ for mode in "${NS_MODES[@]}"; do
+ # we need lo for qemu port forwarding
+ ip netns exec "${mode}0" ip link set dev lo up
+ ip netns exec "${mode}1" ip link set dev lo up
+ done
+}
+
+del_namespaces() {
+ for mode in "${NS_MODES[@]}"; do
+ ip netns del "${mode}0" &>/dev/null
+ ip netns del "${mode}1" &>/dev/null
+ log_host "removed ns ${mode}0"
+ log_host "removed ns ${mode}1"
+ done
+}
+
vm_ssh() {
ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@"
return $?
@@ -110,6 +141,7 @@ vm_ssh() {
cleanup() {
terminate_pidfiles "${!PIDFILES[@]}"
+ del_namespaces
}
check_args() {
--
2.47.3
^ permalink raw reply related
* [PATCH RFC net-next v13 05/13] selftests/vsock: increase timeout to 1200
From: Bobby Eshleman @ 2025-12-24 0:28 UTC (permalink / raw)
To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Eugenio Pérez, Xuan Zhuo, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, Shuah Khan, Long Li
Cc: linux-kernel, virtualization, netdev, kvm, linux-hyperv,
linux-kselftest, berrange, Sargun Dhillon, Bobby Eshleman,
Bobby Eshleman
In-Reply-To: <20251223-vsock-vmtest-v13-0-9d6db8e7c80b@meta.com>
From: Bobby Eshleman <bobbyeshleman@meta.com>
Increase the timeout from 300s to 1200s. On a modern bare metal server
my last run showed the new set of tests taking ~400s. Multiply by an
(arbitrary) factor of three to account for slower/nested runners.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
tools/testing/selftests/vsock/settings | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vsock/settings b/tools/testing/selftests/vsock/settings
index 694d70710ff0..79b65bdf05db 100644
--- a/tools/testing/selftests/vsock/settings
+++ b/tools/testing/selftests/vsock/settings
@@ -1 +1 @@
-timeout=300
+timeout=1200
--
2.47.3
^ permalink raw reply related
* [PATCH RFC net-next v13 03/13] virtio: set skb owner of virtio_transport_reset_no_sock() reply
From: Bobby Eshleman @ 2025-12-24 0:28 UTC (permalink / raw)
To: Stefano Garzarella, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, Michael S. Tsirkin,
Jason Wang, Eugenio Pérez, Xuan Zhuo, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
Broadcom internal kernel review list, Shuah Khan, Long Li
Cc: linux-kernel, virtualization, netdev, kvm, linux-hyperv,
linux-kselftest, berrange, Sargun Dhillon, Bobby Eshleman,
Bobby Eshleman
In-Reply-To: <20251223-vsock-vmtest-v13-0-9d6db8e7c80b@meta.com>
From: Bobby Eshleman <bobbyeshleman@meta.com>
Associate reply packets with the sending socket. When vsock must reply
with an RST packet and there exists a sending socket (e.g., for
loopback), setting the skb owner to the socket correctly handles
reference counting between the skb and sk (i.e., the sk stays alive
until the skb is freed).
This allows the net namespace to be used for socket lookups for the
duration of the reply skb's lifetime, preventing race conditions between
the namespace lifecycle and vsock socket search using the namespace
pointer.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v11:
- move before adding to netns support (Stefano)
Changes in v10:
- break this out into its own patch for easy revert (Stefano)
---
net/vmw_vsock/virtio_transport_common.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index fdb8f5b3fa60..718be9f33274 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1165,6 +1165,12 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
.op = VIRTIO_VSOCK_OP_RST,
.type = le16_to_cpu(hdr->type),
.reply = true,
+
+ /* Set sk owner to socket we are replying to (may be NULL for
+ * non-loopback). This keeps a reference to the sock and
+ * sock_net(sk) until the reply skb is freed.
+ */
+ .vsk = vsock_sk(skb->sk),
};
struct sk_buff *reply;
--
2.47.3
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox