* Re: [PATCH 03/15] mshyperv: Introduce numa_node_to_proximity_domain_info
From: Wei Liu @ 2023-08-02 23:47 UTC (permalink / raw)
To: Nuno Das Neves
Cc: linux-hyperv, linux-kernel, x86, linux-arm-kernel, linux-arch,
mikelley, kys, wei.liu, haiyangz, decui, ssengar, mukeshrathor,
stanislav.kinsburskiy, jinankjain, apais, Tianyu.Lan, vkuznets,
tglx, mingo, bp, dave.hansen, hpa, will, catalin.marinas, rafael,
lenb
In-Reply-To: <1690487690-2428-4-git-send-email-nunodasneves@linux.microsoft.com>
On Thu, Jul 27, 2023 at 12:54:38PM -0700, Nuno Das Neves wrote:
> Factor out logic for converting numa node to proximity domain info into
> a helper function, and export it.
>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
> ---
> arch/x86/hyperv/hv_proc.c | 8 ++------
> drivers/acpi/numa/srat.c | 1 +
> include/asm-generic/mshyperv.h | 18 ++++++++++++++++++
> 3 files changed, 21 insertions(+), 6 deletions(-)
>
[...]
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index 1f4fc5f8a819..0cf9f0574495 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -48,6 +48,7 @@ int node_to_pxm(int node)
> return PXM_INVAL;
> return node_to_pxm_map[node];
> }
> +EXPORT_SYMBOL(node_to_pxm);
Rafael and Len, I would like to get an ACK from you on this one line
change. I see a lot of other functions in that file are already
exported, so I hope this is okay, too.
It's user is the function below numa_node_to_proximity_domain_info.
Thanks,
Wei.
>
> static void __acpi_map_pxm_to_node(int pxm, int node)
> {
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 233c976344e5..447e7ebe67ee 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -21,6 +21,7 @@
> #include <linux/types.h>
> #include <linux/atomic.h>
> #include <linux/bitops.h>
> +#include <acpi/acpi_numa.h>
> #include <linux/cpumask.h>
> #include <linux/nmi.h>
> #include <asm/ptrace.h>
> @@ -28,6 +29,23 @@
>
> #define VTPM_BASE_ADDRESS 0xfed40000
>
> +static inline union hv_proximity_domain_info
> +numa_node_to_proximity_domain_info(int node)
> +{
> + union hv_proximity_domain_info proximity_domain_info;
> +
> + if (node != NUMA_NO_NODE) {
> + proximity_domain_info.domain_id = node_to_pxm(node);
> + proximity_domain_info.flags.reserved = 0;
> + proximity_domain_info.flags.proximity_info_valid = 1;
> + proximity_domain_info.flags.proximity_preferred = 1;
> + } else {
> + proximity_domain_info.as_uint64 = 0;
> + }
> +
> + return proximity_domain_info;
> +}
> +
> struct ms_hyperv_info {
> u32 features;
> u32 priv_high;
> --
> 2.25.1
>
^ permalink raw reply
* Re: [PATCH 04/15] asm-generic/mshyperv: Introduce hv_recommend_using_aeoi()
From: Wei Liu @ 2023-08-02 23:48 UTC (permalink / raw)
To: Nuno Das Neves
Cc: linux-hyperv, linux-kernel, x86, linux-arm-kernel, linux-arch,
mikelley, kys, wei.liu, haiyangz, decui, ssengar, mukeshrathor,
stanislav.kinsburskiy, jinankjain, apais, Tianyu.Lan, vkuznets,
tglx, mingo, bp, dave.hansen, hpa, will, catalin.marinas
In-Reply-To: <1690487690-2428-5-git-send-email-nunodasneves@linux.microsoft.com>
On Thu, Jul 27, 2023 at 12:54:39PM -0700, Nuno Das Neves wrote:
> Factor out logic for determining if we should set the auto eoi flag in SINT
> register.
>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
^ permalink raw reply
* Re: [PATCH 05/15] hyperv: Move hv_connection_id to hyperv-tlfs
From: Wei Liu @ 2023-08-02 23:55 UTC (permalink / raw)
To: Nuno Das Neves
Cc: linux-hyperv, linux-kernel, x86, linux-arm-kernel, linux-arch,
mikelley, kys, wei.liu, haiyangz, decui, ssengar, mukeshrathor,
stanislav.kinsburskiy, jinankjain, apais, Tianyu.Lan, vkuznets,
tglx, mingo, bp, dave.hansen, hpa, will, catalin.marinas
In-Reply-To: <1690487690-2428-6-git-send-email-nunodasneves@linux.microsoft.com>
On Thu, Jul 27, 2023 at 12:54:40PM -0700, Nuno Das Neves wrote:
> This structure should be in hyperv-tlfs.h anyway, since it is part of
> the TLFS document.
Missing blank line here.
> The definition conflicts with one added in hvgdk.h as part of the mshv
> driver so must be moved to hyperv-tlfs.h.
>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> ---
> include/asm-generic/hyperv-tlfs.h | 9 +++++++++
> include/linux/hyperv.h | 9 ---------
> 2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index 373f26efa18a..8fc5e5a9d7cb 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -845,4 +845,13 @@ struct hv_mmio_write_input {
> u8 data[HV_HYPERCALL_MMIO_MAX_DATA_LENGTH];
> } __packed;
>
> +/* Define connection identifier type. */
> +union hv_connection_id {
> + u32 asu32;
> + struct {
> + u32 id:24;
> + u32 reserved:8;
> + } u;
> +};
> +
Missing __packed here, but since this is already aligned it probably
doesn't matter much.
> #endif
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index bfbc37ce223b..f90de5abcd50 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -748,15 +748,6 @@ struct vmbus_close_msg {
> struct vmbus_channel_close_channel msg;
> };
>
> -/* Define connection identifier type. */
> -union hv_connection_id {
> - u32 asu32;
> - struct {
> - u32 id:24;
> - u32 reserved:8;
> - } u;
> -};
> -
> enum vmbus_device_type {
> HV_IDE = 0,
> HV_SCSI,
> --
> 2.25.1
>
^ permalink raw reply
* Re: [PATCH 07/15] Drivers: hv: Move hv_call_deposit_pages and hv_call_create_vp to common code
From: Wei Liu @ 2023-08-03 0:02 UTC (permalink / raw)
To: Nuno Das Neves
Cc: linux-hyperv, linux-kernel, x86, linux-arm-kernel, linux-arch,
mikelley, kys, wei.liu, haiyangz, decui, ssengar, mukeshrathor,
stanislav.kinsburskiy, jinankjain, apais, Tianyu.Lan, vkuznets,
tglx, mingo, bp, dave.hansen, hpa, will, catalin.marinas
In-Reply-To: <1690487690-2428-8-git-send-email-nunodasneves@linux.microsoft.com>
On Thu, Jul 27, 2023 at 12:54:42PM -0700, Nuno Das Neves wrote:
> These hypercalls are not arch-specific.
...
> Move them to common code.
>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Pure code movement so:
Acked-by: Wei Liu <wei.liu@kernel.org>
^ permalink raw reply
* Re: [PATCH 08/15] Drivers: hv: Introduce per-cpu event ring tail
From: Wei Liu @ 2023-08-03 0:10 UTC (permalink / raw)
To: Nuno Das Neves
Cc: linux-hyperv, linux-kernel, x86, linux-arm-kernel, linux-arch,
mikelley, kys, wei.liu, haiyangz, decui, ssengar, mukeshrathor,
stanislav.kinsburskiy, jinankjain, apais, Tianyu.Lan, vkuznets,
tglx, mingo, bp, dave.hansen, hpa, will, catalin.marinas
In-Reply-To: <1690487690-2428-9-git-send-email-nunodasneves@linux.microsoft.com>
On Thu, Jul 27, 2023 at 12:54:43PM -0700, Nuno Das Neves wrote:
> Add a pointer hv_synic_eventring_tail to track the tail pointer for the
> SynIC event ring buffer for each SINT.
> This will be used by the mshv driver, but must be tracked independently
> since the driver module could be removed and re-inserted.
>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
I understand the general idea, but see below.
> ---
> drivers/hv/hv_common.c | 25 +++++++++++++++++++++++++
> include/asm-generic/mshyperv.h | 2 ++
> 2 files changed, 27 insertions(+)
>
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 9f9c3dc89bb2..99d9b262b8a7 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -61,6 +61,16 @@ static void hv_kmsg_dump_unregister(void);
>
> static struct ctl_table_header *hv_ctl_table_hdr;
>
> +/*
> + * Per-cpu array holding the tail pointer for the SynIC event ring buffer
> + * for each SINT.
> + *
> + * We cannot maintain this in mshv driver because the tail pointer should
> + * persist even if the mshv driver is unloaded.
> + */
> +u8 __percpu **hv_synic_eventring_tail;
> +EXPORT_SYMBOL_GPL(hv_synic_eventring_tail);
> +
> /*
> * Hyper-V specific initialization and shutdown code that is
> * common across all architectures. Called from architecture
> @@ -332,6 +342,8 @@ int __init hv_common_init(void)
> if (hv_root_partition) {
> hyperv_pcpu_output_arg = alloc_percpu(void *);
> BUG_ON(!hyperv_pcpu_output_arg);
> + hv_synic_eventring_tail = alloc_percpu(u8 *);
> + BUG_ON(hv_synic_eventring_tail == NULL);
> }
>
> hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
> @@ -356,6 +368,7 @@ int __init hv_common_init(void)
> int hv_common_cpu_init(unsigned int cpu)
> {
> void **inputarg, **outputarg;
> + u8 **synic_eventring_tail;
> u64 msr_vp_index;
> gfp_t flags;
> int pgcount = hv_root_partition ? 2 : 1;
> @@ -371,6 +384,14 @@ int hv_common_cpu_init(unsigned int cpu)
> if (hv_root_partition) {
> outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> *outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
> + synic_eventring_tail = (u8 **)this_cpu_ptr(hv_synic_eventring_tail);
> + *synic_eventring_tail = kcalloc(HV_SYNIC_SINT_COUNT, sizeof(u8),
> + flags);
> +
> + if (unlikely(!*synic_eventring_tail)) {
> + kfree(*inputarg);
> + return -ENOMEM;
> + }
> }
>
> msr_vp_index = hv_get_register(HV_MSR_VP_INDEX);
> @@ -387,6 +408,7 @@ int hv_common_cpu_die(unsigned int cpu)
> {
> unsigned long flags;
> void **inputarg, **outputarg;
> + u8 **synic_eventring_tail;
> void *mem;
>
> local_irq_save(flags);
> @@ -398,6 +420,9 @@ int hv_common_cpu_die(unsigned int cpu)
> if (hv_root_partition) {
> outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> *outputarg = NULL;
> + synic_eventring_tail = (u8 **)this_cpu_ptr(hv_synic_eventring_tail);
> + kfree(*synic_eventring_tail);
> + *synic_eventring_tail = NULL;
The upstream code looks different now. See 9636be85cc.
> }
>
> local_irq_restore(flags);
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 0c94d20b4d44..9118d678b27a 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -73,6 +73,8 @@ extern bool hv_nested;
> extern void * __percpu *hyperv_pcpu_input_arg;
> extern void * __percpu *hyperv_pcpu_output_arg;
>
> +extern u8 __percpu **hv_synic_eventring_tail;
> +
> extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
> extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
> extern bool hv_isolation_type_snp(void);
> --
> 2.25.1
>
^ permalink raw reply
* RE: [PATCH v2 net] net: hv_netvsc: fix netvsc_send_completion to avoid multiple message length checks
From: Michael Kelley (LINUX) @ 2023-08-03 0:14 UTC (permalink / raw)
To: Sonia Sharma, linux-kernel@vger.kernel.org
Cc: linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
Sonia Sharma, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
Dexuan Cui, Long Li, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com
In-Reply-To: <1691013161-14054-1-git-send-email-sosha@linux.microsoft.com>
From: Sonia Sharma <sosha@linux.microsoft.com> Sent: Wednesday, August 2, 2023 2:53 PM
>
> The switch statement in netvsc_send_completion() is incorrectly validating
> the length of incoming network packets by falling through to the next case.
> Avoid the fallthrough. Instead break after a case match and then process
> the complete() call.
>
> Signed-off-by: Sonia Sharma <sonia.sharma@linux.microsoft.com>
> ---
> drivers/net/hyperv/netvsc.c | 17 +++++++++--------
> 1 file changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 82e9796c8f5e..347688dd2eb9 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -851,7 +851,7 @@ static void netvsc_send_completion(struct net_device *ndev,
> msglen);
> return;
> }
> - fallthrough;
> + break;
>
> case NVSP_MSG1_TYPE_SEND_RECV_BUF_COMPLETE:
> if (msglen < sizeof(struct nvsp_message_header) +
> @@ -860,7 +860,7 @@ static void netvsc_send_completion(struct net_device *ndev,
> msglen);
> return;
> }
> - fallthrough;
> + break;
>
> case NVSP_MSG1_TYPE_SEND_SEND_BUF_COMPLETE:
> if (msglen < sizeof(struct nvsp_message_header) +
> @@ -869,7 +869,7 @@ static void netvsc_send_completion(struct net_device *ndev,
> msglen);
> return;
> }
> - fallthrough;
> + break;
>
> case NVSP_MSG5_TYPE_SUBCHANNEL:
> if (msglen < sizeof(struct nvsp_message_header) +
> @@ -878,10 +878,6 @@ static void netvsc_send_completion(struct net_device *ndev,
> msglen);
> return;
> }
> - /* Copy the response back */
> - memcpy(&net_device->channel_init_pkt, nvsp_packet,
> - sizeof(struct nvsp_message));
> - complete(&net_device->channel_init_wait);
> break;
>
> case NVSP_MSG1_TYPE_SEND_RNDIS_PKT_COMPLETE:
> @@ -904,13 +900,18 @@ static void netvsc_send_completion(struct net_device *ndev,
>
> netvsc_send_tx_complete(ndev, net_device, incoming_channel,
> desc, budget);
> - break;
> + return;
>
> default:
> netdev_err(ndev,
> "Unknown send completion type %d received!!\n",
> nvsp_packet->hdr.msg_type);
Per my previous comment, still need a "return" statement here
so the default error case does not do the memcpy() and complete().
Michael
> }
> +
> + /* Copy the response back */
> + memcpy(&net_device->channel_init_pkt, nvsp_packet,
> + sizeof(struct nvsp_message));
> + complete(&net_device->channel_init_wait);
> }
>
> static u32 netvsc_get_next_send_section(struct netvsc_device *net_device)
> --
> 2.25.1
^ permalink raw reply
* Re: [PATCH 09/15] Drivers: hv: Introduce hv_output_arg_exists in hv_common
From: Wei Liu @ 2023-08-03 0:16 UTC (permalink / raw)
To: Nuno Das Neves
Cc: linux-hyperv, linux-kernel, x86, linux-arm-kernel, linux-arch,
mikelley, kys, wei.liu, haiyangz, decui, ssengar, mukeshrathor,
stanislav.kinsburskiy, jinankjain, apais, Tianyu.Lan, vkuznets,
tglx, mingo, bp, dave.hansen, hpa, will, catalin.marinas
In-Reply-To: <1690487690-2428-10-git-send-email-nunodasneves@linux.microsoft.com>
On Thu, Jul 27, 2023 at 12:54:44PM -0700, Nuno Das Neves wrote:
> This is a more flexible approach for determining whether to allocate the
> output page.
> This will be used in both mshv_vtl and root partition.
>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> ---
> drivers/hv/hv_common.c | 30 +++++++++++++++++++++++++-----
> 1 file changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 99d9b262b8a7..16f069beda78 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -57,6 +57,18 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
> void * __percpu *hyperv_pcpu_output_arg;
> EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
>
> +/*
> + * Determine whether output arg is in use, for allocation/deallocation
> + */
> +static bool hv_output_arg_exists(void)
> +{
> + bool ret = hv_root_partition ? true : false;
> +#ifdef CONFIG_MSHV_VTL
> + ret = true;
> +#endif
This should not be here. As far as I can tell, CONFIG_MSHV_VTL is
introduced in a later patch.
The rest looks okay.
Thanks,
Wei.
^ permalink raw reply
* Re: [PATCH 10/15] x86: hyperv: Add mshv_handler irq handler and setup function
From: Wei Liu @ 2023-08-03 0:17 UTC (permalink / raw)
To: Nuno Das Neves
Cc: linux-hyperv, linux-kernel, x86, linux-arm-kernel, linux-arch,
mikelley, kys, wei.liu, haiyangz, decui, ssengar, mukeshrathor,
stanislav.kinsburskiy, jinankjain, apais, Tianyu.Lan, vkuznets,
tglx, mingo, bp, dave.hansen, hpa, will, catalin.marinas
In-Reply-To: <1690487690-2428-11-git-send-email-nunodasneves@linux.microsoft.com>
On Thu, Jul 27, 2023 at 12:54:45PM -0700, Nuno Das Neves wrote:
> This will handle SYNIC interrupts such as intercepts, doorbells, and
> scheduling messages intended for the mshv driver.
>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
^ permalink raw reply
* Re: [PATCH 11/15] Drivers: hv: export vmbus_isr, hv_context and hv_post_message
From: Wei Liu @ 2023-08-03 0:17 UTC (permalink / raw)
To: Nuno Das Neves
Cc: linux-hyperv, linux-kernel, x86, linux-arm-kernel, linux-arch,
mikelley, kys, wei.liu, haiyangz, decui, ssengar, mukeshrathor,
stanislav.kinsburskiy, jinankjain, apais, Tianyu.Lan, vkuznets,
tglx, mingo, bp, dave.hansen, hpa, will, catalin.marinas
In-Reply-To: <1690487690-2428-12-git-send-email-nunodasneves@linux.microsoft.com>
On Thu, Jul 27, 2023 at 12:54:46PM -0700, Nuno Das Neves wrote:
> These will be used by the mshv_vtl driver.
>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
^ permalink raw reply
* Re: [PATCH 12/15] Documentation: Reserve ioctl number for mshv driver
From: Wei Liu @ 2023-08-03 0:23 UTC (permalink / raw)
To: Nuno Das Neves, corbet
Cc: linux-hyperv, linux-kernel, x86, linux-arm-kernel, linux-arch,
mikelley, kys, wei.liu, haiyangz, decui, ssengar, mukeshrathor,
stanislav.kinsburskiy, jinankjain, apais, Tianyu.Lan, vkuznets,
tglx, mingo, bp, dave.hansen, hpa, will, catalin.marinas
In-Reply-To: <1690487690-2428-13-git-send-email-nunodasneves@linux.microsoft.com>
This needs an ack from Jonathan.
On Thu, Jul 27, 2023 at 12:54:47PM -0700, Nuno Das Neves wrote:
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> ---
> Documentation/userspace-api/ioctl/ioctl-number.rst | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 0a1882e296ae..ca6b82419118 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -355,6 +355,8 @@ Code Seq# Include File Comments
> 0xB6 all linux/fpga-dfl.h
> 0xB7 all uapi/linux/remoteproc_cdev.h <mailto:linux-remoteproc@vger.kernel.org>
> 0xB7 all uapi/linux/nsfs.h <mailto:Andrei Vagin <avagin@openvz.org>>
> +0xB8 all uapi/linux/mshv.h Microsoft Hypervisor VM management APIs
> + <mailto:linux-hyperv@vger.kernel.org>
> 0xC0 00-0F linux/usb/iowarrior.h
> 0xCA 00-0F uapi/misc/cxl.h
> 0xCA 10-2F uapi/misc/ocxl.h
> --
> 2.25.1
>
^ permalink raw reply
* Re: [PATCH 13/15] uapi: hyperv: Add mshv driver headers hvhdk.h, hvhdk_mini.h, hvgdk.h, hvgdk_mini.h
From: Wei Liu @ 2023-08-03 0:27 UTC (permalink / raw)
To: Nuno Das Neves
Cc: linux-hyperv, linux-kernel, x86, linux-arm-kernel, linux-arch,
mikelley, kys, wei.liu, haiyangz, decui, ssengar, mukeshrathor,
stanislav.kinsburskiy, jinankjain, apais, Tianyu.Lan, vkuznets,
tglx, mingo, bp, dave.hansen, hpa, will, catalin.marinas
In-Reply-To: <1690487690-2428-14-git-send-email-nunodasneves@linux.microsoft.com>
On Thu, Jul 27, 2023 at 12:54:48PM -0700, Nuno Das Neves wrote:
> Containing hypervisor ABI definitions to use in mshv driver.
>
> Version numbers for each file:
> hvhdk.h 25212
> hvhdk_mini.h 25294
> hvgdk.h 25125
> hvgdk_mini.h 25294
>
> These are unstable interfaces and as such must be compiled independently
> from published interfaces found in hyperv-tlfs.h.
>
> These are in uapi because they will be used in the mshv ioctl API.
>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
^ permalink raw reply
* Re: [PATCH RFC net-next v5 03/14] af_vsock: support multi-transport datagrams
From: Bobby Eshleman @ 2023-08-02 22:24 UTC (permalink / raw)
To: Arseniy Krasnov
Cc: Bobby Eshleman, Stefan Hajnoczi, Stefano Garzarella,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
VMware PV-Drivers Reviewers, Dan Carpenter, Simon Horman, kvm,
virtualization, netdev, linux-kernel, linux-hyperv, bpf
In-Reply-To: <43fad7ab-2ca9-608e-566f-80e607d2d6b8@gmail.com>
On Sun, Jul 23, 2023 at 12:53:15AM +0300, Arseniy Krasnov wrote:
>
>
> On 19.07.2023 03:50, Bobby Eshleman wrote:
> > This patch adds support for multi-transport datagrams.
> >
> > This includes:
> > - Per-packet lookup of transports when using sendto(sockaddr_vm)
> > - Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
> > sockaddr_vm
> > - rename VSOCK_TRANSPORT_F_DGRAM to VSOCK_TRANSPORT_F_DGRAM_FALLBACK
> > - connect() now assigns the transport for (similar to connectible
> > sockets)
> >
> > To preserve backwards compatibility with VMCI, some important changes
> > are made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
> > be used for dgrams only if there is not yet a g2h or h2g transport that
> > has been registered that can transmit the packet. If there is a g2h/h2g
> > transport for that remote address, then that transport will be used and
> > not "transport_dgram". This essentially makes "transport_dgram" a
> > fallback transport for when h2g/g2h has not yet gone online, and so it
> > is renamed "transport_dgram_fallback". VMCI implements this transport.
> >
> > The logic around "transport_dgram" needs to be retained to prevent
> > breaking VMCI:
> >
> > 1) VMCI datagrams existed prior to h2g/g2h and so operate under a
> > different paradigm. When the vmci transport comes online, it registers
> > itself with the DGRAM feature, but not H2G/G2H. Only later when the
> > transport has more information about its environment does it register
> > H2G or G2H. In the case that a datagram socket is created after
> > VSOCK_TRANSPORT_F_DGRAM registration but before G2H/H2G registration,
> > the "transport_dgram" transport is the only registered transport and so
> > needs to be used.
> >
> > 2) VMCI seems to require a special message be sent by the transport when a
> > datagram socket calls bind(). Under the h2g/g2h model, the transport
> > is selected using the remote_addr which is set by connect(). At
> > bind time there is no remote_addr because often no connect() has been
> > called yet: the transport is null. Therefore, with a null transport
> > there doesn't seem to be any good way for a datagram socket to tell the
> > VMCI transport that it has just had bind() called upon it.
> >
> > With the new fallback logic, after H2G/G2H comes online the socket layer
> > will access the VMCI transport via transport_{h2g,g2h}. Prior to H2G/G2H
> > coming online, the socket layer will access the VMCI transport via
> > "transport_dgram_fallback".
> >
> > Only transports with a special datagram fallback use-case such as VMCI
> > need to register VSOCK_TRANSPORT_F_DGRAM_FALLBACK.
> >
> > Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> > ---
> > drivers/vhost/vsock.c | 1 -
> > include/linux/virtio_vsock.h | 2 --
> > include/net/af_vsock.h | 10 +++---
> > net/vmw_vsock/af_vsock.c | 64 ++++++++++++++++++++++++++-------
> > net/vmw_vsock/hyperv_transport.c | 6 ----
> > net/vmw_vsock/virtio_transport.c | 1 -
> > net/vmw_vsock/virtio_transport_common.c | 7 ----
> > net/vmw_vsock/vmci_transport.c | 2 +-
> > net/vmw_vsock/vsock_loopback.c | 1 -
> > 9 files changed, 58 insertions(+), 36 deletions(-)
> >
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index ae8891598a48..d5d6a3c3f273 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
> > .cancel_pkt = vhost_transport_cancel_pkt,
> >
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > - .dgram_bind = virtio_transport_dgram_bind,
> > .dgram_allow = virtio_transport_dgram_allow,
> >
> > .stream_enqueue = virtio_transport_stream_enqueue,
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 18cbe8d37fca..7632552bee58 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -211,8 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> > bool virtio_transport_stream_allow(u32 cid, u32 port);
> > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > - struct sockaddr_vm *addr);
> > bool virtio_transport_dgram_allow(u32 cid, u32 port);
> >
> > int virtio_transport_connect(struct vsock_sock *vsk);
> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > index 305d57502e89..f6a0ca9d7c3e 100644
> > --- a/include/net/af_vsock.h
> > +++ b/include/net/af_vsock.h
> > @@ -96,13 +96,13 @@ struct vsock_transport_send_notify_data {
> >
> > /* Transport features flags */
> > /* Transport provides host->guest communication */
> > -#define VSOCK_TRANSPORT_F_H2G 0x00000001
> > +#define VSOCK_TRANSPORT_F_H2G 0x00000001
> > /* Transport provides guest->host communication */
> > -#define VSOCK_TRANSPORT_F_G2H 0x00000002
> > -/* Transport provides DGRAM communication */
> > -#define VSOCK_TRANSPORT_F_DGRAM 0x00000004
> > +#define VSOCK_TRANSPORT_F_G2H 0x00000002
> > +/* Transport provides fallback for DGRAM communication */
> > +#define VSOCK_TRANSPORT_F_DGRAM_FALLBACK 0x00000004
> > /* Transport provides local (loopback) communication */
> > -#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
> > +#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
> >
> > struct vsock_transport {
> > struct module *module;
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index ae5ac5531d96..26c97b33d55a 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -139,8 +139,8 @@ struct proto vsock_proto = {
> > static const struct vsock_transport *transport_h2g;
> > /* Transport used for guest->host communication */
> > static const struct vsock_transport *transport_g2h;
> > -/* Transport used for DGRAM communication */
> > -static const struct vsock_transport *transport_dgram;
> > +/* Transport used as a fallback for DGRAM communication */
> > +static const struct vsock_transport *transport_dgram_fallback;
> > /* Transport used for local communication */
> > static const struct vsock_transport *transport_local;
> > static DEFINE_MUTEX(vsock_register_mutex);
> > @@ -439,6 +439,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
> > return transport;
> > }
> >
> > +static const struct vsock_transport *
> > +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
> > +{
> > + const struct vsock_transport *transport;
> > +
> > + transport = vsock_connectible_lookup_transport(cid, flags);
> > + if (transport)
> > + return transport;
> > +
> > + return transport_dgram_fallback;
> > +}
> > +
> > /* Assign a transport to a socket and call the .init transport callback.
> > *
> > * Note: for connection oriented socket this must be called when vsk->remote_addr
> > @@ -475,7 +487,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> >
> > switch (sk->sk_type) {
> > case SOCK_DGRAM:
> > - new_transport = transport_dgram;
> > + new_transport = vsock_dgram_lookup_transport(remote_cid,
> > + remote_flags);
>
> I'm a little bit confused about this:
> 1) Let's create SOCK_DGRAM socket using vsock_create()
> 2) for SOCK_DGRAM it calls 'vsock_assign_transport()' and we go here, remote_cid == -1
> 3) I guess 'vsock_dgram_lookup_transport()' calls logic from 0002 and returns h2g for such remote cid, which is not
> correct I think...
>
> Please correct me if i'm wrong
>
> Thanks, Arseniy
>
As I understand, for the VMCI case, if transport_h2g != NULL, then
transport_h2g == transport_dgram_fallback. In either case,
vsk->transport == transport_dgram_fallback.
For the virtio/vhost case, temporarily vsk->transport == transport_h2g,
but it is unused because vsk->transport->dgram_bind == NULL.
Until SS_CONNECTED is set by connect() and vsk->transport is set
correctly, the send path is barred from using the bad transport.
I guess the recvmsg() path is a little more sketchy, and probably only
works in my test cases because h2g/g2h in the vhost/virtio case have
identical dgram_addr_init() implementations.
I think a cleaner solution is maybe checking in vsock_create() if
dgram_bind is implemented. If it is not, then vsk->transport should be
reset to NULL and a comment added explaining why VMCI requires this.
Then the other calls can begin explicitly checking for vsk->transport ==
NULL.
Thoughts?
> > break;
> > case SOCK_STREAM:
> > case SOCK_SEQPACKET:
> > @@ -692,6 +705,9 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > static int __vsock_bind_dgram(struct vsock_sock *vsk,
> > struct sockaddr_vm *addr)
> > {
> > + if (!vsk->transport || !vsk->transport->dgram_bind)
> > + return -EINVAL;
> > +
> > return vsk->transport->dgram_bind(vsk, addr);
> > }
> >
> > @@ -1162,6 +1178,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > struct vsock_sock *vsk;
> > struct sockaddr_vm *remote_addr;
> > const struct vsock_transport *transport;
> > + bool module_got = false;
> >
> > if (msg->msg_flags & MSG_OOB)
> > return -EOPNOTSUPP;
> > @@ -1173,19 +1190,34 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> >
> > lock_sock(sk);
> >
> > - transport = vsk->transport;
> > -
> > err = vsock_auto_bind(vsk);
> > if (err)
> > goto out;
> >
> > -
> > /* If the provided message contains an address, use that. Otherwise
> > * fall back on the socket's remote handle (if it has been connected).
> > */
> > if (msg->msg_name &&
> > vsock_addr_cast(msg->msg_name, msg->msg_namelen,
> > &remote_addr) == 0) {
> > + transport = vsock_dgram_lookup_transport(remote_addr->svm_cid,
> > + remote_addr->svm_flags);
> > + if (!transport) {
> > + err = -EINVAL;
> > + goto out;
> > + }
> > +
> > + if (!try_module_get(transport->module)) {
> > + err = -ENODEV;
> > + goto out;
> > + }
> > +
> > + /* When looking up a transport dynamically and acquiring a
> > + * reference on the module, we need to remember to release the
> > + * reference later.
> > + */
> > + module_got = true;
> > +
> > /* Ensure this address is of the right type and is a valid
> > * destination.
> > */
> > @@ -1200,6 +1232,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > } else if (sock->state == SS_CONNECTED) {
> > remote_addr = &vsk->remote_addr;
> >
> > + transport = vsk->transport;
> > if (remote_addr->svm_cid == VMADDR_CID_ANY)
> > remote_addr->svm_cid = transport->get_local_cid();
> >
> > @@ -1224,6 +1257,8 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
> >
> > out:
> > + if (module_got)
> > + module_put(transport->module);
> > release_sock(sk);
> > return err;
> > }
> > @@ -1256,13 +1291,18 @@ static int vsock_dgram_connect(struct socket *sock,
> > if (err)
> > goto out;
> >
> > + memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> > +
> > + err = vsock_assign_transport(vsk, NULL);
> > + if (err)
> > + goto out;
> > +
> > if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
> > remote_addr->svm_port)) {
> > err = -EINVAL;
> > goto out;
> > }
> >
> > - memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> > sock->state = SS_CONNECTED;
> >
> > /* sock map disallows redirection of non-TCP sockets with sk_state !=
> > @@ -2487,7 +2527,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> >
> > t_h2g = transport_h2g;
> > t_g2h = transport_g2h;
> > - t_dgram = transport_dgram;
> > + t_dgram = transport_dgram_fallback;
> > t_local = transport_local;
> >
> > if (features & VSOCK_TRANSPORT_F_H2G) {
> > @@ -2506,7 +2546,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> > t_g2h = t;
> > }
> >
> > - if (features & VSOCK_TRANSPORT_F_DGRAM) {
> > + if (features & VSOCK_TRANSPORT_F_DGRAM_FALLBACK) {
> > if (t_dgram) {
> > err = -EBUSY;
> > goto err_busy;
> > @@ -2524,7 +2564,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> >
> > transport_h2g = t_h2g;
> > transport_g2h = t_g2h;
> > - transport_dgram = t_dgram;
> > + transport_dgram_fallback = t_dgram;
> > transport_local = t_local;
> >
> > err_busy:
> > @@ -2543,8 +2583,8 @@ void vsock_core_unregister(const struct vsock_transport *t)
> > if (transport_g2h == t)
> > transport_g2h = NULL;
> >
> > - if (transport_dgram == t)
> > - transport_dgram = NULL;
> > + if (transport_dgram_fallback == t)
> > + transport_dgram_fallback = NULL;
> >
> > if (transport_local == t)
> > transport_local = NULL;
> > diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> > index 7f1ea434656d..c29000f2612a 100644
> > --- a/net/vmw_vsock/hyperv_transport.c
> > +++ b/net/vmw_vsock/hyperv_transport.c
> > @@ -551,11 +551,6 @@ static void hvs_destruct(struct vsock_sock *vsk)
> > kfree(hvs);
> > }
> >
> > -static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> > -{
> > - return -EOPNOTSUPP;
> > -}
> > -
> > static int hvs_dgram_enqueue(struct vsock_sock *vsk,
> > struct sockaddr_vm *remote, struct msghdr *msg,
> > size_t dgram_len)
> > @@ -826,7 +821,6 @@ static struct vsock_transport hvs_transport = {
> > .connect = hvs_connect,
> > .shutdown = hvs_shutdown,
> >
> > - .dgram_bind = hvs_dgram_bind,
> > .dgram_enqueue = hvs_dgram_enqueue,
> > .dgram_allow = hvs_dgram_allow,
> >
> > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > index 66edffdbf303..ac2126c7dac5 100644
> > --- a/net/vmw_vsock/virtio_transport.c
> > +++ b/net/vmw_vsock/virtio_transport.c
> > @@ -428,7 +428,6 @@ static struct virtio_transport virtio_transport = {
> > .shutdown = virtio_transport_shutdown,
> > .cancel_pkt = virtio_transport_cancel_pkt,
> >
> > - .dgram_bind = virtio_transport_dgram_bind,
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > .dgram_allow = virtio_transport_dgram_allow,
> >
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index 01ea1402ad40..ffcbdd77feaa 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -781,13 +781,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
> > }
> > EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> >
> > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > - struct sockaddr_vm *addr)
> > -{
> > - return -EOPNOTSUPP;
> > -}
> > -EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> > -
> > bool virtio_transport_dgram_allow(u32 cid, u32 port)
> > {
> > return false;
> > diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> > index 0bbbdb222245..857b0461f856 100644
> > --- a/net/vmw_vsock/vmci_transport.c
> > +++ b/net/vmw_vsock/vmci_transport.c
> > @@ -2072,7 +2072,7 @@ static int __init vmci_transport_init(void)
> > /* Register only with dgram feature, other features (H2G, G2H) will be
> > * registered when the first host or guest becomes active.
> > */
> > - err = vsock_core_register(&vmci_transport, VSOCK_TRANSPORT_F_DGRAM);
> > + err = vsock_core_register(&vmci_transport, VSOCK_TRANSPORT_F_DGRAM_FALLBACK);
> > if (err < 0)
> > goto err_unsubscribe;
> >
> > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> > index 2a59dd177c74..278235ea06c4 100644
> > --- a/net/vmw_vsock/vsock_loopback.c
> > +++ b/net/vmw_vsock/vsock_loopback.c
> > @@ -61,7 +61,6 @@ static struct virtio_transport loopback_transport = {
> > .shutdown = virtio_transport_shutdown,
> > .cancel_pkt = vsock_loopback_cancel_pkt,
> >
> > - .dgram_bind = virtio_transport_dgram_bind,
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > .dgram_allow = virtio_transport_dgram_allow,
> >
> >
^ permalink raw reply
* [PATCH v3 net] net: hv_netvsc: fix netvsc_send_completion to avoid multiple message length checks
From: Sonia Sharma @ 2023-08-03 0:45 UTC (permalink / raw)
To: linux-kernel
Cc: linux-hyperv, netdev, sosha, kys, mikelley, haiyangz, wei.liu,
decui, longli, davem, edumazet, kuba, pabeni
From: Sonia Sharma <sonia.sharma@linux.microsoft.com>
The switch statement in netvsc_send_completion() is incorrectly validating
the length of incoming network packets by falling through to the next case.
Avoid the fallthrough. Instead break after a case match and then process
the complete() call.
Signed-off-by: Sonia Sharma <sonia.sharma@linux.microsoft.com>
---
Changes in v3:
* added return statement in default case as pointed by Michael Kelley..
---
drivers/net/hyperv/netvsc.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 82e9796c8f5e..0f7e4d377776 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -851,7 +851,7 @@ static void netvsc_send_completion(struct net_device *ndev,
msglen);
return;
}
- fallthrough;
+ break;
case NVSP_MSG1_TYPE_SEND_RECV_BUF_COMPLETE:
if (msglen < sizeof(struct nvsp_message_header) +
@@ -860,7 +860,7 @@ static void netvsc_send_completion(struct net_device *ndev,
msglen);
return;
}
- fallthrough;
+ break;
case NVSP_MSG1_TYPE_SEND_SEND_BUF_COMPLETE:
if (msglen < sizeof(struct nvsp_message_header) +
@@ -869,7 +869,7 @@ static void netvsc_send_completion(struct net_device *ndev,
msglen);
return;
}
- fallthrough;
+ break;
case NVSP_MSG5_TYPE_SUBCHANNEL:
if (msglen < sizeof(struct nvsp_message_header) +
@@ -878,10 +878,6 @@ static void netvsc_send_completion(struct net_device *ndev,
msglen);
return;
}
- /* Copy the response back */
- memcpy(&net_device->channel_init_pkt, nvsp_packet,
- sizeof(struct nvsp_message));
- complete(&net_device->channel_init_wait);
break;
case NVSP_MSG1_TYPE_SEND_RNDIS_PKT_COMPLETE:
@@ -904,13 +900,19 @@ static void netvsc_send_completion(struct net_device *ndev,
netvsc_send_tx_complete(ndev, net_device, incoming_channel,
desc, budget);
- break;
+ return;
default:
netdev_err(ndev,
"Unknown send completion type %d received!!\n",
nvsp_packet->hdr.msg_type);
+ return;
}
+
+ /* Copy the response back */
+ memcpy(&net_device->channel_init_pkt, nvsp_packet,
+ sizeof(struct nvsp_message));
+ complete(&net_device->channel_init_wait);
}
static u32 netvsc_get_next_send_section(struct netvsc_device *net_device)
--
2.25.1
^ permalink raw reply related
* Re: [PATCH 14/15] asm-generic: hyperv: Use mshv headers conditionally. Add asm-generic/hyperv-defs.h
From: Wei Liu @ 2023-08-03 0:48 UTC (permalink / raw)
To: Nuno Das Neves
Cc: linux-hyperv, linux-kernel, x86, linux-arm-kernel, linux-arch,
mikelley, kys, wei.liu, haiyangz, decui, ssengar, mukeshrathor,
stanislav.kinsburskiy, jinankjain, apais, Tianyu.Lan, vkuznets,
tglx, mingo, bp, dave.hansen, hpa, will, catalin.marinas
In-Reply-To: <1690487690-2428-15-git-send-email-nunodasneves@linux.microsoft.com>
On Thu, Jul 27, 2023 at 12:54:49PM -0700, Nuno Das Neves wrote:
> In order to keep unstable hyper-v interfaces independent of
> hyperv-tlfs.h, hvhdk.h must replace hyperv-tlfs.h eveywhere it will be
> used in the mshv driver.
Please properly capitalize "Hyper-V" when it is used as a term.
> Add hyperv-defs.h to replace some inclusions of hyperv-tlfs.h.
> It includes hyperv-tlfs.h or hvhdk.h depending on a compile-time constant
> HV_HYPERV_DEFS which will be defined in the mshv driver.
>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> ---
> arch/arm64/include/asm/mshyperv.h | 2 +-
> arch/x86/include/asm/mshyperv.h | 3 +--
> drivers/hv/hyperv_vmbus.h | 1 -
> include/asm-generic/hyperv-defs.h | 26 ++++++++++++++++++++++++++
> include/asm-generic/mshyperv.h | 2 +-
> include/linux/hyperv.h | 2 +-
> 6 files changed, 30 insertions(+), 6 deletions(-)
> create mode 100644 include/asm-generic/hyperv-defs.h
>
[...]
> diff --git a/include/asm-generic/hyperv-defs.h b/include/asm-generic/hyperv-defs.h
> new file mode 100644
> index 000000000000..ac6fcba35c8c
> --- /dev/null
> +++ b/include/asm-generic/hyperv-defs.h
> @@ -0,0 +1,26 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_GENERIC_HYPERV_DEFS_H
> +#define _ASM_GENERIC_HYPERV_DEFS_H
> +
> +/*
> + * There are cases where Microsoft Hypervisor ABIs are needed which may not be
> + * stable or present in the Hyper-V TLFS document. E.g. the mshv_root driver.
> + *
> + * As these interfaces are unstable and may differ from hyperv-tlfs.h, they
> + * must be kept separate and independent.
> + *
> + * However, code from files that depend on hyperv-tlfs.h (such as mshyperv.h)
> + * is still needed, so work around the issue by conditionally including the
> + * correct definitions.
> + *
> + * Note: Since they are independent of each other, there are many definitions
> + * duplicated in both hyperv-tlfs.h and uapi/hyperv/hv*.h files.
> + */
Is this because we accidentally introduced some host only, unstable
interfaces to hyperv-tlfs.h? Is the long term plan to drop those from
hyperv-tlfs.h and further separate the helper functions?
Thanks,
Wei.
> +#ifdef HV_HYPERV_DEFS
> +#include <uapi/hyperv/hvhdk.h>
> +#else
> +#include <asm/hyperv-tlfs.h>
> +#endif
> +
> +#endif /* _ASM_GENERIC_HYPERV_DEFS_H */
> +
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 2b20994d809e..e86b6f51fb64 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -25,7 +25,7 @@
> #include <linux/cpumask.h>
> #include <linux/nmi.h>
> #include <asm/ptrace.h>
> -#include <asm/hyperv-tlfs.h>
> +#include <asm-generic/hyperv-defs.h>
>
> #define VTPM_BASE_ADDRESS 0xfed40000
>
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index f90de5abcd50..66ed8b3e5d89 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -24,7 +24,7 @@
> #include <linux/mod_devicetable.h>
> #include <linux/interrupt.h>
> #include <linux/reciprocal_div.h>
> -#include <asm/hyperv-tlfs.h>
> +#include <asm-generic/hyperv-defs.h>
>
> #define MAX_PAGE_BUFFER_COUNT 32
> #define MAX_MULTIPAGE_BUFFER_COUNT 32 /* 128K */
> --
> 2.25.1
>
^ permalink raw reply
* Re: [PATCH RFC net-next v5 03/14] af_vsock: support multi-transport datagrams
From: Bobby Eshleman @ 2023-08-03 0:53 UTC (permalink / raw)
To: Arseniy Krasnov
Cc: Bobby Eshleman, Stefan Hajnoczi, Stefano Garzarella,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
VMware PV-Drivers Reviewers, Dan Carpenter, Simon Horman, kvm,
virtualization, netdev, linux-kernel, linux-hyperv, bpf
In-Reply-To: <ZMrXrBHuaEcpxGwA@bullseye>
On Wed, Aug 02, 2023 at 10:24:44PM +0000, Bobby Eshleman wrote:
> On Sun, Jul 23, 2023 at 12:53:15AM +0300, Arseniy Krasnov wrote:
> >
> >
> > On 19.07.2023 03:50, Bobby Eshleman wrote:
> > > This patch adds support for multi-transport datagrams.
> > >
> > > This includes:
> > > - Per-packet lookup of transports when using sendto(sockaddr_vm)
> > > - Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
> > > sockaddr_vm
> > > - rename VSOCK_TRANSPORT_F_DGRAM to VSOCK_TRANSPORT_F_DGRAM_FALLBACK
> > > - connect() now assigns the transport for (similar to connectible
> > > sockets)
> > >
> > > To preserve backwards compatibility with VMCI, some important changes
> > > are made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
> > > be used for dgrams only if there is not yet a g2h or h2g transport that
> > > has been registered that can transmit the packet. If there is a g2h/h2g
> > > transport for that remote address, then that transport will be used and
> > > not "transport_dgram". This essentially makes "transport_dgram" a
> > > fallback transport for when h2g/g2h has not yet gone online, and so it
> > > is renamed "transport_dgram_fallback". VMCI implements this transport.
> > >
> > > The logic around "transport_dgram" needs to be retained to prevent
> > > breaking VMCI:
> > >
> > > 1) VMCI datagrams existed prior to h2g/g2h and so operate under a
> > > different paradigm. When the vmci transport comes online, it registers
> > > itself with the DGRAM feature, but not H2G/G2H. Only later when the
> > > transport has more information about its environment does it register
> > > H2G or G2H. In the case that a datagram socket is created after
> > > VSOCK_TRANSPORT_F_DGRAM registration but before G2H/H2G registration,
> > > the "transport_dgram" transport is the only registered transport and so
> > > needs to be used.
> > >
> > > 2) VMCI seems to require a special message be sent by the transport when a
> > > datagram socket calls bind(). Under the h2g/g2h model, the transport
> > > is selected using the remote_addr which is set by connect(). At
> > > bind time there is no remote_addr because often no connect() has been
> > > called yet: the transport is null. Therefore, with a null transport
> > > there doesn't seem to be any good way for a datagram socket to tell the
> > > VMCI transport that it has just had bind() called upon it.
> > >
> > > With the new fallback logic, after H2G/G2H comes online the socket layer
> > > will access the VMCI transport via transport_{h2g,g2h}. Prior to H2G/G2H
> > > coming online, the socket layer will access the VMCI transport via
> > > "transport_dgram_fallback".
> > >
> > > Only transports with a special datagram fallback use-case such as VMCI
> > > need to register VSOCK_TRANSPORT_F_DGRAM_FALLBACK.
> > >
> > > Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> > > ---
> > > drivers/vhost/vsock.c | 1 -
> > > include/linux/virtio_vsock.h | 2 --
> > > include/net/af_vsock.h | 10 +++---
> > > net/vmw_vsock/af_vsock.c | 64 ++++++++++++++++++++++++++-------
> > > net/vmw_vsock/hyperv_transport.c | 6 ----
> > > net/vmw_vsock/virtio_transport.c | 1 -
> > > net/vmw_vsock/virtio_transport_common.c | 7 ----
> > > net/vmw_vsock/vmci_transport.c | 2 +-
> > > net/vmw_vsock/vsock_loopback.c | 1 -
> > > 9 files changed, 58 insertions(+), 36 deletions(-)
> > >
> > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > index ae8891598a48..d5d6a3c3f273 100644
> > > --- a/drivers/vhost/vsock.c
> > > +++ b/drivers/vhost/vsock.c
> > > @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
> > > .cancel_pkt = vhost_transport_cancel_pkt,
> > >
> > > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > > - .dgram_bind = virtio_transport_dgram_bind,
> > > .dgram_allow = virtio_transport_dgram_allow,
> > >
> > > .stream_enqueue = virtio_transport_stream_enqueue,
> > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > index 18cbe8d37fca..7632552bee58 100644
> > > --- a/include/linux/virtio_vsock.h
> > > +++ b/include/linux/virtio_vsock.h
> > > @@ -211,8 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> > > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> > > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> > > bool virtio_transport_stream_allow(u32 cid, u32 port);
> > > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > > - struct sockaddr_vm *addr);
> > > bool virtio_transport_dgram_allow(u32 cid, u32 port);
> > >
> > > int virtio_transport_connect(struct vsock_sock *vsk);
> > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > > index 305d57502e89..f6a0ca9d7c3e 100644
> > > --- a/include/net/af_vsock.h
> > > +++ b/include/net/af_vsock.h
> > > @@ -96,13 +96,13 @@ struct vsock_transport_send_notify_data {
> > >
> > > /* Transport features flags */
> > > /* Transport provides host->guest communication */
> > > -#define VSOCK_TRANSPORT_F_H2G 0x00000001
> > > +#define VSOCK_TRANSPORT_F_H2G 0x00000001
> > > /* Transport provides guest->host communication */
> > > -#define VSOCK_TRANSPORT_F_G2H 0x00000002
> > > -/* Transport provides DGRAM communication */
> > > -#define VSOCK_TRANSPORT_F_DGRAM 0x00000004
> > > +#define VSOCK_TRANSPORT_F_G2H 0x00000002
> > > +/* Transport provides fallback for DGRAM communication */
> > > +#define VSOCK_TRANSPORT_F_DGRAM_FALLBACK 0x00000004
> > > /* Transport provides local (loopback) communication */
> > > -#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
> > > +#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
> > >
> > > struct vsock_transport {
> > > struct module *module;
> > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > > index ae5ac5531d96..26c97b33d55a 100644
> > > --- a/net/vmw_vsock/af_vsock.c
> > > +++ b/net/vmw_vsock/af_vsock.c
> > > @@ -139,8 +139,8 @@ struct proto vsock_proto = {
> > > static const struct vsock_transport *transport_h2g;
> > > /* Transport used for guest->host communication */
> > > static const struct vsock_transport *transport_g2h;
> > > -/* Transport used for DGRAM communication */
> > > -static const struct vsock_transport *transport_dgram;
> > > +/* Transport used as a fallback for DGRAM communication */
> > > +static const struct vsock_transport *transport_dgram_fallback;
> > > /* Transport used for local communication */
> > > static const struct vsock_transport *transport_local;
> > > static DEFINE_MUTEX(vsock_register_mutex);
> > > @@ -439,6 +439,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
> > > return transport;
> > > }
> > >
> > > +static const struct vsock_transport *
> > > +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
> > > +{
> > > + const struct vsock_transport *transport;
> > > +
> > > + transport = vsock_connectible_lookup_transport(cid, flags);
> > > + if (transport)
> > > + return transport;
> > > +
> > > + return transport_dgram_fallback;
> > > +}
> > > +
> > > /* Assign a transport to a socket and call the .init transport callback.
> > > *
> > > * Note: for connection oriented socket this must be called when vsk->remote_addr
> > > @@ -475,7 +487,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> > >
> > > switch (sk->sk_type) {
> > > case SOCK_DGRAM:
> > > - new_transport = transport_dgram;
> > > + new_transport = vsock_dgram_lookup_transport(remote_cid,
> > > + remote_flags);
> >
> > I'm a little bit confused about this:
> > 1) Let's create SOCK_DGRAM socket using vsock_create()
> > 2) for SOCK_DGRAM it calls 'vsock_assign_transport()' and we go here, remote_cid == -1
> > 3) I guess 'vsock_dgram_lookup_transport()' calls logic from 0002 and returns h2g for such remote cid, which is not
> > correct I think...
> >
> > Please correct me if i'm wrong
> >
> > Thanks, Arseniy
> >
>
> As I understand, for the VMCI case, if transport_h2g != NULL, then
> transport_h2g == transport_dgram_fallback. In either case,
> vsk->transport == transport_dgram_fallback.
>
> For the virtio/vhost case, temporarily vsk->transport == transport_h2g,
> but it is unused because vsk->transport->dgram_bind == NULL.
>
> Until SS_CONNECTED is set by connect() and vsk->transport is set
> correctly, the send path is barred from using the bad transport.
>
> I guess the recvmsg() path is a little more sketchy, and probably only
> works in my test cases because h2g/g2h in the vhost/virtio case have
> identical dgram_addr_init() implementations.
>
> I think a cleaner solution is maybe checking in vsock_create() if
> dgram_bind is implemented. If it is not, then vsk->transport should be
> reset to NULL and a comment added explaining why VMCI requires this.
>
> Then the other calls can begin explicitly checking for vsk->transport ==
> NULL.
Actually, on further reflection here, in order for the vsk->transport to
be called in time for ->dgram_addr_init(), it is going to be necessary
to call vsock_assign_transport() in vsock_dgram_bind() anyway.
I think this means that the vsock_assign_transport() call can be removed
from vsock_create() call entirely, and yet VMCI can still dispatch
messages upon bind() calls as needed.
This would then simplify the whole arrangement, if there aren't other
unseen issues.
>
> Thoughts?
>
> > > break;
> > > case SOCK_STREAM:
> > > case SOCK_SEQPACKET:
> > > @@ -692,6 +705,9 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > > static int __vsock_bind_dgram(struct vsock_sock *vsk,
> > > struct sockaddr_vm *addr)
> > > {
> > > + if (!vsk->transport || !vsk->transport->dgram_bind)
> > > + return -EINVAL;
> > > +
> > > return vsk->transport->dgram_bind(vsk, addr);
> > > }
> > >
> > > @@ -1162,6 +1178,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > > struct vsock_sock *vsk;
> > > struct sockaddr_vm *remote_addr;
> > > const struct vsock_transport *transport;
> > > + bool module_got = false;
> > >
> > > if (msg->msg_flags & MSG_OOB)
> > > return -EOPNOTSUPP;
> > > @@ -1173,19 +1190,34 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > >
> > > lock_sock(sk);
> > >
> > > - transport = vsk->transport;
> > > -
> > > err = vsock_auto_bind(vsk);
> > > if (err)
> > > goto out;
> > >
> > > -
> > > /* If the provided message contains an address, use that. Otherwise
> > > * fall back on the socket's remote handle (if it has been connected).
> > > */
> > > if (msg->msg_name &&
> > > vsock_addr_cast(msg->msg_name, msg->msg_namelen,
> > > &remote_addr) == 0) {
> > > + transport = vsock_dgram_lookup_transport(remote_addr->svm_cid,
> > > + remote_addr->svm_flags);
> > > + if (!transport) {
> > > + err = -EINVAL;
> > > + goto out;
> > > + }
> > > +
> > > + if (!try_module_get(transport->module)) {
> > > + err = -ENODEV;
> > > + goto out;
> > > + }
> > > +
> > > + /* When looking up a transport dynamically and acquiring a
> > > + * reference on the module, we need to remember to release the
> > > + * reference later.
> > > + */
> > > + module_got = true;
> > > +
> > > /* Ensure this address is of the right type and is a valid
> > > * destination.
> > > */
> > > @@ -1200,6 +1232,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > > } else if (sock->state == SS_CONNECTED) {
> > > remote_addr = &vsk->remote_addr;
> > >
> > > + transport = vsk->transport;
> > > if (remote_addr->svm_cid == VMADDR_CID_ANY)
> > > remote_addr->svm_cid = transport->get_local_cid();
> > >
> > > @@ -1224,6 +1257,8 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > > err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
> > >
> > > out:
> > > + if (module_got)
> > > + module_put(transport->module);
> > > release_sock(sk);
> > > return err;
> > > }
> > > @@ -1256,13 +1291,18 @@ static int vsock_dgram_connect(struct socket *sock,
> > > if (err)
> > > goto out;
> > >
> > > + memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> > > +
> > > + err = vsock_assign_transport(vsk, NULL);
> > > + if (err)
> > > + goto out;
> > > +
> > > if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
> > > remote_addr->svm_port)) {
> > > err = -EINVAL;
> > > goto out;
> > > }
> > >
> > > - memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> > > sock->state = SS_CONNECTED;
> > >
> > > /* sock map disallows redirection of non-TCP sockets with sk_state !=
> > > @@ -2487,7 +2527,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> > >
> > > t_h2g = transport_h2g;
> > > t_g2h = transport_g2h;
> > > - t_dgram = transport_dgram;
> > > + t_dgram = transport_dgram_fallback;
> > > t_local = transport_local;
> > >
> > > if (features & VSOCK_TRANSPORT_F_H2G) {
> > > @@ -2506,7 +2546,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> > > t_g2h = t;
> > > }
> > >
> > > - if (features & VSOCK_TRANSPORT_F_DGRAM) {
> > > + if (features & VSOCK_TRANSPORT_F_DGRAM_FALLBACK) {
> > > if (t_dgram) {
> > > err = -EBUSY;
> > > goto err_busy;
> > > @@ -2524,7 +2564,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> > >
> > > transport_h2g = t_h2g;
> > > transport_g2h = t_g2h;
> > > - transport_dgram = t_dgram;
> > > + transport_dgram_fallback = t_dgram;
> > > transport_local = t_local;
> > >
> > > err_busy:
> > > @@ -2543,8 +2583,8 @@ void vsock_core_unregister(const struct vsock_transport *t)
> > > if (transport_g2h == t)
> > > transport_g2h = NULL;
> > >
> > > - if (transport_dgram == t)
> > > - transport_dgram = NULL;
> > > + if (transport_dgram_fallback == t)
> > > + transport_dgram_fallback = NULL;
> > >
> > > if (transport_local == t)
> > > transport_local = NULL;
> > > diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> > > index 7f1ea434656d..c29000f2612a 100644
> > > --- a/net/vmw_vsock/hyperv_transport.c
> > > +++ b/net/vmw_vsock/hyperv_transport.c
> > > @@ -551,11 +551,6 @@ static void hvs_destruct(struct vsock_sock *vsk)
> > > kfree(hvs);
> > > }
> > >
> > > -static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> > > -{
> > > - return -EOPNOTSUPP;
> > > -}
> > > -
> > > static int hvs_dgram_enqueue(struct vsock_sock *vsk,
> > > struct sockaddr_vm *remote, struct msghdr *msg,
> > > size_t dgram_len)
> > > @@ -826,7 +821,6 @@ static struct vsock_transport hvs_transport = {
> > > .connect = hvs_connect,
> > > .shutdown = hvs_shutdown,
> > >
> > > - .dgram_bind = hvs_dgram_bind,
> > > .dgram_enqueue = hvs_dgram_enqueue,
> > > .dgram_allow = hvs_dgram_allow,
> > >
> > > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > > index 66edffdbf303..ac2126c7dac5 100644
> > > --- a/net/vmw_vsock/virtio_transport.c
> > > +++ b/net/vmw_vsock/virtio_transport.c
> > > @@ -428,7 +428,6 @@ static struct virtio_transport virtio_transport = {
> > > .shutdown = virtio_transport_shutdown,
> > > .cancel_pkt = virtio_transport_cancel_pkt,
> > >
> > > - .dgram_bind = virtio_transport_dgram_bind,
> > > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > > .dgram_allow = virtio_transport_dgram_allow,
> > >
> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > index 01ea1402ad40..ffcbdd77feaa 100644
> > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > @@ -781,13 +781,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
> > > }
> > > EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> > >
> > > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > > - struct sockaddr_vm *addr)
> > > -{
> > > - return -EOPNOTSUPP;
> > > -}
> > > -EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> > > -
> > > bool virtio_transport_dgram_allow(u32 cid, u32 port)
> > > {
> > > return false;
> > > diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> > > index 0bbbdb222245..857b0461f856 100644
> > > --- a/net/vmw_vsock/vmci_transport.c
> > > +++ b/net/vmw_vsock/vmci_transport.c
> > > @@ -2072,7 +2072,7 @@ static int __init vmci_transport_init(void)
> > > /* Register only with dgram feature, other features (H2G, G2H) will be
> > > * registered when the first host or guest becomes active.
> > > */
> > > - err = vsock_core_register(&vmci_transport, VSOCK_TRANSPORT_F_DGRAM);
> > > + err = vsock_core_register(&vmci_transport, VSOCK_TRANSPORT_F_DGRAM_FALLBACK);
> > > if (err < 0)
> > > goto err_unsubscribe;
> > >
> > > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> > > index 2a59dd177c74..278235ea06c4 100644
> > > --- a/net/vmw_vsock/vsock_loopback.c
> > > +++ b/net/vmw_vsock/vsock_loopback.c
> > > @@ -61,7 +61,6 @@ static struct virtio_transport loopback_transport = {
> > > .shutdown = virtio_transport_shutdown,
> > > .cancel_pkt = vsock_loopback_cancel_pkt,
> > >
> > > - .dgram_bind = virtio_transport_dgram_bind,
> > > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > > .dgram_allow = virtio_transport_dgram_allow,
> > >
> > >
^ permalink raw reply
* [PATCH bpf-next v2 1/3] eth: add missing xdp.h includes in drivers
From: Jakub Kicinski @ 2023-08-03 1:02 UTC (permalink / raw)
To: ast
Cc: netdev, bpf, hawk, amritha.nambiar, aleksander.lobakin,
Jakub Kicinski, Wei Fang, Gerhard Engleder, j.vosburgh, andy,
shayagr, akiyano, ioana.ciornei, claudiu.manoil, vladimir.oltean,
shenwei.wang, xiaoning.wang, linux-imx, dmichail, jeroendb,
pkaligineedi, shailend, jesse.brandeburg, anthony.l.nguyen,
horatiu.vultur, UNGLinuxDriver, kys, haiyangz, wei.liu, decui,
peppe.cavallaro, alexandre.torgue, joabreu, mcoquelin.stm32,
grygorii.strashko, longli, sharmaajay, daniel, john.fastabend,
simon.horman, leon, linux-hyperv
In-Reply-To: <20230803010230.1755386-1-kuba@kernel.org>
Handful of drivers currently expect to get xdp.h by virtue
of including netdevice.h. This will soon no longer be the case
so add explicit includes.
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2: try a little harder with the alphabetic order of includes
CC: j.vosburgh@gmail.com
CC: andy@greyhouse.net
CC: shayagr@amazon.com
CC: akiyano@amazon.com
CC: ioana.ciornei@nxp.com
CC: claudiu.manoil@nxp.com
CC: vladimir.oltean@nxp.com
CC: shenwei.wang@nxp.com
CC: xiaoning.wang@nxp.com
CC: linux-imx@nxp.com
CC: dmichail@fungible.com
CC: jeroendb@google.com
CC: pkaligineedi@google.com
CC: shailend@google.com
CC: jesse.brandeburg@intel.com
CC: anthony.l.nguyen@intel.com
CC: horatiu.vultur@microchip.com
CC: UNGLinuxDriver@microchip.com
CC: kys@microsoft.com
CC: haiyangz@microsoft.com
CC: wei.liu@kernel.org
CC: decui@microsoft.com
CC: peppe.cavallaro@st.com
CC: alexandre.torgue@foss.st.com
CC: joabreu@synopsys.com
CC: mcoquelin.stm32@gmail.com
CC: grygorii.strashko@ti.com
CC: longli@microsoft.com
CC: sharmaajay@microsoft.com
CC: daniel@iogearbox.net
CC: hawk@kernel.org
CC: john.fastabend@gmail.com
CC: simon.horman@corigine.com
CC: leon@kernel.org
CC: linux-hyperv@vger.kernel.org
CC: bpf@vger.kernel.org
---
drivers/net/bonding/bond_main.c | 1 +
drivers/net/ethernet/amazon/ena/ena_netdev.h | 1 +
drivers/net/ethernet/engleder/tsnep.h | 1 +
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h | 1 +
drivers/net/ethernet/freescale/enetc/enetc.h | 1 +
drivers/net/ethernet/freescale/fec.h | 1 +
drivers/net/ethernet/fungible/funeth/funeth_txrx.h | 1 +
drivers/net/ethernet/google/gve/gve.h | 1 +
drivers/net/ethernet/intel/igc/igc.h | 1 +
drivers/net/ethernet/microchip/lan966x/lan966x_main.h | 1 +
drivers/net/ethernet/microsoft/mana/mana_en.c | 1 +
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 1 +
drivers/net/ethernet/ti/cpsw_priv.h | 1 +
drivers/net/hyperv/hyperv_net.h | 1 +
drivers/net/tap.c | 1 +
include/net/mana/mana.h | 2 ++
16 files changed, 17 insertions(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 7a0f25301f7e..2f21cca4fdaf 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -90,6 +90,7 @@
#include <net/tls.h>
#endif
#include <net/ip6_route.h>
+#include <net/xdp.h>
#include "bonding_priv.h"
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index 248b715b4d68..33c923e1261a 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -14,6 +14,7 @@
#include <linux/interrupt.h>
#include <linux/netdevice.h>
#include <linux/skbuff.h>
+#include <net/xdp.h>
#include <uapi/linux/bpf.h>
#include "ena_com.h"
diff --git a/drivers/net/ethernet/engleder/tsnep.h b/drivers/net/ethernet/engleder/tsnep.h
index 11b29f56aaf9..6e14c918e3fb 100644
--- a/drivers/net/ethernet/engleder/tsnep.h
+++ b/drivers/net/ethernet/engleder/tsnep.h
@@ -14,6 +14,7 @@
#include <linux/net_tstamp.h>
#include <linux/ptp_clock_kernel.h>
#include <linux/miscdevice.h>
+#include <net/xdp.h>
#define TSNEP "tsnep"
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
index d56d7a13262e..bfb6c96c3b2f 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
@@ -12,6 +12,7 @@
#include <linux/fsl/mc.h>
#include <linux/net_tstamp.h>
#include <net/devlink.h>
+#include <net/xdp.h>
#include <soc/fsl/dpaa2-io.h>
#include <soc/fsl/dpaa2-fd.h>
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 8577cf7699a0..7439739cd81a 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -11,6 +11,7 @@
#include <linux/if_vlan.h>
#include <linux/phylink.h>
#include <linux/dim.h>
+#include <net/xdp.h>
#include "enetc_hw.h"
diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h
index 8f1edcca96c4..5a0974e62f99 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -22,6 +22,7 @@
#include <linux/timecounter.h>
#include <dt-bindings/firmware/imx/rsrc.h>
#include <linux/firmware/imx/sci.h>
+#include <net/xdp.h>
#if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || \
defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARM) || \
diff --git a/drivers/net/ethernet/fungible/funeth/funeth_txrx.h b/drivers/net/ethernet/fungible/funeth/funeth_txrx.h
index 53b7e95213a8..5eec552a1f24 100644
--- a/drivers/net/ethernet/fungible/funeth/funeth_txrx.h
+++ b/drivers/net/ethernet/fungible/funeth/funeth_txrx.h
@@ -5,6 +5,7 @@
#include <linux/netdevice.h>
#include <linux/u64_stats_sync.h>
+#include <net/xdp.h>
/* Tx descriptor size */
#define FUNETH_SQE_SIZE 64U
diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index 4b425bf71ede..a31256f70348 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -11,6 +11,7 @@
#include <linux/netdevice.h>
#include <linux/pci.h>
#include <linux/u64_stats_sync.h>
+#include <net/xdp.h>
#include "gve_desc.h"
#include "gve_desc_dqo.h"
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 9db384f66a8e..4bffc3cb502f 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -15,6 +15,7 @@
#include <linux/net_tstamp.h>
#include <linux/bitfield.h>
#include <linux/hrtimer.h>
+#include <net/xdp.h>
#include "igc_hw.h"
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
index 27f272831ea5..eb7d81b5e9f8 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
@@ -14,6 +14,7 @@
#include <net/pkt_cls.h>
#include <net/pkt_sched.h>
#include <net/switchdev.h>
+#include <net/xdp.h>
#include <vcap_api.h>
#include <vcap_api_client.h>
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index ac2acc9aca9d..21665f114fe9 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -11,6 +11,7 @@
#include <net/checksum.h>
#include <net/ip6_checksum.h>
+#include <net/xdp.h>
#include <net/mana/mana.h>
#include <net/mana/mana_auxiliary.h>
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 4ce5eaaae513..a6d034968654 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -22,6 +22,7 @@
#include <linux/net_tstamp.h>
#include <linux/reset.h>
#include <net/page_pool.h>
+#include <net/xdp.h>
#include <uapi/linux/bpf.h>
struct stmmac_resources {
diff --git a/drivers/net/ethernet/ti/cpsw_priv.h b/drivers/net/ethernet/ti/cpsw_priv.h
index 34230145ca0b..0e27c433098d 100644
--- a/drivers/net/ethernet/ti/cpsw_priv.h
+++ b/drivers/net/ethernet/ti/cpsw_priv.h
@@ -6,6 +6,7 @@
#ifndef DRIVERS_NET_ETHERNET_TI_CPSW_PRIV_H_
#define DRIVERS_NET_ETHERNET_TI_CPSW_PRIV_H_
+#include <net/xdp.h>
#include <uapi/linux/bpf.h>
#include "davinci_cpdma.h"
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index c9dd69dbe1b8..810977952f95 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -16,6 +16,7 @@
#include <linux/hyperv.h>
#include <linux/rndis.h>
#include <linux/jhash.h>
+#include <net/xdp.h>
/* RSS related */
#define OID_GEN_RECEIVE_SCALE_CAPABILITIES 0x00010203 /* query only */
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 9137fb8c1c42..b196a2a54355 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -22,6 +22,7 @@
#include <net/net_namespace.h>
#include <net/rtnetlink.h>
#include <net/sock.h>
+#include <net/xdp.h>
#include <linux/virtio_net.h>
#include <linux/skb_array.h>
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 024ad8ddb27e..1ccdca03e166 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -4,6 +4,8 @@
#ifndef _MANA_H
#define _MANA_H
+#include <net/xdp.h>
+
#include "gdma.h"
#include "hw_channel.h"
--
2.41.0
^ permalink raw reply related
* Re: [PATCH 15/15] Drivers: hv: Add modules to expose /dev/mshv to VMMs running on Hyper-V
From: Wei Liu @ 2023-08-03 1:23 UTC (permalink / raw)
To: Nuno Das Neves
Cc: linux-hyperv, linux-kernel, x86, linux-arm-kernel, linux-arch,
mikelley, kys, wei.liu, haiyangz, decui, ssengar, mukeshrathor,
stanislav.kinsburskiy, jinankjain, apais, Tianyu.Lan, vkuznets,
tglx, mingo, bp, dave.hansen, hpa, will, catalin.marinas
In-Reply-To: <1690487690-2428-16-git-send-email-nunodasneves@linux.microsoft.com>
On Thu, Jul 27, 2023 at 12:54:50PM -0700, Nuno Das Neves wrote:
> Add mshv, mshv_root, and mshv_vtl modules.
> - mshv provides /dev/mshv and common code, and is the parent module
> - mshv_root provides APIs for creating and managing child partitions
> - mshv_vtl provides VTL (Virtual Trust Level) support for VMMs
Please provide a slightly more detailed description of what these
modules do. This is huge patch after all. People doing code archaeology
will appreciate a better commit message.
For example (please correct if I'm wrong):
Module mshv provides /dev/mshv and common code, and is the parent module
to the other two modules. At its core, it implements an eventfd frame
work, and defines some helper functions for the other modules.
Module mshv_root provides APIs for creating and managing child
partitions. It defines abstractions for vcpus, partitions and other
things related to running a guest inside the kernel. It also exposes
user space interfaces for the VMMs.
Module mshv_vtl provides VTL (Virtual Trust Level) support for VMMs. It
allows the VMM to run in a higher trust level than the guest but still
within the same context as the guest. This is a useful feature for in
guest emulation for better isolation and performance.
>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> ---
> drivers/hv/Kconfig | 54 +
> drivers/hv/Makefile | 21 +
> drivers/hv/hv_call.c | 119 ++
> drivers/hv/mshv.h | 156 +++
> drivers/hv/mshv_eventfd.c | 758 ++++++++++++
> drivers/hv/mshv_eventfd.h | 80 ++
> drivers/hv/mshv_main.c | 208 ++++
> drivers/hv/mshv_msi.c | 129 +++
> drivers/hv/mshv_portid_table.c | 84 ++
> drivers/hv/mshv_root.h | 194 ++++
> drivers/hv/mshv_root_hv_call.c | 1064 +++++++++++++++++
> drivers/hv/mshv_root_main.c | 1964 ++++++++++++++++++++++++++++++++
> drivers/hv/mshv_synic.c | 689 +++++++++++
> drivers/hv/mshv_vtl.h | 52 +
> drivers/hv/mshv_vtl_main.c | 1541 +++++++++++++++++++++++++
> drivers/hv/xfer_to_guest.c | 28 +
> include/uapi/linux/mshv.h | 298 +++++
> 17 files changed, 7439 insertions(+)
> create mode 100644 drivers/hv/hv_call.c
> create mode 100644 drivers/hv/mshv.h
> create mode 100644 drivers/hv/mshv_eventfd.c
> create mode 100644 drivers/hv/mshv_eventfd.h
> create mode 100644 drivers/hv/mshv_main.c
> create mode 100644 drivers/hv/mshv_msi.c
> create mode 100644 drivers/hv/mshv_portid_table.c
> create mode 100644 drivers/hv/mshv_root.h
> create mode 100644 drivers/hv/mshv_root_hv_call.c
> create mode 100644 drivers/hv/mshv_root_main.c
> create mode 100644 drivers/hv/mshv_synic.c
> create mode 100644 drivers/hv/mshv_vtl.h
> create mode 100644 drivers/hv/mshv_vtl_main.c
> create mode 100644 drivers/hv/xfer_to_guest.c
> create mode 100644 include/uapi/linux/mshv.h
>
> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> index 00242107d62e..b150d686e902 100644
> --- a/drivers/hv/Kconfig
> +++ b/drivers/hv/Kconfig
> @@ -54,4 +54,58 @@ config HYPERV_BALLOON
> help
> Select this option to enable Hyper-V Balloon driver.
>
> +config MSHV
> + tristate "Microsoft Hypervisor root partition interfaces: /dev/mshv"
> + depends on X86_64 && HYPERV
> + select EVENTFD
> + select MSHV_VFIO
This is not needed yet, right? I think this is just dead code right now.
It can be introduced when we start upstreaming the VFIO bits.
> + select MSHV_XFER_TO_GUEST_WORK
> + help
> + Select this option to enable core functionality for managing guest
> + virtual machines running under the Microsoft Hypervisor.
> +
> + The interfaces are provided via a device named /dev/mshv.
> +
> + To compile this as a module, choose M here.
> +
> + If unsure, say N.
> +
> +config MSHV_ROOT
> + tristate "Microsoft Hyper-V root partition APIs driver"
> + depends on MSHV
> + help
> + Select this option to provide /dev/mshv interfaces specific to
> + running as the root partition on Microsoft Hypervisor.
> +
> + To compile this as a module, choose M here.
> +
> + If unsure, say N.
> +
> +config MSHV_VTL
> + tristate "Microsoft Hyper-V VTL driver"
> + depends on MSHV
> + select HYPERV_VTL_MODE
> + select TRANSPARENT_HUGEPAGE
> + help
> + Select this option to enable Hyper-V VTL driver.
> + Virtual Secure Mode (VSM) is a set of hypervisor capabilities and
> + enlightenments offered to host and guest partitions which enables
> + the creation and management of new security boundaries within
> + operating system software.
> +
> + VSM achieves and maintains isolation through Virtual Trust Levels
> + (VTLs). Virtual Trust Levels are hierarchical, with higher levels
> + being more privileged than lower levels. VTL0 is the least privileged
> + level, and currently only other level supported is VTL2.
> +
> + To compile this as a module, choose M here.
> +
> + If unsure, say N.
The changes to the function which indicates if output pages are needed
should be in this patch.
> +
> +config MSHV_VFIO
> + bool
> +
> +config MSHV_XFER_TO_GUEST_WORK
> + bool
> +
> endmenu
> diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
> index d76df5c8c2a9..113c79cfadb9 100644
> --- a/drivers/hv/Makefile
> +++ b/drivers/hv/Makefile
> @@ -2,10 +2,31 @@
> obj-$(CONFIG_HYPERV) += hv_vmbus.o
> obj-$(CONFIG_HYPERV_UTILS) += hv_utils.o
> obj-$(CONFIG_HYPERV_BALLOON) += hv_balloon.o
> +obj-$(CONFIG_DXGKRNL) += dxgkrnl/
This is not yet upstreamed. It shouldn't be here. Does this not break
the build for you?
The rest is basically a copy of what was posted many moons before plus
some VTL stuff, and new code for the root scheduler and async hypercall
support. I've probably gone through some versions of this code already,
so I only skim the code.
Since this is a Microsoft only driver, I don't expect to get much review
from the community -- the last few rounds were quiet. I will however let
this patch series float for a while before taking any further actions
just in case.
If people are interested in specific bits of the code in the driver,
please let Nuno and I know.
Thanks,
Wei.
^ permalink raw reply
* Re: [PATCH 01/15] hyperv-tlfs: Change shared HV_REGISTER_* defines to HV_MSR_*
From: Wei Liu @ 2023-08-03 1:25 UTC (permalink / raw)
To: Nuno Das Neves
Cc: linux-hyperv, linux-kernel, x86, linux-arm-kernel, linux-arch,
mikelley, kys, wei.liu, haiyangz, decui, ssengar, mukeshrathor,
stanislav.kinsburskiy, jinankjain, apais, Tianyu.Lan, vkuznets,
tglx, mingo, bp, dave.hansen, hpa, will, catalin.marinas
In-Reply-To: <1690487690-2428-2-git-send-email-nunodasneves@linux.microsoft.com>
On Thu, Jul 27, 2023 at 12:54:36PM -0700, Nuno Das Neves wrote:
> In x86 hyperv-tlfs, HV_REGISTER_ prefix is used to indicate MSRs
> accessed via rdmsrl/wrmsrl. But in ARM64, HV_REGISTER_ instead indicates
> VP registers accessed via get/set vp registers hypercall.
>
> This is due to HV_REGISTER_* names being used by hv_set/get_register,
> with the arch-specific version delegating to the appropriate mechanism.
>
> The problem is, using prefix HV_REGISTER_ for MSRs will conflict with
> VP registers when they are introduced for x86 in future.
>
> This patch solves the issue by:
>
> 1. Defining all the x86 MSRs with a consistent prefix: HV_X64_MSR_.
> This is so HV_REGISTER_ can be reserved for VP registers.
>
> 2. Change the non-arch-specific alias used by hv_set/get_register to
> HV_MSR_. This is also happens to be the same name HyperV uses for this
> purpose.
>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Assuming the prior (private?) discussion with Michael led to this patch:
Acked-by: Wei Liu <wei.liu@kernel.org>
^ permalink raw reply
* Re: [PATCH V6 net-next] net: mana: Configure hwc timeout from hardware
From: Jesse Brandeburg @ 2023-08-03 1:35 UTC (permalink / raw)
To: Souradeep Chakrabarti, kys, haiyangz, wei.liu, decui, davem,
edumazet, kuba, pabeni, longli, sharmaajay, leon, cai.huoqing,
ssengar, vkuznets, tglx, linux-hyperv, netdev, linux-kernel,
linux-rdma
Cc: schakrabarti
In-Reply-To: <1690974460-15660-1-git-send-email-schakrabarti@linux.microsoft.com>
On 8/2/2023 4:07 AM, Souradeep Chakrabarti wrote:
> At present hwc timeout value is a fixed value. This patch sets the hwc
> timeout from the hardware. It now uses a new hardware capability
> GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG to query and set the value
> in hwc_timeout.
>
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Looks sane, thanks!
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
For future patches please use imperative mood for your patch
descriptions, no "This patch" [1]
[1]
https://docs.kernel.org/process/submitting-patches.html?highlight=imperative+mood#:~:text=Describe%20your%20changes%20in%20imperative%20mood%2C%20e.g.%20%22make%20xyzzy%20do%20frotz%22%20instead%20of%20%22%5BThis%20patch%5D%20makes%20xyzzy%20do%20frotz%22%20or%20%22%5BI%5D%20changed%20xyzzy%20to%20do%20frotz%22%2C%20as%20if%20you%20are%20giving%20orders%20to%20the%20codebase%20to%20change%20its%20behaviour.
^ permalink raw reply
* Re: [PATCH V5,net-next] net: mana: Add page pool for RX buffers
From: Jesse Brandeburg @ 2023-08-03 1:44 UTC (permalink / raw)
To: Haiyang Zhang, linux-hyperv, netdev
Cc: decui, kys, paulros, olaf, vkuznets, davem, wei.liu, edumazet,
kuba, pabeni, leon, longli, ssengar, linux-rdma, daniel,
john.fastabend, bpf, ast, sharmaajay, hawk, tglx, shradhagupta,
linux-kernel
In-Reply-To: <1690999650-9557-1-git-send-email-haiyangz@microsoft.com>
On 8/2/2023 11:07 AM, Haiyang Zhang wrote:
> Add page pool for RX buffers for faster buffer cycle and reduce CPU
> usage.
>
> The standard page pool API is used.
>
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
> V5:
> In err path, set page_pool_put_full_page(..., false) as suggested by
> Jakub Kicinski
> V4:
> Add nid setting, remove page_pool_nid_changed(), as suggested by
> Jesper Dangaard Brouer
> V3:
> Update xdp mem model, pool param, alloc as suggested by Jakub Kicinski
> V2:
> Use the standard page pool API as suggested by Jesper Dangaard Brouer
> ---
> diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> index 024ad8ddb27e..b12859511839 100644
> --- a/include/net/mana/mana.h
> +++ b/include/net/mana/mana.h
> @@ -280,6 +280,7 @@ struct mana_recv_buf_oob {
> struct gdma_wqe_request wqe_req;
>
> void *buf_va;
> + bool from_pool; /* allocated from a page pool */
suggest you use flags and not bools, as bools waste 7 bits each, plus
your packing of this struct will be full of holes, made worse by this
patch. (see pahole tool)
>
> /* SGL of the buffer going to be sent has part of the work request. */
> u32 num_sge;
> @@ -330,6 +331,8 @@ struct mana_rxq {
> bool xdp_flush;
> int xdp_rc; /* XDP redirect return code */
>
> + struct page_pool *page_pool;
> +
> /* MUST BE THE LAST MEMBER:
> * Each receive buffer has an associated mana_recv_buf_oob.
> */
The rest of the patch looks ok and is remarkably compact for a
conversion to page pool. I'd prefer someone with more page pool exposure
review this for correctness, but FWIW
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
^ permalink raw reply
* RE: [PATCH v3 net] net: hv_netvsc: fix netvsc_send_completion to avoid multiple message length checks
From: Michael Kelley (LINUX) @ 2023-08-03 2:04 UTC (permalink / raw)
To: Sonia Sharma, linux-kernel@vger.kernel.org
Cc: linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
Sonia Sharma, KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org,
Dexuan Cui, Long Li, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com
In-Reply-To: <1691023528-5270-1-git-send-email-sosha@linux.microsoft.com>
From: Sonia Sharma <sosha@linux.microsoft.com> Sent: Wednesday, August 2, 2023 5:45 PM
>
> The switch statement in netvsc_send_completion() is incorrectly validating
> the length of incoming network packets by falling through to the next case.
> Avoid the fallthrough. Instead break after a case match and then process
> the complete() call.
>
> Signed-off-by: Sonia Sharma <sonia.sharma@linux.microsoft.com>
> ---
> Changes in v3:
> * added return statement in default case as pointed by Michael Kelley..
> ---
> drivers/net/hyperv/netvsc.c | 18 ++++++++++--------
> 1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 82e9796c8f5e..0f7e4d377776 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -851,7 +851,7 @@ static void netvsc_send_completion(struct net_device *ndev,
> msglen);
> return;
> }
> - fallthrough;
> + break;
>
> case NVSP_MSG1_TYPE_SEND_RECV_BUF_COMPLETE:
> if (msglen < sizeof(struct nvsp_message_header) +
> @@ -860,7 +860,7 @@ static void netvsc_send_completion(struct net_device *ndev,
> msglen);
> return;
> }
> - fallthrough;
> + break;
>
> case NVSP_MSG1_TYPE_SEND_SEND_BUF_COMPLETE:
> if (msglen < sizeof(struct nvsp_message_header) +
> @@ -869,7 +869,7 @@ static void netvsc_send_completion(struct net_device *ndev,
> msglen);
> return;
> }
> - fallthrough;
> + break;
>
> case NVSP_MSG5_TYPE_SUBCHANNEL:
> if (msglen < sizeof(struct nvsp_message_header) +
> @@ -878,10 +878,6 @@ static void netvsc_send_completion(struct net_device *ndev,
> msglen);
> return;
> }
> - /* Copy the response back */
> - memcpy(&net_device->channel_init_pkt, nvsp_packet,
> - sizeof(struct nvsp_message));
> - complete(&net_device->channel_init_wait);
> break;
>
> case NVSP_MSG1_TYPE_SEND_RNDIS_PKT_COMPLETE:
> @@ -904,13 +900,19 @@ static void netvsc_send_completion(struct net_device
> *ndev,
>
> netvsc_send_tx_complete(ndev, net_device, incoming_channel,
> desc, budget);
> - break;
> + return;
>
> default:
> netdev_err(ndev,
> "Unknown send completion type %d received!!\n",
> nvsp_packet->hdr.msg_type);
> + return;
> }
> +
> + /* Copy the response back */
> + memcpy(&net_device->channel_init_pkt, nvsp_packet,
> + sizeof(struct nvsp_message));
> + complete(&net_device->channel_init_wait);
> }
>
> static u32 netvsc_get_next_send_section(struct netvsc_device *net_device)
> --
> 2.25.1
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
^ permalink raw reply
* RE: [PATCH v3 2/3] tools: hv: Add vmbus_bufring
From: Saurabh Singh Sengar @ 2023-08-03 12:06 UTC (permalink / raw)
To: Michael Kelley (LINUX), Saurabh Sengar, KY Srinivasan,
Haiyang Zhang, wei.liu@kernel.org, Dexuan Cui,
gregkh@linuxfoundation.org, corbet@lwn.net,
linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
linux-doc@vger.kernel.org
In-Reply-To: <BYAPR21MB1688F11EF5924D8D69F97203D70BA@BYAPR21MB1688.namprd21.prod.outlook.com>
> -----Original Message-----
> From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Sent: Thursday, August 3, 2023 3:14 AM
> To: Saurabh Sengar <ssengar@linux.microsoft.com>; KY Srinivasan
> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
> wei.liu@kernel.org; Dexuan Cui <decui@microsoft.com>;
> gregkh@linuxfoundation.org; corbet@lwn.net; linux-kernel@vger.kernel.org;
> linux-hyperv@vger.kernel.org; linux-doc@vger.kernel.org
> Subject: [EXTERNAL] RE: [PATCH v3 2/3] tools: hv: Add vmbus_bufring
>
> From: Saurabh Sengar <ssengar@linux.microsoft.com> Sent: Friday, July 14,
> 2023 3:26 AM
> >
> > Provide a userspace interface for userspace drivers or applications to
> > read/write a VMBus ringbuffer. A significant part of this code is
> > borrowed from DPDK[1]. Current library is supported exclusively for
> > the x86 architecture.
> >
> > To build this library:
> > make -C tools/hv libvmbus_bufring.a
> >
> > Applications using this library can include the vmbus_bufring.h header
> > file and libvmbus_bufring.a statically.
> >
> > [1]
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> >
> ub.com%2FDPDK%2Fdpdk%2F&data=05%7C01%7Cssengar%40microsoft.com
> %7C7aa6d
> >
> 4dbbcb44895db5008db93a193c9%7C72f988bf86f141af91ab2d7cd011db47%7
> C1%7C0
> >
> %7C638266094508922046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj
> AwMDAiLCJQ
> >
> IjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata
> =O0cvl
> > EWlbNS51VoaBHo5l2wWDDjAFJVdfDeT3t%2FR36Y%3D&reserved=0
> >
> > Signed-off-by: Mary Hardy <maryhardy@microsoft.com>
> > Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> > ---
> > [V3]
> > - Made ring buffer data offset depend on page size
> > - remove rte_smp_rwmb macro and reused rte_compiler_barrier instead
> > - Added legal counsel sign-off
> > - Removed "Link:" tag
> > - Improve commit messages
> > - new library compilation dependent on x86
> > - simplify mmap
> >
> > [V2]
> > - simpler sysfs path, less parsing
> >
> > tools/hv/Build | 1 +
> > tools/hv/Makefile | 13 +-
> > tools/hv/vmbus_bufring.c | 297
> > +++++++++++++++++++++++++++++++++++++++
> > tools/hv/vmbus_bufring.h | 154 ++++++++++++++++++++
> > 4 files changed, 464 insertions(+), 1 deletion(-) create mode 100644
> > tools/hv/vmbus_bufring.c create mode 100644 tools/hv/vmbus_bufring.h
> >
> > diff --git a/tools/hv/Build b/tools/hv/Build index
> > 6cf51fa4b306..2a667d3d94cb 100644
> > --- a/tools/hv/Build
> > +++ b/tools/hv/Build
> > @@ -1,3 +1,4 @@
> > hv_kvp_daemon-y += hv_kvp_daemon.o
> > hv_vss_daemon-y += hv_vss_daemon.o
> > hv_fcopy_daemon-y += hv_fcopy_daemon.o
> > +vmbus_bufring-y += vmbus_bufring.o
> > diff --git a/tools/hv/Makefile b/tools/hv/Makefile index
> > fe770e679ae8..33cf488fd20f 100644
> > --- a/tools/hv/Makefile
> > +++ b/tools/hv/Makefile
> > @@ -11,14 +11,19 @@ srctree := $(patsubst %/,%,$(dir $(CURDIR)))
> > srctree := $(patsubst %/,%,$(dir $(srctree))) endif
> >
> > +include $(srctree)/tools/scripts/Makefile.arch
> > +
> > # Do not use make's built-in rules
> > # (this improves performance and avoids hard-to-debug behaviour);
> > MAKEFLAGS += -r
> >
> > override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -I$(OUTPUT)include
> >
> > +ifeq ($(SRCARCH),x86)
> > +ALL_LIBS := libvmbus_bufring.a
> > +endif
> > ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
> > -ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
> > +ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS)) $(patsubst
> > %,$(OUTPUT)%,$(ALL_LIBS))
> >
> > ALL_SCRIPTS := hv_get_dhcp_info.sh hv_get_dns_info.sh
> > hv_set_ifconfig.sh
> >
> > @@ -27,6 +32,12 @@ all: $(ALL_PROGRAMS) export srctree OUTPUT CC LD
> > CFLAGS include $(srctree)/tools/build/Makefile.include
> >
> > +HV_VMBUS_BUFRING_IN := $(OUTPUT)vmbus_bufring.o
> > +$(HV_VMBUS_BUFRING_IN): FORCE
> > + $(Q)$(MAKE) $(build)=vmbus_bufring
> > +$(OUTPUT)libvmbus_bufring.a : vmbus_bufring.o
> > + $(AR) rcs $@ $^
> > +
> > HV_KVP_DAEMON_IN := $(OUTPUT)hv_kvp_daemon-in.o
> > $(HV_KVP_DAEMON_IN): FORCE
> > $(Q)$(MAKE) $(build)=hv_kvp_daemon
> > diff --git a/tools/hv/vmbus_bufring.c b/tools/hv/vmbus_bufring.c new
> > file mode 100644 index 000000000000..fb1f0489c625
> > --- /dev/null
> > +++ b/tools/hv/vmbus_bufring.c
> > @@ -0,0 +1,297 @@
> > +// SPDX-License-Identifier: BSD-3-Clause
> > +/*
> > + * Copyright (c) 2009-2012,2016,2023 Microsoft Corp.
> > + * Copyright (c) 2012 NetApp Inc.
> > + * Copyright (c) 2012 Citrix Inc.
> > + * All rights reserved.
> > + */
> > +
> > +#include <errno.h>
> > +#include <emmintrin.h>
> > +#include <stdio.h>
> > +#include <string.h>
> > +#include <unistd.h>
> > +#include <sys/uio.h>
> > +#include "vmbus_bufring.h"
> > +
> > +#define rte_compiler_barrier() ({ asm volatile ("" : : : "memory"); })
> > +#define RINGDATA_START_OFFSET (getpagesize())
> > +#define VMBUS_RQST_ERROR 0xFFFFFFFFFFFFFFFF
> > +#define ALIGN(val, align) ((typeof(val))((val) & (~((typeof(val))((align) -
> 1)))))
> > +
> > +/* Increase bufring index by inc with wraparound */ static inline
> > +uint32_t vmbus_br_idxinc(uint32_t idx, uint32_t inc, uint32_t sz) {
> > + idx += inc;
> > + if (idx >= sz)
> > + idx -= sz;
> > +
> > + return idx;
> > +}
> > +
> > +void vmbus_br_setup(struct vmbus_br *br, void *buf, unsigned int
> > +blen) {
> > + br->vbr = buf;
> > + br->windex = br->vbr->windex;
> > + br->dsize = blen - RINGDATA_START_OFFSET; }
> > +
> > +static inline __always_inline void
> > +rte_smp_mb(void)
> > +{
> > + asm volatile("lock addl $0, -128(%%rsp); " ::: "memory"); }
> > +
> > +static inline int
> > +rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t
> > +src) {
> > + uint8_t res;
> > +
> > + asm volatile("lock ; "
> > + "cmpxchgl %[src], %[dst];"
> > + "sete %[res];"
> > + : [res] "=a" (res), /* output */
> > + [dst] "=m" (*dst)
> > + : [src] "r" (src), /* input */
> > + "a" (exp),
> > + "m" (*dst)
> > + : "memory"); /* no-clobber list */
> > + return res;
> > +}
> > +
> > +static inline uint32_t
> > +vmbus_txbr_copyto(const struct vmbus_br *tbr, uint32_t windex,
> > + const void *src0, uint32_t cplen) {
> > + uint8_t *br_data = (uint8_t *)tbr->vbr + RINGDATA_START_OFFSET;
> > + uint32_t br_dsize = tbr->dsize;
> > + const uint8_t *src = src0;
> > +
> > + if (cplen > br_dsize - windex) {
> > + uint32_t fraglen = br_dsize - windex;
> > +
> > + /* Wrap-around detected */
> > + memcpy(br_data + windex, src, fraglen);
> > + memcpy(br_data, src + fraglen, cplen - fraglen);
> > + } else {
> > + memcpy(br_data + windex, src, cplen);
> > + }
> > +
> > + return vmbus_br_idxinc(windex, cplen, br_dsize); }
> > +
> > +/*
> > + * Write scattered channel packet to TX bufring.
> > + *
> > + * The offset of this channel packet is written as a 64bits value
> > + * immediately after this channel packet.
> > + *
> > + * The write goes through three stages:
> > + * 1. Reserve space in ring buffer for the new data.
> > + * Writer atomically moves priv_write_index.
> > + * 2. Copy the new data into the ring.
> > + * 3. Update the tail of the ring (visible to host) that indicates
> > + * next read location. Writer updates write_index
> > + */
> > +static int
> > +vmbus_txbr_write(struct vmbus_br *tbr, const struct iovec iov[], int iovlen,
> > + bool *need_sig)
> > +{
> > + struct vmbus_bufring *vbr = tbr->vbr;
> > + uint32_t ring_size = tbr->dsize;
> > + uint32_t old_windex, next_windex, windex, total;
> > + uint64_t save_windex;
> > + int i;
> > +
> > + total = 0;
> > + for (i = 0; i < iovlen; i++)
> > + total += iov[i].iov_len;
> > + total += sizeof(save_windex);
> > +
> > + /* Reserve space in ring */
> > + do {
> > + uint32_t avail;
> > +
> > + /* Get current free location */
> > + old_windex = tbr->windex;
> > +
> > + /* Prevent compiler reordering this with calculation */
> > + rte_compiler_barrier();
> > +
> > + avail = vmbus_br_availwrite(tbr, old_windex);
> > +
> > + /* If not enough space in ring, then tell caller. */
> > + if (avail <= total)
> > + return -EAGAIN;
> > +
> > + next_windex = vmbus_br_idxinc(old_windex, total, ring_size);
> > +
> > + /* Atomic update of next write_index for other threads */
> > + } while (!rte_atomic32_cmpset(&tbr->windex, old_windex,
> > +next_windex));
> > +
> > + /* Space from old..new is now reserved */
> > + windex = old_windex;
> > + for (i = 0; i < iovlen; i++)
> > + windex = vmbus_txbr_copyto(tbr, windex, iov[i].iov_base,
> > +iov[i].iov_len);
> > +
> > + /* Set the offset of the current channel packet. */
> > + save_windex = ((uint64_t)old_windex) << 32;
> > + windex = vmbus_txbr_copyto(tbr, windex, &save_windex,
> > + sizeof(save_windex));
> > +
> > + /* The region reserved should match region used */
> > + if (windex != next_windex)
> > + return -EINVAL;
> > +
> > + /* Ensure that data is available before updating host index */
> > + rte_compiler_barrier();
> > +
> > + /* Checkin for our reservation. wait for our turn to update host */
> > + while (!rte_atomic32_cmpset(&vbr->windex, old_windex,
> next_windex))
> > + _mm_pause();
> > +
> > + return 0;
> > +}
> > +
> > +int rte_vmbus_chan_send(struct vmbus_br *txbr, uint16_t type, void
> *data,
> > + uint32_t dlen, uint32_t flags)
> > +{
> > + struct vmbus_chanpkt pkt;
> > + unsigned int pktlen, pad_pktlen;
> > + const uint32_t hlen = sizeof(pkt);
> > + bool send_evt = false;
> > + uint64_t pad = 0;
> > + struct iovec iov[3];
> > + int error;
> > +
> > + pktlen = hlen + dlen;
> > + pad_pktlen = ALIGN(pktlen, sizeof(uint64_t));
>
> This ALIGN function rounds down. So pad_pktlen could be less than pktlen.
Thanks for pointing this, will fix.
>
> > +
> > + pkt.hdr.type = type;
> > + pkt.hdr.flags = flags;
> > + pkt.hdr.hlen = hlen >> VMBUS_CHANPKT_SIZE_SHIFT;
> > + pkt.hdr.tlen = pad_pktlen >> VMBUS_CHANPKT_SIZE_SHIFT;
> > + pkt.hdr.xactid = VMBUS_RQST_ERROR; /* doesn't support multiple
> > +requests at same time */
> > +
> > + iov[0].iov_base = &pkt;
> > + iov[0].iov_len = hlen;
> > + iov[1].iov_base = data;
> > + iov[1].iov_len = dlen;
> > + iov[2].iov_base = &pad;
> > + iov[2].iov_len = pad_pktlen - pktlen;
>
> Given the way your ALIGN function works, the above could produce a
> negative value for iov[2].iov_len. Then bad things will happen. :-(
Got it.
>
> > +
> > + error = vmbus_txbr_write(txbr, iov, 3, &send_evt);
> > +
> > + return error;
> > +}
> > +
<snip>
> > + * we can request that the receiver interrupt the sender
> > + * when the ring transitions from being full to being able
> > + * to handle a message of size "pending_send_sz".
> > + *
> > + * Add necessary state for this enhancement.
> > + */
> > + volatile uint32_t pending_send;
> > + uint32_t reserved1[12];
> > +
> > + union {
> > + struct {
> > + uint32_t feat_pending_send_sz:1;
> > + };
> > + uint32_t value;
> > + } feature_bits;
> > +
> > + /*
> > + * Ring data starts here + RingDataStartOffset
>
> This mention of RingDataStartOffset looks stale. I could not find it defined
> anywhere.
Will correct it to:
Ring data starts after PAGE_SIZE offset from the start of this struct (RINGDATA_START_OFFSET).
- Saurabh
^ permalink raw reply
* RE: [PATCH v3 3/3] tools: hv: Add new fcopy application based on uio driver
From: Saurabh Singh Sengar @ 2023-08-03 12:12 UTC (permalink / raw)
To: Michael Kelley (LINUX), Saurabh Sengar, KY Srinivasan,
Haiyang Zhang, wei.liu@kernel.org, Dexuan Cui,
gregkh@linuxfoundation.org, corbet@lwn.net,
linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
linux-doc@vger.kernel.org
In-Reply-To: <BYAPR21MB16884ABB3DAD7B8213EF5D6DD70BA@BYAPR21MB1688.namprd21.prod.outlook.com>
> -----Original Message-----
> From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Sent: Thursday, August 3, 2023 3:15 AM
> To: Saurabh Sengar <ssengar@linux.microsoft.com>; KY Srinivasan
> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
> wei.liu@kernel.org; Dexuan Cui <decui@microsoft.com>;
> gregkh@linuxfoundation.org; corbet@lwn.net; linux-kernel@vger.kernel.org;
> linux-hyperv@vger.kernel.org; linux-doc@vger.kernel.org
> Subject: [EXTERNAL] RE: [PATCH v3 3/3] tools: hv: Add new fcopy application
> based on uio driver
>
> From: Saurabh Sengar <ssengar@linux.microsoft.com> Sent: Friday, July 14,
> 2023 3:26 AM
> >
> > Implement the file copy service for Linux guests on Hyper-V. This
> > permits the host to copy a file (over VMBus) into the guest. This
> > facility is part of "guest integration services" supported on the
> > Hyper-V platform.
> >
> > Here is a link that provides additional details on this functionality:
> >
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flear
> > n.microsoft.com%2Fen-us%2Fpowershell%2Fmodule%2Fhyper-v%2Fcopy-
> vmfile%
> > 3Fview%3Dwindowsserver2022-
> ps&data=05%7C01%7Cssengar%40microsoft.com%7
> >
> Ca5edc1b9d6574e2e6e3108db93a1c558%7C72f988bf86f141af91ab2d7cd011
> db47%7
> >
> C1%7C0%7C638266095311741847%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> MC4wLjAwMD
> >
> AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
> &sdata
> > =VudSwIKYJJJgPKxNbbfCnOjia1lfKCdijnSn94OWm8Q%3D&reserved=0
> >
> > This new fcopy application uses uio_hv_vmbus_client driver which makes
> > the earlier hv_util based driver and application obsolete.
> >
> > Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> > ---
> > [V3]
> > - Improve cover letter and commit messages
> > - Improve debug prints
> > - Instead of hardcoded instance id, query from class id sysfs
> > - Set the ring_size value from application
> > - Update the application to mmap /dev/uio instead of sysfs
> > - new application compilation dependent on x86
> >
> > [V2]
> > - simpler sysfs path
> >
> > tools/hv/Build | 1 +
> > tools/hv/Makefile | 10 +-
> > tools/hv/hv_fcopy_uio_daemon.c | 578
> > +++++++++++++++++++++++++++++++++
> > 3 files changed, 588 insertions(+), 1 deletion(-) create mode 100644
> > tools/hv/hv_fcopy_uio_daemon.c
> >
> > diff --git a/tools/hv/Build b/tools/hv/Build index
> > 2a667d3d94cb..efcbb74a0d23 100644
> > --- a/tools/hv/Build
> > +++ b/tools/hv/Build
> > @@ -2,3 +2,4 @@ hv_kvp_daemon-y += hv_kvp_daemon.o
> hv_vss_daemon-y +=
> > hv_vss_daemon.o hv_fcopy_daemon-y += hv_fcopy_daemon.o
> > vmbus_bufring-y += vmbus_bufring.o
> > +hv_fcopy_uio_daemon-y += hv_fcopy_uio_daemon.o
> > diff --git a/tools/hv/Makefile b/tools/hv/Makefile index
> > 33cf488fd20f..678c6c450a53 100644
> > --- a/tools/hv/Makefile
> > +++ b/tools/hv/Makefile
> > @@ -21,8 +21,10 @@ override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -
> > I$(OUTPUT)include
> >
> > ifeq ($(SRCARCH),x86)
> > ALL_LIBS := libvmbus_bufring.a
> > -endif
> > +ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
> > hv_fcopy_uio_daemon
> > +else
> > ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
> > +endif
> > ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS)) $(patsubst
> > %,$(OUTPUT)%,$(ALL_LIBS))
> >
> > ALL_SCRIPTS := hv_get_dhcp_info.sh hv_get_dns_info.sh
> > hv_set_ifconfig.sh @@ -56,6 +58,12 @@ $(HV_FCOPY_DAEMON_IN):
> FORCE
> > $(OUTPUT)hv_fcopy_daemon: $(HV_FCOPY_DAEMON_IN)
> > $(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
> >
> > +HV_FCOPY_UIO_DAEMON_IN := $(OUTPUT)hv_fcopy_uio_daemon-in.o
> > +$(HV_FCOPY_UIO_DAEMON_IN): FORCE
> > + $(Q)$(MAKE) $(build)=hv_fcopy_uio_daemon
> > +$(OUTPUT)hv_fcopy_uio_daemon: $(HV_FCOPY_UIO_DAEMON_IN)
> > libvmbus_bufring.a
> > + $(QUIET_LINK)$(CC) -lm $< -L. -lvmbus_bufring -o $@
> > +
> > clean:
> > rm -f $(ALL_PROGRAMS)
> > find $(or $(OUTPUT),.) -name '*.o' -delete -o -name '\.*.d' -delete
> > diff --git a/tools/hv/hv_fcopy_uio_daemon.c
> > b/tools/hv/hv_fcopy_uio_daemon.c new file mode 100644 index
> > 000000000000..e8618a30dc7e
> > --- /dev/null
> > +++ b/tools/hv/hv_fcopy_uio_daemon.c
> > @@ -0,0 +1,578 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * An implementation of host to guest copy functionality for Linux.
> > + *
> > + * Copyright (C) 2023, Microsoft, Inc.
> > + *
> > + * Author : K. Y. Srinivasan <kys@microsoft.com>
> > + * Author : Saurabh Sengar <ssengar@microsoft.com>
> > + *
> > + */
> > +
> > +#include <dirent.h>
> > +#include <errno.h>
> > +#include <fcntl.h>
> > +#include <getopt.h>
> > +#include <locale.h>
> > +#include <stdbool.h>
> > +#include <stddef.h>
> > +#include <stdint.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <syslog.h>
> > +#include <unistd.h>
> > +#include <sys/mman.h>
> > +#include <sys/stat.h>
> > +#include <linux/hyperv.h>
> > +#include "vmbus_bufring.h"
> > +
> > +#define ICMSGTYPE_NEGOTIATE 0
> > +#define ICMSGTYPE_FCOPY 7
> > +
> > +#define WIN8_SRV_MAJOR 1
> > +#define WIN8_SRV_MINOR 1
> > +#define WIN8_SRV_VERSION (WIN8_SRV_MAJOR << 16 |
> WIN8_SRV_MINOR)
> > +
> > +#define MAX_PATH_LEN 300
> > +#define MAX_LINE_LEN 40
> > +#define DEVICES_SYSFS "/sys/bus/vmbus/devices"
> > +#define FCOPY_CLASS_ID "34d14be3-dee4-41c8-9ae7-
> 6b174977c192"
> > +
> > +#define FCOPY_VER_COUNT 1
> > +static const int fcopy_versions[] = {
> > + WIN8_SRV_VERSION
> > +};
> > +
> > +#define FW_VER_COUNT 1
> > +static const int fw_versions[] = {
> > + UTIL_FW_VERSION
> > +};
> > +
> > +#define HV_RING_SIZE (4 * 4096)
> > +
> > +unsigned char desc[HV_RING_SIZE];
> > +
> > +static int target_fd;
> > +static char target_fname[PATH_MAX];
> > +static unsigned long long filesize;
> > +
> > +static int hv_fcopy_create_file(char *file_name, char *path_name,
> > +__u32 flags) {
> > + int error = HV_E_FAIL;
> > + char *q, *p;
> > +
> > + filesize = 0;
> > + p = (char *)path_name;
> > + snprintf(target_fname, sizeof(target_fname), "%s/%s",
> > + (char *)path_name, (char *)file_name);
> > +
> > + /*
> > + * Check to see if the path is already in place; if not,
> > + * create if required.
> > + */
> > + while ((q = strchr(p, '/')) != NULL) {
> > + if (q == p) {
> > + p++;
> > + continue;
> > + }
> > + *q = '\0';
> > + if (access(path_name, F_OK)) {
> > + if (flags & CREATE_PATH) {
> > + if (mkdir(path_name, 0755)) {
> > + syslog(LOG_ERR, "Failed to create
> %s",
> > + path_name);
> > + goto done;
> > + }
> > + } else {
> > + syslog(LOG_ERR, "Invalid path: %s",
> path_name);
> > + goto done;
> > + }
> > + }
> > + p = q + 1;
> > + *q = '/';
> > + }
> > +
> > + if (!access(target_fname, F_OK)) {
> > + syslog(LOG_INFO, "File: %s exists", target_fname);
> > + if (!(flags & OVER_WRITE)) {
> > + error = HV_ERROR_ALREADY_EXISTS;
> > + goto done;
> > + }
> > + }
> > +
> > + target_fd = open(target_fname,
> > + O_RDWR | O_CREAT | O_TRUNC | O_CLOEXEC,
> 0744);
> > + if (target_fd == -1) {
> > + syslog(LOG_INFO, "Open Failed: %s", strerror(errno));
> > + goto done;
> > + }
> > +
> > + error = 0;
> > +done:
> > + if (error)
> > + target_fname[0] = '\0';
> > + return error;
> > +}
> > +
> > +static int hv_copy_data(struct hv_do_fcopy *cpmsg) {
> > + ssize_t bytes_written;
> > + int ret = 0;
> > +
> > + bytes_written = pwrite(target_fd, cpmsg->data, cpmsg->size,
> > + cpmsg->offset);
> > +
> > + filesize += cpmsg->size;
> > + if (bytes_written != cpmsg->size) {
> > + switch (errno) {
> > + case ENOSPC:
> > + ret = HV_ERROR_DISK_FULL;
> > + break;
> > + default:
> > + ret = HV_E_FAIL;
> > + break;
> > + }
> > + syslog(LOG_ERR, "pwrite failed to write %llu bytes: %ld (%s)",
> > + filesize, (long)bytes_written, strerror(errno));
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +/*
> > + * Reset target_fname to "" in the two below functions for
> > +hibernation: if
> > + * the fcopy operation is aborted by hibernation, the daemon should
> > +remove the
> > + * partially-copied file; to achieve this, the hv_utils driver always
> > +fakes a
> > + * CANCEL_FCOPY message upon suspend, and later when the VM
> resumes
> > +back,
> > + * the daemon calls hv_copy_cancel() to remove the file; if a file is
> > +copied
> > + * successfully before suspend, hv_copy_finished() must reset
> > +target_fname to
> > + * avoid that the file can be incorrectly removed upon resume, since
> > +the faked
> > + * CANCEL_FCOPY message is spurious in this case.
> > + */
> > +static int hv_copy_finished(void)
> > +{
> > + close(target_fd);
> > + target_fname[0] = '\0';
> > + return 0;
> > +}
> > +
> > +static void print_usage(char *argv[]) {
> > + fprintf(stderr, "Usage: %s [options]\n"
> > + "Options are:\n"
> > + " -n, --no-daemon stay in foreground, don't
> daemonize\n"
> > + " -h, --help print this help\n", argv[0]);
> > +}
> > +
> > +static bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp,
> unsigned char *buf,
> > + unsigned int buflen, const int *fw_version,
> int fw_vercnt,
> > + const int *srv_version, int srv_vercnt,
> > + int *nego_fw_version, int *nego_srv_version)
> {
> > + int icframe_major, icframe_minor;
> > + int icmsg_major, icmsg_minor;
> > + int fw_major, fw_minor;
> > + int srv_major, srv_minor;
> > + int i, j;
> > + bool found_match = false;
> > + struct icmsg_negotiate *negop;
> > +
> > + /* Check that there's enough space for icframe_vercnt, icmsg_vercnt
> */
> > + if (buflen < ICMSG_HDR + offsetof(struct icmsg_negotiate, reserved)) {
> > + syslog(LOG_ERR, "Invalid icmsg negotiate");
> > + return false;
> > + }
> > +
> > + icmsghdrp->icmsgsize = 0x10;
> > + negop = (struct icmsg_negotiate *)&buf[ICMSG_HDR];
> > +
> > + icframe_major = negop->icframe_vercnt;
> > + icframe_minor = 0;
> > +
> > + icmsg_major = negop->icmsg_vercnt;
> > + icmsg_minor = 0;
> > +
> > + /* Validate negop packet */
> > + if (icframe_major > IC_VERSION_NEGOTIATION_MAX_VER_COUNT ||
> > + icmsg_major > IC_VERSION_NEGOTIATION_MAX_VER_COUNT ||
> > + ICMSG_NEGOTIATE_PKT_SIZE(icframe_major, icmsg_major) >
> buflen) {
> > + syslog(LOG_ERR, "Invalid icmsg negotiate - icframe_major:
> %u, icmsg_major: %u\n",
> > + icframe_major, icmsg_major);
> > + goto fw_error;
> > + }
> > +
> > + /*
> > + * Select the framework version number we will
> > + * support.
> > + */
> > +
> > + for (i = 0; i < fw_vercnt; i++) {
> > + fw_major = (fw_version[i] >> 16);
> > + fw_minor = (fw_version[i] & 0xFFFF);
> > +
> > + for (j = 0; j < negop->icframe_vercnt; j++) {
> > + if (negop->icversion_data[j].major == fw_major &&
> > + negop->icversion_data[j].minor == fw_minor) {
> > + icframe_major = negop-
> >icversion_data[j].major;
> > + icframe_minor = negop-
> >icversion_data[j].minor;
> > + found_match = true;
> > + break;
> > + }
> > + }
> > +
> > + if (found_match)
> > + break;
> > + }
> > +
> > + if (!found_match)
> > + goto fw_error;
> > +
> > + found_match = false;
> > +
> > + for (i = 0; i < srv_vercnt; i++) {
> > + srv_major = (srv_version[i] >> 16);
> > + srv_minor = (srv_version[i] & 0xFFFF);
> > +
> > + for (j = negop->icframe_vercnt;
> > + (j < negop->icframe_vercnt + negop->icmsg_vercnt);
> > + j++) {
> > + if (negop->icversion_data[j].major == srv_major &&
> > + negop->icversion_data[j].minor == srv_minor) {
> > + icmsg_major = negop-
> >icversion_data[j].major;
> > + icmsg_minor = negop-
> >icversion_data[j].minor;
> > + found_match = true;
> > + break;
> > + }
> > + }
> > +
> > + if (found_match)
> > + break;
> > + }
> > +
> > + /*
> > + * Respond with the framework and service
> > + * version numbers we can support.
> > + */
> > +fw_error:
> > + if (!found_match) {
> > + negop->icframe_vercnt = 0;
> > + negop->icmsg_vercnt = 0;
> > + } else {
> > + negop->icframe_vercnt = 1;
> > + negop->icmsg_vercnt = 1;
> > + }
> > +
> > + if (nego_fw_version)
> > + *nego_fw_version = (icframe_major << 16) | icframe_minor;
> > +
> > + if (nego_srv_version)
> > + *nego_srv_version = (icmsg_major << 16) | icmsg_minor;
> > +
> > + negop->icversion_data[0].major = icframe_major;
> > + negop->icversion_data[0].minor = icframe_minor;
> > + negop->icversion_data[1].major = icmsg_major;
> > + negop->icversion_data[1].minor = icmsg_minor;
> > +
> > + return found_match;
> > +}
> > +
> > +static void wcstoutf8(char *dest, const __u16 *src, size_t dest_size)
> > +{
> > + size_t len = 0;
> > +
> > + while (len < dest_size) {
> > + if (src[len] < 0x80)
> > + dest[len++] = (char)(*src++);
> > + else
> > + dest[len++] = 'X';
> > + }
> > +
> > + dest[len] = '\0';
> > +}
> > +
> > +static int hv_fcopy_start(struct hv_start_fcopy *smsg_in) {
> > + setlocale(LC_ALL, "en_US.utf8");
> > + size_t file_size, path_size;
> > + char *file_name, *path_name;
> > + char *in_file_name = (char *)smsg_in->file_name;
> > + char *in_path_name = (char *)smsg_in->path_name;
> > +
> > + file_size = wcstombs(NULL, (const wchar_t *restrict)in_file_name, 0) +
> 1;
> > + path_size = wcstombs(NULL, (const wchar_t *restrict)in_path_name,
> 0)
> > ++ 1;
> > +
> > + file_name = (char *)malloc(file_size * sizeof(char));
> > + path_name = (char *)malloc(path_size * sizeof(char));
> > +
> > + wcstoutf8(file_name, (__u16 *)in_file_name, file_size);
> > + wcstoutf8(path_name, (__u16 *)in_path_name, path_size);
> > +
> > + return hv_fcopy_create_file(file_name, path_name,
> > +smsg_in->copy_flags); }
> > +
> > +static int hv_fcopy_send_data(struct hv_fcopy_hdr *fcopy_msg, int
> > +recvlen) {
> > + int operation = fcopy_msg->operation;
> > +
> > + /*
> > + * The strings sent from the host are encoded in
> > + * utf16; convert it to utf8 strings.
> > + * The host assures us that the utf16 strings will not exceed
> > + * the max lengths specified. We will however, reserve room
> > + * for the string terminating character - in the utf16s_utf8s()
> > + * function we limit the size of the buffer where the converted
> > + * string is placed to W_MAX_PATH -1 to guarantee
> > + * that the strings can be properly terminated!
> > + */
> > +
> > + switch (operation) {
> > + case START_FILE_COPY:
> > + return hv_fcopy_start((struct hv_start_fcopy *)fcopy_msg);
> > + case WRITE_TO_FILE:
> > + return hv_copy_data((struct hv_do_fcopy *)fcopy_msg);
> > + case COMPLETE_FCOPY:
> > + return hv_copy_finished();
> > + }
> > +
> > + return HV_E_FAIL;
> > +}
> > +
> > +/* process the packet recv from host */ static int
> > +fcopy_pkt_process(struct vmbus_br *txbr) {
> > + int ret, offset, pktlen;
> > + int fcopy_srv_version;
> > + const struct vmbus_chanpkt_hdr *pkt;
> > + struct hv_fcopy_hdr *fcopy_msg;
> > + struct icmsg_hdr *icmsghdr;
> > +
> > + pkt = (const struct vmbus_chanpkt_hdr *)desc;
> > + offset = pkt->hlen << 3;
> > + pktlen = (pkt->tlen << 3) - offset;
> > + icmsghdr = (struct icmsg_hdr *)&desc[offset + sizeof(struct
> vmbuspipe_hdr)];
> > + icmsghdr->status = HV_E_FAIL;
> > +
> > + if (icmsghdr->icmsgtype == ICMSGTYPE_NEGOTIATE) {
> > + if (vmbus_prep_negotiate_resp(icmsghdr, desc + offset,
> pktlen, fw_versions,
> > + FW_VER_COUNT, fcopy_versions,
> FCOPY_VER_COUNT,
> > + NULL, &fcopy_srv_version)) {
> > + syslog(LOG_INFO, "FCopy IC version %d.%d",
> > + fcopy_srv_version >> 16, fcopy_srv_version &
> 0xFFFF);
> > + icmsghdr->status = 0;
> > + }
> > + } else if (icmsghdr->icmsgtype == ICMSGTYPE_FCOPY) {
> > + /* Ensure recvlen is big enough to contain hv_fcopy_hdr */
> > + if (pktlen < ICMSG_HDR + sizeof(struct hv_fcopy_hdr)) {
> > + syslog(LOG_ERR, "Invalid Fcopy hdr. Packet length too
> small: %u",
> > + pktlen);
> > + return -ENOBUFS;
> > + }
> > +
> > + fcopy_msg = (struct hv_fcopy_hdr *)&desc[offset +
> ICMSG_HDR];
> > + icmsghdr->status = hv_fcopy_send_data(fcopy_msg, pktlen);
> > + }
> > +
> > + icmsghdr->icflags = ICMSGHDRFLAG_TRANSACTION |
> ICMSGHDRFLAG_RESPONSE;
> > + ret = rte_vmbus_chan_send(txbr, 0x6, desc + offset, pktlen, 0);
> > + if (ret) {
> > + syslog(LOG_ERR, "Write to ringbuffer failed err: %d", ret);
> > + return ret;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void fcopy_get_first_folder(char *path, char *chan_no) {
> > + DIR *dir = opendir(path);
> > + struct dirent *entry;
> > +
> > + if (!dir) {
> > + syslog(LOG_ERR, "Failed to open directory (errno=%s).\n",
> strerror(errno));
> > + return;
> > + }
> > +
> > + while ((entry = readdir(dir)) != NULL) {
> > + if (entry->d_type == DT_DIR && strcmp(entry->d_name, ".")
> != 0 &&
> > + strcmp(entry->d_name, "..") != 0) {
> > + strcpy(chan_no, entry->d_name);
> > + break;
> > + }
> > + }
> > +
> > + closedir(dir);
> > +}
> > +
> > +static void fcopy_set_ring_size(char *path, char *inst, int size) {
> > + char ring_size_path[MAX_PATH_LEN] = {0};
> > + FILE *fd;
> > +
> > + snprintf(ring_size_path, sizeof(ring_size_path), "%s/%s/%s", path, inst,
> "ring_size");
> > + fd = fopen(ring_size_path, "w");
> > + if (!fd) {
> > + syslog(LOG_WARNING, "Failed to open ring_size file
> (errno=%s).\n", strerror(errno));
> > + return;
> > + }
> > + fprintf(fd, "%d", size);
>
> Check for and log an error if the new value isn't accepted by the kernel
> driver?
> The code is using a ring size value that should be accepted by the kernel
> driver, but weird stuff happens and it's probably better to know about it.
Ok, will add error check.
>
> > + fclose(fd);
> > +}
> > +
> > +static char *fcopy_read_sysfs(char *path, char *buf, int len) {
> > + FILE *fd;
> > + char *ret;
> > +
> > + fd = fopen(path, "r");
> > + if (!fd)
> > + return NULL;
> > +
> > + ret = fgets(buf, len, fd);
> > + fclose(fd);
> > +
> > + return ret;
> > +}
> > +
> > +static int fcopy_get_instance_id(char *path, char *class_id, char
> > +*inst) {
> > + DIR *dir = opendir(path);
> > + struct dirent *entry;
> > + char tmp_path[MAX_PATH_LEN] = {0};
> > + char line[MAX_LINE_LEN];
> > +
> > + if (!dir) {
> > + syslog(LOG_ERR, "Failed to open directory (errno=%s).\n",
> strerror(errno));
> > + return -EINVAL;
> > + }
> > +
> > + while ((entry = readdir(dir)) != NULL) {
> > + if (entry->d_type == DT_LNK && strcmp(entry->d_name, ".")
> != 0 &&
> > + strcmp(entry->d_name, "..") != 0) {
> > + /* search for the sysfs path with matching class_id */
> > + snprintf(tmp_path, sizeof(tmp_path), "%s/%s/%s",
> > + path, entry->d_name, "class_id");
> > + if (!fcopy_read_sysfs(tmp_path, line, MAX_LINE_LEN))
> > + continue;
> > +
> > + /* class id matches, now fetch the instance id from
> device_id */
> > + if (strstr(line, class_id)) {
> > + snprintf(tmp_path, sizeof(tmp_path),
> "%s/%s/%s",
> > + path, entry->d_name, "device_id");
> > + if (!fcopy_read_sysfs(tmp_path, line,
> MAX_LINE_LEN))
> > + continue;
> > + /* remove braces */
> > + strncpy(inst, line + 1, strlen(line) - 3);
> > + break;
> > + }
> > + }
> > + }
> > +
> > + closedir(dir);
> > + return 0;
>
> If this function doesn't find a matching class_id, it appears that it returns 0,
> but with the "inst" parameter unset. The caller will then proceed as if "inst"
> was set when it is actually an uninitialized stack variable. Probably need
> some better error detection and handling.
Good point !
Let me fix it in next version.
>
> > +}
> > +
> > +int main(int argc, char *argv[])
> > +{
> > + int fcopy_fd = -1, tmp = 1;
> > + int daemonize = 1, long_index = 0, opt, ret = -EINVAL;
> > + struct vmbus_br txbr, rxbr;
> > + void *ring;
> > + uint32_t len = HV_RING_SIZE;
> > + char uio_name[10] = {0};
> > + char uio_dev_path[15] = {0};
> > + char uio_path[MAX_PATH_LEN] = {0};
> > + char inst[MAX_LINE_LEN] = {0};
> > +
> > + static struct option long_options[] = {
> > + {"help", no_argument, 0, 'h' },
> > + {"no-daemon", no_argument, 0, 'n' },
> > + {0, 0, 0, 0 }
> > + };
> > +
> > + while ((opt = getopt_long(argc, argv, "hn", long_options,
> > + &long_index)) != -1) {
> > + switch (opt) {
> > + case 'n':
> > + daemonize = 0;
> > + break;
> > + case 'h':
> > + default:
> > + print_usage(argv);
> > + exit(EXIT_FAILURE);
> > + }
> > + }
> > +
> > + if (daemonize && daemon(1, 0)) {
> > + syslog(LOG_ERR, "daemon() failed; error: %s",
> strerror(errno));
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + openlog("HV_UIO_FCOPY", 0, LOG_USER);
> > + syslog(LOG_INFO, "starting; pid is:%d", getpid());
> > +
> > + /* get instance id */
> > + if (fcopy_get_instance_id(DEVICES_SYSFS, FCOPY_CLASS_ID, inst))
> > + exit(EXIT_FAILURE);
>
> Per above, need better error handling. And since the syslog is now open, any
> errors should be logged rather than having the process just mysteriously exit.
OK.
- Saurabh
^ permalink raw reply
* Re: [PATCH v3 net] net: hv_netvsc: fix netvsc_send_completion to avoid multiple message length checks
From: Simon Horman @ 2023-08-03 12:14 UTC (permalink / raw)
To: Sonia Sharma
Cc: linux-kernel, linux-hyperv, netdev, sosha, kys, mikelley,
haiyangz, wei.liu, decui, longli, davem, edumazet, kuba, pabeni
In-Reply-To: <1691023528-5270-1-git-send-email-sosha@linux.microsoft.com>
On Wed, Aug 02, 2023 at 05:45:28PM -0700, Sonia Sharma wrote:
> From: Sonia Sharma <sonia.sharma@linux.microsoft.com>
>
> The switch statement in netvsc_send_completion() is incorrectly validating
> the length of incoming network packets by falling through to the next case.
> Avoid the fallthrough. Instead break after a case match and then process
> the complete() call.
>
> Signed-off-by: Sonia Sharma <sonia.sharma@linux.microsoft.com>
Hi Sonia,
if this is a bug-fix, which seems to be the case, then it probably warrants
a Fixes tag.
^ permalink raw reply
* Re: [PATCH RFC net-next v5 03/14] af_vsock: support multi-transport datagrams
From: Stefano Garzarella @ 2023-08-03 12:42 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Arseniy Krasnov, Bobby Eshleman, Stefan Hajnoczi,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
VMware PV-Drivers Reviewers, Dan Carpenter, Simon Horman, kvm,
virtualization, netdev, linux-kernel, linux-hyperv, bpf
In-Reply-To: <ZMr6giur//A1hrND@bullseye>
On Thu, Aug 03, 2023 at 12:53:22AM +0000, Bobby Eshleman wrote:
>On Wed, Aug 02, 2023 at 10:24:44PM +0000, Bobby Eshleman wrote:
>> On Sun, Jul 23, 2023 at 12:53:15AM +0300, Arseniy Krasnov wrote:
>> >
>> >
>> > On 19.07.2023 03:50, Bobby Eshleman wrote:
>> > > This patch adds support for multi-transport datagrams.
>> > >
>> > > This includes:
>> > > - Per-packet lookup of transports when using sendto(sockaddr_vm)
>> > > - Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
>> > > sockaddr_vm
>> > > - rename VSOCK_TRANSPORT_F_DGRAM to VSOCK_TRANSPORT_F_DGRAM_FALLBACK
>> > > - connect() now assigns the transport for (similar to connectible
>> > > sockets)
>> > >
>> > > To preserve backwards compatibility with VMCI, some important changes
>> > > are made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
>> > > be used for dgrams only if there is not yet a g2h or h2g transport that
>> > > has been registered that can transmit the packet. If there is a g2h/h2g
>> > > transport for that remote address, then that transport will be used and
>> > > not "transport_dgram". This essentially makes "transport_dgram" a
>> > > fallback transport for when h2g/g2h has not yet gone online, and so it
>> > > is renamed "transport_dgram_fallback". VMCI implements this transport.
>> > >
>> > > The logic around "transport_dgram" needs to be retained to prevent
>> > > breaking VMCI:
>> > >
>> > > 1) VMCI datagrams existed prior to h2g/g2h and so operate under a
>> > > different paradigm. When the vmci transport comes online, it registers
>> > > itself with the DGRAM feature, but not H2G/G2H. Only later when the
>> > > transport has more information about its environment does it register
>> > > H2G or G2H. In the case that a datagram socket is created after
>> > > VSOCK_TRANSPORT_F_DGRAM registration but before G2H/H2G registration,
>> > > the "transport_dgram" transport is the only registered transport and so
>> > > needs to be used.
>> > >
>> > > 2) VMCI seems to require a special message be sent by the transport when a
>> > > datagram socket calls bind(). Under the h2g/g2h model, the transport
>> > > is selected using the remote_addr which is set by connect(). At
>> > > bind time there is no remote_addr because often no connect() has been
>> > > called yet: the transport is null. Therefore, with a null transport
>> > > there doesn't seem to be any good way for a datagram socket to tell the
>> > > VMCI transport that it has just had bind() called upon it.
>> > >
>> > > With the new fallback logic, after H2G/G2H comes online the socket layer
>> > > will access the VMCI transport via transport_{h2g,g2h}. Prior to H2G/G2H
>> > > coming online, the socket layer will access the VMCI transport via
>> > > "transport_dgram_fallback".
>> > >
>> > > Only transports with a special datagram fallback use-case such as VMCI
>> > > need to register VSOCK_TRANSPORT_F_DGRAM_FALLBACK.
>> > >
>> > > Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>> > > ---
>> > > drivers/vhost/vsock.c | 1 -
>> > > include/linux/virtio_vsock.h | 2 --
>> > > include/net/af_vsock.h | 10 +++---
>> > > net/vmw_vsock/af_vsock.c | 64 ++++++++++++++++++++++++++-------
>> > > net/vmw_vsock/hyperv_transport.c | 6 ----
>> > > net/vmw_vsock/virtio_transport.c | 1 -
>> > > net/vmw_vsock/virtio_transport_common.c | 7 ----
>> > > net/vmw_vsock/vmci_transport.c | 2 +-
>> > > net/vmw_vsock/vsock_loopback.c | 1 -
>> > > 9 files changed, 58 insertions(+), 36 deletions(-)
>> > >
>> > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> > > index ae8891598a48..d5d6a3c3f273 100644
>> > > --- a/drivers/vhost/vsock.c
>> > > +++ b/drivers/vhost/vsock.c
>> > > @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
>> > > .cancel_pkt = vhost_transport_cancel_pkt,
>> > >
>> > > .dgram_enqueue = virtio_transport_dgram_enqueue,
>> > > - .dgram_bind = virtio_transport_dgram_bind,
>> > > .dgram_allow = virtio_transport_dgram_allow,
>> > >
>> > > .stream_enqueue = virtio_transport_stream_enqueue,
>> > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> > > index 18cbe8d37fca..7632552bee58 100644
>> > > --- a/include/linux/virtio_vsock.h
>> > > +++ b/include/linux/virtio_vsock.h
>> > > @@ -211,8 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
>> > > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
>> > > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
>> > > bool virtio_transport_stream_allow(u32 cid, u32 port);
>> > > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>> > > - struct sockaddr_vm *addr);
>> > > bool virtio_transport_dgram_allow(u32 cid, u32 port);
>> > >
>> > > int virtio_transport_connect(struct vsock_sock *vsk);
>> > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> > > index 305d57502e89..f6a0ca9d7c3e 100644
>> > > --- a/include/net/af_vsock.h
>> > > +++ b/include/net/af_vsock.h
>> > > @@ -96,13 +96,13 @@ struct vsock_transport_send_notify_data {
>> > >
>> > > /* Transport features flags */
>> > > /* Transport provides host->guest communication */
>> > > -#define VSOCK_TRANSPORT_F_H2G 0x00000001
>> > > +#define VSOCK_TRANSPORT_F_H2G 0x00000001
>> > > /* Transport provides guest->host communication */
>> > > -#define VSOCK_TRANSPORT_F_G2H 0x00000002
>> > > -/* Transport provides DGRAM communication */
>> > > -#define VSOCK_TRANSPORT_F_DGRAM 0x00000004
>> > > +#define VSOCK_TRANSPORT_F_G2H 0x00000002
>> > > +/* Transport provides fallback for DGRAM communication */
>> > > +#define VSOCK_TRANSPORT_F_DGRAM_FALLBACK 0x00000004
>> > > /* Transport provides local (loopback) communication */
>> > > -#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
>> > > +#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
>> > >
>> > > struct vsock_transport {
>> > > struct module *module;
>> > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> > > index ae5ac5531d96..26c97b33d55a 100644
>> > > --- a/net/vmw_vsock/af_vsock.c
>> > > +++ b/net/vmw_vsock/af_vsock.c
>> > > @@ -139,8 +139,8 @@ struct proto vsock_proto = {
>> > > static const struct vsock_transport *transport_h2g;
>> > > /* Transport used for guest->host communication */
>> > > static const struct vsock_transport *transport_g2h;
>> > > -/* Transport used for DGRAM communication */
>> > > -static const struct vsock_transport *transport_dgram;
>> > > +/* Transport used as a fallback for DGRAM communication */
>> > > +static const struct vsock_transport *transport_dgram_fallback;
>> > > /* Transport used for local communication */
>> > > static const struct vsock_transport *transport_local;
>> > > static DEFINE_MUTEX(vsock_register_mutex);
>> > > @@ -439,6 +439,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
>> > > return transport;
>> > > }
>> > >
>> > > +static const struct vsock_transport *
>> > > +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
>> > > +{
>> > > + const struct vsock_transport *transport;
>> > > +
>> > > + transport = vsock_connectible_lookup_transport(cid, flags);
>> > > + if (transport)
>> > > + return transport;
>> > > +
>> > > + return transport_dgram_fallback;
>> > > +}
>> > > +
>> > > /* Assign a transport to a socket and call the .init transport callback.
>> > > *
>> > > * Note: for connection oriented socket this must be called when vsk->remote_addr
>> > > @@ -475,7 +487,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
>> > >
>> > > switch (sk->sk_type) {
>> > > case SOCK_DGRAM:
>> > > - new_transport = transport_dgram;
>> > > + new_transport = vsock_dgram_lookup_transport(remote_cid,
>> > > + remote_flags);
>> >
>> > I'm a little bit confused about this:
>> > 1) Let's create SOCK_DGRAM socket using vsock_create()
>> > 2) for SOCK_DGRAM it calls 'vsock_assign_transport()' and we go here, remote_cid == -1
>> > 3) I guess 'vsock_dgram_lookup_transport()' calls logic from 0002 and returns h2g for such remote cid, which is not
>> > correct I think...
>> >
>> > Please correct me if i'm wrong
>> >
>> > Thanks, Arseniy
>> >
>>
>> As I understand, for the VMCI case, if transport_h2g != NULL, then
>> transport_h2g == transport_dgram_fallback. In either case,
>> vsk->transport == transport_dgram_fallback.
>>
>> For the virtio/vhost case, temporarily vsk->transport == transport_h2g,
>> but it is unused because vsk->transport->dgram_bind == NULL.
>>
>> Until SS_CONNECTED is set by connect() and vsk->transport is set
>> correctly, the send path is barred from using the bad transport.
>>
>> I guess the recvmsg() path is a little more sketchy, and probably only
>> works in my test cases because h2g/g2h in the vhost/virtio case have
>> identical dgram_addr_init() implementations.
>>
>> I think a cleaner solution is maybe checking in vsock_create() if
>> dgram_bind is implemented. If it is not, then vsk->transport should be
>> reset to NULL and a comment added explaining why VMCI requires this.
>>
>> Then the other calls can begin explicitly checking for vsk->transport ==
>> NULL.
>
>Actually, on further reflection here, in order for the vsk->transport to
>be called in time for ->dgram_addr_init(), it is going to be necessary
>to call vsock_assign_transport() in vsock_dgram_bind() anyway.
>
>I think this means that the vsock_assign_transport() call can be removed
>from vsock_create() call entirely, and yet VMCI can still dispatch
>messages upon bind() calls as needed.
>
>This would then simplify the whole arrangement, if there aren't other
>unseen issues.
This sounds like a good approach.
My only question is whether vsock_dgram_bind() is always called for each
dgram socket.
Stefano
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox