* [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically
From: lantianyu1986 @ 2019-07-29 7:52 UTC (permalink / raw)
To: luto, tglx, mingo, bp, hpa, x86, kys, haiyangz, sthemmin, sashal,
daniel.lezcano, arnd, michael.h.kelley, ashal
Cc: Tianyu Lan, linux-kernel, linux-hyperv, linux-arch
In-Reply-To: <20190729075243.22745-1-Tianyu.Lan@microsoft.com>
From: Tianyu Lan <Tianyu.Lan@microsoft.com>
This is to prepare to add Hyper-V sched clock callback and move
Hyper-V reference TSC initialization much earlier in the boot
process when timestamp is 0. So no discontinuity is observed
when pv_ops.time.sched_clock to calculate its offset. This earlier
initialization requires that the Hyper-V TSC page be allocated
statically instead of with vmalloc(), so fixup the references
to the TSC page and the method of getting its physical address.
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
arch/x86/entry/vdso/vma.c | 2 +-
drivers/clocksource/hyperv_timer.c | 12 ++++--------
2 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 349a61d8bf34..f5937742b290 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -122,7 +122,7 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
if (tsc_pg && vclock_was_used(VCLOCK_HVCLOCK))
return vmf_insert_pfn(vma, vmf->address,
- vmalloc_to_pfn(tsc_pg));
+ virt_to_phys(tsc_pg) >> PAGE_SHIFT);
}
return VM_FAULT_SIGBUS;
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index ba2c79e6a0ee..86764ec9a854 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -214,17 +214,17 @@ EXPORT_SYMBOL_GPL(hyperv_cs);
#ifdef CONFIG_HYPERV_TSCPAGE
-static struct ms_hyperv_tsc_page *tsc_pg;
+static struct ms_hyperv_tsc_page tsc_pg __aligned(PAGE_SIZE);
struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
{
- return tsc_pg;
+ return &tsc_pg;
}
EXPORT_SYMBOL_GPL(hv_get_tsc_page);
static u64 notrace read_hv_sched_clock_tsc(void)
{
- u64 current_tick = hv_read_tsc_page(tsc_pg);
+ u64 current_tick = hv_read_tsc_page(&tsc_pg);
if (current_tick == U64_MAX)
hv_get_time_ref_count(current_tick);
@@ -280,12 +280,8 @@ static bool __init hv_init_tsc_clocksource(void)
if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
return false;
- tsc_pg = vmalloc(PAGE_SIZE);
- if (!tsc_pg)
- return false;
-
hyperv_cs = &hyperv_cs_tsc;
- phys_addr = page_to_phys(vmalloc_to_page(tsc_pg));
+ phys_addr = virt_to_phys(&tsc_pg) & PAGE_MASK;
/*
* The Hyper-V TLFS specifies to preserve the value of reserved
--
2.14.5
^ permalink raw reply related
* [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
From: lantianyu1986 @ 2019-07-29 7:52 UTC (permalink / raw)
To: luto, tglx, mingo, bp, hpa, x86, kys, haiyangz, sthemmin, sashal,
daniel.lezcano, arnd, michael.h.kelley, ashal
Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel
From: Tianyu Lan <Tianyu.Lan@microsoft.com>
Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
on x86. But native_sched_clock() directly uses the raw TSC value, which
can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock()
to set the sched clock function appropriately. On x86, this sets
pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
scaled and adjusted to be continuous.
Also move the Hyper-V reference TSC initialization much earlier in the boot
process so no discontinuity is observed when pv_ops.time.sched_clock
calculates its offset. This earlier initialization requires that the Hyper-V TSC
page be allocated statically instead of with vmalloc(), so fixup the references
to the TSC page and the method of getting its physical address.
Tianyu Lan (2):
clocksource/Hyper-v: Allocate Hyper-V tsc page statically
clocksource/Hyper-V: Add Hyper-V specific sched clock function
arch/x86/entry/vdso/vma.c | 2 +-
arch/x86/hyperv/hv_init.c | 2 --
arch/x86/kernel/cpu/mshyperv.c | 8 ++++++++
drivers/clocksource/hyperv_timer.c | 34 ++++++++++++++++------------------
include/asm-generic/mshyperv.h | 1 +
5 files changed, 26 insertions(+), 21 deletions(-)
--
2.14.5
^ permalink raw reply
* [PATCH net] hv_sock: Fix hang when a connection is closed
From: Dexuan Cui @ 2019-07-28 18:32 UTC (permalink / raw)
To: Sunil Muthuswamy, David Miller, netdev@vger.kernel.org
Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
sashal@kernel.org, Michael Kelley, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, olaf@aepfle.de, apw@canonical.com,
jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com
hvs_do_close_lock_held() may decrease the reference count to 0 and free the
sk struct completely, and then the following release_sock(sk) may hang.
Fixes: a9eeb998c28d ("hv_sock: Add support for delayed close")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: stable@vger.kernel.org
---
With the proper kernel debugging options enabled, first a warning can
appear:
kworker/1:0/4467 is freeing memory ..., with a lock still held there!
stack backtrace:
Workqueue: events vmbus_onmessage_work [hv_vmbus]
Call Trace:
dump_stack+0x67/0x90
debug_check_no_locks_freed.cold.52+0x78/0x7d
slab_free_freelist_hook+0x85/0x140
kmem_cache_free+0xa5/0x380
__sk_destruct+0x150/0x260
hvs_close_connection+0x24/0x30 [hv_sock]
vmbus_onmessage_work+0x1d/0x30 [hv_vmbus]
process_one_work+0x241/0x600
worker_thread+0x3c/0x390
kthread+0x11b/0x140
ret_from_fork+0x24/0x30
and then the following release_sock(sk) can hang:
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:4467]
...
irq event stamp: 62890
CPU: 1 PID: 4467 Comm: kworker/1:0 Tainted: G W 5.2.0+ #39
Workqueue: events vmbus_onmessage_work [hv_vmbus]
RIP: 0010:queued_spin_lock_slowpath+0x2b/0x1e0
...
Call Trace:
do_raw_spin_lock+0xab/0xb0
release_sock+0x19/0xb0
vmbus_onmessage_work+0x1d/0x30 [hv_vmbus]
process_one_work+0x241/0x600
worker_thread+0x3c/0x390
kthread+0x11b/0x140
ret_from_fork+0x24/0x30
net/vmw_vsock/hyperv_transport.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index f2084e3f7aa4..efbda8ef1eff 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -309,9 +309,16 @@ static void hvs_close_connection(struct vmbus_channel *chan)
{
struct sock *sk = get_per_channel_state(chan);
+ /* Grab an extra reference since hvs_do_close_lock_held() may decrease
+ * the reference count to 0 by calling sock_put(sk).
+ */
+ sock_hold(sk);
+
lock_sock(sk);
hvs_do_close_lock_held(vsock_sk(sk), true);
release_sock(sk);
+
+ sock_put(sk);
}
static void hvs_open_connection(struct vmbus_channel *chan)
--
2.19.1
^ permalink raw reply related
* Re: [PATCH] hv_sock: use HV_HYP_PAGE_SIZE instead of PAGE_SIZE_4K
From: kbuild test robot @ 2019-07-28 4:06 UTC (permalink / raw)
To: Himadri Pandya
Cc: kbuild-all, mikelley, kys, haiyangz, sthemmin, sashal, davem,
linux-hyperv, netdev, linux-kernel, Himadri Pandya
In-Reply-To: <20190725051125.10605-1-himadri18.07@gmail.com>
Hi Himadri,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on linus/master]
[cannot apply to v5.3-rc1 next-20190726]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Himadri-Pandya/hv_sock-use-HV_HYP_PAGE_SIZE-instead-of-PAGE_SIZE_4K/20190726-085229
reproduce:
# apt-get install sparse
# sparse version: v0.6.1-rc1-7-g2b96cd8-dirty
make ARCH=x86_64 allmodconfig
make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
sparse warnings: (new ones prefixed by >>)
include/linux/sched.h:609:43: sparse: sparse: bad integer constant expression
include/linux/sched.h:609:73: sparse: sparse: invalid named zero-width bitfield `value'
include/linux/sched.h:610:43: sparse: sparse: bad integer constant expression
include/linux/sched.h:610:67: sparse: sparse: invalid named zero-width bitfield `bucket_id'
net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: incompatible types for operation (-)
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse: left side has type bad type
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse: right side has type int
net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: incompatible types for operation (-)
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse: left side has type bad type
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse: right side has type int
net/vmw_vsock/hyperv_transport.c:65:17: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:65:17: sparse: sparse: bad constant expression type
net/vmw_vsock/hyperv_transport.c:387:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:388:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:392:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:392:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:392:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:392:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:395:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:395:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:395:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:395:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:465:25: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:466:25: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:666:9: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
vim +214 net/vmw_vsock/hyperv_transport.c
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 59
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 60 struct hvs_send_buf {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 61 /* The header before the payload data */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 62 struct vmpipe_proto_header hdr;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 63
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 64 /* The payload */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 @65 u8 data[HVS_SEND_BUF_SIZE];
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 66 };
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 67
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 68 #define HVS_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 69 sizeof(struct vmpipe_proto_header))
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 70
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 71 /* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write(), and
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 72 * __hv_pkt_iter_next().
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 73 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 74 #define VMBUS_PKT_TRAILER_SIZE (sizeof(u64))
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 75
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 76 #define HVS_PKT_LEN(payload_len) (HVS_HEADER_LEN + \
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 77 ALIGN((payload_len), 8) + \
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 78 VMBUS_PKT_TRAILER_SIZE)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 79
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 80 union hvs_service_id {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 81 uuid_le srv_id;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 82
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 83 struct {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 84 unsigned int svm_port;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 85 unsigned char b[sizeof(uuid_le) - sizeof(unsigned int)];
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 86 };
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 87 };
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 88
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 89 /* Per-socket state (accessed via vsk->trans) */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 90 struct hvsock {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 91 struct vsock_sock *vsk;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 92
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 93 uuid_le vm_srv_id;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 94 uuid_le host_srv_id;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 95
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 96 struct vmbus_channel *chan;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 97 struct vmpacket_descriptor *recv_desc;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 98
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 99 /* The length of the payload not delivered to userland yet */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 100 u32 recv_data_len;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 101 /* The offset of the payload */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 102 u32 recv_data_off;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 103
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 104 /* Have we sent the zero-length packet (FIN)? */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 105 bool fin_sent;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 106 };
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 107
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 108 /* In the VM, we support Hyper-V Sockets with AF_VSOCK, and the endpoint is
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 109 * <cid, port> (see struct sockaddr_vm). Note: cid is not really used here:
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 110 * when we write apps to connect to the host, we can only use VMADDR_CID_ANY
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 111 * or VMADDR_CID_HOST (both are equivalent) as the remote cid, and when we
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 112 * write apps to bind() & listen() in the VM, we can only use VMADDR_CID_ANY
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 113 * as the local cid.
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 114 *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 115 * On the host, Hyper-V Sockets are supported by Winsock AF_HYPERV:
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 116 * https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 117 * guide/make-integration-service, and the endpoint is <VmID, ServiceId> with
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 118 * the below sockaddr:
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 119 *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 120 * struct SOCKADDR_HV
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 121 * {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 122 * ADDRESS_FAMILY Family;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 123 * USHORT Reserved;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 124 * GUID VmId;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 125 * GUID ServiceId;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 126 * };
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 127 * Note: VmID is not used by Linux VM and actually it isn't transmitted via
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 128 * VMBus, because here it's obvious the host and the VM can easily identify
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 129 * each other. Though the VmID is useful on the host, especially in the case
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 130 * of Windows container, Linux VM doesn't need it at all.
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 131 *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 132 * To make use of the AF_VSOCK infrastructure in Linux VM, we have to limit
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 133 * the available GUID space of SOCKADDR_HV so that we can create a mapping
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 134 * between AF_VSOCK port and SOCKADDR_HV Service GUID. The rule of writing
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 135 * Hyper-V Sockets apps on the host and in Linux VM is:
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 136 *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 137 ****************************************************************************
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 138 * The only valid Service GUIDs, from the perspectives of both the host and *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 139 * Linux VM, that can be connected by the other end, must conform to this *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 140 * format: <port>-facb-11e6-bd58-64006a7986d3, and the "port" must be in *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 141 * this range [0, 0x7FFFFFFF]. *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 142 ****************************************************************************
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 143 *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 144 * When we write apps on the host to connect(), the GUID ServiceID is used.
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 145 * When we write apps in Linux VM to connect(), we only need to specify the
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 146 * port and the driver will form the GUID and use that to request the host.
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 147 *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 148 * From the perspective of Linux VM:
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 149 * 1. the local ephemeral port (i.e. the local auto-bound port when we call
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 150 * connect() without explicit bind()) is generated by __vsock_bind_stream(),
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 151 * and the range is [1024, 0xFFFFFFFF).
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 152 * 2. the remote ephemeral port (i.e. the auto-generated remote port for
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 153 * a connect request initiated by the host's connect()) is generated by
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 154 * hvs_remote_addr_init() and the range is [0x80000000, 0xFFFFFFFF).
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 155 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 156
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 157 #define MAX_LISTEN_PORT ((u32)0x7FFFFFFF)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 158 #define MAX_VM_LISTEN_PORT MAX_LISTEN_PORT
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 159 #define MAX_HOST_LISTEN_PORT MAX_LISTEN_PORT
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 160 #define MIN_HOST_EPHEMERAL_PORT (MAX_HOST_LISTEN_PORT + 1)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 161
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 162 /* 00000000-facb-11e6-bd58-64006a7986d3 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 163 static const uuid_le srv_id_template =
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 164 UUID_LE(0x00000000, 0xfacb, 0x11e6, 0xbd, 0x58,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 165 0x64, 0x00, 0x6a, 0x79, 0x86, 0xd3);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 166
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 167 static bool is_valid_srv_id(const uuid_le *id)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 168 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 169 return !memcmp(&id->b[4], &srv_id_template.b[4], sizeof(uuid_le) - 4);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 170 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 171
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 172 static unsigned int get_port_by_srv_id(const uuid_le *svr_id)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 173 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 174 return *((unsigned int *)svr_id);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 175 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 176
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 177 static void hvs_addr_init(struct sockaddr_vm *addr, const uuid_le *svr_id)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 178 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 179 unsigned int port = get_port_by_srv_id(svr_id);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 180
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 181 vsock_addr_init(addr, VMADDR_CID_ANY, port);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 182 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 183
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 184 static void hvs_remote_addr_init(struct sockaddr_vm *remote,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 185 struct sockaddr_vm *local)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 186 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 187 static u32 host_ephemeral_port = MIN_HOST_EPHEMERAL_PORT;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 188 struct sock *sk;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 189
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 190 vsock_addr_init(remote, VMADDR_CID_ANY, VMADDR_PORT_ANY);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 191
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 192 while (1) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 193 /* Wrap around ? */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 194 if (host_ephemeral_port < MIN_HOST_EPHEMERAL_PORT ||
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 195 host_ephemeral_port == VMADDR_PORT_ANY)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 196 host_ephemeral_port = MIN_HOST_EPHEMERAL_PORT;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 197
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 198 remote->svm_port = host_ephemeral_port++;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 199
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 200 sk = vsock_find_connected_socket(remote, local);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 201 if (!sk) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 202 /* Found an available ephemeral port */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 203 return;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 204 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 205
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 206 /* Release refcnt got in vsock_find_connected_socket */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 207 sock_put(sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 208 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 209 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 210
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 211 static void hvs_set_channel_pending_send_size(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 212 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 213 set_channel_pending_send_size(chan,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 @214 HVS_PKT_LEN(HVS_SEND_BUF_SIZE));
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 215
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 216 virt_mb();
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 217 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 218
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 219 static bool hvs_channel_readable(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 220 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 221 u32 readable = hv_get_bytes_to_read(&chan->inbound);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 222
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 223 /* 0-size payload means FIN */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 224 return readable >= HVS_PKT_LEN(0);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 225 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 226
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 227 static int hvs_channel_readable_payload(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 228 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 229 u32 readable = hv_get_bytes_to_read(&chan->inbound);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 230
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 231 if (readable > HVS_PKT_LEN(0)) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 232 /* At least we have 1 byte to read. We don't need to return
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 233 * the exact readable bytes: see vsock_stream_recvmsg() ->
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 234 * vsock_stream_has_data().
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 235 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 236 return 1;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 237 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 238
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 239 if (readable == HVS_PKT_LEN(0)) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 240 /* 0-size payload means FIN */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 241 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 242 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 243
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 244 /* No payload or FIN */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 245 return -1;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 246 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 247
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 248 static size_t hvs_channel_writable_bytes(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 249 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 250 u32 writeable = hv_get_bytes_to_write(&chan->outbound);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 251 size_t ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 252
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 253 /* The ringbuffer mustn't be 100% full, and we should reserve a
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 254 * zero-length-payload packet for the FIN: see hv_ringbuffer_write()
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 255 * and hvs_shutdown().
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 256 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 257 if (writeable <= HVS_PKT_LEN(1) + HVS_PKT_LEN(0))
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 258 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 259
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 260 ret = writeable - HVS_PKT_LEN(1) - HVS_PKT_LEN(0);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 261
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 262 return round_down(ret, 8);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 263 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 264
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 265 static int hvs_send_data(struct vmbus_channel *chan,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 266 struct hvs_send_buf *send_buf, size_t to_write)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 267 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 268 send_buf->hdr.pkt_type = 1;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 269 send_buf->hdr.data_size = to_write;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 270 return vmbus_sendpacket(chan, &send_buf->hdr,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 271 sizeof(send_buf->hdr) + to_write,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 272 0, VM_PKT_DATA_INBAND, 0);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 273 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 274
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 275 static void hvs_channel_cb(void *ctx)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 276 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 277 struct sock *sk = (struct sock *)ctx;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 278 struct vsock_sock *vsk = vsock_sk(sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 279 struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 280 struct vmbus_channel *chan = hvs->chan;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 281
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 282 if (hvs_channel_readable(chan))
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 283 sk->sk_data_ready(sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 284
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 285 if (hv_get_bytes_to_write(&chan->outbound) > 0)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 286 sk->sk_write_space(sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 287 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 288
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 289 static void hvs_do_close_lock_held(struct vsock_sock *vsk,
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 290 bool cancel_timeout)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 291 {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 292 struct sock *sk = sk_vsock(vsk);
b4562ca7925a3be Dexuan Cui 2017-10-19 293
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 294 sock_set_flag(sk, SOCK_DONE);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 295 vsk->peer_shutdown = SHUTDOWN_MASK;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 296 if (vsock_stream_has_data(vsk) <= 0)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 297 sk->sk_state = TCP_CLOSING;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 298 sk->sk_state_change(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 299 if (vsk->close_work_scheduled &&
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 300 (!cancel_timeout || cancel_delayed_work(&vsk->close_work))) {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 301 vsk->close_work_scheduled = false;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 302 vsock_remove_sock(vsk);
b4562ca7925a3be Dexuan Cui 2017-10-19 303
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 304 /* Release the reference taken while scheduling the timeout */
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 305 sock_put(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 306 }
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 307 }
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 308
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 309 static void hvs_close_connection(struct vmbus_channel *chan)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 310 {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 311 struct sock *sk = get_per_channel_state(chan);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 312
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 313 lock_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 314 hvs_do_close_lock_held(vsock_sk(sk), true);
b4562ca7925a3be Dexuan Cui 2017-10-19 315 release_sock(sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 316 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 317
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 318 static void hvs_open_connection(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 319 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 320 uuid_le *if_instance, *if_type;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 321 unsigned char conn_from_host;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 322
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 323 struct sockaddr_vm addr;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 324 struct sock *sk, *new = NULL;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 325 struct vsock_sock *vnew = NULL;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 326 struct hvsock *hvs = NULL;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 327 struct hvsock *hvs_new = NULL;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 328 int rcvbuf;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 329 int ret;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 330 int sndbuf;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 331
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 332 if_type = &chan->offermsg.offer.if_type;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 333 if_instance = &chan->offermsg.offer.if_instance;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 334 conn_from_host = chan->offermsg.offer.u.pipe.user_def[0];
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 335
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 336 /* The host or the VM should only listen on a port in
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 337 * [0, MAX_LISTEN_PORT]
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 338 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 339 if (!is_valid_srv_id(if_type) ||
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 340 get_port_by_srv_id(if_type) > MAX_LISTEN_PORT)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 341 return;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 342
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 343 hvs_addr_init(&addr, conn_from_host ? if_type : if_instance);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 344 sk = vsock_find_bound_socket(&addr);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 345 if (!sk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 346 return;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 347
b4562ca7925a3be Dexuan Cui 2017-10-19 348 lock_sock(sk);
3b4477d2dcf2709 Stefan Hajnoczi 2017-10-05 349 if ((conn_from_host && sk->sk_state != TCP_LISTEN) ||
3b4477d2dcf2709 Stefan Hajnoczi 2017-10-05 350 (!conn_from_host && sk->sk_state != TCP_SYN_SENT))
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 351 goto out;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 352
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 353 if (conn_from_host) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 354 if (sk->sk_ack_backlog >= sk->sk_max_ack_backlog)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 355 goto out;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 356
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 357 new = __vsock_create(sock_net(sk), NULL, sk, GFP_KERNEL,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 358 sk->sk_type, 0);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 359 if (!new)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 360 goto out;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 361
3b4477d2dcf2709 Stefan Hajnoczi 2017-10-05 362 new->sk_state = TCP_SYN_SENT;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 363 vnew = vsock_sk(new);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 364 hvs_new = vnew->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 365 hvs_new->chan = chan;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 366 } else {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 367 hvs = vsock_sk(sk)->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 368 hvs->chan = chan;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 369 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 370
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 371 set_channel_read_mode(chan, HV_CALL_DIRECT);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 372
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 373 /* Use the socket buffer sizes as hints for the VMBUS ring size. For
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 374 * server side sockets, 'sk' is the parent socket and thus, this will
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 375 * allow the child sockets to inherit the size from the parent. Keep
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 376 * the mins to the default value and align to page size as per VMBUS
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 377 * requirements.
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 378 * For the max, the socket core library will limit the socket buffer
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 379 * size that can be set by the user, but, since currently, the hv_sock
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 380 * VMBUS ring buffer is physically contiguous allocation, restrict it
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 381 * further.
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 382 * Older versions of hv_sock host side code cannot handle bigger VMBUS
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 383 * ring buffer size. Use the version number to limit the change to newer
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 384 * versions.
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 385 */
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 386 if (vmbus_proto_version < VERSION_WIN10_V5) {
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 387 sndbuf = RINGBUFFER_HVS_SND_SIZE;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 388 rcvbuf = RINGBUFFER_HVS_RCV_SIZE;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 389 } else {
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 @390 sndbuf = max_t(int, sk->sk_sndbuf, RINGBUFFER_HVS_SND_SIZE);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 391 sndbuf = min_t(int, sndbuf, RINGBUFFER_HVS_MAX_SIZE);
31113cc83e30924 Himadri Pandya 2019-07-25 392 sndbuf = ALIGN(sndbuf, HV_HYP_PAGE_SIZE);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 393 rcvbuf = max_t(int, sk->sk_rcvbuf, RINGBUFFER_HVS_RCV_SIZE);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 394 rcvbuf = min_t(int, rcvbuf, RINGBUFFER_HVS_MAX_SIZE);
31113cc83e30924 Himadri Pandya 2019-07-25 395 rcvbuf = ALIGN(rcvbuf, HV_HYP_PAGE_SIZE);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 396 }
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 397
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 398 ret = vmbus_open(chan, sndbuf, rcvbuf, NULL, 0, hvs_channel_cb,
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 399 conn_from_host ? new : sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 400 if (ret != 0) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 401 if (conn_from_host) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 402 hvs_new->chan = NULL;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 403 sock_put(new);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 404 } else {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 405 hvs->chan = NULL;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 406 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 407 goto out;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 408 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 409
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 410 set_per_channel_state(chan, conn_from_host ? new : sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 411 vmbus_set_chn_rescind_callback(chan, hvs_close_connection);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 412
cb359b60416701c Sunil Muthuswamy 2019-06-17 413 /* Set the pending send size to max packet size to always get
cb359b60416701c Sunil Muthuswamy 2019-06-17 414 * notifications from the host when there is enough writable space.
cb359b60416701c Sunil Muthuswamy 2019-06-17 415 * The host is optimized to send notifications only when the pending
cb359b60416701c Sunil Muthuswamy 2019-06-17 416 * size boundary is crossed, and not always.
cb359b60416701c Sunil Muthuswamy 2019-06-17 417 */
cb359b60416701c Sunil Muthuswamy 2019-06-17 418 hvs_set_channel_pending_send_size(chan);
cb359b60416701c Sunil Muthuswamy 2019-06-17 419
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 420 if (conn_from_host) {
3b4477d2dcf2709 Stefan Hajnoczi 2017-10-05 421 new->sk_state = TCP_ESTABLISHED;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 422 sk->sk_ack_backlog++;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 423
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 424 hvs_addr_init(&vnew->local_addr, if_type);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 425 hvs_remote_addr_init(&vnew->remote_addr, &vnew->local_addr);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 426
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 427 hvs_new->vm_srv_id = *if_type;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 428 hvs_new->host_srv_id = *if_instance;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 429
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 430 vsock_insert_connected(vnew);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 431
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 432 vsock_enqueue_accept(sk, new);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 433 } else {
3b4477d2dcf2709 Stefan Hajnoczi 2017-10-05 434 sk->sk_state = TCP_ESTABLISHED;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 435 sk->sk_socket->state = SS_CONNECTED;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 436
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 437 vsock_insert_connected(vsock_sk(sk));
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 438 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 439
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 440 sk->sk_state_change(sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 441
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 442 out:
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 443 /* Release refcnt obtained when we called vsock_find_bound_socket() */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 444 sock_put(sk);
b4562ca7925a3be Dexuan Cui 2017-10-19 445
b4562ca7925a3be Dexuan Cui 2017-10-19 446 release_sock(sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 447 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 448
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 449 static u32 hvs_get_local_cid(void)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 450 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 451 return VMADDR_CID_ANY;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 452 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 453
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 454 static int hvs_sock_init(struct vsock_sock *vsk, struct vsock_sock *psk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 455 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 456 struct hvsock *hvs;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 457 struct sock *sk = sk_vsock(vsk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 458
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 459 hvs = kzalloc(sizeof(*hvs), GFP_KERNEL);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 460 if (!hvs)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 461 return -ENOMEM;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 462
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 463 vsk->trans = hvs;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 464 hvs->vsk = vsk;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 465 sk->sk_sndbuf = RINGBUFFER_HVS_SND_SIZE;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 466 sk->sk_rcvbuf = RINGBUFFER_HVS_RCV_SIZE;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 467 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 468 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 469
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 470 static int hvs_connect(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 471 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 472 union hvs_service_id vm, host;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 473 struct hvsock *h = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 474
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 475 vm.srv_id = srv_id_template;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 476 vm.svm_port = vsk->local_addr.svm_port;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 477 h->vm_srv_id = vm.srv_id;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 478
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 479 host.srv_id = srv_id_template;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 480 host.svm_port = vsk->remote_addr.svm_port;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 481 h->host_srv_id = host.srv_id;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 482
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 483 return vmbus_send_tl_connect_request(&h->vm_srv_id, &h->host_srv_id);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 484 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 485
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 486 static void hvs_shutdown_lock_held(struct hvsock *hvs, int mode)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 487 {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 488 struct vmpipe_proto_header hdr;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 489
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 490 if (hvs->fin_sent || !hvs->chan)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 491 return;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 492
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 493 /* It can't fail: see hvs_channel_writable_bytes(). */
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 494 (void)hvs_send_data(hvs->chan, (struct hvs_send_buf *)&hdr, 0);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 495 hvs->fin_sent = true;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 496 }
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 497
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 498 static int hvs_shutdown(struct vsock_sock *vsk, int mode)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 499 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 500 struct sock *sk = sk_vsock(vsk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 501
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 502 if (!(mode & SEND_SHUTDOWN))
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 503 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 504
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 505 lock_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 506 hvs_shutdown_lock_held(vsk->trans, mode);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 507 release_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 508 return 0;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 509 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 510
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 511 static void hvs_close_timeout(struct work_struct *work)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 512 {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 513 struct vsock_sock *vsk =
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 514 container_of(work, struct vsock_sock, close_work.work);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 515 struct sock *sk = sk_vsock(vsk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 516
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 517 sock_hold(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 518 lock_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 519 if (!sock_flag(sk, SOCK_DONE))
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 520 hvs_do_close_lock_held(vsk, false);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 521
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 522 vsk->close_work_scheduled = false;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 523 release_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 524 sock_put(sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 525 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 526
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 527 /* Returns true, if it is safe to remove socket; false otherwise */
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 528 static bool hvs_close_lock_held(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 529 {
b4562ca7925a3be Dexuan Cui 2017-10-19 530 struct sock *sk = sk_vsock(vsk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 531
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 532 if (!(sk->sk_state == TCP_ESTABLISHED ||
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 533 sk->sk_state == TCP_CLOSING))
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 534 return true;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 535
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 536 if ((sk->sk_shutdown & SHUTDOWN_MASK) != SHUTDOWN_MASK)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 537 hvs_shutdown_lock_held(vsk->trans, SHUTDOWN_MASK);
b4562ca7925a3be Dexuan Cui 2017-10-19 538
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 539 if (sock_flag(sk, SOCK_DONE))
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 540 return true;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 541
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 542 /* This reference will be dropped by the delayed close routine */
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 543 sock_hold(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 544 INIT_DELAYED_WORK(&vsk->close_work, hvs_close_timeout);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 545 vsk->close_work_scheduled = true;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 546 schedule_delayed_work(&vsk->close_work, HVS_CLOSE_TIMEOUT);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 547 return false;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 548 }
b4562ca7925a3be Dexuan Cui 2017-10-19 549
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 550 static void hvs_release(struct vsock_sock *vsk)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 551 {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 552 struct sock *sk = sk_vsock(vsk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 553 bool remove_sock;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 554
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 555 lock_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 556 remove_sock = hvs_close_lock_held(vsk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 557 release_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 558 if (remove_sock)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 559 vsock_remove_sock(vsk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 560 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 561
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 562 static void hvs_destruct(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 563 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 564 struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 565 struct vmbus_channel *chan = hvs->chan;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 566
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 567 if (chan)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 568 vmbus_hvsock_device_unregister(chan);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 569
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 570 kfree(hvs);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 571 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 572
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 573 static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 574 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 575 return -EOPNOTSUPP;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 576 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 577
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 578 static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 579 size_t len, int flags)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 580 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 581 return -EOPNOTSUPP;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 582 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 583
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 584 static int hvs_dgram_enqueue(struct vsock_sock *vsk,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 585 struct sockaddr_vm *remote, struct msghdr *msg,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 586 size_t dgram_len)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 587 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 588 return -EOPNOTSUPP;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 589 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 590
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 591 static bool hvs_dgram_allow(u32 cid, u32 port)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 592 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 593 return false;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 594 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 595
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 596 static int hvs_update_recv_data(struct hvsock *hvs)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 597 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 598 struct hvs_recv_buf *recv_buf;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 599 u32 payload_len;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 600
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 601 recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 602 payload_len = recv_buf->hdr.data_size;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 603
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 604 if (payload_len > HVS_MTU_SIZE)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 605 return -EIO;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 606
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 607 if (payload_len == 0)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 608 hvs->vsk->peer_shutdown |= SEND_SHUTDOWN;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 609
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 610 hvs->recv_data_len = payload_len;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 611 hvs->recv_data_off = 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 612
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 613 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 614 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 615
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 616 static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 617 size_t len, int flags)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 618 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 619 struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 620 bool need_refill = !hvs->recv_desc;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 621 struct hvs_recv_buf *recv_buf;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 622 u32 to_read;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 623 int ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 624
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 625 if (flags & MSG_PEEK)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 626 return -EOPNOTSUPP;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 627
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 628 if (need_refill) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 629 hvs->recv_desc = hv_pkt_iter_first(hvs->chan);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 630 ret = hvs_update_recv_data(hvs);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 631 if (ret)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 632 return ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 633 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 634
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 635 recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 636 to_read = min_t(u32, len, hvs->recv_data_len);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 637 ret = memcpy_to_msg(msg, recv_buf->data + hvs->recv_data_off, to_read);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 638 if (ret != 0)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 639 return ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 640
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 641 hvs->recv_data_len -= to_read;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 642 if (hvs->recv_data_len == 0) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 643 hvs->recv_desc = hv_pkt_iter_next(hvs->chan, hvs->recv_desc);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 644 if (hvs->recv_desc) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 645 ret = hvs_update_recv_data(hvs);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 646 if (ret)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 647 return ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 648 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 649 } else {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 650 hvs->recv_data_off += to_read;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 651 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 652
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 653 return to_read;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 654 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 655
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 656 static ssize_t hvs_stream_enqueue(struct vsock_sock *vsk, struct msghdr *msg,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 657 size_t len)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 658 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 659 struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 660 struct vmbus_channel *chan = hvs->chan;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 661 struct hvs_send_buf *send_buf;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 662 ssize_t to_write, max_writable;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 663 ssize_t ret = 0;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 664 ssize_t bytes_written = 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 665
31113cc83e30924 Himadri Pandya 2019-07-25 666 BUILD_BUG_ON(sizeof(*send_buf) != HV_HYP_PAGE_SIZE);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 667
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 668 send_buf = kmalloc(sizeof(*send_buf), GFP_KERNEL);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 669 if (!send_buf)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 670 return -ENOMEM;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 671
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 672 /* Reader(s) could be draining data from the channel as we write.
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 673 * Maximize bandwidth, by iterating until the channel is found to be
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 674 * full.
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 675 */
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 676 while (len) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 677 max_writable = hvs_channel_writable_bytes(chan);
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 678 if (!max_writable)
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 679 break;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 680 to_write = min_t(ssize_t, len, max_writable);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 681 to_write = min_t(ssize_t, to_write, HVS_SEND_BUF_SIZE);
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 682 /* memcpy_from_msg is safe for loop as it advances the offsets
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 683 * within the message iterator.
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 684 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 685 ret = memcpy_from_msg(send_buf->data, msg, to_write);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 686 if (ret < 0)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 687 goto out;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 688
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 689 ret = hvs_send_data(hvs->chan, send_buf, to_write);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 690 if (ret < 0)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 691 goto out;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 692
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 693 bytes_written += to_write;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 694 len -= to_write;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 695 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 696 out:
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 697 /* If any data has been sent, return that */
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 698 if (bytes_written)
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 699 ret = bytes_written;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 700 kfree(send_buf);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 701 return ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 702 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 703
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 704 static s64 hvs_stream_has_data(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 705 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 706 struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 707 s64 ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 708
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 709 if (hvs->recv_data_len > 0)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 710 return 1;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 711
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 712 switch (hvs_channel_readable_payload(hvs->chan)) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 713 case 1:
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 714 ret = 1;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 715 break;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 716 case 0:
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 717 vsk->peer_shutdown |= SEND_SHUTDOWN;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 718 ret = 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 719 break;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 720 default: /* -1 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 721 ret = 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 722 break;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 723 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 724
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 725 return ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 726 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 727
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 728 static s64 hvs_stream_has_space(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 729 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 730 struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 731
cb359b60416701c Sunil Muthuswamy 2019-06-17 732 return hvs_channel_writable_bytes(hvs->chan);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 733 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 734
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 735 static u64 hvs_stream_rcvhiwat(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 736 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 737 return HVS_MTU_SIZE + 1;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 738 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 739
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 740 static bool hvs_stream_is_active(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 741 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 742 struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 743
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 744 return hvs->chan != NULL;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 745 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 746
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 747 static bool hvs_stream_allow(u32 cid, u32 port)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 748 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 749 /* The host's port range [MIN_HOST_EPHEMERAL_PORT, 0xFFFFFFFF) is
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 750 * reserved as ephemeral ports, which are used as the host's ports
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 751 * when the host initiates connections.
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 752 *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 753 * Perform this check in the guest so an immediate error is produced
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 754 * instead of a timeout.
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 755 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 756 if (port > MAX_HOST_LISTEN_PORT)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 757 return false;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 758
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 759 if (cid == VMADDR_CID_HOST)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 760 return true;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 761
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 762 return false;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 763 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 764
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 765 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 766 int hvs_notify_poll_in(struct vsock_sock *vsk, size_t target, bool *readable)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 767 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 768 struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 769
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 770 *readable = hvs_channel_readable(hvs->chan);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 771 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 772 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 773
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 774 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 775 int hvs_notify_poll_out(struct vsock_sock *vsk, size_t target, bool *writable)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 776 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 777 *writable = hvs_stream_has_space(vsk) > 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 778
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 779 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 780 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 781
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 782 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 783 int hvs_notify_recv_init(struct vsock_sock *vsk, size_t target,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 784 struct vsock_transport_recv_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 785 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 786 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 787 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 788
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 789 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 790 int hvs_notify_recv_pre_block(struct vsock_sock *vsk, size_t target,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 791 struct vsock_transport_recv_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 792 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 793 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 794 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 795
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 796 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 797 int hvs_notify_recv_pre_dequeue(struct vsock_sock *vsk, size_t target,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 798 struct vsock_transport_recv_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 799 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 800 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 801 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 802
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 803 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 804 int hvs_notify_recv_post_dequeue(struct vsock_sock *vsk, size_t target,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 805 ssize_t copied, bool data_read,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 806 struct vsock_transport_recv_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 807 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 808 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 809 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 810
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 811 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 812 int hvs_notify_send_init(struct vsock_sock *vsk,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 813 struct vsock_transport_send_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 814 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 815 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 816 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 817
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 818 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 819 int hvs_notify_send_pre_block(struct vsock_sock *vsk,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 820 struct vsock_transport_send_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 821 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 822 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 823 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 824
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 825 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 826 int hvs_notify_send_pre_enqueue(struct vsock_sock *vsk,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 827 struct vsock_transport_send_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 828 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 829 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 830 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 831
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 832 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 833 int hvs_notify_send_post_enqueue(struct vsock_sock *vsk, ssize_t written,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 834 struct vsock_transport_send_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 835 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 836 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 837 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 838
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 839 static void hvs_set_buffer_size(struct vsock_sock *vsk, u64 val)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 840 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 841 /* Ignored. */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 842 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 843
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 844 static void hvs_set_min_buffer_size(struct vsock_sock *vsk, u64 val)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 845 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 846 /* Ignored. */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 847 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 848
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 849 static void hvs_set_max_buffer_size(struct vsock_sock *vsk, u64 val)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 850 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 851 /* Ignored. */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 852 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 853
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 854 static u64 hvs_get_buffer_size(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 855 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 856 return -ENOPROTOOPT;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 857 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 858
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 859 static u64 hvs_get_min_buffer_size(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 860 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 861 return -ENOPROTOOPT;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 862 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 863
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 864 static u64 hvs_get_max_buffer_size(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 865 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 866 return -ENOPROTOOPT;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 867 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 868
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 869 static struct vsock_transport hvs_transport = {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 870 .get_local_cid = hvs_get_local_cid,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 871
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 872 .init = hvs_sock_init,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 873 .destruct = hvs_destruct,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 874 .release = hvs_release,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 875 .connect = hvs_connect,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 876 .shutdown = hvs_shutdown,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 877
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 878 .dgram_bind = hvs_dgram_bind,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 879 .dgram_dequeue = hvs_dgram_dequeue,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 880 .dgram_enqueue = hvs_dgram_enqueue,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 881 .dgram_allow = hvs_dgram_allow,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 882
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 883 .stream_dequeue = hvs_stream_dequeue,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 884 .stream_enqueue = hvs_stream_enqueue,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 885 .stream_has_data = hvs_stream_has_data,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 886 .stream_has_space = hvs_stream_has_space,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 887 .stream_rcvhiwat = hvs_stream_rcvhiwat,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 888 .stream_is_active = hvs_stream_is_active,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 889 .stream_allow = hvs_stream_allow,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 890
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 891 .notify_poll_in = hvs_notify_poll_in,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 892 .notify_poll_out = hvs_notify_poll_out,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 893 .notify_recv_init = hvs_notify_recv_init,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 894 .notify_recv_pre_block = hvs_notify_recv_pre_block,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 895 .notify_recv_pre_dequeue = hvs_notify_recv_pre_dequeue,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 896 .notify_recv_post_dequeue = hvs_notify_recv_post_dequeue,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 897 .notify_send_init = hvs_notify_send_init,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 898 .notify_send_pre_block = hvs_notify_send_pre_block,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 899 .notify_send_pre_enqueue = hvs_notify_send_pre_enqueue,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 900 .notify_send_post_enqueue = hvs_notify_send_post_enqueue,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 901
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 902 .set_buffer_size = hvs_set_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 903 .set_min_buffer_size = hvs_set_min_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 904 .set_max_buffer_size = hvs_set_max_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 905 .get_buffer_size = hvs_get_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 906 .get_min_buffer_size = hvs_get_min_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 907 .get_max_buffer_size = hvs_get_max_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 908 };
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 909
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 910 static int hvs_probe(struct hv_device *hdev,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 911 const struct hv_vmbus_device_id *dev_id)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 912 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 913 struct vmbus_channel *chan = hdev->channel;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 914
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 915 hvs_open_connection(chan);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 916
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 917 /* Always return success to suppress the unnecessary error message
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 918 * in vmbus_probe(): on error the host will rescind the device in
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 919 * 30 seconds and we can do cleanup at that time in
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 920 * vmbus_onoffer_rescind().
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 921 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 922 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 923 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 924
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 925 static int hvs_remove(struct hv_device *hdev)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 926 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 927 struct vmbus_channel *chan = hdev->channel;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 928
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 929 vmbus_close(chan);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 930
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 931 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 932 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 933
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 934 /* This isn't really used. See vmbus_match() and vmbus_probe() */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 935 static const struct hv_vmbus_device_id id_table[] = {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 936 {},
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 937 };
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 938
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 939 static struct hv_driver hvs_drv = {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 940 .name = "hv_sock",
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 941 .hvsock = true,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 942 .id_table = id_table,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 943 .probe = hvs_probe,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 944 .remove = hvs_remove,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 945 };
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 946
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 947 static int __init hvs_init(void)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 948 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 949 int ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 950
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 951 if (vmbus_proto_version < VERSION_WIN10)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 952 return -ENODEV;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 953
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 954 ret = vmbus_driver_register(&hvs_drv);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 955 if (ret != 0)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 956 return ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 957
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 958 ret = vsock_core_init(&hvs_transport);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 959 if (ret) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 960 vmbus_driver_unregister(&hvs_drv);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 961 return ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 962 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 963
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 964 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 965 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 966
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 967 static void __exit hvs_exit(void)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 968 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 969 vsock_core_exit();
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 970 vmbus_driver_unregister(&hvs_drv);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 971 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 972
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 973 module_init(hvs_init);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 974 module_exit(hvs_exit);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 975
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 976 MODULE_DESCRIPTION("Hyper-V Sockets");
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 977 MODULE_VERSION("1.0.0");
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 978 MODULE_LICENSE("GPL");
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 979 MODULE_ALIAS_NETPROTO(PF_VSOCK);
:::::: The code at line 214 was first introduced by commit
:::::: ae0078fcf0a5eb3a8623bfb5f988262e0911fdb9 hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
:::::: TO: Dexuan Cui <decui@microsoft.com>
:::::: CC: David S. Miller <davem@davemloft.net>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
^ permalink raw reply
* Re: [PATCH] hv_sock: use HV_HYP_PAGE_SIZE instead of PAGE_SIZE_4K
From: Himadri Pandya @ 2019-07-27 11:50 UTC (permalink / raw)
To: kbuild test robot
Cc: kbuild-all, mikelley, kys, haiyangz, sthemmin, sashal, davem,
linux-hyperv, netdev, linux-kernel, Himadri Pandya
In-Reply-To: <201907271302.tDRkl9uU%lkp@intel.com>
On 7/27/2019 10:50 AM, kbuild test robot wrote:
> Hi Himadri,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on linus/master]
> [cannot apply to v5.3-rc1 next-20190726]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
This patch should be applied to linux-next git tree.
Thank you.
- Himadri
>
> url: https://github.com/0day-ci/linux/commits/Himadri-Pandya/hv_sock-use-HV_HYP_PAGE_SIZE-instead-of-PAGE_SIZE_4K/20190726-085229
> config: x86_64-allyesconfig (attached as .config)
> compiler: gcc-7 (Debian 7.4.0-10) 7.4.0
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
>
> All error/warnings (new ones prefixed by >>):
>
>>> net/vmw_vsock/hyperv_transport.c:58:28: error: 'HV_HYP_PAGE_SIZE' undeclared here (not in a function); did you mean 'HV_MESSAGE_SIZE'?
> #define HVS_SEND_BUF_SIZE (HV_HYP_PAGE_SIZE - sizeof(struct vmpipe_proto_header))
> ^
>>> net/vmw_vsock/hyperv_transport.c:65:10: note: in expansion of macro 'HVS_SEND_BUF_SIZE'
> u8 data[HVS_SEND_BUF_SIZE];
> ^~~~~~~~~~~~~~~~~
> In file included from include/linux/list.h:9:0,
> from include/linux/module.h:9,
> from net/vmw_vsock/hyperv_transport.c:11:
> net/vmw_vsock/hyperv_transport.c: In function 'hvs_open_connection':
>>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
> __builtin_choose_expr(__safe_cmp(x, y), \
> ^
> include/linux/kernel.h:921:27: note: in expansion of macro '__careful_cmp'
> #define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >)
> ^~~~~~~~~~~~~
>>> net/vmw_vsock/hyperv_transport.c:390:12: note: in expansion of macro 'max_t'
> sndbuf = max_t(int, sk->sk_sndbuf, RINGBUFFER_HVS_SND_SIZE);
> ^~~~~
>>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
> __builtin_choose_expr(__safe_cmp(x, y), \
> ^
> include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
> #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
> ^~~~~~~~~~~~~
>>> net/vmw_vsock/hyperv_transport.c:391:12: note: in expansion of macro 'min_t'
> sndbuf = min_t(int, sndbuf, RINGBUFFER_HVS_MAX_SIZE);
> ^~~~~
>>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
> __builtin_choose_expr(__safe_cmp(x, y), \
> ^
> include/linux/kernel.h:921:27: note: in expansion of macro '__careful_cmp'
> #define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >)
> ^~~~~~~~~~~~~
> net/vmw_vsock/hyperv_transport.c:393:12: note: in expansion of macro 'max_t'
> rcvbuf = max_t(int, sk->sk_rcvbuf, RINGBUFFER_HVS_RCV_SIZE);
> ^~~~~
>>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
> __builtin_choose_expr(__safe_cmp(x, y), \
> ^
> include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
> #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
> ^~~~~~~~~~~~~
> net/vmw_vsock/hyperv_transport.c:394:12: note: in expansion of macro 'min_t'
> rcvbuf = min_t(int, rcvbuf, RINGBUFFER_HVS_MAX_SIZE);
> ^~~~~
> net/vmw_vsock/hyperv_transport.c: In function 'hvs_stream_enqueue':
>>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
> __builtin_choose_expr(__safe_cmp(x, y), \
> ^
> include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
> #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
> ^~~~~~~~~~~~~
> net/vmw_vsock/hyperv_transport.c:681:14: note: in expansion of macro 'min_t'
> to_write = min_t(ssize_t, to_write, HVS_SEND_BUF_SIZE);
> ^~~~~
>
> vim +58 net/vmw_vsock/hyperv_transport.c
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
^ permalink raw reply
* Re: [PATCH] hv_sock: use HV_HYP_PAGE_SIZE instead of PAGE_SIZE_4K
From: kbuild test robot @ 2019-07-27 5:20 UTC (permalink / raw)
To: Himadri Pandya
Cc: kbuild-all, mikelley, kys, haiyangz, sthemmin, sashal, davem,
linux-hyperv, netdev, linux-kernel, Himadri Pandya
In-Reply-To: <20190725051125.10605-1-himadri18.07@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4160 bytes --]
Hi Himadri,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linus/master]
[cannot apply to v5.3-rc1 next-20190726]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Himadri-Pandya/hv_sock-use-HV_HYP_PAGE_SIZE-instead-of-PAGE_SIZE_4K/20190726-085229
config: x86_64-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-10) 7.4.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
All error/warnings (new ones prefixed by >>):
>> net/vmw_vsock/hyperv_transport.c:58:28: error: 'HV_HYP_PAGE_SIZE' undeclared here (not in a function); did you mean 'HV_MESSAGE_SIZE'?
#define HVS_SEND_BUF_SIZE (HV_HYP_PAGE_SIZE - sizeof(struct vmpipe_proto_header))
^
>> net/vmw_vsock/hyperv_transport.c:65:10: note: in expansion of macro 'HVS_SEND_BUF_SIZE'
u8 data[HVS_SEND_BUF_SIZE];
^~~~~~~~~~~~~~~~~
In file included from include/linux/list.h:9:0,
from include/linux/module.h:9,
from net/vmw_vsock/hyperv_transport.c:11:
net/vmw_vsock/hyperv_transport.c: In function 'hvs_open_connection':
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
__builtin_choose_expr(__safe_cmp(x, y), \
^
include/linux/kernel.h:921:27: note: in expansion of macro '__careful_cmp'
#define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >)
^~~~~~~~~~~~~
>> net/vmw_vsock/hyperv_transport.c:390:12: note: in expansion of macro 'max_t'
sndbuf = max_t(int, sk->sk_sndbuf, RINGBUFFER_HVS_SND_SIZE);
^~~~~
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
__builtin_choose_expr(__safe_cmp(x, y), \
^
include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
#define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
^~~~~~~~~~~~~
>> net/vmw_vsock/hyperv_transport.c:391:12: note: in expansion of macro 'min_t'
sndbuf = min_t(int, sndbuf, RINGBUFFER_HVS_MAX_SIZE);
^~~~~
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
__builtin_choose_expr(__safe_cmp(x, y), \
^
include/linux/kernel.h:921:27: note: in expansion of macro '__careful_cmp'
#define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >)
^~~~~~~~~~~~~
net/vmw_vsock/hyperv_transport.c:393:12: note: in expansion of macro 'max_t'
rcvbuf = max_t(int, sk->sk_rcvbuf, RINGBUFFER_HVS_RCV_SIZE);
^~~~~
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
__builtin_choose_expr(__safe_cmp(x, y), \
^
include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
#define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
^~~~~~~~~~~~~
net/vmw_vsock/hyperv_transport.c:394:12: note: in expansion of macro 'min_t'
rcvbuf = min_t(int, rcvbuf, RINGBUFFER_HVS_MAX_SIZE);
^~~~~
net/vmw_vsock/hyperv_transport.c: In function 'hvs_stream_enqueue':
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
__builtin_choose_expr(__safe_cmp(x, y), \
^
include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
#define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
^~~~~~~~~~~~~
net/vmw_vsock/hyperv_transport.c:681:14: note: in expansion of macro 'min_t'
to_write = min_t(ssize_t, to_write, HVS_SEND_BUF_SIZE);
^~~~~
vim +58 net/vmw_vsock/hyperv_transport.c
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 69531 bytes --]
^ permalink raw reply
* [PATCH] clocksource/drivers: hyperv_timer: Fix CPU offlining by unbinding the timer
From: Dexuan Cui @ 2019-07-27 5:07 UTC (permalink / raw)
To: tglx@linutronix.de, daniel.lezcano@linaro.org,
gregkh@linuxfoundation.org, sashal@kernel.org, Stephen Hemminger,
Haiyang Zhang, KY Srinivasan, Michael Kelley,
linux-hyperv@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, Dexuan Cui
The commit fd1fea6834d0 says "No behavior is changed", but actually it
removes the clockevents_unbind_device() call from hv_synic_cleanup().
In the discussion earlier this month, I thought the unbind call is
unnecessary (see https://www.spinics.net/lists/arm-kernel/msg739888.html),
however, after more investigation, when a VM runs on Hyper-V, it turns out
the unbind call must be kept, otherwise CPU offling may not work, because
a per-cpu timer device is still needed, after hv_synic_cleanup() disables
the per-cpu Hyper-V timer device.
The issue is found in the hibernation test. These are the details:
1. CPU0 hangs in wait_for_ap_thread(), when trying to offline CPU1:
hibernation_snapshot
create_image
suspend_disable_secondary_cpus
freeze_secondary_cpus
_cpu_down(1, 1, CPUHP_OFFLINE)
cpuhp_kick_ap_work
cpuhp_kick_ap
__cpuhp_kick_ap
wait_for_ap_thread()
2. CPU0 hangs because CPU1 hangs this way: after CPU1 disables the per-cpu
Hyper-V timer device in hv_synic_cleanup(), CPU1 sets a timer... Please
read on to see how this can happen.
2.1 By "_cpu_down(1, 1, CPUHP_OFFLINE):", CPU0 first tries to move CPU1 to
the CPUHP_TEARDOWN_CPU state and this wakes up the cpuhp/1 thread on CPU1;
the thread is basically a loop of executing various callbacks defined in
the global array cpuhp_hp_states[]: see smpboot_thread_fn().
2.2 This is how a callback is called on CPU1:
smpboot_thread_fn
ht->thread_fn(td->cpu), i.e. cpuhp_thread_fun
cpuhp_invoke_callback
state = st->state
st->state--
cpuhp_get_step(state)->teardown.single()
2.3 At first, the state of CPU1 is CPUHP_ONLINE, which defines a
.teardown.single of NULL, so the execution of the code returns to the loop
in smpboot_thread_fn(), and then reruns cpuhp_invoke_callback() with a
smaller st->state.
2.4 The .teardown.single of every state between CPUHP_ONLINE and
CPUHP_TEARDOWN_CPU runs one by one.
2.5 When it comes to the CPUHP_AP_ONLINE_DYN range, hv_synic_cleanup()
runs: see vmbus_bus_init(). It calls hv_stimer_cleanup() ->
hv_ce_shutdown() to disable the per-cpu timer device, so timer interrupt
will no longer happen on CPU1.
2.6 Later, the .teardown.single of CPUHP_AP_SMPBOOT_THREADS, i.e.
smpboot_park_threads(), starts to run, trying to park all the other
hotplug_threads, e.g. ksoftirqd/1 and rcuc/1; here a timer can be set up
this way and the timer will never be fired since CPU1 doesn't have
an active timer device now, so CPU1 hangs and can not be offlined:
smpboot_park_threads
smpboot_park_thread
kthread_park
wait_task_inactive
schedule_hrtimeout(&to, HRTIMER_MODE_REL)
With this patch, when the per-cpu Hyper-V timer device is disabled, the
system switches to the Local APIC timer, and the hang issue can not
happen.
Fixes: fd1fea6834d0 ("clocksource/drivers: Make Hyper-V clocksource ISA agnostic")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
drivers/clocksource/hyperv_timer.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index 41c31a7ac0e4..8f3422c66cbb 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -139,6 +139,7 @@ void hv_stimer_cleanup(unsigned int cpu)
/* Turn off clockevent device */
if (ms_hyperv.features & HV_MSR_SYNTIMER_AVAILABLE) {
ce = per_cpu_ptr(hv_clock_event, cpu);
+ clockevents_unbind_device(ce, cpu);
hv_ce_shutdown(ce);
}
}
--
2.19.1
^ permalink raw reply related
* Re: [PATCH 1/2] Drivers: hv: Specify receive buffer size using Hyper-V page size
From: Stephen Hemminger @ 2019-07-26 16:07 UTC (permalink / raw)
To: Himadri Pandya
Cc: Michael Kelley, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
sashal, linux-hyperv, linux-kernel, himadri18.07
In-Reply-To: <20190725050315.6935-2-himadri18.07@gmail.com>
On Wed, 24 Jul 2019 22:03:14 -0700
"Himadri Pandya" <himadrispandya@gmail.com> wrote:
> The recv_buffer is used to retrieve data from the VMbus ring buffer.
> VMbus ring buffers are sized based on the guest page size which
> Hyper-V assumes to be 4KB. But it may be different on some
> architectures. So use the Hyper-V page size to allocate the
> recv_buffer and set the maximum size to receive.
>
> Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>
If pagesize is 64K, then doing it this way will waste lots of
memory.
^ permalink raw reply
* Re: [PATCH 2/2] Drivers: hv: util: Specify ring buffer size using Hyper-V page size
From: Stephen Hemminger @ 2019-07-26 16:06 UTC (permalink / raw)
To: Himadri Pandya
Cc: Michael Kelley, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
sashal, linux-hyperv, linux-kernel, himadri18.07
In-Reply-To: <20190725050315.6935-3-himadri18.07@gmail.com>
On Wed, 24 Jul 2019 22:03:15 -0700
"Himadri Pandya" <himadrispandya@gmail.com> wrote:
> VMbus ring buffers are sized based on the 4K page size used by
> Hyper-V. The Linux guest page size may not be 4K on all architectures
> so use the Hyper-V page size to specify the ring buffer size.
>
> Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>
> ---
> drivers/hv/hv_util.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
> index c2c08f26bd5f..766bd8457346 100644
> --- a/drivers/hv/hv_util.c
> +++ b/drivers/hv/hv_util.c
> @@ -413,8 +413,9 @@ static int util_probe(struct hv_device *dev,
>
> hv_set_drvdata(dev, srv);
>
> - ret = vmbus_open(dev->channel, 4 * PAGE_SIZE, 4 * PAGE_SIZE, NULL,
> 0,
> - srv->util_cb, dev->channel);
> + ret = vmbus_open(dev->channel, 4 * HV_HYP_PAGE_SIZE,
> + 4 * HV_HYP_PAGE_SIZE, NULL, 0, srv->util_cb,
> + dev->channel);
> if (ret)
> goto error;
>
hv_util doesn't need lots of buffering. Why not define a fixed
value across all architectures. Maybe with some roundup to HV_HYP_PAGE_SIZE.
^ permalink raw reply
* Re: [PATCH v3 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
From: Juergen Gross @ 2019-07-26 7:28 UTC (permalink / raw)
To: Nadav Amit, Andy Lutomirski, Dave Hansen
Cc: Borislav Petkov, Peter Zijlstra, Sasha Levin, x86,
Thomas Gleixner, virtualization, xen-devel, Haiyang Zhang,
K. Y. Srinivasan, Stephen Hemminger, Boris Ostrovsky, Ingo Molnar,
Paolo Bonzini, kvm, linux-hyperv, linux-kernel
In-Reply-To: <20190719005837.4150-5-namit@vmware.com>
On 19.07.19 02:58, Nadav Amit wrote:
> To improve TLB shootdown performance, flush the remote and local TLBs
> concurrently. Introduce flush_tlb_multi() that does so. Introduce
> paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
> and hyper-v are only compile-tested).
>
> While the updated smp infrastructure is capable of running a function on
> a single local core, it is not optimized for this case. The multiple
> function calls and the indirect branch introduce some overhead, and
> might make local TLB flushes slower than they were before the recent
> changes.
>
> Before calling the SMP infrastructure, check if only a local TLB flush
> is needed to restore the lost performance in this common case. This
> requires to check mm_cpumask() one more time, but unless this mask is
> updated very frequently, this should impact performance negatively.
>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Sasha Levin <sashal@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: x86@kernel.org
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: linux-hyperv@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: virtualization@lists.linux-foundation.org
> Cc: kvm@vger.kernel.org
> Cc: xen-devel@lists.xenproject.org
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
> arch/x86/hyperv/mmu.c | 10 +++---
> arch/x86/include/asm/paravirt.h | 6 ++--
> arch/x86/include/asm/paravirt_types.h | 4 +--
> arch/x86/include/asm/tlbflush.h | 8 ++---
> arch/x86/include/asm/trace/hyperv.h | 2 +-
> arch/x86/kernel/kvm.c | 11 +++++--
> arch/x86/kernel/paravirt.c | 2 +-
> arch/x86/mm/tlb.c | 47 ++++++++++++++++++---------
> arch/x86/xen/mmu_pv.c | 11 +++----
> include/trace/events/xen.h | 2 +-
> 10 files changed, 62 insertions(+), 41 deletions(-)
Xen and paravirt parts: Reviewed-by: Juergen Gross <jgross@suse.com>
Juergen
^ permalink raw reply
* Re: [PATCH] hv_sock: use HV_HYP_PAGE_SIZE instead of PAGE_SIZE_4K
From: David Miller @ 2019-07-26 0:26 UTC (permalink / raw)
To: himadrispandya
Cc: mikelley, kys, haiyangz, sthemmin, sashal, linux-hyperv, netdev,
linux-kernel, himadri18.07
In-Reply-To: <20190725051125.10605-1-himadri18.07@gmail.com>
From: Himadri Pandya <himadrispandya@gmail.com>
Date: Thu, 25 Jul 2019 05:11:25 +0000
> Older windows hosts require the hv_sock ring buffer to be defined
> using 4K pages. This was achieved by using the symbol PAGE_SIZE_4K
> defined specifically for this purpose. But now we have a new symbol
> HV_HYP_PAGE_SIZE defined in hyperv-tlfs which can be used for this.
>
> This patch removes the definition of symbol PAGE_SIZE_4K and replaces
> its usage with the symbol HV_HYP_PAGE_SIZE. This patch also aligns
> sndbuf and rcvbuf to hyper-v specific page size using HV_HYP_PAGE_SIZE
> instead of the guest page size(PAGE_SIZE) as hyper-v expects the page
> size to be 4K and it might not be the case on ARM64 architecture.
>
> Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>
This doesn't compile:
CC [M] net/vmw_vsock/hyperv_transport.o
net/vmw_vsock/hyperv_transport.c:58:28: error: ‘HV_HYP_PAGE_SIZE’ undeclared here (not in a function); did you mean ‘HV_MESSAGE_SIZE’?
#define HVS_SEND_BUF_SIZE (HV_HYP_PAGE_SIZE - sizeof(struct vmpipe_proto_header))
^~~~~~~~~~~~~~~~
^ permalink raw reply
* Re: [PATCH net-next] Name NICs based on vmbus offer and enable async probe by default
From: David Miller @ 2019-07-25 18:46 UTC (permalink / raw)
To: haiyangz
Cc: sashal, linux-hyperv, netdev, kys, sthemmin, olaf, vkuznets,
linux-kernel
In-Reply-To: <1563908517-55735-1-git-send-email-haiyangz@microsoft.com>
1) Subject: line lacks proper subsystem prefix
2) No module parameters in networking drivers, sorry. Find some generic way to do
this via devlink or similar.
^ permalink raw reply
* [PATCH] hv_sock: use HV_HYP_PAGE_SIZE instead of PAGE_SIZE_4K
From: Himadri Pandya @ 2019-07-25 5:11 UTC (permalink / raw)
To: mikelley, kys, haiyangz, sthemmin, sashal, davem
Cc: linux-hyperv, netdev, linux-kernel, Himadri Pandya
Older windows hosts require the hv_sock ring buffer to be defined
using 4K pages. This was achieved by using the symbol PAGE_SIZE_4K
defined specifically for this purpose. But now we have a new symbol
HV_HYP_PAGE_SIZE defined in hyperv-tlfs which can be used for this.
This patch removes the definition of symbol PAGE_SIZE_4K and replaces
its usage with the symbol HV_HYP_PAGE_SIZE. This patch also aligns
sndbuf and rcvbuf to hyper-v specific page size using HV_HYP_PAGE_SIZE
instead of the guest page size(PAGE_SIZE) as hyper-v expects the page
size to be 4K and it might not be the case on ARM64 architecture.
Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>
---
net/vmw_vsock/hyperv_transport.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index f2084e3f7aa4..ecb5d72d8010 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -13,15 +13,16 @@
#include <linux/hyperv.h>
#include <net/sock.h>
#include <net/af_vsock.h>
+#include <asm/hyperv-tlfs.h>
/* Older (VMBUS version 'VERSION_WIN10' or before) Windows hosts have some
- * stricter requirements on the hv_sock ring buffer size of six 4K pages. Newer
- * hosts don't have this limitation; but, keep the defaults the same for compat.
+ * stricter requirements on the hv_sock ring buffer size of six 4K pages.
+ * hyperv-tlfs defines HV_HYP_PAGE_SIZE as 4K. Newer hosts don't have this
+ * limitation; but, keep the defaults the same for compat.
*/
-#define PAGE_SIZE_4K 4096
-#define RINGBUFFER_HVS_RCV_SIZE (PAGE_SIZE_4K * 6)
-#define RINGBUFFER_HVS_SND_SIZE (PAGE_SIZE_4K * 6)
-#define RINGBUFFER_HVS_MAX_SIZE (PAGE_SIZE_4K * 64)
+#define RINGBUFFER_HVS_RCV_SIZE (HV_HYP_PAGE_SIZE * 6)
+#define RINGBUFFER_HVS_SND_SIZE (HV_HYP_PAGE_SIZE * 6)
+#define RINGBUFFER_HVS_MAX_SIZE (HV_HYP_PAGE_SIZE * 64)
/* The MTU is 16KB per the host side's design */
#define HVS_MTU_SIZE (1024 * 16)
@@ -54,7 +55,7 @@ struct hvs_recv_buf {
* ringbuffer APIs that allow us to directly copy data from userspace buffer
* to VMBus ringbuffer.
*/
-#define HVS_SEND_BUF_SIZE (PAGE_SIZE_4K - sizeof(struct vmpipe_proto_header))
+#define HVS_SEND_BUF_SIZE (HV_HYP_PAGE_SIZE - sizeof(struct vmpipe_proto_header))
struct hvs_send_buf {
/* The header before the payload data */
@@ -388,10 +389,10 @@ static void hvs_open_connection(struct vmbus_channel *chan)
} else {
sndbuf = max_t(int, sk->sk_sndbuf, RINGBUFFER_HVS_SND_SIZE);
sndbuf = min_t(int, sndbuf, RINGBUFFER_HVS_MAX_SIZE);
- sndbuf = ALIGN(sndbuf, PAGE_SIZE);
+ sndbuf = ALIGN(sndbuf, HV_HYP_PAGE_SIZE);
rcvbuf = max_t(int, sk->sk_rcvbuf, RINGBUFFER_HVS_RCV_SIZE);
rcvbuf = min_t(int, rcvbuf, RINGBUFFER_HVS_MAX_SIZE);
- rcvbuf = ALIGN(rcvbuf, PAGE_SIZE);
+ rcvbuf = ALIGN(rcvbuf, HV_HYP_PAGE_SIZE);
}
ret = vmbus_open(chan, sndbuf, rcvbuf, NULL, 0, hvs_channel_cb,
@@ -662,7 +663,7 @@ static ssize_t hvs_stream_enqueue(struct vsock_sock *vsk, struct msghdr *msg,
ssize_t ret = 0;
ssize_t bytes_written = 0;
- BUILD_BUG_ON(sizeof(*send_buf) != PAGE_SIZE_4K);
+ BUILD_BUG_ON(sizeof(*send_buf) != HV_HYP_PAGE_SIZE);
send_buf = kmalloc(sizeof(*send_buf), GFP_KERNEL);
if (!send_buf)
--
2.17.1
^ permalink raw reply related
* [PATCH 2/2] Drivers: hv: util: Specify ring buffer size using Hyper-V page size
From: Himadri Pandya @ 2019-07-25 5:03 UTC (permalink / raw)
To: mikelley, kys, haiyangz, sthemmin, sashal
Cc: linux-hyperv, linux-kernel, Himadri Pandya
In-Reply-To: <20190725050315.6935-1-himadri18.07@gmail.com>
VMbus ring buffers are sized based on the 4K page size used by
Hyper-V. The Linux guest page size may not be 4K on all architectures
so use the Hyper-V page size to specify the ring buffer size.
Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>
---
drivers/hv/hv_util.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
index c2c08f26bd5f..766bd8457346 100644
--- a/drivers/hv/hv_util.c
+++ b/drivers/hv/hv_util.c
@@ -413,8 +413,9 @@ static int util_probe(struct hv_device *dev,
hv_set_drvdata(dev, srv);
- ret = vmbus_open(dev->channel, 4 * PAGE_SIZE, 4 * PAGE_SIZE, NULL, 0,
- srv->util_cb, dev->channel);
+ ret = vmbus_open(dev->channel, 4 * HV_HYP_PAGE_SIZE,
+ 4 * HV_HYP_PAGE_SIZE, NULL, 0, srv->util_cb,
+ dev->channel);
if (ret)
goto error;
--
2.17.1
^ permalink raw reply related
* [PATCH 1/2] Drivers: hv: Specify receive buffer size using Hyper-V page size
From: Himadri Pandya @ 2019-07-25 5:03 UTC (permalink / raw)
To: mikelley, kys, haiyangz, sthemmin, sashal
Cc: linux-hyperv, linux-kernel, Himadri Pandya
In-Reply-To: <20190725050315.6935-1-himadri18.07@gmail.com>
The recv_buffer is used to retrieve data from the VMbus ring buffer.
VMbus ring buffers are sized based on the guest page size which
Hyper-V assumes to be 4KB. But it may be different on some
architectures. So use the Hyper-V page size to allocate the
recv_buffer and set the maximum size to receive.
Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>
---
drivers/hv/hv_fcopy.c | 3 ++-
drivers/hv/hv_kvp.c | 3 ++-
drivers/hv/hv_snapshot.c | 3 ++-
drivers/hv/hv_util.c | 8 ++++----
4 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
index 7e30ae0635cc..08fa4a5de644 100644
--- a/drivers/hv/hv_fcopy.c
+++ b/drivers/hv/hv_fcopy.c
@@ -13,6 +13,7 @@
#include <linux/workqueue.h>
#include <linux/hyperv.h>
#include <linux/sched.h>
+#include <asm/hyperv-tlfs.h>
#include "hyperv_vmbus.h"
#include "hv_utils_transport.h"
@@ -234,7 +235,7 @@ void hv_fcopy_onchannelcallback(void *context)
if (fcopy_transaction.state > HVUTIL_READY)
return;
- vmbus_recvpacket(channel, recv_buffer, PAGE_SIZE * 2, &recvlen,
+ vmbus_recvpacket(channel, recv_buffer, HV_HYP_PAGE_SIZE * 2, &recvlen,
&requestid);
if (recvlen <= 0)
return;
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index 5054d1105236..ae7c028dc5a8 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -27,6 +27,7 @@
#include <linux/connector.h>
#include <linux/workqueue.h>
#include <linux/hyperv.h>
+#include <asm/hyperv-tlfs.h>
#include "hyperv_vmbus.h"
#include "hv_utils_transport.h"
@@ -661,7 +662,7 @@ void hv_kvp_onchannelcallback(void *context)
if (kvp_transaction.state > HVUTIL_READY)
return;
- vmbus_recvpacket(channel, recv_buffer, PAGE_SIZE * 4, &recvlen,
+ vmbus_recvpacket(channel, recv_buffer, HV_HYP_PAGE_SIZE * 4, &recvlen,
&requestid);
if (recvlen > 0) {
diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
index 20ba95b75a94..03b6454268b3 100644
--- a/drivers/hv/hv_snapshot.c
+++ b/drivers/hv/hv_snapshot.c
@@ -12,6 +12,7 @@
#include <linux/connector.h>
#include <linux/workqueue.h>
#include <linux/hyperv.h>
+#include <asm/hyperv-tlfs.h>
#include "hyperv_vmbus.h"
#include "hv_utils_transport.h"
@@ -297,7 +298,7 @@ void hv_vss_onchannelcallback(void *context)
if (vss_transaction.state > HVUTIL_READY)
return;
- vmbus_recvpacket(channel, recv_buffer, PAGE_SIZE * 2, &recvlen,
+ vmbus_recvpacket(channel, recv_buffer, HV_HYP_PAGE_SIZE * 2, &recvlen,
&requestid);
if (recvlen > 0) {
diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
index e32681ee7b9f..c2c08f26bd5f 100644
--- a/drivers/hv/hv_util.c
+++ b/drivers/hv/hv_util.c
@@ -136,7 +136,7 @@ static void shutdown_onchannelcallback(void *context)
struct icmsg_hdr *icmsghdrp;
vmbus_recvpacket(channel, shut_txf_buf,
- PAGE_SIZE, &recvlen, &requestid);
+ HV_HYP_PAGE_SIZE, &recvlen, &requestid);
if (recvlen > 0) {
icmsghdrp = (struct icmsg_hdr *)&shut_txf_buf[
@@ -284,7 +284,7 @@ static void timesync_onchannelcallback(void *context)
u8 *time_txf_buf = util_timesynch.recv_buffer;
vmbus_recvpacket(channel, time_txf_buf,
- PAGE_SIZE, &recvlen, &requestid);
+ HV_HYP_PAGE_SIZE, &recvlen, &requestid);
if (recvlen > 0) {
icmsghdrp = (struct icmsg_hdr *)&time_txf_buf[
@@ -346,7 +346,7 @@ static void heartbeat_onchannelcallback(void *context)
while (1) {
vmbus_recvpacket(channel, hbeat_txf_buf,
- PAGE_SIZE, &recvlen, &requestid);
+ HV_HYP_PAGE_SIZE, &recvlen, &requestid);
if (!recvlen)
break;
@@ -390,7 +390,7 @@ static int util_probe(struct hv_device *dev,
(struct hv_util_service *)dev_id->driver_data;
int ret;
- srv->recv_buffer = kmalloc(PAGE_SIZE * 4, GFP_KERNEL);
+ srv->recv_buffer = kmalloc(HV_HYP_PAGE_SIZE * 4, GFP_KERNEL);
if (!srv->recv_buffer)
return -ENOMEM;
srv->channel = dev->channel;
--
2.17.1
^ permalink raw reply related
* [PATCH 0/2] Drivers: hv: Specify buffer size using Hyper-V page size
From: Himadri Pandya @ 2019-07-25 5:03 UTC (permalink / raw)
To: mikelley, kys, haiyangz, sthemmin, sashal
Cc: linux-hyperv, linux-kernel, Himadri Pandya
recv_buffer and VMbus ring buffers are sized based on guest page size
which Hyper-V assumes to be 4KB. It might not be the case for some
architectures. Hence instead use the Hyper-V page size.
Himadri Pandya (2):
Drivers: hv: Specify receive buffer size using Hyper-V page size
Drivers: hv: util: Specify ring buffer size using Hyper-V page size
drivers/hv/hv_fcopy.c | 3 ++-
drivers/hv/hv_kvp.c | 3 ++-
drivers/hv/hv_snapshot.c | 3 ++-
drivers/hv/hv_util.c | 13 +++++++------
4 files changed, 13 insertions(+), 9 deletions(-)
--
2.17.1
^ permalink raw reply
* Re: [PATCH v3] locking/spinlocks, paravirt, hyperv: Correct the hv_nopvspin case
From: Zhenzhong Duan @ 2019-07-24 7:29 UTC (permalink / raw)
To: linux-kernel
Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
Juergen Gross, Boris Ostrovsky, Peter Zijlstra, Waiman Long,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, linux-hyperv
In-Reply-To: <1562120635-9806-1-git-send-email-zhenzhong.duan@oracle.com>
Hi Maintainers,
Any further comments on this? Thanks
Zhenzhong
On 2019/7/3 10:23, Zhenzhong Duan wrote:
> With the boot parameter "hv_nopvspin" specified a Hyperv guest should
> not make use of paravirt spinlocks, but behave as if running on bare
> metal. This is not true, however, as the qspinlock code will fall back
> to a test-and-set scheme when it is detecting a hypervisor.
>
> In order to avoid this disable the virt_spin_lock_key.
>
> Same change for XEN is already in Commit e6fd28eb3522
> ("locking/spinlocks, paravirt, xen: Correct the xen_nopvspin case")
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Sasha Levin <sashal@kernel.org>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: linux-hyperv@vger.kernel.org
> ---
> v3: remove unlikely() as suggested by Sasha
>
> arch/x86/hyperv/hv_spinlock.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/hyperv/hv_spinlock.c b/arch/x86/hyperv/hv_spinlock.c
> index 07f21a0..210495b 100644
> --- a/arch/x86/hyperv/hv_spinlock.c
> +++ b/arch/x86/hyperv/hv_spinlock.c
> @@ -64,6 +64,9 @@ __visible bool hv_vcpu_is_preempted(int vcpu)
>
> void __init hv_init_spinlocks(void)
> {
> + if (!hv_pvspin)
> + static_branch_disable(&virt_spin_lock_key);
> +
> if (!hv_pvspin || !apic ||
> !(ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) ||
> !(ms_hyperv.features & HV_X64_MSR_GUEST_IDLE_AVAILABLE)) {
^ permalink raw reply
* Re: [PATCH v1] hv_sock: Use consistent types for UUIDs
From: David Miller @ 2019-07-23 20:58 UTC (permalink / raw)
To: andriy.shevchenko; +Cc: haiyangz, kys, sthemmin, sashal, linux-hyperv, netdev
In-Reply-To: <20190723163943.65991-1-andriy.shevchenko@linux.intel.com>
From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date: Tue, 23 Jul 2019 19:39:43 +0300
> The rest of Hyper-V code is using new types for UUID handling.
> Convert hv_sock as well.
>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Applied to net-next.
^ permalink raw reply
* [PATCH net-next] Name NICs based on vmbus offer and enable async probe by default
From: Haiyang Zhang @ 2019-07-23 19:02 UTC (permalink / raw)
To: sashal@kernel.org, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf@aepfle.de,
vkuznets, davem@davemloft.net, linux-kernel@vger.kernel.org
Previously the async probing caused NIC naming in random order.
The patch adds a dev_num field in vmbus channel structure. It’s assigned
to the first available number when the channel is offered. So netvsc can
use it for NIC naming based on channel offer sequence. Now we re-enable
the async probing mode by default for faster probing.
Also added a modules parameter, probe_type, to set sync probing mode if
a user wants to.
Fixes: af0a5646cb8d ("use the new async probing feature for the hyperv drivers")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/hv/channel_mgmt.c | 46 +++++++++++++++++++++++++++++++++++++++--
drivers/net/hyperv/netvsc_drv.c | 33 ++++++++++++++++++++++++++---
include/linux/hyperv.h | 4 ++++
3 files changed, 78 insertions(+), 5 deletions(-)
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index addcef5..ab7c05b 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -304,6 +304,8 @@ bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp,
EXPORT_SYMBOL_GPL(vmbus_prep_negotiate_resp);
+#define HV_DEV_NUM_INVALID (-1)
+
/*
* alloc_channel - Allocate and initialize a vmbus channel object
*/
@@ -315,6 +317,8 @@ static struct vmbus_channel *alloc_channel(void)
if (!channel)
return NULL;
+ channel->dev_num = HV_DEV_NUM_INVALID;
+
spin_lock_init(&channel->lock);
init_completion(&channel->rescind_event);
@@ -533,6 +537,42 @@ static void vmbus_add_channel_work(struct work_struct *work)
}
/*
+ * Get the first available device number of its type, then
+ * record it in the channel structure.
+ */
+static void hv_set_devnum(struct vmbus_channel *newchannel)
+{
+ struct vmbus_channel *channel;
+ unsigned int i = 0;
+ bool found;
+
+ BUG_ON(!mutex_is_locked(&vmbus_connection.channel_mutex));
+
+ /* Only HV_NIC uses this number for now */
+ if (hv_get_dev_type(newchannel) != HV_NIC)
+ return;
+
+next:
+ found = false;
+
+ list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
+ if (i == channel->dev_num &&
+ guid_equal(&channel->offermsg.offer.if_type,
+ &newchannel->offermsg.offer.if_type)) {
+ found = true;
+ break;
+ }
+ }
+
+ if (found) {
+ i++;
+ goto next;
+ }
+
+ newchannel->dev_num = i;
+}
+
+/*
* vmbus_process_offer - Process the offer by creating a channel/device
* associated with this offer
*/
@@ -561,10 +601,12 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
}
}
- if (fnew)
+ if (fnew) {
+ hv_set_devnum(newchannel);
+
list_add_tail(&newchannel->listentry,
&vmbus_connection.chn_list);
- else {
+ } else {
/*
* Check to see if this is a valid sub-channel.
*/
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index afdcc56..af53690 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -57,6 +57,10 @@
module_param(debug, int, 0444);
MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
+static unsigned int probe_type __ro_after_init = PROBE_PREFER_ASYNCHRONOUS;
+module_param(probe_type, uint, 0444);
+MODULE_PARM_DESC(probe_type, "Probe type: 1=async(default), 2=sync");
+
static LIST_HEAD(netvsc_dev_list);
static void netvsc_change_rx_flags(struct net_device *net, int change)
@@ -2233,10 +2237,19 @@ static int netvsc_probe(struct hv_device *dev,
struct net_device_context *net_device_ctx;
struct netvsc_device_info *device_info = NULL;
struct netvsc_device *nvdev;
+ char name[IFNAMSIZ];
int ret = -ENOMEM;
- net = alloc_etherdev_mq(sizeof(struct net_device_context),
- VRSS_CHANNEL_MAX);
+ if (probe_type == PROBE_PREFER_ASYNCHRONOUS) {
+ snprintf(name, IFNAMSIZ, "eth%d", dev->channel->dev_num);
+ net = alloc_netdev_mqs(sizeof(struct net_device_context), name,
+ NET_NAME_ENUM, ether_setup,
+ VRSS_CHANNEL_MAX, VRSS_CHANNEL_MAX);
+ } else {
+ net = alloc_etherdev_mq(sizeof(struct net_device_context),
+ VRSS_CHANNEL_MAX);
+ }
+
if (!net)
goto no_net;
@@ -2323,6 +2336,14 @@ static int netvsc_probe(struct hv_device *dev,
net->max_mtu = ETH_DATA_LEN;
ret = register_netdevice(net);
+
+ if (ret == -EEXIST) {
+ pr_info("NIC name %s exists, request another name.\n",
+ net->name);
+ strlcpy(net->name, "eth%d", IFNAMSIZ);
+ ret = register_netdevice(net);
+ }
+
if (ret != 0) {
pr_err("Unable to register netdev.\n");
goto register_failed;
@@ -2407,7 +2428,7 @@ static int netvsc_remove(struct hv_device *dev)
.probe = netvsc_probe,
.remove = netvsc_remove,
.driver = {
- .probe_type = PROBE_FORCE_SYNCHRONOUS,
+ .probe_type = PROBE_PREFER_ASYNCHRONOUS,
},
};
@@ -2473,6 +2494,12 @@ static int __init netvsc_drv_init(void)
}
netvsc_ring_bytes = ring_size * PAGE_SIZE;
+ if (probe_type != PROBE_PREFER_ASYNCHRONOUS)
+ probe_type = PROBE_FORCE_SYNCHRONOUS;
+
+ netvsc_drv.driver.probe_type = probe_type;
+ pr_info("probe_type: %u\n", probe_type);
+
ret = vmbus_driver_register(&netvsc_drv);
if (ret)
return ret;
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 6256cc3..12fc5ea 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -841,6 +841,10 @@ struct vmbus_channel {
*/
struct vmbus_channel *primary_channel;
/*
+ * Used for device naming based on channel offer sequence.
+ */
+ int dev_num;
+ /*
* Support per-channel state for use by vmbus drivers.
*/
void *per_channel_state;
--
1.8.3.1
^ permalink raw reply related
* RE: [PATCH v1] hv_sock: Use consistent types for UUIDs
From: Dexuan Cui @ 2019-07-23 16:57 UTC (permalink / raw)
To: Andy Shevchenko, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
Sasha Levin, linux-hyperv@vger.kernel.org, David S. Miller,
netdev@vger.kernel.org
In-Reply-To: <20190723163943.65991-1-andriy.shevchenko@linux.intel.com>
> From: linux-hyperv-owner@vger.kernel.org
> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Andy Shevchenko
> Sent: Tuesday, July 23, 2019 9:40 AM
>
> The rest of Hyper-V code is using new types for UUID handling.
> Convert hv_sock as well.
>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Looks good to me. Thanks, Andy!
Thanks,
-- Dexuan
^ permalink raw reply
* [PATCH v1] hv_sock: Use consistent types for UUIDs
From: Andy Shevchenko @ 2019-07-23 16:39 UTC (permalink / raw)
To: Haiyang Zhang, K. Y. Srinivasan, Stephen Hemminger, Sasha Levin,
linux-hyperv, David S. Miller, netdev
Cc: Andy Shevchenko
The rest of Hyper-V code is using new types for UUID handling.
Convert hv_sock as well.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
net/vmw_vsock/hyperv_transport.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index f2084e3f7aa4..2a1719c0f8d2 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -77,11 +77,11 @@ struct hvs_send_buf {
VMBUS_PKT_TRAILER_SIZE)
union hvs_service_id {
- uuid_le srv_id;
+ guid_t srv_id;
struct {
unsigned int svm_port;
- unsigned char b[sizeof(uuid_le) - sizeof(unsigned int)];
+ unsigned char b[sizeof(guid_t) - sizeof(unsigned int)];
};
};
@@ -89,8 +89,8 @@ union hvs_service_id {
struct hvsock {
struct vsock_sock *vsk;
- uuid_le vm_srv_id;
- uuid_le host_srv_id;
+ guid_t vm_srv_id;
+ guid_t host_srv_id;
struct vmbus_channel *chan;
struct vmpacket_descriptor *recv_desc;
@@ -159,21 +159,21 @@ struct hvsock {
#define MIN_HOST_EPHEMERAL_PORT (MAX_HOST_LISTEN_PORT + 1)
/* 00000000-facb-11e6-bd58-64006a7986d3 */
-static const uuid_le srv_id_template =
- UUID_LE(0x00000000, 0xfacb, 0x11e6, 0xbd, 0x58,
- 0x64, 0x00, 0x6a, 0x79, 0x86, 0xd3);
+static const guid_t srv_id_template =
+ GUID_INIT(0x00000000, 0xfacb, 0x11e6, 0xbd, 0x58,
+ 0x64, 0x00, 0x6a, 0x79, 0x86, 0xd3);
-static bool is_valid_srv_id(const uuid_le *id)
+static bool is_valid_srv_id(const guid_t *id)
{
- return !memcmp(&id->b[4], &srv_id_template.b[4], sizeof(uuid_le) - 4);
+ return !memcmp(&id->b[4], &srv_id_template.b[4], sizeof(guid_t) - 4);
}
-static unsigned int get_port_by_srv_id(const uuid_le *svr_id)
+static unsigned int get_port_by_srv_id(const guid_t *svr_id)
{
return *((unsigned int *)svr_id);
}
-static void hvs_addr_init(struct sockaddr_vm *addr, const uuid_le *svr_id)
+static void hvs_addr_init(struct sockaddr_vm *addr, const guid_t *svr_id)
{
unsigned int port = get_port_by_srv_id(svr_id);
@@ -316,7 +316,7 @@ static void hvs_close_connection(struct vmbus_channel *chan)
static void hvs_open_connection(struct vmbus_channel *chan)
{
- uuid_le *if_instance, *if_type;
+ guid_t *if_instance, *if_type;
unsigned char conn_from_host;
struct sockaddr_vm addr;
--
2.20.1
^ permalink raw reply related
* Re: [PATCH v3 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
From: Peter Zijlstra @ 2019-07-22 19:32 UTC (permalink / raw)
To: Nadav Amit
Cc: Andy Lutomirski, Dave Hansen, the arch/x86 maintainers, LKML,
Thomas Gleixner, Ingo Molnar, K. Y. Srinivasan, Haiyang Zhang,
Stephen Hemminger, Sasha Levin, Borislav Petkov, Juergen Gross,
Paolo Bonzini, Boris Ostrovsky, linux-hyperv@vger.kernel.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
xen-devel@lists.xenproject.org
In-Reply-To: <58DA0841-33C2-4D16-A671-08064A15001C@vmware.com>
On Mon, Jul 22, 2019 at 07:27:09PM +0000, Nadav Amit wrote:
> > On Jul 22, 2019, at 12:14 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > But then we can still do something like the below, which doesn't change
> > things and still gets rid of that dual function crud, simplifying
> > smp_call_function_many again.
> Nice! I will add it on top, if you don’t mind (instead squashing it).
Not at all.
> The original decision to have local/remote functions was mostly to provide
> the generality.
>
> I would change the last argument of __smp_call_function_many() from “wait”
> to “flags” that would indicate whether to run the function locally, since I
> don’t want to change the semantics of smp_call_function_many() and decide
> whether to run the function locally purely based on the mask. Let me know if
> you disagree.
Agreed.
^ permalink raw reply
* Re: [PATCH v3 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
From: Nadav Amit @ 2019-07-22 19:27 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andy Lutomirski, Dave Hansen, the arch/x86 maintainers, LKML,
Thomas Gleixner, Ingo Molnar, K. Y. Srinivasan, Haiyang Zhang,
Stephen Hemminger, Sasha Levin, Borislav Petkov, Juergen Gross,
Paolo Bonzini, Boris Ostrovsky, linux-hyperv@vger.kernel.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
xen-devel@lists.xenproject.org
In-Reply-To: <20190722191433.GD6698@worktop.programming.kicks-ass.net>
> On Jul 22, 2019, at 12:14 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Jul 18, 2019 at 05:58:32PM -0700, Nadav Amit wrote:
>> @@ -709,8 +716,9 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
>> * doing a speculative memory access.
>> */
>> if (info->freed_tables) {
>> - smp_call_function_many(cpumask, flush_tlb_func_remote,
>> - (void *)info, 1);
>> + __smp_call_function_many(cpumask, flush_tlb_func_remote,
>> + flush_tlb_func_local,
>> + (void *)info, 1);
>> } else {
>> /*
>> * Although we could have used on_each_cpu_cond_mask(),
>> @@ -737,7 +745,8 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
>> if (tlb_is_not_lazy(cpu))
>> __cpumask_set_cpu(cpu, cond_cpumask);
>> }
>> - smp_call_function_many(cond_cpumask, flush_tlb_func_remote,
>> + __smp_call_function_many(cond_cpumask, flush_tlb_func_remote,
>> + flush_tlb_func_local,
>> (void *)info, 1);
>> }
>> }
>
> Do we really need that _local/_remote distinction? ISTR you had a patch
> that frobbed flush_tlb_info into the csd and that gave space
> constraints, but I'm not seeing that here (probably a wise, get stuff
> merged etc..).
>
> struct __call_single_data {
> struct llist_node llist; /* 0 8 */
> smp_call_func_t func; /* 8 8 */
> void * info; /* 16 8 */
> unsigned int flags; /* 24 4 */
>
> /* size: 32, cachelines: 1, members: 4 */
> /* padding: 4 */
> /* last cacheline: 32 bytes */
> };
>
> struct flush_tlb_info {
> struct mm_struct * mm; /* 0 8 */
> long unsigned int start; /* 8 8 */
> long unsigned int end; /* 16 8 */
> u64 new_tlb_gen; /* 24 8 */
> unsigned int stride_shift; /* 32 4 */
> bool freed_tables; /* 36 1 */
>
> /* size: 40, cachelines: 1, members: 6 */
> /* padding: 3 */
> /* last cacheline: 40 bytes */
> };
>
> IIRC what you did was make void *__call_single_data::info the last
> member and a union until the full cacheline size (64). Given the above
> that would get us 24 bytes for csd, leaving us 40 for that
> flush_tlb_info.
>
> But then we can still do something like the below, which doesn't change
> things and still gets rid of that dual function crud, simplifying
> smp_call_function_many again.
>
> Index: linux-2.6/arch/x86/include/asm/tlbflush.h
> ===================================================================
> --- linux-2.6.orig/arch/x86/include/asm/tlbflush.h
> +++ linux-2.6/arch/x86/include/asm/tlbflush.h
> @@ -546,8 +546,9 @@ struct flush_tlb_info {
> unsigned long start;
> unsigned long end;
> u64 new_tlb_gen;
> - unsigned int stride_shift;
> - bool freed_tables;
> + unsigned int cpu;
> + unsigned short stride_shift;
> + unsigned char freed_tables;
> };
>
> #define local_flush_tlb() __flush_tlb()
> Index: linux-2.6/arch/x86/mm/tlb.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/tlb.c
> +++ linux-2.6/arch/x86/mm/tlb.c
> @@ -659,6 +659,27 @@ static void flush_tlb_func_remote(void *
> flush_tlb_func_common(f, false, TLB_REMOTE_SHOOTDOWN);
> }
>
> +static void flush_tlb_func(void *info)
> +{
> + const struct flush_tlb_info *f = info;
> + enum tlb_flush_reason reason = TLB_REMOTE_SHOOTDOWN;
> + bool local = false;
> +
> + if (f->cpu == smp_processor_id()) {
> + local = true;
> + reason = (f->mm == NULL) ? TLB_LOCAL_SHOOTDOWN : TLB_LOCAL_MM_SHOOTDOWN;
> + } else {
> + inc_irq_stat(irq_tlb_count);
> +
> + if (f->mm && f->mm != this_cpu_read(cpu_tlbstate.loaded_mm))
> + return;
> +
> + count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
> + }
> +
> + flush_tlb_func_common(f, local, reason);
> +}
> +
> static bool tlb_is_not_lazy(int cpu)
> {
> return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);
Nice! I will add it on top, if you don’t mind (instead squashing it).
The original decision to have local/remote functions was mostly to provide
the generality.
I would change the last argument of __smp_call_function_many() from “wait”
to “flags” that would indicate whether to run the function locally, since I
don’t want to change the semantics of smp_call_function_many() and decide
whether to run the function locally purely based on the mask. Let me know if
you disagree.
^ permalink raw reply
* Re: [PATCH v3 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
From: Peter Zijlstra @ 2019-07-22 19:14 UTC (permalink / raw)
To: Nadav Amit
Cc: Andy Lutomirski, Dave Hansen, x86, linux-kernel, Thomas Gleixner,
Ingo Molnar, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
Sasha Levin, Borislav Petkov, Juergen Gross, Paolo Bonzini,
Boris Ostrovsky, linux-hyperv, virtualization, kvm, xen-devel
In-Reply-To: <20190719005837.4150-5-namit@vmware.com>
On Thu, Jul 18, 2019 at 05:58:32PM -0700, Nadav Amit wrote:
> @@ -709,8 +716,9 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
> * doing a speculative memory access.
> */
> if (info->freed_tables) {
> - smp_call_function_many(cpumask, flush_tlb_func_remote,
> - (void *)info, 1);
> + __smp_call_function_many(cpumask, flush_tlb_func_remote,
> + flush_tlb_func_local,
> + (void *)info, 1);
> } else {
> /*
> * Although we could have used on_each_cpu_cond_mask(),
> @@ -737,7 +745,8 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
> if (tlb_is_not_lazy(cpu))
> __cpumask_set_cpu(cpu, cond_cpumask);
> }
> - smp_call_function_many(cond_cpumask, flush_tlb_func_remote,
> + __smp_call_function_many(cond_cpumask, flush_tlb_func_remote,
> + flush_tlb_func_local,
> (void *)info, 1);
> }
> }
Do we really need that _local/_remote distinction? ISTR you had a patch
that frobbed flush_tlb_info into the csd and that gave space
constraints, but I'm not seeing that here (probably a wise, get stuff
merged etc..).
struct __call_single_data {
struct llist_node llist; /* 0 8 */
smp_call_func_t func; /* 8 8 */
void * info; /* 16 8 */
unsigned int flags; /* 24 4 */
/* size: 32, cachelines: 1, members: 4 */
/* padding: 4 */
/* last cacheline: 32 bytes */
};
struct flush_tlb_info {
struct mm_struct * mm; /* 0 8 */
long unsigned int start; /* 8 8 */
long unsigned int end; /* 16 8 */
u64 new_tlb_gen; /* 24 8 */
unsigned int stride_shift; /* 32 4 */
bool freed_tables; /* 36 1 */
/* size: 40, cachelines: 1, members: 6 */
/* padding: 3 */
/* last cacheline: 40 bytes */
};
IIRC what you did was make void *__call_single_data::info the last
member and a union until the full cacheline size (64). Given the above
that would get us 24 bytes for csd, leaving us 40 for that
flush_tlb_info.
But then we can still do something like the below, which doesn't change
things and still gets rid of that dual function crud, simplifying
smp_call_function_many again.
Index: linux-2.6/arch/x86/include/asm/tlbflush.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/tlbflush.h
+++ linux-2.6/arch/x86/include/asm/tlbflush.h
@@ -546,8 +546,9 @@ struct flush_tlb_info {
unsigned long start;
unsigned long end;
u64 new_tlb_gen;
- unsigned int stride_shift;
- bool freed_tables;
+ unsigned int cpu;
+ unsigned short stride_shift;
+ unsigned char freed_tables;
};
#define local_flush_tlb() __flush_tlb()
Index: linux-2.6/arch/x86/mm/tlb.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/tlb.c
+++ linux-2.6/arch/x86/mm/tlb.c
@@ -659,6 +659,27 @@ static void flush_tlb_func_remote(void *
flush_tlb_func_common(f, false, TLB_REMOTE_SHOOTDOWN);
}
+static void flush_tlb_func(void *info)
+{
+ const struct flush_tlb_info *f = info;
+ enum tlb_flush_reason reason = TLB_REMOTE_SHOOTDOWN;
+ bool local = false;
+
+ if (f->cpu == smp_processor_id()) {
+ local = true;
+ reason = (f->mm == NULL) ? TLB_LOCAL_SHOOTDOWN : TLB_LOCAL_MM_SHOOTDOWN;
+ } else {
+ inc_irq_stat(irq_tlb_count);
+
+ if (f->mm && f->mm != this_cpu_read(cpu_tlbstate.loaded_mm))
+ return;
+
+ count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
+ }
+
+ flush_tlb_func_common(f, local, reason);
+}
+
static bool tlb_is_not_lazy(int cpu)
{
return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);
^ permalink raw reply
* Re: [PATCH] hv: Use the correct style for SPDX License Identifier
From: Greg Kroah-Hartman @ 2019-07-22 14:08 UTC (permalink / raw)
To: Nishad Kamdar
Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
Joe Perches, Uwe Kleine-König, linux-hyperv, linux-kernel
In-Reply-To: <20190722133112.GA7990@nishad>
On Mon, Jul 22, 2019 at 07:01:17PM +0530, Nishad Kamdar wrote:
> This patch corrects the SPDX License Identifier style
> in the trace header file related to Microsoft Hyper-V
> client drivers.
> For C header files Documentation/process/license-rules.rst
> mandates C-like comments (opposed to C source files where
> C++ style should be used)
>
> Changes made by using a script provided by Joe Perches here:
> https://lkml.org/lkml/2019/2/7/46
>
> Suggested-by: Joe Perches <joe@perches.com>
> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox