* [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams
@ 2024-07-10 21:25 Amery Hung
2024-07-10 21:25 ` [RFC PATCH net-next v6 01/14] af_vsock: generalize vsock_dgram_recvmsg() to all transports Amery Hung
` (15 more replies)
0 siblings, 16 replies; 51+ messages in thread
From: Amery Hung @ 2024-07-10 21:25 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
Hey all!
This series introduces support for datagrams to virtio/vsock.
It is a spin-off (and smaller version) of this series from the summer:
https://lore.kernel.org/all/cover.1660362668.git.bobby.eshleman@bytedance.com/
Please note that this is an RFC and should not be merged until
associated changes are made to the virtio specification, which will
follow after discussion from this series.
Another aside, the v4 of the series has only been mildly tested with a
run of tools/testing/vsock/vsock_test. Some code likely needs cleaning
up, but I'm hoping to get some of the design choices agreed upon before
spending too much time making it pretty.
This series first supports datagrams in a basic form for virtio, and
then optimizes the sendpath for all datagram transports.
The result is a very fast datagram communication protocol that
outperforms even UDP on multi-queue virtio-net w/ vhost on a variety
of multi-threaded workload samples.
For those that are curious, some summary data comparing UDP and VSOCK
DGRAM (N=5):
vCPUS: 16
virtio-net queues: 16
payload size: 4KB
Setup: bare metal + vm (non-nested)
UDP: 287.59 MB/s
VSOCK DGRAM: 509.2 MB/s
Some notes about the implementation...
This datagram implementation forces datagrams to self-throttle according
to the threshold set by sk_sndbuf. It behaves similar to the credits
used by streams in its effect on throughput and memory consumption, but
it is not influenced by the receiving socket as credits are.
The device drops packets silently.
As discussed previously, this series introduces datagrams and defers
fairness to future work. See discussion in v2 for more context around
datagrams, fairness, and this implementation.
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
---
Changes in v6:
- allow empty transport in datagram vsock
- add empty transport checks in various paths
- transport layer now saves source cid and port to control buffer of skb
to remove the dependency of transport in recvmsg()
- fix virtio dgram_enqueue() by looking up the transport to be used when
using sendto(2)
- fix skb memory leaks in two places
- add dgram auto-bind test
- Link to v5: https://lore.kernel.org/r/20230413-b4-vsock-dgram-v5-0-581bd37fdb26@bytedance.com
Changes in v5:
- teach vhost to drop dgram when a datagram exceeds the receive buffer
- now uses MSG_ERRQUEUE and depends on Arseniy's zerocopy patch:
"vsock: read from socket's error queue"
- replace multiple ->dgram_* callbacks with single ->dgram_addr_init()
callback
- refactor virtio dgram skb allocator to reduce conflicts w/ zerocopy series
- add _fallback/_FALLBACK suffix to dgram transport variables/macros
- add WARN_ONCE() for table_size / VSOCK_HASH issue
- add static to vsock_find_bound_socket_common
- dedupe code in vsock_dgram_sendmsg() using module_got var
- drop concurrent sendmsg() for dgram and defer to future series
- Add more tests
- test EHOSTUNREACH in errqueue
- test stream + dgram address collision
- improve clarity of dgram msg bounds test code
- Link to v4: https://lore.kernel.org/r/20230413-b4-vsock-dgram-v4-0-0cebbb2ae899@bytedance.com
Changes in v4:
- style changes
- vsock: use sk_vsock(vsk) in vsock_dgram_recvmsg instead of
&sk->vsk
- vsock: fix xmas tree declaration
- vsock: fix spacing issues
- virtio/vsock: virtio_transport_recv_dgram returns void because err
unused
- sparse analysis warnings/errors
- virtio/vsock: fix unitialized skerr on destroy
- virtio/vsock: fix uninitialized err var on goto out
- vsock: fix declarations that need static
- vsock: fix __rcu annotation order
- bugs
- vsock: fix null ptr in remote_info code
- vsock/dgram: make transport_dgram a fallback instead of first
priority
- vsock: remove redundant rcu read lock acquire in getname()
- tests
- add more tests (message bounds and more)
- add vsock_dgram_bind() helper
- add vsock_dgram_connect() helper
Changes in v3:
- Support multi-transport dgram, changing logic in connect/bind
to support VMCI case
- Support per-pkt transport lookup for sendto() case
- Fix dgram_allow() implementation
- Fix dgram feature bit number (now it is 3)
- Fix binding so dgram and connectible (cid,port) spaces are
non-overlapping
- RCU protect transport ptr so connect() calls never leave
a lockless read of the transport and remote_addr are always
in sync
- Link to v2: https://lore.kernel.org/r/20230413-b4-vsock-dgram-v2-0-079cc7cee62e@bytedance.com
Bobby Eshleman (14):
af_vsock: generalize vsock_dgram_recvmsg() to all transports
af_vsock: refactor transport lookup code
af_vsock: support multi-transport datagrams
af_vsock: generalize bind table functions
af_vsock: use a separate dgram bind table
virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM
virtio/vsock: add common datagram send path
af_vsock: add vsock_find_bound_dgram_socket()
virtio/vsock: add common datagram recv path
virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
vhost/vsock: implement datagram support
vsock/loopback: implement datagram support
virtio/vsock: implement datagram support
test/vsock: add vsock dgram tests
drivers/vhost/vsock.c | 62 +-
include/linux/virtio_vsock.h | 9 +-
include/net/af_vsock.h | 24 +-
include/uapi/linux/virtio_vsock.h | 2 +
net/vmw_vsock/af_vsock.c | 343 ++++++--
net/vmw_vsock/hyperv_transport.c | 13 -
net/vmw_vsock/virtio_transport.c | 24 +-
net/vmw_vsock/virtio_transport_common.c | 188 ++++-
net/vmw_vsock/vmci_transport.c | 61 +-
net/vmw_vsock/vsock_loopback.c | 9 +-
tools/testing/vsock/util.c | 177 +++-
tools/testing/vsock/util.h | 10 +
tools/testing/vsock/vsock_test.c | 1032 ++++++++++++++++++++---
13 files changed, 1638 insertions(+), 316 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 01/14] af_vsock: generalize vsock_dgram_recvmsg() to all transports
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
@ 2024-07-10 21:25 ` Amery Hung
2024-07-15 8:02 ` Luigi Leonardi
2024-07-29 19:25 ` Arseniy Krasnov
2024-07-10 21:25 ` [RFC PATCH net-next v6 02/14] af_vsock: refactor transport lookup code Amery Hung
` (14 subsequent siblings)
15 siblings, 2 replies; 51+ messages in thread
From: Amery Hung @ 2024-07-10 21:25 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
From: Bobby Eshleman <bobby.eshleman@bytedance.com>
This commit drops the transport->dgram_dequeue callback and makes
vsock_dgram_recvmsg() generic to all transports.
To make this possible, two transport-level changes are introduced:
- transport in the receiving path now stores the cid and port into
the control buffer of an skb when populating an skb. The information
later is used to initialize sockaddr_vm structure in recvmsg()
without referencing vsk->transport.
- transport implementations set the skb->data pointer to the beginning
of the payload prior to adding the skb to the socket's receive queue.
That is, they must use skb_pull() before enqueuing. This is an
agreement between the transport and the socket layer that skb->data
always points to the beginning of the payload (and not, for example,
the packet header).
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
---
drivers/vhost/vsock.c | 1 -
include/linux/virtio_vsock.h | 5 ---
include/net/af_vsock.h | 11 ++++-
net/vmw_vsock/af_vsock.c | 42 +++++++++++++++++-
net/vmw_vsock/hyperv_transport.c | 7 ---
net/vmw_vsock/virtio_transport.c | 1 -
net/vmw_vsock/virtio_transport_common.c | 9 ----
net/vmw_vsock/vmci_transport.c | 59 +++----------------------
net/vmw_vsock/vsock_loopback.c | 1 -
9 files changed, 55 insertions(+), 81 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index ec20ecff85c7..97fffa914e66 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -419,7 +419,6 @@ static struct virtio_transport vhost_transport = {
.cancel_pkt = vhost_transport_cancel_pkt,
.dgram_enqueue = virtio_transport_dgram_enqueue,
- .dgram_dequeue = virtio_transport_dgram_dequeue,
.dgram_bind = virtio_transport_dgram_bind,
.dgram_allow = virtio_transport_dgram_allow,
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index c82089dee0c8..8b56b8a19ddd 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -177,11 +177,6 @@ virtio_transport_stream_dequeue(struct vsock_sock *vsk,
size_t len,
int type);
int
-virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
- struct msghdr *msg,
- size_t len, int flags);
-
-int
virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
struct msghdr *msg,
size_t len);
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 535701efc1e5..7aa1f5f2b1a5 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -120,8 +120,6 @@ struct vsock_transport {
/* DGRAM. */
int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
- int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
- size_t len, int flags);
int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
struct msghdr *, size_t len);
bool (*dgram_allow)(u32 cid, u32 port);
@@ -219,6 +217,15 @@ void vsock_for_each_connected_socket(struct vsock_transport *transport,
int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
bool vsock_find_cid(unsigned int cid);
+struct vsock_skb_cb {
+ unsigned int src_cid;
+ unsigned int src_port;
+};
+
+static inline struct vsock_skb_cb *vsock_skb_cb(struct sk_buff *skb) {
+ return (struct vsock_skb_cb *)skb->cb;
+};
+
/**** TAP ****/
struct vsock_tap {
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 4b040285aa78..5e7d4d99ea2c 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1273,11 +1273,15 @@ static int vsock_dgram_connect(struct socket *sock,
int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
size_t len, int flags)
{
+ struct vsock_skb_cb *vsock_cb;
#ifdef CONFIG_BPF_SYSCALL
const struct proto *prot;
#endif
struct vsock_sock *vsk;
+ struct sk_buff *skb;
+ size_t payload_len;
struct sock *sk;
+ int err;
sk = sock->sk;
vsk = vsock_sk(sk);
@@ -1288,7 +1292,43 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
return prot->recvmsg(sk, msg, len, flags, NULL);
#endif
- return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
+ if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
+ return -EOPNOTSUPP;
+
+ if (unlikely(flags & MSG_ERRQUEUE))
+ return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 0);
+
+ /* Retrieve the head sk_buff from the socket's receive queue. */
+ err = 0;
+ skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
+ if (!skb)
+ return err;
+
+ payload_len = skb->len;
+
+ if (payload_len > len) {
+ payload_len = len;
+ msg->msg_flags |= MSG_TRUNC;
+ }
+
+ /* Place the datagram payload in the user's iovec. */
+ err = skb_copy_datagram_msg(skb, 0, msg, payload_len);
+ if (err)
+ goto out;
+
+ if (msg->msg_name) {
+ /* Provide the address of the sender. */
+ DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
+
+ vsock_cb = vsock_skb_cb(skb);
+ vsock_addr_init(vm_addr, vsock_cb->src_cid, vsock_cb->src_port);
+ msg->msg_namelen = sizeof(*vm_addr);
+ }
+ err = payload_len;
+
+out:
+ skb_free_datagram(&vsk->sk, skb);
+ return err;
}
EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index e2157e387217..326dd41ee2d5 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -556,12 +556,6 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
return -EOPNOTSUPP;
}
-static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
- size_t len, int flags)
-{
- return -EOPNOTSUPP;
-}
-
static int hvs_dgram_enqueue(struct vsock_sock *vsk,
struct sockaddr_vm *remote, struct msghdr *msg,
size_t dgram_len)
@@ -833,7 +827,6 @@ static struct vsock_transport hvs_transport = {
.shutdown = hvs_shutdown,
.dgram_bind = hvs_dgram_bind,
- .dgram_dequeue = hvs_dgram_dequeue,
.dgram_enqueue = hvs_dgram_enqueue,
.dgram_allow = hvs_dgram_allow,
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 43d405298857..a8c97e95622a 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -508,7 +508,6 @@ static struct virtio_transport virtio_transport = {
.cancel_pkt = virtio_transport_cancel_pkt,
.dgram_bind = virtio_transport_dgram_bind,
- .dgram_dequeue = virtio_transport_dgram_dequeue,
.dgram_enqueue = virtio_transport_dgram_enqueue,
.dgram_allow = virtio_transport_dgram_allow,
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 16ff976a86e3..4bf73d20c12a 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -810,15 +810,6 @@ virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
}
EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_enqueue);
-int
-virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
- struct msghdr *msg,
- size_t len, int flags)
-{
- return -EOPNOTSUPP;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
-
s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
{
struct virtio_vsock_sock *vvs = vsk->trans;
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index b370070194fa..b39df3ed8c8d 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -610,6 +610,7 @@ vmci_transport_datagram_create_hnd(u32 resource_id,
static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg)
{
+ struct vsock_skb_cb *vsock_cb;
struct sock *sk;
size_t size;
struct sk_buff *skb;
@@ -637,10 +638,14 @@ static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg)
if (!skb)
return VMCI_ERROR_NO_MEM;
+ vsock_cb = vsock_skb_cb(skb);
+ vsock_cb->src_cid = dg->src.context;
+ vsock_cb->src_port = dg->src.resource;
/* sk_receive_skb() will do a sock_put(), so hold here. */
sock_hold(sk);
skb_put(skb, size);
memcpy(skb->data, dg, size);
+ skb_pull(skb, VMCI_DG_HEADERSIZE);
sk_receive_skb(sk, skb, 0);
return VMCI_SUCCESS;
@@ -1731,59 +1736,6 @@ static int vmci_transport_dgram_enqueue(
return err - sizeof(*dg);
}
-static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
- struct msghdr *msg, size_t len,
- int flags)
-{
- int err;
- struct vmci_datagram *dg;
- size_t payload_len;
- struct sk_buff *skb;
-
- if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
- return -EOPNOTSUPP;
-
- /* Retrieve the head sk_buff from the socket's receive queue. */
- err = 0;
- skb = skb_recv_datagram(&vsk->sk, flags, &err);
- if (!skb)
- return err;
-
- dg = (struct vmci_datagram *)skb->data;
- if (!dg)
- /* err is 0, meaning we read zero bytes. */
- goto out;
-
- payload_len = dg->payload_size;
- /* Ensure the sk_buff matches the payload size claimed in the packet. */
- if (payload_len != skb->len - sizeof(*dg)) {
- err = -EINVAL;
- goto out;
- }
-
- if (payload_len > len) {
- payload_len = len;
- msg->msg_flags |= MSG_TRUNC;
- }
-
- /* Place the datagram payload in the user's iovec. */
- err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
- if (err)
- goto out;
-
- if (msg->msg_name) {
- /* Provide the address of the sender. */
- DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
- vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
- msg->msg_namelen = sizeof(*vm_addr);
- }
- err = payload_len;
-
-out:
- skb_free_datagram(&vsk->sk, skb);
- return err;
-}
-
static bool vmci_transport_dgram_allow(u32 cid, u32 port)
{
if (cid == VMADDR_CID_HYPERVISOR) {
@@ -2040,7 +1992,6 @@ static struct vsock_transport vmci_transport = {
.release = vmci_transport_release,
.connect = vmci_transport_connect,
.dgram_bind = vmci_transport_dgram_bind,
- .dgram_dequeue = vmci_transport_dgram_dequeue,
.dgram_enqueue = vmci_transport_dgram_enqueue,
.dgram_allow = vmci_transport_dgram_allow,
.stream_dequeue = vmci_transport_stream_dequeue,
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index 6dea6119f5b2..11488887a5cc 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -66,7 +66,6 @@ static struct virtio_transport loopback_transport = {
.cancel_pkt = vsock_loopback_cancel_pkt,
.dgram_bind = virtio_transport_dgram_bind,
- .dgram_dequeue = virtio_transport_dgram_dequeue,
.dgram_enqueue = virtio_transport_dgram_enqueue,
.dgram_allow = virtio_transport_dgram_allow,
--
2.20.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 02/14] af_vsock: refactor transport lookup code
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
2024-07-10 21:25 ` [RFC PATCH net-next v6 01/14] af_vsock: generalize vsock_dgram_recvmsg() to all transports Amery Hung
@ 2024-07-10 21:25 ` Amery Hung
2024-07-25 6:29 ` Arseniy Krasnov
2024-07-10 21:25 ` [RFC PATCH net-next v6 03/14] af_vsock: support multi-transport datagrams Amery Hung
` (13 subsequent siblings)
15 siblings, 1 reply; 51+ messages in thread
From: Amery Hung @ 2024-07-10 21:25 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
From: Bobby Eshleman <bobby.eshleman@bytedance.com>
Introduce new reusable function vsock_connectible_lookup_transport()
that performs the transport lookup logic.
No functional change intended.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
---
net/vmw_vsock/af_vsock.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 5e7d4d99ea2c..98d10cd30483 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -424,6 +424,22 @@ static void vsock_deassign_transport(struct vsock_sock *vsk)
vsk->transport = NULL;
}
+static const struct vsock_transport *
+vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
+{
+ const struct vsock_transport *transport;
+
+ if (vsock_use_local_transport(cid))
+ transport = transport_local;
+ else if (cid <= VMADDR_CID_HOST || !transport_h2g ||
+ (flags & VMADDR_FLAG_TO_HOST))
+ transport = transport_g2h;
+ else
+ transport = transport_h2g;
+
+ return transport;
+}
+
/* Assign a transport to a socket and call the .init transport callback.
*
* Note: for connection oriented socket this must be called when vsk->remote_addr
@@ -464,13 +480,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
break;
case SOCK_STREAM:
case SOCK_SEQPACKET:
- if (vsock_use_local_transport(remote_cid))
- new_transport = transport_local;
- else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g ||
- (remote_flags & VMADDR_FLAG_TO_HOST))
- new_transport = transport_g2h;
- else
- new_transport = transport_h2g;
+ new_transport = vsock_connectible_lookup_transport(remote_cid,
+ remote_flags);
break;
default:
return -ESOCKTNOSUPPORT;
--
2.20.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 03/14] af_vsock: support multi-transport datagrams
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
2024-07-10 21:25 ` [RFC PATCH net-next v6 01/14] af_vsock: generalize vsock_dgram_recvmsg() to all transports Amery Hung
2024-07-10 21:25 ` [RFC PATCH net-next v6 02/14] af_vsock: refactor transport lookup code Amery Hung
@ 2024-07-10 21:25 ` Amery Hung
2024-07-15 8:13 ` Arseniy Krasnov
2024-07-28 20:28 ` Arseniy Krasnov
2024-07-10 21:25 ` [RFC PATCH net-next v6 04/14] af_vsock: generalize bind table functions Amery Hung
` (12 subsequent siblings)
15 siblings, 2 replies; 51+ messages in thread
From: Amery Hung @ 2024-07-10 21:25 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
From: Bobby Eshleman <bobby.eshleman@bytedance.com>
This patch adds support for multi-transport datagrams.
This includes:
- Allow transport to be undecided (i.e., empty) for non-VMCI datagram
use cases during socket creation.
- connect() now assigns the transport for (similar to connectible
sockets)
- Per-packet lookup of transports when using sendto(sockaddr_vm)
- Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
sockaddr_vm
- Rename VSOCK_TRANSPORT_F_DGRAM to VSOCK_TRANSPORT_F_DGRAM_FALLBACK
* Dynamic transport lookup *
virtio datagram will follow h2g/g2h paradigm. Since it is impossible
to know which transport to use during socket creation, the transport is
allowed to remain empty. The transport will be assigned only when
connect() is called. Otherwise, in the sendmsg() path, if sendto() is
used, the cid is used to lookup the transport that will be used. In the
recvmsg() path, since the receiving method is generalized and shared by
different transport, there is now no need to resolve the transport.
Finally, a couple of checks for empty transport are added in other paths
to prevent null-pointer dereference.
* Compatibiliity with VMCI *
To preserve backwards compatibility with VMCI, some important changes
are made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
be used for dgrams only if there is not yet a g2h or h2g transport that
has been registered that can transmit the packet. If there is a g2h/h2g
transport for that remote address, then that transport will be used and
not "transport_dgram". This essentially makes "transport_dgram" a
fallback transport for when h2g/g2h has not yet gone online, and so it
is renamed "transport_dgram_fallback". VMCI implements this transport.
The logic around "transport_dgram" needs to be retained to prevent
breaking VMCI:
1) VMCI datagrams existed prior to h2g/g2h and so operate under a
different paradigm. When the vmci transport comes online, it registers
itself with the DGRAM feature, but not H2G/G2H. Only later when the
transport has more information about its environment does it register
H2G or G2H. In the case that a datagram socket is created after
VSOCK_TRANSPORT_F_DGRAM registration but before G2H/H2G registration,
the "transport_dgram" transport is the only registered transport and so
needs to be used.
2) VMCI seems to require a special message be sent by the transport when a
datagram socket calls bind(). Under the h2g/g2h model, the transport
is selected using the remote_addr which is set by connect(). At
bind time there is no remote_addr because often no connect() has been
called yet: the transport is null. Therefore, with a null transport
there doesn't seem to be any good way for a datagram socket to tell the
VMCI transport that it has just had bind() called upon it.
With the new fallback logic, after H2G/G2H comes online the socket layer
will access the VMCI transport via transport_{h2g,g2h}. Prior to H2G/G2H
coming online, the socket layer will access the VMCI transport via
"transport_dgram_fallback".
Only transports with a special datagram fallback use-case such as VMCI
need to register VSOCK_TRANSPORT_F_DGRAM_FALLBACK.
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
---
drivers/vhost/vsock.c | 1 -
include/linux/virtio_vsock.h | 2 -
include/net/af_vsock.h | 10 +-
net/vmw_vsock/af_vsock.c | 127 +++++++++++++++++++-----
net/vmw_vsock/hyperv_transport.c | 6 --
net/vmw_vsock/virtio_transport.c | 1 -
net/vmw_vsock/virtio_transport_common.c | 7 --
net/vmw_vsock/vmci_transport.c | 2 +-
net/vmw_vsock/vsock_loopback.c | 1 -
9 files changed, 107 insertions(+), 50 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 97fffa914e66..fa1aefb78016 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -419,7 +419,6 @@ static struct virtio_transport vhost_transport = {
.cancel_pkt = vhost_transport_cancel_pkt,
.dgram_enqueue = virtio_transport_dgram_enqueue,
- .dgram_bind = virtio_transport_dgram_bind,
.dgram_allow = virtio_transport_dgram_allow,
.stream_enqueue = virtio_transport_stream_enqueue,
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 8b56b8a19ddd..f749a066af46 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -221,8 +221,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
bool virtio_transport_stream_allow(u32 cid, u32 port);
-int virtio_transport_dgram_bind(struct vsock_sock *vsk,
- struct sockaddr_vm *addr);
bool virtio_transport_dgram_allow(u32 cid, u32 port);
int virtio_transport_connect(struct vsock_sock *vsk);
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 7aa1f5f2b1a5..44db8f2c507d 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -96,13 +96,13 @@ struct vsock_transport_send_notify_data {
/* Transport features flags */
/* Transport provides host->guest communication */
-#define VSOCK_TRANSPORT_F_H2G 0x00000001
+#define VSOCK_TRANSPORT_F_H2G 0x00000001
/* Transport provides guest->host communication */
-#define VSOCK_TRANSPORT_F_G2H 0x00000002
-/* Transport provides DGRAM communication */
-#define VSOCK_TRANSPORT_F_DGRAM 0x00000004
+#define VSOCK_TRANSPORT_F_G2H 0x00000002
+/* Transport provides fallback for DGRAM communication */
+#define VSOCK_TRANSPORT_F_DGRAM_FALLBACK 0x00000004
/* Transport provides local (loopback) communication */
-#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
+#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
struct vsock_transport {
struct module *module;
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 98d10cd30483..acc15e11700c 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -140,8 +140,8 @@ struct proto vsock_proto = {
static const struct vsock_transport *transport_h2g;
/* Transport used for guest->host communication */
static const struct vsock_transport *transport_g2h;
-/* Transport used for DGRAM communication */
-static const struct vsock_transport *transport_dgram;
+/* Transport used as a fallback for DGRAM communication */
+static const struct vsock_transport *transport_dgram_fallback;
/* Transport used for local communication */
static const struct vsock_transport *transport_local;
static DEFINE_MUTEX(vsock_register_mutex);
@@ -440,19 +440,20 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
return transport;
}
-/* Assign a transport to a socket and call the .init transport callback.
- *
- * Note: for connection oriented socket this must be called when vsk->remote_addr
- * is set (e.g. during the connect() or when a connection request on a listener
- * socket is received).
- * The vsk->remote_addr is used to decide which transport to use:
- * - remote CID == VMADDR_CID_LOCAL or g2h->local_cid or VMADDR_CID_HOST if
- * g2h is not loaded, will use local transport;
- * - remote CID <= VMADDR_CID_HOST or h2g is not loaded or remote flags field
- * includes VMADDR_FLAG_TO_HOST flag value, will use guest->host transport;
- * - remote CID > VMADDR_CID_HOST will use host->guest transport;
- */
-int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
+static const struct vsock_transport *
+vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
+{
+ const struct vsock_transport *transport;
+
+ transport = vsock_connectible_lookup_transport(cid, flags);
+ if (transport)
+ return transport;
+
+ return transport_dgram_fallback;
+}
+
+static int __vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk,
+ bool create_sock)
{
const struct vsock_transport *new_transport;
struct sock *sk = sk_vsock(vsk);
@@ -476,7 +477,21 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
switch (sk->sk_type) {
case SOCK_DGRAM:
- new_transport = transport_dgram;
+ /* During vsock_create(), the transport cannot be decided yet if
+ * using virtio. While for VMCI, it is transport_dgram_fallback.
+ * Therefore, we try to initialize it to transport_dgram_fallback
+ * so that we don't break VMCI. If VMCI is not present, it is okay
+ * to leave the transport empty since vsk->transport != NULL checks
+ * will be performed in send and receive paths.
+ *
+ * During vsock_dgram_connect(), since remote_cid is available,
+ * the right transport is assigned after lookup.
+ */
+ if (create_sock)
+ new_transport = transport_dgram_fallback;
+ else
+ new_transport = vsock_dgram_lookup_transport(remote_cid,
+ remote_flags);
break;
case SOCK_STREAM:
case SOCK_SEQPACKET:
@@ -501,6 +516,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
vsock_deassign_transport(vsk);
}
+ /* Only allow empty transport during vsock_create() for datagram */
+ if (!new_transport && sk->sk_type == SOCK_DGRAM && create_sock)
+ return 0;
+
/* We increase the module refcnt to prevent the transport unloading
* while there are open sockets assigned to it.
*/
@@ -525,6 +544,23 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
return 0;
}
+
+/* Assign a transport to a socket and call the .init transport callback.
+ *
+ * Note: for connection oriented socket this must be called when vsk->remote_addr
+ * is set (e.g. during the connect() or when a connection request on a listener
+ * socket is received).
+ * The vsk->remote_addr is used to decide which transport to use:
+ * - remote CID == VMADDR_CID_LOCAL or g2h->local_cid or VMADDR_CID_HOST if
+ * g2h is not loaded, will use local transport;
+ * - remote CID <= VMADDR_CID_HOST or h2g is not loaded or remote flags field
+ * includes VMADDR_FLAG_TO_HOST flag value, will use guest->host transport;
+ * - remote CID > VMADDR_CID_HOST will use host->guest transport;
+ */
+int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
+{
+ return __vsock_assign_transport(vsk, psk, false);
+}
EXPORT_SYMBOL_GPL(vsock_assign_transport);
bool vsock_find_cid(unsigned int cid)
@@ -693,6 +729,9 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
static int __vsock_bind_dgram(struct vsock_sock *vsk,
struct sockaddr_vm *addr)
{
+ if (!vsk->transport || !vsk->transport->dgram_bind)
+ return -EINVAL;
+
return vsk->transport->dgram_bind(vsk, addr);
}
@@ -825,6 +864,9 @@ static void __vsock_release(struct sock *sk, int level)
vsk->transport->release(vsk);
else if (sock_type_connectible(sk->sk_type))
vsock_remove_sock(vsk);
+ else if (sk->sk_type == SOCK_DGRAM &&
+ (!vsk->transport || !vsk->transport->dgram_bind))
+ vsock_remove_sock(vsk);
sock_orphan(sk);
sk->sk_shutdown = SHUTDOWN_MASK;
@@ -1152,6 +1194,9 @@ static int vsock_read_skb(struct sock *sk, skb_read_actor_t read_actor)
{
struct vsock_sock *vsk = vsock_sk(sk);
+ if (!vsk->transport)
+ return -EINVAL;
+
return vsk->transport->read_skb(vsk, read_actor);
}
@@ -1163,6 +1208,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
struct vsock_sock *vsk;
struct sockaddr_vm *remote_addr;
const struct vsock_transport *transport;
+ bool module_got = false;
if (msg->msg_flags & MSG_OOB)
return -EOPNOTSUPP;
@@ -1174,19 +1220,40 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
lock_sock(sk);
- transport = vsk->transport;
-
err = vsock_auto_bind(vsk);
if (err)
goto out;
-
/* If the provided message contains an address, use that. Otherwise
* fall back on the socket's remote handle (if it has been connected).
*/
if (msg->msg_name &&
vsock_addr_cast(msg->msg_name, msg->msg_namelen,
&remote_addr) == 0) {
+ transport = vsock_dgram_lookup_transport(remote_addr->svm_cid,
+ remote_addr->svm_flags);
+ /* transport_dgram_fallback needs to be initialized to be called */
+ if (transport == transport_dgram_fallback && transport != vsk->transport) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (!transport) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (!try_module_get(transport->module)) {
+ err = -ENODEV;
+ goto out;
+ }
+
+ /* When looking up a transport dynamically and acquiring a
+ * reference on the module, we need to remember to release the
+ * reference later.
+ */
+ module_got = true;
+
/* Ensure this address is of the right type and is a valid
* destination.
*/
@@ -1201,6 +1268,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
} else if (sock->state == SS_CONNECTED) {
remote_addr = &vsk->remote_addr;
+ transport = vsk->transport;
if (remote_addr->svm_cid == VMADDR_CID_ANY)
remote_addr->svm_cid = transport->get_local_cid();
@@ -1225,6 +1293,8 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
out:
+ if (module_got)
+ module_put(transport->module);
release_sock(sk);
return err;
}
@@ -1257,13 +1327,18 @@ static int vsock_dgram_connect(struct socket *sock,
if (err)
goto out;
+ memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
+
+ err = vsock_assign_transport(vsk, NULL);
+ if (err)
+ goto out;
+
if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
remote_addr->svm_port)) {
err = -EINVAL;
goto out;
}
- memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
sock->state = SS_CONNECTED;
/* sock map disallows redirection of non-TCP sockets with sk_state !=
@@ -2406,7 +2481,7 @@ static int vsock_create(struct net *net, struct socket *sock,
vsk = vsock_sk(sk);
if (sock->type == SOCK_DGRAM) {
- ret = vsock_assign_transport(vsk, NULL);
+ ret = __vsock_assign_transport(vsk, NULL, true);
if (ret < 0) {
sock_put(sk);
return ret;
@@ -2548,7 +2623,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
t_h2g = transport_h2g;
t_g2h = transport_g2h;
- t_dgram = transport_dgram;
+ t_dgram = transport_dgram_fallback;
t_local = transport_local;
if (features & VSOCK_TRANSPORT_F_H2G) {
@@ -2567,7 +2642,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
t_g2h = t;
}
- if (features & VSOCK_TRANSPORT_F_DGRAM) {
+ if (features & VSOCK_TRANSPORT_F_DGRAM_FALLBACK) {
if (t_dgram) {
err = -EBUSY;
goto err_busy;
@@ -2585,7 +2660,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
transport_h2g = t_h2g;
transport_g2h = t_g2h;
- transport_dgram = t_dgram;
+ transport_dgram_fallback = t_dgram;
transport_local = t_local;
err_busy:
@@ -2604,8 +2679,8 @@ void vsock_core_unregister(const struct vsock_transport *t)
if (transport_g2h == t)
transport_g2h = NULL;
- if (transport_dgram == t)
- transport_dgram = NULL;
+ if (transport_dgram_fallback == t)
+ transport_dgram_fallback = NULL;
if (transport_local == t)
transport_local = NULL;
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 326dd41ee2d5..64ad87a3879c 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -551,11 +551,6 @@ static void hvs_destruct(struct vsock_sock *vsk)
kfree(hvs);
}
-static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
-{
- return -EOPNOTSUPP;
-}
-
static int hvs_dgram_enqueue(struct vsock_sock *vsk,
struct sockaddr_vm *remote, struct msghdr *msg,
size_t dgram_len)
@@ -826,7 +821,6 @@ static struct vsock_transport hvs_transport = {
.connect = hvs_connect,
.shutdown = hvs_shutdown,
- .dgram_bind = hvs_dgram_bind,
.dgram_enqueue = hvs_dgram_enqueue,
.dgram_allow = hvs_dgram_allow,
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index a8c97e95622a..4891b845fcde 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -507,7 +507,6 @@ static struct virtio_transport virtio_transport = {
.shutdown = virtio_transport_shutdown,
.cancel_pkt = virtio_transport_cancel_pkt,
- .dgram_bind = virtio_transport_dgram_bind,
.dgram_enqueue = virtio_transport_dgram_enqueue,
.dgram_allow = virtio_transport_dgram_allow,
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 4bf73d20c12a..a1c76836d798 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1008,13 +1008,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
}
EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
-int virtio_transport_dgram_bind(struct vsock_sock *vsk,
- struct sockaddr_vm *addr)
-{
- return -EOPNOTSUPP;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
-
bool virtio_transport_dgram_allow(u32 cid, u32 port)
{
return false;
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index b39df3ed8c8d..49aba9c48415 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -2061,7 +2061,7 @@ static int __init vmci_transport_init(void)
/* Register only with dgram feature, other features (H2G, G2H) will be
* registered when the first host or guest becomes active.
*/
- err = vsock_core_register(&vmci_transport, VSOCK_TRANSPORT_F_DGRAM);
+ err = vsock_core_register(&vmci_transport, VSOCK_TRANSPORT_F_DGRAM_FALLBACK);
if (err < 0)
goto err_unsubscribe;
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index 11488887a5cc..4dd4886f29d1 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -65,7 +65,6 @@ static struct virtio_transport loopback_transport = {
.shutdown = virtio_transport_shutdown,
.cancel_pkt = vsock_loopback_cancel_pkt,
- .dgram_bind = virtio_transport_dgram_bind,
.dgram_enqueue = virtio_transport_dgram_enqueue,
.dgram_allow = virtio_transport_dgram_allow,
--
2.20.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 04/14] af_vsock: generalize bind table functions
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
` (2 preceding siblings ...)
2024-07-10 21:25 ` [RFC PATCH net-next v6 03/14] af_vsock: support multi-transport datagrams Amery Hung
@ 2024-07-10 21:25 ` Amery Hung
2024-07-23 14:39 ` Stefano Garzarella
2024-07-10 21:25 ` [RFC PATCH net-next v6 05/14] af_vsock: use a separate dgram bind table Amery Hung
` (11 subsequent siblings)
15 siblings, 1 reply; 51+ messages in thread
From: Amery Hung @ 2024-07-10 21:25 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
From: Bobby Eshleman <bobby.eshleman@bytedance.com>
This commit makes the bind table management functions in vsock usable
for different bind tables. Future work will introduce a new table for
datagrams to avoid address collisions, and these functions will be used
there.
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
---
net/vmw_vsock/af_vsock.c | 34 +++++++++++++++++++++++++++-------
1 file changed, 27 insertions(+), 7 deletions(-)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index acc15e11700c..d571be9cdbf0 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -232,11 +232,12 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
sock_put(&vsk->sk);
}
-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
+static struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
+ struct list_head *bind_table)
{
struct vsock_sock *vsk;
- list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
+ list_for_each_entry(vsk, bind_table, bound_table) {
if (vsock_addr_equals_addr(addr, &vsk->local_addr))
return sk_vsock(vsk);
@@ -249,6 +250,11 @@ static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
return NULL;
}
+static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
+{
+ return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
+}
+
static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
struct sockaddr_vm *dst)
{
@@ -671,12 +677,18 @@ static void vsock_pending_work(struct work_struct *work)
/**** SOCKET OPERATIONS ****/
-static int __vsock_bind_connectible(struct vsock_sock *vsk,
- struct sockaddr_vm *addr)
+static int vsock_bind_common(struct vsock_sock *vsk,
+ struct sockaddr_vm *addr,
+ struct list_head *bind_table,
+ size_t table_size)
{
static u32 port;
struct sockaddr_vm new_addr;
+ if (WARN_ONCE(table_size < VSOCK_HASH_SIZE,
+ "table size too small, may cause overflow"))
+ return -EINVAL;
+
if (!port)
port = get_random_u32_above(LAST_RESERVED_PORT);
@@ -692,7 +704,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
new_addr.svm_port = port++;
- if (!__vsock_find_bound_socket(&new_addr)) {
+ if (!vsock_find_bound_socket_common(&new_addr,
+ &bind_table[VSOCK_HASH(addr)])) {
found = true;
break;
}
@@ -709,7 +722,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
return -EACCES;
}
- if (__vsock_find_bound_socket(&new_addr))
+ if (vsock_find_bound_socket_common(&new_addr,
+ &bind_table[VSOCK_HASH(addr)]))
return -EADDRINUSE;
}
@@ -721,11 +735,17 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
* by AF_UNIX.
*/
__vsock_remove_bound(vsk);
- __vsock_insert_bound(vsock_bound_sockets(&vsk->local_addr), vsk);
+ __vsock_insert_bound(&bind_table[VSOCK_HASH(&vsk->local_addr)], vsk);
return 0;
}
+static int __vsock_bind_connectible(struct vsock_sock *vsk,
+ struct sockaddr_vm *addr)
+{
+ return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
+}
+
static int __vsock_bind_dgram(struct vsock_sock *vsk,
struct sockaddr_vm *addr)
{
--
2.20.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 05/14] af_vsock: use a separate dgram bind table
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
` (3 preceding siblings ...)
2024-07-10 21:25 ` [RFC PATCH net-next v6 04/14] af_vsock: generalize bind table functions Amery Hung
@ 2024-07-10 21:25 ` Amery Hung
2024-07-23 14:41 ` Stefano Garzarella
2024-07-10 21:25 ` [RFC PATCH net-next v6 06/14] virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM Amery Hung
` (10 subsequent siblings)
15 siblings, 1 reply; 51+ messages in thread
From: Amery Hung @ 2024-07-10 21:25 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
From: Bobby Eshleman <bobby.eshleman@bytedance.com>
This commit adds support for bound dgram sockets to be tracked in a
separate bind table from connectible sockets in order to avoid address
collisions. With this commit, users can simultaneously bind a dgram
socket and connectible socket to the same CID and port.
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
---
net/vmw_vsock/af_vsock.c | 103 +++++++++++++++++++++++++++++----------
1 file changed, 76 insertions(+), 27 deletions(-)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index d571be9cdbf0..ab08cd81720e 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -10,18 +10,23 @@
* - There are two kinds of sockets: those created by user action (such as
* calling socket(2)) and those created by incoming connection request packets.
*
- * - There are two "global" tables, one for bound sockets (sockets that have
- * specified an address that they are responsible for) and one for connected
- * sockets (sockets that have established a connection with another socket).
- * These tables are "global" in that all sockets on the system are placed
- * within them. - Note, though, that the bound table contains an extra entry
- * for a list of unbound sockets and SOCK_DGRAM sockets will always remain in
- * that list. The bound table is used solely for lookup of sockets when packets
- * are received and that's not necessary for SOCK_DGRAM sockets since we create
- * a datagram handle for each and need not perform a lookup. Keeping SOCK_DGRAM
- * sockets out of the bound hash buckets will reduce the chance of collisions
- * when looking for SOCK_STREAM sockets and prevents us from having to check the
- * socket type in the hash table lookups.
+ * - There are three "global" tables, one for bound connectible (stream /
+ * seqpacket) sockets, one for bound datagram sockets, and one for connected
+ * sockets. Bound sockets are sockets that have specified an address that
+ * they are responsible for. Connected sockets are sockets that have
+ * established a connection with another socket. These tables are "global" in
+ * that all sockets on the system are placed within them. - Note, though,
+ * that the bound tables contain an extra entry for a list of unbound
+ * sockets. The bound tables are used solely for lookup of sockets when packets
+ * are received.
+ *
+ * - There are separate bind tables for connectible and datagram sockets to avoid
+ * address collisions between stream/seqpacket sockets and datagram sockets.
+ *
+ * - Transports may elect to NOT use the global datagram bind table by
+ * implementing the ->dgram_bind() callback. If that callback is implemented,
+ * the global bind table is not used and the responsibility of bound datagram
+ * socket tracking is deferred to the transport.
*
* - Sockets created by user action will either be "client" sockets that
* initiate a connection or "server" sockets that listen for connections; we do
@@ -116,6 +121,7 @@
static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
static void vsock_sk_destruct(struct sock *sk);
static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
+static bool sock_type_connectible(u16 type);
/* Protocol family. */
struct proto vsock_proto = {
@@ -152,21 +158,25 @@ static DEFINE_MUTEX(vsock_register_mutex);
* VSocket is stored in the connected hash table.
*
* Unbound sockets are all put on the same list attached to the end of the hash
- * table (vsock_unbound_sockets). Bound sockets are added to the hash table in
- * the bucket that their local address hashes to (vsock_bound_sockets(addr)
- * represents the list that addr hashes to).
+ * tables (vsock_unbound_sockets/vsock_unbound_dgram_sockets). Bound sockets
+ * are added to the hash table in the bucket that their local address hashes to
+ * (vsock_bound_sockets(addr) and vsock_bound_dgram_sockets(addr) represents
+ * the list that addr hashes to).
*
- * Specifically, we initialize the vsock_bind_table array to a size of
- * VSOCK_HASH_SIZE + 1 so that vsock_bind_table[0] through
- * vsock_bind_table[VSOCK_HASH_SIZE - 1] are for bound sockets and
- * vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets. The hash function
- * mods with VSOCK_HASH_SIZE to ensure this.
+ * Specifically, taking connectible sockets as an example we initialize the
+ * vsock_bind_table array to a size of VSOCK_HASH_SIZE + 1 so that
+ * vsock_bind_table[0] through vsock_bind_table[VSOCK_HASH_SIZE - 1] are for
+ * bound sockets and vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets.
+ * The hash function mods with VSOCK_HASH_SIZE to ensure this.
+ * Datagrams and vsock_dgram_bind_table operate in the same way.
*/
#define MAX_PORT_RETRIES 24
#define VSOCK_HASH(addr) ((addr)->svm_port % VSOCK_HASH_SIZE)
#define vsock_bound_sockets(addr) (&vsock_bind_table[VSOCK_HASH(addr)])
+#define vsock_bound_dgram_sockets(addr) (&vsock_dgram_bind_table[VSOCK_HASH(addr)])
#define vsock_unbound_sockets (&vsock_bind_table[VSOCK_HASH_SIZE])
+#define vsock_unbound_dgram_sockets (&vsock_dgram_bind_table[VSOCK_HASH_SIZE])
/* XXX This can probably be implemented in a better way. */
#define VSOCK_CONN_HASH(src, dst) \
@@ -182,6 +192,8 @@ struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
EXPORT_SYMBOL_GPL(vsock_connected_table);
DEFINE_SPINLOCK(vsock_table_lock);
EXPORT_SYMBOL_GPL(vsock_table_lock);
+static struct list_head vsock_dgram_bind_table[VSOCK_HASH_SIZE + 1];
+static DEFINE_SPINLOCK(vsock_dgram_table_lock);
/* Autobind this socket to the local address if necessary. */
static int vsock_auto_bind(struct vsock_sock *vsk)
@@ -204,6 +216,9 @@ static void vsock_init_tables(void)
for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
INIT_LIST_HEAD(&vsock_connected_table[i]);
+
+ for (i = 0; i < ARRAY_SIZE(vsock_dgram_bind_table); i++)
+ INIT_LIST_HEAD(&vsock_dgram_bind_table[i]);
}
static void __vsock_insert_bound(struct list_head *list,
@@ -271,13 +286,28 @@ static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
return NULL;
}
-static void vsock_insert_unbound(struct vsock_sock *vsk)
+static void __vsock_insert_dgram_unbound(struct vsock_sock *vsk)
+{
+ spin_lock_bh(&vsock_dgram_table_lock);
+ __vsock_insert_bound(vsock_unbound_dgram_sockets, vsk);
+ spin_unlock_bh(&vsock_dgram_table_lock);
+}
+
+static void __vsock_insert_connectible_unbound(struct vsock_sock *vsk)
{
spin_lock_bh(&vsock_table_lock);
__vsock_insert_bound(vsock_unbound_sockets, vsk);
spin_unlock_bh(&vsock_table_lock);
}
+static void vsock_insert_unbound(struct vsock_sock *vsk)
+{
+ if (sock_type_connectible(sk_vsock(vsk)->sk_type))
+ __vsock_insert_connectible_unbound(vsk);
+ else
+ __vsock_insert_dgram_unbound(vsk);
+}
+
void vsock_insert_connected(struct vsock_sock *vsk)
{
struct list_head *list = vsock_connected_sockets(
@@ -289,6 +319,14 @@ void vsock_insert_connected(struct vsock_sock *vsk)
}
EXPORT_SYMBOL_GPL(vsock_insert_connected);
+static void vsock_remove_dgram_bound(struct vsock_sock *vsk)
+{
+ spin_lock_bh(&vsock_dgram_table_lock);
+ if (__vsock_in_bound_table(vsk))
+ __vsock_remove_bound(vsk);
+ spin_unlock_bh(&vsock_dgram_table_lock);
+}
+
void vsock_remove_bound(struct vsock_sock *vsk)
{
spin_lock_bh(&vsock_table_lock);
@@ -340,7 +378,10 @@ EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
void vsock_remove_sock(struct vsock_sock *vsk)
{
- vsock_remove_bound(vsk);
+ if (sock_type_connectible(sk_vsock(vsk)->sk_type))
+ vsock_remove_bound(vsk);
+ else
+ vsock_remove_dgram_bound(vsk);
vsock_remove_connected(vsk);
}
EXPORT_SYMBOL_GPL(vsock_remove_sock);
@@ -746,11 +787,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
}
-static int __vsock_bind_dgram(struct vsock_sock *vsk,
- struct sockaddr_vm *addr)
+static int vsock_bind_dgram(struct vsock_sock *vsk,
+ struct sockaddr_vm *addr)
{
- if (!vsk->transport || !vsk->transport->dgram_bind)
- return -EINVAL;
+ if (!vsk->transport || !vsk->transport->dgram_bind) {
+ int retval;
+
+ spin_lock_bh(&vsock_dgram_table_lock);
+ retval = vsock_bind_common(vsk, addr, vsock_dgram_bind_table,
+ VSOCK_HASH_SIZE);
+ spin_unlock_bh(&vsock_dgram_table_lock);
+
+ return retval;
+ }
return vsk->transport->dgram_bind(vsk, addr);
}
@@ -781,7 +830,7 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
break;
case SOCK_DGRAM:
- retval = __vsock_bind_dgram(vsk, addr);
+ retval = vsock_bind_dgram(vsk, addr);
break;
default:
--
2.20.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 06/14] virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
` (4 preceding siblings ...)
2024-07-10 21:25 ` [RFC PATCH net-next v6 05/14] af_vsock: use a separate dgram bind table Amery Hung
@ 2024-07-10 21:25 ` Amery Hung
2024-07-10 21:25 ` [RFC PATCH net-next v6 07/14] virtio/vsock: add common datagram send path Amery Hung
` (9 subsequent siblings)
15 siblings, 0 replies; 51+ messages in thread
From: Amery Hung @ 2024-07-10 21:25 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
From: Bobby Eshleman <bobby.eshleman@bytedance.com>
This commit adds the datagram packet type for inclusion in virtio vsock
packet headers. It is included here as a standalone commit because
multiple future but distinct commits depend on it.
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
---
include/uapi/linux/virtio_vsock.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
index 64738838bee5..331be28b1d30 100644
--- a/include/uapi/linux/virtio_vsock.h
+++ b/include/uapi/linux/virtio_vsock.h
@@ -69,6 +69,7 @@ struct virtio_vsock_hdr {
enum virtio_vsock_type {
VIRTIO_VSOCK_TYPE_STREAM = 1,
VIRTIO_VSOCK_TYPE_SEQPACKET = 2,
+ VIRTIO_VSOCK_TYPE_DGRAM = 3,
};
enum virtio_vsock_op {
--
2.20.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 07/14] virtio/vsock: add common datagram send path
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
` (5 preceding siblings ...)
2024-07-10 21:25 ` [RFC PATCH net-next v6 06/14] virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM Amery Hung
@ 2024-07-10 21:25 ` Amery Hung
2024-07-23 14:42 ` Stefano Garzarella
2024-07-29 20:00 ` Arseniy Krasnov
2024-07-10 21:25 ` [RFC PATCH net-next v6 08/14] af_vsock: add vsock_find_bound_dgram_socket() Amery Hung
` (8 subsequent siblings)
15 siblings, 2 replies; 51+ messages in thread
From: Amery Hung @ 2024-07-10 21:25 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
From: Bobby Eshleman <bobby.eshleman@bytedance.com>
This commit implements the common function
virtio_transport_dgram_enqueue for enqueueing datagrams. It does not add
usage in either vhost or virtio yet.
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
---
include/linux/virtio_vsock.h | 1 +
include/net/af_vsock.h | 2 +
net/vmw_vsock/af_vsock.c | 2 +-
net/vmw_vsock/virtio_transport_common.c | 87 ++++++++++++++++++++++++-
4 files changed, 90 insertions(+), 2 deletions(-)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index f749a066af46..4408749febd2 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -152,6 +152,7 @@ struct virtio_vsock_pkt_info {
u16 op;
u32 flags;
bool reply;
+ u8 remote_flags;
};
struct virtio_transport {
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 44db8f2c507d..6e97d344ac75 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -216,6 +216,8 @@ void vsock_for_each_connected_socket(struct vsock_transport *transport,
void (*fn)(struct sock *sk));
int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
bool vsock_find_cid(unsigned int cid);
+const struct vsock_transport *vsock_dgram_lookup_transport(unsigned int cid,
+ __u8 flags);
struct vsock_skb_cb {
unsigned int src_cid;
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index ab08cd81720e..f83b655fdbe9 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -487,7 +487,7 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
return transport;
}
-static const struct vsock_transport *
+const struct vsock_transport *
vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
{
const struct vsock_transport *transport;
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index a1c76836d798..46cd1807f8e3 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1040,13 +1040,98 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)
}
EXPORT_SYMBOL_GPL(virtio_transport_shutdown);
+static int virtio_transport_dgram_send_pkt_info(struct vsock_sock *vsk,
+ struct virtio_vsock_pkt_info *info)
+{
+ u32 src_cid, src_port, dst_cid, dst_port;
+ const struct vsock_transport *transport;
+ const struct virtio_transport *t_ops;
+ struct sock *sk = sk_vsock(vsk);
+ struct virtio_vsock_hdr *hdr;
+ struct sk_buff *skb;
+ void *payload;
+ int noblock = 0;
+ int err;
+
+ info->type = virtio_transport_get_type(sk_vsock(vsk));
+
+ if (info->pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
+ return -EMSGSIZE;
+
+ transport = vsock_dgram_lookup_transport(info->remote_cid, info->remote_flags);
+ t_ops = container_of(transport, struct virtio_transport, transport);
+ if (unlikely(!t_ops))
+ return -EFAULT;
+
+ if (info->msg)
+ noblock = info->msg->msg_flags & MSG_DONTWAIT;
+
+ /* Use sock_alloc_send_skb to throttle by sk_sndbuf. This helps avoid
+ * triggering the OOM.
+ */
+ skb = sock_alloc_send_skb(sk, info->pkt_len + VIRTIO_VSOCK_SKB_HEADROOM,
+ noblock, &err);
+ if (!skb)
+ return err;
+
+ skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
+
+ src_cid = t_ops->transport.get_local_cid();
+ src_port = vsk->local_addr.svm_port;
+ dst_cid = info->remote_cid;
+ dst_port = info->remote_port;
+
+ hdr = virtio_vsock_hdr(skb);
+ hdr->type = cpu_to_le16(info->type);
+ hdr->op = cpu_to_le16(info->op);
+ hdr->src_cid = cpu_to_le64(src_cid);
+ hdr->dst_cid = cpu_to_le64(dst_cid);
+ hdr->src_port = cpu_to_le32(src_port);
+ hdr->dst_port = cpu_to_le32(dst_port);
+ hdr->flags = cpu_to_le32(info->flags);
+ hdr->len = cpu_to_le32(info->pkt_len);
+
+ if (info->msg && info->pkt_len > 0) {
+ payload = skb_put(skb, info->pkt_len);
+ err = memcpy_from_msg(payload, info->msg, info->pkt_len);
+ if (err)
+ goto out;
+ }
+
+ trace_virtio_transport_alloc_pkt(src_cid, src_port,
+ dst_cid, dst_port,
+ info->pkt_len,
+ info->type,
+ info->op,
+ info->flags,
+ false);
+
+ return t_ops->send_pkt(skb);
+out:
+ kfree_skb(skb);
+ return err;
+}
+
int
virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
struct sockaddr_vm *remote_addr,
struct msghdr *msg,
size_t dgram_len)
{
- return -EOPNOTSUPP;
+ /* Here we are only using the info struct to retain style uniformity
+ * and to ease future refactoring and merging.
+ */
+ struct virtio_vsock_pkt_info info = {
+ .op = VIRTIO_VSOCK_OP_RW,
+ .remote_cid = remote_addr->svm_cid,
+ .remote_port = remote_addr->svm_port,
+ .remote_flags = remote_addr->svm_flags,
+ .msg = msg,
+ .vsk = vsk,
+ .pkt_len = dgram_len,
+ };
+
+ return virtio_transport_dgram_send_pkt_info(vsk, &info);
}
EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
--
2.20.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 08/14] af_vsock: add vsock_find_bound_dgram_socket()
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
` (6 preceding siblings ...)
2024-07-10 21:25 ` [RFC PATCH net-next v6 07/14] virtio/vsock: add common datagram send path Amery Hung
@ 2024-07-10 21:25 ` Amery Hung
2024-07-10 21:25 ` [RFC PATCH net-next v6 09/14] virtio/vsock: add common datagram recv path Amery Hung
` (7 subsequent siblings)
15 siblings, 0 replies; 51+ messages in thread
From: Amery Hung @ 2024-07-10 21:25 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
From: Bobby Eshleman <bobby.eshleman@bytedance.com>
This commit adds vsock_find_bound_dgram_socket() which allows transports
to find bound dgram sockets in the global dgram bind table. It is
intended to be used for "routing" incoming packets to the correct
sockets if the transport uses the global bind table.
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
---
include/net/af_vsock.h | 1 +
net/vmw_vsock/af_vsock.c | 16 ++++++++++++++++
2 files changed, 17 insertions(+)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 6e97d344ac75..9d0882b82bfa 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -218,6 +218,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
bool vsock_find_cid(unsigned int cid);
const struct vsock_transport *vsock_dgram_lookup_transport(unsigned int cid,
__u8 flags);
+struct sock *vsock_find_bound_dgram_socket(struct sockaddr_vm *addr);
struct vsock_skb_cb {
unsigned int src_cid;
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index f83b655fdbe9..f0e5db0eb43a 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -265,6 +265,22 @@ static struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
return NULL;
}
+struct sock *
+vsock_find_bound_dgram_socket(struct sockaddr_vm *addr)
+{
+ struct sock *sk;
+
+ spin_lock_bh(&vsock_dgram_table_lock);
+ sk = vsock_find_bound_socket_common(addr, vsock_bound_dgram_sockets(addr));
+ if (sk)
+ sock_hold(sk);
+
+ spin_unlock_bh(&vsock_dgram_table_lock);
+
+ return sk;
+}
+EXPORT_SYMBOL_GPL(vsock_find_bound_dgram_socket);
+
static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
{
return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
--
2.20.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 09/14] virtio/vsock: add common datagram recv path
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
` (7 preceding siblings ...)
2024-07-10 21:25 ` [RFC PATCH net-next v6 08/14] af_vsock: add vsock_find_bound_dgram_socket() Amery Hung
@ 2024-07-10 21:25 ` Amery Hung
2024-07-23 14:42 ` Stefano Garzarella
2024-07-10 21:25 ` [RFC PATCH net-next v6 10/14] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Amery Hung
` (6 subsequent siblings)
15 siblings, 1 reply; 51+ messages in thread
From: Amery Hung @ 2024-07-10 21:25 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
From: Bobby Eshleman <bobby.eshleman@bytedance.com>
This commit adds the common datagram receive functionality for virtio
transports. It does not add the vhost/virtio users of that
functionality.
This functionality includes:
- changes to the virtio_transport_recv_pkt() path for finding the
bound socket receiver for incoming packets
- virtio_transport_recv_pkt() saves the source cid and port to the
control buffer for recvmsg() to initialize sockaddr_vm structure
when using datagram
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
---
net/vmw_vsock/virtio_transport_common.c | 79 +++++++++++++++++++++----
1 file changed, 66 insertions(+), 13 deletions(-)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 46cd1807f8e3..a571b575fde9 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -235,7 +235,9 @@ EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);
static u16 virtio_transport_get_type(struct sock *sk)
{
- if (sk->sk_type == SOCK_STREAM)
+ if (sk->sk_type == SOCK_DGRAM)
+ return VIRTIO_VSOCK_TYPE_DGRAM;
+ else if (sk->sk_type == SOCK_STREAM)
return VIRTIO_VSOCK_TYPE_STREAM;
else
return VIRTIO_VSOCK_TYPE_SEQPACKET;
@@ -1422,6 +1424,33 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
kfree_skb(skb);
}
+static void
+virtio_transport_dgram_kfree_skb(struct sk_buff *skb, int err)
+{
+ if (err == -ENOMEM)
+ kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_RCVBUFF);
+ else if (err == -ENOBUFS)
+ kfree_skb_reason(skb, SKB_DROP_REASON_PROTO_MEM);
+ else
+ kfree_skb(skb);
+}
+
+/* This function takes ownership of the skb.
+ *
+ * It either places the skb on the sk_receive_queue or frees it.
+ */
+static void
+virtio_transport_recv_dgram(struct sock *sk, struct sk_buff *skb)
+{
+ int err;
+
+ err = sock_queue_rcv_skb(sk, skb);
+ if (err) {
+ virtio_transport_dgram_kfree_skb(skb, err);
+ return;
+ }
+}
+
static int
virtio_transport_recv_connected(struct sock *sk,
struct sk_buff *skb)
@@ -1591,7 +1620,8 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
static bool virtio_transport_valid_type(u16 type)
{
return (type == VIRTIO_VSOCK_TYPE_STREAM) ||
- (type == VIRTIO_VSOCK_TYPE_SEQPACKET);
+ (type == VIRTIO_VSOCK_TYPE_SEQPACKET) ||
+ (type == VIRTIO_VSOCK_TYPE_DGRAM);
}
/* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mutex
@@ -1601,44 +1631,57 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
struct sk_buff *skb)
{
struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
+ struct vsock_skb_cb *vsock_cb;
struct sockaddr_vm src, dst;
struct vsock_sock *vsk;
struct sock *sk;
bool space_available;
+ u16 type;
vsock_addr_init(&src, le64_to_cpu(hdr->src_cid),
le32_to_cpu(hdr->src_port));
vsock_addr_init(&dst, le64_to_cpu(hdr->dst_cid),
le32_to_cpu(hdr->dst_port));
+ type = le16_to_cpu(hdr->type);
+
trace_virtio_transport_recv_pkt(src.svm_cid, src.svm_port,
dst.svm_cid, dst.svm_port,
le32_to_cpu(hdr->len),
- le16_to_cpu(hdr->type),
+ type,
le16_to_cpu(hdr->op),
le32_to_cpu(hdr->flags),
le32_to_cpu(hdr->buf_alloc),
le32_to_cpu(hdr->fwd_cnt));
- if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) {
+ if (!virtio_transport_valid_type(type)) {
(void)virtio_transport_reset_no_sock(t, skb);
goto free_pkt;
}
- /* The socket must be in connected or bound table
- * otherwise send reset back
+ /* For stream/seqpacket, the socket must be in connected or bound table
+ * otherwise send reset back.
+ *
+ * For datagrams, no reset is sent back.
*/
sk = vsock_find_connected_socket(&src, &dst);
if (!sk) {
- sk = vsock_find_bound_socket(&dst);
- if (!sk) {
- (void)virtio_transport_reset_no_sock(t, skb);
- goto free_pkt;
+ if (type == VIRTIO_VSOCK_TYPE_DGRAM) {
+ sk = vsock_find_bound_dgram_socket(&dst);
+ if (!sk)
+ goto free_pkt;
+ } else {
+ sk = vsock_find_bound_socket(&dst);
+ if (!sk) {
+ (void)virtio_transport_reset_no_sock(t, skb);
+ goto free_pkt;
+ }
}
}
- if (virtio_transport_get_type(sk) != le16_to_cpu(hdr->type)) {
- (void)virtio_transport_reset_no_sock(t, skb);
+ if (virtio_transport_get_type(sk) != type) {
+ if (type != VIRTIO_VSOCK_TYPE_DGRAM)
+ (void)virtio_transport_reset_no_sock(t, skb);
sock_put(sk);
goto free_pkt;
}
@@ -1654,12 +1697,21 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
/* Check if sk has been closed before lock_sock */
if (sock_flag(sk, SOCK_DONE)) {
- (void)virtio_transport_reset_no_sock(t, skb);
+ if (type != VIRTIO_VSOCK_TYPE_DGRAM)
+ (void)virtio_transport_reset_no_sock(t, skb);
release_sock(sk);
sock_put(sk);
goto free_pkt;
}
+ if (sk->sk_type == SOCK_DGRAM) {
+ vsock_cb = vsock_skb_cb(skb);
+ vsock_cb->src_cid = src.svm_cid;
+ vsock_cb->src_port = src.svm_port;
+ virtio_transport_recv_dgram(sk, skb);
+ goto out;
+ }
+
space_available = virtio_transport_space_update(sk, skb);
/* Update CID in case it has changed after a transport reset event */
@@ -1691,6 +1743,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
break;
}
+out:
release_sock(sk);
/* Release refcnt obtained when we fetched this socket out of the
--
2.20.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 10/14] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
` (8 preceding siblings ...)
2024-07-10 21:25 ` [RFC PATCH net-next v6 09/14] virtio/vsock: add common datagram recv path Amery Hung
@ 2024-07-10 21:25 ` Amery Hung
2024-07-10 21:25 ` [RFC PATCH net-next v6 11/14] vhost/vsock: implement datagram support Amery Hung
` (5 subsequent siblings)
15 siblings, 0 replies; 51+ messages in thread
From: Amery Hung @ 2024-07-10 21:25 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
From: Bobby Eshleman <bobby.eshleman@bytedance.com>
This commit adds a feature bit for virtio vsock to support datagrams.
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
---
include/uapi/linux/virtio_vsock.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
index 331be28b1d30..27b4b2b8bf13 100644
--- a/include/uapi/linux/virtio_vsock.h
+++ b/include/uapi/linux/virtio_vsock.h
@@ -40,6 +40,7 @@
/* The feature bitmap for virtio vsock */
#define VIRTIO_VSOCK_F_SEQPACKET 1 /* SOCK_SEQPACKET supported */
+#define VIRTIO_VSOCK_F_DGRAM 3 /* SOCK_DGRAM supported */
struct virtio_vsock_config {
__le64 guest_cid;
--
2.20.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 11/14] vhost/vsock: implement datagram support
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
` (9 preceding siblings ...)
2024-07-10 21:25 ` [RFC PATCH net-next v6 10/14] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Amery Hung
@ 2024-07-10 21:25 ` Amery Hung
2024-07-10 21:25 ` [RFC PATCH net-next v6 12/14] vsock/loopback: " Amery Hung
` (4 subsequent siblings)
15 siblings, 0 replies; 51+ messages in thread
From: Amery Hung @ 2024-07-10 21:25 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
From: Bobby Eshleman <bobby.eshleman@bytedance.com>
This commit implements datagram support for vhost/vsock by teaching
vhost to use the common virtio transport datagram functions.
If the virtio RX buffer is too small, then the transmission is
abandoned, the packet dropped, and EHOSTUNREACH is added to the socket's
error queue.
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
---
drivers/vhost/vsock.c | 60 ++++++++++++++++++++++++++++++++++++++--
net/vmw_vsock/af_vsock.c | 2 +-
2 files changed, 58 insertions(+), 4 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index fa1aefb78016..13c3cbff21da 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -8,6 +8,7 @@
*/
#include <linux/miscdevice.h>
#include <linux/atomic.h>
+#include <linux/errqueue.h>
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/vmalloc.h>
@@ -32,7 +33,8 @@
enum {
VHOST_VSOCK_FEATURES = VHOST_FEATURES |
(1ULL << VIRTIO_F_ACCESS_PLATFORM) |
- (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
+ (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
+ (1ULL << VIRTIO_VSOCK_F_DGRAM)
};
enum {
@@ -56,6 +58,7 @@ struct vhost_vsock {
atomic_t queued_replies;
u32 guest_cid;
+ bool dgram_allow;
bool seqpacket_allow;
};
@@ -86,6 +89,32 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
return NULL;
}
+/* Claims ownership of the skb, do not free the skb after calling! */
+static void
+vhost_transport_error(struct sk_buff *skb, int err)
+{
+ struct sock_exterr_skb *serr;
+ struct sock *sk = skb->sk;
+ struct sk_buff *clone;
+
+ serr = SKB_EXT_ERR(skb);
+ memset(serr, 0, sizeof(*serr));
+ serr->ee.ee_errno = err;
+ serr->ee.ee_origin = SO_EE_ORIGIN_NONE;
+
+ clone = skb_clone(skb, GFP_KERNEL);
+ if (!clone)
+ goto out;
+
+ if (sock_queue_err_skb(sk, clone))
+ kfree_skb(clone);
+
+ sk->sk_err = err;
+ sk_error_report(sk);
+out:
+ kfree_skb(skb);
+}
+
static void
vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
struct vhost_virtqueue *vq)
@@ -162,9 +191,15 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
hdr = virtio_vsock_hdr(skb);
/* If the packet is greater than the space available in the
- * buffer, we split it using multiple buffers.
+ * buffer, we split it using multiple buffers for connectible
+ * sockets and drop the packet for datagram sockets.
*/
if (payload_len > iov_len - sizeof(*hdr)) {
+ if (le16_to_cpu(hdr->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
+ vhost_transport_error(skb, EHOSTUNREACH);
+ continue;
+ }
+
payload_len = iov_len - sizeof(*hdr);
/* As we are copying pieces of large packet's buffer to
@@ -403,6 +438,22 @@ static bool vhost_transport_msgzerocopy_allow(void)
return true;
}
+static bool vhost_transport_dgram_allow(u32 cid, u32 port)
+{
+ struct vhost_vsock *vsock;
+ bool dgram_allow = false;
+
+ rcu_read_lock();
+ vsock = vhost_vsock_get(cid);
+
+ if (vsock)
+ dgram_allow = vsock->dgram_allow;
+
+ rcu_read_unlock();
+
+ return dgram_allow;
+}
+
static bool vhost_transport_seqpacket_allow(u32 remote_cid);
static struct virtio_transport vhost_transport = {
@@ -419,7 +470,7 @@ static struct virtio_transport vhost_transport = {
.cancel_pkt = vhost_transport_cancel_pkt,
.dgram_enqueue = virtio_transport_dgram_enqueue,
- .dgram_allow = virtio_transport_dgram_allow,
+ .dgram_allow = vhost_transport_dgram_allow,
.stream_enqueue = virtio_transport_stream_enqueue,
.stream_dequeue = virtio_transport_stream_dequeue,
@@ -811,6 +862,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
vsock->seqpacket_allow = true;
+ if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM))
+ vsock->dgram_allow = true;
+
for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
vq = &vsock->vqs[i];
mutex_lock(&vq->mutex);
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index f0e5db0eb43a..344db0f3a602 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1463,7 +1463,7 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
return prot->recvmsg(sk, msg, len, flags, NULL);
#endif
- if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
+ if (unlikely(flags & MSG_OOB))
return -EOPNOTSUPP;
if (unlikely(flags & MSG_ERRQUEUE))
--
2.20.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 12/14] vsock/loopback: implement datagram support
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
` (10 preceding siblings ...)
2024-07-10 21:25 ` [RFC PATCH net-next v6 11/14] vhost/vsock: implement datagram support Amery Hung
@ 2024-07-10 21:25 ` Amery Hung
2024-08-01 12:18 ` Luigi Leonardi
2024-07-10 21:25 ` [RFC PATCH net-next v6 13/14] virtio/vsock: " Amery Hung
` (3 subsequent siblings)
15 siblings, 1 reply; 51+ messages in thread
From: Amery Hung @ 2024-07-10 21:25 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
From: Bobby Eshleman <bobby.eshleman@bytedance.com>
This commit implements datagram support for vsock loopback.
Not much more than simply toggling on "dgram_allow" and continuing to
use the common virtio functions.
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
---
net/vmw_vsock/vsock_loopback.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
index 4dd4886f29d1..0de4e2c8573c 100644
--- a/net/vmw_vsock/vsock_loopback.c
+++ b/net/vmw_vsock/vsock_loopback.c
@@ -46,6 +46,11 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk)
return 0;
}
+static bool vsock_loopback_dgram_allow(u32 cid, u32 port)
+{
+ return true;
+}
+
static bool vsock_loopback_seqpacket_allow(u32 remote_cid);
static bool vsock_loopback_msgzerocopy_allow(void)
{
@@ -66,7 +71,7 @@ static struct virtio_transport loopback_transport = {
.cancel_pkt = vsock_loopback_cancel_pkt,
.dgram_enqueue = virtio_transport_dgram_enqueue,
- .dgram_allow = virtio_transport_dgram_allow,
+ .dgram_allow = vsock_loopback_dgram_allow,
.stream_dequeue = virtio_transport_stream_dequeue,
.stream_enqueue = virtio_transport_stream_enqueue,
--
2.20.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 13/14] virtio/vsock: implement datagram support
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
` (11 preceding siblings ...)
2024-07-10 21:25 ` [RFC PATCH net-next v6 12/14] vsock/loopback: " Amery Hung
@ 2024-07-10 21:25 ` Amery Hung
2024-07-11 23:02 ` Luigi Leonardi
2024-07-10 21:25 ` [RFC PATCH net-next v6 14/14] test/vsock: add vsock dgram tests Amery Hung
` (2 subsequent siblings)
15 siblings, 1 reply; 51+ messages in thread
From: Amery Hung @ 2024-07-10 21:25 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
From: Bobby Eshleman <bobby.eshleman@bytedance.com>
This commit implements datagram support with a new version of
->dgram_allow().
Additionally, it drops virtio_transport_dgram_allow() as an exported
symbol because it is no longer used in other transports.
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
---
include/linux/virtio_vsock.h | 1 -
net/vmw_vsock/virtio_transport.c | 22 +++++++++++++++++++++-
net/vmw_vsock/virtio_transport_common.c | 6 ------
3 files changed, 21 insertions(+), 8 deletions(-)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 4408749febd2..fe8fa0a9669d 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -222,7 +222,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
bool virtio_transport_stream_allow(u32 cid, u32 port);
-bool virtio_transport_dgram_allow(u32 cid, u32 port);
int virtio_transport_connect(struct vsock_sock *vsk);
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 4891b845fcde..4e1ed3b11e26 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -63,6 +63,7 @@ struct virtio_vsock {
u32 guest_cid;
bool seqpacket_allow;
+ bool dgram_allow;
/* These fields are used only in tx path in function
* 'virtio_transport_send_pkt_work()', so to save
@@ -492,6 +493,21 @@ static bool virtio_transport_msgzerocopy_allow(void)
return true;
}
+static bool virtio_transport_dgram_allow(u32 cid, u32 port)
+{
+ struct virtio_vsock *vsock;
+ bool dgram_allow;
+
+ dgram_allow = false;
+ rcu_read_lock();
+ vsock = rcu_dereference(the_virtio_vsock);
+ if (vsock)
+ dgram_allow = vsock->dgram_allow;
+ rcu_read_unlock();
+
+ return dgram_allow;
+}
+
static bool virtio_transport_seqpacket_allow(u32 remote_cid);
static struct virtio_transport virtio_transport = {
@@ -753,6 +769,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_SEQPACKET))
vsock->seqpacket_allow = true;
+ if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
+ vsock->dgram_allow = true;
+
vdev->priv = vsock;
ret = virtio_vsock_vqs_init(vsock);
@@ -850,7 +869,8 @@ static struct virtio_device_id id_table[] = {
};
static unsigned int features[] = {
- VIRTIO_VSOCK_F_SEQPACKET
+ VIRTIO_VSOCK_F_SEQPACKET,
+ VIRTIO_VSOCK_F_DGRAM
};
static struct virtio_driver virtio_vsock_driver = {
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index a571b575fde9..52f671287fe3 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1010,12 +1010,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
}
EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
-bool virtio_transport_dgram_allow(u32 cid, u32 port)
-{
- return false;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
-
int virtio_transport_connect(struct vsock_sock *vsk)
{
struct virtio_vsock_pkt_info info = {
--
2.20.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 14/14] test/vsock: add vsock dgram tests
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
` (12 preceding siblings ...)
2024-07-10 21:25 ` [RFC PATCH net-next v6 13/14] virtio/vsock: " Amery Hung
@ 2024-07-10 21:25 ` Amery Hung
2024-07-20 19:58 ` Arseniy Krasnov
2024-07-23 14:43 ` Stefano Garzarella
2024-07-23 14:38 ` [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Stefano Garzarella
2025-07-22 14:35 ` Stefano Garzarella
15 siblings, 2 replies; 51+ messages in thread
From: Amery Hung @ 2024-07-10 21:25 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
From: Bobby Eshleman <bobby.eshleman@bytedance.com>
From: Jiang Wang <jiang.wang@bytedance.com>
This commit adds tests for vsock datagram.
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Amery Hung <amery.hung@bytedance.com>
---
tools/testing/vsock/util.c | 177 ++++-
tools/testing/vsock/util.h | 10 +
tools/testing/vsock/vsock_test.c | 1032 ++++++++++++++++++++++++++----
3 files changed, 1099 insertions(+), 120 deletions(-)
diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
index 554b290fefdc..14d6cd90ca15 100644
--- a/tools/testing/vsock/util.c
+++ b/tools/testing/vsock/util.c
@@ -154,7 +154,8 @@ static int vsock_connect(unsigned int cid, unsigned int port, int type)
int ret;
int fd;
- control_expectln("LISTENING");
+ if (type != SOCK_DGRAM)
+ control_expectln("LISTENING");
fd = socket(AF_VSOCK, type, 0);
if (fd < 0) {
@@ -189,6 +190,11 @@ int vsock_seqpacket_connect(unsigned int cid, unsigned int port)
return vsock_connect(cid, port, SOCK_SEQPACKET);
}
+int vsock_dgram_connect(unsigned int cid, unsigned int port)
+{
+ return vsock_connect(cid, port, SOCK_DGRAM);
+}
+
/* Listen on <cid, port> and return the file descriptor. */
static int vsock_listen(unsigned int cid, unsigned int port, int type)
{
@@ -287,6 +293,34 @@ int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
return vsock_accept(cid, port, clientaddrp, SOCK_SEQPACKET);
}
+int vsock_dgram_bind(unsigned int cid, unsigned int port)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = port,
+ .svm_cid = cid,
+ },
+ };
+ int fd;
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ return fd;
+}
+
/* Transmit bytes from a buffer and check the return value.
*
* expected_ret:
@@ -425,6 +459,147 @@ void recv_byte(int fd, int expected_ret, int flags)
}
}
+/* Transmit bytes to the given address from a buffer and check the return value.
+ *
+ * expected_ret:
+ * <0 Negative errno (for testing errors)
+ * 0 End-of-file
+ * >0 Success (bytes successfully written)
+ */
+void sendto_buf(int fd, void *buf, size_t len, struct sockaddr *dst, socklen_t addrlen,
+ int flags, ssize_t expected_ret)
+{
+ ssize_t nwritten = 0;
+ ssize_t ret;
+
+ timeout_begin(TIMEOUT);
+ do {
+ ret = sendto(fd, buf + nwritten, len - nwritten, flags, dst, addrlen);
+ timeout_check("sendto");
+
+ if (ret == 0 || (ret < 0 && errno != EINTR))
+ break;
+
+ nwritten += ret;
+ } while (nwritten < len);
+ timeout_end();
+
+ if (expected_ret < 0) {
+ if (nwritten != -1) {
+ fprintf(stderr, "bogus sendto(2) return value %zd\n",
+ nwritten);
+ exit(EXIT_FAILURE);
+ }
+ if (errno != -expected_ret) {
+ perror("sendto");
+ exit(EXIT_FAILURE);
+ }
+ return;
+ }
+
+ if (ret < 0) {
+ perror("sendto");
+ exit(EXIT_FAILURE);
+ }
+
+ if (nwritten != expected_ret) {
+ if (ret == 0)
+ fprintf(stderr, "unexpected EOF while sending bytes\n");
+
+ fprintf(stderr, "bogus sendto(2) bytes written %zd (expected %zd)\n",
+ nwritten, expected_ret);
+ exit(EXIT_FAILURE);
+ }
+}
+
+/* Receive bytes from the given address in a buffer and check the return value.
+ *
+ * expected_ret:
+ * <0 Negative errno (for testing errors)
+ * 0 End-of-file
+ * >0 Success (bytes successfully read)
+ */
+void recvfrom_buf(int fd, void *buf, size_t len, struct sockaddr *src, socklen_t *addrlen,
+ int flags, ssize_t expected_ret)
+{
+ ssize_t nread = 0;
+ ssize_t ret;
+
+ timeout_begin(TIMEOUT);
+ do {
+ ret = recvfrom(fd, buf + nread, len - nread, flags, src, addrlen);
+ timeout_check("recvfrom");
+
+ if (ret == 0 || (ret < 0 && errno != EINTR))
+ break;
+
+ nread += ret;
+ } while (nread < len);
+ timeout_end();
+
+ if (expected_ret < 0) {
+ if (nread != -1) {
+ fprintf(stderr, "bogus recvfrom(2) return value %zd\n",
+ nread);
+ exit(EXIT_FAILURE);
+ }
+ if (errno != -expected_ret) {
+ perror("recvfrom");
+ exit(EXIT_FAILURE);
+ }
+ return;
+ }
+
+ if (ret < 0) {
+ perror("recvfrom");
+ exit(EXIT_FAILURE);
+ }
+
+ if (nread != expected_ret) {
+ if (ret == 0)
+ fprintf(stderr, "unexpected EOF while receiving bytes\n");
+
+ fprintf(stderr, "bogus recv(2) bytes read %zd (expected %zd)\n",
+ nread, expected_ret);
+ exit(EXIT_FAILURE);
+ }
+}
+
+/* Transmit one byte to the given address and check the return value.
+ *
+ * expected_ret:
+ * <0 Negative errno (for testing errors)
+ * 0 End-of-file
+ * 1 Success
+ */
+void sendto_byte(int fd, struct sockaddr *dst, socklen_t addrlen,
+ int expected_ret, int flags)
+{
+ uint8_t byte = 'A';
+
+ sendto_buf(fd, &byte, sizeof(byte), dst, addrlen, flags, expected_ret);
+}
+
+/* Receive one byte from the given address and check the return value.
+ *
+ * expected_ret:
+ * <0 Negative errno (for testing errors)
+ * 0 End-of-file
+ * 1 Success
+ */
+void recvfrom_byte(int fd, struct sockaddr *src, socklen_t *addrlen,
+ int expected_ret, int flags)
+{
+ uint8_t byte;
+
+ recvfrom_buf(fd, &byte, sizeof(byte), src, addrlen, flags, expected_ret);
+
+ if (byte != 'A') {
+ fprintf(stderr, "unexpected byte read %c\n", byte);
+ exit(EXIT_FAILURE);
+ }
+}
+
/* Run test cases. The program terminates if a failure occurs. */
void run_tests(const struct test_case *test_cases,
const struct test_opts *opts)
diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
index e95e62485959..3367262b53c9 100644
--- a/tools/testing/vsock/util.h
+++ b/tools/testing/vsock/util.h
@@ -43,17 +43,27 @@ int vsock_stream_connect(unsigned int cid, unsigned int port);
int vsock_bind_connect(unsigned int cid, unsigned int port,
unsigned int bind_port, int type);
int vsock_seqpacket_connect(unsigned int cid, unsigned int port);
+int vsock_dgram_connect(unsigned int cid, unsigned int port);
int vsock_stream_accept(unsigned int cid, unsigned int port,
struct sockaddr_vm *clientaddrp);
int vsock_stream_listen(unsigned int cid, unsigned int port);
int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
struct sockaddr_vm *clientaddrp);
+int vsock_dgram_bind(unsigned int cid, unsigned int port);
void vsock_wait_remote_close(int fd);
void send_buf(int fd, const void *buf, size_t len, int flags,
ssize_t expected_ret);
void recv_buf(int fd, void *buf, size_t len, int flags, ssize_t expected_ret);
void send_byte(int fd, int expected_ret, int flags);
void recv_byte(int fd, int expected_ret, int flags);
+void sendto_buf(int fd, void *buf, size_t len, struct sockaddr *dst,
+ socklen_t addrlen, int flags, ssize_t expected_ret);
+void recvfrom_buf(int fd, void *buf, size_t len, struct sockaddr *src,
+ socklen_t *addrlen, int flags, ssize_t expected_ret);
+void sendto_byte(int fd, struct sockaddr *dst, socklen_t addrlen,
+ int expected_ret, int flags);
+void recvfrom_byte(int fd, struct sockaddr *src, socklen_t *addrlen,
+ int expected_ret, int flags);
void run_tests(const struct test_case *test_cases,
const struct test_opts *opts);
void list_tests(const struct test_case *test_cases);
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index f851f8961247..1e1576ca87d0 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -13,6 +13,7 @@
#include <string.h>
#include <errno.h>
#include <unistd.h>
+#include <linux/errqueue.h>
#include <linux/kernel.h>
#include <sys/types.h>
#include <sys/socket.h>
@@ -26,6 +27,12 @@
#include "control.h"
#include "util.h"
+#ifndef SOL_VSOCK
+#define SOL_VSOCK 287
+#endif
+
+#define DGRAM_MSG_CNT 16
+
static void test_stream_connection_reset(const struct test_opts *opts)
{
union {
@@ -1403,125 +1410,912 @@ static void test_stream_cred_upd_on_set_rcvlowat(const struct test_opts *opts)
test_stream_credit_update_test(opts, false);
}
-static struct test_case test_cases[] = {
- {
- .name = "SOCK_STREAM connection reset",
- .run_client = test_stream_connection_reset,
- },
- {
- .name = "SOCK_STREAM bind only",
- .run_client = test_stream_bind_only_client,
- .run_server = test_stream_bind_only_server,
- },
- {
- .name = "SOCK_STREAM client close",
- .run_client = test_stream_client_close_client,
- .run_server = test_stream_client_close_server,
- },
- {
- .name = "SOCK_STREAM server close",
- .run_client = test_stream_server_close_client,
- .run_server = test_stream_server_close_server,
- },
- {
- .name = "SOCK_STREAM multiple connections",
- .run_client = test_stream_multiconn_client,
- .run_server = test_stream_multiconn_server,
- },
- {
- .name = "SOCK_STREAM MSG_PEEK",
- .run_client = test_stream_msg_peek_client,
- .run_server = test_stream_msg_peek_server,
- },
- {
- .name = "SOCK_SEQPACKET msg bounds",
- .run_client = test_seqpacket_msg_bounds_client,
- .run_server = test_seqpacket_msg_bounds_server,
- },
- {
- .name = "SOCK_SEQPACKET MSG_TRUNC flag",
- .run_client = test_seqpacket_msg_trunc_client,
- .run_server = test_seqpacket_msg_trunc_server,
- },
- {
- .name = "SOCK_SEQPACKET timeout",
- .run_client = test_seqpacket_timeout_client,
- .run_server = test_seqpacket_timeout_server,
- },
- {
- .name = "SOCK_SEQPACKET invalid receive buffer",
- .run_client = test_seqpacket_invalid_rec_buffer_client,
- .run_server = test_seqpacket_invalid_rec_buffer_server,
- },
- {
- .name = "SOCK_STREAM poll() + SO_RCVLOWAT",
- .run_client = test_stream_poll_rcvlowat_client,
- .run_server = test_stream_poll_rcvlowat_server,
- },
- {
- .name = "SOCK_SEQPACKET big message",
- .run_client = test_seqpacket_bigmsg_client,
- .run_server = test_seqpacket_bigmsg_server,
- },
- {
- .name = "SOCK_STREAM test invalid buffer",
- .run_client = test_stream_inv_buf_client,
- .run_server = test_stream_inv_buf_server,
- },
- {
- .name = "SOCK_SEQPACKET test invalid buffer",
- .run_client = test_seqpacket_inv_buf_client,
- .run_server = test_seqpacket_inv_buf_server,
- },
- {
- .name = "SOCK_STREAM virtio skb merge",
- .run_client = test_stream_virtio_skb_merge_client,
- .run_server = test_stream_virtio_skb_merge_server,
- },
- {
- .name = "SOCK_SEQPACKET MSG_PEEK",
- .run_client = test_seqpacket_msg_peek_client,
- .run_server = test_seqpacket_msg_peek_server,
- },
- {
- .name = "SOCK_STREAM SHUT_WR",
- .run_client = test_stream_shutwr_client,
- .run_server = test_stream_shutwr_server,
- },
- {
- .name = "SOCK_STREAM SHUT_RD",
- .run_client = test_stream_shutrd_client,
- .run_server = test_stream_shutrd_server,
- },
- {
- .name = "SOCK_STREAM MSG_ZEROCOPY",
- .run_client = test_stream_msgzcopy_client,
- .run_server = test_stream_msgzcopy_server,
- },
- {
- .name = "SOCK_SEQPACKET MSG_ZEROCOPY",
- .run_client = test_seqpacket_msgzcopy_client,
- .run_server = test_seqpacket_msgzcopy_server,
- },
- {
- .name = "SOCK_STREAM MSG_ZEROCOPY empty MSG_ERRQUEUE",
- .run_client = test_stream_msgzcopy_empty_errq_client,
- .run_server = test_stream_msgzcopy_empty_errq_server,
- },
- {
- .name = "SOCK_STREAM double bind connect",
- .run_client = test_double_bind_connect_client,
- .run_server = test_double_bind_connect_server,
- },
- {
- .name = "SOCK_STREAM virtio credit update + SO_RCVLOWAT",
- .run_client = test_stream_rcvlowat_def_cred_upd_client,
- .run_server = test_stream_cred_upd_on_set_rcvlowat,
- },
- {
- .name = "SOCK_STREAM virtio credit update + low rx_bytes",
- .run_client = test_stream_rcvlowat_def_cred_upd_client,
- .run_server = test_stream_cred_upd_on_low_rx_bytes,
+static void test_dgram_sendto_client(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = opts->peer_cid,
+ },
+ };
+ int fd;
+
+ /* Wait for the server to be ready */
+ control_expectln("BIND");
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0);
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_sendto_server(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = VMADDR_CID_ANY,
+ },
+ };
+ socklen_t addrlen = sizeof(addr.sa);
+ unsigned long sock_buf_size;
+ int fd;
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Set receive buffer to maximum */
+ sock_buf_size = -1;
+ if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
+ &sock_buf_size, sizeof(sock_buf_size))) {
+ perror("setsockopt(SO_RECVBUF)");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Notify the client that the server is ready */
+ control_writeln("BIND");
+
+ recvfrom_byte(fd, &addr.sa, &addrlen, 1, 0);
+
+ /* Wait for the client to finish */
+ control_expectln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_sendto_auto_bind_client(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = opts->peer_cid,
+ },
+ };
+ struct sockaddr_vm bind_addr;
+ socklen_t addrlen;
+ unsigned int port;
+ int fd;
+
+ /* Wait for the server to be ready */
+ control_expectln("BIND");
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0);
+
+ /* Get auto-bound port after sendto */
+ addrlen = sizeof(bind_addr);
+ if (getsockname(fd, (struct sockaddr *)&bind_addr, &addrlen)) {
+ perror("getsockname");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Send the port number to the server */
+ port = bind_addr.svm_port;
+ sendto_buf(fd, &port, sizeof(port), &addr.sa, sizeof(addr.svm), 0, sizeof(port));
+
+ addr.svm.svm_port = port;
+ recvfrom_byte(fd, &addr.sa, &addrlen, 1, 0);
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_sendto_auto_bind_server(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = VMADDR_CID_ANY,
+ },
+ };
+ socklen_t addrlen = sizeof(addr.sa);
+ unsigned long sock_buf_size;
+ unsigned int port;
+ int fd;
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Set receive buffer to maximum */
+ sock_buf_size = -1;
+ if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
+ &sock_buf_size, sizeof(sock_buf_size))) {
+ perror("setsockopt(SO_RECVBUF)");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Notify the client that the server is ready */
+ control_writeln("BIND");
+
+ recvfrom_byte(fd, &addr.sa, &addrlen, 1, 0);
+
+ /* Receive the port the client is listening to */
+ recvfrom_buf(fd, &port, sizeof(port), &addr.sa, &addrlen, 0, sizeof(port));
+
+ addr.svm.svm_port = port;
+ addr.svm.svm_cid = opts->peer_cid;
+ sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0);
+
+ /* Wait for the client to finish */
+ control_expectln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_connect_client(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = opts->peer_cid,
+ },
+ };
+ int ret;
+ int fd;
+
+ /* Wait for the server to be ready */
+ control_expectln("BIND");
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ ret = connect(fd, &addr.sa, sizeof(addr.svm));
+ if (ret < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ send_byte(fd, 1, 0);
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_connect_server(const struct test_opts *opts)
+{
+ test_dgram_sendto_server(opts);
+}
+
+static void test_dgram_multiconn_sendto_client(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = opts->peer_cid,
+ },
+ };
+ int fds[MULTICONN_NFDS];
+ int i;
+
+ /* Wait for the server to be ready */
+ control_expectln("BIND");
+
+ for (i = 0; i < MULTICONN_NFDS; i++) {
+ fds[i] = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fds[i] < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+ }
+
+ for (i = 0; i < MULTICONN_NFDS; i++) {
+ sendto_byte(fds[i], &addr.sa, sizeof(addr.svm), 1, 0);
+
+ /* This is here to make explicit the case of the test failing
+ * due to packet loss. The test fails when recv() times out
+ * otherwise, which is much more confusing.
+ */
+ control_expectln("PKTRECV");
+ }
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+
+ for (i = 0; i < MULTICONN_NFDS; i++)
+ close(fds[i]);
+}
+
+static void test_dgram_multiconn_sendto_server(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = VMADDR_CID_ANY,
+ },
+ };
+ int len = sizeof(addr.sa);
+ int fd;
+ int i;
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Notify the client that the server is ready */
+ control_writeln("BIND");
+
+ for (i = 0; i < MULTICONN_NFDS; i++) {
+ recvfrom_byte(fd, &addr.sa, &len, 1, 0);
+ control_writeln("PKTRECV");
+ }
+
+ /* Wait for the client to finish */
+ control_expectln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_multiconn_send_client(const struct test_opts *opts)
+{
+ int fds[MULTICONN_NFDS];
+ int i;
+
+ /* Wait for the server to be ready */
+ control_expectln("BIND");
+
+ for (i = 0; i < MULTICONN_NFDS; i++) {
+ fds[i] = vsock_dgram_connect(opts->peer_cid, 1234);
+ if (fds[i] < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+ }
+
+ for (i = 0; i < MULTICONN_NFDS; i++) {
+ send_byte(fds[i], 1, 0);
+ /* This is here to make explicit the case of the test failing
+ * due to packet loss.
+ */
+ control_expectln("PKTRECV");
+ }
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+
+ for (i = 0; i < MULTICONN_NFDS; i++)
+ close(fds[i]);
+}
+
+static void test_dgram_multiconn_send_server(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
+ .svm_cid = VMADDR_CID_ANY,
+ },
+ };
+ unsigned long sock_buf_size;
+ int fd;
+ int i;
+
+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+ if (fd < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Set receive buffer to maximum */
+ sock_buf_size = -1;
+ if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
+ &sock_buf_size, sizeof(sock_buf_size))) {
+ perror("setsockopt(SO_RECVBUF)");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Notify the client that the server is ready */
+ control_writeln("BIND");
+
+ for (i = 0; i < MULTICONN_NFDS; i++) {
+ recv_byte(fd, 1, 0);
+ control_writeln("PKTRECV");
+ }
+
+ /* Wait for the client to finish */
+ control_expectln("DONE");
+
+ close(fd);
+}
+
+/*
+ * This test is similar to the seqpacket msg bounds tests, but it is unreliable
+ * because it may also fail in the unlikely case that packets are dropped.
+ */
+static void test_dgram_bounds_unreliable_client(const struct test_opts *opts)
+{
+ unsigned long recv_buf_size;
+ unsigned long *hashes;
+ size_t max_msg_size;
+ int page_size;
+ int fd;
+ int i;
+
+ fd = vsock_dgram_connect(opts->peer_cid, 1234);
+ if (fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ hashes = malloc(DGRAM_MSG_CNT * sizeof(unsigned long));
+ if (!hashes) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Let the server know the client is ready */
+ control_writeln("CLNTREADY");
+
+ /* Wait, until receiver sets buffer size. */
+ control_expectln("SRVREADY");
+
+ recv_buf_size = control_readulong();
+
+ page_size = getpagesize();
+ max_msg_size = MAX_MSG_PAGES * page_size;
+
+ for (i = 0; i < DGRAM_MSG_CNT; i++) {
+ ssize_t send_size;
+ size_t buf_size;
+ void *buf;
+
+ /* Use "small" buffers and "big" buffers. */
+ if (opts->peer_cid <= VMADDR_CID_HOST && (i & 1))
+ buf_size = page_size +
+ (rand() % (max_msg_size - page_size));
+ else
+ buf_size = 1 + (rand() % page_size);
+
+ buf_size = min(buf_size, recv_buf_size);
+
+ buf = malloc(buf_size);
+
+ if (!buf) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ memset(buf, rand() & 0xff, buf_size);
+
+ send_size = send(fd, buf, buf_size, 0);
+ if (send_size < 0) {
+ perror("send");
+ exit(EXIT_FAILURE);
+ }
+
+ if (send_size != buf_size) {
+ fprintf(stderr, "Invalid send size\n");
+ exit(EXIT_FAILURE);
+ }
+
+ /* In theory the implementation isn't required to transmit
+ * these packets in order, so we use this PKTSENT/PKTRECV
+ * message sequence so that server and client coordinate
+ * sending and receiving one packet at a time. The client sends
+ * a packet and waits until it has been received before sending
+ * another.
+ *
+ * Also in theory these packets can be lost and the test will
+ * fail for that reason.
+ */
+ control_writeln("PKTSENT");
+ control_expectln("PKTRECV");
+
+ /* Send the server a hash of the packet */
+ hashes[i] = hash_djb2(buf, buf_size);
+ free(buf);
+ }
+
+ control_writeln("SENDDONE");
+ close(fd);
+
+ for (i = 0; i < DGRAM_MSG_CNT; i++) {
+ if (hashes[i] != control_readulong())
+ fprintf(stderr, "broken dgram message bounds or packet loss\n");
+ }
+ free(hashes);
+}
+
+static void test_dgram_bounds_unreliable_server(const struct test_opts *opts)
+{
+ unsigned long hashes[DGRAM_MSG_CNT];
+ unsigned long sock_buf_size;
+ struct msghdr msg = {0};
+ struct iovec iov = {0};
+ socklen_t len;
+ int fd;
+ int i;
+
+ fd = vsock_dgram_bind(VMADDR_CID_ANY, 1234);
+ if (fd < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Set receive buffer to maximum */
+ sock_buf_size = -1;
+ if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
+ &sock_buf_size, sizeof(sock_buf_size))) {
+ perror("setsockopt(SO_RECVBUF)");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Retrieve the receive buffer size */
+ len = sizeof(sock_buf_size);
+ if (getsockopt(fd, SOL_SOCKET, SO_RCVBUF,
+ &sock_buf_size, &len)) {
+ perror("getsockopt(SO_RECVBUF)");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Client ready to receive parameters */
+ control_expectln("CLNTREADY");
+
+ /* Ready to receive data. */
+ control_writeln("SRVREADY");
+
+ if (opts->peer_cid > VMADDR_CID_HOST)
+ control_writeulong(sock_buf_size);
+ else
+ control_writeulong(getpagesize());
+
+ iov.iov_len = MAX_MSG_PAGES * getpagesize();
+ iov.iov_base = malloc(iov.iov_len);
+ if (!iov.iov_base) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ msg.msg_iov = &iov;
+ msg.msg_iovlen = 1;
+
+ for (i = 0; i < DGRAM_MSG_CNT; i++) {
+ ssize_t recv_size;
+
+ control_expectln("PKTSENT");
+ recv_size = recvmsg(fd, &msg, 0);
+ control_writeln("PKTRECV");
+
+ if (!recv_size)
+ break;
+
+ if (recv_size < 0) {
+ perror("recvmsg");
+ exit(EXIT_FAILURE);
+ }
+
+ hashes[i] = hash_djb2(msg.msg_iov[0].iov_base, recv_size);
+ }
+
+ control_expectln("SENDDONE");
+
+ free(iov.iov_base);
+ close(fd);
+
+ for (i = 0; i < DGRAM_MSG_CNT; i++)
+ control_writeulong(hashes[i]);
+}
+
+#define POLL_TIMEOUT_MS 1000
+void vsock_recv_error(int fd)
+{
+ struct sock_extended_err *serr;
+ struct msghdr msg = { 0 };
+ struct pollfd fds = { 0 };
+ char cmsg_data[128];
+ struct cmsghdr *cm;
+ ssize_t res;
+
+ fds.fd = fd;
+ fds.events = 0;
+
+ if (poll(&fds, 1, POLL_TIMEOUT_MS) < 0) {
+ perror("poll");
+ exit(EXIT_FAILURE);
+ }
+
+ if (!(fds.revents & POLLERR)) {
+ fprintf(stderr, "POLLERR expected\n");
+ exit(EXIT_FAILURE);
+ }
+
+ msg.msg_control = cmsg_data;
+ msg.msg_controllen = sizeof(cmsg_data);
+
+ res = recvmsg(fd, &msg, MSG_ERRQUEUE);
+ if (res) {
+ fprintf(stderr, "failed to read error queue: %zi\n", res);
+ exit(EXIT_FAILURE);
+ }
+
+ cm = CMSG_FIRSTHDR(&msg);
+ if (!cm) {
+ fprintf(stderr, "cmsg: no cmsg\n");
+ exit(EXIT_FAILURE);
+ }
+
+ if (cm->cmsg_level != SOL_VSOCK) {
+ fprintf(stderr, "cmsg: unexpected 'cmsg_level'\n");
+ exit(EXIT_FAILURE);
+ }
+
+ if (cm->cmsg_type != 0) {
+ fprintf(stderr, "cmsg: unexpected 'cmsg_type'\n");
+ exit(EXIT_FAILURE);
+ }
+
+ serr = (void *)CMSG_DATA(cm);
+ if (serr->ee_origin != 0) {
+ fprintf(stderr, "serr: unexpected 'ee_origin'\n");
+ exit(EXIT_FAILURE);
+ }
+
+ if (serr->ee_errno != EHOSTUNREACH) {
+ fprintf(stderr, "serr: wrong error code: %u\n", serr->ee_errno);
+ exit(EXIT_FAILURE);
+ }
+}
+
+/*
+ * Attempt to send a packet larger than the client's RX buffer. Test that the
+ * packet was dropped and that there is an error in the error queue.
+ */
+static void test_dgram_drop_big_packets_server(const struct test_opts *opts)
+{
+ unsigned long client_rx_buf_size;
+ size_t buf_size;
+ void *buf;
+ int fd;
+
+ if (opts->peer_cid <= VMADDR_CID_HOST) {
+ printf("The server's peer must be a guest (not CID %u), skipped...\n",
+ opts->peer_cid);
+ return;
+ }
+
+ /* Wait for the server to be ready */
+ control_expectln("READY");
+
+ fd = vsock_dgram_connect(opts->peer_cid, 1234);
+ if (fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ client_rx_buf_size = control_readulong();
+
+ buf_size = client_rx_buf_size + 1;
+ buf = malloc(buf_size);
+ if (!buf) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Even though the buffer is exceeded, the send() should still succeed. */
+ if (send(fd, buf, buf_size, 0) < 0) {
+ perror("send");
+ exit(EXIT_FAILURE);
+ }
+
+ vsock_recv_error(fd);
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+
+ close(fd);
+}
+
+static void test_dgram_drop_big_packets_client(const struct test_opts *opts)
+{
+ unsigned long buf_size = getpagesize();
+
+ if (opts->peer_cid > VMADDR_CID_HOST) {
+ printf("The client's peer must be the host (not CID %u), skipped...\n",
+ opts->peer_cid);
+ return;
+ }
+
+ control_writeln("READY");
+ control_writeulong(buf_size);
+ control_expectln("DONE");
+}
+
+static void test_stream_dgram_address_collision_client(const struct test_opts *opts)
+{
+ int dgram_fd, stream_fd;
+
+ stream_fd = vsock_stream_connect(opts->peer_cid, 1234);
+ if (stream_fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ /* This simply tests if connect() causes address collision client-side.
+ * Keep in mind that there is no exchange of packets with the
+ * bound socket on the server.
+ */
+ dgram_fd = vsock_dgram_connect(opts->peer_cid, 1234);
+ if (dgram_fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ close(stream_fd);
+ close(dgram_fd);
+
+ /* Notify the server that the client has finished */
+ control_writeln("DONE");
+}
+
+static void test_stream_dgram_address_collision_server(const struct test_opts *opts)
+{
+ int dgram_fd, stream_fd;
+ struct sockaddr_vm addr;
+ socklen_t addrlen;
+
+ stream_fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, 0);
+ if (stream_fd < 0) {
+ perror("accept");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Retrieve the CID/port for re-use. */
+ addrlen = sizeof(addr);
+ if (getsockname(stream_fd, (struct sockaddr *)&addr, &addrlen)) {
+ perror("getsockname");
+ exit(EXIT_FAILURE);
+ }
+
+ /* See not in the client function about the pairwise connect call. */
+ dgram_fd = vsock_dgram_bind(addr.svm_cid, addr.svm_port);
+ if (dgram_fd < 0) {
+ perror("bind");
+ exit(EXIT_FAILURE);
+ }
+
+ control_expectln("DONE");
+
+ close(stream_fd);
+ close(dgram_fd);
+}
+
+static struct test_case test_cases[] = {
+ {
+ .name = "SOCK_STREAM connection reset",
+ .run_client = test_stream_connection_reset,
+ },
+ {
+ .name = "SOCK_STREAM bind only",
+ .run_client = test_stream_bind_only_client,
+ .run_server = test_stream_bind_only_server,
+ },
+ {
+ .name = "SOCK_STREAM client close",
+ .run_client = test_stream_client_close_client,
+ .run_server = test_stream_client_close_server,
+ },
+ {
+ .name = "SOCK_STREAM server close",
+ .run_client = test_stream_server_close_client,
+ .run_server = test_stream_server_close_server,
+ },
+ {
+ .name = "SOCK_STREAM multiple connections",
+ .run_client = test_stream_multiconn_client,
+ .run_server = test_stream_multiconn_server,
+ },
+ {
+ .name = "SOCK_STREAM MSG_PEEK",
+ .run_client = test_stream_msg_peek_client,
+ .run_server = test_stream_msg_peek_server,
+ },
+ {
+ .name = "SOCK_SEQPACKET msg bounds",
+ .run_client = test_seqpacket_msg_bounds_client,
+ .run_server = test_seqpacket_msg_bounds_server,
+ },
+ {
+ .name = "SOCK_SEQPACKET MSG_TRUNC flag",
+ .run_client = test_seqpacket_msg_trunc_client,
+ .run_server = test_seqpacket_msg_trunc_server,
+ },
+ {
+ .name = "SOCK_SEQPACKET timeout",
+ .run_client = test_seqpacket_timeout_client,
+ .run_server = test_seqpacket_timeout_server,
+ },
+ {
+ .name = "SOCK_SEQPACKET invalid receive buffer",
+ .run_client = test_seqpacket_invalid_rec_buffer_client,
+ .run_server = test_seqpacket_invalid_rec_buffer_server,
+ },
+ {
+ .name = "SOCK_STREAM poll() + SO_RCVLOWAT",
+ .run_client = test_stream_poll_rcvlowat_client,
+ .run_server = test_stream_poll_rcvlowat_server,
+ },
+ {
+ .name = "SOCK_SEQPACKET big message",
+ .run_client = test_seqpacket_bigmsg_client,
+ .run_server = test_seqpacket_bigmsg_server,
+ },
+ {
+ .name = "SOCK_STREAM test invalid buffer",
+ .run_client = test_stream_inv_buf_client,
+ .run_server = test_stream_inv_buf_server,
+ },
+ {
+ .name = "SOCK_SEQPACKET test invalid buffer",
+ .run_client = test_seqpacket_inv_buf_client,
+ .run_server = test_seqpacket_inv_buf_server,
+ },
+ {
+ .name = "SOCK_STREAM virtio skb merge",
+ .run_client = test_stream_virtio_skb_merge_client,
+ .run_server = test_stream_virtio_skb_merge_server,
+ },
+ {
+ .name = "SOCK_SEQPACKET MSG_PEEK",
+ .run_client = test_seqpacket_msg_peek_client,
+ .run_server = test_seqpacket_msg_peek_server,
+ },
+ {
+ .name = "SOCK_STREAM SHUT_WR",
+ .run_client = test_stream_shutwr_client,
+ .run_server = test_stream_shutwr_server,
+ },
+ {
+ .name = "SOCK_STREAM SHUT_RD",
+ .run_client = test_stream_shutrd_client,
+ .run_server = test_stream_shutrd_server,
+ },
+ {
+ .name = "SOCK_STREAM MSG_ZEROCOPY",
+ .run_client = test_stream_msgzcopy_client,
+ .run_server = test_stream_msgzcopy_server,
+ },
+ {
+ .name = "SOCK_SEQPACKET MSG_ZEROCOPY",
+ .run_client = test_seqpacket_msgzcopy_client,
+ .run_server = test_seqpacket_msgzcopy_server,
+ },
+ {
+ .name = "SOCK_STREAM MSG_ZEROCOPY empty MSG_ERRQUEUE",
+ .run_client = test_stream_msgzcopy_empty_errq_client,
+ .run_server = test_stream_msgzcopy_empty_errq_server,
+ },
+ {
+ .name = "SOCK_STREAM double bind connect",
+ .run_client = test_double_bind_connect_client,
+ .run_server = test_double_bind_connect_server,
+ },
+ {
+ .name = "SOCK_STREAM virtio credit update + SO_RCVLOWAT",
+ .run_client = test_stream_rcvlowat_def_cred_upd_client,
+ .run_server = test_stream_cred_upd_on_set_rcvlowat,
+ },
+ {
+ .name = "SOCK_STREAM virtio credit update + low rx_bytes",
+ .run_client = test_stream_rcvlowat_def_cred_upd_client,
+ .run_server = test_stream_cred_upd_on_low_rx_bytes,
+ },
+ {
+ .name = "SOCK_DGRAM client sendto",
+ .run_client = test_dgram_sendto_client,
+ .run_server = test_dgram_sendto_server,
+ },
+ {
+ .name = "SOCK_DGRAM client sendto auto bind",
+ .run_client = test_dgram_sendto_auto_bind_client,
+ .run_server = test_dgram_sendto_auto_bind_server,
+ },
+ {
+ .name = "SOCK_DGRAM client connect",
+ .run_client = test_dgram_connect_client,
+ .run_server = test_dgram_connect_server,
+ },
+ {
+ .name = "SOCK_DGRAM multiple connections using sendto",
+ .run_client = test_dgram_multiconn_sendto_client,
+ .run_server = test_dgram_multiconn_sendto_server,
+ },
+ {
+ .name = "SOCK_DGRAM multiple connections using send",
+ .run_client = test_dgram_multiconn_send_client,
+ .run_server = test_dgram_multiconn_send_server,
+ },
+ {
+ .name = "SOCK_DGRAM msg bounds unreliable",
+ .run_client = test_dgram_bounds_unreliable_client,
+ .run_server = test_dgram_bounds_unreliable_server,
+ },
+ {
+ .name = "SOCK_DGRAM drop big packets",
+ .run_client = test_dgram_drop_big_packets_client,
+ .run_server = test_dgram_drop_big_packets_server,
+ },
+ {
+ .name = "SOCK_STREAM and SOCK_DGRAM address collision",
+ .run_client = test_stream_dgram_address_collision_client,
+ .run_server = test_stream_dgram_address_collision_server,
},
{},
};
--
2.20.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 13/14] virtio/vsock: implement datagram support
2024-07-10 21:25 ` [RFC PATCH net-next v6 13/14] virtio/vsock: " Amery Hung
@ 2024-07-11 23:02 ` Luigi Leonardi
2024-07-11 23:07 ` Amery Hung
0 siblings, 1 reply; 51+ messages in thread
From: Luigi Leonardi @ 2024-07-11 23:02 UTC (permalink / raw)
To: ameryhung, Bobby Eshleman
Cc: amery.hung, bpf, bryantan, dan.carpenter, davem, decui, edumazet,
haiyangz, jasowang, jiang.wang, kuba, kvm, kys, linux-hyperv,
linux-kernel, mst, netdev, oxffffaa, pabeni, pv-drivers, sgarzare,
simon.horman, stefanha, vdasa, virtualization, wei.liu,
xiyou.wangcong, xuanzhuo, Luigi Leonardi
Hi Bobby, Amery
Thank you for working on this!
> This commit implements datagram support with a new version of
> ->dgram_allow().
Commit messages should be imperative "This commit implements X" -> "Implements X".
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
This suggestion applies to many of the commits in this series.
> +static bool virtio_transport_dgram_allow(u32 cid, u32 port)
> +{
> + struct virtio_vsock *vsock;
> + bool dgram_allow;
> +
> + dgram_allow = false;
I think you can initialize the variable in the declaration.
> + rcu_read_lock();
> + vsock = rcu_dereference(the_virtio_vsock);
> + if (vsock)
> + dgram_allow = vsock->dgram_allow;
> + rcu_read_unlock();
> +
> + return dgram_allow;
> +}
> +
The rest LGTM.
Thanks,
Luigi
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 13/14] virtio/vsock: implement datagram support
2024-07-11 23:02 ` Luigi Leonardi
@ 2024-07-11 23:07 ` Amery Hung
0 siblings, 0 replies; 51+ messages in thread
From: Amery Hung @ 2024-07-11 23:07 UTC (permalink / raw)
To: Luigi Leonardi
Cc: Bobby Eshleman, amery.hung, bpf, bryantan, dan.carpenter, davem,
decui, edumazet, haiyangz, jasowang, jiang.wang, kuba, kvm, kys,
linux-hyperv, linux-kernel, mst, netdev, oxffffaa, pabeni,
pv-drivers, sgarzare, simon.horman, stefanha, vdasa,
virtualization, wei.liu, xiyou.wangcong, xuanzhuo
On Thu, Jul 11, 2024 at 4:03 PM Luigi Leonardi
<luigi.leonardi@outlook.com> wrote:
>
> Hi Bobby, Amery
>
> Thank you for working on this!
>
> > This commit implements datagram support with a new version of
> > ->dgram_allow().
>
> Commit messages should be imperative "This commit implements X" -> "Implements X".
> https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
> This suggestion applies to many of the commits in this series.
Thanks for pointing this out. I will change the commit message in the
next version.
>
> > +static bool virtio_transport_dgram_allow(u32 cid, u32 port)
> > +{
> > + struct virtio_vsock *vsock;
> > + bool dgram_allow;
> > +
> > + dgram_allow = false;
>
> I think you can initialize the variable in the declaration.
Got it.
Thanks,
Amery
>
> > + rcu_read_lock();
> > + vsock = rcu_dereference(the_virtio_vsock);
> > + if (vsock)
> > + dgram_allow = vsock->dgram_allow;
> > + rcu_read_unlock();
> > +
> > + return dgram_allow;
> > +}
> > +
>
> The rest LGTM.
>
> Thanks,
> Luigi
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 01/14] af_vsock: generalize vsock_dgram_recvmsg() to all transports
2024-07-10 21:25 ` [RFC PATCH net-next v6 01/14] af_vsock: generalize vsock_dgram_recvmsg() to all transports Amery Hung
@ 2024-07-15 8:02 ` Luigi Leonardi
2024-07-15 23:39 ` Amery Hung
2024-07-29 19:25 ` Arseniy Krasnov
1 sibling, 1 reply; 51+ messages in thread
From: Luigi Leonardi @ 2024-07-15 8:02 UTC (permalink / raw)
To: ameryhung
Cc: amery.hung, bobby.eshleman, bpf, bryantan, dan.carpenter, davem,
decui, edumazet, haiyangz, jasowang, jiang.wang, kuba, kvm, kys,
linux-hyperv, linux-kernel, mst, netdev, oxffffaa, pabeni,
pv-drivers, sgarzare, simon.horman, stefanha, vdasa,
virtualization, wei.liu, xiyou.wangcong, xuanzhuo, Luigi Leonardi
Hi Amery, Bobby
> From: Bobby Eshleman <bobby.eshleman@bytedance.com>
>
> This commit drops the transport->dgram_dequeue callback and makes
> vsock_dgram_recvmsg() generic to all transports.
>
> To make this possible, two transport-level changes are introduced:
> - transport in the receiving path now stores the cid and port into
> the control buffer of an skb when populating an skb. The information
> later is used to initialize sockaddr_vm structure in recvmsg()
> without referencing vsk->transport.
> - transport implementations set the skb->data pointer to the beginning
> of the payload prior to adding the skb to the socket's receive queue.
> That is, they must use skb_pull() before enqueuing. This is an
> agreement between the transport and the socket layer that skb->data
> always points to the beginning of the payload (and not, for example,
> the packet header).
>
Like in the other patch, please use imperative in the commit message.
>
> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> Signed-off-by: Amery Hung <amery.hung@bytedance.com>
> ---
> drivers/vhost/vsock.c | 1 -
> include/linux/virtio_vsock.h | 5 ---
> include/net/af_vsock.h | 11 ++++-
> net/vmw_vsock/af_vsock.c | 42 +++++++++++++++++-
> net/vmw_vsock/hyperv_transport.c | 7 ---
> net/vmw_vsock/virtio_transport.c | 1 -
> net/vmw_vsock/virtio_transport_common.c | 9 ----
> net/vmw_vsock/vmci_transport.c | 59 +++----------------------
> net/vmw_vsock/vsock_loopback.c | 1 -
> 9 files changed, 55 insertions(+), 81 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index ec20ecff85c7..97fffa914e66 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -419,7 +419,6 @@ static struct virtio_transport vhost_transport = {
> .cancel_pkt = vhost_transport_cancel_pkt,
>
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> - .dgram_dequeue = virtio_transport_dgram_dequeue,
> .dgram_bind = virtio_transport_dgram_bind,
> .dgram_allow = virtio_transport_dgram_allow,
>
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index c82089dee0c8..8b56b8a19ddd 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -177,11 +177,6 @@ virtio_transport_stream_dequeue(struct vsock_sock *vsk,
> size_t len,
> int type);
> int
> -virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> - struct msghdr *msg,
> - size_t len, int flags);
> -
> -int
> virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> struct msghdr *msg,
> size_t len);
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index 535701efc1e5..7aa1f5f2b1a5 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -120,8 +120,6 @@ struct vsock_transport {
>
> /* DGRAM. */
> int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
> - int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
> - size_t len, int flags);
> int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
> struct msghdr *, size_t len);
> bool (*dgram_allow)(u32 cid, u32 port);
> @@ -219,6 +217,15 @@ void vsock_for_each_connected_socket(struct vsock_transport *transport,
> int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
> bool vsock_find_cid(unsigned int cid);
>
> +struct vsock_skb_cb {
> + unsigned int src_cid;
> + unsigned int src_port;
> +};
> +
> +static inline struct vsock_skb_cb *vsock_skb_cb(struct sk_buff *skb) {
> + return (struct vsock_skb_cb *)skb->cb;
> +};
> +
>
Running scripts/checkpatch.pl --strict --codespell on the patch shows this error:
ERROR: open brace '{' following function definitions go on the next line
#183: FILE: include/net/af_vsock.h:225:
+static inline struct vsock_skb_cb *vsock_skb_cb(struct sk_buff *skb) {
total: 1 errors, 0 warnings, 0 checks, 235 lines checked
>
> /**** TAP ****/
>
> struct vsock_tap {
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index 4b040285aa78..5e7d4d99ea2c 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -1273,11 +1273,15 @@ static int vsock_dgram_connect(struct socket *sock,
> int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> size_t len, int flags)
> {
> + struct vsock_skb_cb *vsock_cb;
> #ifdef CONFIG_BPF_SYSCALL
> const struct proto *prot;
> #endif
> struct vsock_sock *vsk;
> + struct sk_buff *skb;
> + size_t payload_len;
> struct sock *sk;
> + int err;
>
> sk = sock->sk;
> vsk = vsock_sk(sk);
> @@ -1288,7 +1292,43 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> return prot->recvmsg(sk, msg, len, flags, NULL);
> #endif
>
> - return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
> + if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> + return -EOPNOTSUPP;
> +
> + if (unlikely(flags & MSG_ERRQUEUE))
> + return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 0);
>
This if statement is always false!
>
> +
> + /* Retrieve the head sk_buff from the socket's receive queue. */
> + err = 0;
> + skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
> + if (!skb)
> + return err;
> +
> + payload_len = skb->len;
> +
nit: I'd remove this blank line.
> + if (payload_len > len) {
> + payload_len = len;
> + msg->msg_flags |= MSG_TRUNC;
> + }
> +
> + /* Place the datagram payload in the user's iovec. */
> + err = skb_copy_datagram_msg(skb, 0, msg, payload_len);
> + if (err)
> + goto out;
> +
> + if (msg->msg_name) {
> + /* Provide the address of the sender. */
> + DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> +
> + vsock_cb = vsock_skb_cb(skb);
> + vsock_addr_init(vm_addr, vsock_cb->src_cid, vsock_cb->src_port);
> + msg->msg_namelen = sizeof(*vm_addr);
> + }
> + err = payload_len;
> +
> +out:
> + skb_free_datagram(&vsk->sk, skb);
> + return err;
> }
> EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
>
> diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> index e2157e387217..326dd41ee2d5 100644
> --- a/net/vmw_vsock/hyperv_transport.c
> +++ b/net/vmw_vsock/hyperv_transport.c
> @@ -556,12 +556,6 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> return -EOPNOTSUPP;
> }
>
> -static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
> - size_t len, int flags)
> -{
> - return -EOPNOTSUPP;
> -}
> -
> static int hvs_dgram_enqueue(struct vsock_sock *vsk,
> struct sockaddr_vm *remote, struct msghdr *msg,
> size_t dgram_len)
> @@ -833,7 +827,6 @@ static struct vsock_transport hvs_transport = {
> .shutdown = hvs_shutdown,
>
> .dgram_bind = hvs_dgram_bind,
> - .dgram_dequeue = hvs_dgram_dequeue,
> .dgram_enqueue = hvs_dgram_enqueue,
> .dgram_allow = hvs_dgram_allow,
>
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 43d405298857..a8c97e95622a 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -508,7 +508,6 @@ static struct virtio_transport virtio_transport = {
> .cancel_pkt = virtio_transport_cancel_pkt,
>
> .dgram_bind = virtio_transport_dgram_bind,
> - .dgram_dequeue = virtio_transport_dgram_dequeue,
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> .dgram_allow = virtio_transport_dgram_allow,
>
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 16ff976a86e3..4bf73d20c12a 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -810,15 +810,6 @@ virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> }
> EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_enqueue);
>
> -int
> -virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> - struct msghdr *msg,
> - size_t len, int flags)
> -{
> - return -EOPNOTSUPP;
> -}
> -EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
> -
> s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
> {
> struct virtio_vsock_sock *vvs = vsk->trans;
> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> index b370070194fa..b39df3ed8c8d 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c
> @@ -610,6 +610,7 @@ vmci_transport_datagram_create_hnd(u32 resource_id,
>
> static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg)
> {
> + struct vsock_skb_cb *vsock_cb;
> struct sock *sk;
> size_t size;
> struct sk_buff *skb;
> @@ -637,10 +638,14 @@ static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg)
> if (!skb)
> return VMCI_ERROR_NO_MEM;
>
> + vsock_cb = vsock_skb_cb(skb);
> + vsock_cb->src_cid = dg->src.context;
> + vsock_cb->src_port = dg->src.resource;
> /* sk_receive_skb() will do a sock_put(), so hold here. */
> sock_hold(sk);
> skb_put(skb, size);
> memcpy(skb->data, dg, size);
> + skb_pull(skb, VMCI_DG_HEADERSIZE);
> sk_receive_skb(sk, skb, 0);
>
> return VMCI_SUCCESS;
> @@ -1731,59 +1736,6 @@ static int vmci_transport_dgram_enqueue(
> return err - sizeof(*dg);
> }
>
> -static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
> - struct msghdr *msg, size_t len,
> - int flags)
> -{
> - int err;
> - struct vmci_datagram *dg;
> - size_t payload_len;
> - struct sk_buff *skb;
> -
> - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> - return -EOPNOTSUPP;
> -
> - /* Retrieve the head sk_buff from the socket's receive queue. */
> - err = 0;
> - skb = skb_recv_datagram(&vsk->sk, flags, &err);
> - if (!skb)
> - return err;
> -
> - dg = (struct vmci_datagram *)skb->data;
> - if (!dg)
> - /* err is 0, meaning we read zero bytes. */
> - goto out;
> -
> - payload_len = dg->payload_size;
> - /* Ensure the sk_buff matches the payload size claimed in the packet. */
> - if (payload_len != skb->len - sizeof(*dg)) {
> - err = -EINVAL;
> - goto out;
> - }
> -
> - if (payload_len > len) {
> - payload_len = len;
> - msg->msg_flags |= MSG_TRUNC;
> - }
> -
> - /* Place the datagram payload in the user's iovec. */
> - err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
> - if (err)
> - goto out;
> -
> - if (msg->msg_name) {
> - /* Provide the address of the sender. */
> - DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> - vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
> - msg->msg_namelen = sizeof(*vm_addr);
> - }
> - err = payload_len;
> -
> -out:
> - skb_free_datagram(&vsk->sk, skb);
> - return err;
> -}
> -
> static bool vmci_transport_dgram_allow(u32 cid, u32 port)
> {
> if (cid == VMADDR_CID_HYPERVISOR) {
> @@ -2040,7 +1992,6 @@ static struct vsock_transport vmci_transport = {
> .release = vmci_transport_release,
> .connect = vmci_transport_connect,
> .dgram_bind = vmci_transport_dgram_bind,
> - .dgram_dequeue = vmci_transport_dgram_dequeue,
> .dgram_enqueue = vmci_transport_dgram_enqueue,
> .dgram_allow = vmci_transport_dgram_allow,
> .stream_dequeue = vmci_transport_stream_dequeue,
> diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> index 6dea6119f5b2..11488887a5cc 100644
> --- a/net/vmw_vsock/vsock_loopback.c
> +++ b/net/vmw_vsock/vsock_loopback.c
> @@ -66,7 +66,6 @@ static struct virtio_transport loopback_transport = {
> .cancel_pkt = vsock_loopback_cancel_pkt,
>
> .dgram_bind = virtio_transport_dgram_bind,
> - .dgram_dequeue = virtio_transport_dgram_dequeue,
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> .dgram_allow = virtio_transport_dgram_allow,
>
> --
> 2.20.1
>
>
Small changes :)
Rest LGTM!
Thanks,
Luigi
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 03/14] af_vsock: support multi-transport datagrams
2024-07-10 21:25 ` [RFC PATCH net-next v6 03/14] af_vsock: support multi-transport datagrams Amery Hung
@ 2024-07-15 8:13 ` Arseniy Krasnov
2024-07-15 17:41 ` Amery Hung
2024-07-28 20:28 ` Arseniy Krasnov
1 sibling, 1 reply; 51+ messages in thread
From: Arseniy Krasnov @ 2024-07-15 8:13 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong, kernel
Hi! Sorry, i was not in cc, so I'll reply in this way :)
+static const struct vsock_transport *
+vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
+{
+ const struct vsock_transport *transport;
+
+ transport = vsock_connectible_lookup_transport(cid, flags);
+ if (transport)
+ return transport;
+
+ return transport_dgram_fallback;
+}
+
^^^
I guess this must be under EXPORT_SYMBOL, because it is called from
virtio_transport_common.c, so module build fails.
Thanks
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 03/14] af_vsock: support multi-transport datagrams
2024-07-15 8:13 ` Arseniy Krasnov
@ 2024-07-15 17:41 ` Amery Hung
0 siblings, 0 replies; 51+ messages in thread
From: Amery Hung @ 2024-07-15 17:41 UTC (permalink / raw)
To: Arseniy Krasnov
Cc: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers, dan.carpenter, simon.horman, oxffffaa, kvm,
virtualization, netdev, linux-kernel, linux-hyperv, bpf,
bobby.eshleman, jiang.wang, amery.hung, xiyou.wangcong, kernel
On Mon, Jul 15, 2024 at 1:25 AM Arseniy Krasnov
<avkrasnov@salutedevices.com> wrote:
>
> Hi! Sorry, i was not in cc, so I'll reply in this way :)
Ope. I will copy you in the next version.
>
> +static const struct vsock_transport *
> +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
> +{
> + const struct vsock_transport *transport;
> +
> + transport = vsock_connectible_lookup_transport(cid, flags);
> + if (transport)
> + return transport;
> +
> + return transport_dgram_fallback;
> +}
> +
> ^^^
>
> I guess this must be under EXPORT_SYMBOL, because it is called from
> virtio_transport_common.c, so module build fails.
>
> Thanks
Right. I will fix it by exporting vsock_dgram_lookup_transport() in patch 7.
Thanks!
Amery
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 01/14] af_vsock: generalize vsock_dgram_recvmsg() to all transports
2024-07-15 8:02 ` Luigi Leonardi
@ 2024-07-15 23:39 ` Amery Hung
0 siblings, 0 replies; 51+ messages in thread
From: Amery Hung @ 2024-07-15 23:39 UTC (permalink / raw)
To: Luigi Leonardi
Cc: amery.hung, bobby.eshleman, bpf, bryantan, dan.carpenter, davem,
decui, edumazet, haiyangz, jasowang, jiang.wang, kuba, kvm, kys,
linux-hyperv, linux-kernel, mst, netdev, oxffffaa, pabeni,
pv-drivers, sgarzare, simon.horman, stefanha, vdasa,
virtualization, wei.liu, xiyou.wangcong, xuanzhuo
On Mon, Jul 15, 2024 at 1:02 AM Luigi Leonardi
<luigi.leonardi@outlook.com> wrote:
>
> Hi Amery, Bobby
>
> > From: Bobby Eshleman <bobby.eshleman@bytedance.com>
> >
> > This commit drops the transport->dgram_dequeue callback and makes
> > vsock_dgram_recvmsg() generic to all transports.
> >
> > To make this possible, two transport-level changes are introduced:
> > - transport in the receiving path now stores the cid and port into
> > the control buffer of an skb when populating an skb. The information
> > later is used to initialize sockaddr_vm structure in recvmsg()
> > without referencing vsk->transport.
> > - transport implementations set the skb->data pointer to the beginning
> > of the payload prior to adding the skb to the socket's receive queue.
> > That is, they must use skb_pull() before enqueuing. This is an
> > agreement between the transport and the socket layer that skb->data
> > always points to the beginning of the payload (and not, for example,
> > the packet header).
> >
> Like in the other patch, please use imperative in the commit message.
> >
> > Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> > Signed-off-by: Amery Hung <amery.hung@bytedance.com>
> > ---
> > drivers/vhost/vsock.c | 1 -
> > include/linux/virtio_vsock.h | 5 ---
> > include/net/af_vsock.h | 11 ++++-
> > net/vmw_vsock/af_vsock.c | 42 +++++++++++++++++-
> > net/vmw_vsock/hyperv_transport.c | 7 ---
> > net/vmw_vsock/virtio_transport.c | 1 -
> > net/vmw_vsock/virtio_transport_common.c | 9 ----
> > net/vmw_vsock/vmci_transport.c | 59 +++----------------------
> > net/vmw_vsock/vsock_loopback.c | 1 -
> > 9 files changed, 55 insertions(+), 81 deletions(-)
> >
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index ec20ecff85c7..97fffa914e66 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -419,7 +419,6 @@ static struct virtio_transport vhost_transport = {
> > .cancel_pkt = vhost_transport_cancel_pkt,
> >
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > - .dgram_dequeue = virtio_transport_dgram_dequeue,
> > .dgram_bind = virtio_transport_dgram_bind,
> > .dgram_allow = virtio_transport_dgram_allow,
> >
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index c82089dee0c8..8b56b8a19ddd 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -177,11 +177,6 @@ virtio_transport_stream_dequeue(struct vsock_sock *vsk,
> > size_t len,
> > int type);
> > int
> > -virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> > - struct msghdr *msg,
> > - size_t len, int flags);
> > -
> > -int
> > virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> > struct msghdr *msg,
> > size_t len);
> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > index 535701efc1e5..7aa1f5f2b1a5 100644
> > --- a/include/net/af_vsock.h
> > +++ b/include/net/af_vsock.h
> > @@ -120,8 +120,6 @@ struct vsock_transport {
> >
> > /* DGRAM. */
> > int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
> > - int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
> > - size_t len, int flags);
> > int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
> > struct msghdr *, size_t len);
> > bool (*dgram_allow)(u32 cid, u32 port);
> > @@ -219,6 +217,15 @@ void vsock_for_each_connected_socket(struct vsock_transport *transport,
> > int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
> > bool vsock_find_cid(unsigned int cid);
> >
> > +struct vsock_skb_cb {
> > + unsigned int src_cid;
> > + unsigned int src_port;
> > +};
> > +
> > +static inline struct vsock_skb_cb *vsock_skb_cb(struct sk_buff *skb) {
> > + return (struct vsock_skb_cb *)skb->cb;
> > +};
> > +
> >
>
> Running scripts/checkpatch.pl --strict --codespell on the patch shows this error:
>
> ERROR: open brace '{' following function definitions go on the next line
> #183: FILE: include/net/af_vsock.h:225:
> +static inline struct vsock_skb_cb *vsock_skb_cb(struct sk_buff *skb) {
>
> total: 1 errors, 0 warnings, 0 checks, 235 lines checked
> >
> > /**** TAP ****/
> >
> > struct vsock_tap {
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index 4b040285aa78..5e7d4d99ea2c 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -1273,11 +1273,15 @@ static int vsock_dgram_connect(struct socket *sock,
> > int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> > size_t len, int flags)
> > {
> > + struct vsock_skb_cb *vsock_cb;
> > #ifdef CONFIG_BPF_SYSCALL
> > const struct proto *prot;
> > #endif
> > struct vsock_sock *vsk;
> > + struct sk_buff *skb;
> > + size_t payload_len;
> > struct sock *sk;
> > + int err;
> >
> > sk = sock->sk;
> > vsk = vsock_sk(sk);
> > @@ -1288,7 +1292,43 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> > return prot->recvmsg(sk, msg, len, flags, NULL);
> > #endif
> >
> > - return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
> > + if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> > + return -EOPNOTSUPP;
> > +
> > + if (unlikely(flags & MSG_ERRQUEUE))
> > + return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 0);
> >
> This if statement is always false!
> >
> > +
> > + /* Retrieve the head sk_buff from the socket's receive queue. */
> > + err = 0;
> > + skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
> > + if (!skb)
> > + return err;
> > +
> > + payload_len = skb->len;
> > +
> nit: I'd remove this blank line.
> > + if (payload_len > len) {
> > + payload_len = len;
> > + msg->msg_flags |= MSG_TRUNC;
> > + }
> > +
> > + /* Place the datagram payload in the user's iovec. */
> > + err = skb_copy_datagram_msg(skb, 0, msg, payload_len);
> > + if (err)
> > + goto out;
> > +
> > + if (msg->msg_name) {
> > + /* Provide the address of the sender. */
> > + DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> > +
> > + vsock_cb = vsock_skb_cb(skb);
> > + vsock_addr_init(vm_addr, vsock_cb->src_cid, vsock_cb->src_port);
> > + msg->msg_namelen = sizeof(*vm_addr);
> > + }
> > + err = payload_len;
> > +
> > +out:
> > + skb_free_datagram(&vsk->sk, skb);
> > + return err;
> > }
> > EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
> >
> > diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> > index e2157e387217..326dd41ee2d5 100644
> > --- a/net/vmw_vsock/hyperv_transport.c
> > +++ b/net/vmw_vsock/hyperv_transport.c
> > @@ -556,12 +556,6 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> > return -EOPNOTSUPP;
> > }
> >
> > -static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
> > - size_t len, int flags)
> > -{
> > - return -EOPNOTSUPP;
> > -}
> > -
> > static int hvs_dgram_enqueue(struct vsock_sock *vsk,
> > struct sockaddr_vm *remote, struct msghdr *msg,
> > size_t dgram_len)
> > @@ -833,7 +827,6 @@ static struct vsock_transport hvs_transport = {
> > .shutdown = hvs_shutdown,
> >
> > .dgram_bind = hvs_dgram_bind,
> > - .dgram_dequeue = hvs_dgram_dequeue,
> > .dgram_enqueue = hvs_dgram_enqueue,
> > .dgram_allow = hvs_dgram_allow,
> >
> > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > index 43d405298857..a8c97e95622a 100644
> > --- a/net/vmw_vsock/virtio_transport.c
> > +++ b/net/vmw_vsock/virtio_transport.c
> > @@ -508,7 +508,6 @@ static struct virtio_transport virtio_transport = {
> > .cancel_pkt = virtio_transport_cancel_pkt,
> >
> > .dgram_bind = virtio_transport_dgram_bind,
> > - .dgram_dequeue = virtio_transport_dgram_dequeue,
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > .dgram_allow = virtio_transport_dgram_allow,
> >
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index 16ff976a86e3..4bf73d20c12a 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -810,15 +810,6 @@ virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> > }
> > EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_enqueue);
> >
> > -int
> > -virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> > - struct msghdr *msg,
> > - size_t len, int flags)
> > -{
> > - return -EOPNOTSUPP;
> > -}
> > -EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
> > -
> > s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
> > {
> > struct virtio_vsock_sock *vvs = vsk->trans;
> > diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> > index b370070194fa..b39df3ed8c8d 100644
> > --- a/net/vmw_vsock/vmci_transport.c
> > +++ b/net/vmw_vsock/vmci_transport.c
> > @@ -610,6 +610,7 @@ vmci_transport_datagram_create_hnd(u32 resource_id,
> >
> > static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg)
> > {
> > + struct vsock_skb_cb *vsock_cb;
> > struct sock *sk;
> > size_t size;
> > struct sk_buff *skb;
> > @@ -637,10 +638,14 @@ static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg)
> > if (!skb)
> > return VMCI_ERROR_NO_MEM;
> >
> > + vsock_cb = vsock_skb_cb(skb);
> > + vsock_cb->src_cid = dg->src.context;
> > + vsock_cb->src_port = dg->src.resource;
> > /* sk_receive_skb() will do a sock_put(), so hold here. */
> > sock_hold(sk);
> > skb_put(skb, size);
> > memcpy(skb->data, dg, size);
> > + skb_pull(skb, VMCI_DG_HEADERSIZE);
> > sk_receive_skb(sk, skb, 0);
> >
> > return VMCI_SUCCESS;
> > @@ -1731,59 +1736,6 @@ static int vmci_transport_dgram_enqueue(
> > return err - sizeof(*dg);
> > }
> >
> > -static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
> > - struct msghdr *msg, size_t len,
> > - int flags)
> > -{
> > - int err;
> > - struct vmci_datagram *dg;
> > - size_t payload_len;
> > - struct sk_buff *skb;
> > -
> > - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> > - return -EOPNOTSUPP;
> > -
> > - /* Retrieve the head sk_buff from the socket's receive queue. */
> > - err = 0;
> > - skb = skb_recv_datagram(&vsk->sk, flags, &err);
> > - if (!skb)
> > - return err;
> > -
> > - dg = (struct vmci_datagram *)skb->data;
> > - if (!dg)
> > - /* err is 0, meaning we read zero bytes. */
> > - goto out;
> > -
> > - payload_len = dg->payload_size;
> > - /* Ensure the sk_buff matches the payload size claimed in the packet. */
> > - if (payload_len != skb->len - sizeof(*dg)) {
> > - err = -EINVAL;
> > - goto out;
> > - }
> > -
> > - if (payload_len > len) {
> > - payload_len = len;
> > - msg->msg_flags |= MSG_TRUNC;
> > - }
> > -
> > - /* Place the datagram payload in the user's iovec. */
> > - err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
> > - if (err)
> > - goto out;
> > -
> > - if (msg->msg_name) {
> > - /* Provide the address of the sender. */
> > - DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> > - vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
> > - msg->msg_namelen = sizeof(*vm_addr);
> > - }
> > - err = payload_len;
> > -
> > -out:
> > - skb_free_datagram(&vsk->sk, skb);
> > - return err;
> > -}
> > -
> > static bool vmci_transport_dgram_allow(u32 cid, u32 port)
> > {
> > if (cid == VMADDR_CID_HYPERVISOR) {
> > @@ -2040,7 +1992,6 @@ static struct vsock_transport vmci_transport = {
> > .release = vmci_transport_release,
> > .connect = vmci_transport_connect,
> > .dgram_bind = vmci_transport_dgram_bind,
> > - .dgram_dequeue = vmci_transport_dgram_dequeue,
> > .dgram_enqueue = vmci_transport_dgram_enqueue,
> > .dgram_allow = vmci_transport_dgram_allow,
> > .stream_dequeue = vmci_transport_stream_dequeue,
> > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> > index 6dea6119f5b2..11488887a5cc 100644
> > --- a/net/vmw_vsock/vsock_loopback.c
> > +++ b/net/vmw_vsock/vsock_loopback.c
> > @@ -66,7 +66,6 @@ static struct virtio_transport loopback_transport = {
> > .cancel_pkt = vsock_loopback_cancel_pkt,
> >
> > .dgram_bind = virtio_transport_dgram_bind,
> > - .dgram_dequeue = virtio_transport_dgram_dequeue,
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > .dgram_allow = virtio_transport_dgram_allow,
> >
> > --
> > 2.20.1
> >
> >
>
> Small changes :)
> Rest LGTM!
>
I will fix the two style issues.
Thank you,
Amery
> Thanks,
> Luigi
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 14/14] test/vsock: add vsock dgram tests
2024-07-10 21:25 ` [RFC PATCH net-next v6 14/14] test/vsock: add vsock dgram tests Amery Hung
@ 2024-07-20 19:58 ` Arseniy Krasnov
2024-07-23 14:43 ` Stefano Garzarella
1 sibling, 0 replies; 51+ messages in thread
From: Arseniy Krasnov @ 2024-07-20 19:58 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
+static void test_dgram_sendto_client(const struct test_opts *opts)
+{
+ union {
+ struct sockaddr sa;
+ struct sockaddr_vm svm;
+ } addr = {
+ .svm = {
+ .svm_family = AF_VSOCK,
+ .svm_port = 1234,
^^^
port is not hardcoded, it is 'opts->peer_port'
+ .svm_cid = opts->peer_cid,
+ },
+ };
+ int fd;
Thanks, Arseniy
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
` (13 preceding siblings ...)
2024-07-10 21:25 ` [RFC PATCH net-next v6 14/14] test/vsock: add vsock dgram tests Amery Hung
@ 2024-07-23 14:38 ` Stefano Garzarella
2025-07-22 14:35 ` Stefano Garzarella
15 siblings, 0 replies; 51+ messages in thread
From: Stefano Garzarella @ 2024-07-23 14:38 UTC (permalink / raw)
To: Amery Hung
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
Hi Amery,
On Wed, Jul 10, 2024 at 09:25:41PM GMT, Amery Hung wrote:
>Hey all!
>
>This series introduces support for datagrams to virtio/vsock.
>
>It is a spin-off (and smaller version) of this series from the summer:
> https://lore.kernel.org/all/cover.1660362668.git.bobby.eshleman@bytedance.com/
Cool! Thanks for restarting this work!
>
>Please note that this is an RFC and should not be merged until
>associated changes are made to the virtio specification, which will
>follow after discussion from this series.
>
>Another aside, the v4 of the series has only been mildly tested with a
>run of tools/testing/vsock/vsock_test. Some code likely needs cleaning
>up, but I'm hoping to get some of the design choices agreed upon before
>spending too much time making it pretty.
What are the main points where you would like an agreement?
>
>This series first supports datagrams in a basic form for virtio, and
>then optimizes the sendpath for all datagram transports.
What kind of optimization?
>
>The result is a very fast datagram communication protocol that
>outperforms even UDP on multi-queue virtio-net w/ vhost on a variety
>of multi-threaded workload samples.
>
>For those that are curious, some summary data comparing UDP and VSOCK
>DGRAM (N=5):
>
> vCPUS: 16
> virtio-net queues: 16
> payload size: 4KB
> Setup: bare metal + vm (non-nested)
>
> UDP: 287.59 MB/s
> VSOCK DGRAM: 509.2 MB/s
Nice!
I have not tested because the series does not compile as has already
been pointed out, I will test the next version.
>
>Some notes about the implementation...
>
>This datagram implementation forces datagrams to self-throttle according
>to the threshold set by sk_sndbuf. It behaves similar to the credits
>used by streams in its effect on throughput and memory consumption, but
>it is not influenced by the receiving socket as credits are.
>
>The device drops packets silently.
>
>As discussed previously, this series introduces datagrams and defers
>fairness to future work. See discussion in v2 for more context around
>datagrams, fairness, and this implementation.
So IIUC we are re-using the same virtqueues used by stream/seqpacket,
right?
I did a fast review, there's something to fix, but it looks like this
can work well, so I'd start to discuss virtio spec changes ASAP.
Thanks,
Stefano
>
>Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>Signed-off-by: Amery Hung <amery.hung@bytedance.com>
>---
>Changes in v6:
>- allow empty transport in datagram vsock
>- add empty transport checks in various paths
>- transport layer now saves source cid and port to control buffer of skb
> to remove the dependency of transport in recvmsg()
>- fix virtio dgram_enqueue() by looking up the transport to be used when
> using sendto(2)
>- fix skb memory leaks in two places
>- add dgram auto-bind test
>- Link to v5: https://lore.kernel.org/r/20230413-b4-vsock-dgram-v5-0-581bd37fdb26@bytedance.com
>
>Changes in v5:
>- teach vhost to drop dgram when a datagram exceeds the receive buffer
> - now uses MSG_ERRQUEUE and depends on Arseniy's zerocopy patch:
> "vsock: read from socket's error queue"
>- replace multiple ->dgram_* callbacks with single ->dgram_addr_init()
> callback
>- refactor virtio dgram skb allocator to reduce conflicts w/ zerocopy series
>- add _fallback/_FALLBACK suffix to dgram transport variables/macros
>- add WARN_ONCE() for table_size / VSOCK_HASH issue
>- add static to vsock_find_bound_socket_common
>- dedupe code in vsock_dgram_sendmsg() using module_got var
>- drop concurrent sendmsg() for dgram and defer to future series
>- Add more tests
> - test EHOSTUNREACH in errqueue
> - test stream + dgram address collision
>- improve clarity of dgram msg bounds test code
>- Link to v4: https://lore.kernel.org/r/20230413-b4-vsock-dgram-v4-0-0cebbb2ae899@bytedance.com
>
>Changes in v4:
>- style changes
> - vsock: use sk_vsock(vsk) in vsock_dgram_recvmsg instead of
> &sk->vsk
> - vsock: fix xmas tree declaration
> - vsock: fix spacing issues
> - virtio/vsock: virtio_transport_recv_dgram returns void because err
> unused
>- sparse analysis warnings/errors
> - virtio/vsock: fix unitialized skerr on destroy
> - virtio/vsock: fix uninitialized err var on goto out
> - vsock: fix declarations that need static
> - vsock: fix __rcu annotation order
>- bugs
> - vsock: fix null ptr in remote_info code
> - vsock/dgram: make transport_dgram a fallback instead of first
> priority
> - vsock: remove redundant rcu read lock acquire in getname()
>- tests
> - add more tests (message bounds and more)
> - add vsock_dgram_bind() helper
> - add vsock_dgram_connect() helper
>
>Changes in v3:
>- Support multi-transport dgram, changing logic in connect/bind
> to support VMCI case
>- Support per-pkt transport lookup for sendto() case
>- Fix dgram_allow() implementation
>- Fix dgram feature bit number (now it is 3)
>- Fix binding so dgram and connectible (cid,port) spaces are
> non-overlapping
>- RCU protect transport ptr so connect() calls never leave
> a lockless read of the transport and remote_addr are always
> in sync
>- Link to v2: https://lore.kernel.org/r/20230413-b4-vsock-dgram-v2-0-079cc7cee62e@bytedance.com
>
>
>Bobby Eshleman (14):
> af_vsock: generalize vsock_dgram_recvmsg() to all transports
> af_vsock: refactor transport lookup code
> af_vsock: support multi-transport datagrams
> af_vsock: generalize bind table functions
> af_vsock: use a separate dgram bind table
> virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM
> virtio/vsock: add common datagram send path
> af_vsock: add vsock_find_bound_dgram_socket()
> virtio/vsock: add common datagram recv path
> virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
> vhost/vsock: implement datagram support
> vsock/loopback: implement datagram support
> virtio/vsock: implement datagram support
> test/vsock: add vsock dgram tests
>
> drivers/vhost/vsock.c | 62 +-
> include/linux/virtio_vsock.h | 9 +-
> include/net/af_vsock.h | 24 +-
> include/uapi/linux/virtio_vsock.h | 2 +
> net/vmw_vsock/af_vsock.c | 343 ++++++--
> net/vmw_vsock/hyperv_transport.c | 13 -
> net/vmw_vsock/virtio_transport.c | 24 +-
> net/vmw_vsock/virtio_transport_common.c | 188 ++++-
> net/vmw_vsock/vmci_transport.c | 61 +-
> net/vmw_vsock/vsock_loopback.c | 9 +-
> tools/testing/vsock/util.c | 177 +++-
> tools/testing/vsock/util.h | 10 +
> tools/testing/vsock/vsock_test.c | 1032 ++++++++++++++++++++---
> 13 files changed, 1638 insertions(+), 316 deletions(-)
>
>--
>2.20.1
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 04/14] af_vsock: generalize bind table functions
2024-07-10 21:25 ` [RFC PATCH net-next v6 04/14] af_vsock: generalize bind table functions Amery Hung
@ 2024-07-23 14:39 ` Stefano Garzarella
2024-07-28 18:52 ` Amery Hung
0 siblings, 1 reply; 51+ messages in thread
From: Stefano Garzarella @ 2024-07-23 14:39 UTC (permalink / raw)
To: Amery Hung
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
On Wed, Jul 10, 2024 at 09:25:45PM GMT, Amery Hung wrote:
>From: Bobby Eshleman <bobby.eshleman@bytedance.com>
>
>This commit makes the bind table management functions in vsock usable
>for different bind tables. Future work will introduce a new table for
>datagrams to avoid address collisions, and these functions will be used
>there.
>
>Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>---
> net/vmw_vsock/af_vsock.c | 34 +++++++++++++++++++++++++++-------
> 1 file changed, 27 insertions(+), 7 deletions(-)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index acc15e11700c..d571be9cdbf0 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -232,11 +232,12 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> sock_put(&vsk->sk);
> }
>
>-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
>+static struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
>+ struct list_head *bind_table)
> {
> struct vsock_sock *vsk;
>
>- list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
>+ list_for_each_entry(vsk, bind_table, bound_table) {
> if (vsock_addr_equals_addr(addr, &vsk->local_addr))
> return sk_vsock(vsk);
>
>@@ -249,6 +250,11 @@ static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> return NULL;
> }
>
>+static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
>+{
>+ return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
>+}
>+
> static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
> struct sockaddr_vm *dst)
> {
>@@ -671,12 +677,18 @@ static void vsock_pending_work(struct work_struct *work)
>
> /**** SOCKET OPERATIONS ****/
>
>-static int __vsock_bind_connectible(struct vsock_sock *vsk,
>- struct sockaddr_vm *addr)
>+static int vsock_bind_common(struct vsock_sock *vsk,
>+ struct sockaddr_vm *addr,
>+ struct list_head *bind_table,
>+ size_t table_size)
> {
> static u32 port;
> struct sockaddr_vm new_addr;
>
>+ if (WARN_ONCE(table_size < VSOCK_HASH_SIZE,
>+ "table size too small, may cause overflow"))
>+ return -EINVAL;
>+
I'd add this in another commit.
> if (!port)
> port = get_random_u32_above(LAST_RESERVED_PORT);
>
>@@ -692,7 +704,8 @@ static int __vsock_bind_connectible(struct
>vsock_sock *vsk,
>
> new_addr.svm_port = port++;
>
>- if (!__vsock_find_bound_socket(&new_addr)) {
>+ if (!vsock_find_bound_socket_common(&new_addr,
>+ &bind_table[VSOCK_HASH(addr)])) {
Can we add a macro for `&bind_table[VSOCK_HASH(addr)])` ?
> found = true;
> break;
> }
>@@ -709,7 +722,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> return -EACCES;
> }
>
>- if (__vsock_find_bound_socket(&new_addr))
>+ if (vsock_find_bound_socket_common(&new_addr,
>+ &bind_table[VSOCK_HASH(addr)]))
> return -EADDRINUSE;
> }
>
>@@ -721,11 +735,17 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> * by AF_UNIX.
> */
> __vsock_remove_bound(vsk);
>- __vsock_insert_bound(vsock_bound_sockets(&vsk->local_addr), vsk);
>+ __vsock_insert_bound(&bind_table[VSOCK_HASH(&vsk->local_addr)], vsk);
>
> return 0;
> }
>
>+static int __vsock_bind_connectible(struct vsock_sock *vsk,
>+ struct sockaddr_vm *addr)
>+{
>+ return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
What about using ARRAY_SIZE(x) ?
BTW we are using that size just to check it, but all the arrays we use
are statically allocated, so what about a compile time check like
BUILD_BUG_ON()?
Thanks,
Stefano
>+}
>+
> static int __vsock_bind_dgram(struct vsock_sock *vsk,
> struct sockaddr_vm *addr)
> {
>--
>2.20.1
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 05/14] af_vsock: use a separate dgram bind table
2024-07-10 21:25 ` [RFC PATCH net-next v6 05/14] af_vsock: use a separate dgram bind table Amery Hung
@ 2024-07-23 14:41 ` Stefano Garzarella
2024-07-28 21:37 ` Amery Hung
0 siblings, 1 reply; 51+ messages in thread
From: Stefano Garzarella @ 2024-07-23 14:41 UTC (permalink / raw)
To: Amery Hung
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
On Wed, Jul 10, 2024 at 09:25:46PM GMT, Amery Hung wrote:
>From: Bobby Eshleman <bobby.eshleman@bytedance.com>
>
>This commit adds support for bound dgram sockets to be tracked in a
>separate bind table from connectible sockets in order to avoid address
>collisions. With this commit, users can simultaneously bind a dgram
>socket and connectible socket to the same CID and port.
>
>Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>---
> net/vmw_vsock/af_vsock.c | 103 +++++++++++++++++++++++++++++----------
> 1 file changed, 76 insertions(+), 27 deletions(-)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index d571be9cdbf0..ab08cd81720e 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -10,18 +10,23 @@
> * - There are two kinds of sockets: those created by user action (such as
> * calling socket(2)) and those created by incoming connection request packets.
> *
>- * - There are two "global" tables, one for bound sockets (sockets that have
>- * specified an address that they are responsible for) and one for connected
>- * sockets (sockets that have established a connection with another socket).
>- * These tables are "global" in that all sockets on the system are placed
>- * within them. - Note, though, that the bound table contains an extra entry
>- * for a list of unbound sockets and SOCK_DGRAM sockets will always remain in
>- * that list. The bound table is used solely for lookup of sockets when packets
>- * are received and that's not necessary for SOCK_DGRAM sockets since we create
>- * a datagram handle for each and need not perform a lookup. Keeping SOCK_DGRAM
>- * sockets out of the bound hash buckets will reduce the chance of collisions
>- * when looking for SOCK_STREAM sockets and prevents us from having to check the
>- * socket type in the hash table lookups.
>+ * - There are three "global" tables, one for bound connectible (stream /
>+ * seqpacket) sockets, one for bound datagram sockets, and one for connected
>+ * sockets. Bound sockets are sockets that have specified an address that
>+ * they are responsible for. Connected sockets are sockets that have
>+ * established a connection with another socket. These tables are "global" in
>+ * that all sockets on the system are placed within them. - Note, though,
>+ * that the bound tables contain an extra entry for a list of unbound
>+ * sockets. The bound tables are used solely for lookup of sockets when packets
>+ * are received.
>+ *
>+ * - There are separate bind tables for connectible and datagram sockets to avoid
>+ * address collisions between stream/seqpacket sockets and datagram sockets.
>+ *
>+ * - Transports may elect to NOT use the global datagram bind table by
>+ * implementing the ->dgram_bind() callback. If that callback is implemented,
>+ * the global bind table is not used and the responsibility of bound datagram
>+ * socket tracking is deferred to the transport.
> *
> * - Sockets created by user action will either be "client" sockets that
> * initiate a connection or "server" sockets that listen for connections; we do
>@@ -116,6 +121,7 @@
> static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
> static void vsock_sk_destruct(struct sock *sk);
> static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
>+static bool sock_type_connectible(u16 type);
>
> /* Protocol family. */
> struct proto vsock_proto = {
>@@ -152,21 +158,25 @@ static DEFINE_MUTEX(vsock_register_mutex);
> * VSocket is stored in the connected hash table.
> *
> * Unbound sockets are all put on the same list attached to the end of the hash
>- * table (vsock_unbound_sockets). Bound sockets are added to the hash table in
>- * the bucket that their local address hashes to (vsock_bound_sockets(addr)
>- * represents the list that addr hashes to).
>+ * tables (vsock_unbound_sockets/vsock_unbound_dgram_sockets). Bound sockets
>+ * are added to the hash table in the bucket that their local address hashes to
>+ * (vsock_bound_sockets(addr) and vsock_bound_dgram_sockets(addr) represents
>+ * the list that addr hashes to).
> *
>- * Specifically, we initialize the vsock_bind_table array to a size of
>- * VSOCK_HASH_SIZE + 1 so that vsock_bind_table[0] through
>- * vsock_bind_table[VSOCK_HASH_SIZE - 1] are for bound sockets and
>- * vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets. The hash function
>- * mods with VSOCK_HASH_SIZE to ensure this.
>+ * Specifically, taking connectible sockets as an example we initialize the
>+ * vsock_bind_table array to a size of VSOCK_HASH_SIZE + 1 so that
>+ * vsock_bind_table[0] through vsock_bind_table[VSOCK_HASH_SIZE - 1] are for
>+ * bound sockets and vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets.
>+ * The hash function mods with VSOCK_HASH_SIZE to ensure this.
>+ * Datagrams and vsock_dgram_bind_table operate in the same way.
> */
> #define MAX_PORT_RETRIES 24
>
> #define VSOCK_HASH(addr) ((addr)->svm_port % VSOCK_HASH_SIZE)
> #define vsock_bound_sockets(addr) (&vsock_bind_table[VSOCK_HASH(addr)])
>+#define vsock_bound_dgram_sockets(addr) (&vsock_dgram_bind_table[VSOCK_HASH(addr)])
> #define vsock_unbound_sockets (&vsock_bind_table[VSOCK_HASH_SIZE])
>+#define vsock_unbound_dgram_sockets (&vsock_dgram_bind_table[VSOCK_HASH_SIZE])
>
> /* XXX This can probably be implemented in a better way. */
> #define VSOCK_CONN_HASH(src, dst) \
>@@ -182,6 +192,8 @@ struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
> EXPORT_SYMBOL_GPL(vsock_connected_table);
> DEFINE_SPINLOCK(vsock_table_lock);
> EXPORT_SYMBOL_GPL(vsock_table_lock);
>+static struct list_head vsock_dgram_bind_table[VSOCK_HASH_SIZE + 1];
>+static DEFINE_SPINLOCK(vsock_dgram_table_lock);
>
> /* Autobind this socket to the local address if necessary. */
> static int vsock_auto_bind(struct vsock_sock *vsk)
>@@ -204,6 +216,9 @@ static void vsock_init_tables(void)
>
> for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
> INIT_LIST_HEAD(&vsock_connected_table[i]);
>+
>+ for (i = 0; i < ARRAY_SIZE(vsock_dgram_bind_table); i++)
>+ INIT_LIST_HEAD(&vsock_dgram_bind_table[i]);
> }
>
> static void __vsock_insert_bound(struct list_head *list,
>@@ -271,13 +286,28 @@ static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
> return NULL;
> }
>
>-static void vsock_insert_unbound(struct vsock_sock *vsk)
>+static void __vsock_insert_dgram_unbound(struct vsock_sock *vsk)
>+{
>+ spin_lock_bh(&vsock_dgram_table_lock);
>+ __vsock_insert_bound(vsock_unbound_dgram_sockets, vsk);
>+ spin_unlock_bh(&vsock_dgram_table_lock);
>+}
>+
>+static void __vsock_insert_connectible_unbound(struct vsock_sock *vsk)
> {
> spin_lock_bh(&vsock_table_lock);
> __vsock_insert_bound(vsock_unbound_sockets, vsk);
> spin_unlock_bh(&vsock_table_lock);
> }
>
>+static void vsock_insert_unbound(struct vsock_sock *vsk)
>+{
>+ if (sock_type_connectible(sk_vsock(vsk)->sk_type))
>+ __vsock_insert_connectible_unbound(vsk);
>+ else
>+ __vsock_insert_dgram_unbound(vsk);
>+}
>+
> void vsock_insert_connected(struct vsock_sock *vsk)
> {
> struct list_head *list = vsock_connected_sockets(
>@@ -289,6 +319,14 @@ void vsock_insert_connected(struct vsock_sock *vsk)
> }
> EXPORT_SYMBOL_GPL(vsock_insert_connected);
>
>+static void vsock_remove_dgram_bound(struct vsock_sock *vsk)
>+{
>+ spin_lock_bh(&vsock_dgram_table_lock);
>+ if (__vsock_in_bound_table(vsk))
>+ __vsock_remove_bound(vsk);
>+ spin_unlock_bh(&vsock_dgram_table_lock);
>+}
>+
> void vsock_remove_bound(struct vsock_sock *vsk)
> {
> spin_lock_bh(&vsock_table_lock);
>@@ -340,7 +378,10 @@ EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
>
> void vsock_remove_sock(struct vsock_sock *vsk)
> {
>- vsock_remove_bound(vsk);
>+ if (sock_type_connectible(sk_vsock(vsk)->sk_type))
>+ vsock_remove_bound(vsk);
>+ else
>+ vsock_remove_dgram_bound(vsk);
Can we try to be consistent, for example we have vsock_insert_unbound()
which calls internally sock_type_connectible(), while
vsock_remove_bound() is just for connectible sockets. It's a bit
confusing.
> vsock_remove_connected(vsk);
> }
> EXPORT_SYMBOL_GPL(vsock_remove_sock);
>@@ -746,11 +787,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
> }
>
>-static int __vsock_bind_dgram(struct vsock_sock *vsk,
>- struct sockaddr_vm *addr)
>+static int vsock_bind_dgram(struct vsock_sock *vsk,
>+ struct sockaddr_vm *addr)
Why we are renaming this?
> {
>- if (!vsk->transport || !vsk->transport->dgram_bind)
>- return -EINVAL;
>+ if (!vsk->transport || !vsk->transport->dgram_bind) {
Why this condition?
Maybe a comment here is needed because I'm lost...
>+ int retval;
>+
>+ spin_lock_bh(&vsock_dgram_table_lock);
>+ retval = vsock_bind_common(vsk, addr, vsock_dgram_bind_table,
>+ VSOCK_HASH_SIZE);
Should we use VSOCK_HASH_SIZE + 1 here?
Using ARRAY_SIZE(x) should avoid this problem.
>+ spin_unlock_bh(&vsock_dgram_table_lock);
>+
>+ return retval;
>+ }
>
> return vsk->transport->dgram_bind(vsk, addr);
> }
>@@ -781,7 +830,7 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
> break;
>
> case SOCK_DGRAM:
>- retval = __vsock_bind_dgram(vsk, addr);
>+ retval = vsock_bind_dgram(vsk, addr);
> break;
>
> default:
>--
>2.20.1
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 07/14] virtio/vsock: add common datagram send path
2024-07-10 21:25 ` [RFC PATCH net-next v6 07/14] virtio/vsock: add common datagram send path Amery Hung
@ 2024-07-23 14:42 ` Stefano Garzarella
2024-07-26 23:22 ` Amery Hung
2024-07-29 20:00 ` Arseniy Krasnov
1 sibling, 1 reply; 51+ messages in thread
From: Stefano Garzarella @ 2024-07-23 14:42 UTC (permalink / raw)
To: Amery Hung
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
On Wed, Jul 10, 2024 at 09:25:48PM GMT, Amery Hung wrote:
>From: Bobby Eshleman <bobby.eshleman@bytedance.com>
>
>This commit implements the common function
>virtio_transport_dgram_enqueue for enqueueing datagrams. It does not add
>usage in either vhost or virtio yet.
>
>Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>Signed-off-by: Amery Hung <amery.hung@bytedance.com>
>---
> include/linux/virtio_vsock.h | 1 +
> include/net/af_vsock.h | 2 +
> net/vmw_vsock/af_vsock.c | 2 +-
> net/vmw_vsock/virtio_transport_common.c | 87 ++++++++++++++++++++++++-
> 4 files changed, 90 insertions(+), 2 deletions(-)
>
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index f749a066af46..4408749febd2 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -152,6 +152,7 @@ struct virtio_vsock_pkt_info {
> u16 op;
> u32 flags;
> bool reply;
>+ u8 remote_flags;
> };
>
> struct virtio_transport {
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index 44db8f2c507d..6e97d344ac75 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -216,6 +216,8 @@ void vsock_for_each_connected_socket(struct vsock_transport *transport,
> void (*fn)(struct sock *sk));
> int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
> bool vsock_find_cid(unsigned int cid);
>+const struct vsock_transport *vsock_dgram_lookup_transport(unsigned int cid,
>+ __u8 flags);
Why __u8 and not just u8?
>
> struct vsock_skb_cb {
> unsigned int src_cid;
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index ab08cd81720e..f83b655fdbe9 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -487,7 +487,7 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
> return transport;
> }
>
>-static const struct vsock_transport *
>+const struct vsock_transport *
> vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
> {
> const struct vsock_transport *transport;
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index a1c76836d798..46cd1807f8e3 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -1040,13 +1040,98 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)
> }
> EXPORT_SYMBOL_GPL(virtio_transport_shutdown);
>
>+static int virtio_transport_dgram_send_pkt_info(struct vsock_sock *vsk,
>+ struct virtio_vsock_pkt_info *info)
>+{
>+ u32 src_cid, src_port, dst_cid, dst_port;
>+ const struct vsock_transport *transport;
>+ const struct virtio_transport *t_ops;
>+ struct sock *sk = sk_vsock(vsk);
>+ struct virtio_vsock_hdr *hdr;
>+ struct sk_buff *skb;
>+ void *payload;
>+ int noblock = 0;
>+ int err;
>+
>+ info->type = virtio_transport_get_type(sk_vsock(vsk));
>+
>+ if (info->pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
>+ return -EMSGSIZE;
>+
>+ transport = vsock_dgram_lookup_transport(info->remote_cid, info->remote_flags);
Can `transport` be null?
I don't understand why we are calling vsock_dgram_lookup_transport()
again. Didn't we already do that in vsock_dgram_sendmsg()?
Also should we add a comment mentioning that we can't use
virtio_transport_get_ops()? IIUC becuase the vsk can be not assigned
to a specific transport, right?
>+ t_ops = container_of(transport, struct virtio_transport, transport);
>+ if (unlikely(!t_ops))
>+ return -EFAULT;
>+
>+ if (info->msg)
>+ noblock = info->msg->msg_flags & MSG_DONTWAIT;
>+
>+ /* Use sock_alloc_send_skb to throttle by sk_sndbuf. This helps avoid
>+ * triggering the OOM.
>+ */
>+ skb = sock_alloc_send_skb(sk, info->pkt_len + VIRTIO_VSOCK_SKB_HEADROOM,
>+ noblock, &err);
>+ if (!skb)
>+ return err;
>+
>+ skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
>+
>+ src_cid = t_ops->transport.get_local_cid();
>+ src_port = vsk->local_addr.svm_port;
>+ dst_cid = info->remote_cid;
>+ dst_port = info->remote_port;
>+
>+ hdr = virtio_vsock_hdr(skb);
>+ hdr->type = cpu_to_le16(info->type);
>+ hdr->op = cpu_to_le16(info->op);
>+ hdr->src_cid = cpu_to_le64(src_cid);
>+ hdr->dst_cid = cpu_to_le64(dst_cid);
>+ hdr->src_port = cpu_to_le32(src_port);
>+ hdr->dst_port = cpu_to_le32(dst_port);
>+ hdr->flags = cpu_to_le32(info->flags);
>+ hdr->len = cpu_to_le32(info->pkt_len);
>+
>+ if (info->msg && info->pkt_len > 0) {
>+ payload = skb_put(skb, info->pkt_len);
>+ err = memcpy_from_msg(payload, info->msg, info->pkt_len);
>+ if (err)
>+ goto out;
>+ }
>+
>+ trace_virtio_transport_alloc_pkt(src_cid, src_port,
>+ dst_cid, dst_port,
>+ info->pkt_len,
>+ info->type,
>+ info->op,
>+ info->flags,
>+ false);
>+
>+ return t_ops->send_pkt(skb);
>+out:
>+ kfree_skb(skb);
>+ return err;
>+}
>+
> int
> virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
> struct sockaddr_vm *remote_addr,
> struct msghdr *msg,
> size_t dgram_len)
> {
>- return -EOPNOTSUPP;
>+ /* Here we are only using the info struct to retain style uniformity
>+ * and to ease future refactoring and merging.
>+ */
>+ struct virtio_vsock_pkt_info info = {
>+ .op = VIRTIO_VSOCK_OP_RW,
>+ .remote_cid = remote_addr->svm_cid,
>+ .remote_port = remote_addr->svm_port,
>+ .remote_flags = remote_addr->svm_flags,
>+ .msg = msg,
>+ .vsk = vsk,
>+ .pkt_len = dgram_len,
>+ };
>+
>+ return virtio_transport_dgram_send_pkt_info(vsk, &info);
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
>
>--
>2.20.1
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 09/14] virtio/vsock: add common datagram recv path
2024-07-10 21:25 ` [RFC PATCH net-next v6 09/14] virtio/vsock: add common datagram recv path Amery Hung
@ 2024-07-23 14:42 ` Stefano Garzarella
2024-07-30 0:35 ` Amery Hung
0 siblings, 1 reply; 51+ messages in thread
From: Stefano Garzarella @ 2024-07-23 14:42 UTC (permalink / raw)
To: Amery Hung
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
On Wed, Jul 10, 2024 at 09:25:50PM GMT, Amery Hung wrote:
>From: Bobby Eshleman <bobby.eshleman@bytedance.com>
>
>This commit adds the common datagram receive functionality for virtio
>transports. It does not add the vhost/virtio users of that
>functionality.
>
>This functionality includes:
>- changes to the virtio_transport_recv_pkt() path for finding the
> bound socket receiver for incoming packets
>- virtio_transport_recv_pkt() saves the source cid and port to the
> control buffer for recvmsg() to initialize sockaddr_vm structure
> when using datagram
>
>Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>Signed-off-by: Amery Hung <amery.hung@bytedance.com>
>---
> net/vmw_vsock/virtio_transport_common.c | 79 +++++++++++++++++++++----
> 1 file changed, 66 insertions(+), 13 deletions(-)
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 46cd1807f8e3..a571b575fde9 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -235,7 +235,9 @@ EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);
>
> static u16 virtio_transport_get_type(struct sock *sk)
> {
>- if (sk->sk_type == SOCK_STREAM)
>+ if (sk->sk_type == SOCK_DGRAM)
>+ return VIRTIO_VSOCK_TYPE_DGRAM;
>+ else if (sk->sk_type == SOCK_STREAM)
> return VIRTIO_VSOCK_TYPE_STREAM;
> else
> return VIRTIO_VSOCK_TYPE_SEQPACKET;
>@@ -1422,6 +1424,33 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
> kfree_skb(skb);
> }
>
>+static void
>+virtio_transport_dgram_kfree_skb(struct sk_buff *skb, int err)
>+{
>+ if (err == -ENOMEM)
>+ kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_RCVBUFF);
>+ else if (err == -ENOBUFS)
>+ kfree_skb_reason(skb, SKB_DROP_REASON_PROTO_MEM);
>+ else
>+ kfree_skb(skb);
>+}
>+
>+/* This function takes ownership of the skb.
>+ *
>+ * It either places the skb on the sk_receive_queue or frees it.
>+ */
>+static void
>+virtio_transport_recv_dgram(struct sock *sk, struct sk_buff *skb)
>+{
>+ int err;
>+
>+ err = sock_queue_rcv_skb(sk, skb);
>+ if (err) {
>+ virtio_transport_dgram_kfree_skb(skb, err);
>+ return;
>+ }
>+}
>+
> static int
> virtio_transport_recv_connected(struct sock *sk,
> struct sk_buff *skb)
>@@ -1591,7 +1620,8 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
> static bool virtio_transport_valid_type(u16 type)
> {
> return (type == VIRTIO_VSOCK_TYPE_STREAM) ||
>- (type == VIRTIO_VSOCK_TYPE_SEQPACKET);
>+ (type == VIRTIO_VSOCK_TYPE_SEQPACKET) ||
>+ (type == VIRTIO_VSOCK_TYPE_DGRAM);
> }
>
> /* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mutex
>@@ -1601,44 +1631,57 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> struct sk_buff *skb)
> {
> struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
>+ struct vsock_skb_cb *vsock_cb;
This can be defined in the block where it's used.
> struct sockaddr_vm src, dst;
> struct vsock_sock *vsk;
> struct sock *sk;
> bool space_available;
>+ u16 type;
>
> vsock_addr_init(&src, le64_to_cpu(hdr->src_cid),
> le32_to_cpu(hdr->src_port));
> vsock_addr_init(&dst, le64_to_cpu(hdr->dst_cid),
> le32_to_cpu(hdr->dst_port));
>
>+ type = le16_to_cpu(hdr->type);
>+
> trace_virtio_transport_recv_pkt(src.svm_cid, src.svm_port,
> dst.svm_cid, dst.svm_port,
> le32_to_cpu(hdr->len),
>- le16_to_cpu(hdr->type),
>+ type,
> le16_to_cpu(hdr->op),
> le32_to_cpu(hdr->flags),
> le32_to_cpu(hdr->buf_alloc),
> le32_to_cpu(hdr->fwd_cnt));
>
>- if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) {
>+ if (!virtio_transport_valid_type(type)) {
> (void)virtio_transport_reset_no_sock(t, skb);
> goto free_pkt;
> }
>
>- /* The socket must be in connected or bound table
>- * otherwise send reset back
>+ /* For stream/seqpacket, the socket must be in connected or bound table
>+ * otherwise send reset back.
>+ *
>+ * For datagrams, no reset is sent back.
> */
> sk = vsock_find_connected_socket(&src, &dst);
> if (!sk) {
>- sk = vsock_find_bound_socket(&dst);
>- if (!sk) {
>- (void)virtio_transport_reset_no_sock(t, skb);
>- goto free_pkt;
>+ if (type == VIRTIO_VSOCK_TYPE_DGRAM) {
>+ sk = vsock_find_bound_dgram_socket(&dst);
>+ if (!sk)
>+ goto free_pkt;
>+ } else {
>+ sk = vsock_find_bound_socket(&dst);
>+ if (!sk) {
>+ (void)virtio_transport_reset_no_sock(t, skb);
>+ goto free_pkt;
>+ }
> }
> }
>
>- if (virtio_transport_get_type(sk) != le16_to_cpu(hdr->type)) {
>- (void)virtio_transport_reset_no_sock(t, skb);
>+ if (virtio_transport_get_type(sk) != type) {
>+ if (type != VIRTIO_VSOCK_TYPE_DGRAM)
>+ (void)virtio_transport_reset_no_sock(t, skb);
> sock_put(sk);
> goto free_pkt;
> }
>@@ -1654,12 +1697,21 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>
> /* Check if sk has been closed before lock_sock */
> if (sock_flag(sk, SOCK_DONE)) {
>- (void)virtio_transport_reset_no_sock(t, skb);
>+ if (type != VIRTIO_VSOCK_TYPE_DGRAM)
>+ (void)virtio_transport_reset_no_sock(t, skb);
> release_sock(sk);
> sock_put(sk);
> goto free_pkt;
> }
>
>+ if (sk->sk_type == SOCK_DGRAM) {
>+ vsock_cb = vsock_skb_cb(skb);
>+ vsock_cb->src_cid = src.svm_cid;
>+ vsock_cb->src_port = src.svm_port;
>+ virtio_transport_recv_dgram(sk, skb);
What about adding an API that transports can use to hide this?
I mean something that hide vsock_cb creation and queue packet in the
socket receive queue. I'd also not expose vsock_skb_cb in an header, but
I'd handle it internally in af_vsock.c. So I'd just expose API to
queue/dequeue them.
Also why VMCI is using sk_receive_skb(), while we are using
sock_queue_rcv_skb()?
Thanks,
Stefano
>+ goto out;
>+ }
>+
> space_available = virtio_transport_space_update(sk, skb);
>
> /* Update CID in case it has changed after a transport reset event */
>@@ -1691,6 +1743,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> break;
> }
>
>+out:
> release_sock(sk);
>
> /* Release refcnt obtained when we fetched this socket out of the
>--
>2.20.1
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 14/14] test/vsock: add vsock dgram tests
2024-07-10 21:25 ` [RFC PATCH net-next v6 14/14] test/vsock: add vsock dgram tests Amery Hung
2024-07-20 19:58 ` Arseniy Krasnov
@ 2024-07-23 14:43 ` Stefano Garzarella
2024-07-28 22:06 ` Amery Hung
1 sibling, 1 reply; 51+ messages in thread
From: Stefano Garzarella @ 2024-07-23 14:43 UTC (permalink / raw)
To: Amery Hung
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
On Wed, Jul 10, 2024 at 09:25:55PM GMT, Amery Hung wrote:
>From: Bobby Eshleman <bobby.eshleman@bytedance.com>
>
>From: Jiang Wang <jiang.wang@bytedance.com>
>
>This commit adds tests for vsock datagram.
>
>Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>Signed-off-by: Amery Hung <amery.hung@bytedance.com>
>---
> tools/testing/vsock/util.c | 177 ++++-
> tools/testing/vsock/util.h | 10 +
> tools/testing/vsock/vsock_test.c | 1032 ++++++++++++++++++++++++++----
> 3 files changed, 1099 insertions(+), 120 deletions(-)
>
>diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
>index 554b290fefdc..14d6cd90ca15 100644
>--- a/tools/testing/vsock/util.c
>+++ b/tools/testing/vsock/util.c
>@@ -154,7 +154,8 @@ static int vsock_connect(unsigned int cid, unsigned int port, int type)
> int ret;
> int fd;
>
>- control_expectln("LISTENING");
>+ if (type != SOCK_DGRAM)
>+ control_expectln("LISTENING");
Why it is not needed?
BTW this patch is too big to be reviewed, please split it.
Thanks,
Stefano
>
> fd = socket(AF_VSOCK, type, 0);
> if (fd < 0) {
>@@ -189,6 +190,11 @@ int vsock_seqpacket_connect(unsigned int cid, unsigned int port)
> return vsock_connect(cid, port, SOCK_SEQPACKET);
> }
>
>+int vsock_dgram_connect(unsigned int cid, unsigned int port)
>+{
>+ return vsock_connect(cid, port, SOCK_DGRAM);
>+}
>+
> /* Listen on <cid, port> and return the file descriptor. */
> static int vsock_listen(unsigned int cid, unsigned int port, int type)
> {
>@@ -287,6 +293,34 @@ int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
> return vsock_accept(cid, port, clientaddrp, SOCK_SEQPACKET);
> }
>
>+int vsock_dgram_bind(unsigned int cid, unsigned int port)
>+{
>+ union {
>+ struct sockaddr sa;
>+ struct sockaddr_vm svm;
>+ } addr = {
>+ .svm = {
>+ .svm_family = AF_VSOCK,
>+ .svm_port = port,
>+ .svm_cid = cid,
>+ },
>+ };
>+ int fd;
>+
>+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
>+ if (fd < 0) {
>+ perror("socket");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
>+ perror("bind");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ return fd;
>+}
>+
> /* Transmit bytes from a buffer and check the return value.
> *
> * expected_ret:
>@@ -425,6 +459,147 @@ void recv_byte(int fd, int expected_ret, int flags)
> }
> }
>
>+/* Transmit bytes to the given address from a buffer and check the return value.
>+ *
>+ * expected_ret:
>+ * <0 Negative errno (for testing errors)
>+ * 0 End-of-file
>+ * >0 Success (bytes successfully written)
>+ */
>+void sendto_buf(int fd, void *buf, size_t len, struct sockaddr *dst, socklen_t addrlen,
>+ int flags, ssize_t expected_ret)
>+{
>+ ssize_t nwritten = 0;
>+ ssize_t ret;
>+
>+ timeout_begin(TIMEOUT);
>+ do {
>+ ret = sendto(fd, buf + nwritten, len - nwritten, flags, dst, addrlen);
>+ timeout_check("sendto");
>+
>+ if (ret == 0 || (ret < 0 && errno != EINTR))
>+ break;
>+
>+ nwritten += ret;
>+ } while (nwritten < len);
>+ timeout_end();
>+
>+ if (expected_ret < 0) {
>+ if (nwritten != -1) {
>+ fprintf(stderr, "bogus sendto(2) return value %zd\n",
>+ nwritten);
>+ exit(EXIT_FAILURE);
>+ }
>+ if (errno != -expected_ret) {
>+ perror("sendto");
>+ exit(EXIT_FAILURE);
>+ }
>+ return;
>+ }
>+
>+ if (ret < 0) {
>+ perror("sendto");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (nwritten != expected_ret) {
>+ if (ret == 0)
>+ fprintf(stderr, "unexpected EOF while sending
>bytes\n");
>+
>+ fprintf(stderr, "bogus sendto(2) bytes written %zd (expected %zd)\n",
>+ nwritten, expected_ret);
>+ exit(EXIT_FAILURE);
>+ }
>+}
>+
>+/* Receive bytes from the given address in a buffer and check the return value.
>+ *
>+ * expected_ret:
>+ * <0 Negative errno (for testing errors)
>+ * 0 End-of-file
>+ * >0 Success (bytes successfully read)
>+ */
>+void recvfrom_buf(int fd, void *buf, size_t len, struct sockaddr *src, socklen_t *addrlen,
>+ int flags, ssize_t expected_ret)
>+{
>+ ssize_t nread = 0;
>+ ssize_t ret;
>+
>+ timeout_begin(TIMEOUT);
>+ do {
>+ ret = recvfrom(fd, buf + nread, len - nread, flags, src, addrlen);
>+ timeout_check("recvfrom");
>+
>+ if (ret == 0 || (ret < 0 && errno != EINTR))
>+ break;
>+
>+ nread += ret;
>+ } while (nread < len);
>+ timeout_end();
>+
>+ if (expected_ret < 0) {
>+ if (nread != -1) {
>+ fprintf(stderr, "bogus recvfrom(2) return value %zd\n",
>+ nread);
>+ exit(EXIT_FAILURE);
>+ }
>+ if (errno != -expected_ret) {
>+ perror("recvfrom");
>+ exit(EXIT_FAILURE);
>+ }
>+ return;
>+ }
>+
>+ if (ret < 0) {
>+ perror("recvfrom");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (nread != expected_ret) {
>+ if (ret == 0)
>+ fprintf(stderr, "unexpected EOF while receiving bytes\n");
>+
>+ fprintf(stderr, "bogus recv(2) bytes read %zd (expected %zd)\n",
>+ nread, expected_ret);
>+ exit(EXIT_FAILURE);
>+ }
>+}
>+
>+/* Transmit one byte to the given address and check the return value.
>+ *
>+ * expected_ret:
>+ * <0 Negative errno (for testing errors)
>+ * 0 End-of-file
>+ * 1 Success
>+ */
>+void sendto_byte(int fd, struct sockaddr *dst, socklen_t addrlen,
>+ int expected_ret, int flags)
>+{
>+ uint8_t byte = 'A';
>+
>+ sendto_buf(fd, &byte, sizeof(byte), dst, addrlen, flags, expected_ret);
>+}
>+
>+/* Receive one byte from the given address and check the return value.
>+ *
>+ * expected_ret:
>+ * <0 Negative errno (for testing errors)
>+ * 0 End-of-file
>+ * 1 Success
>+ */
>+void recvfrom_byte(int fd, struct sockaddr *src, socklen_t *addrlen,
>+ int expected_ret, int flags)
>+{
>+ uint8_t byte;
>+
>+ recvfrom_buf(fd, &byte, sizeof(byte), src, addrlen, flags, expected_ret);
>+
>+ if (byte != 'A') {
>+ fprintf(stderr, "unexpected byte read %c\n", byte);
>+ exit(EXIT_FAILURE);
>+ }
>+}
>+
> /* Run test cases. The program terminates if a failure occurs. */
> void run_tests(const struct test_case *test_cases,
> const struct test_opts *opts)
>diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
>index e95e62485959..3367262b53c9 100644
>--- a/tools/testing/vsock/util.h
>+++ b/tools/testing/vsock/util.h
>@@ -43,17 +43,27 @@ int vsock_stream_connect(unsigned int cid, unsigned int port);
> int vsock_bind_connect(unsigned int cid, unsigned int port,
> unsigned int bind_port, int type);
> int vsock_seqpacket_connect(unsigned int cid, unsigned int port);
>+int vsock_dgram_connect(unsigned int cid, unsigned int port);
> int vsock_stream_accept(unsigned int cid, unsigned int port,
> struct sockaddr_vm *clientaddrp);
> int vsock_stream_listen(unsigned int cid, unsigned int port);
> int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
> struct sockaddr_vm *clientaddrp);
>+int vsock_dgram_bind(unsigned int cid, unsigned int port);
> void vsock_wait_remote_close(int fd);
> void send_buf(int fd, const void *buf, size_t len, int flags,
> ssize_t expected_ret);
> void recv_buf(int fd, void *buf, size_t len, int flags, ssize_t expected_ret);
> void send_byte(int fd, int expected_ret, int flags);
> void recv_byte(int fd, int expected_ret, int flags);
>+void sendto_buf(int fd, void *buf, size_t len, struct sockaddr *dst,
>+ socklen_t addrlen, int flags, ssize_t expected_ret);
>+void recvfrom_buf(int fd, void *buf, size_t len, struct sockaddr *src,
>+ socklen_t *addrlen, int flags, ssize_t expected_ret);
>+void sendto_byte(int fd, struct sockaddr *dst, socklen_t addrlen,
>+ int expected_ret, int flags);
>+void recvfrom_byte(int fd, struct sockaddr *src, socklen_t *addrlen,
>+ int expected_ret, int flags);
> void run_tests(const struct test_case *test_cases,
> const struct test_opts *opts);
> void list_tests(const struct test_case *test_cases);
>diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>index f851f8961247..1e1576ca87d0 100644
>--- a/tools/testing/vsock/vsock_test.c
>+++ b/tools/testing/vsock/vsock_test.c
>@@ -13,6 +13,7 @@
> #include <string.h>
> #include <errno.h>
> #include <unistd.h>
>+#include <linux/errqueue.h>
> #include <linux/kernel.h>
> #include <sys/types.h>
> #include <sys/socket.h>
>@@ -26,6 +27,12 @@
> #include "control.h"
> #include "util.h"
>
>+#ifndef SOL_VSOCK
>+#define SOL_VSOCK 287
>+#endif
>+
>+#define DGRAM_MSG_CNT 16
>+
> static void test_stream_connection_reset(const struct test_opts *opts)
> {
> union {
>@@ -1403,125 +1410,912 @@ static void test_stream_cred_upd_on_set_rcvlowat(const struct test_opts *opts)
> test_stream_credit_update_test(opts, false);
> }
>
>-static struct test_case test_cases[] = {
>- {
>- .name = "SOCK_STREAM connection reset",
>- .run_client = test_stream_connection_reset,
>- },
>- {
>- .name = "SOCK_STREAM bind only",
>- .run_client = test_stream_bind_only_client,
>- .run_server = test_stream_bind_only_server,
>- },
>- {
>- .name = "SOCK_STREAM client close",
>- .run_client = test_stream_client_close_client,
>- .run_server = test_stream_client_close_server,
>- },
>- {
>- .name = "SOCK_STREAM server close",
>- .run_client = test_stream_server_close_client,
>- .run_server = test_stream_server_close_server,
>- },
>- {
>- .name = "SOCK_STREAM multiple connections",
>- .run_client = test_stream_multiconn_client,
>- .run_server = test_stream_multiconn_server,
>- },
>- {
>- .name = "SOCK_STREAM MSG_PEEK",
>- .run_client = test_stream_msg_peek_client,
>- .run_server = test_stream_msg_peek_server,
>- },
>- {
>- .name = "SOCK_SEQPACKET msg bounds",
>- .run_client = test_seqpacket_msg_bounds_client,
>- .run_server = test_seqpacket_msg_bounds_server,
>- },
>- {
>- .name = "SOCK_SEQPACKET MSG_TRUNC flag",
>- .run_client = test_seqpacket_msg_trunc_client,
>- .run_server = test_seqpacket_msg_trunc_server,
>- },
>- {
>- .name = "SOCK_SEQPACKET timeout",
>- .run_client = test_seqpacket_timeout_client,
>- .run_server = test_seqpacket_timeout_server,
>- },
>- {
>- .name = "SOCK_SEQPACKET invalid receive buffer",
>- .run_client = test_seqpacket_invalid_rec_buffer_client,
>- .run_server = test_seqpacket_invalid_rec_buffer_server,
>- },
>- {
>- .name = "SOCK_STREAM poll() + SO_RCVLOWAT",
>- .run_client = test_stream_poll_rcvlowat_client,
>- .run_server = test_stream_poll_rcvlowat_server,
>- },
>- {
>- .name = "SOCK_SEQPACKET big message",
>- .run_client = test_seqpacket_bigmsg_client,
>- .run_server = test_seqpacket_bigmsg_server,
>- },
>- {
>- .name = "SOCK_STREAM test invalid buffer",
>- .run_client = test_stream_inv_buf_client,
>- .run_server = test_stream_inv_buf_server,
>- },
>- {
>- .name = "SOCK_SEQPACKET test invalid buffer",
>- .run_client = test_seqpacket_inv_buf_client,
>- .run_server = test_seqpacket_inv_buf_server,
>- },
>- {
>- .name = "SOCK_STREAM virtio skb merge",
>- .run_client = test_stream_virtio_skb_merge_client,
>- .run_server = test_stream_virtio_skb_merge_server,
>- },
>- {
>- .name = "SOCK_SEQPACKET MSG_PEEK",
>- .run_client = test_seqpacket_msg_peek_client,
>- .run_server = test_seqpacket_msg_peek_server,
>- },
>- {
>- .name = "SOCK_STREAM SHUT_WR",
>- .run_client = test_stream_shutwr_client,
>- .run_server = test_stream_shutwr_server,
>- },
>- {
>- .name = "SOCK_STREAM SHUT_RD",
>- .run_client = test_stream_shutrd_client,
>- .run_server = test_stream_shutrd_server,
>- },
>- {
>- .name = "SOCK_STREAM MSG_ZEROCOPY",
>- .run_client = test_stream_msgzcopy_client,
>- .run_server = test_stream_msgzcopy_server,
>- },
>- {
>- .name = "SOCK_SEQPACKET MSG_ZEROCOPY",
>- .run_client = test_seqpacket_msgzcopy_client,
>- .run_server = test_seqpacket_msgzcopy_server,
>- },
>- {
>- .name = "SOCK_STREAM MSG_ZEROCOPY empty MSG_ERRQUEUE",
>- .run_client = test_stream_msgzcopy_empty_errq_client,
>- .run_server = test_stream_msgzcopy_empty_errq_server,
>- },
>- {
>- .name = "SOCK_STREAM double bind connect",
>- .run_client = test_double_bind_connect_client,
>- .run_server = test_double_bind_connect_server,
>- },
>- {
>- .name = "SOCK_STREAM virtio credit update + SO_RCVLOWAT",
>- .run_client = test_stream_rcvlowat_def_cred_upd_client,
>- .run_server = test_stream_cred_upd_on_set_rcvlowat,
>- },
>- {
>- .name = "SOCK_STREAM virtio credit update + low rx_bytes",
>- .run_client = test_stream_rcvlowat_def_cred_upd_client,
>- .run_server = test_stream_cred_upd_on_low_rx_bytes,
>+static void test_dgram_sendto_client(const struct test_opts *opts)
>+{
>+ union {
>+ struct sockaddr sa;
>+ struct sockaddr_vm svm;
>+ } addr = {
>+ .svm = {
>+ .svm_family = AF_VSOCK,
>+ .svm_port = 1234,
>+ .svm_cid = opts->peer_cid,
>+ },
>+ };
>+ int fd;
>+
>+ /* Wait for the server to be ready */
>+ control_expectln("BIND");
>+
>+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
>+ if (fd < 0) {
>+ perror("socket");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0);
>+
>+ /* Notify the server that the client has finished */
>+ control_writeln("DONE");
>+
>+ close(fd);
>+}
>+
>+static void test_dgram_sendto_server(const struct test_opts *opts)
>+{
>+ union {
>+ struct sockaddr sa;
>+ struct sockaddr_vm svm;
>+ } addr = {
>+ .svm = {
>+ .svm_family = AF_VSOCK,
>+ .svm_port = 1234,
>+ .svm_cid = VMADDR_CID_ANY,
>+ },
>+ };
>+ socklen_t addrlen = sizeof(addr.sa);
>+ unsigned long sock_buf_size;
>+ int fd;
>+
>+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
>+ if (fd < 0) {
>+ perror("socket");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
>+ perror("bind");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Set receive buffer to maximum */
>+ sock_buf_size = -1;
>+ if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
>+ &sock_buf_size, sizeof(sock_buf_size))) {
>+ perror("setsockopt(SO_RECVBUF)");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Notify the client that the server is ready */
>+ control_writeln("BIND");
>+
>+ recvfrom_byte(fd, &addr.sa, &addrlen, 1, 0);
>+
>+ /* Wait for the client to finish */
>+ control_expectln("DONE");
>+
>+ close(fd);
>+}
>+
>+static void test_dgram_sendto_auto_bind_client(const struct test_opts *opts)
>+{
>+ union {
>+ struct sockaddr sa;
>+ struct sockaddr_vm svm;
>+ } addr = {
>+ .svm = {
>+ .svm_family = AF_VSOCK,
>+ .svm_port = 1234,
>+ .svm_cid = opts->peer_cid,
>+ },
>+ };
>+ struct sockaddr_vm bind_addr;
>+ socklen_t addrlen;
>+ unsigned int port;
>+ int fd;
>+
>+ /* Wait for the server to be ready */
>+ control_expectln("BIND");
>+
>+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
>+ if (fd < 0) {
>+ perror("socket");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0);
>+
>+ /* Get auto-bound port after sendto */
>+ addrlen = sizeof(bind_addr);
>+ if (getsockname(fd, (struct sockaddr *)&bind_addr, &addrlen)) {
>+ perror("getsockname");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Send the port number to the server */
>+ port = bind_addr.svm_port;
>+ sendto_buf(fd, &port, sizeof(port), &addr.sa, sizeof(addr.svm), 0, sizeof(port));
>+
>+ addr.svm.svm_port = port;
>+ recvfrom_byte(fd, &addr.sa, &addrlen, 1, 0);
>+
>+ /* Notify the server that the client has finished */
>+ control_writeln("DONE");
>+
>+ close(fd);
>+}
>+
>+static void test_dgram_sendto_auto_bind_server(const struct test_opts *opts)
>+{
>+ union {
>+ struct sockaddr sa;
>+ struct sockaddr_vm svm;
>+ } addr = {
>+ .svm = {
>+ .svm_family = AF_VSOCK,
>+ .svm_port = 1234,
>+ .svm_cid = VMADDR_CID_ANY,
>+ },
>+ };
>+ socklen_t addrlen = sizeof(addr.sa);
>+ unsigned long sock_buf_size;
>+ unsigned int port;
>+ int fd;
>+
>+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
>+ if (fd < 0) {
>+ perror("socket");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
>+ perror("bind");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Set receive buffer to maximum */
>+ sock_buf_size = -1;
>+ if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
>+ &sock_buf_size, sizeof(sock_buf_size))) {
>+ perror("setsockopt(SO_RECVBUF)");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Notify the client that the server is ready */
>+ control_writeln("BIND");
>+
>+ recvfrom_byte(fd, &addr.sa, &addrlen, 1, 0);
>+
>+ /* Receive the port the client is listening to */
>+ recvfrom_buf(fd, &port, sizeof(port), &addr.sa, &addrlen, 0, sizeof(port));
>+
>+ addr.svm.svm_port = port;
>+ addr.svm.svm_cid = opts->peer_cid;
>+ sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0);
>+
>+ /* Wait for the client to finish */
>+ control_expectln("DONE");
>+
>+ close(fd);
>+}
>+
>+static void test_dgram_connect_client(const struct test_opts *opts)
>+{
>+ union {
>+ struct sockaddr sa;
>+ struct sockaddr_vm svm;
>+ } addr = {
>+ .svm = {
>+ .svm_family = AF_VSOCK,
>+ .svm_port = 1234,
>+ .svm_cid = opts->peer_cid,
>+ },
>+ };
>+ int ret;
>+ int fd;
>+
>+ /* Wait for the server to be ready */
>+ control_expectln("BIND");
>+
>+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
>+ if (fd < 0) {
>+ perror("bind");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ ret = connect(fd, &addr.sa, sizeof(addr.svm));
>+ if (ret < 0) {
>+ perror("connect");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ send_byte(fd, 1, 0);
>+
>+ /* Notify the server that the client has finished */
>+ control_writeln("DONE");
>+
>+ close(fd);
>+}
>+
>+static void test_dgram_connect_server(const struct test_opts *opts)
>+{
>+ test_dgram_sendto_server(opts);
>+}
>+
>+static void test_dgram_multiconn_sendto_client(const struct test_opts *opts)
>+{
>+ union {
>+ struct sockaddr sa;
>+ struct sockaddr_vm svm;
>+ } addr = {
>+ .svm = {
>+ .svm_family = AF_VSOCK,
>+ .svm_port = 1234,
>+ .svm_cid = opts->peer_cid,
>+ },
>+ };
>+ int fds[MULTICONN_NFDS];
>+ int i;
>+
>+ /* Wait for the server to be ready */
>+ control_expectln("BIND");
>+
>+ for (i = 0; i < MULTICONN_NFDS; i++) {
>+ fds[i] = socket(AF_VSOCK, SOCK_DGRAM, 0);
>+ if (fds[i] < 0) {
>+ perror("socket");
>+ exit(EXIT_FAILURE);
>+ }
>+ }
>+
>+ for (i = 0; i < MULTICONN_NFDS; i++) {
>+ sendto_byte(fds[i], &addr.sa, sizeof(addr.svm), 1, 0);
>+
>+ /* This is here to make explicit the case of the test failing
>+ * due to packet loss. The test fails when recv() times out
>+ * otherwise, which is much more confusing.
>+ */
>+ control_expectln("PKTRECV");
>+ }
>+
>+ /* Notify the server that the client has finished */
>+ control_writeln("DONE");
>+
>+ for (i = 0; i < MULTICONN_NFDS; i++)
>+ close(fds[i]);
>+}
>+
>+static void test_dgram_multiconn_sendto_server(const struct test_opts *opts)
>+{
>+ union {
>+ struct sockaddr sa;
>+ struct sockaddr_vm svm;
>+ } addr = {
>+ .svm = {
>+ .svm_family = AF_VSOCK,
>+ .svm_port = 1234,
>+ .svm_cid = VMADDR_CID_ANY,
>+ },
>+ };
>+ int len = sizeof(addr.sa);
>+ int fd;
>+ int i;
>+
>+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
>+ if (fd < 0) {
>+ perror("socket");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
>+ perror("bind");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Notify the client that the server is ready */
>+ control_writeln("BIND");
>+
>+ for (i = 0; i < MULTICONN_NFDS; i++) {
>+ recvfrom_byte(fd, &addr.sa, &len, 1, 0);
>+ control_writeln("PKTRECV");
>+ }
>+
>+ /* Wait for the client to finish */
>+ control_expectln("DONE");
>+
>+ close(fd);
>+}
>+
>+static void test_dgram_multiconn_send_client(const struct test_opts *opts)
>+{
>+ int fds[MULTICONN_NFDS];
>+ int i;
>+
>+ /* Wait for the server to be ready */
>+ control_expectln("BIND");
>+
>+ for (i = 0; i < MULTICONN_NFDS; i++) {
>+ fds[i] = vsock_dgram_connect(opts->peer_cid, 1234);
>+ if (fds[i] < 0) {
>+ perror("connect");
>+ exit(EXIT_FAILURE);
>+ }
>+ }
>+
>+ for (i = 0; i < MULTICONN_NFDS; i++) {
>+ send_byte(fds[i], 1, 0);
>+ /* This is here to make explicit the case of the test failing
>+ * due to packet loss.
>+ */
>+ control_expectln("PKTRECV");
>+ }
>+
>+ /* Notify the server that the client has finished */
>+ control_writeln("DONE");
>+
>+ for (i = 0; i < MULTICONN_NFDS; i++)
>+ close(fds[i]);
>+}
>+
>+static void test_dgram_multiconn_send_server(const struct test_opts *opts)
>+{
>+ union {
>+ struct sockaddr sa;
>+ struct sockaddr_vm svm;
>+ } addr = {
>+ .svm = {
>+ .svm_family = AF_VSOCK,
>+ .svm_port = 1234,
>+ .svm_cid = VMADDR_CID_ANY,
>+ },
>+ };
>+ unsigned long sock_buf_size;
>+ int fd;
>+ int i;
>+
>+ fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
>+ if (fd < 0) {
>+ perror("socket");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
>+ perror("bind");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Set receive buffer to maximum */
>+ sock_buf_size = -1;
>+ if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
>+ &sock_buf_size, sizeof(sock_buf_size))) {
>+ perror("setsockopt(SO_RECVBUF)");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Notify the client that the server is ready */
>+ control_writeln("BIND");
>+
>+ for (i = 0; i < MULTICONN_NFDS; i++) {
>+ recv_byte(fd, 1, 0);
>+ control_writeln("PKTRECV");
>+ }
>+
>+ /* Wait for the client to finish */
>+ control_expectln("DONE");
>+
>+ close(fd);
>+}
>+
>+/*
>+ * This test is similar to the seqpacket msg bounds tests, but it is unreliable
>+ * because it may also fail in the unlikely case that packets are dropped.
>+ */
>+static void test_dgram_bounds_unreliable_client(const struct test_opts *opts)
>+{
>+ unsigned long recv_buf_size;
>+ unsigned long *hashes;
>+ size_t max_msg_size;
>+ int page_size;
>+ int fd;
>+ int i;
>+
>+ fd = vsock_dgram_connect(opts->peer_cid, 1234);
>+ if (fd < 0) {
>+ perror("connect");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ hashes = malloc(DGRAM_MSG_CNT * sizeof(unsigned long));
>+ if (!hashes) {
>+ perror("malloc");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Let the server know the client is ready */
>+ control_writeln("CLNTREADY");
>+
>+ /* Wait, until receiver sets buffer size. */
>+ control_expectln("SRVREADY");
>+
>+ recv_buf_size = control_readulong();
>+
>+ page_size = getpagesize();
>+ max_msg_size = MAX_MSG_PAGES * page_size;
>+
>+ for (i = 0; i < DGRAM_MSG_CNT; i++) {
>+ ssize_t send_size;
>+ size_t buf_size;
>+ void *buf;
>+
>+ /* Use "small" buffers and "big" buffers. */
>+ if (opts->peer_cid <= VMADDR_CID_HOST && (i & 1))
>+ buf_size = page_size +
>+ (rand() % (max_msg_size - page_size));
>+ else
>+ buf_size = 1 + (rand() % page_size);
>+
>+ buf_size = min(buf_size, recv_buf_size);
>+
>+ buf = malloc(buf_size);
>+
>+ if (!buf) {
>+ perror("malloc");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ memset(buf, rand() & 0xff, buf_size);
>+
>+ send_size = send(fd, buf, buf_size, 0);
>+ if (send_size < 0) {
>+ perror("send");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (send_size != buf_size) {
>+ fprintf(stderr, "Invalid send size\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* In theory the implementation isn't required to transmit
>+ * these packets in order, so we use this PKTSENT/PKTRECV
>+ * message sequence so that server and client coordinate
>+ * sending and receiving one packet at a time. The client sends
>+ * a packet and waits until it has been received before sending
>+ * another.
>+ *
>+ * Also in theory these packets can be lost and the test will
>+ * fail for that reason.
>+ */
>+ control_writeln("PKTSENT");
>+ control_expectln("PKTRECV");
>+
>+ /* Send the server a hash of the packet */
>+ hashes[i] = hash_djb2(buf, buf_size);
>+ free(buf);
>+ }
>+
>+ control_writeln("SENDDONE");
>+ close(fd);
>+
>+ for (i = 0; i < DGRAM_MSG_CNT; i++) {
>+ if (hashes[i] != control_readulong())
>+ fprintf(stderr, "broken dgram message bounds or packet loss\n");
>+ }
>+ free(hashes);
>+}
>+
>+static void test_dgram_bounds_unreliable_server(const struct test_opts *opts)
>+{
>+ unsigned long hashes[DGRAM_MSG_CNT];
>+ unsigned long sock_buf_size;
>+ struct msghdr msg = {0};
>+ struct iovec iov = {0};
>+ socklen_t len;
>+ int fd;
>+ int i;
>+
>+ fd = vsock_dgram_bind(VMADDR_CID_ANY, 1234);
>+ if (fd < 0) {
>+ perror("bind");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Set receive buffer to maximum */
>+ sock_buf_size = -1;
>+ if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
>+ &sock_buf_size, sizeof(sock_buf_size))) {
>+ perror("setsockopt(SO_RECVBUF)");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Retrieve the receive buffer size */
>+ len = sizeof(sock_buf_size);
>+ if (getsockopt(fd, SOL_SOCKET, SO_RCVBUF,
>+ &sock_buf_size, &len)) {
>+ perror("getsockopt(SO_RECVBUF)");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Client ready to receive parameters */
>+ control_expectln("CLNTREADY");
>+
>+ /* Ready to receive data. */
>+ control_writeln("SRVREADY");
>+
>+ if (opts->peer_cid > VMADDR_CID_HOST)
>+ control_writeulong(sock_buf_size);
>+ else
>+ control_writeulong(getpagesize());
>+
>+ iov.iov_len = MAX_MSG_PAGES * getpagesize();
>+ iov.iov_base = malloc(iov.iov_len);
>+ if (!iov.iov_base) {
>+ perror("malloc");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ msg.msg_iov = &iov;
>+ msg.msg_iovlen = 1;
>+
>+ for (i = 0; i < DGRAM_MSG_CNT; i++) {
>+ ssize_t recv_size;
>+
>+ control_expectln("PKTSENT");
>+ recv_size = recvmsg(fd, &msg, 0);
>+ control_writeln("PKTRECV");
>+
>+ if (!recv_size)
>+ break;
>+
>+ if (recv_size < 0) {
>+ perror("recvmsg");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ hashes[i] = hash_djb2(msg.msg_iov[0].iov_base, recv_size);
>+ }
>+
>+ control_expectln("SENDDONE");
>+
>+ free(iov.iov_base);
>+ close(fd);
>+
>+ for (i = 0; i < DGRAM_MSG_CNT; i++)
>+ control_writeulong(hashes[i]);
>+}
>+
>+#define POLL_TIMEOUT_MS 1000
>+void vsock_recv_error(int fd)
>+{
>+ struct sock_extended_err *serr;
>+ struct msghdr msg = { 0 };
>+ struct pollfd fds = { 0 };
>+ char cmsg_data[128];
>+ struct cmsghdr *cm;
>+ ssize_t res;
>+
>+ fds.fd = fd;
>+ fds.events = 0;
>+
>+ if (poll(&fds, 1, POLL_TIMEOUT_MS) < 0) {
>+ perror("poll");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (!(fds.revents & POLLERR)) {
>+ fprintf(stderr, "POLLERR expected\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ msg.msg_control = cmsg_data;
>+ msg.msg_controllen = sizeof(cmsg_data);
>+
>+ res = recvmsg(fd, &msg, MSG_ERRQUEUE);
>+ if (res) {
>+ fprintf(stderr, "failed to read error queue: %zi\n", res);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ cm = CMSG_FIRSTHDR(&msg);
>+ if (!cm) {
>+ fprintf(stderr, "cmsg: no cmsg\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (cm->cmsg_level != SOL_VSOCK) {
>+ fprintf(stderr, "cmsg: unexpected 'cmsg_level'\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (cm->cmsg_type != 0) {
>+ fprintf(stderr, "cmsg: unexpected 'cmsg_type'\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ serr = (void *)CMSG_DATA(cm);
>+ if (serr->ee_origin != 0) {
>+ fprintf(stderr, "serr: unexpected 'ee_origin'\n");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (serr->ee_errno != EHOSTUNREACH) {
>+ fprintf(stderr, "serr: wrong error code: %u\n", serr->ee_errno);
>+ exit(EXIT_FAILURE);
>+ }
>+}
>+
>+/*
>+ * Attempt to send a packet larger than the client's RX buffer. Test that the
>+ * packet was dropped and that there is an error in the error queue.
>+ */
>+static void test_dgram_drop_big_packets_server(const struct test_opts *opts)
>+{
>+ unsigned long client_rx_buf_size;
>+ size_t buf_size;
>+ void *buf;
>+ int fd;
>+
>+ if (opts->peer_cid <= VMADDR_CID_HOST) {
>+ printf("The server's peer must be a guest (not CID %u), skipped...\n",
>+ opts->peer_cid);
>+ return;
>+ }
>+
>+ /* Wait for the server to be ready */
>+ control_expectln("READY");
>+
>+ fd = vsock_dgram_connect(opts->peer_cid, 1234);
>+ if (fd < 0) {
>+ perror("connect");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ client_rx_buf_size = control_readulong();
>+
>+ buf_size = client_rx_buf_size + 1;
>+ buf = malloc(buf_size);
>+ if (!buf) {
>+ perror("malloc");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Even though the buffer is exceeded, the send() should still succeed. */
>+ if (send(fd, buf, buf_size, 0) < 0) {
>+ perror("send");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ vsock_recv_error(fd);
>+
>+ /* Notify the server that the client has finished */
>+ control_writeln("DONE");
>+
>+ close(fd);
>+}
>+
>+static void test_dgram_drop_big_packets_client(const struct test_opts *opts)
>+{
>+ unsigned long buf_size = getpagesize();
>+
>+ if (opts->peer_cid > VMADDR_CID_HOST) {
>+ printf("The client's peer must be the host (not CID %u), skipped...\n",
>+ opts->peer_cid);
>+ return;
>+ }
>+
>+ control_writeln("READY");
>+ control_writeulong(buf_size);
>+ control_expectln("DONE");
>+}
>+
>+static void test_stream_dgram_address_collision_client(const struct test_opts *opts)
>+{
>+ int dgram_fd, stream_fd;
>+
>+ stream_fd = vsock_stream_connect(opts->peer_cid, 1234);
>+ if (stream_fd < 0) {
>+ perror("connect");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* This simply tests if connect() causes address collision client-side.
>+ * Keep in mind that there is no exchange of packets with the
>+ * bound socket on the server.
>+ */
>+ dgram_fd = vsock_dgram_connect(opts->peer_cid, 1234);
>+ if (dgram_fd < 0) {
>+ perror("connect");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ close(stream_fd);
>+ close(dgram_fd);
>+
>+ /* Notify the server that the client has finished */
>+ control_writeln("DONE");
>+}
>+
>+static void test_stream_dgram_address_collision_server(const struct test_opts *opts)
>+{
>+ int dgram_fd, stream_fd;
>+ struct sockaddr_vm addr;
>+ socklen_t addrlen;
>+
>+ stream_fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, 0);
>+ if (stream_fd < 0) {
>+ perror("accept");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Retrieve the CID/port for re-use. */
>+ addrlen = sizeof(addr);
>+ if (getsockname(stream_fd, (struct sockaddr *)&addr, &addrlen)) {
>+ perror("getsockname");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* See not in the client function about the pairwise connect call. */
>+ dgram_fd = vsock_dgram_bind(addr.svm_cid, addr.svm_port);
>+ if (dgram_fd < 0) {
>+ perror("bind");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ control_expectln("DONE");
>+
>+ close(stream_fd);
>+ close(dgram_fd);
>+}
>+
>+static struct test_case test_cases[] = {
>+ {
>+ .name = "SOCK_STREAM connection reset",
>+ .run_client = test_stream_connection_reset,
>+ },
>+ {
>+ .name = "SOCK_STREAM bind only",
>+ .run_client = test_stream_bind_only_client,
>+ .run_server = test_stream_bind_only_server,
>+ },
>+ {
>+ .name = "SOCK_STREAM client close",
>+ .run_client = test_stream_client_close_client,
>+ .run_server = test_stream_client_close_server,
>+ },
>+ {
>+ .name = "SOCK_STREAM server close",
>+ .run_client = test_stream_server_close_client,
>+ .run_server = test_stream_server_close_server,
>+ },
>+ {
>+ .name = "SOCK_STREAM multiple connections",
>+ .run_client = test_stream_multiconn_client,
>+ .run_server = test_stream_multiconn_server,
>+ },
>+ {
>+ .name = "SOCK_STREAM MSG_PEEK",
>+ .run_client = test_stream_msg_peek_client,
>+ .run_server = test_stream_msg_peek_server,
>+ },
>+ {
>+ .name = "SOCK_SEQPACKET msg bounds",
>+ .run_client = test_seqpacket_msg_bounds_client,
>+ .run_server = test_seqpacket_msg_bounds_server,
>+ },
>+ {
>+ .name = "SOCK_SEQPACKET MSG_TRUNC flag",
>+ .run_client = test_seqpacket_msg_trunc_client,
>+ .run_server = test_seqpacket_msg_trunc_server,
>+ },
>+ {
>+ .name = "SOCK_SEQPACKET timeout",
>+ .run_client = test_seqpacket_timeout_client,
>+ .run_server = test_seqpacket_timeout_server,
>+ },
>+ {
>+ .name = "SOCK_SEQPACKET invalid receive buffer",
>+ .run_client = test_seqpacket_invalid_rec_buffer_client,
>+ .run_server = test_seqpacket_invalid_rec_buffer_server,
>+ },
>+ {
>+ .name = "SOCK_STREAM poll() + SO_RCVLOWAT",
>+ .run_client = test_stream_poll_rcvlowat_client,
>+ .run_server = test_stream_poll_rcvlowat_server,
>+ },
>+ {
>+ .name = "SOCK_SEQPACKET big message",
>+ .run_client = test_seqpacket_bigmsg_client,
>+ .run_server = test_seqpacket_bigmsg_server,
>+ },
>+ {
>+ .name = "SOCK_STREAM test invalid buffer",
>+ .run_client = test_stream_inv_buf_client,
>+ .run_server = test_stream_inv_buf_server,
>+ },
>+ {
>+ .name = "SOCK_SEQPACKET test invalid buffer",
>+ .run_client = test_seqpacket_inv_buf_client,
>+ .run_server = test_seqpacket_inv_buf_server,
>+ },
>+ {
>+ .name = "SOCK_STREAM virtio skb merge",
>+ .run_client = test_stream_virtio_skb_merge_client,
>+ .run_server = test_stream_virtio_skb_merge_server,
>+ },
>+ {
>+ .name = "SOCK_SEQPACKET MSG_PEEK",
>+ .run_client = test_seqpacket_msg_peek_client,
>+ .run_server = test_seqpacket_msg_peek_server,
>+ },
>+ {
>+ .name = "SOCK_STREAM SHUT_WR",
>+ .run_client = test_stream_shutwr_client,
>+ .run_server = test_stream_shutwr_server,
>+ },
>+ {
>+ .name = "SOCK_STREAM SHUT_RD",
>+ .run_client = test_stream_shutrd_client,
>+ .run_server = test_stream_shutrd_server,
>+ },
>+ {
>+ .name = "SOCK_STREAM MSG_ZEROCOPY",
>+ .run_client = test_stream_msgzcopy_client,
>+ .run_server = test_stream_msgzcopy_server,
>+ },
>+ {
>+ .name = "SOCK_SEQPACKET MSG_ZEROCOPY",
>+ .run_client = test_seqpacket_msgzcopy_client,
>+ .run_server = test_seqpacket_msgzcopy_server,
>+ },
>+ {
>+ .name = "SOCK_STREAM MSG_ZEROCOPY empty MSG_ERRQUEUE",
>+ .run_client = test_stream_msgzcopy_empty_errq_client,
>+ .run_server = test_stream_msgzcopy_empty_errq_server,
>+ },
>+ {
>+ .name = "SOCK_STREAM double bind connect",
>+ .run_client = test_double_bind_connect_client,
>+ .run_server = test_double_bind_connect_server,
>+ },
>+ {
>+ .name = "SOCK_STREAM virtio credit update + SO_RCVLOWAT",
>+ .run_client = test_stream_rcvlowat_def_cred_upd_client,
>+ .run_server = test_stream_cred_upd_on_set_rcvlowat,
>+ },
>+ {
>+ .name = "SOCK_STREAM virtio credit update + low rx_bytes",
>+ .run_client = test_stream_rcvlowat_def_cred_upd_client,
>+ .run_server = test_stream_cred_upd_on_low_rx_bytes,
>+ },
>+ {
>+ .name = "SOCK_DGRAM client sendto",
>+ .run_client = test_dgram_sendto_client,
>+ .run_server = test_dgram_sendto_server,
>+ },
>+ {
>+ .name = "SOCK_DGRAM client sendto auto bind",
>+ .run_client = test_dgram_sendto_auto_bind_client,
>+ .run_server = test_dgram_sendto_auto_bind_server,
>+ },
>+ {
>+ .name = "SOCK_DGRAM client connect",
>+ .run_client = test_dgram_connect_client,
>+ .run_server = test_dgram_connect_server,
>+ },
>+ {
>+ .name = "SOCK_DGRAM multiple connections using sendto",
>+ .run_client = test_dgram_multiconn_sendto_client,
>+ .run_server = test_dgram_multiconn_sendto_server,
>+ },
>+ {
>+ .name = "SOCK_DGRAM multiple connections using send",
>+ .run_client = test_dgram_multiconn_send_client,
>+ .run_server = test_dgram_multiconn_send_server,
>+ },
>+ {
>+ .name = "SOCK_DGRAM msg bounds unreliable",
>+ .run_client = test_dgram_bounds_unreliable_client,
>+ .run_server = test_dgram_bounds_unreliable_server,
>+ },
>+ {
>+ .name = "SOCK_DGRAM drop big packets",
>+ .run_client = test_dgram_drop_big_packets_client,
>+ .run_server = test_dgram_drop_big_packets_server,
>+ },
>+ {
>+ .name = "SOCK_STREAM and SOCK_DGRAM address collision",
>+ .run_client = test_stream_dgram_address_collision_client,
>+ .run_server = test_stream_dgram_address_collision_server,
> },
> {},
> };
>--
>2.20.1
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 02/14] af_vsock: refactor transport lookup code
2024-07-10 21:25 ` [RFC PATCH net-next v6 02/14] af_vsock: refactor transport lookup code Amery Hung
@ 2024-07-25 6:29 ` Arseniy Krasnov
2024-07-28 22:10 ` Amery Hung
0 siblings, 1 reply; 51+ messages in thread
From: Arseniy Krasnov @ 2024-07-25 6:29 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong
Hi
+static const struct vsock_transport *
+vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
^^^ may be just 'u8' ?
+{
+ const struct vsock_transport *transport;
^^^ do we really need this variable now?
May be shorter like:
if (A)
return transport_local;
else if (B)
return transport_g2h;
else
return transport_h2g;
+
+ if (vsock_use_local_transport(cid))
+ transport = transport_local;
+ else if (cid <= VMADDR_CID_HOST || !transport_h2g ||
+ (flags & VMADDR_FLAG_TO_HOST))
+ transport = transport_g2h;
+ else
+ transport = transport_h2g;
+
+ return transport;
+}
+
Thanks
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 07/14] virtio/vsock: add common datagram send path
2024-07-23 14:42 ` Stefano Garzarella
@ 2024-07-26 23:22 ` Amery Hung
2024-07-30 8:22 ` Stefano Garzarella
0 siblings, 1 reply; 51+ messages in thread
From: Amery Hung @ 2024-07-26 23:22 UTC (permalink / raw)
To: Stefano Garzarella
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
On Tue, Jul 23, 2024 at 7:42 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Jul 10, 2024 at 09:25:48PM GMT, Amery Hung wrote:
> >From: Bobby Eshleman <bobby.eshleman@bytedance.com>
> >
> >This commit implements the common function
> >virtio_transport_dgram_enqueue for enqueueing datagrams. It does not add
> >usage in either vhost or virtio yet.
> >
> >Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> >Signed-off-by: Amery Hung <amery.hung@bytedance.com>
> >---
> > include/linux/virtio_vsock.h | 1 +
> > include/net/af_vsock.h | 2 +
> > net/vmw_vsock/af_vsock.c | 2 +-
> > net/vmw_vsock/virtio_transport_common.c | 87 ++++++++++++++++++++++++-
> > 4 files changed, 90 insertions(+), 2 deletions(-)
> >
> >diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> >index f749a066af46..4408749febd2 100644
> >--- a/include/linux/virtio_vsock.h
> >+++ b/include/linux/virtio_vsock.h
> >@@ -152,6 +152,7 @@ struct virtio_vsock_pkt_info {
> > u16 op;
> > u32 flags;
> > bool reply;
> >+ u8 remote_flags;
> > };
> >
> > struct virtio_transport {
> >diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> >index 44db8f2c507d..6e97d344ac75 100644
> >--- a/include/net/af_vsock.h
> >+++ b/include/net/af_vsock.h
> >@@ -216,6 +216,8 @@ void vsock_for_each_connected_socket(struct vsock_transport *transport,
> > void (*fn)(struct sock *sk));
> > int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
> > bool vsock_find_cid(unsigned int cid);
> >+const struct vsock_transport *vsock_dgram_lookup_transport(unsigned int cid,
> >+ __u8 flags);
>
> Why __u8 and not just u8?
>
Will change to u8.
>
> >
> > struct vsock_skb_cb {
> > unsigned int src_cid;
> >diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> >index ab08cd81720e..f83b655fdbe9 100644
> >--- a/net/vmw_vsock/af_vsock.c
> >+++ b/net/vmw_vsock/af_vsock.c
> >@@ -487,7 +487,7 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
> > return transport;
> > }
> >
> >-static const struct vsock_transport *
> >+const struct vsock_transport *
> > vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
> > {
> > const struct vsock_transport *transport;
> >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> >index a1c76836d798..46cd1807f8e3 100644
> >--- a/net/vmw_vsock/virtio_transport_common.c
> >+++ b/net/vmw_vsock/virtio_transport_common.c
> >@@ -1040,13 +1040,98 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)
> > }
> > EXPORT_SYMBOL_GPL(virtio_transport_shutdown);
> >
> >+static int virtio_transport_dgram_send_pkt_info(struct vsock_sock *vsk,
> >+ struct virtio_vsock_pkt_info *info)
> >+{
> >+ u32 src_cid, src_port, dst_cid, dst_port;
> >+ const struct vsock_transport *transport;
> >+ const struct virtio_transport *t_ops;
> >+ struct sock *sk = sk_vsock(vsk);
> >+ struct virtio_vsock_hdr *hdr;
> >+ struct sk_buff *skb;
> >+ void *payload;
> >+ int noblock = 0;
> >+ int err;
> >+
> >+ info->type = virtio_transport_get_type(sk_vsock(vsk));
> >+
> >+ if (info->pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> >+ return -EMSGSIZE;
> >+
> >+ transport = vsock_dgram_lookup_transport(info->remote_cid, info->remote_flags);
>
> Can `transport` be null?
>
> I don't understand why we are calling vsock_dgram_lookup_transport()
> again. Didn't we already do that in vsock_dgram_sendmsg()?
>
transport should be valid here since we null-checked it in
vsock_dgram_sendmsg(). The reason vsock_dgram_lookup_transport() is
called again here is we don't have the transport when we called into
transport->dgram_enqueue(). I can also instead add transport to the
argument of dgram_enqueue() to eliminate this redundant lookup.
> Also should we add a comment mentioning that we can't use
> virtio_transport_get_ops()? IIUC becuase the vsk can be not assigned
> to a specific transport, right?
>
Correct. For virtio dgram socket, transport is not assigned unless
vsock_dgram_connect() is called. I will add a comment here explaining
this.
> >+ t_ops = container_of(transport, struct virtio_transport, transport);
> >+ if (unlikely(!t_ops))
> >+ return -EFAULT;
> >+
> >+ if (info->msg)
> >+ noblock = info->msg->msg_flags & MSG_DONTWAIT;
> >+
> >+ /* Use sock_alloc_send_skb to throttle by sk_sndbuf. This helps avoid
> >+ * triggering the OOM.
> >+ */
> >+ skb = sock_alloc_send_skb(sk, info->pkt_len + VIRTIO_VSOCK_SKB_HEADROOM,
> >+ noblock, &err);
> >+ if (!skb)
> >+ return err;
> >+
> >+ skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
> >+
> >+ src_cid = t_ops->transport.get_local_cid();
> >+ src_port = vsk->local_addr.svm_port;
> >+ dst_cid = info->remote_cid;
> >+ dst_port = info->remote_port;
> >+
> >+ hdr = virtio_vsock_hdr(skb);
> >+ hdr->type = cpu_to_le16(info->type);
> >+ hdr->op = cpu_to_le16(info->op);
> >+ hdr->src_cid = cpu_to_le64(src_cid);
> >+ hdr->dst_cid = cpu_to_le64(dst_cid);
> >+ hdr->src_port = cpu_to_le32(src_port);
> >+ hdr->dst_port = cpu_to_le32(dst_port);
> >+ hdr->flags = cpu_to_le32(info->flags);
> >+ hdr->len = cpu_to_le32(info->pkt_len);
> >+
> >+ if (info->msg && info->pkt_len > 0) {
> >+ payload = skb_put(skb, info->pkt_len);
> >+ err = memcpy_from_msg(payload, info->msg, info->pkt_len);
> >+ if (err)
> >+ goto out;
> >+ }
> >+
> >+ trace_virtio_transport_alloc_pkt(src_cid, src_port,
> >+ dst_cid, dst_port,
> >+ info->pkt_len,
> >+ info->type,
> >+ info->op,
> >+ info->flags,
> >+ false);
> >+
> >+ return t_ops->send_pkt(skb);
> >+out:
> >+ kfree_skb(skb);
> >+ return err;
> >+}
> >+
> > int
> > virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
> > struct sockaddr_vm *remote_addr,
> > struct msghdr *msg,
> > size_t dgram_len)
> > {
> >- return -EOPNOTSUPP;
> >+ /* Here we are only using the info struct to retain style uniformity
> >+ * and to ease future refactoring and merging.
> >+ */
> >+ struct virtio_vsock_pkt_info info = {
> >+ .op = VIRTIO_VSOCK_OP_RW,
> >+ .remote_cid = remote_addr->svm_cid,
> >+ .remote_port = remote_addr->svm_port,
> >+ .remote_flags = remote_addr->svm_flags,
> >+ .msg = msg,
> >+ .vsk = vsk,
> >+ .pkt_len = dgram_len,
> >+ };
> >+
> >+ return virtio_transport_dgram_send_pkt_info(vsk, &info);
> > }
> > EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
> >
> >--
> >2.20.1
> >
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 04/14] af_vsock: generalize bind table functions
2024-07-23 14:39 ` Stefano Garzarella
@ 2024-07-28 18:52 ` Amery Hung
2024-07-30 8:00 ` Stefano Garzarella
0 siblings, 1 reply; 51+ messages in thread
From: Amery Hung @ 2024-07-28 18:52 UTC (permalink / raw)
To: Stefano Garzarella
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
On Tue, Jul 23, 2024 at 7:40 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Jul 10, 2024 at 09:25:45PM GMT, Amery Hung wrote:
> >From: Bobby Eshleman <bobby.eshleman@bytedance.com>
> >
> >This commit makes the bind table management functions in vsock usable
> >for different bind tables. Future work will introduce a new table for
> >datagrams to avoid address collisions, and these functions will be used
> >there.
> >
> >Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> >---
> > net/vmw_vsock/af_vsock.c | 34 +++++++++++++++++++++++++++-------
> > 1 file changed, 27 insertions(+), 7 deletions(-)
> >
> >diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> >index acc15e11700c..d571be9cdbf0 100644
> >--- a/net/vmw_vsock/af_vsock.c
> >+++ b/net/vmw_vsock/af_vsock.c
> >@@ -232,11 +232,12 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> > sock_put(&vsk->sk);
> > }
> >
> >-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> >+static struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
> >+ struct list_head *bind_table)
> > {
> > struct vsock_sock *vsk;
> >
> >- list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
> >+ list_for_each_entry(vsk, bind_table, bound_table) {
> > if (vsock_addr_equals_addr(addr, &vsk->local_addr))
> > return sk_vsock(vsk);
> >
> >@@ -249,6 +250,11 @@ static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> > return NULL;
> > }
> >
> >+static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> >+{
> >+ return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
> >+}
> >+
> > static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
> > struct sockaddr_vm *dst)
> > {
> >@@ -671,12 +677,18 @@ static void vsock_pending_work(struct work_struct *work)
> >
> > /**** SOCKET OPERATIONS ****/
> >
> >-static int __vsock_bind_connectible(struct vsock_sock *vsk,
> >- struct sockaddr_vm *addr)
> >+static int vsock_bind_common(struct vsock_sock *vsk,
> >+ struct sockaddr_vm *addr,
> >+ struct list_head *bind_table,
> >+ size_t table_size)
> > {
> > static u32 port;
> > struct sockaddr_vm new_addr;
> >
> >+ if (WARN_ONCE(table_size < VSOCK_HASH_SIZE,
> >+ "table size too small, may cause overflow"))
> >+ return -EINVAL;
> >+
>
> I'd add this in another commit.
>
> > if (!port)
> > port = get_random_u32_above(LAST_RESERVED_PORT);
> >
> >@@ -692,7 +704,8 @@ static int __vsock_bind_connectible(struct
> >vsock_sock *vsk,
> >
> > new_addr.svm_port = port++;
> >
> >- if (!__vsock_find_bound_socket(&new_addr)) {
> >+ if (!vsock_find_bound_socket_common(&new_addr,
> >+ &bind_table[VSOCK_HASH(addr)])) {
>
> Can we add a macro for `&bind_table[VSOCK_HASH(addr)])` ?
>
Definitely. I will add the following macro:
#define vsock_bound_sockets_in_table(bind_table, addr) \
(&bind_table[VSOCK_HASH(addr)])
> > found = true;
> > break;
> > }
> >@@ -709,7 +722,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > return -EACCES;
> > }
> >
> >- if (__vsock_find_bound_socket(&new_addr))
> >+ if (vsock_find_bound_socket_common(&new_addr,
> >+ &bind_table[VSOCK_HASH(addr)]))
> > return -EADDRINUSE;
> > }
> >
> >@@ -721,11 +735,17 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > * by AF_UNIX.
> > */
> > __vsock_remove_bound(vsk);
> >- __vsock_insert_bound(vsock_bound_sockets(&vsk->local_addr), vsk);
> >+ __vsock_insert_bound(&bind_table[VSOCK_HASH(&vsk->local_addr)], vsk);
> >
> > return 0;
> > }
> >
> >+static int __vsock_bind_connectible(struct vsock_sock *vsk,
> >+ struct sockaddr_vm *addr)
> >+{
> >+ return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
>
> What about using ARRAY_SIZE(x) ?
>
> BTW we are using that size just to check it, but all the arrays we use
> are statically allocated, so what about a compile time check like
> BUILD_BUG_ON()?
>
I will remove the table_size check you mentioned earlier and the
argument here as the arrays are allocated statically like you
mentioned.
If you think this check may be a good addition, I can add a
BUILD_BUG_ON() in the new vsock_bound_sockets_in_table() macro.
Thanks,
Amery
> Thanks,
> Stefano
>
>
> >+}
> >+
> > static int __vsock_bind_dgram(struct vsock_sock *vsk,
> > struct sockaddr_vm *addr)
> > {
> >--
> >2.20.1
> >
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 03/14] af_vsock: support multi-transport datagrams
2024-07-10 21:25 ` [RFC PATCH net-next v6 03/14] af_vsock: support multi-transport datagrams Amery Hung
2024-07-15 8:13 ` Arseniy Krasnov
@ 2024-07-28 20:28 ` Arseniy Krasnov
2024-07-28 21:53 ` Amery Hung
1 sibling, 1 reply; 51+ messages in thread
From: Arseniy Krasnov @ 2024-07-28 20:28 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong, kernel
Hi Amery
> /* Transport features flags */
> /* Transport provides host->guest communication */
> -#define VSOCK_TRANSPORT_F_H2G 0x00000001
> +#define VSOCK_TRANSPORT_F_H2G 0x00000001
> /* Transport provides guest->host communication */
> -#define VSOCK_TRANSPORT_F_G2H 0x00000002
> -/* Transport provides DGRAM communication */
> -#define VSOCK_TRANSPORT_F_DGRAM 0x00000004
> +#define VSOCK_TRANSPORT_F_G2H 0x00000002
> +/* Transport provides fallback for DGRAM communication */
> +#define VSOCK_TRANSPORT_F_DGRAM_FALLBACK 0x00000004
> /* Transport provides local (loopback) communication */
> -#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
> +#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
^^^ This is refactoring ?
> + /* During vsock_create(), the transport cannot be decided yet if
> + * using virtio. While for VMCI, it is transport_dgram_fallback.
I'm not English speaker, but 'decided' -> 'detected'/'resolved' ?
Thanks, Arseniy
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 05/14] af_vsock: use a separate dgram bind table
2024-07-23 14:41 ` Stefano Garzarella
@ 2024-07-28 21:37 ` Amery Hung
2024-07-30 8:05 ` Stefano Garzarella
0 siblings, 1 reply; 51+ messages in thread
From: Amery Hung @ 2024-07-28 21:37 UTC (permalink / raw)
To: Stefano Garzarella
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
On Tue, Jul 23, 2024 at 7:41 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Jul 10, 2024 at 09:25:46PM GMT, Amery Hung wrote:
> >From: Bobby Eshleman <bobby.eshleman@bytedance.com>
> >
> >This commit adds support for bound dgram sockets to be tracked in a
> >separate bind table from connectible sockets in order to avoid address
> >collisions. With this commit, users can simultaneously bind a dgram
> >socket and connectible socket to the same CID and port.
> >
> >Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> >---
> > net/vmw_vsock/af_vsock.c | 103 +++++++++++++++++++++++++++++----------
> > 1 file changed, 76 insertions(+), 27 deletions(-)
> >
> >diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> >index d571be9cdbf0..ab08cd81720e 100644
> >--- a/net/vmw_vsock/af_vsock.c
> >+++ b/net/vmw_vsock/af_vsock.c
> >@@ -10,18 +10,23 @@
> > * - There are two kinds of sockets: those created by user action (such as
> > * calling socket(2)) and those created by incoming connection request packets.
> > *
> >- * - There are two "global" tables, one for bound sockets (sockets that have
> >- * specified an address that they are responsible for) and one for connected
> >- * sockets (sockets that have established a connection with another socket).
> >- * These tables are "global" in that all sockets on the system are placed
> >- * within them. - Note, though, that the bound table contains an extra entry
> >- * for a list of unbound sockets and SOCK_DGRAM sockets will always remain in
> >- * that list. The bound table is used solely for lookup of sockets when packets
> >- * are received and that's not necessary for SOCK_DGRAM sockets since we create
> >- * a datagram handle for each and need not perform a lookup. Keeping SOCK_DGRAM
> >- * sockets out of the bound hash buckets will reduce the chance of collisions
> >- * when looking for SOCK_STREAM sockets and prevents us from having to check the
> >- * socket type in the hash table lookups.
> >+ * - There are three "global" tables, one for bound connectible (stream /
> >+ * seqpacket) sockets, one for bound datagram sockets, and one for connected
> >+ * sockets. Bound sockets are sockets that have specified an address that
> >+ * they are responsible for. Connected sockets are sockets that have
> >+ * established a connection with another socket. These tables are "global" in
> >+ * that all sockets on the system are placed within them. - Note, though,
> >+ * that the bound tables contain an extra entry for a list of unbound
> >+ * sockets. The bound tables are used solely for lookup of sockets when packets
> >+ * are received.
> >+ *
> >+ * - There are separate bind tables for connectible and datagram sockets to avoid
> >+ * address collisions between stream/seqpacket sockets and datagram sockets.
> >+ *
> >+ * - Transports may elect to NOT use the global datagram bind table by
> >+ * implementing the ->dgram_bind() callback. If that callback is implemented,
> >+ * the global bind table is not used and the responsibility of bound datagram
> >+ * socket tracking is deferred to the transport.
> > *
> > * - Sockets created by user action will either be "client" sockets that
> > * initiate a connection or "server" sockets that listen for connections; we do
> >@@ -116,6 +121,7 @@
> > static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
> > static void vsock_sk_destruct(struct sock *sk);
> > static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
> >+static bool sock_type_connectible(u16 type);
> >
> > /* Protocol family. */
> > struct proto vsock_proto = {
> >@@ -152,21 +158,25 @@ static DEFINE_MUTEX(vsock_register_mutex);
> > * VSocket is stored in the connected hash table.
> > *
> > * Unbound sockets are all put on the same list attached to the end of the hash
> >- * table (vsock_unbound_sockets). Bound sockets are added to the hash table in
> >- * the bucket that their local address hashes to (vsock_bound_sockets(addr)
> >- * represents the list that addr hashes to).
> >+ * tables (vsock_unbound_sockets/vsock_unbound_dgram_sockets). Bound sockets
> >+ * are added to the hash table in the bucket that their local address hashes to
> >+ * (vsock_bound_sockets(addr) and vsock_bound_dgram_sockets(addr) represents
> >+ * the list that addr hashes to).
> > *
> >- * Specifically, we initialize the vsock_bind_table array to a size of
> >- * VSOCK_HASH_SIZE + 1 so that vsock_bind_table[0] through
> >- * vsock_bind_table[VSOCK_HASH_SIZE - 1] are for bound sockets and
> >- * vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets. The hash function
> >- * mods with VSOCK_HASH_SIZE to ensure this.
> >+ * Specifically, taking connectible sockets as an example we initialize the
> >+ * vsock_bind_table array to a size of VSOCK_HASH_SIZE + 1 so that
> >+ * vsock_bind_table[0] through vsock_bind_table[VSOCK_HASH_SIZE - 1] are for
> >+ * bound sockets and vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets.
> >+ * The hash function mods with VSOCK_HASH_SIZE to ensure this.
> >+ * Datagrams and vsock_dgram_bind_table operate in the same way.
> > */
> > #define MAX_PORT_RETRIES 24
> >
> > #define VSOCK_HASH(addr) ((addr)->svm_port % VSOCK_HASH_SIZE)
> > #define vsock_bound_sockets(addr) (&vsock_bind_table[VSOCK_HASH(addr)])
> >+#define vsock_bound_dgram_sockets(addr) (&vsock_dgram_bind_table[VSOCK_HASH(addr)])
> > #define vsock_unbound_sockets (&vsock_bind_table[VSOCK_HASH_SIZE])
> >+#define vsock_unbound_dgram_sockets (&vsock_dgram_bind_table[VSOCK_HASH_SIZE])
> >
> > /* XXX This can probably be implemented in a better way. */
> > #define VSOCK_CONN_HASH(src, dst) \
> >@@ -182,6 +192,8 @@ struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
> > EXPORT_SYMBOL_GPL(vsock_connected_table);
> > DEFINE_SPINLOCK(vsock_table_lock);
> > EXPORT_SYMBOL_GPL(vsock_table_lock);
> >+static struct list_head vsock_dgram_bind_table[VSOCK_HASH_SIZE + 1];
> >+static DEFINE_SPINLOCK(vsock_dgram_table_lock);
> >
> > /* Autobind this socket to the local address if necessary. */
> > static int vsock_auto_bind(struct vsock_sock *vsk)
> >@@ -204,6 +216,9 @@ static void vsock_init_tables(void)
> >
> > for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
> > INIT_LIST_HEAD(&vsock_connected_table[i]);
> >+
> >+ for (i = 0; i < ARRAY_SIZE(vsock_dgram_bind_table); i++)
> >+ INIT_LIST_HEAD(&vsock_dgram_bind_table[i]);
> > }
> >
> > static void __vsock_insert_bound(struct list_head *list,
> >@@ -271,13 +286,28 @@ static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
> > return NULL;
> > }
> >
> >-static void vsock_insert_unbound(struct vsock_sock *vsk)
> >+static void __vsock_insert_dgram_unbound(struct vsock_sock *vsk)
> >+{
> >+ spin_lock_bh(&vsock_dgram_table_lock);
> >+ __vsock_insert_bound(vsock_unbound_dgram_sockets, vsk);
> >+ spin_unlock_bh(&vsock_dgram_table_lock);
> >+}
> >+
> >+static void __vsock_insert_connectible_unbound(struct vsock_sock *vsk)
> > {
> > spin_lock_bh(&vsock_table_lock);
> > __vsock_insert_bound(vsock_unbound_sockets, vsk);
> > spin_unlock_bh(&vsock_table_lock);
> > }
> >
> >+static void vsock_insert_unbound(struct vsock_sock *vsk)
> >+{
> >+ if (sock_type_connectible(sk_vsock(vsk)->sk_type))
> >+ __vsock_insert_connectible_unbound(vsk);
> >+ else
> >+ __vsock_insert_dgram_unbound(vsk);
> >+}
> >+
> > void vsock_insert_connected(struct vsock_sock *vsk)
> > {
> > struct list_head *list = vsock_connected_sockets(
> >@@ -289,6 +319,14 @@ void vsock_insert_connected(struct vsock_sock *vsk)
> > }
> > EXPORT_SYMBOL_GPL(vsock_insert_connected);
> >
> >+static void vsock_remove_dgram_bound(struct vsock_sock *vsk)
> >+{
> >+ spin_lock_bh(&vsock_dgram_table_lock);
> >+ if (__vsock_in_bound_table(vsk))
> >+ __vsock_remove_bound(vsk);
> >+ spin_unlock_bh(&vsock_dgram_table_lock);
> >+}
> >+
> > void vsock_remove_bound(struct vsock_sock *vsk)
> > {
> > spin_lock_bh(&vsock_table_lock);
> >@@ -340,7 +378,10 @@ EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
> >
> > void vsock_remove_sock(struct vsock_sock *vsk)
> > {
> >- vsock_remove_bound(vsk);
> >+ if (sock_type_connectible(sk_vsock(vsk)->sk_type))
> >+ vsock_remove_bound(vsk);
> >+ else
> >+ vsock_remove_dgram_bound(vsk);
>
> Can we try to be consistent, for example we have vsock_insert_unbound()
> which calls internally sock_type_connectible(), while
> vsock_remove_bound() is just for connectible sockets. It's a bit
> confusing.
I agree with you. I will make the style more consistent by keeping
vsock_insert_unbound() only work on connectible sockets.
>
> > vsock_remove_connected(vsk);
> > }
> > EXPORT_SYMBOL_GPL(vsock_remove_sock);
> >@@ -746,11 +787,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
> > }
> >
> >-static int __vsock_bind_dgram(struct vsock_sock *vsk,
> >- struct sockaddr_vm *addr)
> >+static int vsock_bind_dgram(struct vsock_sock *vsk,
> >+ struct sockaddr_vm *addr)
>
> Why we are renaming this?
I will keep the original __vsock_bind_dgram() for consistency.
>
> > {
> >- if (!vsk->transport || !vsk->transport->dgram_bind)
> >- return -EINVAL;
> >+ if (!vsk->transport || !vsk->transport->dgram_bind) {
>
> Why this condition?
>
> Maybe a comment here is needed because I'm lost...
We currently use !vsk->transport->dgram_bind to determine if this is
VMCI dgram transport. Will add a comment explaining this.
>
> >+ int retval;
> >+
> >+ spin_lock_bh(&vsock_dgram_table_lock);
> >+ retval = vsock_bind_common(vsk, addr, vsock_dgram_bind_table,
> >+ VSOCK_HASH_SIZE);
>
> Should we use VSOCK_HASH_SIZE + 1 here?
>
> Using ARRAY_SIZE(x) should avoid this problem.
Yes. The size here is wrong. I will remove the size check (the
discussion is in patch 4).
Thanks,
Amery
>
>
> >+ spin_unlock_bh(&vsock_dgram_table_lock);
> >+
> >+ return retval;
> >+ }
> >
> > return vsk->transport->dgram_bind(vsk, addr);
> > }
> >@@ -781,7 +830,7 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
> > break;
> >
> > case SOCK_DGRAM:
> >- retval = __vsock_bind_dgram(vsk, addr);
> >+ retval = vsock_bind_dgram(vsk, addr);
> > break;
> >
> > default:
> >--
> >2.20.1
> >
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 03/14] af_vsock: support multi-transport datagrams
2024-07-28 20:28 ` Arseniy Krasnov
@ 2024-07-28 21:53 ` Amery Hung
2024-07-29 5:12 ` Arseniy Krasnov
0 siblings, 1 reply; 51+ messages in thread
From: Amery Hung @ 2024-07-28 21:53 UTC (permalink / raw)
To: Arseniy Krasnov
Cc: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers, dan.carpenter, simon.horman, oxffffaa, kvm,
virtualization, netdev, linux-kernel, linux-hyperv, bpf,
bobby.eshleman, jiang.wang, amery.hung, xiyou.wangcong, kernel
On Sun, Jul 28, 2024 at 1:40 PM Arseniy Krasnov
<avkrasnov@salutedevices.com> wrote:
>
> Hi Amery
>
> > /* Transport features flags */
> > /* Transport provides host->guest communication */
> > -#define VSOCK_TRANSPORT_F_H2G 0x00000001
> > +#define VSOCK_TRANSPORT_F_H2G 0x00000001
> > /* Transport provides guest->host communication */
> > -#define VSOCK_TRANSPORT_F_G2H 0x00000002
> > -/* Transport provides DGRAM communication */
> > -#define VSOCK_TRANSPORT_F_DGRAM 0x00000004
> > +#define VSOCK_TRANSPORT_F_G2H 0x00000002
> > +/* Transport provides fallback for DGRAM communication */
> > +#define VSOCK_TRANSPORT_F_DGRAM_FALLBACK 0x00000004
> > /* Transport provides local (loopback) communication */
> > -#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
> > +#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
>
> ^^^ This is refactoring ?
>
This part contains no functional change.
Since virtio dgram uses transport_h2g/g2h instead of transport_dgram
(renamed totransport_dgam_fallback to in this patch) of VMCI, we
rename the flags here to describe the transport in a more accurate
way.
For a datagram vsock, during socket creation, if VMCI is present,
transport_dgram will be registered as a fallback.
During vsock_dgram_sendmsg(), we will always try to resolve the
transport to transport_h2g/g2h/local first and then fallback on
transport_dgram.
Let me know if there is anything that is confusing here.
>
> > + /* During vsock_create(), the transport cannot be decided yet if
> > + * using virtio. While for VMCI, it is transport_dgram_fallback.
>
>
> I'm not English speaker, but 'decided' -> 'detected'/'resolved' ?
>
Not a native English speaker either, but I think resolve is also
pretty accurate.
Thanks,
Amery
>
>
> Thanks, Arseniy
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 14/14] test/vsock: add vsock dgram tests
2024-07-23 14:43 ` Stefano Garzarella
@ 2024-07-28 22:06 ` Amery Hung
0 siblings, 0 replies; 51+ messages in thread
From: Amery Hung @ 2024-07-28 22:06 UTC (permalink / raw)
To: Stefano Garzarella
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
On Tue, Jul 23, 2024 at 7:43 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Jul 10, 2024 at 09:25:55PM GMT, Amery Hung wrote:
> >From: Bobby Eshleman <bobby.eshleman@bytedance.com>
> >
> >From: Jiang Wang <jiang.wang@bytedance.com>
> >
> >This commit adds tests for vsock datagram.
> >
> >Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> >Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> >Signed-off-by: Amery Hung <amery.hung@bytedance.com>
> >---
> > tools/testing/vsock/util.c | 177 ++++-
> > tools/testing/vsock/util.h | 10 +
> > tools/testing/vsock/vsock_test.c | 1032 ++++++++++++++++++++++++++----
> > 3 files changed, 1099 insertions(+), 120 deletions(-)
> >
> >diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
> >index 554b290fefdc..14d6cd90ca15 100644
> >--- a/tools/testing/vsock/util.c
> >+++ b/tools/testing/vsock/util.c
> >@@ -154,7 +154,8 @@ static int vsock_connect(unsigned int cid, unsigned int port, int type)
> > int ret;
> > int fd;
> >
> >- control_expectln("LISTENING");
> >+ if (type != SOCK_DGRAM)
> >+ control_expectln("LISTENING");
>
> Why it is not needed?
>
I think we actually need it. I will add control_write("LISTENING") in
vsock_dgram_bind().
> BTW this patch is too big to be reviewed, please split it.
Will do.
Thank you,
Amery
>
> Thanks,
> Stefano
>
> >
> > fd = socket(AF_VSOCK, type, 0);
> > if (fd < 0) {
> >@@ -189,6 +190,11 @@ int vsock_seqpacket_connect(unsigned int cid, unsigned int port)
> > return vsock_connect(cid, port, SOCK_SEQPACKET);
> > }
> >
[...]
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 02/14] af_vsock: refactor transport lookup code
2024-07-25 6:29 ` Arseniy Krasnov
@ 2024-07-28 22:10 ` Amery Hung
0 siblings, 0 replies; 51+ messages in thread
From: Amery Hung @ 2024-07-28 22:10 UTC (permalink / raw)
To: Arseniy Krasnov
Cc: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers, dan.carpenter, simon.horman, oxffffaa, kvm,
virtualization, netdev, linux-kernel, linux-hyperv, bpf,
bobby.eshleman, jiang.wang, amery.hung, xiyou.wangcong
On Wed, Jul 24, 2024 at 11:41 PM Arseniy Krasnov
<avkrasnov@salutedevices.com> wrote:
>
> Hi
>
> +static const struct vsock_transport *
> +vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
> ^^^ may be just 'u8' ?
> +{
> + const struct vsock_transport *transport;
> ^^^ do we really need this variable now?
> May be shorter like:
> if (A)
> return transport_local;
> else if (B)
> return transport_g2h;
> else
> return transport_h2g;
Looks good to me. Will change it in the next version.
Thanks,
Amery
> +
> + if (vsock_use_local_transport(cid))
> + transport = transport_local;
> + else if (cid <= VMADDR_CID_HOST || !transport_h2g ||
> + (flags & VMADDR_FLAG_TO_HOST))
> + transport = transport_g2h;
> + else
> + transport = transport_h2g;
> +
> + return transport;
> +}
> +
>
> Thanks
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 03/14] af_vsock: support multi-transport datagrams
2024-07-28 21:53 ` Amery Hung
@ 2024-07-29 5:12 ` Arseniy Krasnov
0 siblings, 0 replies; 51+ messages in thread
From: Arseniy Krasnov @ 2024-07-29 5:12 UTC (permalink / raw)
To: Amery Hung
Cc: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers, dan.carpenter, simon.horman, oxffffaa, kvm,
virtualization, netdev, linux-kernel, linux-hyperv, bpf,
bobby.eshleman, jiang.wang, amery.hung, xiyou.wangcong, kernel
On 29.07.2024 00:53, Amery Hung wrote:
> On Sun, Jul 28, 2024 at 1:40 PM Arseniy Krasnov
> <avkrasnov@salutedevices.com> wrote:
>>
>> Hi Amery
>>
>>> /* Transport features flags */
>>> /* Transport provides host->guest communication */
>>> -#define VSOCK_TRANSPORT_F_H2G 0x00000001
>>> +#define VSOCK_TRANSPORT_F_H2G 0x00000001
>>> /* Transport provides guest->host communication */
>>> -#define VSOCK_TRANSPORT_F_G2H 0x00000002
>>> -/* Transport provides DGRAM communication */
>>> -#define VSOCK_TRANSPORT_F_DGRAM 0x00000004
>>> +#define VSOCK_TRANSPORT_F_G2H 0x00000002
>>> +/* Transport provides fallback for DGRAM communication */
>>> +#define VSOCK_TRANSPORT_F_DGRAM_FALLBACK 0x00000004
>>> /* Transport provides local (loopback) communication */
>>> -#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
>>> +#define VSOCK_TRANSPORT_F_LOCAL 0x00000008
>>
>> ^^^ This is refactoring ?
>>
>
> This part contains no functional change.
Ah I see, sorry )
Thanks, Arseniy
>
> Since virtio dgram uses transport_h2g/g2h instead of transport_dgram
> (renamed totransport_dgam_fallback to in this patch) of VMCI, we
> rename the flags here to describe the transport in a more accurate
> way.
>
> For a datagram vsock, during socket creation, if VMCI is present,
> transport_dgram will be registered as a fallback.
>
> During vsock_dgram_sendmsg(), we will always try to resolve the
> transport to transport_h2g/g2h/local first and then fallback on
> transport_dgram.
>
> Let me know if there is anything that is confusing here.
>
>>
>>> + /* During vsock_create(), the transport cannot be decided yet if
>>> + * using virtio. While for VMCI, it is transport_dgram_fallback.
>>
>>
>> I'm not English speaker, but 'decided' -> 'detected'/'resolved' ?
>>
>
> Not a native English speaker either, but I think resolve is also
> pretty accurate.
>
> Thanks,
> Amery
>
>>
>>
>> Thanks, Arseniy
^ permalink raw reply [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 01/14] af_vsock: generalize vsock_dgram_recvmsg() to all transports
2024-07-10 21:25 ` [RFC PATCH net-next v6 01/14] af_vsock: generalize vsock_dgram_recvmsg() to all transports Amery Hung
2024-07-15 8:02 ` Luigi Leonardi
@ 2024-07-29 19:25 ` Arseniy Krasnov
1 sibling, 0 replies; 51+ messages in thread
From: Arseniy Krasnov @ 2024-07-29 19:25 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong, kernel
> @@ -1273,11 +1273,15 @@ static int vsock_dgram_connect(struct socket *sock,
> int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> size_t len, int flags)
> {
> + struct vsock_skb_cb *vsock_cb;
> #ifdef CONFIG_BPF_SYSCALL
> const struct proto *prot;
> #endif
> struct vsock_sock *vsk;
> + struct sk_buff *skb;
> + size_t payload_len;
> struct sock *sk;
> + int err;
>
> sk = sock->sk;
> vsk = vsock_sk(sk);
> @@ -1288,7 +1292,43 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> return prot->recvmsg(sk, msg, len, flags, NULL);
> #endif
>
> - return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
> + if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> + return -EOPNOTSUPP;
> +
> + if (unlikely(flags & MSG_ERRQUEUE))
> + return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 0);
> +
> + /* Retrieve the head sk_buff from the socket's receive queue. */
> + err = 0;
> + skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
> + if (!skb)
> + return err;
> +
> + payload_len = skb->len;
> +
> + if (payload_len > len) {
> + payload_len = len;
> + msg->msg_flags |= MSG_TRUNC;
> + }
> +
> + /* Place the datagram payload in the user's iovec. */
> + err = skb_copy_datagram_msg(skb, 0, msg, payload_len);
> + if (err)
> + goto out;
> +
> + if (msg->msg_name) {
> + /* Provide the address of the sender. */
> + DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> +
> + vsock_cb = vsock_skb_cb(skb);
May be we can declare 'vsock_cb' here ? Reducing its scope.
> + vsock_addr_init(vm_addr, vsock_cb->src_cid, vsock_cb->src_port);
> + msg->msg_namelen = sizeof(*vm_addr);
> + }
> + err = payload_len;
> +
> +out:
> + skb_free_datagram(&vsk->sk, skb);
> + return err;
> }
> EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
>
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c
> @@ -610,6 +610,7 @@ vmci_transport_datagram_create_hnd(u32 resource_id,
>
> static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg)
> {
> + struct vsock_skb_cb *vsock_cb;
> struct sock *sk;
> size_t size;
> struct sk_buff *skb;
> @@ -637,10 +638,14 @@ static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg)
> if (!skb)
> return VMCI_ERROR_NO_MEM;
>
> + vsock_cb = vsock_skb_cb(skb);
> + vsock_cb->src_cid = dg->src.context;
> + vsock_cb->src_port = dg->src.resource;
> /* sk_receive_skb() will do a sock_put(), so hold here. */
> sock_hold(sk);
> skb_put(skb, size);
> memcpy(skb->data, dg, size);
> + skb_pull(skb, VMCI_DG_HEADERSIZE);
Small suggestion: here we do:
1) skb_put(skb, size of entire datagram)
2) memcpy(entire datagram)
3) skb_pull(VMCI_DG_HEADERSIZE)
If we provide only data to the upper layer, we can do:
1) skb_put(dg->payload_size)
2) memcpy(dg->payload_size)
Also (I'm no expert in VMCI), i guess using dg->payload_size is safer
to know number of data bytes, instead of using VMCI_DG_HEADERSIZE.
WDYT?
> sk_receive_skb(sk, skb, 0);
>
> return VMCI_SUCCESS;
> @@ -1731,59 +1736,6 @@ static int vmci_transport_dgram_enqueue(
> return err - sizeof(*dg);
> }
>
Thanks
^ permalink raw reply [flat|nested] 51+ messages in thread
* [RFC PATCH net-next v6 07/14] virtio/vsock: add common datagram send path
2024-07-10 21:25 ` [RFC PATCH net-next v6 07/14] virtio/vsock: add common datagram send path Amery Hung
2024-07-23 14:42 ` Stefano Garzarella
@ 2024-07-29 20:00 ` Arseniy Krasnov
2024-07-29 22:51 ` Amery Hung
1 sibling, 1 reply; 51+ messages in thread
From: Arseniy Krasnov @ 2024-07-29 20:00 UTC (permalink / raw)
To: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers
Cc: dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, ameryhung, xiyou.wangcong, kernel
Hi,
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index a1c76836d798..46cd1807f8e3 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -1040,13 +1040,98 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)
> }
> EXPORT_SYMBOL_GPL(virtio_transport_shutdown);
>
> +static int virtio_transport_dgram_send_pkt_info(struct vsock_sock *vsk,
> + struct virtio_vsock_pkt_info *info)
> +{
> + u32 src_cid, src_port, dst_cid, dst_port;
> + const struct vsock_transport *transport;
> + const struct virtio_transport *t_ops;
> + struct sock *sk = sk_vsock(vsk);
> + struct virtio_vsock_hdr *hdr;
> + struct sk_buff *skb;
> + void *payload;
> + int noblock = 0;
> + int err;
> +
> + info->type = virtio_transport_get_type(sk_vsock(vsk));
> +
> + if (info->pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> + return -EMSGSIZE;
Small suggestion, i think we can check for packet length earlier ? Before
info->type = ...
> +
> + transport = vsock_dgram_lookup_transport(info->remote_cid, info->remote_flags);
> + t_ops = container_of(transport, struct virtio_transport, transport);
> + if (unlikely(!t_ops))
> + return -EFAULT;
> +
> + if (info->msg)
> + noblock = info->msg->msg_flags & MSG_DONTWAIT;
> +
> + /* Use sock_alloc_send_skb to throttle by sk_sndbuf. This helps avoid
> + * triggering the OOM.
> + */
> + skb = sock_alloc_send_skb(sk, info->pkt_len + VIRTIO_VSOCK_SKB_HEADROOM,
> + noblock, &err);
> + if (!skb)
> + return err;
> +
> + skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
> +
> + src_cid = t_ops->transport.get_local_cid();
> + src_port = vsk->local_addr.svm_port;
> + dst_cid = info->remote_cid;
> + dst_port = info->remote_port;
> +
> + hdr = virtio_vsock_hdr(skb);
> + hdr->type = cpu_to_le16(info->type);
> + hdr->op = cpu_to_le16(info->op);
> + hdr->src_cid = cpu_to_le64(src_cid);
> + hdr->dst_cid = cpu_to_le64(dst_cid);
> + hdr->src_port = cpu_to_le32(src_port);
> + hdr->dst_port = cpu_to_le32(dst_port);
> + hdr->flags = cpu_to_le32(info->flags);
> + hdr->len = cpu_to_le32(info->pkt_len);
There is function 'virtio_transport_init_hdr()' in this file, may be reuse it ?
> +
> + if (info->msg && info->pkt_len > 0) {
If pkt_len is 0, do we really need to send such packets ? Because for connectible
sockets, we ignore empty OP_RW packets.
> + payload = skb_put(skb, info->pkt_len);
> + err = memcpy_from_msg(payload, info->msg, info->pkt_len);
> + if (err)
> + goto out;
> + }
> +
> + trace_virtio_transport_alloc_pkt(src_cid, src_port,
> + dst_cid, dst_port,
> + info->pkt_len,
> + info->type,
> + info->op,
> + info->flags,
> + false);
^^^ For SOCK_DGRAM, include/trace/events/vsock_virtio_transport_common.h also should
be updated?
> +
> + return t_ops->send_pkt(skb);
> +out:
> + kfree_skb(skb);
> + return err;
> +}
> +
> int
> virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
> struct sockaddr_vm *remote_addr,
> struct msghdr *msg,
> size_t dgram_len)
> {
> - return -EOPNOTSUPP;
> + /* Here we are only using the info struct to retain style uniformity
> + * and to ease future refactoring and merging.
> + */
> + struct virtio_vsock_pkt_info info = {
> + .op = VIRTIO_VSOCK_OP_RW,
> + .remote_cid = remote_addr->svm_cid,
> + .remote_port = remote_addr->svm_port,
> + .remote_flags = remote_addr->svm_flags,
> + .msg = msg,
> + .vsk = vsk,
> + .pkt_len = dgram_len,
> + };
> +
> + return virtio_transport_dgram_send_pkt_info(vsk, &info);
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
>
> --
> 2.20.1
Thanks, Arseniy
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 07/14] virtio/vsock: add common datagram send path
2024-07-29 20:00 ` Arseniy Krasnov
@ 2024-07-29 22:51 ` Amery Hung
2024-07-30 5:09 ` Arseniy Krasnov
0 siblings, 1 reply; 51+ messages in thread
From: Amery Hung @ 2024-07-29 22:51 UTC (permalink / raw)
To: Arseniy Krasnov
Cc: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers, dan.carpenter, simon.horman, oxffffaa, kvm,
virtualization, netdev, linux-kernel, linux-hyperv, bpf,
bobby.eshleman, jiang.wang, amery.hung, xiyou.wangcong, kernel
On Mon, Jul 29, 2024 at 1:12 PM Arseniy Krasnov
<avkrasnov@salutedevices.com> wrote:
>
> Hi,
>
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index a1c76836d798..46cd1807f8e3 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -1040,13 +1040,98 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)
> > }
> > EXPORT_SYMBOL_GPL(virtio_transport_shutdown);
> >
> > +static int virtio_transport_dgram_send_pkt_info(struct vsock_sock *vsk,
> > + struct virtio_vsock_pkt_info *info)
> > +{
> > + u32 src_cid, src_port, dst_cid, dst_port;
> > + const struct vsock_transport *transport;
> > + const struct virtio_transport *t_ops;
> > + struct sock *sk = sk_vsock(vsk);
> > + struct virtio_vsock_hdr *hdr;
> > + struct sk_buff *skb;
> > + void *payload;
> > + int noblock = 0;
> > + int err;
> > +
> > + info->type = virtio_transport_get_type(sk_vsock(vsk));
> > +
> > + if (info->pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> > + return -EMSGSIZE;
>
> Small suggestion, i think we can check for packet length earlier ? Before
> info->type = ...
Certainly.
>
> > +
> > + transport = vsock_dgram_lookup_transport(info->remote_cid, info->remote_flags);
> > + t_ops = container_of(transport, struct virtio_transport, transport);
> > + if (unlikely(!t_ops))
> > + return -EFAULT;
> > +
> > + if (info->msg)
> > + noblock = info->msg->msg_flags & MSG_DONTWAIT;
> > +
> > + /* Use sock_alloc_send_skb to throttle by sk_sndbuf. This helps avoid
> > + * triggering the OOM.
> > + */
> > + skb = sock_alloc_send_skb(sk, info->pkt_len + VIRTIO_VSOCK_SKB_HEADROOM,
> > + noblock, &err);
> > + if (!skb)
> > + return err;
> > +
> > + skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
> > +
> > + src_cid = t_ops->transport.get_local_cid();
> > + src_port = vsk->local_addr.svm_port;
> > + dst_cid = info->remote_cid;
> > + dst_port = info->remote_port;
> > +
> > + hdr = virtio_vsock_hdr(skb);
> > + hdr->type = cpu_to_le16(info->type);
> > + hdr->op = cpu_to_le16(info->op);
> > + hdr->src_cid = cpu_to_le64(src_cid);
> > + hdr->dst_cid = cpu_to_le64(dst_cid);
> > + hdr->src_port = cpu_to_le32(src_port);
> > + hdr->dst_port = cpu_to_le32(dst_port);
> > + hdr->flags = cpu_to_le32(info->flags);
> > + hdr->len = cpu_to_le32(info->pkt_len);
>
> There is function 'virtio_transport_init_hdr()' in this file, may be reuse it ?
Will do.
>
> > +
> > + if (info->msg && info->pkt_len > 0) {
>
> If pkt_len is 0, do we really need to send such packets ? Because for connectible
> sockets, we ignore empty OP_RW packets.
Thanks for pointing this out. I think virtio dgram should also follow that.
>
> > + payload = skb_put(skb, info->pkt_len);
> > + err = memcpy_from_msg(payload, info->msg, info->pkt_len);
> > + if (err)
> > + goto out;
> > + }
> > +
> > + trace_virtio_transport_alloc_pkt(src_cid, src_port,
> > + dst_cid, dst_port,
> > + info->pkt_len,
> > + info->type,
> > + info->op,
> > + info->flags,
> > + false);
>
> ^^^ For SOCK_DGRAM, include/trace/events/vsock_virtio_transport_common.h also should
> be updated?
Can you elaborate what needs to be changed?
Thank you,
Amery
>
> > +
> > + return t_ops->send_pkt(skb);
> > +out:
> > + kfree_skb(skb);
> > + return err;
> > +}
> > +
> > int
> > virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
> > struct sockaddr_vm *remote_addr,
> > struct msghdr *msg,
> > size_t dgram_len)
> > {
> > - return -EOPNOTSUPP;
> > + /* Here we are only using the info struct to retain style uniformity
> > + * and to ease future refactoring and merging.
> > + */
> > + struct virtio_vsock_pkt_info info = {
> > + .op = VIRTIO_VSOCK_OP_RW,
> > + .remote_cid = remote_addr->svm_cid,
> > + .remote_port = remote_addr->svm_port,
> > + .remote_flags = remote_addr->svm_flags,
> > + .msg = msg,
> > + .vsk = vsk,
> > + .pkt_len = dgram_len,
> > + };
> > +
> > + return virtio_transport_dgram_send_pkt_info(vsk, &info);
> > }
> > EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
> >
> > --
> > 2.20.1
>
> Thanks, Arseniy
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 09/14] virtio/vsock: add common datagram recv path
2024-07-23 14:42 ` Stefano Garzarella
@ 2024-07-30 0:35 ` Amery Hung
2024-07-30 8:32 ` Stefano Garzarella
0 siblings, 1 reply; 51+ messages in thread
From: Amery Hung @ 2024-07-30 0:35 UTC (permalink / raw)
To: Stefano Garzarella
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
On Tue, Jul 23, 2024 at 7:42 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Jul 10, 2024 at 09:25:50PM GMT, Amery Hung wrote:
> >From: Bobby Eshleman <bobby.eshleman@bytedance.com>
> >
> >This commit adds the common datagram receive functionality for virtio
> >transports. It does not add the vhost/virtio users of that
> >functionality.
> >
> >This functionality includes:
> >- changes to the virtio_transport_recv_pkt() path for finding the
> > bound socket receiver for incoming packets
> >- virtio_transport_recv_pkt() saves the source cid and port to the
> > control buffer for recvmsg() to initialize sockaddr_vm structure
> > when using datagram
> >
> >Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> >Signed-off-by: Amery Hung <amery.hung@bytedance.com>
> >---
> > net/vmw_vsock/virtio_transport_common.c | 79 +++++++++++++++++++++----
> > 1 file changed, 66 insertions(+), 13 deletions(-)
> >
> >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> >index 46cd1807f8e3..a571b575fde9 100644
> >--- a/net/vmw_vsock/virtio_transport_common.c
> >+++ b/net/vmw_vsock/virtio_transport_common.c
> >@@ -235,7 +235,9 @@ EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);
> >
> > static u16 virtio_transport_get_type(struct sock *sk)
> > {
> >- if (sk->sk_type == SOCK_STREAM)
> >+ if (sk->sk_type == SOCK_DGRAM)
> >+ return VIRTIO_VSOCK_TYPE_DGRAM;
> >+ else if (sk->sk_type == SOCK_STREAM)
> > return VIRTIO_VSOCK_TYPE_STREAM;
> > else
> > return VIRTIO_VSOCK_TYPE_SEQPACKET;
> >@@ -1422,6 +1424,33 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
> > kfree_skb(skb);
> > }
> >
> >+static void
> >+virtio_transport_dgram_kfree_skb(struct sk_buff *skb, int err)
> >+{
> >+ if (err == -ENOMEM)
> >+ kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_RCVBUFF);
> >+ else if (err == -ENOBUFS)
> >+ kfree_skb_reason(skb, SKB_DROP_REASON_PROTO_MEM);
> >+ else
> >+ kfree_skb(skb);
> >+}
> >+
> >+/* This function takes ownership of the skb.
> >+ *
> >+ * It either places the skb on the sk_receive_queue or frees it.
> >+ */
> >+static void
> >+virtio_transport_recv_dgram(struct sock *sk, struct sk_buff *skb)
> >+{
> >+ int err;
> >+
> >+ err = sock_queue_rcv_skb(sk, skb);
> >+ if (err) {
> >+ virtio_transport_dgram_kfree_skb(skb, err);
> >+ return;
> >+ }
> >+}
> >+
> > static int
> > virtio_transport_recv_connected(struct sock *sk,
> > struct sk_buff *skb)
> >@@ -1591,7 +1620,8 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
> > static bool virtio_transport_valid_type(u16 type)
> > {
> > return (type == VIRTIO_VSOCK_TYPE_STREAM) ||
> >- (type == VIRTIO_VSOCK_TYPE_SEQPACKET);
> >+ (type == VIRTIO_VSOCK_TYPE_SEQPACKET) ||
> >+ (type == VIRTIO_VSOCK_TYPE_DGRAM);
> > }
> >
> > /* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mutex
> >@@ -1601,44 +1631,57 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> > struct sk_buff *skb)
> > {
> > struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
> >+ struct vsock_skb_cb *vsock_cb;
>
> This can be defined in the block where it's used.
>
Got it.
> > struct sockaddr_vm src, dst;
> > struct vsock_sock *vsk;
> > struct sock *sk;
> > bool space_available;
> >+ u16 type;
> >
> > vsock_addr_init(&src, le64_to_cpu(hdr->src_cid),
> > le32_to_cpu(hdr->src_port));
> > vsock_addr_init(&dst, le64_to_cpu(hdr->dst_cid),
> > le32_to_cpu(hdr->dst_port));
> >
> >+ type = le16_to_cpu(hdr->type);
> >+
> > trace_virtio_transport_recv_pkt(src.svm_cid, src.svm_port,
> > dst.svm_cid, dst.svm_port,
> > le32_to_cpu(hdr->len),
> >- le16_to_cpu(hdr->type),
> >+ type,
> > le16_to_cpu(hdr->op),
> > le32_to_cpu(hdr->flags),
> > le32_to_cpu(hdr->buf_alloc),
> > le32_to_cpu(hdr->fwd_cnt));
> >
> >- if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) {
> >+ if (!virtio_transport_valid_type(type)) {
> > (void)virtio_transport_reset_no_sock(t, skb);
> > goto free_pkt;
> > }
> >
> >- /* The socket must be in connected or bound table
> >- * otherwise send reset back
> >+ /* For stream/seqpacket, the socket must be in connected or bound table
> >+ * otherwise send reset back.
> >+ *
> >+ * For datagrams, no reset is sent back.
> > */
> > sk = vsock_find_connected_socket(&src, &dst);
> > if (!sk) {
> >- sk = vsock_find_bound_socket(&dst);
> >- if (!sk) {
> >- (void)virtio_transport_reset_no_sock(t, skb);
> >- goto free_pkt;
> >+ if (type == VIRTIO_VSOCK_TYPE_DGRAM) {
> >+ sk = vsock_find_bound_dgram_socket(&dst);
> >+ if (!sk)
> >+ goto free_pkt;
> >+ } else {
> >+ sk = vsock_find_bound_socket(&dst);
> >+ if (!sk) {
> >+ (void)virtio_transport_reset_no_sock(t, skb);
> >+ goto free_pkt;
> >+ }
> > }
> > }
> >
> >- if (virtio_transport_get_type(sk) != le16_to_cpu(hdr->type)) {
> >- (void)virtio_transport_reset_no_sock(t, skb);
> >+ if (virtio_transport_get_type(sk) != type) {
> >+ if (type != VIRTIO_VSOCK_TYPE_DGRAM)
> >+ (void)virtio_transport_reset_no_sock(t, skb);
> > sock_put(sk);
> > goto free_pkt;
> > }
> >@@ -1654,12 +1697,21 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> >
> > /* Check if sk has been closed before lock_sock */
> > if (sock_flag(sk, SOCK_DONE)) {
> >- (void)virtio_transport_reset_no_sock(t, skb);
> >+ if (type != VIRTIO_VSOCK_TYPE_DGRAM)
> >+ (void)virtio_transport_reset_no_sock(t, skb);
> > release_sock(sk);
> > sock_put(sk);
> > goto free_pkt;
> > }
> >
> >+ if (sk->sk_type == SOCK_DGRAM) {
> >+ vsock_cb = vsock_skb_cb(skb);
> >+ vsock_cb->src_cid = src.svm_cid;
> >+ vsock_cb->src_port = src.svm_port;
> >+ virtio_transport_recv_dgram(sk, skb);
>
>
> What about adding an API that transports can use to hide this?
>
> I mean something that hide vsock_cb creation and queue packet in the
> socket receive queue. I'd also not expose vsock_skb_cb in an header, but
> I'd handle it internally in af_vsock.c. So I'd just expose API to
> queue/dequeue them.
>
Got it. I will move vsock_skb_cb to af_vsock.c and create an API:
vsock_dgram_skb_save_src_addr(struct sk_buff *skb, u32 cid, u32 port)
Different dgram implementations will call this API instead of the code
block above to save the source address information into the control
buffer.
A side note on why this is a vsock API instead of a member function in
transport: As we move to support multi-transport dgram, different
transport implementations can place skb into the sk->sk_receive_queue.
Therefore, we cannot call transport-specific function in
vsock_dgram_recvmsg() to initialize struct sockaddr_vm. Hence, the
receiving paths of different transports need to call this API to save
source address.
> Also why VMCI is using sk_receive_skb(), while we are using
> sock_queue_rcv_skb()?
>
I _think_ originally we referred to UDP and UDS when designing virtio
dgram, and ended up with placing skb into sk_receive_queue directly. I
will look into this to provide better justification.
Thank you,
Amery
> Thanks,
> Stefano
>
> >+ goto out;
> >+ }
> >+
> > space_available = virtio_transport_space_update(sk, skb);
> >
> > /* Update CID in case it has changed after a transport reset event */
> >@@ -1691,6 +1743,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> > break;
> > }
> >
> >+out:
> > release_sock(sk);
> >
> > /* Release refcnt obtained when we fetched this socket out of the
> >--
> >2.20.1
> >
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 07/14] virtio/vsock: add common datagram send path
2024-07-29 22:51 ` Amery Hung
@ 2024-07-30 5:09 ` Arseniy Krasnov
0 siblings, 0 replies; 51+ messages in thread
From: Arseniy Krasnov @ 2024-07-30 5:09 UTC (permalink / raw)
To: Amery Hung
Cc: stefanha, sgarzare, mst, jasowang, xuanzhuo, davem, edumazet,
kuba, pabeni, kys, haiyangz, wei.liu, decui, bryantan, vdasa,
pv-drivers, dan.carpenter, simon.horman, oxffffaa, kvm,
virtualization, netdev, linux-kernel, linux-hyperv, bpf,
bobby.eshleman, jiang.wang, amery.hung, xiyou.wangcong, kernel
On 30.07.2024 01:51, Amery Hung wrote:
> On Mon, Jul 29, 2024 at 1:12 PM Arseniy Krasnov
> <avkrasnov@salutedevices.com> wrote:
>>
>> Hi,
>>
>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>> index a1c76836d798..46cd1807f8e3 100644
>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>> @@ -1040,13 +1040,98 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)
>>> }
>>> EXPORT_SYMBOL_GPL(virtio_transport_shutdown);
>>>
>>> +static int virtio_transport_dgram_send_pkt_info(struct vsock_sock *vsk,
>>> + struct virtio_vsock_pkt_info *info)
>>> +{
>>> + u32 src_cid, src_port, dst_cid, dst_port;
>>> + const struct vsock_transport *transport;
>>> + const struct virtio_transport *t_ops;
>>> + struct sock *sk = sk_vsock(vsk);
>>> + struct virtio_vsock_hdr *hdr;
>>> + struct sk_buff *skb;
>>> + void *payload;
>>> + int noblock = 0;
>>> + int err;
>>> +
>>> + info->type = virtio_transport_get_type(sk_vsock(vsk));
>>> +
>>> + if (info->pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
>>> + return -EMSGSIZE;
>>
>> Small suggestion, i think we can check for packet length earlier ? Before
>> info->type = ...
>
> Certainly.
>
>>
>>> +
>>> + transport = vsock_dgram_lookup_transport(info->remote_cid, info->remote_flags);
>>> + t_ops = container_of(transport, struct virtio_transport, transport);
>>> + if (unlikely(!t_ops))
>>> + return -EFAULT;
>>> +
>>> + if (info->msg)
>>> + noblock = info->msg->msg_flags & MSG_DONTWAIT;
>>> +
>>> + /* Use sock_alloc_send_skb to throttle by sk_sndbuf. This helps avoid
>>> + * triggering the OOM.
>>> + */
>>> + skb = sock_alloc_send_skb(sk, info->pkt_len + VIRTIO_VSOCK_SKB_HEADROOM,
>>> + noblock, &err);
>>> + if (!skb)
>>> + return err;
>>> +
>>> + skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
>>> +
>>> + src_cid = t_ops->transport.get_local_cid();
>>> + src_port = vsk->local_addr.svm_port;
>>> + dst_cid = info->remote_cid;
>>> + dst_port = info->remote_port;
>>> +
>>> + hdr = virtio_vsock_hdr(skb);
>>> + hdr->type = cpu_to_le16(info->type);
>>> + hdr->op = cpu_to_le16(info->op);
>>> + hdr->src_cid = cpu_to_le64(src_cid);
>>> + hdr->dst_cid = cpu_to_le64(dst_cid);
>>> + hdr->src_port = cpu_to_le32(src_port);
>>> + hdr->dst_port = cpu_to_le32(dst_port);
>>> + hdr->flags = cpu_to_le32(info->flags);
>>> + hdr->len = cpu_to_le32(info->pkt_len);
>>
>> There is function 'virtio_transport_init_hdr()' in this file, may be reuse it ?
>
> Will do.
>
>>
>>> +
>>> + if (info->msg && info->pkt_len > 0) {
>>
>> If pkt_len is 0, do we really need to send such packets ? Because for connectible
>> sockets, we ignore empty OP_RW packets.
>
> Thanks for pointing this out. I think virtio dgram should also follow that.
>
>>
>>> + payload = skb_put(skb, info->pkt_len);
>>> + err = memcpy_from_msg(payload, info->msg, info->pkt_len);
>>> + if (err)
>>> + goto out;
>>> + }
>>> +
>>> + trace_virtio_transport_alloc_pkt(src_cid, src_port,
>>> + dst_cid, dst_port,
>>> + info->pkt_len,
>>> + info->type,
>>> + info->op,
>>> + info->flags,
>>> + false);
>>
>> ^^^ For SOCK_DGRAM, include/trace/events/vsock_virtio_transport_common.h also should
>> be updated?
>
> Can you elaborate what needs to be changed?
Sure, there are:
TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_STREAM);
TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_SEQPACKET);
#define show_type(val) \
__print_symbolic(val, \
{ VIRTIO_VSOCK_TYPE_STREAM, "STREAM" }, \
{ VIRTIO_VSOCK_TYPE_SEQPACKET, "SEQPACKET" })
I guess SOCK_DGRAM handling should be added to print type of socket.
Thanks, Arseniy
>
> Thank you,
> Amery
>
>>
>>> +
>>> + return t_ops->send_pkt(skb);
>>> +out:
>>> + kfree_skb(skb);
>>> + return err;
>>> +}
>>> +
>>> int
>>> virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
>>> struct sockaddr_vm *remote_addr,
>>> struct msghdr *msg,
>>> size_t dgram_len)
>>> {
>>> - return -EOPNOTSUPP;
>>> + /* Here we are only using the info struct to retain style uniformity
>>> + * and to ease future refactoring and merging.
>>> + */
>>> + struct virtio_vsock_pkt_info info = {
>>> + .op = VIRTIO_VSOCK_OP_RW,
>>> + .remote_cid = remote_addr->svm_cid,
>>> + .remote_port = remote_addr->svm_port,
>>> + .remote_flags = remote_addr->svm_flags,
>>> + .msg = msg,
>>> + .vsk = vsk,
>>> + .pkt_len = dgram_len,
>>> + };
>>> +
>>> + return virtio_transport_dgram_send_pkt_info(vsk, &info);
>>> }
>>> EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
>>>
>>> --
>>> 2.20.1
>>
>> Thanks, Arseniy
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 04/14] af_vsock: generalize bind table functions
2024-07-28 18:52 ` Amery Hung
@ 2024-07-30 8:00 ` Stefano Garzarella
2024-07-30 17:56 ` Amery Hung
0 siblings, 1 reply; 51+ messages in thread
From: Stefano Garzarella @ 2024-07-30 8:00 UTC (permalink / raw)
To: Amery Hung
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
On Sun, Jul 28, 2024 at 11:52:54AM GMT, Amery Hung wrote:
>On Tue, Jul 23, 2024 at 7:40 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> On Wed, Jul 10, 2024 at 09:25:45PM GMT, Amery Hung wrote:
>> >From: Bobby Eshleman <bobby.eshleman@bytedance.com>
>> >
>> >This commit makes the bind table management functions in vsock usable
>> >for different bind tables. Future work will introduce a new table for
>> >datagrams to avoid address collisions, and these functions will be used
>> >there.
>> >
>> >Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>> >---
>> > net/vmw_vsock/af_vsock.c | 34 +++++++++++++++++++++++++++-------
>> > 1 file changed, 27 insertions(+), 7 deletions(-)
>> >
>> >diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> >index acc15e11700c..d571be9cdbf0 100644
>> >--- a/net/vmw_vsock/af_vsock.c
>> >+++ b/net/vmw_vsock/af_vsock.c
>> >@@ -232,11 +232,12 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
>> > sock_put(&vsk->sk);
>> > }
>> >
>> >-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
>> >+static struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
>> >+ struct list_head *bind_table)
>> > {
>> > struct vsock_sock *vsk;
>> >
>> >- list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
>> >+ list_for_each_entry(vsk, bind_table, bound_table) {
>> > if (vsock_addr_equals_addr(addr, &vsk->local_addr))
>> > return sk_vsock(vsk);
>> >
>> >@@ -249,6 +250,11 @@ static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
>> > return NULL;
>> > }
>> >
>> >+static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
>> >+{
>> >+ return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
>> >+}
>> >+
>> > static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
>> > struct sockaddr_vm *dst)
>> > {
>> >@@ -671,12 +677,18 @@ static void vsock_pending_work(struct work_struct *work)
>> >
>> > /**** SOCKET OPERATIONS ****/
>> >
>> >-static int __vsock_bind_connectible(struct vsock_sock *vsk,
>> >- struct sockaddr_vm *addr)
>> >+static int vsock_bind_common(struct vsock_sock *vsk,
>> >+ struct sockaddr_vm *addr,
>> >+ struct list_head *bind_table,
>> >+ size_t table_size)
>> > {
>> > static u32 port;
>> > struct sockaddr_vm new_addr;
>> >
>> >+ if (WARN_ONCE(table_size < VSOCK_HASH_SIZE,
>> >+ "table size too small, may cause overflow"))
>> >+ return -EINVAL;
>> >+
>>
>> I'd add this in another commit.
>>
>> > if (!port)
>> > port = get_random_u32_above(LAST_RESERVED_PORT);
>> >
>> >@@ -692,7 +704,8 @@ static int __vsock_bind_connectible(struct
>> >vsock_sock *vsk,
>> >
>> > new_addr.svm_port = port++;
>> >
>> >- if (!__vsock_find_bound_socket(&new_addr)) {
>> >+ if (!vsock_find_bound_socket_common(&new_addr,
>> >+ &bind_table[VSOCK_HASH(addr)])) {
>>
>> Can we add a macro for `&bind_table[VSOCK_HASH(addr)])` ?
>>
>
>Definitely. I will add the following macro:
>
>#define vsock_bound_sockets_in_table(bind_table, addr) \
> (&bind_table[VSOCK_HASH(addr)])
yeah.
>
>> > found = true;
>> > break;
>> > }
>> >@@ -709,7 +722,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
>> > return -EACCES;
>> > }
>> >
>> >- if (__vsock_find_bound_socket(&new_addr))
>> >+ if (vsock_find_bound_socket_common(&new_addr,
>> >+ &bind_table[VSOCK_HASH(addr)]))
>> > return -EADDRINUSE;
>> > }
>> >
>> >@@ -721,11 +735,17 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
>> > * by AF_UNIX.
>> > */
>> > __vsock_remove_bound(vsk);
>> >- __vsock_insert_bound(vsock_bound_sockets(&vsk->local_addr), vsk);
>> >+ __vsock_insert_bound(&bind_table[VSOCK_HASH(&vsk->local_addr)], vsk);
>> >
>> > return 0;
>> > }
>> >
>> >+static int __vsock_bind_connectible(struct vsock_sock *vsk,
>> >+ struct sockaddr_vm *addr)
>> >+{
>> >+ return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
>>
>> What about using ARRAY_SIZE(x) ?
>>
>> BTW we are using that size just to check it, but all the arrays we use
>> are statically allocated, so what about a compile time check like
>> BUILD_BUG_ON()?
>>
>
>I will remove the table_size check you mentioned earlier and the
>argument here as the arrays are allocated statically like you
>mentioned.
>
>If you think this check may be a good addition, I can add a
>BUILD_BUG_ON() in the new vsock_bound_sockets_in_table() macro.
If you want to add it, we need to do it in a separate commit. But since
we already have so many changes and both arrays are statically allocated
in the same file, IMHO we can avoid the check.
Stefano
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 05/14] af_vsock: use a separate dgram bind table
2024-07-28 21:37 ` Amery Hung
@ 2024-07-30 8:05 ` Stefano Garzarella
0 siblings, 0 replies; 51+ messages in thread
From: Stefano Garzarella @ 2024-07-30 8:05 UTC (permalink / raw)
To: Amery Hung
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
On Sun, Jul 28, 2024 at 02:37:24PM GMT, Amery Hung wrote:
>On Tue, Jul 23, 2024 at 7:41 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> On Wed, Jul 10, 2024 at 09:25:46PM GMT, Amery Hung wrote:
>> >From: Bobby Eshleman <bobby.eshleman@bytedance.com>
>> >
>> >This commit adds support for bound dgram sockets to be tracked in a
>> >separate bind table from connectible sockets in order to avoid address
>> >collisions. With this commit, users can simultaneously bind a dgram
>> >socket and connectible socket to the same CID and port.
>> >
>> >Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>> >---
>> > net/vmw_vsock/af_vsock.c | 103 +++++++++++++++++++++++++++++----------
>> > 1 file changed, 76 insertions(+), 27 deletions(-)
>> >
>> >diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> >index d571be9cdbf0..ab08cd81720e 100644
>> >--- a/net/vmw_vsock/af_vsock.c
>> >+++ b/net/vmw_vsock/af_vsock.c
>> >@@ -10,18 +10,23 @@
>> > * - There are two kinds of sockets: those created by user action (such as
>> > * calling socket(2)) and those created by incoming connection request packets.
>> > *
>> >- * - There are two "global" tables, one for bound sockets (sockets that have
>> >- * specified an address that they are responsible for) and one for connected
>> >- * sockets (sockets that have established a connection with another socket).
>> >- * These tables are "global" in that all sockets on the system are placed
>> >- * within them. - Note, though, that the bound table contains an extra entry
>> >- * for a list of unbound sockets and SOCK_DGRAM sockets will always remain in
>> >- * that list. The bound table is used solely for lookup of sockets when packets
>> >- * are received and that's not necessary for SOCK_DGRAM sockets since we create
>> >- * a datagram handle for each and need not perform a lookup. Keeping SOCK_DGRAM
>> >- * sockets out of the bound hash buckets will reduce the chance of collisions
>> >- * when looking for SOCK_STREAM sockets and prevents us from having to check the
>> >- * socket type in the hash table lookups.
>> >+ * - There are three "global" tables, one for bound connectible (stream /
>> >+ * seqpacket) sockets, one for bound datagram sockets, and one for connected
>> >+ * sockets. Bound sockets are sockets that have specified an address that
>> >+ * they are responsible for. Connected sockets are sockets that have
>> >+ * established a connection with another socket. These tables are "global" in
>> >+ * that all sockets on the system are placed within them. - Note, though,
>> >+ * that the bound tables contain an extra entry for a list of unbound
>> >+ * sockets. The bound tables are used solely for lookup of sockets when packets
>> >+ * are received.
>> >+ *
>> >+ * - There are separate bind tables for connectible and datagram sockets to avoid
>> >+ * address collisions between stream/seqpacket sockets and datagram sockets.
>> >+ *
>> >+ * - Transports may elect to NOT use the global datagram bind table by
>> >+ * implementing the ->dgram_bind() callback. If that callback is implemented,
>> >+ * the global bind table is not used and the responsibility of bound datagram
>> >+ * socket tracking is deferred to the transport.
>> > *
>> > * - Sockets created by user action will either be "client" sockets that
>> > * initiate a connection or "server" sockets that listen for connections; we do
>> >@@ -116,6 +121,7 @@
>> > static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
>> > static void vsock_sk_destruct(struct sock *sk);
>> > static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
>> >+static bool sock_type_connectible(u16 type);
>> >
>> > /* Protocol family. */
>> > struct proto vsock_proto = {
>> >@@ -152,21 +158,25 @@ static DEFINE_MUTEX(vsock_register_mutex);
>> > * VSocket is stored in the connected hash table.
>> > *
>> > * Unbound sockets are all put on the same list attached to the end of the hash
>> >- * table (vsock_unbound_sockets). Bound sockets are added to the hash table in
>> >- * the bucket that their local address hashes to (vsock_bound_sockets(addr)
>> >- * represents the list that addr hashes to).
>> >+ * tables (vsock_unbound_sockets/vsock_unbound_dgram_sockets). Bound sockets
>> >+ * are added to the hash table in the bucket that their local address hashes to
>> >+ * (vsock_bound_sockets(addr) and vsock_bound_dgram_sockets(addr) represents
>> >+ * the list that addr hashes to).
>> > *
>> >- * Specifically, we initialize the vsock_bind_table array to a size of
>> >- * VSOCK_HASH_SIZE + 1 so that vsock_bind_table[0] through
>> >- * vsock_bind_table[VSOCK_HASH_SIZE - 1] are for bound sockets and
>> >- * vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets. The hash function
>> >- * mods with VSOCK_HASH_SIZE to ensure this.
>> >+ * Specifically, taking connectible sockets as an example we initialize the
>> >+ * vsock_bind_table array to a size of VSOCK_HASH_SIZE + 1 so that
>> >+ * vsock_bind_table[0] through vsock_bind_table[VSOCK_HASH_SIZE - 1] are for
>> >+ * bound sockets and vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets.
>> >+ * The hash function mods with VSOCK_HASH_SIZE to ensure this.
>> >+ * Datagrams and vsock_dgram_bind_table operate in the same way.
>> > */
>> > #define MAX_PORT_RETRIES 24
>> >
>> > #define VSOCK_HASH(addr) ((addr)->svm_port % VSOCK_HASH_SIZE)
>> > #define vsock_bound_sockets(addr) (&vsock_bind_table[VSOCK_HASH(addr)])
>> >+#define vsock_bound_dgram_sockets(addr) (&vsock_dgram_bind_table[VSOCK_HASH(addr)])
>> > #define vsock_unbound_sockets (&vsock_bind_table[VSOCK_HASH_SIZE])
>> >+#define vsock_unbound_dgram_sockets (&vsock_dgram_bind_table[VSOCK_HASH_SIZE])
>> >
>> > /* XXX This can probably be implemented in a better way. */
>> > #define VSOCK_CONN_HASH(src, dst) \
>> >@@ -182,6 +192,8 @@ struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
>> > EXPORT_SYMBOL_GPL(vsock_connected_table);
>> > DEFINE_SPINLOCK(vsock_table_lock);
>> > EXPORT_SYMBOL_GPL(vsock_table_lock);
>> >+static struct list_head vsock_dgram_bind_table[VSOCK_HASH_SIZE + 1];
>> >+static DEFINE_SPINLOCK(vsock_dgram_table_lock);
>> >
>> > /* Autobind this socket to the local address if necessary. */
>> > static int vsock_auto_bind(struct vsock_sock *vsk)
>> >@@ -204,6 +216,9 @@ static void vsock_init_tables(void)
>> >
>> > for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
>> > INIT_LIST_HEAD(&vsock_connected_table[i]);
>> >+
>> >+ for (i = 0; i < ARRAY_SIZE(vsock_dgram_bind_table); i++)
>> >+ INIT_LIST_HEAD(&vsock_dgram_bind_table[i]);
>> > }
>> >
>> > static void __vsock_insert_bound(struct list_head *list,
>> >@@ -271,13 +286,28 @@ static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
>> > return NULL;
>> > }
>> >
>> >-static void vsock_insert_unbound(struct vsock_sock *vsk)
>> >+static void __vsock_insert_dgram_unbound(struct vsock_sock *vsk)
>> >+{
>> >+ spin_lock_bh(&vsock_dgram_table_lock);
>> >+ __vsock_insert_bound(vsock_unbound_dgram_sockets, vsk);
>> >+ spin_unlock_bh(&vsock_dgram_table_lock);
>> >+}
>> >+
>> >+static void __vsock_insert_connectible_unbound(struct vsock_sock *vsk)
>> > {
>> > spin_lock_bh(&vsock_table_lock);
>> > __vsock_insert_bound(vsock_unbound_sockets, vsk);
>> > spin_unlock_bh(&vsock_table_lock);
>> > }
>> >
>> >+static void vsock_insert_unbound(struct vsock_sock *vsk)
>> >+{
>> >+ if (sock_type_connectible(sk_vsock(vsk)->sk_type))
>> >+ __vsock_insert_connectible_unbound(vsk);
>> >+ else
>> >+ __vsock_insert_dgram_unbound(vsk);
>> >+}
>> >+
>> > void vsock_insert_connected(struct vsock_sock *vsk)
>> > {
>> > struct list_head *list = vsock_connected_sockets(
>> >@@ -289,6 +319,14 @@ void vsock_insert_connected(struct vsock_sock *vsk)
>> > }
>> > EXPORT_SYMBOL_GPL(vsock_insert_connected);
>> >
>> >+static void vsock_remove_dgram_bound(struct vsock_sock *vsk)
>> >+{
>> >+ spin_lock_bh(&vsock_dgram_table_lock);
>> >+ if (__vsock_in_bound_table(vsk))
>> >+ __vsock_remove_bound(vsk);
>> >+ spin_unlock_bh(&vsock_dgram_table_lock);
>> >+}
>> >+
>> > void vsock_remove_bound(struct vsock_sock *vsk)
>> > {
>> > spin_lock_bh(&vsock_table_lock);
>> >@@ -340,7 +378,10 @@ EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
>> >
>> > void vsock_remove_sock(struct vsock_sock *vsk)
>> > {
>> >- vsock_remove_bound(vsk);
>> >+ if (sock_type_connectible(sk_vsock(vsk)->sk_type))
>> >+ vsock_remove_bound(vsk);
>> >+ else
>> >+ vsock_remove_dgram_bound(vsk);
>>
>> Can we try to be consistent, for example we have vsock_insert_unbound()
>> which calls internally sock_type_connectible(), while
>> vsock_remove_bound() is just for connectible sockets. It's a bit
>> confusing.
>
>I agree with you. I will make the style more consistent by keeping
>vsock_insert_unbound() only work on connectible sockets.
Maybe I would have done the opposite, making vsock_remove_bound() usable
on all sockets. But I haven't really looked at whether that's feasible
or not.
>
>>
>> > vsock_remove_connected(vsk);
>> > }
>> > EXPORT_SYMBOL_GPL(vsock_remove_sock);
>> >@@ -746,11 +787,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
>> > return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
>> > }
>> >
>> >-static int __vsock_bind_dgram(struct vsock_sock *vsk,
>> >- struct sockaddr_vm *addr)
>> >+static int vsock_bind_dgram(struct vsock_sock *vsk,
>> >+ struct sockaddr_vm *addr)
>>
>> Why we are renaming this?
>
>I will keep the original __vsock_bind_dgram() for consistency.
>
>>
>> > {
>> >- if (!vsk->transport || !vsk->transport->dgram_bind)
>> >- return -EINVAL;
>> >+ if (!vsk->transport || !vsk->transport->dgram_bind) {
>>
>> Why this condition?
>>
>> Maybe a comment here is needed because I'm lost...
>
>We currently use !vsk->transport->dgram_bind to determine if this is
>VMCI dgram transport. Will add a comment explaining this.
Thanks, what's not clear to me is why before this series we returned an
error, whereas now we call vsock_bind_common().
Thanks,
Stefano
>
>>
>> >+ int retval;
>> >+
>> >+ spin_lock_bh(&vsock_dgram_table_lock);
>> >+ retval = vsock_bind_common(vsk, addr, vsock_dgram_bind_table,
>> >+ VSOCK_HASH_SIZE);
>>
>> Should we use VSOCK_HASH_SIZE + 1 here?
>>
>> Using ARRAY_SIZE(x) should avoid this problem.
>
>Yes. The size here is wrong. I will remove the size check (the
>discussion is in patch 4).
>
>Thanks,
>Amery
>
>
>
>>
>>
>> >+ spin_unlock_bh(&vsock_dgram_table_lock);
>> >+
>> >+ return retval;
>> >+ }
>> >
>> > return vsk->transport->dgram_bind(vsk, addr);
>> > }
>> >@@ -781,7 +830,7 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
>> > break;
>> >
>> > case SOCK_DGRAM:
>> >- retval = __vsock_bind_dgram(vsk, addr);
>> >+ retval = vsock_bind_dgram(vsk, addr);
>> > break;
>> >
>> > default:
>> >--
>> >2.20.1
>> >
>>
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 07/14] virtio/vsock: add common datagram send path
2024-07-26 23:22 ` Amery Hung
@ 2024-07-30 8:22 ` Stefano Garzarella
0 siblings, 0 replies; 51+ messages in thread
From: Stefano Garzarella @ 2024-07-30 8:22 UTC (permalink / raw)
To: Amery Hung
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
On Fri, Jul 26, 2024 at 04:22:16PM GMT, Amery Hung wrote:
>On Tue, Jul 23, 2024 at 7:42 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> On Wed, Jul 10, 2024 at 09:25:48PM GMT, Amery Hung wrote:
>> >From: Bobby Eshleman <bobby.eshleman@bytedance.com>
>> >
>> >This commit implements the common function
>> >virtio_transport_dgram_enqueue for enqueueing datagrams. It does not add
>> >usage in either vhost or virtio yet.
>> >
>> >Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>> >Signed-off-by: Amery Hung <amery.hung@bytedance.com>
>> >---
>> > include/linux/virtio_vsock.h | 1 +
>> > include/net/af_vsock.h | 2 +
>> > net/vmw_vsock/af_vsock.c | 2 +-
>> > net/vmw_vsock/virtio_transport_common.c | 87 ++++++++++++++++++++++++-
>> > 4 files changed, 90 insertions(+), 2 deletions(-)
>> >
>> >diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> >index f749a066af46..4408749febd2 100644
>> >--- a/include/linux/virtio_vsock.h
>> >+++ b/include/linux/virtio_vsock.h
>> >@@ -152,6 +152,7 @@ struct virtio_vsock_pkt_info {
>> > u16 op;
>> > u32 flags;
>> > bool reply;
>> >+ u8 remote_flags;
>> > };
>> >
>> > struct virtio_transport {
>> >diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> >index 44db8f2c507d..6e97d344ac75 100644
>> >--- a/include/net/af_vsock.h
>> >+++ b/include/net/af_vsock.h
>> >@@ -216,6 +216,8 @@ void vsock_for_each_connected_socket(struct vsock_transport *transport,
>> > void (*fn)(struct sock *sk));
>> > int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
>> > bool vsock_find_cid(unsigned int cid);
>> >+const struct vsock_transport *vsock_dgram_lookup_transport(unsigned int cid,
>> >+ __u8 flags);
>>
>> Why __u8 and not just u8?
>>
>
>Will change to u8.
>
>>
>> >
>> > struct vsock_skb_cb {
>> > unsigned int src_cid;
>> >diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> >index ab08cd81720e..f83b655fdbe9 100644
>> >--- a/net/vmw_vsock/af_vsock.c
>> >+++ b/net/vmw_vsock/af_vsock.c
>> >@@ -487,7 +487,7 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
>> > return transport;
>> > }
>> >
>> >-static const struct vsock_transport *
>> >+const struct vsock_transport *
>> > vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
>> > {
>> > const struct vsock_transport *transport;
>> >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> >index a1c76836d798..46cd1807f8e3 100644
>> >--- a/net/vmw_vsock/virtio_transport_common.c
>> >+++ b/net/vmw_vsock/virtio_transport_common.c
>> >@@ -1040,13 +1040,98 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)
>> > }
>> > EXPORT_SYMBOL_GPL(virtio_transport_shutdown);
>> >
>> >+static int virtio_transport_dgram_send_pkt_info(struct vsock_sock *vsk,
>> >+ struct virtio_vsock_pkt_info *info)
>> >+{
>> >+ u32 src_cid, src_port, dst_cid, dst_port;
>> >+ const struct vsock_transport *transport;
>> >+ const struct virtio_transport *t_ops;
>> >+ struct sock *sk = sk_vsock(vsk);
>> >+ struct virtio_vsock_hdr *hdr;
>> >+ struct sk_buff *skb;
>> >+ void *payload;
>> >+ int noblock = 0;
>> >+ int err;
>> >+
>> >+ info->type = virtio_transport_get_type(sk_vsock(vsk));
>> >+
>> >+ if (info->pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
>> >+ return -EMSGSIZE;
>> >+
>> >+ transport = vsock_dgram_lookup_transport(info->remote_cid, info->remote_flags);
>>
>> Can `transport` be null?
>>
>> I don't understand why we are calling vsock_dgram_lookup_transport()
>> again. Didn't we already do that in vsock_dgram_sendmsg()?
>>
>
>transport should be valid here sin)e we null-checked it in
>vsock_dgram_sendmsg(). The reason vsock_dgram_lookup_transport() is
>called again here is we don't have the transport when we called into
>transport->dgram_enqueue(). I can also instead add transport to the
>argument of dgram_enqueue() to eliminate this redundant lookup.
Yes, I would absolutely eliminate this double lookup.
You can add either a parameter, or define the callback in each transport
and internally use the statically allocated transport in each.
For example for vhost/vsock.c:
static int vhost_transport_dgram_enqueue(....) {
return virtio_transport_dgram_enqueue(&vhost_transport.transport,
...)
}
In virtio_transport_recv_pkt() we already do something similar.
>
>> Also should we add a comment mentioning that we can't use
>> virtio_transport_get_ops()? IIUC becuase the vsk can be not assigned
>> to a specific transport, right?
>>
>
>Correct. For virtio dgram socket, transport is not assigned unless
>vsock_dgram_connect() is called. I will add a comment here explaining
>this.
Thanks,
Stefano
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 09/14] virtio/vsock: add common datagram recv path
2024-07-30 0:35 ` Amery Hung
@ 2024-07-30 8:32 ` Stefano Garzarella
0 siblings, 0 replies; 51+ messages in thread
From: Stefano Garzarella @ 2024-07-30 8:32 UTC (permalink / raw)
To: Amery Hung
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
On Mon, Jul 29, 2024 at 05:35:01PM GMT, Amery Hung wrote:
>On Tue, Jul 23, 2024 at 7:42 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> On Wed, Jul 10, 2024 at 09:25:50PM GMT, Amery Hung wrote:
>> >From: Bobby Eshleman <bobby.eshleman@bytedance.com>
>> >
>> >This commit adds the common datagram receive functionality for virtio
>> >transports. It does not add the vhost/virtio users of that
>> >functionality.
>> >
>> >This functionality includes:
>> >- changes to the virtio_transport_recv_pkt() path for finding the
>> > bound socket receiver for incoming packets
>> >- virtio_transport_recv_pkt() saves the source cid and port to the
>> > control buffer for recvmsg() to initialize sockaddr_vm structure
>> > when using datagram
>> >
>> >Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>> >Signed-off-by: Amery Hung <amery.hung@bytedance.com>
>> >---
>> > net/vmw_vsock/virtio_transport_common.c | 79 +++++++++++++++++++++----
>> > 1 file changed, 66 insertions(+), 13 deletions(-)
>> >
>> >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> >index 46cd1807f8e3..a571b575fde9 100644
>> >--- a/net/vmw_vsock/virtio_transport_common.c
>> >+++ b/net/vmw_vsock/virtio_transport_common.c
>> >@@ -235,7 +235,9 @@ EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);
>> >
>> > static u16 virtio_transport_get_type(struct sock *sk)
>> > {
>> >- if (sk->sk_type == SOCK_STREAM)
>> >+ if (sk->sk_type == SOCK_DGRAM)
>> >+ return VIRTIO_VSOCK_TYPE_DGRAM;
>> >+ else if (sk->sk_type == SOCK_STREAM)
>> > return VIRTIO_VSOCK_TYPE_STREAM;
>> > else
>> > return VIRTIO_VSOCK_TYPE_SEQPACKET;
>> >@@ -1422,6 +1424,33 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
>> > kfree_skb(skb);
>> > }
>> >
>> >+static void
>> >+virtio_transport_dgram_kfree_skb(struct sk_buff *skb, int err)
>> >+{
>> >+ if (err == -ENOMEM)
>> >+ kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_RCVBUFF);
>> >+ else if (err == -ENOBUFS)
>> >+ kfree_skb_reason(skb, SKB_DROP_REASON_PROTO_MEM);
>> >+ else
>> >+ kfree_skb(skb);
>> >+}
>> >+
>> >+/* This function takes ownership of the skb.
>> >+ *
>> >+ * It either places the skb on the sk_receive_queue or frees it.
>> >+ */
>> >+static void
>> >+virtio_transport_recv_dgram(struct sock *sk, struct sk_buff *skb)
>> >+{
>> >+ int err;
>> >+
>> >+ err = sock_queue_rcv_skb(sk, skb);
>> >+ if (err) {
>> >+ virtio_transport_dgram_kfree_skb(skb, err);
>> >+ return;
>> >+ }
>> >+}
>> >+
>> > static int
>> > virtio_transport_recv_connected(struct sock *sk,
>> > struct sk_buff *skb)
>> >@@ -1591,7 +1620,8 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
>> > static bool virtio_transport_valid_type(u16 type)
>> > {
>> > return (type == VIRTIO_VSOCK_TYPE_STREAM) ||
>> >- (type == VIRTIO_VSOCK_TYPE_SEQPACKET);
>> >+ (type == VIRTIO_VSOCK_TYPE_SEQPACKET) ||
>> >+ (type == VIRTIO_VSOCK_TYPE_DGRAM);
>> > }
>> >
>> > /* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mutex
>> >@@ -1601,44 +1631,57 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>> > struct sk_buff *skb)
>> > {
>> > struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
>> >+ struct vsock_skb_cb *vsock_cb;
>>
>> This can be defined in the block where it's used.
>>
>
>Got it.
>
>> > struct sockaddr_vm src, dst;
>> > struct vsock_sock *vsk;
>> > struct sock *sk;
>> > bool space_available;
>> >+ u16 type;
>> >
>> > vsock_addr_init(&src, le64_to_cpu(hdr->src_cid),
>> > le32_to_cpu(hdr->src_port));
>> > vsock_addr_init(&dst, le64_to_cpu(hdr->dst_cid),
>> > le32_to_cpu(hdr->dst_port));
>> >
>> >+ type = le16_to_cpu(hdr->type);
>> >+
>> > trace_virtio_transport_recv_pkt(src.svm_cid, src.svm_port,
>> > dst.svm_cid, dst.svm_port,
>> > le32_to_cpu(hdr->len),
>> >- le16_to_cpu(hdr->type),
>> >+ type,
>> > le16_to_cpu(hdr->op),
>> > le32_to_cpu(hdr->flags),
>> > le32_to_cpu(hdr->buf_alloc),
>> > le32_to_cpu(hdr->fwd_cnt));
>> >
>> >- if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) {
>> >+ if (!virtio_transport_valid_type(type)) {
>> > (void)virtio_transport_reset_no_sock(t, skb);
>> > goto free_pkt;
>> > }
>> >
>> >- /* The socket must be in connected or bound table
>> >- * otherwise send reset back
>> >+ /* For stream/seqpacket, the socket must be in connected or bound table
>> >+ * otherwise send reset back.
>> >+ *
>> >+ * For datagrams, no reset is sent back.
>> > */
>> > sk = vsock_find_connected_socket(&src, &dst);
>> > if (!sk) {
>> >- sk = vsock_find_bound_socket(&dst);
>> >- if (!sk) {
>> >- (void)virtio_transport_reset_no_sock(t, skb);
>> >- goto free_pkt;
>> >+ if (type == VIRTIO_VSOCK_TYPE_DGRAM) {
>> >+ sk = vsock_find_bound_dgram_socket(&dst);
>> >+ if (!sk)
>> >+ goto free_pkt;
>> >+ } else {
>> >+ sk = vsock_find_bound_socket(&dst);
>> >+ if (!sk) {
>> >+ (void)virtio_transport_reset_no_sock(t, skb);
>> >+ goto free_pkt;
>> >+ }
>> > }
>> > }
>> >
>> >- if (virtio_transport_get_type(sk) != le16_to_cpu(hdr->type)) {
>> >- (void)virtio_transport_reset_no_sock(t, skb);
>> >+ if (virtio_transport_get_type(sk) != type) {
>> >+ if (type != VIRTIO_VSOCK_TYPE_DGRAM)
>> >+ (void)virtio_transport_reset_no_sock(t, skb);
>> > sock_put(sk);
>> > goto free_pkt;
>> > }
>> >@@ -1654,12 +1697,21 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>> >
>> > /* Check if sk has been closed before lock_sock */
>> > if (sock_flag(sk, SOCK_DONE)) {
>> >- (void)virtio_transport_reset_no_sock(t, skb);
>> >+ if (type != VIRTIO_VSOCK_TYPE_DGRAM)
>> >+ (void)virtio_transport_reset_no_sock(t, skb);
>> > release_sock(sk);
>> > sock_put(sk);
>> > goto free_pkt;
>> > }
>> >
>> >+ if (sk->sk_type == SOCK_DGRAM) {
>> >+ vsock_cb = vsock_skb_cb(skb);
>> >+ vsock_cb->src_cid = src.svm_cid;
>> >+ vsock_cb->src_port = src.svm_port;
>> >+ virtio_transport_recv_dgram(sk, skb);
>>
>>
>> What about adding an API that transports can use to hide this?
>>
>> I mean something that hide vsock_cb creation and queue packet in the
>> socket receive queue. I'd also not expose vsock_skb_cb in an header, but
>> I'd handle it internally in af_vsock.c. So I'd just expose API to
>> queue/dequeue them.
>>
>
>Got it. I will move vsock_skb_cb to af_vsock.c and create an API:
>
>vsock_dgram_skb_save_src_addr(struct sk_buff *skb, u32 cid, u32 port)
This is okay, but I would try to go further by directly adding an API to
queue dgrams in af_vsock.c (if it's feasible).
>
>Different dgram implementations will call this API instead of the code
>block above to save the source address information into the control
>buffer.
>
>A side note on why this is a vsock API instead of a member )unction in
>transport: As we move to support multi-transport dgram, different
>transport implementations can place skb into the sk->sk_receive_queue.
>Therefore, we cannot call transport-specific function in
>vsock_dgram_recvmsg() to initialize struct sockaddr_vm. Hence, the
>receiving paths of different transports need to call this API to save
>source address.
What I meant is, why virtio_transport_recv_dgram() can't be exposed by
af_vsock.c as vsock_recv_dgram() and handle all internally, like
populate vsock_cb, call sock_queue_rcv_skb(), etc.
>
>> Also why VMCI is using sk_receive_skb(), while we are using
>> sock_queue_rcv_skb()?
>>
>
>I _think_ originally we referred to UDP and UDS when designing virtio
>dgram, and ended up with placing skb into sk_receive_queue directly. I
>will look into this to provide better justification.
Great, thanks.
Maybe we can also ping VMCI maintainers to understand if they can switch
to sock_queue_rcv_skb(). But we should understand better the difference.
Thanks,
Stefano
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 04/14] af_vsock: generalize bind table functions
2024-07-30 8:00 ` Stefano Garzarella
@ 2024-07-30 17:56 ` Amery Hung
0 siblings, 0 replies; 51+ messages in thread
From: Amery Hung @ 2024-07-30 17:56 UTC (permalink / raw)
To: Stefano Garzarella
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
On Tue, Jul 30, 2024 at 1:00 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Sun, Jul 28, 2024 at 11:52:54AM GMT, Amery Hung wrote:
> >On Tue, Jul 23, 2024 at 7:40 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >>
> >> On Wed, Jul 10, 2024 at 09:25:45PM GMT, Amery Hung wrote:
> >> >From: Bobby Eshleman <bobby.eshleman@bytedance.com>
> >> >
> >> >This commit makes the bind table management functions in vsock usable
> >> >for different bind tables. Future work will introduce a new table for
> >> >datagrams to avoid address collisions, and these functions will be used
> >> >there.
> >> >
> >> >Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> >> >---
> >> > net/vmw_vsock/af_vsock.c | 34 +++++++++++++++++++++++++++-------
> >> > 1 file changed, 27 insertions(+), 7 deletions(-)
> >> >
> >> >diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> >> >index acc15e11700c..d571be9cdbf0 100644
> >> >--- a/net/vmw_vsock/af_vsock.c
> >> >+++ b/net/vmw_vsock/af_vsock.c
> >> >@@ -232,11 +232,12 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> >> > sock_put(&vsk->sk);
> >> > }
> >> >
> >> >-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> >> >+static struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
> >> >+ struct list_head *bind_table)
> >> > {
> >> > struct vsock_sock *vsk;
> >> >
> >> >- list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
> >> >+ list_for_each_entry(vsk, bind_table, bound_table) {
> >> > if (vsock_addr_equals_addr(addr, &vsk->local_addr))
> >> > return sk_vsock(vsk);
> >> >
> >> >@@ -249,6 +250,11 @@ static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> >> > return NULL;
> >> > }
> >> >
> >> >+static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> >> >+{
> >> >+ return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
> >> >+}
> >> >+
> >> > static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
> >> > struct sockaddr_vm *dst)
> >> > {
> >> >@@ -671,12 +677,18 @@ static void vsock_pending_work(struct work_struct *work)
> >> >
> >> > /**** SOCKET OPERATIONS ****/
> >> >
> >> >-static int __vsock_bind_connectible(struct vsock_sock *vsk,
> >> >- struct sockaddr_vm *addr)
> >> >+static int vsock_bind_common(struct vsock_sock *vsk,
> >> >+ struct sockaddr_vm *addr,
> >> >+ struct list_head *bind_table,
> >> >+ size_t table_size)
> >> > {
> >> > static u32 port;
> >> > struct sockaddr_vm new_addr;
> >> >
> >> >+ if (WARN_ONCE(table_size < VSOCK_HASH_SIZE,
> >> >+ "table size too small, may cause overflow"))
> >> >+ return -EINVAL;
> >> >+
> >>
> >> I'd add this in another commit.
> >>
> >> > if (!port)
> >> > port = get_random_u32_above(LAST_RESERVED_PORT);
> >> >
> >> >@@ -692,7 +704,8 @@ static int __vsock_bind_connectible(struct
> >> >vsock_sock *vsk,
> >> >
> >> > new_addr.svm_port = port++;
> >> >
> >> >- if (!__vsock_find_bound_socket(&new_addr)) {
> >> >+ if (!vsock_find_bound_socket_common(&new_addr,
> >> >+ &bind_table[VSOCK_HASH(addr)])) {
> >>
> >> Can we add a macro for `&bind_table[VSOCK_HASH(addr)])` ?
> >>
> >
> >Definitely. I will add the following macro:
> >
> >#define vsock_bound_sockets_in_table(bind_table, addr) \
> > (&bind_table[VSOCK_HASH(addr)])
>
> yeah.
>
> >
> >> > found = true;
> >> > break;
> >> > }
> >> >@@ -709,7 +722,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> >> > return -EACCES;
> >> > }
> >> >
> >> >- if (__vsock_find_bound_socket(&new_addr))
> >> >+ if (vsock_find_bound_socket_common(&new_addr,
> >> >+ &bind_table[VSOCK_HASH(addr)]))
> >> > return -EADDRINUSE;
> >> > }
> >> >
> >> >@@ -721,11 +735,17 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> >> > * by AF_UNIX.
> >> > */
> >> > __vsock_remove_bound(vsk);
> >> >- __vsock_insert_bound(vsock_bound_sockets(&vsk->local_addr), vsk);
> >> >+ __vsock_insert_bound(&bind_table[VSOCK_HASH(&vsk->local_addr)], vsk);
> >> >
> >> > return 0;
> >> > }
> >> >
> >> >+static int __vsock_bind_connectible(struct vsock_sock *vsk,
> >> >+ struct sockaddr_vm *addr)
> >> >+{
> >> >+ return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
> >>
> >> What about using ARRAY_SIZE(x) ?
> >>
> >> BTW we are using that size just to check it, but all the arrays we use
> >> are statically allocated, so what about a compile time check like
> >> BUILD_BUG_ON()?
> >>
> >
> >I will remove the table_size check you mentioned earlier and the
> >argument here as the arrays are allocated statically like you
> >mentioned.
> >
> >If you think this check may be a good addition, I can add a
> >BUILD_BUG_ON() in the new vsock_bound_sockets_in_table() macro.
>
> If you want to add it, we need to do it in a separate commit. But since
> we already have so many changes and both arrays are statically allocated
> in the same file, IMHO we can avoid the check.
>
> Stefano
>
Okay. I will not add the check.
Thanks,
Amery
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 12/14] vsock/loopback: implement datagram support
2024-07-10 21:25 ` [RFC PATCH net-next v6 12/14] vsock/loopback: " Amery Hung
@ 2024-08-01 12:18 ` Luigi Leonardi
0 siblings, 0 replies; 51+ messages in thread
From: Luigi Leonardi @ 2024-08-01 12:18 UTC (permalink / raw)
To: ameryhung
Cc: amery.hung, bobby.eshleman, bpf, bryantan, dan.carpenter, davem,
decui, edumazet, haiyangz, jasowang, jiang.wang, kuba, kvm, kys,
linux-hyperv, linux-kernel, mst, netdev, oxffffaa, pabeni,
pv-drivers, sgarzare, simon.horman, stefanha, vdasa,
virtualization, wei.liu, xiyou.wangcong, xuanzhuo, Luigi Leonardi
> +static bool vsock_loopback_dgram_allow(u32 cid, u32 port)
> +{
> + return true;
> +}
> +
> static bool vsock_loopback_seqpacket_allow(u32 remote_cid);
> static bool vsock_loopback_msgzerocopy_allow(void)
> {
> @@ -66,7 +71,7 @@ static struct virtio_transport loopback_transport = {
> .cancel_pkt = vsock_loopback_cancel_pkt,
>
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> - .dgram_allow = virtio_transport_dgram_allow,
> + .dgram_allow = vsock_loopback_dgram_allow,
>
> .stream_dequeue = virtio_transport_stream_dequeue,
> .stream_enqueue = virtio_transport_stream_enqueue,
> --
> 2.20.1
Code LGTM! Just because you have to send a new version I'd modify
the commit message to something like:
"Add 'vsock_loopback_dgram_allow' callback for datagram support."
Feel free to change it :)
Thank you,
Luigi
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
` (14 preceding siblings ...)
2024-07-23 14:38 ` [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Stefano Garzarella
@ 2025-07-22 14:35 ` Stefano Garzarella
2025-07-26 5:53 ` Amery Hung
15 siblings, 1 reply; 51+ messages in thread
From: Stefano Garzarella @ 2025-07-22 14:35 UTC (permalink / raw)
To: Amery Hung
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
Hi Amery,
On Wed, Jul 10, 2024 at 09:25:41PM +0000, Amery Hung wrote:
>Hey all!
>
>This series introduces support for datagrams to virtio/vsock.
any update on v7 of this series?
Thanks,
Stefano
>
>It is a spin-off (and smaller version) of this series from the summer:
> https://lore.kernel.org/all/cover.1660362668.git.bobby.eshleman@bytedance.com/
>
>Please note that this is an RFC and should not be merged until
>associated changes are made to the virtio specification, which will
>follow after discussion from this series.
>
>Another aside, the v4 of the series has only been mildly tested with a
>run of tools/testing/vsock/vsock_test. Some code likely needs cleaning
>up, but I'm hoping to get some of the design choices agreed upon before
>spending too much time making it pretty.
>
>This series first supports datagrams in a basic form for virtio, and
>then optimizes the sendpath for all datagram transports.
>
>The result is a very fast datagram communication protocol that
>outperforms even UDP on multi-queue virtio-net w/ vhost on a variety
>of multi-threaded workload samples.
>
>For those that are curious, some summary data comparing UDP and VSOCK
>DGRAM (N=5):
>
> vCPUS: 16
> virtio-net queues: 16
> payload size: 4KB
> Setup: bare metal + vm (non-nested)
>
> UDP: 287.59 MB/s
> VSOCK DGRAM: 509.2 MB/s
>
>Some notes about the implementation...
>
>This datagram implementation forces datagrams to self-throttle according
>to the threshold set by sk_sndbuf. It behaves similar to the credits
>used by streams in its effect on throughput and memory consumption, but
>it is not influenced by the receiving socket as credits are.
>
>The device drops packets silently.
>
>As discussed previously, this series introduces datagrams and defers
>fairness to future work. See discussion in v2 for more context around
>datagrams, fairness, and this implementation.
>
>Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>Signed-off-by: Amery Hung <amery.hung@bytedance.com>
>---
>Changes in v6:
>- allow empty transport in datagram vsock
>- add empty transport checks in various paths
>- transport layer now saves source cid and port to control buffer of skb
> to remove the dependency of transport in recvmsg()
>- fix virtio dgram_enqueue() by looking up the transport to be used when
> using sendto(2)
>- fix skb memory leaks in two places
>- add dgram auto-bind test
>- Link to v5: https://lore.kernel.org/r/20230413-b4-vsock-dgram-v5-0-581bd37fdb26@bytedance.com
>
>Changes in v5:
>- teach vhost to drop dgram when a datagram exceeds the receive buffer
> - now uses MSG_ERRQUEUE and depends on Arseniy's zerocopy patch:
> "vsock: read from socket's error queue"
>- replace multiple ->dgram_* callbacks with single ->dgram_addr_init()
> callback
>- refactor virtio dgram skb allocator to reduce conflicts w/ zerocopy series
>- add _fallback/_FALLBACK suffix to dgram transport variables/macros
>- add WARN_ONCE() for table_size / VSOCK_HASH issue
>- add static to vsock_find_bound_socket_common
>- dedupe code in vsock_dgram_sendmsg() using module_got var
>- drop concurrent sendmsg() for dgram and defer to future series
>- Add more tests
> - test EHOSTUNREACH in errqueue
> - test stream + dgram address collision
>- improve clarity of dgram msg bounds test code
>- Link to v4: https://lore.kernel.org/r/20230413-b4-vsock-dgram-v4-0-0cebbb2ae899@bytedance.com
>
>Changes in v4:
>- style changes
> - vsock: use sk_vsock(vsk) in vsock_dgram_recvmsg instead of
> &sk->vsk
> - vsock: fix xmas tree declaration
> - vsock: fix spacing issues
> - virtio/vsock: virtio_transport_recv_dgram returns void because err
> unused
>- sparse analysis warnings/errors
> - virtio/vsock: fix unitialized skerr on destroy
> - virtio/vsock: fix uninitialized err var on goto out
> - vsock: fix declarations that need static
> - vsock: fix __rcu annotation order
>- bugs
> - vsock: fix null ptr in remote_info code
> - vsock/dgram: make transport_dgram a fallback instead of first
> priority
> - vsock: remove redundant rcu read lock acquire in getname()
>- tests
> - add more tests (message bounds and more)
> - add vsock_dgram_bind() helper
> - add vsock_dgram_connect() helper
>
>Changes in v3:
>- Support multi-transport dgram, changing logic in connect/bind
> to support VMCI case
>- Support per-pkt transport lookup for sendto() case
>- Fix dgram_allow() implementation
>- Fix dgram feature bit number (now it is 3)
>- Fix binding so dgram and connectible (cid,port) spaces are
> non-overlapping
>- RCU protect transport ptr so connect() calls never leave
> a lockless read of the transport and remote_addr are always
> in sync
>- Link to v2: https://lore.kernel.org/r/20230413-b4-vsock-dgram-v2-0-079cc7cee62e@bytedance.com
>
>
>Bobby Eshleman (14):
> af_vsock: generalize vsock_dgram_recvmsg() to all transports
> af_vsock: refactor transport lookup code
> af_vsock: support multi-transport datagrams
> af_vsock: generalize bind table functions
> af_vsock: use a separate dgram bind table
> virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM
> virtio/vsock: add common datagram send path
> af_vsock: add vsock_find_bound_dgram_socket()
> virtio/vsock: add common datagram recv path
> virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
> vhost/vsock: implement datagram support
> vsock/loopback: implement datagram support
> virtio/vsock: implement datagram support
> test/vsock: add vsock dgram tests
>
> drivers/vhost/vsock.c | 62 +-
> include/linux/virtio_vsock.h | 9 +-
> include/net/af_vsock.h | 24 +-
> include/uapi/linux/virtio_vsock.h | 2 +
> net/vmw_vsock/af_vsock.c | 343 ++++++--
> net/vmw_vsock/hyperv_transport.c | 13 -
> net/vmw_vsock/virtio_transport.c | 24 +-
> net/vmw_vsock/virtio_transport_common.c | 188 ++++-
> net/vmw_vsock/vmci_transport.c | 61 +-
> net/vmw_vsock/vsock_loopback.c | 9 +-
> tools/testing/vsock/util.c | 177 +++-
> tools/testing/vsock/util.h | 10 +
> tools/testing/vsock/vsock_test.c | 1032 ++++++++++++++++++++---
> 13 files changed, 1638 insertions(+), 316 deletions(-)
>
>--
>2.20.1
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams
2025-07-22 14:35 ` Stefano Garzarella
@ 2025-07-26 5:53 ` Amery Hung
2025-07-29 12:40 ` Stefano Garzarella
0 siblings, 1 reply; 51+ messages in thread
From: Amery Hung @ 2025-07-26 5:53 UTC (permalink / raw)
To: Stefano Garzarella
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
On Tue, Jul 22, 2025 at 7:35 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> Hi Amery,
>
> On Wed, Jul 10, 2024 at 09:25:41PM +0000, Amery Hung wrote:
> >Hey all!
> >
> >This series introduces support for datagrams to virtio/vsock.
>
> any update on v7 of this series?
>
Hi Stefano,
Sorry that I don't have personal time to work on v7. Since I don't
think people involved in this set are still working on it, I am
posting my v7 WIP here to see if anyone is interested in finishing it.
Would greatly appreciate any help.
Link: https://github.com/ameryhung/linux/tree/vsock-dgram-v7
Here are the things that I haven't address in the WIP:
01/14
- Arseniy suggested doing skb_put(dg->payload_size) and memcpy(dg->payload_size)
07/14
- Remove the double transport lookup in the send path by passing
transport to dgram_enqueue
- Address Arseniy's comment about updating vsock_virtio_transport_common.h
14/14
- Split test/vsock into smaller patches
Finally the spec change discussion also needs to happen.
> Thanks,
> Stefano
>
> >
> >It is a spin-off (and smaller version) of this series from the summer:
> > https://lore.kernel.org/all/cover.1660362668.git.bobby.eshleman@bytedance.com/
> >
> >Please note that this is an RFC and should not be merged until
> >associated changes are made to the virtio specification, which will
> >follow after discussion from this series.
> >
> >Another aside, the v4 of the series has only been mildly tested with a
> >run of tools/testing/vsock/vsock_test. Some code likely needs cleaning
> >up, but I'm hoping to get some of the design choices agreed upon before
> >spending too much time making it pretty.
> >
> >This series first supports datagrams in a basic form for virtio, and
> >then optimizes the sendpath for all datagram transports.
> >
> >The result is a very fast datagram communication protocol that
> >outperforms even UDP on multi-queue virtio-net w/ vhost on a variety
> >of multi-threaded workload samples.
> >
> >For those that are curious, some summary data comparing UDP and VSOCK
> >DGRAM (N=5):
> >
> > vCPUS: 16
> > virtio-net queues: 16
> > payload size: 4KB
> > Setup: bare metal + vm (non-nested)
> >
> > UDP: 287.59 MB/s
> > VSOCK DGRAM: 509.2 MB/s
> >
> >Some notes about the implementation...
> >
> >This datagram implementation forces datagrams to self-throttle according
> >to the threshold set by sk_sndbuf. It behaves similar to the credits
> >used by streams in its effect on throughput and memory consumption, but
> >it is not influenced by the receiving socket as credits are.
> >
> >The device drops packets silently.
> >
> >As discussed previously, this series introduces datagrams and defers
> >fairness to future work. See discussion in v2 for more context around
> >datagrams, fairness, and this implementation.
> >
> >Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> >Signed-off-by: Amery Hung <amery.hung@bytedance.com>
> >---
> >Changes in v6:
> >- allow empty transport in datagram vsock
> >- add empty transport checks in various paths
> >- transport layer now saves source cid and port to control buffer of skb
> > to remove the dependency of transport in recvmsg()
> >- fix virtio dgram_enqueue() by looking up the transport to be used when
> > using sendto(2)
> >- fix skb memory leaks in two places
> >- add dgram auto-bind test
> >- Link to v5: https://lore.kernel.org/r/20230413-b4-vsock-dgram-v5-0-581bd37fdb26@bytedance.com
> >
> >Changes in v5:
> >- teach vhost to drop dgram when a datagram exceeds the receive buffer
> > - now uses MSG_ERRQUEUE and depends on Arseniy's zerocopy patch:
> > "vsock: read from socket's error queue"
> >- replace multiple ->dgram_* callbacks with single ->dgram_addr_init()
> > callback
> >- refactor virtio dgram skb allocator to reduce conflicts w/ zerocopy series
> >- add _fallback/_FALLBACK suffix to dgram transport variables/macros
> >- add WARN_ONCE() for table_size / VSOCK_HASH issue
> >- add static to vsock_find_bound_socket_common
> >- dedupe code in vsock_dgram_sendmsg() using module_got var
> >- drop concurrent sendmsg() for dgram and defer to future series
> >- Add more tests
> > - test EHOSTUNREACH in errqueue
> > - test stream + dgram address collision
> >- improve clarity of dgram msg bounds test code
> >- Link to v4: https://lore.kernel.org/r/20230413-b4-vsock-dgram-v4-0-0cebbb2ae899@bytedance.com
> >
> >Changes in v4:
> >- style changes
> > - vsock: use sk_vsock(vsk) in vsock_dgram_recvmsg instead of
> > &sk->vsk
> > - vsock: fix xmas tree declaration
> > - vsock: fix spacing issues
> > - virtio/vsock: virtio_transport_recv_dgram returns void because err
> > unused
> >- sparse analysis warnings/errors
> > - virtio/vsock: fix unitialized skerr on destroy
> > - virtio/vsock: fix uninitialized err var on goto out
> > - vsock: fix declarations that need static
> > - vsock: fix __rcu annotation order
> >- bugs
> > - vsock: fix null ptr in remote_info code
> > - vsock/dgram: make transport_dgram a fallback instead of first
> > priority
> > - vsock: remove redundant rcu read lock acquire in getname()
> >- tests
> > - add more tests (message bounds and more)
> > - add vsock_dgram_bind() helper
> > - add vsock_dgram_connect() helper
> >
> >Changes in v3:
> >- Support multi-transport dgram, changing logic in connect/bind
> > to support VMCI case
> >- Support per-pkt transport lookup for sendto() case
> >- Fix dgram_allow() implementation
> >- Fix dgram feature bit number (now it is 3)
> >- Fix binding so dgram and connectible (cid,port) spaces are
> > non-overlapping
> >- RCU protect transport ptr so connect() calls never leave
> > a lockless read of the transport and remote_addr are always
> > in sync
> >- Link to v2: https://lore.kernel.org/r/20230413-b4-vsock-dgram-v2-0-079cc7cee62e@bytedance.com
> >
> >
> >Bobby Eshleman (14):
> > af_vsock: generalize vsock_dgram_recvmsg() to all transports
> > af_vsock: refactor transport lookup code
> > af_vsock: support multi-transport datagrams
> > af_vsock: generalize bind table functions
> > af_vsock: use a separate dgram bind table
> > virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM
> > virtio/vsock: add common datagram send path
> > af_vsock: add vsock_find_bound_dgram_socket()
> > virtio/vsock: add common datagram recv path
> > virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
> > vhost/vsock: implement datagram support
> > vsock/loopback: implement datagram support
> > virtio/vsock: implement datagram support
> > test/vsock: add vsock dgram tests
> >
> > drivers/vhost/vsock.c | 62 +-
> > include/linux/virtio_vsock.h | 9 +-
> > include/net/af_vsock.h | 24 +-
> > include/uapi/linux/virtio_vsock.h | 2 +
> > net/vmw_vsock/af_vsock.c | 343 ++++++--
> > net/vmw_vsock/hyperv_transport.c | 13 -
> > net/vmw_vsock/virtio_transport.c | 24 +-
> > net/vmw_vsock/virtio_transport_common.c | 188 ++++-
> > net/vmw_vsock/vmci_transport.c | 61 +-
> > net/vmw_vsock/vsock_loopback.c | 9 +-
> > tools/testing/vsock/util.c | 177 +++-
> > tools/testing/vsock/util.h | 10 +
> > tools/testing/vsock/vsock_test.c | 1032 ++++++++++++++++++++---
> > 13 files changed, 1638 insertions(+), 316 deletions(-)
> >
> >--
> >2.20.1
> >
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams
2025-07-26 5:53 ` Amery Hung
@ 2025-07-29 12:40 ` Stefano Garzarella
0 siblings, 0 replies; 51+ messages in thread
From: Stefano Garzarella @ 2025-07-29 12:40 UTC (permalink / raw)
To: Amery Hung, Sergio Lopez Pascual, Tyler Fanelli
Cc: stefanha, mst, jasowang, xuanzhuo, davem, edumazet, kuba, pabeni,
kys, haiyangz, wei.liu, decui, bryantan, vdasa, pv-drivers,
dan.carpenter, simon.horman, oxffffaa, kvm, virtualization,
netdev, linux-kernel, linux-hyperv, bpf, bobby.eshleman,
jiang.wang, amery.hung, xiyou.wangcong
Hi Amery,
On Sat, 26 Jul 2025 at 07:53, Amery Hung <ameryhung@gmail.com> wrote:
>
> On Tue, Jul 22, 2025 at 7:35 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > Hi Amery,
> >
> > On Wed, Jul 10, 2024 at 09:25:41PM +0000, Amery Hung wrote:
> > >Hey all!
> > >
> > >This series introduces support for datagrams to virtio/vsock.
> >
> > any update on v7 of this series?
> >
>
> Hi Stefano,
>
> Sorry that I don't have personal time to work on v7. Since I don't
> think people involved in this set are still working on it, I am
> posting my v7 WIP here to see if anyone is interested in finishing it.
> Would greatly appreciate any help.
>
> Link: https://github.com/ameryhung/linux/tree/vsock-dgram-v7
>
> Here are the things that I haven't address in the WIP:
>
> 01/14
> - Arseniy suggested doing skb_put(dg->payload_size) and memcpy(dg->payload_size)
>
> 07/14
> - Remove the double transport lookup in the send path by passing
> transport to dgram_enqueue
> - Address Arseniy's comment about updating vsock_virtio_transport_common.h
>
> 14/14
> - Split test/vsock into smaller patches
>
> Finally the spec change discussion also needs to happen.
Thanks for the update!
I CCed Sergio and Tyler that may be interested on completing this for
libkrun use case.
Thanks,
Stefano
>
>
>
> > Thanks,
> > Stefano
> >
> > >
> > >It is a spin-off (and smaller version) of this series from the summer:
> > > https://lore.kernel.org/all/cover.1660362668.git.bobby.eshleman@bytedance.com/
> > >
> > >Please note that this is an RFC and should not be merged until
> > >associated changes are made to the virtio specification, which will
> > >follow after discussion from this series.
> > >
> > >Another aside, the v4 of the series has only been mildly tested with a
> > >run of tools/testing/vsock/vsock_test. Some code likely needs cleaning
> > >up, but I'm hoping to get some of the design choices agreed upon before
> > >spending too much time making it pretty.
> > >
> > >This series first supports datagrams in a basic form for virtio, and
> > >then optimizes the sendpath for all datagram transports.
> > >
> > >The result is a very fast datagram communication protocol that
> > >outperforms even UDP on multi-queue virtio-net w/ vhost on a variety
> > >of multi-threaded workload samples.
> > >
> > >For those that are curious, some summary data comparing UDP and VSOCK
> > >DGRAM (N=5):
> > >
> > > vCPUS: 16
> > > virtio-net queues: 16
> > > payload size: 4KB
> > > Setup: bare metal + vm (non-nested)
> > >
> > > UDP: 287.59 MB/s
> > > VSOCK DGRAM: 509.2 MB/s
> > >
> > >Some notes about the implementation...
> > >
> > >This datagram implementation forces datagrams to self-throttle according
> > >to the threshold set by sk_sndbuf. It behaves similar to the credits
> > >used by streams in its effect on throughput and memory consumption, but
> > >it is not influenced by the receiving socket as credits are.
> > >
> > >The device drops packets silently.
> > >
> > >As discussed previously, this series introduces datagrams and defers
> > >fairness to future work. See discussion in v2 for more context around
> > >datagrams, fairness, and this implementation.
> > >
> > >Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> > >Signed-off-by: Amery Hung <amery.hung@bytedance.com>
> > >---
> > >Changes in v6:
> > >- allow empty transport in datagram vsock
> > >- add empty transport checks in various paths
> > >- transport layer now saves source cid and port to control buffer of skb
> > > to remove the dependency of transport in recvmsg()
> > >- fix virtio dgram_enqueue() by looking up the transport to be used when
> > > using sendto(2)
> > >- fix skb memory leaks in two places
> > >- add dgram auto-bind test
> > >- Link to v5: https://lore.kernel.org/r/20230413-b4-vsock-dgram-v5-0-581bd37fdb26@bytedance.com
> > >
> > >Changes in v5:
> > >- teach vhost to drop dgram when a datagram exceeds the receive buffer
> > > - now uses MSG_ERRQUEUE and depends on Arseniy's zerocopy patch:
> > > "vsock: read from socket's error queue"
> > >- replace multiple ->dgram_* callbacks with single ->dgram_addr_init()
> > > callback
> > >- refactor virtio dgram skb allocator to reduce conflicts w/ zerocopy series
> > >- add _fallback/_FALLBACK suffix to dgram transport variables/macros
> > >- add WARN_ONCE() for table_size / VSOCK_HASH issue
> > >- add static to vsock_find_bound_socket_common
> > >- dedupe code in vsock_dgram_sendmsg() using module_got var
> > >- drop concurrent sendmsg() for dgram and defer to future series
> > >- Add more tests
> > > - test EHOSTUNREACH in errqueue
> > > - test stream + dgram address collision
> > >- improve clarity of dgram msg bounds test code
> > >- Link to v4: https://lore.kernel.org/r/20230413-b4-vsock-dgram-v4-0-0cebbb2ae899@bytedance.com
> > >
> > >Changes in v4:
> > >- style changes
> > > - vsock: use sk_vsock(vsk) in vsock_dgram_recvmsg instead of
> > > &sk->vsk
> > > - vsock: fix xmas tree declaration
> > > - vsock: fix spacing issues
> > > - virtio/vsock: virtio_transport_recv_dgram returns void because err
> > > unused
> > >- sparse analysis warnings/errors
> > > - virtio/vsock: fix unitialized skerr on destroy
> > > - virtio/vsock: fix uninitialized err var on goto out
> > > - vsock: fix declarations that need static
> > > - vsock: fix __rcu annotation order
> > >- bugs
> > > - vsock: fix null ptr in remote_info code
> > > - vsock/dgram: make transport_dgram a fallback instead of first
> > > priority
> > > - vsock: remove redundant rcu read lock acquire in getname()
> > >- tests
> > > - add more tests (message bounds and more)
> > > - add vsock_dgram_bind() helper
> > > - add vsock_dgram_connect() helper
> > >
> > >Changes in v3:
> > >- Support multi-transport dgram, changing logic in connect/bind
> > > to support VMCI case
> > >- Support per-pkt transport lookup for sendto() case
> > >- Fix dgram_allow() implementation
> > >- Fix dgram feature bit number (now it is 3)
> > >- Fix binding so dgram and connectible (cid,port) spaces are
> > > non-overlapping
> > >- RCU protect transport ptr so connect() calls never leave
> > > a lockless read of the transport and remote_addr are always
> > > in sync
> > >- Link to v2: https://lore.kernel.org/r/20230413-b4-vsock-dgram-v2-0-079cc7cee62e@bytedance.com
> > >
> > >
> > >Bobby Eshleman (14):
> > > af_vsock: generalize vsock_dgram_recvmsg() to all transports
> > > af_vsock: refactor transport lookup code
> > > af_vsock: support multi-transport datagrams
> > > af_vsock: generalize bind table functions
> > > af_vsock: use a separate dgram bind table
> > > virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM
> > > virtio/vsock: add common datagram send path
> > > af_vsock: add vsock_find_bound_dgram_socket()
> > > virtio/vsock: add common datagram recv path
> > > virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
> > > vhost/vsock: implement datagram support
> > > vsock/loopback: implement datagram support
> > > virtio/vsock: implement datagram support
> > > test/vsock: add vsock dgram tests
> > >
> > > drivers/vhost/vsock.c | 62 +-
> > > include/linux/virtio_vsock.h | 9 +-
> > > include/net/af_vsock.h | 24 +-
> > > include/uapi/linux/virtio_vsock.h | 2 +
> > > net/vmw_vsock/af_vsock.c | 343 ++++++--
> > > net/vmw_vsock/hyperv_transport.c | 13 -
> > > net/vmw_vsock/virtio_transport.c | 24 +-
> > > net/vmw_vsock/virtio_transport_common.c | 188 ++++-
> > > net/vmw_vsock/vmci_transport.c | 61 +-
> > > net/vmw_vsock/vsock_loopback.c | 9 +-
> > > tools/testing/vsock/util.c | 177 +++-
> > > tools/testing/vsock/util.h | 10 +
> > > tools/testing/vsock/vsock_test.c | 1032 ++++++++++++++++++++---
> > > 13 files changed, 1638 insertions(+), 316 deletions(-)
> > >
> > >--
> > >2.20.1
> > >
> >
>
^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2025-07-29 12:40 UTC | newest]
Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-10 21:25 [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Amery Hung
2024-07-10 21:25 ` [RFC PATCH net-next v6 01/14] af_vsock: generalize vsock_dgram_recvmsg() to all transports Amery Hung
2024-07-15 8:02 ` Luigi Leonardi
2024-07-15 23:39 ` Amery Hung
2024-07-29 19:25 ` Arseniy Krasnov
2024-07-10 21:25 ` [RFC PATCH net-next v6 02/14] af_vsock: refactor transport lookup code Amery Hung
2024-07-25 6:29 ` Arseniy Krasnov
2024-07-28 22:10 ` Amery Hung
2024-07-10 21:25 ` [RFC PATCH net-next v6 03/14] af_vsock: support multi-transport datagrams Amery Hung
2024-07-15 8:13 ` Arseniy Krasnov
2024-07-15 17:41 ` Amery Hung
2024-07-28 20:28 ` Arseniy Krasnov
2024-07-28 21:53 ` Amery Hung
2024-07-29 5:12 ` Arseniy Krasnov
2024-07-10 21:25 ` [RFC PATCH net-next v6 04/14] af_vsock: generalize bind table functions Amery Hung
2024-07-23 14:39 ` Stefano Garzarella
2024-07-28 18:52 ` Amery Hung
2024-07-30 8:00 ` Stefano Garzarella
2024-07-30 17:56 ` Amery Hung
2024-07-10 21:25 ` [RFC PATCH net-next v6 05/14] af_vsock: use a separate dgram bind table Amery Hung
2024-07-23 14:41 ` Stefano Garzarella
2024-07-28 21:37 ` Amery Hung
2024-07-30 8:05 ` Stefano Garzarella
2024-07-10 21:25 ` [RFC PATCH net-next v6 06/14] virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM Amery Hung
2024-07-10 21:25 ` [RFC PATCH net-next v6 07/14] virtio/vsock: add common datagram send path Amery Hung
2024-07-23 14:42 ` Stefano Garzarella
2024-07-26 23:22 ` Amery Hung
2024-07-30 8:22 ` Stefano Garzarella
2024-07-29 20:00 ` Arseniy Krasnov
2024-07-29 22:51 ` Amery Hung
2024-07-30 5:09 ` Arseniy Krasnov
2024-07-10 21:25 ` [RFC PATCH net-next v6 08/14] af_vsock: add vsock_find_bound_dgram_socket() Amery Hung
2024-07-10 21:25 ` [RFC PATCH net-next v6 09/14] virtio/vsock: add common datagram recv path Amery Hung
2024-07-23 14:42 ` Stefano Garzarella
2024-07-30 0:35 ` Amery Hung
2024-07-30 8:32 ` Stefano Garzarella
2024-07-10 21:25 ` [RFC PATCH net-next v6 10/14] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Amery Hung
2024-07-10 21:25 ` [RFC PATCH net-next v6 11/14] vhost/vsock: implement datagram support Amery Hung
2024-07-10 21:25 ` [RFC PATCH net-next v6 12/14] vsock/loopback: " Amery Hung
2024-08-01 12:18 ` Luigi Leonardi
2024-07-10 21:25 ` [RFC PATCH net-next v6 13/14] virtio/vsock: " Amery Hung
2024-07-11 23:02 ` Luigi Leonardi
2024-07-11 23:07 ` Amery Hung
2024-07-10 21:25 ` [RFC PATCH net-next v6 14/14] test/vsock: add vsock dgram tests Amery Hung
2024-07-20 19:58 ` Arseniy Krasnov
2024-07-23 14:43 ` Stefano Garzarella
2024-07-28 22:06 ` Amery Hung
2024-07-23 14:38 ` [RFC PATCH net-next v6 00/14] virtio/vsock: support datagrams Stefano Garzarella
2025-07-22 14:35 ` Stefano Garzarella
2025-07-26 5:53 ` Amery Hung
2025-07-29 12:40 ` Stefano Garzarella
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).