All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bobby Eshleman <bobbyeshleman@gmail.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: "Jakub Kicinski" <kuba@kernel.org>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	"Wei Liu" <wei.liu@kernel.org>,
	"Dexuan Cui" <decui@microsoft.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"Bryan Tan" <bryan-bt.tan@broadcom.com>,
	"Vishnu Dasa" <vishnu.dasa@broadcom.com>,
	"Broadcom internal kernel review list"
	<bcm-kernel-feedback-list@broadcom.com>,
	"David S. Miller" <davem@davemloft.net>,
	virtualization@lists.linux.dev, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
	kvm@vger.kernel.org
Subject: Re: [PATCH v2 1/3] vsock: add network namespace support
Date: Wed, 19 Mar 2025 12:00:38 -0700	[thread overview]
Message-ID: <Z9sUVs1Tq3SN83MQ@devvm6277.cco0.facebook.com> (raw)
In-Reply-To: <sqvqvlovlxpfo2tlkazugkocwmlhc7iay2kvq7b75bgwk7vhfw@tvgfe5fj3mw6>

On Wed, Mar 19, 2025 at 02:02:32PM +0100, Stefano Garzarella wrote:
> On Wed, Mar 12, 2025 at 01:59:35PM -0700, Bobby Eshleman wrote:
> > From: Stefano Garzarella <sgarzare@redhat.com>
> > 
> > This patch adds a check of the "net" assigned to a socket during
> > the vsock_find_bound_socket() and vsock_find_connected_socket()
> > to support network namespace, allowing to share the same address
> > (cid, port) across different network namespaces.
> > 
> > This patch preserves old behavior, and does not yet bring up namespace
> > support fully.
> > 
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> 
> I'd describe here a bit the new behaviour related to `fallback` that you
> developed.
> 
> Or we can split this patch in two patches, one with my changes without
> fallback, and another with fallback as you as author.
> 
> WDYT?
> 

I like the idea of splitting it, that way any unforeseen issues in the
new logic can be isolated to the one patch.

> 
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@gmail.com>
> > ---
> > v1 -> v2:
> > * remove 'netns' module param
> > * remove vsock_net_eq()
> > * use vsock_global_net() for "global" namespace
> > * use fallback logic in socket lookup functions, giving precedence to
> >  non-global vsock namespaces
> > 
> > RFC -> v1
> > * added 'netns' module param
> > * added 'vsock_net_eq()' to check the "net" assigned to a socket
> >  only when 'netns' support is enabled
> > ---
> > include/net/af_vsock.h                  |  7 +++--
> > net/vmw_vsock/af_vsock.c                | 55 ++++++++++++++++++++++++---------
> > net/vmw_vsock/hyperv_transport.c        |  2 +-
> > net/vmw_vsock/virtio_transport_common.c |  5 +--
> > net/vmw_vsock/vmci_transport.c          |  4 +--
> > 5 files changed, 51 insertions(+), 22 deletions(-)
> > 
> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > index 9e85424c834353d016a527070dd62e15ff3bfce1..41afbc18648c953da27a93571d408de968aa7668 100644
> > --- a/include/net/af_vsock.h
> > +++ b/include/net/af_vsock.h
> > @@ -213,9 +213,10 @@ void vsock_enqueue_accept(struct sock *listener, struct sock *connected);
> > void vsock_insert_connected(struct vsock_sock *vsk);
> > void vsock_remove_bound(struct vsock_sock *vsk);
> > void vsock_remove_connected(struct vsock_sock *vsk);
> > -struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
> > +struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr, struct net *net);
> > struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> > -					 struct sockaddr_vm *dst);
> > +					 struct sockaddr_vm *dst,
> > +					 struct net *net);
> > void vsock_remove_sock(struct vsock_sock *vsk);
> > void vsock_for_each_connected_socket(struct vsock_transport *transport,
> > 				     void (*fn)(struct sock *sk));
> > @@ -255,4 +256,6 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t)
> > {
> > 	return t->msgzerocopy_allow && t->msgzerocopy_allow();
> > }
> > +
> > +struct net *vsock_global_net(void);
> 
> If it just returns null, maybe we can make it inline here.
> 

Roger that.

> > #endif /* __AF_VSOCK_H__ */
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index 7e3db87ae4333cf63327ec105ca99253569bb9fe..d206489bf0a81cf989387c7c8063be91a7c21a7d 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -235,37 +235,60 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> > 	sock_put(&vsk->sk);
> > }
> > 
> > -static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> > +struct net *vsock_global_net(void)
> > {
> > +	return NULL;
> > +}
> > +EXPORT_SYMBOL_GPL(vsock_global_net);
> > +
> > +static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr,
> > +					      struct net *net)
> > +{
> 
> Please add a comment here to describe what fallback is used for.
> And I would suggest also something on top of this file to explain a bit
> how netns are handled in AF_VSOCK.
> 

sgtm!

> > +	struct sock *fallback = NULL;
> > 	struct vsock_sock *vsk;
> > 
> > 	list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
> > -		if (vsock_addr_equals_addr(addr, &vsk->local_addr))
> > -			return sk_vsock(vsk);
> > +		if (vsock_addr_equals_addr(addr, &vsk->local_addr)) {
> > +			if (net_eq(net, sock_net(sk_vsock(vsk))))
> > +				return sk_vsock(vsk);
> > 
> > +			if (net_eq(net, vsock_global_net()))
> > +				fallback = sk_vsock(vsk);
> > +		}
> > 		if (addr->svm_port == vsk->local_addr.svm_port &&
> > 		    (vsk->local_addr.svm_cid == VMADDR_CID_ANY ||
> > -		     addr->svm_cid == VMADDR_CID_ANY))
> > -			return sk_vsock(vsk);
> > +		     addr->svm_cid == VMADDR_CID_ANY)) {
> > +			if (net_eq(net, sock_net(sk_vsock(vsk))))
> > +				return sk_vsock(vsk);
> > +
> > +			if (net_eq(net, vsock_global_net()))
> > +				fallback = sk_vsock(vsk);
> > +		}
> > 	}
> > 
> > -	return NULL;
> > +	return fallback;
> > }
> > 
> > static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
> > -						  struct sockaddr_vm *dst)
> > +						  struct sockaddr_vm *dst,
> > +						  struct net *net)
> > {
> > +	struct sock *fallback = NULL;
> > 	struct vsock_sock *vsk;
> > 
> > 	list_for_each_entry(vsk, vsock_connected_sockets(src, dst),
> > 			    connected_table) {
> > 		if (vsock_addr_equals_addr(src, &vsk->remote_addr) &&
> > 		    dst->svm_port == vsk->local_addr.svm_port) {
> > -			return sk_vsock(vsk);
> > +			if (net_eq(net, sock_net(sk_vsock(vsk))))
> > +				return sk_vsock(vsk);
> > +
> > +			if (net_eq(net, vsock_global_net()))
> > +				fallback = sk_vsock(vsk);
> 
> This pattern seems to be repeated 3 times, can we make a function/macro?
> 

yep, no problem!

> > 		}
> > 	}
> > 
> > -	return NULL;
> > +	return fallback;
> > }
> > 
> > static void vsock_insert_unbound(struct vsock_sock *vsk)
> > @@ -304,12 +327,12 @@ void vsock_remove_connected(struct vsock_sock *vsk)
> > }
> > EXPORT_SYMBOL_GPL(vsock_remove_connected);
> > 
> > -struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
> > +struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr, struct net *net)
> > {
> > 	struct sock *sk;
> > 
> > 	spin_lock_bh(&vsock_table_lock);
> > -	sk = __vsock_find_bound_socket(addr);
> > +	sk = __vsock_find_bound_socket(addr, net);
> > 	if (sk)
> > 		sock_hold(sk);
> > 
> > @@ -320,12 +343,13 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
> > EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
> > 
> > struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> > -					 struct sockaddr_vm *dst)
> > +					 struct sockaddr_vm *dst,
> > +					 struct net *net)
> > {
> > 	struct sock *sk;
> > 
> > 	spin_lock_bh(&vsock_table_lock);
> > -	sk = __vsock_find_connected_socket(src, dst);
> > +	sk = __vsock_find_connected_socket(src, dst, net);
> > 	if (sk)
> > 		sock_hold(sk);
> > 
> > @@ -644,6 +668,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > {
> > 	static u32 port;
> > 	struct sockaddr_vm new_addr;
> > +	struct net *net = sock_net(sk_vsock(vsk));
> > 
> > 	if (!port)
> > 		port = get_random_u32_above(LAST_RESERVED_PORT);
> > @@ -660,7 +685,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > 
> > 			new_addr.svm_port = port++;
> > 
> > -			if (!__vsock_find_bound_socket(&new_addr)) {
> > +			if (!__vsock_find_bound_socket(&new_addr, net)) {
> > 				found = true;
> > 				break;
> > 			}
> > @@ -677,7 +702,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > 			return -EACCES;
> > 		}
> > 
> > -		if (__vsock_find_bound_socket(&new_addr))
> > +		if (__vsock_find_bound_socket(&new_addr, net))
> > 			return -EADDRINUSE;
> > 	}
> > 
> > diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> > index 31342ab502b4fc35feb812d2c94e0e35ded73771..253609898d24f8a484fcfc3296011c6f501a72a8 100644
> > --- a/net/vmw_vsock/hyperv_transport.c
> > +++ b/net/vmw_vsock/hyperv_transport.c
> > @@ -313,7 +313,7 @@ static void hvs_open_connection(struct vmbus_channel *chan)
> > 		return;
> > 
> > 	hvs_addr_init(&addr, conn_from_host ? if_type : if_instance);
> > -	sk = vsock_find_bound_socket(&addr);
> > +	sk = vsock_find_bound_socket(&addr, NULL);
> > 	if (!sk)
> > 		return;
> > 
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index 7f7de6d8809655fe522749fbbc9025df71f071bd..256d2a4fe482b3cb938a681b6924be69b2065616 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -1590,6 +1590,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> > 			       struct sk_buff *skb)
> > {
> > 	struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
> > +	struct net *net = vsock_global_net();
> 
> Why using vsock_global_net() in virtio and directly NULL in the others
> transports?
> 

This was an oversight on my part, I found an unnamed NULL harder to
reason about, switched to the func, but forgot to switch over the other
transports.

BTW, I was unsure about just making NULL a macro (e.g.,
VIRTIO_VSOCK_GLOBAL_NET?) instead of a function. I just used a function
because A) I noticed in the prior rev that the default net was a
function instead of some macro to &init_net, and B) the function seemed
a little more flexible for future changes. What are your thoughts here?


Thanks for the review!

Best,
Bobby

  reply	other threads:[~2025-03-19 19:00 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-12 20:59 [PATCH v2 0/3] vsock: add namespace support to vhost-vsock Bobby Eshleman
2025-03-12 20:59 ` [PATCH v2 1/3] vsock: add network namespace support Bobby Eshleman
2025-03-19 13:02   ` Stefano Garzarella
2025-03-19 19:00     ` Bobby Eshleman [this message]
2025-03-20  8:57       ` Stefano Garzarella
2025-03-20 20:56         ` Bobby Eshleman
2025-03-12 20:59 ` [PATCH v2 2/3] vsock/virtio_transport_common: handle netns of received packets Bobby Eshleman
2025-03-19 13:26   ` Stefano Garzarella
2025-03-19 19:05     ` Bobby Eshleman
2025-03-12 20:59 ` [PATCH v2 3/3] vhost/vsock: use netns of process that opens the vhost-vsock-netns device Bobby Eshleman
2025-03-19 14:15   ` Stefano Garzarella
2025-03-19 19:28     ` Bobby Eshleman
2025-03-19 21:09   ` Paolo Abeni
2025-03-20  9:08     ` Stefano Garzarella
2025-03-20 21:05       ` Bobby Eshleman
2025-03-21 10:02         ` Stefano Garzarella
2025-03-21 16:43           ` Bobby Eshleman
2025-03-26  0:11           ` Bobby Eshleman
2025-03-27  9:14             ` Stefano Garzarella
2025-03-28 16:07               ` Bobby Eshleman
2025-03-28 16:19                 ` Stefano Garzarella
2025-03-28 20:14                   ` Bobby Eshleman
2025-03-20 20:57     ` Bobby Eshleman
2025-03-13  2:28 ` [PATCH v2 0/3] vsock: add namespace support to vhost-vsock Bobby Eshleman
2025-03-13 15:37   ` Stefano Garzarella
2025-03-13 16:20     ` Bobby Eshleman
2025-03-21 19:49 ` Michael S. Tsirkin
2025-03-22  1:04   ` Bobby Eshleman
2025-03-28 17:03 ` Stefano Garzarella
2025-03-28 20:13   ` Bobby Eshleman
2025-04-01 19:05   ` Daniel P. Berrangé
2025-04-02  0:21     ` Bobby Eshleman
2025-04-02  8:13       ` Stefano Garzarella
2025-04-02  9:21         ` Daniel P. Berrangé
2025-04-02 22:18           ` Bobby Eshleman
2025-04-02 22:28             ` Bobby Eshleman
2025-04-03  9:33               ` Stefano Garzarella
2025-04-03 19:42                 ` Bobby Eshleman
2025-04-04 13:05             ` Daniel P. Berrangé
2025-04-18 17:57               ` Bobby Eshleman
2025-04-22 13:35                 ` Stefano Garzarella
2025-04-03  9:01           ` Stefano Garzarella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z9sUVs1Tq3SN83MQ@devvm6277.cco0.facebook.com \
    --to=bobbyeshleman@gmail.com \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=bryan-bt.tan@broadcom.com \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=eperezma@redhat.com \
    --cc=haiyangz@microsoft.com \
    --cc=jasowang@redhat.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=vishnu.dasa@broadcom.com \
    --cc=wei.liu@kernel.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.