All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bobby Eshleman <bobbyeshleman@gmail.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: linux-hyperv@vger.kernel.org,
	Bobby Eshleman <bobby.eshleman@bytedance.com>,
	kvm@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>,
	VMware PV-Drivers Reviewers <pv-drivers@vmware.com>,
	Simon Horman <simon.horman@corigine.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	virtualization@lists.linux-foundation.org,
	Eric Dumazet <edumazet@google.com>,
	Dan Carpenter <dan.carpenter@linaro.org>,
	Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
	Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
	Bryan Tan <bryantan@vmware.com>, Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Arseniy Krasnov <oxffffaa@gmail.com>,
	Vishnu Dasa <vdasa@vmware.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org, "David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH RFC net-next v5 03/14] af_vsock: support multi-transport datagrams
Date: Fri, 4 Aug 2023 17:01:07 +0000	[thread overview]
Message-ID: <ZM0u09yiCDA6Bbmy@bullseye> (raw)
In-Reply-To: <tqynchyi6ceb3d7lbd4cwt6f6hu6ma3nq6dkqfknnmwstgx62w@z2dp5ybgmx4s>

On Fri, Aug 04, 2023 at 04:11:58PM +0200, Stefano Garzarella wrote:
> On Thu, Aug 03, 2023 at 06:58:24PM +0000, Bobby Eshleman wrote:
> > On Thu, Aug 03, 2023 at 02:42:26PM +0200, Stefano Garzarella wrote:
> > > On Thu, Aug 03, 2023 at 12:53:22AM +0000, Bobby Eshleman wrote:
> > > > On Wed, Aug 02, 2023 at 10:24:44PM +0000, Bobby Eshleman wrote:
> > > > > On Sun, Jul 23, 2023 at 12:53:15AM +0300, Arseniy Krasnov wrote:
> > > > > >
> > > > > >
> > > > > > On 19.07.2023 03:50, Bobby Eshleman wrote:
> > > > > > > This patch adds support for multi-transport datagrams.
> > > > > > >
> > > > > > > This includes:
> > > > > > > - Per-packet lookup of transports when using sendto(sockaddr_vm)
> > > > > > > - Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
> > > > > > >   sockaddr_vm
> > > > > > > - rename VSOCK_TRANSPORT_F_DGRAM to VSOCK_TRANSPORT_F_DGRAM_FALLBACK
> > > > > > > - connect() now assigns the transport for (similar to connectible
> > > > > > >   sockets)
> > > > > > >
> > > > > > > To preserve backwards compatibility with VMCI, some important changes
> > > > > > > are made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
> > > > > > > be used for dgrams only if there is not yet a g2h or h2g transport that
> > > > > > > has been registered that can transmit the packet. If there is a g2h/h2g
> > > > > > > transport for that remote address, then that transport will be used and
> > > > > > > not "transport_dgram". This essentially makes "transport_dgram" a
> > > > > > > fallback transport for when h2g/g2h has not yet gone online, and so it
> > > > > > > is renamed "transport_dgram_fallback". VMCI implements this transport.
> > > > > > >
> > > > > > > The logic around "transport_dgram" needs to be retained to prevent
> > > > > > > breaking VMCI:
> > > > > > >
> > > > > > > 1) VMCI datagrams existed prior to h2g/g2h and so operate under a
> > > > > > >    different paradigm. When the vmci transport comes online, it registers
> > > > > > >    itself with the DGRAM feature, but not H2G/G2H. Only later when the
> > > > > > >    transport has more information about its environment does it register
> > > > > > >    H2G or G2H.  In the case that a datagram socket is created after
> > > > > > >    VSOCK_TRANSPORT_F_DGRAM registration but before G2H/H2G registration,
> > > > > > >    the "transport_dgram" transport is the only registered transport and so
> > > > > > >    needs to be used.
> > > > > > >
> > > > > > > 2) VMCI seems to require a special message be sent by the transport when a
> > > > > > >    datagram socket calls bind(). Under the h2g/g2h model, the transport
> > > > > > >    is selected using the remote_addr which is set by connect(). At
> > > > > > >    bind time there is no remote_addr because often no connect() has been
> > > > > > >    called yet: the transport is null. Therefore, with a null transport
> > > > > > >    there doesn't seem to be any good way for a datagram socket to tell the
> > > > > > >    VMCI transport that it has just had bind() called upon it.
> > > > > > >
> > > > > > > With the new fallback logic, after H2G/G2H comes online the socket layer
> > > > > > > will access the VMCI transport via transport_{h2g,g2h}. Prior to H2G/G2H
> > > > > > > coming online, the socket layer will access the VMCI transport via
> > > > > > > "transport_dgram_fallback".
> > > > > > >
> > > > > > > Only transports with a special datagram fallback use-case such as VMCI
> > > > > > > need to register VSOCK_TRANSPORT_F_DGRAM_FALLBACK.
> > > > > > >
> > > > > > > Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> > > > > > > ---
> > > > > > >  drivers/vhost/vsock.c                   |  1 -
> > > > > > >  include/linux/virtio_vsock.h            |  2 --
> > > > > > >  include/net/af_vsock.h                  | 10 +++---
> > > > > > >  net/vmw_vsock/af_vsock.c                | 64 ++++++++++++++++++++++++++-------
> > > > > > >  net/vmw_vsock/hyperv_transport.c        |  6 ----
> > > > > > >  net/vmw_vsock/virtio_transport.c        |  1 -
> > > > > > >  net/vmw_vsock/virtio_transport_common.c |  7 ----
> > > > > > >  net/vmw_vsock/vmci_transport.c          |  2 +-
> > > > > > >  net/vmw_vsock/vsock_loopback.c          |  1 -
> > > > > > >  9 files changed, 58 insertions(+), 36 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > > > > index ae8891598a48..d5d6a3c3f273 100644
> > > > > > > --- a/drivers/vhost/vsock.c
> > > > > > > +++ b/drivers/vhost/vsock.c
> > > > > > > @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
> > > > > > >  		.cancel_pkt               = vhost_transport_cancel_pkt,
> > > > > > >
> > > > > > >  		.dgram_enqueue            = virtio_transport_dgram_enqueue,
> > > > > > > -		.dgram_bind               = virtio_transport_dgram_bind,
> > > > > > >  		.dgram_allow              = virtio_transport_dgram_allow,
> > > > > > >
> > > > > > >  		.stream_enqueue           = virtio_transport_stream_enqueue,
> > > > > > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > > > > > > index 18cbe8d37fca..7632552bee58 100644
> > > > > > > --- a/include/linux/virtio_vsock.h
> > > > > > > +++ b/include/linux/virtio_vsock.h
> > > > > > > @@ -211,8 +211,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> > > > > > >  u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> > > > > > >  bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> > > > > > >  bool virtio_transport_stream_allow(u32 cid, u32 port);
> > > > > > > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > > > > > > -				struct sockaddr_vm *addr);
> > > > > > >  bool virtio_transport_dgram_allow(u32 cid, u32 port);
> > > > > > >
> > > > > > >  int virtio_transport_connect(struct vsock_sock *vsk);
> > > > > > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > > > > > > index 305d57502e89..f6a0ca9d7c3e 100644
> > > > > > > --- a/include/net/af_vsock.h
> > > > > > > +++ b/include/net/af_vsock.h
> > > > > > > @@ -96,13 +96,13 @@ struct vsock_transport_send_notify_data {
> > > > > > >
> > > > > > >  /* Transport features flags */
> > > > > > >  /* Transport provides host->guest communication */
> > > > > > > -#define VSOCK_TRANSPORT_F_H2G		0x00000001
> > > > > > > +#define VSOCK_TRANSPORT_F_H2G			0x00000001
> > > > > > >  /* Transport provides guest->host communication */
> > > > > > > -#define VSOCK_TRANSPORT_F_G2H		0x00000002
> > > > > > > -/* Transport provides DGRAM communication */
> > > > > > > -#define VSOCK_TRANSPORT_F_DGRAM		0x00000004
> > > > > > > +#define VSOCK_TRANSPORT_F_G2H			0x00000002
> > > > > > > +/* Transport provides fallback for DGRAM communication */
> > > > > > > +#define VSOCK_TRANSPORT_F_DGRAM_FALLBACK	0x00000004
> > > > > > >  /* Transport provides local (loopback) communication */
> > > > > > > -#define VSOCK_TRANSPORT_F_LOCAL		0x00000008
> > > > > > > +#define VSOCK_TRANSPORT_F_LOCAL			0x00000008
> > > > > > >
> > > > > > >  struct vsock_transport {
> > > > > > >  	struct module *module;
> > > > > > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > > > > > > index ae5ac5531d96..26c97b33d55a 100644
> > > > > > > --- a/net/vmw_vsock/af_vsock.c
> > > > > > > +++ b/net/vmw_vsock/af_vsock.c
> > > > > > > @@ -139,8 +139,8 @@ struct proto vsock_proto = {
> > > > > > >  static const struct vsock_transport *transport_h2g;
> > > > > > >  /* Transport used for guest->host communication */
> > > > > > >  static const struct vsock_transport *transport_g2h;
> > > > > > > -/* Transport used for DGRAM communication */
> > > > > > > -static const struct vsock_transport *transport_dgram;
> > > > > > > +/* Transport used as a fallback for DGRAM communication */
> > > > > > > +static const struct vsock_transport *transport_dgram_fallback;
> > > > > > >  /* Transport used for local communication */
> > > > > > >  static const struct vsock_transport *transport_local;
> > > > > > >  static DEFINE_MUTEX(vsock_register_mutex);
> > > > > > > @@ -439,6 +439,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
> > > > > > >  	return transport;
> > > > > > >  }
> > > > > > >
> > > > > > > +static const struct vsock_transport *
> > > > > > > +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
> > > > > > > +{
> > > > > > > +	const struct vsock_transport *transport;
> > > > > > > +
> > > > > > > +	transport = vsock_connectible_lookup_transport(cid, flags);
> > > > > > > +	if (transport)
> > > > > > > +		return transport;
> > > > > > > +
> > > > > > > +	return transport_dgram_fallback;
> > > > > > > +}
> > > > > > > +
> > > > > > >  /* Assign a transport to a socket and call the .init transport callback.
> > > > > > >   *
> > > > > > >   * Note: for connection oriented socket this must be called when vsk->remote_addr
> > > > > > > @@ -475,7 +487,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> > > > > > >
> > > > > > >  	switch (sk->sk_type) {
> > > > > > >  	case SOCK_DGRAM:
> > > > > > > -		new_transport = transport_dgram;
> > > > > > > +		new_transport = vsock_dgram_lookup_transport(remote_cid,
> > > > > > > +							     remote_flags);
> > > > > >
> > > > > > I'm a little bit confused about this:
> > > > > > 1) Let's create SOCK_DGRAM socket using vsock_create()
> > > > > > 2) for SOCK_DGRAM it calls 'vsock_assign_transport()' and we go here, remote_cid == -1
> > > > > > 3) I guess 'vsock_dgram_lookup_transport()' calls logic from 0002 and returns h2g for such remote cid, which is not
> > > > > >    correct I think...
> > > > > >
> > > > > > Please correct me if i'm wrong
> > > > > >
> > > > > > Thanks, Arseniy
> > > > > >
> > > > >
> > > > > As I understand, for the VMCI case, if transport_h2g != NULL, then
> > > > > transport_h2g == transport_dgram_fallback. In either case,
> > > > > vsk->transport == transport_dgram_fallback.
> > > > >
> > > > > For the virtio/vhost case, temporarily vsk->transport == transport_h2g,
> > > > > but it is unused because vsk->transport->dgram_bind == NULL.
> > > > >
> > > > > Until SS_CONNECTED is set by connect() and vsk->transport is set
> > > > > correctly, the send path is barred from using the bad transport.
> > > > >
> > > > > I guess the recvmsg() path is a little more sketchy, and probably only
> > > > > works in my test cases because h2g/g2h in the vhost/virtio case have
> > > > > identical dgram_addr_init() implementations.
> > > > >
> > > > > I think a cleaner solution is maybe checking in vsock_create() if
> > > > > dgram_bind is implemented. If it is not, then vsk->transport should be
> > > > > reset to NULL and a comment added explaining why VMCI requires this.
> > > > >
> > > > > Then the other calls can begin explicitly checking for vsk->transport ==
> > > > > NULL.
> > > >
> > > > Actually, on further reflection here, in order for the vsk->transport to
> > > > be called in time for ->dgram_addr_init(), it is going to be necessary
> > > > to call vsock_assign_transport() in vsock_dgram_bind() anyway.
> > > >
> > > > I think this means that the vsock_assign_transport() call can be removed
> > > > from vsock_create() call entirely, and yet VMCI can still dispatch
> > > > messages upon bind() calls as needed.
> > > >
> > > > This would then simplify the whole arrangement, if there aren't other
> > > > unseen issues.
> > > 
> > > This sounds like a good approach.
> > > 
> > > My only question is whether vsock_dgram_bind() is always called for each
> > > dgram socket.
> > > 
> > 
> > No, not yet.
> > 
> > Currently, receivers may use vsock_dgram_recvmsg() prior to any bind,
> > but this should probably change.
> > 
> > For UDP, if we initialize a socket and call recvmsg() with no prior
> > bind, then the socket will be auto-bound to 0.0.0.0. I guess vsock
> > should probably also auto-bind in this case.
> 
> I see.
> 
> > 
> > For other cases, bind may not be called prior to calls to vsock_poll() /
> > vsock_getname() (even if it doesn't make sense to do so), but I think it
> > is okay as long as vsk->transport is not used.
> 
> Makes sense.
> 
> > 
> > vsock_dgram_sendmsg() always auto-binds if needed.
> 
> Okay, but the transport for sending messages, doesn't depend on the
> local address, right?

That is correct.

Best,
Bobby

  reply	other threads:[~2023-08-04 17:02 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-19  0:50 [PATCH RFC net-next v5 00/14] virtio/vsock: support datagrams Bobby Eshleman
2023-07-19  0:50 ` [PATCH RFC net-next v5 01/14] af_vsock: generalize vsock_dgram_recvmsg() to all transports Bobby Eshleman
2023-07-24 18:11   ` Arseniy Krasnov
2023-07-26 18:21     ` Bobby Eshleman
2023-07-27  7:53       ` Arseniy Krasnov
2023-07-19  0:50 ` [PATCH RFC net-next v5 02/14] af_vsock: refactor transport lookup code Bobby Eshleman
2023-07-19  0:50 ` [PATCH RFC net-next v5 03/14] af_vsock: support multi-transport datagrams Bobby Eshleman
2023-07-22 21:53   ` Arseniy Krasnov
2023-08-02 22:24     ` Bobby Eshleman
2023-08-03  0:53       ` Bobby Eshleman
2023-08-03 12:42         ` Stefano Garzarella
2023-08-03 12:42           ` Stefano Garzarella
2023-08-03 18:58           ` Bobby Eshleman
2023-08-04 14:11             ` Stefano Garzarella
2023-08-04 14:11               ` Stefano Garzarella
2023-08-04 17:01               ` Bobby Eshleman [this message]
2023-07-19  0:50 ` [PATCH RFC net-next v5 04/14] af_vsock: generalize bind table functions Bobby Eshleman
2023-07-19  0:50 ` [PATCH RFC net-next v5 05/14] af_vsock: use a separate dgram bind table Bobby Eshleman
2023-07-19  0:50 ` [PATCH RFC net-next v5 06/14] virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM Bobby Eshleman
2023-07-19  0:50 ` [PATCH RFC net-next v5 07/14] virtio/vsock: add common datagram send path Bobby Eshleman
2023-07-22  8:16   ` Arseniy Krasnov
2023-07-26 17:08     ` Bobby Eshleman
2023-07-27  7:57       ` Arseniy Krasnov
2023-08-02 20:08         ` Bobby Eshleman
2023-07-19  0:50 ` [PATCH RFC net-next v5 08/14] af_vsock: add vsock_find_bound_dgram_socket() Bobby Eshleman
2023-07-19  0:50 ` [PATCH RFC net-next v5 09/14] virtio/vsock: add common datagram recv path Bobby Eshleman
2023-07-19  0:50 ` [PATCH RFC net-next v5 10/14] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Bobby Eshleman
2023-07-26 18:38   ` Michael S. Tsirkin
2023-07-26 18:38     ` Michael S. Tsirkin
2023-07-27  7:48     ` Stefano Garzarella
2023-07-27  7:48       ` Stefano Garzarella
2023-08-01  4:30       ` Bobby Eshleman
2023-08-01 16:04         ` Stefano Garzarella
2023-08-01 16:04           ` Stefano Garzarella
2023-07-19  0:50 ` [PATCH RFC net-next v5 11/14] vhost/vsock: implement datagram support Bobby Eshleman
2023-07-22  8:42   ` Arseniy Krasnov
2023-07-26 17:55     ` Bobby Eshleman
2023-07-27  8:00       ` Arseniy Krasnov
2023-08-02 21:23         ` Bobby Eshleman
2023-08-04 13:16           ` Arseniy Krasnov
2023-07-27  8:04       ` Arseniy Krasnov
2023-07-26 18:40   ` Michael S. Tsirkin
2023-07-26 18:40     ` Michael S. Tsirkin
2023-08-01  4:26     ` Bobby Eshleman
2023-08-02 19:28     ` Bobby Eshleman
2023-08-12  8:01       ` Bobby Eshleman
2023-07-19  0:50 ` [PATCH RFC net-next v5 12/14] vsock/loopback: " Bobby Eshleman
2023-07-19  0:50 ` [PATCH RFC net-next v5 13/14] virtio/vsock: " Bobby Eshleman
2023-07-22  8:45   ` Arseniy Krasnov
2023-07-26 17:58     ` Bobby Eshleman
2023-07-27  8:09       ` Arseniy Krasnov
2023-08-01  4:14         ` [PATCH RFC net-next v5 13/14] virtio/vsock: implement datagram supporty Bobby Eshleman
2023-07-19  0:50 ` [PATCH RFC net-next v5 14/14] test/vsock: add vsock dgram tests Bobby Eshleman
2023-07-22  8:51 ` [PATCH RFC net-next v5 00/14] virtio/vsock: support datagrams Arseniy Krasnov
2023-07-26 18:02 ` Bobby Eshleman
2023-07-27  7:51 ` Michael S. Tsirkin
2023-07-27  7:51   ` Michael S. Tsirkin
2023-08-01  5:30   ` Bobby Eshleman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZM0u09yiCDA6Bbmy@bullseye \
    --to=bobbyeshleman@gmail.com \
    --cc=bobby.eshleman@bytedance.com \
    --cc=bpf@vger.kernel.org \
    --cc=bryantan@vmware.com \
    --cc=dan.carpenter@linaro.org \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=edumazet@google.com \
    --cc=haiyangz@microsoft.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=oxffffaa@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=pv-drivers@vmware.com \
    --cc=sgarzare@redhat.com \
    --cc=simon.horman@corigine.com \
    --cc=stefanha@redhat.com \
    --cc=vdasa@vmware.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=wei.liu@kernel.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.