netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ian Campbell <ian.campbell@docker.com>
To: "Michael S. Tsirkin" <mst@redhat.com>,
	Stefan Hajnoczi <stefanha@gmail.com>
Cc: marius vlad <marius.vlad0@gmail.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	kvm@vger.kernel.org, netdev@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>,
	Matt Benjamin <mbenjamin@redhat.com>,
	Greg Kurz <gkurz@linux.vnet.ibm.com>,
	Christoffer Dall <christoffer.dall@linaro.org>
Subject: Re: [RFC v5 0/5] Add virtio transport for AF_VSOCK
Date: Tue, 12 Apr 2016 17:07:49 +0100	[thread overview]
Message-ID: <1460477269.24107.1.camel@docker.com> (raw)
In-Reply-To: <20160411154517-mutt-send-email-mst@redhat.com>

Some how Stefan's reply disapeared from my INBOX (although I did see
it) so replying here.

On Mon, 2016-04-11 at 15:54 +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 11, 2016 at 11:45:48AM +0100, Stefan Hajnoczi wrote:
> > 
> > On Fri, Apr 08, 2016 at 04:35:05PM +0100, Ian Campbell wrote:
> > > 
> > > On Fri, 2016-04-01 at 15:23 +0100, Stefan Hajnoczi wrote:
> > > > 
> > > > This series is based on Michael Tsirkin's vhost branch (v4.5-rc6).
> > > > 
> > > > I'm about to process Claudio Imbrenda's locking fixes for virtio-vsock but
> > > > first I want to share the latest version of the code.  Several people are
> > > > playing with vsock now so sharing the latest code should avoid duplicate work.
> > > Thanks for this, I've been using it in my project and it mostly seems
> > > fine.
> > > 
> > > One wrinkle I came across, which I'm not sure if it is by design or a
> > > problem is that I can see this sequence coming from the guest (with
> > > other activity in between):
> > > 
> > >     1) OP_SHUTDOWN w/ flags == SHUTDOWN_RX
> > >     2) OP_SHUTDOWN w/ flags == SHUTDOWN_TX
> > >     3) OP_SHUTDOWN w/ flags == SHUTDOWN_TX|SHUTDOWN_RX
> > > 
> > > I orignally had my backend close things down at #2, however this meant
> > > that when #3 arrived it was for a non-existent socket (or, worse, an
> > > active one if the ports got reused). I checked v5 of the spec
> > > proposal[0] which says:
> > >     If these bits are set and there are no more virtqueue buffers
> > >     pending the socket is disconnected.
> > > 
> > > but I'm not entirely sure if this behaviour contradicts this or not
> > > (the bits have both been set at #2, but not at the same time).
> > > 
> > > BTW, how does one tell if there are no more virtqueue buffers pending
> > > or not while processing the op?
> > #2 is odd.  The shutdown bits are sticky so they cannot be cleared once
> > set.  I would have expected just #1 and #3.  The behavior you observe
> > look like a bug.
> > 
> > The spec text does not convey the meaning of OP_SHUTDOWN well.
> > OP_SHUTDOWN SHUTDOWN_TX|SHUTDOWN_RX means no further rx/tx is possible
> > for this connection.  "there are no more virtqueue buffers pending the
> > socket" really means that this isn't an immediate close from the
> > perspective of the application.  If the application still has unread rx
> > buffers then the socket stays readable until the rx data has been fully
> > read.

Thanks, distinguishing the local buffer to the application from the
vring would make that clearer. Perhaps by not talking about "virtqueue
buffers" since they sound like a vring thing.

However, as Michael observes I'm not sure that's the whole story.

> Yes but you also wrote:
> 	If these bits are set and there are no more virtqueue buffers
> 	pending the socket is disconnected.
> 
> how does remote know that there are no buffers pending and so it's safe
> to reuse the same source/destination address now?

Indeed this is one of the things I struggled with. e.g. If I send a
SHUTDOWN_RX to my peer am I supposed to wait for that buffer to come
back (so I know the peer has seen it) and then wait for an entire
"cycle" of the TX ring to know there is nothing still in flight? That's
some tricky book-keeping.

>   Maybe destination
> should send RST at that point?

i.e. upon receipt of SHUTDOWN_RX|SHUTDOWN_TX from the peer you are
expected to send a RST. When the peer observes that then they know
there is no further data in that connection on the ring?

That sounds like it would be helpful.

> > > Another thing I noticed, which is really more to do with the generic
> > > AF_VSOCK bits than anything to do with your patches is that there is no
> > > limitations on which vsock ports a non-privileged user can bind to and
> > > relatedly that there is no netns support so e.g. users in unproivileged
> > > containers can bind to any vsock port and talk to the host, which might
> > > be undesirable. For my use for now I just went with the big hammer
> > > approach of denying access from anything other than init_net
> > > namespace[1] while I consider what the right answer is.
> > From the vhost point of view each netns should have its own AF_VSOCK
> > namespace.  This way two containers could act as "the host" (CID 2) for
> > their respective guests.

When you say "should" you mean that's the intended design as opposed to
what the current code is actually doing, right?

Ian.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

      parent reply	other threads:[~2016-04-12 16:07 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-01 14:23 [RFC v5 0/5] Add virtio transport for AF_VSOCK Stefan Hajnoczi
2016-04-01 14:23 ` [RFC v5 1/5] VSOCK: transport-specific vsock_transport functions Stefan Hajnoczi
2016-04-01 14:23 ` [RFC v5 2/5] VSOCK: Introduce virtio_vsock_common.ko Stefan Hajnoczi
2016-04-01 14:23 ` [RFC v5 3/5] VSOCK: Introduce virtio_transport.ko Stefan Hajnoczi
2016-04-01 14:23 ` [RFC v5 4/5] VSOCK: Introduce vhost_vsock.ko Stefan Hajnoczi
2016-04-01 14:23 ` [RFC v5 5/5] VSOCK: Add Makefile and Kconfig Stefan Hajnoczi
2016-04-08 15:35 ` [RFC v5 0/5] Add virtio transport for AF_VSOCK Ian Campbell
2016-04-11 10:45   ` Stefan Hajnoczi
2016-04-11 12:54     ` Michael S. Tsirkin
2016-04-12 13:59       ` Stefan Hajnoczi
2016-04-12 16:37         ` Ian Campbell
2016-04-13 13:38           ` Stefan Hajnoczi
2016-04-12 16:07       ` Ian Campbell [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1460477269.24107.1.camel@docker.com \
    --to=ian.campbell@docker.com \
    --cc=christoffer.dall@linaro.org \
    --cc=gkurz@linux.vnet.ibm.com \
    --cc=imbrenda@linux.vnet.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=marius.vlad0@gmail.com \
    --cc=mbenjamin@redhat.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).