public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Bobby Eshleman <bobbyeshleman@gmail.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Bobby Eshleman <bobby.eshleman@gmail.com>,
	Bobby Eshleman <bobby.eshleman@bytedance.com>,
	Cong Wang <cong.wang@bytedance.com>,
	Jiang Wang <jiang.wang@bytedance.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Stefano Garzarella <sgarzare@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/6] vsock: add netdev to vhost/virtio vsock
Date: Tue, 16 Aug 2022 06:18:54 +0000	[thread overview]
Message-ID: <Yvs2qqx/sG0C3zvz@bullseye> (raw)
In-Reply-To: <20220816123701-mutt-send-email-mst@kernel.org>

On Tue, Aug 16, 2022 at 12:38:52PM -0400, Michael S. Tsirkin wrote:
> On Mon, Aug 15, 2022 at 10:56:06AM -0700, Bobby Eshleman wrote:
> > In order to support usage of qdisc on vsock traffic, this commit
> > introduces a struct net_device to vhost and virtio vsock.
> > 
> > Two new devices are created, vhost-vsock for vhost and virtio-vsock
> > for virtio. The devices are attached to the respective transports.
> > 
> > To bypass the usage of the device, the user may "down" the associated
> > network interface using common tools. For example, "ip link set dev
> > virtio-vsock down" lets vsock bypass the net_device and qdisc entirely,
> > simply using the FIFO logic of the prior implementation.
> 
> Ugh. That's quite a hack. Mark my words, at some point we will decide to
> have down mean "down".  Besides, multiple net devices with link up tend
> to confuse userspace. So might want to keep it down at all times
> even short term.
> 

I have to admit, this choice was born more of perceived necessity than
of a love for the design... but I can explain the pain points that led
to the current state, which I hope sparks some discussion.

When the state is down, dev_queue_xmit() will fail. To avoid this and
preserve the "zero-configuration" guarantee of vsock, I chose to make
transmission work regardless of device state by implementing this
"ignore up/down state" hack.

This is unfortunate because what we are really after here is just packet
scheduling, i.e., qdisc. We don't really need the rest of the
net_device, and I don't think up/down buys us anything of value. The
relationship between qdisc and net_device is so tightly knit together
though, that using qdisc without a net_device doesn't look very
practical (and maybe impossible).

Some alternative routes might be:

1) Default the state to up, and let users disable vsock by downing the
   device if they'd like. It still works out-of-the-box, but if users
   really want to disable vsock they may.

2) vsock may simply turn the device to state up when a socket is first
   used. For instance, the HCI device in net/bluetooth/hci_* uses a
   technique where the net_device is turned to up when bind() is called on
   any HCI socket (BTPROTO_HCI). It can also be turned up/down via
   ioctl().

3) Modify net_device registration to allow us to have an invisible
   device that is only known by the kernel. It may default to up and remain
   unchanged. The qdisc controls alone may be exposed to userspace,
   hopefully via netlink to still work with tc. This is not
   currently supported by register_netdevice(), but a series from 2007 was
   sent to the ML, tentatively approved in concept, but was never merged[1].

4) Currently NETDEV_UP/NETDEV_DOWN commands can't be vetoed.
   NETDEV_PRE_UP, however, is used to effectively veto NETDEV_UP
   commands[2]. We could introduce NETDEV_PRE_DOWN to support vetoing of
   NETDEV_DOWN. This would allow us to install a hook to determine if
   we actually want to allow the device to be downed.

In an ideal world, we could simply pass a set of netdev queues, a
packet, and maybe a blob of state to qdisc and let it work its
scheduling magic...

Any thoughts?

[1]: https://lore.kernel.org/netdev/20070129140958.0cf6880f@freekitty/
[2]: https://lore.kernel.org/all/20090529.220906.243061042.davem@davemloft.net/

Thanks,
Bobby

  reply	other threads:[~2022-08-16 21:21 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-15 17:56 [PATCH 0/6] virtio/vsock: introduce dgrams, sk_buff, and qdisc Bobby Eshleman
2022-08-15 17:56 ` [PATCH 1/6] vsock: replace virtio_vsock_pkt with sk_buff Bobby Eshleman
2022-08-16  2:30   ` Bobby Eshleman
2022-08-15 17:56 ` [PATCH 2/6] vsock: return errors other than -ENOMEM to socket Bobby Eshleman
2022-08-15 20:01   ` kernel test robot
2022-08-15 23:13   ` kernel test robot
2022-08-16  2:16   ` kernel test robot
2022-08-16  2:30   ` Bobby Eshleman
2022-08-17  5:28     ` [virtio-dev] " Arseniy Krasnov
2022-09-26 13:21   ` Stefano Garzarella
2022-09-26 21:30     ` Bobby Eshleman
2022-08-15 17:56 ` [PATCH 3/6] vsock: add netdev to vhost/virtio vsock Bobby Eshleman
2022-08-16  2:31   ` Bobby Eshleman
2022-08-16 16:38   ` Michael S. Tsirkin
2022-08-16  6:18     ` Bobby Eshleman [this message]
2022-08-16 18:07     ` Jakub Kicinski
2022-08-16  7:02       ` Bobby Eshleman
2022-08-16 23:07         ` Jakub Kicinski
2022-08-16  8:29           ` Bobby Eshleman
2022-08-17  1:15             ` Jakub Kicinski
2022-08-16 10:50               ` Bobby Eshleman
2022-08-17 17:20                 ` Michael S. Tsirkin
2022-08-18  4:34                   ` Jason Wang
2022-08-17  1:23           ` [External] " Cong Wang .
2022-09-06 10:58   ` Michael S. Tsirkin
2022-08-18 14:20     ` Bobby Eshleman
2022-08-15 17:56 ` [PATCH 4/6] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Bobby Eshleman
2022-08-16  2:31   ` Bobby Eshleman
2022-09-26 13:17   ` Stefano Garzarella
2022-09-26 21:52     ` Bobby Eshleman
2022-08-15 17:56 ` [PATCH 5/6] virtio/vsock: add support for dgram Bobby Eshleman
2022-08-15 21:02   ` kernel test robot
2022-08-16  2:32   ` Bobby Eshleman
2022-08-17  5:01     ` [virtio-dev] " Arseniy Krasnov
2022-08-16  9:57       ` Bobby Eshleman
2022-08-18  8:24         ` Arseniy Krasnov
2022-08-17  5:42       ` Arseniy Krasnov
2022-08-16  9:58         ` Bobby Eshleman
2022-08-18  8:35           ` Arseniy Krasnov
2022-08-16 20:52             ` Bobby Eshleman
2022-08-19  4:30               ` Arseniy Krasnov
2022-08-15 17:56 ` [PATCH 6/6] vsock_test: add tests for vsock dgram Bobby Eshleman
2022-08-16  2:32   ` Bobby Eshleman
2022-08-15 20:39 ` [PATCH 0/6] virtio/vsock: introduce dgrams, sk_buff, and qdisc Michael S. Tsirkin
2022-08-16  1:55   ` Bobby Eshleman
2022-08-16  2:29 ` Bobby Eshleman
     [not found] ` <CAGxU2F7+L-UiNPtUm4EukOgTVJ1J6Orqs1LMvhWWvfL9zWb23g@mail.gmail.com>
2022-08-16  2:35   ` Bobby Eshleman
2022-08-17  6:54 ` Michael S. Tsirkin
2022-08-16  9:42   ` Bobby Eshleman
2022-08-17 17:02     ` Michael S. Tsirkin
2022-08-16 11:08       ` Bobby Eshleman
2022-08-17 17:53         ` Michael S. Tsirkin
2022-08-16 12:10           ` Bobby Eshleman
2022-08-18  4:28   ` Jason Wang
2022-09-06  9:00     ` Stefano Garzarella
2022-09-06 13:26 ` Stefan Hajnoczi
2022-08-18 14:39   ` Bobby Eshleman
2022-09-08  8:30     ` Stefano Garzarella
2022-09-08 14:36     ` Call to discuss vsock netdev/sk_buff [was Re: [PATCH 0/6] virtio/vsock: introduce dgrams, sk_buff, and qdisc] Stefano Garzarella
2022-09-09 18:13       ` Bobby Eshleman
2022-09-12 18:12         ` Stefano Garzarella
2022-09-09 23:33           ` Bobby Eshleman
2022-09-16  3:51             ` Stefano Garzarella
2022-09-10 16:29               ` Bobby Eshleman
2022-09-26 13:42 ` [PATCH 0/6] virtio/vsock: introduce dgrams, sk_buff, and qdisc Stefano Garzarella
2022-09-26 21:44   ` Bobby Eshleman
2022-09-27 17:45   ` Stefano Garzarella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yvs2qqx/sG0C3zvz@bullseye \
    --to=bobbyeshleman@gmail.com \
    --cc=bobby.eshleman@bytedance.com \
    --cc=bobby.eshleman@gmail.com \
    --cc=cong.wang@bytedance.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jasowang@redhat.com \
    --cc=jiang.wang@bytedance.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox