netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Shrijeet Mukherjee <shm@cumulusnetworks.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Thomas Graf <tgraf@suug.ch>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Jakub Kicinski <kubakici@wp.pl>,
	David Miller <davem@davemloft.net>,
	alexander.duyck@gmail.com, shrijeet@gmail.com,
	tom@herbertland.com, netdev@vger.kernel.org,
	Roopa Prabhu <roopa@cumulusnetworks.com>,
	Nikolay Aleksandrov <nikolay@cumulusnetworks.com>,
	aconole@redhat.com
Subject: Re: [PATCH net-next RFC WIP] Patch for XDP support for virtio_net
Date: Thu, 3 Nov 2016 16:29:22 -0700	[thread overview]
Message-ID: <581BC852.7010304@gmail.com> (raw)
In-Reply-To: <20161104002503-mutt-send-email-mst@kernel.org>

[...]

>>> - when XDP is attached disable all LRO using VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET
>>>   (not used by driver so far, designed to allow dynamic LRO control with
>>>    ethtool)
>>
>> I see there is a UAPI bit for this but I guess we also need to add
>> support to vhost as well? Seems otherwise we may just drop a bunch
>> of packets on the floor out of handle_rx() when recvmsg returns larger
>> than a page size. Or did I read this wrong...
> 
> It's already supported host side. However you might
> get some packets that were in flight when you attached.
> 

Really I must have missed it I don't see any *GUEST_FEATURES* flag in
./drivers/vhost/?

>>> - start adding page-sized buffers
>>
>> I started to mangle add_recvbuf_big() and receive_big() here and this
>> didn't seem too bad.
> 
> I imagine it won't be ATM but I think we'll need to support
> mrg buffers with time and then it will be messy.
> Besides, it's not an architectural thing that receive_big
> uses page sized buffers, it could use any size.
> So a separate path just for xdp would be better imho.
> 
>>> - do something with non-page-sized buffers added previously - what
>>>   exactly? copy I guess? What about LRO packets that are too large -
>>>   can we drop or can we split them up?
>>
>> hmm not sure I understand this here. With LRO disabled and mergeable
>> buffers disabled all packets should fit in a page correct?
> 
> Assuing F_MTU is negotiated and MTU field is small enough, yes.
> But if you disable mrg buffers dynamically you will get some packets
> in buffers that were added before the disable.
> Similarly for disabling LRO dynamically.
> 
>> With LRO enabled case I guess to start with we block XDP from being
>> loaded for the same reason we don't allow jumbo frames on physical
>> nics.
> 
> If you ask that host disables the capability, then yes, it's easy.
> Let's do that for now, it's a start.
> 
> 
>>>
>>> I'm fine with disabling XDP for some configurations as the first step,
>>> and we can add that support later.
>>>
>>
>> In order for this to work though I guess we need to be able to
>> dynamically disable mergeable buffers at the moment I just commented
>> it out of the features list and fixed up virtio_has_features so it
>> wont bug_on.
> 
> For now we can just set mrg_rxbuf=off on qemu command line, and
> fail XDP attach if not there. I think we'll be able to support it
> long term but you will need host side changes, or fully reset
> device and reconfigure it.

see question below. I agree disabling mrg_rxbuff=off lro=off and an
xdp receive path makes this relatively straight forward and clean with
the MTU patch noted below as well.

> 
>>> Ideas about mergeable buffers (optional):
>>>
>>> At the moment mergeable buffers can't be disabled dynamically.
>>> They do bring a small benefit for XDP if host MTU is large (see below)
>>> and aren't hard to support:
>>> - if header is by itself skip 1st page
>>> - otherwise copy all data into first page
>>> and it's nicer not to add random limitations that require guest reboot.
>>> It might make sense to add a command that disables/enabled
>>> mergeable buffers dynamically but that's for newer hosts.
>>
>> Yep it seems disabling mergeable buffers solves this but didn't look at
>> it too closely. I'll look closer tomorrow.
>>
>>>
>>> Spec does not require it but in practice most hosts put all data
>>> in the 1st page or all in the 2nd page so the copy will be nop
>>> for these cases.
>>>
>>> Large host MTU - newer hosts report the host MTU, older ones don't.
>>> Using mergeable buffers we can at least detect this case
>>> (and then what? drop I guess).
>>>
>>
>> The physical nics just refuse to load XDP with large MTU.
> 
> So let's do the same for now, unfortunately you don't know
> the MTU unless _F_MTU is negitiated and QEMU does not
> implement that yet, but it's easy to add.
> In fact I suspect Aaron (cc) has an implementation since
> he posted a patch implementing that.
> Aaron could you post it pls?
> 

Great! Aaron if you want me to review/test at all let me know I have
a few systems setup running this now so can help if needed.

>> Any reason
>> not to negotiate the mtu with the guest so that the guest can force
>> this?
> 
> There are generally many guests and many NICs on the host.
> A big packet arrives, what do you want to do with it?

Drop it just like a physical nic would do if packet is larger
than MTU. Maybe splat a message in the log so user has some
clue something got misconfigured.

> We probably want to build propagating MTU across all VMs and NICs
> but let's get a basic thing merged first.

That feels like an orchestration/QEMU type problem to me. Just because
some NIC has jumbo frames enabled doesn't necessarily mean they would
ever get to any specific VM based on forwarding configuration.

And if I try to merge the last email I sent out here. In mergeable and
big_packets modes if LRO is off and MTU < PAGE_SIZE it seems we should
always get physically contiguous data on a single page correct? It
may be at some offset in a page however. But the offset should not
matter to XDP. If I read this right we wouldn't need to add a new
XDP mode and could just use the existing merge or big modes. This would
seem cleaner to me than adding a new mode and requiring a qemu option.

Thanks for all the pointers by the way its very helpful.

  reply	other threads:[~2016-11-03 23:29 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-22  4:07 [PATCH net-next RFC WIP] Patch for XDP support for virtio_net Shrijeet Mukherjee
2016-10-23 16:38 ` Stephen Hemminger
2016-10-24  1:51   ` Shrijeet Mukherjee
2016-10-25  1:10     ` Alexei Starovoitov
2016-10-25 17:36 ` Jakub Kicinski
2016-10-26 13:52 ` Jesper Dangaard Brouer
2016-10-26 16:36   ` Michael S. Tsirkin
2016-10-26 16:52     ` David Miller
2016-10-26 17:07       ` Michael S. Tsirkin
2016-10-26 17:11         ` David Miller
2016-10-27  8:55           ` Jesper Dangaard Brouer
2016-10-27 21:09             ` John Fastabend
2016-10-27 21:30               ` Michael S. Tsirkin
2016-10-27 21:42                 ` David Miller
2016-10-27 22:25                   ` Michael S. Tsirkin
2016-10-28  1:35                     ` David Miller
2016-10-28  1:43                       ` Alexander Duyck
2016-10-28  2:10                         ` David Miller
2016-10-28 15:56                           ` John Fastabend
2016-10-28 16:18                             ` Jakub Kicinski
2016-10-28 18:22                               ` Alexei Starovoitov
2016-10-28 20:35                                 ` Alexander Duyck
2016-10-28 20:42                                   ` Jakub Kicinski
2016-10-28 20:36                                 ` Jakub Kicinski
2016-10-29  3:51                                 ` Shrijeet Mukherjee
2016-10-29 11:25                                   ` Thomas Graf
2016-11-02 14:27                                     ` Jesper Dangaard Brouer
2016-11-03  1:28                                       ` Shrijeet Mukherjee
2016-11-03  4:11                                         ` Michael S. Tsirkin
2016-11-03  6:44                                           ` John Fastabend
2016-11-03 22:20                                             ` John Fastabend
2016-11-03 22:42                                             ` Michael S. Tsirkin
2016-11-03 23:29                                               ` John Fastabend [this message]
2016-11-04  0:34                                                 ` Michael S. Tsirkin
2016-11-04 23:05                                                   ` John Fastabend
2016-11-06  6:50                                                     ` Michael S. Tsirkin
2016-10-28 17:11                             ` David Miller
2016-10-30 22:53                               ` Michael S. Tsirkin
2016-11-02 14:01                               ` Jesper Dangaard Brouer
2016-11-02 16:06                                 ` Alexander Duyck
2016-10-28  0:02               ` Shrijeet Mukherjee
2016-10-28  0:46                 ` Shrijeet Mukherjee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=581BC852.7010304@gmail.com \
    --to=john.fastabend@gmail.com \
    --cc=aconole@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=kubakici@wp.pl \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@cumulusnetworks.com \
    --cc=roopa@cumulusnetworks.com \
    --cc=shm@cumulusnetworks.com \
    --cc=shrijeet@gmail.com \
    --cc=tgraf@suug.ch \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).