Re: [PATCH net-next RFC WIP] Patch for XDP support for virtio_net

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: David Miller <davem@davemloft.net>
Cc: john.fastabend@gmail.com, alexander.duyck@gmail.com,
	brouer@redhat.com, shrijeet@gmail.com, tom@herbertland.com,
	netdev@vger.kernel.org, shm@cumulusnetworks.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com
Subject: Re: [PATCH net-next RFC WIP] Patch for XDP support for virtio_net
Date: Mon, 31 Oct 2016 00:53:53 +0200	[thread overview]
Message-ID: <20161031004709-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20161028.131101.1982236040609043994.davem@davemloft.net>

On Fri, Oct 28, 2016 at 01:11:01PM -0400, David Miller wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Fri, 28 Oct 2016 08:56:35 -0700
> 
> > On 16-10-27 07:10 PM, David Miller wrote:
> >> From: Alexander Duyck <alexander.duyck@gmail.com>
> >> Date: Thu, 27 Oct 2016 18:43:59 -0700
> >> 
> >>> On Thu, Oct 27, 2016 at 6:35 PM, David Miller <davem@davemloft.net> wrote:
> >>>> From: "Michael S. Tsirkin" <mst@redhat.com>
> >>>> Date: Fri, 28 Oct 2016 01:25:48 +0300
> >>>>
> >>>>> On Thu, Oct 27, 2016 at 05:42:18PM -0400, David Miller wrote:
> >>>>>> From: "Michael S. Tsirkin" <mst@redhat.com>
> >>>>>> Date: Fri, 28 Oct 2016 00:30:35 +0300
> >>>>>>
> >>>>>>> Something I'd like to understand is how does XDP address the
> >>>>>>> problem that 100Byte packets are consuming 4K of memory now.
> >>>>>>
> >>>>>> Via page pools.  We're going to make a generic one, but right now
> >>>>>> each and every driver implements a quick list of pages to allocate
> >>>>>> from (and thus avoid the DMA man/unmap overhead, etc.)
> >>>>>
> >>>>> So to clarify, ATM virtio doesn't attempt to avoid dma map/unmap
> >>>>> so there should be no issue with that even when using sub/page
> >>>>> regions, assuming DMA APIs support sub-page map/unmap correctly.
> >>>>
> >>>> That's not what I said.
> >>>>
> >>>> The page pools are meant to address the performance degradation from
> >>>> going to having one packet per page for the sake of XDP's
> >>>> requirements.
> >>>>
> >>>> You still need to have one packet per page for correct XDP operation
> >>>> whether you do page pools or not, and whether you have DMA mapping
> >>>> (or it's equivalent virutalization operation) or not.
> >>>
> >>> Maybe I am missing something here, but why do you need to limit things
> >>> to one packet per page for correct XDP operation?  Most of the drivers
> >>> out there now are usually storing something closer to at least 2
> >>> packets per page, and with the DMA API fixes I am working on there
> >>> should be no issue with changing the contents inside those pages since
> >>> we won't invalidate or overwrite the data after the DMA buffer has
> >>> been synchronized for use by the CPU.
> >> 
> >> Because with SKB's you can share the page with other packets.
> >> 
> >> With XDP you simply cannot.
> >> 
> >> It's software semantics that are the issue.  SKB frag list pages
> >> are read only, XDP packets are writable.
> >> 
> >> This has nothing to do with "writability" of the pages wrt. DMA
> >> mapping or cpu mappings.
> >> 
> > 
> > Sorry I'm not seeing it either. The current xdp_buff is defined
> > by,
> > 
> >   struct xdp_buff {
> > 	void *data;
> > 	void *data_end;
> >   };
> > 
> > The verifier has an xdp_is_valid_access() check to ensure we don't go
> > past data_end. The page for now at least never leaves the driver. For
> > the work to get xmit to other devices working I'm still not sure I see
> > any issue.
> 
> I guess I can say that the packets must be "writable" until I'm blue
> in the face but I'll say it again, semantically writable pages are a
> requirement.  And if multiple packets share a page this requirement
> is not satisfied.
> 
> Also, we want to do several things in the future:
> 
> 1) Allow push/pop of headers via eBPF code, which needs we need
>    headroom.

I think that with e.g. LRO or a large MTU page per packet
does not guarantee headroom.

> 2) Transparently zero-copy pass packets into userspace, basically
>    the user will have a semi-permanently mapped ring of all the
>    packet pages sitting in the RX queue of the device and the
>    page pool associated with it.  This way we avoid all of the
>    TLB flush/map overhead for the user's mapping of the packets
>    just as we avoid the DMA map/unmap overhead.

Looks like you can share pages between packets as long as
they all come from the same pool so accessible
to the same userspace.

> And that's just the beginninng.
> 
> I'm sure others can come up with more reasons why we have this
> requirement.

I'm still a bit confused :( Is this a requirement of the current code or
to enable future extensions?

-- 
MST

next prev parent reply	other threads:[~2016-10-30 22:53 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-22  4:07 [PATCH net-next RFC WIP] Patch for XDP support for virtio_net Shrijeet Mukherjee
2016-10-23 16:38 ` Stephen Hemminger
2016-10-24  1:51   ` Shrijeet Mukherjee
2016-10-25  1:10     ` Alexei Starovoitov
2016-10-25 17:36 ` Jakub Kicinski
2016-10-26 13:52 ` Jesper Dangaard Brouer
2016-10-26 16:36   ` Michael S. Tsirkin
2016-10-26 16:52     ` David Miller
2016-10-26 17:07       ` Michael S. Tsirkin
2016-10-26 17:11         ` David Miller
2016-10-27  8:55           ` Jesper Dangaard Brouer
2016-10-27 21:09             ` John Fastabend
2016-10-27 21:30               ` Michael S. Tsirkin
2016-10-27 21:42                 ` David Miller
2016-10-27 22:25                   ` Michael S. Tsirkin
2016-10-28  1:35                     ` David Miller
2016-10-28  1:43                       ` Alexander Duyck
2016-10-28  2:10                         ` David Miller
2016-10-28 15:56                           ` John Fastabend
2016-10-28 16:18                             ` Jakub Kicinski
2016-10-28 18:22                               ` Alexei Starovoitov
2016-10-28 20:35                                 ` Alexander Duyck
2016-10-28 20:42                                   ` Jakub Kicinski
2016-10-28 20:36                                 ` Jakub Kicinski
2016-10-29  3:51                                 ` Shrijeet Mukherjee
2016-10-29 11:25                                   ` Thomas Graf
2016-11-02 14:27                                     ` Jesper Dangaard Brouer
2016-11-03  1:28                                       ` Shrijeet Mukherjee
2016-11-03  4:11                                         ` Michael S. Tsirkin
2016-11-03  6:44                                           ` John Fastabend
2016-11-03 22:20                                             ` John Fastabend
2016-11-03 22:42                                             ` Michael S. Tsirkin
2016-11-03 23:29                                               ` John Fastabend
2016-11-04  0:34                                                 ` Michael S. Tsirkin
2016-11-04 23:05                                                   ` John Fastabend
2016-11-06  6:50                                                     ` Michael S. Tsirkin
2016-10-28 17:11                             ` David Miller
2016-10-30 22:53                               ` Michael S. Tsirkin [this message]
2016-11-02 14:01                               ` Jesper Dangaard Brouer
2016-11-02 16:06                                 ` Alexander Duyck
2016-10-28  0:02               ` Shrijeet Mukherjee
2016-10-28  0:46                 ` Shrijeet Mukherjee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161031004709-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=john.fastabend@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@cumulusnetworks.com \
    --cc=roopa@cumulusnetworks.com \
    --cc=shm@cumulusnetworks.com \
    --cc=shrijeet@gmail.com \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.