All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	"Florian Westphal" <fw@strlen.de>,
	netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	"Toke Høiland-Jørgensen" <toke@toke.dk>,
	ard.biesheuvel@linaro.org, "Jason Wang" <jasowang@redhat.com>,
	BjörnTöpel <bjorn.topel@intel.com>,
	w@1wt.eu, "Saeed Mahameed" <saeedm@mellanox.com>,
	mykyta.iziumtsev@gmail.com,
	"Daniel Borkmann" <borkmann@iogearbox.net>,
	"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
	"Tariq Toukan" <tariqt@mellanox.com>,
	brouer@redhat.com
Subject: Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
Date: Sat, 8 Dec 2018 21:11:53 +0100	[thread overview]
Message-ID: <20181208211153.68f1424e@redhat.com> (raw)
In-Reply-To: <db053651-5de2-26c3-8fae-1a077a665dd0@gmail.com>

On Sat, 8 Dec 2018 11:26:56 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On 12/08/2018 06:57 AM, Ilias Apalodimas wrote:
> > Hi Eric,   
> >>>> This patch is changing struct sk_buff, and is thus per-definition
> >>>> controversial.
> >>>>
> >>>> Place a new member 'mem_info' of type struct xdp_mem_info, just after
> >>>> members (flags) head_frag and pfmemalloc, And not in between
> >>>> headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
> >>>> Copying mem_info during skb_clone() is required.  This makes sure that
> >>>> pages are correctly freed or recycled during the altered
> >>>> skb_free_head() invocation.  
> >>>
> >>> I read this to mean that this 'info' isn't accessed/needed until skb
> >>> is freed.  Any reason its not added at the end?
> >>>
> >>> This would avoid moving other fields that are probably accessed
> >>> more frequently during processing.
> >>>  
> >>
> >> But I do not get why the patch is needed.
> >>
> >> Adding extra cost for each skb destruction is costly.
> >>
> >> I though XDP was all about _not_ having skbs.  
> > 
> > You hit the only point i don't personally like in the code, xdp info in the 
> > skb. Considering the benefits i am convinced this is ok and here's why:
> >   
> >> Please let's do not slow down the non XDP stack only to make XDP more appealing.  
> > 
> > We are not trying to do that, on the contrary. The patchset has nothing towards
> > speeding XDP and slowing down anything else. The patchset speeds up the 
> > mvneta driver on the default network stack. The only change that was needed was
> > to adapt the driver to using the page_pool API. The speed improvements we are
> > seeing on specific workloads (i.e 256b < packet < 400b) are almost 3x.
> > 
> > Lots of high speed drivers are doing similar recycling tricks themselves (and
> > there's no common code, everyone is doing something similar though). All we are
> > trying to do is provide a unified API to make that easier for the rest. Another
> > advantage is that if the some drivers switch to the API, adding XDP
> > functionality on them is pretty trivial.
> > 
> > Moreover our tests are only performed on systems without or with SMMU disabled.
> > On a platform i have access to, enabling and disabling the SMMU has some
> > performance impact. By keeping the buffers mapped we believe this impact
> > will be substantially less (i'll come back with results once i have them on
> > this).
> > 
> > I do understand your point, but the potential advantages on my head
> > overwight that by a lot.
> > 
> > I got other concerns on the patchset though. Like how much memory is it 'ok' to
> > keep mapped keeping in mind we are using the streaming DMA API. Are we going to
> > affect anyone else negatively by doing so ?
> >   
>  
> I want to make sure you guys thought about splice() stuff, and
> skb_try_coalesce(), and GRO, and skb cloning, and ...

Thanks for the pointers. To Ilias, we need to double check skb_try_coalesce()
code path, as it does look like we don't handle this correctly.

> I do not see how an essential property of page fragments would be
> attached to sk_buff, without adding extra code all over the places.
> 
> Page recycling in drivers is fine, but should operate on pages that
> the driver owns.
> 

I think we have addressed this.  We are only recycling pages that the
driver owns.  In case of fragments, then we release the DMA-mapping and
don't recycle the page, instead the normal code path is taken (which is
missing in case of skb_try_coalesce).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

  reply	other threads:[~2018-12-08 20:12 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-06 23:25 [net-next PATCH RFC 0/8] page_pool DMA handling and allow to recycles frames via SKB Jesper Dangaard Brouer
2018-12-06 23:25 ` [net-next PATCH RFC 1/8] page_pool: add helper functions for DMA Jesper Dangaard Brouer
2018-12-08  7:06   ` David Miller
2018-12-08  7:55     ` Ilias Apalodimas
2018-12-06 23:25 ` [net-next PATCH RFC 2/8] net: mvneta: use page pool API for sw buffer manager Jesper Dangaard Brouer
2018-12-06 23:25 ` [net-next PATCH RFC 3/8] xdp: reduce size of struct xdp_mem_info Jesper Dangaard Brouer
2018-12-06 23:25 ` [net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API Jesper Dangaard Brouer
2018-12-08  7:15   ` David Miller
2018-12-08  7:54     ` Ilias Apalodimas
2018-12-08  9:57   ` [net-next, RFC, " Florian Westphal
2018-12-08 11:36     ` Jesper Dangaard Brouer
2018-12-08 20:10       ` David Miller
2018-12-08 12:29     ` Eric Dumazet
2018-12-08 12:34       ` Eric Dumazet
2018-12-08 13:45       ` Jesper Dangaard Brouer
2018-12-08 14:57       ` Ilias Apalodimas
2018-12-08 17:07         ` Andrew Lunn
2018-12-08 19:26         ` Eric Dumazet
2018-12-08 20:11           ` Jesper Dangaard Brouer [this message]
2018-12-08 20:14             ` Ilias Apalodimas
2018-12-08 21:06               ` Willy Tarreau
2018-12-10  7:54                 ` Ilias Apalodimas
2018-12-08 22:36               ` Eric Dumazet
2018-12-08 20:21         ` David Miller
2018-12-08 20:29           ` Ilias Apalodimas
2018-12-10  9:51           ` Saeed Mahameed
2018-12-06 23:25 ` [net-next PATCH RFC 5/8] net: mvneta: remove copybreak, prefetch and use build_skb Jesper Dangaard Brouer
2018-12-06 23:25 ` [net-next PATCH RFC 6/8] mvneta: activate page recycling via skb using page_pool Jesper Dangaard Brouer
2018-12-06 23:26 ` [net-next PATCH RFC 7/8] xdp: bpf: cpumap redirect must update skb->mem_info Jesper Dangaard Brouer
2018-12-06 23:26 ` [net-next PATCH RFC 8/8] veth: xdp_frames redirected into veth need to transfer xdp_mem_info Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181208211153.68f1424e@redhat.com \
    --to=brouer@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=bjorn.topel@intel.com \
    --cc=borkmann@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=fw@strlen.de \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jasowang@redhat.com \
    --cc=mykyta.iziumtsev@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    --cc=tariqt@mellanox.com \
    --cc=toke@toke.dk \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.