Re: [PATCH v4 bpf-next 00/13] mvneta: introduce XDP multi-buffer support

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jesper Dangaard Brouer <brouer@redhat.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	bpf@vger.kernel.org, netdev@vger.kernel.org, davem@davemloft.net,
	kuba@kernel.org, ast@kernel.org, daniel@iogearbox.net,
	shayagr@amazon.com, sameehj@amazon.com, dsahern@kernel.org,
	echaudro@redhat.com, brouer@redhat.com
Subject: Re: [PATCH v4 bpf-next 00/13] mvneta: introduce XDP multi-buffer support
Date: Mon, 5 Oct 2020 11:52:47 +0200	[thread overview]
Message-ID: <20201005115247.72429157@carbon> (raw)
In-Reply-To: <5f776c14d69b3_a6402087e@john-XPS-13-9370.notmuch>

On Fri, 02 Oct 2020 11:06:12 -0700
John Fastabend <john.fastabend@gmail.com> wrote:

> Lorenzo Bianconi wrote:
> > > Lorenzo Bianconi wrote:  
> > > > This series introduce XDP multi-buffer support. The mvneta driver is
> > > > the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
> > > > please focus on how these new types of xdp_{buff,frame} packets
> > > > traverse the different layers and the layout design. It is on purpose
> > > > that BPF-helpers are kept simple, as we don't want to expose the
> > > > internal layout to allow later changes.
> > > > 
> > > > For now, to keep the design simple and to maintain performance, the XDP
> > > > BPF-prog (still) only have access to the first-buffer. It is left for
> > > > later (another patchset) to add payload access across multiple buffers.
> > > > This patchset should still allow for these future extensions. The goal
> > > > is to lift the XDP MTU restriction that comes with XDP, but maintain
> > > > same performance as before.
> > > > 
> > > > The main idea for the new multi-buffer layout is to reuse the same
> > > > layout used for non-linear SKB. This rely on the "skb_shared_info"
> > > > struct at the end of the first buffer to link together subsequent
> > > > buffers. Keeping the layout compatible with SKBs is also done to ease
> > > > and speedup creating an SKB from an xdp_{buff,frame}. Converting
> > > > xdp_frame to SKB and deliver it to the network stack is shown in cpumap
> > > > code (patch 13/13).  
> > > 
> > > Using the end of the buffer for the skb_shared_info struct is going to
> > > become driver API so unwinding it if it proves to be a performance issue
> > > is going to be ugly. So same question as before, for the use case where
> > > we receive packet and do XDP_TX with it how do we avoid cache miss
> > > overhead? This is not just a hypothetical use case, the Facebook
> > > load balancer is doing this as well as Cilium and allowing this with
> > > multi-buffer packets >1500B would be useful.
> > > 
> > > Can we write the skb_shared_info lazily? It should only be needed once
> > > we know the packet is going up the stack to some place that needs the
> > > info. Which we could learn from the return code of the XDP program.  
> > 
> > Hi John,  
> 
> Hi, I'll try to join the two threads this one and the one on helpers here
> so we don't get too fragmented.
> 
> > 
> > I agree, I think for XDP_TX use-case it is not strictly necessary to fill the
> > skb_hared_info. The driver can just keep this info on the stack and use it
> > inserting the packet back to the DMA ring.
> > For mvneta I implemented it in this way to keep the code aligned with ndo_xdp_xmit
> > path since it is a low-end device. I guess we are not introducing any API constraint
> > for XDP_TX. A high-end device can implement multi-buff for XDP_TX in a different way
> > in order to avoid the cache miss.  
> 
> Agree it would be an implementation detail for XDP_TX except the two
> helpers added in this series currently require it to be there.

That is a good point.  If you look at the details, the helpers use
xdp_buff->mb bit to guard against accessing the "shared_info"
cacheline. Thus, for the normal single frame case XDP_TX should not see
a slowdown.  Do we really need to optimize XDP_TX multi-frame case(?)

> > 
> > We need to fill the skb_shared info only when we want to pass the frame to the
> > network stack (build_skb() can directly reuse skb_shared_info->frags[]) or for
> > XDP_REDIRECT use-case.  
> 
> It might be good to think about the XDP_REDIRECT case as well then. If the
> frags list fit in the metadata/xdp_frame would we expect better
> performance?

I don't like to use space in xdp_frame for this. (1) We (Ahern and I)
are planning to use the space in xdp_frame for RX-csum + RX-hash +vlan,
which will be more common (e.g. all packets will have HW RX+csum).  (2)
I consider XDP multi-buffer the exception case, that will not be used
in most cases, so why reserve space for that in this cache-line.

IMHO we CANNOT allow any slowdown for existing XDP use-cases, but IMHO
XDP multi-buffer use-cases are allowed to run "slower".

> Looking at skb_shared_info{} that is a rather large structure with many

A cache-line detail about skb_shared_info: The first frags[0] member is
in the first cache-line.  Meaning that it is still fast to have xdp
frames with 1 extra buffer.

> fields that look unnecessary for XDP_REDIRECT case and only needed when
> passing to the stack. 

Yes, I think we can use first cache-line of skb_shared_info more
optimally (via defining a xdp_shared_info struct). But I still want us
to use this specific cache-line.  Let me explain why below. (Avoiding
cache-line misses is all about the details, so I hope you can follow).

Hopefully most driver developers understand/knows this.  In the RX-loop
the current RX-descriptor have a status that indicate there are more
frame, usually expressed as non-EOP (End-Of-Packet).  Thus, a driver
can start a prefetchw of this shared_info cache-line, prior to
processing the RX-desc that describe the multi-buffer.
 (Remember this shared_info is constructed prior to calling XDP and any
XDP_TX action, thus the XDP prog should not see a cache-line miss when
using the BPF-helper to read shared_info area).

> Fundamentally, a frag just needs
> 
>  struct bio_vec {
>      struct page *bv_page;     // 8B
>      unsigned int bv_len;      // 4B
>      unsigned int bv_offset;   // 4B
>  } // 16B
> 
> With header split + data we only need a single frag so we could use just
> 16B. And worse case jumbo frame + header split seems 3 entries would be
> enough giving 48B (header plus 3 4k pages). 

For jumbo-frame 9000 MTU 2 entries might be enough, as we also have
room in the first buffer (((9000-(4096-256-320))/4096 = 1.33789).

The problem is that we need to support TSO (TCP Segmentation Offload)
use-case, which can have more frames. Thus, 3 entries will not be
enough.

> Could we just stick this in the metadata and make it read only? Then
> programs that care can read it and get all the info they need without
> helpers.

I don't see how that is possible. (1) the metadata area is only 32
bytes, (2) when freeing an xdp_frame the kernel need to know the layout
as these points will be free'ed.

> I would expect performance to be better in the XDP_TX and
> XDP_REDIRECT cases. And copying an extra worse case 48B in passing to
> the stack I guess is not measurable given all the work needed in that
> path.

I do agree, that when passing to netstack we can do a transformation
from xdp_shared_info to skb_shared_info with a fairly small cost.  (The
TSO case would require more copying).

Notice that allocating an SKB, will always clear the first 32 bytes of
skb_shared_info.  If the XDP driver-code path have done the prefetch
as described above, then we should see a speedup for netstack delivery.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

next prev parent reply	other threads:[~2020-10-05  9:53 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-02 14:41 [PATCH v4 bpf-next 00/13] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
2020-10-02 14:41 ` [PATCH v4 bpf-next 01/13] xdp: introduce mb in xdp_buff/xdp_frame Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 02/13] xdp: initialize xdp_buff mb bit to 0 in all XDP drivers Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 03/13] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 04/13] xdp: add multi-buff support to xdp_return_{buff/frame} Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 05/13] net: mvneta: add multi buffer support to XDP_TX Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 06/13] bpf: introduce bpf_xdp_get_frags_{count, total_size} helpers Lorenzo Bianconi
2020-10-02 15:36   ` John Fastabend
2020-10-02 16:25     ` Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 07/13] samples/bpf: add bpf program that uses xdp mb helpers Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 08/13] bpf: move user_size out of bpf_test_init Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 09/13] bpf: introduce multibuff support to bpf_prog_test_run_xdp() Lorenzo Bianconi
2020-10-08  8:06   ` Shay Agroskin
2020-10-08 10:46     ` Jesper Dangaard Brouer
2020-10-02 14:42 ` [PATCH v4 bpf-next 10/13] bpf: test_run: add skb_shared_info pointer in bpf_test_finish signature Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 11/13] bpf: add xdp multi-buffer selftest Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 12/13] net: mvneta: enable jumbo frames for XDP Lorenzo Bianconi
2020-10-02 14:42 ` [PATCH v4 bpf-next 13/13] bpf: cpumap: introduce xdp multi-buff support Lorenzo Bianconi
2020-10-02 15:25 ` [PATCH v4 bpf-next 00/13] mvneta: introduce XDP multi-buffer support John Fastabend
2020-10-02 16:06   ` Lorenzo Bianconi
2020-10-02 18:06     ` John Fastabend
2020-10-05  9:52       ` Jesper Dangaard Brouer [this message]
2020-10-05 21:22         ` John Fastabend
2020-10-05 22:24           ` Lorenzo Bianconi
2020-10-06  4:29             ` John Fastabend
2020-10-06  7:30               ` Jesper Dangaard Brouer
2020-10-06 15:28                 ` Lorenzo Bianconi
2020-10-08 14:38                   ` John Fastabend
2020-10-02 19:53   ` Daniel Borkmann
2020-10-05 15:50     ` Tirthendu Sarkar
2020-10-06 12:39     ` Jubran, Samih

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201005115247.72429157@carbon \
    --to=brouer@redhat.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=echaudro@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=lorenzo.bianconi@redhat.com \
    --cc=lorenzo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sameehj@amazon.com \
    --cc=shayagr@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).