From: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: Lorenzo Bianconi <lorenzo@kernel.org>,
netdev@vger.kernel.org, bpf@vger.kernel.org, davem@davemloft.net,
brouer@redhat.com, echaudro@redhat.com, sameehj@amazon.com,
kuba@kernel.org, daniel@iogearbox.net, ast@kernel.org,
shayagr@amazon.com, edumazet@google.com
Subject: Re: [PATCH v2 net-next 6/9] bpf: helpers: add bpf_xdp_adjust_mb_header helper
Date: Tue, 8 Sep 2020 23:31:20 +0200 [thread overview]
Message-ID: <20200908213120.GA27040@lore-desk> (raw)
In-Reply-To: <5f57e23e513b2_10343208e0@john-XPS-13-9370.notmuch>
[-- Attachment #1: Type: text/plain, Size: 8063 bytes --]
> Lorenzo Bianconi wrote:
> > > Lorenzo Bianconi wrote:
> > > > > Lorenzo Bianconi wrote:
> >
> > [...]
> >
> > > > > > + * Description
> > > > > > + * Adjust frame headers moving *offset* bytes from/to the second
> > > > > > + * buffer to/from the first one. This helper can be used to move
> > > > > > + * headers when the hw DMA SG does not copy all the headers in
> > > > > > + * the first fragment.
> > > >
> > > > + Eric to the discussion
> > > >
>
> [...]
>
> > > > > > +BPF_CALL_2(bpf_xdp_adjust_mb_header, struct xdp_buff *, xdp,
> > > > > > + int, offset)
> > > > > > +{
> > > > > > + void *data_hard_end, *data_end;
> > > > > > + struct skb_shared_info *sinfo;
> > > > > > + int frag_offset, frag_len;
> > > > > > + u8 *addr;
> > > > > > +
> > > > > > + if (!xdp->mb)
> > > > > > + return -EOPNOTSUPP;
> > >
> > > Not required for this patch necessarily but I think it would be better user
> > > experience if instead of EOPNOTSUPP here we did the header split. This
> > > would allocate a frag and copy the bytes around as needed. Yes it might
> > > be slow if you don't have a frag free in the driver, but if user wants to
> > > do header split and their hardware can't do it we would have a way out.
> > >
> > > I guess it could be an improvement for later though.
> >
> > I have no a strong opinion on this, I did it in this way to respect the rule "we
> > do not allocate memory for XDP".
> >
> > @Jesper, David: thoughts?
>
> Consider adding a flags field to the helper so we could do this later with
> a flag. Then users who want the alloc can set the flag and get it.
>
> [...]
>
> >
> > >
> > > How/when does the header split go wrong on the mvneta device? I guess
> > > this is to fix a real bug/issue not some theoritical one? An example
> > > in the commit message would make this concrete. Soemthing like,
> > > "When using RX zerocopy to mmap data into userspace application if
> > > a packet with [all these wild headers] is received rx zerocopy breaks
> > > because header split puts headers X in the data frag confusing apps".
> >
> > This issue does not occur with mvneta since the driver is not capable of
> > performing header split AFAIK. The helper has been introduced to cover the
> > "issue" reported by Eric in his NetDevConf presentation. In order to test the
> > helper I modified the mventa rx napi loop in a controlled way (this patch can't
> > be sent upstream, it is for testing only :))
> > I will improve commit message in v3.
>
> Ah ok so really there is no users for the helper then IMO just drop
> the patch until we have a user then.
I agree, this helper is not strictly related to the series. I added it in v2
to provide another example of using xdp_buff mb bit.
>
> >
> > >
> > > >
> > > > >
> > > > > Also and even more concerning I think this API requires the
> > > > > driver to populate shinfo. If we use TX_REDIRECT a lot or TX_XMIT
> > > > > this means we need to populate shinfo when its probably not ever
> > > > > used. If our driver is smart L2/L3 headers are in the readable
> > > > > data and prefetched. Writing frags into the end of a page is likely
> > > > > not free.
> > > >
> > > > Sorry I did not get what you mean with "populate shinfo" here. We need to
> > > > properly set shared_info in order to create the xdp multi-buff.
> > > > Apart of header splits, please consider the main uses-cases for
> > > > xdp multi-buff are XDP with TSO and Jumbo frames.
> > >
> > > The use case I have in mind is a XDP_TX or XDP_REDIRECT load balancer.
> > > I wont know this at the driver level and now I'll have to write into
> > > the back of every page with this shinfo just in case. If header
> > > split is working I should never need to even touch the page outside
> > > the first N bytes that were DMAd and in cache with DDIO. So its extra
> > > overhead for something that is unlikely to happen in the LB case.
> >
> > So far the skb_shared_info in constructed in mvneta only if the hw splits
> > the received data in multiple buffers (so if the MTU is greater than 1 PAGE,
> > please see comments below). Moreover the shared_info is present only in the
> > first buffer.
>
> Still in a normal L2/L3/L4 use case I expect all the headers you
> need to be in the fist buffer so its unlikely for use cases that
> send most traffic via XDP_TX for example to ever need the extra
> info. In these cases I think you are paying some penalty for
> having to do the work of populating the shinfo. Maybe its measurable
> maybe not I'm not sure.
>
> Also if we make it required for multi-buffer than we also need
> the shinfo on 40gbps or 100gbps nics and now even small costs
> matter.
Now I realized I used the word "split" in a not clear way here,
I apologize for that.
What I mean is not related "header" split, I am referring to the case where
the hw is configured with a given rx buffer size (e.g. 1 PAGE) and we have
set a higher MTU/max received size (e.g. 9K).
In this case the hw will "split" the jumbo received frame over multiple rx
buffers/descriptors. Populating the "xdp_shared_info" we will forward this
layout info to the eBPF sandbox and to a remote driver/cpu.
Please note this use case is not currently covered by XDP so if we develop it a
proper way I guess we should not get any performance hit for the legacy single-buffer
mode since we will not populate the shared_info for it (I think you refer to
the "legacy" use-case in your "normal L2/L3/L4" example, right?)
Anyway I will run some tests to verify the performances for the single buffer
use-case are not hit.
Regards,
Lorenzo
>
> >
> > >
> > > If you take the simplest possible program that just returns XDP_TX
> > > and run a pkt generator against it. I believe (haven't run any
> > > tests) that you will see overhead now just from populating this
> > > shinfo. I think it needs to only be done when its needed e.g. when
> > > user makes this helper call or we need to build the skb and populate
> > > the frags there.
> >
> > sure, I will carry out some tests.
>
> Thanks!
>
> >
> > >
> > > I think a smart driver will just keep the frags list in whatever
> > > form it has them (rx descriptors?) and push them over to the
> > > tx descriptors without having to do extra work with frag lists.
> >
> > I think there are many use-cases where we want to have this info available in
> > xdp_buff/xdp_frame. E.g: let's consider the following Jumbo frame example:
> > - MTU > 1 PAGE (so we the driver will split the received data in multiple rx
> > descriptors)
> > - the driver performs a XDP_REDIRECT to a veth or cpumap
> >
> > Relying on the proposed architecture we could enable GRO in veth or cpumap I
> > guess since we can build a non-linear skb from the xdp multi-buff, right?
>
> I'm not disputing there are use-cases. But, I'm trying to see if we
> can cover those without introducing additional latency in other
> cases. Hence the extra benchmarks request ;)
>
> >
> > >
> > > >
> > > > >
> > > > > Did you benchmark this?
> > > >
> > > > will do, I need to understand if we can use tiny buffers in mvneta.
> > >
> > > Why tiny buffers? How does mvneta layout the frags when doing
> > > header split? Can we just benchmark what mvneta is doing at the
> > > end of this patch series?
> >
> > for the moment mvneta can split the received data when the previous buffer is
> > full (e.g. when we the first page is completely written). I want to explore if
> > I can set a tiny buffer (e.g. 128B) as max received buffer to run some performance
> > tests and have some "comparable" results respect to the ones I got when I added XDP
> > support to mvneta.
>
> OK would be great.
>
> >
> > >
> > > Also can you try the basic XDP_TX case mentioned above.
> > > I don't want this to degrade existing use cases if at all
> > > possible.
> >
> > sure, will do.
>
> Thanks!
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
next prev parent reply other threads:[~2020-09-08 21:31 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-03 20:58 [PATCH v2 net-next 0/9] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
2020-09-03 20:58 ` [PATCH v2 net-next 1/9] xdp: introduce mb in xdp_buff/xdp_frame Lorenzo Bianconi
2020-09-04 1:07 ` Alexei Starovoitov
2020-09-04 7:19 ` Jesper Dangaard Brouer
2020-09-04 15:15 ` David Ahern
2020-09-04 15:59 ` Jesper Dangaard Brouer
2020-09-04 16:30 ` David Ahern
2020-09-07 18:02 ` Jesper Dangaard Brouer
2020-09-08 1:22 ` David Ahern
2020-09-03 20:58 ` [PATCH v2 net-next 2/9] xdp: initialize xdp_buff mb bit to 0 in all XDP drivers Lorenzo Bianconi
2020-09-04 7:35 ` Jesper Dangaard Brouer
2020-09-04 7:35 ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-09-04 7:54 ` Lorenzo Bianconi
2020-09-04 7:54 ` [Intel-wired-lan] " Lorenzo Bianconi
2020-09-03 20:58 ` [PATCH v2 net-next 3/9] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer Lorenzo Bianconi
2020-09-06 7:33 ` Shay Agroskin
2020-09-06 9:05 ` Lorenzo Bianconi
2020-09-03 20:58 ` [PATCH v2 net-next 4/9] xdp: add multi-buff support to xdp_return_{buff/frame} Lorenzo Bianconi
2020-09-03 20:58 ` [PATCH v2 net-next 5/9] net: mvneta: add multi buffer support to XDP_TX Lorenzo Bianconi
2020-09-06 7:20 ` Shay Agroskin
2020-09-06 8:43 ` Lorenzo Bianconi
2020-09-03 20:58 ` [PATCH v2 net-next 6/9] bpf: helpers: add bpf_xdp_adjust_mb_header helper Lorenzo Bianconi
2020-09-04 1:09 ` Alexei Starovoitov
2020-09-04 8:07 ` Lorenzo Bianconi
2020-09-04 1:13 ` Alexei Starovoitov
2020-09-04 7:50 ` Lorenzo Bianconi
2020-09-04 13:52 ` Jesper Dangaard Brouer
2020-09-04 14:27 ` Lorenzo Bianconi
2020-09-04 6:47 ` John Fastabend
2020-09-04 9:45 ` Lorenzo Bianconi
2020-09-04 15:23 ` John Fastabend
2020-09-06 13:36 ` Lorenzo Bianconi
2020-09-08 19:57 ` John Fastabend
2020-09-08 21:31 ` Lorenzo Bianconi [this message]
2020-09-09 20:51 ` Lorenzo Bianconi
2020-09-03 20:58 ` [PATCH v2 net-next 7/9] bpf: helpers: add multibuffer support Lorenzo Bianconi
2020-09-03 21:24 ` Maciej Fijalkowski
2020-09-04 9:47 ` Lorenzo Bianconi
2020-09-03 20:58 ` [PATCH v2 net-next 8/9] samples/bpf: add bpf program that uses xdp mb helpers Lorenzo Bianconi
2020-09-03 20:58 ` [PATCH v2 net-next 9/9] net: mvneta: enable jumbo frames for XDP Lorenzo Bianconi
2020-09-04 1:08 ` [PATCH v2 net-next 0/9] mvneta: introduce XDP multi-buffer support Alexei Starovoitov
2020-09-04 7:40 ` Lorenzo Bianconi
2020-09-04 5:41 ` John Fastabend
2020-09-04 7:39 ` Lorenzo Bianconi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200908213120.GA27040@lore-desk \
--to=lorenzo.bianconi@redhat.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=echaudro@redhat.com \
--cc=edumazet@google.com \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=lorenzo@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=sameehj@amazon.com \
--cc=shayagr@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.