From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Netdev <netdev@vger.kernel.org>,
BjörnTöpel <bjorn.topel@intel.com>,
"Karlsson, Magnus" <magnus.karlsson@intel.com>,
"Eugenia Emantayev" <eugenia@mellanox.com>,
"Jason Wang" <jasowang@redhat.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"Eran Ben Elisha" <eranbe@mellanox.com>,
"Saeed Mahameed" <saeedm@mellanox.com>,
"Gal Pressman" <galp@mellanox.com>,
"Daniel Borkmann" <borkmann@iogearbox.net>,
"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
"Tariq Toukan" <tariqt@mellanox.com>,
brouer@redhat.com
Subject: Re: [bpf-next V5 PATCH 14/15] xdp: transition into using xdp_frame for return API
Date: Mon, 26 Mar 2018 13:42:56 +0200 [thread overview]
Message-ID: <20180326134256.2cc7cf13@redhat.com> (raw)
In-Reply-To: <CAKgT0UfqqTt0J-6ssQJ4-drRHTMmOdGyEgadUAt9vcgfN-Enxg@mail.gmail.com>
On Fri, 23 Mar 2018 10:29:29 -0700 Alexander Duyck <alexander.duyck@gmail.com> wrote:
> On Fri, Mar 23, 2018 at 5:19 AM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> > Changing API xdp_return_frame() to take struct xdp_frame as argument,
> > seems like a natural choice. But there are some subtle performance
> > details here that needs extra care, which is a deliberate choice.
> >
> > When de-referencing xdp_frame on a remote CPU during DMA-TX
> > completion, result in the cache-line is change to "Shared"
> > state. Later when the page is reused for RX, then this xdp_frame
> > cache-line is written, which change the state to "Modified".
> >
> > This situation already happens (naturally) for, virtio_net, tun and
> > cpumap as the xdp_frame pointer is the queued object. In tun and
> > cpumap, the ptr_ring is used for efficiently transferring cache-lines
> > (with pointers) between CPUs. Thus, the only option is to
> > de-referencing xdp_frame.
> >
> > It is only the ixgbe driver that had an optimization, in which it can
> > avoid doing the de-reference of xdp_frame. The driver already have
> > TX-ring queue, which (in case of remote DMA-TX completion) have to be
> > transferred between CPUs anyhow. In this data area, we stored a
> > struct xdp_mem_info and a data pointer, which allowed us to avoid
> > de-referencing xdp_frame.
> >
> > To compensate for this, a prefetchw is used for telling the cache
> > coherency protocol about our access pattern. My benchmarks show that
> > this prefetchw is enough to compensate the ixgbe driver.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>
> I'm really not a fan of this patch. It seems like it is adding a ton
> of overhead for one case that is going to penalize all of the other
> use cases for XDP. Just the extra prefetch is going to have a
> significant impact on things like the XDP_DROP case.
>
[...]
> > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > index ff069597fccf..e6e9b28ecfba 100644
> > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > @@ -1207,7 +1207,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
> >
> > /* free the skb */
> > if (ring_is_xdp(tx_ring))
> > - xdp_return_frame(tx_buffer->data, &tx_buffer->xdp_mem);
> > + xdp_return_frame(tx_buffer->xdpf);
> > else
> > napi_consume_skb(tx_buffer->skb, napi_budget);
> >
> > @@ -2376,6 +2376,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
> > xdp.data_hard_start = xdp.data -
> > ixgbe_rx_offset(rx_ring);
> > xdp.data_end = xdp.data + size;
> > + prefetchw(xdp.data_hard_start); /* xdp_frame write */
> >
> > skb = ixgbe_run_xdp(adapter, rx_ring, &xdp);
> > }
>
> Do we really need to be prefetching yet another cache line? This is
> going to hurt general performance since for most cases you are now
> prefetching a cache line that will never be used.
My intuition were also that this would hurt performance. But I've done
many tests, and they all show no performance regression from this change!
XDP_DROP testing with xdp1 on ixgbe:
Baseline: 14,703,344 pkt/s
Patched: 14,711,574 pkt/s
For people reproducing, notice that it requires tuning on the generator
side (with pktgen) to reach these extreme speeds.
As you might have noticed, the prefetchw is only active when a XDP
program is loaded. Thus, I created a benchmark[1] that loads an XDP
program and always returns XDP_PASS (via --action)
Then, I drop packets in iptables raw table:
Baseline: 5,783,509 pps
Patched: 5,789,195 pps
Then unload netfilter modules and let packets reach UDP step
"UdpNoPorts" listening stage:
Baseline: 3,832,956 pps
Patched: 3,855,470 pps
Then, add a udp_sink (--recvmsg) to allow packets to travel deeper into
the network stack (and force udp_sink to run on another CPU):
Baseline: 2,278,855 pps
Patched: 2,270,373 pps
By measurements, it should be clear that this patchset does not add "a
ton of overhead" and does not "penalize all of the other use cases for
XDP".
For the XDP_REDIRECT use-case, I actually find the prefetchw() quite
beautiful, because xdp_buff (which is stack variable) is used by the
bpf_prog, and the prefetch have time to fetch the memory area that the
conversion of xdp_buff to xdp_frame goes into, and xdp_buff is known to
be hot.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_bench01_mem_access_cost_kern.c
next prev parent reply other threads:[~2018-03-26 11:43 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-23 12:17 [bpf-next V5 PATCH 00/15] XDP redirect memory return API Jesper Dangaard Brouer
2018-03-23 12:18 ` [bpf-next V5 PATCH 01/15] mlx5: basic XDP_REDIRECT forward support Jesper Dangaard Brouer
2018-03-23 12:18 ` [bpf-next V5 PATCH 02/15] xdp: introduce xdp_return_frame API and use in cpumap Jesper Dangaard Brouer
2018-03-23 16:35 ` Alexander Duyck
2018-03-26 19:06 ` Jesper Dangaard Brouer
2018-03-23 12:18 ` [bpf-next V5 PATCH 03/15] ixgbe: use xdp_return_frame API Jesper Dangaard Brouer
2018-03-23 16:46 ` Alexander Duyck
2018-03-23 12:18 ` [bpf-next V5 PATCH 04/15] xdp: move struct xdp_buff from filter.h to xdp.h Jesper Dangaard Brouer
2018-03-23 12:18 ` [bpf-next V5 PATCH 05/15] xdp: introduce a new xdp_frame type Jesper Dangaard Brouer
2018-03-23 17:11 ` Alexander Duyck
2018-03-23 12:18 ` [bpf-next V5 PATCH 06/15] tun: convert to use generic xdp_frame and xdp_return_frame API Jesper Dangaard Brouer
2018-03-23 12:18 ` [bpf-next V5 PATCH 07/15] virtio_net: " Jesper Dangaard Brouer
2018-03-23 12:18 ` [bpf-next V5 PATCH 08/15] bpf: cpumap convert to use generic xdp_frame Jesper Dangaard Brouer
2018-03-23 12:18 ` [bpf-next V5 PATCH 09/15] mlx5: register a memory model when XDP is enabled Jesper Dangaard Brouer
2018-03-23 16:18 ` Sergei Shtylyov
2018-03-23 12:18 ` [bpf-next V5 PATCH 10/15] xdp: rhashtable with allocator ID to pointer mapping Jesper Dangaard Brouer
2018-03-23 16:56 ` Alexander Duyck
2018-03-23 18:15 ` Jesper Dangaard Brouer
2018-03-23 18:22 ` Alexander Duyck
2018-03-26 21:04 ` Jesper Dangaard Brouer
2018-03-23 12:18 ` [bpf-next V5 PATCH 11/15] page_pool: refurbish version of page_pool code Jesper Dangaard Brouer
2018-03-23 13:28 ` Eric Dumazet
2018-03-26 14:09 ` Jesper Dangaard Brouer
2018-03-23 13:29 ` Eric Dumazet
2018-03-23 14:15 ` Jesper Dangaard Brouer
2018-03-23 14:55 ` Eric Dumazet
2018-03-26 15:19 ` Jesper Dangaard Brouer
2018-03-23 13:37 ` Eric Dumazet
2018-03-23 12:18 ` [bpf-next V5 PATCH 12/15] xdp: allow page_pool as an allocator type in xdp_return_frame Jesper Dangaard Brouer
2018-03-23 17:02 ` Alexander Duyck
2018-03-23 12:19 ` [bpf-next V5 PATCH 13/15] mlx5: use page_pool for xdp_return_frame call Jesper Dangaard Brouer
2018-03-23 12:19 ` [bpf-next V5 PATCH 14/15] xdp: transition into using xdp_frame for return API Jesper Dangaard Brouer
2018-03-23 17:29 ` Alexander Duyck
2018-03-26 11:42 ` Jesper Dangaard Brouer [this message]
2018-03-23 12:19 ` [bpf-next V5 PATCH 15/15] xdp: transition into using xdp_frame for ndo_xdp_xmit Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180326134256.2cc7cf13@redhat.com \
--to=brouer@redhat.com \
--cc=alexander.duyck@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bjorn.topel@intel.com \
--cc=borkmann@iogearbox.net \
--cc=eranbe@mellanox.com \
--cc=eugenia@mellanox.com \
--cc=galp@mellanox.com \
--cc=jasowang@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=saeedm@mellanox.com \
--cc=tariqt@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).