From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Edward Cree <ecree@solarflare.com>
Cc: <netdev@vger.kernel.org>, <jakub.kicinski@netronome.com>,
"Michael S. Tsirkin" <mst@redhat.com>, <pavel.odintsov@gmail.com>,
Jason Wang <jasowang@redhat.com>, <mchan@broadcom.com>,
John Fastabend <john.fastabend@gmail.com>,
<peter.waskiewicz.jr@intel.com>, <ast@fiberby.dk>,
Daniel Borkmann <borkmann@iogearbox.net>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Andy Gospodarek <andy@greyhouse.net>,
brouer@redhat.com
Subject: Re: [net-next V7 PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP
Date: Fri, 13 Oct 2017 10:17:57 +0200 [thread overview]
Message-ID: <20171013101757.58758ed0@redhat.com> (raw)
In-Reply-To: <de3a227a-da92-b905-11d0-ecc5a05f3bc0@solarflare.com>
On Thu, 12 Oct 2017 21:35:05 +0100 Edward Cree <ecree@solarflare.com> wrote:
> On 12/10/17 13:26, Jesper Dangaard Brouer wrote:
> > The 'cpumap' is primary used as a backend map for XDP BPF helper
> s/primary/primarily.
> [...]
> Again, s/primary/primarily.
> > + * call bpf_redirect_map() and XDP_REDIRECT action, like 'devmap'.
> > + *
> > + * Unlike devmap which redirect XDP frames out another NIC device,
> > + * this map type redirect raw XDP frames to another CPU. The remote
> Also I think both of these 'redirect' should be 'redirects', just a
> grammatical nit pick ;)
> > + * CPU will do SKB-allocation and call the normal network stack.
> > + *
> > + * This is a scalability and isolation mechanism, that allow
> > + * separating the early driver network XDP layer, from the rest of the
> > + * netstack, and assigning dedicated CPUs for this stage. This
> > + * basically allows for 10G wirespeed pre-filtering via bpf.
> > + */
> > +#include <linux/bpf.h>
> > +#include <linux/filter.h>
> > +#include <linux/ptr_ring.h>
> > +
> > +#include <linux/sched.h>
> > +#include <linux/workqueue.h>
> > +#include <linux/kthread.h>
> > +#include <linux/capability.h>
> > +
> > +/* General idea: XDP packets getting XDP redirected to another CPU,
> > + * will maximum be stored/queued for one driver ->poll() call. It is
> > + * guaranteed that setting flush bit and flush operation happen on
> > + * same CPU. Thus, cpu_map_flush operation can deduct via this_cpu_ptr()
> > + * which queue in bpf_cpu_map_entry contains packets.
> > + */
> > +
> > +#define CPU_MAP_BULK_SIZE 8 /* 8 == one cacheline on 64-bit archs */
> > +struct xdp_bulk_queue {
> > + void *q[CPU_MAP_BULK_SIZE];
> > + unsigned int count;
> > +};
>
> I realise it's a bit late to say this on a v7, but it might be better to
> use a linked-list (list_heads) here instead of an array. Then, the
> struct xdp_pkt you store in the packet headroom could contain the
> list_head, there's no arbitrary bulking limit, and the flush just has
> to link the newly-created elements into the receiving CPU's list.
> Is there an obvious reason why this wouldn't work / can't perform as
> well, or should I try it and benchmark it?
No, I've tried to explain this before. I do want a bulking limit for
several reasons. (1) This is connected to how ptr_ring works. I do want
to have a full cache-line to transfer/enqueue into the ptr_ring. The
ptr_ring is the key to making the transfer between CPUs work so
efficiently (I even reject my own alf_queue in favor of ptr_ring).
(2) Due to latency concerns, I don't want to "wait" for 64 packets before
the remote CPU get a chance to see these. I want to transfer/enqueue
packets to the remote CPU as soon as possible, and due to cacheline
constraints this is 8 packets.
The ptr_ring goes to great lengths to avoid cache-line bouncing. Like
fb9de9704775 ("ptr_ring: batch ring zeroing") which helps avoid cache
line bouncing when queue is full. When queue is almost empty,
cache-line bouncing still occurs. Which is what I'm trying to minimize
here by transfering/enqueueing a full cacheline.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2017-10-13 8:18 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-12 12:26 [net-next V7 PATCH 0/5] New bpf cpumap type for XDP_REDIRECT Jesper Dangaard Brouer
2017-10-12 12:26 ` [net-next V7 PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP Jesper Dangaard Brouer
2017-10-12 20:35 ` Edward Cree
2017-10-13 8:17 ` Jesper Dangaard Brouer [this message]
2017-10-14 0:36 ` kbuild test robot
2017-10-12 12:26 ` [net-next V7 PATCH 2/5] bpf: XDP_REDIRECT enable use of cpumap Jesper Dangaard Brouer
2017-10-12 12:26 ` [net-next V7 PATCH 3/5] bpf: cpumap xdp_buff to skb conversion and allocation Jesper Dangaard Brouer
2017-10-12 21:13 ` Edward Cree
2017-10-13 9:13 ` Jesper Dangaard Brouer
2017-10-12 12:27 ` [net-next V7 PATCH 4/5] bpf: cpumap add tracepoints Jesper Dangaard Brouer
2017-10-14 16:54 ` David Miller
2017-10-12 12:27 ` [net-next V7 PATCH 5/5] samples/bpf: add cpumap sample program xdp_redirect_cpu Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171013101757.58758ed0@redhat.com \
--to=brouer@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andy@greyhouse.net \
--cc=ast@fiberby.dk \
--cc=borkmann@iogearbox.net \
--cc=ecree@solarflare.com \
--cc=jakub.kicinski@netronome.com \
--cc=jasowang@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=mchan@broadcom.com \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pavel.odintsov@gmail.com \
--cc=peter.waskiewicz.jr@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.