All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: "Jonathan Lemon" <jonathan.lemon@gmail.com>,
	"Magnus Karlsson" <magnus.karlsson@intel.com>,
	"Björn Töpel" <bjorn.topel@intel.com>,
	ast@kernel.org, "Daniel Borkmann" <daniel@iogearbox.net>,
	"Network Development" <netdev@vger.kernel.org>,
	"Jakub Kicinski" <jakub.kicinski@netronome.com>,
	"Björn Töpel" <bjorn.topel@gmail.com>,
	"Zhang, Qi Z" <qi.z.zhang@intel.com>,
	xiaolong.ye@intel.com, brouer@redhat.com,
	"xdp-newbies@vger.kernel.org" <xdp-newbies@vger.kernel.org>
Subject: Re: [PATCH bpf-next v4 0/2] libbpf: adding AF_XDP support
Date: Wed, 13 Feb 2019 12:55:30 +0100	[thread overview]
Message-ID: <20190213125530.4a7fb8bc@carbon> (raw)
In-Reply-To: <CAJ8uoz19UjmEHTc28Qd_9KdY9D-ojXSBRTbmffRhUTX49mnWvg@mail.gmail.com>

On Wed, 13 Feb 2019 12:32:47 +0100
Magnus Karlsson <magnus.karlsson@gmail.com> wrote:

> On Mon, Feb 11, 2019 at 9:44 PM Jonathan Lemon <jonathan.lemon@gmail.com> wrote:
> >
> > On 8 Feb 2019, at 5:05, Magnus Karlsson wrote:
> >  
> > > This patch proposes to add AF_XDP support to libbpf. The main reason
> > > for this is to facilitate writing applications that use AF_XDP by
> > > offering higher-level APIs that hide many of the details of the AF_XDP
> > > uapi. This is in the same vein as libbpf facilitates XDP adoption by
> > > offering easy-to-use higher level interfaces of XDP
> > > functionality. Hopefully this will facilitate adoption of AF_XDP, make
> > > applications using it simpler and smaller, and finally also make it
> > > possible for applications to benefit from optimizations in the AF_XDP
> > > user space access code. Previously, people just copied and pasted the
> > > code from the sample application into their application, which is not
> > > desirable.  
> >
> > I like the idea of encapsulating the boilerplate logic in a library.
> >
> > I do think there is an important missing piece though - there should be
> > some code which queries the netdev for how many queues are attached, and
> > create the appropriate number of umem/AF_XDP sockets.
> >
> > I ran into this issue when testing the current AF_XDP code - on my test
> > boxes, the mlx5 card has 55 channels (aka queues), so when the test program
> > binds only to channel 0, nothing works as expected, since not all traffic
> > is being intercepted.  While obvious in hindsight, this took a while to
> > track down.  
> 
> Yes, agreed. You are not the first one to stumble upon this problem
> :-). Let me think a little bit on how to solve this in a good way. We
> need this to be simple and intuitive, as you say.

I see people hitting this with AF_XDP all the time... I had some
backup-slides[2] in our FOSDEM presentation[1] that describe the issue,
give the performance reason why and propose a workaround.

[1] https://github.com/xdp-project/xdp-project/tree/master/conference/FOSDEM2019
[2] https://github.com/xdp-project/xdp-project/blob/master/conference/FOSDEM2019/xdp_building_block.org#backup-slides

Alternative work-around
  * Create as many AF_XDP sockets as RXQs
  * Have userspace poll()/select on all sockets

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

* Backup Slides                                                      :export:

** Slide: Where does AF_XDP performance come from?                  :export:

/Lock-free [[https://lwn.net/Articles/169961/][channel]] directly from driver RX-queue into AF_XDP socket/
- Single-Producer/Single-Consumer (SPSC) descriptor ring queues
- *Single*-/Producer/ (SP) via bind to specific RX-*/queue id/*
  * NAPI-softirq assures only 1-CPU process 1-RX-queue id (per sched)
- *Single*-/Consumer/ (SC) via 1-Application
- *Bounded* buffer pool (UMEM) allocated by userspace (register with kernel)
  * Descriptor(s) in ring(s) point into UMEM
  * /No memory allocation/, but return frames to UMEM in timely manner
- [[http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf][Transport signature]] Van Jacobson talked about
  * Replaced by XDP/eBPF program choosing to XDP_REDIRECT

** Slide: Details: Actually *four* SPSC ring queues                 :export:

AF_XDP /socket/: Has /two rings/: *RX* and *TX*
 - Descriptor(s) in ring points into UMEM
/UMEM/ consists of a number of equally sized chunks
 - Has /two rings/: *FILL* ring and *COMPLETION* ring
 - FILL ring: application gives kernel area to RX fill
 - COMPLETION ring: kernel tells app TX is done for area (can be reused)

** Slide: Gotcha by RX-queue id binding                             :export:

AF_XDP bound to */single RX-queue id/* (for SPSC performance reasons)
- NIC by default spreads flows with RSS-hashing over RX-queues
  * Traffic likely not hitting queue you expect
- You *MUST* configure NIC *HW filters* to /steer to RX-queue id/
  * Out of scope for XDP setup
  * Use ethtool or TC HW offloading for filter setup
- *Alternative* work-around
  * /Create as many AF_XDP sockets as RXQs/
  * Have userspace poll()/select on all sockets

  reply	other threads:[~2019-02-13 11:55 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-08 13:05 [PATCH bpf-next v4 0/2] libbpf: adding AF_XDP support Magnus Karlsson
2019-02-08 13:05 ` [PATCH bpf-next v4 1/2] libbpf: add support for using AF_XDP sockets Magnus Karlsson
2019-02-15 16:37   ` Daniel Borkmann
2019-02-18  8:59     ` Magnus Karlsson
2019-02-18 11:21       ` Maciej Fijalkowski
2019-02-08 13:05 ` [PATCH bpf-next v4 2/2] samples/bpf: convert xdpsock to use libbpf for AF_XDP access Magnus Karlsson
2019-02-11  6:33 ` [PATCH bpf-next v4 0/2] libbpf: adding AF_XDP support Jean-Mickael Guerin
2019-02-11  7:52   ` Magnus Karlsson
2019-02-11 19:48 ` Jonathan Lemon
2019-02-13 11:32   ` Magnus Karlsson
2019-02-13 11:55     ` Jesper Dangaard Brouer [this message]
2019-02-15 16:20       ` Daniel Borkmann
2019-02-18  8:20         ` Magnus Karlsson
2019-02-18  9:38           ` Daniel Borkmann
2019-02-18 10:09             ` Magnus Karlsson
2019-02-13 20:49     ` Jonathan Lemon
2019-02-14  8:25       ` Magnus Karlsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190213125530.4a7fb8bc@carbon \
    --to=brouer@redhat.com \
    --cc=ast@kernel.org \
    --cc=bjorn.topel@gmail.com \
    --cc=bjorn.topel@intel.com \
    --cc=daniel@iogearbox.net \
    --cc=jakub.kicinski@netronome.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=magnus.karlsson@gmail.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=qi.z.zhang@intel.com \
    --cc=xdp-newbies@vger.kernel.org \
    --cc=xiaolong.ye@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.