netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Björn Töpel" <bjorn.topel@gmail.com>
Cc: magnus.karlsson@intel.com, alexander.h.duyck@intel.com,
	alexander.duyck@gmail.com, john.fastabend@gmail.com, ast@fb.com,
	willemdebruijn.kernel@gmail.com, daniel@iogearbox.net,
	netdev@vger.kernel.org, "Björn Töpel" <bjorn.topel@intel.com>,
	michael.lundkvist@ericsson.com, jesse.brandeburg@intel.com,
	anjali.singhai@intel.com, jeffrey.b.shaw@intel.com,
	ferruh.yigit@intel.com, qi.z.zhang@intel.com, brouer@redhat.com
Subject: Re: [RFC PATCH 05/24] bpf: added bpf_xdpsk_redirect
Date: Mon, 5 Feb 2018 14:42:35 +0100	[thread overview]
Message-ID: <20180205144235.00ff28ef@redhat.com> (raw)
In-Reply-To: <20180131135356.19134-6-bjorn.topel@gmail.com>

On Wed, 31 Jan 2018 14:53:37 +0100 Björn Töpel <bjorn.topel@gmail.com> wrote:

> The bpf_xdpsk_redirect call redirects the XDP context to the XDP
> socket bound to the receiving queue (if any).

As I explained in-person at FOSDEM, my suggestion is to use the
bpf-map infrastructure for AF_XDP redirect, but people on this
upstream mailing also need a chance to validate my idea ;-)

The important thing to keep in-mind is how we can still maintain a
SPSC (Single producer Single Consumer) relationship between an
RX-queue and a userspace consumer-process.

This AF_XDP "FOSDEM" patchset, store the "xsk" (xdp_sock) pointer
directly in the net_device (_rx[].netdev_rx_queue.xs) structure.  This
limit each RX-queue to service a single xdp_sock.  It sounds good from
a SPSC pov, but not very flexible.  With a "xdp_sock_map" we can get
the flexibility to select among multiple xdp_sock'ets (via XDP
pre-filter selecting a different map), and still maintain a SPSC
relationship.  The RX-queue will just have several SPSC relationships
with the individual xdp_sock's.

This is true for the AF_XDP-copy mode, and require less driver change.
For the AF_XDP-zero-copy (ZC) mode drivers need significant changes
anyhow, and in ZC case we will have to disallow this multiple
xdp_sock's, which is simply achieved by checking if the xdp_sock
pointer returned from the map lookup match the one that userspace
requested driver to register it's memory for RX-rings from.

The "xdp_sock_map" is an array, where the index correspond to the
queue_index.  The bpf_redirect_map() ignore the specified index and
instead use xdp_rxq_info->queue_index in the lookup.

Notice that a bpf-map have no pinned relationship with the device or
XDP prog loaded.  Thus, userspace need to bind() this map to the
device before traffic can flow, like the proposed bind() on the
xdp_sock.  This is to establish the SPSC binding.  My proposal is that
userspace insert the xdp_sock file-descriptor(s) in the map at the
queue-index, and the map (which is also just a file-descriptor) is
bound maybe via bind() to a specific device (via the ifindex).  Kernel
side will walk the map and do required actions xdp_sock's in find in
map.

TX-side is harder, as now multiple xdp_sock's can have the same
queue-pair ID with the same net_device. But Magnus propose that this
can be solved with hardware. As newer NICs have many TX-queue, and the
queue-pair ID is just an external visible number, while the kernel
internal structure can point to a dedicated TX-queue per xdp_sock.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

  reply	other threads:[~2018-02-05 13:42 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-31 13:53 [RFC PATCH 00/24] Introducing AF_XDP support Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 01/24] xsk: AF_XDP sockets buildable skeleton Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 02/24] xsk: add user memory registration sockopt Björn Töpel
2018-02-07 16:00   ` Willem de Bruijn
2018-02-07 21:39     ` Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 03/24] xsk: added XDP_{R,T}X_RING sockopt and supporting structures Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 04/24] xsk: add bind support and introduce Rx functionality Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 05/24] bpf: added bpf_xdpsk_redirect Björn Töpel
2018-02-05 13:42   ` Jesper Dangaard Brouer [this message]
2018-02-07 21:11     ` Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 06/24] net: wire up xsk support in the XDP_REDIRECT path Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 07/24] xsk: introduce Tx functionality Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 08/24] i40e: add support for XDP_REDIRECT Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 09/24] samples/bpf: added xdpsock program Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 10/24] netdevice: added XDP_{UN,}REGISTER_XSK command to ndo_bpf Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 11/24] netdevice: added ndo for transmitting a packet from an XDP socket Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 12/24] xsk: add iterator functions to xsk_ring Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 13/24] i40e: introduce external allocator support Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 14/24] i40e: implemented page recycling buff_pool Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 15/24] i40e: start using " Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 16/24] i40e: separated buff_pool interface from i40e implementaion Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 17/24] xsk: introduce xsk_buff_pool Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 18/24] xdp: added buff_pool support to struct xdp_buff Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 19/24] xsk: add support for zero copy Rx Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 20/24] xsk: add support for zero copy Tx Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 21/24] i40e: implement xsk sub-commands in ndo_bpf for zero copy Rx Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 22/24] i40e: introduced a clean_tx callback function Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 23/24] i40e: introduced Tx completion callbacks Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 24/24] i40e: Tx support for zero copy allocator Björn Töpel
2018-02-01 16:42 ` [RFC PATCH 00/24] Introducing AF_XDP support Jesper Dangaard Brouer
2018-02-02 10:31 ` Jesper Dangaard Brouer
2018-02-05 15:05 ` Björn Töpel
2018-02-07 15:54   ` Willem de Bruijn
2018-02-07 21:28     ` Björn Töpel
2018-02-08 23:16       ` Willem de Bruijn
2018-02-07 17:59 ` Tom Herbert
2018-02-07 21:38   ` Björn Töpel
2018-03-26 16:06 ` William Tu
2018-03-26 16:38   ` Jesper Dangaard Brouer
2018-03-26 21:58     ` William Tu
2018-03-27  6:09       ` Björn Töpel
2018-03-27  9:37       ` Jesper Dangaard Brouer
2018-03-28  0:06         ` William Tu
2018-03-28  8:01           ` Jesper Dangaard Brouer
2018-03-28 15:05             ` William Tu
2018-03-26 22:54     ` Tushar Dave
2018-03-26 23:03       ` Alexander Duyck
2018-03-26 23:20         ` Tushar Dave
2018-03-28  0:49           ` William Tu
2018-03-27  6:30         ` Björn Töpel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180205144235.00ff28ef@redhat.com \
    --to=brouer@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alexander.h.duyck@intel.com \
    --cc=anjali.singhai@intel.com \
    --cc=ast@fb.com \
    --cc=bjorn.topel@gmail.com \
    --cc=bjorn.topel@intel.com \
    --cc=daniel@iogearbox.net \
    --cc=ferruh.yigit@intel.com \
    --cc=jeffrey.b.shaw@intel.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=john.fastabend@gmail.com \
    --cc=magnus.karlsson@intel.com \
    --cc=michael.lundkvist@ericsson.com \
    --cc=netdev@vger.kernel.org \
    --cc=qi.z.zhang@intel.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).