* [PATCH net-next 0/6] AF_XDP infrastructure improvements
@ 2019-04-26 11:42 Maxim Mikityanskiy
2019-04-26 11:42 ` [PATCH net-next 1/6] samples/bpf: Add hbm to .gitignore Maxim Mikityanskiy
` (6 more replies)
0 siblings, 7 replies; 18+ messages in thread
From: Maxim Mikityanskiy @ 2019-04-26 11:42 UTC (permalink / raw)
To: David S. Miller, Björn Töpel, Magnus Karlsson
Cc: netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan, Maxim Mikityanskiy
This series contains improvements to the AF_XDP kernel infrastructure.
They are required for the upcoming AF_XDP support in mlx5e, but also
some of them benefit to all drivers, and some can be useful for other
drivers that want to implement AF_XDP.
Maxim Mikityanskiy (6):
samples/bpf: Add hbm to .gitignore
xsk: Add API to check for available entries in FQ
xsk: Add getsockopt XDP_OPTIONS
xsk: Extend channels to support combined XSK/non-XSK traffic
xsk: Change the default frame size to 4096 and allow controlling it
xsk: Return the whole xdp_desc from xsk_umem_consume_tx
drivers/net/ethernet/intel/i40e/i40e_xsk.c | 12 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 15 ++-
include/net/xdp_sock.h | 28 +++-
include/uapi/linux/if_xdp.h | 18 +++
net/xdp/xsk.c | 45 +++++--
net/xdp/xsk_queue.h | 15 +++
samples/bpf/.gitignore | 1 +
samples/bpf/xdpsock_user.c | 47 +++++--
tools/include/uapi/linux/if_xdp.h | 18 +++
tools/lib/bpf/xsk.c | 127 +++++++++++++++----
tools/lib/bpf/xsk.h | 6 +-
11 files changed, 267 insertions(+), 65 deletions(-)
--
2.19.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH net-next 1/6] samples/bpf: Add hbm to .gitignore
2019-04-26 11:42 [PATCH net-next 0/6] AF_XDP infrastructure improvements Maxim Mikityanskiy
@ 2019-04-26 11:42 ` Maxim Mikityanskiy
2019-04-26 14:57 ` Maciej Fijalkowski
2019-04-26 11:42 ` [PATCH net-next 2/6] xsk: Add API to check for available entries in FQ Maxim Mikityanskiy
` (5 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Maxim Mikityanskiy @ 2019-04-26 11:42 UTC (permalink / raw)
To: David S. Miller, Björn Töpel, Magnus Karlsson
Cc: netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan, Maxim Mikityanskiy
The hbm binary was missing from .gitignore.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
---
samples/bpf/.gitignore | 1 +
1 file changed, 1 insertion(+)
diff --git a/samples/bpf/.gitignore b/samples/bpf/.gitignore
index 59e40998e249..c7498457595a 100644
--- a/samples/bpf/.gitignore
+++ b/samples/bpf/.gitignore
@@ -1,5 +1,6 @@
cpustat
fds_example
+hbm
lathist
lwt_len_hist
map_perf_test
--
2.19.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next 2/6] xsk: Add API to check for available entries in FQ
2019-04-26 11:42 [PATCH net-next 0/6] AF_XDP infrastructure improvements Maxim Mikityanskiy
2019-04-26 11:42 ` [PATCH net-next 1/6] samples/bpf: Add hbm to .gitignore Maxim Mikityanskiy
@ 2019-04-26 11:42 ` Maxim Mikityanskiy
2019-04-26 15:18 ` Alexei Starovoitov
2019-04-26 11:42 ` [PATCH net-next 3/6] xsk: Add getsockopt XDP_OPTIONS Maxim Mikityanskiy
` (4 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Maxim Mikityanskiy @ 2019-04-26 11:42 UTC (permalink / raw)
To: David S. Miller, Björn Töpel, Magnus Karlsson
Cc: netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan, Maxim Mikityanskiy
Add a function that checks whether the Fill Ring has the specified
amount of descriptors available. It will be useful for mlx5e that wants
to check in advance, whether it can allocate a bulk of RX descriptors,
to get the best performance.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
---
include/net/xdp_sock.h | 22 ++++++++++++++++++++++
net/xdp/xsk.c | 8 ++++++++
net/xdp/xsk_queue.h | 15 +++++++++++++++
3 files changed, 45 insertions(+)
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index d074b6d60f8a..e466b4bf68b4 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -1,6 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* AF_XDP internal functions
* Copyright(c) 2018 Intel Corporation.
+ * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
*/
#ifndef _LINUX_XDP_SOCK_H
@@ -77,6 +78,7 @@ int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp);
void xsk_flush(struct xdp_sock *xs);
bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs);
/* Used from netdev driver */
+bool xsk_umem_has_addrs(struct xdp_umem *umem, u32 cnt);
u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr);
void xsk_umem_discard_addr(struct xdp_umem *umem);
void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries);
@@ -99,6 +101,16 @@ static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 addr)
}
/* Reuse-queue aware version of FILL queue helpers */
+static inline bool xsk_umem_has_addrs_rq(struct xdp_umem *umem, u32 cnt)
+{
+ struct xdp_umem_fq_reuse *rq = umem->fq_reuse;
+
+ if (rq->length >= cnt)
+ return true;
+
+ return xsk_umem_has_addrs(umem, cnt - rq->length);
+}
+
static inline u64 *xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr)
{
struct xdp_umem_fq_reuse *rq = umem->fq_reuse;
@@ -146,6 +158,11 @@ static inline bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
return false;
}
+static inline bool xsk_umem_has_addrs(struct xdp_umem *umem, u32 cnt)
+{
+ return false;
+}
+
static inline u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
{
return NULL;
@@ -200,6 +217,11 @@ static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 addr)
return 0;
}
+static inline bool xsk_umem_has_addrs_rq(struct xdp_umem *umem, u32 cnt)
+{
+ return false;
+}
+
static inline u64 *xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr)
{
return NULL;
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index a14e8864e4fa..ca764ed9da41 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -4,9 +4,11 @@
* AF_XDP sockets allows a channel between XDP programs and userspace
* applications.
* Copyright(c) 2018 Intel Corporation.
+ * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
*
* Author(s): Björn Töpel <bjorn.topel@intel.com>
* Magnus Karlsson <magnus.karlsson@intel.com>
+ * Maxim Mikityanskiy <maximmi@mellanox.com>
*/
#define pr_fmt(fmt) "AF_XDP: %s: " fmt, __func__
@@ -37,6 +39,12 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
READ_ONCE(xs->umem->fq);
}
+bool xsk_umem_has_addrs(struct xdp_umem *umem, u32 cnt)
+{
+ return xskq_has_addrs(umem->fq, cnt);
+}
+EXPORT_SYMBOL(xsk_umem_has_addrs);
+
u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
{
return xskq_peek_addr(umem->fq, addr);
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 610c0bdc0c2b..8e26aff59fd4 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -1,6 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* XDP user-space ring structure
* Copyright(c) 2018 Intel Corporation.
+ * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
*/
#ifndef _LINUX_XSK_QUEUE_H
@@ -75,6 +76,20 @@ static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt)
return q->nentries - (producer - q->cons_tail);
}
+static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
+{
+ u32 entries = q->prod_tail - q->cons_tail;
+
+ if (entries >= cnt)
+ return true;
+
+ /* Refresh the local pointer. */
+ q->prod_tail = READ_ONCE(q->ring->producer);
+ entries = q->prod_tail - q->cons_tail;
+
+ return entries >= cnt;
+}
+
/* UMEM queue */
static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
--
2.19.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next 3/6] xsk: Add getsockopt XDP_OPTIONS
2019-04-26 11:42 [PATCH net-next 0/6] AF_XDP infrastructure improvements Maxim Mikityanskiy
2019-04-26 11:42 ` [PATCH net-next 1/6] samples/bpf: Add hbm to .gitignore Maxim Mikityanskiy
2019-04-26 11:42 ` [PATCH net-next 2/6] xsk: Add API to check for available entries in FQ Maxim Mikityanskiy
@ 2019-04-26 11:42 ` Maxim Mikityanskiy
2019-04-26 20:25 ` Björn Töpel
2019-04-26 11:42 ` [PATCH net-next 4/6] xsk: Extend channels to support combined XSK/non-XSK traffic Maxim Mikityanskiy
` (3 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Maxim Mikityanskiy @ 2019-04-26 11:42 UTC (permalink / raw)
To: David S. Miller, Björn Töpel, Magnus Karlsson
Cc: netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan, Maxim Mikityanskiy
Make it possible for the application to determine whether the AF_XDP
socket is running in zero-copy mode. To achieve this, add a new
getsockopt option XDP_OPTIONS that returns flags. The only flag
supported for now is the zero-copy mode indicator.
Also query XDP_OPTIONS in the xdpsock sample application. The zero-copy
flag will be used in the following commits.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
---
include/uapi/linux/if_xdp.h | 7 +++++++
net/xdp/xsk.c | 22 ++++++++++++++++++++++
tools/include/uapi/linux/if_xdp.h | 7 +++++++
tools/lib/bpf/xsk.c | 11 +++++++++++
4 files changed, 47 insertions(+)
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index caed8b1614ff..9ae4b4e08b68 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -46,6 +46,7 @@ struct xdp_mmap_offsets {
#define XDP_UMEM_FILL_RING 5
#define XDP_UMEM_COMPLETION_RING 6
#define XDP_STATISTICS 7
+#define XDP_OPTIONS 8
struct xdp_umem_reg {
__u64 addr; /* Start of packet data area */
@@ -60,6 +61,12 @@ struct xdp_statistics {
__u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
};
+struct xdp_options {
+ __u32 flags;
+};
+
+#define XDP_OPTIONS_FLAG_ZEROCOPY (1 << 0)
+
/* Pgoff for mmaping the rings */
#define XDP_PGOFF_RX_RING 0
#define XDP_PGOFF_TX_RING 0x80000000
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ca764ed9da41..ff8427552e8d 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -652,6 +652,28 @@ static int xsk_getsockopt(struct socket *sock, int level, int optname,
return 0;
}
+ case XDP_OPTIONS:
+ {
+ struct xdp_options opts;
+
+ if (len < sizeof(opts))
+ return -EINVAL;
+
+ opts.flags = 0;
+
+ mutex_lock(&xs->mutex);
+ if (xs->zc)
+ opts.flags |= XDP_OPTIONS_FLAG_ZEROCOPY;
+ mutex_unlock(&xs->mutex);
+
+ len = sizeof(opts);
+ if (copy_to_user(optval, &opts, len))
+ return -EFAULT;
+ if (put_user(len, optlen))
+ return -EFAULT;
+
+ return 0;
+ }
default:
break;
}
diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index caed8b1614ff..9ae4b4e08b68 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -46,6 +46,7 @@ struct xdp_mmap_offsets {
#define XDP_UMEM_FILL_RING 5
#define XDP_UMEM_COMPLETION_RING 6
#define XDP_STATISTICS 7
+#define XDP_OPTIONS 8
struct xdp_umem_reg {
__u64 addr; /* Start of packet data area */
@@ -60,6 +61,12 @@ struct xdp_statistics {
__u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
};
+struct xdp_options {
+ __u32 flags;
+};
+
+#define XDP_OPTIONS_FLAG_ZEROCOPY (1 << 0)
+
/* Pgoff for mmaping the rings */
#define XDP_PGOFF_RX_RING 0
#define XDP_PGOFF_TX_RING 0x80000000
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 8d0078b65486..50d7235e3b21 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -67,6 +67,7 @@ struct xsk_socket {
int xsks_map_fd;
__u32 queue_id;
char ifname[IFNAMSIZ];
+ bool zc;
};
struct xsk_nl_info {
@@ -525,6 +526,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
{
struct sockaddr_xdp sxdp = {};
struct xdp_mmap_offsets off;
+ struct xdp_options opts;
struct xsk_socket *xsk;
socklen_t optlen;
void *map;
@@ -642,6 +644,15 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
goto out_mmap_tx;
}
+ optlen = sizeof(opts);
+ err = getsockopt(xsk->fd, SOL_XDP, XDP_OPTIONS, &opts, &optlen);
+ if (err) {
+ err = -errno;
+ goto out_mmap_tx;
+ }
+
+ xsk->zc = opts.flags & XDP_OPTIONS_FLAG_ZEROCOPY;
+
if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) {
err = xsk_setup_xdp_prog(xsk);
if (err)
--
2.19.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next 4/6] xsk: Extend channels to support combined XSK/non-XSK traffic
2019-04-26 11:42 [PATCH net-next 0/6] AF_XDP infrastructure improvements Maxim Mikityanskiy
` (2 preceding siblings ...)
2019-04-26 11:42 ` [PATCH net-next 3/6] xsk: Add getsockopt XDP_OPTIONS Maxim Mikityanskiy
@ 2019-04-26 11:42 ` Maxim Mikityanskiy
2019-04-26 15:24 ` Alexei Starovoitov
` (2 more replies)
2019-04-26 11:42 ` [PATCH net-next 5/6] xsk: Change the default frame size to 4096 and allow controlling it Maxim Mikityanskiy
` (2 subsequent siblings)
6 siblings, 3 replies; 18+ messages in thread
From: Maxim Mikityanskiy @ 2019-04-26 11:42 UTC (permalink / raw)
To: David S. Miller, Björn Töpel, Magnus Karlsson
Cc: netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan, Maxim Mikityanskiy
Currently, the drivers that implement AF_XDP zero-copy support (e.g.,
i40e) switch the channel into a different mode when an XSK is opened. It
causes some issues that have to be taken into account. For example, RSS
needs to be reconfigured to skip the XSK-enabled channels, or the XDP
program should filter out traffic not intended for that socket and
XDP_PASS it with an additional copy. As nothing validates or forces the
proper configuration, it's easy to have packets drops, when they get
into an XSK by mistake, and, in fact, it's the default configuration.
There has to be some tool to have RSS reconfigured on each socket open
and close event, but such a tool is problematic to implement, because no
one reports these events, and it's race-prone.
This commit extends XSK to support both kinds of traffic (XSK and
non-XSK) in the same channel. It implies having two RX queues in
XSK-enabled channels: one for the regular traffic, and the other for
XSK. It solves the problem with RSS: the default configuration just
works without the need to manually reconfigure RSS or to perform some
possibly complicated filtering in the XDP layer. It makes it easy to run
both AF_XDP and regular sockets on the same machine. In the XDP program,
the QID's most significant bit will serve as a flag to indicate whether
it's the XSK queue or not. The extension is compatible with the legacy
configuration, so if one wants to run the legacy mode, they can
reconfigure RSS and ignore the flag in the XDP program (implemented in
the reference XDP program in libbpf). mlx5e will support this extension.
A single XDP program can run both with drivers supporting or not
supporting this extension. The xdpsock sample and libbpf are updated
accordingly.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
---
include/uapi/linux/if_xdp.h | 11 +++
net/xdp/xsk.c | 5 +-
samples/bpf/xdpsock_user.c | 10 ++-
tools/include/uapi/linux/if_xdp.h | 11 +++
tools/lib/bpf/xsk.c | 116 ++++++++++++++++++++++--------
tools/lib/bpf/xsk.h | 4 ++
6 files changed, 126 insertions(+), 31 deletions(-)
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index 9ae4b4e08b68..cf6ff1ecc6bd 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -82,4 +82,15 @@ struct xdp_desc {
/* UMEM descriptor is __u64 */
+/* The driver may run a dedicated XSK RQ in the channel. The XDP program uses
+ * this flag bit in the queue index to distinguish between two RQs of the same
+ * channel.
+ */
+#define XDP_QID_FLAG_XSKRQ (1 << 31)
+
+static inline __u32 xdp_qid_get_channel(__u32 qid)
+{
+ return qid & ~XDP_QID_FLAG_XSKRQ;
+}
+
#endif /* _LINUX_IF_XDP_H */
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ff8427552e8d..abc0d2e013e3 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -106,9 +106,12 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
{
+ struct xdp_rxq_info *rxq = xdp->rxq;
+ u32 channel = xdp_qid_get_channel(rxq->queue_index);
u32 len;
- if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
+ if (xs->dev != rxq->dev || xs->queue_id != channel ||
+ xs->zc != (rxq->mem.type == MEM_TYPE_ZERO_COPY))
return -EINVAL;
len = xdp->data_end - xdp->data;
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index d08ee1ab7bb4..a6b13025ee79 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -62,6 +62,7 @@ enum benchmark_type {
static enum benchmark_type opt_bench = BENCH_RXDROP;
static u32 opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+static u32 opt_libbpf_flags;
static const char *opt_if = "";
static int opt_ifindex;
static int opt_queue;
@@ -306,7 +307,7 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
xsk->umem = umem;
cfg.rx_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
cfg.tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
- cfg.libbpf_flags = 0;
+ cfg.libbpf_flags = opt_libbpf_flags;
cfg.xdp_flags = opt_xdp_flags;
cfg.bind_flags = opt_xdp_bind_flags;
ret = xsk_socket__create(&xsk->xsk, opt_if, opt_queue, umem->umem,
@@ -346,6 +347,7 @@ static struct option long_options[] = {
{"interval", required_argument, 0, 'n'},
{"zero-copy", no_argument, 0, 'z'},
{"copy", no_argument, 0, 'c'},
+ {"combined", no_argument, 0, 'C'},
{0, 0, 0, 0}
};
@@ -365,6 +367,7 @@ static void usage(const char *prog)
" -n, --interval=n Specify statistics update interval (default 1 sec).\n"
" -z, --zero-copy Force zero-copy mode.\n"
" -c, --copy Force copy mode.\n"
+ " -C, --combined Driver supports combined XSK and non-XSK traffic in a channel.\n"
"\n";
fprintf(stderr, str, prog);
exit(EXIT_FAILURE);
@@ -377,7 +380,7 @@ static void parse_command_line(int argc, char **argv)
opterr = 0;
for (;;) {
- c = getopt_long(argc, argv, "Frtli:q:psSNn:cz", long_options,
+ c = getopt_long(argc, argv, "Frtli:q:psSNn:czC", long_options,
&option_index);
if (c == -1)
break;
@@ -420,6 +423,9 @@ static void parse_command_line(int argc, char **argv)
case 'F':
opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
break;
+ case 'C':
+ opt_libbpf_flags |= XSK_LIBBPF_FLAGS__COMBINED_CHANNELS;
+ break;
default:
usage(basename(argv[0]));
}
diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index 9ae4b4e08b68..cf6ff1ecc6bd 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -82,4 +82,15 @@ struct xdp_desc {
/* UMEM descriptor is __u64 */
+/* The driver may run a dedicated XSK RQ in the channel. The XDP program uses
+ * this flag bit in the queue index to distinguish between two RQs of the same
+ * channel.
+ */
+#define XDP_QID_FLAG_XSKRQ (1 << 31)
+
+static inline __u32 xdp_qid_get_channel(__u32 qid)
+{
+ return qid & ~XDP_QID_FLAG_XSKRQ;
+}
+
#endif /* _LINUX_IF_XDP_H */
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 50d7235e3b21..451638b6951b 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -76,6 +76,12 @@ struct xsk_nl_info {
int fd;
};
+enum qidconf {
+ QIDCONF_REGULAR,
+ QIDCONF_XSK,
+ QIDCONF_XSK_COMBINED,
+};
+
/* For 32-bit systems, we need to use mmap2 as the offsets are 64-bit.
* Unfortunately, it is not part of glibc.
*/
@@ -139,7 +145,7 @@ static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
return 0;
}
- if (usr_cfg->libbpf_flags & ~XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)
+ if (usr_cfg->libbpf_flags & ~XSK_LIBBPF_FLAGS_MASK)
return -EINVAL;
cfg->rx_size = usr_cfg->rx_size;
@@ -266,44 +272,93 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
/* This is the C-program:
* SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
* {
- * int *qidconf, index = ctx->rx_queue_index;
+ * int *qidconf, qc;
+ * int index = ctx->rx_queue_index & ~(1 << 31);
+ * bool is_xskrq = ctx->rx_queue_index & (1 << 31);
*
- * // A set entry here means that the correspnding queue_id
- * // has an active AF_XDP socket bound to it.
+ * // A set entry here means that the corresponding queue_id
+ * // has an active AF_XDP socket bound to it. Value 2 means
+ * // it's zero-copy multi-RQ mode.
* qidconf = bpf_map_lookup_elem(&qidconf_map, &index);
* if (!qidconf)
* return XDP_ABORTED;
*
- * if (*qidconf)
+ * qc = *qidconf;
+ *
+ * if (qc == 2)
+ * qc = is_xskrq ? 1 : 0;
+ *
+ * switch (qc) {
+ * case 0:
+ * return XDP_PASS;
+ * case 1:
* return bpf_redirect_map(&xsks_map, index, 0);
+ * }
*
- * return XDP_PASS;
+ * return XDP_ABORTED;
* }
*/
struct bpf_insn prog[] = {
- /* r1 = *(u32 *)(r1 + 16) */
- BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 16),
- /* *(u32 *)(r10 - 4) = r1 */
- BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_1, -4),
- BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
- BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
- BPF_LD_MAP_FD(BPF_REG_1, xsk->qidconf_map_fd),
+ /* Load index. */
+ /* r6 = *(u32 *)(r1 + 16) */
+ BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_ARG1, 16),
+ /* w7 = w6 */
+ BPF_MOV32_REG(BPF_REG_7, BPF_REG_6),
+ /* w7 &= 2147483647 */
+ BPF_ALU32_IMM(BPF_AND, BPF_REG_7, ~XDP_QID_FLAG_XSKRQ),
+ /* *(u32 *)(r10 - 4) = r7 */
+ BPF_STX_MEM(BPF_W, BPF_REG_FP, BPF_REG_7, -4),
+
+ /* Call bpf_map_lookup_elem. */
+ /* r2 = r10 */
+ BPF_MOV64_REG(BPF_REG_ARG2, BPF_REG_FP),
+ /* r2 += -4 */
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_ARG2, -4),
+ /* r1 = qidconf_map ll */
+ BPF_LD_MAP_FD(BPF_REG_ARG1, xsk->qidconf_map_fd),
+ /* call 1 */
BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
- BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
- BPF_MOV32_IMM(BPF_REG_0, 0),
- /* if r1 == 0 goto +8 */
- BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 8),
- BPF_MOV32_IMM(BPF_REG_0, 2),
- /* r1 = *(u32 *)(r1 + 0) */
- BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 0),
- /* if r1 == 0 goto +5 */
- BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 5),
- /* r2 = *(u32 *)(r10 - 4) */
- BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd),
- BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_10, -4),
+
+ /* Check the return value. */
+ /* if r0 == 0 goto +14 */
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 14),
+
+ /* Check qc == QIDCONF_XSK_COMBINED. */
+ /* r6 >>= 31 */
+ BPF_ALU64_IMM(BPF_RSH, BPF_REG_6, 31),
+ /* r1 = *(u32 *)(r0 + 0) */
+ BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0),
+ /* if r1 == 2 goto +1 */
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, QIDCONF_XSK_COMBINED, 1),
+
+ /* qc != QIDCONF_XSK_COMBINED */
+ /* r6 = r1 */
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+
+ /* switch (qc) */
+ /* w0 = 2 */
+ BPF_MOV32_IMM(BPF_REG_0, XDP_PASS),
+ /* if w6 == 0 goto +8 */
+ BPF_JMP32_IMM(BPF_JEQ, BPF_REG_6, QIDCONF_REGULAR, 8),
+ /* if w6 != 1 goto +6 */
+ BPF_JMP32_IMM(BPF_JNE, BPF_REG_6, QIDCONF_XSK, 6),
+
+ /* Call bpf_redirect_map. */
+ /* r1 = xsks_map ll */
+ BPF_LD_MAP_FD(BPF_REG_ARG1, xsk->xsks_map_fd),
+ /* w2 = w7 */
+ BPF_MOV32_REG(BPF_REG_ARG2, BPF_REG_7),
+ /* w3 = 0 */
BPF_MOV32_IMM(BPF_REG_3, 0),
+ /* call 51 */
BPF_EMIT_CALL(BPF_FUNC_redirect_map),
- /* The jumps are to this instruction */
+ /* exit */
+ BPF_EXIT_INSN(),
+
+ /* XDP_ABORTED */
+ /* w0 = 0 */
+ BPF_MOV32_IMM(BPF_REG_0, XDP_ABORTED),
+ /* exit */
BPF_EXIT_INSN(),
};
size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
@@ -482,6 +537,7 @@ static int xsk_update_bpf_maps(struct xsk_socket *xsk, int qidconf_value,
static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
{
+ int qidconf_value = QIDCONF_XSK;
bool prog_attached = false;
__u32 prog_id = 0;
int err;
@@ -504,7 +560,11 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
xsk->prog_fd = bpf_prog_get_fd_by_id(prog_id);
}
- err = xsk_update_bpf_maps(xsk, true, xsk->fd);
+ if (xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__COMBINED_CHANNELS)
+ if (xsk->zc)
+ qidconf_value = QIDCONF_XSK_COMBINED;
+
+ err = xsk_update_bpf_maps(xsk, qidconf_value, xsk->fd);
if (err)
goto out_load;
@@ -716,7 +776,7 @@ void xsk_socket__delete(struct xsk_socket *xsk)
if (!xsk)
return;
- (void)xsk_update_bpf_maps(xsk, 0, 0);
+ (void)xsk_update_bpf_maps(xsk, QIDCONF_REGULAR, 0);
optlen = sizeof(off);
err = getsockopt(xsk->fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index a497f00e2962..72a49b102ab3 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -170,6 +170,10 @@ struct xsk_umem_config {
/* Flags for the libbpf_flags field. */
#define XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD (1 << 0)
+#define XSK_LIBBPF_FLAGS__COMBINED_CHANNELS (1 << 1)
+#define XSK_LIBBPF_FLAGS_MASK ( \
+ XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD | \
+ XSK_LIBBPF_FLAGS__COMBINED_CHANNELS)
struct xsk_socket_config {
__u32 rx_size;
--
2.19.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next 5/6] xsk: Change the default frame size to 4096 and allow controlling it
2019-04-26 11:42 [PATCH net-next 0/6] AF_XDP infrastructure improvements Maxim Mikityanskiy
` (3 preceding siblings ...)
2019-04-26 11:42 ` [PATCH net-next 4/6] xsk: Extend channels to support combined XSK/non-XSK traffic Maxim Mikityanskiy
@ 2019-04-26 11:42 ` Maxim Mikityanskiy
2019-04-26 20:27 ` Björn Töpel
2019-04-26 11:42 ` [PATCH net-next 6/6] xsk: Return the whole xdp_desc from xsk_umem_consume_tx Maxim Mikityanskiy
2019-04-26 20:25 ` [PATCH net-next 0/6] AF_XDP infrastructure improvements Björn Töpel
6 siblings, 1 reply; 18+ messages in thread
From: Maxim Mikityanskiy @ 2019-04-26 11:42 UTC (permalink / raw)
To: David S. Miller, Björn Töpel, Magnus Karlsson
Cc: netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan, Maxim Mikityanskiy
The typical XDP memory scheme is one packet per page. Change the AF_XDP
frame size in libbpf to 4096, which is the page size on x86, to allow
libbpf to be used with the drivers with the packet-per-page scheme.
Add a command line option -f to xdpsock to allow to specify a custom
frame size.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
---
samples/bpf/xdpsock_user.c | 39 +++++++++++++++++++++++++-------------
tools/lib/bpf/xsk.h | 2 +-
2 files changed, 27 insertions(+), 14 deletions(-)
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index a6b13025ee79..3746f699cf2f 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -69,6 +69,7 @@ static int opt_queue;
static int opt_poll;
static int opt_interval = 1;
static u32 opt_xdp_bind_flags;
+static int opt_xsk_frame_shift = XSK_UMEM__DEFAULT_FRAME_SHIFT;
static __u32 prog_id;
struct xsk_umem_info {
@@ -93,6 +94,11 @@ struct xsk_socket_info {
static int num_socks;
struct xsk_socket_info *xsks[MAX_SOCKS];
+static inline int get_buf_size(int frames)
+{
+ return frames << opt_xsk_frame_shift;
+}
+
static unsigned long get_nsecs(void)
{
struct timespec ts;
@@ -277,6 +283,12 @@ static size_t gen_eth_frame(struct xsk_umem_info *umem, u64 addr)
static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
{
struct xsk_umem_info *umem;
+ struct xsk_umem_config cfg = {
+ .fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
+ .comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
+ .frame_size = get_buf_size(1),
+ .frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
+ };
int ret;
umem = calloc(1, sizeof(*umem));
@@ -284,7 +296,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
exit_with_error(errno);
ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
- NULL);
+ &cfg);
if (ret)
exit_with_error(-ret);
@@ -325,9 +337,8 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS)
exit_with_error(-ret);
for (i = 0;
- i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
- XSK_UMEM__DEFAULT_FRAME_SIZE;
- i += XSK_UMEM__DEFAULT_FRAME_SIZE)
+ i < get_buf_size(XSK_RING_PROD__DEFAULT_NUM_DESCS);
+ i += get_buf_size(1))
*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = i;
xsk_ring_prod__submit(&xsk->umem->fq,
XSK_RING_PROD__DEFAULT_NUM_DESCS);
@@ -348,6 +359,7 @@ static struct option long_options[] = {
{"zero-copy", no_argument, 0, 'z'},
{"copy", no_argument, 0, 'c'},
{"combined", no_argument, 0, 'C'},
+ {"frame-shift", required_argument, 0, 'f'},
{0, 0, 0, 0}
};
@@ -368,8 +380,9 @@ static void usage(const char *prog)
" -z, --zero-copy Force zero-copy mode.\n"
" -c, --copy Force copy mode.\n"
" -C, --combined Driver supports combined XSK and non-XSK traffic in a channel.\n"
+ " -f, --frame-shift=n Set the frame shift (default is %d, frame size is 1 << frame-shift).\n"
"\n";
- fprintf(stderr, str, prog);
+ fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SHIFT);
exit(EXIT_FAILURE);
}
@@ -380,7 +393,7 @@ static void parse_command_line(int argc, char **argv)
opterr = 0;
for (;;) {
- c = getopt_long(argc, argv, "Frtli:q:psSNn:czC", long_options,
+ c = getopt_long(argc, argv, "Frtli:q:psSNn:czCf:", long_options,
&option_index);
if (c == -1)
break;
@@ -426,6 +439,9 @@ static void parse_command_line(int argc, char **argv)
case 'C':
opt_libbpf_flags |= XSK_LIBBPF_FLAGS__COMBINED_CHANNELS;
break;
+ case 'f':
+ opt_xsk_frame_shift = atoi(optarg);
+ break;
default:
usage(basename(argv[0]));
}
@@ -589,8 +605,7 @@ static void tx_only(struct xsk_socket_info *xsk)
for (i = 0; i < BATCH_SIZE; i++) {
xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr
- = (frame_nb + i) <<
- XSK_UMEM__DEFAULT_FRAME_SHIFT;
+ = get_buf_size(frame_nb + i);
xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len =
sizeof(pkt_data) - 1;
}
@@ -667,20 +682,18 @@ int main(int argc, char **argv)
}
ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
- NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
+ get_buf_size(NUM_FRAMES));
if (ret)
exit_with_error(ret);
/* Create sockets... */
- umem = xsk_configure_umem(bufs,
- NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
+ umem = xsk_configure_umem(bufs, get_buf_size(NUM_FRAMES));
xsks[num_socks++] = xsk_configure_socket(umem);
if (opt_bench == BENCH_TXONLY) {
int i;
- for (i = 0; i < NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE;
- i += XSK_UMEM__DEFAULT_FRAME_SIZE)
+ for (i = 0; i < get_buf_size(NUM_FRAMES); i += get_buf_size(1))
(void)gen_eth_frame(umem, i);
}
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 72a49b102ab3..946245a7272e 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -157,7 +157,7 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
#define XSK_RING_CONS__DEFAULT_NUM_DESCS 2048
#define XSK_RING_PROD__DEFAULT_NUM_DESCS 2048
-#define XSK_UMEM__DEFAULT_FRAME_SHIFT 11 /* 2048 bytes */
+#define XSK_UMEM__DEFAULT_FRAME_SHIFT 12 /* 4096 bytes */
#define XSK_UMEM__DEFAULT_FRAME_SIZE (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
#define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
--
2.19.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next 6/6] xsk: Return the whole xdp_desc from xsk_umem_consume_tx
2019-04-26 11:42 [PATCH net-next 0/6] AF_XDP infrastructure improvements Maxim Mikityanskiy
` (4 preceding siblings ...)
2019-04-26 11:42 ` [PATCH net-next 5/6] xsk: Change the default frame size to 4096 and allow controlling it Maxim Mikityanskiy
@ 2019-04-26 11:42 ` Maxim Mikityanskiy
2019-04-26 20:25 ` [PATCH net-next 0/6] AF_XDP infrastructure improvements Björn Töpel
6 siblings, 0 replies; 18+ messages in thread
From: Maxim Mikityanskiy @ 2019-04-26 11:42 UTC (permalink / raw)
To: David S. Miller, Björn Töpel, Magnus Karlsson
Cc: netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan, Maxim Mikityanskiy
Some drivers want to access the data transmitted in order to implement
acceleration features of the NICs. It is also useful in AF_XDP TX flow.
Change the xsk_umem_consume_tx API to return the whole xdp_desc, that
contains the data pointer, length and DMA address, instead of only the
latter two. Adapt the implementation of i40e and ixgbe to this change.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_xsk.c | 12 +++++++-----
drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 15 +++++++++------
include/net/xdp_sock.h | 6 +++---
net/xdp/xsk.c | 10 +++-------
4 files changed, 22 insertions(+), 21 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index b5c182e688e3..669eb67470d9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -637,8 +637,8 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
struct i40e_tx_desc *tx_desc = NULL;
struct i40e_tx_buffer *tx_bi;
bool work_done = true;
+ struct xdp_desc desc;
dma_addr_t dma;
- u32 len;
while (budget-- > 0) {
if (!unlikely(I40E_DESC_UNUSED(xdp_ring))) {
@@ -647,21 +647,23 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
break;
}
- if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &dma, &len))
+ if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
break;
- dma_sync_single_for_device(xdp_ring->dev, dma, len,
+ dma = xdp_umem_get_dma(xdp_ring->xsk_umem, desc.addr);
+
+ dma_sync_single_for_device(xdp_ring->dev, dma, desc.len,
DMA_BIDIRECTIONAL);
tx_bi = &xdp_ring->tx_bi[xdp_ring->next_to_use];
- tx_bi->bytecount = len;
+ tx_bi->bytecount = desc.len;
tx_desc = I40E_TX_DESC(xdp_ring, xdp_ring->next_to_use);
tx_desc->buffer_addr = cpu_to_le64(dma);
tx_desc->cmd_type_offset_bsz =
build_ctob(I40E_TX_DESC_CMD_ICRC
| I40E_TX_DESC_CMD_EOP,
- 0, len, 0);
+ 0, desc.len, 0);
xdp_ring->next_to_use++;
if (xdp_ring->next_to_use == xdp_ring->count)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index bfe95ce0bd7f..0297a70a4e2d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -621,8 +621,9 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
union ixgbe_adv_tx_desc *tx_desc = NULL;
struct ixgbe_tx_buffer *tx_bi;
bool work_done = true;
- u32 len, cmd_type;
+ struct xdp_desc desc;
dma_addr_t dma;
+ u32 cmd_type;
while (budget-- > 0) {
if (unlikely(!ixgbe_desc_unused(xdp_ring)) ||
@@ -631,14 +632,16 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
break;
}
- if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &dma, &len))
+ if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
break;
- dma_sync_single_for_device(xdp_ring->dev, dma, len,
+ dma = xdp_umem_get_dma(xdp_ring->xsk_umem, desc.addr);
+
+ dma_sync_single_for_device(xdp_ring->dev, dma, desc.len,
DMA_BIDIRECTIONAL);
tx_bi = &xdp_ring->tx_buffer_info[xdp_ring->next_to_use];
- tx_bi->bytecount = len;
+ tx_bi->bytecount = desc.len;
tx_bi->xdpf = NULL;
tx_desc = IXGBE_TX_DESC(xdp_ring, xdp_ring->next_to_use);
@@ -648,10 +651,10 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
cmd_type = IXGBE_ADVTXD_DTYP_DATA |
IXGBE_ADVTXD_DCMD_DEXT |
IXGBE_ADVTXD_DCMD_IFCS;
- cmd_type |= len | IXGBE_TXD_CMD;
+ cmd_type |= desc.len | IXGBE_TXD_CMD;
tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
tx_desc->read.olinfo_status =
- cpu_to_le32(len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+ cpu_to_le32(desc.len << IXGBE_ADVTXD_PAYLEN_SHIFT);
xdp_ring->next_to_use++;
if (xdp_ring->next_to_use == xdp_ring->count)
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index e466b4bf68b4..efc1d03615db 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -82,7 +82,7 @@ bool xsk_umem_has_addrs(struct xdp_umem *umem, u32 cnt);
u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr);
void xsk_umem_discard_addr(struct xdp_umem *umem);
void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries);
-bool xsk_umem_consume_tx(struct xdp_umem *umem, dma_addr_t *dma, u32 *len);
+bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc);
void xsk_umem_consume_tx_done(struct xdp_umem *umem);
struct xdp_umem_fq_reuse *xsk_reuseq_prepare(u32 nentries);
struct xdp_umem_fq_reuse *xsk_reuseq_swap(struct xdp_umem *umem,
@@ -176,8 +176,8 @@ static inline void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries)
{
}
-static inline bool xsk_umem_consume_tx(struct xdp_umem *umem, dma_addr_t *dma,
- u32 *len)
+static inline bool xsk_umem_consume_tx(struct xdp_umem *umem,
+ struct xdp_desc *desc)
{
return false;
}
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index abc0d2e013e3..774169b107dc 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -177,22 +177,18 @@ void xsk_umem_consume_tx_done(struct xdp_umem *umem)
}
EXPORT_SYMBOL(xsk_umem_consume_tx_done);
-bool xsk_umem_consume_tx(struct xdp_umem *umem, dma_addr_t *dma, u32 *len)
+bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
{
- struct xdp_desc desc;
struct xdp_sock *xs;
rcu_read_lock();
list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
- if (!xskq_peek_desc(xs->tx, &desc))
+ if (!xskq_peek_desc(xs->tx, desc))
continue;
- if (xskq_produce_addr_lazy(umem->cq, desc.addr))
+ if (xskq_produce_addr_lazy(umem->cq, desc->addr))
goto out;
- *dma = xdp_umem_get_dma(umem, desc.addr);
- *len = desc.len;
-
xskq_discard_desc(xs->tx);
rcu_read_unlock();
return true;
--
2.19.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH net-next 1/6] samples/bpf: Add hbm to .gitignore
2019-04-26 11:42 ` [PATCH net-next 1/6] samples/bpf: Add hbm to .gitignore Maxim Mikityanskiy
@ 2019-04-26 14:57 ` Maciej Fijalkowski
0 siblings, 0 replies; 18+ messages in thread
From: Maciej Fijalkowski @ 2019-04-26 14:57 UTC (permalink / raw)
To: Maxim Mikityanskiy
Cc: David S. Miller, Björn Töpel, Magnus Karlsson,
netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan
On Fri, 26 Apr 2019 11:42:31 +0000
Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
> The hbm binary was missing from .gitignore.
>
> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> ---
> samples/bpf/.gitignore | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/samples/bpf/.gitignore b/samples/bpf/.gitignore
> index 59e40998e249..c7498457595a 100644
> --- a/samples/bpf/.gitignore
> +++ b/samples/bpf/.gitignore
> @@ -1,5 +1,6 @@
> cpustat
> fds_example
> +hbm
> lathist
> lwt_len_hist
> map_perf_test
That was already applied in ead442a0f9aa ("samples: bpf: add hbm sample
to .gitignore"). I guess you need to send a v2 rebased against latest bpf-next?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next 2/6] xsk: Add API to check for available entries in FQ
2019-04-26 11:42 ` [PATCH net-next 2/6] xsk: Add API to check for available entries in FQ Maxim Mikityanskiy
@ 2019-04-26 15:18 ` Alexei Starovoitov
0 siblings, 0 replies; 18+ messages in thread
From: Alexei Starovoitov @ 2019-04-26 15:18 UTC (permalink / raw)
To: Maxim Mikityanskiy
Cc: David S. Miller, Björn Töpel, Magnus Karlsson,
netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan
On Fri, Apr 26, 2019 at 11:42:33AM +0000, Maxim Mikityanskiy wrote:
> Add a function that checks whether the Fill Ring has the specified
> amount of descriptors available. It will be useful for mlx5e that wants
> to check in advance, whether it can allocate a bulk of RX descriptors,
> to get the best performance.
>
> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
...
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index a14e8864e4fa..ca764ed9da41 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -4,9 +4,11 @@
> * AF_XDP sockets allows a channel between XDP programs and userspace
> * applications.
> * Copyright(c) 2018 Intel Corporation.
> + * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
> *
> * Author(s): Björn Töpel <bjorn.topel@intel.com>
> * Magnus Karlsson <magnus.karlsson@intel.com>
> + * Maxim Mikityanskiy <maximmi@mellanox.com>
I don't think adding one function makes you an author.
Please remove and remove all copyright lines.
If a file is brand new it's fine to add one, but changes
to existing files do not warrant copyright line additions.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next 4/6] xsk: Extend channels to support combined XSK/non-XSK traffic
2019-04-26 11:42 ` [PATCH net-next 4/6] xsk: Extend channels to support combined XSK/non-XSK traffic Maxim Mikityanskiy
@ 2019-04-26 15:24 ` Alexei Starovoitov
2019-04-26 19:11 ` Jakub Kicinski
2019-04-26 20:26 ` Björn Töpel
2 siblings, 0 replies; 18+ messages in thread
From: Alexei Starovoitov @ 2019-04-26 15:24 UTC (permalink / raw)
To: Maxim Mikityanskiy
Cc: David S. Miller, Björn Töpel, Magnus Karlsson,
netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan
On Fri, Apr 26, 2019 at 11:42:37AM +0000, Maxim Mikityanskiy wrote:
>
> This commit extends XSK to support both kinds of traffic (XSK and
> non-XSK) in the same channel. It implies having two RX queues in
> XSK-enabled channels: one for the regular traffic, and the other for
> XSK.
...
> the reference XDP program in libbpf). mlx5e will support this extension.
Nack to this feature, because corresponding mlx patches are not provided.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next 4/6] xsk: Extend channels to support combined XSK/non-XSK traffic
2019-04-26 11:42 ` [PATCH net-next 4/6] xsk: Extend channels to support combined XSK/non-XSK traffic Maxim Mikityanskiy
2019-04-26 15:24 ` Alexei Starovoitov
@ 2019-04-26 19:11 ` Jakub Kicinski
2019-04-30 18:11 ` Maxim Mikityanskiy
2019-04-26 20:26 ` Björn Töpel
2 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2019-04-26 19:11 UTC (permalink / raw)
To: Maxim Mikityanskiy
Cc: David S. Miller, Björn Töpel, Magnus Karlsson,
netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan
On Fri, 26 Apr 2019 11:42:37 +0000, Maxim Mikityanskiy wrote:
> Currently, the drivers that implement AF_XDP zero-copy support (e.g.,
> i40e) switch the channel into a different mode when an XSK is opened. It
> causes some issues that have to be taken into account. For example, RSS
> needs to be reconfigured to skip the XSK-enabled channels, or the XDP
> program should filter out traffic not intended for that socket and
> XDP_PASS it with an additional copy. As nothing validates or forces the
> proper configuration, it's easy to have packets drops, when they get
> into an XSK by mistake, and, in fact, it's the default configuration.
> There has to be some tool to have RSS reconfigured on each socket open
> and close event, but such a tool is problematic to implement, because no
> one reports these events, and it's race-prone.
>
> This commit extends XSK to support both kinds of traffic (XSK and
> non-XSK) in the same channel. It implies having two RX queues in
> XSK-enabled channels: one for the regular traffic, and the other for
> XSK. It solves the problem with RSS: the default configuration just
> works without the need to manually reconfigure RSS or to perform some
> possibly complicated filtering in the XDP layer. It makes it easy to run
> both AF_XDP and regular sockets on the same machine. In the XDP program,
> the QID's most significant bit will serve as a flag to indicate whether
> it's the XSK queue or not. The extension is compatible with the legacy
> configuration, so if one wants to run the legacy mode, they can
> reconfigure RSS and ignore the flag in the XDP program (implemented in
> the reference XDP program in libbpf). mlx5e will support this extension.
>
> A single XDP program can run both with drivers supporting or not
> supporting this extension. The xdpsock sample and libbpf are updated
> accordingly.
>
> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
It's been a while but this patch makes very little sense to me.
User installs a UMEM on a queue, all packets are delivered to that
UMEM, if XDP program doesn't redirect the packet to a XSK socket the
driver should copy it to a normal kernel page and proceed. No drops
should happen.
Having two queues which pretend to share an ID but really one of them
has a magic encoding does not help IMO.
I definitely agree with you that the way queues are taken out of the
stack's ID space and the fact they aren't automatically excluded from
RSS etc is quite suboptimal..
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next 0/6] AF_XDP infrastructure improvements
2019-04-26 11:42 [PATCH net-next 0/6] AF_XDP infrastructure improvements Maxim Mikityanskiy
` (5 preceding siblings ...)
2019-04-26 11:42 ` [PATCH net-next 6/6] xsk: Return the whole xdp_desc from xsk_umem_consume_tx Maxim Mikityanskiy
@ 2019-04-26 20:25 ` Björn Töpel
6 siblings, 0 replies; 18+ messages in thread
From: Björn Töpel @ 2019-04-26 20:25 UTC (permalink / raw)
To: Maxim Mikityanskiy, David S. Miller, Magnus Karlsson
Cc: netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan
On 2019-04-26 13:42, Maxim Mikityanskiy wrote:
> This series contains improvements to the AF_XDP kernel infrastructure.
> They are required for the upcoming AF_XDP support in mlx5e, but also
> some of them benefit to all drivers, and some can be useful for other
> drivers that want to implement AF_XDP.
>,
Maxim,
Firstly; It's exciting to see Mellanox continuing to invest in XDP/BPF!
Keep it up!
I'll comment on the individual patches. Going forward, please rebase/tag
the patches to bpf-next (and add the bpf mailing list).
Cheers,
Björn
> Maxim Mikityanskiy (6):
> samples/bpf: Add hbm to .gitignore
> xsk: Add API to check for available entries in FQ
> xsk: Add getsockopt XDP_OPTIONS
> xsk: Extend channels to support combined XSK/non-XSK traffic
> xsk: Change the default frame size to 4096 and allow controlling it
> xsk: Return the whole xdp_desc from xsk_umem_consume_tx
>
> drivers/net/ethernet/intel/i40e/i40e_xsk.c | 12 +-
> drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 15 ++-
> include/net/xdp_sock.h | 28 +++-
> include/uapi/linux/if_xdp.h | 18 +++
> net/xdp/xsk.c | 45 +++++--
> net/xdp/xsk_queue.h | 15 +++
> samples/bpf/.gitignore | 1 +
> samples/bpf/xdpsock_user.c | 47 +++++--
> tools/include/uapi/linux/if_xdp.h | 18 +++
> tools/lib/bpf/xsk.c | 127 +++++++++++++++----
> tools/lib/bpf/xsk.h | 6 +-
> 11 files changed, 267 insertions(+), 65 deletions(-)
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next 3/6] xsk: Add getsockopt XDP_OPTIONS
2019-04-26 11:42 ` [PATCH net-next 3/6] xsk: Add getsockopt XDP_OPTIONS Maxim Mikityanskiy
@ 2019-04-26 20:25 ` Björn Töpel
0 siblings, 0 replies; 18+ messages in thread
From: Björn Töpel @ 2019-04-26 20:25 UTC (permalink / raw)
To: Maxim Mikityanskiy, David S. Miller, Magnus Karlsson
Cc: netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan
On 2019-04-26 13:42, Maxim Mikityanskiy wrote:
> Make it possible for the application to determine whether the AF_XDP
> socket is running in zero-copy mode. To achieve this, add a new
> getsockopt option XDP_OPTIONS that returns flags. The only flag
> supported for now is the zero-copy mode indicator.
>
> Also query XDP_OPTIONS in the xdpsock sample application. The zero-copy
> flag will be used in the following commits.
>
Please break this up into separate patches; xsk and libbpf.
Björn
> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> ---
> include/uapi/linux/if_xdp.h | 7 +++++++
> net/xdp/xsk.c | 22 ++++++++++++++++++++++
> tools/include/uapi/linux/if_xdp.h | 7 +++++++
> tools/lib/bpf/xsk.c | 11 +++++++++++
> 4 files changed, 47 insertions(+)
>
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index caed8b1614ff..9ae4b4e08b68 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -46,6 +46,7 @@ struct xdp_mmap_offsets {
> #define XDP_UMEM_FILL_RING 5
> #define XDP_UMEM_COMPLETION_RING 6
> #define XDP_STATISTICS 7
> +#define XDP_OPTIONS 8
>
> struct xdp_umem_reg {
> __u64 addr; /* Start of packet data area */
> @@ -60,6 +61,12 @@ struct xdp_statistics {
> __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
> };
>
> +struct xdp_options {
> + __u32 flags;
> +};
> +
> +#define XDP_OPTIONS_FLAG_ZEROCOPY (1 << 0)
> +
> /* Pgoff for mmaping the rings */
> #define XDP_PGOFF_RX_RING 0
> #define XDP_PGOFF_TX_RING 0x80000000
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index ca764ed9da41..ff8427552e8d 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -652,6 +652,28 @@ static int xsk_getsockopt(struct socket *sock, int level, int optname,
>
> return 0;
> }
> + case XDP_OPTIONS:
> + {
> + struct xdp_options opts;
> +
> + if (len < sizeof(opts))
> + return -EINVAL;
> +
> + opts.flags = 0;
> +
> + mutex_lock(&xs->mutex);
> + if (xs->zc)
> + opts.flags |= XDP_OPTIONS_FLAG_ZEROCOPY;
> + mutex_unlock(&xs->mutex);
> +
> + len = sizeof(opts);
> + if (copy_to_user(optval, &opts, len))
> + return -EFAULT;
> + if (put_user(len, optlen))
> + return -EFAULT;
> +
> + return 0;
> + }
> default:
> break;
> }
> diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
> index caed8b1614ff..9ae4b4e08b68 100644
> --- a/tools/include/uapi/linux/if_xdp.h
> +++ b/tools/include/uapi/linux/if_xdp.h
> @@ -46,6 +46,7 @@ struct xdp_mmap_offsets {
> #define XDP_UMEM_FILL_RING 5
> #define XDP_UMEM_COMPLETION_RING 6
> #define XDP_STATISTICS 7
> +#define XDP_OPTIONS 8
>
> struct xdp_umem_reg {
> __u64 addr; /* Start of packet data area */
> @@ -60,6 +61,12 @@ struct xdp_statistics {
> __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
> };
>
> +struct xdp_options {
> + __u32 flags;
> +};
> +
> +#define XDP_OPTIONS_FLAG_ZEROCOPY (1 << 0)
> +
> /* Pgoff for mmaping the rings */
> #define XDP_PGOFF_RX_RING 0
> #define XDP_PGOFF_TX_RING 0x80000000
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index 8d0078b65486..50d7235e3b21 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -67,6 +67,7 @@ struct xsk_socket {
> int xsks_map_fd;
> __u32 queue_id;
> char ifname[IFNAMSIZ];
> + bool zc;
> };
>
> struct xsk_nl_info {
> @@ -525,6 +526,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
> {
> struct sockaddr_xdp sxdp = {};
> struct xdp_mmap_offsets off;
> + struct xdp_options opts;
> struct xsk_socket *xsk;
> socklen_t optlen;
> void *map;
> @@ -642,6 +644,15 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
> goto out_mmap_tx;
> }
>
> + optlen = sizeof(opts);
> + err = getsockopt(xsk->fd, SOL_XDP, XDP_OPTIONS, &opts, &optlen);
> + if (err) {
> + err = -errno;
> + goto out_mmap_tx;
> + }
> +
> + xsk->zc = opts.flags & XDP_OPTIONS_FLAG_ZEROCOPY;
> +
> if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) {
> err = xsk_setup_xdp_prog(xsk);
> if (err)
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next 4/6] xsk: Extend channels to support combined XSK/non-XSK traffic
2019-04-26 11:42 ` [PATCH net-next 4/6] xsk: Extend channels to support combined XSK/non-XSK traffic Maxim Mikityanskiy
2019-04-26 15:24 ` Alexei Starovoitov
2019-04-26 19:11 ` Jakub Kicinski
@ 2019-04-26 20:26 ` Björn Töpel
2019-04-30 18:11 ` Maxim Mikityanskiy
2 siblings, 1 reply; 18+ messages in thread
From: Björn Töpel @ 2019-04-26 20:26 UTC (permalink / raw)
To: Maxim Mikityanskiy, David S. Miller, Magnus Karlsson
Cc: netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan
On 2019-04-26 13:42, Maxim Mikityanskiy wrote:
> Currently, the drivers that implement AF_XDP zero-copy support (e.g.,
> i40e) switch the channel into a different mode when an XSK is opened. It
> causes some issues that have to be taken into account. For example, RSS
> needs to be reconfigured to skip the XSK-enabled channels, or the XDP
> program should filter out traffic not intended for that socket and
> XDP_PASS it with an additional copy. As nothing validates or forces the
> proper configuration, it's easy to have packets drops, when they get
> into an XSK by mistake, and, in fact, it's the default configuration.
> There has to be some tool to have RSS reconfigured on each socket open
> and close event, but such a tool is problematic to implement, because no
> one reports these events, and it's race-prone.
>
> This commit extends XSK to support both kinds of traffic (XSK and
> non-XSK) in the same channel. It implies having two RX queues in
> XSK-enabled channels: one for the regular traffic, and the other for
> XSK. It solves the problem with RSS: the default configuration just
> works without the need to manually reconfigure RSS or to perform some
> possibly complicated filtering in the XDP layer. It makes it easy to run
> both AF_XDP and regular sockets on the same machine. In the XDP program,
> the QID's most significant bit will serve as a flag to indicate whether
> it's the XSK queue or not. The extension is compatible with the legacy
> configuration, so if one wants to run the legacy mode, they can
> reconfigure RSS and ignore the flag in the XDP program (implemented in
> the reference XDP program in libbpf). mlx5e will support this extension.
>
> A single XDP program can run both with drivers supporting or not
> supporting this extension. The xdpsock sample and libbpf are updated
> accordingly.
>
> Signed-off-by: Maxim Mikityanskiy<maximmi@mellanox.com>
I acknowledge the problem you're addressing, but I'm not a fan of this
design.
Netdevs, queues and now channels. This is too complicated. The setup is
complex as it is. Adding more cognitive load is not good.
There's already an assumption that a user attaching to a certain netdev
queue has to take care of the flow steering.
What about adding a mode where the AF_XDP user just get a *new* queue,
by binding to (say) netdev;qid==-1? This new queue wouldn't be included
in the existing queue set. Create a bunch of AF_XDP sockets and steer
traffic to that set. And yes, we need a mechanism to steer to that
queue, in-app/socket or out-of-band. Please see this [1] discussion.
I don't like this "let's split a queue into channels".
Björn
[1] https://lore.kernel.org/netdev/20190415183258.36dcee9a@carbon/
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next 5/6] xsk: Change the default frame size to 4096 and allow controlling it
2019-04-26 11:42 ` [PATCH net-next 5/6] xsk: Change the default frame size to 4096 and allow controlling it Maxim Mikityanskiy
@ 2019-04-26 20:27 ` Björn Töpel
0 siblings, 0 replies; 18+ messages in thread
From: Björn Töpel @ 2019-04-26 20:27 UTC (permalink / raw)
To: Maxim Mikityanskiy, David S. Miller, Magnus Karlsson
Cc: netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan
On 2019-04-26 13:42, Maxim Mikityanskiy wrote:
> The typical XDP memory scheme is one packet per page. Change the AF_XDP
> frame size in libbpf to 4096, which is the page size on x86, to allow
> libbpf to be used with the drivers with the packet-per-page scheme.
>
> Add a command line option -f to xdpsock to allow to specify a custom
> frame size.
>
I'd prefer size, in favor of "frame shift".
Björn
> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> ---
> samples/bpf/xdpsock_user.c | 39 +++++++++++++++++++++++++-------------
> tools/lib/bpf/xsk.h | 2 +-
> 2 files changed, 27 insertions(+), 14 deletions(-)
>
> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> index a6b13025ee79..3746f699cf2f 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -69,6 +69,7 @@ static int opt_queue;
> static int opt_poll;
> static int opt_interval = 1;
> static u32 opt_xdp_bind_flags;
> +static int opt_xsk_frame_shift = XSK_UMEM__DEFAULT_FRAME_SHIFT;
> static __u32 prog_id;
>
> struct xsk_umem_info {
> @@ -93,6 +94,11 @@ struct xsk_socket_info {
> static int num_socks;
> struct xsk_socket_info *xsks[MAX_SOCKS];
>
> +static inline int get_buf_size(int frames)
> +{
> + return frames << opt_xsk_frame_shift;
> +}
> +
> static unsigned long get_nsecs(void)
> {
> struct timespec ts;
> @@ -277,6 +283,12 @@ static size_t gen_eth_frame(struct xsk_umem_info *umem, u64 addr)
> static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
> {
> struct xsk_umem_info *umem;
> + struct xsk_umem_config cfg = {
> + .fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS,
> + .comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
> + .frame_size = get_buf_size(1),
> + .frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
> + };
> int ret;
>
> umem = calloc(1, sizeof(*umem));
> @@ -284,7 +296,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
> exit_with_error(errno);
>
> ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
> - NULL);
> + &cfg);
> if (ret)
> exit_with_error(-ret);
>
> @@ -325,9 +337,8 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
> if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS)
> exit_with_error(-ret);
> for (i = 0;
> - i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
> - XSK_UMEM__DEFAULT_FRAME_SIZE;
> - i += XSK_UMEM__DEFAULT_FRAME_SIZE)
> + i < get_buf_size(XSK_RING_PROD__DEFAULT_NUM_DESCS);
> + i += get_buf_size(1))
> *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = i;
> xsk_ring_prod__submit(&xsk->umem->fq,
> XSK_RING_PROD__DEFAULT_NUM_DESCS);
> @@ -348,6 +359,7 @@ static struct option long_options[] = {
> {"zero-copy", no_argument, 0, 'z'},
> {"copy", no_argument, 0, 'c'},
> {"combined", no_argument, 0, 'C'},
> + {"frame-shift", required_argument, 0, 'f'},
> {0, 0, 0, 0}
> };
>
> @@ -368,8 +380,9 @@ static void usage(const char *prog)
> " -z, --zero-copy Force zero-copy mode.\n"
> " -c, --copy Force copy mode.\n"
> " -C, --combined Driver supports combined XSK and non-XSK traffic in a channel.\n"
> + " -f, --frame-shift=n Set the frame shift (default is %d, frame size is 1 << frame-shift).\n"
> "\n";
> - fprintf(stderr, str, prog);
> + fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SHIFT);
> exit(EXIT_FAILURE);
> }
>
> @@ -380,7 +393,7 @@ static void parse_command_line(int argc, char **argv)
> opterr = 0;
>
> for (;;) {
> - c = getopt_long(argc, argv, "Frtli:q:psSNn:czC", long_options,
> + c = getopt_long(argc, argv, "Frtli:q:psSNn:czCf:", long_options,
> &option_index);
> if (c == -1)
> break;
> @@ -426,6 +439,9 @@ static void parse_command_line(int argc, char **argv)
> case 'C':
> opt_libbpf_flags |= XSK_LIBBPF_FLAGS__COMBINED_CHANNELS;
> break;
> + case 'f':
> + opt_xsk_frame_shift = atoi(optarg);
> + break;
> default:
> usage(basename(argv[0]));
> }
> @@ -589,8 +605,7 @@ static void tx_only(struct xsk_socket_info *xsk)
>
> for (i = 0; i < BATCH_SIZE; i++) {
> xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr
> - = (frame_nb + i) <<
> - XSK_UMEM__DEFAULT_FRAME_SHIFT;
> + = get_buf_size(frame_nb + i);
> xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len =
> sizeof(pkt_data) - 1;
> }
> @@ -667,20 +682,18 @@ int main(int argc, char **argv)
> }
>
> ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
> - NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
> + get_buf_size(NUM_FRAMES));
> if (ret)
> exit_with_error(ret);
>
> /* Create sockets... */
> - umem = xsk_configure_umem(bufs,
> - NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
> + umem = xsk_configure_umem(bufs, get_buf_size(NUM_FRAMES));
> xsks[num_socks++] = xsk_configure_socket(umem);
>
> if (opt_bench == BENCH_TXONLY) {
> int i;
>
> - for (i = 0; i < NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE;
> - i += XSK_UMEM__DEFAULT_FRAME_SIZE)
> + for (i = 0; i < get_buf_size(NUM_FRAMES); i += get_buf_size(1))
> (void)gen_eth_frame(umem, i);
> }
>
> diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> index 72a49b102ab3..946245a7272e 100644
> --- a/tools/lib/bpf/xsk.h
> +++ b/tools/lib/bpf/xsk.h
> @@ -157,7 +157,7 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
>
> #define XSK_RING_CONS__DEFAULT_NUM_DESCS 2048
> #define XSK_RING_PROD__DEFAULT_NUM_DESCS 2048
> -#define XSK_UMEM__DEFAULT_FRAME_SHIFT 11 /* 2048 bytes */
> +#define XSK_UMEM__DEFAULT_FRAME_SHIFT 12 /* 4096 bytes */
> #define XSK_UMEM__DEFAULT_FRAME_SIZE (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
> #define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next 4/6] xsk: Extend channels to support combined XSK/non-XSK traffic
2019-04-26 19:11 ` Jakub Kicinski
@ 2019-04-30 18:11 ` Maxim Mikityanskiy
0 siblings, 0 replies; 18+ messages in thread
From: Maxim Mikityanskiy @ 2019-04-30 18:11 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S. Miller, Björn Töpel, Magnus Karlsson,
netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan
On 2019-04-26 22:11, Jakub Kicinski wrote:
> On Fri, 26 Apr 2019 11:42:37 +0000, Maxim Mikityanskiy wrote:
>> Currently, the drivers that implement AF_XDP zero-copy support (e.g.,
>> i40e) switch the channel into a different mode when an XSK is opened. It
>> causes some issues that have to be taken into account. For example, RSS
>> needs to be reconfigured to skip the XSK-enabled channels, or the XDP
>> program should filter out traffic not intended for that socket and
>> XDP_PASS it with an additional copy. As nothing validates or forces the
>> proper configuration, it's easy to have packets drops, when they get
>> into an XSK by mistake, and, in fact, it's the default configuration.
>> There has to be some tool to have RSS reconfigured on each socket open
>> and close event, but such a tool is problematic to implement, because no
>> one reports these events, and it's race-prone.
>>
>> This commit extends XSK to support both kinds of traffic (XSK and
>> non-XSK) in the same channel. It implies having two RX queues in
>> XSK-enabled channels: one for the regular traffic, and the other for
>> XSK. It solves the problem with RSS: the default configuration just
>> works without the need to manually reconfigure RSS or to perform some
>> possibly complicated filtering in the XDP layer. It makes it easy to run
>> both AF_XDP and regular sockets on the same machine. In the XDP program,
>> the QID's most significant bit will serve as a flag to indicate whether
>> it's the XSK queue or not. The extension is compatible with the legacy
>> configuration, so if one wants to run the legacy mode, they can
>> reconfigure RSS and ignore the flag in the XDP program (implemented in
>> the reference XDP program in libbpf). mlx5e will support this extension.
>>
>> A single XDP program can run both with drivers supporting or not
>> supporting this extension. The xdpsock sample and libbpf are updated
>> accordingly.
>>
>> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
>
> It's been a while but this patch makes very little sense to me.
>
> User installs a UMEM on a queue, all packets are delivered to that
> UMEM, if XDP program doesn't redirect the packet to a XSK socket the
> driver should copy it to a normal kernel page and proceed. No drops
> should happen.
The thing is that the reference XDP program (included in libbpf)
redirects everything to the XSK. It doesn't do any filtering. So, the
packets steered to the queue by RSS also get into the XSK when they
shouldn't - here is where drops appear. Moreover, if the XDP program did
the filtering, you would have to setup the same rules both via ethtool
and in the XDP program, and this has a lot of disadvantages comparing to
just not having stray packets in the queue.
> Having two queues which pretend to share an ID but really one of them
> has a magic encoding does not help IMO.
>
> I definitely agree with you that the way queues are taken out of the
> stack's ID space and the fact they aren't automatically excluded from
> RSS etc is quite suboptimal..
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next 4/6] xsk: Extend channels to support combined XSK/non-XSK traffic
2019-04-26 20:26 ` Björn Töpel
@ 2019-04-30 18:11 ` Maxim Mikityanskiy
2019-04-30 18:24 ` Björn Töpel
0 siblings, 1 reply; 18+ messages in thread
From: Maxim Mikityanskiy @ 2019-04-30 18:11 UTC (permalink / raw)
To: Björn Töpel, David S. Miller, Magnus Karlsson
Cc: netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan, Jakub Kicinski
On 2019-04-26 23:26, Björn Töpel wrote:
> On 2019-04-26 13:42, Maxim Mikityanskiy wrote:
>> Currently, the drivers that implement AF_XDP zero-copy support (e.g.,
>> i40e) switch the channel into a different mode when an XSK is opened. It
>> causes some issues that have to be taken into account. For example, RSS
>> needs to be reconfigured to skip the XSK-enabled channels, or the XDP
>> program should filter out traffic not intended for that socket and
>> XDP_PASS it with an additional copy. As nothing validates or forces the
>> proper configuration, it's easy to have packets drops, when they get
>> into an XSK by mistake, and, in fact, it's the default configuration.
>> There has to be some tool to have RSS reconfigured on each socket open
>> and close event, but such a tool is problematic to implement, because no
>> one reports these events, and it's race-prone.
>>
>> This commit extends XSK to support both kinds of traffic (XSK and
>> non-XSK) in the same channel. It implies having two RX queues in
>> XSK-enabled channels: one for the regular traffic, and the other for
>> XSK. It solves the problem with RSS: the default configuration just
>> works without the need to manually reconfigure RSS or to perform some
>> possibly complicated filtering in the XDP layer. It makes it easy to run
>> both AF_XDP and regular sockets on the same machine. In the XDP program,
>> the QID's most significant bit will serve as a flag to indicate whether
>> it's the XSK queue or not. The extension is compatible with the legacy
>> configuration, so if one wants to run the legacy mode, they can
>> reconfigure RSS and ignore the flag in the XDP program (implemented in
>> the reference XDP program in libbpf). mlx5e will support this extension.
>>
>> A single XDP program can run both with drivers supporting or not
>> supporting this extension. The xdpsock sample and libbpf are updated
>> accordingly.
>>
>> Signed-off-by: Maxim Mikityanskiy<maximmi@mellanox.com>
>
>
> I acknowledge the problem you're addressing, but I'm not a fan of this
> design.
>
> Netdevs, queues and now channels. This is too complicated. The setup is
> complex as it is. Adding more cognitive load is not good.
IMO, using "queues" for AF_XDP is a misnomer. You don't really attach an
AF_XDP socket to RX queue number X and TX queue number Y. Instead, you
have a single number, and that number actually selects a channel (a ring
in i40e) number X, which has an RX queue and TX queue. Even libbpf
queries the number of channels to determine the maximum X allowed.
So, I'm not adding anything particularly new here. Everything that used
to work still works. I'm not using netdevs+queues+channels, I'm still
using netdevs+channels, just as it was before. There is no incompatible
change here, only an improvement for the drivers that support it.
I'm going to respin this series, adding the mlx5 patches and addressing
the other comments. If you feel like there are more things to discuss
regarding this patch, please move on to the v2.
Thanks for looking into it!
> There's already an assumption that a user attaching to a certain netdev
> queue has to take care of the flow steering.
>
> What about adding a mode where the AF_XDP user just get a *new* queue,
> by binding to (say) netdev;qid==-1? This new queue wouldn't be included
> in the existing queue set. Create a bunch of AF_XDP sockets and steer
> traffic to that set. And yes, we need a mechanism to steer to that
> queue, in-app/socket or out-of-band. Please see this [1] discussion.
>
> I don't like this "let's split a queue into channels".
>
>
> Björn
>
>
> [1] https://lore.kernel.org/netdev/20190415183258.36dcee9a@carbon/
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next 4/6] xsk: Extend channels to support combined XSK/non-XSK traffic
2019-04-30 18:11 ` Maxim Mikityanskiy
@ 2019-04-30 18:24 ` Björn Töpel
0 siblings, 0 replies; 18+ messages in thread
From: Björn Töpel @ 2019-04-30 18:24 UTC (permalink / raw)
To: Maxim Mikityanskiy, David S. Miller, Magnus Karlsson
Cc: netdev@vger.kernel.org, Saeed Mahameed, Jonathan Lemon,
Eran Ben Elisha, Tariq Toukan, Jakub Kicinski
On 2019-04-30 20:11, Maxim Mikityanskiy wrote:
> I'm going to respin this series, adding the mlx5 patches and addressing
> the other comments. If you feel like there are more things to discuss
> regarding this patch, please move on to the v2.
Very cool, thanks for working on this Maxim!
It's Labor Day in Sweden tomorrow, I'll have a look Thu/Fri!
Cheers,
Björn
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2019-04-30 18:24 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-04-26 11:42 [PATCH net-next 0/6] AF_XDP infrastructure improvements Maxim Mikityanskiy
2019-04-26 11:42 ` [PATCH net-next 1/6] samples/bpf: Add hbm to .gitignore Maxim Mikityanskiy
2019-04-26 14:57 ` Maciej Fijalkowski
2019-04-26 11:42 ` [PATCH net-next 2/6] xsk: Add API to check for available entries in FQ Maxim Mikityanskiy
2019-04-26 15:18 ` Alexei Starovoitov
2019-04-26 11:42 ` [PATCH net-next 3/6] xsk: Add getsockopt XDP_OPTIONS Maxim Mikityanskiy
2019-04-26 20:25 ` Björn Töpel
2019-04-26 11:42 ` [PATCH net-next 4/6] xsk: Extend channels to support combined XSK/non-XSK traffic Maxim Mikityanskiy
2019-04-26 15:24 ` Alexei Starovoitov
2019-04-26 19:11 ` Jakub Kicinski
2019-04-30 18:11 ` Maxim Mikityanskiy
2019-04-26 20:26 ` Björn Töpel
2019-04-30 18:11 ` Maxim Mikityanskiy
2019-04-30 18:24 ` Björn Töpel
2019-04-26 11:42 ` [PATCH net-next 5/6] xsk: Change the default frame size to 4096 and allow controlling it Maxim Mikityanskiy
2019-04-26 20:27 ` Björn Töpel
2019-04-26 11:42 ` [PATCH net-next 6/6] xsk: Return the whole xdp_desc from xsk_umem_consume_tx Maxim Mikityanskiy
2019-04-26 20:25 ` [PATCH net-next 0/6] AF_XDP infrastructure improvements Björn Töpel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).