* Re: [PATCH v5] net: netfilter: Fix rpfilter dropping vrf packets by mistake
From: Pablo Neira Ayuso @ 2019-07-16 11:16 UTC (permalink / raw)
To: Miaohe Lin
Cc: kadlec, fw, davem, kuznet, yoshfuji, netfilter-devel, coreteam,
netdev, linux-kernel, mingfangsen
In-Reply-To: <1562039976-203880-1-git-send-email-linmiaohe@huawei.com>
On Tue, Jul 02, 2019 at 03:59:36AM +0000, Miaohe Lin wrote:
> When firewalld is enabled with ipv4/ipv6 rpfilter, vrf
> ipv4/ipv6 packets will be dropped. Vrf device will pass
> through netfilter hook twice. One with enslaved device
> and another one with l3 master device. So in device may
> dismatch witch out device because out device is always
> enslaved device.So failed with the check of the rpfilter
> and drop the packets by mistake.
Applied to nf.git, thanks.
^ permalink raw reply
* [PATCH v2 00/10] XDP unaligned chunk placement support
From: Kevin Laatz @ 2019-07-16 3:06 UTC (permalink / raw)
To: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jakub.kicinski,
jonathan.lemon
Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz
In-Reply-To: <20190620090958.2135-1-kevin.laatz@intel.com>
This patch set adds the ability to use unaligned chunks in the XDP umem.
Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (default is 2k, max is PAGE_SIZE). This limits where we can
place chunks within the umem as well as limiting the packet sizes that are
supported.
The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.
Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.
The numbers below were recorded with the following set up:
- Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
- Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
- Driver: i40e
- Application: xdpsock with l2fwd (single interface)
These are solely for comparing performance with and without the patches.
The largest drop was ~1% (in zero-copy mode).
+-------------------------+------------+-----------------+-------------+
| Buffer size: 2048 | SKB mode | Zero-copy | Copy |
+-------------------------+------------+-----------------+-------------+
| Aligned (baseline) | 1.7 Mpps | 15.3 Mpps | 2.08 Mpps |
+-------------------------+------------+-----------------+-------------+
| Aligned (with patches) | 1.7 Mpps | 15.1 Mpps | 2.08 Mpps |
+-------------------------+------------+-----------------+-------------+
| Unaligned | 1.7 Mpps | 14.5 Mpps | 2.08 Mpps |
+-------------------------+------------+-----------------+-------------+
NOTE: We are currently working on the changes required in the Mellanox
driver. We will include these in the v3.
Structure of the patchset:
Patch 1:
- Remove unnecessary masking and headroom addition during zero-copy Rx
buffer recycling in i40e. This change is required in order for the
buffer recycling to work in the unaligned chunk mode.
Patch 2:
- Remove unnecessary masking and headroom addition during
zero-copy Rx buffer recycling in ixgbe. This change is required in
order for the buffer recycling to work in the unaligned chunk mode.
Patch 3:
- Add infrastructure for unaligned chunks. Since we are dealing with
unaligned chunks that could potentially cross a physical page boundary,
we add checks to keep track of that information. We can later use this
information to correctly handle buffers that are placed at an address
where they cross a page boundary. This patch also modifies the
existing Rx and Tx functions to use the new descriptor format. To
handle addresses correctly, we need to mask appropriately based on
whether we are in aligned or unaligned mode.
Patch 4:
- This patch updates the i40e driver to make use of the new descriptor
format. The new format is particularly useful here since we can now
retrieve the original address in places like i40e_zca_free with ease.
This saves us doing various calculations to get the original address
back.
Patch 5:
- This patch updates the ixgbe driver to make use of the new descriptor
format. The new format is particularly useful here since we can now
retrieve the original address in places like ixgbe_zca_free with ease.
This saves us doing various calculations to get the original address
back.
Patch 6:
- Add flags for umem configuration to libbpf
Patch 7:
- Modify xdpsock application to add a command line option for
unaligned chunks
Patch 8:
- Since we can now run the application in unaligned chunk mode, we need
to make sure we recycle the buffers appropriately.
Patch 9:
- Adds hugepage support to the xdpsock application
Patch 10:
- Documentation update to include the unaligned chunk scenario. We need
to explicitly state that the incoming addresses are only masked in the
aligned chunk mode and not the unaligned chunk mode.
---
v2:
- fixed checkpatch issues
- fixed Rx buffer recycling for unaligned chunks in xdpsock
- removed unused defines
- fixed how chunk_size is calculated in xsk_diag.c
- added some performance numbers to cover letter
- modified descriptor format to make it easier to retrieve original
address
- removed patch adding off_t off to the zero copy allocator. This is no
longer needed with the new descriptor format.
Kevin Laatz (10):
i40e: simplify Rx buffer recycle
ixgbe: simplify Rx buffer recycle
xsk: add support to allow unaligned chunk placement
i40e: modify driver for handling offsets
ixgbe: modify driver for handling offsets
libbpf: add flags to umem config
samples/bpf: add unaligned chunks mode support to xdpsock
samples/bpf: add buffer recycling for unaligned chunks to xdpsock
samples/bpf: use hugepages in xdpsock app
doc/af_xdp: include unaligned chunk case
Documentation/networking/af_xdp.rst | 10 ++-
drivers/net/ethernet/intel/i40e/i40e_xsk.c | 39 +++++----
drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 39 +++++----
include/net/xdp_sock.h | 2 +
include/uapi/linux/if_xdp.h | 9 ++
net/xdp/xdp_umem.c | 17 ++--
net/xdp/xsk.c | 89 ++++++++++++++++----
net/xdp/xsk_diag.c | 2 +-
net/xdp/xsk_queue.h | 70 +++++++++++++--
samples/bpf/xdpsock_user.c | 61 ++++++++++----
tools/include/uapi/linux/if_xdp.h | 4 +
tools/lib/bpf/xsk.c | 3 +
tools/lib/bpf/xsk.h | 2 +
13 files changed, 266 insertions(+), 81 deletions(-)
--
2.17.1
^ permalink raw reply
* [PATCH v2 01/10] i40e: simplify Rx buffer recycle
From: Kevin Laatz @ 2019-07-16 3:06 UTC (permalink / raw)
To: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jakub.kicinski,
jonathan.lemon
Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz
In-Reply-To: <20190716030637.5634-1-kevin.laatz@intel.com>
Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'old_bi' values
directly without having to mask and add the headroom.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_xsk.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 32bad014d76c..dfa096db2244 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -420,8 +420,6 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
struct i40e_rx_buffer *old_bi)
{
struct i40e_rx_buffer *new_bi = &rx_ring->rx_bi[rx_ring->next_to_alloc];
- unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
- u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
u16 nta = rx_ring->next_to_alloc;
/* update, and store next to alloc */
@@ -429,14 +427,9 @@ static void i40e_reuse_rx_buffer_zc(struct i40e_ring *rx_ring,
rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
/* transfer page from old buffer to new buffer */
- new_bi->dma = old_bi->dma & mask;
- new_bi->dma += hr;
-
- new_bi->addr = (void *)((unsigned long)old_bi->addr & mask);
- new_bi->addr += hr;
-
- new_bi->handle = old_bi->handle & mask;
- new_bi->handle += rx_ring->xsk_umem->headroom;
+ new_bi->dma = old_bi->dma;
+ new_bi->addr = old_bi->addr;
+ new_bi->handle = old_bi->handle;
old_bi->addr = NULL;
}
--
2.17.1
^ permalink raw reply related
* [PATCH v2 02/10] ixgbe: simplify Rx buffer recycle
From: Kevin Laatz @ 2019-07-16 3:06 UTC (permalink / raw)
To: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jakub.kicinski,
jonathan.lemon
Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz
In-Reply-To: <20190716030637.5634-1-kevin.laatz@intel.com>
Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'obi' values
directly without having to mask and add the headroom.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 6b609553329f..bc86057628c8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -201,8 +201,6 @@ ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(struct ixgbe_ring *rx_ring,
static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
struct ixgbe_rx_buffer *obi)
{
- unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
- u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
u16 nta = rx_ring->next_to_alloc;
struct ixgbe_rx_buffer *nbi;
@@ -212,14 +210,9 @@ static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
/* transfer page from old buffer to new buffer */
- nbi->dma = obi->dma & mask;
- nbi->dma += hr;
-
- nbi->addr = (void *)((unsigned long)obi->addr & mask);
- nbi->addr += hr;
-
- nbi->handle = obi->handle & mask;
- nbi->handle += rx_ring->xsk_umem->headroom;
+ nbi->dma = obi->dma;
+ nbi->addr = obi->addr;
+ nbi->handle = obi->handle;
obi->addr = NULL;
obi->skb = NULL;
--
2.17.1
^ permalink raw reply related
* [PATCH v2 03/10] xsk: add support to allow unaligned chunk placement
From: Kevin Laatz @ 2019-07-16 3:06 UTC (permalink / raw)
To: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jakub.kicinski,
jonathan.lemon
Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz
In-Reply-To: <20190716030637.5634-1-kevin.laatz@intel.com>
Currently, addresses are chunk size aligned. This means, we are very
restricted in terms of where we can place chunk within the umem. For
example, if we have a chunk size of 2k, then our chunks can only be placed
at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
This patch introduces the ability to use unaligned chunks. With these
changes, we are no longer bound to having to place chunks at a 2k (or
whatever your chunk size is) interval. Since we are no longer dealing with
aligned chunks, they can now cross page boundaries. Checks for page
contiguity have been added in order to keep track of which pages are
followed by a physically contiguous page.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
v2:
- Add checks for the flags coming from userspace
- Fix how we get chunk_size in xsk_diag.c
- Add defines for masking the new descriptor format
- Modified the rx functions to use new descriptor format
- Modified the tx functions to use new descriptor format
---
include/net/xdp_sock.h | 2 +
include/uapi/linux/if_xdp.h | 9 ++++
net/xdp/xdp_umem.c | 17 ++++---
net/xdp/xsk.c | 89 ++++++++++++++++++++++++++++++-------
net/xdp/xsk_diag.c | 2 +-
net/xdp/xsk_queue.h | 70 +++++++++++++++++++++++++----
6 files changed, 159 insertions(+), 30 deletions(-)
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 69796d264f06..f7ab8ff33f06 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -19,6 +19,7 @@ struct xsk_queue;
struct xdp_umem_page {
void *addr;
dma_addr_t dma;
+ bool next_pg_contig;
};
struct xdp_umem_fq_reuse {
@@ -48,6 +49,7 @@ struct xdp_umem {
bool zc;
spinlock_t xsk_list_lock;
struct list_head xsk_list;
+ u32 flags;
};
struct xdp_sock {
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index faaa5ca2a117..f8dc68fcdf78 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -17,6 +17,9 @@
#define XDP_COPY (1 << 1) /* Force copy-mode */
#define XDP_ZEROCOPY (1 << 2) /* Force zero-copy mode */
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
+
struct sockaddr_xdp {
__u16 sxdp_family;
__u16 sxdp_flags;
@@ -53,6 +56,7 @@ struct xdp_umem_reg {
__u64 len; /* Length of packet data area */
__u32 chunk_size;
__u32 headroom;
+ __u32 flags;
};
struct xdp_statistics {
@@ -74,6 +78,11 @@ struct xdp_options {
#define XDP_UMEM_PGOFF_FILL_RING 0x100000000ULL
#define XDP_UMEM_PGOFF_COMPLETION_RING 0x180000000ULL
+/* Masks for unaligned chunks mode */
+#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
+#define XSK_UNALIGNED_BUF_ADDR_MASK \
+ ((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
+
/* Rx/Tx descriptor */
struct xdp_desc {
__u64 addr;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 20c91f02d3d8..6130735bdd3d 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -303,6 +303,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
{
+ bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;
u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
unsigned int chunks, chunks_per_page;
u64 addr = mr->addr, size = mr->len;
@@ -318,7 +319,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
return -EINVAL;
}
- if (!is_power_of_2(chunk_size))
+ if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNKS))
+ return -EINVAL;
+
+ if (!unaligned_chunks && !is_power_of_2(chunk_size))
return -EINVAL;
if (!PAGE_ALIGNED(addr)) {
@@ -335,9 +339,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
if (chunks == 0)
return -EINVAL;
- chunks_per_page = PAGE_SIZE / chunk_size;
- if (chunks < chunks_per_page || chunks % chunks_per_page)
- return -EINVAL;
+ if (!unaligned_chunks) {
+ chunks_per_page = PAGE_SIZE / chunk_size;
+ if (chunks < chunks_per_page || chunks % chunks_per_page)
+ return -EINVAL;
+ }
headroom = ALIGN(headroom, 64);
@@ -346,13 +352,14 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
return -EINVAL;
umem->address = (unsigned long)addr;
- umem->chunk_mask = ~((u64)chunk_size - 1);
+ umem->chunk_mask = unaligned_chunks ? U64_MAX : ~((u64)chunk_size - 1);
umem->size = size;
umem->headroom = headroom;
umem->chunk_size_nohr = chunk_size - headroom;
umem->npgs = size / PAGE_SIZE;
umem->pgs = NULL;
umem->user = NULL;
+ umem->flags = mr->flags;
INIT_LIST_HEAD(&umem->xsk_list);
spin_lock_init(&umem->xsk_list_lock);
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index d4d6f10aa936..78089825821a 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
{
- return xskq_peek_addr(umem->fq, addr);
+ return xskq_peek_addr(umem->fq, addr, umem);
}
EXPORT_SYMBOL(xsk_umem_peek_addr);
@@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
}
EXPORT_SYMBOL(xsk_umem_discard_addr);
+/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
+ * each page. This is only required in copy mode.
+ */
+static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
+ u32 len, u32 metalen)
+{
+ void *to_buf = xdp_umem_get_data(umem, addr);
+
+ if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
+ void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
+ u64 page_start = addr & (PAGE_SIZE - 1);
+ u64 first_len = PAGE_SIZE - (addr - page_start);
+
+ memcpy(to_buf, from_buf, first_len + metalen);
+ memcpy(next_pg_addr, from_buf + first_len, len - first_len);
+
+ return;
+ }
+
+ memcpy(to_buf, from_buf, len + metalen);
+}
+
static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
{
- void *to_buf, *from_buf;
+ u64 offset = xs->umem->headroom;
+ void *from_buf;
u32 metalen;
u64 addr;
int err;
- if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+ if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
xs->rx_dropped++;
return -ENOSPC;
}
- addr += xs->umem->headroom;
-
if (unlikely(xdp_data_meta_unsupported(xdp))) {
from_buf = xdp->data;
metalen = 0;
@@ -78,9 +99,13 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
metalen = xdp->data - xdp->data_meta;
}
- to_buf = xdp_umem_get_data(xs->umem, addr);
- memcpy(to_buf, from_buf, len + metalen);
- addr += metalen;
+ __xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);
+
+ offset += metalen;
+ if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+ addr |= offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+ else
+ addr += offset;
err = xskq_produce_batch_desc(xs->rx, addr, len);
if (!err) {
xskq_discard_addr(xs->umem->fq);
@@ -127,6 +152,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
u32 len = xdp->data_end - xdp->data;
void *buffer;
u64 addr;
+ u64 offset = xs->umem->headroom;
int err;
spin_lock_bh(&xs->rx_lock);
@@ -136,17 +162,20 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
goto out_unlock;
}
- if (!xskq_peek_addr(xs->umem->fq, &addr) ||
+ if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
err = -ENOSPC;
goto out_drop;
}
- addr += xs->umem->headroom;
-
- buffer = xdp_umem_get_data(xs->umem, addr);
+ buffer = xdp_umem_get_data(xs->umem, addr + offset);
memcpy(buffer, xdp->data_meta, len + metalen);
- addr += metalen;
+ offset += metalen;
+
+ if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+ addr |= offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+ else
+ addr += offset;
err = xskq_produce_batch_desc(xs->rx, addr, len);
if (err)
goto out_drop;
@@ -190,7 +219,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
rcu_read_lock();
list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
- if (!xskq_peek_desc(xs->tx, desc))
+ if (!xskq_peek_desc(xs->tx, desc, umem))
continue;
if (xskq_produce_addr_lazy(umem->cq, desc->addr))
@@ -240,7 +269,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
mutex_lock(&xs->mutex);
- while (xskq_peek_desc(xs->tx, &desc)) {
+ while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
char *buffer;
u64 addr;
u32 len;
@@ -265,6 +294,10 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
skb_put(skb, len);
addr = desc.addr;
+ if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+ addr = (addr & XSK_UNALIGNED_BUF_ADDR_MASK) |
+ (addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+
buffer = xdp_umem_get_data(xs->umem, addr);
err = skb_store_bits(skb, 0, buffer, len);
if (unlikely(err)) {
@@ -275,7 +308,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
skb->dev = xs->dev;
skb->priority = sk->sk_priority;
skb->mark = sk->sk_mark;
- skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
+ skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
skb->destructor = xsk_destruct_skb;
err = dev_direct_xmit(skb, xs->queue_id);
@@ -415,6 +448,28 @@ static struct socket *xsk_lookup_xsk_from_fd(int fd)
return sock;
}
+/* Check if umem pages are contiguous.
+ * If zero-copy mode, use the DMA address to do the page contiguity check
+ * For all other modes we use addr (kernel virtual address)
+ */
+static void xsk_check_page_contiguity(struct xdp_umem *umem, u32 flags)
+{
+ int i;
+
+ if (flags & XDP_ZEROCOPY) {
+ for (i = 0; i < umem->npgs - 1; i++)
+ umem->pages[i].next_pg_contig =
+ (umem->pages[i].dma + PAGE_SIZE ==
+ umem->pages[i + 1].dma);
+ return;
+ }
+
+ for (i = 0; i < umem->npgs - 1; i++)
+ umem->pages[i].next_pg_contig =
+ (umem->pages[i].addr + PAGE_SIZE ==
+ umem->pages[i + 1].addr);
+}
+
static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
{
struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
@@ -502,6 +557,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
err = xdp_umem_assign_dev(xs->umem, dev, qid, flags);
if (err)
goto out_unlock;
+
+ xsk_check_page_contiguity(xs->umem, flags);
}
xs->dev = dev;
diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index d5e06c8e0cbf..9986a759fe06 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -56,7 +56,7 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb)
du.id = umem->id;
du.size = umem->size;
du.num_pages = umem->npgs;
- du.chunk_size = (__u32)(~umem->chunk_mask + 1);
+ du.chunk_size = umem->chunk_size_nohr + umem->headroom;
du.headroom = umem->headroom;
du.ifindex = umem->dev ? umem->dev->ifindex : 0;
du.queue_id = umem->queue_id;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 909c5168ed0f..04afc9de86d9 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -133,6 +133,16 @@ static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
/* UMEM queue */
+static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr,
+ u64 length)
+{
+ bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
+ bool next_pg_contig =
+ umem->pages[(addr >> PAGE_SHIFT) + 1].next_pg_contig;
+
+ return cross_pg && !next_pg_contig;
+}
+
static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
{
if (addr >= q->size) {
@@ -143,23 +153,52 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
return true;
}
-static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
+static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
+ u64 length,
+ struct xdp_umem *umem)
+{
+ addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+ addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+ if (addr >= q->size ||
+ xskq_crosses_non_contig_pg(umem, addr, length)) {
+ q->invalid_descs++;
+ return false;
+ }
+
+ return true;
+}
+
+static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
+ struct xdp_umem *umem)
{
while (q->cons_tail != q->cons_head) {
struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
unsigned int idx = q->cons_tail & q->ring_mask;
*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
+ if (*addr & (~XSK_UNALIGNED_BUF_ADDR_MASK))
+ goto out;
+
+ if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+ if (xskq_is_valid_addr_unaligned(q, *addr,
+ umem->chunk_size_nohr,
+ umem))
+ return addr;
+ goto out;
+ }
+
if (xskq_is_valid_addr(q, *addr))
return addr;
+out:
q->cons_tail++;
}
return NULL;
}
-static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
+static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
+ struct xdp_umem *umem)
{
if (q->cons_tail == q->cons_head) {
smp_mb(); /* D, matches A */
@@ -170,7 +209,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
smp_rmb();
}
- return xskq_validate_addr(q, addr);
+ return xskq_validate_addr(q, addr, umem);
}
static inline void xskq_discard_addr(struct xsk_queue *q)
@@ -229,8 +268,21 @@ static inline int xskq_reserve_addr(struct xsk_queue *q)
/* Rx/Tx queue */
-static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
+static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
+ struct xdp_umem *umem)
{
+ if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS) {
+ if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
+ return false;
+
+ if (d->len > umem->chunk_size_nohr || d->options) {
+ q->invalid_descs++;
+ return false;
+ }
+
+ return true;
+ }
+
if (!xskq_is_valid_addr(q, d->addr))
return false;
@@ -244,14 +296,15 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
}
static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
- struct xdp_desc *desc)
+ struct xdp_desc *desc,
+ struct xdp_umem *umem)
{
while (q->cons_tail != q->cons_head) {
struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
unsigned int idx = q->cons_tail & q->ring_mask;
*desc = READ_ONCE(ring->desc[idx]);
- if (xskq_is_valid_desc(q, desc))
+ if (xskq_is_valid_desc(q, desc, umem))
return desc;
q->cons_tail++;
@@ -261,7 +314,8 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
}
static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
- struct xdp_desc *desc)
+ struct xdp_desc *desc,
+ struct xdp_umem *umem)
{
if (q->cons_tail == q->cons_head) {
smp_mb(); /* D, matches A */
@@ -272,7 +326,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
smp_rmb(); /* C, matches B */
}
- return xskq_validate_desc(q, desc);
+ return xskq_validate_desc(q, desc, umem);
}
static inline void xskq_discard_desc(struct xsk_queue *q)
--
2.17.1
^ permalink raw reply related
* [PATCH v2 04/10] i40e: modify driver for handling offsets
From: Kevin Laatz @ 2019-07-16 3:06 UTC (permalink / raw)
To: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jakub.kicinski,
jonathan.lemon
Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz
In-Reply-To: <20190716030637.5634-1-kevin.laatz@intel.com>
With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_xsk.c | 26 +++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index dfa096db2244..b8316e9ba159 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -190,7 +190,9 @@ int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem,
**/
static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
{
+ struct xdp_umem *umem = rx_ring->xsk_umem;
int err, result = I40E_XDP_PASS;
+ u64 offset = umem->headroom;
struct i40e_ring *xdp_ring;
struct bpf_prog *xdp_prog;
u32 act;
@@ -201,7 +203,13 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
*/
xdp_prog = READ_ONCE(rx_ring->xdp_prog);
act = bpf_prog_run_xdp(xdp_prog, xdp);
- xdp->handle += xdp->data - xdp->data_hard_start;
+ offset += xdp->data - xdp->data_hard_start;
+
+ if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+ xdp->handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+ else
+ xdp->handle += offset;
+
switch (act) {
case XDP_PASS:
break;
@@ -262,7 +270,7 @@ static bool i40e_alloc_buffer_zc(struct i40e_ring *rx_ring,
bi->addr = xdp_umem_get_data(umem, handle);
bi->addr += hr;
- bi->handle = handle + umem->headroom;
+ bi->handle = handle;
xsk_umem_discard_addr(umem);
return true;
@@ -299,7 +307,7 @@ static bool i40e_alloc_buffer_slow_zc(struct i40e_ring *rx_ring,
bi->addr = xdp_umem_get_data(umem, handle);
bi->addr += hr;
- bi->handle = handle + umem->headroom;
+ bi->handle = handle;
xsk_umem_discard_addr_rq(umem);
return true;
@@ -456,7 +464,10 @@ void i40e_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
nta++;
rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
- handle &= mask;
+ if (rx_ring->xsk_umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+ handle &= XSK_UNALIGNED_BUF_ADDR_MASK;
+ else
+ handle &= mask;
bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
bi->dma += hr;
@@ -635,6 +646,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
struct i40e_tx_buffer *tx_bi;
bool work_done = true;
struct xdp_desc desc;
+ u64 addr, offset;
dma_addr_t dma;
while (budget-- > 0) {
@@ -647,7 +659,11 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
break;
- dma = xdp_umem_get_dma(xdp_ring->xsk_umem, desc.addr);
+ /* for unaligned chunks need to take offset from upper bits */
+ offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+ addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
+
+ dma = xdp_umem_get_dma(xdp_ring->xsk_umem, addr + offset);
dma_sync_single_for_device(xdp_ring->dev, dma, desc.len,
DMA_BIDIRECTIONAL);
--
2.17.1
^ permalink raw reply related
* [PATCH v2 05/10] ixgbe: modify driver for handling offsets
From: Kevin Laatz @ 2019-07-16 3:06 UTC (permalink / raw)
To: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jakub.kicinski,
jonathan.lemon
Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz
In-Reply-To: <20190716030637.5634-1-kevin.laatz@intel.com>
With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 26 ++++++++++++++++----
1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index bc86057628c8..ac1669b18d13 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -143,7 +143,9 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
struct ixgbe_ring *rx_ring,
struct xdp_buff *xdp)
{
+ struct xdp_umem *umem = rx_ring->xsk_umem;
int err, result = IXGBE_XDP_PASS;
+ u64 offset = umem->headroom;
struct bpf_prog *xdp_prog;
struct xdp_frame *xdpf;
u32 act;
@@ -151,7 +153,13 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
rcu_read_lock();
xdp_prog = READ_ONCE(rx_ring->xdp_prog);
act = bpf_prog_run_xdp(xdp_prog, xdp);
- xdp->handle += xdp->data - xdp->data_hard_start;
+ offset += xdp->data - xdp->data_hard_start;
+
+ if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+ xdp->handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+ else
+ xdp->handle += offset;
+
switch (act) {
case XDP_PASS:
break;
@@ -235,7 +243,10 @@ void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
nta++;
rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
- handle &= mask;
+ if (rx_ring->xsk_umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
+ handle &= XSK_UNALIGNED_BUF_ADDR_MASK;
+ else
+ handle &= mask;
bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
bi->dma += hr;
@@ -269,7 +280,7 @@ static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
bi->addr = xdp_umem_get_data(umem, handle);
bi->addr += hr;
- bi->handle = handle + umem->headroom;
+ bi->handle = handle;
xsk_umem_discard_addr(umem);
return true;
@@ -296,7 +307,7 @@ static bool ixgbe_alloc_buffer_slow_zc(struct ixgbe_ring *rx_ring,
bi->addr = xdp_umem_get_data(umem, handle);
bi->addr += hr;
- bi->handle = handle + umem->headroom;
+ bi->handle = handle;
xsk_umem_discard_addr_rq(umem);
return true;
@@ -565,6 +576,7 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
struct ixgbe_tx_buffer *tx_bi;
bool work_done = true;
struct xdp_desc desc;
+ u64 addr, offset;
dma_addr_t dma;
u32 cmd_type;
@@ -578,7 +590,11 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
break;
- dma = xdp_umem_get_dma(xdp_ring->xsk_umem, desc.addr);
+ /* for unaligned chunks need to take offset from upper bits */
+ offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+ addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
+
+ dma = xdp_umem_get_dma(xdp_ring->xsk_umem, addr + offset);
dma_sync_single_for_device(xdp_ring->dev, dma, desc.len,
DMA_BIDIRECTIONAL);
--
2.17.1
^ permalink raw reply related
* [PATCH v2 07/10] samples/bpf: add unaligned chunks mode support to xdpsock
From: Kevin Laatz @ 2019-07-16 3:06 UTC (permalink / raw)
To: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jakub.kicinski,
jonathan.lemon
Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz
In-Reply-To: <20190716030637.5634-1-kevin.laatz@intel.com>
This patch adds support for the unaligned chunks mode. The addition of the
unaligned chunks option will allow users to run the application with more
relaxed chunk placement in the XDP umem.
Unaligned chunks mode can be used with the '-u' or '--unaligned' command
line options.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
samples/bpf/xdpsock_user.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 93eaaf7239b2..26ba1a1fd582 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -67,6 +67,8 @@ static int opt_ifindex;
static int opt_queue;
static int opt_poll;
static int opt_interval = 1;
+static u32 opt_umem_flags;
+static int opt_unaligned_chunks;
static u32 opt_xdp_bind_flags;
static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
static __u32 prog_id;
@@ -282,7 +284,9 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
.comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS,
.frame_size = opt_xsk_frame_size,
.frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM,
+ .flags = opt_umem_flags
};
+
int ret;
umem = calloc(1, sizeof(*umem));
@@ -291,6 +295,7 @@ static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
&cfg);
+
if (ret)
exit_with_error(-ret);
@@ -352,6 +357,7 @@ static struct option long_options[] = {
{"zero-copy", no_argument, 0, 'z'},
{"copy", no_argument, 0, 'c'},
{"frame-size", required_argument, 0, 'f'},
+ {"unaligned", no_argument, 0, 'u'},
{0, 0, 0, 0}
};
@@ -372,6 +378,7 @@ static void usage(const char *prog)
" -z, --zero-copy Force zero-copy mode.\n"
" -c, --copy Force copy mode.\n"
" -f, --frame-size=n Set the frame size (must be a power of two, default is %d).\n"
+ " -u, --unaligned Enable unaligned chunk placement\n"
"\n";
fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE);
exit(EXIT_FAILURE);
@@ -384,7 +391,7 @@ static void parse_command_line(int argc, char **argv)
opterr = 0;
for (;;) {
- c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:", long_options,
+ c = getopt_long(argc, argv, "Frtli:q:psSNn:czf:u", long_options,
&option_index);
if (c == -1)
break;
@@ -424,12 +431,17 @@ static void parse_command_line(int argc, char **argv)
case 'c':
opt_xdp_bind_flags |= XDP_COPY;
break;
+ case 'u':
+ opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
+ opt_unaligned_chunks = 1;
+ break;
case 'F':
opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
break;
case 'f':
opt_xsk_frame_size = atoi(optarg);
break;
+
default:
usage(basename(argv[0]));
}
@@ -442,7 +454,8 @@ static void parse_command_line(int argc, char **argv)
usage(basename(argv[0]));
}
- if (opt_xsk_frame_size & (opt_xsk_frame_size - 1)) {
+ if ((opt_xsk_frame_size & (opt_xsk_frame_size - 1)) &&
+ !opt_unaligned_chunks) {
fprintf(stderr, "--frame-size=%d is not a power of two\n",
opt_xsk_frame_size);
usage(basename(argv[0]));
--
2.17.1
^ permalink raw reply related
* [PATCH v2 08/10] samples/bpf: add buffer recycling for unaligned chunks to xdpsock
From: Kevin Laatz @ 2019-07-16 3:06 UTC (permalink / raw)
To: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jakub.kicinski,
jonathan.lemon
Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz
In-Reply-To: <20190716030637.5634-1-kevin.laatz@intel.com>
This patch adds buffer recycling support for unaligned buffers. Since we
don't mask the addr to 2k at umem_reg in unaligned mode, we need to make
sure we give back the correct (original) addr to the fill queue. We achieve
this using the new descriptor format and associated masks. The new format
uses the upper 16-bits for the offset and the lower 48-bits for the addr.
Since we have a field for the offset, we no longer need to modify the
actual address. As such, all we have to do to get back the original address
is mask for the lower 48 bits (i.e. strip the offset and we get the address
on it's own).
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
v2:
- Removed unused defines
- Fix buffer recycling for unaligned case
- Remove --buf-size (--frame-size merged before this)
- Modifications to use the new descriptor format for buffer recycling
---
samples/bpf/xdpsock_user.c | 29 +++++++++++++++++++----------
1 file changed, 19 insertions(+), 10 deletions(-)
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 26ba1a1fd582..8f220afd549a 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -474,6 +474,7 @@ static void kick_tx(struct xsk_socket_info *xsk)
static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
{
+ struct xsk_umem_info *umem = xsk->umem;
u32 idx_cq = 0, idx_fq = 0;
unsigned int rcvd;
size_t ndescs;
@@ -486,22 +487,24 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
xsk->outstanding_tx;
/* re-add completed Tx buffers */
- rcvd = xsk_ring_cons__peek(&xsk->umem->cq, ndescs, &idx_cq);
+ rcvd = xsk_ring_cons__peek(&umem->cq, ndescs, &idx_cq);
if (rcvd > 0) {
unsigned int i;
int ret;
- ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+ ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
while (ret != rcvd) {
if (ret < 0)
exit_with_error(-ret);
- ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
- &idx_fq);
+ ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
+ }
+
+ for (i = 0; i < rcvd; i++) {
+ u64 comp_addr =
+ *xsk_ring_cons__comp_addr(&umem->cq, idx_cq++);
+ *xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) =
+ comp_addr & XSK_UNALIGNED_BUF_ADDR_MASK;
}
- for (i = 0; i < rcvd; i++)
- *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
- *xsk_ring_cons__comp_addr(&xsk->umem->cq,
- idx_cq++);
xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
xsk_ring_cons__release(&xsk->umem->cq, rcvd);
@@ -548,7 +551,11 @@ static void rx_drop(struct xsk_socket_info *xsk)
for (i = 0; i < rcvd; i++) {
u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
- char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+ u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+
+ addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
+ char *pkt = xsk_umem__get_data(xsk->umem->buffer,
+ addr + offset);
hex_dump(pkt, len, addr);
*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
@@ -654,7 +661,9 @@ static void l2fwd(struct xsk_socket_info *xsk)
idx_rx)->addr;
u32 len = xsk_ring_cons__rx_desc(&xsk->rx,
idx_rx++)->len;
- char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+ u64 offset = addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+ char *pkt = xsk_umem__get_data(xsk->umem->buffer,
+ (addr & XSK_UNALIGNED_BUF_ADDR_MASK) + offset);
swap_mac_addresses(pkt);
--
2.17.1
^ permalink raw reply related
* [PATCH v2 10/10] doc/af_xdp: include unaligned chunk case
From: Kevin Laatz @ 2019-07-16 3:06 UTC (permalink / raw)
To: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jakub.kicinski,
jonathan.lemon
Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz
In-Reply-To: <20190716030637.5634-1-kevin.laatz@intel.com>
The addition of unaligned chunks mode, the documentation needs to be
updated to indicate that the incoming addr to the fill ring will only be
masked if the user application is run in the aligned chunk mode. This patch
also adds a line to explicitly indicate that the incoming addr will not be
masked if running the user application in the unaligned chunk mode.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
Documentation/networking/af_xdp.rst | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index eeedc2e826aa..83f7ae5fc045 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -153,10 +153,12 @@ an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
Frames passed to the kernel are used for the ingress path (RX rings).
-The user application produces UMEM addrs to this ring. Note that the
-kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
-log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
-and 3000 refers to the same chunk.
+The user application produces UMEM addrs to this ring. Note that, if
+running the application with aligned chunk mode, the kernel will mask
+the incoming addr. E.g. for a chunk size of 2k, the log2(2048) LSB of
+the addr will be masked off, meaning that 2048, 2050 and 3000 refers
+to the same chunk. If the user application is run in the unaligned
+chunks mode, then the incoming addr will be left untouched.
UMEM Completion Ring
--
2.17.1
^ permalink raw reply related
* [PATCH v2 09/10] samples/bpf: use hugepages in xdpsock app
From: Kevin Laatz @ 2019-07-16 3:06 UTC (permalink / raw)
To: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jakub.kicinski,
jonathan.lemon
Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz
In-Reply-To: <20190716030637.5634-1-kevin.laatz@intel.com>
This patch modifies xdpsock to use mmap instead of posix_memalign. With
this change, we can use hugepages when running the application in unaligned
chunks mode. Using hugepages makes it more likely that we have physically
contiguous memory, which supports the unaligned chunk mode better.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
samples/bpf/xdpsock_user.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 8f220afd549a..958a27193582 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -69,6 +69,7 @@ static int opt_poll;
static int opt_interval = 1;
static u32 opt_umem_flags;
static int opt_unaligned_chunks;
+static int opt_mmap_flags;
static u32 opt_xdp_bind_flags;
static int opt_xsk_frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
static __u32 prog_id;
@@ -434,6 +435,7 @@ static void parse_command_line(int argc, char **argv)
case 'u':
opt_umem_flags |= XDP_UMEM_UNALIGNED_CHUNKS;
opt_unaligned_chunks = 1;
+ opt_mmap_flags = MAP_HUGETLB;
break;
case 'F':
opt_xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
@@ -696,11 +698,14 @@ int main(int argc, char **argv)
exit(EXIT_FAILURE);
}
- ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
- NUM_FRAMES * opt_xsk_frame_size);
- if (ret)
- exit_with_error(ret);
-
+ /* Reserve memory for the umem. Use hugepages if unaligned chunk mode */
+ bufs = mmap(NULL, NUM_FRAMES * opt_xsk_frame_size,
+ PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS | opt_mmap_flags, -1, 0);
+ if (bufs == MAP_FAILED) {
+ printf("ERROR: mmap failed\n");
+ exit(EXIT_FAILURE);
+ }
/* Create sockets... */
umem = xsk_configure_umem(bufs, NUM_FRAMES * opt_xsk_frame_size);
xsks[num_socks++] = xsk_configure_socket(umem);
--
2.17.1
^ permalink raw reply related
* [PATCH v2 06/10] libbpf: add flags to umem config
From: Kevin Laatz @ 2019-07-16 3:06 UTC (permalink / raw)
To: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jakub.kicinski,
jonathan.lemon
Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan, Kevin Laatz
In-Reply-To: <20190716030637.5634-1-kevin.laatz@intel.com>
This patch adds a 'flags' field to the umem_config and umem_reg structs.
This will allow for more options to be added for configuring umems.
The first use for the flags field is to add a flag for unaligned chunks
mode. These flags can either be user-provided or filled with a default.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
v2:
- Removed the headroom check from this patch. It has moved to the
previous patch.
---
tools/include/uapi/linux/if_xdp.h | 4 ++++
tools/lib/bpf/xsk.c | 3 +++
tools/lib/bpf/xsk.h | 2 ++
3 files changed, 9 insertions(+)
diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index faaa5ca2a117..594bcebb189c 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -17,6 +17,9 @@
#define XDP_COPY (1 << 1) /* Force copy-mode */
#define XDP_ZEROCOPY (1 << 2) /* Force zero-copy mode */
+/* Flags for xsk_umem_config flags */
+#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
+
struct sockaddr_xdp {
__u16 sxdp_family;
__u16 sxdp_flags;
@@ -53,6 +56,7 @@ struct xdp_umem_reg {
__u64 len; /* Length of packet data area */
__u32 chunk_size;
__u32 headroom;
+ __u32 flags;
};
struct xdp_statistics {
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index b33740221b7e..d5ff3fc39e32 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -116,6 +116,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+ cfg->flags = XSK_UMEM__DEFAULT_FLAGS;
return;
}
@@ -123,6 +124,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg,
cfg->comp_size = usr_cfg->comp_size;
cfg->frame_size = usr_cfg->frame_size;
cfg->frame_headroom = usr_cfg->frame_headroom;
+ cfg->flags = usr_cfg->flags;
}
static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
@@ -182,6 +184,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
mr.len = size;
mr.chunk_size = umem->config.frame_size;
mr.headroom = umem->config.frame_headroom;
+ mr.flags = umem->config.flags;
err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
if (err) {
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 833a6e60d065..44a03d8c34b9 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -170,12 +170,14 @@ LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
#define XSK_UMEM__DEFAULT_FRAME_SHIFT 12 /* 4096 bytes */
#define XSK_UMEM__DEFAULT_FRAME_SIZE (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
#define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+#define XSK_UMEM__DEFAULT_FLAGS 0
struct xsk_umem_config {
__u32 fill_size;
__u32 comp_size;
__u32 frame_size;
__u32 frame_headroom;
+ __u32 flags;
};
/* Flags for the libbpf_flags field. */
--
2.17.1
^ permalink raw reply related
* Re: [bpf-next RFC 2/6] tcp: add skb-less helpers to retrieve SYN cookie
From: Lorenz Bauer @ 2019-07-16 11:34 UTC (permalink / raw)
To: Petar Penkov
Cc: Networking, bpf, davem, Alexei Starovoitov, Daniel Borkmann,
Eric Dumazet, sdf, Petar Penkov
In-Reply-To: <20190716002650.154729-3-ppenkov.kernel@gmail.com>
On Tue, 16 Jul 2019 at 01:27, Petar Penkov <ppenkov.kernel@gmail.com> wrote:
>
> From: Petar Penkov <ppenkov@google.com>
>
> This patch allows generation of a SYN cookie before an SKB has been
> allocated, as is the case at XDP.
>
> Signed-off-by: Petar Penkov <ppenkov@google.com>
> ---
> include/net/tcp.h | 11 ++++++
> net/ipv4/tcp_input.c | 79 ++++++++++++++++++++++++++++++++++++++++++++
> net/ipv4/tcp_ipv4.c | 8 +++++
> net/ipv6/tcp_ipv6.c | 8 +++++
> 4 files changed, 106 insertions(+)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index cca3c59b98bf..a128e22c0d5d 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -414,6 +414,17 @@ void tcp_parse_options(const struct net *net, const struct sk_buff *skb,
> int estab, struct tcp_fastopen_cookie *foc);
> const u8 *tcp_parse_md5sig_option(const struct tcphdr *th);
>
> +/*
> + * BPF SKB-less helpers
> + */
> +u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph,
> + struct tcphdr *tch, u32 *cookie);
> +u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph,
> + struct tcphdr *tch, u32 *cookie);
> +u16 tcp_get_syncookie(struct request_sock_ops *rsk_ops,
> + const struct tcp_request_sock_ops *af_ops,
> + struct sock *sk, void *iph, struct tcphdr *tch,
> + u32 *cookie);
> /*
> * TCP v4 functions exported for the inet6 API
> */
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 8892df6de1d4..1406d7e0953c 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3782,6 +3782,52 @@ static void smc_parse_options(const struct tcphdr *th,
> #endif
> }
>
> +/* Try to parse the MSS option from the TCP header. Return 0 on failure, clamped
> + * value on success.
> + *
> + * Invoked for BPF SYN cookie generation, so th should be a SYN.
> + */
> +static u16 tcp_parse_mss_option(const struct net *net, const struct tcphdr *th,
> + u16 user_mss)
net seems unused?
> +{
> + const unsigned char *ptr = (const unsigned char *)(th + 1);
> + int length = (th->doff * 4) - sizeof(struct tcphdr);
> + u16 mss = 0;
> +
> + while (length > 0) {
> + int opcode = *ptr++;
> + int opsize;
> +
> + switch (opcode) {
> + case TCPOPT_EOL:
> + return mss;
> + case TCPOPT_NOP: /* Ref: RFC 793 section 3.1 */
> + length--;
> + continue;
> + default:
> + if (length < 2)
> + return mss;
> + opsize = *ptr++;
> + if (opsize < 2) /* "silly options" */
> + return mss;
> + if (opsize > length)
> + return mss; /* fail on partial options */
> + if (opcode == TCPOPT_MSS && opsize == TCPOLEN_MSS) {
> + u16 in_mss = get_unaligned_be16(ptr);
> +
> + if (in_mss) {
> + if (user_mss && user_mss < in_mss)
> + in_mss = user_mss;
> + mss = in_mss;
> + }
> + }
> + ptr += opsize - 2;
> + length -= opsize;
> + }
> + }
> + return mss;
> +}
> +
> /* Look for tcp options. Normally only called on SYN and SYNACK packets.
> * But, this can also be called on packets in the established flow when
> * the fast version below fails.
> @@ -6464,6 +6510,39 @@ static void tcp_reqsk_record_syn(const struct sock *sk,
> }
> }
>
> +u16 tcp_get_syncookie(struct request_sock_ops *rsk_ops,
> + const struct tcp_request_sock_ops *af_ops,
> + struct sock *sk, void *iph, struct tcphdr *th,
> + u32 *cookie)
> +{
> + u16 mss = 0;
> +#ifdef CONFIG_SYN_COOKIES
> + bool is_v4 = rsk_ops->family == AF_INET;
> + struct tcp_sock *tp = tcp_sk(sk);
> +
> + if (sock_net(sk)->ipv4.sysctl_tcp_syncookies != 2 &&
> + !inet_csk_reqsk_queue_is_full(sk))
> + return 0;
> +
> + if (!tcp_syn_flood_action(sk, rsk_ops->slab_name))
> + return 0;
> +
> + if (sk_acceptq_is_full(sk)) {
> + NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
> + return 0;
> + }
> +
> + mss = tcp_parse_mss_option(sock_net(sk), th, tp->rx_opt.user_mss);
> + if (!mss)
> + mss = af_ops->mss_clamp;
> +
> + tcp_synq_overflow(sk);
> + *cookie = is_v4 ? __cookie_v4_init_sequence(iph, th, &mss)
> + : __cookie_v6_init_sequence(iph, th, &mss);
> +#endif
> + return mss;
> +}
> +
> int tcp_conn_request(struct request_sock_ops *rsk_ops,
> const struct tcp_request_sock_ops *af_ops,
> struct sock *sk, struct sk_buff *skb)
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index d57641cb3477..0e06e59784bd 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1515,6 +1515,14 @@ static struct sock *tcp_v4_cookie_check(struct sock *sk, struct sk_buff *skb)
> return sk;
> }
>
> +u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph,
> + struct tcphdr *tch, u32 *cookie)
> +{
> + return tcp_get_syncookie(&tcp_request_sock_ops,
> + &tcp_request_sock_ipv4_ops, sk, iph, tch,
> + cookie);
> +}
> +
> /* The socket must have it's spinlock held when we get
> * here, unless it is a TCP_LISTEN socket.
> *
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index d56a9019a0fe..ce46cdba54bc 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -1058,6 +1058,14 @@ static struct sock *tcp_v6_cookie_check(struct sock *sk, struct sk_buff *skb)
> return sk;
> }
>
> +u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph,
> + struct tcphdr *tch, u32 *cookie)
> +{
> + return tcp_get_syncookie(&tcp6_request_sock_ops,
> + &tcp_request_sock_ipv6_ops, sk, iph, tch,
> + cookie);
> +}
> +
> static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
> {
> if (skb->protocol == htons(ETH_P_IP))
> --
> 2.22.0.510.g264f2c817a-goog
>
--
Lorenz Bauer | Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
www.cloudflare.com
^ permalink raw reply
* Re: [bpf-next RFC 3/6] bpf: add bpf_tcp_gen_syncookie helper
From: Lorenz Bauer @ 2019-07-16 11:54 UTC (permalink / raw)
To: Petar Penkov
Cc: Networking, bpf, davem, Alexei Starovoitov, Daniel Borkmann,
Eric Dumazet, sdf, Petar Penkov
In-Reply-To: <20190716002650.154729-4-ppenkov.kernel@gmail.com>
On Tue, 16 Jul 2019 at 01:27, Petar Penkov <ppenkov.kernel@gmail.com> wrote:
>
> From: Petar Penkov <ppenkov@google.com>
>
> This helper function allows BPF programs to try to generate SYN
> cookies, given a reference to a listener socket. The function works
> from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a
> socket in both cases.
>
> Signed-off-by: Petar Penkov <ppenkov@google.com>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> ---
> include/uapi/linux/bpf.h | 30 ++++++++++++++++++-
> net/core/filter.c | 62 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 91 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 6f68438aa4ed..abf4a85c76d1 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2713,6 +2713,33 @@ union bpf_attr {
> * **-EPERM** if no permission to send the *sig*.
> *
> * **-EAGAIN** if bpf program can try again.
> + *
> + * s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
> + * Description
> + * Try to issue a SYN cookie for the packet with corresponding
> + * IP/TCP headers, *iph* and *th*, on the listening socket in *sk*.
> + *
> + * *iph* points to the start of the IPv4 or IPv6 header, while
> + * *iph_len* contains **sizeof**\ (**struct iphdr**) or
> + * **sizeof**\ (**struct ip6hdr**).
> + *
> + * *th* points to the start of the TCP header, while *th_len*
> + * contains **sizeof**\ (**struct tcphdr**).
> + *
> + * Return
> + * On success, lower 32 bits hold the generated SYN cookie in
> + * network order and the higher 32 bits hold the MSS value for that
> + * cookie.
I prefer returning the cookie vs. directly creating the packet.
Returning a struct would
be nicest, but it doesn't seem worth it to extend the verifier for
that. Taking a pointer
to a result struct would also be good, but we're out of arguments :/
Nit: the MSS is only 16 bit?
> + *
> + * On failure, the returned value is one of the following:
> + *
> + * **-EINVAL** SYN cookie cannot be issued due to error
> + *
> + * **-ENOENT** SYN cookie should not be issued (no SYN flood)
> + *
> + * **-ENOTSUPP** kernel configuration does not enable SYN cookies
> + *
> + * **-EPROTONOSUPPORT** *sk* family is not AF_INET/AF_INET6
> */
> #define __BPF_FUNC_MAPPER(FN) \
> FN(unspec), \
> @@ -2824,7 +2851,8 @@ union bpf_attr {
> FN(strtoul), \
> FN(sk_storage_get), \
> FN(sk_storage_delete), \
> - FN(send_signal),
> + FN(send_signal), \
> + FN(tcp_gen_syncookie),
>
> /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> * function eBPF program intends to call
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 47f6386fb17a..109fd1e286f4 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5850,6 +5850,64 @@ static const struct bpf_func_proto bpf_tcp_check_syncookie_proto = {
> .arg5_type = ARG_CONST_SIZE,
> };
>
> +BPF_CALL_5(bpf_tcp_gen_syncookie, struct sock *, sk, void *, iph, u32, iph_len,
> + struct tcphdr *, th, u32, th_len)
> +{
> +#ifdef CONFIG_SYN_COOKIES
> + u32 cookie;
> + u16 mss;
> +
> + if (unlikely(th_len < sizeof(*th)))
> + return -EINVAL;
> +
> + if (sk->sk_protocol != IPPROTO_TCP || sk->sk_state != TCP_LISTEN)
> + return -EINVAL;
> +
> + if (!sock_net(sk)->ipv4.sysctl_tcp_syncookies)
> + return -EINVAL;
Should this be ENOENT instead?
> +
> + if (!th->syn || th->ack || th->fin || th->rst)
> + return -EINVAL;
> +
> + switch (sk->sk_family) {
> + case AF_INET:
> + if (unlikely(iph_len < sizeof(struct iphdr)))
> + return -EINVAL;
> + mss = tcp_v4_get_syncookie(sk, iph, th, &cookie);
> + break;
> +
> +#if IS_BUILTIN(CONFIG_IPV6)
> + case AF_INET6:
> + if (unlikely(iph_len < sizeof(struct ipv6hdr)))
> + return -EINVAL;
> + mss = tcp_v6_get_syncookie(sk, iph, th, &cookie);
> + break;
> +#endif /* CONFIG_IPV6 */
> +
> + default:
> + return -EPROTONOSUPPORT;
> + }
> + if (mss <= 0)
> + return -ENOENT;
> +
> + return htonl(cookie) | ((u64)mss << 32);
> +#else
> + return -ENOTSUPP;
> +#endif /* CONFIG_SYN_COOKIES */
> +}
> +
> +static const struct bpf_func_proto bpf_tcp_gen_syncookie_proto = {
> + .func = bpf_tcp_gen_syncookie,
> + .gpl_only = true, /* __cookie_v*_init_sequence() is GPL */
> + .pkt_access = true,
> + .ret_type = RET_INTEGER,
> + .arg1_type = ARG_PTR_TO_SOCK_COMMON,
> + .arg2_type = ARG_PTR_TO_MEM,
> + .arg3_type = ARG_CONST_SIZE,
> + .arg4_type = ARG_PTR_TO_MEM,
> + .arg5_type = ARG_CONST_SIZE,
> +};
> +
> #endif /* CONFIG_INET */
>
> bool bpf_helper_changes_pkt_data(void *func)
> @@ -6135,6 +6193,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> return &bpf_tcp_check_syncookie_proto;
> case BPF_FUNC_skb_ecn_set_ce:
> return &bpf_skb_ecn_set_ce_proto;
> + case BPF_FUNC_tcp_gen_syncookie:
> + return &bpf_tcp_gen_syncookie_proto;
> #endif
> default:
> return bpf_base_func_proto(func_id);
> @@ -6174,6 +6234,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> return &bpf_xdp_skc_lookup_tcp_proto;
> case BPF_FUNC_tcp_check_syncookie:
> return &bpf_tcp_check_syncookie_proto;
> + case BPF_FUNC_tcp_gen_syncookie:
> + return &bpf_tcp_gen_syncookie_proto;
> #endif
> default:
> return bpf_base_func_proto(func_id);
> --
> 2.22.0.510.g264f2c817a-goog
>
--
Lorenz Bauer | Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
www.cloudflare.com
^ permalink raw reply
* Re: [bpf-next RFC 3/6] bpf: add bpf_tcp_gen_syncookie helper
From: Lorenz Bauer @ 2019-07-16 11:56 UTC (permalink / raw)
To: Eric Dumazet
Cc: Petar Penkov, Networking, bpf, davem, Alexei Starovoitov,
Daniel Borkmann, Eric Dumazet, sdf, Petar Penkov
In-Reply-To: <b6ef24b0-0415-c67d-5a66-21f1c2530414@gmail.com>
On Tue, 16 Jul 2019 at 08:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > + return -EINVAL;
> > +
> > + if (sk->sk_protocol != IPPROTO_TCP || sk->sk_state != TCP_LISTEN)
> > + return -EINVAL;
> > +
> > + if (!sock_net(sk)->ipv4.sysctl_tcp_syncookies)
> > + return -EINVAL;
> > +
> > + if (!th->syn || th->ack || th->fin || th->rst)
> > + return -EINVAL;
> > +
> > + switch (sk->sk_family) {
>
> This is strange, because a dual stack listener will have sk->sk_family set to AF_INET6.
>
> What really matters here is if the packet is IPv4 or IPv6.
>
> So you need to look at iph->version instead.
>
> Then look if the socket family allows this packet to be processed
> (For example AF_INET6 sockets might prevent IPv4 packets, see sk->sk_ipv6only )
Does this apply for (the existing) tcp_check_syn_cookie as well?
--
Lorenz Bauer | Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
www.cloudflare.com
^ permalink raw reply
* [PATCH bpf] bpf: fix narrower loads on s390
From: Ilya Leoshkevich @ 2019-07-16 11:59 UTC (permalink / raw)
To: bpf, netdev; +Cc: gor, heiko.carstens, Ilya Leoshkevich
test_pkt_md_access is failing on s390, since the associated eBPF prog
returns TC_ACT_SHOT, which in turn happens because loading a part of a
struct __sk_buff field produces an incorrect result.
The problem is that when verifier emits the code to replace partial load
of a field with a full load, a shift and a bitwise AND, it assumes that
the machine is little endian.
Adjust shift count calculation to account for endianness.
Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
kernel/bpf/verifier.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5900cbb966b1..3f9353653558 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8616,8 +8616,12 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
}
if (is_narrower_load && size < target_size) {
- u8 shift = (off & (size_default - 1)) * 8;
-
+ u8 load_off = off & (size_default - 1);
+#ifdef __LITTLE_ENDIAN
+ u8 shift = load_off * 8;
+#else
+ u8 shift = (size_default - (load_off + size)) * 8;
+#endif
if (ctx_field_size <= 4) {
if (shift)
insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH,
--
2.21.0
^ permalink raw reply related
* [PATCH 00/14] pending doc patches for 5.3-rc
From: Mauro Carvalho Chehab @ 2019-07-16 12:10 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, linux-scsi, esc.storagedev, linuxppc-dev,
Jonathan Corbet, alsa-devel, kvm, linux-i2c, rcu, linux-pm,
linux-doc, devicetree, linux-arch, linux-arm-kernel,
linux-watchdog, x86, dri-devel, netdev, linux-crypto, linux-sh,
linux-input, linux-pci
Those are the pending documentation patches after my pull request
for this branch:
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media.git tags/docs/v5.3-1
Patches 1 to 13 were already submitted, but got rebased. Patch 14
is a new fixup one.
Patches 1 and 2 weren't submitted before due to merge conflicts
that are now solved upstream;
Patch 3 fixes a series of random Documentation/* references that
are pointing to the wrong places.
Patch 4 fix a longstanding issue: every time a new book is added,
conf.py need changes, in order to allow generating a PDF file.
After the patch, conf.py will automatically recognize new books,
saving the trouble of keeping adding documents to it.
Patches 5 to 11 are due to fonts support when building translations.pdf.
The main focus is to add xeCJK support. While doing it, I discovered
some bugs at sphinx-pre-install script after running it with 7 different
distributions.
Patch 12 improves support for partial doc building. Currently, each
subdir needs to have its own conf.py, in order to support partial
doc build. After it, any Documentation subdir can be used to
roduce html/pdf docs with:
make SPHINXDIRS="foo bar" htmldocs
(or pdfdocs, latexdocs, epubdocs, ...)
Patch 13 is a cleanup patch: it simply get rid of all those extra
conf.py files that aren't needed anymore. The only extra config
file after it is this one:
Documentation/media/conf_nitpick.py
With enables some extra optional Sphinx features.
Patch 14 adds Documentation/virtual to the main index.rst file
and add a new *.rst file that was orphaned there.
-
After this series, there's just one more patch meant to be applied
for 5.3, with is still waiting for some patches to be merged from
linux-next:
https://git.linuxtv.org/mchehab/experimental.git/commit/?id=b1b5dc7d7bbfbbfdace2a248c6458301c6e34100
Mauro Carvalho Chehab (14):
docs: powerpc: convert docs to ReST and rename to *.rst
docs: power: add it to to the main documentation index
docs: fix broken doc references due to renames
docs: pdf: add all Documentation/*/index.rst to PDF output
docs: conf.py: add CJK package needed by translations
docs: conf.py: only use CJK if the font is available
scripts/sphinx-pre-install: fix script for RHEL/CentOS
scripts/sphinx-pre-install: don't use LaTeX with CentOS 7
scripts/sphinx-pre-install: fix latexmk dependencies
scripts/sphinx-pre-install: cleanup Gentoo checks
scripts/sphinx-pre-install: seek for Noto CJK fonts for pdf output
docs: load_config.py: avoid needing a conf.py just due to LaTeX docs
docs: remove extra conf.py files
docs: virtual: add it to the documentation body
Documentation/PCI/pci-error-recovery.rst | 5 +-
Documentation/RCU/rculist_nulls.txt | 2 +-
Documentation/admin-guide/conf.py | 10 --
Documentation/conf.py | 30 +++-
Documentation/core-api/conf.py | 10 --
Documentation/crypto/conf.py | 10 --
Documentation/dev-tools/conf.py | 10 --
.../devicetree/bindings/arm/idle-states.txt | 2 +-
Documentation/doc-guide/conf.py | 10 --
Documentation/driver-api/80211/conf.py | 10 --
Documentation/driver-api/conf.py | 10 --
Documentation/driver-api/pm/conf.py | 10 --
Documentation/filesystems/conf.py | 10 --
Documentation/gpu/conf.py | 10 --
Documentation/index.rst | 3 +
Documentation/input/conf.py | 10 --
Documentation/kernel-hacking/conf.py | 10 --
Documentation/locking/spinlocks.rst | 4 +-
Documentation/maintainer/conf.py | 10 --
Documentation/media/conf.py | 12 --
Documentation/memory-barriers.txt | 2 +-
Documentation/networking/conf.py | 10 --
Documentation/power/index.rst | 2 +-
.../{bootwrapper.txt => bootwrapper.rst} | 28 +++-
.../{cpu_families.txt => cpu_families.rst} | 23 +--
.../{cpu_features.txt => cpu_features.rst} | 6 +-
Documentation/powerpc/{cxl.txt => cxl.rst} | 46 ++++--
.../powerpc/{cxlflash.txt => cxlflash.rst} | 10 +-
.../{DAWR-POWER9.txt => dawr-power9.rst} | 15 +-
Documentation/powerpc/{dscr.txt => dscr.rst} | 18 +-
...ecovery.txt => eeh-pci-error-recovery.rst} | 108 ++++++------
...ed-dump.txt => firmware-assisted-dump.rst} | 117 +++++++------
Documentation/powerpc/{hvcs.txt => hvcs.rst} | 108 ++++++------
Documentation/powerpc/index.rst | 34 ++++
Documentation/powerpc/isa-versions.rst | 15 +-
.../powerpc/{mpc52xx.txt => mpc52xx.rst} | 12 +-
...nv.txt => pci_iov_resource_on_powernv.rst} | 15 +-
.../powerpc/{pmu-ebb.txt => pmu-ebb.rst} | 1 +
Documentation/powerpc/ptrace.rst | 156 ++++++++++++++++++
Documentation/powerpc/ptrace.txt | 151 -----------------
.../{qe_firmware.txt => qe_firmware.rst} | 37 +++--
.../{syscall64-abi.txt => syscall64-abi.rst} | 29 ++--
...al_memory.txt => transactional_memory.rst} | 45 ++---
Documentation/process/conf.py | 10 --
Documentation/sh/conf.py | 10 --
Documentation/sound/conf.py | 10 --
Documentation/sphinx/load_config.py | 27 ++-
.../translations/ko_KR/memory-barriers.txt | 2 +-
Documentation/userspace-api/conf.py | 10 --
Documentation/virtual/kvm/index.rst | 1 +
Documentation/vm/conf.py | 10 --
Documentation/watchdog/hpwdt.rst | 2 +-
Documentation/x86/conf.py | 10 --
MAINTAINERS | 14 +-
arch/powerpc/kernel/exceptions-64s.S | 2 +-
drivers/gpu/drm/drm_modes.c | 2 +-
drivers/i2c/busses/i2c-nvidia-gpu.c | 2 +-
drivers/scsi/hpsa.c | 4 +-
drivers/soc/fsl/qe/qe.c | 2 +-
drivers/tty/hvc/hvcs.c | 2 +-
include/soc/fsl/qe/qe.h | 2 +-
scripts/sphinx-pre-install | 118 ++++++++++---
62 files changed, 738 insertions(+), 678 deletions(-)
delete mode 100644 Documentation/admin-guide/conf.py
delete mode 100644 Documentation/core-api/conf.py
delete mode 100644 Documentation/crypto/conf.py
delete mode 100644 Documentation/dev-tools/conf.py
delete mode 100644 Documentation/doc-guide/conf.py
delete mode 100644 Documentation/driver-api/80211/conf.py
delete mode 100644 Documentation/driver-api/conf.py
delete mode 100644 Documentation/driver-api/pm/conf.py
delete mode 100644 Documentation/filesystems/conf.py
delete mode 100644 Documentation/gpu/conf.py
delete mode 100644 Documentation/input/conf.py
delete mode 100644 Documentation/kernel-hacking/conf.py
delete mode 100644 Documentation/maintainer/conf.py
delete mode 100644 Documentation/media/conf.py
delete mode 100644 Documentation/networking/conf.py
rename Documentation/powerpc/{bootwrapper.txt => bootwrapper.rst} (93%)
rename Documentation/powerpc/{cpu_families.txt => cpu_families.rst} (95%)
rename Documentation/powerpc/{cpu_features.txt => cpu_features.rst} (97%)
rename Documentation/powerpc/{cxl.txt => cxl.rst} (95%)
rename Documentation/powerpc/{cxlflash.txt => cxlflash.rst} (98%)
rename Documentation/powerpc/{DAWR-POWER9.txt => dawr-power9.rst} (95%)
rename Documentation/powerpc/{dscr.txt => dscr.rst} (91%)
rename Documentation/powerpc/{eeh-pci-error-recovery.txt => eeh-pci-error-recovery.rst} (82%)
rename Documentation/powerpc/{firmware-assisted-dump.txt => firmware-assisted-dump.rst} (80%)
rename Documentation/powerpc/{hvcs.txt => hvcs.rst} (91%)
create mode 100644 Documentation/powerpc/index.rst
rename Documentation/powerpc/{mpc52xx.txt => mpc52xx.rst} (91%)
rename Documentation/powerpc/{pci_iov_resource_on_powernv.txt => pci_iov_resource_on_powernv.rst} (97%)
rename Documentation/powerpc/{pmu-ebb.txt => pmu-ebb.rst} (99%)
create mode 100644 Documentation/powerpc/ptrace.rst
delete mode 100644 Documentation/powerpc/ptrace.txt
rename Documentation/powerpc/{qe_firmware.txt => qe_firmware.rst} (95%)
rename Documentation/powerpc/{syscall64-abi.txt => syscall64-abi.rst} (82%)
rename Documentation/powerpc/{transactional_memory.txt => transactional_memory.rst} (93%)
delete mode 100644 Documentation/process/conf.py
delete mode 100644 Documentation/sh/conf.py
delete mode 100644 Documentation/sound/conf.py
delete mode 100644 Documentation/userspace-api/conf.py
delete mode 100644 Documentation/vm/conf.py
delete mode 100644 Documentation/x86/conf.py
--
2.21.0
^ permalink raw reply
* [PATCH 13/14] docs: remove extra conf.py files
From: Mauro Carvalho Chehab @ 2019-07-16 12:10 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, Jonathan Corbet, Herbert Xu,
David S. Miller, Maarten Lankhorst, Maxime Ripard, Sean Paul,
David Airlie, Daniel Vetter, Dmitry Torokhov, Yoshinori Sato,
Rich Felker, Jaroslav Kysela, Takashi Iwai, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, H. Peter Anvin, x86, linux-doc,
linux-crypto, dri-devel, linux-input, netdev, linux-sh,
alsa-devel
In-Reply-To: <cover.1563277838.git.mchehab+samsung@kernel.org>
Now that the latex_documents are handled automatically, we can
remove those extra conf.py files.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
Documentation/admin-guide/conf.py | 10 ----------
Documentation/core-api/conf.py | 10 ----------
Documentation/crypto/conf.py | 10 ----------
Documentation/dev-tools/conf.py | 10 ----------
Documentation/doc-guide/conf.py | 10 ----------
Documentation/driver-api/80211/conf.py | 10 ----------
Documentation/driver-api/conf.py | 10 ----------
Documentation/driver-api/pm/conf.py | 10 ----------
Documentation/filesystems/conf.py | 10 ----------
Documentation/gpu/conf.py | 10 ----------
Documentation/input/conf.py | 10 ----------
Documentation/kernel-hacking/conf.py | 10 ----------
Documentation/maintainer/conf.py | 10 ----------
Documentation/media/conf.py | 12 ------------
Documentation/networking/conf.py | 10 ----------
Documentation/process/conf.py | 10 ----------
Documentation/sh/conf.py | 10 ----------
Documentation/sound/conf.py | 10 ----------
Documentation/userspace-api/conf.py | 10 ----------
Documentation/vm/conf.py | 10 ----------
Documentation/x86/conf.py | 10 ----------
21 files changed, 212 deletions(-)
delete mode 100644 Documentation/admin-guide/conf.py
delete mode 100644 Documentation/core-api/conf.py
delete mode 100644 Documentation/crypto/conf.py
delete mode 100644 Documentation/dev-tools/conf.py
delete mode 100644 Documentation/doc-guide/conf.py
delete mode 100644 Documentation/driver-api/80211/conf.py
delete mode 100644 Documentation/driver-api/conf.py
delete mode 100644 Documentation/driver-api/pm/conf.py
delete mode 100644 Documentation/filesystems/conf.py
delete mode 100644 Documentation/gpu/conf.py
delete mode 100644 Documentation/input/conf.py
delete mode 100644 Documentation/kernel-hacking/conf.py
delete mode 100644 Documentation/maintainer/conf.py
delete mode 100644 Documentation/media/conf.py
delete mode 100644 Documentation/networking/conf.py
delete mode 100644 Documentation/process/conf.py
delete mode 100644 Documentation/sh/conf.py
delete mode 100644 Documentation/sound/conf.py
delete mode 100644 Documentation/userspace-api/conf.py
delete mode 100644 Documentation/vm/conf.py
delete mode 100644 Documentation/x86/conf.py
diff --git a/Documentation/admin-guide/conf.py b/Documentation/admin-guide/conf.py
deleted file mode 100644
index 86f738953799..000000000000
--- a/Documentation/admin-guide/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = 'Linux Kernel User Documentation'
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'linux-user.tex', 'Linux Kernel User Documentation',
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/core-api/conf.py b/Documentation/core-api/conf.py
deleted file mode 100644
index db1f7659f3da..000000000000
--- a/Documentation/core-api/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Core-API Documentation"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'core-api.tex', project,
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/crypto/conf.py b/Documentation/crypto/conf.py
deleted file mode 100644
index 4335d251ddf3..000000000000
--- a/Documentation/crypto/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = 'Linux Kernel Crypto API'
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'crypto-api.tex', 'Linux Kernel Crypto API manual',
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/dev-tools/conf.py b/Documentation/dev-tools/conf.py
deleted file mode 100644
index 7faafa3f7888..000000000000
--- a/Documentation/dev-tools/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Development tools for the kernel"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'dev-tools.tex', project,
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/doc-guide/conf.py b/Documentation/doc-guide/conf.py
deleted file mode 100644
index fd3731182d5a..000000000000
--- a/Documentation/doc-guide/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = 'Linux Kernel Documentation Guide'
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'kernel-doc-guide.tex', 'Linux Kernel Documentation Guide',
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/driver-api/80211/conf.py b/Documentation/driver-api/80211/conf.py
deleted file mode 100644
index 4424b4b0b9c3..000000000000
--- a/Documentation/driver-api/80211/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Linux 802.11 Driver Developer's Guide"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', '80211.tex', project,
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/driver-api/conf.py b/Documentation/driver-api/conf.py
deleted file mode 100644
index 202726d20088..000000000000
--- a/Documentation/driver-api/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "The Linux driver implementer's API guide"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'driver-api.tex', project,
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/driver-api/pm/conf.py b/Documentation/driver-api/pm/conf.py
deleted file mode 100644
index a89fac11272f..000000000000
--- a/Documentation/driver-api/pm/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Device Power Management"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'pm.tex', project,
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/filesystems/conf.py b/Documentation/filesystems/conf.py
deleted file mode 100644
index ea44172af5c4..000000000000
--- a/Documentation/filesystems/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Linux Filesystems API"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'filesystems.tex', project,
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/gpu/conf.py b/Documentation/gpu/conf.py
deleted file mode 100644
index 1757b040fb32..000000000000
--- a/Documentation/gpu/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Linux GPU Driver Developer's Guide"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'gpu.tex', project,
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/input/conf.py b/Documentation/input/conf.py
deleted file mode 100644
index d2352fdc92ed..000000000000
--- a/Documentation/input/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "The Linux input driver subsystem"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'linux-input.tex', project,
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/kernel-hacking/conf.py b/Documentation/kernel-hacking/conf.py
deleted file mode 100644
index 3d8acf0f33ad..000000000000
--- a/Documentation/kernel-hacking/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Kernel Hacking Guides"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'kernel-hacking.tex', project,
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/maintainer/conf.py b/Documentation/maintainer/conf.py
deleted file mode 100644
index 81e9eb7a7884..000000000000
--- a/Documentation/maintainer/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = 'Linux Kernel Development Documentation'
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'maintainer.tex', 'Linux Kernel Development Documentation',
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/media/conf.py b/Documentation/media/conf.py
deleted file mode 100644
index 1f194fcd2cae..000000000000
--- a/Documentation/media/conf.py
+++ /dev/null
@@ -1,12 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-# SPDX-License-Identifier: GPL-2.0
-
-project = 'Linux Media Subsystem Documentation'
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'media.tex', 'Linux Media Subsystem Documentation',
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/networking/conf.py b/Documentation/networking/conf.py
deleted file mode 100644
index 40f69e67a883..000000000000
--- a/Documentation/networking/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Linux Networking Documentation"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'networking.tex', project,
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/process/conf.py b/Documentation/process/conf.py
deleted file mode 100644
index 1b01a80ad9ce..000000000000
--- a/Documentation/process/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = 'Linux Kernel Development Documentation'
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'process.tex', 'Linux Kernel Development Documentation',
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/sh/conf.py b/Documentation/sh/conf.py
deleted file mode 100644
index 1eb684a13ac8..000000000000
--- a/Documentation/sh/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "SuperH architecture implementation manual"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'sh.tex', project,
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/sound/conf.py b/Documentation/sound/conf.py
deleted file mode 100644
index 3f1fc5e74e7b..000000000000
--- a/Documentation/sound/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Linux Sound Subsystem Documentation"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'sound.tex', project,
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/userspace-api/conf.py b/Documentation/userspace-api/conf.py
deleted file mode 100644
index 2eaf59f844e5..000000000000
--- a/Documentation/userspace-api/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "The Linux kernel user-space API guide"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'userspace-api.tex', project,
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/vm/conf.py b/Documentation/vm/conf.py
deleted file mode 100644
index 3b0b601af558..000000000000
--- a/Documentation/vm/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "Linux Memory Management Documentation"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'memory-management.tex', project,
- 'The kernel development community', 'manual'),
-]
diff --git a/Documentation/x86/conf.py b/Documentation/x86/conf.py
deleted file mode 100644
index 33c5c3142e20..000000000000
--- a/Documentation/x86/conf.py
+++ /dev/null
@@ -1,10 +0,0 @@
-# -*- coding: utf-8; mode: python -*-
-
-project = "X86 architecture specific documentation"
-
-tags.add("subproject")
-
-latex_documents = [
- ('index', 'x86.tex', project,
- 'The kernel development community', 'manual'),
-]
--
2.21.0
^ permalink raw reply related
* Re: [PATCH iproute2-rc 8/8] rdma: Document counter statistic
From: Leon Romanovsky @ 2019-07-16 12:19 UTC (permalink / raw)
To: Gal Pressman
Cc: Stephen Hemminger, netdev, David Ahern, Mark Zhang,
RDMA mailing list
In-Reply-To: <92db561d-e89c-0e09-ef2e-9eb9535d504f@amazon.com>
On Tue, Jul 16, 2019 at 11:19:26AM +0300, Gal Pressman wrote:
> On 10/07/2019 10:24, Leon Romanovsky wrote:
> > +.SH "EXAMPLES"
> > +.PP
> > +rdma statistic show
> > +.RS 4
> > +Shows the state of the default counter of all RDMA devices on the system.
> > +.RE
> > +.PP
> > +rdma statistic show link mlx5_2/1
> > +.RS 4
> > +Shows the state of the default counter of specified RDMA port
> > +.RE
> > +.PP
> > +rdma statistic qp show
> > +.RS 4
> > +Shows the state of all qp counters of all RDMA devices on the system.
> > +.RE
> > +.PP
> > +rdma statistic qp show link mlx5_2/1
> > +.RS 4
> > +Shows the state of all qp counters of specified RDMA port.
> > +.RE
> > +.PP
> > +rdma statistic qp show link mlx5_2 pid 30489
> > +.RS 4
> > +Shows the state of all qp counters of specified RDMA port and belonging to pid 30489
> > +.RE
> > +.PP
> > +rdma statistic qp mode
> > +.RS 4
> > +List current counter mode on all deivces
>
> "deivces" -> "devices".
Thanks,
Stephen,
Can you please fix this typo while you are applying the series?
Thanks
^ permalink raw reply
* Re: [PATCH net-next] net: openvswitch: do not update max_headroom if new headroom is equal to old headroom
From: Taehee Yoo @ 2019-07-16 12:28 UTC (permalink / raw)
To: Pravin Shelar; +Cc: David S. Miller, Linux Kernel Network Developers, ovs dev
In-Reply-To: <CAOrHB_DXSo37ZttVcW3fEuvB9_-2VtzZgf0JNq1ZhSfJJHSa7Q@mail.gmail.com>
On Tue, 16 Jul 2019 at 03:15, Pravin Shelar <pshelar@ovn.org> wrote:
>
Hi Pravin,
> On Fri, Jul 5, 2019 at 9:08 AM Taehee Yoo <ap420073@gmail.com> wrote:
> >
> > When a vport is deleted, the maximum headroom size would be changed.
> > If the vport which has the largest headroom is deleted,
> > the new max_headroom would be set.
> > But, if the new headroom size is equal to the old headroom size,
> > updating routine is unnecessary.
> >
> > Signed-off-by: Taehee Yoo <ap420073@gmail.com>
>
> Sorry for late reply. This patch looks good to me too.
>
> I am curious about reason to avoid adjustment to headroom. why are you
> trying to avoid unnecessary changes to headroom.
>
Thank you for the review!
The intention of this patch is to remove unnecessary overhead.
There is no other reason.
Thanks!
> Thanks,
> Pravin.
^ permalink raw reply
* [PATCH bpf] selftests/bpf: fix perf_buffer on s390
From: Ilya Leoshkevich @ 2019-07-16 12:58 UTC (permalink / raw)
To: bpf, netdev; +Cc: gor, heiko.carstens, Ilya Leoshkevich
perf_buffer test fails for exactly the same reason test_attach_probe
used to fail: different nanosleep syscall kprobe name.
Reuse the test_attach_probe fix.
Fixes: ee5cf82ce04a ("selftests/bpf: test perf buffer API")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
.../testing/selftests/bpf/prog_tests/attach_probe.c | 12 ++----------
tools/testing/selftests/bpf/prog_tests/perf_buffer.c | 8 +-------
tools/testing/selftests/bpf/test_progs.h | 8 ++++++++
3 files changed, 11 insertions(+), 17 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/attach_probe.c b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
index 47af4afc5013..5ecc267d98b0 100644
--- a/tools/testing/selftests/bpf/prog_tests/attach_probe.c
+++ b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
@@ -21,14 +21,6 @@ ssize_t get_base_addr() {
return -EINVAL;
}
-#ifdef __x86_64__
-#define SYS_KPROBE_NAME "__x64_sys_nanosleep"
-#elif defined(__s390x__)
-#define SYS_KPROBE_NAME "__s390x_sys_nanosleep"
-#else
-#define SYS_KPROBE_NAME "sys_nanosleep"
-#endif
-
void test_attach_probe(void)
{
const char *kprobe_name = "kprobe/sys_nanosleep";
@@ -86,7 +78,7 @@ void test_attach_probe(void)
kprobe_link = bpf_program__attach_kprobe(kprobe_prog,
false /* retprobe */,
- SYS_KPROBE_NAME);
+ SYS_NANOSLEEP_KPROBE_NAME);
if (CHECK(IS_ERR(kprobe_link), "attach_kprobe",
"err %ld\n", PTR_ERR(kprobe_link))) {
kprobe_link = NULL;
@@ -94,7 +86,7 @@ void test_attach_probe(void)
}
kretprobe_link = bpf_program__attach_kprobe(kretprobe_prog,
true /* retprobe */,
- SYS_KPROBE_NAME);
+ SYS_NANOSLEEP_KPROBE_NAME);
if (CHECK(IS_ERR(kretprobe_link), "attach_kretprobe",
"err %ld\n", PTR_ERR(kretprobe_link))) {
kretprobe_link = NULL;
diff --git a/tools/testing/selftests/bpf/prog_tests/perf_buffer.c b/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
index 3f1ef95865ff..3003fddc0613 100644
--- a/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
+++ b/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
@@ -5,12 +5,6 @@
#include <sys/socket.h>
#include <test_progs.h>
-#ifdef __x86_64__
-#define SYS_KPROBE_NAME "__x64_sys_nanosleep"
-#else
-#define SYS_KPROBE_NAME "sys_nanosleep"
-#endif
-
static void on_sample(void *ctx, int cpu, void *data, __u32 size)
{
int cpu_data = *(int *)data, duration = 0;
@@ -56,7 +50,7 @@ void test_perf_buffer(void)
/* attach kprobe */
link = bpf_program__attach_kprobe(prog, false /* retprobe */,
- SYS_KPROBE_NAME);
+ SYS_NANOSLEEP_KPROBE_NAME);
if (CHECK(IS_ERR(link), "attach_kprobe", "err %ld\n", PTR_ERR(link)))
goto out_close;
diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
index f095e1d4c657..49e0f7d85643 100644
--- a/tools/testing/selftests/bpf/test_progs.h
+++ b/tools/testing/selftests/bpf/test_progs.h
@@ -92,3 +92,11 @@ int compare_map_keys(int map1_fd, int map2_fd);
int compare_stack_ips(int smap_fd, int amap_fd, int stack_trace_len);
int extract_build_id(char *build_id, size_t size);
void *spin_lock_thread(void *arg);
+
+#ifdef __x86_64__
+#define SYS_NANOSLEEP_KPROBE_NAME "__x64_sys_nanosleep"
+#elif defined(__s390x__)
+#define SYS_NANOSLEEP_KPROBE_NAME "__s390x_sys_nanosleep"
+#else
+#define SYS_NANOSLEEP_KPROBE_NAME "sys_nanosleep"
+#endif
--
2.21.0
^ permalink raw reply related
* Re: [EXT] [PATCH v1] net: fec: optionally reset PHY via a reset-controller
From: Sven Van Asbroeck @ 2019-07-16 13:19 UTC (permalink / raw)
To: Andy Duan
Cc: David S . Miller, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <VI1PR0402MB36009E99D7361583702B84DDFFCE0@VI1PR0402MB3600.eurprd04.prod.outlook.com>
Hi Andy,
On Mon, Jul 15, 2019 at 10:02 PM Andy Duan <fugang.duan@nxp.com> wrote:
>
> the phylib already can handle mii bus reset and phy device reset
That's a great suggestion, thank you !! I completely overlooked that code.
What will happen to the legacy phy reset code in fec? Are there many users left?
^ permalink raw reply
* [PATCH] libertas_tf: Use correct channel range in lbtf_geo_init
From: YueHaibing @ 2019-07-16 13:25 UTC (permalink / raw)
To: kvalo, davem, lkundrak, derosier
Cc: linux-kernel, netdev, linux-wireless, YueHaibing
It seems we should use 'range' instead of 'priv->range'
in lbtf_geo_init(), because 'range' is the corret one
related to current regioncode.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 691cdb49388b ("libertas_tf: command helper functions for libertas_tf")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
drivers/net/wireless/marvell/libertas_tf/cmd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/marvell/libertas_tf/cmd.c b/drivers/net/wireless/marvell/libertas_tf/cmd.c
index 1eacca0..a333172 100644
--- a/drivers/net/wireless/marvell/libertas_tf/cmd.c
+++ b/drivers/net/wireless/marvell/libertas_tf/cmd.c
@@ -65,7 +65,7 @@ static void lbtf_geo_init(struct lbtf_private *priv)
break;
}
- for (ch = priv->range.start; ch < priv->range.end; ch++)
+ for (ch = range.start; ch < range.end; ch++)
priv->channels[CHAN_TO_IDX(ch)].flags = 0;
}
--
2.7.4
^ permalink raw reply related
* Re: [PATCH] libertas_tf: Use correct channel range in lbtf_geo_init
From: Yuehaibing @ 2019-07-16 13:38 UTC (permalink / raw)
To: kvalo, davem, lkundrak, derosier; +Cc: linux-kernel, netdev, linux-wireless
In-Reply-To: <20190716132534.11256-1-yuehaibing@huawei.com>
Pls ignore this, sorry
On 2019/7/16 21:25, YueHaibing wrote:
> It seems we should use 'range' instead of 'priv->range'
> in lbtf_geo_init(), because 'range' is the corret one
> related to current regioncode.
>
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Fixes: 691cdb49388b ("libertas_tf: command helper functions for libertas_tf")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
> drivers/net/wireless/marvell/libertas_tf/cmd.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/marvell/libertas_tf/cmd.c b/drivers/net/wireless/marvell/libertas_tf/cmd.c
> index 1eacca0..a333172 100644
> --- a/drivers/net/wireless/marvell/libertas_tf/cmd.c
> +++ b/drivers/net/wireless/marvell/libertas_tf/cmd.c
> @@ -65,7 +65,7 @@ static void lbtf_geo_init(struct lbtf_private *priv)
> break;
> }
>
> - for (ch = priv->range.start; ch < priv->range.end; ch++)
> + for (ch = range.start; ch < range.end; ch++)
> priv->channels[CHAN_TO_IDX(ch)].flags = 0;
> }
>
>
^ permalink raw reply
* Re: IPv6 L2TP issues related to 93531c67
From: Paul Donohue @ 2019-07-16 13:56 UTC (permalink / raw)
To: David Ahern; +Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev
In-Reply-To: <d6db74f5-5add-7500-1b7a-fa62302a455f@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2193 bytes --]
On Mon, Jul 15, 2019 at 12:55:48PM -0600, David Ahern wrote:
> As an FYI, gmail thinks your emails are spam.
Ugh. Thanks for letting me know. I'll look into it.
> On 7/15/19 10:18 AM, Paul Donohue wrote:
> > Reverting commit 93531c6743157d7e8c5792f8ed1a57641149d62c (identified by bisection) fixes this issue.
> That commit can not be reverted. It is a foundational piece for a lot of
> other changes. Did you mean the commit before it works and this commit
> fails?
Sorry, yes, I meant the commit before it works, and this one fails. I did not try reverting this commit on a more recent kernel.
> > It is not obvious to me how commit 93531c6743157d7e8c5792f8ed1a57641149d62c causes this issue, or how it should be fixed. Could someone take a look and point me in the right direction for further troubleshooting?
> Let's get a complete example that demonstrates the problem, and I can go
> from there. Can you take the attached script and update it so that it
> reflects the problem you are reporting? That script works on latest
> kernel as well as 4.14.133. It uses network namespaces for 2 hosts with
> a router between them.
>
> Also, check the return of the fib lookups using:
> perf record -e fib6:* -a
> <run test, ctrl-c on the record>
> perf script
>
> Checkout the fib lookup parameters and result. Do they look correct to
> you for your setup?
Unfortunately, I have a fairly complicated setup, so it took me a while to figure out which pieces were relevant ... But I think I've finally got it. The missing piece was IPsec.
After establishing an IPsec tunnel to carry the L2TP traffic, the first L2TP packet through the IPsec tunnel permanently breaks the associated L2TP tunnel. Tearing down the IPsec tunnel does not restore functionality of the L2TP tunnel - I have to tear down and re-create the L2TP tunnel before it will work again. In my real-world use case, I have two L2TP tunnels running over the same IPsec tunnel, and the first L2TP tunnel to send a packet after IPsec is established gets permanently broken, while the other L2TP tunnel works fine.
I've attached a modified version of the script which demonstrates this issue.
Thank you!
-Paul
[-- Attachment #2: l2tp.sh --]
[-- Type: application/x-sh, Size: 4977 bytes --]
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox