Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] ethernet: renesas: convert to SPDX identifiers
From: Sergei Shtylyov @ 2018-09-07  9:05 UTC (permalink / raw)
  To: Kuninori Morimoto, David S. Miller, Mark Brown
  Cc: Geert Uytterhoeven, Robin Murphy, netdev, linux-renesas-soc
In-Reply-To: <875zziqdem.wl-kuninori.morimoto.gx@renesas.com>

Hello!

On 9/7/2018 5:02 AM, Kuninori Morimoto wrote:

> From: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
> 
> This patch updates license to use SPDX-License-Identifier
> instead of verbose license text.
> 
> Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
> ---
>   drivers/net/ethernet/renesas/Kconfig    | 1 +
>   drivers/net/ethernet/renesas/Makefile   | 1 +
>   drivers/net/ethernet/renesas/ravb_ptp.c | 6 +-----
>   3 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/renesas/Kconfig b/drivers/net/ethernet/renesas/Kconfig
> index f3f7477..bb0ebdf 100644
> --- a/drivers/net/ethernet/renesas/Kconfig
> +++ b/drivers/net/ethernet/renesas/Kconfig
> @@ -1,3 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0
>   #
>   # Renesas device configuration
>   #

    There was no license at all on this file...

> diff --git a/drivers/net/ethernet/renesas/Makefile b/drivers/net/ethernet/renesas/Makefile
> index a05102a..f21ab8c 100644
> --- a/drivers/net/ethernet/renesas/Makefile
> +++ b/drivers/net/ethernet/renesas/Makefile
> @@ -1,3 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0
>   #
>   # Makefile for the Renesas device drivers.
>   #

    Likewise.

> diff --git a/drivers/net/ethernet/renesas/ravb_ptp.c b/drivers/net/ethernet/renesas/ravb_ptp.c
> index eede70e..0721b5c 100644
> --- a/drivers/net/ethernet/renesas/ravb_ptp.c
> +++ b/drivers/net/ethernet/renesas/ravb_ptp.c
> @@ -1,13 +1,9 @@
> +// SPDX-License-Identifier: GPL-2.0+
>   /* PTP 1588 clock using the Renesas Ethernet AVB
>    *
>    * Copyright (C) 2013-2015 Renesas Electronics Corporation
>    * Copyright (C) 2015 Renesas Solutions Corp.
>    * Copyright (C) 2015-2016 Cogent Embedded, Inc. <source@cogentembedded.com>
> - *
> - *  This program is free software; you can redistribute it and/or modify
> - *  it under the terms of the GNU General Public License as published by
> - *  the Free Software Foundation; either version 2 of the License, or
> - *  (at your option) any later version.
>    */
>   
>   #include "ravb.h"

    This part looks valid, don't know why this file was missed by Wolfram who 
did the previous patch...

MBR, Sergei

^ permalink raw reply

* Re: Re: [PATCH net-next v2 0/5] virtio: support packed ring
From: Michael S. Tsirkin @ 2018-09-07 13:00 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, virtualization, linux-kernel, netdev, virtio-dev, wexu,
	jfreimann
In-Reply-To: <20180907012225.GA32677@debian>

On Fri, Sep 07, 2018 at 09:22:25AM +0800, Tiwei Bie wrote:
> On Mon, Aug 27, 2018 at 05:00:40PM +0300, Michael S. Tsirkin wrote:
> > Are there still plans to test the performance with vost pmd?
> > vhost doesn't seem to show a performance gain ...
> > 
> 
> I tried some performance tests with vhost PMD. In guest, the
> XDP program will return XDP_DROP directly. And in host, testpmd
> will do txonly fwd.
> 
> When burst size is 1 and packet size is 64 in testpmd and
> testpmd needs to iterate 5 Tx queues (but only the first two
> queues are enabled) to prepare and inject packets, I got ~12%
> performance boost (5.7Mpps -> 6.4Mpps). And if the vhost PMD
> is faster (e.g. just need to iterate the first two queues to
> prepare and inject packets), then I got similar performance
> for both rings (~9.9Mpps) (packed ring's performance can be
> lower, because it's more complicated in driver.)
> 
> I think packed ring makes vhost PMD faster, but it doesn't make
> the driver faster. In packed ring, the ring is simplified, and
> the handling of the ring in vhost (device) is also simplified,
> but things are not simplified in driver, e.g. although there is
> no desc table in the virtqueue anymore, driver still needs to
> maintain a private desc state table (which is still managed as
> a list in this patch set) to support the out-of-order desc
> processing in vhost (device).
> 
> I think this patch set is mainly to make the driver have a full
> functional support for the packed ring, which makes it possible
> to leverage the packed ring feature in vhost (device). But I'm
> not sure whether there is any other better idea, I'd like to
> hear your thoughts. Thanks!

Just this: Jens seems to report a nice gain with virtio and
vhost pmd across the board. Try to compare virtio and
virtio pmd to see what does pmd do better?


> 
> > 
> > On Wed, Jul 11, 2018 at 10:27:06AM +0800, Tiwei Bie wrote:
> > > Hello everyone,
> > > 
> > > This patch set implements packed ring support in virtio driver.
> > > 
> > > Some functional tests have been done with Jason's
> > > packed ring implementation in vhost:
> > > 
> > > https://lkml.org/lkml/2018/7/3/33
> > > 
> > > Both of ping and netperf worked as expected.
> > > 
> > > v1 -> v2:
> > > - Use READ_ONCE() to read event off_wrap and flags together (Jason);
> > > - Add comments related to ccw (Jason);
> > > 
> > > RFC (v6) -> v1:
> > > - Avoid extra virtio_wmb() in virtqueue_enable_cb_delayed_packed()
> > >   when event idx is off (Jason);
> > > - Fix bufs calculation in virtqueue_enable_cb_delayed_packed() (Jason);
> > > - Test the state of the desc at used_idx instead of last_used_idx
> > >   in virtqueue_enable_cb_delayed_packed() (Jason);
> > > - Save wrap counter (as part of queue state) in the return value
> > >   of virtqueue_enable_cb_prepare_packed();
> > > - Refine the packed ring definitions in uapi;
> > > - Rebase on the net-next tree;
> > > 
> > > RFC v5 -> RFC v6:
> > > - Avoid tracking addr/len/flags when DMA API isn't used (MST/Jason);
> > > - Define wrap counter as bool (Jason);
> > > - Use ALIGN() in vring_init_packed() (Jason);
> > > - Avoid using pointer to track `next` in detach_buf_packed() (Jason);
> > > - Add comments for barriers (Jason);
> > > - Don't enable RING_PACKED on ccw for now (noticed by Jason);
> > > - Refine the memory barrier in virtqueue_poll();
> > > - Add a missing memory barrier in virtqueue_enable_cb_delayed_packed();
> > > - Remove the hacks in virtqueue_enable_cb_prepare_packed();
> > > 
> > > RFC v4 -> RFC v5:
> > > - Save DMA addr, etc in desc state (Jason);
> > > - Track used wrap counter;
> > > 
> > > RFC v3 -> RFC v4:
> > > - Make ID allocation support out-of-order (Jason);
> > > - Various fixes for EVENT_IDX support;
> > > 
> > > RFC v2 -> RFC v3:
> > > - Split into small patches (Jason);
> > > - Add helper virtqueue_use_indirect() (Jason);
> > > - Just set id for the last descriptor of a list (Jason);
> > > - Calculate the prev in virtqueue_add_packed() (Jason);
> > > - Fix/improve desc suppression code (Jason/MST);
> > > - Refine the code layout for XXX_split/packed and wrappers (MST);
> > > - Fix the comments and API in uapi (MST);
> > > - Remove the BUG_ON() for indirect (Jason);
> > > - Some other refinements and bug fixes;
> > > 
> > > RFC v1 -> RFC v2:
> > > - Add indirect descriptor support - compile test only;
> > > - Add event suppression supprt - compile test only;
> > > - Move vring_packed_init() out of uapi (Jason, MST);
> > > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > > - Avoid using '%' operator (Jason);
> > > - Rename free_head -> next_avail_idx (Jason);
> > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > > - Some other refinements and bug fixes;
> > > 
> > > Thanks!
> > > 
> > > Tiwei Bie (5):
> > >   virtio: add packed ring definitions
> > >   virtio_ring: support creating packed ring
> > >   virtio_ring: add packed ring support
> > >   virtio_ring: add event idx support in packed ring
> > >   virtio_ring: enable packed ring
> > > 
> > >  drivers/s390/virtio/virtio_ccw.c   |   14 +
> > >  drivers/virtio/virtio_ring.c       | 1365 ++++++++++++++++++++++------
> > >  include/linux/virtio_ring.h        |    8 +-
> > >  include/uapi/linux/virtio_config.h |    3 +
> > >  include/uapi/linux/virtio_ring.h   |   43 +
> > >  5 files changed, 1157 insertions(+), 276 deletions(-)
> > > 
> > > -- 
> > > 2.18.0
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 

^ permalink raw reply

* [PATCH v2 4/4] i40e: disallow changing the number of descriptors when AF_XDP is on
From: Björn Töpel @ 2018-09-07  8:18 UTC (permalink / raw)
  To: ast, daniel, jeffrey.t.kirsher, intel-wired-lan, jakub.kicinski
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev
In-Reply-To: <20180907081848.5438-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@intel.com>

When an AF_XDP UMEM is attached to any of the Rx rings, we disallow a
user to change the number of descriptors via e.g. "ethtool -G IFNAME".

Otherwise, the size of the stash/reuse queue can grow unbounded, which
would result in OOM or leaking userspace buffers.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 .../net/ethernet/intel/i40e/i40e_ethtool.c    |  9 +++++++-
 .../ethernet/intel/i40e/i40e_txrx_common.h    |  1 +
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 22 +++++++++++++++++++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index d7d3974beca2..3cd2c88c72f8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -5,7 +5,7 @@
 
 #include "i40e.h"
 #include "i40e_diag.h"
-
+#include "i40e_txrx_common.h"
 #include "i40e_ethtool_stats.h"
 
 #define I40E_PF_STAT(_name, _stat) \
@@ -1493,6 +1493,13 @@ static int i40e_set_ringparam(struct net_device *netdev,
 	    (new_rx_count == vsi->rx_rings[0]->count))
 		return 0;
 
+	/* If there is a AF_XDP UMEM attached to any of Rx rings,
+	 * disallow changing the number of descriptors -- regardless
+	 * if the netdev is running or not.
+	 */
+	if (i40e_xsk_any_rx_ring_enabled(vsi))
+		return -EBUSY;
+
 	while (test_and_set_bit(__I40E_CONFIG_BUSY, pf->state)) {
 		timeout--;
 		if (!timeout)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
index 8d46acff6f2e..09809dffe399 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
@@ -89,5 +89,6 @@ static inline void i40e_arm_wb(struct i40e_ring *tx_ring,
 
 void i40e_xsk_clean_rx_ring(struct i40e_ring *rx_ring);
 void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring);
+bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi);
 
 #endif /* I40E_TXRX_COMMON_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index e4b62e871afc..119f59ec7cc0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -944,3 +944,25 @@ void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring)
 	if (xsk_frames)
 		xsk_umem_complete_tx(umem, xsk_frames);
 }
+
+/**
+ * i40e_xsk_any_rx_ring_enabled - Checks whether any of the Rx rings
+ * has AF_XDP UMEM attached
+ * @vsi: vsi
+ *
+ * Returns true if any of the Rx rings has an AF_XDP UMEM attached
+ **/
+bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi)
+{
+	int i;
+
+	if (!vsi->xsk_umems)
+		return false;
+
+	for (i = 0; i < vsi->num_queue_pairs; i++) {
+		if (vsi->xsk_umems[i])
+			return true;
+	}
+
+	return false;
+}
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 3/4] i40e: clean zero-copy XDP Rx ring on shutdown/reset
From: Björn Töpel @ 2018-09-07  8:18 UTC (permalink / raw)
  To: ast, daniel, jeffrey.t.kirsher, intel-wired-lan, jakub.kicinski
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev
In-Reply-To: <20180907081848.5438-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@intel.com>

Outstanding Rx descriptors are temporarily stored on a stash/reuse
queue. When/if the HW rings comes up again, entries from the stash are
used to re-populate the ring.

The latter required some restructuring of the allocation scheme for
the AF_XDP zero-copy implementation. There is now a fast, and a slow
allocation. The "fast allocation" is used from the fast-path and
obtains free buffers from the fill ring and the internal recycle
mechanism. The "slow allocation" is only used in ring setup, and
obtains buffers from the fill ring and the stash (if any).

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   4 +-
 .../ethernet/intel/i40e/i40e_txrx_common.h    |   1 +
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 100 ++++++++++++++++--
 3 files changed, 96 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 7f85d4ba8b54..740ea58ba938 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1355,8 +1355,10 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
 		rx_ring->skb = NULL;
 	}
 
-	if (rx_ring->xsk_umem)
+	if (rx_ring->xsk_umem) {
+		i40e_xsk_clean_rx_ring(rx_ring);
 		goto skip_free;
+	}
 
 	/* Free all the Rx ring sk_buffs */
 	for (i = 0; i < rx_ring->count; i++) {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
index 29c68b29d36f..8d46acff6f2e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
@@ -87,6 +87,7 @@ static inline void i40e_arm_wb(struct i40e_ring *tx_ring,
 	}
 }
 
+void i40e_xsk_clean_rx_ring(struct i40e_ring *rx_ring);
 void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring);
 
 #endif /* I40E_TXRX_COMMON_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 99116277c4d2..e4b62e871afc 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -140,6 +140,7 @@ static void i40e_xsk_umem_dma_unmap(struct i40e_vsi *vsi, struct xdp_umem *umem)
 static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem,
 				u16 qid)
 {
+	struct xdp_umem_fq_reuse *reuseq;
 	bool if_running;
 	int err;
 
@@ -156,6 +157,12 @@ static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem,
 			return -EBUSY;
 	}
 
+	reuseq = xsk_reuseq_prepare(vsi->rx_rings[0]->count);
+	if (!reuseq)
+		return -ENOMEM;
+
+	xsk_reuseq_free(xsk_reuseq_swap(umem, reuseq));
+
 	err = i40e_xsk_umem_dma_map(vsi, umem);
 	if (err)
 		return err;
@@ -353,16 +360,46 @@ static bool i40e_alloc_buffer_zc(struct i40e_ring *rx_ring,
 }
 
 /**
- * i40e_alloc_rx_buffers_zc - Allocates a number of Rx buffers
+ * i40e_alloc_buffer_slow_zc - Allocates an i40e_rx_buffer
  * @rx_ring: Rx ring
- * @count: The number of buffers to allocate
+ * @bi: Rx buffer to populate
  *
- * This function allocates a number of Rx buffers and places them on
- * the Rx ring.
+ * This function allocates an Rx buffer. The buffer can come from fill
+ * queue, or via the reuse queue.
  *
  * Returns true for a successful allocation, false otherwise
  **/
-bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
+static bool i40e_alloc_buffer_slow_zc(struct i40e_ring *rx_ring,
+				      struct i40e_rx_buffer *bi)
+{
+	struct xdp_umem *umem = rx_ring->xsk_umem;
+	u64 handle, hr;
+
+	if (!xsk_umem_peek_addr_rq(umem, &handle)) {
+		rx_ring->rx_stats.alloc_page_failed++;
+		return false;
+	}
+
+	handle &= rx_ring->xsk_umem->chunk_mask;
+
+	hr = umem->headroom + XDP_PACKET_HEADROOM;
+
+	bi->dma = xdp_umem_get_dma(umem, handle);
+	bi->dma += hr;
+
+	bi->addr = xdp_umem_get_data(umem, handle);
+	bi->addr += hr;
+
+	bi->handle = handle + umem->headroom;
+
+	xsk_umem_discard_addr_rq(umem);
+	return true;
+}
+
+static __always_inline bool __i40e_alloc_rx_buffers_zc(
+	struct i40e_ring *rx_ring, u16 count,
+	bool alloc(struct i40e_ring *rx_ring,
+		   struct i40e_rx_buffer *bi))
 {
 	u16 ntu = rx_ring->next_to_use;
 	union i40e_rx_desc *rx_desc;
@@ -372,7 +409,7 @@ bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
 	rx_desc = I40E_RX_DESC(rx_ring, ntu);
 	bi = &rx_ring->rx_bi[ntu];
 	do {
-		if (!i40e_alloc_buffer_zc(rx_ring, bi)) {
+		if (!alloc(rx_ring, bi)) {
 			ok = false;
 			goto no_buffers;
 		}
@@ -404,6 +441,38 @@ bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
 	return ok;
 }
 
+/**
+ * i40e_alloc_rx_buffers_zc - Allocates a number of Rx buffers
+ * @rx_ring: Rx ring
+ * @count: The number of buffers to allocate
+ *
+ * This function allocates a number of Rx buffers from the reuse queue
+ * or fill ring and places them on the Rx ring.
+ *
+ * Returns true for a successful allocation, false otherwise
+ **/
+bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
+{
+	return __i40e_alloc_rx_buffers_zc(rx_ring, count,
+					  i40e_alloc_buffer_slow_zc);
+}
+
+/**
+ * i40e_alloc_rx_buffers_fast_zc - Allocates a number of Rx buffers
+ * @rx_ring: Rx ring
+ * @count: The number of buffers to allocate
+ *
+ * This function allocates a number of Rx buffers from the fill ring
+ * or the internal recycle mechanism and places them on the Rx ring.
+ *
+ * Returns true for a successful allocation, false otherwise
+ **/
+static bool i40e_alloc_rx_buffers_fast_zc(struct i40e_ring *rx_ring, u16 count)
+{
+	return __i40e_alloc_rx_buffers_zc(rx_ring, count,
+					  i40e_alloc_buffer_zc);
+}
+
 /**
  * i40e_get_rx_buffer_zc - Return the current Rx buffer
  * @rx_ring: Rx ring
@@ -571,8 +640,8 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 
 		if (cleaned_count >= I40E_RX_BUFFER_WRITE) {
 			failure = failure ||
-				  !i40e_alloc_rx_buffers_zc(rx_ring,
-							    cleaned_count);
+				  !i40e_alloc_rx_buffers_fast_zc(rx_ring,
+								 cleaned_count);
 			cleaned_count = 0;
 		}
 
@@ -831,6 +900,21 @@ int i40e_xsk_async_xmit(struct net_device *dev, u32 queue_id)
 	return 0;
 }
 
+void i40e_xsk_clean_rx_ring(struct i40e_ring *rx_ring)
+{
+	u16 i;
+
+	for (i = 0; i < rx_ring->count; i++) {
+		struct i40e_rx_buffer *rx_bi = &rx_ring->rx_bi[i];
+
+		if (!rx_bi->addr)
+			continue;
+
+		xsk_umem_fq_reuse(rx_ring->xsk_umem, rx_bi->handle);
+		rx_bi->addr = NULL;
+	}
+}
+
 /**
  * i40e_xsk_clean_xdp_ring - Clean the XDP Tx ring on shutdown
  * @xdp_ring: XDP Tx ring
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 2/4] net: xsk: add a simple buffer reuse queue
From: Björn Töpel @ 2018-09-07  8:18 UTC (permalink / raw)
  To: ast, daniel, jeffrey.t.kirsher, intel-wired-lan, jakub.kicinski
  Cc: magnus.karlsson, magnus.karlsson, netdev
In-Reply-To: <20180907081848.5438-1-bjorn.topel@gmail.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>

XSK UMEM is strongly single producer single consumer so reuse of
frames is challenging.  Add a simple "stash" of FILL packets to
reuse for drivers to optionally make use of.  This is useful
when driver has to free (ndo_stop) or resize a ring with an active
AF_XDP ZC socket.

v2: Fixed build issues for !CONFIG_XDP_SOCKETS.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/xdp_sock.h | 69 ++++++++++++++++++++++++++++++++++++++++++
 net/xdp/xdp_umem.c     |  2 ++
 net/xdp/xsk_queue.c    | 55 +++++++++++++++++++++++++++++++++
 net/xdp/xsk_queue.h    |  3 ++
 4 files changed, 129 insertions(+)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 932ca0dad6f3..70a115bea4f4 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -21,6 +21,12 @@ struct xdp_umem_page {
 	dma_addr_t dma;
 };
 
+struct xdp_umem_fq_reuse {
+	u32 nentries;
+	u32 length;
+	u64 handles[];
+};
+
 struct xdp_umem {
 	struct xsk_queue *fq;
 	struct xsk_queue *cq;
@@ -37,6 +43,7 @@ struct xdp_umem {
 	struct page **pgs;
 	u32 npgs;
 	struct net_device *dev;
+	struct xdp_umem_fq_reuse *fq_reuse;
 	u16 queue_id;
 	bool zc;
 	spinlock_t xsk_list_lock;
@@ -75,6 +82,10 @@ void xsk_umem_discard_addr(struct xdp_umem *umem);
 void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries);
 bool xsk_umem_consume_tx(struct xdp_umem *umem, dma_addr_t *dma, u32 *len);
 void xsk_umem_consume_tx_done(struct xdp_umem *umem);
+struct xdp_umem_fq_reuse *xsk_reuseq_prepare(u32 nentries);
+struct xdp_umem_fq_reuse *xsk_reuseq_swap(struct xdp_umem *umem,
+					  struct xdp_umem_fq_reuse *newq);
+void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq);
 
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
@@ -85,6 +96,35 @@ static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 addr)
 {
 	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & (PAGE_SIZE - 1));
 }
+
+/* Reuse-queue aware version of FILL queue helpers */
+static inline u64 *xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr)
+{
+	struct xdp_umem_fq_reuse *rq = umem->fq_reuse;
+
+	if (!rq->length)
+		return xsk_umem_peek_addr(umem, addr);
+
+	*addr = rq->handles[rq->length - 1];
+	return addr;
+}
+
+static inline void xsk_umem_discard_addr_rq(struct xdp_umem *umem)
+{
+	struct xdp_umem_fq_reuse *rq = umem->fq_reuse;
+
+	if (!rq->length)
+		xsk_umem_discard_addr(umem);
+	else
+		rq->length--;
+}
+
+static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
+{
+	struct xdp_umem_fq_reuse *rq = umem->fq_reuse;
+
+	rq->handles[rq->length++] = addr;
+}
 #else
 static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
@@ -128,6 +168,21 @@ static inline void xsk_umem_consume_tx_done(struct xdp_umem *umem)
 {
 }
 
+static inline struct xdp_umem_fq_reuse *xsk_reuseq_prepare(u32 nentries)
+{
+	return NULL;
+}
+
+static inline struct xdp_umem_fq_reuse *xsk_reuseq_swap(
+	struct xdp_umem *umem,
+	struct xdp_umem_fq_reuse *newq)
+{
+	return NULL;
+}
+static inline void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq)
+{
+}
+
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
 	return NULL;
@@ -137,6 +192,20 @@ static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 addr)
 {
 	return 0;
 }
+
+static inline u64 *xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr)
+{
+	return NULL;
+}
+
+static inline void xsk_umem_discard_addr_rq(struct xdp_umem *umem)
+{
+}
+
+static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
+{
+}
+
 #endif /* CONFIG_XDP_SOCKETS */
 
 #endif /* _LINUX_XDP_SOCK_H */
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index b3b632c5aeae..555427b3e0fe 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -165,6 +165,8 @@ static void xdp_umem_release(struct xdp_umem *umem)
 		umem->cq = NULL;
 	}
 
+	xsk_reuseq_destroy(umem);
+
 	xdp_umem_unpin_pages(umem);
 
 	task = get_pid_task(umem->pid, PIDTYPE_PID);
diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c
index 2dc1384d9f27..b66504592d9b 100644
--- a/net/xdp/xsk_queue.c
+++ b/net/xdp/xsk_queue.c
@@ -3,7 +3,9 @@
  * Copyright(c) 2018 Intel Corporation.
  */
 
+#include <linux/log2.h>
 #include <linux/slab.h>
+#include <linux/overflow.h>
 
 #include "xsk_queue.h"
 
@@ -62,3 +64,56 @@ void xskq_destroy(struct xsk_queue *q)
 	page_frag_free(q->ring);
 	kfree(q);
 }
+
+struct xdp_umem_fq_reuse *xsk_reuseq_prepare(u32 nentries)
+{
+	struct xdp_umem_fq_reuse *newq;
+
+	/* Check for overflow */
+	if (nentries > (u32)roundup_pow_of_two(nentries))
+		return NULL;
+	nentries = roundup_pow_of_two(nentries);
+
+	newq = kvmalloc(struct_size(newq, handles, nentries), GFP_KERNEL);
+	if (!newq)
+		return NULL;
+	memset(newq, 0, offsetof(typeof(*newq), handles));
+
+	newq->nentries = nentries;
+	return newq;
+}
+EXPORT_SYMBOL_GPL(xsk_reuseq_prepare);
+
+struct xdp_umem_fq_reuse *xsk_reuseq_swap(struct xdp_umem *umem,
+					  struct xdp_umem_fq_reuse *newq)
+{
+	struct xdp_umem_fq_reuse *oldq = umem->fq_reuse;
+
+	if (!oldq) {
+		umem->fq_reuse = newq;
+		return NULL;
+	}
+
+	if (newq->nentries < oldq->length)
+		return newq;
+
+	memcpy(newq->handles, oldq->handles,
+	       array_size(oldq->length, sizeof(u64)));
+	newq->length = oldq->length;
+
+	umem->fq_reuse = newq;
+	return oldq;
+}
+EXPORT_SYMBOL_GPL(xsk_reuseq_swap);
+
+void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq)
+{
+	kvfree(rq);
+}
+EXPORT_SYMBOL_GPL(xsk_reuseq_free);
+
+void xsk_reuseq_destroy(struct xdp_umem *umem)
+{
+	xsk_reuseq_free(umem->fq_reuse);
+	umem->fq_reuse = NULL;
+}
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 82252cccb4e0..bcb5cbb40419 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -258,4 +258,7 @@ void xskq_set_umem(struct xsk_queue *q, u64 size, u64 chunk_mask);
 struct xsk_queue *xskq_create(u32 nentries, bool umem_queue);
 void xskq_destroy(struct xsk_queue *q_ops);
 
+/* Executed by the core when the entire UMEM gets freed */
+void xsk_reuseq_destroy(struct xdp_umem *umem);
+
 #endif /* _LINUX_XSK_QUEUE_H */
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 0/4] i40e AF_XDP zero-copy buffer leak fixes
From: Björn Töpel @ 2018-09-07  8:18 UTC (permalink / raw)
  To: ast, daniel, jeffrey.t.kirsher, intel-wired-lan, jakub.kicinski
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev

From: Björn Töpel <bjorn.topel@intel.com>

NB! The v1 was sent via the bpf-next tree. This time the series is
routed via JeffK's Intel Wired tree to minimize the risk for i40e
merge conflicts.

This series addresses an AF_XDP zero-copy issue that buffers passed
from userspace to the kernel was leaked when the hardware descriptor
ring was torn down.

The patches fixes the i40e AF_XDP zero-copy implementation.

Thanks to Jakub Kicinski for pointing this out!

Some background for folks that don't know the details: A zero-copy
capable driver picks buffers off the fill ring and places them on the
hardware Rx ring to be completed at a later point when DMA is
complete. Similar on the Tx side; The driver picks buffers off the Tx
ring and places them on the Tx hardware ring.

In the typical flow, the Rx buffer will be placed onto an Rx ring
(completed to the user), and the Tx buffer will be placed on the
completion ring to notify the user that the transfer is done.

However, if the driver needs to tear down the hardware rings for some
reason (interface goes down, reconfiguration and such), the userspace
buffers cannot be leaked. They have to be reused or completed back to
userspace.

The implementation does the following:

* Outstanding Tx descriptors will be passed to the completion
  ring. The Tx code has back-pressure mechanism in place, so that
  enough empty space in the completion ring is guaranteed.

* Outstanding Rx descriptors are temporarily stored on a stash/reuse
  queue. The reuse queue is based on Jakub's RFC. When/if the HW rings
  comes up again, entries from the stash are used to re-populate the
  ring.

* When AF_XDP ZC is enabled, disallow changing the number of hardware
  descriptors via ethtool. Otherwise, the size of the stash/reuse
  queue can grow unbounded.

Going forward, introducing a "zero-copy allocator" analogous to Jesper
Brouer's page pool would be a more robust and reuseable solution.

v1->v2: Address kbuild "undefined symbols" error when building with
        !CONFIG_XDP_SOCKETS.

Thanks!
Björn


Björn Töpel (3):
  i40e: clean zero-copy XDP Tx ring on shutdown/reset
  i40e: clean zero-copy XDP Rx ring on shutdown/reset
  i40e: disallow changing the number of descriptors when AF_XDP is on

Jakub Kicinski (1):
  net: xsk: add a simple buffer reuse queue

 .../net/ethernet/intel/i40e/i40e_ethtool.c    |   9 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  21 ++-
 .../ethernet/intel/i40e/i40e_txrx_common.h    |   4 +
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 152 +++++++++++++++++-
 include/net/xdp_sock.h                        |  69 ++++++++
 net/xdp/xdp_umem.c                            |   2 +
 net/xdp/xsk_queue.c                           |  55 +++++++
 net/xdp/xsk_queue.h                           |   3 +
 8 files changed, 299 insertions(+), 16 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH v2 1/4] i40e: clean zero-copy XDP Tx ring on shutdown/reset
From: Björn Töpel @ 2018-09-07  8:18 UTC (permalink / raw)
  To: ast, daniel, jeffrey.t.kirsher, intel-wired-lan, jakub.kicinski
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev
In-Reply-To: <20180907081848.5438-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@intel.com>

When the zero-copy enabled XDP Tx ring is torn down, due to
configuration changes, outstandning frames on the hardware descriptor
ring are queued on the completion ring.

The completion ring has a back-pressure mechanism that will guarantee
that there is sufficient space on the ring.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 17 +++++++----
 .../ethernet/intel/i40e/i40e_txrx_common.h    |  2 ++
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 30 +++++++++++++++++++
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 37bd4e50ccde..7f85d4ba8b54 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -636,13 +636,18 @@ void i40e_clean_tx_ring(struct i40e_ring *tx_ring)
 	unsigned long bi_size;
 	u16 i;
 
-	/* ring already cleared, nothing to do */
-	if (!tx_ring->tx_bi)
-		return;
+	if (ring_is_xdp(tx_ring) && tx_ring->xsk_umem) {
+		i40e_xsk_clean_tx_ring(tx_ring);
+	} else {
+		/* ring already cleared, nothing to do */
+		if (!tx_ring->tx_bi)
+			return;
 
-	/* Free all the Tx ring sk_buffs */
-	for (i = 0; i < tx_ring->count; i++)
-		i40e_unmap_and_free_tx_resource(tx_ring, &tx_ring->tx_bi[i]);
+		/* Free all the Tx ring sk_buffs */
+		for (i = 0; i < tx_ring->count; i++)
+			i40e_unmap_and_free_tx_resource(tx_ring,
+							&tx_ring->tx_bi[i]);
+	}
 
 	bi_size = sizeof(struct i40e_tx_buffer) * tx_ring->count;
 	memset(tx_ring->tx_bi, 0, bi_size);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
index b5afd479a9c5..29c68b29d36f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
@@ -87,4 +87,6 @@ static inline void i40e_arm_wb(struct i40e_ring *tx_ring,
 	}
 }
 
+void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring);
+
 #endif /* I40E_TXRX_COMMON_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 2ebfc78bbd09..99116277c4d2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -830,3 +830,33 @@ int i40e_xsk_async_xmit(struct net_device *dev, u32 queue_id)
 
 	return 0;
 }
+
+/**
+ * i40e_xsk_clean_xdp_ring - Clean the XDP Tx ring on shutdown
+ * @xdp_ring: XDP Tx ring
+ **/
+void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring)
+{
+	u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
+	struct xdp_umem *umem = tx_ring->xsk_umem;
+	struct i40e_tx_buffer *tx_bi;
+	u32 xsk_frames = 0;
+
+	while (ntc != ntu) {
+		tx_bi = &tx_ring->tx_bi[ntc];
+
+		if (tx_bi->xdpf)
+			i40e_clean_xdp_tx_buffer(tx_ring, tx_bi);
+		else
+			xsk_frames++;
+
+		tx_bi->xdpf = NULL;
+
+		ntc++;
+		if (ntc > tx_ring->count)
+			ntc = 0;
+	}
+
+	if (xsk_frames)
+		xsk_umem_complete_tx(umem, xsk_frames);
+}
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH v3 1/2] net: ethernet: i40e: fix build error
From: Wang, Dongsheng @ 2018-09-07 12:40 UTC (permalink / raw)
  To: jeffrey.t.kirsher@intel.com, sergei.shtylyov@cogentembedded.com
  Cc: jacob.e.keller@intel.com, davem@davemloft.net,
	intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, carolyn.wyborny@intel.com
In-Reply-To: <1536319175-3660-2-git-send-email-dongsheng.wang@hxt-semitech.com>

Hello all,

The i40e_ethtool_stats.h is just included by i40e/i40e_ethtool.c. So the
static doesn't make any affect. And Carolyn's team is working rebuild
i40e and i40evf.

Cheers,
Dongsheng

On 9/7/2018 7:19 PM, Wang, Dongsheng wrote:
> Remove "inline" from __i40e_add_stat_strings and move the function.
>
> In file included from
> drivers/net/ethernet/intel/i40e/i40e_ethtool.c:9:0:
> drivers/net/ethernet/intel/i40e/i40e_ethtool.c: In function
> __i40e_add_stat_string:
> drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h:193:20: error:
> function __i40e_add_stat_strings can never be inlined because it uses
> variable argument lists
>  static inline void __i40e_add_stat_strings(u8 **p, const struct
> 					    i40e_stats stats[],
>
> Tested on: x86_64, make ARCH=i386
>
> Modules section .text:
> i40e: 00019380 <__i40e_add_stat_strings>:
> i40evf: 00006b00 <__i40e_add_stat_strings>:
>
> Buildin section .text:
> i40e: c351ca60 <__i40e_add_stat_strings>:
> i40evf: c354f2c0 <__i40e_add_stat_strings>:
>
> Signed-off-by: Wang Dongsheng <dongsheng.wang@hxt-semitech.com>
> ---
> V3: add static 
> V2: Move function
> ---
>  .../net/ethernet/intel/i40e/i40e_ethtool.c    | 24 ++++++++++++++++++
>  .../ethernet/intel/i40e/i40e_ethtool_stats.h  | 25 ++-----------------
>  2 files changed, 26 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> index d7d3974beca2..34121a72f2e7 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> @@ -1821,6 +1821,30 @@ static void i40e_get_ethtool_stats(struct net_device *netdev,
>  		  "ethtool stats count mismatch!");
>  }
>  
> +/**
> + * __i40e_add_stat_strings - copy stat strings into ethtool buffer
> + * @p: ethtool supplied buffer
> + * @stats: stat definitions array
> + * @size: size of the stats array
> + *
> + * Format and copy the strings described by stats into the buffer pointed at
> + * by p.
> + **/
> +static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
> +				    const unsigned int size, ...)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < size; i++) {
> +		va_list args;
> +
> +		va_start(args, size);
> +		vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
> +		*p += ETH_GSTRING_LEN;
> +		va_end(args);
> +	}
> +}
> +
>  /**
>   * i40e_get_stat_strings - copy stat strings into supplied buffer
>   * @netdev: the netdev to collect strings for
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h b/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
> index bba1cb0b658f..553b0d720839 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
> @@ -181,29 +181,8 @@ i40e_add_queue_stats(u64 **data, struct i40e_ring *ring)
>  	*data += size;
>  }
>  
> -/**
> - * __i40e_add_stat_strings - copy stat strings into ethtool buffer
> - * @p: ethtool supplied buffer
> - * @stats: stat definitions array
> - * @size: size of the stats array
> - *
> - * Format and copy the strings described by stats into the buffer pointed at
> - * by p.
> - **/
> -static inline void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
> -				    const unsigned int size, ...)
> -{
> -	unsigned int i;
> -
> -	for (i = 0; i < size; i++) {
> -		va_list args;
> -
> -		va_start(args, size);
> -		vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
> -		*p += ETH_GSTRING_LEN;
> -		va_end(args);
> -	}
> -}
> +static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
> +				    const unsigned int size, ...);
>  
>  /**
>   * 40e_add_stat_strings - copy stat strings into ethtool buffer

^ permalink raw reply

* Re: [PATCH v3 2/2] net: ethernet: i40evf: fix underlying build error
From: Wang, Dongsheng @ 2018-09-07 12:34 UTC (permalink / raw)
  To: jeffrey.t.kirsher@intel.com, sergei.shtylyov@cogentembedded.com
  Cc: jacob.e.keller@intel.com, davem@davemloft.net,
	intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, carolyn.wyborny@intel.com
In-Reply-To: <1536319175-3660-3-git-send-email-dongsheng.wang@hxt-semitech.com>

Hello Jacob,

Since Carolyn' team is working this, I think we don't need this patch
anymore because this header file is only for ethtool.c.

The cross-including scenario simply doesn't exist.

If you agree, please ignore this patch and just only review the [1/2].


Cheers,
Dongsheng

On 9/7/2018 7:19 PM, Wang, Dongsheng wrote:
> Can't have non-inline function in a header file.
> There is a risk of "Multiple definition" from cross-including.
>
> Tested on: x86_64, make ARCH=i386
>
> Modules section .text:
> i40e: 00019380 <__i40e_add_stat_strings>:
> i40evf: 00006b00 <__i40e_add_stat_strings>:
>
> Buildin section .text:
> i40e: c351ca60 <__i40e_add_stat_strings>:
> i40evf: c354f2c0 <__i40e_add_stat_strings>:
>
> Signed-off-by: Wang Dongsheng <dongsheng.wang@hxt-semitech.com>
> ---
> V3: add static 
> ---
>  .../intel/i40evf/i40e_ethtool_stats.h         | 23 +-----------------
>  .../ethernet/intel/i40evf/i40evf_ethtool.c    | 24 +++++++++++++++++++
>  2 files changed, 25 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h b/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
> index 60b595dd8c39..62ab67a77753 100644
> --- a/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
> +++ b/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
> @@ -181,29 +181,8 @@ i40evf_add_queue_stats(u64 **data, struct i40e_ring *ring)
>  	*data += size;
>  }
>  
> -/**
> - * __i40e_add_stat_strings - copy stat strings into ethtool buffer
> - * @p: ethtool supplied buffer
> - * @stats: stat definitions array
> - * @size: size of the stats array
> - *
> - * Format and copy the strings described by stats into the buffer pointed at
> - * by p.
> - **/
>  static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
> -				    const unsigned int size, ...)
> -{
> -	unsigned int i;
> -
> -	for (i = 0; i < size; i++) {
> -		va_list args;
> -
> -		va_start(args, size);
> -		vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
> -		*p += ETH_GSTRING_LEN;
> -		va_end(args);
> -	}
> -}
> +				    const unsigned int size, ...);
>  
>  /**
>   * 40e_add_stat_strings - copy stat strings into ethtool buffer
> diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
> index 9319971c5c92..eb2e910bc3a1 100644
> --- a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
> +++ b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
> @@ -171,6 +171,30 @@ static void i40evf_get_priv_flag_strings(struct net_device *netdev, u8 *data)
>  	}
>  }
>  
> +/**
> + * __i40e_add_stat_strings - copy stat strings into ethtool buffer
> + * @p: ethtool supplied buffer
> + * @stats: stat definitions array
> + * @size: size of the stats array
> + *
> + * Format and copy the strings described by stats into the buffer pointed at
> + * by p.
> + **/
> +static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
> +				    const unsigned int size, ...)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < size; i++) {
> +		va_list args;
> +
> +		va_start(args, size);
> +		vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
> +		*p += ETH_GSTRING_LEN;
> +		va_end(args);
> +	}
> +}
> +
>  /**
>   * i40evf_get_stat_strings - Get stat strings
>   * @netdev: network interface device structure

^ permalink raw reply

* Re: [PATCH 1/7] fix hnode refcounting
From: Jamal Hadi Salim @ 2018-09-07 12:33 UTC (permalink / raw)
  To: Al Viro; +Cc: netdev, Cong Wang, Jiri Pirko, stable
In-Reply-To: <612ba054-370d-d118-b439-c68ea466eec9@mojatatu.com>

To clarify with an example i used to test
your patches:

#0 add ingress filter
$TC qdisc add dev $P ingress
#1 add filter
$TC filter add dev $P parent ffff: protocol ip prio 10 \
u32 match ip protocol 1 0xff
#2 display
$TC filter ls dev $P parent ffff:

#3 try to delete root
$TC filter delete dev $P parent ffff: protocol ip prio 10 \
handle 800: u32

#4 nothing changes..
$TC filter ls dev $P parent ffff:
#5 delete filter
$TC filter delete dev $P parent ffff: protocol ip prio 10 \
handle 800:0:800 u32
#6 filter gone but hash table still there..
$TC filter ls dev $P parent ffff:
#7 delete tp
$TC filter delete dev $P parent ffff: protocol ip prio 10 \
u32
#8 now it is gone..
$TC filter ls dev $P parent ffff:

your patches show #6 filter as still active.
We want it to look like #8

Hope this helps.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH 3/4] of: Convert to using %pOFn instead of device_node.name
From: Thierry Reding @ 2018-09-07 12:29 UTC (permalink / raw)
  To: Rob Herring
  Cc: Frank Rowand, devicetree, linux-kernel, Andrew Lunn,
	Florian Fainelli, netdev
In-Reply-To: <20180828155254.10709-4-robh@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]

On Tue, Aug 28, 2018 at 10:52:53AM -0500, Rob Herring wrote:
> In preparation to remove the node name pointer from struct device_node,
> convert printf users to use the %pOFn format specifier.
> 
> Cc: Frank Rowand <frowand.list@gmail.com>
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Cc: devicetree@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Rob Herring <robh@kernel.org>
> ---
>  drivers/of/device.c   |  4 ++--
>  drivers/of/of_mdio.c  | 12 ++++++------
>  drivers/of/of_numa.c  |  4 ++--
>  drivers/of/overlay.c  |  4 ++--
>  drivers/of/platform.c |  8 ++++----
>  drivers/of/unittest.c | 12 ++++++------
>  6 files changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/of/device.c b/drivers/of/device.c
> index 5957cd4fa262..daa075d87317 100644
> --- a/drivers/of/device.c
> +++ b/drivers/of/device.c
> @@ -219,7 +219,7 @@ static ssize_t of_device_get_modalias(struct device *dev, char *str, ssize_t len
>  		return -ENODEV;
>  
>  	/* Name & Type */
> -	csize = snprintf(str, len, "of:N%sT%s", dev->of_node->name,
> +	csize = snprintf(str, len, "of:N%pOFnT%s", dev->of_node,
>  			 dev->of_node->type);
>  	tsize = csize;
>  	len -= csize;

This seems to cause the modalias to be improperly constructed. As a
consequence, automatic module loading at boot time is now broken. I
think the reason why this fails is because vsnprintf() will skip all
alpha-numeric characters after a call to pointer(). Presumably this
is meant to be a generic way of skipping whatever specifiers we throw
at it.

Unfortunately for the case of OF modaliases, this means that the 'T'
character gets eaten, so we end up with something like this:

	# udevadm info /sys/bus/platform/devices/54200000.dc
	[...]
	E: MODALIAS=of:Ndc<NULL>Cnvidia,tegra124-dc
	[...]

instead of this:

	# udevadm info /sys/bus/platform/devices/54200000.dc
	[...]
	E: MODALIAS=of:NdcT<NULL>Cnvidia,tegra124-dc
	[...]

Everything is back to normal if I revert this patch. However, since
that's obviously not what we want, I think perhaps what we need is a
way for pointer() (and its implementations) to report back how many
characters in the format string it consumed so that we can support
these kinds of back-to-back strings.

If nobody else has the time I can look into coding up a fix, but in the
meantime it might be best to back this one out until we can handle the
OF modalias format string.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH] xen/netfront: fix waiting for xenbus state change
From: Juergen Gross @ 2018-09-07 12:21 UTC (permalink / raw)
  To: linux-kernel, xen-devel, netdev
  Cc: boris.ostrovsky, davem, Juergen Gross, stable

Commit 822fb18a82aba ("xen-netfront: wait xenbus state change when load
module manually") added a new wait queue to wait on for a state change
when the module is loaded manually. Unfortunately there is no wakeup
anywhere to stop that waiting.

Instead of introducing a new wait queue rename the existing
module_unload_q to module_wq and use it for both purposes (loading and
unloading).

As any state change of the backend might be intended to stop waiting
do the wake_up_all() in any case when netback_changed() is called.

Fixes: 822fb18a82aba ("xen-netfront: wait xenbus state change when load module manually")
Cc: <stable@vger.kernel.org> #4.18
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/net/xen-netfront.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 73f596a90c69..9407acbd19a9 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -87,8 +87,7 @@ struct netfront_cb {
 /* IRQ name is queue name with "-tx" or "-rx" appended */
 #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
 
-static DECLARE_WAIT_QUEUE_HEAD(module_load_q);
-static DECLARE_WAIT_QUEUE_HEAD(module_unload_q);
+static DECLARE_WAIT_QUEUE_HEAD(module_wq);
 
 struct netfront_stats {
 	u64			packets;
@@ -1332,11 +1331,11 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 	netif_carrier_off(netdev);
 
 	xenbus_switch_state(dev, XenbusStateInitialising);
-	wait_event(module_load_q,
-			   xenbus_read_driver_state(dev->otherend) !=
-			   XenbusStateClosed &&
-			   xenbus_read_driver_state(dev->otherend) !=
-			   XenbusStateUnknown);
+	wait_event(module_wq,
+		   xenbus_read_driver_state(dev->otherend) !=
+		   XenbusStateClosed &&
+		   xenbus_read_driver_state(dev->otherend) !=
+		   XenbusStateUnknown);
 	return netdev;
 
  exit:
@@ -2010,15 +2009,14 @@ static void netback_changed(struct xenbus_device *dev,
 
 	dev_dbg(&dev->dev, "%s\n", xenbus_strstate(backend_state));
 
+	wake_up_all(&module_wq);
+
 	switch (backend_state) {
 	case XenbusStateInitialising:
 	case XenbusStateInitialised:
 	case XenbusStateReconfiguring:
 	case XenbusStateReconfigured:
-		break;
-
 	case XenbusStateUnknown:
-		wake_up_all(&module_unload_q);
 		break;
 
 	case XenbusStateInitWait:
@@ -2034,12 +2032,10 @@ static void netback_changed(struct xenbus_device *dev,
 		break;
 
 	case XenbusStateClosed:
-		wake_up_all(&module_unload_q);
 		if (dev->state == XenbusStateClosed)
 			break;
 		/* Missed the backend's CLOSING state -- fallthrough */
 	case XenbusStateClosing:
-		wake_up_all(&module_unload_q);
 		xenbus_frontend_closed(dev);
 		break;
 	}
@@ -2147,14 +2143,14 @@ static int xennet_remove(struct xenbus_device *dev)
 
 	if (xenbus_read_driver_state(dev->otherend) != XenbusStateClosed) {
 		xenbus_switch_state(dev, XenbusStateClosing);
-		wait_event(module_unload_q,
+		wait_event(module_wq,
 			   xenbus_read_driver_state(dev->otherend) ==
 			   XenbusStateClosing ||
 			   xenbus_read_driver_state(dev->otherend) ==
 			   XenbusStateUnknown);
 
 		xenbus_switch_state(dev, XenbusStateClosed);
-		wait_event(module_unload_q,
+		wait_event(module_wq,
 			   xenbus_read_driver_state(dev->otherend) ==
 			   XenbusStateClosed ||
 			   xenbus_read_driver_state(dev->otherend) ==
-- 
2.16.4

^ permalink raw reply related

* Re: [PATCH] net/sock: move memory_allocated over to percpu_counter variables
From: Eric Dumazet @ 2018-09-07  7:21 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Herbert Xu, David Miller, Neil Horman, Marcelo Ricardo Leitner,
	Vladislav Yasevich, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	linux-crypto, LKML, linux-sctp, netdev, linux-decnet-user,
	kernel-team, Yuchung Cheng, Neal Cardwell
In-Reply-To: <CANn89iKJcgMWb2Kmk6L9k=NkfBUKZ6BwriWr3O+N5Y0u5dy=9g@mail.gmail.com>

On Fri, Sep 7, 2018 at 12:03 AM Eric Dumazet <edumazet@google.com> wrote:

> Problem is : we have platforms with more than 100 cpus, and
> sk_memory_allocated() cost will be too expensive,
> especially if the host is under memory pressure, since all cpus will
> touch their private counter.
>
> per cpu variables do not really scale, they were ok 10 years ago when
> no more than 16 cpus were the norm.
>
> I would prefer change TCP to not aggressively call
> __sk_mem_reduce_allocated() from tcp_write_timer()
>
> Ideally only tcp_retransmit_timer() should attempt to reduce forward
> allocations, after recurring timeout.
>
> Note that after 20c64d5cd5a2bdcdc8982a06cb05e5e1bd851a3d ("net: avoid
> sk_forward_alloc overflows")
> we have better control over sockets having huge forward allocations.
>
> Something like :

Or something less risky :

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 7fdf222a0bdfe9775970082f6b5dcdcc82b2ae1a..0aee80b6966cb2898e46350c761f9eb431ff1206
100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -604,7 +604,8 @@ void tcp_write_timer_handler(struct sock *sk)
        }

 out:
-       sk_mem_reclaim(sk);
+       if (tcp_under_memory_pressure(sk))
+               sk_mem_reclaim(sk);
 }

 static void tcp_write_timer(struct timer_list *t)

^ permalink raw reply

* Re: [Xen-devel] [PATCH] xen-netfront: wait xenbus state change when load module manually
From: Juergen Gross @ 2018-09-07 11:30 UTC (permalink / raw)
  To: Jiri Slaby, Boris Ostrovsky, Xiao Liang, David Miller
  Cc: netdev, linux-kernel, xen-devel
In-Reply-To: <172007cc-6a82-99dd-2d1e-5103a6c1b9b0@suse.cz>

On 07/09/18 13:06, Jiri Slaby wrote:
> On 08/24/2018, 04:26 PM, Boris Ostrovsky wrote:
>> On 08/24/2018 07:26 AM, Juergen Gross wrote:
>>> On 24/08/18 13:12, Jiri Slaby wrote:
>>>> On 07/30/2018, 10:18 AM, Xiao Liang wrote:
>>>>> On 07/29/2018 11:30 PM, David Miller wrote:
>>>>>> From: Xiao Liang <xiliang@redhat.com>
>>>>>> Date: Fri, 27 Jul 2018 17:56:08 +0800
>>>>>>
>>>>>>> @@ -1330,6 +1331,11 @@ static struct net_device
>>>>>>> *xennet_create_dev(struct xenbus_device *dev)
>>>>>>>       netif_carrier_off(netdev);
>>>>>>>         xenbus_switch_state(dev, XenbusStateInitialising);
>>>>>>> +    wait_event(module_load_q,
>>>>>>> +               xenbus_read_driver_state(dev->otherend) !=
>>>>>>> +               XenbusStateClosed &&
>>>>>>> +               xenbus_read_driver_state(dev->otherend) !=
>>>>>>> +               XenbusStateUnknown);
>>>>>>>       return netdev;
>>>>>>>      exit:
>>>>>> What performs the wakeups that will trigger for this sleep site?
>>>>> In my understanding, backend leaving closed/unknow state can trigger the
>>>>> wakeups. I mean to make sure both sides are ready for creating connection.
>>>> While backporting this to 4.12, I was surprised by the commit the same
>>>> as Boris and David.
>>>>
>>>> So I assume the explanation is that wake_up_all of module_unload_q in
>>>> netback_changed wakes also all the processes waiting on module_load_q?
>>>> If so, what makes sure that module_unload_q is queued and the process is
>>>> the same as for module_load_q?
>>> How could it? Either the thread is waiting on module_unload_q _or_ on
>>> module_load_q. It can't wait on two queues at the same time.
>>>
>>>> To me, it looks rather error-prone. Unless it is erroneous now, at least
>>>> for future changes. Wouldn't it make sense to wake up module_load_q
>>>> along with module_unload_q in netback_changed? Or drop module_load_q
>>>> completely and use only module_unload_q (i.e. in xennet_create_dev too)?
>>> To me this looks just wrong. A thread waiting on module_load_q won't be
>>> woken up again.
>>>
>>> I'd drop module_load_q in favor of module_unload_q.
>>
>>
>> Yes, use single queue, but rename it to something more neutral. module_wq?
> 
> Can somebody who is actually using the module fix this, please?

Already at it.


Juergen

^ permalink raw reply

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
From: Sinan Kaya @ 2018-09-07  6:40 UTC (permalink / raw)
  To: Daniel Drake, bhelgaas
  Cc: linux-pci, linux, nouveau, linux-pm, peter, kherbst,
	andriy.shevchenko, rafael.j.wysocki, keith.busch, mika.westerberg,
	jonathan.derrick, kugel, davem, hkallweit1, netdev, nic_swsd
In-Reply-To: <20180907053614.6540-1-drake@endlessm.com>

On 9/6/2018 10:36 PM, Daniel Drake wrote:
> +	if (pci_dev->class == PCI_CLASS_BRIDGE_PCI << 8)
> +		pci_setup_bridge_mmio_pref(pci_dev);

This should probably some kind of a quirk rather than default
for the listed card as it sounds like you are dealing with
broken hardware.

^ permalink raw reply

* [PATCH v3 2/2] net: ethernet: i40evf: fix underlying build error
From: Wang Dongsheng @ 2018-09-07 11:19 UTC (permalink / raw)
  To: jeffrey.t.kirsher, sergei.shtylyov
  Cc: jacob.e.keller, davem, intel-wired-lan, netdev, linux-kernel,
	Wang Dongsheng
In-Reply-To: <1536319175-3660-1-git-send-email-dongsheng.wang@hxt-semitech.com>

Can't have non-inline function in a header file.
There is a risk of "Multiple definition" from cross-including.

Tested on: x86_64, make ARCH=i386

Modules section .text:
i40e: 00019380 <__i40e_add_stat_strings>:
i40evf: 00006b00 <__i40e_add_stat_strings>:

Buildin section .text:
i40e: c351ca60 <__i40e_add_stat_strings>:
i40evf: c354f2c0 <__i40e_add_stat_strings>:

Signed-off-by: Wang Dongsheng <dongsheng.wang@hxt-semitech.com>
---
V3: add static 
---
 .../intel/i40evf/i40e_ethtool_stats.h         | 23 +-----------------
 .../ethernet/intel/i40evf/i40evf_ethtool.c    | 24 +++++++++++++++++++
 2 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h b/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
index 60b595dd8c39..62ab67a77753 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
@@ -181,29 +181,8 @@ i40evf_add_queue_stats(u64 **data, struct i40e_ring *ring)
 	*data += size;
 }
 
-/**
- * __i40e_add_stat_strings - copy stat strings into ethtool buffer
- * @p: ethtool supplied buffer
- * @stats: stat definitions array
- * @size: size of the stats array
- *
- * Format and copy the strings described by stats into the buffer pointed at
- * by p.
- **/
 static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
-				    const unsigned int size, ...)
-{
-	unsigned int i;
-
-	for (i = 0; i < size; i++) {
-		va_list args;
-
-		va_start(args, size);
-		vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
-		*p += ETH_GSTRING_LEN;
-		va_end(args);
-	}
-}
+				    const unsigned int size, ...);
 
 /**
  * 40e_add_stat_strings - copy stat strings into ethtool buffer
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
index 9319971c5c92..eb2e910bc3a1 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
@@ -171,6 +171,30 @@ static void i40evf_get_priv_flag_strings(struct net_device *netdev, u8 *data)
 	}
 }
 
+/**
+ * __i40e_add_stat_strings - copy stat strings into ethtool buffer
+ * @p: ethtool supplied buffer
+ * @stats: stat definitions array
+ * @size: size of the stats array
+ *
+ * Format and copy the strings described by stats into the buffer pointed at
+ * by p.
+ **/
+static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
+				    const unsigned int size, ...)
+{
+	unsigned int i;
+
+	for (i = 0; i < size; i++) {
+		va_list args;
+
+		va_start(args, size);
+		vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
+		*p += ETH_GSTRING_LEN;
+		va_end(args);
+	}
+}
+
 /**
  * i40evf_get_stat_strings - Get stat strings
  * @netdev: network interface device structure
-- 
2.18.0

^ permalink raw reply related

* [PATCH v3 0/2] net: ethernet: i40e/i40evf fix build error
From: Wang Dongsheng @ 2018-09-07 11:19 UTC (permalink / raw)
  To: jeffrey.t.kirsher, sergei.shtylyov
  Cc: jacob.e.keller, davem, intel-wired-lan, netdev, linux-kernel,
	Wang Dongsheng

i40e: function __i40e_add_stat_strings can never be inlined because it uses
variable argument lists.

i40evf: A risk of "Multiple definition" from cross-including.

I am working on aarch64 platform and unfortunately the config file is MODULE.
So I missed the buildin test... Sorry for pushing this patch to v3 version...

Now I tested this patch with buildin and modules.

Tested on: x86_64, make ARCH=i386

Modules section .text:
i40e: 00019380 <__i40e_add_stat_strings>:
i40evf: 00006b00 <__i40e_add_stat_strings>:

Buildin:
i40e: c351ca60 <__i40e_add_stat_strings>:
i40evf: c354f2c0 <__i40e_add_stat_strings>:


Wang Dongsheng (2):
  net: ethernet: i40e: fix build error
  net: ethernet: i40evf: fix underlying build error

 .../net/ethernet/intel/i40e/i40e_ethtool.c    | 24 ++++++++++++++++++
 .../ethernet/intel/i40e/i40e_ethtool_stats.h  | 25 ++-----------------
 .../intel/i40evf/i40e_ethtool_stats.h         | 23 +----------------
 .../ethernet/intel/i40evf/i40evf_ethtool.c    | 24 ++++++++++++++++++
 4 files changed, 51 insertions(+), 45 deletions(-)

-- 
2.18.0

^ permalink raw reply

* Re: [RFC PATCH bpf-next v2 0/4] Implement bpf queue/stack maps
From: Joe Stringer @ 2018-09-07  6:27 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: mauricio.vasquez, ast, daniel, netdev, Joe Stringer
In-Reply-To: <20180907001317.fj7f6fg6ihljompp@ast-mbp.dhcp.thefacebook.com>

On Thu, 6 Sep 2018 at 17:13, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> bpf_map_pop_elem() is trying to do lookup_and_delete and preserve
> validity of value without races.
> With pcpu_freelist I don't think there is a solution.
> We can have this queue/stack map without prealloc and use kmalloc/kfree
> back and forth. Performance will not be as great, but for your use case,
> I suspect, it will be good enough.
> The key issue with kmalloc/kfree is unbounded time of rcu callbacks.
> If somebody starts doing push/pop for every packet, the rcu subsystem
> will struggle and nothing we can do about it.
>
> The only way I could think of to resolve this problem is to reuse
> the logic that Joe is working on for socket lookups inside the program.
> Joe,
> how is that going? Could you repost the latest patches?

I can rebase & send them out. Was just wanting to get a little more testing in.

Cheers,
Joe

^ permalink raw reply

* Re: [Xen-devel] [PATCH] xen-netfront: wait xenbus state change when load module manually
From: Jiri Slaby @ 2018-09-07 11:06 UTC (permalink / raw)
  To: Boris Ostrovsky, Juergen Gross, Xiao Liang, David Miller
  Cc: netdev, linux-kernel, xen-devel
In-Reply-To: <1a6b2326-78b3-f7fa-fb3b-08c54f4f9761@oracle.com>

On 08/24/2018, 04:26 PM, Boris Ostrovsky wrote:
> On 08/24/2018 07:26 AM, Juergen Gross wrote:
>> On 24/08/18 13:12, Jiri Slaby wrote:
>>> On 07/30/2018, 10:18 AM, Xiao Liang wrote:
>>>> On 07/29/2018 11:30 PM, David Miller wrote:
>>>>> From: Xiao Liang <xiliang@redhat.com>
>>>>> Date: Fri, 27 Jul 2018 17:56:08 +0800
>>>>>
>>>>>> @@ -1330,6 +1331,11 @@ static struct net_device
>>>>>> *xennet_create_dev(struct xenbus_device *dev)
>>>>>>       netif_carrier_off(netdev);
>>>>>>         xenbus_switch_state(dev, XenbusStateInitialising);
>>>>>> +    wait_event(module_load_q,
>>>>>> +               xenbus_read_driver_state(dev->otherend) !=
>>>>>> +               XenbusStateClosed &&
>>>>>> +               xenbus_read_driver_state(dev->otherend) !=
>>>>>> +               XenbusStateUnknown);
>>>>>>       return netdev;
>>>>>>      exit:
>>>>> What performs the wakeups that will trigger for this sleep site?
>>>> In my understanding, backend leaving closed/unknow state can trigger the
>>>> wakeups. I mean to make sure both sides are ready for creating connection.
>>> While backporting this to 4.12, I was surprised by the commit the same
>>> as Boris and David.
>>>
>>> So I assume the explanation is that wake_up_all of module_unload_q in
>>> netback_changed wakes also all the processes waiting on module_load_q?
>>> If so, what makes sure that module_unload_q is queued and the process is
>>> the same as for module_load_q?
>> How could it? Either the thread is waiting on module_unload_q _or_ on
>> module_load_q. It can't wait on two queues at the same time.
>>
>>> To me, it looks rather error-prone. Unless it is erroneous now, at least
>>> for future changes. Wouldn't it make sense to wake up module_load_q
>>> along with module_unload_q in netback_changed? Or drop module_load_q
>>> completely and use only module_unload_q (i.e. in xennet_create_dev too)?
>> To me this looks just wrong. A thread waiting on module_load_q won't be
>> woken up again.
>>
>> I'd drop module_load_q in favor of module_unload_q.
> 
> 
> Yes, use single queue, but rename it to something more neutral. module_wq?

Can somebody who is actually using the module fix this, please?

I could fix it, but untested changes are "a bit" worse than tested changes.

thanks,
-- 
js
suse labs

^ permalink raw reply

* Re: [PATCH 09/12] iwlwifi: Remove local iwl_bcast_addr and use ether_broadcast_addr
From: Luciano Coelho @ 2018-09-07 10:46 UTC (permalink / raw)
  To: Joe Perches, linux-kernel
  Cc: Johannes Berg, Emmanuel Grumbach, Intel Linux Wireless,
	Kalle Valo, linux-wireless, netdev
In-Reply-To: <ebe5bbb03ca9c39e468d7f40d67d58733062a1f9.camel@intel.com>

On Sat, 2018-07-07 at 10:40 +0300, Luciano Coelho wrote:
> On Sat, 2018-03-31 at 00:05 -0700, Joe Perches wrote:
> > Use the new global to save a little bit of object code.
> > 
> > Signed-off-by: Joe Perches <joe@perches.com>
> > ---
> 
> I took this one to our internal tree and it will be applied as part
> of our normal upstreaming process.

It seems that the patch to etherdevice.h never got applied upstream, so
I'm actually dropping this patch.

Please resend if it becomes relevant again.

--
Cheers,
Luca.

^ permalink raw reply

* Re: [Nouveau] [PATCH] PCI: Reprogram bridge prefetch registers on resume
From: Lukas Wunner @ 2018-09-07  5:49 UTC (permalink / raw)
  To: Daniel Drake
  Cc: bhelgaas, mika.westerberg, linux-pm, linux-pci, rafael.j.wysocki,
	nic_swsd, keith.busch, netdev, nouveau, andriy.shevchenko, linux,
	davem, jonathan.derrick, hkallweit1
In-Reply-To: <20180907053614.6540-1-drake@endlessm.com>

On Fri, Sep 07, 2018 at 01:36:14PM +0800, Daniel Drake wrote:
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -934,6 +934,7 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn);
>  void pci_device_add(struct pci_dev *dev, struct pci_bus *bus);
>  unsigned int pci_scan_child_bus(struct pci_bus *bus);
>  void pci_bus_add_device(struct pci_dev *dev);
> +void pci_setup_bridge_mmio_pref(struct pci_dev *bridge);
>  void pci_read_bridge_bases(struct pci_bus *child);
>  struct resource *pci_find_parent_resource(const struct pci_dev *dev,
>  					  struct resource *res);

Since this is only used internally in the PCI core, the declaration
can live in drivers/pci/pci.h.

Thanks,

Lukas

^ permalink raw reply

* Re: [PATCH net-next] bnxt_en: remove set but not used variable 'addr_type'
From: David Miller @ 2018-09-07  4:55 UTC (permalink / raw)
  To: yuehaibing; +Cc: michael.chan, netdev, kernel-janitors
In-Reply-To: <1536147850-186781-1-git-send-email-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 5 Sep 2018 11:44:10 +0000

> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c: In function 'bnxt_tc_parse_flow':
> drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c:186:6: warning:
>  variable 'addr_type' set but not used [-Wunused-but-set-variable]
> 
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Looks correct to me.

Applied, thanks.

^ permalink raw reply

* Re: [Patch net v3] tipc: call start and done ops directly in __tipc_nl_compat_dumpit()
From: David Miller @ 2018-09-07  4:50 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, tipc-discussion, jon.maloy, ying.xue
In-Reply-To: <20180904215455.3985-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Tue,  4 Sep 2018 14:54:55 -0700

> __tipc_nl_compat_dumpit() uses a netlink_callback on stack,
> so the only way to align it with other ->dumpit() call path
> is calling tipc_dump_start() and tipc_dump_done() directly
> inside it. Otherwise ->dumpit() would always get NULL from
> cb->args[].
> 
> But tipc_dump_start() uses sock_net(cb->skb->sk) to retrieve
> net pointer, the cb->skb here doesn't set skb->sk, the net pointer
> is saved in msg->net instead, so introduce a helper function
> __tipc_dump_start() to pass in msg->net.
> 
> Ying pointed out cb->args[0...3] are already used by other
> callbacks on this call path, so we can't use cb->args[0] any
> more, use cb->args[4] instead.
> 
> Fixes: 9a07efa9aea2 ("tipc: switch to rhashtable iterator")
> Reported-and-tested-by: syzbot+e93a2c41f91b8e2c7d9b@syzkaller.appspotmail.com
> Cc: Jon Maloy <jon.maloy@ericsson.com>
> Cc: Ying Xue <ying.xue@windriver.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied, thanks Cong.

^ permalink raw reply

* Re: [PATCH net-next v2] openvswitch: Derive IP protocol number for IPv6 later frags
From: David Miller @ 2018-09-07  4:48 UTC (permalink / raw)
  To: yihung.wei; +Cc: netdev, pshelar, u9012063
In-Reply-To: <1536100421-16360-1-git-send-email-yihung.wei@gmail.com>

From: Yi-Hung Wei <yihung.wei@gmail.com>
Date: Tue,  4 Sep 2018 15:33:41 -0700

> Currently, OVS only parses the IP protocol number for the first
> IPv6 fragment, but sets the IP protocol number for the later fragments
> to be NEXTHDF_FRAGMENT.  This patch tries to derive the IP protocol
> number for the IPV6 later frags so that we can match that.
> 
> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 6/7] get rid of tc_u_common ->rcu
From: Al Viro @ 2018-09-07  4:28 UTC (permalink / raw)
  To: Cong Wang; +Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko
In-Reply-To: <CAM_iQpWnydTkLYwmZq2gQPnNBTS6E0iNNgy1zsmYFcXkKJAnTw@mail.gmail.com>

On Thu, Sep 06, 2018 at 09:18:47PM -0700, Cong Wang wrote:
> On Wed, Sep 5, 2018 at 12:04 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > From: Al Viro <viro@zeniv.linux.org.uk>
> >
> > unused
> >
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > ---
> >  net/sched/cls_u32.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
> > index 8a1a573487bd..be9240ae1417 100644
> > --- a/net/sched/cls_u32.c
> > +++ b/net/sched/cls_u32.c
> > @@ -98,7 +98,6 @@ struct tc_u_common {
> >         int                     refcnt;
> >         struct idr              handle_idr;
> >         struct hlist_node       hnode;
> > -       struct rcu_head         rcu;
> >  };
> 
> Just FYI:
> 
> This was added when RCU was introduced to u32 fast path,
> it looks like on fast path we never touch tc_u_common, we
> only use tp->root, all the rest are slow paths with RTNL lock,
> so it is probably fine to just remove it rather than converting
> that kfree() to kfree_rcu().

*nod*

In any case, if u32_classify() grows that dereference, we can always
re-add ->rcu at the same time.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox