Netdev List

Netdev List
 help / color / mirror / Atom feed

* [RFC PATCH] i40e: enable PCIe relax ordering for SPARC
From: Tushar Dave @ 2016-12-05 17:07 UTC (permalink / raw)
  To: jeffrey.t.kirsher, intel-wired-lan; +Cc: netdev

Unlike previous generation NIC (e.g. ixgbe) i40e doesn't seem to have
standard CSR where PCIe relaxed ordering can be set. Without PCIe relax
ordering enabled, i40e performance is significantly low on SPARC.

This patch sets PCIe relax ordering for SPARC arch by setting dma attr
DMA_ATTR_WEAK_ORDERING for every tx and rx DMA map/unmap.
This has shown 10x increase in performance numbers.

e.g.
iperf TCP test with 10 threads on SPARC S7

Test 1: Without this patch

[root@brm-snt1-03 net]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40926
[  5] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40934
[  6] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40930
[  7] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40928
[  8] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40922
[  9] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40932
[ 10] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40920
[ 11] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40924
[ 14] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40982
[ 12] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40980
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-20.0 sec   566 MBytes   237 Mbits/sec
[  5]  0.0-20.0 sec   532 MBytes   223 Mbits/sec
[  6]  0.0-20.0 sec   537 MBytes   225 Mbits/sec
[  8]  0.0-20.0 sec   546 MBytes   229 Mbits/sec
[ 11]  0.0-20.0 sec   592 MBytes   248 Mbits/sec
[  7]  0.0-20.0 sec   539 MBytes   226 Mbits/sec
[  9]  0.0-20.0 sec   572 MBytes   240 Mbits/sec
[ 10]  0.0-20.0 sec   604 MBytes   253 Mbits/sec
[ 14]  0.0-20.0 sec   567 MBytes   238 Mbits/sec
[ 12]  0.0-20.0 sec   511 MBytes   214 Mbits/sec
[SUM]  0.0-20.0 sec  5.44 GBytes  2.33 Gbits/sec

Test 2: with this patch:

[root@brm-snt1-03 net]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
TCP: request_sock_TCP: Possible SYN flooding on port 5001. Sending
cookies.  Check SNMP counters.
[  4] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46876
[  5] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46874
[  6] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46872
[  7] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46880
[  8] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46878
[  9] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46884
[ 10] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46886
[ 11] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46890
[ 12] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46888
[ 13] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46882
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-20.0 sec  7.45 GBytes  3.19 Gbits/sec
[  5]  0.0-20.0 sec  7.48 GBytes  3.21 Gbits/sec
[  7]  0.0-20.0 sec  7.34 GBytes  3.15 Gbits/sec
[  8]  0.0-20.0 sec  7.42 GBytes  3.18 Gbits/sec
[  9]  0.0-20.0 sec  7.24 GBytes  3.11 Gbits/sec
[ 10]  0.0-20.0 sec  7.40 GBytes  3.17 Gbits/sec
[ 12]  0.0-20.0 sec  7.49 GBytes  3.21 Gbits/sec
[  6]  0.0-20.0 sec  7.30 GBytes  3.13 Gbits/sec
[ 11]  0.0-20.0 sec  7.44 GBytes  3.19 Gbits/sec
[ 13]  0.0-20.0 sec  7.22 GBytes  3.10 Gbits/sec
[SUM]  0.0-20.0 sec  73.8 GBytes  31.6 Gbits/sec

NOTE: In my testing, this patch does _not_ show any harm to i40e
performance numbers on x86.

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 69 ++++++++++++++++++++---------
 drivers/net/ethernet/intel/i40e/i40e_txrx.h |  1 +
 2 files changed, 49 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 6287bf6..800dca7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -551,15 +551,17 @@ static void i40e_unmap_and_free_tx_resource(struct i40e_ring *ring,
 		else
 			dev_kfree_skb_any(tx_buffer->skb);
 		if (dma_unmap_len(tx_buffer, len))
-			dma_unmap_single(ring->dev,
-					 dma_unmap_addr(tx_buffer, dma),
-					 dma_unmap_len(tx_buffer, len),
-					 DMA_TO_DEVICE);
+			dma_unmap_single_attrs(ring->dev,
+					       dma_unmap_addr(tx_buffer, dma),
+					       dma_unmap_len(tx_buffer, len),
+					       DMA_TO_DEVICE,
+					       ring->dma_attrs);
 	} else if (dma_unmap_len(tx_buffer, len)) {
-		dma_unmap_page(ring->dev,
-			       dma_unmap_addr(tx_buffer, dma),
-			       dma_unmap_len(tx_buffer, len),
-			       DMA_TO_DEVICE);
+		dma_unmap_single_attrs(ring->dev,
+				       dma_unmap_addr(tx_buffer, dma),
+				       dma_unmap_len(tx_buffer, len),
+				       DMA_TO_DEVICE,
+				       ring->dma_attrs);
 	}
 
 	tx_buffer->next_to_watch = NULL;
@@ -662,6 +664,8 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
 	struct i40e_tx_buffer *tx_buf;
 	struct i40e_tx_desc *tx_head;
 	struct i40e_tx_desc *tx_desc;
+	dma_addr_t addr;
+	size_t size;
 	unsigned int total_bytes = 0, total_packets = 0;
 	unsigned int budget = vsi->work_limit;
 
@@ -696,10 +700,11 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
 		napi_consume_skb(tx_buf->skb, napi_budget);
 
 		/* unmap skb header data */
-		dma_unmap_single(tx_ring->dev,
-				 dma_unmap_addr(tx_buf, dma),
-				 dma_unmap_len(tx_buf, len),
-				 DMA_TO_DEVICE);
+		dma_unmap_single_attrs(tx_ring->dev,
+				       dma_unmap_addr(tx_buf, dma),
+				       dma_unmap_len(tx_buf, len),
+				       DMA_TO_DEVICE,
+				       tx_ring->dma_attrs);
 
 		/* clear tx_buffer data */
 		tx_buf->skb = NULL;
@@ -717,12 +722,15 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
 				tx_desc = I40E_TX_DESC(tx_ring, 0);
 			}
 
+			addr = dma_unmap_addr(tx_buf, dma);
+			size = dma_unmap_len(tx_buf, len);
 			/* unmap any remaining paged data */
 			if (dma_unmap_len(tx_buf, len)) {
-				dma_unmap_page(tx_ring->dev,
-					       dma_unmap_addr(tx_buf, dma),
-					       dma_unmap_len(tx_buf, len),
-					       DMA_TO_DEVICE);
+				dma_unmap_single_attrs(tx_ring->dev,
+						       addr,
+						       size,
+						       DMA_TO_DEVICE,
+						       tx_ring->dma_attrs);
 				dma_unmap_len_set(tx_buf, len, 0);
 			}
 		}
@@ -1010,6 +1018,11 @@ int i40e_setup_tx_descriptors(struct i40e_ring *tx_ring)
 	 */
 	tx_ring->size += sizeof(u32);
 	tx_ring->size = ALIGN(tx_ring->size, 4096);
+#ifdef CONFIG_SPARC
+	tx_ring->dma_attrs = DMA_ATTR_WEAK_ORDERING;
+#else
+	tx_ring->dma_attrs = 0;
+#endif
 	tx_ring->desc = dma_alloc_coherent(dev, tx_ring->size,
 					   &tx_ring->dma, GFP_KERNEL);
 	if (!tx_ring->desc) {
@@ -1053,7 +1066,11 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
 		if (!rx_bi->page)
 			continue;
 
-		dma_unmap_page(dev, rx_bi->dma, PAGE_SIZE, DMA_FROM_DEVICE);
+		dma_unmap_single_attrs(dev,
+				       rx_bi->dma,
+				       PAGE_SIZE,
+				       DMA_FROM_DEVICE,
+				       rx_ring->dma_attrs);
 		__free_pages(rx_bi->page, 0);
 
 		rx_bi->page = NULL;
@@ -1113,6 +1130,11 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
 	/* Round up to nearest 4K */
 	rx_ring->size = rx_ring->count * sizeof(union i40e_32byte_rx_desc);
 	rx_ring->size = ALIGN(rx_ring->size, 4096);
+#ifdef CONFIG_SPARC
+	rx_ring->dma_attrs = DMA_ATTR_WEAK_ORDERING;
+#else
+	rx_ring->dma_attrs = 0;
+#endif
 	rx_ring->desc = dma_alloc_coherent(dev, rx_ring->size,
 					   &rx_ring->dma, GFP_KERNEL);
 
@@ -1182,7 +1204,8 @@ static bool i40e_alloc_mapped_page(struct i40e_ring *rx_ring,
 	}
 
 	/* map page for use */
-	dma = dma_map_page(rx_ring->dev, page, 0, PAGE_SIZE, DMA_FROM_DEVICE);
+	dma = dma_map_single_attrs(rx_ring->dev, page_address(page), PAGE_SIZE,
+				   DMA_FROM_DEVICE, rx_ring->dma_attrs);
 
 	/* if mapping failed free memory back to system since
 	 * there isn't much point in holding memory we can't use
@@ -1695,8 +1718,11 @@ struct sk_buff *i40e_fetch_rx_buffer(struct i40e_ring *rx_ring,
 		rx_ring->rx_stats.page_reuse_count++;
 	} else {
 		/* we are not reusing the buffer so unmap it */
-		dma_unmap_page(rx_ring->dev, rx_buffer->dma, PAGE_SIZE,
-			       DMA_FROM_DEVICE);
+		dma_unmap_single_attrs(rx_ring->dev,
+				       rx_buffer->dma,
+				       PAGE_SIZE,
+				       DMA_FROM_DEVICE,
+				       rx_ring->dma_attrs);
 	}
 
 	/* clear contents of buffer_info */
@@ -2737,7 +2763,8 @@ static inline void i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 	first->skb = skb;
 	first->tx_flags = tx_flags;
 
-	dma = dma_map_single(tx_ring->dev, skb->data, size, DMA_TO_DEVICE);
+	dma = dma_map_single_attrs(tx_ring->dev, skb->data, size,
+				   DMA_TO_DEVICE, tx_ring->dma_attrs);
 
 	tx_desc = I40E_TX_DESC(tx_ring, i);
 	tx_bi = first;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 5088405..9a86212 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -327,6 +327,7 @@ struct i40e_ring {
 
 	unsigned int size;		/* length of descriptor ring in bytes */
 	dma_addr_t dma;			/* physical address of ring */
+	unsigned long dma_attrs;	/* DMA attributes */
 
 	struct i40e_vsi *vsi;		/* Backreference to associated VSI */
 	struct i40e_q_vector *q_vector;	/* Backreference to associated vector */
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH 01/28] bnxt_en: Add bnxt_set_max_func_irqs().
From: Michael Chan @ 2016-12-05 17:10 UTC (permalink / raw)
  To: David Miller; +Cc: Selvin Xavier, dledford, linux-rdma, Netdev
In-Reply-To: <20161205.114704.1700305660817315475.davem@davemloft.net>

On Mon, Dec 5, 2016 at 8:47 AM, David Miller <davem@davemloft.net> wrote:
>
> It really doesn't make any sense to only send 7 out of 28 of these
> patches to the networking list.
>
> In fact I would say that you need to split this series into two
> components.

OK.  I will resend those bnxt_en patches later today as a separate series.

>
> One that goes into the networking tree, and once that's accepted you
> can submit the IB parts to that subsystem's maintainer.

^ permalink raw reply

* [PATCH] net: stmmac: clear reset value of snps,wr_osr_lmt/snps,rd_osr_lmt before writing
From: Niklas Cassel @ 2016-12-05 17:12 UTC (permalink / raw)
  To: Giuseppe Cavallaro, Alexandre Torgue; +Cc: Niklas Cassel, netdev, linux-kernel

From: Niklas Cassel <niklas.cassel@axis.com>

WR_OSR_LMT and RD_OSR_LMT have a reset value of 1.
Since the reset value wasn't cleared before writing, the value in the
register would be incorrect if specifying an uneven value for
snps,wr_osr_lmt/snps,rd_osr_lmt.

Zero is a valid value for the properties, since the databook specifies:
maximum outstanding requests = WR_OSR_LMT + 1.

We do not want to change the behavior for existing users when the
property is missing. Therefore, default to 1 if the property is missing,
since that is the same as the reset value.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c   | 2 ++
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c      | 2 ++
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 6 ++++--
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
index 990746955216..f35385266fbf 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
@@ -43,9 +43,11 @@ static void dwmac1000_dma_axi(void __iomem *ioaddr, struct stmmac_axi *axi)
 	if (axi->axi_xit_frm)
 		value |= DMA_AXI_LPI_XIT_FRM;
 
+	value &= ~DMA_AXI_WR_OSR_LMT;
 	value |= (axi->axi_wr_osr_lmt & DMA_AXI_WR_OSR_LMT_MASK) <<
 		 DMA_AXI_WR_OSR_LMT_SHIFT;
 
+	value &= ~DMA_AXI_RD_OSR_LMT;
 	value |= (axi->axi_rd_osr_lmt & DMA_AXI_RD_OSR_LMT_MASK) <<
 		 DMA_AXI_RD_OSR_LMT_SHIFT;
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
index 577316de6ba8..e81b6e565c29 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
@@ -30,9 +30,11 @@ static void dwmac4_dma_axi(void __iomem *ioaddr, struct stmmac_axi *axi)
 	if (axi->axi_xit_frm)
 		value |= DMA_AXI_LPI_XIT_FRM;
 
+	value &= ~DMA_AXI_WR_OSR_LMT;
 	value |= (axi->axi_wr_osr_lmt & DMA_AXI_OSR_MAX) <<
 		 DMA_AXI_WR_OSR_LMT_SHIFT;
 
+	value &= ~DMA_AXI_RD_OSR_LMT;
 	value |= (axi->axi_rd_osr_lmt & DMA_AXI_OSR_MAX) <<
 		 DMA_AXI_RD_OSR_LMT_SHIFT;
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 98bf86d64d96..e528e7126b65 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -126,8 +126,10 @@ static struct stmmac_axi *stmmac_axi_setup(struct platform_device *pdev)
 	axi->axi_mb = of_property_read_bool(np, "snps,axi_mb");
 	axi->axi_rb =  of_property_read_bool(np, "snps,axi_rb");
 
-	of_property_read_u32(np, "snps,wr_osr_lmt", &axi->axi_wr_osr_lmt);
-	of_property_read_u32(np, "snps,rd_osr_lmt", &axi->axi_rd_osr_lmt);
+	if (of_property_read_u32(np, "snps,wr_osr_lmt", &axi->axi_wr_osr_lmt))
+		axi->axi_wr_osr_lmt = 1;
+	if (of_property_read_u32(np, "snps,rd_osr_lmt", &axi->axi_rd_osr_lmt))
+		axi->axi_rd_osr_lmt = 1;
 	of_property_read_u32_array(np, "snps,blen", axi->axi_blen, AXI_BLEN);
 	of_node_put(np);
 
-- 
2.1.4

^ permalink raw reply related

* Re: [patch net v2] net: fec: fix compile with CONFIG_M5272
From: David Miller @ 2016-12-05 17:14 UTC (permalink / raw)
  To: nikita.yoush
  Cc: lkp, kbuild-all, fugang.duan, troy.kisky, andrew, eric, tremyfr,
	johannes, netdev, cphealy, fabio.estevam, linux-kernel
In-Reply-To: <51f44adf-486c-2e60-97c1-037d448fad49@cogentembedded.com>

From: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Date: Mon, 5 Dec 2016 16:55:04 +0300

> Aieee   I was typing too fast today, sorry...
> 
> send separate "fix for the fix", or re-send patch without that silly typo?

If the patch hasn't been applied yet, you resend a fixed version of the
patch, always.

^ permalink raw reply

* Re: [PATCH net-next] liquidio: 'imply' ptp instead of 'select'
From: David Miller @ 2016-12-05 17:20 UTC (permalink / raw)
  To: nicolas.pitre
  Cc: arnd, felix.manlunas, tglx, david.daney, satananda.burla,
	rvatsavayi, sgoutham, netdev, linux-kernel
In-Reply-To: <alpine.LFD.2.20.1612051039010.1657@knanqh.ubzr>

From: Nicolas Pitre <nicolas.pitre@linaro.org>
Date: Mon, 5 Dec 2016 10:44:32 -0500 (EST)

> On Sat, 3 Dec 2016, David Miller wrote:
> 
>> From: Arnd Bergmann <arnd@arndb.de>
>> Date: Sat,  3 Dec 2016 00:04:32 +0100
>> 
>> > ptp now depends on the optional POSIX_TIMERS setting and fails to build
>> > if we select it without that:
>> > 
>> > warning: (LIQUIDIO_VF && TI_CPTS) selects PTP_1588_CLOCK which has unmet direct dependencies (NET && POSIX_TIMERS)
>> > warning: (LIQUIDIO_VF && TI_CPTS) selects PTP_1588_CLOCK which has unmet direct dependencies (NET && POSIX_TIMERS)
>> > ERROR: "posix_clock_unregister" [drivers/ptp/ptp.ko] undefined!
>> > ERROR: "posix_clock_register" [drivers/ptp/ptp.ko] undefined!
>> > ERROR: "pps_unregister_source" [drivers/ptp/ptp.ko] undefined!
>> > ERROR: "pps_event" [drivers/ptp/ptp.ko] undefined!
>> > ERROR: "pps_register_source" [drivers/ptp/ptp.ko] undefined!
>> > 
>> > It seems that two patches have collided here, the build failure
>> > is a result of the combination. Changing the new option to 'imply'
>> > as well fixes it.
>> > 
>> > Fixes: 111fc64a237f ("liquidio CN23XX: VF registration")
>> > Fixes: d1cbfd771ce8 ("ptp_clock: Allow for it to be optional")
>> > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>> 
>> Like the kbuild robot, when I apply this it complains about 'imply' being
>> an unknown option.
>> 
>> I guess it worked for you because support for 'imply' exists in the -next
>> tree and gets pulled in from somewhere else.
> 
> Exact.
> 
>> In any event, as-is I cannot apply this.
> 
> It should be carried in linux-next for the time being, and suggested as 
> a probable "merge resolution" to Linus when submitting your tree for 
> merging.  I think that's the best that can be done.

This means the build of pure net-next is broken, which is unacceptble, as
this is the tree that many developers do their work and build tests against.

You cannot have the proper functioning of net-next depend upon something
not in that tree.

Therefore, you must resolve this without the 'imply' keyword somehow.

Thanks.

^ permalink raw reply

* Re: [iproute PATCH v2 08/18] ss: Turn generic_proc_open() wrappers into macros
From: Phil Sutter @ 2016-12-05 17:24 UTC (permalink / raw)
  To: David Laight; +Cc: Stephen Hemminger, netdev@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB0235052@AcuExch.aculab.com>

On Mon, Dec 05, 2016 at 05:02:20PM +0000, David Laight wrote:
> From: Phil Sutter
> > Sent: 02 December 2016 10:40
> > Signed-off-by: Phil Sutter <phil@nwl.cc>
> > ---
> >  misc/ss.c | 89 ++++++++++++++-------------------------------------------------
> >  1 file changed, 19 insertions(+), 70 deletions(-)
> > 
> > diff --git a/misc/ss.c b/misc/ss.c
> > index ad38eb97b0055..71040a82ca6b1 100644
> > --- a/misc/ss.c
> > +++ b/misc/ss.c
> > @@ -327,76 +327,25 @@ static FILE *generic_proc_open(const char *env, const char *name)
> > 
> >  	return fopen(p, "r");
> >  }
> > -
> > -static FILE *net_tcp_open(void)
> > -{
> > -	return generic_proc_open("PROC_NET_TCP", "net/tcp");
> > -}
> ...
> > +#define net_tcp_open()		generic_proc_open("PROC_NET_TCP", "net/tcp")
> ...
> 
> Does that make any difference at all?
> The compiler should inline the static functions.

Probably, yes. My motivation behind it was merely the reduced LoC (and
maybe the improved clarity how these are really just wrappers, but
that's personal preference I guess).

Cheers, Phil

^ permalink raw reply

* Re: [ovs-dev] [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit
From: Michał Mirosław @ 2016-12-05 17:24 UTC (permalink / raw)
  To: Ben Pfaff; +Cc: netdev, dev, bridge
In-Reply-To: <20161203232730.GT3129@ovn.org>

On Sat, Dec 03, 2016 at 03:27:30PM -0800, Ben Pfaff wrote:
> On Sat, Dec 03, 2016 at 10:22:28AM +0100, Michał Mirosław wrote:
> > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed
> > intact through linux networking stack.
> > 
> > Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl>
> > ---
> > 
> > Dear NetDevs
> > 
> > I guess this needs to be split to the prep..convert[]..finish sequence,
> > but if you like it as is, then it's ready.
> > 
> > The biggest question is if the modified interface and vlan_present
> > is the way to go. This can be changed to use vlan_proto != 0 instead
> > of an extra flag bit.
> > 
> > As I can't test most of the driver changes, please look at them carefully.
> > OVS and bridge eyes are especially welcome.
> 
> This appears to change the established Open vSwitch userspace API.  You
> can see that simply from the way that it changes the documentation for
> the userspace API.  If I'm right about that, then this change will break
> all userspace programs that use the Open vSwitch kernel module,
> including Open vSwitch itself.

If I understood the code correctly, it does change expected meaning for
the (unlikely?) case of header truncated just before the VLAN TCI - it will
be impossible to differentiate this case from the VLAN TCI == 0.

I guess this is a problem with OVS API, because it doesn't directly show
the "missing" state of elements, but relies on an "invalid" value.

I can probably change the code to mask this change in the OVS code (by keeping
the CFI/DEI bit unusable). This somehow feels wrong, though.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [patch net v2] net: fec: fix compile with CONFIG_M5272
From: Nikita Yushchenko @ 2016-12-05 17:26 UTC (permalink / raw)
  To: David Miller
  Cc: lkp, kbuild-all, fugang.duan, troy.kisky, andrew, eric, tremyfr,
	johannes, netdev, cphealy, fabio.estevam, linux-kernel
In-Reply-To: <20161205.121406.1568673407424770643.davem@davemloft.net>

> From: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
> Date: Mon, 5 Dec 2016 16:55:04 +0300
> 
>> Aieee   I was typing too fast today, sorry...
>>
>> send separate "fix for the fix", or re-send patch without that silly typo?
> 
> If the patch hasn't been applied yet, you resend a fixed version of the
> patch, always.

Ok, will repost shortly.

What I don't understand is - how could test robot fetch it before it was
applied?

Nikita

^ permalink raw reply

* Re: [patch net v2] net: fec: fix compile with CONFIG_M5272
From: David Miller @ 2016-12-05 17:28 UTC (permalink / raw)
  To: nikita.yoush
  Cc: lkp, kbuild-all, fugang.duan, troy.kisky, andrew, eric, tremyfr,
	johannes, netdev, cphealy, fabio.estevam, linux-kernel
In-Reply-To: <90474527-926c-b66f-831b-3fb6f7128394@cogentembedded.com>

From: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Date: Mon, 5 Dec 2016 20:26:52 +0300

>> From: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
>> Date: Mon, 5 Dec 2016 16:55:04 +0300
>> 
>>> Aieee   I was typing too fast today, sorry...
>>>
>>> send separate "fix for the fix", or re-send patch without that silly typo?
>> 
>> If the patch hasn't been applied yet, you resend a fixed version of the
>> patch, always.
> 
> Ok, will repost shortly.
> 
> What I don't understand is - how could test robot fetch it before it was
> applied?

It takes them from the mailing list, and exactly this is the value of
the robot.  It can test patches before I add them to my tree.

^ permalink raw reply

* Re: [net PATCH 2/2] ipv4: Drop suffix update from resize code
From: David Miller @ 2016-12-05 17:28 UTC (permalink / raw)
  To: rshearma; +Cc: alexander.h.duyck, netdev
In-Reply-To: <46f96f3a-0e8a-5eff-0f2d-7aeb6aec8c23@brocade.com>

From: Robert Shearman <rshearma@brocade.com>
Date: Mon, 5 Dec 2016 15:05:18 +0000

> On 01/12/16 12:27, Alexander Duyck wrote:
>> It has been reported that update_suffix can be expensive when it is
>> called
>> on a large node in which most of the suffix lengths are the same.  The
>> time
>> required to add 200K entries had increased from around 3 seconds to
>> almost
>> 49 seconds.
>>
>> In order to address this we need to move the code for updating the
>> suffix
>> out of resize and instead just have it handled in the cases where we
>> are
>> pushing a node that increases the suffix length, or will decrease the
>> suffix length.
>>
>> Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length")
>> Reported-by: Robert Shearman <rshearma@brocade.com>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> $ time sudo ip route restore < ~/allroutes
> RTNETLINK answers: File exists
> RTNETLINK answers: File exists
> RTNETLINK answers: File exists
> RTNETLINK answers: File exists

What are these errors all about?

^ permalink raw reply

* Re: [PATCH v2 net-next 1/2] flow dissector: ICMP support
From: Tom Herbert @ 2016-12-05 17:29 UTC (permalink / raw)
  To: Simon Horman
  Cc: Jiri Pirko, David Miller, Linux Kernel Network Developers,
	Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, Jamal Hadi Salim,
	Jiri Pirko
In-Reply-To: <20161205142208.GA16425@penelope.horms.nl>

On Mon, Dec 5, 2016 at 6:23 AM, Simon Horman <simon.horman@netronome.com> wrote:
> On Sat, Dec 03, 2016 at 10:52:32AM -0800, Tom Herbert wrote:
>> On Sat, Dec 3, 2016 at 2:49 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> > Fri, Dec 02, 2016 at 09:31:41PM CET, simon.horman@netronome.com wrote:
>> >>Allow dissection of ICMP(V6) type and code. This re-uses transport layer
>> >>port dissection code as although ICMP is not a transport protocol and their
>> >>type and code are not ports this allows sharing of both code and storage.
>> >>
>> >>Signed-off-by: Simon Horman <simon.horman@netronome.com>
>
> ...
>
>> > Digging into this a bit more. I think it would be much nice not to mix
>> > up l4 ports and icmp stuff.
>> >
>> > How about to have FLOW_DISSECTOR_KEY_ICMP
>> > and
>> > struct flow_dissector_key_icmp {
>> >         u8 type;
>> >         u8 code;
>> > };
>> >
>> > The you can make this structure and struct flow_dissector_key_ports into
>> > an union in struct flow_keys.
>> >
>> > Looks much cleaner to me.
>
> Hi Jiri,
>
> I wondered about that too. The main reason that I didn't do this the first
> time around is that I wasn't sure what to do to differentiate between ports
> and icmp in fl_init_dissector() given that using a union would could to
> mask bits being set in both if they are set in either.
>
> I've taken a stab at addressing that below along with implementing your
> suggestion. If the treatment in fl_init_dissector() is acceptable
> perhaps something similar should be done for FLOW_DISSECTOR_KEY_IPV[46]_ADDRS
> too?
>
>> I agree, this patch adds to many conditionals into the fast path for
>> ICMP handling. Neither is there much point in using type and code as
>> input to the packet hash.
>
> Hi Tom,
>
> I'm not sure that I follow this. A packet may be ICMP or not regardless of
> if FLOW_DISSECTOR_KEY_PORT (and now FLOW_DISSECTOR_KEY_ICMP) port is set or
> not. I'd appreciate some guidance here.
>
>
> First-pass at implementing Jiri's suggestion follows:
>
> diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
> index 5540dfa18872..475cd121496f 100644
> --- a/include/net/flow_dissector.h
> +++ b/include/net/flow_dissector.h
> @@ -91,14 +91,10 @@ struct flow_dissector_key_addrs {
>
>  /**
>   * flow_dissector_key_ports:
> - *     @ports: port numbers of Transport header or
> - *             type and code of ICMP header
> + *     @ports: port numbers of Transport header
>   *             ports: source (high) and destination (low) port numbers
>   *             src: source port number
>   *             dst: destination port number
> - *             icmp: ICMP type (high) and code (low)
> - *             type: ICMP type
> - *             type: ICMP code
>   */
>  struct flow_dissector_key_ports {
>         union {
> @@ -107,6 +103,18 @@ struct flow_dissector_key_ports {
>                         __be16 src;
>                         __be16 dst;
>                 };
> +       };
> +};
> +
> +/**
> + * flow_dissector_key_icmp:
> + *     @ports: type and code of ICMP header
> + *             icmp: ICMP type (high) and code (low)
> + *             type: ICMP type
> + *             code: ICMP code
> + */
> +struct flow_dissector_key_icmp {
> +       union {
>                 __be16 icmp;
>                 struct {
>                         u8 type;
> @@ -133,6 +141,7 @@ enum flow_dissector_key_id {
>         FLOW_DISSECTOR_KEY_IPV4_ADDRS, /* struct flow_dissector_key_ipv4_addrs */
>         FLOW_DISSECTOR_KEY_IPV6_ADDRS, /* struct flow_dissector_key_ipv6_addrs */
>         FLOW_DISSECTOR_KEY_PORTS, /* struct flow_dissector_key_ports */
> +       FLOW_DISSECTOR_KEY_ICMP, /* struct flow_dissector_key_icmp */
>         FLOW_DISSECTOR_KEY_ETH_ADDRS, /* struct flow_dissector_key_eth_addrs */
>         FLOW_DISSECTOR_KEY_TIPC_ADDRS, /* struct flow_dissector_key_tipc_addrs */
>         FLOW_DISSECTOR_KEY_VLAN, /* struct flow_dissector_key_flow_vlan */
> @@ -171,7 +180,10 @@ struct flow_keys {
>         struct flow_dissector_key_tags tags;
>         struct flow_dissector_key_vlan vlan;
>         struct flow_dissector_key_keyid keyid;
> -       struct flow_dissector_key_ports ports;
> +       union {
> +               struct flow_dissector_key_ports ports;
> +               struct flow_dissector_key_icmp icmp;
> +       };
>         struct flow_dissector_key_addrs addrs;
>  };
>
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 0584b4bb4390..33e5fa2b3e87 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -139,6 +139,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>         struct flow_dissector_key_basic *key_basic;
>         struct flow_dissector_key_addrs *key_addrs;
>         struct flow_dissector_key_ports *key_ports;
> +       struct flow_dissector_key_icmp *key_icmp;
>         struct flow_dissector_key_tags *key_tags;
>         struct flow_dissector_key_vlan *key_vlan;
>         struct flow_dissector_key_keyid *key_keyid;
> @@ -559,18 +560,25 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>                 break;
>         }
>
> -       if (dissector_uses_key(flow_dissector,
> -                              FLOW_DISSECTOR_KEY_PORTS)) {
> -               key_ports = skb_flow_dissector_target(flow_dissector,
> -                                                     FLOW_DISSECTOR_KEY_PORTS,
> -                                                     target_container);
> -               if (flow_protos_are_icmp_any(proto, ip_proto))
> -                       key_ports->icmp = skb_flow_get_be16(skb, nhoff, data,
> -                                                           hlen);
> -               else
> +       if (flow_protos_are_icmp_any(proto, ip_proto)) {
> +               if (dissector_uses_key(flow_dissector,
> +                                      FLOW_DISSECTOR_KEY_ICMP)) {
> +                       key_icmp = skb_flow_dissector_target(flow_dissector,
> +                                                            FLOW_DISSECTOR_KEY_ICMP,
> +                                                            target_container);
> +                       key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data,
> +                                                          hlen);
> +               }
> +       } else {
> +               if (dissector_uses_key(flow_dissector,
> +                                      FLOW_DISSECTOR_KEY_PORTS)) {
> +                       key_ports = skb_flow_dissector_target(flow_dissector,
> +                                                             FLOW_DISSECTOR_KEY_PORTS,
> +                                                             target_container);
>                         key_ports->ports = __skb_flow_get_ports(skb, nhoff,
>                                                                 ip_proto, data,
>                                                                 hlen);
> +               }
>         }
>
>  out_good:
> @@ -975,6 +983,10 @@ static const struct flow_dissector_key flow_keys_dissector_keys[] = {
>                 .offset = offsetof(struct flow_keys, ports),
>         },
>         {
> +               .key_id = FLOW_DISSECTOR_KEY_ICMP,
> +               .offset = offsetof(struct flow_keys, icmp),
> +       },

This is the problem I was referring to. This adds ICMP to the keys for
computing the skb hash and the ICMP type and code are in a union with
port numbers so they are in the range over which the hash is computed.
This means that we are treating type and code as some sort of flow and
and so different type/code between the same src/dst could follow
different paths in ECMP. For the purposes of computing a packet hash I
don't think we want ICMP information included. This might be a matter
of not putting FLOW_DISSECTOR_KEY_ICMP in flow_keys_dissector_keys,
where we need this information we could define another structure that
has all the same fields as in flow_keys_dissector_keys and include
FLOW_DISSECTOR_KEY_ICMP. Alternatively, the flow_dissector_key_icmp
could also be outside of the area that is used in the hash: that is no
in flow_keys_hash_start(keys) to flow_keys_hash_length(keys), keyval);

> +       {
>                 .key_id = FLOW_DISSECTOR_KEY_VLAN,
>                 .offset = offsetof(struct flow_keys, vlan),
>         },
> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
> index c96b2a349779..f4ba69bd362f 100644
> --- a/net/sched/cls_flower.c
> +++ b/net/sched/cls_flower.c
> @@ -38,7 +38,10 @@ struct fl_flow_key {
>                 struct flow_dissector_key_ipv4_addrs ipv4;
>                 struct flow_dissector_key_ipv6_addrs ipv6;
>         };
> -       struct flow_dissector_key_ports tp;
> +       union {
> +               struct flow_dissector_key_ports tp;
> +               struct flow_dissector_key_icmp icmp;
> +       };

When an ICMP packet is encapsulated within UDP then icmp overwrites
valid port information with is a stronger signal of the flow (unless
FLOW_DISSECTOR_F_STOP_AT_ENCAP is set). This is another reason not to
use a union with ports.

Tom

>         struct flow_dissector_key_keyid enc_key_id;
>         union {
>                 struct flow_dissector_key_ipv4_addrs enc_ipv4;
> @@ -511,19 +514,23 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
>                                &mask->tp.dst, TCA_FLOWER_KEY_SCTP_DST_MASK,
>                                sizeof(key->tp.dst));
>         } else if (flow_basic_key_is_icmpv4(&key->basic)) {
> -               fl_set_key_val(tb, &key->tp.type, TCA_FLOWER_KEY_ICMPV4_TYPE,
> -                              &mask->tp.type, TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
> -                              sizeof(key->tp.type));
> -               fl_set_key_val(tb, &key->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
> -                              &mask->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
> -                              sizeof(key->tp.code));
> +               fl_set_key_val(tb, &key->icmp.type, TCA_FLOWER_KEY_ICMPV4_TYPE,
> +                              &mask->icmp.type,
> +                              TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
> +                              sizeof(key->icmp.type));
> +               fl_set_key_val(tb, &key->icmp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
> +                              &mask->icmp.code,
> +                              TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
> +                              sizeof(key->icmp.code));
>         } else if (flow_basic_key_is_icmpv6(&key->basic)) {
> -               fl_set_key_val(tb, &key->tp.type, TCA_FLOWER_KEY_ICMPV6_TYPE,
> -                              &mask->tp.type, TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
> -                              sizeof(key->tp.type));
> -               fl_set_key_val(tb, &key->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
> -                              &mask->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
> -                              sizeof(key->tp.code));
> +               fl_set_key_val(tb, &key->icmp.type, TCA_FLOWER_KEY_ICMPV6_TYPE,
> +                              &mask->icmp.type,
> +                              TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
> +                              sizeof(key->icmp.type));
> +               fl_set_key_val(tb, &key->icmp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
> +                              &mask->icmp.code,
> +                              TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
> +                              sizeof(key->icmp.code));
>         }
>
>         if (tb[TCA_FLOWER_KEY_ENC_IPV4_SRC] ||
> @@ -609,15 +616,16 @@ static int fl_init_hashtable(struct cls_fl_head *head,
>                 keys[cnt].key_id = id;                                          \
>                 keys[cnt].offset = FL_KEY_MEMBER_OFFSET(member);                \
>                 cnt++;                                                          \
> -       } while(0);
> +       } while(0)
>
>  #define FL_KEY_SET_IF_MASKED(mask, keys, cnt, id, member)                      \
>         do {                                                                    \
>                 if (FL_KEY_IS_MASKED(mask, member))                             \
>                         FL_KEY_SET(keys, cnt, id, member);                      \
> -       } while(0);
> +       } while(0)
>
>  static void fl_init_dissector(struct cls_fl_head *head,
> +                             struct cls_fl_filter *f,
>                               struct fl_flow_mask *mask)
>  {
>         struct flow_dissector_key keys[FLOW_DISSECTOR_KEY_MAX];
> @@ -631,8 +639,12 @@ static void fl_init_dissector(struct cls_fl_head *head,
>                              FLOW_DISSECTOR_KEY_IPV4_ADDRS, ipv4);
>         FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
>                              FLOW_DISSECTOR_KEY_IPV6_ADDRS, ipv6);
> -       FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
> -                            FLOW_DISSECTOR_KEY_PORTS, tp);
> +       if (flow_basic_key_is_icmpv4(&f->key.basic))
> +               FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
> +                                    FLOW_DISSECTOR_KEY_ICMP, icmp);
> +       else
> +               FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
> +                                    FLOW_DISSECTOR_KEY_PORTS, tp);
>         FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
>                              FLOW_DISSECTOR_KEY_VLAN, vlan);
>         FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
> @@ -652,6 +664,7 @@ static void fl_init_dissector(struct cls_fl_head *head,
>  }
>
>  static int fl_check_assign_mask(struct cls_fl_head *head,
> +                               struct cls_fl_filter *f,
>                                 struct fl_flow_mask *mask)
>  {
>         int err;
> @@ -672,7 +685,7 @@ static int fl_check_assign_mask(struct cls_fl_head *head,
>         memcpy(&head->mask, mask, sizeof(head->mask));
>         head->mask_assigned = true;
>
> -       fl_init_dissector(head, mask);
> +       fl_init_dissector(head, f, mask);
>
>         return 0;
>  }
> @@ -785,7 +798,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>         if (err)
>                 goto errout;
>
> -       err = fl_check_assign_mask(head, &mask);
> +       err = fl_check_assign_mask(head, fnew, &mask);
>         if (err)
>                 goto errout;
>
> @@ -1000,24 +1013,24 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
>                                   sizeof(key->tp.dst))))
>                 goto nla_put_failure;
>         else if (flow_basic_key_is_icmpv4(&key->basic) &&
> -                (fl_dump_key_val(skb, &key->tp.type,
> -                                 TCA_FLOWER_KEY_ICMPV4_TYPE, &mask->tp.type,
> +                (fl_dump_key_val(skb, &key->icmp.type,
> +                                 TCA_FLOWER_KEY_ICMPV4_TYPE, &mask->icmp.type,
>                                   TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
> -                                 sizeof(key->tp.type)) ||
> -                 fl_dump_key_val(skb, &key->tp.code,
> -                                 TCA_FLOWER_KEY_ICMPV4_CODE, &mask->tp.code,
> +                                 sizeof(key->icmp.type)) ||
> +                 fl_dump_key_val(skb, &key->icmp.code,
> +                                 TCA_FLOWER_KEY_ICMPV4_CODE, &mask->icmp.code,
>                                   TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
> -                                 sizeof(key->tp.code))))
> +                                 sizeof(key->icmp.code))))
>                 goto nla_put_failure;
>         else if (flow_basic_key_is_icmpv6(&key->basic) &&
> -                (fl_dump_key_val(skb, &key->tp.type,
> -                                 TCA_FLOWER_KEY_ICMPV6_TYPE, &mask->tp.type,
> +                (fl_dump_key_val(skb, &key->icmp.type,
> +                                 TCA_FLOWER_KEY_ICMPV6_TYPE, &mask->icmp.type,
>                                   TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
> -                                 sizeof(key->tp.type)) ||
> -                 fl_dump_key_val(skb, &key->tp.code,
> -                                 TCA_FLOWER_KEY_ICMPV6_CODE, &mask->tp.code,
> +                                 sizeof(key->icmp.type)) ||
> +                 fl_dump_key_val(skb, &key->icmp.code,
> +                                 TCA_FLOWER_KEY_ICMPV6_CODE, &mask->icmp.code,
>                                   TCA_FLOWER_KEY_ICMPV6_CODE_MASK,
> -                                 sizeof(key->tp.code))))
> +                                 sizeof(key->icmp.code))))
>                 goto nla_put_failure;
>
>         if (key->enc_control.addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS &&

^ permalink raw reply

* Re: [net PATCH 2/2] ipv4: Drop suffix update from resize code
From: Duyck, Alexander H @ 2016-12-05 17:36 UTC (permalink / raw)
  To: davem@davemloft.net, rshearma@brocade.com; +Cc: netdev@vger.kernel.org
In-Reply-To: <20161205.122846.557360857895468724.davem@davemloft.net>

On Mon, 2016-12-05 at 12:28 -0500, David Miller wrote:
> From: Robert Shearman <rshearma@brocade.com>
> Date: Mon, 5 Dec 2016 15:05:18 +0000
> 
> > 
> > On 01/12/16 12:27, Alexander Duyck wrote:
> > > 
> > > It has been reported that update_suffix can be expensive when it is
> > > called
> > > on a large node in which most of the suffix lengths are the same.  The
> > > time
> > > required to add 200K entries had increased from around 3 seconds to
> > > almost
> > > 49 seconds.
> > > 
> > > In order to address this we need to move the code for updating the
> > > suffix
> > > out of resize and instead just have it handled in the cases where we
> > > are
> > > pushing a node that increases the suffix length, or will decrease the
> > > suffix length.
> > > 
> > > Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length")
> > > Reported-by: Robert Shearman <rshearma@brocade.com>
> > > Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> > 
> > $ time sudo ip route restore < ~/allroutes
> > RTNETLINK answers: File exists
> > RTNETLINK answers: File exists
> > RTNETLINK answers: File exists
> > RTNETLINK answers: File exists
> 
> What are these errors all about?

I think it is the fact that he is trying to restore "all routes" and
some of the routes already exist such as those associated with his
default network interface.

- Alex

^ permalink raw reply

* [patch net v3] net: fec: fix compile with CONFIG_M5272
From: Nikita Yushchenko @ 2016-12-05 17:41 UTC (permalink / raw)
  To: David S. Miller, Fugang Duan, Troy Kisky, Andrew Lunn,
	Eric Nelson, Philippe Reynes, Johannes Berg, netdev
  Cc: Chris Healy, Fabio Estevam, linux-kernel, Nikita Yushchenko

Commit 4dfb80d18d05 ("net: fec: cache statistics while device is down")
introduced unconditional statistics-related actions.

However, when driver is compiled with CONFIG_M5272, staticsics-related
definitions do not exist, which results into build errors.

Fix that by adding explicit handling of !defined(CONFIG_M5272) case.

Fixes: 4dfb80d18d05 ("net: fec: cache statistics while device is down")
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
---
Changes since v2:
- fix typo that caused compile problem, double-check that it compiles
  and works as expected.

 drivers/net/ethernet/freescale/fec_main.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 5f77caa59534..12aef1b15356 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2313,6 +2313,8 @@ static const struct fec_stat {
 	{ "IEEE_rx_octets_ok", IEEE_R_OCTETS_OK },
 };
 
+#define FEC_STATS_SIZE		(ARRAY_SIZE(fec_stats) * sizeof(u64))
+
 static void fec_enet_update_ethtool_stats(struct net_device *dev)
 {
 	struct fec_enet_private *fep = netdev_priv(dev);
@@ -2330,7 +2332,7 @@ static void fec_enet_get_ethtool_stats(struct net_device *dev,
 	if (netif_running(dev))
 		fec_enet_update_ethtool_stats(dev);
 
-	memcpy(data, fep->ethtool_stats, ARRAY_SIZE(fec_stats) * sizeof(u64));
+	memcpy(data, fep->ethtool_stats, FEC_STATS_SIZE);
 }
 
 static void fec_enet_get_strings(struct net_device *netdev,
@@ -2355,6 +2357,12 @@ static int fec_enet_get_sset_count(struct net_device *dev, int sset)
 		return -EOPNOTSUPP;
 	}
 }
+
+#else	/* !defined(CONFIG_M5272) */
+#define FEC_STATS_SIZE	0
+static inline void fec_enet_update_ethtool_stats(struct net_device *dev)
+{
+}
 #endif /* !defined(CONFIG_M5272) */
 
 static int fec_enet_nway_reset(struct net_device *dev)
@@ -3293,8 +3301,7 @@ fec_probe(struct platform_device *pdev)
 
 	/* Init network device */
 	ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private) +
-				  ARRAY_SIZE(fec_stats) * sizeof(u64),
-				  num_tx_qs, num_rx_qs);
+				  FEC_STATS_SIZE, num_tx_qs, num_rx_qs);
 	if (!ndev)
 		return -ENOMEM;
 
-- 
2.1.4

^ permalink raw reply related

* [PATCH 130/130] Fixed to checkpatch errors.
From: Ozgur Karatas @ 2016-12-05 17:42 UTC (permalink / raw)
  To: johannes, David Miller; +Cc: linux-wireless, netdev, linux-kernel

Hello,

Fixed to checkpatch errors.

ERROR: net/wireless/util.c:1787: ERROR: that open brace { should be on the previous line
ERROR: net/wireless/util.c:1792: ERROR: that open brace { should be on the previous line

Signed-off-by: Ozgur Karatas <okaratas@member.fsf.org>
---
 net/wireless/util.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/wireless/util.c b/net/wireless/util.c
index 659b507..f4ac755 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -1783,11 +1783,13 @@ EXPORT_SYMBOL(cfg80211_free_nan_func);
 
 /* See IEEE 802.1H for LLC/SNAP encapsulation/decapsulation */
 /* Ethernet-II snap header (RFC1042 for most EtherTypes) */
-const unsigned char rfc1042_header[] __aligned(2) =
-	{ 0xaa, 0xaa, 0x03, 0x00, 0x00, 0x00 };
+const unsigned char rfc1042_header[] __aligned(2) = {
+	0xaa, 0xaa, 0x03, 0x00, 0x00, 0x00
+};
 EXPORT_SYMBOL(rfc1042_header);
 
 /* Bridge-Tunnel header (for EtherTypes ETH_P_AARP and ETH_P_IPX) */
-const unsigned char bridge_tunnel_header[] __aligned(2) =
-	{ 0xaa, 0xaa, 0x03, 0x00, 0x00, 0xf8 };
+const unsigned char bridge_tunnel_header[] __aligned(2) = {
+	0xaa, 0xaa, 0x03, 0x00, 0x00, 0xf8
+};
 EXPORT_SYMBOL(bridge_tunnel_header);
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH net-next 2/4] mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs
From: Jakub Kicinski @ 2016-12-05 17:46 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Rick Jones, netdev, Alexei Starovoitov, Brenden Blanco,
	Daniel Borkmann, David Miller, Saeed Mahameed, Tariq Toukan,
	Kernel Team
In-Reply-To: <20161204031958.GD70461@kafai-mba.local>

On Sat, 3 Dec 2016 19:19:58 -0800, Martin KaFai Lau wrote:
> AFAIK, only mlx4/5 supports XDP now.

That is, unfortunately, not true.

$ git checkout net-next/master
$ git grep ndo_xdp drivers/
drivers/net/ethernet/mellanox/mlx4/en_netdev.c: .ndo_xdp                = mlx4_xdp,
drivers/net/ethernet/mellanox/mlx4/en_netdev.c: .ndo_xdp                = mlx4_xdp,
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:      .ndo_xdp                 = mlx5e_xdp,
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:      .ndo_xdp                 = mlx5e_xdp,
drivers/net/ethernet/netronome/nfp/nfp_net_common.c:    .ndo_xdp                = nfp_net_xdp,
drivers/net/ethernet/qlogic/qede/qede_main.c:   .ndo_xdp = qede_xdp,

^ permalink raw reply

* Re: [PATCH v2 net-next 0/4]: Allow head adjustment in XDP prog
From: Jakub Kicinski @ 2016-12-05 17:53 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, Alexei Starovoitov, Brenden Blanco, Daniel Borkmann,
	David Miller, Jesper Dangaard Brouer, Saeed Mahameed,
	Tariq Toukan, Kernel Team
In-Reply-To: <1480821446-4122277-1-git-send-email-kafai@fb.com>

On Sat, 3 Dec 2016 19:17:22 -0800, Martin KaFai Lau wrote:
> This series adds a helper to allow head adjusting in XDP prog.  mlx4
> driver has been modified to support this feature.  An example is written
> to encapsulate a packet with an IPv4/v6 header and then XDP_TX it
> out.

Can we just add a feature to one of four drivers which support XDP
today and AFAICT leave it completely broken on remaining tree?

I'm not seeing any way for the drivers to inform the stack about their
capabilities, would that not be a pre-requisite?

^ permalink raw reply

* Re: commit : ppp: add rtnetlink device creation support - breaks netcf on my machine.
From: Guillaume Nault @ 2016-12-05 17:53 UTC (permalink / raw)
  To: Brad Campbell; +Cc: netdev
In-Reply-To: <32208f70-4165-e79b-89ee-543eefaf40ef@fnarfbargle.com>

On Mon, Dec 05, 2016 at 05:06:31PM +0800, Brad Campbell wrote:
> Machine is an old Debian (7.11) box with a recent compile of netcf. Any
> kernel later than 4.6.7 prevents ncftool from running if the ppp interface
> is up.
> 
> I'm using netcf-0.2.8, and this commit breaks it :
> 
> 96d934c70db6e1bc135600c57da1285eaf7efb26 is the first bad commit
> commit 96d934c70db6e1bc135600c57da1285eaf7efb26
> Author: Guillaume Nault <g.nault@alphalink.fr>
> Date:   Thu Apr 28 17:55:30 2016 +0200
> 
>     ppp: add rtnetlink device creation support
> 
> The issue is netcf dies if the ppp interface is up with the informative
> message "Failed to initialize netcf". No debug output. Enabling debugging in
> libnl shows it dies when the data for the ppp interface is returned.
> 
> Take ppp down and ncftool comes up ok.
> 
> libnl appears to be version 3.2.24 (Debian libnl-3)
> 
> This has been manifesting on a production machine with a pppoe link to the
> world, but I managed to reproduce it on a test box using a pppoe server and
> a client on the test box which enabled me to bisect it.
> 
Can you send a minimal configuration file that triggers the bug?
I've set up a virtual machine (Linux 4.7.0, netcf 0.2.8 backported from
Debian Sid), but couldn't reproduce the issue so far.

> I don't know anything about netlink, nor the networking subsystem so I'm
> asking for a belt with a cluebat please. I'm using the latest *released*
> netcf, and a quick check of the git tree does not seem to indicate anything
> that might be related. Have I missed a kernel config option, or made some
> catastrophic blunder in setting this up? It works fine on 4.6.4, but
> anything 4.7 or later just dies if the ppp interface is up.
> 
Probably not a mistake on your side. I've started looking at netcf'
source code, but haven't found anything that could explain your issue.
It'd really help if you could provide steps to reproduce the bug.

Regards,

Guillaume

^ permalink raw reply

* Re: [PATCH 1/7] net: ethernet: ti: cpdma: am437x: allow descs to be plased in ddr
From: Grygorii Strashko @ 2016-12-05 17:57 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, mugunthanvnm, nsekhar, linux-kernel, linux-omap,
	ivan.khoronzhuk
In-Reply-To: <20161203.153419.482900037601991931.davem@davemloft.net>



On 12/03/2016 02:34 PM, David Miller wrote:
> From: Grygorii Strashko <grygorii.strashko@ti.com>
> Date: Thu, 1 Dec 2016 17:34:26 -0600
> 
>> @@ -167,10 +167,10 @@ static struct cpdma_control_info controls[] = {
>>  
>>  /* various accessors */
>>  #define dma_reg_read(ctlr, ofs)		__raw_readl((ctlr)->dmaregs + (ofs))
>> -#define chan_read(chan, fld)		__raw_readl((chan)->fld)
>> +#define chan_read(chan, fld)		readl((chan)->fld)
>>  #define desc_read(desc, fld)		__raw_readl(&(desc)->fld)
>>  #define dma_reg_write(ctlr, ofs, v)	__raw_writel(v, (ctlr)->dmaregs + (ofs))
>> -#define chan_write(chan, fld, v)	__raw_writel(v, (chan)->fld)
>> +#define chan_write(chan, fld, v)	writel(v, (chan)->fld)
>>  #define desc_write(desc, fld, v)	__raw_writel((u32)(v), &(desc)->fld)
> 
> Unless you want to keep running into subtle errors all over the
> place wrt. register vs. memory write ordering, I strong suggest
> you use strongly ordered readl/writel for all register accesses.
> 
> I see no tangible, worthwhile, advantage to using these relaxed
> ordering primitives.  The only result is potential bugs.
> 
> People who use the relaxed ordering primitives properly are only
> doing so in extremely carefully coded sequences where a series
> of writes has no dependency on main memory operations and is
> explicitly completed with a barrier operation such as a read
> back of a register in the same device.
> 
> That's not at all what is going on here, instead the driver is wildly
> using relaxed ordered register accesses for basically everything.
> This is extremely unwise and it's why you ran into this bug in the
> first place.
> 
> Therefore, I absolutely require that you fix this by eliminating
> any and all usese of relaxed ordering I/O accessors in this driver.

Thanks for your comments - that's what I've tried first, but that decided
to find out minimal change which still works.

I'll update it. 

-- 
regards,
-grygorii

^ permalink raw reply

* [PATCH] net/udp: do not touch skb->peeked unless really needed
From: Eric Dumazet @ 2016-12-05 17:57 UTC (permalink / raw)
  To: Paolo Abeni, David Miller; +Cc: netdev, Willem de Bruijn
In-Reply-To: <1480948133.18162.527.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <edumazet@google.com>

In UDP recvmsg() path we currently access 3 cache lines from an skb
while holding receive queue lock, plus another one if packet is
dequeued, since we need to change skb->next->prev

1st cache line (contains ->next/prev pointers, offsets 0x00 and 0x08)
2nd cache line (skb->len & skb->peeked, offsets 0x80 and 0x8e)
3rd cache line (skb->truesize/users, offsets 0xe0 and 0xe4)

skb->peeked is only needed to make sure 0-length packets are properly
handled while MSG_PEEK is operated.

I had first the intent to remove skb->peeked but the "MSG_PEEK at
non-zero offset" support added by Sam Kumar makes this not possible.

This patch avoids one cache line miss during the locked section, when
skb->len and skb->peeked do not have to be read.

It also avoids the skb_set_peeked() cost for non empty UDP datagrams.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/datagram.c |   19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index 49816af8586bb832e806972b486588041a99524c..9482037a5c8c64aec79e42c65bd2691bdd9450a3 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -214,6 +214,7 @@ struct sk_buff *__skb_try_recv_datagram(struct sock *sk, unsigned int flags,
 	if (error)
 		goto no_packet;
 
+	*peeked = 0;
 	do {
 		/* Again only user level code calls this function, so nothing
 		 * interrupt level will suddenly eat the receive_queue.
@@ -227,22 +228,22 @@ struct sk_buff *__skb_try_recv_datagram(struct sock *sk, unsigned int flags,
 		spin_lock_irqsave(&queue->lock, cpu_flags);
 		skb_queue_walk(queue, skb) {
 			*last = skb;
-			*peeked = skb->peeked;
 			if (flags & MSG_PEEK) {
 				if (_off >= skb->len && (skb->len || _off ||
 							 skb->peeked)) {
 					_off -= skb->len;
 					continue;
 				}
-
-				skb = skb_set_peeked(skb);
-				error = PTR_ERR(skb);
-				if (IS_ERR(skb)) {
-					spin_unlock_irqrestore(&queue->lock,
-							       cpu_flags);
-					goto no_packet;
+				if (!skb->len) {
+					skb = skb_set_peeked(skb);
+					if (IS_ERR(skb)) {
+						error = PTR_ERR(skb);
+						spin_unlock_irqrestore(&queue->lock,
+								       cpu_flags);
+						goto no_packet;
+					}
 				}
-
+				*peeked = 1;
 				atomic_inc(&skb->users);
 			} else {
 				__skb_unlink(skb, queue);

^ permalink raw reply related

* Re: [patch net v3] net: fec: fix compile with CONFIG_M5272
From: Nikita Yushchenko @ 2016-12-05 18:06 UTC (permalink / raw)
  To: Vitaly Wool
  Cc: Johannes Berg, Troy Kisky, linux-kernel, Chris Healy,
	David S. Miller, Fugang Duan, Eric Nelson, Andrew Lunn,
	Fabio Estevam, netdev, Philippe Reynes
In-Reply-To: <CAMJBoFNd==OugwCjwT-q1MkooFr+Ed7AF00ifdimTk_DFVJO9Q@mail.gmail.com>

>     +#define FEC_STATS_SIZE         (ARRAY_SIZE(fec_stats) * sizeof(u64))
> 
> 
> Do I take it right this actually translates to (sizeof(fec_stats) /
> sizeof(u64) * sizeof(u64))?

No.

fec_stats is an array of structs, each struct has car arrsy and integer,
and size of that is definitely not bytes.

ARRAY_SIZE(fec_stats) is number of stats, i.e. it is returned from
fec_enet_get_sset_count()

Each stat in blob is u64. Thus (ARRAY_SIZE(fec_stats) * sizeof(u64)) is
size of stats blob.

^ permalink raw reply

* Re: [PATCH v2] net: phy: dp83848: Support ethernet pause frames
From: David Miller @ 2016-12-05 18:14 UTC (permalink / raw)
  To: jesper.nilsson; +Cc: f.fainelli, rogerq, afd, dmurphy, netdev, linux-kernel
In-Reply-To: <20161202145748.GA19016@axis.com>

From: Jesper Nilsson <jesper.nilsson@axis.com>
Date: Fri, 2 Dec 2016 15:57:49 +0100

> According to the documentation, the PHYs supported by this driver
> can also support pause frames. Announce this to be so.
> Tested with a TI83822I.
> 
> Acked-by: Andrew F. Davis <afd@ti.com>
> Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>

Applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH iproute2/net-next v3 0/3] tc: flower: SCTP and other port fixes
From: Stephen Hemminger @ 2016-12-05 18:14 UTC (permalink / raw)
  To: Simon Horman; +Cc: netdev
In-Reply-To: <1480755160-27768-1-git-send-email-simon.horman@netronome.com>

On Sat,  3 Dec 2016 09:52:37 +0100
Simon Horman <simon.horman@netronome.com> wrote:

> Hi Stephen,
> 
> this short series:
> 
> * Makes some improvements to the documentation of flower;
>   A follow-up to recent work by Paul Blakey and myself.
> * Corrects some errors introduced when SCTP port matching support was
>   recently added.
> 
> Changes v2->v3:
> * Rebase
> * Dropped merged patch
> 
> Changes v1->v2:
> * Rebase
> 
> Simon Horman (3):
>   tc: flower: document SCTP ip_proto
>   tc: flower: correct name of ip_proto parameter to flower_parse_port()
>   tc: flower: make use of flower_port_attr_type() safe and silent
> 
>  man/man8/tc-flower.8 | 14 +++++++-------
>  tc/f_flower.c        | 32 +++++++++++++++++---------------
>  2 files changed, 24 insertions(+), 22 deletions(-)
> 

Applied to net-next

^ permalink raw reply

* Re: [net PATCH 0/2] IPv4 FIB suffix length fixes
From: David Miller @ 2016-12-05 18:16 UTC (permalink / raw)
  To: alexander.h.duyck; +Cc: netdev, rshearma
In-Reply-To: <20161201121605.15499.13176.stgit@ahduyck-blue-test.jf.intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>
Date: Thu, 01 Dec 2016 07:27:47 -0500

> In reviewing the patch from Robert Shearman and looking over the code I
> realized there were a few different bugs we were still carrying in the IPv4
> FIB lookup code.
> 
> These two patches are based off of Robert's original patch, but take things
> one step further by splitting them up to address two additional issues I
> found.
> 
> So first have Robert's original patch which was addressing the fact that
> us calling update_suffix in resize is expensive when it is called per add.
> To address that I incorporated the core bit of the patch which was us
> dropping the update_suffix call from resize.
> 
> The first patch in the series does a rename and fix on the push_suffix and
> pull_suffix code.  Specifically we drop the need to pass a leaf and
> secondly we fix things so we pull the suffix as long as the value of the
> suffix in the node is dropping.
> 
> The second patch addresses the original issue reported as well as
> optimizing the code for the fact that update_suffix is only really meant to
> go through and clean things up when we are decreasing a suffix.  I had
> originally added code for it to somehow cause an increase, but if we push
> the suffix when a new leaf is added we only ever have to handle pulling
> down the suffix with update_suffix so I updated the code to reflect that.
> 
> As far as side effects the only ones I think that will be obvious should be
> the fact that some routes may be able to be found earlier since before we
> relied on resize to update the suffix lengths, and now we are updating them
> before we add or remove the leaf.

Series applied and queued up for -stable, thanks Alex.

^ permalink raw reply

* Re: [PATCH 0/8] irda: w83977af_ir: Neatening
From: Joe Perches @ 2016-12-05 18:16 UTC (permalink / raw)
  To: netdev, David Miller; +Cc: Arnd Bergmann, Samuel Ortiz, linux-kernel
In-Reply-To: <cover.1480014088.git.joe@perches.com>

On Thu, 2016-11-24 at 11:10 -0800, Joe Perches wrote:
> On top of Arnd's overly long udelay patch because I noticed a
> misindented block.
> 
> Even though I haven't turned on the netwinder in a box in in the
> garage in who knows how long, if this device is still used somewhere,
> might as well neaten the code too.

Marked as "changes requested" in patchwork?

I didn't see any change requested.

^ permalink raw reply

* Re: [PATCH] net: ping: check minimum size on ICMP header length
From: David Miller @ 2016-12-05 18:19 UTC (permalink / raw)
  To: keescook
  Cc: netdev, mchong, i, kuznet, jmorris, yoshfuji, kaber, linux-kernel
In-Reply-To: <20161203005853.GA117599@beast>

From: Kees Cook <keescook@chromium.org>
Date: Fri, 2 Dec 2016 16:58:53 -0800

> diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
> index 205e2000d395..8257be3f032c 100644
> --- a/net/ipv4/ping.c
> +++ b/net/ipv4/ping.c
> @@ -654,7 +654,7 @@ int ping_common_sendmsg(int family, struct msghdr *msg, size_t len,
>  			void *user_icmph, size_t icmph_len) {
>  	u8 type, code;
>  
> -	if (len > 0xFFFF)
> +	if (len > 0xFFFF || len < icmph_len)
>  		return -EMSGSIZE;

As suggested by Lorenzo, please use -EINVAL here.

Thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox