Netdev List

Netdev List
 help / color / mirror / Atom feed

* [v7, 3/5] dt: move guts devicetree doc out of powerpc directory
From: Yangbo Lu @ 2016-04-01  3:07 UTC (permalink / raw)
  To: devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-i2c-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-mmc-u79uwXL29TY76Z2rM5mHXA
  Cc: ulf.hansson-QSEj5FYQhm4dnm+yROfE0A, Zhao Qiang, Russell King,
	Yangbo Lu, Bhupesh Sharma, Santosh Shilimkar, Jochen Friedrich,
	scott.wood-3arQi8VN3Tc, Rob Herring, Claudiu Manoil, Kumar Gala,
	leoyang.li-3arQi8VN3Tc, xiaobo.xie-3arQi8VN3Tc
In-Reply-To: <1459480051-3701-1-git-send-email-yangbo.lu-3arQi8VN3Tc@public.gmane.org>

Move guts devicetree doc to Documentation/devicetree/bindings/soc/fsl/
since it's used by not only PowerPC but also ARM. And add a specification
for 'little-endian' property.

Signed-off-by: Yangbo Lu <yangbo.lu-3arQi8VN3Tc@public.gmane.org>
---
Changes for v2:
	- None
Changes for v3:
	- None
Changes for v4:
	- Added this patch
Changes for v5:
	- Modified the description for little-endian property
Changes for v6:
	- None
Changes for v7:
	- None
---
 Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt | 3 +++
 1 file changed, 3 insertions(+)
 rename Documentation/devicetree/bindings/{powerpc => soc}/fsl/guts.txt (91%)

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/guts.txt b/Documentation/devicetree/bindings/soc/fsl/guts.txt
similarity index 91%
rename from Documentation/devicetree/bindings/powerpc/fsl/guts.txt
rename to Documentation/devicetree/bindings/soc/fsl/guts.txt
index b71b203..07adca9 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/guts.txt
+++ b/Documentation/devicetree/bindings/soc/fsl/guts.txt
@@ -25,6 +25,9 @@ Recommended properties:
  - fsl,liodn-bits : Indicates the number of defined bits in the LIODN
    registers, for those SOCs that have a PAMU device.
 
+ - little-endian : Indicates that the global utilities block is little
+   endian. The default is big endian.
+
 Examples:
 	global-utilities@e0000 {	/* global utilities block */
 		compatible = "fsl,mpc8548-guts";
-- 
2.1.0.27.g96db324

^ permalink raw reply related

* [v7, 4/5] powerpc/fsl: move mpc85xx.h to include/linux/fsl
From: Yangbo Lu @ 2016-04-01  3:07 UTC (permalink / raw)
  To: devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-i2c-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-mmc-u79uwXL29TY76Z2rM5mHXA
  Cc: ulf.hansson-QSEj5FYQhm4dnm+yROfE0A, Zhao Qiang, Russell King,
	Yangbo Lu, Bhupesh Sharma, Santosh Shilimkar, Jochen Friedrich,
	scott.wood-3arQi8VN3Tc, Rob Herring, Claudiu Manoil, Kumar Gala,
	leoyang.li-3arQi8VN3Tc, xiaobo.xie-3arQi8VN3Tc
In-Reply-To: <1459480051-3701-1-git-send-email-yangbo.lu-3arQi8VN3Tc@public.gmane.org>

Move mpc85xx.h to include/linux/fsl and rename it to svr.h as
a common header file. It has been used for mpc85xx and it will
be used for ARM-based SoC as well.

Signed-off-by: Yangbo Lu <yangbo.lu-3arQi8VN3Tc@public.gmane.org>
Acked-by: Wolfram Sang <wsa-z923LK4zBo2bacvFa/9K2g@public.gmane.org>
---
Changes for v2:
	- None
Changes for v3:
	- None
Changes for v4:
	- None
Changes for v5:
	- Changed to Move mpc85xx.h to include/linux/fsl/
	- Adjusted '#include <linux/fsl/svr.h>' position in file
Changes for v6:
	- None
Changes for v7:
	- Added 'Acked-by: Wolfram Sang' for I2C part
	- Also applied to arch/powerpc/kernel/cpu_setup_fsl_booke.S
---
 arch/powerpc/kernel/cpu_setup_fsl_booke.S                     | 2 +-
 drivers/clk/clk-qoriq.c                                       | 3 +--
 drivers/i2c/busses/i2c-mpc.c                                  | 2 +-
 drivers/iommu/fsl_pamu.c                                      | 3 +--
 drivers/net/ethernet/freescale/gianfar.c                      | 2 +-
 arch/powerpc/include/asm/mpc85xx.h => include/linux/fsl/svr.h | 4 ++--
 6 files changed, 7 insertions(+), 9 deletions(-)
 rename arch/powerpc/include/asm/mpc85xx.h => include/linux/fsl/svr.h (97%)

diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
index 462aed9..2b0284e 100644
--- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S
+++ b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
@@ -13,13 +13,13 @@
  *
  */
 
+#include <linux/fsl/svr.h>
 #include <asm/page.h>
 #include <asm/processor.h>
 #include <asm/cputable.h>
 #include <asm/ppc_asm.h>
 #include <asm/mmu-book3e.h>
 #include <asm/asm-offsets.h>
-#include <asm/mpc85xx.h>
 
 _GLOBAL(__e500_icache_setup)
 	mfspr	r0, SPRN_L1CSR1
diff --git a/drivers/clk/clk-qoriq.c b/drivers/clk/clk-qoriq.c
index 7bc1c45..fc7f722 100644
--- a/drivers/clk/clk-qoriq.c
+++ b/drivers/clk/clk-qoriq.c
@@ -13,6 +13,7 @@
 #include <linux/clk.h>
 #include <linux/clk-provider.h>
 #include <linux/fsl/guts.h>
+#include <linux/fsl/svr.h>
 #include <linux/io.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
@@ -1148,8 +1149,6 @@ bad_args:
 }
 
 #ifdef CONFIG_PPC
-#include <asm/mpc85xx.h>
-
 static const u32 a4510_svrs[] __initconst = {
 	(SVR_P2040 << 8) | 0x10,	/* P2040 1.0 */
 	(SVR_P2040 << 8) | 0x11,	/* P2040 1.1 */
diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c
index 48ecffe..600704c 100644
--- a/drivers/i2c/busses/i2c-mpc.c
+++ b/drivers/i2c/busses/i2c-mpc.c
@@ -27,9 +27,9 @@
 #include <linux/i2c.h>
 #include <linux/interrupt.h>
 #include <linux/delay.h>
+#include <linux/fsl/svr.h>
 
 #include <asm/mpc52xx.h>
-#include <asm/mpc85xx.h>
 #include <sysdev/fsl_soc.h>
 
 #define DRV_NAME "mpc-i2c"
diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c
index a34355f..af8fb27 100644
--- a/drivers/iommu/fsl_pamu.c
+++ b/drivers/iommu/fsl_pamu.c
@@ -21,11 +21,10 @@
 #include "fsl_pamu.h"
 
 #include <linux/fsl/guts.h>
+#include <linux/fsl/svr.h>
 #include <linux/interrupt.h>
 #include <linux/genalloc.h>
 
-#include <asm/mpc85xx.h>
-
 /* define indexes for each operation mapping scenario */
 #define OMI_QMAN        0x00
 #define OMI_FMAN        0x01
diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index d2f917a..2224b10 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -86,11 +86,11 @@
 #include <linux/udp.h>
 #include <linux/in.h>
 #include <linux/net_tstamp.h>
+#include <linux/fsl/svr.h>
 
 #include <asm/io.h>
 #ifdef CONFIG_PPC
 #include <asm/reg.h>
-#include <asm/mpc85xx.h>
 #endif
 #include <asm/irq.h>
 #include <asm/uaccess.h>
diff --git a/arch/powerpc/include/asm/mpc85xx.h b/include/linux/fsl/svr.h
similarity index 97%
rename from arch/powerpc/include/asm/mpc85xx.h
rename to include/linux/fsl/svr.h
index 213f3a8..8d13836 100644
--- a/arch/powerpc/include/asm/mpc85xx.h
+++ b/include/linux/fsl/svr.h
@@ -9,8 +9,8 @@
  * (at your option) any later version.
  */
 
-#ifndef __ASM_PPC_MPC85XX_H
-#define __ASM_PPC_MPC85XX_H
+#ifndef FSL_SVR_H
+#define FSL_SVR_H
 
 #define SVR_REV(svr)	((svr) & 0xFF)		/* SOC design resision */
 #define SVR_MAJ(svr)	(((svr) >>  4) & 0xF)	/* Major revision field*/
-- 
2.1.0.27.g96db324

^ permalink raw reply related

* [v7, 5/5] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0
From: Yangbo Lu @ 2016-04-01  3:07 UTC (permalink / raw)
  To: devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-clk-u79uwXL29TY76Z2rM5mHXA,
	linux-i2c-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-mmc-u79uwXL29TY76Z2rM5mHXA
  Cc: ulf.hansson-QSEj5FYQhm4dnm+yROfE0A, Zhao Qiang, Russell King,
	Yangbo Lu, Bhupesh Sharma, Santosh Shilimkar, Jochen Friedrich,
	scott.wood-3arQi8VN3Tc, Rob Herring, Claudiu Manoil, Kumar Gala,
	leoyang.li-3arQi8VN3Tc, xiaobo.xie-3arQi8VN3Tc
In-Reply-To: <1459480051-3701-1-git-send-email-yangbo.lu-3arQi8VN3Tc@public.gmane.org>

The eSDHC of T4240-R1.0-R2.0 has incorrect vender version and spec version.
Acturally the right version numbers should be VVN=0x13 and SVN = 0x1.
This patch adds the GUTS driver support for eSDHC driver to get SVR(System
version register). And fix host version to avoid that incorrect version
numbers break down the ADMA data transfer.

Signed-off-by: Yangbo Lu <yangbo.lu-3arQi8VN3Tc@public.gmane.org>
Acked-by: Ulf Hansson <ulf.hansson-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
Changes for v2:
	- Got SVR through iomap instead of dts
Changes for v3:
	- Managed GUTS through syscon instead of iomap in eSDHC driver
Changes for v4:
	- Got SVR by GUTS driver instead of SYSCON
Changes for v5:
	- Changed to get SVR through API fsl_guts_get_svr()
	- Combined patch 4, patch 5 and patch 6 into one
Changes for v6:
	- Added 'Acked-by: Ulf Hansson'
Changes for v7:
	- None
---
 drivers/mmc/host/Kconfig          |  1 +
 drivers/mmc/host/sdhci-of-esdhc.c | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 04feea8..5743b05 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -142,6 +142,7 @@ config MMC_SDHCI_OF_ESDHC
 	depends on MMC_SDHCI_PLTFM
 	depends on PPC || ARCH_MXC || ARCH_LAYERSCAPE
 	select MMC_SDHCI_IO_ACCESSORS
+	select FSL_GUTS
 	help
 	  This selects the Freescale eSDHC controller support.
 
diff --git a/drivers/mmc/host/sdhci-of-esdhc.c b/drivers/mmc/host/sdhci-of-esdhc.c
index 3f34d35..68cc020 100644
--- a/drivers/mmc/host/sdhci-of-esdhc.c
+++ b/drivers/mmc/host/sdhci-of-esdhc.c
@@ -18,6 +18,8 @@
 #include <linux/of.h>
 #include <linux/delay.h>
 #include <linux/module.h>
+#include <linux/fsl/svr.h>
+#include <linux/fsl/guts.h>
 #include <linux/mmc/host.h>
 #include "sdhci-pltfm.h"
 #include "sdhci-esdhc.h"
@@ -28,6 +30,8 @@
 struct sdhci_esdhc {
 	u8 vendor_ver;
 	u8 spec_ver;
+	u32 soc_ver;
+	u8 soc_rev;
 };
 
 /**
@@ -73,6 +77,8 @@ static u32 esdhc_readl_fixup(struct sdhci_host *host,
 static u16 esdhc_readw_fixup(struct sdhci_host *host,
 				     int spec_reg, u32 value)
 {
+	struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
+	struct sdhci_esdhc *esdhc = sdhci_pltfm_priv(pltfm_host);
 	u16 ret;
 	int shift = (spec_reg & 0x2) * 8;
 
@@ -80,6 +86,13 @@ static u16 esdhc_readw_fixup(struct sdhci_host *host,
 		ret = value & 0xffff;
 	else
 		ret = (value >> shift) & 0xffff;
+
+	/* Workaround for T4240-R1.0-R2.0 eSDHC which has incorrect
+	 * vendor version and spec version information.
+	 */
+	if ((spec_reg == SDHCI_HOST_VERSION) &&
+	    (esdhc->soc_ver == SVR_T4240) && (esdhc->soc_rev <= 0x20))
+		ret = (VENDOR_V_23 << SDHCI_VENDOR_VER_SHIFT) | SDHCI_SPEC_200;
 	return ret;
 }
 
@@ -567,10 +580,20 @@ static void esdhc_init(struct platform_device *pdev, struct sdhci_host *host)
 	struct sdhci_pltfm_host *pltfm_host;
 	struct sdhci_esdhc *esdhc;
 	u16 host_ver;
+	u32 svr;
 
 	pltfm_host = sdhci_priv(host);
 	esdhc = sdhci_pltfm_priv(pltfm_host);
 
+	fsl_guts_init();
+	svr = fsl_guts_get_svr();
+	if (svr) {
+		esdhc->soc_ver = SVR_SOC_VER(svr);
+		esdhc->soc_rev = SVR_REV(svr);
+	} else {
+		dev_err(&pdev->dev, "Failed to get SVR value!\n");
+	}
+
 	host_ver = sdhci_readw(host, SDHCI_HOST_VERSION);
 	esdhc->vendor_ver = (host_ver & SDHCI_VENDOR_VER_MASK) >>
 			     SDHCI_VENDOR_VER_SHIFT;
-- 
2.1.0.27.g96db324

^ permalink raw reply related

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
From: Eric Dumazet @ 2016-04-01  3:13 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek
In-Reply-To: <56FDD67E.2040904@stressinduktion.org>

On Fri, 2016-04-01 at 04:01 +0200, Hannes Frederic Sowa wrote:

> I thought so first, as well. But given the double check for the 
> spin_lock and the "mutex" we end up with the same result for the 
> lockdep_sock_is_held check.
> 
> Do you see other consequences?

Well, we release the spinlock in __release_sock()

So another thread could come and acquire the socket, then call
mutex_acquire() while the first thread did not call yet mutex_release()

So maybe lockdep will complain (but I do not know lockdep enough to
tell)

So maybe the following would be better :

(Absolutely untested, really I need to take a break)

diff --git a/include/net/sock.h b/include/net/sock.h
index 255d3e03727b..7d5dfa7e1918 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1327,7 +1327,13 @@ static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)
 
 static inline void sock_release_ownership(struct sock *sk)
 {
-	sk->sk_lock.owned = 0;
+	if (sk->sk_lock.owned) {
+		/*
+		 * The sk_lock has mutex_unlock() semantics:
+		 */
+		mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
+		sk->sk_lock.owned = 0;
+	}
 }
 
 /*
diff --git a/net/core/sock.c b/net/core/sock.c
index b67b9aedb230..c7ab98e72346 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2429,10 +2429,6 @@ EXPORT_SYMBOL(lock_sock_nested);
 
 void release_sock(struct sock *sk)
 {
-	/*
-	 * The sk_lock has mutex_unlock() semantics:
-	 */
-	mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
 
 	spin_lock_bh(&sk->sk_lock.slock);
 	if (sk->sk_backlog.tail)

^ permalink raw reply related

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
From: Hannes Frederic Sowa @ 2016-04-01  3:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek
In-Reply-To: <1459480383.6473.270.camel@edumazet-glaptop3.roam.corp.google.com>



On Fri, Apr 1, 2016, at 05:13, Eric Dumazet wrote:
> On Fri, 2016-04-01 at 04:01 +0200, Hannes Frederic Sowa wrote:
> 
> > I thought so first, as well. But given the double check for the 
> > spin_lock and the "mutex" we end up with the same result for the 
> > lockdep_sock_is_held check.
> > 
> > Do you see other consequences?
> 
> Well, we release the spinlock in __release_sock()
> 
> So another thread could come and acquire the socket, then call
> mutex_acquire() while the first thread did not call yet mutex_release()
> 
> So maybe lockdep will complain (but I do not know lockdep enough to
> tell)
> 
> So maybe the following would be better :
> 
> (Absolutely untested, really I need to take a break)

I quickly tested the patch and my scripts didn't show any splats so far.
This patch seems more consistent albeit I don't think it is relevant for
lockdep_sock_is_held as we only flip owned while holding slock. But this
definitely needs more review.

Thanks a lot!

^ permalink raw reply

* Re: qdisc spin lock
From: John Fastabend @ 2016-04-01  3:44 UTC (permalink / raw)
  To: Michael Ma, Cong Wang; +Cc: Linux Kernel Network Developers
In-Reply-To: <CAAmHdhxagKnLP1_5ZW7HTsVBu0TSFYKCvNstAEWN-NHrdnvvVQ@mail.gmail.com>

On 16-03-31 04:48 PM, Michael Ma wrote:
> I didn't really know that multiple qdiscs can be isolated using MQ so
> that each txq can be associated with a particular qdisc. Also we don't
> really have multiple interfaces...

MQ will assign a default qdisc to each txq and the default qdisc can
be changed to htb or any other qdisc of your choice.

> 
> With this MQ solution we'll still need to assign transmit queues to
> different classes by doing some math on the bandwidth limit if I
> understand correctly, which seems to be less convenient compared with
> a solution purely within HTB.
> 

Agreed.

> I assume that with this solution I can still share qdisc among
> multiple transmit queues - please let me know if this is not the case.

Nope sorry doesn't work that way unless you employ some sort of stacked
netdevice strategy which does start to get a bit complex. The basic hint
would be to stack some type of virtual netdev on top of a device and
run the htb qdisc there. Push traffic onto the netdev depending on the
class it belongs to. Its ugly yes.

Noting all that I posted an RFC patch some time back to allow writing
qdiscs that do not require taking the lock. I'll try to respin these
and submit them when net-next opens again. The next logical step is to
write a "better" HTB probably using a shared counter and dropping the
requirement that it be exact.

Sorry I didn't get a chance to look at the paper in your post so not
sure if they suggest something similar or not.

Thanks,
John

> 
> 2016-03-31 15:16 GMT-07:00 Cong Wang <xiyou.wangcong@gmail.com>:
>> On Wed, Mar 30, 2016 at 12:20 AM, Michael Ma <make0818@gmail.com> wrote:
>>> As far as I understand the design of TC is to simplify locking schema
>>> and minimize the work in __qdisc_run so that throughput won’t be
>>> affected, especially with large packets. However if the scenario is
>>> that multiple classes in the queueing discipline only have the shaping
>>> limit, there isn’t really a necessary correlation between different
>>> classes. The only synchronization point should be when the packet is
>>> dequeued from the qdisc queue and enqueued to the transmit queue of
>>> the device. My question is – is it worth investing on avoiding the
>>> locking contention by partitioning the queue/lock so that this
>>> scenario is addressed with relatively smaller latency?
>>
>> If your HTB classes don't share bandwidth, why do you still make them
>> under the same hierarchy? IOW, you can just isolate them either with some
>> other qdisc or just separated interfaces.

^ permalink raw reply

* Re: [PATCH RFC net-next] net: core: Pass XPS select queue decision to skb_tx_hash
From: John Fastabend @ 2016-04-01  3:49 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Saeed Mahameed, Linux Netdev List, Eric Dumazet, Tom Herbert,
	Jiri Pirko, David S. Miller, John Fastabend
In-Reply-To: <CALzJLG-xJe6_-2a=djpLxBR5xQY562m06eLLCP04GdTrzmWJuQ@mail.gmail.com>

On 16-03-30 11:30 AM, Saeed Mahameed wrote:
> On Wed, Mar 30, 2016 at 8:04 PM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>>
>> OK, so let me see if I get this right now. This was the precedence
>> before the patch in the normal no select queue case,
>>
>>         (1) socket mapping sk_tx_queue_mapping iff !ooo_okay
>>         (2) xps
>>         (3) skb->queue_mapping
>>         (4) qoffset/qcount (hash over tc queues)
>>         (5) hash over num_tx_queues
>>
>> With this patch the precedence is a bit changed because
>> skb_tx_hash is always called.
>>
>>         (1) socket mapping sk_tx_queue_mapping iff !ooo_okay
>>         (2) skb->queue_mapping
>>         (3) qoffset/qcount
>>            (hash over tc queues if xps choice is > qcount)
>>         (4) xps
>>         (5) hash over num_tx_queues
>>
>> Sound right? Nice thing about this with correct configuration
>> of tc with qcount = xps_queues it sort of works as at least
> 
> Yes !
> for qcount = xps_queues which almost all drivers default
> configurations goes this way, it works like charm, xps selects the
> exact TC TX queue at the correct offset without any need for further
> SKB hashing.
> and even if by mistake XPS was also configured on TC TX queue then
> this patch will detect that the xps hash is out of this TC
> offset/qcount range and will re-hash. But i don't see why would user
> or driver do such strange configuration.
> 
>> I expect it to. I think the question is are people OK with
>> letting skb->queue_mapping take precedence. I am at least
>> because it makes the skb edit queue_mapping action from tc
>> easier to use.
>>
> 
> skb->queue_mapping toke precedence also before this patch, the only
> thing this patch came to change is how to compute the txq when
> skb->queue_mapping is not present, so we don't need to worry about
> this.
> 

I don't believe that is correct in the general case. Perhaps
in the ndo_select_queue path though. See this line,

        if (queue_index < 0 || skb->ooo_okay ||
            queue_index >= dev->real_num_tx_queues) {
                int new_index = get_xps_queue(dev, skb);
                if (new_index < 0)
                        new_index = skb_tx_hash(dev, skb);

The skb_tx_hash() routine is never called if xps is enabled.
And so we never get into the call to do this,

        if (skb_rx_queue_recorded(skb)) {
                hash = skb_get_rx_queue(skb);
                while (unlikely(hash >= num_tx_queues))
                        hash -= num_tx_queues;
                return hash;
        }

Right? FWIW I think that using queue_mapping before xps is better
because we can use tc to pick the queue_mapping them programmatically
if we want for these special cases instead if wanted.

>> And just a comment on the code why not just move get_xps_queue
>> into skb_tx_hash at this point if its always being called as the
>> "hint". Then we avoid calling it in the case queue_mapping is
>> set.
>>
> 
> Very good point, the only place that calls skb_tx_hash(dev, skb) other
> than __netdev_pick_tx is mlx4 driver and they did it there just
> because they wanted to bypass XPS configuration if TC QoS is
> configured, with this fix we don't have to bypass XPS at all for when
> TC is configured.
> 
> I will change it.
> 

Great thanks.

^ permalink raw reply

* Re: [PATCH] rds: rds-stress show all zeros after few minutes
From: santosh.shilimkar @ 2016-04-01  3:59 UTC (permalink / raw)
  To: shamir rabinovitch, rds-devel, netdev; +Cc: davem
In-Reply-To: <1459385402-28449-1-git-send-email-shamir.rabinovitch@oracle.com>

Hi Shamir,

Nice to see this one soon on the list,
Just to make $subject more relevant. How about below?

RDS: fix congestion map corruption for PAGE_SIZE > 8k

On 3/30/16 5:50 PM, shamir rabinovitch wrote:
> Issue can be seen on platforms that use 8K and above page size
> while rds fragment size is 4K. On those platforms single page is
> shared between 2 or more rds fragments. Each fragment has it's own
> offeset and rds cong map code need to take this offset to account.
> Not taking this offset to account lead to reading the data fragment
> as congestion map fragment and hang of the rds transmit due to far
> cong map corruption.
>
> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
> Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> Tested-by: Anand Bibhuti <anand.bibhuti@oracle.com>
>
> Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com>
> ---
>   net/rds/ib_recv.c |    2 +-
>   net/rds/iw_recv.c |    2 +-
>   net/rds/page.c    |    5 +++--
>   3 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
> index 977fb86..abc8cc8 100644
> --- a/net/rds/ib_recv.c
> +++ b/net/rds/ib_recv.c
> @@ -796,7 +796,7 @@ static void rds_ib_cong_recv(struct rds_connection *conn,
>
>   		addr = kmap_atomic(sg_page(&frag->f_sg));
>
> -		src = addr + frag_off;
> +		src = addr + frag->f_sg.offset + frag_off;
>   		dst = (void *)map->m_page_addrs[map_page] + map_off;
>   		for (k = 0; k < to_copy; k += 8) {
>   			/* Record ports that became uncongested, ie
> diff --git a/net/rds/iw_recv.c b/net/rds/iw_recv.c
If you refresh the patch against 4.6-rc1, you won't need to
patch iw_recv.c :-)


> diff --git a/net/rds/page.c b/net/rds/page.c
> index 5a14e6d..715cbaa 100644
> --- a/net/rds/page.c
> +++ b/net/rds/page.c
> @@ -135,8 +135,9 @@ int rds_page_remainder_alloc(struct scatterlist *scat, unsigned long bytes,
>   			if (rem->r_offset != 0)
>   				rds_stats_inc(s_page_remainder_hit);
>
> -			rem->r_offset += bytes;
> -			if (rem->r_offset == PAGE_SIZE) {
> +			/* some hw (e.g. sparc) require aligned memory */
> +			rem->r_offset += ALIGN(bytes, 8);
> +			if (rem->r_offset >= PAGE_SIZE) {
>   				__free_page(rem->r_page);
>   				rem->r_page = NULL;
>   			}
>
This hunk I missed out looks like. This doesn't belong to the
$subject patch. Could you please add this in separate patch. I
will need more than just "some hw (e.g. sparc) require aligned memory"

Once you fix these, please repost the updated version, and I will add
them to the 4.7 queue. Thanks !!

Regards,
Santosh

^ permalink raw reply

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
From: Alexei Starovoitov @ 2016-04-01  4:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hannes Frederic Sowa, davem, netdev, sasha.levin, daniel,
	mkubecek
In-Reply-To: <1459479818.6473.265.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, Mar 31, 2016 at 08:03:38PM -0700, Eric Dumazet wrote:
> On Thu, 2016-03-31 at 18:45 -0700, Alexei Starovoitov wrote:
> 
> > Eric, what's your take on Hannes's patch 2 ?
> > Is it more accurate to ask lockdep to check for actual lock
> > or lockdep can rely on owned flag?
> > Potentially there could be races between setting the flag and
> > actual lock... but that code is contained, so unlikely.
> > Will we find the real issues with this 'stronger' check or
> > just spend a ton of time adapting to new model like your other
> > patch for release_sock and whatever may need to come next...
> 
> More precise lockdep checks are certainly good, I only objected to 4/4
> trying to work around another bug.
> 
> But why do we rush for 'net' tree ?
> 
> This looks net-next material to me.
> 
> Locking changes are often subtle, lets take the time to do them
> properly.

completely agree. I think only first patch belongs in net.
Everything else is net-next material.

^ permalink raw reply

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
From: Hannes Frederic Sowa @ 2016-04-01  4:12 UTC (permalink / raw)
  To: Alexei Starovoitov, Eric Dumazet
  Cc: davem, netdev, sasha.levin, daniel, mkubecek
In-Reply-To: <20160401040442.GA14661@ast-mbp.thefacebook.com>

On 01.04.2016 06:04, Alexei Starovoitov wrote:
> On Thu, Mar 31, 2016 at 08:03:38PM -0700, Eric Dumazet wrote:
>> On Thu, 2016-03-31 at 18:45 -0700, Alexei Starovoitov wrote:
>>
>>> Eric, what's your take on Hannes's patch 2 ?
>>> Is it more accurate to ask lockdep to check for actual lock
>>> or lockdep can rely on owned flag?
>>> Potentially there could be races between setting the flag and
>>> actual lock... but that code is contained, so unlikely.
>>> Will we find the real issues with this 'stronger' check or
>>> just spend a ton of time adapting to new model like your other
>>> patch for release_sock and whatever may need to come next...
>>
>> More precise lockdep checks are certainly good, I only objected to 4/4
>> trying to work around another bug.
>>
>> But why do we rush for 'net' tree ?
>>
>> This looks net-next material to me.
>>
>> Locking changes are often subtle, lets take the time to do them
>> properly.
>
> completely agree. I think only first patch belongs in net.
> Everything else is net-next material.

Problem with first patch is that it uses lock_sock_fast, thus the 
current sock_owned_by_user check doesn't get rid the lockdep warning. :/

Thus we would need to go with the two first patches. Do you think it is 
acceptable? I actually didn't see a problem and testing showed no 
problems so far.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
From: Alexei Starovoitov @ 2016-04-01  4:26 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Eric Dumazet, davem, netdev, sasha.levin, daniel, mkubecek
In-Reply-To: <56FDF541.6070106@stressinduktion.org>

On Fri, Apr 01, 2016 at 06:12:49AM +0200, Hannes Frederic Sowa wrote:
> On 01.04.2016 06:04, Alexei Starovoitov wrote:
> >On Thu, Mar 31, 2016 at 08:03:38PM -0700, Eric Dumazet wrote:
> >>On Thu, 2016-03-31 at 18:45 -0700, Alexei Starovoitov wrote:
> >>
> >>>Eric, what's your take on Hannes's patch 2 ?
> >>>Is it more accurate to ask lockdep to check for actual lock
> >>>or lockdep can rely on owned flag?
> >>>Potentially there could be races between setting the flag and
> >>>actual lock... but that code is contained, so unlikely.
> >>>Will we find the real issues with this 'stronger' check or
> >>>just spend a ton of time adapting to new model like your other
> >>>patch for release_sock and whatever may need to come next...
> >>
> >>More precise lockdep checks are certainly good, I only objected to 4/4
> >>trying to work around another bug.
> >>
> >>But why do we rush for 'net' tree ?
> >>
> >>This looks net-next material to me.
> >>
> >>Locking changes are often subtle, lets take the time to do them
> >>properly.
> >
> >completely agree. I think only first patch belongs in net.
> >Everything else is net-next material.
> 
> Problem with first patch is that it uses lock_sock_fast, thus the current
> sock_owned_by_user check doesn't get rid the lockdep warning. :/
> 
> Thus we would need to go with the two first patches. Do you think it is
> acceptable? I actually didn't see a problem and testing showed no problems
> so far.

I see. right. the patch 1 only makes sense when coupled with 2.
but now I'm not so sure that lockdep_is_held(&sk->sk_lock.slock)
is a valid check, since current sock_owned_by_user() is equivalent
to lockdep_is_held(&sk->sk_lock) only.
I would go with Daniel's approach. Much simpler to reason about.

^ permalink raw reply

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
From: Hannes Frederic Sowa @ 2016-04-01  4:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, davem, netdev, sasha.levin, daniel, mkubecek
In-Reply-To: <20160401042655.GA16500@ast-mbp.thefacebook.com>

On 01.04.2016 06:26, Alexei Starovoitov wrote:
> On Fri, Apr 01, 2016 at 06:12:49AM +0200, Hannes Frederic Sowa wrote:
>> On 01.04.2016 06:04, Alexei Starovoitov wrote:
>>> On Thu, Mar 31, 2016 at 08:03:38PM -0700, Eric Dumazet wrote:
>>>> On Thu, 2016-03-31 at 18:45 -0700, Alexei Starovoitov wrote:
>>>>
>>>>> Eric, what's your take on Hannes's patch 2 ?
>>>>> Is it more accurate to ask lockdep to check for actual lock
>>>>> or lockdep can rely on owned flag?
>>>>> Potentially there could be races between setting the flag and
>>>>> actual lock... but that code is contained, so unlikely.
>>>>> Will we find the real issues with this 'stronger' check or
>>>>> just spend a ton of time adapting to new model like your other
>>>>> patch for release_sock and whatever may need to come next...
>>>>
>>>> More precise lockdep checks are certainly good, I only objected to 4/4
>>>> trying to work around another bug.
>>>>
>>>> But why do we rush for 'net' tree ?
>>>>
>>>> This looks net-next material to me.
>>>>
>>>> Locking changes are often subtle, lets take the time to do them
>>>> properly.
>>>
>>> completely agree. I think only first patch belongs in net.
>>> Everything else is net-next material.
>>
>> Problem with first patch is that it uses lock_sock_fast, thus the current
>> sock_owned_by_user check doesn't get rid the lockdep warning. :/
>>
>> Thus we would need to go with the two first patches. Do you think it is
>> acceptable? I actually didn't see a problem and testing showed no problems
>> so far.
>
> I see. right. the patch 1 only makes sense when coupled with 2.
> but now I'm not so sure that lockdep_is_held(&sk->sk_lock.slock)
> is a valid check, since current sock_owned_by_user() is equivalent
> to lockdep_is_held(&sk->sk_lock) only.
> I would go with Daniel's approach. Much simpler to reason about.

IMHO we should treat sk_lock and sk_lock.slock the same as they are 
encapsulated by socket lock api.

I was rather afraid that we call those changed functions from within 
release_sock and thus would have the same problem again, where we get 
splats because of the time where we actually have user ownership but not 
the mark in the lockdep data structures. But this seems not to be the 
case as the functions are only directly called on behalf of user space.

Daniel, what do you think? I would be fine with your patch for net and 
we clean this up a bit in net-next then.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH net-next 1/6] net: skbuff: don't use union for napi_id and sender_cpu
From: Jason Wang @ 2016-04-01  4:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, mst, netdev, linux-kernel
In-Reply-To: <1459479325.6473.260.camel@edumazet-glaptop3.roam.corp.google.com>



On 04/01/2016 10:55 AM, Eric Dumazet wrote:
> On Fri, 2016-04-01 at 10:13 +0800, Jason Wang wrote:
>
>
>> The problem is we want to support busy polling for tun. This needs
>> napi_id to be passed to tun socket by sk_mark_napi_id() during
>> tun_net_xmit(). But before reaching this, XPS will set sender_cpu will
>> make us can't see correct napi_id.
>>
> Looks like napi_id should have precedence then ?

But then when busy polling is enabled, we may still hit the issue before
commit 2bd82484bb4c5db1d5dc983ac7c409b2782e0154? So looks like sometimes
(e.g for tun), we need both two fields.

>
> Only forwarding should allow the field to be cleared to allow XPS to do
> its job.
>
> Maybe skb_sender_cpu_clear() was removed too early (commit
> 64d4e3431e686dc37ce388ba531c4c4e866fb141)

Not sure I get you, but this will clear napi_id too.

> Look, it is 8pm here, I am pretty sure a solution can be found,
> but I am also need to take a break, I started at 3am today...
>
>
>

^ permalink raw reply

* Greetings to you You have new message
From: Anya-sanda Chindori Chininga @ 2016-03-30 15:51 UTC (permalink / raw)
  To: Recipients

Hello Dear
How are you doing today? I read your profile today at hotdog and found you worthy to be mine as someone whom i can lay on his arms as long as love is concern, caring and teasing you all the night long, If you are interested in knowing more about me and for me to send you some photos of mine please contact me back, for i have some thing important to share with you above all,remember that age,colour,language or religion does not matter but love matters alot in life. God bless you as you read my mail
Waiting to hear from you 
Thanks 
kiss

^ permalink raw reply

* [PATCH (net.git) 0/3] stmmac MDIO and normal descr fixes
From: Giuseppe Cavallaro @ 2016-04-01  7:07 UTC (permalink / raw)
  To: netdev
  Cc: gabriel.fernandez, afaerber, fschaefer.oss, dinh.linux, davem,
	preid, rhgadsdon, linux-kernel, Giuseppe Cavallaro

This patch series is to fix the problems below and recently debugged
in this mailing list:

o to fix a problem for the HW where the normal descriptor
o to fix the mdio registration according to the different
  platform configurations

I am resending all the patches again: built on top of net.git repo.

Giuseppe Cavallaro (3):
  stmmac: fix TX normal DESC
  Revert "stmmac: Fix 'eth0: No PHY found' regression"
  stmmac: fix MDIO settings

 drivers/net/ethernet/stmicro/stmmac/norm_desc.c    |   16 ++--
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |   16 +---
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  |   10 +--
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |   91 ++++++++++++++------
 include/linux/stmmac.h                             |    1 -
 5 files changed, 80 insertions(+), 54 deletions(-)

-- 
1.7.4.4

^ permalink raw reply

* [PATCH (net.git) 1/3] stmmac: fix TX normal DESC
From: Giuseppe Cavallaro @ 2016-04-01  7:07 UTC (permalink / raw)
  To: netdev
  Cc: gabriel.fernandez, afaerber, fschaefer.oss, dinh.linux, davem,
	preid, rhgadsdon, linux-kernel, Giuseppe Cavallaro,
	Fabrice Gasnier
In-Reply-To: <1459494436-27386-1-git-send-email-peppe.cavallaro@st.com>

This patch fixs a regression raised when test on chips that use
the normal descriptor layout. In fact, no len bits were set for
the TDES1 and no OWN bit inside the TDES0.

Signed-off-by: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Cc: Fabrice Gasnier <fabrice.gasnier@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/norm_desc.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
index e13228f..011386f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
@@ -199,11 +199,6 @@ static void ndesc_prepare_tx_desc(struct dma_desc *p, int is_fs, int len,
 {
 	unsigned int tdes1 = p->des1;
 
-	if (mode == STMMAC_CHAIN_MODE)
-		norm_set_tx_desc_len_on_chain(p, len);
-	else
-		norm_set_tx_desc_len_on_ring(p, len);
-
 	if (is_fs)
 		tdes1 |= TDES1_FIRST_SEGMENT;
 	else
@@ -217,10 +212,15 @@ static void ndesc_prepare_tx_desc(struct dma_desc *p, int is_fs, int len,
 	if (ls)
 		tdes1 |= TDES1_LAST_SEGMENT;
 
-	if (tx_own)
-		tdes1 |= TDES0_OWN;
-
 	p->des1 = tdes1;
+
+	if (mode == STMMAC_CHAIN_MODE)
+		norm_set_tx_desc_len_on_chain(p, len);
+	else
+		norm_set_tx_desc_len_on_ring(p, len);
+
+	if (tx_own)
+		p->des0 |= TDES0_OWN;
 }
 
 static void ndesc_set_tx_ic(struct dma_desc *p)
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH (net.git) 2/3] Revert "stmmac: Fix 'eth0: No PHY found' regression"
From: Giuseppe Cavallaro @ 2016-04-01  7:07 UTC (permalink / raw)
  To: netdev
  Cc: gabriel.fernandez, afaerber, fschaefer.oss, dinh.linux, davem,
	preid, rhgadsdon, linux-kernel, Giuseppe Cavallaro
In-Reply-To: <1459494436-27386-1-git-send-email-peppe.cavallaro@st.com>

This reverts commit 88f8b1bb41c6208f81b6a480244533ded7b59493.
due to problems on GeekBox and Banana Pi M1 board when
connected to a real transceiver instead of a switch via
fixed-link.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Gabriel Fernandez <gabriel.fernandez@linaro.org>
Cc: Andreas Färber <afaerber@suse.de>
Cc: Frank Schäfer <fschaefer.oss@googlemail.com>
Cc: Dinh Nguyen <dinh.linux@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  |   11 ++++++++++-
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |    9 +--------
 include/linux/stmmac.h                             |    1 -
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index ea76129..af09ced 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -199,12 +199,21 @@ int stmmac_mdio_register(struct net_device *ndev)
 	struct stmmac_priv *priv = netdev_priv(ndev);
 	struct stmmac_mdio_bus_data *mdio_bus_data = priv->plat->mdio_bus_data;
 	int addr, found;
-	struct device_node *mdio_node = priv->plat->mdio_node;
+	struct device_node *mdio_node = NULL;
+	struct device_node *child_node = NULL;
 
 	if (!mdio_bus_data)
 		return 0;
 
 	if (IS_ENABLED(CONFIG_OF)) {
+		for_each_child_of_node(priv->device->of_node, child_node) {
+			if (of_device_is_compatible(child_node,
+						    "snps,dwmac-mdio")) {
+				mdio_node = child_node;
+				break;
+			}
+		}
+
 		if (mdio_node) {
 			netdev_dbg(ndev, "FOUND MDIO subnode\n");
 		} else {
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index dcbd2a1..9cf181f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -146,7 +146,6 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac)
 	struct device_node *np = pdev->dev.of_node;
 	struct plat_stmmacenet_data *plat;
 	struct stmmac_dma_cfg *dma_cfg;
-	struct device_node *child_node = NULL;
 
 	plat = devm_kzalloc(&pdev->dev, sizeof(*plat), GFP_KERNEL);
 	if (!plat)
@@ -177,19 +176,13 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac)
 		plat->phy_node = of_node_get(np);
 	}
 
-	for_each_child_of_node(np, child_node)
-		if (of_device_is_compatible(child_node,	"snps,dwmac-mdio")) {
-			plat->mdio_node = child_node;
-			break;
-		}
-
 	/* "snps,phy-addr" is not a standard property. Mark it as deprecated
 	 * and warn of its use. Remove this when phy node support is added.
 	 */
 	if (of_property_read_u32(np, "snps,phy-addr", &plat->phy_addr) == 0)
 		dev_warn(&pdev->dev, "snps,phy-addr property is deprecated\n");
 
-	if ((plat->phy_node && !of_phy_is_fixed_link(np)) || !plat->mdio_node)
+	if ((plat->phy_node && !of_phy_is_fixed_link(np)) || plat->phy_bus_name)
 		plat->mdio_bus_data = NULL;
 	else
 		plat->mdio_bus_data =
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index 4bcf5a6..6e53fa8 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -114,7 +114,6 @@ struct plat_stmmacenet_data {
 	int interface;
 	struct stmmac_mdio_bus_data *mdio_bus_data;
 	struct device_node *phy_node;
-	struct device_node *mdio_node;
 	struct stmmac_dma_cfg *dma_cfg;
 	int clk_csr;
 	int has_gmac;
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH (net.git) 3/3] stmmac: fix MDIO settings
From: Giuseppe Cavallaro @ 2016-04-01  7:07 UTC (permalink / raw)
  To: netdev
  Cc: gabriel.fernandez, afaerber, fschaefer.oss, dinh.linux, davem,
	preid, rhgadsdon, linux-kernel, Giuseppe Cavallaro
In-Reply-To: <1459494436-27386-1-git-send-email-peppe.cavallaro@st.com>

Initially the phy_bus_name was added to manipulate the
driver name but it was recently just used to manage the
fixed-link and then to take some decision at run-time.
So the patch uses the is_pseudo_fixed_link and removes
the phy_bus_name variable not necessary anymore.

The driver can manage the mdio registration by using phy-handle,
dwmac-mdio and own parameter e.g. snps,phy-addr.
This patch takes care about all these possible configurations
and fixes the mdio registration in case of there is a real
transceiver or a switch (that needs to be managed by using
fixed-link).

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Tested-by: Frank Schäfer <fschaefer.oss@googlemail.com>
Cc: Gabriel Fernandez <gabriel.fernandez@linaro.org>
Cc: Dinh Nguyen <dinh.linux@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Phil Reid <preid@electromag.com.au>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |   16 +---
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  |   19 +----
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |   84 +++++++++++++++----
 include/linux/stmmac.h                             |    2 +-
 4 files changed, 73 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 4c5ce98..78464fa 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -278,7 +278,6 @@ static void stmmac_eee_ctrl_timer(unsigned long arg)
  */
 bool stmmac_eee_init(struct stmmac_priv *priv)
 {
-	char *phy_bus_name = priv->plat->phy_bus_name;
 	unsigned long flags;
 	bool ret = false;
 
@@ -290,7 +289,7 @@ bool stmmac_eee_init(struct stmmac_priv *priv)
 		goto out;
 
 	/* Never init EEE in case of a switch is attached */
-	if (phy_bus_name && (!strcmp(phy_bus_name, "fixed")))
+	if (priv->phydev->is_pseudo_fixed_link)
 		goto out;
 
 	/* MAC core supports the EEE feature. */
@@ -827,12 +826,8 @@ static int stmmac_init_phy(struct net_device *dev)
 		phydev = of_phy_connect(dev, priv->plat->phy_node,
 					&stmmac_adjust_link, 0, interface);
 	} else {
-		if (priv->plat->phy_bus_name)
-			snprintf(bus_id, MII_BUS_ID_SIZE, "%s-%x",
-				 priv->plat->phy_bus_name, priv->plat->bus_id);
-		else
-			snprintf(bus_id, MII_BUS_ID_SIZE, "stmmac-%x",
-				 priv->plat->bus_id);
+		snprintf(bus_id, MII_BUS_ID_SIZE, "stmmac-%x",
+			 priv->plat->bus_id);
 
 		snprintf(phy_id_fmt, MII_BUS_ID_SIZE + 3, PHY_ID_FMT, bus_id,
 			 priv->plat->phy_addr);
@@ -871,9 +866,8 @@ static int stmmac_init_phy(struct net_device *dev)
 	}
 
 	/* If attached to a switch, there is no reason to poll phy handler */
-	if (priv->plat->phy_bus_name)
-		if (!strcmp(priv->plat->phy_bus_name, "fixed"))
-			phydev->irq = PHY_IGNORE_INTERRUPT;
+	if (phydev->is_pseudo_fixed_link)
+		phydev->irq = PHY_IGNORE_INTERRUPT;
 
 	pr_debug("stmmac_init_phy:  %s: attached to PHY (UID 0x%x)"
 		 " Link = %d\n", dev->name, phydev->phy_id, phydev->link);
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index af09ced..06704ca 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -198,29 +198,12 @@ int stmmac_mdio_register(struct net_device *ndev)
 	struct mii_bus *new_bus;
 	struct stmmac_priv *priv = netdev_priv(ndev);
 	struct stmmac_mdio_bus_data *mdio_bus_data = priv->plat->mdio_bus_data;
+	struct device_node *mdio_node = priv->plat->mdio_node;
 	int addr, found;
-	struct device_node *mdio_node = NULL;
-	struct device_node *child_node = NULL;
 
 	if (!mdio_bus_data)
 		return 0;
 
-	if (IS_ENABLED(CONFIG_OF)) {
-		for_each_child_of_node(priv->device->of_node, child_node) {
-			if (of_device_is_compatible(child_node,
-						    "snps,dwmac-mdio")) {
-				mdio_node = child_node;
-				break;
-			}
-		}
-
-		if (mdio_node) {
-			netdev_dbg(ndev, "FOUND MDIO subnode\n");
-		} else {
-			netdev_warn(ndev, "No MDIO subnode found\n");
-		}
-	}
-
 	new_bus = mdiobus_alloc();
 	if (new_bus == NULL)
 		return -ENOMEM;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 9cf181f..cf37ea5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -132,6 +132,69 @@ static struct stmmac_axi *stmmac_axi_setup(struct platform_device *pdev)
 }
 
 /**
+ * stmmac_dt_phy - parse device-tree driver parameters to allocate PHY resources
+ * @plat: driver data platform structure
+ * @np: device tree node
+ * @dev: device pointer
+ * Description:
+ * The mdio bus will be allocated in case of a phy transceiver is on board;
+ * it will be NULL if the fixed-link is configured.
+ * If there is the "snps,dwmac-mdio" sub-node the mdio will be allocated
+ * in any case (for DSA, mdio must be registered even if fixed-link).
+ * The table below sums the supported configurations:
+ *	-------------------------------
+ *	snps,phy-addr	|     Y
+ *	-------------------------------
+ *	phy-handle	|     Y
+ *	-------------------------------
+ *	fixed-link	|     N
+ *	-------------------------------
+ *	snps,dwmac-mdio	|
+ *	  even if	|     Y
+ *	fixed-link	|
+ *	-------------------------------
+ *
+ * It returns 0 in case of success otherwise -ENODEV.
+ */
+static int stmmac_dt_phy(struct plat_stmmacenet_data *plat,
+			 struct device_node *np, struct device *dev)
+{
+	bool mdio = true;
+
+	/* If phy-handle property is passed from DT, use it as the PHY */
+	plat->phy_node = of_parse_phandle(np, "phy-handle", 0);
+	if (plat->phy_node)
+		dev_dbg(dev, "Found phy-handle subnode\n");
+
+	/* If phy-handle is not specified, check if we have a fixed-phy */
+	if (!plat->phy_node && of_phy_is_fixed_link(np)) {
+		if ((of_phy_register_fixed_link(np) < 0))
+			return -ENODEV;
+
+		dev_dbg(dev, "Found fixed-link subnode\n");
+		plat->phy_node = of_node_get(np);
+		mdio = false;
+	}
+
+	/* If snps,dwmac-mdio is passed from DT, always register the MDIO */
+	for_each_child_of_node(np, plat->mdio_node) {
+		if (of_device_is_compatible(plat->mdio_node, "snps,dwmac-mdio"))
+			break;
+	}
+
+	if (plat->mdio_node) {
+		dev_dbg(dev, "Found MDIO subnode\n");
+		mdio = true;
+	}
+
+	if (mdio)
+		plat->mdio_bus_data =
+			devm_kzalloc(dev, sizeof(struct stmmac_mdio_bus_data),
+				     GFP_KERNEL);
+	return 0;
+}
+
+/**
  * stmmac_probe_config_dt - parse device-tree driver parameters
  * @pdev: platform_device structure
  * @plat: driver data platform structure
@@ -165,30 +228,15 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac)
 	/* Default to phy auto-detection */
 	plat->phy_addr = -1;
 
-	/* If we find a phy-handle property, use it as the PHY */
-	plat->phy_node = of_parse_phandle(np, "phy-handle", 0);
-
-	/* If phy-handle is not specified, check if we have a fixed-phy */
-	if (!plat->phy_node && of_phy_is_fixed_link(np)) {
-		if ((of_phy_register_fixed_link(np) < 0))
-			return ERR_PTR(-ENODEV);
-
-		plat->phy_node = of_node_get(np);
-	}
-
 	/* "snps,phy-addr" is not a standard property. Mark it as deprecated
 	 * and warn of its use. Remove this when phy node support is added.
 	 */
 	if (of_property_read_u32(np, "snps,phy-addr", &plat->phy_addr) == 0)
 		dev_warn(&pdev->dev, "snps,phy-addr property is deprecated\n");
 
-	if ((plat->phy_node && !of_phy_is_fixed_link(np)) || plat->phy_bus_name)
-		plat->mdio_bus_data = NULL;
-	else
-		plat->mdio_bus_data =
-			devm_kzalloc(&pdev->dev,
-				     sizeof(struct stmmac_mdio_bus_data),
-				     GFP_KERNEL);
+	/* To Configure PHY by using all device-tree supported properties */
+	if (stmmac_dt_phy(plat, np, &pdev->dev))
+		return ERR_PTR(-ENODEV);
 
 	of_property_read_u32(np, "tx-fifo-depth", &plat->tx_fifo_size);
 
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index 6e53fa8..e6bc30a 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -108,12 +108,12 @@ struct stmmac_axi {
 };
 
 struct plat_stmmacenet_data {
-	char *phy_bus_name;
 	int bus_id;
 	int phy_addr;
 	int interface;
 	struct stmmac_mdio_bus_data *mdio_bus_data;
 	struct device_node *phy_node;
+	struct device_node *mdio_node;
 	struct stmmac_dma_cfg *dma_cfg;
 	int clk_csr;
 	int has_gmac;
-- 
1.7.4.4

^ permalink raw reply related

* Re: [PATCH] net: mvneta: replace MVNETA_CPU_D_CACHE_LINE_SIZE with L1_CACHE_BYTES
From: Jisheng Zhang @ 2016-04-01  7:15 UTC (permalink / raw)
  To: David Miller; +Cc: thomas.petazzoni, netdev, linux-kernel, linux-arm-kernel
In-Reply-To: <20160331.164710.1998871529914699937.davem@davemloft.net>

Hi David, Thomas,

On Thu, 31 Mar 2016 16:47:10 -0400 David Miller  wrote:

> From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> Date: Thu, 31 Mar 2016 22:37:35 +0200
> 
> > Hello,
> > 
> > On Thu, 31 Mar 2016 15:15:47 -0400 (EDT), David Miller wrote:  
> >> From: Jisheng Zhang <jszhang@marvell.com>
> >> Date: Wed, 30 Mar 2016 19:55:21 +0800
> >>   
> >> > The mvneta is also used in some Marvell berlin family SoCs which may
> >> > have 64bytes cacheline size. Replace the MVNETA_CPU_D_CACHE_LINE_SIZE
> >> > usage with L1_CACHE_BYTES.
> >> > 
> >> > And since dma_alloc_coherent() is always cacheline size aligned, so
> >> > remove the align checks.
> >> > 
> >> > Signed-off-by: Jisheng Zhang <jszhang@marvell.com>  
> >> 
> >> Applied.  
> > 
> > A new version of the patch was sent, which more rightfully uses
> > cache_line_size(), see:
> > 
> >  "[PATCH v2] net: mvneta: replace MVNETA_CPU_D_CACHE_LINE_SIZE with cache_line_size"  
> 
> Sorry about that.
> 
> Send me a realtive fixup patch if you like.
> 

Sorry about inconvenience, I'll send out fixup patch.

Thanks,
Jisheng

^ permalink raw reply

* Section 4 No. 9,10 Failed was occurred by IPv6 Ready Logo Conformance Test
From: Yuki Machida @ 2016-04-01  7:31 UTC (permalink / raw)
  To: netdev

Hi all,

I tested 4.6-rc1 by IPv6 Ready Logo Core Conformance Test.
4.6-rc1 has some FAILs in Section 4 (RFC 1981: Path MTU Discovery for IP version 6).
I conformed that it was PASSed in 3.14.28 and it was FAILed in 4.1.17.
I will find a patch between 3.14 and 4.1.

IPv6 Ready Logo
https://www.ipv6ready.org/
TAHI Project
http://www.tahi.org/

I ran the IPv6 Ready Logo Core Conformance Test on Intel D510MO (Atom D510).
It is using userland build with yocto project.

Test Environment
Test Specification          : 4.0.6
Tool Version                : REL_3_3_2
Test Program Version        : V6LC_5_0_0
Target Device               : Intel D510MO (Atom D510)

List of FAILs

Section 4: RFC 1981 - Path MTU Discovery for IPv6
- Test v6LC.4.1.6: Receiving MTU Below IPv6 Minimum Link MTU
  - No. 9 Part A: MTU equal to 56
  - No.10 Part B: MTU equal to 1279

Regards,
Yuki Machida

^ permalink raw reply

* Re: Section 4 No. 9,10 Failed was occurred by IPv6 Ready Logo Conformance Test
From: Rongqing Li @ 2016-04-01  7:43 UTC (permalink / raw)
  To: Yuki Machida, netdev
In-Reply-To: <56FE23C4.2000706@jp.fujitsu.com>



On 2016年04月01日 15:31, Yuki Machida wrote:
> Hi all,
> 
> I tested 4.6-rc1 by IPv6 Ready Logo Core Conformance Test.
> 4.6-rc1 has some FAILs in Section 4 (RFC 1981: Path MTU Discovery for IP version 6).
> I conformed that it was PASSed in 3.14.28 and it was FAILed in 4.1.17.
> I will find a patch between 3.14 and 4.1.
> 
> IPv6 Ready Logo
> https://www.ipv6ready.org/
> TAHI Project
> http://www.tahi.org/
> 
> I ran the IPv6 Ready Logo Core Conformance Test on Intel D510MO (Atom D510).
> It is using userland build with yocto project.
> 
> Test Environment
> Test Specification          : 4.0.6
> Tool Version                : REL_3_3_2
> Test Program Version        : V6LC_5_0_0
> Target Device               : Intel D510MO (Atom D510)
> 
> List of FAILs
> 
> Section 4: RFC 1981 - Path MTU Discovery for IPv6
> - Test v6LC.4.1.6: Receiving MTU Below IPv6 Minimum Link MTU
>    - No. 9 Part A: MTU equal to 56
>    - No.10 Part B: MTU equal to 1279
> 

apply this one

commit 8013d1d7eafb0589ca766db6b74026f76b7f5cb4
Author: Hangbin Liu <liuhangbin@gmail.com>
Date:   Thu Jul 30 14:28:42 2015 +0800

    net/ipv6: add sysctl option accept_ra_min_hop_limit

    Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
    disabled accept hop limit from RA if it is smaller than the current hop
    limit for security stuff. But this behavior kind of break the RFC
definition.

    RFC 4861, 6.3.4.  Processing Received Router Advertisements
       A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
       and Retrans Timer) may contain a value denoting that it is
       unspecified.  In such cases, the parameter should be ignored and the
       host should continue using whatever value it is already using.

       If the received Cur Hop Limit value is non-zero, the host SHOULD set
       its CurHopLimit variable to the received value.

    So add sysctl option accept_ra_min_hop_limit to let user choose the
minimum
    hop limit value they can accept from RA. And set default to 1 to
meet RFC
    standards.

    Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
    Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>





and revert the below one, the TAHI should be updated

commit 9d289715eb5c252ae15bd547cb252ca547a3c4f2
Author: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Thu Jan 15 22:34:25 2015 +0100

    ipv6: stop sending PTB packets for MTU < 1280

    Reduce the attack vector and stop generating IPv6 Fragment Header for
    paths with an MTU smaller than the minimum required IPv6 MTU
    size (1280 byte) - called atomic fragments.

    See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1]
    for more information and how this "feature" can be misused.

    [1]
https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00

    Signed-off-by: Fernando Gont <fgont@si6networks.com>
    Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
    Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>



-Roy




> Regards,
> Yuki Machida
> 

-- 
Best Reagrds,
Roy | RongQing Li

^ permalink raw reply

* Re: Section 4 No. 9,10 Failed was occurred by IPv6 Ready Logo Conformance Test
From: Yuki Machida @ 2016-04-01  8:00 UTC (permalink / raw)
  To: Rongqing Li, netdev
In-Reply-To: <56FE26BE.2060209@windriver.com>

Hi Roy,

Thank you for your advice.
I am very glad.

Futher comment below.

On 2016年04月01日 16:43, Rongqing Li wrote:
> 
> 
> On 2016年04月01日 15:31, Yuki Machida wrote:
>> Hi all,
>>
>> I tested 4.6-rc1 by IPv6 Ready Logo Core Conformance Test.
>> 4.6-rc1 has some FAILs in Section 4 (RFC 1981: Path MTU Discovery for IP version 6).
>> I conformed that it was PASSed in 3.14.28 and it was FAILed in 4.1.17.
>> I will find a patch between 3.14 and 4.1.
>>
>> IPv6 Ready Logo
>> https://www.ipv6ready.org/
>> TAHI Project
>> http://www.tahi.org/
>>
>> I ran the IPv6 Ready Logo Core Conformance Test on Intel D510MO (Atom D510).
>> It is using userland build with yocto project.
>>
>> Test Environment
>> Test Specification          : 4.0.6
>> Tool Version                : REL_3_3_2
>> Test Program Version        : V6LC_5_0_0
>> Target Device               : Intel D510MO (Atom D510)
>>
>> List of FAILs
>>
>> Section 4: RFC 1981 - Path MTU Discovery for IPv6
>> - Test v6LC.4.1.6: Receiving MTU Below IPv6 Minimum Link MTU
>>     - No. 9 Part A: MTU equal to 56
>>     - No.10 Part B: MTU equal to 1279
>>
> 
> apply this one
> 
> commit 8013d1d7eafb0589ca766db6b74026f76b7f5cb4
> Author: Hangbin Liu <liuhangbin@gmail.com>
> Date:   Thu Jul 30 14:28:42 2015 +0800
> 
>      net/ipv6: add sysctl option accept_ra_min_hop_limit
> 
>      Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
>      disabled accept hop limit from RA if it is smaller than the current hop
>      limit for security stuff. But this behavior kind of break the RFC
> definition.
> 
>      RFC 4861, 6.3.4.  Processing Received Router Advertisements
>         A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
>         and Retrans Timer) may contain a value denoting that it is
>         unspecified.  In such cases, the parameter should be ignored and the
>         host should continue using whatever value it is already using.
> 
>         If the received Cur Hop Limit value is non-zero, the host SHOULD set
>         its CurHopLimit variable to the received value.
> 
>      So add sysctl option accept_ra_min_hop_limit to let user choose the
> minimum
>      hop limit value they can accept from RA. And set default to 1 to
> meet RFC
>      standards.
> 
>      Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
>      Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
>      Signed-off-by: David S. Miller <davem@davemloft.net>

I conformed that above patch has been applied at v4.3 in linux.git.

% git tag --contains=8013d1d7eafb0589ca766db6b74026f76b7f5cb4 | head
v4.3
v4.3-rc1
v4.3-rc2
v4.3-rc3
v4.3-rc4
v4.3-rc5
v4.3-rc6
v4.3-rc7
v4.4
v4.4-rc1

> 
> 
> 
> 
> 
> and revert the below one, the TAHI should be updated
> 
> commit 9d289715eb5c252ae15bd547cb252ca547a3c4f2
> Author: Hagen Paul Pfeifer <hagen@jauu.net>
> Date: Thu Jan 15 22:34:25 2015 +0100
> 
>      ipv6: stop sending PTB packets for MTU < 1280
> 
>      Reduce the attack vector and stop generating IPv6 Fragment Header for
>      paths with an MTU smaller than the minimum required IPv6 MTU
>      size (1280 byte) - called atomic fragments.
> 
>      See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1]
>      for more information and how this "feature" can be misused.
> 
>      [1]
> https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00
> 
>      Signed-off-by: Fernando Gont <fgont@si6networks.com>
>      Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
>      Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>      Signed-off-by: David S. Miller <davem@davemloft.net>

I will try.

> 
> 
> 
> -Roy
> 
> 
> 
> 
>> Regards,
>> Yuki Machida
>>
> 

^ permalink raw reply

* Re: [PATCH v2 net-next] net: ipv4: Consider unreachable nexthops in multipath routes
From: Julian Anastasov @ 2016-04-01  8:09 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev
In-Reply-To: <1459463081-20206-1-git-send-email-dsa@cumulusnetworks.com>


	Hello,

On Thu, 31 Mar 2016, David Ahern wrote:

> To maintain backward compatibility use of the neighbor information is
> based on a new sysctl, fib_multipath_use_neigh.
> 
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ---
> v2
> - use rcu locking to avoid refcnts per Eric's suggestion
> - only consider neighbor info for nh_scope == RT_SCOPE_LINK per Julian's
>   comment
> - drop the 'state == NUD_REACHABLE' from the state check since it is
>   part of NUD_VALID (comment from Julian)
> - wrapped the use of the neigh in a sysctl
> 
>  Documentation/networking/ip-sysctl.txt | 10 ++++++++++
>  include/net/netns/ipv4.h               |  3 +++
>  net/ipv4/fib_semantics.c               | 32 ++++++++++++++++++++++++++++++--
>  net/ipv4/sysctl_net_ipv4.c             | 11 +++++++++++
>  4 files changed, 54 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index b183e2b606c8..5b316d33a23f 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -63,6 +63,16 @@ fwmark_reflect - BOOLEAN
>  	fwmark of the packet they are replying to.
>  	Default: 0
>  
> +fib_multipath_use_neigh - BOOLEAN
> +	Use status of existing neighbor entry when determining nexthop for
> +	multipath routes. If disabled neighbor information is not used then
> +	packets could be directed to a dead nexthop. Only valid for kernels

	Some may associate "dead" with RTNH_F_DEAD while
in context of nexthop "failed" or "unreachable" can be
more suitable? Can we use something like this?:

	If disabled<COMMA_ALLOWED_HERE?> neighbor information is not used
and packets could be directed to a failed nexthop.

> +	built with CONFIG_IP_ROUTE_MULTIPATH enabled.
> +	Default: 0 (disabled)
> +	Possible values:
> +	0 - disabled
> +	1 - enabled
> +
>  route/max_size - INTEGER
>  	Maximum number of routes allowed in the kernel.  Increase
>  	this when using large numbers of interfaces and/or routes.
> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> index a69cde3ce460..d061ffeb1e71 100644
> --- a/include/net/netns/ipv4.h
> +++ b/include/net/netns/ipv4.h
> @@ -133,6 +133,9 @@ struct netns_ipv4 {
>  	struct fib_rules_ops	*mr_rules_ops;
>  #endif
>  #endif
> +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> +	int sysctl_fib_multipath_use_neigh;
> +#endif
>  	atomic_t	rt_genid;
>  };
>  #endif
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index d97268e8ff10..6d423faff0ce 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -1559,17 +1559,45 @@ int fib_sync_up(struct net_device *dev, unsigned int nh_flags)
>  }
>  
>  #ifdef CONFIG_IP_ROUTE_MULTIPATH
> +static bool fib_good_nh(const struct fib_nh *nh, struct net_device *dev)
> +{
> +	struct neighbour *n = NULL;
> +	int state = NUD_NONE;

	Looks like we can do it even better.
If we use NUD_REACHABLE here...

> +
> +	if (nh->nh_scope == RT_SCOPE_LINK) {
> +		rcu_read_lock_bh();
> +
> +		n = __neigh_lookup_noref(&arp_tbl, &nh->nh_gw, dev);
> +		if (n)
> +			state = n->nud_state;
> +
> +		rcu_read_unlock_bh();
> +	}
> +
> +	/* outside of rcu locking using n only as a boolean
> +	 * on whether a neighbor entry existed
> +	 */
> +	if (!n || (state & NUD_VALID))

	then check for '!n' is not needed. For bool type
'return state & NUD_VALID;' should work.

> +		return true;
> +
> +	return false;
> +}
>  
>  void fib_select_multipath(struct fib_result *res, int hash)
>  {
>  	struct fib_info *fi = res->fi;
> +	struct net_device *dev = fi->fib_dev;
> +	struct net *net = fi->fib_net;
>  
>  	for_nexthops(fi) {
>  		if (hash > atomic_read(&nh->nh_upper_bound))
>  			continue;
>  
> -		res->nh_sel = nhsel;
> -		return;
> +		if (!net->ipv4.sysctl_fib_multipath_use_neigh ||
> +		    fib_good_nh(nh, dev)) {

	This dev is from first nexthop. Better fib_good_nh
to use nh->nh_dev instead, it is present for all nexthops
in multipath route. We should not copy the bugs from
fib_detect_death.

> +			res->nh_sel = nhsel;
> +			return;
> +		}
>  	} endfor_nexthops(fi);

	So, you dropped the idea to give full chance for
fallback? Now if last nexthop fails we do not fallback at all.
We promised to prefer reachable nexthops.

Regards

^ permalink raw reply

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
From: Daniel Borkmann @ 2016-04-01  8:10 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Alexei Starovoitov
  Cc: Eric Dumazet, davem, netdev, sasha.levin, mkubecek
In-Reply-To: <56FDFA0F.5070104@stressinduktion.org>

On 04/01/2016 06:33 AM, Hannes Frederic Sowa wrote:
> On 01.04.2016 06:26, Alexei Starovoitov wrote:
>> On Fri, Apr 01, 2016 at 06:12:49AM +0200, Hannes Frederic Sowa wrote:
>>> On 01.04.2016 06:04, Alexei Starovoitov wrote:
>>>> On Thu, Mar 31, 2016 at 08:03:38PM -0700, Eric Dumazet wrote:
>>>>> On Thu, 2016-03-31 at 18:45 -0700, Alexei Starovoitov wrote:
>>>>>
>>>>>> Eric, what's your take on Hannes's patch 2 ?
>>>>>> Is it more accurate to ask lockdep to check for actual lock
>>>>>> or lockdep can rely on owned flag?
>>>>>> Potentially there could be races between setting the flag and
>>>>>> actual lock... but that code is contained, so unlikely.
>>>>>> Will we find the real issues with this 'stronger' check or
>>>>>> just spend a ton of time adapting to new model like your other
>>>>>> patch for release_sock and whatever may need to come next...
>>>>>
>>>>> More precise lockdep checks are certainly good, I only objected to 4/4
>>>>> trying to work around another bug.
>>>>>
>>>>> But why do we rush for 'net' tree ?
>>>>>
>>>>> This looks net-next material to me.
>>>>>
>>>>> Locking changes are often subtle, lets take the time to do them
>>>>> properly.
>>>>
>>>> completely agree. I think only first patch belongs in net.
>>>> Everything else is net-next material.
>>>
>>> Problem with first patch is that it uses lock_sock_fast, thus the current
>>> sock_owned_by_user check doesn't get rid the lockdep warning. :/
>>>
>>> Thus we would need to go with the two first patches. Do you think it is
>>> acceptable? I actually didn't see a problem and testing showed no problems
>>> so far.
>>
>> I see. right. the patch 1 only makes sense when coupled with 2.
>> but now I'm not so sure that lockdep_is_held(&sk->sk_lock.slock)
>> is a valid check, since current sock_owned_by_user() is equivalent
>> to lockdep_is_held(&sk->sk_lock) only.
>> I would go with Daniel's approach. Much simpler to reason about.
>
> IMHO we should treat sk_lock and sk_lock.slock the same as they are encapsulated by socket lock api.
>
> I was rather afraid that we call those changed functions from within release_sock and thus would have the same problem again, where we get splats because of the time where we actually have user ownership but not the mark in the lockdep data structures. But this seems not to be the case as the functions are only directly called on behalf of user space.
>
> Daniel, what do you think? I would be fine with your patch for net and we clean this up a bit in net-next then.

Okay, that's fine by me.

Dave, do you need me to resubmit this one w/o changes: http://patchwork.ozlabs.org/patch/603903/ ?

Thanks,
Daniel

^ permalink raw reply

* [PATCH] bridge: remove br_dev_set_multicast_list
From: roy.qing.li @ 2016-04-01  8:16 UTC (permalink / raw)
  To: netdev

From: Li RongQing <roy.qing.li@gmail.com>

remove br_dev_set_multicast_list which does nothing

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
---
 net/bridge/br_device.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 2c8095a..75c7e00 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -123,10 +123,6 @@ static int br_dev_open(struct net_device *dev)
 	return 0;
 }
 
-static void br_dev_set_multicast_list(struct net_device *dev)
-{
-}
-
 static void br_dev_change_rx_flags(struct net_device *dev, int change)
 {
 	if (change & IFF_PROMISC)
@@ -329,7 +325,6 @@ static const struct net_device_ops br_netdev_ops = {
 	.ndo_start_xmit		 = br_dev_xmit,
 	.ndo_get_stats64	 = br_get_stats64,
 	.ndo_set_mac_address	 = br_set_mac_address,
-	.ndo_set_rx_mode	 = br_dev_set_multicast_list,
 	.ndo_change_rx_flags	 = br_dev_change_rx_flags,
 	.ndo_change_mtu		 = br_change_mtu,
 	.ndo_do_ioctl		 = br_dev_ioctl,
-- 
2.1.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox