Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v1 3/3] ARM: STi: Add STiH416 ethernet support.
From: srinivas.kandagatla @ 2014-02-03 12:02 UTC (permalink / raw)
  To: netdev
  Cc: Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Rob Landley, Russell King, Srinivas Kandagatla, Stuart Menefy,
	Giuseppe Cavallaro, devicetree, linux-doc, linux-kernel,
	linux-arm-kernel, kernel, davem
In-Reply-To: <1391428787-27143-1-git-send-email-srinivas.kandagatla@st.com>

From: Srinivas Kandagatla <srinivas.kandagatla@st.com>

This patch adds support to STiH416 SOC, which has two ethernet
snps,dwmac controllers version 3.710. With this patch B2000 and B2020
boards can boot with ethernet in MII and RGMII modes.

Tested on both B2020 and B2000.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
---
 arch/arm/boot/dts/stih416-clock.dtsi   |   14 ++++
 arch/arm/boot/dts/stih416-pinctrl.dtsi |  109 ++++++++++++++++++++++++++++++++
 arch/arm/boot/dts/stih416.dtsi         |   44 +++++++++++++
 3 files changed, 167 insertions(+)

diff --git a/arch/arm/boot/dts/stih416-clock.dtsi b/arch/arm/boot/dts/stih416-clock.dtsi
index 7026bf1..a6942c7 100644
--- a/arch/arm/boot/dts/stih416-clock.dtsi
+++ b/arch/arm/boot/dts/stih416-clock.dtsi
@@ -37,5 +37,19 @@
 			clock-frequency = <100000000>;
 			clock-output-names = "CLK_S_ICN_REG_0";
 		};
+
+		CLK_S_GMAC0_PHY: clockgenA1@7 {
+			#clock-cells = <0>;
+			compatible = "fixed-clock";
+			clock-frequency = <25000000>;
+			clock-output-names = "CLK_S_GMAC0_PHY";
+		};
+
+		CLK_S_ETH1_PHY: clockgenA0@7 {
+			#clock-cells = <0>;
+			compatible = "fixed-clock";
+			clock-frequency = <25000000>;
+			clock-output-names = "CLK_S_ETH1_PHY";
+		};
 	};
 };
diff --git a/arch/arm/boot/dts/stih416-pinctrl.dtsi b/arch/arm/boot/dts/stih416-pinctrl.dtsi
index 8863c38..c4beef2 100644
--- a/arch/arm/boot/dts/stih416-pinctrl.dtsi
+++ b/arch/arm/boot/dts/stih416-pinctrl.dtsi
@@ -132,6 +132,58 @@
 					};
 				};
 			};
+
+			gmac1 {
+				pinctrl_mii1: mii1 {
+					st,pins {
+						txd0 = <&PIO0 0 ALT1 OUT SE_NICLK_IO 0 CLK_A>;
+						txd1 = <&PIO0 1 ALT1 OUT SE_NICLK_IO 0 CLK_A>;
+						txd2 = <&PIO0 2 ALT1 OUT SE_NICLK_IO 0 CLK_A>;
+						txd3 = <&PIO0 3 ALT1 OUT SE_NICLK_IO 0 CLK_A>;
+						txer = <&PIO0 4 ALT1 OUT SE_NICLK_IO 0 CLK_A>;
+						txen = <&PIO0 5 ALT1 OUT SE_NICLK_IO 0 CLK_A>;
+						txclk = <&PIO0 6 ALT1 IN NICLK 0 CLK_A>;
+						col =   <&PIO0 7 ALT1 IN BYPASS 1000>;
+
+						mdio =  <&PIO1 0 ALT1 OUT BYPASS 1500>;
+						mdc =   <&PIO1 1 ALT1 OUT NICLK 0 CLK_A>;
+						crs =   <&PIO1 2 ALT1 IN BYPASS 1000>;
+						mdint = <&PIO1 3 ALT1 IN BYPASS 0>;
+						rxd0 =  <&PIO1 4 ALT1 IN SE_NICLK_IO 0 CLK_A>;
+						rxd1 =  <&PIO1 5 ALT1 IN SE_NICLK_IO 0 CLK_A>;
+						rxd2 =  <&PIO1 6 ALT1 IN SE_NICLK_IO 0 CLK_A>;
+						rxd3 =  <&PIO1 7 ALT1 IN SE_NICLK_IO 0 CLK_A>;
+
+						rxdv =  <&PIO2 0 ALT1 IN SE_NICLK_IO 0 CLK_A>;
+						rx_er = <&PIO2 1 ALT1 IN SE_NICLK_IO 0 CLK_A>;
+						rxclk = <&PIO2 2 ALT1 IN NICLK 0 CLK_A>;
+					 	phyclk = <&PIO2 3 ALT1 OUT NICLK 0 CLK_A>;
+					};
+				};
+				pinctrl_rgmii1: rgmii1-0 {
+					st,pins {
+						txd0 =  <&PIO0 0 ALT1 OUT DE_IO 500 CLK_A>;
+						txd1 =  <&PIO0 1 ALT1 OUT DE_IO 500 CLK_A>;
+						txd2 =  <&PIO0 2 ALT1 OUT DE_IO 500 CLK_A>;
+						txd3 =  <&PIO0 3 ALT1 OUT DE_IO 500 CLK_A>;
+						txen =  <&PIO0 5 ALT1 OUT DE_IO 0   CLK_A>;
+						txclk = <&PIO0 6 ALT1 IN  NICLK 0   CLK_A>;
+
+						mdio = <&PIO1 0 ALT1 OUT BYPASS 0>;
+						mdc  = <&PIO1 1 ALT1 OUT NICLK  0 CLK_A>;
+						rxd0 = <&PIO1 4 ALT1 IN DE_IO 500 CLK_A>;
+						rxd1 = <&PIO1 5 ALT1 IN DE_IO 500 CLK_A>;
+						rxd2 = <&PIO1 6 ALT1 IN DE_IO 500 CLK_A>;
+						rxd3 = <&PIO1 7 ALT1 IN DE_IO 500 CLK_A>;
+
+						rxdv   = <&PIO2 0 ALT1 IN  DE_IO 500 CLK_A>;
+						rxclk  = <&PIO2 2 ALT1 IN  NICLK 0   CLK_A>;
+						phyclk = <&PIO2 3 ALT4 OUT NICLK 0   CLK_B>;
+
+						clk125= <&PIO3 7 ALT4 IN NICLK 0 CLK_A>;
+					};
+				};
+			};
 		};
 
 		pin-controller-front {
@@ -322,6 +374,63 @@
 					};
 				};
 			};
+
+			gmac0 {
+				pinctrl_mii0: mii0 {
+					st,pins {
+						mdint = <&PIO13 6 ALT2 IN  BYPASS      0>;
+						txen =  <&PIO13 7 ALT2 OUT SE_NICLK_IO 0 CLK_A>;
+						txd0 =  <&PIO14 0 ALT2 OUT SE_NICLK_IO 0 CLK_A>;
+						txd1 =  <&PIO14 1 ALT2 OUT SE_NICLK_IO 0 CLK_A>;
+						txd2 =  <&PIO14 2 ALT2 OUT SE_NICLK_IO 0 CLK_B>;
+						txd3 =  <&PIO14 3 ALT2 OUT SE_NICLK_IO 0 CLK_B>;
+
+						txclk = <&PIO15 0 ALT2 IN  NICLK       0 CLK_A>;
+						txer =  <&PIO15 1 ALT2 OUT SE_NICLK_IO 0 CLK_A>;
+						crs = <&PIO15 2 ALT2 IN  BYPASS 1000>;
+						col = <&PIO15 3 ALT2 IN  BYPASS 1000>;
+						mdio= <&PIO15 4 ALT2 OUT BYPASS 1500>;
+						mdc = <&PIO15 5 ALT2 OUT NICLK  0    CLK_B>;
+
+						rxd0 =  <&PIO16 0 ALT2 IN SE_NICLK_IO 0 CLK_A>;
+						rxd1 =  <&PIO16 1 ALT2 IN SE_NICLK_IO 0 CLK_A>;
+						rxd2 =  <&PIO16 2 ALT2 IN SE_NICLK_IO 0 CLK_A>;
+						rxd3 =  <&PIO16 3 ALT2 IN SE_NICLK_IO 0 CLK_A>;
+						rxdv =  <&PIO15 6 ALT2 IN SE_NICLK_IO 0 CLK_A>;
+						rx_er = <&PIO15 7 ALT2 IN SE_NICLK_IO 0 CLK_A>;
+						rxclk = <&PIO17 0 ALT2 IN NICLK 0 CLK_A>;
+					 	phyclk = <&PIO13 5 ALT2 OUT NICLK 0 CLK_B>;
+					};
+				};
+
+				pinctrl_gmii0: gmii0 {
+					st,pins {
+						};
+				};
+				pinctrl_rgmii0: rgmii0 {
+					st,pins {
+						 phyclk = <&PIO13  5 ALT4 OUT NICLK 0 CLK_B>;
+						 txen = <&PIO13 7 ALT2 OUT DE_IO 0 CLK_A>;
+						 txd0  = <&PIO14 0 ALT2 OUT DE_IO 500 CLK_A>;
+						 txd1  = <&PIO14 1 ALT2 OUT DE_IO 500 CLK_A>;
+						 txd2  = <&PIO14 2 ALT2 OUT DE_IO 500 CLK_B>;
+						 txd3  = <&PIO14 3 ALT2 OUT DE_IO 500 CLK_B>;
+						 txclk = <&PIO15 0 ALT2 IN NICLK 0 CLK_A>;
+
+						 mdio = <&PIO15 4 ALT2 OUT BYPASS 0>;
+						 mdc = <&PIO15 5 ALT2 OUT NICLK 0 CLK_B>;
+
+						 rxdv = <&PIO15 6 ALT2 IN DE_IO 500 CLK_A>;
+						 rxd0 =<&PIO16 0 ALT2 IN DE_IO	500 CLK_A>;
+						 rxd1 =<&PIO16 1 ALT2 IN DE_IO	500 CLK_A>;
+						 rxd2 =<&PIO16 2 ALT2 IN DE_IO	500 CLK_A>;
+						 rxd3  =<&PIO16 3 ALT2 IN DE_IO 500 CLK_A>;
+						 rxclk =<&PIO17 0 ALT2 IN NICLK 0 CLK_A>;
+
+						 clk125=<&PIO17 6 ALT1 IN NICLK 0 CLK_A>;
+					};
+				};
+			};
 		};
 
 		pin-controller-fvdp-fe {
diff --git a/arch/arm/boot/dts/stih416.dtsi b/arch/arm/boot/dts/stih416.dtsi
index 788ba5b..a96055b 100644
--- a/arch/arm/boot/dts/stih416.dtsi
+++ b/arch/arm/boot/dts/stih416.dtsi
@@ -156,5 +156,49 @@
 
 			status		= "disabled";
 		};
+
+		ethernet0: dwmac@fe810000 {
+			device_type 	= "network";
+			compatible	= "st,stih416-dwmac", "snps,dwmac", "snps,dwmac-3.710";
+			status 		= "disabled";
+			reg 		= <0xfe810000 0x8000>, <0x8bc 0x4>;
+			reg-names	= "stmmaceth", "sti-ethconf";
+
+			interrupts = <0 133 0>, <0 134 0>, <0 135 0>;
+			interrupt-names = "macirq", "eth_wake_irq", "eth_lpi";
+
+			snps,pbl 	= <32>;
+			snps,mixed-burst;
+
+			st,syscon		= <&syscfg_rear>;
+			resets			= <&softreset STIH416_ETH0_SOFTRESET>;
+			reset-names		= "stmmaceth";
+			pinctrl-names 	= "default";
+			pinctrl-0	= <&pinctrl_mii0>;
+			clock-names	= "stmmaceth";
+			clocks		= <&CLK_S_GMAC0_PHY>;
+		};
+
+		ethernet1: dwmac@fef08000 {
+			device_type = "network";
+			compatible		= "st,stih416-dwmac", "snps,dwmac", "snps,dwmac-3.710";
+			status 		= "disabled";
+			reg		= <0xfef08000 0x8000>, <0x7f0 0x4>;
+			reg-names	= "stmmaceth", "sti-ethconf";
+			interrupts = <0 136 0>, <0 137 0>, <0 138 0>;
+			interrupt-names = "macirq", "eth_wake_irq", "eth_lpi";
+
+			snps,pbl	= <32>;
+			snps,mixed-burst;
+
+			st,syscon	= <&syscfg_sbc>;
+
+			resets		= <&softreset STIH416_ETH1_SOFTRESET>;
+			reset-names	= "stmmaceth";
+			pinctrl-names 	= "default";
+			pinctrl-0	= <&pinctrl_mii1>;
+			clock-names	= "stmmaceth";
+			clocks		= <&CLK_S_ETH1_PHY>;
+		};
 	};
 };
-- 
1.7.9.5

^ permalink raw reply related

* Re: OOPS in nf_ct_unlink_expect_report using Polycom RealPresence Mobile
From: Pablo Neira Ayuso @ 2014-02-03 12:14 UTC (permalink / raw)
  To: astx; +Cc: linux-kernel, netdev, netfilter, Alexey Dobriyan, netfilter-devel
In-Reply-To: <20140131170402.Horde.yIFUeQVjLycuS_8PGQoKmg5@aws-it.at>

[-- Attachment #1: Type: text/plain, Size: 234 bytes --]

On Fri, Jan 31, 2014 at 05:04:02PM +0100, astx wrote:
> Dear Alexey,
> 
> seems to help. Thank you for your quick response. Kernel 3.10.28 is
> now stable using h323 / Polycom.

Thanks, if no objection, will pass this patch to David.

[-- Attachment #2: 0001-netfilter-nf_nat_h323-fix-crash-in-nf_ct_unlink_expe.patch --]
[-- Type: text/x-diff, Size: 2377 bytes --]

>From d98506139d6e192705422ffba13bc2ff476ac513 Mon Sep 17 00:00:00 2001
From: Alexey Dobriyan <adobriyan@gmail.com>
Date: Mon, 3 Feb 2014 13:07:24 +0100
Subject: [PATCH] netfilter: nf_nat_h323: fix crash in
 nf_ct_unlink_expect_report()

Similar bug fixed in SIP module in 3f509c6 ("netfilter: nf_nat_sip: fix
incorrect handling of EBUSY for RTCP expectation").

BUG: unable to handle kernel paging request at 00100104
IP: [<f8214f07>] nf_ct_unlink_expect_report+0x57/0xf0 [nf_conntrack]
...
Call Trace:
  [<c0244bd8>] ? del_timer+0x48/0x70
  [<f8215687>] nf_ct_remove_expectations+0x47/0x60 [nf_conntrack]
  [<f8211c99>] nf_ct_delete_from_lists+0x59/0x90 [nf_conntrack]
  [<f8212e5e>] death_by_timeout+0x14e/0x1c0 [nf_conntrack]
  [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack]
  [<c024442d>] call_timer_fn+0x1d/0x80
  [<c024461e>] run_timer_softirq+0x18e/0x1a0
  [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack]
  [<c023e6f3>] __do_softirq+0xa3/0x170
  [<c023e650>] ? __local_bh_enable+0x70/0x70
  <IRQ>
  [<c023e587>] ? irq_exit+0x67/0xa0
  [<c0202af6>] ? do_IRQ+0x46/0xb0
  [<c027ad05>] ? clockevents_notify+0x35/0x110
  [<c066ac6c>] ? common_interrupt+0x2c/0x40
  [<c056e3c1>] ? cpuidle_enter_state+0x41/0xf0
  [<c056e6fb>] ? cpuidle_idle_call+0x8b/0x100
  [<c02085f8>] ? arch_cpu_idle+0x8/0x30
  [<c027314b>] ? cpu_idle_loop+0x4b/0x140
  [<c0273258>] ? cpu_startup_entry+0x18/0x20
  [<c066056d>] ? rest_init+0x5d/0x70
  [<c0813ac8>] ? start_kernel+0x2ec/0x2f2
  [<c081364f>] ? repair_env_string+0x5b/0x5b
  [<c0813269>] ? i386_start_kernel+0x33/0x35

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/nf_nat_h323.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/netfilter/nf_nat_h323.c b/net/ipv4/netfilter/nf_nat_h323.c
index 9eea059d..574f7eb 100644
--- a/net/ipv4/netfilter/nf_nat_h323.c
+++ b/net/ipv4/netfilter/nf_nat_h323.c
@@ -229,7 +229,10 @@ static int nat_rtp_rtcp(struct sk_buff *skb, struct nf_conn *ct,
 			ret = nf_ct_expect_related(rtcp_exp);
 			if (ret == 0)
 				break;
-			else if (ret != -EBUSY) {
+			else if (ret == -EBUSY) {
+				nf_ct_unexpect_related(rtp_exp);
+				continue;
+			} else if (ret < 0) {
 				nf_ct_unexpect_related(rtp_exp);
 				nated_port = 0;
 				break;
-- 
1.7.10.4


^ permalink raw reply related

* Re: [Xen-devel] [PATCH v6] xen/grant-table: Avoid m2p_override during mapping
From: Zoltan Kiss @ 2014-02-03 13:27 UTC (permalink / raw)
  To: David Vrabel
  Cc: Julien Grall, Stefano Stabellini, jonathan.davies, wei.liu2,
	ian.campbell, netdev, linux-kernel, xen-devel
In-Reply-To: <52EF7618.7030402@citrix.com>

On 03/02/14 11:57, David Vrabel wrote:
>>
>> Hi,
>>
>> That's bad indeed. I think the best solution is to put those parts
>> behind an #ifdef x86. The ones moved from x86/p2m.c to grant-table.c.
>> David, Stefano, what do you think?
> I don't think we want (more) #ifdef CONFIG_X86 in grant-table.c and the
> arch-specific bits will have to factored out into their own functions
> with suitable stubs provided for ARM.
>
I've just sent in v7 with stubs, I guess that's something you suggested. 
Please review it, I'm especially curious about your thoughts regarding 
the new function name.

Zoli

^ permalink raw reply

* Re: [PATCH] [RFC] netfilter: nf_conntrack: don't relase a conntrack with non-zero refcnt
From: Andrew Vagin @ 2014-02-03 13:59 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Florian Westphal, Andrey Vagin, netfilter-devel, Eric Dumazet,
	netfilter, netdev, linux-kernel, vvs, Cyrill Gorcunov,
	Vasiliy Averin
In-Reply-To: <20140202233046.GA4137@localhost>

On Mon, Feb 03, 2014 at 12:30:46AM +0100, Pablo Neira Ayuso wrote:
> On Thu, Jan 16, 2014 at 10:23:01AM +0100, Florian Westphal wrote:
> > Andrew Vagin <avagin@parallels.com> wrote:
> > > > I think it would be nice if we could keep it that way.
> > > > If everything fails we could proably intoduce a 'larval' dummy list
> > > > similar to the one used by template conntracks?
> > > 
> > > I'm not sure, that this is required. Could you elaborate when this can
> > > be useful?
> > 
> > You can dump the lists via ctnetlink.  Its meant as a debugging aid in
> > case one suspects refcnt leaks.
> > 
> > Granted, in this situation there should be no leak since we put the newly
> > allocated entry in the error case.
> > 
> > > Now I see only overhead, because we need to take the nf_conntrack_lock
> > > lock to add conntrack in a list.
> > 
> > True. I don't have any preference, I guess I'd just do the insertion into the
> > unconfirmed list when we know we cannot track to keep the "unhashed"
> > bug trap in the destroy function.
> > 
> > Pablo, any preference?
> 
> I think we can initially set to zero the refcount and bump it once it
> gets into any of the lists, so Eric's golden rule also stands for
> conntracks that are released without being inserted in any list via
> nf_conntrack_free().
> 
> My idea was to use dying list to detect possible runtime leaks (ie.
> missing nf_ct_put somewhere), not simple leaks the initialization
> path, as you said, it would add too much overhead to catch them with
> the dying list, so we can skip those.
> 
> Please, let me know if you find any issue with this approach.

Hello Pablo,

I don't see any problem with this approach and I like it.
Thank you for the patch.

> diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
> index 01ea6ee..b2ac624 100644
> --- a/include/net/netfilter/nf_conntrack.h
> +++ b/include/net/netfilter/nf_conntrack.h
> @@ -284,6 +284,8 @@ extern unsigned int nf_conntrack_max;
>  extern unsigned int nf_conntrack_hash_rnd;
>  void init_nf_conntrack_hash_rnd(void);
>  
> +void nf_conntrack_tmpl_insert(struct net *net, struct nf_conn *tmpl);
> +
>  #define NF_CT_STAT_INC(net, count)	  __this_cpu_inc((net)->ct.stat->count)
>  #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count)
>  
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 4d1fb5d..bd5ec5a 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -448,7 +448,8 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
>  			goto out;
>  
>  	add_timer(&ct->timeout);
> -	nf_conntrack_get(&ct->ct_general);
> +	/* The caller holds a reference to this object */
> +	atomic_set(&ct->ct_general.use, 2);
>  	__nf_conntrack_hash_insert(ct, hash, repl_hash);
>  	NF_CT_STAT_INC(net, insert);
>  	spin_unlock_bh(&nf_conntrack_lock);
> @@ -462,6 +463,21 @@ out:
>  }
>  EXPORT_SYMBOL_GPL(nf_conntrack_hash_check_insert);
>  
> +/* deletion from this larval template list happens via nf_ct_put() */
> +void nf_conntrack_tmpl_insert(struct net *net, struct nf_conn *tmpl)
> +{
> +	__set_bit(IPS_TEMPLATE_BIT, &tmpl->status);
> +	__set_bit(IPS_CONFIRMED_BIT, &tmpl->status);
> +	nf_conntrack_get(&tmpl->ct_general);
> +
> +	spin_lock_bh(&nf_conntrack_lock);
> +	/* Overload tuple linked list to put us in template list. */
> +	hlist_nulls_add_head_rcu(&tmpl->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
> +				 &net->ct.tmpl);
> +	spin_unlock_bh(&nf_conntrack_lock);
> +}
> +EXPORT_SYMBOL_GPL(nf_conntrack_tmpl_insert);
> +
>  /* Confirm a connection given skb; places it in hash table */
>  int
>  __nf_conntrack_confirm(struct sk_buff *skb)
> @@ -733,11 +749,11 @@ __nf_conntrack_alloc(struct net *net, u16 zone,
>  		nf_ct_zone->id = zone;
>  	}
>  #endif
> -	/*
> -	 * changes to lookup keys must be done before setting refcnt to 1
> +	/* Because we use RCU lookups, we set ct_general.use to zero before
> +	 * this is inserted in any list.
>  	 */
>  	smp_wmb();
> -	atomic_set(&ct->ct_general.use, 1);
> +	atomic_set(&ct->ct_general.use, 0);
>  	return ct;
>  
>  #ifdef CONFIG_NF_CONNTRACK_ZONES
> @@ -761,6 +777,11 @@ void nf_conntrack_free(struct nf_conn *ct)
>  {
>  	struct net *net = nf_ct_net(ct);
>  
> +	/* A freed object has refcnt == 0, thats
> +	 * the golden rule for SLAB_DESTROY_BY_RCU
> +	 */
> +	NF_CT_ASSERT(atomic_read(&ct->ct_general.use) == 0);
> +
>  	nf_ct_ext_destroy(ct);
>  	nf_ct_ext_free(ct);
>  	kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
> @@ -856,6 +877,9 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
>  		NF_CT_STAT_INC(net, new);
>  	}
>  
> +	/* Now it is inserted into the hashes, bump refcount */
> +	nf_conntrack_get(&ct->ct_general);
> +
>  	/* Overload tuple linked list to put us in unconfirmed list. */
>  	hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
>  		       &net->ct.unconfirmed);
> diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c
> index 9858e3e..52e20c9 100644
> --- a/net/netfilter/nf_synproxy_core.c
> +++ b/net/netfilter/nf_synproxy_core.c
> @@ -363,9 +363,8 @@ static int __net_init synproxy_net_init(struct net *net)
>  		goto err2;
>  	if (!nfct_synproxy_ext_add(ct))
>  		goto err2;
> -	__set_bit(IPS_TEMPLATE_BIT, &ct->status);
> -	__set_bit(IPS_CONFIRMED_BIT, &ct->status);
>  
> +	nf_conntrack_tmpl_insert(net, ct);
>  	snet->tmpl = ct;
>  
>  	snet->stats = alloc_percpu(struct synproxy_stats);
> @@ -390,7 +389,7 @@ static void __net_exit synproxy_net_exit(struct net *net)
>  {
>  	struct synproxy_net *snet = synproxy_pernet(net);
>  
> -	nf_conntrack_free(snet->tmpl);
> +	nf_ct_put(snet->tmpl);
>  	synproxy_proc_exit(net);
>  	free_percpu(snet->stats);
>  }
> diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c
> index 5929be6..75747ae 100644
> --- a/net/netfilter/xt_CT.c
> +++ b/net/netfilter/xt_CT.c
> @@ -228,12 +228,7 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par,
>  			goto err3;
>  	}
>  
> -	__set_bit(IPS_TEMPLATE_BIT, &ct->status);
> -	__set_bit(IPS_CONFIRMED_BIT, &ct->status);
> -
> -	/* Overload tuple linked list to put us in template list. */
> -	hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
> -				 &par->net->ct.tmpl);
> +	nf_conntrack_tmpl_insert(par->net, ct);
>  out:
>  	info->ct = ct;
>  	return 0;


^ permalink raw reply

* Requeues and ECN marking
From: Greg Kuperman @ 2014-02-03 14:50 UTC (permalink / raw)
  To: netdev

Hi all,

I am testing a new congestion control protocol that relies on explicit
congestion notifications (ECN) to notify the receiver of a congestion
event. I have a rate limited link of 1 Mbps, and I am using the RED
queuing discipline with ECN enabled. What I have noticed is that no
matter how small I set my queue size, or how low I set my minimum
marking level, the first ECN marked packet does not get sent out for
about 10 seconds after the input rate exceeds the output rate. Further
examination shows that ECN marking does not occur until the number or
requeues hits 1000. Below are two queries of tc -s -d qdisc ls dev
eth1.

qdisc red 8028: root refcnt 2 limit 10000000b min 1b max 0b ecn ewma
30 Plog 21 Scell_log 31
 Sent 1307892 bytes 1247 pkt (dropped 0, overlimits 0 requeues 960)
 backlog 1052118b 962p requeues 960
  marked 0 early 0 pdrop 0 other 0

qdisc red 8028: root refcnt 2 limit 10000000b min 1b max 0b ecn ewma
30 Plog 21 Scell_log 31
 Sent 1379262 bytes 1312 pkt (dropped 0, overlimits 72 requeues 1024)
 backlog 1122468b 1027p requeues 1024
  marked 72 early 0 pdrop 0 other 0

The txqueuelen defaults to 1000 for the interface, so I figured that
packets maybe buffering there, and then dequeuing, before any packets
are marked. I set txqueuelen to lower values (all the way down to 1),
but the exact same behavior occurs (no marked packets until number of
dequeues hits 1000). In contrast, if I set txqueuele to something very
high, I get no requeues, drops, or marked packets.

My goal is for packets to be marked as soon as the ingress rate
exceeds the egress. Am I correct in thinking that the requeuing
operation is the culprit? Can I eliminate requeues? Is there something
else I can do to get the behavior I am looking for?

Thank you all for the help. And please cc me in your replies; I'm not
100% sure if I get all the messages from this mailing list.

Best,
Greg

^ permalink raw reply

* [PATCH] net:phy:dp83640: Declare that TX timestamping possible
From: Stefan Sørensen @ 2014-02-03 14:36 UTC (permalink / raw)
  To: richardcochran, netdev; +Cc: Stefan Sørensen

Set the SKBTX_IN_PROGRESS bit in tx_flags dp83640_txtstamp when doing
tx timestamps as per Documentation/networking/timestamping.txt.

Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
---
 drivers/net/phy/dp83640.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index dfb132e..ae95a9a 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -1273,6 +1273,7 @@ static void dp83640_txtstamp(struct phy_device *phydev,
 		}
 		/* fall through */
 	case HWTSTAMP_TX_ON:
+		skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
 		skb_queue_tail(&dp83640->tx_queue, skb);
 		schedule_work(&dp83640->ts_work);
 		break;
-- 
1.8.5.3

^ permalink raw reply related

* [PATCH] net:phy:dp83640: Do not hardcode timestamping event edge
From: Stefan Sørensen @ 2014-02-03 14:36 UTC (permalink / raw)
  To: richardcochran, netdev; +Cc: Stefan Sørensen

Currently the external timestamping code it hardcoded to use
the rising edge even though the hardware has configurable event
edge detection. This patch change the code to use falling edge
detection if PTP_FALLING_EDGE is set in the user supplied flags.

Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
---
 drivers/net/phy/dp83640.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index baa1a75..80c5fc8 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -440,7 +440,10 @@ static int ptp_dp83640_enable(struct ptp_clock_info *ptp,
 		if (on) {
 			gpio_num = extts_gpio[index];
 			evnt |= (gpio_num & EVNT_GPIO_MASK) << EVNT_GPIO_SHIFT;
-			evnt |= EVNT_RISE;
+			if (rq->extts.flags & PTP_FALLING_EDGE)
+				evnt |= EVNT_FALL;
+			else
+				evnt |= EVNT_RISE;
 		}
 		ext_write(0, phydev, PAGE5, PTP_EVNT, evnt);
 		return 0;
-- 
1.8.5.3

^ permalink raw reply related

* [PATCH] net:phy:dp83640: Initialize PTP clocks at device init.
From: Stefan Sørensen @ 2014-02-03 14:36 UTC (permalink / raw)
  To: richardcochran, netdev; +Cc: Stefan Sørensen

The trigger and events functionality can be useful even if packet
timestamping is not used, but the required PTP clock is only enabled
when packet timestamping is started. This patch moves the clock enable
to when the interface is configured.

Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
---
 drivers/net/phy/dp83640.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index 80c5fc8..14616e4 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -1064,6 +1064,13 @@ static void dp83640_remove(struct phy_device *phydev)
 	kfree(dp83640);
 }
 
+static int dp83640_config_init(struct phy_device *phydev)
+{
+	enable_status_frames(phydev, true);
+	ext_write(0, phydev, PAGE4, PTP_CTL, PTP_ENABLE);
+	return 0;
+}
+
 static int dp83640_ack_interrupt(struct phy_device *phydev)
 {
 	int err = phy_read(phydev, MII_DP83640_MISR);
@@ -1201,11 +1208,6 @@ static int dp83640_hwtstamp(struct phy_device *phydev, struct ifreq *ifr)
 
 	mutex_lock(&dp83640->clock->extreg_lock);
 
-	if (dp83640->hwts_tx_en || dp83640->hwts_rx_en) {
-		enable_status_frames(phydev, true);
-		ext_write(0, phydev, PAGE4, PTP_CTL, PTP_ENABLE);
-	}
-
 	ext_write(0, phydev, PAGE5, PTP_TXCFG0, txcfg0);
 	ext_write(0, phydev, PAGE5, PTP_RXCFG0, rxcfg0);
 
@@ -1337,6 +1339,7 @@ static struct phy_driver dp83640_driver = {
 	.flags		= PHY_HAS_INTERRUPT,
 	.probe		= dp83640_probe,
 	.remove		= dp83640_remove,
+	.config_init	= dp83640_config_init,
 	.config_aneg	= genphy_config_aneg,
 	.read_status	= genphy_read_status,
 	.ack_interrupt  = dp83640_ack_interrupt,
-- 
1.8.5.3

^ permalink raw reply related

* [PATCH] ptp: Allow selecting trigger/event index in testptp
From: Stefan Sørensen @ 2014-02-03 14:36 UTC (permalink / raw)
  To: richardcochran, netdev; +Cc: Stefan Sørensen

Currently the trigger/event is hardcoded to 0, this patch adds
a new command line argument -i to select an arbitrary trigger/
event.

Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
---
 Documentation/ptp/testptp.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c
index a74d0a8..04b21cd 100644
--- a/Documentation/ptp/testptp.c
+++ b/Documentation/ptp/testptp.c
@@ -123,7 +123,8 @@ static void usage(char *progname)
 		" -P val     enable or disable (val=1|0) the system clock PPS\n"
 		" -s         set the ptp clock time from the system time\n"
 		" -S         set the system time from the ptp clock time\n"
-		" -t val     shift the ptp clock time by 'val' seconds\n",
+		" -t val     shift the ptp clock time by 'val' seconds\n"
+		" -i val     index for event/trigger\n",
 		progname);
 }
 
@@ -161,13 +162,14 @@ int main(int argc, char *argv[])
 	int perout = -1;
 	int pps = -1;
 	int settime = 0;
+	int index = 0;
 
 	int64_t t1, t2, tp;
 	int64_t interval, offset;
 
 	progname = strrchr(argv[0], '/');
 	progname = progname ? 1+progname : argv[0];
-	while (EOF != (c = getopt(argc, argv, "a:A:cd:e:f:ghk:p:P:sSt:v"))) {
+	while (EOF != (c = getopt(argc, argv, "a:A:cd:e:f:ghk:p:P:sSt:vi:"))) {
 		switch (c) {
 		case 'a':
 			oneshot = atoi(optarg);
@@ -209,6 +211,9 @@ int main(int argc, char *argv[])
 		case 't':
 			adjtime = atoi(optarg);
 			break;
+		case 'i':
+			index = atoi(optarg);
+			break;
 		case 'h':
 			usage(progname);
 			return 0;
@@ -301,7 +306,7 @@ int main(int argc, char *argv[])
 
 	if (extts) {
 		memset(&extts_request, 0, sizeof(extts_request));
-		extts_request.index = 0;
+		extts_request.index = index;
 		extts_request.flags = PTP_ENABLE_FEATURE;
 		if (ioctl(fd, PTP_EXTTS_REQUEST, &extts_request)) {
 			perror("PTP_EXTTS_REQUEST");
@@ -375,7 +380,7 @@ int main(int argc, char *argv[])
 			return -1;
 		}
 		memset(&perout_request, 0, sizeof(perout_request));
-		perout_request.index = 0;
+		perout_request.index = index;
 		perout_request.start.sec = ts.tv_sec + 2;
 		perout_request.start.nsec = 0;
 		perout_request.period.sec = 0;
-- 
1.8.5.3

^ permalink raw reply related

* Re: [PATCH] ipv6: default route for link local address is not added while assigning a address
From: Nicolas Dichtel @ 2014-02-03 15:23 UTC (permalink / raw)
  To: Sohny Thomas, netdev, linux-kernel, yoshfuji, davem, kumuda,
	Hannes Frederic Sowa
In-Reply-To: <52EF42FF.60907@linux.vnet.ibm.com>

Le 03/02/2014 08:19, Sohny Thomas a écrit :
>
>> Actually I am not so sure, there is no defined semantic of flush. I would
>> be ok with all three solutions: leave it as is, always add link-local
>> address (it does not matter if we don't have a link-local address on
It matters. This address is required.
RFC 4291
Section 2.1:
    All interfaces are required to have at least one Link-Local unicast
    address (see Section 2.8 for additional required addresses).
Section 2.8:
       o Its required Link-Local address for each interface.

>> that interface, as a global scoped one is just fine enough) or make flush not
>> remove the link-local address (but this seems a bit too special cased for me).
>
> 1) In case if we leave it as it is, there is rfc 6724 rule 2 to be considered (
> previously rfc 3484)
>
> Rule 2: Prefer appropriate scope.
>     If Scope(SA) < Scope(SB): If Scope(SA) < Scope(D), then prefer SB and
>     otherwise prefer SA.  Similarly, if Scope(SB) < Scope(SA): If
>     Scope(SB) < Scope(D), then prefer SA and otherwise prefer SB.
>
> Test:
>
>     Destination: fe80::2(LS)
>      Candidate Source Addresses: 3ffe::1(GS) or fec0::1(SS) or LLA(LS)
>      Result: LLA(LS)
>      Scope(LLA) < Scope(fec0::1): If Scope(LLA) < Scope(fe80::2),  no, prefer LLA
>      Scope(LLA) < Scope(3ffe::1): If Scope(LLA) < Scope(fe80::2),  no, prefer LLA
>
>
> Now the above test fails since the route itself is not present, and the test
> assumes that the route gets added since the LLA is not removed during the test
In your scenario, the link local route has been removed manually, not by the
kernel. What is your network manager?

>
> 2) having a LLA always helps in NDP i think
A link-local Address yes, it's a MUST. But having only the link local route will
not help.

>
> 3) making flush not remove link-local address will be chnaging functionality of
> ip flush command
You can flush by specifying the prototype:
ip -6 route flush proto static


Regards,
Nicolas

^ permalink raw reply

* Re: OOPS in nf_ct_unlink_expect_report using Polycom RealPresence Mobile
From: astx @ 2014-02-03 15:46 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: linux-kernel, netdev, netfilter, Alexey Dobriyan, netfilter-devel
In-Reply-To: <20140203121415.GA12777@localhost>

Test results / tested kernel versions:

3.2.54
3.8.13
3.10.28

Above kernel versions without patch are dying with same error on  
trying to start h323 connections using "Polycom RealPresence Mobile".

I can confirm that with this patch all three kernel versions are  
pretty stable now again.

Thank you all for your fast and competent help.

Best Regards,

Toni

Zitat von Pablo Neira Ayuso <pablo@netfilter.org>:

> On Fri, Jan 31, 2014 at 05:04:02PM +0100, astx wrote:
>> Dear Alexey,
>>
>> seems to help. Thank you for your quick response. Kernel 3.10.28 is
>> now stable using h323 / Polycom.
>
> Thanks, if no objection, will pass this patch to David.

^ permalink raw reply

* Re: [PATCH] ptp: Allow selecting trigger/event index in testptp
From: Richard Cochran @ 2014-02-03 15:59 UTC (permalink / raw)
  To: Stefan Sørensen; +Cc: netdev
In-Reply-To: <1391438187-21834-1-git-send-email-stefan.sorensen@spectralink.com>

On Mon, Feb 03, 2014 at 03:36:27PM +0100, Stefan Sørensen wrote:
> Currently the trigger/event is hardcoded to 0, this patch adds
> a new command line argument -i to select an arbitrary trigger/
> event.

This is a nice extension of the program, but ...

> diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c
> index a74d0a8..04b21cd 100644
> --- a/Documentation/ptp/testptp.c
> +++ b/Documentation/ptp/testptp.c
> @@ -123,7 +123,8 @@ static void usage(char *progname)
>  		" -P val     enable or disable (val=1|0) the system clock PPS\n"
>  		" -s         set the ptp clock time from the system time\n"
>  		" -S         set the system time from the ptp clock time\n"
> -		" -t val     shift the ptp clock time by 'val' seconds\n",
> +		" -t val     shift the ptp clock time by 'val' seconds\n"
> +		" -i val     index for event/trigger\n",

can we please keep the options in alphabetical order?

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH] ipv6: default route for link local address is not added while assigning a address
From: Hannes Frederic Sowa @ 2014-02-03 16:08 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: Sohny Thomas, netdev, linux-kernel, yoshfuji, davem, kumuda
In-Reply-To: <52EFB454.1040908@6wind.com>

Hello!

On Mon, Feb 03, 2014 at 04:23:00PM +0100, Nicolas Dichtel wrote:
> Le 03/02/2014 08:19, Sohny Thomas a écrit :
> >
> >>Actually I am not so sure, there is no defined semantic of flush. I would
> >>be ok with all three solutions: leave it as is, always add link-local
> >>address (it does not matter if we don't have a link-local address on
> It matters. This address is required.
> RFC 4291
> Section 2.1:
>    All interfaces are required to have at least one Link-Local unicast
>    address (see Section 2.8 for additional required addresses).
> Section 2.8:
>       o Its required Link-Local address for each interface.

Yes, sure, it is required. But you also can manually delete the LL address and
we don't guard against that.

> >>that interface, as a global scoped one is just fine enough) or make flush 
> >>not
> >>remove the link-local address (but this seems a bit too special cased for 
> >>me).
> >
> >1) In case if we leave it as it is, there is rfc 6724 rule 2 to be 
> >considered (
> >previously rfc 3484)
> >
> >Rule 2: Prefer appropriate scope.
> >    If Scope(SA) < Scope(SB): If Scope(SA) < Scope(D), then prefer SB and
> >    otherwise prefer SA.  Similarly, if Scope(SB) < Scope(SA): If
> >    Scope(SB) < Scope(D), then prefer SA and otherwise prefer SB.
> >
> >Test:
> >
> >    Destination: fe80::2(LS)
> >     Candidate Source Addresses: 3ffe::1(GS) or fec0::1(SS) or LLA(LS)
> >     Result: LLA(LS)
> >     Scope(LLA) < Scope(fec0::1): If Scope(LLA) < Scope(fe80::2),  no, 
> >     prefer LLA
> >     Scope(LLA) < Scope(3ffe::1): If Scope(LLA) < Scope(fe80::2),  no, 
> >     prefer LLA
> >
> >
> >Now the above test fails since the route itself is not present, and the 
> >test
> >assumes that the route gets added since the LLA is not removed during the 
> >test
> In your scenario, the link local route has been removed manually, not by the
> kernel. What is your network manager?

The test scenario is outlined here:
<https://bugzilla.kernel.org/show_bug.cgi?id=68511>

Basically, the command in question is this one:

	[root@localhost ~]# ip -6 -statistics -statistics route flush dev eth0

which removes the fe80::/64 route.

> >2) having a LLA always helps in NDP i think
> A link-local Address yes, it's a MUST. But having only the link local route 
> will
> not help.

Agreed, the LL address should be available, too. I currently don't know
what will break if LL address is not available. I guess MLD won't work
properly and thus even basic connectivity won't work with some switches.

> >3) making flush not remove link-local address will be chnaging 
> >functionality of
> >ip flush command
> You can flush by specifying the prototype:
> ip -6 route flush proto static

So we have four possiblities now:

1) leave it as is

	seems still acceptable to me

2) add fe80::/64 route unconditionally if any address gets added

	Sohny's patch already looks good in doing so at first look.

3) add fe80::/64 route in case LL address gets added via inet6_rtm_newaddr

	would be ok, too. I tend towards this solution somehow by now.

4) make flush not remove the fe80::/64 address

	Least favourable to me. I guess this also woud need iproute change
	and seems most difficult to do.

Any opionions?

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH] net:phy:dp83640: Declare that TX timestamping possible
From: Richard Cochran @ 2014-02-03 16:19 UTC (permalink / raw)
  To: Stefan Sørensen; +Cc: netdev
In-Reply-To: <1391438195-21888-1-git-send-email-stefan.sorensen@spectralink.com>

On Mon, Feb 03, 2014 at 03:36:35PM +0100, Stefan Sørensen wrote:
> Set the SKBTX_IN_PROGRESS bit in tx_flags dp83640_txtstamp when doing
> tx timestamps as per Documentation/networking/timestamping.txt.
> 
> Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
> ---

Acked-by: Richard Cochran <richardcochran@gmail.com>

^ permalink raw reply

* Re: [PATCH] net:phy:dp83640: Do not hardcode timestamping event edge
From: Richard Cochran @ 2014-02-03 16:20 UTC (permalink / raw)
  To: Stefan Sørensen; +Cc: netdev
In-Reply-To: <1391438210-21941-1-git-send-email-stefan.sorensen@spectralink.com>

On Mon, Feb 03, 2014 at 03:36:50PM +0100, Stefan Sørensen wrote:
> Currently the external timestamping code it hardcoded to use
> the rising edge even though the hardware has configurable event
> edge detection. This patch change the code to use falling edge
> detection if PTP_FALLING_EDGE is set in the user supplied flags.
> 
> Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
> ---

Acked-by: Richard Cochran <richardcochran@gmail.com>

^ permalink raw reply

* Re: [PATCH] [RFC] netfilter: nf_conntrack: don't relase a conntrack with non-zero refcnt
From: Eric Dumazet @ 2014-02-03 16:22 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Florian Westphal, Andrew Vagin, Andrey Vagin, netfilter-devel,
	netfilter, netdev, linux-kernel, vvs, Cyrill Gorcunov,
	Vasiliy Averin
In-Reply-To: <20140202233046.GA4137@localhost>

On Mon, 2014-02-03 at 00:30 +0100, Pablo Neira Ayuso wrote:
>          */
>         smp_wmb();
> -       atomic_set(&ct->ct_general.use, 1);
> +       atomic_set(&ct->ct_general.use, 0);
>         return ct; 

Hi Pablo !

I think your patch is the way to go, but might need some extra care
with memory barriers.

I believe the smp_wmb() here is no longer needed.

If its a newly allocated memory, no other users can access to ct,
if its a recycled ct, content is already 0 anyway.

After your patch, nf_conntrack_get(&tmpl->ct_general) should increment 
an already non zero refcnt, so no memory barrier is needed.

But one smp_wmb() is needed right before this point :

	/* The caller holds a reference to this object */
	atomic_set(&ct->ct_general.use, 2);

Thanks !

^ permalink raw reply

* Re: [PATCH] ipv6: default route for link local address is not added while assigning a address
From: Nicolas Dichtel @ 2014-02-03 16:26 UTC (permalink / raw)
  To: Sohny Thomas, netdev, linux-kernel, yoshfuji, davem, kumuda
In-Reply-To: <20140203160838.GA17999@order.stressinduktion.org>

Le 03/02/2014 17:08, Hannes Frederic Sowa a écrit :
> Hello!
>
> On Mon, Feb 03, 2014 at 04:23:00PM +0100, Nicolas Dichtel wrote:
>> Le 03/02/2014 08:19, Sohny Thomas a écrit :
>>>
>>>> Actually I am not so sure, there is no defined semantic of flush. I would
>>>> be ok with all three solutions: leave it as is, always add link-local
>>>> address (it does not matter if we don't have a link-local address on
>> It matters. This address is required.
>> RFC 4291
>> Section 2.1:
>>     All interfaces are required to have at least one Link-Local unicast
>>     address (see Section 2.8 for additional required addresses).
>> Section 2.8:
>>        o Its required Link-Local address for each interface.
>
> Yes, sure, it is required. But you also can manually delete the LL address and
> we don't guard against that.
Sure. It's why I don't like this patch, it fix a user error.

>
>>>> that interface, as a global scoped one is just fine enough) or make flush
>>>> not
>>>> remove the link-local address (but this seems a bit too special cased for
>>>> me).
>>>
>>> 1) In case if we leave it as it is, there is rfc 6724 rule 2 to be
>>> considered (
>>> previously rfc 3484)
>>>
>>> Rule 2: Prefer appropriate scope.
>>>     If Scope(SA) < Scope(SB): If Scope(SA) < Scope(D), then prefer SB and
>>>     otherwise prefer SA.  Similarly, if Scope(SB) < Scope(SA): If
>>>     Scope(SB) < Scope(D), then prefer SA and otherwise prefer SB.
>>>
>>> Test:
>>>
>>>     Destination: fe80::2(LS)
>>>      Candidate Source Addresses: 3ffe::1(GS) or fec0::1(SS) or LLA(LS)
>>>      Result: LLA(LS)
>>>      Scope(LLA) < Scope(fec0::1): If Scope(LLA) < Scope(fe80::2),  no,
>>>      prefer LLA
>>>      Scope(LLA) < Scope(3ffe::1): If Scope(LLA) < Scope(fe80::2),  no,
>>>      prefer LLA
>>>
>>>
>>> Now the above test fails since the route itself is not present, and the
>>> test
>>> assumes that the route gets added since the LLA is not removed during the
>>> test
>> In your scenario, the link local route has been removed manually, not by the
>> kernel. What is your network manager?
>
> The test scenario is outlined here:
> <https://bugzilla.kernel.org/show_bug.cgi?id=68511>
>
> Basically, the command in question is this one:
>
> 	[root@localhost ~]# ip -6 -statistics -statistics route flush dev eth0
>
> which removes the fe80::/64 route.
>
>>> 2) having a LLA always helps in NDP i think
>> A link-local Address yes, it's a MUST. But having only the link local route
>> will
>> not help.
>
> Agreed, the LL address should be available, too. I currently don't know
> what will break if LL address is not available. I guess MLD won't work
> properly and thus even basic connectivity won't work with some switches.
>
>>> 3) making flush not remove link-local address will be chnaging
>>> functionality of
>>> ip flush command
>> You can flush by specifying the prototype:
>> ip -6 route flush proto static
>
> So we have four possiblities now:
>
> 1) leave it as is
>
> 	seems still acceptable to me
>
> 2) add fe80::/64 route unconditionally if any address gets added
>
> 	Sohny's patch already looks good in doing so at first look.
I don't like this solution, because it's a kernel patch to fix a configuration
problem.

>
> 3) add fe80::/64 route in case LL address gets added via inet6_rtm_newaddr
>
> 	would be ok, too. I tend towards this solution somehow by now.
This seems right also, but I'm not sure that this will fix Sohny's pb.

>
> 4) make flush not remove the fe80::/64 address
>
> 	Least favourable to me. I guess this also woud need iproute change
> 	and seems most difficult to do.
Why using this command 'ip -6 route flush proto static' isn't possible?
I think that we know what kind of route is added for these TAHI tests, hence
it's better to remove only routes added manually (or by a routing daemon if
it's the case).
Removing kernel routes may hide bugs: imagine the kernel adds a wrong route,
TAHI will not detect it.


Regards,
Nicolas

^ permalink raw reply

* Re: [PATCH] net:phy:dp83640: Initialize PTP clocks at device init.
From: Richard Cochran @ 2014-02-03 16:33 UTC (permalink / raw)
  To: Stefan Sørensen; +Cc: netdev
In-Reply-To: <1391438218-21994-1-git-send-email-stefan.sorensen@spectralink.com>

On Mon, Feb 03, 2014 at 03:36:58PM +0100, Stefan Sørensen wrote:
> The trigger and events functionality can be useful even if packet
> timestamping is not used, but the required PTP clock is only enabled
> when packet timestamping is started. This patch moves the clock enable
> to when the interface is configured.

Hm, I vaguely recall that there might have been some reason not enable
the clock too early. (Maybe this was related to multiple PHYs?)

Quickly looking at the code once again, I can't see anything wrong with
this now, but I'll look at it again tomorrow.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH net 4/5] openvswitch: Fix ovs_flow_free() ovs-lock assert.
From: Sergei Shtylyov @ 2014-02-03 16:42 UTC (permalink / raw)
  To: Jesse Gross, David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, dev-yBygre7rU0SM8Zsap4Y0gw
In-Reply-To: <1391389686-34303-5-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

Hello.

On 03-02-2014 5:08, Jesse Gross wrote:

> From: Pravin B Shelar <pshelar-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

> ovs_flow_free() is not called under ovs-lock during packet
> execute path (ovs_packet_cmd_execute()). Since packet execute
> does not touch flow->mask, there is no need to take that
> lock either. So move assert in case where flow->mask is checked.

> Found by code inspection.

> Signed-off-by: Pravin B Shelar <pshelar-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
> ---
>   net/openvswitch/flow_table.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)

> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index bd14052..ad0bda0 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -158,11 +158,12 @@ void ovs_flow_free(struct sw_flow *flow, bool deferred)
>   	if (!flow)
>   		return;
>
> -	ASSERT_OVSL();
> -
>   	if (flow->mask) {
>   		struct sw_flow_mask *mask = flow->mask;
>
> +		/* ovs-lock is required to protect mask-refcount and
> +		 * mask list. */

    Networking multi-line comment style is:

/* bla
  * bla
  */

WBR, Sergei

^ permalink raw reply

* Re: [PATCH net 1/5] openvswitch: Pad OVS_PACKET_ATTR_PACKET if linear copy was performed
From: Sergei Shtylyov @ 2014-02-03 16:43 UTC (permalink / raw)
  To: Jesse Gross, David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, dev-yBygre7rU0SM8Zsap4Y0gw
In-Reply-To: <1391389686-34303-2-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

Hello.

On 03-02-2014 5:08, Jesse Gross wrote:

> From: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>

> While the zerocopy method is correctly omitted if user space
> does not support unaligned Netlink messages. The attribute is
> still not padded correctly as skb_zerocopy() will not ensure
> padding and the attribute size is no longer pre calculated
> though nla_reserve() which ensured padding previously.

> This patch applies appropriate padding if a linear data copy
> was performed in skb_zerocopy().

> Signed-off-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
> Acked-by: Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
> ---
>   net/openvswitch/datapath.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)

> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index df46928..3ca9121 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
[...]
> @@ -466,6 +466,11 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
>
>   	skb_zerocopy(user_skb, skb, skb->len, hlen);
>
> +	/* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */
> +	if (!(dp->user_features & OVS_DP_F_UNALIGNED) &&
> +	    (plen = (ALIGN(user_skb->len, NLA_ALIGNTO) - user_skb->len)) > 0)

    This shouldn't pass checkpatch.pl which complains about assignments inside 
*if* statements.

WBR, Sergei

^ permalink raw reply

* usb interrupt storms from the ax88179 hardware
From: David Laight @ 2014-02-03 17:17 UTC (permalink / raw)
  To: netdev, linux-usb@vger.kernel.org, Freddy Xin

On one system (an amd motherboard with the ASMedia xhci controller)
I'm seeing almost back to back USB (7 or 8 a second) 'interrupt'
packets from an ax88179 Ge card.
It may be that other systems behave similarly.
I'm sure this hadn't used to happen!

I don't know what the interrupt status means, the value (as LE u32)
is 0x900a1 0xe1cd6d79
The only bit the driver looks at is the 0x10000 bit in the first word.
This is the 'link up/down' flag.
The two halves of the second word are probably different fields, the
high bits appear after about a second.

Now I'd guess that the driver ought to be doing something about some
of these values. While in this mode transmits are delayed for anything
upto 100ms, at least for some time they do get sent.

However the processing of the 'link up/down' flag is clearly wrong.
The code currently does:

348         event = urb->transfer_buffer;
349         le32_to_cpus((void *)&event->intdata1);
350 
351         link = (((__force u32)event->intdata1) & AX_INT_PPLS_LINK) >> 16;
352 
353         if (netif_carrier_ok(dev->net) != link) {
354                 usbnet_link_change(dev, link, 1);
355                 netdev_info(dev->net, "ax88179 - Link status is: %d\n", link);
356         }

Which ends up doing repeated calls to usbnet_link_change() and confusing
that code a lot.

I presume there is some delay before the return value from netif_carrier_ok()
matches the set state.

I think the code should be remembering the link state locally.
It might also need to clear some other interrupt flags, only ASIX know
what they mean.

	David

^ permalink raw reply

* Re: Requeues and ECN marking
From: Eric Dumazet @ 2014-02-03 17:28 UTC (permalink / raw)
  To: Greg Kuperman; +Cc: netdev
In-Reply-To: <CAMvx-beWMC8awScfEtHs8sSkzz0fGNqBH1fn7hC=k0iaFpJvSA@mail.gmail.com>

On Mon, 2014-02-03 at 09:50 -0500, Greg Kuperman wrote:
> Hi all,
> 
> I am testing a new congestion control protocol that relies on explicit
> congestion notifications (ECN) to notify the receiver of a congestion
> event. I have a rate limited link of 1 Mbps, and I am using the RED
> queuing discipline with ECN enabled. What I have noticed is that no
> matter how small I set my queue size, or how low I set my minimum
> marking level, the first ECN marked packet does not get sent out for
> about 10 seconds after the input rate exceeds the output rate. Further
> examination shows that ECN marking does not occur until the number or
> requeues hits 1000. Below are two queries of tc -s -d qdisc ls dev
> eth1.
> 
> qdisc red 8028: root refcnt 2 limit 10000000b min 1b max 0b ecn ewma
> 30 Plog 21 Scell_log 31
>  Sent 1307892 bytes 1247 pkt (dropped 0, overlimits 0 requeues 960)
>  backlog 1052118b 962p requeues 960
>   marked 0 early 0 pdrop 0 other 0
> 
> qdisc red 8028: root refcnt 2 limit 10000000b min 1b max 0b ecn ewma
> 30 Plog 21 Scell_log 31
>  Sent 1379262 bytes 1312 pkt (dropped 0, overlimits 72 requeues 1024)
>  backlog 1122468b 1027p requeues 1024
>   marked 72 early 0 pdrop 0 other 0
> 
> 
> The txqueuelen defaults to 1000 for the interface, so I figured that
> packets maybe buffering there, and then dequeuing, before any packets
> are marked. I set txqueuelen to lower values (all the way down to 1),
> but the exact same behavior occurs (no marked packets until number of
> dequeues hits 1000). In contrast, if I set txqueuele to something very
> high, I get no requeues, drops, or marked packets.
> 
> My goal is for packets to be marked as soon as the ingress rate
> exceeds the egress. Am I correct in thinking that the requeuing
> operation is the culprit? Can I eliminate requeues? Is there something
> else I can do to get the behavior I am looking for?
> 
> Thank you all for the help. And please cc me in your replies; I'm not
> 100% sure if I get all the messages from this mailing list.

requeues have nothing to do with ECN marking.

How is done your rate limiting ?

Post the whole setup, not part of it, it will help to spot the problem
in one go, instead of many mail exchanges.

^ permalink raw reply

* Re: Requeues and ECN marking
From: Greg Kuperman @ 2014-02-03 17:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1391448503.28432.101.camel@edumazet-glaptop2.roam.corp.google.com>

Thanks for the response. I agree that requeues should have nothing to
do with ECN marking, and that is why I am confused about what is
happening.

The entire setup is as follows. I am using the kernel version 3.2.44.
I am running a network emulator (CORE
http://www.nrl.navy.mil/itd/ncs/products/core), within which I have
four nodes. Each node becomes its own linux container, running its own
network control on its interfaces. The four nodes are the sender node
s, receiver node r, and two intermediate nodes 1 and 2. Node s is
connected to node 1, which is connected to node 2, which is connected
to node r. Link (1,2) is rate-limited to 1 Mbps (this rate limiting is
handled by another application that applies back pressure to the node
when its buffers are full and it can no longer send packets; the
buffer for that application is variable, and I have set it to hold up
to 10 packets).

I am running RED queuing discipline on the egress of node 1 with the
following setup:
tc qdisc add dev eth1 root red burst 1000000 limit 10000000 avpkt 1000
ecn bandwidth 125 probability 1

I also run it with the following (and have no change in behavior):
tc qdisc add dev eth1 root red min 2000 max 10000 probability 1.0
limit 1000000 burst 10 avpkt 1000 bandwidth 125 ecn probability 1

The odd thing that seems to be happening is that I can see the backlog
and requeues increasing, and once they hit 1000, then packet marking
begins. This is even though I have the minimum in RED set to 1 byte,
and max set to 0 (which, from my understanding means that packet
marking should begin when the backlog is 1 byte be the maximum
probability of marking right away because the max is set to 0). The
explanation I came up with is that it had something to do with the
requeues, but that may be entirely off base. I have no idea why it
does not begin marking packets right away (which is the desired
behavior).

Thank you again for all of your time, and please let me know if there
is anymore info that you guys need.

Some more queue statistics (I'm not sure how helpful this will be):

qdisc red 8004: root refcnt 2 limit 10000000b min 1b max 0b ecn
 Sent 1044606 bytes 996 pkt (dropped 0, overlimits 0 requeues 905)
 backlog 993072b 913p requeues 905
  marked 0 early 0 pdrop 0 other 0

qdisc red 8004: root refcnt 2 limit 10000000b min 1b max 0b ecn
 Sent 1131390 bytes 1076 pkt (dropped 0, overlimits 0 requeues 984)
 backlog 1080870b 992p requeues 984
  marked 0 early 0 pdrop 0 other 0

qdisc red 8004: root refcnt 2 limit 10000000b min 1b max 0b ecn
 Sent 1231386 bytes 1168 pkt (dropped 0, overlimits 179 requeues 1075)
 backlog 1179690b 1082p requeues 1075
  marked 179 early 0 pdrop 0 other 0

qdisc red 8004: root refcnt 2 limit 10000000b min 1b max 0b ecn
 Sent 1334640 bytes 1263 pkt (dropped 0, overlimits 368 requeues 1169)
 backlog 1283958b 1176p requeues 1169
  marked 368 early 0 pdrop 0 other 0

On Mon, Feb 3, 2014 at 12:28 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2014-02-03 at 09:50 -0500, Greg Kuperman wrote:
>> Hi all,
>>
>> I am testing a new congestion control protocol that relies on explicit
>> congestion notifications (ECN) to notify the receiver of a congestion
>> event. I have a rate limited link of 1 Mbps, and I am using the RED
>> queuing discipline with ECN enabled. What I have noticed is that no
>> matter how small I set my queue size, or how low I set my minimum
>> marking level, the first ECN marked packet does not get sent out for
>> about 10 seconds after the input rate exceeds the output rate. Further
>> examination shows that ECN marking does not occur until the number or
>> requeues hits 1000. Below are two queries of tc -s -d qdisc ls dev
>> eth1.
>>
>> qdisc red 8028: root refcnt 2 limit 10000000b min 1b max 0b ecn ewma
>> 30 Plog 21 Scell_log 31
>>  Sent 1307892 bytes 1247 pkt (dropped 0, overlimits 0 requeues 960)
>>  backlog 1052118b 962p requeues 960
>>   marked 0 early 0 pdrop 0 other 0
>>
>> qdisc red 8028: root refcnt 2 limit 10000000b min 1b max 0b ecn ewma
>> 30 Plog 21 Scell_log 31
>>  Sent 1379262 bytes 1312 pkt (dropped 0, overlimits 72 requeues 1024)
>>  backlog 1122468b 1027p requeues 1024
>>   marked 72 early 0 pdrop 0 other 0
>>
>>
>> The txqueuelen defaults to 1000 for the interface, so I figured that
>> packets maybe buffering there, and then dequeuing, before any packets
>> are marked. I set txqueuelen to lower values (all the way down to 1),
>> but the exact same behavior occurs (no marked packets until number of
>> dequeues hits 1000). In contrast, if I set txqueuele to something very
>> high, I get no requeues, drops, or marked packets.
>>
>> My goal is for packets to be marked as soon as the ingress rate
>> exceeds the egress. Am I correct in thinking that the requeuing
>> operation is the culprit? Can I eliminate requeues? Is there something
>> else I can do to get the behavior I am looking for?
>>
>> Thank you all for the help. And please cc me in your replies; I'm not
>> 100% sure if I get all the messages from this mailing list.
>
> requeues have nothing to do with ECN marking.
>
> How is done your rate limiting ?
>
> Post the whole setup, not part of it, it will help to spot the problem
> in one go, instead of many mail exchanges.
>
>

^ permalink raw reply

* Re: [PATCH RFC 1/1] usb: Tell xhci when usb data might be misaligned
From: Sarah Sharp @ 2014-02-03 17:55 UTC (permalink / raw)
  To: Mark Lord
  Cc: Ming Lei, Bjørn Mork, David Laight,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Greg Kroah-Hartman, David Miller, Dan Williams, Nyman, Mathias,
	Alan Stern, Freddy Xin
In-Reply-To: <52ED5381.2010106-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>

On Sat, Feb 01, 2014 at 03:05:21PM -0500, Mark Lord wrote:
> On 14-02-01 09:18 AM, Ming Lei wrote:
> >
> > Even real regressions are easily/often introduced, and we are discussing
> > how to fix that. I suggest to unset the flag only for the known buggy
> > controllers.

Ming, the regression cannot be easily fixed in this case.  We tried the
"easy, quick fix" and it broke USB storage and usbfs.  The patches to
paper over those issues started to creep into the upper layers, and I'm
not willing to add more code to hack around the issues caused by the
"quick fix".  We need to do this right, not wall-paper over the issues.

> It is not the controllers that are particularly "buggy" here.
> But rather the drivers and design of parts of the kernel.

As Mark mentioned, the host controllers aren't buggy.  The xHCI driver
simply doesn't handle a 1.0 host controller requirement, TD fragments,
very well.  Only the USB ethernet layer triggers this bug, because the
USB storage layer hands down scatter-gather lists in multiples of the
max packet size.

You tested on a 1.0 host controller, and it apparently didn't need the
TD fragments requirement.  It seems that Intel 1.0 xHCI host controllers
do need that requirement.  Perhaps we can add an xHCI driver quirk for
an exception so that your host can allow any kind of scatter-gather?

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC 1/1] usb: Tell xhci when usb data might be misaligned
From: Sarah Sharp @ 2014-02-03 17:56 UTC (permalink / raw)
  To: David Laight
  Cc: 'Mark Lord', Ming Lei, Bjørn Mork,
	linux-usb@vger.kernel.org, netdev@vger.kernel.org,
	Greg Kroah-Hartman, David Miller, Dan Williams, Nyman, Mathias,
	Alan Stern, Freddy Xin
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D0F6B7707@AcuExch.aculab.com>

On Mon, Feb 03, 2014 at 09:54:09AM +0000, David Laight wrote:
> From: Mark Lord
> > On 14-02-01 09:18 AM, Ming Lei wrote:
> > >
> > > Even real regressions are easily/often introduced, and we are discussing
> > > how to fix that. I suggest to unset the flag only for the known buggy
> > > controllers.
> > 
> > It is not the controllers that are particularly "buggy" here.
> > But rather the drivers and design of parts of the kernel.
> 
> I suspect that the documentation is describing the actual implementation
> of a specific hardware implementation, not necessarily how the hardware was
> intended to behave.

You are speculating.  Please stop speculating without evidence.  It does
not add to this conversation.

Sarah Sharp

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox