Netdev List
 help / color / mirror / Atom feed
* Re: "xfrm: Fix the gc threshold value for ipv4" broke my IPSec connections
From: Eric Dumazet @ 2013-10-15 22:51 UTC (permalink / raw)
  To: Damian Pietras, Steffen Klassert; +Cc: netdev
In-Reply-To: <525DBE65.1070707@daper.net>

On Wed, 2013-10-16 at 00:15 +0200, Damian Pietras wrote:
> On 15.10.2013 23:02, Eric Dumazet wrote:
> >> 703fb94ec58e0e8769380c2877a8a34aeb5b6c97
> >> xfrm: Fix the gc threshold value for ipv4
> >>
> >> Reverting it on 3.10.15 fixes my issue. This seems to be there from 3.7
> >> and I don't really believe such simple case stayed broken for so long.
> >> Em I missing something or there is really a bug?
> >>
> >> If smeone is interested in details of this configuration and commands
> >> I'm running, just let me know. This was reproduced with few VMs under XEN.
> >>
> > 
> > It looks like you need to tune /proc/sys/net/ipv4/xfrm4_gc_thresh to a
> > sensible value given your workload.
> > 
> > try :
> > 
> > echo 65536 >/proc/sys/net/ipv4/xfrm4_gc_thresh
> > 
> > Presumably the 1024 default is really too small...
> 
> Now it's working in my test setup, I'm changing it on the production
> boxes, thanks!
> 
> 

Steffen, what do you think ?

1024 seems really small, given we had much higher values.

(256 K on a 1GB host)

This sysctl also needs an entry in
Documentation/networking/ip-sysctl.txt

^ permalink raw reply

* Re: [PATCH RFC 0/2] xfrm: Remove ancient sleeping code
From: David Miller @ 2013-10-15 23:14 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev
In-Reply-To: <20131015073020.GV7660@secunet.com>

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Tue, 15 Oct 2013 09:30:20 +0200

> We could fiddle something to get a terminating condition if the
> state is not resolved after some time, but my plan was to disable
> the larval_drop sysctl by default some day again. At best without
> any notable change to userspace. That's why I would prefer to
> remove the sleeping entirely.

Ok I have no problem with removing the sleeping stuff.

^ permalink raw reply

* Re: [PATCH RFC 0/2] xfrm: Remove ancient sleeping code
From: Eric Dumazet @ 2013-10-15 23:58 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, netdev
In-Reply-To: <20131015073020.GV7660@secunet.com>

On Tue, 2013-10-15 at 09:30 +0200, Steffen Klassert wrote:

> Right, that's why I've limited the queue to 100 packets. We can
> queue the SYNs of up to 100 tcp connestions that want to use
> this IPsec state. It surely can happen that we queue multiple
> retransmitted SYNs if the IPsec resolution is slow. But the
> queueing code tries at least to get the packets out before
> the first tcp retransmit. I think there is still room for
> optimizations, maybe reducing the queue lenght or the queue
> timeout to avoid queueing retransmitted SYNs as much as possible.

Note that its totally possible to avoid retransmitting SYN if original
SYN is still in a host queue.

We currently increment a SNMP counter when we detect this, we could
do something else (like not queuing a copy of the packet)

http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=0e280af026a5662ffd57c4e623b822df1f7f47ff

Another work in progress is to delay RTO arming at the time TCP
packet leaves the host queues, instead of at the enqueue time.

^ permalink raw reply

* [RFC net-next] ipv6: Use destination address determined by IPVS
From: Simon Horman @ 2013-10-16  0:02 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki / 吉藤英明
  Cc: lvs-devel, netdev, Julian Anastasov, Mark Brooks, Simon Horman

In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
such that it creates and uses a neigh entry if none is found.
Subsequently the 'n' field was removed from struct rt6_info.

Unfortunately my analysis is that in the case of IPVS direct routing this
change leads to incorrect behaviour as in this case packets may be output
to a destination other than where they would be output according to the
route table. In particular, the destination address may actually be a local
address and empirically a neighbour lookup seems to result in it becoming
unreachable.

This patch resolves the problem by providing the destination address
determined by IPVS to ip6_finish_output2() in the skb callback.  Although
this seems to work I can see several problems with this approach:

* It is rather ugly, stuffing an IPVS exception right in
  the middle of IPv6 code. The overhead could be eliminated for many users
  by using a staic key. But none the less it is not attractive.

* The use of the skb callback is may not be valid
  as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
  alternative is to add a new field to struct sk_buff.

* This covers all IPv6 packets output by IPVS but actually
  only those output using IPVS Direct-Routing need this.  One way to
  resolve this would be to add a more fine-grained ipvs_property to
  struct sk_buff.

Reported-by: Mark Brooks <mark@loadbalancer.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 include/net/ip_vs.h             | 6 ++++++
 net/ipv6/ip6_output.c           | 9 +++++++--
 net/netfilter/ipvs/ip_vs_xmit.c | 2 ++
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 1c2e1b9..11d90a6 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1649,4 +1649,10 @@ ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
 		atomic_read(&dest->inactconns);
 }
 
+struct ipvs_skb_cb {
+	struct in6_addr *daddr;
+};
+
+#define IP_VS_SKB_CB(skb) ((struct ipvs_skb_cb *)&(skb)->cb)
+
 #endif	/* _NET_IP_VS_H */
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a54c45c..a340180 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -52,6 +52,7 @@
 #include <net/addrconf.h>
 #include <net/rawv6.h>
 #include <net/icmp.h>
+#include <net/ip_vs.h>
 #include <net/xfrm.h>
 #include <net/checksum.h>
 #include <linux/mroute6.h>
@@ -61,7 +62,7 @@ static int ip6_finish_output2(struct sk_buff *skb)
 	struct dst_entry *dst = skb_dst(skb);
 	struct net_device *dev = dst->dev;
 	struct neighbour *neigh;
-	struct in6_addr *nexthop;
+	struct in6_addr *nexthop, *daddr;
 	int ret;
 
 	skb->protocol = htons(ETH_P_IPV6);
@@ -105,7 +106,11 @@ static int ip6_finish_output2(struct sk_buff *skb)
 	}
 
 	rcu_read_lock_bh();
-	nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
+	if (unlikely(IS_ENABLED(CONFIG_IP_VS) && skb->ipvs_property))
+		daddr = IP_VS_SKB_CB(skb)->daddr;
+	else
+		daddr = &ipv6_hdr(skb)->daddr;
+	nexthop = rt6_nexthop((struct rt6_info *)dst, daddr);
 	neigh = __ipv6_neigh_lookup_noref(dst->dev, nexthop);
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false);
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index c47444e..054b679 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -391,6 +391,8 @@ __ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest,
 		rt = (struct rt6_info *) dst;
 	}
 
+	IP_VS_SKB_CB(skb)->daddr = daddr;
+
 	local = __ip_vs_is_local_route6(rt);
 	if (!((local ? IP_VS_RT_MODE_LOCAL : IP_VS_RT_MODE_NON_LOCAL) &
 	      rt_mode)) {
-- 
1.8.4


^ permalink raw reply related

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Eric Dumazet @ 2013-10-16  0:13 UTC (permalink / raw)
  To: Simon Horman
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <1381881751-6719-1-git-send-email-horms@verge.net.au>

On Wed, 2013-10-16 at 09:02 +0900, Simon Horman wrote:
> In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> such that it creates and uses a neigh entry if none is found.
> Subsequently the 'n' field was removed from struct rt6_info.
> 
> Unfortunately my analysis is that in the case of IPVS direct routing this
> change leads to incorrect behaviour as in this case packets may be output
> to a destination other than where they would be output according to the
> route table. In particular, the destination address may actually be a local
> address and empirically a neighbour lookup seems to result in it becoming
> unreachable.
> 
> This patch resolves the problem by providing the destination address
> determined by IPVS to ip6_finish_output2() in the skb callback.  Although
> this seems to work I can see several problems with this approach:
> 
> * It is rather ugly, stuffing an IPVS exception right in
>   the middle of IPv6 code. The overhead could be eliminated for many users
>   by using a staic key. But none the less it is not attractive.
> 
> * The use of the skb callback is may not be valid
>   as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
>   alternative is to add a new field to struct sk_buff.

Please no ;)

> 
> * This covers all IPv6 packets output by IPVS but actually
>   only those output using IPVS Direct-Routing need this.  One way to
>   resolve this would be to add a more fine-grained ipvs_property to
>   struct sk_buff.
> 
> Reported-by: Mark Brooks <mark@loadbalancer.org>
> Signed-off-by: Simon Horman <horms@verge.net.au>
> ---
>  include/net/ip_vs.h             | 6 ++++++
>  net/ipv6/ip6_output.c           | 9 +++++++--
>  net/netfilter/ipvs/ip_vs_xmit.c | 2 ++
>  3 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> index 1c2e1b9..11d90a6 100644
> --- a/include/net/ip_vs.h
> +++ b/include/net/ip_vs.h
> @@ -1649,4 +1649,10 @@ ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
>  		atomic_read(&dest->inactconns);
>  }
>  
> +struct ipvs_skb_cb {
> +	struct in6_addr *daddr;
> +};

So we pass a reference.

> +
> +#define IP_VS_SKB_CB(skb) ((struct ipvs_skb_cb *)&(skb)->cb)
> +
>  #endif	/* _NET_IP_VS_H */
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index a54c45c..a340180 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -52,6 +52,7 @@
>  #include <net/addrconf.h>
>  #include <net/rawv6.h>
>  #include <net/icmp.h>
> +#include <net/ip_vs.h>
>  #include <net/xfrm.h>
>  #include <net/checksum.h>
>  #include <linux/mroute6.h>
> @@ -61,7 +62,7 @@ static int ip6_finish_output2(struct sk_buff *skb)
>  	struct dst_entry *dst = skb_dst(skb);
>  	struct net_device *dev = dst->dev;
>  	struct neighbour *neigh;
> -	struct in6_addr *nexthop;
> +	struct in6_addr *nexthop, *daddr;
>  	int ret;
>  
>  	skb->protocol = htons(ETH_P_IPV6);
> @@ -105,7 +106,11 @@ static int ip6_finish_output2(struct sk_buff *skb)
>  	}
>  
>  	rcu_read_lock_bh();
> -	nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
> +	if (unlikely(IS_ENABLED(CONFIG_IP_VS) && skb->ipvs_property))
> +		daddr = IP_VS_SKB_CB(skb)->daddr;

What guarantee do we have daddr points to valid memory (not already
freed/reused) ?

I guess things like NFQUEUE could happen ?




^ permalink raw reply

* Re: DomU's network interface will hung when Dom0 running 32bit
From: jianhai luan @ 2013-10-16  0:15 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, xen-devel, netdev, ANNIE LI
In-Reply-To: <525D6C16.6010705@oracle.com>


On 2013-10-16 0:23, jianhai luan wrote:
>
> On 2013-10-16 0:03, Wei Liu wrote:
>> On Tue, Oct 15, 2013 at 11:19:42PM +0800, jianhai luan wrote:
>> [...]
>>>>>>> * time_after_eq(now, next_credit) -> false
>>>>>>> * time_before(now, expires) -> false
>>>>>> If now is placed in above environment, the result will be correct
>>>>>> (Sending package will be not allowed until next_credit).
>>>>> No, it is not necessarily correct. Keep in mind that "now" wraps 
>>>>> around,
>>>>> which is the issue you try to fix. You still have a window to 
>>>>> stall your
>>>>> frontend.
>>>> Remember that time_after_eq is supposed to work even with wraparound
>>>> occurring, so long as the two times are less than MAX_LONG/2 apart.
>>> Sorry for my misunderstand explanation. I mean that
>>>    * time_after_eq()/time_before_eq() fix the jiffies wraparound, so
>>> please think about  jiffies in line increasing.
>>>    * time_after_eq()/time_before_eq() have the range (0, MAX_LONG/2),
>>> the judge will be wrong if out of the range.
>>>
>>> So please think about three kind environment
>>>    -  expires        now        next_credit
>>>       --------time increases this direction ---------->
>>>
>>>    -  expires        [next_credit        now next_credit+MAX_LONG/2
>>>       --------time increase this direction ----------->
>>>
>>>    - expires        next_credit        next_credit+MAX_LONG/2 now
>>>       --------time increadse this direction ---------->
>>>
>>> The first environment should be netfront consume all credit_byte
>>> before next_credit, So we should pending one timer to calculator the
>>> new credit_byte, and don't transmit until next_credit.
>>>
>>> the second environment should be calculator the credit_byte because
>>> netfront don't consume all credit_byte before next_credit, and
>>> time_after_eq() do correct judge.
>>>
>>> the third environment should be calculator in time because netfront
>>> don't consume all credit_byte until next_credit.But time_after_eq do
>>> error judge (time_after_eq(now, next_credit) is false), so the
>>> remaining_byte isn't be increased.
>>>
>>> and I work on the third environment.  You know now >
>>> next_credit+MAX_LONG/2, time_before(now, expire) should be
>>> true(time_before(now, expire) is false in first environment)
>> Thanks for staighten this out for me. I'm just too dumb for this, please
>> be patient with me. :-)
>>
>> Could you prove that time_before(now, expire) is always true in third
>> case? That's where my main cencern lies. Is it because msecs_to_jiffies
>> always returns MAX_JIFFY_OFFSET (which is ((LONG_MAX >> 1)-1) ) at most?
>
> I have wrong judge in third environment. If now large than expires + 
> MAX_UNLONG, time_before(now, expires) will be false.
>   expires    next_credit    next_credit+MAX_UNLONG/2    expires + 
> MAX_UNLONG    now    next_credit+MAX_UNLONG
>   --------------------------------------------------------- time 
> increadse this direction  ---------------------------------->
>
>   In the above environment, time_before(now, expires) will return 
> false. But the jiffies elapsed more time and next_credit will be 
> reachable in soon(time_after_eq(now, next_credit) will be true).

After above talk, the window should be exist (expire+MAX_ULONG 
next_credit+MAX_ULONG, expire + 2MAX_ULONF next_credit+MAX_ULONG 
....,expire+<n>ULONG next_credit+<n>MAX_ULONG). Other time window should 
be not exist (maybe i don't think about).
  If so, please think about the below:
   * If no speed control, vif->credit_usec should be zero. expire = 
next_credit and the window is zero
   * If we have done speed control, the scenario should be very likely 
than first environment except the value is larger (the delta is 
<n>MAX_UNLONG)
      - If do speed control. the window should have been think about
                      speed            worse time
                      100M/s          40s
                      1000M/s       4s
      - the time should be acceptable.
>>
>> Wei.
>

^ permalink raw reply

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Simon Horman @ 2013-10-16  0:28 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <1381882426.2045.85.camel@edumazet-glaptop.roam.corp.google.com>

On Tue, Oct 15, 2013 at 05:13:46PM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-16 at 09:02 +0900, Simon Horman wrote:
> > In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> > ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> > such that it creates and uses a neigh entry if none is found.
> > Subsequently the 'n' field was removed from struct rt6_info.
> > 
> > Unfortunately my analysis is that in the case of IPVS direct routing this
> > change leads to incorrect behaviour as in this case packets may be output
> > to a destination other than where they would be output according to the
> > route table. In particular, the destination address may actually be a local
> > address and empirically a neighbour lookup seems to result in it becoming
> > unreachable.
> > 
> > This patch resolves the problem by providing the destination address
> > determined by IPVS to ip6_finish_output2() in the skb callback.  Although
> > this seems to work I can see several problems with this approach:
> > 
> > * It is rather ugly, stuffing an IPVS exception right in
> >   the middle of IPv6 code. The overhead could be eliminated for many users
> >   by using a staic key. But none the less it is not attractive.
> > 
> > * The use of the skb callback is may not be valid
> >   as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
> >   alternative is to add a new field to struct sk_buff.
> 
> Please no ;)

I thought of you when I wrote that comment :)

> > * This covers all IPv6 packets output by IPVS but actually
> >   only those output using IPVS Direct-Routing need this.  One way to
> >   resolve this would be to add a more fine-grained ipvs_property to
> >   struct sk_buff.
> > 
> > Reported-by: Mark Brooks <mark@loadbalancer.org>
> > Signed-off-by: Simon Horman <horms@verge.net.au>
> > ---
> >  include/net/ip_vs.h             | 6 ++++++
> >  net/ipv6/ip6_output.c           | 9 +++++++--
> >  net/netfilter/ipvs/ip_vs_xmit.c | 2 ++
> >  3 files changed, 15 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> > index 1c2e1b9..11d90a6 100644
> > --- a/include/net/ip_vs.h
> > +++ b/include/net/ip_vs.h
> > @@ -1649,4 +1649,10 @@ ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
> >  		atomic_read(&dest->inactconns);
> >  }
> >  
> > +struct ipvs_skb_cb {
> > +	struct in6_addr *daddr;
> > +};
> 
> So we pass a reference.
>
> > +
> > +#define IP_VS_SKB_CB(skb) ((struct ipvs_skb_cb *)&(skb)->cb)
> > +
> >  #endif	/* _NET_IP_VS_H */
> > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > index a54c45c..a340180 100644
> > --- a/net/ipv6/ip6_output.c
> > +++ b/net/ipv6/ip6_output.c
> > @@ -52,6 +52,7 @@
> >  #include <net/addrconf.h>
> >  #include <net/rawv6.h>
> >  #include <net/icmp.h>
> > +#include <net/ip_vs.h>
> >  #include <net/xfrm.h>
> >  #include <net/checksum.h>
> >  #include <linux/mroute6.h>
> > @@ -61,7 +62,7 @@ static int ip6_finish_output2(struct sk_buff *skb)
> >  	struct dst_entry *dst = skb_dst(skb);
> >  	struct net_device *dev = dst->dev;
> >  	struct neighbour *neigh;
> > -	struct in6_addr *nexthop;
> > +	struct in6_addr *nexthop, *daddr;
> >  	int ret;
> >  
> >  	skb->protocol = htons(ETH_P_IPV6);
> > @@ -105,7 +106,11 @@ static int ip6_finish_output2(struct sk_buff *skb)
> >  	}
> >  
> >  	rcu_read_lock_bh();
> > -	nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
> > +	if (unlikely(IS_ENABLED(CONFIG_IP_VS) && skb->ipvs_property))
> > +		daddr = IP_VS_SKB_CB(skb)->daddr;
> 
> What guarantee do we have daddr points to valid memory (not already
> freed/reused) ?

I can change that to passing a value if there is a risk
the reference could become invalid. To be honest I am more
worried that skb->cb might be clobbered entirely.

> I guess things like NFQUEUE could happen ?

Could you expand a little?

^ permalink raw reply

* Re: [PATCH] sh_eth: add/use RMCR.RNC bit
From: Simon Horman @ 2013-10-16  0:34 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: netdev, nobuhiro.iwamatsu.yj, linux-sh, davem
In-Reply-To: <201310160229.58735.sergei.shtylyov@cogentembedded.com>

On Wed, Oct 16, 2013 at 02:29:58AM +0400, Sergei Shtylyov wrote:
> Declare 'enum EMCR_BIT' containing the single member for the RMCR.RNC bit and
> replace bare numbers in the driver by  this mnemonic.
> 
> Suggested-by: David Miller <davem@davemloft.net>
> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

Thanks Sergei,

this seems to move things in the right direction.

Reviewed-by: Simon Horman <horms+renesas@verge.net.au>

> ---
> This patch is against DaveM's 'net.git' repo but it intended for 'net-next.git'
> repo -- it's  because 'net-next.git' doesn't contain the required Simon Horman's
> patch yet.
> 
>  drivers/net/ethernet/renesas/sh_eth.c |    6 +++---
>  drivers/net/ethernet/renesas/sh_eth.h |    3 +++
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> Index: net/drivers/net/ethernet/renesas/sh_eth.c
> ===================================================================
> --- net.orig/drivers/net/ethernet/renesas/sh_eth.c
> +++ net/drivers/net/ethernet/renesas/sh_eth.c
> @@ -483,7 +483,7 @@ static struct sh_eth_cpu_data sh7757_dat
>  	.register_type	= SH_ETH_REG_FAST_SH4,
>  
>  	.eesipr_value	= DMAC_M_RFRMER | DMAC_M_ECI | 0x003fffff,
> -	.rmcr_value	= 0x00000001,
> +	.rmcr_value	= RMCR_RNC,
>  
>  	.tx_check	= EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO,
>  	.eesr_err_check	= EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE |
> @@ -561,7 +561,7 @@ static struct sh_eth_cpu_data sh7757_dat
>  			  EESR_RFE | EESR_RDE | EESR_RFRMER | EESR_TFE |
>  			  EESR_TDE | EESR_ECI,
>  	.fdr_value	= 0x0000072f,
> -	.rmcr_value	= 0x00000001,
> +	.rmcr_value	= RMCR_RNC,
>  
>  	.irq_flags	= IRQF_SHARED,
>  	.apr		= 1,
> @@ -689,7 +689,7 @@ static struct sh_eth_cpu_data r8a7740_da
>  			  EESR_RFE | EESR_RDE | EESR_RFRMER | EESR_TFE |
>  			  EESR_TDE | EESR_ECI,
>  	.fdr_value	= 0x0000070f,
> -	.rmcr_value	= 0x00000001,
> +	.rmcr_value	= RMCR_RNC,
>  
>  	.apr		= 1,
>  	.mpr		= 1,
> Index: net/drivers/net/ethernet/renesas/sh_eth.h
> ===================================================================
> --- net.orig/drivers/net/ethernet/renesas/sh_eth.h
> +++ net/drivers/net/ethernet/renesas/sh_eth.h
> @@ -321,6 +321,9 @@ enum TD_STS_BIT {
>  #define TD_TFP	(TD_TFP1|TD_TFP0)
>  
>  /* RMCR */
> +enum RMCR_BIT {
> +	RMCR_RNC = 0x00000001,
> +};
>  #define DEFAULT_RMCR_VALUE	0x00000000
>  
>  /* ECMR */
> 

^ permalink raw reply

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Eric Dumazet @ 2013-10-16  0:39 UTC (permalink / raw)
  To: Simon Horman
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <20131016002807.GM22321@verge.net.au>

On Wed, 2013-10-16 at 09:28 +0900, Simon Horman wrote:

> > I guess things like NFQUEUE could happen ?
> 
> Could you expand a little?

This was to point that between IPVS and ipv6 stack we might have a
delay, and daddr was maybe pointed to a freed memory.

IP6CB only uses 24 bytes, so I think you would be safe adding 16 bytes.




^ permalink raw reply

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Hannes Frederic Sowa @ 2013-10-16  0:53 UTC (permalink / raw)
  To: Simon Horman
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <1381881751-6719-1-git-send-email-horms@verge.net.au>

On Wed, Oct 16, 2013 at 09:02:31AM +0900, Simon Horman wrote:
> In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> such that it creates and uses a neigh entry if none is found.
> Subsequently the 'n' field was removed from struct rt6_info.
> 
> Unfortunately my analysis is that in the case of IPVS direct routing this
> change leads to incorrect behaviour as in this case packets may be output
> to a destination other than where they would be output according to the
> route table. In particular, the destination address may actually be a local
> address and empirically a neighbour lookup seems to result in it becoming
> unreachable.
> 
> This patch resolves the problem by providing the destination address
> determined by IPVS to ip6_finish_output2() in the skb callback.  Although
> this seems to work I can see several problems with this approach:
> 
> * It is rather ugly, stuffing an IPVS exception right in
>   the middle of IPv6 code. The overhead could be eliminated for many users
>   by using a staic key. But none the less it is not attractive.
> 
> * The use of the skb callback is may not be valid
>   as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
>   alternative is to add a new field to struct sk_buff.
> 
> * This covers all IPv6 packets output by IPVS but actually
>   only those output using IPVS Direct-Routing need this.  One way to
>   resolve this would be to add a more fine-grained ipvs_property to
>   struct sk_buff.

Hmm, that reminds me on the following bug report which would be nice we could
solve in one go, too: http://www.spinics.net/lists/netdev/msg250785.html

Greetings,

  Hannes


^ permalink raw reply

* Re: [PATCH] bridge: Correctly clamp MAX forward_delay when enabling STP
From: Herbert Xu @ 2013-10-16  0:56 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, Stephen Hemminger
In-Reply-To: <1381863465-27304-1-git-send-email-vyasevic@redhat.com>

On Tue, Oct 15, 2013 at 02:57:45PM -0400, Vlad Yasevich wrote:
> Commit be4f154d5ef0ca147ab6bcd38857a774133f5450
> 	bridge: Clamp forward_delay when enabling STP
> had a typo when attempting to clamp maximum forward delay.
> 
> It is possible to set bridge_forward_delay to be higher then
> permitted maximum when STP is off.  When turning STP on, the
> higher then allowed delay has to be clamed down to max value.
> 
> CC: Herbert Xu <herbert@gondor.apana.org.au>
> CC: Stephen Hemminger <shemminger@vyatta.com>
> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>

Good catch!

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCHv3 net] {xfrm, sctp} Stick to software crc32 even if hardware is capable of that
From: Vlad Yasevich @ 2013-10-16  1:05 UTC (permalink / raw)
  To: Fan Du, nhorman; +Cc: steffen.klassert, davem, netdev
In-Reply-To: <1381828777-15894-1-git-send-email-fan.du@windriver.com>

On 10/15/2013 05:19 AM, Fan Du wrote:
> igb/ixgbe have hardware sctp checksum support, when this feature is enabled
> and also IPsec is armed to protect sctp traffic, ugly things happened as
> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum every thing
> up and pack the 16bits result in the checksum field). The result is fail
> establishment of sctp communication.
>
> Signed-off-by: Fan Du <fan.du@windriver.com>
> Cc: Vlad Yasevich <vyasevich@gmail.com>
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Acked-by: Vlad Yasevich <vyasevich@gmail.com>

Looks good to me.

-vlad

> ---
> v3:
>    - Rename is_xfrm_armed by dst_xfrm
>    - Move this funtion in include/net/dst.h
>
> v2:
>    - Split v1 into two separate patches.
>
> ---
>   include/net/dst.h |   12 ++++++++++++
>   net/sctp/output.c |    3 ++-
>   2 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 211dcf1..44995c1 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -478,10 +478,22 @@ static inline struct dst_entry *xfrm_lookup(struct net *net,
>   {
>   	return dst_orig;
>   }
> +
> +static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
> +{
> +	return NULL;
> +}
> +
>   #else
>   struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry *dst_orig,
>   			      const struct flowi *fl, struct sock *sk,
>   			      int flags);
> +
> +/* skb attached with this dst needs transformation if dst->xfrm is valid */
> +static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
> +{
> +	return dst->xfrm;
> +}
>   #endif
>
>   #endif /* _NET_DST_H */
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index 0ac3a65..24b3718 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -536,7 +536,8 @@ int sctp_packet_transmit(struct sctp_packet *packet)
>   	 * by CRC32-C as described in <draft-ietf-tsvwg-sctpcsum-02.txt>.
>   	 */
>   	if (!sctp_checksum_disable) {
> -		if (!(dst->dev->features & NETIF_F_SCTP_CSUM)) {
> +		if (!(dst->dev->features & NETIF_F_SCTP_CSUM) ||
> +			(dst_xfrm(dst) != NULL)) {
>   			__u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
>
>   			/* 3) Put the resultant value into the checksum field in the
>

^ permalink raw reply

* Re: [PATCHv3 net] {xfrm, sctp} Stick to software crc32 even if hardware is capable of that
From: Vlad Yasevich @ 2013-10-16  1:52 UTC (permalink / raw)
  To: Fan Du, nhorman; +Cc: steffen.klassert, davem, netdev
In-Reply-To: <525DE666.10308@gmail.com>

On 10/15/2013 09:05 PM, Vlad Yasevich wrote:
> On 10/15/2013 05:19 AM, Fan Du wrote:
>> igb/ixgbe have hardware sctp checksum support, when this feature is
>> enabled
>> and also IPsec is armed to protect sctp traffic, ugly things happened as
>> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum
>> every thing
>> up and pack the 16bits result in the checksum field). The result is fail
>> establishment of sctp communication.
>>
>> Signed-off-by: Fan Du <fan.du@windriver.com>
>> Cc: Vlad Yasevich <vyasevich@gmail.com>
>> Cc: Neil Horman <nhorman@tuxdriver.com>
>> Cc: Steffen Klassert <steffen.klassert@secunet.com>
>> Acked-by: Vlad Yasevich <vyasevich@gmail.com>
>
> Looks good to me.
>

Just tryied applying this and looks like you based this on net-next.
This fixes a rather ugly bug when checksum offloading is done.
I am going to rebase this and re-submit for net.

-vlad

> -vlad
>
>> ---
>> v3:
>>    - Rename is_xfrm_armed by dst_xfrm
>>    - Move this funtion in include/net/dst.h
>>
>> v2:
>>    - Split v1 into two separate patches.
>>
>> ---
>>   include/net/dst.h |   12 ++++++++++++
>>   net/sctp/output.c |    3 ++-
>>   2 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/dst.h b/include/net/dst.h
>> index 211dcf1..44995c1 100644
>> --- a/include/net/dst.h
>> +++ b/include/net/dst.h
>> @@ -478,10 +478,22 @@ static inline struct dst_entry
>> *xfrm_lookup(struct net *net,
>>   {
>>       return dst_orig;
>>   }
>> +
>> +static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
>> +{
>> +    return NULL;
>> +}
>> +
>>   #else
>>   struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry
>> *dst_orig,
>>                     const struct flowi *fl, struct sock *sk,
>>                     int flags);
>> +
>> +/* skb attached with this dst needs transformation if dst->xfrm is
>> valid */
>> +static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
>> +{
>> +    return dst->xfrm;
>> +}
>>   #endif
>>
>>   #endif /* _NET_DST_H */
>> diff --git a/net/sctp/output.c b/net/sctp/output.c
>> index 0ac3a65..24b3718 100644
>> --- a/net/sctp/output.c
>> +++ b/net/sctp/output.c
>> @@ -536,7 +536,8 @@ int sctp_packet_transmit(struct sctp_packet *packet)
>>        * by CRC32-C as described in <draft-ietf-tsvwg-sctpcsum-02.txt>.
>>        */
>>       if (!sctp_checksum_disable) {
>> -        if (!(dst->dev->features & NETIF_F_SCTP_CSUM)) {
>> +        if (!(dst->dev->features & NETIF_F_SCTP_CSUM) ||
>> +            (dst_xfrm(dst) != NULL)) {
>>               __u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
>>
>>               /* 3) Put the resultant value into the checksum field in
>> the
>>
>

^ permalink raw reply

* Re: [PATCHv3 net] {xfrm, sctp} Stick to software crc32 even if hardware is capable of that
From: Fan Du @ 2013-10-16  1:58 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: nhorman, steffen.klassert, davem, netdev
In-Reply-To: <525DF141.6060302@gmail.com>



On 2013年10月16日 09:52, Vlad Yasevich wrote:
> On 10/15/2013 09:05 PM, Vlad Yasevich wrote:
>> On 10/15/2013 05:19 AM, Fan Du wrote:
>>> igb/ixgbe have hardware sctp checksum support, when this feature is
>>> enabled
>>> and also IPsec is armed to protect sctp traffic, ugly things happened as
>>> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum
>>> every thing
>>> up and pack the 16bits result in the checksum field). The result is fail
>>> establishment of sctp communication.
>>>
>>> Signed-off-by: Fan Du <fan.du@windriver.com>
>>> Cc: Vlad Yasevich <vyasevich@gmail.com>
>>> Cc: Neil Horman <nhorman@tuxdriver.com>
>>> Cc: Steffen Klassert <steffen.klassert@secunet.com>
>>> Acked-by: Vlad Yasevich <vyasevich@gmail.com>
>>
>> Looks good to me.
>>
>
> Just tryied applying this and looks like you based this on net-next.
> This fixes a rather ugly bug when checksum offloading is done.
> I am going to rebase this and re-submit for net.

:( Sorry for the inconvenience. Stable tree might hurts also.

-- 
浮沉随浪只记今朝笑

--fan

^ permalink raw reply

* [PATCH v4 net 0/3] sctp: Use software checksum under certain
From: Vlad Yasevich @ 2013-10-16  2:01 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, fan.du, Vlad Yasevich

There are some cards that support SCTP checksum offloading.  When using
these cards with IPSec or forcing IP fragmentation of SCTP traffic,
the checksum is computed incorrectly due to the fact that xfrm and IP/IPv6
fragmentation code do not know that this is SCTP traffic and do not
know that checksum has to be computed differently.

To fix this, we let SCTP detect these conditions and perform software
checksum calculation.

Fan Du (1):
  sctp: Use software crc32 checksum when xfrm transform will happen.

Vlad Yasevich (2):
  net: dst: provide accessor function to dst->xfrm
  sctp: Perform software checksum if packet has to be fragmented.

 include/net/dst.h | 12 ++++++++++++
 net/sctp/output.c |  3 ++-
 2 files changed, 14 insertions(+), 1 deletion(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH v4 net 1/3] net: dst: provide accessor function to dst->xfrm
From: Vlad Yasevich @ 2013-10-16  2:01 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, fan.du, Vlad Yasevich
In-Reply-To: <1381888891-31186-1-git-send-email-vyasevich@gmail.com>

dst->xfrm is conditionally defined.  Provide accessor funtion that
is always available.

Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
---
 include/net/dst.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/net/dst.h b/include/net/dst.h
index 3bc4865..3c4c944 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -479,10 +479,22 @@ static inline struct dst_entry *xfrm_lookup(struct net *net,
 {
 	return dst_orig;
 } 
+
+static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
+{
+	return NULL;
+}
+
 #else
 extern struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry *dst_orig,
 				     const struct flowi *fl, struct sock *sk,
 				     int flags);
+
+/* skb attached with this dst needs transformation if dst->xfrm is valid */
+static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
+{
+	return dst->xfrm;
+}
 #endif
 
 #endif /* _NET_DST_H */
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v4 net 2/3] sctp: Use software crc32 checksum when xfrm transform will happen.
From: Vlad Yasevich @ 2013-10-16  2:01 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, fan.du, Neil Horman, Steffen Klassert, Vlad Yasevich
In-Reply-To: <1381888891-31186-1-git-send-email-vyasevich@gmail.com>

From: Fan Du <fan.du@windriver.com>

igb/ixgbe have hardware sctp checksum support, when this feature is enabled
and also IPsec is armed to protect sctp traffic, ugly things happened as
xfrm_output checks CHECKSUM_PARTIAL to do checksum operation(sum every thing
up and pack the 16bits result in the checksum field). The result is fail
establishment of sctp communication.

Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
---
 net/sctp/output.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index 0ac3a65..d35b54c 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -536,7 +536,8 @@ int sctp_packet_transmit(struct sctp_packet *packet)
 	 * by CRC32-C as described in <draft-ietf-tsvwg-sctpcsum-02.txt>.
 	 */
 	if (!sctp_checksum_disable) {
-		if (!(dst->dev->features & NETIF_F_SCTP_CSUM)) {
+		if (!(dst->dev->features & NETIF_F_SCTP_CSUM) ||
+		    (dst_xfrm(dst) != NULL)) {
 			__u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
 
 			/* 3) Put the resultant value into the checksum field in the
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v4 net 3/3] sctp: Perform software checksum if packet has to be fragmented.
From: Vlad Yasevich @ 2013-10-16  2:01 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, fan.du, Vlad Yasevich
In-Reply-To: <1381888891-31186-1-git-send-email-vyasevich@gmail.com>

IP/IPv6 fragmentation knows how to compute only TCP/UDP checksum.
This causes problems if SCTP packets has to be fragmented and
ipsummed has been set to PARTIAL due to checksum offload support.
This condition can happen when retransmitting after MTU discover,
or when INIT or other control chunks are larger then MTU.
Check for the rare fragmentation condition in SCTP and use software
checksum calculation in this case.

CC: Fan Du <fan.du@windriver.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
---
 net/sctp/output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index d35b54c..3191373 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -537,7 +537,7 @@ int sctp_packet_transmit(struct sctp_packet *packet)
 	 */
 	if (!sctp_checksum_disable) {
 		if (!(dst->dev->features & NETIF_F_SCTP_CSUM) ||
-		    (dst_xfrm(dst) != NULL)) {
+		    (dst_xfrm(dst) != NULL) || packet->ipfragok) {
 			__u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
 
 			/* 3) Put the resultant value into the checksum field in the
-- 
1.8.3.1

^ permalink raw reply related

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Simon Horman @ 2013-10-16  2:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <1381883949.2045.97.camel@edumazet-glaptop.roam.corp.google.com>

On Tue, Oct 15, 2013 at 05:39:09PM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-16 at 09:28 +0900, Simon Horman wrote:
> 
> > > I guess things like NFQUEUE could happen ?
> > 
> > Could you expand a little?
> 
> This was to point that between IPVS and ipv6 stack we might have a
> delay, and daddr was maybe pointed to a freed memory.
> 
> IP6CB only uses 24 bytes, so I think you would be safe adding 16 bytes.

That does seem very promising but while implementing it
I hit a problem.

struct tcp_skb_cb includes a field of type struct inet6_skb_parm.  And
expanding struct inet6_skb_parm by 16 bytes means that struct tcp_skb_cb is
now larger than 48 bytes and no longer fits in skb->cb.

Is it appropriate to grow skb->cb as the comment above struct tcp_skb_cb
suggests?

^ permalink raw reply

* Re: [PATCH V2] For for each TSN t being newly acked (Not only cumulatively, but also SELECTIVELY) cacc_saw_newack should be set to 1.
From: Vlad Yasevich @ 2013-10-16  2:13 UTC (permalink / raw)
  To: Chang Xiangzhong; +Cc: nhorman, davem, linux-sctp, netdev, linux-kernel
In-Reply-To: <1381860784-16481-1-git-send-email-changxiangzhong@gmail.com>

On 10/15/2013 02:13 PM, Chang Xiangzhong wrote:
> Signed-off-by: Xiangzhong Chang <changxiangzhong@gmail.com>

Your proposed solution is very nice, but it does 2 things in one
patch.
  1) It fixes the bug
  2) It refactors the code to improve the flow.

While (2) is very nice, it needs a much more careful review.
Can you please split this into 2 patches?  First patch can
make the code look like this:

	if (sctp_acked(sack, tsn)) {
		...
		if (!tchunk->tsn_gap_acked) {
			tchunk->tsn_gap_acked = 1;
			*highest_new_tsn_in_sack = tsn;
			bytes_acked += sctp_data_size(tchunk);
                         if (!tchunk->transport)
				migrate_bytes += sctp_data_size(tchunk);
			forward_progress = true;

			/*
			 * SFR-CACC algorithm:
			 * 2) If the SACK contains gap acks
			 * and the flag CHANGEOVER_ACTIVE is
			 * set the receiver of the SACK MUST
			 * take the following action:
			...
		}
	}

Then you can file a second patch to improve the flow/refactor the 
function.  You have to be very careful here though and be sure to
run through all the regression tests since you would be modifying
a very critcal part of the code.

Thanks
-vlad

> ---
>   net/sctp/outqueue.c |   76 ++++++++++++++++++++++++---------------------------
>   1 file changed, 35 insertions(+), 41 deletions(-)
>
> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> index 94df758..84ef3b8 100644
> --- a/net/sctp/outqueue.c
> +++ b/net/sctp/outqueue.c
> @@ -1357,13 +1357,13 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>
>   		tsn = ntohl(tchunk->subh.data_hdr->tsn);
>   		if (sctp_acked(sack, tsn)) {
> -			/* If this queue is the retransmit queue, the
> -			 * retransmit timer has already reclaimed
> -			 * the outstanding bytes for this chunk, so only
> -			 * count bytes associated with a transport.
> -			 */
> -			if (transport) {
> -				/* If this chunk is being used for RTT
> +			if (!tchunk->tsn_gap_acked) {
> +				/* If this queue is the retransmit queue, the
> +				 * retransmit timer has already reclaimed
> +				 * the outstanding bytes for this chunk, so only
> +				 * count bytes associated with a transport.
> +				 *
> +				 * If this chunk is being used for RTT
>   				 * measurement, calculate the RTT and update
>   				 * the RTO using this value.
>   				 *
> @@ -1374,28 +1374,44 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>   				 * first instance of the packet or a later
>   				 * instance).
>   				 */
> -				if (!tchunk->tsn_gap_acked &&
> -				    tchunk->rtt_in_progress) {
> +				if (transport && tchunk->rtt_in_progress) {
>   					tchunk->rtt_in_progress = 0;
>   					rtt = jiffies - tchunk->sent_at;
>   					sctp_transport_update_rto(transport,
> -								  rtt);
> +						rtt);
>   				}
> -			}
>
> -			/* If the chunk hasn't been marked as ACKED,
> -			 * mark it and account bytes_acked if the
> -			 * chunk had a valid transport (it will not
> -			 * have a transport if ASCONF had deleted it
> -			 * while DATA was outstanding).
> -			 */
> -			if (!tchunk->tsn_gap_acked) {
> +				/* If the chunk hasn't been marked as ACKED,
> +				 * mark it and account bytes_acked if the
> +				 * chunk had a valid transport (it will not
> +				 * have a transport if ASCONF had deleted it
> +				 * while DATA was outstanding).
> +				 */
>   				tchunk->tsn_gap_acked = 1;
>   				*highest_new_tsn_in_sack = tsn;
>   				bytes_acked += sctp_data_size(tchunk);
>   				if (!tchunk->transport)
>   					migrate_bytes += sctp_data_size(tchunk);
>   				forward_progress = true;
> +
> +				/* SFR-CACC algorithm:
> +				 * 2) If the SACK contains gap acks
> +				 * and the flag CHANGEOVER_ACTIVE is
> +				 * set the receiver of the SACK MUST
> +				 * take the following action:
> +				 *
> +				 * B) For each TSN t being acked that
> +				 * has not been acked in any SACK so
> +				 * far, set cacc_saw_newack to 1 for
> +				 * the destination that the TSN was
> +				 * sent to.
> +				 */
> +				if (transport &&
> +					sack->num_gap_ack_blocks &&
> +					q->asoc->peer.primary_path->cacc.
> +					changeover_active
> +				   )
> +					transport->cacc.cacc_saw_newack = 1;
>   			}
>
>   			if (TSN_lte(tsn, sack_ctsn)) {
> @@ -1411,30 +1427,8 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>   				restart_timer = 1;
>   				forward_progress = true;
>
> -				if (!tchunk->tsn_gap_acked) {
> -					/*
> -					 * SFR-CACC algorithm:
> -					 * 2) If the SACK contains gap acks
> -					 * and the flag CHANGEOVER_ACTIVE is
> -					 * set the receiver of the SACK MUST
> -					 * take the following action:
> -					 *
> -					 * B) For each TSN t being acked that
> -					 * has not been acked in any SACK so
> -					 * far, set cacc_saw_newack to 1 for
> -					 * the destination that the TSN was
> -					 * sent to.
> -					 */
> -					if (transport &&
> -					    sack->num_gap_ack_blocks &&
> -					    q->asoc->peer.primary_path->cacc.
> -					    changeover_active)
> -						transport->cacc.cacc_saw_newack
> -							= 1;
> -				}
> -
>   				list_add_tail(&tchunk->transmitted_list,
> -					      &q->sacked);
> +					&q->sacked);
>   			} else {
>   				/* RFC2960 7.2.4, sctpimpguide-05 2.8.2
>   				 * M2) Each time a SACK arrives reporting
>

^ permalink raw reply

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Simon Horman @ 2013-10-16  2:14 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <20131016005322.GA18135@order.stressinduktion.org>

On Wed, Oct 16, 2013 at 02:53:22AM +0200, Hannes Frederic Sowa wrote:
> On Wed, Oct 16, 2013 at 09:02:31AM +0900, Simon Horman wrote:
> > In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> > ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> > such that it creates and uses a neigh entry if none is found.
> > Subsequently the 'n' field was removed from struct rt6_info.
> > 
> > Unfortunately my analysis is that in the case of IPVS direct routing this
> > change leads to incorrect behaviour as in this case packets may be output
> > to a destination other than where they would be output according to the
> > route table. In particular, the destination address may actually be a local
> > address and empirically a neighbour lookup seems to result in it becoming
> > unreachable.
> > 
> > This patch resolves the problem by providing the destination address
> > determined by IPVS to ip6_finish_output2() in the skb callback.  Although
> > this seems to work I can see several problems with this approach:
> > 
> > * It is rather ugly, stuffing an IPVS exception right in
> >   the middle of IPv6 code. The overhead could be eliminated for many users
> >   by using a staic key. But none the less it is not attractive.
> > 
> > * The use of the skb callback is may not be valid
> >   as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
> >   alternative is to add a new field to struct sk_buff.
> > 
> > * This covers all IPv6 packets output by IPVS but actually
> >   only those output using IPVS Direct-Routing need this.  One way to
> >   resolve this would be to add a more fine-grained ipvs_property to
> >   struct sk_buff.
> 
> Hmm, that reminds me on the following bug report which would be nice we could
> solve in one go, too: http://www.spinics.net/lists/netdev/msg250785.html

I think it should be possible to solve that using the
IP6CB() approach that Eric suggested. Hopefully we can make
that approach fly.

^ permalink raw reply

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Eric Dumazet @ 2013-10-16  2:40 UTC (permalink / raw)
  To: Simon Horman
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <20131016021304.GA17801@verge.net.au>

On Wed, 2013-10-16 at 11:13 +0900, Simon Horman wrote:

> That does seem very promising but while implementing it
> I hit a problem.
> 
> struct tcp_skb_cb includes a field of type struct inet6_skb_parm.  And
> expanding struct inet6_skb_parm by 16 bytes means that struct tcp_skb_cb is
> now larger than 48 bytes and no longer fits in skb->cb.
> 
> Is it appropriate to grow skb->cb as the comment above struct tcp_skb_cb
> suggests?

Ah well...

No, its not appropriate to grow sk_buff by 16 bytes, sorry.

You could not change struct inet6_skb_parm, but define IP6CB as
a compound 

struct ip6cb {
   struct inet6_skb_parm foo;
   struct in6_addr       bar;
};




^ permalink raw reply

* Re: [PATCH 02/18] net: use wrapper functions of net_ratelimit() to simplify code
From: Kefeng Wang @ 2013-10-16  3:24 UTC (permalink / raw)
  To: Joe Perches
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman,
	David S. Miller, Pablo Neira Ayuso, Stephen Hemminger,
	Johannes Berg, John W. Linville, Stanislaw Gruszka, Johannes Berg,
	Francois Romieu, Ben Hutchings, Chas Williams, Marc Kleine-Budde,
	Samuel Ortiz, Paul Mackerras, Oliver Neukum,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	Rusty Russell, Michael S. Tsirkin, netfilter
In-Reply-To: <1381854263.22110.19.camel@joe-AO722>

Thanks for your reply.

On 10/16 0:24, Joe Perches wrote:
> On Tue, 2013-10-15 at 19:44 +0800, Kefeng Wang wrote:
>> Wrapper functions net_ratelimited_function() and net_XXX_ratelimited()
>> are called to simplify code.
> []
>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> []
>> @@ -465,10 +465,8 @@ void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
>>  	if (likely(fdb)) {
>>  		/* attempt to update an entry for a local interface */
>>  		if (unlikely(fdb->is_local)) {
>> -			if (net_ratelimit())
>> -				br_warn(br, "received packet on %s with "
>> -					"own address as source address\n",
>> -					source->dev->name);
>> +			net_ratelimited_function(br_warn, br, "received packet on %s "
>> +				"with own address as source address\n", source->dev->name);
> 
> Hello Kefeng.
> 
> When these types of lines are changed, please coalesce the
> fragmented format pieces into a single string.
> 
> It makes grep a bit easier and 80 columns limits don't
> apply to formats.

Got it, I will coalesce them, but 80 columns limits will be
broken.

> I think using net_ratelimited_function is not particularly
> clarifying here.
> 
> Maybe net_ratelimited_function should be removed instead
> of its use sites expanded.
> 
> Perhaps adding macros like #define br_warn_ratelimited()
> would be better.

yes, I found dev_emerg_ratelimited already exists. I should
use them and will add some similar mcaros.

> This comment applies to the whole series.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 14/18] net: usb: use wrapper functions of net_ratelimit() to simplify code
From: Kefeng Wang @ 2013-10-16  3:27 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: linux-kernel, Greg Kroah-Hartman, David S. Miller,
	Pablo Neira Ayuso, Stephen Hemminger, Johannes Berg,
	John W. Linville, Stanislaw Gruszka, Johannes Berg,
	Francois Romieu, Ben Hutchings, Chas Williams, Marc Kleine-Budde,
	Samuel Ortiz, Paul Mackerras, Oliver Neukum,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	Rusty Russell, Michael S. Tsirkin, netfilter
In-Reply-To: <525D921C.2030709@cogentembedded.com>

Thanks for you reply.
On 10/16 3:06, Sergei Shtylyov wrote:
> Hello.
> 
> On 10/15/2013 03:45 PM, Kefeng Wang wrote:
> 
>> net_ratelimited_function() is called to simplify code.
> 
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>   drivers/net/usb/usbnet.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
>> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
>> index bf94e10..edf81de 100644
>> --- a/drivers/net/usb/usbnet.c
>> +++ b/drivers/net/usb/usbnet.c
>> @@ -450,8 +450,8 @@ void usbnet_defer_kevent (struct usbnet *dev, int work)
>>   {
>>       set_bit (work, &dev->flags);
>>       if (!schedule_work (&dev->kevent)) {
>> -        if (net_ratelimit())
>> -            netdev_err(dev->net, "kevent %d may have been dropped\n", work);
>> +        net_ratelimited_function(netdev_err, dev->net,
>> +            "kevent %d may have been dropped\n", work);
> 
>    The continuation line should start under 'netdev_err'. Same about the other patches where you didn't change the indentation of the continuation lines though you should have.

Got it, indentation will be changed.

> WBR, Sergei
> 
> 
> .
> 

^ permalink raw reply

* Re: [PATCH v2.43 0/5] MPLS actions and matches
From: Ben Pfaff @ 2013-10-16  3:55 UTC (permalink / raw)
  To: Simon Horman
  Cc: dev, netdev, Jesse Gross, Pravin B Shelar, Ravi K, Isaku Yamahata,
	Joe Stringer
In-Reply-To: <20131010003139.GA20311@verge.net.au>

On Thu, Oct 10, 2013 at 09:31:41AM +0900, Simon Horman wrote:
> I believe this series addresses the feedback that each of you
> gave with regards to recent previous postings of this patch-set.
> I'm wondering if you could find some time to review it.

I'm waiting for an ack from Jesse, then I'm going to do a final pass and
I hope to commit this series at this point.  I see some ways that we can
improve MPLS support afterward but I don't see any blockers.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox