Netdev List
 help / color / mirror / Atom feed
* Re: [PATCHv2 net] xen-netback: count number required slots for an skb more carefully
From: David Miller @ 2013-09-13  3:23 UTC (permalink / raw)
  To: david.vrabel
  Cc: xen-devel, konrad.wilk, boris.ostrovsky, netdev, ian.campbell
In-Reply-To: <1378907568-8062-1-git-send-email-david.vrabel@citrix.com>

From: David Vrabel <david.vrabel@citrix.com>
Date: Wed, 11 Sep 2013 14:52:48 +0100

> From: David Vrabel <david.vrabel@citrix.com>
> 
> When a VM is providing an iSCSI target and the LUN is used by the
> backend domain, the generated skbs for direct I/O writes to the disk
> have large, multi-page skb->data but no frags.
> 
> With some lengths and starting offsets, xen_netbk_count_skb_slots()
> would be one short because the simple calculation of
> DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the
> decisions made by start_new_rx_buffer() which does not guarantee
> responses are fully packed.
 ...
> Fix this by counting the number of required slots more carefully.  In
> xen_netbk_count_skb_slots(), more closely follow the algorithm used by
> xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which
> is the dry-run equivalent of netbk_gop_frag_copy().
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> --
> v2: xen_netbk_* -> xenvif_* to match new style

Thanks for resolving this into a final patch after so much back and
forth :-)

I assume you want this queued up for -stable, and can you check if
there is any non-trivial backporting for earlier kernels?

^ permalink raw reply

* [PATCH net-next] bridge: fix NULL pointer deref in br_handle_frame
From: Hong Zhiguo @ 2013-09-13  3:09 UTC (permalink / raw)
  To: netdev; +Cc: davem, zhiguohong, eric.dumazet, vyasevic
In-Reply-To: <1378988195-2710-1-git-send-email-zhiguohong@tencent.com>

From: Hong Zhiguo <zhiguohong@tencent.com>

I got an Oops on my box when br_handle_frame is called between these
2 lines of del_nbp:
	dev->priv_flags &= ~IFF_BRIDGE_PORT;
	/* --> br_handle_frame is called at this time */
	netdev_rx_handler_unregister(dev);

In br_handle_frame the return of br_port_get_rcu(dev) is dereferenced
without check but br_port_get_rcu(dev) returns NULL if:
	!(dev->priv_flags & IFF_BRIDGE_PORT)

In my first fix I moved netdev_rx_handler_unregister up. Eric Dumazet
pointed out the testing of IFF_BRIDGE_PORT is not necessary here since
we're in rcu_read_lock and we have synchronize_net() in
netdev_rx_handler_unregister. This fix removed the testing of
IFF_BRIDGE_PORT.

I tested the fix on my box with script doing "brctl addif" and "brctl
delif" repeatedly while a lot of broadcast frame present on the LAN.
I added msleep in del_nbp between setting of priv_flags and unregister
so it's easy to reproduce the oops without the fix.

I'll send another patch to net-next to take care of br_netfilter and
ebtable if necessary(seems there's NULL check following but I'll
have a look).

The Oops(some lines omitted):
BUG: unable to handle kernel NULL pointer dereference at 0000000000000021
IP: [<ffffffff8150901d>] br_handle_frame+0xed/0x230
Oops: 0000 [#1] PREEMPT SMP
RIP: 0010:[<ffffffff8150901d>]  [<ffffffff8150901d>] br_handle_frame+0xed/0x230
RSP: 0018:ffff880030403c10  EFLAGS: 00010286
Stack:
 ffff88002c945700 ffffffff81508f30 0000000000000000 ffff88002d41e000
 ffff880030403c98 ffffffff81477acb ffffffff81477821 ffff880030403c68
 ffffffff81090e10 00ff88002d545c80 ffff88002c945700 ffffffff81aa50c0
Call Trace:
 <IRQ>
 [<ffffffff81508f30>] ? br_handle_frame_finish+0x300/0x300
 [<ffffffff81477acb>] __netif_receive_skb_core+0x39b/0x880

Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com>
---
 net/bridge/br_input.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index a2fd37e..2244049 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -60,7 +60,7 @@ static int br_pass_frame_up(struct sk_buff *skb)
 int br_handle_frame_finish(struct sk_buff *skb)
 {
 	const unsigned char *dest = eth_hdr(skb)->h_dest;
-	struct net_bridge_port *p = br_port_get_rcu(skb->dev);
+	struct net_bridge_port *p = rcu_dereference(skb->dev->rx_handler_data);
 	struct net_bridge *br;
 	struct net_bridge_fdb_entry *dst;
 	struct net_bridge_mdb_entry *mdst;
@@ -143,7 +143,7 @@ drop:
 /* note: already called with rcu_read_lock */
 static int br_handle_local_finish(struct sk_buff *skb)
 {
-	struct net_bridge_port *p = br_port_get_rcu(skb->dev);
+	struct net_bridge_port *p = rcu_dereference(skb->dev->rx_handler_data);
 	u16 vid = 0;
 
 	br_vlan_get_tag(skb, &vid);
@@ -173,7 +173,7 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
 	if (!skb)
 		return RX_HANDLER_CONSUMED;
 
-	p = br_port_get_rcu(skb->dev);
+	p = rcu_dereference(skb->dev->rx_handler_data);
 
 	if (unlikely(is_link_local_ether_addr(dest))) {
 		/*
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH net] tg3: Expand led off fix to include 5720
From: David Miller @ 2013-09-13  3:11 UTC (permalink / raw)
  To: nsujir; +Cc: netdev, stable, mchan
In-Reply-To: <1379019691-4618-1-git-send-email-nsujir@broadcom.com>

From: "Nithin Nayak Sujir" <nsujir@broadcom.com>
Date: Thu, 12 Sep 2013 14:01:31 -0700

> Commit 989038e217e94161862a959e82f9a1ecf8dda152 ("tg3: Don't turn off
> led on 5719 serdes port 0") added code to skip turning led off on port
> 0 of the 5719 since it powered down other ports. This workaround needs
> to be enabled on the 5720 as well.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied and queued up for -stable.

^ permalink raw reply

* [PATCH v2 6/6] ipv6: Do route updating for redirect in ndisc layer
From: Duan Jiong @ 2013-09-13  3:03 UTC (permalink / raw)
  To: davem; +Cc: netdev, hannes
In-Reply-To: <52327F00.4040802@cn.fujitsu.com>

From: Duan Jiong <duanj.fnst@cn.fujitsu.com>

Do the whole verification and route updating in ndisc
lay and then just call into icmpv6_notify() to notify
the upper protocols.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
---
 include/net/ip6_route.h |  3 ---
 net/ipv6/ndisc.c        |  6 ++----
 net/ipv6/route.c        | 29 ++---------------------------
 3 files changed, 4 insertions(+), 34 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index f525e70..5db259e 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -133,9 +133,6 @@ extern void ip6_update_pmtu(struct sk_buff *skb, struct net *net, __be32 mtu,
 extern void ip6_sk_update_pmtu(struct sk_buff *skb, struct sock *sk,
 			       __be32 mtu);
 extern void ip6_redirect(struct sk_buff *skb, struct net *net, int oif, u32 mark);
-extern void ip6_redirect_no_header(struct sk_buff *skb, struct net *net, int oif,
-				   u32 mark);
-extern void ip6_sk_redirect(struct sk_buff *skb, struct sock *sk);
 
 struct netlink_callback;
 
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index f8a55ff..6bd1b41 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1368,11 +1368,9 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
 	if (!ndisc_parse_options(msg->opt, ndoptlen, &ndopts))
 		return;
 
-	if (!ndopts.nd_opts_rh) {
-		ip6_redirect_no_header(skb, dev_net(skb->dev),
-					skb->dev->ifindex, 0);
+	ip6_redirect(skb, dev_net(skb->dev), skb->dev->ifindex, 0);
+	if (!ndopts.nd_opts_rh)
 		return;
-	}
 
 	hdr = (u8 *)ndopts.nd_opts_rh;
 	hdr += 8;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c979dd9..151bd6c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1227,27 +1227,7 @@ static struct dst_entry *ip6_route_redirect(struct net *net,
 				flags, __ip6_route_redirect);
 }
 
-void ip6_redirect(struct sk_buff *skb, struct net *net, int oif, u32 mark)
-{
-	const struct ipv6hdr *iph = (struct ipv6hdr *) skb->data;
-	struct dst_entry *dst;
-	struct flowi6 fl6;
-
-	memset(&fl6, 0, sizeof(fl6));
-	fl6.flowi6_oif = oif;
-	fl6.flowi6_mark = mark;
-	fl6.flowi6_flags = 0;
-	fl6.daddr = iph->daddr;
-	fl6.saddr = iph->saddr;
-	fl6.flowlabel = ip6_flowinfo(iph);
-
-	dst = ip6_route_redirect(net, &fl6, &ipv6_hdr(skb)->saddr);
-	rt6_do_redirect(dst, NULL, skb);
-	dst_release(dst);
-}
-EXPORT_SYMBOL_GPL(ip6_redirect);
-
-void ip6_redirect_no_header(struct sk_buff *skb, struct net *net, int oif,
+void ip6_redirect(struct sk_buff *skb, struct net *net, int oif,
 			    u32 mark)
 {
 	const struct ipv6hdr *iph = ipv6_hdr(skb);
@@ -1266,12 +1246,7 @@ void ip6_redirect_no_header(struct sk_buff *skb, struct net *net, int oif,
 	rt6_do_redirect(dst, NULL, skb);
 	dst_release(dst);
 }
-
-void ip6_sk_redirect(struct sk_buff *skb, struct sock *sk)
-{
-	ip6_redirect(skb, sock_net(sk), sk->sk_bound_dev_if, sk->sk_mark);
-}
-EXPORT_SYMBOL_GPL(ip6_sk_redirect);
+EXPORT_SYMBOL_GPL(ip6_redirect);
 
 static unsigned int ip6_default_advmss(const struct dst_entry *dst)
 {
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v2 5/6] ipv6: modify the err to 0 when dealing with NDISC_REDIRECT
From: Duan Jiong @ 2013-09-13  3:02 UTC (permalink / raw)
  To: davem; +Cc: netdev, hannes
In-Reply-To: <52327F00.4040802@cn.fujitsu.com>

From: Duan Jiong <duanj.fnst@cn.fujitsu.com>

when dealing with redirect message, the err shoud
be assigned to 0, not EPROTO. And del the statements
for updating route.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
---
 net/ipv6/icmp.c | 3 +++
 net/ipv6/raw.c  | 3 +--
 net/ipv6/udp.c  | 2 --
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 4bde43c..6bcedcc 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -975,6 +975,9 @@ int icmpv6_err_convert(u8 type, u8 code, int *err)
 	case ICMPV6_TIME_EXCEED:
 		*err = EHOSTUNREACH;
 		break;
+	case NDISC_REDIRECT:
+		*err = 0;
+		break;
 	}
 
 	return fatal;
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 58916bb..baf86b8 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -335,8 +335,7 @@ static void rawv6_err(struct sock *sk, struct sk_buff *skb,
 		ip6_sk_update_pmtu(skb, sk, info);
 		harderr = (np->pmtudisc == IPV6_PMTUDISC_DO);
 	}
-	if (type == NDISC_REDIRECT)
-		ip6_sk_redirect(skb, sk);
+
 	if (np->recverr) {
 		u8 *payload = skb->data;
 		if (!inet->hdrincl)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index f405815..a40b392 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -525,8 +525,6 @@ void __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 
 	if (type == ICMPV6_PKT_TOOBIG)
 		ip6_sk_update_pmtu(skb, sk, info);
-	if (type == NDISC_REDIRECT)
-		ip6_sk_redirect(skb, sk);
 
 	np = inet6_sk(sk);
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v2 4/6] ip6tnl: move route updating for redirect to ndisc layer
From: Duan Jiong @ 2013-09-13  3:01 UTC (permalink / raw)
  To: davem; +Cc: netdev, hannes
In-Reply-To: <52327F00.4040802@cn.fujitsu.com>

From: Duan Jiong <duanj.fnst@cn.fujitsu.com>

In rfc2473, we can know that the tunnel ICMP redirect
message should not be reported to the source of the
original packet, so after calling ip6_tnl_err(), the
rel_msg is set to 0 in function ip4ip6_err(), and the
redirect will never be handled.

In order to deal with this, we move route updating for
redirect to ndisc layer.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
---
 net/ipv6/ip6_tunnel.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 61355f7..3ea834b 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -576,9 +576,6 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 		rel_type = ICMP_DEST_UNREACH;
 		rel_code = ICMP_FRAG_NEEDED;
 		break;
-	case NDISC_REDIRECT:
-		rel_type = ICMP_REDIRECT;
-		rel_code = ICMP_REDIR_HOST;
 	default:
 		return 0;
 	}
@@ -637,8 +634,6 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 
 		skb_dst(skb2)->ops->update_pmtu(skb_dst(skb2), NULL, skb2, rel_info);
 	}
-	if (rel_type == ICMP_REDIRECT)
-		skb_dst(skb2)->ops->redirect(skb_dst(skb2), NULL, skb2);
 
 	icmp_send(skb2, rel_type, rel_code, htonl(rel_info));
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v2 3/6] ipv6: del statements for dealing with NDISC_REDIRECT
From: Duan Jiong @ 2013-09-13  3:00 UTC (permalink / raw)
  To: davem; +Cc: netdev, hannes
In-Reply-To: <52327F00.4040802@cn.fujitsu.com>

From: Duan Jiong <duanj.fnst@cn.fujitsu.com>

Now the route updating for redirect is done in ndisc
layer.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
---
 net/ipv6/icmp.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index eef8d94..4bde43c 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -91,8 +91,6 @@ static void icmpv6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 
 	if (type == ICMPV6_PKT_TOOBIG)
 		ip6_update_pmtu(skb, net, info, 0, 0);
-	else if (type == NDISC_REDIRECT)
-		ip6_redirect(skb, net, skb->dev->ifindex, 0);
 
 	if (!(type & ICMPV6_INFOMSG_MASK))
 		if (icmp6->icmp6_type == ICMPV6_ECHO_REQUEST)
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH v2] bonding: Make alb learning packet interval configurable
From: Neil Horman @ 2013-09-13  3:01 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: David Miller, netdev, vfalico, nhorman, fubar
In-Reply-To: <20130912210613.GA15224@gospo.rdu.redhat.com>

On Thu, Sep 12, 2013 at 05:06:13PM -0400, Andy Gospodarek wrote:
> On Thu, Sep 12, 2013 at 04:49:48PM -0400, David Miller wrote:
> > From: Neil Horman <nhorman@tuxdriver.com>
> > Date: Tue, 10 Sep 2013 16:39:03 -0400
> > 
> > > From: Neil Horman <nhorman@redhat.com>
> > > 
> > > running bonding in ALB mode requires that learning packets be sent periodically,
> > > so that the switch knows where to send responding traffic.  However, depending
> > > on switch configuration, there may not be any need to send traffic at the
> > > default rate of 3 packets per second, which represents little more than wasted
> > > data.  Allow the ALB learning packet interval to be made configurable via sysfs
> > > 
> > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > 
> > I hate to be a stickler, but I'd like you to make the default value
> > documented both in the code and in the documentation.
> > 
> > Use some macro for the code "#define BOND_ALB_DEFAULT_LP_INTERVAL 1" and
> > mention the default in the bonding.txt changes.
> > 
> 
> Agree with DaveM on this.  You can just keep the one that was there and
> it should be pretty easy.
> 
Yeah, sure I can do that. v3 in the AM
Neil

^ permalink raw reply

* [PATCH v2 2/6] ipv6: just match on ICMPV6_PKT_TOOBIG in those err_handle
From: Duan Jiong @ 2013-09-13  2:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, hannes
In-Reply-To: <52327F00.4040802@cn.fujitsu.com>

From: Duan Jiong <duanj.fnst@cn.fujitsu.com>

we don't handle NDISC_REDIRECTs here any more, and the match
on ICMPV6_DEST_UNREACH is meaningless.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
---
 net/ipv6/ah6.c     | 9 ++-------
 net/ipv6/esp6.c    | 9 ++-------
 net/ipv6/ipcomp6.c | 9 ++-------
 3 files changed, 6 insertions(+), 21 deletions(-)

diff --git a/net/ipv6/ah6.c b/net/ipv6/ah6.c
index 73784c3..79fb40f 100644
--- a/net/ipv6/ah6.c
+++ b/net/ipv6/ah6.c
@@ -618,19 +618,14 @@ static void ah6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	struct ip_auth_hdr *ah = (struct ip_auth_hdr*)(skb->data+offset);
 	struct xfrm_state *x;
 
-	if (type != ICMPV6_DEST_UNREACH &&
-	    type != ICMPV6_PKT_TOOBIG &&
-	    type != NDISC_REDIRECT)
+	if (type != ICMPV6_PKT_TOOBIG)
 		return;
 
 	x = xfrm_state_lookup(net, skb->mark, (xfrm_address_t *)&iph->daddr, ah->spi, IPPROTO_AH, AF_INET6);
 	if (!x)
 		return;
 
-	if (type == NDISC_REDIRECT)
-		ip6_redirect(skb, net, skb->dev->ifindex, 0);
-	else
-		ip6_update_pmtu(skb, net, info, 0, 0);
+	ip6_update_pmtu(skb, net, info, 0, 0);
 	xfrm_state_put(x);
 }
 
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index d3618a7..6aa64e1 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -436,9 +436,7 @@ static void esp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	struct ip_esp_hdr *esph = (struct ip_esp_hdr *)(skb->data + offset);
 	struct xfrm_state *x;
 
-	if (type != ICMPV6_DEST_UNREACH &&
-	    type != ICMPV6_PKT_TOOBIG &&
-	    type != NDISC_REDIRECT)
+	if (type != ICMPV6_PKT_TOOBIG)
 		return;
 
 	x = xfrm_state_lookup(net, skb->mark, (const xfrm_address_t *)&iph->daddr,
@@ -446,10 +444,7 @@ static void esp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	if (!x)
 		return;
 
-	if (type == NDISC_REDIRECT)
-		ip6_redirect(skb, net, skb->dev->ifindex, 0);
-	else
-		ip6_update_pmtu(skb, net, info, 0, 0);
+	ip6_update_pmtu(skb, net, info, 0, 0);
 	xfrm_state_put(x);
 }
 
diff --git a/net/ipv6/ipcomp6.c b/net/ipv6/ipcomp6.c
index 5636a91..e943158 100644
--- a/net/ipv6/ipcomp6.c
+++ b/net/ipv6/ipcomp6.c
@@ -64,9 +64,7 @@ static void ipcomp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 		(struct ip_comp_hdr *)(skb->data + offset);
 	struct xfrm_state *x;
 
-	if (type != ICMPV6_DEST_UNREACH &&
-	    type != ICMPV6_PKT_TOOBIG &&
-	    type != NDISC_REDIRECT)
+	if (type != ICMPV6_PKT_TOOBIG)
 		return;
 
 	spi = htonl(ntohs(ipcomph->cpi));
@@ -75,10 +73,7 @@ static void ipcomp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	if (!x)
 		return;
 
-	if (type == NDISC_REDIRECT)
-		ip6_redirect(skb, net, skb->dev->ifindex, 0);
-	else
-		ip6_update_pmtu(skb, net, info, 0, 0);
+	ip6_update_pmtu(skb, net, info, 0, 0);
 	xfrm_state_put(x);
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v2 1/6] ipv6: del the statements for updating route in  (dccp|tcp|sctp)_v6_err
From: Duan Jiong @ 2013-09-13  2:58 UTC (permalink / raw)
  To: davem; +Cc: netdev, hannes
In-Reply-To: <52327F00.4040802@cn.fujitsu.com>

From: Duan Jiong <duanj.fnst@cn.fujitsu.com>

Because we will do route updating for redirect in nidsc layer. And
when dealing with redirect message, the dccp and sctp should like
tcp return directly.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
---
 net/dccp/ipv6.c     | 10 +++-------
 net/ipv6/tcp_ipv6.c | 12 ++++--------
 net/sctp/input.c    | 12 ------------
 net/sctp/ipv6.c     |  6 +++---
 4 files changed, 10 insertions(+), 30 deletions(-)

diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 9c61f9c..300840c 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -98,6 +98,9 @@ static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 		return;
 	}
 
+	if (type == NDISC_REDIRECT)
+		return;
+
 	sk = inet6_lookup(net, &dccp_hashinfo,
 			&hdr->daddr, dh->dccph_dport,
 			&hdr->saddr, dh->dccph_sport, inet6_iif(skb));
@@ -130,13 +133,6 @@ static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 
 	np = inet6_sk(sk);
 
-	if (type == NDISC_REDIRECT) {
-		struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie);
-
-		if (dst)
-			dst->ops->redirect(dst, sk, skb);
-	}
-
 	if (type == ICMPV6_PKT_TOOBIG) {
 		struct dst_entry *dst = NULL;
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 5c71501..d3ca8a4 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -346,6 +346,10 @@ static void tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	__u32 seq;
 	struct net *net = dev_net(skb->dev);
 
+
+	if (type == NDISC_REDIRECT)
+		return;
+
 	sk = inet6_lookup(net, &tcp_hashinfo, &hdr->daddr,
 			th->dest, &hdr->saddr, th->source, skb->dev->ifindex);
 
@@ -382,14 +386,6 @@ static void tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 
 	np = inet6_sk(sk);
 
-	if (type == NDISC_REDIRECT) {
-		struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie);
-
-		if (dst)
-			dst->ops->redirect(dst, sk, skb);
-		goto out;
-	}
-
 	if (type == ICMPV6_PKT_TOOBIG) {
 		/* We are not interested in TCP_LISTEN and open_requests
 		 * (SYN-ACKs send out by Linux are always <576bytes so
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 5f20686..0d2d4b7 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -413,18 +413,6 @@ void sctp_icmp_frag_needed(struct sock *sk, struct sctp_association *asoc,
 	sctp_retransmit(&asoc->outqueue, t, SCTP_RTXR_PMTUD);
 }
 
-void sctp_icmp_redirect(struct sock *sk, struct sctp_transport *t,
-			struct sk_buff *skb)
-{
-	struct dst_entry *dst;
-
-	if (!t)
-		return;
-	dst = sctp_transport_dst_check(t);
-	if (dst)
-		dst->ops->redirect(dst, sk, skb);
-}
-
 /*
  * SCTP Implementer's Guide, 2.37 ICMP handling procedures
  *
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index da613ce..ee12d87 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -151,6 +151,9 @@ static void sctp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	int err;
 	struct net *net = dev_net(skb->dev);
 
+	if (type == NDISC_REDIRECT)
+		return;
+
 	idev = in6_dev_get(skb->dev);
 
 	/* Fix up skb to look at the embedded net header. */
@@ -181,9 +184,6 @@ static void sctp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 			goto out_unlock;
 		}
 		break;
-	case NDISC_REDIRECT:
-		sctp_icmp_redirect(sk, transport, skb);
-		break;
 	default:
 		break;
 	}
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH  v2 0/6] ipv6: Do route updating for redirect in ndisc layer
From: Duan Jiong @ 2013-09-13  2:57 UTC (permalink / raw)
  To: davem; +Cc: netdev, hannes

From: Duan Jiong <duanj.fnst@cn.fujitsu.com>

the ip6_redirect() could be replaced with
ip6_redirect_no_header(), we could always use ip6_redirect()
for route updating in ndisc layer and use the data of the
redirected header option just for finding the socket to be
notified and then notify user in protocols' err_handler.

---
 Changes for v2:
 1.handle the update of the NDISC_REDIRECT error code directly in
   icmpv6_err_convert.
 2.squash some patchs into one patch.
 3.modify the subject of those patchs.

Duan Jiong (6):
  ipv6: del the statements for updating route in (dccp|tcp|sctp)_v6_err
  ipv6: just match on ICMPV6_PKT_TOOBIG in those err_handle
  ipv6: del statements for dealing with NDISC_REDIRECT
  ip6tnl: move route updating for redirect to ndisc layer
  ipv6: modify the err to 0 when dealing with NDISC_REDIRECT
  ipv6: Do route updating for redirect in ndisc layer

 include/net/ip6_route.h |  3 ---
 net/dccp/ipv6.c         | 10 +++-------
 net/ipv6/ah6.c          |  9 ++-------
 net/ipv6/esp6.c         |  9 ++-------
 net/ipv6/icmp.c         |  5 +++--
 net/ipv6/ip6_tunnel.c   |  5 -----
 net/ipv6/ipcomp6.c      |  9 ++-------
 net/ipv6/ndisc.c        |  6 ++----
 net/ipv6/raw.c          |  3 +--
 net/ipv6/route.c        | 29 ++---------------------------
 net/ipv6/tcp_ipv6.c     | 12 ++++--------
 net/ipv6/udp.c          |  2 --
 net/sctp/input.c        | 12 ------------
 net/sctp/ipv6.c         |  6 +++---
 14 files changed, 24 insertions(+), 96 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: [PATCHv3 linux-next] hrtimer: Add notifier when clock_was_set was called
From: Fan Du @ 2013-09-13  2:46 UTC (permalink / raw)
  To: David Miller, tglx
  Cc: herbert, steffen.klassert, dborkman, linux-kernel, netdev,
	John Stultz
In-Reply-To: <20130912.133252.425268707009916773.davem@davemloft.net>

Hi Dave/Thomas

On 2013年09月13日 01:32, David Miller wrote:
> From: Thomas Gleixner<tglx@linutronix.de>
> Date: Thu, 12 Sep 2013 16:43:37 +0200 (CEST)
>
>> So what about going back to timer_list timers and simply utilize
>> register_pm_notifier(), which will tell you that the system resumed?
>
> The thing to understand is that there are two timeouts for an IPSEC
> rule, a soft and a hard timeout.
>
> There is a gap between these two exactly so that we can negotiate a
> new encapsulation with the IPSEC gateway before communication ceases
> to be possible over the IPSEC protected path.
>
> So the idea is that the soft timeout triggers the re-negotiation,
> and after a hard timeout the IPSEC path is no longer usable and
> all communication will fail.
>
> Simply triggering a re-negoation after every suspend/resume makes
> no sense at all.  Spurious re-negotiations are undesirable.
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (*a*)

What's the differences between this with re-negotiation after every
system wall clock changing by using clock_was_set notifier?


 > On 2013年08月02日 06:35, David Miller wrote:
 >
 > I suspect the thing to do is to have system time changes generate a
 > notifier when clock_was_set() is called.
 >
 > The XFRM code would walk the rules and pretend that we hit the soft
 > timeout for every rule that we haven't hit the soft timeout yet
 > already.
 >
 > If a rule hit the soft timeout, force a hard timeout.
 >
 > When forcing a soft timeout, adjust the hard timeout to be
 > (hard_timeout - soft_timeout) into the future.



> What we want are real timers.  We want that rather than a "we
> suspended so just assume all timers expired" event which is not very
> useful for this kind of application.
>

Here we are facing two problems:)

(1) what kind timer should xfrm_state should employ, Two requirements here:
     First one, KEY lifetime should include suspend/resume time. Second one,
     system wall clock time changing(backward/forward) should *not* impact
     *timer* timeout event(not the soft/hard IPsec events fired to user space!)

     net-next commit 99565a6c471cbb66caa68347c195133017559943 ("xfrm: Make
     xfrm_state timer monotonic") by utilizing *CLOCK_BOOTTIME* has solved this problem.

(2) What I have been bugging you around here for this long time is really the second
     problem, I'm sorry I didn't make it clearly to you and others, which is below:

     Why using wall clock time to calculate soft/hard IPsec events when xfrm_state timer
     out happens in its timeout handler? Because even if xfrm_state using CLOCK_BOOTTIME,
     system wall clock time changing will surely disturb soft/hard IPsec events, which
     you raised your concern about in (*a*).

     The initial approach( http://marc.info/?l=linux-netdev&m=137534280429187&w=2) has
     tried to solve this second problem by eliminating depending system wall clock in
     xfrm_state timer timeout handler.

I think this time, I have made this situation crystal clear.

-- 
浮沉随浪只记今朝笑

--fan

^ permalink raw reply

* Re: [PATCH 10/11] ipv6: move route updating for redirect to ndisc layer
From: Duan Jiong @ 2013-09-13  1:38 UTC (permalink / raw)
  To: hannes; +Cc: davem, netdev
In-Reply-To: <20130912220435.GC2101@order.stressinduktion.org>

于 2013年09月13日 06:04, Hannes Frederic Sowa 写道:
> On Thu, Sep 12, 2013 at 06:51:49PM +0800, Duan Jiong wrote:
>> From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
>>
>> And when dealing with redirect message, the err shoud
>> be assigned to 0.
>>
>> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
>> ---
>>  net/ipv6/raw.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
>> index 58916bb..6138199 100644
>> --- a/net/ipv6/raw.c
>> +++ b/net/ipv6/raw.c
>> @@ -336,7 +336,7 @@ static void rawv6_err(struct sock *sk, struct sk_buff *skb,
>>  		harderr = (np->pmtudisc == IPV6_PMTUDISC_DO);
>>  	}
>>  	if (type == NDISC_REDIRECT)
>> -		ip6_sk_redirect(skb, sk);
>> +		err = 0;
>>  	if (np->recverr) {
>>  		u8 *payload = skb->data;
>>  		if (!inet->hdrincl)
> 
> Hm, I don't like the cirumstance that we first call icmpv6_err_convert at
> first and get back a bogus error value and later on convert it to something
> meaningful.
> 
> Either:
> a) Don't call icmpv6_err_convert at all for redirects. Then we could
>    place a WARN_ON(type == NDISC_REDIRECT) into this function to find future
>    missuse of this function with redirects or
> 
> b) handle the update of the NDISC_REDIRECT error code directly in
>    icmpv6_err_convert.
> 
> Also that you used the same headings for some commits shows that you could
> perhaps squash them into one patch.
> 
> Otherwise I'm fine with the changes, thanks.
> 

Thanks for you help, i will modify my patch.

Thanks,
  Duan

^ permalink raw reply

* RE: [PATCH v2 1/1] net: fec: fix phy reset operation to let imx6sl evk work
From: Duan Fugang-B38611 @ 2013-09-13  1:37 UTC (permalink / raw)
  To: David Miller
  Cc: shawn.guo@linaro.org, netdev@vger.kernel.org,
	bhutchings@solarflare.com, stephen@networkplumber.org,
	Li Frank-B20596, s.hauer@pengutronix.de
In-Reply-To: <20130912.171251.883636431449078988.davem@davemloft.net>

From: David Miller [mailto:davem@davemloft.net]
Data: Friday, September 13, 2013 5:13 AM

> To: Duan Fugang-B38611
> Cc: shawn.guo@linaro.org; netdev@vger.kernel.org;
> bhutchings@solarflare.com; stephen@networkplumber.org; Li Frank-B20596;
> s.hauer@pengutronix.de
> Subject: Re: [PATCH v2 1/1] net: fec: fix phy reset operation to let
> imx6sl evk work
> 
> From: Fugang Duan <B38611@freescale.com>
> Date: Wed, 11 Sep 2013 10:18:57 +0800
> 
> > Current driver only do phy reset in probe function, which is not
> > right. Since some phy clock is disabled after module probe, the phy
> > enter abnormal status, which needs do reset to recovery the phy. And
> > do ifconfig ethx up/down test, the phy also enter abnormal status.
> 
> What would disable the PHY clock after the probe?
> 
> You have to explain these kinds of things in your changelog message.
> 
> Thanks.
David,  there have another patch to saving ethernet power, which will disable phy
And MAC clock after module probe. The patch also include the information to explain
Some cases.

Of course, current Linux-next ethernet cannot work for imx6sl evk platform just because the 
Reset operation is not correct, have no dependency on saving power patch. 

Thanks,
Andy

^ permalink raw reply

* Re: ip l2tp - suspected defect using IPv6 local/remote addresses
From: Jeff Loughridge @ 2013-09-13  0:46 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: James Chapman, netdev
In-Reply-To: <1378991941.1531.5.camel@bwh-desktop.uk.level5networks.com>

I confirmed IPv6 endpoints work using a 3.5 kernel. Thanks, guys.

On Thu, Sep 12, 2013 at 9:19 AM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Thu, 2013-09-12 at 11:20 +0100, James Chapman wrote:
>> On 11 September 2013 19:52, Jeff Loughridge <jeffl@alumni.duke.edu> wrote:
>> >
>> > Using IPv6 address as L2TPv3 endpoints doesn't seem to work in
>> > iproute2 3.11. I see a cosmetic defect in the output of 'ip l2tp show
>> > tunnel'. In addition, I can't get tunnels to function with UDP or IP
>> > encapsulation.
>> >
>> > root@debian:~# ip l2tp add tunnel tunnel_id 3000 peer_tunnel_id 4000
>> > encap udp local a::1 remote a::2 udp_sport 5000 udp_dport 6000
>> > root@debian:~# ip l2tp add session tunnel_id 3000 session_id 1000
>> > peer_session_id 2000
>> > root@debian:~# ip l2tp show tunnel
>> > Tunnel 3000, encap UDP
>> >   From 127.0.0.1 to 127.0.0.1
>> >   Peer tunnel 4000
>> >   UDP source / dest ports: 5000/6000
>> > root@debian:~#
>>
>> You'll need a 3.5 or later kernel for L2TP over IPv6. I see you are
>> using 3.2. Are you using a version of iproute2 which is not matched to
>> your kernel?
> [...]
>
> The iproute2 version number only indicates which kernel features it
> supports; it is supposed to be backward-compatible.  And it really
> should not silently fail like this, although this may well be a bug in
> the API that can't be fixed in userland...
>
> Ben.
>
> --
> Ben Hutchings, Staff Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
>

^ permalink raw reply

* Re: [PATCH v2.39 0/7] MPLS actions and matches
From: Simon Horman @ 2013-09-12 22:56 UTC (permalink / raw)
  To: Ben Pfaff
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA, Ravi K,
	Isaku Yamahata
In-Reply-To: <20130912190636.GL16044-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

On Thu, Sep 12, 2013 at 12:06:36PM -0700, Ben Pfaff wrote:
> I've totally lost track of the status of this patch series.  I assume it
> needs Jesse's review.  Jesse, if I'm wrong about that, let me know and
> I'll take a pass at it.

My understanding is that you have looked over the approach
taken for the non-datapath code and were happy with it in
the context that it needed review from Jesse along with the
datapath code.

I believe it was a few revisions ago that you looked over
the series but I don't believe the non-datapath code has changed
in a meaningful way since then.

^ permalink raw reply

* Re: [PATCH v2.39 0/7] MPLS actions and matches
From: Ben Pfaff @ 2013-09-12 22:54 UTC (permalink / raw)
  To: Simon Horman
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA, Ravi K,
	Isaku Yamahata
In-Reply-To: <20130912225614.GB30229-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>

On Fri, Sep 13, 2013 at 07:56:14AM +0900, Simon Horman wrote:
> On Thu, Sep 12, 2013 at 12:06:36PM -0700, Ben Pfaff wrote:
> > I've totally lost track of the status of this patch series.  I assume it
> > needs Jesse's review.  Jesse, if I'm wrong about that, let me know and
> > I'll take a pass at it.
> 
> My understanding is that you have looked over the approach
> taken for the non-datapath code and were happy with it in
> the context that it needed review from Jesse along with the
> datapath code.
> 
> I believe it was a few revisions ago that you looked over
> the series but I don't believe the non-datapath code has changed
> in a meaningful way since then.

That sounds plausible, thanks for refreshing my memory.

^ permalink raw reply

* Re: [PATCH RFC] net: neighbour: use source address of last enqueued packet for solicitation
From: Hannes Frederic Sowa @ 2013-09-12 22:20 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev, davem
In-Reply-To: <alpine.LFD.2.00.1309092208460.1655@ja.ssi.bg>

On Mon, Sep 09, 2013 at 11:17:45PM +0300, Julian Anastasov wrote:
> 	arp_queue has packets only in NUD_INCOMPLETE state 
> (mcast_solicit=3 secs by default). And __neigh_event_send()
> now can keep many packets, 64KB from recent changes. So the
> 1st place is not guaranteed but now it is more difficult
> to kick the first packet compared to the old limit of just
> 3 packets.
> 
> 	The change can give chance for 2nd and 3th
> probe if the 1st probe is not replied, so it should be
> better to apply it:
> 
> Reviewed-by: Julian Anastasov <ja@ssi.bg>

Thanks for the review. I resend the patch as soon as net-next opens.

> 
> 	Still, I think such problems should be addressed
> with conf/{DEV,all}/arp_announce=1 or 2.

*nod*

I use this knob if I have such problems. But this patch improves
connectivity in the default configuration and we actually don't care much
about the source address in either ipv4 or ipv6. So it seemed legitimate
and simple to me.

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH 10/11] ipv6: move route updating for redirect to ndisc layer
From: Hannes Frederic Sowa @ 2013-09-12 22:04 UTC (permalink / raw)
  To: Duan Jiong; +Cc: davem, netdev
In-Reply-To: <52319CC5.10902@cn.fujitsu.com>

On Thu, Sep 12, 2013 at 06:51:49PM +0800, Duan Jiong wrote:
> From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
> 
> And when dealing with redirect message, the err shoud
> be assigned to 0.
> 
> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
> ---
>  net/ipv6/raw.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
> index 58916bb..6138199 100644
> --- a/net/ipv6/raw.c
> +++ b/net/ipv6/raw.c
> @@ -336,7 +336,7 @@ static void rawv6_err(struct sock *sk, struct sk_buff *skb,
>  		harderr = (np->pmtudisc == IPV6_PMTUDISC_DO);
>  	}
>  	if (type == NDISC_REDIRECT)
> -		ip6_sk_redirect(skb, sk);
> +		err = 0;
>  	if (np->recverr) {
>  		u8 *payload = skb->data;
>  		if (!inet->hdrincl)

Hm, I don't like the cirumstance that we first call icmpv6_err_convert at
first and get back a bogus error value and later on convert it to something
meaningful.

Either:
a) Don't call icmpv6_err_convert at all for redirects. Then we could
   place a WARN_ON(type == NDISC_REDIRECT) into this function to find future
   missuse of this function with redirects or

b) handle the update of the NDISC_REDIRECT error code directly in
   icmpv6_err_convert.

Also that you used the same headings for some commits shows that you could
perhaps squash them into one patch.

Otherwise I'm fine with the changes, thanks.

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH 1/1] net: race condition when removing virtual net_device
From: Francesco Ruggeri @ 2013-09-12 22:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev
In-Reply-To: <CA+HUmGgyCsc-er6iB_mqg9whMyaquGWinkHOQEuEHbFCYVrXBw@mail.gmail.com>

On Thu, Sep 12, 2013 at 2:48 PM, Francesco Ruggeri
<fruggeri@aristanetworks.com> wrote:
> On Thu, Sep 12, 2013 at 1:06 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Francesco Ruggeri <fruggeri@aristanetworks.com> writes:
...
> netdev_run_todo. This guarantees that no deadlocks are introduced,
> since the relative order of net_devices within each namespace is
> preserved, and only non-loopback_devs take and release the extra
> reference.

I should have said "since in net_todo_list the relative order of
net_devices within each namespace is the same as before, ...".

Sorry for any confusion.

Francesco

^ permalink raw reply

* Re: [PATCH 01/11] ipv6: Do route updating for redirect in ndisc layer
From: Hannes Frederic Sowa @ 2013-09-12 21:54 UTC (permalink / raw)
  To: Duan Jiong; +Cc: davem, netdev
In-Reply-To: <52319AEC.1010206@cn.fujitsu.com>

On Thu, Sep 12, 2013 at 06:43:56PM +0800, Duan Jiong wrote:
> -
> -void ip6_sk_redirect(struct sk_buff *skb, struct sock *sk)
> -{
> -	ip6_redirect(skb, sock_net(sk), sk->sk_bound_dev_if, sk->sk_mark);
> -}
> -EXPORT_SYMBOL_GPL(ip6_sk_redirect);
> +EXPORT_SYMBOL_GPL(ip6_redirect);
>  
>  static unsigned int ip6_default_advmss(const struct dst_entry *dst)
>  {

This breaks bisectability. Please remove the function after you cleared up all
references to it. Otherwise git bisect would throw compiler errors.

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH 1/1] net: race condition when removing virtual net_device
From: Francesco Ruggeri @ 2013-09-12 21:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David S. Miller, Eric Dumazet, Jiri Pirko, Alexander Duyck,
	Cong Wang, netdev
In-Reply-To: <87txhp249u.fsf@xmission.com>

On Thu, Sep 12, 2013 at 1:06 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Francesco Ruggeri <fruggeri@aristanetworks.com> writes:
>
>> Resending.
>
> To summarize:
>
> In netdev_run_todo, netdev_wait_allrefs, call_netdevice_notifiers you
> have observed a situation where dev_net(dev) was an invalid pointer.
>

Correct.


> My first impression is that we probably just want to remove the repeated
> call to call_netdevice_notifiers, that happens after 1 second of
> waiting.
>
> Your suggested patch below doesn't have a prayer of working, as it only
> decreases the device reference count after the loop to wait for the
> reference count to go to zero.
>

Actually it did work. In 3.4 it avoided the crash in
inetpeer_invalidate_tree and it did not cause any visible adverse
effects after running it several days on multiple servers that create
and destroy namespaces on a regular basis.
The extra refcount that is taken and released is on the loopback_dev
of the namespace that the net_device is in. The assumption is that
loopback_dev is always the last net_device to be destroyed in a
namespace. So each net_device in a namespace (except for the
loopback_dev itself) takes such reference when it is unlisted, and it
releases it after it is destroyed (after netdev_wait_allrefs).
The patch also makes sure that any loopback_devs are at the tail of
net_todo_list within every namespace in every instance of
netdev_run_todo. This guarantees that no deadlocks are introduced,
since the relative order of net_devices within each namespace is
preserved, and only non-loopback_devs take and release the extra
reference.

> Your patch modified to grab a count on the network namespace the device
> references and not the device itself might make sense, but that runs the
> risk of incrementing the network namespace counts after the network
> namespace is down.

Right.

>
> Simply not rerunning call_netdevice_notifiers seems like the proper
> approach to fix this.
>

That would be great. There would still be one scenario to take care of though:

- veth interfaces v0 and v1 are in namespaces ns0 and ns1.
- process p0 unregisters v0, which also causes v1 to be unregistered.
When p0 enters netdev_run_todo both v0 and v1 are in net_todo_list and
have been unlisted from their namespaces.
- then in p0's netdev_run_todo:

void netdev_run_todo(void)
{
        struct list_head list;

        /* Snapshot list, allow later requests */
        list_replace_init(&net_todo_list, &list);

        __rtnl_unlock();


        /* Wait for rcu callbacks to finish before next phase */
        if (!list_empty(&list))
                rcu_barrier();

********* Assume ns1 is destroyed by another process here ************

        while (!list_empty(&list)) {
                struct net_device *dev
                        = list_first_entry(&list, struct net_device, todo_list);
                list_del(&dev->todo_list);

                rtnl_lock();
                call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);

********* dst_dev_event/dst_ifdown is invoked here, and it tries to access
            dev_net(dev)->loopback_dev.
            In case of v1 this would be an invalid pointer. *****************

                __rtnl_unlock();

Similar scenarios apply if v1 is a vlan or macvlan interface, and v0
is its real device.

Francesco

^ permalink raw reply

* Re: [PATCH net v2] net: sctp: fix ipv6 ipsec encryption bug in sctp_v6_xmit
From: David Miller @ 2013-09-12 21:26 UTC (permalink / raw)
  To: vyasevich
  Cc: dborkman, netdev, linux-sctp, adobriyan, steffen.klassert, hannes
In-Reply-To: <52308C88.10604@gmail.com>

From: Vlad Yasevich <vyasevich@gmail.com>
Date: Wed, 11 Sep 2013 11:30:16 -0400

> On 09/11/2013 10:58 AM, Daniel Borkmann wrote:
>> Alan Chester reported an issue with IPv6 on SCTP that IPsec traffic is
>> not
>> being encrypted, whereas on IPv4 it is. Setting up an AH + ESP
>> transport
>> does not seem to have the desired effect:
...
>> Reported-by: Alan Chester <alan.chester@tekelec.com>
>> Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>> Cc: Steffen Klassert <steffen.klassert@secunet.com>
>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> 
> Acked-by: Vlad Yasevich <vyasevich@gmail.com>

Applied and queued up for -stable, thanks.

Yeah it is tricky to share a common transmit path with the
way SCTP works.

^ permalink raw reply

* Re: [PATCH net V2] tuntap: correctly handle error in tun_set_iff()
From: David Miller @ 2013-09-12 21:22 UTC (permalink / raw)
  To: mst; +Cc: jasowang, netdev, linux-kernel, wannes.rombouts
In-Reply-To: <20130911104350.GA25316@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Wed, 11 Sep 2013 13:43:50 +0300

> On Wed, Sep 11, 2013 at 06:09:48PM +0800, Jason Wang wrote:
>> Commit c8d68e6be1c3b242f1c598595830890b65cea64a
>> (tuntap: multiqueue support) only call free_netdev() on error in
>> tun_set_iff(). This causes several issues:
>> 
>> - memory of tun security were leaked
>> - use after free since the flow gc timer was not deleted and the tfile
>>   were not detached
>> 
>> This patch solves the above issues.
>> 
>> Reported-by: Wannes Rombouts <wannes.rombouts@epitech.eu>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH] xen-netback: fix possible format string flaw
From: David Miller @ 2013-09-12 21:20 UTC (permalink / raw)
  To: Ian.Campbell; +Cc: keescook, linux-kernel, xen-devel, netdev
In-Reply-To: <1378883893.3057.60.camel@dagon.hellion.org.uk>

From: Ian Campbell <Ian.Campbell@citrix.com>
Date: Wed, 11 Sep 2013 08:18:13 +0100

> On Tue, 2013-09-10 at 21:39 -0700, Kees Cook wrote:
>> This makes sure a format string cannot accidentally leak into the
>> kthread_run() call.
>> 
>> Signed-off-by: Kees Cook <keescook@chromium.org>
> 
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox