Netdev List
 help / color / mirror / Atom feed
* Re: [RFC PATCH net-next] net: ipv6: Add a sysctl to make optimistic addresses useful candidates
From: Erik Kline @ 2014-10-21  4:02 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, hannes, Lorenzo Colitti, Eric Dumazet
In-Reply-To: <1413844815.31953.32.camel@decadent.org.uk>

Thanks.  Patch v2 coming shortly.

On Tue, Oct 21, 2014 at 7:40 AM, Ben Hutchings <ben@decadent.org.uk> wrote:
> On Fri, 2014-10-17 at 13:31 +0900, Erik Kline wrote:
> [...]
>> --- a/include/uapi/linux/ipv6.h
>> +++ b/include/uapi/linux/ipv6.h
>> @@ -154,6 +154,7 @@ enum {
>>       DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN,
>>       DEVCONF_PROXY_NDP,
>>       DEVCONF_OPTIMISTIC_DAD,
>> +     DEVCONF_USE_OPTIMISTIC,
>>       DEVCONF_ACCEPT_SOURCE_ROUTE,
>>       DEVCONF_MC_FORWARDING,
>>       DEVCONF_DISABLE_IPV6,
> [...]
>
> As this is UAPI, the new entry must be added at the end of the
> enumeration (well, before DEVCONF_MAX).
>
> Ben.
>
> --
> Ben Hutchings
> Reality is just a crutch for people who can't handle science fiction.

^ permalink raw reply

* [net-next] iptunnel: Fix iptunnel_xmit return code for stats maintenance
From: Andy Zhou @ 2014-10-21  2:22 UTC (permalink / raw)
  To: davem; +Cc: netdev, Andy Zhou

iptunnel_xmit() currently always return >= 0 instead of proper error
code, that is used to maintain stats. For example, current return code
conflicts with how iptunnel_xmit_stats() maintains stats.

Unfortunately, the return code can not be changed without readjusting
how SKB memory is managed through the call chain.  The following two
rules are adopted for this patch:

1) Proper error code are always propagate back through the call chain
   so that the caller can maintain stats.

2) Tunnel xmit functions always free resources, e.g. skb and route
   entry.

Signed-off-by: Andy Zhou <azhou@nicira.com>
---
 drivers/net/vxlan.c            |   21 +++++++++++++--------
 include/net/ip_tunnels.h       |    7 +++++++
 net/ipv4/geneve.c              |    8 ++++++--
 net/ipv4/ip_tunnel_core.c      |   14 +++++++++++---
 net/openvswitch/vport-geneve.c |    5 ++---
 net/openvswitch/vport-gre.c    |    1 +
 net/openvswitch/vport-vxlan.c  |    6 +++---
 net/openvswitch/vport.c        |    8 ++++----
 8 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ca30982..93348cb 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1626,8 +1626,10 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 	bool udp_sum = !vs->sock->sk->sk_no_check_tx;
 
 	skb = udp_tunnel_handle_offloads(skb, udp_sum);
-	if (IS_ERR(skb))
-		return -EINVAL;
+	if (IS_ERR(skb)) {
+		err = -EINVAL;
+		goto error;
+	}
 
 	min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
 			+ VXLAN_HLEN + sizeof(struct iphdr)
@@ -1636,13 +1638,15 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 	/* Need space for new headers (invalidates iph ptr) */
 	err = skb_cow_head(skb, min_headroom);
 	if (unlikely(err))
-		return err;
+		goto error;
 
 	if (vlan_tx_tag_present(skb)) {
 		if (WARN_ON(!__vlan_put_tag(skb,
 					    skb->vlan_proto,
-					    vlan_tx_tag_get(skb))))
-			return -ENOMEM;
+					    vlan_tx_tag_get(skb)))) {
+			err = -ENOMEM;
+			goto error;
+		}
 
 		skb->vlan_tci = 0;
 	}
@@ -1655,6 +1659,10 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 
 	return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos,
 				   ttl, df, src_port, dst_port, xnet);
+error:
+	kfree_skb(skb);
+	ip_rt_put(rt);
+	return err;
 }
 EXPORT_SYMBOL_GPL(vxlan_xmit_skb);
 
@@ -1786,9 +1794,6 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 				     tos, ttl, df, src_port, dst_port,
 				     htonl(vni << 8),
 				     !net_eq(vxlan->net, dev_net(vxlan->dev)));
-
-		if (err < 0)
-			goto rt_tx_error;
 		iptunnel_xmit_stats(err, &dev->stats, dev->tstats);
 #if IS_ENABLED(CONFIG_IPV6)
 	} else {
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 5bc6ede..80bcf2e 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -174,6 +174,13 @@ static inline u8 ip_tunnel_ecn_encap(u8 tos, const struct iphdr *iph,
 }
 
 int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto);
+
+/* Transmit a packet over IP tunnel
+ * Returns:
+ *	0 Congestion notification received
+ *	>0  Number of bytes in the packet successfully sent
+ *	<0 packet dropped due to error
+ */
 int iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 		  __be32 src, __be32 dst, __u8 proto,
 		  __u8 tos, __u8 ttl, __be16 df, bool xnet);
diff --git a/net/ipv4/geneve.c b/net/ipv4/geneve.c
index 065cd94..90fea48 100644
--- a/net/ipv4/geneve.c
+++ b/net/ipv4/geneve.c
@@ -129,14 +129,14 @@ int geneve_xmit_skb(struct geneve_sock *gs, struct rtable *rt,
 
 	err = skb_cow_head(skb, min_headroom);
 	if (unlikely(err))
-		return err;
+		goto error;
 
 	if (vlan_tx_tag_present(skb)) {
 		if (unlikely(!__vlan_put_tag(skb,
 					     skb->vlan_proto,
 					     vlan_tx_tag_get(skb)))) {
 			err = -ENOMEM;
-			return err;
+			goto error;
 		}
 		skb->vlan_tci = 0;
 	}
@@ -146,6 +146,10 @@ int geneve_xmit_skb(struct geneve_sock *gs, struct rtable *rt,
 
 	return udp_tunnel_xmit_skb(gs->sock, rt, skb, src, dst,
 				   tos, ttl, df, src_port, dst_port, xnet);
+error:
+	kfree_skb(skb);
+	ip_rt_put(rt);
+	return err;
 }
 EXPORT_SYMBOL_GPL(geneve_xmit_skb);
 
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 88c386c..b3ba4a3 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -77,9 +77,17 @@ int iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 	__ip_select_ident(iph, skb_shinfo(skb)->gso_segs ?: 1);
 
 	err = ip_local_out_sk(sk, skb);
-	if (unlikely(net_xmit_eval(err)))
-		pkt_len = 0;
-	return pkt_len;
+
+	/* Deal with positive error numbers. Filter out NET_XMIT_CN */
+	if (err > 0)
+		return net_xmit_errno(err);
+
+	/* Success, return number of bytes transmitted */
+	if (err == 0)
+		err = pkt_len;
+
+	/* Return pkt_len or an error code */
+	return err;
 }
 EXPORT_SYMBOL_GPL(iptunnel_xmit);
 
diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
index 106a9d8..34276fb 100644
--- a/net/openvswitch/vport-geneve.c
+++ b/net/openvswitch/vport-geneve.c
@@ -206,15 +206,14 @@ static int geneve_tnl_send(struct vport *vport, struct sk_buff *skb)
 	tunnel_id_to_vni(tun_key->tun_id, vni);
 	skb->ignore_df = 1;
 
-	err = geneve_xmit_skb(geneve_port->gs, rt, skb, fl.saddr,
+	return  geneve_xmit_skb(geneve_port->gs, rt, skb, fl.saddr,
 			      tun_key->ipv4_dst, tun_key->ipv4_tos,
 			      tun_key->ipv4_ttl, df, sport, dport,
 			      tun_key->tun_flags, vni,
 			      tun_info->options_len, (u8 *)tun_info->options,
 			      false);
-	if (err < 0)
-		ip_rt_put(rt);
 error:
+	kfree_skb(skb);
 	return err;
 }
 
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index 108b82d..8721b30 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -200,6 +200,7 @@ static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
 err_free_rt:
 	ip_rt_put(rt);
 error:
+	kfree_skb(skb);
 	return err;
 }
 
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 2735e01..ace849a 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -174,15 +174,15 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 
 	src_port = udp_flow_src_port(net, skb, 0, 0, true);
 
-	err = vxlan_xmit_skb(vxlan_port->vs, rt, skb,
+	return vxlan_xmit_skb(vxlan_port->vs, rt, skb,
 			     fl.saddr, tun_key->ipv4_dst,
 			     tun_key->ipv4_tos, tun_key->ipv4_ttl, df,
 			     src_port, dst_port,
 			     htonl(be64_to_cpu(tun_key->tun_id) << 8),
 			     false);
-	if (err < 0)
-		ip_rt_put(rt);
+
 error:
+	kfree_skb(skb);
 	return err;
 }
 
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 6015802..eb0a72f 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -480,11 +480,11 @@ int ovs_vport_send(struct vport *vport, struct sk_buff *skb)
 		stats->tx_packets++;
 		stats->tx_bytes += sent;
 		u64_stats_update_end(&stats->syncp);
-	} else if (sent < 0) {
-		ovs_vport_record_error(vport, VPORT_E_TX_ERROR);
-		kfree_skb(skb);
-	} else
+	} else if (sent == -ENOBUFS || sent == -ENOMEM || sent == 0) {
 		ovs_vport_record_error(vport, VPORT_E_TX_DROPPED);
+	} else {
+		ovs_vport_record_error(vport, VPORT_E_TX_ERROR);
+	}
 
 	return sent;
 }
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 2/2][V2] xfrm6: fix a potential use after free in xfrm6_policy.c
From: roy.qing.li @ 2014-10-21  0:34 UTC (permalink / raw)
  To: netdev

From: Li RongQing <roy.qing.li@gmail.com>

pskb_may_pull() maybe change skb->data and make nh and exthdr pointer
oboslete, so recompute the nd and exthdr

V2: insert a space between date type(like __be16) and * as suggested by
Sergei Shtylyov

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
---
 net/ipv6/xfrm6_policy.c |   11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index ac49f84..115fd3b 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -170,8 +170,10 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse)
 		case IPPROTO_DCCP:
 			if (!onlyproto && (nh + offset + 4 < skb->data ||
 			     pskb_may_pull(skb, nh + offset + 4 - skb->data))) {
-				__be16 *ports = (__be16 *)exthdr;
+				__be16 *ports;
 
+				nh = skb_network_header(skb);
+				ports = (__be16 *)(nh + offset);
 				fl6->fl6_sport = ports[!!reverse];
 				fl6->fl6_dport = ports[!reverse];
 			}
@@ -180,8 +182,10 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse)
 
 		case IPPROTO_ICMPV6:
 			if (!onlyproto && pskb_may_pull(skb, nh + offset + 2 - skb->data)) {
-				u8 *icmp = (u8 *)exthdr;
+				u8 *icmp;
 
+				nh = skb_network_header(skb);
+				icmp = (u8 *)(nh + offset);
 				fl6->fl6_icmp_type = icmp[0];
 				fl6->fl6_icmp_code = icmp[1];
 			}
@@ -192,8 +196,9 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse)
 		case IPPROTO_MH:
 			if (!onlyproto && pskb_may_pull(skb, nh + offset + 3 - skb->data)) {
 				struct ip6_mh *mh;
-				mh = (struct ip6_mh *)exthdr;
 
+				nh = skb_network_header(skb);
+				mh = (struct ip6_mh *)(nh + offset);
 				fl6->fl6_mh_type = mh->ip6mh_type;
 			}
 			fl6->flowi6_proto = nexthdr;
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH] netfilter: Fix wastful cleanup check for unconfirmed conn in get_next_corpse
From: Feng Gao @ 2014-10-21  0:31 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Patrick McHardy, kadlec, davem
  Cc: Netfilter Developer Mailing List, coreteam, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 788 bytes --]

Hi all,

I am sorry to send the patch commit again because the last email is
not plain text and is rejected by some servers.

This is the patch to branch master of kernel.

The function get_next_corpse is only invoked by nf_ct_iterate_cleanup
in one while loop, and it will check the per cpu unconfirmed conntrack
list every time.

I think the whole list of unconfirmed conntracks could be accessed by
one call, so the others are not necessary.

So I move the checks outside the get_next_corpse, and create one new
function clean_up_unconfirmed_conntracks.
Let the nf_ct_iterate_cleanup invokes the
clean_up_unconfirmed_conntracks after the while loop.

These codes have already exist for a long time. Firstly I think maybe
there is some reason, but I fail to get it.


Best Regards
Feng

[-- Attachment #2: 0001-netfilter-Fix-wastful-cleanup-check-for-unconfirmed-.patch --]
[-- Type: application/octet-stream, Size: 2964 bytes --]

From 52de457b3cfd1e94a52df1c3dfcd9dbf3511fa0d Mon Sep 17 00:00:00 2001
From: fgao <gfree.wind@gmail.com>
Date: Mon, 20 Oct 2014 07:18:05 -0700
Subject: [PATCH 1/1] netfilter: Fix wastful cleanup check for unconfirmed conn in get_next_corpse

The function get_next_corpse is used to iterate the conntracks.
It will check the per cpu unconfirmed list of every cpu too.
Now it is only invoked by nf_ct_iterate_cleanup in one while loop.
Actually the unconfirmed list could be accessed completely by one call, then the others are wastful.

So move the unconfirmed list check outside the function get_next_corpse and create one new function
Let the nf_ct_iterate_cleanup invokes the new function clean_up_unconfirmed_conntracks once after the loops.

Signed-off-by: fgao <gfree.wind@gmail.com>
---
 net/netfilter/nf_conntrack_core.c | 36 ++++++++++++++++++++++++------------
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 5016a69..ace7c2c2 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1348,6 +1348,28 @@ static void nf_conntrack_attach(struct sk_buff *nskb, const struct sk_buff *skb)
 	nf_conntrack_get(nskb->nfct);
 }
 
+static void clean_up_unconfirmed_conntracks(struct net *net,
+		int (*iter)(struct nf_conn *i, void *data),
+		void *data)
+{
+	struct nf_conntrack_tuple_hash *h;
+	struct nf_conn *ct;
+	struct hlist_nulls_node *n;
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		struct ct_pcpu *pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu);
+
+		spin_lock_bh(&pcpu->lock);
+		hlist_nulls_for_each_entry(h, n, &pcpu->unconfirmed, hnnode) {
+			ct = nf_ct_tuplehash_to_ctrack(h);
+			if (iter(ct, data))
+				set_bit(IPS_DYING_BIT, &ct->status);
+		}
+		spin_unlock_bh(&pcpu->lock);
+	}
+}
+
 /* Bring out ya dead! */
 static struct nf_conn *
 get_next_corpse(struct net *net, int (*iter)(struct nf_conn *i, void *data),
@@ -1356,7 +1378,6 @@ get_next_corpse(struct net *net, int (*iter)(struct nf_conn *i, void *data),
 	struct nf_conntrack_tuple_hash *h;
 	struct nf_conn *ct;
 	struct hlist_nulls_node *n;
-	int cpu;
 	spinlock_t *lockp;
 
 	for (; *bucket < net->ct.htable_size; (*bucket)++) {
@@ -1376,17 +1397,6 @@ get_next_corpse(struct net *net, int (*iter)(struct nf_conn *i, void *data),
 		local_bh_enable();
 	}
 
-	for_each_possible_cpu(cpu) {
-		struct ct_pcpu *pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu);
-
-		spin_lock_bh(&pcpu->lock);
-		hlist_nulls_for_each_entry(h, n, &pcpu->unconfirmed, hnnode) {
-			ct = nf_ct_tuplehash_to_ctrack(h);
-			if (iter(ct, data))
-				set_bit(IPS_DYING_BIT, &ct->status);
-		}
-		spin_unlock_bh(&pcpu->lock);
-	}
 	return NULL;
 found:
 	atomic_inc(&ct->ct_general.use);
@@ -1411,6 +1421,8 @@ void nf_ct_iterate_cleanup(struct net *net,
 
 		nf_ct_put(ct);
 	}
+
+	clean_up_unconfirmed_conntracks(net, iter, data);
 }
 EXPORT_SYMBOL_GPL(nf_ct_iterate_cleanup);
 
-- 
1.9.1


^ permalink raw reply related

* Re: Queue with wait-free enqueue, blocking dequeue, splice
From: Mathieu Desnoyers @ 2014-10-21  0:04 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Paul E. McKenney, netdev, Jamal Hadi Salim
In-Reply-To: <20141020160237.302aa17c@redhat.com>

----- Original Message -----
> From: "Jesper Dangaard Brouer" <brouer@redhat.com>
> To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> Cc: brouer@redhat.com, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>, netdev@vger.kernel.org, "Jamal Hadi Salim"
> <jhs@mojatatu.com>
> Sent: Monday, October 20, 2014 10:02:37 AM
> Subject: Re: Queue with wait-free enqueue, blocking dequeue, splice
> 
> 
> On Sat, 18 Oct 2014 11:48:12 +0000 (UTC) Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
> 
> > Following our LPC discussion on lock-free queue algorithms
> > for qdisc, here is some info on the wfcqueue implementation
> > found in Userspace RCU. See http://urcu.so for info and
> > git repository.
> 
> Thank for following up on our very interesting discussions.
> 
> I've started with the more simple variant "urcu/static/wfqueue.h" to
> understand the concepts.  And I'm now reading wfcqueue.h, which I guess
> it replacing wfqueue.h.

Yep, this is correct.

> 
>  
> > Here is the wfcqueue ported to the Linux kernel I sent last
> > year as RFC:
> > https://lkml.org/lkml/2013/3/14/289
> > 
> > I'm very interested to learn if it fits well for your
> > use-case,
> 
> Does this wfcqueue API support bulk dequeue?  (A feature needed for the
> lock-less qdisc implementation, else it cannot compete with our new
> bulk dequeue strategy).

Yes, it does. See the "__wfcq_splice" API.

> 
> AFAIK your queue implementation is a CAS-based, Wait-Free on enqueue,
> but Lock-Free on dequeue with the potential for waiting/blocking on
> a enqueue processes.
>  I'm not 100% sure, that we want this behavior for the qdisc system.

It's actually xchg-based (not CAS-based). It is indeed wait-free
in the strictest sense of the term on enqueue (at least on x86,
some other arch implement xchg using ll/sc, which is not strictly
wait-free).

On dequeue, it can busy-wait for a short while that the enqueue
completes. Note that in kernel, since we disable preemption during
enqueue, the dequeue does not have to ever block, just busy-looping
is OK, since the longest thing that could nest over the enqueue
is possibly an interrupt and softirq. So yes, I guess the dequeue
would qualify as lock-free.

> 
> I can certainly use the wfcq_empty() check,

Not sure why you would want to use it, considering that the dequeue
operation implies it. If there is nothing to dequeue, we just return
immediately. Dequeue operation does not block on empty queue. It
just busy waits if it happen to see the queue in an intermediate
state of the enqueue operation (which is very short, few instructions
at most, with preemption disabled).

> but I guess I need to
> maintain a separate counter to maintain the qdisc limit, right?
> (I would use the approximate/split counter API percpu_counter to keep
> this scalable, and wfcq_empty() would provide an accurate empty check)

Yes for split counters, not sure why you need the empty check explicitly
in your use-case though.

> 
> 
> I think, we/I should start micro benchmarking the different approaches.
> As our time budget is only 67.2ns
>  http://netoptimizer.blogspot.dk/2014/05/the-calculations-10gbits-wirespeed.html
> (or bulking tricks artificially "increase" this budget)
> 
> 
> The motivation behind this lockless qdisc is, the current qdisc locking
> cost 48ns, see slide 9/16 titled "Qdisc locking is nasty":
>  http://people.netfilter.org/hawk/presentations/LinuxPlumbers2014/performance_tx_qdisc_bulk_LPC2014.pdf
> 

Chances are that the scheme:

__wfcq_enqueue() on enqueue

__wfcq_splice() for bulk dequeue

__wfcq_first()
__wfcq_next()  for iteration on the splice dest queue

Would be must faster than the ton of locks you use currently.
The nice part about using just enqueue and splice is that you
don't need locking on the queue at all. Iteration on the
destination queue can be done by a single thread, so no need
for explicit locking there neither.

By the way, you should look at the kernel wfcq implementation
I posted earlier. It's more compact that the one in userspace RCU
because we don't need to support blocking/nonblocking modes,
because we have the luxury of disabling preemption in the kernel.

I look forward to see the numbers,

Thanks!

Mathieu

> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Sr. Network Kernel Developer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply

* Re: [RFC v2 2/6] driver-core: add driver async_probe support
From: Luis R. Rodriguez @ 2014-10-20 23:43 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Greg Kroah-Hartman, Wu Zhangjin, Takashi Iwai, Tejun Heo,
	Arjan van de Ven, linux-kernel@vger.kernel.org, Oleg Nesterov,
	hare, Andrew Morton, Tetsuo Handa, Joseph Salisbury,
	Benjamin Poirier, Santosh Rastapur, Kay Sievers,
	One Thousand Gnomes, Tim Gardner, Pierre Fersing,
	Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
	Abhijit Mahajan <abhi
In-Reply-To: <20140905221029.GA35667@core.coreip.homeip.net>

> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> index 83e910a..49fe573 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -10,6 +10,7 @@
>   *
>   */
>
> +#include <linux/async.h>
>  #include <linux/device.h>
>  #include <linux/module.h>
>  #include <linux/errno.h>
> @@ -547,15 +548,12 @@ void bus_probe_device(struct device *dev)
>  {
>         struct bus_type *bus = dev->bus;
>         struct subsys_interface *sif;
> -       int ret;
>
>         if (!bus)
>                 return;
>
> -       if (bus->p->drivers_autoprobe) {
> -               ret = device_attach(dev);
> -               WARN_ON(ret < 0);
> -       }
> +       if (bus->p->drivers_autoprobe)
> +               device_initial_probe(dev);
>
>         mutex_lock(&bus->p->mutex);
>         list_for_each_entry(sif, &bus->p->interfaces, node)
> @@ -657,6 +655,17 @@ static ssize_t uevent_store(struct device_driver *drv, const char *buf,
>  }
>  static DRIVER_ATTR_WO(uevent);

Based on my review with my latest changes this is what I was missing,
I'll be sure to address this.

 Luis

^ permalink raw reply

* Re: [RFC PATCH net-next] net: ipv6: Add a sysctl to make optimistic addresses useful candidates
From: Ben Hutchings @ 2014-10-20 22:40 UTC (permalink / raw)
  To: Erik Kline; +Cc: netdev, hannes, lorenzo, edumazet
In-Reply-To: <1413520309-13814-1-git-send-email-ek@google.com>

[-- Attachment #1: Type: text/plain, Size: 588 bytes --]

On Fri, 2014-10-17 at 13:31 +0900, Erik Kline wrote:
[...]
> --- a/include/uapi/linux/ipv6.h
> +++ b/include/uapi/linux/ipv6.h
> @@ -154,6 +154,7 @@ enum {
>  	DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN,
>  	DEVCONF_PROXY_NDP,
>  	DEVCONF_OPTIMISTIC_DAD,
> +	DEVCONF_USE_OPTIMISTIC,
>  	DEVCONF_ACCEPT_SOURCE_ROUTE,
>  	DEVCONF_MC_FORWARDING,
>  	DEVCONF_DISABLE_IPV6,
[...]

As this is UAPI, the new entry must be added at the end of the
enumeration (well, before DEVCONF_MAX).

Ben.

-- 
Ben Hutchings
Reality is just a crutch for people who can't handle science fiction.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply

* Re: [linux-nics] [PATCH 1/1] e1000: unset IFF_UNICAST_FLT on WMware 82545EM
From: Jeff Kirsher @ 2014-10-20 22:25 UTC (permalink / raw)
  To: Francesco Ruggeri
  Cc: netdev, linux.nics, pv-drivers, fruggeri, linux-kernel,
	e1000-devel, davem
In-Reply-To: <20141020221115.1FC9B480090@fruggeri-Arora18.sjc.aristanetworks.com>

[-- Attachment #1: Type: text/plain, Size: 551 bytes --]

On Mon, 2014-10-20 at 15:11 -0700, Francesco Ruggeri wrote:
> VMWare's e1000 implementation does not seem to support unicast
> filtering.
> This can be observed by configuring a macvlan interface on eth0 in a
> VM in
> VMWare Fusion 5.0.5, and trying to use that interface instead of eth0.
> Tested on 3.16.
> 
> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
> ---
>  drivers/net/ethernet/intel/e1000/e1000_main.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)

Thanks Francesco, I will add your patch to my queue.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: qdisc running
From: Jamal Hadi Salim @ 2014-10-20 22:17 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: john Fastabend, Herbert Xu, netdev@vger.kernel.org, eric Dumazet,
	Mathieu Desnoyers
In-Reply-To: <20141020181756.2c8f33b9@redhat.com>

On 10/20/14 12:17, Jesper Dangaard Brouer wrote:
>
> On Sun, 19 Oct 2014 15:24:42 -0400 Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>

> I guess it is good for our recent dequeue batching.

It is i think ;->

> But I think/hope we
> can come up with a scheme that does not requires 6 lock/unlock
> operations (as illustrated on slide 9).
>

To be clear:
2 locks + 2 unlock and 2 atomic ops.


> John and I have talked about doing a lockless qdisc, but maintaining
> this __QDISC___STATE_RUNNING in a lockless scenario, would cost us
> extra atomic ops...
>

In the animation this __QDISC___STATE_RUNNING is shown as "occupied"
flag. It is like someone is in the toilet and you cant come in;->
They have to finish dropping the packages into the toilet^Whardware ;->
If it is occupied, you put your package outside and go.

> Are we still sure, that this model of only allowing a single CPU in the
> dequeue path, is still the best solution?

For sure it is the best if you want to batch. Look at that last orange
guy picking all the packages (busylock.swf). This is where all the
batching would  happen.

>(The TXQ lock should already
> protect several CPUs in this code path).


Note:
Maybe for the orange guy (the dequeur) the tx lock could
be avoided? Double check the code. Important to note under
busy period contention is reduced to :
1 lock + 1 unlock + 2 atomic ops for N-1 CPUs.
The orange guy on the other hand is doing 2 lock/unlock.


> I can see that you really needed the budget/fairness in the dequeue
> loop, that we recently mangled with.
>

Yes, fairness is needed so the orange guy doesnt spend all his cycles
doing all the work (that was the basis of my presentation); unless
that is not an issue and the scheduler would move things away from
that cpu.


> What tool do I use to play these SWF files? (I tried VLC but no luck).
>

Firefox should work fine.

cheers,
jamal

^ permalink raw reply

* [PATCH 1/1] e1000: unset IFF_UNICAST_FLT on WMware 82545EM
From: Francesco Ruggeri @ 2014-10-20 22:11 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, davem, pv-drivers, linux.nics, fruggeri,
	e1000-devel

VMWare's e1000 implementation does not seem to support unicast filtering.
This can be observed by configuring a macvlan interface on eth0 in a VM in
VMWare Fusion 5.0.5, and trying to use that interface instead of eth0.
Tested on 3.16.

Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 5f6aded..24f3986 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -1075,7 +1075,10 @@ static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 				  NETIF_F_HW_CSUM |
 				  NETIF_F_SG);
 
-	netdev->priv_flags |= IFF_UNICAST_FLT;
+	/* Do not set IFF_UNICAST_FLT for VMWare's 82545EM */
+	if (hw->device_id != E1000_DEV_ID_82545EM_COPPER ||
+	    hw->subsystem_vendor_id != PCI_VENDOR_ID_VMWARE)
+		netdev->priv_flags |= IFF_UNICAST_FLT;
 
 	adapter->en_mng_pt = e1000_enable_mng_pass_thru(hw);
 
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH] netfilter: log: protect nf_log_register against double registering
From: Marcelo Ricardo Leitner @ 2014-10-20 21:58 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev

Currently, despite the comment right before the function,
nf_log_register allows registering two loggers on with the same type and
end up overwriting the previous register.

Not a real issue today as current tree doesn't have two loggers for the
same type but it's better to get this protected.

Also make sure that all of its callers do error checking.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---

Notes:
    Please let me know if you have any issues with the identation on
    nf_log_register. I just couldn't find a better one.
    
    Thanks

 net/ipv4/netfilter/nf_log_arp.c  |  8 +++++++-
 net/ipv4/netfilter/nf_log_ipv4.c |  8 +++++++-
 net/ipv6/netfilter/nf_log_ipv6.c |  8 +++++++-
 net/netfilter/nf_log.c           | 13 ++++++++++++-
 4 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/netfilter/nf_log_arp.c b/net/ipv4/netfilter/nf_log_arp.c
index ccfc78db12ee8acae68faf451f2cf6bc5597f2c1..8b39174b7be390397a110ec9d3ed497bf8ce6d26 100644
--- a/net/ipv4/netfilter/nf_log_arp.c
+++ b/net/ipv4/netfilter/nf_log_arp.c
@@ -130,7 +130,13 @@ static int __init nf_log_arp_init(void)
 	if (ret < 0)
 		return ret;
 
-	nf_log_register(NFPROTO_ARP, &nf_arp_logger);
+	ret = nf_log_register(NFPROTO_ARP, &nf_arp_logger);
+	if (ret < 0) {
+		pr_err("log: failed to register logger\n");
+		unregister_pernet_subsys(&nf_log_arp_net_ops);
+		return ret;
+	}
+
 	return 0;
 }
 
diff --git a/net/ipv4/netfilter/nf_log_ipv4.c b/net/ipv4/netfilter/nf_log_ipv4.c
index 078bdca1b607a167e05e7cf1bdfedccdd5aca92a..b3cb2ff6580343a9f7537aa2f48fd23858872b4d 100644
--- a/net/ipv4/netfilter/nf_log_ipv4.c
+++ b/net/ipv4/netfilter/nf_log_ipv4.c
@@ -366,7 +366,13 @@ static int __init nf_log_ipv4_init(void)
 	if (ret < 0)
 		return ret;
 
-	nf_log_register(NFPROTO_IPV4, &nf_ip_logger);
+	ret = nf_log_register(NFPROTO_IPV4, &nf_ip_logger);
+	if (ret < 0) {
+		pr_err("log: failed to register logger\n");
+		unregister_pernet_subsys(&nf_log_ipv4_net_ops);
+		return ret;
+	}
+
 	return 0;
 }
 
diff --git a/net/ipv6/netfilter/nf_log_ipv6.c b/net/ipv6/netfilter/nf_log_ipv6.c
index 7b17a0be93e7eccb2a26cd3294713d0f1112158d..b89576a5ca3e2b3964b8ce4aec09c8965496c2f2 100644
--- a/net/ipv6/netfilter/nf_log_ipv6.c
+++ b/net/ipv6/netfilter/nf_log_ipv6.c
@@ -398,7 +398,13 @@ static int __init nf_log_ipv6_init(void)
 	if (ret < 0)
 		return ret;
 
-	nf_log_register(NFPROTO_IPV6, &nf_ip6_logger);
+	ret = nf_log_register(NFPROTO_IPV6, &nf_ip6_logger);
+	if (ret < 0) {
+		pr_err("log: failed to register logger\n");
+		unregister_pernet_subsys(&nf_log_ipv6_net_ops);
+		return ret;
+	}
+
 	return 0;
 }
 
diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
index daad6022c689c47a66a47e7a89a83c0c848c53d6..04495d9debe784827fd0cbcf5e541f10fa06839d 100644
--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -82,10 +82,21 @@ int nf_log_register(u_int8_t pf, struct nf_logger *logger)
 	mutex_lock(&nf_log_mutex);
 
 	if (pf == NFPROTO_UNSPEC) {
+		for (i = NFPROTO_UNSPEC; i < NFPROTO_NUMPROTO; i++) {
+			if (rcu_dereference_protected(loggers[i][logger->type],
+					lockdep_is_held(&nf_log_mutex))) {
+				mutex_unlock(&nf_log_mutex);
+				return -EEXIST;
+			}
+		}
 		for (i = NFPROTO_UNSPEC; i < NFPROTO_NUMPROTO; i++)
 			rcu_assign_pointer(loggers[i][logger->type], logger);
 	} else {
-		/* register at end of list to honor first register win */
+		if (rcu_dereference_protected(loggers[pf][logger->type],
+				lockdep_is_held(&nf_log_mutex))) {
+			mutex_unlock(&nf_log_mutex);
+			return -EEXIST;
+		}
 		rcu_assign_pointer(loggers[pf][logger->type], logger);
 	}
 
-- 
1.9.3


^ permalink raw reply related

* [PATCH] rtlwifi: prevent format string usage from leaking
From: Kees Cook @ 2014-10-20 21:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Larry Finger, Chaoming Li, John W. Linville, linux-wireless,
	netdev

Use "%s" in the workqueue allocation to make sure the rtl_hal_cfg name
can never accidentally leak information via a format string.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 drivers/net/wireless/rtlwifi/base.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/rtlwifi/base.c b/drivers/net/wireless/rtlwifi/base.c
index 58ba71830886..40b6d1d006d7 100644
--- a/drivers/net/wireless/rtlwifi/base.c
+++ b/drivers/net/wireless/rtlwifi/base.c
@@ -467,7 +467,7 @@ static void _rtl_init_deferred_work(struct ieee80211_hw *hw)
 		    rtl_easy_concurrent_retrytimer_callback, (unsigned long)hw);
 	/* <2> work queue */
 	rtlpriv->works.hw = hw;
-	rtlpriv->works.rtl_wq = alloc_workqueue(rtlpriv->cfg->name, 0, 0);
+	rtlpriv->works.rtl_wq = alloc_workqueue("%s", 0, 0, rtlpriv->cfg->name);
 	INIT_DELAYED_WORK(&rtlpriv->works.watchdog_wq,
 			  (void *)rtl_watchdog_wq_callback);
 	INIT_DELAYED_WORK(&rtlpriv->works.ips_nic_off_wq,
-- 
1.9.1


-- 
Kees Cook
Chrome OS Security

^ permalink raw reply related

* [PATCH net] bpf: fix bug in eBPF verifier
From: Alexei Starovoitov @ 2014-10-20 21:54 UTC (permalink / raw)
  To: David S. Miller
  Cc: Hannes Frederic Sowa, Daniel Borkmann, netdev, linux-kernel

while comparing for verifier state equivalency the comparison
was missing a check for uninitialized register.
Make sure it does so and add a testcase.

Fixes: f1bca824dabb ("bpf: add search pruning optimization to verifier")
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---

while we were staring at the verifier code with Hannes during LPC
something felt odd in this spot. Yes. It was a bug. Fix it.

 kernel/bpf/verifier.c       |    3 ++-
 samples/bpf/test_verifier.c |   11 +++++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 801f5f3..9f81818 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1409,7 +1409,8 @@ static bool states_equal(struct verifier_state *old, struct verifier_state *cur)
 		if (memcmp(&old->regs[i], &cur->regs[i],
 			   sizeof(old->regs[0])) != 0) {
 			if (old->regs[i].type == NOT_INIT ||
-			    old->regs[i].type == UNKNOWN_VALUE)
+			    (old->regs[i].type == UNKNOWN_VALUE &&
+			     cur->regs[i].type != NOT_INIT))
 				continue;
 			return false;
 		}
diff --git a/samples/bpf/test_verifier.c b/samples/bpf/test_verifier.c
index f44ef11..eb4bec0 100644
--- a/samples/bpf/test_verifier.c
+++ b/samples/bpf/test_verifier.c
@@ -209,6 +209,17 @@ static struct bpf_test tests[] = {
 		.result = REJECT,
 	},
 	{
+		"program doesn't init R0 before exit in all branches",
+		.insns = {
+			BPF_JMP_IMM(BPF_JGE, BPF_REG_1, 0, 2),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 2),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "R0 !read_ok",
+		.result = REJECT,
+	},
+	{
 		"stack out of bounds",
 		.insns = {
 			BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
-- 
1.7.9.5

^ permalink raw reply related

* Re: localed stuck in recent 3.18 git in copy_net_ns?
From: Kevin Fenzi @ 2014-10-20 20:53 UTC (permalink / raw)
  To: Dave Jones; +Cc: netdev, linux-kernel
In-Reply-To: <20141020204326.GA25668@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 752 bytes --]

On Mon, 20 Oct 2014 16:43:26 -0400
Dave Jones <davej@redhat.com> wrote:

> I've seen similar soft lockup traces from the sys_unshare path when
> running my fuzz tester.  It seems that if you create enough network
> namespaces, it can take a huge amount of time for them to be iterated.
> (Running trinity with '-c unshare' you can see the slow down happen.
> In some cases, it takes so long that the watchdog process kills it --
>  though the SIGKILL won't get delivered until the unshare() completes)
> 
> Any idea what this machine had been doing prior to this that may have
> involved creating lots of namespaces ?

That was right after boot. ;) 

This is my main rawhide running laptop.

A 'ip netns list' shows nothing.

kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* [PATCH RFC v5 net 3/3] ipv6: Avoid redoing fib6_lookup() with reachable = 0 by saving fn
From: Martin KaFai Lau @ 2014-10-20 20:42 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Hannes Frederic Sowa
In-Reply-To: <1413837765-5446-1-git-send-email-kafai@fb.com>

This patch save the fn before doing rt6_backtrack.
Hence, without redo-ing the fib6_lookup(), saved_fn can be used
to redo rt6_select() with RT6_LOOKUP_F_REACHABLE off.

Some minor changes I think make sense to review as a single patch:
* Remove the 'out:' goto label.
* Remove the 'reachable' variable. Only use the 'strict' variable instead.

After this patch, "failing ip6_ins_rt()" should be the only case that
requires a redo of fib6_lookup().

Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/ipv6/route.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 98c523f..c910831 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -917,31 +917,40 @@ static struct rt6_info *rt6_alloc_clone(struct rt6_info *ort,
 static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, int oif,
 				      struct flowi6 *fl6, int flags)
 {
-	struct fib6_node *fn;
+	struct fib6_node *fn, *saved_fn;
 	struct rt6_info *rt, *nrt;
 	int strict = 0;
 	int attempts = 3;
 	int err;
-	int reachable = net->ipv6.devconf_all->forwarding ? 0 : RT6_LOOKUP_F_REACHABLE;
 
 	strict |= flags & RT6_LOOKUP_F_IFACE;
+	if (net->ipv6.devconf_all->forwarding == 0)
+		strict |= RT6_LOOKUP_F_REACHABLE;
 
 redo_fib6_lookup_lock:
 	read_lock_bh(&table->tb6_lock);
 
-redo_fib6_lookup:
 	fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
+	saved_fn = fn;
 
 redo_rt6_select:
-	rt = rt6_select(fn, oif, strict | reachable);
+	rt = rt6_select(fn, oif, strict);
 	if (rt->rt6i_nsiblings)
-		rt = rt6_multipath_select(rt, fl6, oif, strict | reachable);
+		rt = rt6_multipath_select(rt, fl6, oif, strict);
 	if (rt == net->ipv6.ip6_null_entry) {
 		fn = fib6_backtrack(fn, &fl6->saddr);
 		if (fn)
 			goto redo_rt6_select;
-		else
-			goto out;
+		else if (strict & RT6_LOOKUP_F_REACHABLE) {
+			/* also consider unreachable route */
+			strict &= ~RT6_LOOKUP_F_REACHABLE;
+			fn = saved_fn;
+			goto redo_rt6_select;
+		} else {
+			dst_hold(&rt->dst);
+			read_unlock_bh(&table->tb6_lock);
+			goto out2;
+		}
 	}
 
 	dst_hold(&rt->dst);
@@ -977,13 +986,6 @@ redo_rt6_select:
 	ip6_rt_put(rt);
 	goto redo_fib6_lookup_lock;
 
-out:
-	if (reachable) {
-		reachable = 0;
-		goto redo_fib6_lookup;
-	}
-	dst_hold(&rt->dst);
-	read_unlock_bh(&table->tb6_lock);
 out2:
 	rt->dst.lastuse = jiffies;
 	rt->dst.__use++;
-- 
1.8.1

^ permalink raw reply related

* [PATCH RFC v5 net 1/3] ipv6: Remove BACKTRACK macro
From: Martin KaFai Lau @ 2014-10-20 20:42 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Hannes Frederic Sowa
In-Reply-To: <1413837765-5446-1-git-send-email-kafai@fb.com>

It is the prep work to reduce the number of calls to fib6_lookup().

The BACKTRACK macro could be hard-to-read and error-prone due to
its side effects (mainly goto).

This patch is to:
1. Replace BACKTRACK macro with a function (fib6_backtrack) with the following
   return values:
   * If it is backtrack-able, returns next fn for retry.
   * If it reaches the root, returns NULL.
2. The caller needs to decide if a backtrack is needed (by testing
   rt == net->ipv6.ip6_null_entry).
3. Rename the goto labels in ip6_pol_route() to make the next few
   patches easier to read.

Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/ipv6/route.c | 70 ++++++++++++++++++++++++++++++++------------------------
 1 file changed, 40 insertions(+), 30 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index a318dd89..f1ab2f4 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -772,23 +772,22 @@ int rt6_route_rcv(struct net_device *dev, u8 *opt, int len,
 }
 #endif
 
-#define BACKTRACK(__net, saddr)			\
-do { \
-	if (rt == __net->ipv6.ip6_null_entry) {	\
-		struct fib6_node *pn; \
-		while (1) { \
-			if (fn->fn_flags & RTN_TL_ROOT) \
-				goto out; \
-			pn = fn->parent; \
-			if (FIB6_SUBTREE(pn) && FIB6_SUBTREE(pn) != fn) \
-				fn = fib6_lookup(FIB6_SUBTREE(pn), NULL, saddr); \
-			else \
-				fn = pn; \
-			if (fn->fn_flags & RTN_RTINFO) \
-				goto restart; \
-		} \
-	} \
-} while (0)
+static struct fib6_node* fib6_backtrack(struct fib6_node *fn,
+					struct in6_addr *saddr)
+{
+	struct fib6_node *pn;
+	while (1) {
+		if (fn->fn_flags & RTN_TL_ROOT)
+			return NULL;
+		pn = fn->parent;
+		if (FIB6_SUBTREE(pn) && FIB6_SUBTREE(pn) != fn)
+			fn = fib6_lookup(FIB6_SUBTREE(pn), NULL, saddr);
+		else
+			fn = pn;
+		if (fn->fn_flags & RTN_RTINFO)
+			return fn;
+	}
+}
 
 static struct rt6_info *ip6_pol_route_lookup(struct net *net,
 					     struct fib6_table *table,
@@ -804,8 +803,11 @@ restart:
 	rt = rt6_device_match(net, rt, &fl6->saddr, fl6->flowi6_oif, flags);
 	if (rt->rt6i_nsiblings && fl6->flowi6_oif == 0)
 		rt = rt6_multipath_select(rt, fl6, fl6->flowi6_oif, flags);
-	BACKTRACK(net, &fl6->saddr);
-out:
+	if (rt == net->ipv6.ip6_null_entry) {
+		fn = fib6_backtrack(fn, &fl6->saddr);
+		if (fn)
+			goto restart;
+	}
 	dst_use(&rt->dst, jiffies);
 	read_unlock_bh(&table->tb6_lock);
 	return rt;
@@ -924,19 +926,25 @@ static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 
 	strict |= flags & RT6_LOOKUP_F_IFACE;
 
-relookup:
+redo_fib6_lookup_lock:
 	read_lock_bh(&table->tb6_lock);
 
-restart_2:
+redo_fib6_lookup:
 	fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
 
-restart:
+redo_rt6_select:
 	rt = rt6_select(fn, oif, strict | reachable);
 	if (rt->rt6i_nsiblings)
 		rt = rt6_multipath_select(rt, fl6, oif, strict | reachable);
-	BACKTRACK(net, &fl6->saddr);
-	if (rt == net->ipv6.ip6_null_entry ||
-	    rt->rt6i_flags & RTF_CACHE)
+	if (rt == net->ipv6.ip6_null_entry) {
+		fn = fib6_backtrack(fn, &fl6->saddr);
+		if (fn)
+			goto redo_rt6_select;
+		else
+			goto out;
+	}
+
+	if (rt->rt6i_flags & RTF_CACHE)
 		goto out;
 
 	dst_hold(&rt->dst);
@@ -967,12 +975,12 @@ restart:
 	 * released someone could insert this route.  Relookup.
 	 */
 	ip6_rt_put(rt);
-	goto relookup;
+	goto redo_fib6_lookup_lock;
 
 out:
 	if (reachable) {
 		reachable = 0;
-		goto restart_2;
+		goto redo_fib6_lookup;
 	}
 	dst_hold(&rt->dst);
 	read_unlock_bh(&table->tb6_lock);
@@ -1235,10 +1243,12 @@ restart:
 		rt = net->ipv6.ip6_null_entry;
 	else if (rt->dst.error) {
 		rt = net->ipv6.ip6_null_entry;
-		goto out;
+	} else if (rt == net->ipv6.ip6_null_entry) {
+		fn = fib6_backtrack(fn, &fl6->saddr);
+		if (fn)
+			goto restart;
 	}
-	BACKTRACK(net, &fl6->saddr);
-out:
+
 	dst_hold(&rt->dst);
 
 	read_unlock_bh(&table->tb6_lock);
-- 
1.8.1

^ permalink raw reply related

* [PATCH RFC v5 net 2/3] ipv6: Avoid redoing fib6_lookup() for RTF_CACHE hit case
From: Martin KaFai Lau @ 2014-10-20 20:42 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Hannes Frederic Sowa
In-Reply-To: <1413837765-5446-1-git-send-email-kafai@fb.com>

When there is a RTF_CACHE hit, no need to redo fib6_lookup()
with reachable=0.

Cc: David Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/ipv6/route.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f1ab2f4..98c523f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -944,12 +944,12 @@ redo_rt6_select:
 			goto out;
 	}
 
-	if (rt->rt6i_flags & RTF_CACHE)
-		goto out;
-
 	dst_hold(&rt->dst);
 	read_unlock_bh(&table->tb6_lock);
 
+	if (rt->rt6i_flags & RTF_CACHE)
+		goto out2;
+
 	if (!(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY)))
 		nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
 	else if (!(rt->dst.flags & DST_HOST))
-- 
1.8.1

^ permalink raw reply related

* [PATCH RFC v5 net 0/3] ipv6: Reduce the number of fib6_lookup() calls from ip6_pol_route()
From: Martin KaFai Lau @ 2014-10-20 20:42 UTC (permalink / raw)
  To: netdev

Hi,

This patch set is trying to reduce the number of fib6_lookup()
calls from ip6_pol_route().

I have adapted davem's udpflooda and kbench_mod test
(https://git.kernel.org/pub/scm/linux/kernel/git/davem/net_test_tools.git) to
support IPv6 and here is the result:


Before:
[root]# for i in $(seq 1 3); do time ./udpflood -l 20000000 -c 250 2401:face:face:face::2; done

real    0m34.190s
user    0m3.047s
sys     0m31.108s

real    0m34.635s
user    0m3.125s
sys     0m31.475s

real    0m34.517s
user    0m3.034s
sys     0m31.449s

[root]# insmod ip6_route_kbench.ko oif=2 src=2401:face:face:face::1 dst=2401:face:face:face::2
[  660.160976] ip6_route_kbench: ip6_route_output tdiff: 933
[  660.207261] ip6_route_kbench: ip6_route_output tdiff: 988
[  660.253492] ip6_route_kbench: ip6_route_output tdiff: 896
[  660.298862] ip6_route_kbench: ip6_route_output tdiff: 898

After:
[root]# for i in $(seq 1 3); do time ./udpflood -l 20000000 -c 250 2401:face:face:face::2; done

real    0m32.695s
user    0m2.925s
sys     0m29.737s

real    0m32.636s
user    0m3.007s
sys     0m29.596s

real    0m32.797s
user    0m2.866s
sys     0m29.898s

[root]# insmod ip6_route_kbench.ko oif=2 src=2401:face:face:face::1 dst=2401:face:face:face::2
[  881.220793] ip6_route_kbench: ip6_route_output tdiff: 684
[  881.253477] ip6_route_kbench: ip6_route_output tdiff: 640
[  881.286867] ip6_route_kbench: ip6_route_output tdiff: 630
[  881.320749] ip6_route_kbench: ip6_route_output tdiff: 653


/****************************** udpflood.c ******************************/
/* It is an adaptation of the Eric Dumazet's and David Miller's
 * udpflood tool, by adding IPv6 support.
 */

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <malloc.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <stdint.h>
#include <assert.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

#define _GNU_SOURCE
#include <getopt.h>

typedef uint32_t u32;

static int debug =3D 0;

/* Allow -fstrict-aliasing */
typedef union sa_u {
	struct sockaddr_storage a46;
	struct sockaddr_in a4;
	struct sockaddr_in6 a6;
} sa_u;

static int usage(void)
{
	printf("usage: udpflood [ -l count ] [ -m message_size ] [ -c num_ip_addrs=
 ] IP_ADDRESS\n");
	return -1;
}

static u32 get_last32h(const sa_u *sa)
{
	if (sa->a46.ss_family =3D=3D PF_INET)
		return ntohl(sa->a4.sin_addr.s_addr);
	else
		return ntohl(sa->a6.sin6_addr.s6_addr32[3]);
}

static void set_last32h(sa_u *sa, u32 last32h)
{
	if (sa->a46.ss_family =3D=3D PF_INET)
		sa->a4.sin_addr.s_addr =3D htonl(last32h);
	else
		sa->a6.sin6_addr.s6_addr32[3] =3D htonl(last32h);
}

static void print_saddr(const sa_u *sa, const char *msg)
{
	char buf[64];

	if (!debug)
		return;

	switch (sa->a46.ss_family) {
	case PF_INET:
		inet_ntop(PF_INET, &(sa->a4.sin_addr.s_addr), buf,
			  sizeof(buf));
		break;
	case PF_INET6:
		inet_ntop(PF_INET6, &(sa->a6.sin6_addr), buf, sizeof(buf));
		break;
	}

	printf("%s: %s\n", msg, buf);
}

static int send_packets(const sa_u *sa, size_t num_addrs, int count, int ms=
g_sz)
{
	char *msg =3D malloc(msg_sz);
	sa_u saddr;
	u32 start_addr32h, end_addr32h, cur_addr32h;
	int fd, i, err;

	if (!msg)
		return -ENOMEM;

	memset(msg, 0, msg_sz);

	memcpy(&saddr, sa, sizeof(saddr));
	cur_addr32h =3D start_addr32h =3D get_last32h(&saddr);
	end_addr32h =3D start_addr32h + num_addrs;

	fd =3D socket(saddr.a46.ss_family, SOCK_DGRAM, 0);
	if (fd < 0) {
		perror("socket");
		err =3D fd;
		goto out_nofd;
	}

	/* connect to avoid the kernel spending time in figuring
	 * out the source address (i.e pin the src address)
	 */
	err =3D connect(fd, (struct sockaddr *) &saddr, sizeof(saddr));
	if (err < 0) {
		perror("connect");
		goto out;
	}

	print_saddr(&saddr, "start_addr");
	for (i =3D 0; i < count; i++) {
		print_saddr(&saddr, "sendto");
		err =3D sendto(fd, msg, msg_sz, 0, (struct sockaddr *)&saddr,
			     sizeof(saddr));
		if (err < 0) {
			perror("sendto");
			goto out;
		}

		if (++cur_addr32h >=3D end_addr32h)
			cur_addr32h =3D start_addr32h;
		set_last32h(&saddr, cur_addr32h);
	}

	err =3D 0;
out:
	close(fd);
out_nofd:
	free(msg);
	return err;
}

int main(int argc, char **argv, char **envp)
{
	int port, msg_sz, count, num_addrs, ret;

	sa_u start_addr;

	port =3D 6000;
	msg_sz =3D 32;
	count =3D 10000000;
	num_addrs =3D 1;

	while ((ret =3D getopt(argc, argv, "dl:s:p:c:")) >=3D 0) {
		switch (ret) {
		case 'l':
			sscanf(optarg, "%d", &count);
			break;
		case 's':
			sscanf(optarg, "%d", &msg_sz);
			break;
		case 'p':
			sscanf(optarg, "%d", &port);
			break;
		case 'c':
			sscanf(optarg, "%d", &num_addrs);
			break;
		case 'd':
			debug =3D 1;
			break;
		case '?':
			return usage();
		}
	}

	if (num_addrs < 1)
		return usage();

	if (!argv[optind])
		return usage();

	start_addr.a4.sin_port =3D htons(port);
	if (inet_pton(PF_INET, argv[optind], &start_addr.a4.sin_addr))
		start_addr.a46.ss_family =3D PF_INET;
	else if (inet_pton(PF_INET6, argv[optind], &start_addr.a6.sin6_addr.s6_add=
r))
		start_addr.a46.ss_family =3D PF_INET6;
	else
		return usage();

	return send_packets(&start_addr, num_addrs, count, msg_sz);
}

/****************** ip6_route_kbench_mod.c ******************/
#define pr_fmt(fmt) "ip6_route_kbench: " fmt

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/inet.h>
#include <linux/in6.h>

#include <net/route.h>
#include <net/ip6_route.h>

#include <linux/timex.h>
#include <uapi/linux/icmpv6.h>

/* We can't just use "get_cycles()" as on some platforms, such
 * as sparc64, that gives system cycles rather than cpu clock
 * cycles.
 */

#ifdef CONFIG_SPARC64
static inline unsigned long long get_tick(void)
{
	unsigned long long t;

	__asm__ __volatile__("rd %%tick, %0" : "=r" (t));
	return t;
}
#elif defined(CONFIG_X86)
static inline unsigned long long get_tick(void)
{
	unsigned long long t;

	rdtscll(t);

	return t;
}
#elif defined(CONFIG_POWERPC)
static inline unsigned long long get_tick(void)
{
	return get_cycles();
}
#else
#error Unsupported architecture, please implement get_tick()
#endif

#define DEFAULT_WARMUP_COUNT 100000

#define DEFAULT_DST_IP_ADDR	0x4a800001
#define DEFAULT_SRC_IP_ADDR	0x00000000
#define DEFAULT_OIF		0
#define DEFAULT_IIF		0
#define DEFAULT_MARK		0x00000000
#define DEFAULT_TOS		0x00

static int flow_oif = DEFAULT_OIF;
static int flow_iif = DEFAULT_IIF;
static u32 flow_mark = DEFAULT_MARK;
static struct in6_addr flow_dst_ip_addr;
static struct in6_addr flow_src_ip_addr;
static int flow_tos = DEFAULT_TOS;

static char dst_string[64];
static char src_string[64];

module_param_string(dst, dst_string, sizeof(dst_string), 0);
module_param_string(src, src_string, sizeof(src_string), 0);

static int __init flow_setup(void)
{
	if (dst_string[0] &&
	    !in6_pton(dst_string, -1, &flow_dst_ip_addr.s6_addr[0], -1, NULL)) {
		pr_info("cannot parse \"%s\"\n", dst_string);
		return -1;
	}

	if (src_string[0] &&
	    !in6_pton(src_string, -1, &flow_src_ip_addr.s6_addr[0], -1, NULL)) {
		pr_info("cannot parse \"%s\"\n", dst_string);
		return -1;
	}

	return 0;
}

module_param_named(oif, flow_oif, int, 0);
module_param_named(iif, flow_iif, int, 0);
module_param_named(mark, flow_mark, uint, 0);
module_param_named(tos, flow_tos, int, 0);

static int warmup_count = DEFAULT_WARMUP_COUNT;
module_param_named(count, warmup_count, int, 0);

static void flow_init(struct flowi6 *fl6)
{
	memset(fl6, 0, sizeof(*fl6));
	fl6->flowi6_proto = IPPROTO_ICMPV6;
	fl6->flowi6_oif = flow_oif;
	fl6->flowi6_iif = flow_iif;
	fl6->flowi6_mark = flow_mark;
	fl6->flowi6_tos = flow_tos;
	fl6->daddr = flow_dst_ip_addr;
	fl6->saddr = flow_src_ip_addr;
}

static struct sk_buff * fake_skb_get(void)
{
	struct ipv6hdr *hdr;
	struct sk_buff *skb;

	skb = alloc_skb(4096, GFP_KERNEL);
	if (!skb) {
		pr_info("Cannot alloc SKB for test\n");
		return NULL;
	}
	skb->dev = __dev_get_by_index(&init_net, flow_iif);
	if (skb->dev == NULL) {
		pr_info("Input device (%d) does not exist\n", flow_iif);
		goto err;
	}

	skb_reset_mac_header(skb);
	skb_reset_network_header(skb);
	skb_reserve(skb, MAX_HEADER + sizeof(struct ipv6hdr));
	hdr = ipv6_hdr(skb);

	hdr->priority = 0;
	hdr->version = 6;
	memset(hdr->flow_lbl, 0, sizeof(hdr->flow_lbl));
	hdr->payload_len = htons(sizeof(struct icmp6hdr));
	hdr->nexthdr = IPPROTO_ICMPV6;
	hdr->saddr = flow_src_ip_addr;
	hdr->daddr = flow_dst_ip_addr;
	skb->protocol = htons(ETH_P_IPV6);
	skb->mark = flow_mark;

	return skb;
err:
	kfree_skb(skb);
	return NULL;
}

static void do_full_output_lookup_bench(void)
{
	unsigned long long t1, t2, tdiff;
	struct rt6_info *rt;
	struct flowi6 fl6;
	int i;

	rt = NULL;

	for (i = 0; i < warmup_count; i++) {
		flow_init(&fl6);

		rt = (struct rt6_info *)ip6_route_output(&init_net, NULL, &fl6);
		if (IS_ERR(rt))
			break;
		ip6_rt_put(rt);
	}
	if (IS_ERR(rt)) {
		pr_info("ip_route_output_key: err=%ld\n", PTR_ERR(rt));
		return;
	}

	flow_init(&fl6);

	t1 = get_tick();
	rt = (struct rt6_info *)ip6_route_output(&init_net, NULL, &fl6);
	t2 = get_tick();
	if (!IS_ERR(rt))
		ip6_rt_put(rt);

	tdiff = t2 - t1;
	pr_info("ip6_route_output tdiff: %llu\n", tdiff);
}

static void do_full_input_lookup_bench(void)
{
	unsigned long long t1, t2, tdiff;
	struct sk_buff *skb;
	struct rt6_info *rt;
	int err, i;

	skb = fake_skb_get();
	if (skb == NULL)
		goto out_free;

	err = 0;
	local_bh_disable();
	for (i = 0; i < warmup_count; i++) {
		ip6_route_input(skb);
		rt = (struct rt6_info *)skb_dst(skb);
		err = (!rt || rt == init_net.ipv6.ip6_null_entry);
		skb_dst_drop(skb);
		if (err)
			break;
	}
	local_bh_enable();

	if (err) {
		pr_info("Input route lookup fails\n");
		goto out_free;
	}

	local_bh_disable();
	t1 = get_tick();
	ip6_route_input(skb);
	t2 = get_tick();
	local_bh_enable();

	rt = (struct rt6_info *)skb_dst(skb);
	err = (!rt || rt == init_net.ipv6.ip6_null_entry);
	skb_dst_drop(skb);
	if (err) {
		pr_info("Input route lookup fails\n");
		goto out_free;
	}

	tdiff = t2 - t1;
	pr_info("ip6_route_input tdiff: %llu\n", tdiff);

out_free:
	kfree_skb(skb);
}

static void do_full_lookup_bench(void)
{
	if (!flow_iif)
		do_full_output_lookup_bench();
	else
		do_full_input_lookup_bench();
}

static void do_bench(void)
{
	do_full_lookup_bench();
	do_full_lookup_bench();
	do_full_lookup_bench();
	do_full_lookup_bench();
}

static int __init kbench_init(void)
{
	if (flow_setup())
		return -EINVAL;

	pr_info("flow [IIF(%d),OIF(%d),MARK(0x%08x),D("IP6_FMT"),"
		"S("IP6_FMT"),TOS(0x%02x)]\n",
		flow_iif, flow_oif, flow_mark,
		IP6_PRT(flow_dst_ip_addr),
		IP6_PRT(flow_src_ip_addr),
		flow_tos);

#if defined(CONFIG_X86)
	if (!cpu_has_tsc) {
		pr_err("X86 TSC is required, but is unavailable.\n");
		return -EINVAL;
	}
#endif

	pr_info("sizeof(struct rt6_info)==%zu\n", sizeof(struct rt6_info));

	do_bench();

	return -ENODEV;
}

static void __exit kbench_exit(void)
{
}

module_init(kbench_init);
module_exit(kbench_exit);
MODULE_LICENSE("GPL");

^ permalink raw reply

* Re: localed stuck in recent 3.18 git in copy_net_ns?
From: Dave Jones @ 2014-10-20 20:43 UTC (permalink / raw)
  To: Kevin Fenzi; +Cc: netdev, linux-kernel
In-Reply-To: <20141020141515.0688bf33@voldemort.scrye.com>

On Mon, Oct 20, 2014 at 02:15:15PM -0600, Kevin Fenzi wrote:
 
 > I'm seeing suspend/resume failures with recent 3.18 git kernels. 
 > 
 > Full dmesg at: http://paste.fedoraproject.org/143615/83287914/
 > 
 > The possibly interesting parts: 
 > 
 > [   78.373144] PM: Syncing filesystems ... done.
 > [   78.411180] PM: Preparing system for mem sleep
 > [   78.411995] Freezing user space processes ... 
 > [   98.429955] Freezing of tasks failed after 20.001 seconds (1 tasks refusing to freeze, wq_busy=0):
 > [   98.429971] (-localed)      D ffff88025f214c80     0  1866      1 0x00000084
 > [   98.429975]  ffff88024e777df8 0000000000000086 ffff88009b4444b0 0000000000014c80
 > [   98.429978]  ffff88024e777fd8 0000000000014c80 ffff880250ffb110 ffff88009b4444b0
 > [   98.429981]  0000000000000000 ffffffff81cec1a0 ffffffff81cec1a4 ffff88009b4444b0
 > [   98.429983] Call Trace:
 > [   98.429991]  [<ffffffff8175d619>] schedule_preempt_disabled+0x29/0x70
 > [   98.429994]  [<ffffffff8175f433>] __mutex_lock_slowpath+0xb3/0x120
 > [   98.429997]  [<ffffffff8175f4c3>] mutex_lock+0x23/0x40
 > [   98.430001]  [<ffffffff8163e325>] copy_net_ns+0x75/0x140
 > [   98.430005]  [<ffffffff810b8c2d>] create_new_namespaces+0xfd/0x1a0
 > [   98.430008]  [<ffffffff810b8e5a>] unshare_nsproxy_namespaces+0x5a/0xc0
 > [   98.430012]  [<ffffffff81098813>] SyS_unshare+0x193/0x340
 > [   98.430015]  [<ffffffff817617a9>] system_call_fastpath+0x12/0x17

I've seen similar soft lockup traces from the sys_unshare path when running my
fuzz tester.  It seems that if you create enough network namespaces,
it can take a huge amount of time for them to be iterated.
(Running trinity with '-c unshare' you can see the slow down happen. In
 some cases, it takes so long that the watchdog process kills it --
 though the SIGKILL won't get delivered until the unshare() completes)

Any idea what this machine had been doing prior to this that may have
involved creating lots of namespaces ?

	Dave

^ permalink raw reply

* Re: Routing BUG with ppp over l2tp
From: James Carlson @ 2014-10-20 20:22 UTC (permalink / raw)
  To: Alan Stern; +Cc: James Chapman, linux-ppp, netdev
In-Reply-To: <Pine.LNX.4.44L0.1410201414500.2403-100000@iolanthe.rowland.org>

On 10/20/14 15:45, Alan Stern wrote:
> On Mon, 20 Oct 2014, James Carlson wrote:
>> Indeed!  That's pretty darned lame behavior by that peer.  It would
>> probably be workable if you had a virtual router instance and were able
>> to put the L2TP connection in one routing instance and the PPP
>> connection in another routing instance, but that's likely not at all
>> simple to achieve.
> 
> I'd like to find the simplest solution.  Ideally it should "just work", 
> like the Windows and OS-X clients do.

I'm not an expert on Windows networking internals.  I assume OS X is BSD
+ whatever the folks in Cupertino have done to it.  :-/

At a guess, it's living on the edge.  It works because the L2TP
connection establishment caches a pointer to the output forwarding table
entry ("route") and just keeps living with it no matter what actually
happens down the line.

On Linux (and likely many other systems), the output computation is a
bit more dynamic, and the establishment of a direct point-to-point link
to a given IP address (as the PPP link represents) causes existing
cached pointers to get flushed away.  Future packets to that destination
(IP _always_ forwards based on destination, not source) go down the most
direct path.  Point-to-point is as direct as you can get.

It may be possible to modify the L2TP code to use flags to avoid the PPP
link (MSG_DONTROUTE?), but I suspect that's probably a bad rather than a
good thing to do.

>>> Unfortunately, I can't work around this problem by reconfiguring the
>>> VPN server -- there's no way to tell it to use a different IP address
>>> for its end of the VPN tunnel.  Furthermore, the server works just fine
>>> with clients running Windows or OS-X.
>>
>> Really?  That seems ... improbable.
> 
> I guess that depends on how you judge probabilities.  :-)

:-/

> Internet:
> Destination        Gateway            Flags    Refs      Use  Netif Expire
> default            140.247.233.37     UGSc        2        4   ppp0
> 10                 ppp0               USc         1        0   ppp0

That's *quite* interesting!  The PPP link doesn't have an interface
route as you'd find on most other systems.  Instead, it has what appears
to be an effectively unnumbered link.  Note the "ppp0" there instead of
an actual output address and the happy use of "10" for the local address
+ mask.

For what it's worth, the forced IP address option I've suggested is
morally equivalent to what's being done here on OS X, so that's a fair
reason to recommend it.

I checked out the pppd on Mac OS X (Darwin 13.4.0; Mavericks), and it
looks to be a variant of the SAMBA/ANU/CMU pppd, but I'm not sure what's
different with it, and I know of no contributions from them.  And the
BSD support is long gone from the main source base ...

> I don't understand (and can't be bothered to look up) those arcane
> symbols in the netstat output.  The IP address used for the ping test
> (10.160.0.2) is a system on the VPN's private network.

The flags aren't all that interesting.  "Up" "Gateway" "Static"
"cloning" are all expected in this context.

>> As long as you don't need to contact that specific remote server using
>> the badly-assigned "internal" VPN address and can live with the fact
>> that you'll either go through the regular Internet to that address or be
>> forced to use some other address configured on that server, you should
>> be good.
>>
>> (The address I used above is 10.160.0.2.  That was one of the internal
>> DNS server addresses provided in the log you posted.  It's not necessary
>> that the address used here is exactly that, but it may well be helpful.)
> 
> That might work.  But using a nonstandard version of pppd would be
> awkward, and I would prefer to avoid it.

What's "non-standard?"

Having the ability to force a given remote IP address looks to me like a
perfectly reasonable thing to do.  We allow the remote IP address to be
set arbitrarily when the peer (for whatever reason) refuses to divulge
its address, and this is just an extension of that idea.

>> If you can't do that for some reason, then I suppose it would be
>> possible to use IP Chains (or whatever the packet-modification tool du
>> jure is used in your Linux distribution) to nail up an exception so that
>> the outside packets go to the outside interface and the inside ones go
>> to the PPP interface.  Doing that likely requires selecting on (at
>> least!) source address, so it's messy and ugly and possibly error-prone,
>> but it might be doable.
> 
> That sounds like a fairly easy thing to try.  But it would still 
> require manual intervention instead of just working.  Fixing the kernel 
> would be preferable, IMO.

I don't quite agree that it's necessarily "broken."

I do agree that it's bad to crash due to this misconfiguration.  That's
certainly a bug of some sort.  But making the kernel "naturally" accept
that the same unicast remote IP address refers to different outputs
depending on phase-of-moon in order to make this weird server happy
sounds like adding a bug rather than fixing one.

Routing based on destination is a good thing.

>> Otherwise, contact the maintainer of that VPN server.  It's just plain
>> old broken, and life's too short for broken software.
> 
> It is an old Cisco security appliance, no doubt well past End-Of-Life.  
> I'm starting to think it might be preferable to throw the thing away 
> and start up a VPN server on the department's firewall (which is a 
> Linux box) instead.

That sounds like a good (and easier to support) solution.

-- 
James Carlson         42.703N 71.076W         <carlsonj@workingcode.com>

^ permalink raw reply

* localed stuck in recent 3.18 git in copy_net_ns?
From: Kevin Fenzi @ 2014-10-20 20:15 UTC (permalink / raw)
  To: netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2849 bytes --]

Greetings. 

I'm seeing suspend/resume failures with recent 3.18 git kernels. 

Full dmesg at: http://paste.fedoraproject.org/143615/83287914/

The possibly interesting parts: 

[   78.373144] PM: Syncing filesystems ... done.
[   78.411180] PM: Preparing system for mem sleep
[   78.411995] Freezing user space processes ... 
[   98.429955] Freezing of tasks failed after 20.001 seconds (1 tasks refusing to freeze, wq_busy=0):
[   98.429971] (-localed)      D ffff88025f214c80     0  1866      1 0x00000084
[   98.429975]  ffff88024e777df8 0000000000000086 ffff88009b4444b0 0000000000014c80
[   98.429978]  ffff88024e777fd8 0000000000014c80 ffff880250ffb110 ffff88009b4444b0
[   98.429981]  0000000000000000 ffffffff81cec1a0 ffffffff81cec1a4 ffff88009b4444b0
[   98.429983] Call Trace:
[   98.429991]  [<ffffffff8175d619>] schedule_preempt_disabled+0x29/0x70
[   98.429994]  [<ffffffff8175f433>] __mutex_lock_slowpath+0xb3/0x120
[   98.429997]  [<ffffffff8175f4c3>] mutex_lock+0x23/0x40
[   98.430001]  [<ffffffff8163e325>] copy_net_ns+0x75/0x140
[   98.430005]  [<ffffffff810b8c2d>] create_new_namespaces+0xfd/0x1a0
[   98.430008]  [<ffffffff810b8e5a>] unshare_nsproxy_namespaces+0x5a/0xc0
[   98.430012]  [<ffffffff81098813>] SyS_unshare+0x193/0x340
[   98.430015]  [<ffffffff817617a9>] system_call_fastpath+0x12/0x17

[   98.430032] Restarting tasks ... done.
[   98.480361] PM: Syncing filesystems ... done.
[   98.571645] PM: Preparing system for freeze sleep
[   98.571779] Freezing user space processes ... 
[  118.592086] Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
[  118.592102] (-localed)      D ffff88025f214c80     0  1866      1 0x00000084
[  118.592106]  ffff88024e777df8 0000000000000086 ffff88009b4444b0 0000000000014c80
[  118.592109]  ffff88024e777fd8 0000000000014c80 ffff880250ffb110 ffff88009b4444b0
[  118.592111]  0000000000000000 ffffffff81cec1a0 ffffffff81cec1a4 ffff88009b4444b0
[  118.592114] Call Trace:
[  118.592121]  [<ffffffff8175d619>] schedule_preempt_disabled+0x29/0x70
[  118.592125]  [<ffffffff8175f433>] __mutex_lock_slowpath+0xb3/0x120
[  118.592127]  [<ffffffff8175f4c3>] mutex_lock+0x23/0x40
[  118.592132]  [<ffffffff8163e325>] copy_net_ns+0x75/0x140
[  118.592136]  [<ffffffff810b8c2d>] create_new_namespaces+0xfd/0x1a0
[  118.592139]  [<ffffffff810b8e5a>] unshare_nsproxy_namespaces+0x5a/0xc0
[  118.592143]  [<ffffffff81098813>] SyS_unshare+0x193/0x340
[  118.592146]  [<ffffffff817617a9>] system_call_fastpath+0x12/0x17

[  118.592163] Restarting tasks ... done.

root         6  0.0  0.0      0     0 ?        D    13:49   0:00 [kworker/u16:0]
root      1876  0.0  0.0  41460  5784 ?        Ds   13:49   0:00 (-localed)

I'll try and bisect this, but perhaps it rings bells already for folks. 

kevin


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* [PATCH] netlink: Re-add locking to netlink_lookup() and seq walker
From: Thomas Graf @ 2014-10-20 19:53 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Eric Dumazet, Sasha Levin, paulmck, Nikolay Aleksandrov,
	David S. Miller, netdev, linux-kernel, Ursula Braun
In-Reply-To: <20141020082107.GB4268@osiris>

Heiko,

Can you test the following patch:

The synchronize_rcu() in netlink_release() introduces unacceptable
latency. Reintroduce minimal lookup so we can drop the
synchronize_rcu() until socket destruction has been RCUfied.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/netlink/af_netlink.c | 37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7a186e7..f1de72d 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -96,6 +96,14 @@ static DECLARE_WAIT_QUEUE_HEAD(nl_table_wait);
 static int netlink_dump(struct sock *sk);
 static void netlink_skb_destructor(struct sk_buff *skb);
 
+/* nl_table locking explained:
+ * Lookup and traversal are protected with nl_sk_hash_lock or nl_table_lock
+ * combined with an RCU read-side lock. Insertion and removal are protected
+ * with nl_sk_hash_lock while using RCU list modification primitives and may
+ * run in parallel to nl_table_lock protected lookups. Destruction of the
+ * Netlink socket may only occur *after* nl_table_lock has been acquired
+ * either during or after the socket has been removed from the list.
+ */
 DEFINE_RWLOCK(nl_table_lock);
 EXPORT_SYMBOL_GPL(nl_table_lock);
 static atomic_t nl_table_users = ATOMIC_INIT(0);
@@ -109,10 +117,10 @@ EXPORT_SYMBOL_GPL(nl_sk_hash_lock);
 static int lockdep_nl_sk_hash_is_held(void)
 {
 #ifdef CONFIG_LOCKDEP
-	return (debug_locks) ? lockdep_is_held(&nl_sk_hash_lock) : 1;
-#else
-	return 1;
+	if (debug_locks)
+		return lockdep_is_held(&nl_sk_hash_lock) || lockdep_is_held(&nl_table_lock);
 #endif
+	return 1;
 }
 
 static ATOMIC_NOTIFIER_HEAD(netlink_chain);
@@ -1028,11 +1036,13 @@ static struct sock *netlink_lookup(struct net *net, int protocol, u32 portid)
 	struct netlink_table *table = &nl_table[protocol];
 	struct sock *sk;
 
+	read_lock(&nl_table_lock);
 	rcu_read_lock();
 	sk = __netlink_lookup(table, portid, net);
 	if (sk)
 		sock_hold(sk);
 	rcu_read_unlock();
+	read_unlock(&nl_table_lock);
 
 	return sk;
 }
@@ -1257,9 +1267,6 @@ static int netlink_release(struct socket *sock)
 	}
 	netlink_table_ungrab();
 
-	/* Wait for readers to complete */
-	synchronize_net();
-
 	kfree(nlk->groups);
 	nlk->groups = NULL;
 
@@ -1281,6 +1288,7 @@ static int netlink_autobind(struct socket *sock)
 
 retry:
 	cond_resched();
+	netlink_table_grab();
 	rcu_read_lock();
 	if (__netlink_lookup(table, portid, net)) {
 		/* Bind collision, search negative portid values. */
@@ -1288,9 +1296,11 @@ retry:
 		if (rover > -4097)
 			rover = -4097;
 		rcu_read_unlock();
+		netlink_table_ungrab();
 		goto retry;
 	}
 	rcu_read_unlock();
+	netlink_table_ungrab();
 
 	err = netlink_insert(sk, net, portid);
 	if (err == -EADDRINUSE)
@@ -2921,14 +2931,16 @@ static struct sock *netlink_seq_socket_idx(struct seq_file *seq, loff_t pos)
 }
 
 static void *netlink_seq_start(struct seq_file *seq, loff_t *pos)
-	__acquires(RCU)
+	__acquires(nl_table_lock) __acquires(RCU)
 {
+	read_lock(&nl_table_lock);
 	rcu_read_lock();
 	return *pos ? netlink_seq_socket_idx(seq, *pos - 1) : SEQ_START_TOKEN;
 }
 
 static void *netlink_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
+	struct rhashtable *ht;
 	struct netlink_sock *nlk;
 	struct nl_seq_iter *iter;
 	struct net *net;
@@ -2943,19 +2955,19 @@ static void *netlink_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 	iter = seq->private;
 	nlk = v;
 
-	rht_for_each_entry_rcu(nlk, nlk->node.next, node)
+	i = iter->link;
+	ht = &nl_table[i].hash;
+	rht_for_each_entry(nlk, nlk->node.next, ht, node)
 		if (net_eq(sock_net((struct sock *)nlk), net))
 			return nlk;
 
-	i = iter->link;
 	j = iter->hash_idx + 1;
 
 	do {
-		struct rhashtable *ht = &nl_table[i].hash;
 		const struct bucket_table *tbl = rht_dereference_rcu(ht->tbl, ht);
 
 		for (; j < tbl->size; j++) {
-			rht_for_each_entry_rcu(nlk, tbl->buckets[j], node) {
+			rht_for_each_entry(nlk, tbl->buckets[j], ht, node) {
 				if (net_eq(sock_net((struct sock *)nlk), net)) {
 					iter->link = i;
 					iter->hash_idx = j;
@@ -2971,9 +2983,10 @@ static void *netlink_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 }
 
 static void netlink_seq_stop(struct seq_file *seq, void *v)
-	__releases(RCU)
+	__releases(RCU) __releases(nl_table_lock)
 {
 	rcu_read_unlock();
+	read_unlock(&nl_table_lock);
 }
 
 
-- 
1.9.3

^ permalink raw reply related

* Re: Routing BUG with ppp over l2tp
From: Alan Stern @ 2014-10-20 19:45 UTC (permalink / raw)
  To: James Carlson; +Cc: James Chapman, linux-ppp, netdev
In-Reply-To: <5445442C.8080508@workingcode.com>

On Mon, 20 Oct 2014, James Carlson wrote:

> On 10/20/14 12:39, Alan Stern wrote:
> > As far as I can tell, the problem is caused by bad routing.  The kernel
> > gets confused because the IP address assigned by the VPN server to the
> > server's end of the ppp tunnel is the _same_ as the server's actual IP
> > address.
> 
> Indeed!  That's pretty darned lame behavior by that peer.  It would
> probably be workable if you had a virtual router instance and were able
> to put the L2TP connection in one routing instance and the PPP
> connection in another routing instance, but that's likely not at all
> simple to achieve.

I'd like to find the simplest solution.  Ideally it should "just work", 
like the Windows and OS-X clients do.

> > Unfortunately, I can't work around this problem by reconfiguring the
> > VPN server -- there's no way to tell it to use a different IP address
> > for its end of the VPN tunnel.  Furthermore, the server works just fine
> > with clients running Windows or OS-X.
> 
> Really?  That seems ... improbable.

I guess that depends on how you judge probabilities.  :-)

As evidence to convince you, here's a log of a session on a rather old
Mac Powerbook G4 running OS 10.4.11.  The situation isn't exactly the
same as with my Linux system, because for this test the client and the
VPN server are on the same subnet -- I don't think that should make any
difference.  The client's IP address is 140.247.233.41, the server's is
.37, and the router to the outside world is .33.  The client's ppp IP
address (assigned by the server) is 10.170.30.1.

The following commands were carried out while the VPN was connected:


------------------------------------------------------------------------
michael-burns-powerbook-g4:~ stern$ netstat -rn -f inet
Routing tables

Internet:
Destination        Gateway            Flags    Refs      Use  Netif Expire
default            140.247.233.37     UGSc        2        4   ppp0
10                 ppp0               USc         1        0   ppp0
127                127.0.0.1          UCS         0        0    lo0
127.0.0.1          127.0.0.1          UH         12     2278    lo0
140.247.233.32/27  link#4             UCS         2        0    en0
140.247.233.33     0:8:e3:ff:fc:b8    UHLW        0        0    en0   1198
140.247.233.37     0:1e:f7:15:53:a8   UHLW        3       10    en0   1153
140.247.233.37/32  link#4             UCS         1        0    en0
140.247.233.41     127.0.0.1          UHS         0        0    lo0
169.254            link#4             UCS         0        0    en0

michael-burns-powerbook-g4:~ stern$ ping -c1 -n 10.160.0.2
PING 10.160.0.2 (10.160.0.2): 56 data bytes
64 bytes from 10.160.0.2: icmp_seq=0 ttl=64 time=1.368 ms

--- 10.160.0.2 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.368/1.368/1.368/nan ms

michael-burns-powerbook-g4:~ stern$ ifconfig en0
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        inet 140.247.233.41 netmask 0xffffffe0 broadcast 140.247.233.63
        ether 00:03:93:12:da:48 
        media: autoselect (100baseTX <full-duplex>) status: active
        supported media: none autoselect 10baseT/UTP <half-duplex> 10baseT/UTP <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 100baseTX <half-duplex> 100baseTX <full-duplex> 100baseTX <full-duplex,hw-loopback>

michael-burns-powerbook-g4:~ stern$ ifconfig ppp0
ppp0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1280
        inet 10.170.30.1 --> 140.247.233.37 netmask 0xff000000 
------------------------------------------------------------------------


I don't understand (and can't be bothered to look up) those arcane
symbols in the netstat output.  The IP address used for the ping test
(10.160.0.2) is a system on the VPN's private network.

Here's comparable output for a connection from a computer running 
Windows 7 (same IP addresses as before):


------------------------------------------------------------------------
C:\Users\stern>netstat -rn
===========================================================================
Interface List
 20...........................Rowland VPN
 10...00 1a 6b 57 30 02 ......Intel(R) 82566DM Gigabit Network Connection
  1...........................Software Loopback Interface 1
 11...00 00 00 00 00 00 00 e0 Microsoft ISATAP Adapter
 12...00 00 00 00 00 00 00 e0 Teredo Tunneling Pseudo-Interface
 19...00 00 00 00 00 00 00 e0 Microsoft 6to4 Adapter
 21...00 00 00 00 00 00 00 e0 Microsoft ISATAP Adapter #2
===========================================================================

IPv4 Route Table
===========================================================================
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
          0.0.0.0          0.0.0.0   140.247.233.33   140.247.233.41   4491
          0.0.0.0          0.0.0.0         On-link       10.170.30.1     11
      10.170.30.1  255.255.255.255         On-link       10.170.30.1    266
        127.0.0.0        255.0.0.0         On-link         127.0.0.1   4531
        127.0.0.1  255.255.255.255         On-link         127.0.0.1   4531
  127.255.255.255  255.255.255.255         On-link         127.0.0.1   4531
   140.247.233.32  255.255.255.224         On-link    140.247.233.41   4491
   140.247.233.37  255.255.255.255         On-link    140.247.233.41   4236
   140.247.233.41  255.255.255.255         On-link    140.247.233.41   4491
   140.247.233.63  255.255.255.255         On-link    140.247.233.41   4491
        224.0.0.0        240.0.0.0         On-link         127.0.0.1   4531
        224.0.0.0        240.0.0.0         On-link    140.247.233.41   4492
        224.0.0.0        240.0.0.0         On-link       10.170.30.1     11
  255.255.255.255  255.255.255.255         On-link         127.0.0.1   4531
  255.255.255.255  255.255.255.255         On-link    140.247.233.41   4491
  255.255.255.255  255.255.255.255         On-link       10.170.30.1    266
===========================================================================
Persistent Routes:
  Network Address          Netmask  Gateway Address  Metric
          0.0.0.0          0.0.0.0   140.247.233.33  Default
===========================================================================

C:\Users\stern>ping -n 1 10.160.0.2

Pinging 10.160.0.2 with 32 bytes of data:
Reply from 10.160.0.2: bytes=32 time<1ms TTL=64

Ping statistics for 10.160.0.2:
    Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 0ms, Average = 0ms

C:\Users\stern>ipconfig /all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : Windows-test
   Primary Dns Suffix  . . . . . . . : rowland.org
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : rowland.org

PPP adapter Rowland VPN:

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Rowland VPN
   Physical Address. . . . . . . . . :
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   IPv4 Address. . . . . . . . . . . : 10.170.30.1(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.255
   Default Gateway . . . . . . . . . : 0.0.0.0
   DNS Servers . . . . . . . . . . . : 10.160.0.2
                                       10.160.0.3
   Primary WINS Server . . . . . . . : 10.160.0.2
   NetBIOS over Tcpip. . . . . . . . : Disabled

Ethernet adapter Local Area Connection:

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Intel(R) 82566DM Gigabit Network Connection
   Physical Address. . . . . . . . . : 00-1A-6B-57-30-02
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::1426:1891:bf83:3982%10(Preferred)
   IPv4 Address. . . . . . . . . . . : 140.247.233.41(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.224
   Default Gateway . . . . . . . . . : 140.247.233.33
   DHCPv6 IAID . . . . . . . . . . . : 234887787
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-1B-A2-3E-B1-00-1A-6B-57-30-02

   DNS Servers . . . . . . . . . . . : 8.8.8.8
   NetBIOS over Tcpip. . . . . . . . : Enabled
------------------------------------------------------------------------


Although the Windows ipconfig output doesn't show the IP address of the
server side of the ppp tunnel, it does show up in a Details window
under the Network control panel, and it is indeed set to
140.247.233.37.

> > So it looks like the problem has to be fixed either in the kernel or in 
> > the way pppd sets up its routing entry.  Can you guys help?
> 
> I think the easiest solution is to configure pppd to lie to the kernel
> about the remote address.  Who cares what the remote address is on a
> point-to-point link anyway?
> 
> There's currently no option to do this, but the code change in ipcp_up()
> in pppd/ipcp.c would be rather simple.  Just make the "noremoteip" code
> run all the time:
> 
> /* Deliberately falsify the remote address.  We don't care. */
> ho->hisaddr = htonl(0x0aa00002);
> 
> As long as you don't need to contact that specific remote server using
> the badly-assigned "internal" VPN address and can live with the fact
> that you'll either go through the regular Internet to that address or be
> forced to use some other address configured on that server, you should
> be good.
> 
> (The address I used above is 10.160.0.2.  That was one of the internal
> DNS server addresses provided in the log you posted.  It's not necessary
> that the address used here is exactly that, but it may well be helpful.)

That might work.  But using a nonstandard version of pppd would be
awkward, and I would prefer to avoid it.

> If you can't do that for some reason, then I suppose it would be
> possible to use IP Chains (or whatever the packet-modification tool du
> jure is used in your Linux distribution) to nail up an exception so that
> the outside packets go to the outside interface and the inside ones go
> to the PPP interface.  Doing that likely requires selecting on (at
> least!) source address, so it's messy and ugly and possibly error-prone,
> but it might be doable.

That sounds like a fairly easy thing to try.  But it would still 
require manual intervention instead of just working.  Fixing the kernel 
would be preferable, IMO.

> Otherwise, contact the maintainer of that VPN server.  It's just plain
> old broken, and life's too short for broken software.

It is an old Cisco security appliance, no doubt well past End-Of-Life.  
I'm starting to think it might be preferable to throw the thing away 
and start up a VPN server on the department's firewall (which is a 
Linux box) instead.

Alan Stern


^ permalink raw reply

* Re: [PATCH] drivers: net: xgene: Add missing initialization in xgene_enet_ecc_init()
From: Iyappan Subramanian @ 2014-10-20 18:20 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: David S. Miller, Keyur Chudgar, netdev,
	linux-kernel@vger.kernel.org
In-Reply-To: <1413792496-8558-1-git-send-email-geert@linux-m68k.org>

On Mon, Oct 20, 2014 at 1:08 AM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c: In function ‘xgene_enet_ecc_init’:
> drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c:126: warning: ‘data’ may be used uninitialized in this function
>
> Depending on the arbitrary value on the stack, the loop may terminate
> too early, and cause a bogus -ENODEV failure.
>
> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
> ---
>  drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
> index e6d24c2101982444..19e13583b4259cd4 100644
> --- a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
> +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
> @@ -123,7 +123,7 @@ static u32 xgene_enet_rd_mac(struct xgene_enet_pdata *p, u32 rd_addr)
>  static int xgene_enet_ecc_init(struct xgene_enet_pdata *p)
>  {
>         struct net_device *ndev = p->ndev;
> -       u32 data;
> +       u32 data = 0;
>         int i;
>
>         xgene_enet_wr_diag_csr(p, ENET_CFG_MEM_RAM_SHUTDOWN_ADDR, 0);
> --
> 1.9.1
>

Acked-by: Iyappan Subramanian <isubramanian@apm.com>

^ permalink raw reply

* Re: [PATCH v2 0/3] net: minor gso encapsulation fixes
From: David Miller @ 2014-10-20 18:04 UTC (permalink / raw)
  To: fw; +Cc: netdev, edumazet, therbert
In-Reply-To: <1413805758-30026-1-git-send-email-fw@strlen.de>

From: Florian Westphal <fw@strlen.de>
Date: Mon, 20 Oct 2014 13:49:15 +0200

> The following series fixes a minor bug in the gso segmentation handlers
> when encapsulation offload is used.
> 
> Theoretically this could cause kernel panic when the stack tries
> to software-segment such a GRE offload packet, but it looks like there
> is only one affected call site (tbf scheduler) and it handles NULL
> return value.
> 
> I've included a followup patch to add IS_ERR_OR_NULL checks where needed.
> 
> While looking into this, I also found that size computation of the individual
> segments is incorrect if skb->encapsulation is set.
> 
> Please see individual patches for delta vs. v1.

Series applied, thanks Florian.

Longer term I'd really like to see the ops->gso_segment() implementations not
return NULL, but rather locally determinned pointer error codes instead.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox