Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 0/3] virtio-net: inline header support
From: Michael S. Tsirkin @ 2012-10-08 21:31 UTC (permalink / raw)
  To: Rusty Russell
  Cc: kvm, netdev, linux-kernel, Sasha Levin, virtualization, avi,
	Anthony Liguori, Thomas Lendacky
In-Reply-To: <87391u3o67.fsf@rustcorp.com.au>

On Thu, Oct 04, 2012 at 01:04:56PM +0930, Rusty Russell wrote:
> Anthony Liguori <anthony@codemonkey.ws> writes:
> > Rusty Russell <rusty@rustcorp.com.au> writes:
> >
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >>
> >>> Thinking about Sasha's patches, we can reduce ring usage
> >>> for virtio net small packets dramatically if we put
> >>> virtio net header inline with the data.
> >>> This can be done for free in case guest net stack allocated
> >>> extra head room for the packet, and I don't see
> >>> why would this have any downsides.
> >>
> >> I've been wanting to do this for the longest time... but...
> >>
> >>> Even though with my recent patches qemu
> >>> no longer requires header to be the first s/g element,
> >>> we need a new feature bit to detect this.
> >>> A trivial qemu patch will be sent separately.
> >>
> >> There's a reason I haven't done this.  I really, really dislike "my
> >> implemention isn't broken" feature bits.  We could have an infinite
> >> number of them, for each bug in each device.
> >
> > This is a bug in the specification.
> >
> > The QEMU implementation pre-dates the specification.  All of the actual
> > implementations of virtio relied on the semantics of s/g elements and
> > still do.
> 
> lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
> ever going to be merged, so I'm not sure what its status is?  But I'm
> determined to fix qemu, and hence my torture patch to make sure this
> doesn't creep in again.

If you look at my patch you'll notice there's also a
comment in virtio_net.h that seems to be broken in this respect:

/* This is the first element of the scatter-gather list.  If you don't
 * specify GSO or CSUM features, you can simply ignore the header. */

There is a similar comment in virtio-blk.

^ permalink raw reply

* [PATCH net v2 1/6] ipv4: fix sending of redirects
From: Julian Anastasov @ 2012-10-08 21:41 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1349732480-19978-1-git-send-email-ja@ssi.bg>

	After "Cache input routes in fib_info nexthops" (commit
d2d68ba9fe) and "Elide fib_validate_source() completely when possible"
(commit 7a9bc9b81a) we can not send ICMP redirects. It seems we
should not cache the RTCF_DOREDIRECT flag in nh_rth_input because
the same fib_info can be used for traffic that is not redirected,
eg. from other input devices or from sources that are not in same subnet.

	As result, we have to disable the caching of RTCF_DOREDIRECT
flag and to force source validation for the case when forwarding
traffic to the input device. If traffic comes from directly connected
source we allow redirection as it was done before both changes.

	Avoid setting RTCF_DOREDIRECT if IN_DEV_TX_REDIRECTS
is disabled, this can avoid source address validation and to
help caching the routes.

	After the change "Adjust semantics of rt->rt_gateway"
(commit f8126f1d51) we should make sure our ICMP_REDIR_HOST messages
contain daddr instead of 0.0.0.0 when target is directly connected.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 net/ipv4/fib_frontend.c |    3 ++-
 net/ipv4/route.c        |   30 ++++++++++++++++--------------
 2 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 68c93d1..825c608 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -322,7 +322,8 @@ int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
 {
 	int r = secpath_exists(skb) ? 0 : IN_DEV_RPFILTER(idev);
 
-	if (!r && !fib_num_tclassid_users(dev_net(dev))) {
+	if (!r && !fib_num_tclassid_users(dev_net(dev)) &&
+	    (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))) {
 		*itag = 0;
 		return 0;
 	}
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 132e0df..b90da1b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -802,7 +802,8 @@ void ip_rt_send_redirect(struct sk_buff *skb)
 	net = dev_net(rt->dst.dev);
 	peer = inet_getpeer_v4(net->ipv4.peers, ip_hdr(skb)->saddr, 1);
 	if (!peer) {
-		icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST, rt->rt_gateway);
+		icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST,
+			  rt_nexthop(rt, ip_hdr(skb)->daddr));
 		return;
 	}
 
@@ -827,7 +828,9 @@ void ip_rt_send_redirect(struct sk_buff *skb)
 	    time_after(jiffies,
 		       (peer->rate_last +
 			(ip_rt_redirect_load << peer->rate_tokens)))) {
-		icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST, rt->rt_gateway);
+		__be32 gw = rt_nexthop(rt, ip_hdr(skb)->daddr);
+
+		icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST, gw);
 		peer->rate_last = jiffies;
 		++peer->rate_tokens;
 #ifdef CONFIG_IP_ROUTE_VERBOSE
@@ -835,7 +838,7 @@ void ip_rt_send_redirect(struct sk_buff *skb)
 		    peer->rate_tokens == ip_rt_redirect_number)
 			net_warn_ratelimited("host %pI4/if%d ignores redirects for %pI4 to %pI4\n",
 					     &ip_hdr(skb)->saddr, inet_iif(skb),
-					     &ip_hdr(skb)->daddr, &rt->rt_gateway);
+					     &ip_hdr(skb)->daddr, &gw);
 #endif
 	}
 out_put_peer:
@@ -1442,10 +1445,13 @@ static int __mkroute_input(struct sk_buff *skb,
 		goto cleanup;
 	}
 
-	if (out_dev == in_dev && err &&
+	do_cache = res->fi && !itag;
+	if (out_dev == in_dev && err && IN_DEV_TX_REDIRECTS(out_dev) &&
 	    (IN_DEV_SHARED_MEDIA(out_dev) ||
-	     inet_addr_onlink(out_dev, saddr, FIB_RES_GW(*res))))
+	     inet_addr_onlink(out_dev, saddr, FIB_RES_GW(*res)))) {
 		flags |= RTCF_DOREDIRECT;
+		do_cache = false;
+	}
 
 	if (skb->protocol != htons(ETH_P_IP)) {
 		/* Not IP (i.e. ARP). Do not create route, if it is
@@ -1462,15 +1468,11 @@ static int __mkroute_input(struct sk_buff *skb,
 		}
 	}
 
-	do_cache = false;
-	if (res->fi) {
-		if (!itag) {
-			rth = rcu_dereference(FIB_RES_NH(*res).nh_rth_input);
-			if (rt_cache_valid(rth)) {
-				skb_dst_set_noref(skb, &rth->dst);
-				goto out;
-			}
-			do_cache = true;
+	if (do_cache) {
+		rth = rcu_dereference(FIB_RES_NH(*res).nh_rth_input);
+		if (rt_cache_valid(rth)) {
+			skb_dst_set_noref(skb, &rth->dst);
+			goto out;
 		}
 	}
 
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH net v2 2/6] ipv4: fix forwarding for strict source routes
From: Julian Anastasov @ 2012-10-08 21:41 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1349732480-19978-1-git-send-email-ja@ssi.bg>

	After the change "Adjust semantics of rt->rt_gateway"
(commit f8126f1d51) rt_gateway can be 0 but ip_forward() compares
it directly with nexthop. What we want here is to check if traffic
is to directly connected nexthop and to fail if using gateway.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 net/ipv4/ip_forward.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index ab09b12..7f35ac2 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -85,7 +85,7 @@ int ip_forward(struct sk_buff *skb)
 
 	rt = skb_rtable(skb);
 
-	if (opt->is_strictroute && opt->nexthop != rt->rt_gateway)
+	if (opt->is_strictroute && rt->rt_gateway)
 		goto sr_failed;
 
 	if (unlikely(skb->len > dst_mtu(&rt->dst) && !skb_is_gso(skb) &&
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH net v2 0/6] ipv4: Changes for rt_gateway
From: Julian Anastasov @ 2012-10-08 21:41 UTC (permalink / raw)
  To: netdev

	This patchset fixes some problems for the routing caused
by the new rt_gateway semantics. What started as a fix for
IPVS-DR ended as fixes for more problems. To solve the IPVS
problem I decided to name the flag FLOWI_FLAG_KNOWN_NH, so that
we can even get route cached in FNHE or FIB NH.

	Different flag FLOWI_FLAG_RT_NOCACHE could be equally good
for IPVS, we again would be able to use data from fnhe but working
with cached routes should be preferred. If there is no FNHE, the
common case is IPVS to get uncached route, of course, IPVS caches
it itself.

	Patches 1-3 are fixes not related to IPVS problem,
4 and 5 add code that will be used by IPVS in patch 6.

Julian Anastasov (6):
  ipv4: fix sending of redirects
  ipv4: fix forwarding for strict source routes
  ipv4: make sure nh_pcpu_rth_output is always allocated
  ipv4: introduce rt_uses_gateway
  ipv4: Add FLOWI_FLAG_KNOWN_NH
  ipvs: fix ARP resolving for direct routing mode

 include/net/flow.h              |    1 +
 include/net/route.h             |    3 +-
 net/ipv4/fib_frontend.c         |    3 +-
 net/ipv4/fib_semantics.c        |    2 +
 net/ipv4/inet_connection_sock.c |    4 +-
 net/ipv4/ip_forward.c           |    2 +-
 net/ipv4/ip_output.c            |    4 +-
 net/ipv4/route.c                |  102 ++++++++++++++++++++++----------------
 net/ipv4/xfrm4_policy.c         |    1 +
 net/netfilter/ipvs/ip_vs_xmit.c |    6 ++-
 10 files changed, 77 insertions(+), 51 deletions(-)

-- 
1.7.3.4

^ permalink raw reply

* [PATCH net v2 3/6] ipv4: make sure nh_pcpu_rth_output is always allocated
From: Julian Anastasov @ 2012-10-08 21:41 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1349732480-19978-1-git-send-email-ja@ssi.bg>

	Avoid checking nh_pcpu_rth_output in fast path,
abort fib_info creation on alloc_percpu failure.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 net/ipv4/fib_semantics.c |    2 ++
 net/ipv4/route.c         |    3 ---
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 2677530..71b125c 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -840,6 +840,8 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
 	change_nexthops(fi) {
 		nexthop_nh->nh_parent = fi;
 		nexthop_nh->nh_pcpu_rth_output = alloc_percpu(struct rtable __rcu *);
+		if (!nexthop_nh->nh_pcpu_rth_output)
+			goto failure;
 	} endfor_nexthops(fi)
 
 	if (cfg->fc_mx) {
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index b90da1b..5b0180f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1207,8 +1207,6 @@ static bool rt_cache_route(struct fib_nh *nh, struct rtable *rt)
 	if (rt_is_input_route(rt)) {
 		p = (struct rtable **)&nh->nh_rth_input;
 	} else {
-		if (!nh->nh_pcpu_rth_output)
-			goto nocache;
 		p = (struct rtable **)__this_cpu_ptr(nh->nh_pcpu_rth_output);
 	}
 	orig = *p;
@@ -1223,7 +1221,6 @@ static bool rt_cache_route(struct fib_nh *nh, struct rtable *rt)
 		 * unsuccessful at storing this route into the cache
 		 * we really need to set it.
 		 */
-nocache:
 		rt->dst.flags |= DST_NOCACHE;
 		ret = false;
 	}
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH net v2 4/6] ipv4: introduce rt_uses_gateway
From: Julian Anastasov @ 2012-10-08 21:41 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1349732480-19978-1-git-send-email-ja@ssi.bg>

	Add new flag to remember when route is via gateway.
We will use it to allow rt_gateway to contain address of
directly connected host for the cases when DST_NOCACHE is
used or when the NH exception caches per-destination route
without DST_NOCACHE flag, i.e. when routes are not used for
other destinations. By this way we force the neighbour
resolving to work with the routed destination but we
can use different address in the packet, feature needed
for IPVS-DR where original packet for virtual IP is routed
via route to real IP.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/route.h             |    3 +-
 net/ipv4/inet_connection_sock.c |    4 +-
 net/ipv4/ip_forward.c           |    2 +-
 net/ipv4/ip_output.c            |    4 +-
 net/ipv4/route.c                |   48 +++++++++++++++++++++-----------------
 net/ipv4/xfrm4_policy.c         |    1 +
 6 files changed, 34 insertions(+), 28 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index da22243..bc40b63 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -48,7 +48,8 @@ struct rtable {
 	int			rt_genid;
 	unsigned int		rt_flags;
 	__u16			rt_type;
-	__u16			rt_is_input;
+	__u8			rt_is_input;
+	__u8			rt_uses_gateway;
 
 	int			rt_iif;
 
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index f0c5b9c..d34ce29 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -406,7 +406,7 @@ struct dst_entry *inet_csk_route_req(struct sock *sk,
 	rt = ip_route_output_flow(net, fl4, sk);
 	if (IS_ERR(rt))
 		goto no_route;
-	if (opt && opt->opt.is_strictroute && rt->rt_gateway)
+	if (opt && opt->opt.is_strictroute && rt->rt_uses_gateway)
 		goto route_err;
 	return &rt->dst;
 
@@ -442,7 +442,7 @@ struct dst_entry *inet_csk_route_child_sock(struct sock *sk,
 	rt = ip_route_output_flow(net, fl4, sk);
 	if (IS_ERR(rt))
 		goto no_route;
-	if (opt && opt->opt.is_strictroute && rt->rt_gateway)
+	if (opt && opt->opt.is_strictroute && rt->rt_uses_gateway)
 		goto route_err;
 	rcu_read_unlock();
 	return &rt->dst;
diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index 7f35ac2..694de3b 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -85,7 +85,7 @@ int ip_forward(struct sk_buff *skb)
 
 	rt = skb_rtable(skb);
 
-	if (opt->is_strictroute && rt->rt_gateway)
+	if (opt->is_strictroute && rt->rt_uses_gateway)
 		goto sr_failed;
 
 	if (unlikely(skb->len > dst_mtu(&rt->dst) && !skb_is_gso(skb) &&
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 24a29a3..6537a40 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -193,7 +193,7 @@ static inline int ip_finish_output2(struct sk_buff *skb)
 	}
 
 	rcu_read_lock_bh();
-	nexthop = rt->rt_gateway ? rt->rt_gateway : ip_hdr(skb)->daddr;
+	nexthop = (__force u32) rt_nexthop(rt, ip_hdr(skb)->daddr);
 	neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
@@ -371,7 +371,7 @@ int ip_queue_xmit(struct sk_buff *skb, struct flowi *fl)
 	skb_dst_set_noref(skb, &rt->dst);
 
 packet_routed:
-	if (inet_opt && inet_opt->opt.is_strictroute && rt->rt_gateway)
+	if (inet_opt && inet_opt->opt.is_strictroute && rt->rt_uses_gateway)
 		goto no_route;
 
 	/* OK, we know where to send it, allocate and build IP header. */
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 5b0180f..3a116cb 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1126,7 +1126,7 @@ static unsigned int ipv4_mtu(const struct dst_entry *dst)
 	mtu = dst->dev->mtu;
 
 	if (unlikely(dst_metric_locked(dst, RTAX_MTU))) {
-		if (rt->rt_gateway && mtu > 576)
+		if (rt->rt_uses_gateway && mtu > 576)
 			mtu = 576;
 	}
 
@@ -1177,7 +1177,9 @@ static bool rt_bind_exception(struct rtable *rt, struct fib_nh_exception *fnhe,
 		if (fnhe->fnhe_gw) {
 			rt->rt_flags |= RTCF_REDIRECTED;
 			rt->rt_gateway = fnhe->fnhe_gw;
-		}
+			rt->rt_uses_gateway = 1;
+		} else if (!rt->rt_gateway)
+			rt->rt_gateway = daddr;
 
 		orig = rcu_dereference(fnhe->fnhe_rth);
 		rcu_assign_pointer(fnhe->fnhe_rth, rt);
@@ -1186,13 +1188,6 @@ static bool rt_bind_exception(struct rtable *rt, struct fib_nh_exception *fnhe,
 
 		fnhe->fnhe_stamp = jiffies;
 		ret = true;
-	} else {
-		/* Routes we intend to cache in nexthop exception have
-		 * the DST_NOCACHE bit clear.  However, if we are
-		 * unsuccessful at storing this route into the cache
-		 * we really need to set it.
-		 */
-		rt->dst.flags |= DST_NOCACHE;
 	}
 	spin_unlock_bh(&fnhe_lock);
 
@@ -1215,15 +1210,8 @@ static bool rt_cache_route(struct fib_nh *nh, struct rtable *rt)
 	if (prev == orig) {
 		if (orig)
 			rt_free(orig);
-	} else {
-		/* Routes we intend to cache in the FIB nexthop have
-		 * the DST_NOCACHE bit clear.  However, if we are
-		 * unsuccessful at storing this route into the cache
-		 * we really need to set it.
-		 */
-		rt->dst.flags |= DST_NOCACHE;
+	} else
 		ret = false;
-	}
 
 	return ret;
 }
@@ -1284,8 +1272,10 @@ static void rt_set_nexthop(struct rtable *rt, __be32 daddr,
 	if (fi) {
 		struct fib_nh *nh = &FIB_RES_NH(*res);
 
-		if (nh->nh_gw && nh->nh_scope == RT_SCOPE_LINK)
+		if (nh->nh_gw && nh->nh_scope == RT_SCOPE_LINK) {
 			rt->rt_gateway = nh->nh_gw;
+			rt->rt_uses_gateway = 1;
+		}
 		dst_init_metrics(&rt->dst, fi->fib_metrics, true);
 #ifdef CONFIG_IP_ROUTE_CLASSID
 		rt->dst.tclassid = nh->nh_tclassid;
@@ -1294,8 +1284,18 @@ static void rt_set_nexthop(struct rtable *rt, __be32 daddr,
 			cached = rt_bind_exception(rt, fnhe, daddr);
 		else if (!(rt->dst.flags & DST_NOCACHE))
 			cached = rt_cache_route(nh, rt);
-	}
-	if (unlikely(!cached))
+		if (unlikely(!cached)) {
+			/* Routes we intend to cache in nexthop exception or
+			 * FIB nexthop have the DST_NOCACHE bit clear.
+			 * However, if we are unsuccessful at storing this
+			 * route into the cache we really need to set it.
+			 */
+			rt->dst.flags |= DST_NOCACHE;
+			if (!rt->rt_gateway)
+				rt->rt_gateway = daddr;
+			rt_add_uncached_list(rt);
+		}
+	} else
 		rt_add_uncached_list(rt);
 
 #ifdef CONFIG_IP_ROUTE_CLASSID
@@ -1363,6 +1363,7 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	rth->rt_iif	= 0;
 	rth->rt_pmtu	= 0;
 	rth->rt_gateway	= 0;
+	rth->rt_uses_gateway = 0;
 	INIT_LIST_HEAD(&rth->rt_uncached);
 	if (our) {
 		rth->dst.input= ip_local_deliver;
@@ -1432,7 +1433,6 @@ static int __mkroute_input(struct sk_buff *skb,
 		return -EINVAL;
 	}
 
-
 	err = fib_validate_source(skb, saddr, daddr, tos, FIB_RES_OIF(*res),
 				  in_dev->dev, in_dev, &itag);
 	if (err < 0) {
@@ -1488,6 +1488,7 @@ static int __mkroute_input(struct sk_buff *skb,
 	rth->rt_iif 	= 0;
 	rth->rt_pmtu	= 0;
 	rth->rt_gateway	= 0;
+	rth->rt_uses_gateway = 0;
 	INIT_LIST_HEAD(&rth->rt_uncached);
 
 	rth->dst.input = ip_forward;
@@ -1658,6 +1659,7 @@ local_input:
 	rth->rt_iif	= 0;
 	rth->rt_pmtu	= 0;
 	rth->rt_gateway	= 0;
+	rth->rt_uses_gateway = 0;
 	INIT_LIST_HEAD(&rth->rt_uncached);
 	if (res.type == RTN_UNREACHABLE) {
 		rth->dst.input= ip_error;
@@ -1826,6 +1828,7 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 	rth->rt_iif	= orig_oif ? : 0;
 	rth->rt_pmtu	= 0;
 	rth->rt_gateway = 0;
+	rth->rt_uses_gateway = 0;
 	INIT_LIST_HEAD(&rth->rt_uncached);
 
 	RT_CACHE_STAT_INC(out_slow_tot);
@@ -2104,6 +2107,7 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_or
 		rt->rt_flags = ort->rt_flags;
 		rt->rt_type = ort->rt_type;
 		rt->rt_gateway = ort->rt_gateway;
+		rt->rt_uses_gateway = ort->rt_uses_gateway;
 
 		INIT_LIST_HEAD(&rt->rt_uncached);
 
@@ -2182,7 +2186,7 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src,
 		if (nla_put_be32(skb, RTA_PREFSRC, fl4->saddr))
 			goto nla_put_failure;
 	}
-	if (rt->rt_gateway &&
+	if (rt->rt_uses_gateway &&
 	    nla_put_be32(skb, RTA_GATEWAY, rt->rt_gateway))
 		goto nla_put_failure;
 
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 681ea2f..05c5ab8 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -91,6 +91,7 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 					      RTCF_LOCAL);
 	xdst->u.rt.rt_type = rt->rt_type;
 	xdst->u.rt.rt_gateway = rt->rt_gateway;
+	xdst->u.rt.rt_uses_gateway = rt->rt_uses_gateway;
 	xdst->u.rt.rt_pmtu = rt->rt_pmtu;
 	INIT_LIST_HEAD(&xdst->u.rt.rt_uncached);
 
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH net v2 5/6] ipv4: Add FLOWI_FLAG_KNOWN_NH
From: Julian Anastasov @ 2012-10-08 21:41 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1349732480-19978-1-git-send-email-ja@ssi.bg>

	Add flag to request that output route should be
returned with known rt_gateway, in case we want to use
it as nexthop for neighbour resolving.

	The returned route can be cached as follows:

- in NH exception: because the cached routes are not shared
	with other destinations
- in FIB NH: when using gateway because all destinations for
	NH share same gateway

	As last option, to return rt_gateway!=0 we have to
set DST_NOCACHE.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/flow.h |    1 +
 net/ipv4/route.c   |   21 +++++++++++++++++----
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/include/net/flow.h b/include/net/flow.h
index e1dd508..628e11b 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -21,6 +21,7 @@ struct flowi_common {
 	__u8	flowic_flags;
 #define FLOWI_FLAG_ANYSRC		0x01
 #define FLOWI_FLAG_CAN_SLEEP		0x02
+#define FLOWI_FLAG_KNOWN_NH		0x04
 	__u32	flowic_secid;
 };
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 3a116cb..1a0da8d 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1762,6 +1762,7 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 	struct in_device *in_dev;
 	u16 type = res->type;
 	struct rtable *rth;
+	bool do_cache;
 
 	in_dev = __in_dev_get_rcu(dev_out);
 	if (!in_dev)
@@ -1798,24 +1799,36 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 	}
 
 	fnhe = NULL;
+	do_cache = fi != NULL;
 	if (fi) {
 		struct rtable __rcu **prth;
+		struct fib_nh *nh = &FIB_RES_NH(*res);
 
-		fnhe = find_exception(&FIB_RES_NH(*res), fl4->daddr);
+		fnhe = find_exception(nh, fl4->daddr);
 		if (fnhe)
 			prth = &fnhe->fnhe_rth;
-		else
-			prth = __this_cpu_ptr(FIB_RES_NH(*res).nh_pcpu_rth_output);
+		else {
+			if (unlikely(fl4->flowi4_flags &
+				     FLOWI_FLAG_KNOWN_NH &&
+				     !(nh->nh_gw &&
+				       nh->nh_scope == RT_SCOPE_LINK))) {
+				do_cache = false;
+				goto add;
+			}
+			prth = __this_cpu_ptr(nh->nh_pcpu_rth_output);
+		}
 		rth = rcu_dereference(*prth);
 		if (rt_cache_valid(rth)) {
 			dst_hold(&rth->dst);
 			return rth;
 		}
 	}
+
+add:
 	rth = rt_dst_alloc(dev_out,
 			   IN_DEV_CONF_GET(in_dev, NOPOLICY),
 			   IN_DEV_CONF_GET(in_dev, NOXFRM),
-			   fi);
+			   do_cache);
 	if (!rth)
 		return ERR_PTR(-ENOBUFS);
 
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH net v2 6/6] ipvs: fix ARP resolving for direct routing mode
From: Julian Anastasov @ 2012-10-08 21:41 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1349732480-19978-1-git-send-email-ja@ssi.bg>

	After the change "Make neigh lookups directly in output packet path"
(commit a263b30936) IPVS can not reach the real server for DR mode
because we resolve the destination address from IP header, not from
route neighbour. Use the new FLOWI_FLAG_KNOWN_NH flag to request
output routes with known nexthop, so that it has preference
on resolving.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 net/netfilter/ipvs/ip_vs_xmit.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 56f6d5d..cc4c809 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -50,6 +50,7 @@ enum {
 				      * local
 				      */
 	IP_VS_RT_MODE_CONNECT	= 8, /* Always bind route to saddr */
+	IP_VS_RT_MODE_KNOWN_NH	= 16,/* Route via remote addr */
 };
 
 /*
@@ -113,6 +114,8 @@ static struct rtable *do_output_route4(struct net *net, __be32 daddr,
 	fl4.daddr = daddr;
 	fl4.saddr = (rt_mode & IP_VS_RT_MODE_CONNECT) ? *saddr : 0;
 	fl4.flowi4_tos = rtos;
+	fl4.flowi4_flags = (rt_mode & IP_VS_RT_MODE_KNOWN_NH) ?
+			   FLOWI_FLAG_KNOWN_NH : 0;
 
 retry:
 	rt = ip_route_output_key(net, &fl4);
@@ -1061,7 +1064,8 @@ ip_vs_dr_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	if (!(rt = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip,
 				      RT_TOS(iph->tos),
 				      IP_VS_RT_MODE_LOCAL |
-					IP_VS_RT_MODE_NON_LOCAL, NULL)))
+				      IP_VS_RT_MODE_NON_LOCAL |
+				      IP_VS_RT_MODE_KNOWN_NH, NULL)))
 		goto tx_error_icmp;
 	if (rt->rt_flags & RTCF_LOCAL) {
 		ip_rt_put(rt);
-- 
1.7.3.4

^ permalink raw reply related

* Re: [PATCH net v2 0/6] ipv4: Changes for rt_gateway
From: David Miller @ 2012-10-08 21:43 UTC (permalink / raw)
  To: ja; +Cc: netdev
In-Reply-To: <1349732480-19978-1-git-send-email-ja@ssi.bg>

From: Julian Anastasov <ja@ssi.bg>
Date: Tue,  9 Oct 2012 00:41:14 +0300

> 	This patchset fixes some problems for the routing caused
> by the new rt_gateway semantics. What started as a fix for
> IPVS-DR ended as fixes for more problems. To solve the IPVS
> problem I decided to name the flag FLOWI_FLAG_KNOWN_NH, so that
> we can even get route cached in FNHE or FIB NH.
> 
> 	Different flag FLOWI_FLAG_RT_NOCACHE could be equally good
> for IPVS, we again would be able to use data from fnhe but working
> with cached routes should be preferred. If there is no FNHE, the
> common case is IPVS to get uncached route, of course, IPVS caches
> it itself.
> 
> 	Patches 1-3 are fixes not related to IPVS problem,
> 4 and 5 add code that will be used by IPVS in patch 6.

This series looks great, applied, thanks Julian.

^ permalink raw reply

* [PATCH] vxlan: fix more sparse warnings
From: Stephen Hemminger @ 2012-10-08 21:55 UTC (permalink / raw)
  To: Fengguang Wu, David Miller; +Cc: kernel-janitors, netdev
In-Reply-To: <20121007123635.GB24374@localhost>

Fix a couple harmless sparse warnings reported by Fengguang Wu.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
Later patches will fix the other warnings as a side effect of more substantial
changes.

--- a/drivers/net/vxlan.c	2012-10-08 13:41:36.920210233 -0700
+++ b/drivers/net/vxlan.c	2012-10-08 13:44:05.630729809 -0700
@@ -1084,13 +1084,13 @@ static int vxlan_fill_info(struct sk_buf
 	if (nla_put_u32(skb, IFLA_VXLAN_ID, vxlan->vni))
 		goto nla_put_failure;
 
-	if (vxlan->gaddr && nla_put_u32(skb, IFLA_VXLAN_GROUP, vxlan->gaddr))
+	if (vxlan->gaddr && nla_put_be32(skb, IFLA_VXLAN_GROUP, vxlan->gaddr))
 		goto nla_put_failure;
 
 	if (vxlan->link && nla_put_u32(skb, IFLA_VXLAN_LINK, vxlan->link))
 		goto nla_put_failure;
 
-	if (vxlan->saddr && nla_put_u32(skb, IFLA_VXLAN_LOCAL, vxlan->saddr))
+	if (vxlan->saddr && nla_put_be32(skb, IFLA_VXLAN_LOCAL, vxlan->saddr))
 		goto nla_put_failure;
 
 	if (nla_put_u8(skb, IFLA_VXLAN_TTL, vxlan->ttl) ||

^ permalink raw reply

* Re: [PATCH] vxlan: fix more sparse warnings
From: David Miller @ 2012-10-08 21:58 UTC (permalink / raw)
  To: shemminger; +Cc: fengguang.wu, kernel-janitors, netdev
In-Reply-To: <20121008145530.54a56962@nehalam.linuxnetplumber.net>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 8 Oct 2012 14:55:30 -0700

> Fix a couple harmless sparse warnings reported by Fengguang Wu.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied, thanks Stephen.

^ permalink raw reply

* Re: [PATCH] compat-drivers: update ethernet driver alx in crap dir
From: Luis R. Rodriguez @ 2012-10-08 22:25 UTC (permalink / raw)
  To: xiong
  Cc: mcgrof, backports, nic-devel, Ren Cloud, Greg Kroah-Hartman,
	linux-kernel, netdev, linux-wireless, qca_vkondrat
In-Reply-To: <1349400892-25315-1-git-send-email-xiong@qca.qualcomm.com>

On Thu, Oct 4, 2012 at 6:34 PM,  <xiong@qca.qualcomm.com> wrote:
> From: xiong <xiong@qca.qualcomm.com>
>
> 1. support new device id (0x10A0/0x10A1).
> 2. add DEBUG_FS interface for diag/swoi functions.
>
> Signed-off-by: Ren Cloud <cjren@qca.qualcomm.com>
> Signed-off-by: xiong <xiong@qca.qualcomm.com>

Xiong,

 -- Vladimir, just a heads up -- this applies to you as well for the
802.11ad wil6210 driver
 -- Greg, some review on your preference on this would be appreciated

The original alx crap patch was added into compat-wireless on the
linux-3.5.y branch. Its been two kernel releases and alx is not yet
upstream and users can only get alx via compat-drivers (technically
compat-wireless as that was pre v3.7). v3.7 would be the *third*
release in which this would happen... This is unfair to users and
consumers of the Linux kernel and derails expectations and our
arrangements for Linux kernel development. I realize that the goal was
to get alx upstream ASAP but regardless of what the reason is, its not
yet upstream. If you cannot work on alx on a timely manner to get
upstream then please submit the driver to the staging area of the
Linux kernel that Greg maintains so that other developers who may be
able to help can submit patches to help you. Under staging your driver
should be accepted so long as it compiles.

I will update the documentation for crap/ patches for compat-drivers
to make it clear now that crap/ patches can be used for adding
components / pieces of code not yet ready for upstream but as far as
full new drivers are concerned you only get one kernel release cycle
for it to linger on crap/ under compat-drivers, if you haven't
addressed upstreaming yet then it should go to drivers/staging/. That
is crap/ should only be used as a shortcut because users exist that
can use the driver but you *do* have a team properly resourced to
address upstreaming properly in a timely manner.

Linus should soon release v3.7-rc1 and new drivers are allowed to be
merged during the RC cycles, as such my recommendation is instead of
getting users to consume alx only through compat-drivers you now
submit alx into staging to Greg in hopes that we can get it into
v3.7-rcX some time, and at that time we can remove the crap/ patch
from compat-drivers.

Users should be able to consume new drivers through kernel.org and
compat-drivers should only provide
the framework for backporting and also categorizing quick fixes. It
should not be used for ongoing
updates for new drivers that users need.

We must draw the line with crap/ patches somewhere.

I'll then take this patch for now but do expect you to get alx into
either staging or proper upstream for the v3.7-rcX. I welcome feedback
from other folks on the proposed arrangement for crap/ patches for
compat-drivers.

https://backports.wiki.kernel.org/index.php/Documentation/compat-drivers/additional-patches#crap_patches

  Luis

^ permalink raw reply

* [ANNOUNCE] libnetfilter_acct 1.0.1 release
From: Pablo Neira Ayuso @ 2012-10-08 22:49 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, netfilter, netfilter-announce, lwn

[-- Attachment #1: Type: text/plain, Size: 394 bytes --]

Hi!

The Netfilter project proudly presents:

        libnetfilter_acct 1.0.1

This release adds the new flag NFACCT_SNPRINTF_T_XML for
nfacct_snprintf and several cleanups.

See ChangeLog that comes attached to this email for more details.

You can download it from:

http://www.netfilter.org/projects/libnetfilter_acct/downloads.html
ftp://ftp.netfilter.org/pub/libnetfilter_acct/

Have fun!

[-- Attachment #2: changes-libnetfilter_acct-1.0.1.txt --]
[-- Type: text/plain, Size: 317 bytes --]

Jan Engelhardt (1):
      build: remove unnecessary pkgconfig->config.status dependency

Pablo Neira Ayuso (4):
      src: NFACCT_SNPRINTF_T_XML flag for nfacct_snprintf to output time
      build: bump version to 1.0.1
      src: remove unnecessary castings
      src: NFACCT_PKTS and NFACCT_BYTES are MNL_TYPE_U64


^ permalink raw reply

* Re: [GIT PULL nf-next] IPVS for 3.7 #2
From: Pablo Neira Ayuso @ 2012-10-08 23:17 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David Miller, Simon Horman, lvs-devel, netdev, netfilter-devel,
	Wensong Zhang, Julian Anastasov, Hans Schillstrom,
	Hans Schillstrom
In-Reply-To: <1349721865.22194.53.camel@localhost>

Hi Jesper,

On Mon, Oct 08, 2012 at 08:44:25PM +0200, Jesper Dangaard Brouer wrote:
> Hey Pablo,
> 
> These changes were intended for 3.7, but just checked you git tree...
> and it looks like you didn't pull in Simon's changes, and thus they have
> not hit DaveM's tree for the merge window :-(

Sorry, you sent this pull request by friday 28/09. Net-next was closed
on tuesday 02/10. That was tight. I don't like to push things too hard
to David by last time.

My intention is still to pass this to net-next once it gets opened
again, of course. So don't worry, we still have the chance to get this
in.

I see at least one fix in this patchset, we can still pass it that to
3.7 if you want.

Let me know what you prefer.

> --Jesper
> 
> 
> On Fri, 2012-09-28 at 11:54 +0900, Simon Horman wrote:
> > Hi Pablo,
> > 
> > please consider the following enhancements to IPVS for inclusion in 3.7.
> > 
> > ----------------------------------------------------------------
> > The following changes since commit 82c93fcc2e1737fede2752520f1bf8f4de6304d8:
> > 
> >   x86: bpf_jit_comp: add XOR instruction for BPF JIT (2012-09-24 16:54:35 -0400)
> > 
> > are available in the git repository at:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git master
> > 
> > for you to fetch changes up to 92eec78d25aee6bbc9bd295f51c022ddfa80cdd9:
> > 
> >   ipvs: SIP fragment handling (2012-09-28 11:37:16 +0900)
> > 
> > ----------------------------------------------------------------
> > Jesper Dangaard Brouer (7):
> >       ipvs: Trivial changes, use compressed IPv6 address in output
> >       ipvs: IPv6 extend ICMPv6 handling for future types
> >       ipvs: Use config macro IS_ENABLED()
> >       ipvs: Fix faulty IPv6 extension header handling in IPVS
> >       ipvs: Complete IPv6 fragment handling for IPVS
> >       ipvs: API change to avoid rescan of IPv6 exthdr
> >       ipvs: SIP fragment handling
> > 
> >  include/net/ip_vs.h                     |  194 +++++++++++----
> >  net/netfilter/ipvs/Kconfig              |    7 +-
> >  net/netfilter/ipvs/ip_vs_conn.c         |   15 +-
> >  net/netfilter/ipvs/ip_vs_core.c         |  404 +++++++++++++++++--------------
> >  net/netfilter/ipvs/ip_vs_dh.c           |    2 +-
> >  net/netfilter/ipvs/ip_vs_lblc.c         |    2 +-
> >  net/netfilter/ipvs/ip_vs_lblcr.c        |    2 +-
> >  net/netfilter/ipvs/ip_vs_pe_sip.c       |   18 +-
> >  net/netfilter/ipvs/ip_vs_proto.c        |    6 +-
> >  net/netfilter/ipvs/ip_vs_proto_ah_esp.c |    9 +-
> >  net/netfilter/ipvs/ip_vs_proto_sctp.c   |   42 ++--
> >  net/netfilter/ipvs/ip_vs_proto_tcp.c    |   40 ++-
> >  net/netfilter/ipvs/ip_vs_proto_udp.c    |   41 ++--
> >  net/netfilter/ipvs/ip_vs_sched.c        |    2 +-
> >  net/netfilter/ipvs/ip_vs_sh.c           |    2 +-
> >  net/netfilter/ipvs/ip_vs_xmit.c         |   73 +++---
> >  net/netfilter/xt_ipvs.c                 |    4 +-
> >  17 files changed, 501 insertions(+), 362 deletions(-)
> 
> 

^ permalink raw reply

* Re: [PATCH net] e1000e: Change wthresh to 1 to avoid possible Tx stalls.
From: Frank Reppin @ 2012-10-09  0:25 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1339044752.2075.14.camel@jtkirshe-mobl>

Jeff Kirsher <jeffrey.t.kirsher <at> intel.com> writes:
> 
> On Thu, 2012-06-07 at 06:24 +0200, Eric Dumazet wrote:
> > On Wed, 2012-06-06 at 17:59 -0700, Jeff Kirsher wrote:
> > 
> > > After further internal review, NACK.
> > > 
> > > This patch will cause unacceptable performance issues with non-ESB2
> > > parts.
> > > 
> > > I am dropping this patch from my queue.
> > > 
> > 
> > I'd like you share your performance numbers before NACKing this patch.
> > 
> > What is the alternative patch you guys have ?
> > 
> 
> Jesse did not share any performance numbers with me, I am sure he can
> give some background tomorrow when he is back online.
> 
> I am working on an alternative patch now and should have something to
> share tomorrow.
Please allow me to ask if there's any progess here?

I've tried 3.5.4 a couple of days ago on a SuperMicro X8SIE-LN4 (82574L)
and could still observe severe latency (up to 3000ms) spikes.

Applying Hiroakis suggested patch did fix this for me as well.
[please note as well that I didn't had this issue in any 3.4.x kernel
before - so +1 for fixing the regression]

Thankyou!
Frank Reppin

-- 

^ permalink raw reply

* Re: [PATCH] compat-drivers: update ethernet driver alx in crap dir
From: Greg Kroah-Hartman @ 2012-10-09  0:42 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: xiong-A+ZNKFmMK5xy9aJCnZT0Uw, mcgrof-DgEjT+Ai2ygdnm+yROfE0A,
	backports-u79uwXL29TY76Z2rM5mHXA,
	nic-devel-zC7DfRvBq/JWk0Htik3J/w, Ren Cloud,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-wireless,
	qca_vkondrat-A+ZNKFmMK5xy9aJCnZT0Uw
In-Reply-To: <CAB=NE6VotAu-cugsn4==3TPX0OGPEFtdn1J+5+Rd_YfLcg9YDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Mon, Oct 08, 2012 at 03:25:08PM -0700, Luis R. Rodriguez wrote:
> On Thu, Oct 4, 2012 at 6:34 PM,  <xiong-A+ZNKFmMK5xy9aJCnZT0Uw@public.gmane.org> wrote:
> > From: xiong <xiong-A+ZNKFmMK5xy9aJCnZT0Uw@public.gmane.org>
> >
> > 1. support new device id (0x10A0/0x10A1).
> > 2. add DEBUG_FS interface for diag/swoi functions.
> >
> > Signed-off-by: Ren Cloud <cjren-A+ZNKFmMK5xy9aJCnZT0Uw@public.gmane.org>
> > Signed-off-by: xiong <xiong-A+ZNKFmMK5xy9aJCnZT0Uw@public.gmane.org>
> 
> Xiong,
> 
>  -- Vladimir, just a heads up -- this applies to you as well for the
> 802.11ad wil6210 driver
>  -- Greg, some review on your preference on this would be appreciated

Preference on what?  I've never seen this driver before, why wasn't it
submitted to be in the staging tree in the first place?

> The original alx crap patch was added into compat-wireless on the
> linux-3.5.y branch.

What is "crap patch"?  And is this the old driver with the dubios
history keeping it from ever being merged anywhere, or is this something
new?

> Its been two kernel releases and alx is not yet
> upstream and users can only get alx via compat-drivers (technically
> compat-wireless as that was pre v3.7). v3.7 would be the *third*
> release in which this would happen... This is unfair to users and
> consumers of the Linux kernel and derails expectations and our
> arrangements for Linux kernel development. I realize that the goal was
> to get alx upstream ASAP but regardless of what the reason is, its not
> yet upstream. If you cannot work on alx on a timely manner to get
> upstream then please submit the driver to the staging area of the
> Linux kernel that Greg maintains so that other developers who may be
> able to help can submit patches to help you. Under staging your driver
> should be accepted so long as it compiles.
> 
> I will update the documentation for crap/ patches for compat-drivers
> to make it clear now that crap/ patches can be used for adding
> components / pieces of code not yet ready for upstream but as far as
> full new drivers are concerned you only get one kernel release cycle
> for it to linger on crap/ under compat-drivers, if you haven't
> addressed upstreaming yet then it should go to drivers/staging/. That
> is crap/ should only be used as a shortcut because users exist that
> can use the driver but you *do* have a team properly resourced to
> address upstreaming properly in a timely manner.

Why do you even need crap/ at all?  What is keeping drivers like this
(if it isn't the legally dubious driver) from being merged into staging
today?  And if it is the legally dubious driver, well, you had better
not be taking it in your tree either...

confused,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] compat-drivers: update ethernet driver alx in crap dir
From: Luis R. Rodriguez @ 2012-10-09  1:14 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: xiong, mcgrof, backports, nic-devel, Ren Cloud, linux-kernel,
	netdev, linux-wireless, qca_vkondrat
In-Reply-To: <20121009004209.GC9068@kroah.com>

On Mon, Oct 8, 2012 at 5:42 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Mon, Oct 08, 2012 at 03:25:08PM -0700, Luis R. Rodriguez wrote:
>> On Thu, Oct 4, 2012 at 6:34 PM,  <xiong@qca.qualcomm.com> wrote:
>> > From: xiong <xiong@qca.qualcomm.com>
>> >
>> > 1. support new device id (0x10A0/0x10A1).
>> > 2. add DEBUG_FS interface for diag/swoi functions.
>> >
>> > Signed-off-by: Ren Cloud <cjren@qca.qualcomm.com>
>> > Signed-off-by: xiong <xiong@qca.qualcomm.com>
>>
>> Xiong,
>>
>>  -- Vladimir, just a heads up -- this applies to you as well for the
>> 802.11ad wil6210 driver
>>  -- Greg, some review on your preference on this would be appreciated
>
> Preference on what?  I've never seen this driver before, why wasn't it
> submitted to be in the staging tree in the first place?

Because the goal was to jump straight to proper upstream and this was
expected to happen rather easily.

>> The original alx crap patch was added into compat-wireless on the
>> linux-3.5.y branch.
>
> What is "crap patch"?  And is this the old driver with the dubios
> history keeping it from ever being merged anywhere, or is this something
> new?

The driver never had any legal dubious issues. The issues with the alx
driver were purely technical.

>> Its been two kernel releases and alx is not yet
>> upstream and users can only get alx via compat-drivers (technically
>> compat-wireless as that was pre v3.7). v3.7 would be the *third*
>> release in which this would happen... This is unfair to users and
>> consumers of the Linux kernel and derails expectations and our
>> arrangements for Linux kernel development. I realize that the goal was
>> to get alx upstream ASAP but regardless of what the reason is, its not
>> yet upstream. If you cannot work on alx on a timely manner to get
>> upstream then please submit the driver to the staging area of the
>> Linux kernel that Greg maintains so that other developers who may be
>> able to help can submit patches to help you. Under staging your driver
>> should be accepted so long as it compiles.
>>
>> I will update the documentation for crap/ patches for compat-drivers
>> to make it clear now that crap/ patches can be used for adding
>> components / pieces of code not yet ready for upstream but as far as
>> full new drivers are concerned you only get one kernel release cycle
>> for it to linger on crap/ under compat-drivers, if you haven't
>> addressed upstreaming yet then it should go to drivers/staging/. That
>> is crap/ should only be used as a shortcut because users exist that
>> can use the driver but you *do* have a team properly resourced to
>> address upstreaming properly in a timely manner.
>
> Why do you even need crap/ at all?  What is keeping drivers like this
> (if it isn't the legally dubious driver) from being merged into staging
> today?  And if it is the legally dubious driver, well, you had better
> not be taking it in your tree either...

crap/ was invented for patches to existing code that people wrote that
they for whatever reason did not think would ever get upstream but yet
they *needed* to support for customer deliveries. Consider a feature
that won't be acceptable but yet a customer *needs* today. In such
case we know the developer should post it as it will get rejected. The
developer may also know that they have to adjust the code to meet
upstream criteria, and they might do that in 1 or 2 future kernel
releases. Without having a mechanism to allow these type of patches
folks go on a forking bandwagon and at times creates a slippery slope
to never go back upstream.

crap/ then was used later for alx as a full driver rather than
drivers/staging/ given that the developers believed they could address
upstream concerns within a release cycle and they needed a release
ASAP. The alx driver needed to be changed to remove atl1c device
support as per review and only support new generation devices. It was
a lot easier for the developers to address that internally rather than
working on drivers/staging. In the end alx is still not upstream as
more technical issues have have not yet been addressed.

I'm considering perhaps just not allowing full drivers through crap/
on compat-drivers and simply having developers bite the bullet and
have to go through staging if they really need a release ASAP and are
not yet ready for upstream.

At this point I am revisiting the policy of when / if to allow drivers
at all through crap/ and that's what I was asking for review for.

  Luis

^ permalink raw reply

* RE: [PATCH] compat-drivers: update ethernet driver alx in crap dir
From: Huang, Xiong @ 2012-10-09  1:24 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: mcgrof-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	backports-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, nic-devel,
	Ren, Cloud, Greg Kroah-Hartman,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-wireless,
	qca_vkondrat
In-Reply-To: <CAB=NE6VotAu-cugsn4==3TPX0OGPEFtdn1J+5+Rd_YfLcg9YDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Hi Luis

    I'm refining the code, I try my best to make it upstream ASAP. Thanks !

-Xiong

> -----Original Message-----
> From: mcgrof@gmail.com [mailto:mcgrof@gmail.com] On Behalf Of Luis R.
> Rodriguez
> Sent: Tuesday, October 09, 2012 6:25
> To: Huang, Xiong
> Cc: mcgrof@kernel.org; backports@vger.kernel.org; nic-devel; Ren, Cloud;
> Greg Kroah-Hartman; linux-kernel@vger.kernel.org; netdev@vger.kernel.org;
> linux-wireless; qca_vkondrat
> Subject: Re: [PATCH] compat-drivers: update ethernet driver alx in crap dir
> 
> On Thu, Oct 4, 2012 at 6:34 PM,  <xiong@qca.qualcomm.com> wrote:
> > From: xiong <xiong@qca.qualcomm.com>
> >
> > 1. support new device id (0x10A0/0x10A1).
> > 2. add DEBUG_FS interface for diag/swoi functions.
> >
> > Signed-off-by: Ren Cloud <cjren@qca.qualcomm.com>
> > Signed-off-by: xiong <xiong@qca.qualcomm.com>
> 
> Xiong,
> 
>  -- Vladimir, just a heads up -- this applies to you as well for the 802.11ad
> wil6210 driver
>  -- Greg, some review on your preference on this would be appreciated
> 
> The original alx crap patch was added into compat-wireless on the linux-3.5.y
> branch. Its been two kernel releases and alx is not yet upstream and users can
> only get alx via compat-drivers (technically compat-wireless as that was pre
> v3.7). v3.7 would be the *third* release in which this would happen... This is
> unfair to users and consumers of the Linux kernel and derails expectations and
> our arrangements for Linux kernel development. I realize that the goal was to
> get alx upstream ASAP but regardless of what the reason is, its not yet
> upstream. If you cannot work on alx on a timely manner to get upstream then
> please submit the driver to the staging area of the Linux kernel that Greg
> maintains so that other developers who may be able to help can submit
> patches to help you. Under staging your driver should be accepted so long as it
> compiles.
> 
> I will update the documentation for crap/ patches for compat-drivers to make
> it clear now that crap/ patches can be used for adding components / pieces of
> code not yet ready for upstream but as far as full new drivers are concerned
> you only get one kernel release cycle for it to linger on crap/ under compat-
> drivers, if you haven't addressed upstreaming yet then it should go to
> drivers/staging/. That is crap/ should only be used as a shortcut because users
> exist that can use the driver but you *do* have a team properly resourced to
> address upstreaming properly in a timely manner.
> 
> Linus should soon release v3.7-rc1 and new drivers are allowed to be merged
> during the RC cycles, as such my recommendation is instead of getting users to
> consume alx only through compat-drivers you now submit alx into staging to
> Greg in hopes that we can get it into v3.7-rcX some time, and at that time we
> can remove the crap/ patch from compat-drivers.
> 
> Users should be able to consume new drivers through kernel.org and compat-
> drivers should only provide the framework for backporting and also
> categorizing quick fixes. It should not be used for ongoing updates for new
> drivers that users need.
> 
> We must draw the line with crap/ patches somewhere.
> 
> I'll then take this patch for now but do expect you to get alx into either staging
> or proper upstream for the v3.7-rcX. I welcome feedback from other folks on
> the proposed arrangement for crap/ patches for compat-drivers.
> 
> https://backports.wiki.kernel.org/index.php/Documentation/compat-
> drivers/additional-patches#crap_patches
> 
>   Luis

^ permalink raw reply

* Re: [PATCH 08/16] ipvs: fix ip_vs_set_timeout debug messages
From: Simon Horman @ 2012-10-09  1:48 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Julian Anastasov, Krzysztof Halasa, linux-arm-kernel,
	linux-kernel, arm, David S. Miller, netdev, netfilter-devel,
	netfilter, coreteam
In-Reply-To: <201210060954.15831.arnd@arndb.de>

On Sat, Oct 06, 2012 at 09:54:15AM +0000, Arnd Bergmann wrote:
> On Saturday 06 October 2012, Julian Anastasov wrote:
> > On Sat, 6 Oct 2012, Arnd Bergmann wrote:
> > > > 	Are there any CONFIG_IP_VS_PROTO_xxx options in this
> > > > default config? It is a waste of memory if IPVS is compiled
> > > > without any protocols.
> > > 
> > > They all appear to be turned off:
> > > 
> > > $ grep CONFIG_IP_VS obj-tmp/.config
> > > CONFIG_IP_VS=m
> > > CONFIG_IP_VS_DEBUG=y
> > > CONFIG_IP_VS_TAB_BITS=12
> > > # CONFIG_IP_VS_PROTO_TCP is not set
> > > # CONFIG_IP_VS_PROTO_UDP is not set
> > > # CONFIG_IP_VS_PROTO_AH_ESP is not set
> > > # CONFIG_IP_VS_PROTO_ESP is not set
> > > # CONFIG_IP_VS_PROTO_AH is not set
> > > # CONFIG_IP_VS_PROTO_SCTP is not set
> > 
> > 	Something should be changed here, may be at least
> > TCP/UDP, who knows.
> 
> I don't try to read too much into our defconfigs. We have 140 of them
> on ARM, and they are mainly useful to give a reasonable build coverage,
> but I wouldn't expect them to be actually used on that hardware.
> 
> I'll leave it up to Krzysztof to send a patch for this if he wants.
> 
> > > --- a/net/netfilter/ipvs/ip_vs_ctl.c
> > > +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> > > @@ -2590,6 +2588,7 @@ __ip_vs_get_timeouts(struct net *net, struct ip_vs_timeout_user *u)
> > >  #if defined(CONFIG_IP_VS_PROTO_TCP) || defined(CONFIG_IP_VS_PROTO_UDP)
> > >  	struct ip_vs_proto_data *pd;
> > >  #endif
> > 
> > 	That is what we want. If you plan another submission
> > you can add empty line before this memset and to replace
> > the __ip_vs_get_timeouts call in ip_vs_genl_set_config with
> > memset but they are cosmetic changes. Or may be Simon will
> > take care about the coding style when applying the change.
> > 
> > Acked-by: Julian Anastasov <ja@ssi.bg>
> 
> I'd prefer Simon to pick up the patch. He should also decide whether he wants
> to add it to stable. In theory, this is a small leak of kernel stack data
> to user space, but as you say in practice it should not happen because it
> only exists for silly configurations that nobody should be using.
> 
> AFAICT, removing the call to __ip_vs_get_timeouts in do_ip_vs_get_ctl would
> be a semantic change for the case where a user sends a IPVS_CMD_SET_CONFIG
> message without without the complete set of attributes inside it. The current
> behavior is to leave the timeouts alone, replacing the __ip_vs_get_timeouts
> with a memset would zero them. I left this part alone then.
> 
> 	Arnd

Hi,

sorry for being a bit slow, it was a long weekend here.
This patch looks reasonable and I think it is appropriate for stable.
I'll see about getting it merged accordingly.

> 
> 8<-----
> ipvs: initialize returned data in do_ip_vs_get_ctl
> 
> As reported by a gcc warning, the do_ip_vs_get_ctl does not initalize
> all the members of the ip_vs_timeout_user structure it returns if
> at least one of the TCP or UDP protocols is disabled for ipvs. 
> 
> This makes sure that the data is always initialized, before it is
> returned as a response to IPVS_CMD_GET_CONFIG or printed as a
> debug message in IPVS_CMD_SET_CONFIG.
> 
> Without this patch, building ARM ixp4xx_defconfig results in:
> 
> net/netfilter/ipvs/ip_vs_ctl.c: In function 'ip_vs_genl_set_cmd':
> net/netfilter/ipvs/ip_vs_ctl.c:2238:47: warning: 't.udp_timeout' may be used uninitialized in this function [-Wuninitialized]
> net/netfilter/ipvs/ip_vs_ctl.c:3322:28: note: 't.udp_timeout' was declared here
> net/netfilter/ipvs/ip_vs_ctl.c:2238:47: warning: 't.tcp_fin_timeout' may be used uninitialized in this function [-Wuninitialized]
> net/netfilter/ipvs/ip_vs_ctl.c:3322:28: note: 't.tcp_fin_timeout' was declared here
> net/netfilter/ipvs/ip_vs_ctl.c:2238:47: warning: 't.tcp_timeout' may be used uninitialized in this function [-Wuninitialized]
> net/netfilter/ipvs/ip_vs_ctl.c:3322:28: note: 't.tcp_timeout' was declared here
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Acked-by: Julian Anastasov <ja@ssi.bg>
> ---
> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> index 2770f85..c4ee437 100644
> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> @@ -2591,6 +2589,8 @@ __ip_vs_get_timeouts(struct net *net, struct ip_vs_timeout_user *u)
>  	struct ip_vs_proto_data *pd;
>  #endif
>  
> +	memset(u, 0, sizeof (*u));
> +
>  #ifdef CONFIG_IP_VS_PROTO_TCP
>  	pd = ip_vs_proto_data_get(net, IPPROTO_TCP);
>  	u->tcp_timeout = pd->timeout_table[IP_VS_TCP_S_ESTABLISHED] / HZ;
> @@ -2768,7 +2768,6 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
>  	{
>  		struct ip_vs_timeout_user t;
>  
> -		memset(&t, 0, sizeof(t));
>  		__ip_vs_get_timeouts(net, &t);
>  		if (copy_to_user(user, &t, sizeof(t)) != 0)
>  			ret = -EFAULT;
> 

^ permalink raw reply

* Re: [RFC PATCH net-next] tcp: introduce tcp_tw_interval to specifiy the time of TIME-WAIT
From: Cong Wang @ 2012-10-09  3:42 UTC (permalink / raw)
  To: Neil Horman
  Cc: netdev, David S. Miller, Alexey Kuznetsov, Patrick McHardy,
	Eric Dumazet
In-Reply-To: <20121008140743.GC22939@hmsreliant.think-freely.org>

On Mon, 2012-10-08 at 10:07 -0400, Neil Horman wrote:
> On Mon, Oct 08, 2012 at 11:17:37AM +0800, Cong Wang wrote:
> > On Tue, 2012-10-02 at 08:09 -0400, Neil Horman wrote:
> > > No, its not very friendly, but the people using this are violating the RFC,
> > > which isn't very friendly. :)
> > 
> > Could you be more specific? In RFC 793, AFAIK, it is allowed to be
> > changed:
> > 
> > http://tools.ietf.org/html/rfc793
> > 
> > " To be sure that a TCP does not create a segment that carries a
> >   sequence number which may be duplicated by an old segment remaining in
> >   the network, the TCP must keep quiet for a maximum segment lifetime
> >   (MSL) before assigning any sequence numbers upon starting up or
> >   recovering from a crash in which memory of sequence numbers in use was
> >   lost.  For this specification the MSL is taken to be 2 minutes.  This
> >   is an engineering choice, and may be changed if experience indicates
> >   it is desirable to do so."
> > 
> Its the length of time that represents an MSL that was the choice, not the fact
> that reusing a TCP before the expiration of the MSL is a bad idea.
> 
> > or I must still be missing something here... :)
> > 
> Next paragraph down:
> 	This specification provides that hosts which "crash" without
>     retaining any knowledge of the last sequence numbers transmitted on
>     each active (i.e., not closed) connection shall delay emitting any
>     TCP segments for at least the agreed Maximum Segment Lifetime (MSL)
>     in the internet system of which the host is a part.  In the
>     paragraphs below, an explanation for this specification is given.
>     TCP implementors may violate the "quiet time" restriction, but only
>     at the risk of causing some old data to be accepted as new or new
>     data rejected as old duplicated by some receivers in the internet
>     system. .... etc.
> 
> 

Ah, ok. Thanks for the detailed answer!

^ permalink raw reply

* Re: [GIT PULL nf-next] IPVS for 3.7 #2
From: Simon Horman @ 2012-10-09  3:52 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jesper Dangaard Brouer, David Miller, lvs-devel, netdev,
	netfilter-devel, Wensong Zhang, Julian Anastasov,
	Hans Schillstrom, Hans Schillstrom
In-Reply-To: <20121008231704.GA7974@1984>

On Tue, Oct 09, 2012 at 01:17:04AM +0200, Pablo Neira Ayuso wrote:
> Hi Jesper,
> 
> On Mon, Oct 08, 2012 at 08:44:25PM +0200, Jesper Dangaard Brouer wrote:
> > Hey Pablo,
> > 
> > These changes were intended for 3.7, but just checked you git tree...
> > and it looks like you didn't pull in Simon's changes, and thus they have
> > not hit DaveM's tree for the merge window :-(
> 
> Sorry, you sent this pull request by friday 28/09. Net-next was closed
> on tuesday 02/10. That was tight. I don't like to push things too hard
> to David by last time.

I understand.

> My intention is still to pass this to net-next once it gets opened
> again, of course. So don't worry, we still have the chance to get this
> in.
> 
> I see at least one fix in this patchset, we can still pass it that to
> 3.7 if you want.

There is a fix from Arnd that I would like incorporated in 3.7.
I will send a pull-request for that a little later.

> Let me know what you prefer.
> 
> > --Jesper
> > 
> > 
> > On Fri, 2012-09-28 at 11:54 +0900, Simon Horman wrote:
> > > Hi Pablo,
> > > 
> > > please consider the following enhancements to IPVS for inclusion in 3.7.
> > > 
> > > ----------------------------------------------------------------
> > > The following changes since commit 82c93fcc2e1737fede2752520f1bf8f4de6304d8:
> > > 
> > >   x86: bpf_jit_comp: add XOR instruction for BPF JIT (2012-09-24 16:54:35 -0400)
> > > 
> > > are available in the git repository at:
> > > 
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git master
> > > 
> > > for you to fetch changes up to 92eec78d25aee6bbc9bd295f51c022ddfa80cdd9:
> > > 
> > >   ipvs: SIP fragment handling (2012-09-28 11:37:16 +0900)
> > > 
> > > ----------------------------------------------------------------
> > > Jesper Dangaard Brouer (7):
> > >       ipvs: Trivial changes, use compressed IPv6 address in output
> > >       ipvs: IPv6 extend ICMPv6 handling for future types
> > >       ipvs: Use config macro IS_ENABLED()
> > >       ipvs: Fix faulty IPv6 extension header handling in IPVS
> > >       ipvs: Complete IPv6 fragment handling for IPVS
> > >       ipvs: API change to avoid rescan of IPv6 exthdr
> > >       ipvs: SIP fragment handling
> > > 
> > >  include/net/ip_vs.h                     |  194 +++++++++++----
> > >  net/netfilter/ipvs/Kconfig              |    7 +-
> > >  net/netfilter/ipvs/ip_vs_conn.c         |   15 +-
> > >  net/netfilter/ipvs/ip_vs_core.c         |  404 +++++++++++++++++--------------
> > >  net/netfilter/ipvs/ip_vs_dh.c           |    2 +-
> > >  net/netfilter/ipvs/ip_vs_lblc.c         |    2 +-
> > >  net/netfilter/ipvs/ip_vs_lblcr.c        |    2 +-
> > >  net/netfilter/ipvs/ip_vs_pe_sip.c       |   18 +-
> > >  net/netfilter/ipvs/ip_vs_proto.c        |    6 +-
> > >  net/netfilter/ipvs/ip_vs_proto_ah_esp.c |    9 +-
> > >  net/netfilter/ipvs/ip_vs_proto_sctp.c   |   42 ++--
> > >  net/netfilter/ipvs/ip_vs_proto_tcp.c    |   40 ++-
> > >  net/netfilter/ipvs/ip_vs_proto_udp.c    |   41 ++--
> > >  net/netfilter/ipvs/ip_vs_sched.c        |    2 +-
> > >  net/netfilter/ipvs/ip_vs_sh.c           |    2 +-
> > >  net/netfilter/ipvs/ip_vs_xmit.c         |   73 +++---
> > >  net/netfilter/xt_ipvs.c                 |    4 +-
> > >  17 files changed, 501 insertions(+), 362 deletions(-)
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* [PATCH] be2net: Remove code that stops further access to BE NIC based on UE bits
From: Ajit Khaparde @ 2012-10-09  4:18 UTC (permalink / raw)
  To: netdev, davem

On certain platforms, BE hardware could  falsely indicate UE.
For BE family of NICs, do not set hw_error based on the UE bits.
If there was a real fatal error, the corresponding h/w block will
automatically go offline and stop traffic.

Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
---
 drivers/net/ethernet/emulex/benet/be_main.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index eb3f2cb..d1b6cc5 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2129,8 +2129,11 @@ void be_detect_error(struct be_adapter *adapter)
 		ue_hi = (ue_hi & ~ue_hi_mask);
 	}
 
-	if (ue_lo || ue_hi ||
-		sliport_status & SLIPORT_STATUS_ERR_MASK) {
+	/* On certain platforms BE hardware can indicate spurious UEs.
+	 * Allow the h/w to stop working completely in case of a real UE.
+	 * Hence not setting the hw_error for UE detection.
+	 */
+	if (sliport_status & SLIPORT_STATUS_ERR_MASK) {
 		adapter->hw_error = true;
 		dev_err(&adapter->pdev->dev,
 			"Error detected in the card\n");
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH] RDS: Fix spinlock recursion for rds over tcp transmit
From: Jie Liu @ 2012-10-09  4:40 UTC (permalink / raw)
  To: Venkat Venkatsubra; +Cc: rds-devel, Dan Carpenter, davem, James Morris, netdev
In-Reply-To: <50730396.1040303@oracle.com>

Hi Venkat,

On 10/09/12 00:47, Venkat Venkatsubra wrote:
> On 10/6/2012 12:42 AM, Jeff Liu wrote:
>> Hello,
>>
>> RDS ping/pong over TCP feature has broke for years(2.6.39 to 3.6.0)
>> since we have to set TCP cork and
>> call kerenel_sendmsg() to reply a ping requirement which both need to
>> lock "struct sock *sk".
>> However, this lock has already been hold before our
>> rda_tcp_data_ready() callback is triggerred.
>> As a result, we always facing spinlock recursion which would
>> resulting in system panic...
>>
>> Given that RDS ping is a special kind of message, we don't need to
>> reply it as
>> soon as possible, IMHO, we can schedule it to work queue as a delayed
>> response to
>> make TCP transport totally works.  Also, I think we can using the
>> system default
>> work queue to serve it to reduce the possible impact on general TCP
>> transmit.
>>
> Hi Jeff,
>
> I was looking at the history of changes to rds_send_pong.
> At one time rds_send_pong did this to transmit the pong message:
>        queue_delayed_work(rds_wq, &conn->c_send_w, 0);
> instead of the current
>        ret = rds_send_xmit(conn);
> i.e. the older versions did not have the deadlock problem and used to
> work once. ;-)
>
> I have suggestions for fixing it in a couple of other ways which you
> may want to consider
> to reduce the amount of code changes in a transport independent layer
> such as "send.c" for a specific underlying transport (tcp in this case).
Thanks for the feedback!

>
> 1. One option is to move back to the old way for all transports (IB,
> tcp, loopback) since
> queuing delay shouldn't be an issue for a diagnostic tool like
> rds-ping which is typically used just to test the connectivity
> and not for serious performance measurements.
So I prefer to your first suggestions since I have also tried to fix
this issue in this way at that time, of course, it really works fine. :)
And also, it could keep the code change as little as possible.
I changed my mind to queue pong message to the system default queue as
it might reduce the impact on general RDS TCP transmits,
but I turned out to be a bit overkill and with more code changes.

I'll send out the V2 patch for your review after a little while.

Thanks,
-Jeff
>
> 2. The underlying transport such as IB, loopback,TCP tells which
> method it wants to send the pong: queued way or send immediately.
>      And the code change in rds_send_pong could then simply be:
>    if (conn->c_flags & QUEUE_PONG)
>       queue_delayed_work(rds_wq, &conn->c_send_w,0);
>    else
>       ret = rds_send_xmit(conn);
> (The above example codes are not complete. You will need to propagate
> this new flag to "conn" from "rds_transport", etc. at connection setup
> time)
>
> Venkat

^ permalink raw reply

* [GIT PULL nf] IPVS fix for 3.7
From: Simon Horman @ 2012-10-09  4:47 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Hans Schillstrom, Hans Schillstrom,
	Jesper Dangaard Brouer, Arnd Bergmann

Hi Pablo,

please consider the following fix for IPVS from Arnd Bergmann for
inclusion in 3.7. I would also like it considered for 3.6, 3.5, 3.4, 3.3
and 3.0 stable.

----------------------------------------------------------------
The following changes since commit 6825a26c2dc21eb4f8df9c06d3786ddec97cf53b:

  ipv6: release reference of ip6_null_entry's dst entry in __ip6_del_rt (2012-10-04 16:00:07 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master

for you to fetch changes up to b61a602ee6730150f4d0df730d9312ac4d820ceb:

  ipvs: initialize returned data in do_ip_vs_get_ctl (2012-10-09 13:04:34 +0900)

----------------------------------------------------------------
Arnd Bergmann (1):
      ipvs: initialize returned data in do_ip_vs_get_ctl

 net/netfilter/ipvs/ip_vs_ctl.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

^ permalink raw reply

* [PATCH] ipvs: initialize returned data in do_ip_vs_get_ctl
From: Simon Horman @ 2012-10-09  4:47 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Hans Schillstrom, Hans Schillstrom,
	Jesper Dangaard Brouer, Arnd Bergmann, Simon Horman
In-Reply-To: <1349758037-25317-1-git-send-email-horms@verge.net.au>

From: Arnd Bergmann <arnd@arndb.de>

As reported by a gcc warning, the do_ip_vs_get_ctl does not initalize
all the members of the ip_vs_timeout_user structure it returns if
at least one of the TCP or UDP protocols is disabled for ipvs.

This makes sure that the data is always initialized, before it is
returned as a response to IPVS_CMD_GET_CONFIG or printed as a
debug message in IPVS_CMD_SET_CONFIG.

Without this patch, building ARM ixp4xx_defconfig results in:

net/netfilter/ipvs/ip_vs_ctl.c: In function 'ip_vs_genl_set_cmd':
net/netfilter/ipvs/ip_vs_ctl.c:2238:47: warning: 't.udp_timeout' may be used uninitialized in this function [-Wuninitialized]
net/netfilter/ipvs/ip_vs_ctl.c:3322:28: note: 't.udp_timeout' was declared here
net/netfilter/ipvs/ip_vs_ctl.c:2238:47: warning: 't.tcp_fin_timeout' may be used uninitialized in this function [-Wuninitialized]
net/netfilter/ipvs/ip_vs_ctl.c:3322:28: note: 't.tcp_fin_timeout' was declared here
net/netfilter/ipvs/ip_vs_ctl.c:2238:47: warning: 't.tcp_timeout' may be used uninitialized in this function [-Wuninitialized]
net/netfilter/ipvs/ip_vs_ctl.c:3322:28: note: 't.tcp_timeout' was declared here

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_ctl.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 7e7198b..c4ee437 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2589,6 +2589,8 @@ __ip_vs_get_timeouts(struct net *net, struct ip_vs_timeout_user *u)
 	struct ip_vs_proto_data *pd;
 #endif
 
+	memset(u, 0, sizeof (*u));
+
 #ifdef CONFIG_IP_VS_PROTO_TCP
 	pd = ip_vs_proto_data_get(net, IPPROTO_TCP);
 	u->tcp_timeout = pd->timeout_table[IP_VS_TCP_S_ESTABLISHED] / HZ;
@@ -2766,7 +2768,6 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 	{
 		struct ip_vs_timeout_user t;
 
-		memset(&t, 0, sizeof(t));
 		__ip_vs_get_timeouts(net, &t);
 		if (copy_to_user(user, &t, sizeof(t)) != 0)
 			ret = -EFAULT;
-- 
1.7.10.4


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox