Netdev List
 help / color / mirror / Atom feed
* [PATCHv3 net-next 1/5] {IPv4,xfrm} Add ESN support for AH egress part
From: Fan Du @ 2014-01-14  1:39 UTC (permalink / raw)
  To: steffen.klassert; +Cc: davem, netdev
In-Reply-To: <1389663552-29638-1-git-send-email-fan.du@windriver.com>

This patch add esn support for AH output stage by attaching upper 32bits
sequence number right after packet payload as specified by RFC 4302.

Then the ICV value will guard upper 32bits sequence number as well when
packet going out.

Signed-off-by: Fan Du <fan.du@windriver.com>
---
 net/ipv4/ah4.c |   25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index 7179026..759e489 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -12,6 +12,7 @@
 #include <linux/scatterlist.h>
 #include <net/icmp.h>
 #include <net/protocol.h>
+#include <crypto/scatterwalk.h>
 
 struct ah_skb_cb {
 	struct xfrm_skb_cb xfrm;
@@ -155,6 +156,10 @@ static int ah_output(struct xfrm_state *x, struct sk_buff *skb)
 	struct iphdr *iph, *top_iph;
 	struct ip_auth_hdr *ah;
 	struct ah_data *ahp;
+	int seqhi_len = 0;
+	__be32 *seqhi;
+	int sglists = 0;
+	struct scatterlist *seqhisg;
 
 	ahp = x->data;
 	ahash = ahp->ahash;
@@ -167,14 +172,19 @@ static int ah_output(struct xfrm_state *x, struct sk_buff *skb)
 	ah = ip_auth_hdr(skb);
 	ihl = ip_hdrlen(skb);
 
+	if (x->props.flags & XFRM_STATE_ESN) {
+		sglists = 1;
+		seqhi_len = sizeof(*seqhi);
+	}
 	err = -ENOMEM;
-	iph = ah_alloc_tmp(ahash, nfrags, ihl);
+	iph = ah_alloc_tmp(ahash, nfrags + sglists, ihl + seqhi_len);
 	if (!iph)
 		goto out;
-
-	icv = ah_tmp_icv(ahash, iph, ihl);
+	seqhi = (__be32 *)((char *)iph + ihl);
+	icv = ah_tmp_icv(ahash, seqhi, seqhi_len);
 	req = ah_tmp_req(ahash, icv);
 	sg = ah_req_sg(ahash, req);
+	seqhisg = sg + nfrags;
 
 	memset(ah->auth_data, 0, ahp->icv_trunc_len);
 
@@ -213,7 +223,14 @@ static int ah_output(struct xfrm_state *x, struct sk_buff *skb)
 	sg_init_table(sg, nfrags);
 	skb_to_sgvec(skb, sg, 0, skb->len);
 
-	ahash_request_set_crypt(req, sg, icv, skb->len);
+	if (x->props.flags & XFRM_STATE_ESN) {
+		sg_unmark_end(&sg[nfrags - 1]);
+		/* Attach seqhi sg right after packet payload */
+		*seqhi = htonl(XFRM_SKB_CB(skb)->seq.output.hi);
+		sg_init_table(seqhisg, sglists);
+		sg_set_buf(seqhisg, seqhi, seqhi_len);
+	}
+	ahash_request_set_crypt(req, sg, icv, skb->len + seqhi_len);
 	ahash_request_set_callback(req, 0, ah_output_done, skb);
 
 	AH_SKB_CB(skb)->tmp = iph;
-- 
1.7.9.5

^ permalink raw reply related

* [PATCHv3 net-next 2/5] {IPv4,xfrm} Add ESN support for AH ingress part
From: Fan Du @ 2014-01-14  1:39 UTC (permalink / raw)
  To: steffen.klassert; +Cc: davem, netdev
In-Reply-To: <1389663552-29638-1-git-send-email-fan.du@windriver.com>

This patch add esn support for AH input stage by attaching upper 32bits
sequence number right after packet payload as specified by RFC 4302.

Then the ICV value will guard upper 32bits sequence number as well when
packet getting in.

Signed-off-by: Fan Du <fan.du@windriver.com>
---
 net/ipv4/ah4.c |   25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index 759e489..7ca2fe7 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -312,6 +312,10 @@ static int ah_input(struct xfrm_state *x, struct sk_buff *skb)
 	struct ip_auth_hdr *ah;
 	struct ah_data *ahp;
 	int err = -ENOMEM;
+	int seqhi_len = 0;
+	__be32 *seqhi;
+	int sglists = 0;
+	struct scatterlist *seqhisg;
 
 	if (!pskb_may_pull(skb, sizeof(*ah)))
 		goto out;
@@ -352,14 +356,22 @@ static int ah_input(struct xfrm_state *x, struct sk_buff *skb)
 	iph = ip_hdr(skb);
 	ihl = ip_hdrlen(skb);
 
-	work_iph = ah_alloc_tmp(ahash, nfrags, ihl + ahp->icv_trunc_len);
+	if (x->props.flags & XFRM_STATE_ESN) {
+		sglists = 1;
+		seqhi_len = sizeof(*seqhi);
+	}
+
+	work_iph = ah_alloc_tmp(ahash, nfrags + sglists, ihl +
+				ahp->icv_trunc_len + seqhi_len);
 	if (!work_iph)
 		goto out;
 
-	auth_data = ah_tmp_auth(work_iph, ihl);
+	seqhi = (__be32 *)((char *)work_iph + ihl);
+	auth_data = ah_tmp_auth(seqhi, seqhi_len);
 	icv = ah_tmp_icv(ahash, auth_data, ahp->icv_trunc_len);
 	req = ah_tmp_req(ahash, icv);
 	sg = ah_req_sg(ahash, req);
+	seqhisg = sg + nfrags;
 
 	memcpy(work_iph, iph, ihl);
 	memcpy(auth_data, ah->auth_data, ahp->icv_trunc_len);
@@ -381,7 +393,14 @@ static int ah_input(struct xfrm_state *x, struct sk_buff *skb)
 	sg_init_table(sg, nfrags);
 	skb_to_sgvec(skb, sg, 0, skb->len);
 
-	ahash_request_set_crypt(req, sg, icv, skb->len);
+	if (x->props.flags & XFRM_STATE_ESN) {
+		sg_unmark_end(&sg[nfrags - 1]);
+		/* Attach seqhi sg right after packet payload */
+		*seqhi = htonl(XFRM_SKB_CB(skb)->seq.output.hi);
+		sg_init_table(seqhisg, sglists);
+		sg_set_buf(seqhisg, seqhi, seqhi_len);
+	}
+	ahash_request_set_crypt(req, sg, icv, skb->len + seqhi_len);
 	ahash_request_set_callback(req, 0, ah_input_done, skb);
 
 	AH_SKB_CB(skb)->tmp = work_iph;
-- 
1.7.9.5

^ permalink raw reply related

* [PATCHv3 net-next 3/5] {IPv6,xfrm} Add ESN support for AH egress part
From: Fan Du @ 2014-01-14  1:39 UTC (permalink / raw)
  To: steffen.klassert; +Cc: davem, netdev
In-Reply-To: <1389663552-29638-1-git-send-email-fan.du@windriver.com>

This patch add esn support for AH output stage by attaching upper 32bits
sequence number right after packet payload as specified by RFC 4302.

Then the ICV value will guard upper 32bits sequence number as well when
packet going out.

Signed-off-by: Fan Du <fan.du@windriver.com>
---
 net/ipv6/ah6.c |   24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ah6.c b/net/ipv6/ah6.c
index 81e496a..3053a01 100644
--- a/net/ipv6/ah6.c
+++ b/net/ipv6/ah6.c
@@ -346,6 +346,10 @@ static int ah6_output(struct xfrm_state *x, struct sk_buff *skb)
 	struct ip_auth_hdr *ah;
 	struct ah_data *ahp;
 	struct tmp_ext *iph_ext;
+	int seqhi_len = 0;
+	__be32 *seqhi;
+	int sglists = 0;
+	struct scatterlist *seqhisg;
 
 	ahp = x->data;
 	ahash = ahp->ahash;
@@ -359,15 +363,22 @@ static int ah6_output(struct xfrm_state *x, struct sk_buff *skb)
 	if (extlen)
 		extlen += sizeof(*iph_ext);
 
+	if (x->props.flags & XFRM_STATE_ESN) {
+		sglists = 1;
+		seqhi_len = sizeof(*seqhi);
+	}
 	err = -ENOMEM;
-	iph_base = ah_alloc_tmp(ahash, nfrags, IPV6HDR_BASELEN + extlen);
+	iph_base = ah_alloc_tmp(ahash, nfrags + sglists, IPV6HDR_BASELEN +
+				extlen + seqhi_len);
 	if (!iph_base)
 		goto out;
 
 	iph_ext = ah_tmp_ext(iph_base);
-	icv = ah_tmp_icv(ahash, iph_ext, extlen);
+	seqhi = (__be32 *)((char *)iph_ext + extlen);
+	icv = ah_tmp_icv(ahash, seqhi, seqhi_len);
 	req = ah_tmp_req(ahash, icv);
 	sg = ah_req_sg(ahash, req);
+	seqhisg = sg + nfrags;
 
 	ah = ip_auth_hdr(skb);
 	memset(ah->auth_data, 0, ahp->icv_trunc_len);
@@ -414,7 +425,14 @@ static int ah6_output(struct xfrm_state *x, struct sk_buff *skb)
 	sg_init_table(sg, nfrags);
 	skb_to_sgvec(skb, sg, 0, skb->len);
 
-	ahash_request_set_crypt(req, sg, icv, skb->len);
+	if (x->props.flags & XFRM_STATE_ESN) {
+		sg_unmark_end(&sg[nfrags - 1]);
+		/* Attach seqhi sg right after packet payload */
+		*seqhi = htonl(XFRM_SKB_CB(skb)->seq.output.hi);
+		sg_init_table(seqhisg, sglists);
+		sg_set_buf(seqhisg, seqhi, seqhi_len);
+	}
+	ahash_request_set_crypt(req, sg, icv, skb->len + seqhi_len);
 	ahash_request_set_callback(req, 0, ah6_output_done, skb);
 
 	AH_SKB_CB(skb)->tmp = iph_base;
-- 
1.7.9.5

^ permalink raw reply related

* [PATCHv3 net-next 4/5] {IPv6,xfrm} Add ESN support for AH ingress part
From: Fan Du @ 2014-01-14  1:39 UTC (permalink / raw)
  To: steffen.klassert; +Cc: davem, netdev
In-Reply-To: <1389663552-29638-1-git-send-email-fan.du@windriver.com>

This patch add esn support for AH input stage by attaching upper 32bits
sequence number right after packet payload as specified by RFC 4302.

Then the ICV value will guard upper 32bits sequence number as well when
packet going in.

Signed-off-by: Fan Du <fan.du@windriver.com>
---
 net/ipv6/ah6.c |   28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ah6.c b/net/ipv6/ah6.c
index 3053a01..8119256 100644
--- a/net/ipv6/ah6.c
+++ b/net/ipv6/ah6.c
@@ -532,6 +532,10 @@ static int ah6_input(struct xfrm_state *x, struct sk_buff *skb)
 	int nexthdr;
 	int nfrags;
 	int err = -ENOMEM;
+	int seqhi_len = 0;
+	__be32 *seqhi;
+	int sglists = 0;
+	struct scatterlist *seqhisg;
 
 	if (!pskb_may_pull(skb, sizeof(struct ip_auth_hdr)))
 		goto out;
@@ -568,14 +572,22 @@ static int ah6_input(struct xfrm_state *x, struct sk_buff *skb)
 
 	skb_push(skb, hdr_len);
 
-	work_iph = ah_alloc_tmp(ahash, nfrags, hdr_len + ahp->icv_trunc_len);
+	if (x->props.flags & XFRM_STATE_ESN) {
+		sglists = 1;
+		seqhi_len = sizeof(*seqhi);
+	}
+
+	work_iph = ah_alloc_tmp(ahash, nfrags + sglists, hdr_len +
+				ahp->icv_trunc_len + seqhi_len);
 	if (!work_iph)
 		goto out;
 
-	auth_data = ah_tmp_auth(work_iph, hdr_len);
-	icv = ah_tmp_icv(ahash, auth_data, ahp->icv_trunc_len);
+	auth_data = ah_tmp_auth((u8 *)work_iph, hdr_len);
+	seqhi = (__be32 *)(auth_data + ahp->icv_trunc_len);
+	icv = ah_tmp_icv(ahash, seqhi, seqhi_len);
 	req = ah_tmp_req(ahash, icv);
 	sg = ah_req_sg(ahash, req);
+	seqhisg = sg + nfrags;
 
 	memcpy(work_iph, ip6h, hdr_len);
 	memcpy(auth_data, ah->auth_data, ahp->icv_trunc_len);
@@ -593,7 +605,15 @@ static int ah6_input(struct xfrm_state *x, struct sk_buff *skb)
 	sg_init_table(sg, nfrags);
 	skb_to_sgvec(skb, sg, 0, skb->len);
 
-	ahash_request_set_crypt(req, sg, icv, skb->len);
+	if (x->props.flags & XFRM_STATE_ESN) {
+		sg_unmark_end(&sg[nfrags - 1]);
+		/* Attach seqhi sg right after packet payload */
+		*seqhi = htonl(XFRM_SKB_CB(skb)->seq.output.hi);
+		sg_init_table(seqhisg, sglists);
+		sg_set_buf(seqhisg, seqhi, seqhi_len);
+	}
+
+	ahash_request_set_crypt(req, sg, icv, skb->len + seqhi_len);
 	ahash_request_set_callback(req, 0, ah6_input_done, skb);
 
 	AH_SKB_CB(skb)->tmp = work_iph;
-- 
1.7.9.5

^ permalink raw reply related

* [PATCHv3 net-next 5/5] xfrm: Don't prohibit AH from using ESN feature
From: Fan Du @ 2014-01-14  1:39 UTC (permalink / raw)
  To: steffen.klassert; +Cc: davem, netdev
In-Reply-To: <1389663552-29638-1-git-send-email-fan.du@windriver.com>

Clear checking when user try to use ESN through netlink keymgr for AH.
As only ESP and AH support ESN feature according to RFC.

Signed-off-by: Fan Du <fan.du@windriver.com>
---
 net/xfrm/xfrm_user.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 97681a3..dbd287d 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -142,7 +142,8 @@ static inline int verify_replay(struct xfrm_usersa_info *p,
 	if (!rt)
 		return 0;
 
-	if (p->id.proto != IPPROTO_ESP)
+	/* As only ESP and AH support ESN feature. */
+	if ((p->id.proto != IPPROTO_ESP) && (p->id.proto != IPPROTO_AH))
 		return -EINVAL;
 
 	if (p->replay_window != 0)
-- 
1.7.9.5

^ permalink raw reply related

* [PATCHv2 net-next 0/4] Make flow cache name space aware
From: Fan Du @ 2014-01-14  1:39 UTC (permalink / raw)
  To: steffen.klassert; +Cc: davem, netdev

Hi,

This patch set aims to make flow cache operating in a per net style
when inserting flow cache entry or flush flow cache. The reason to
do so is not much compelling but reasonable, which is flushing flow
cache in original implementation has global effective, the collateral
damage is netns with only a few flow cache entries has gone.

So this patch make flow cache running in a per net scope. Operation
from different netns won't interfere with each other. And the flushing
operation is worthwhile for the netns which supposed to be.

v2:
  - Pick up newly created file include/net/flowcache.h missed in v1.

Fan Du (4):
  flowcache: Namespacify flowcache global parameters with xfrm
  flowcache: Make flowcache entry inserting/flushing in per-net style
  flowcache: Fixup flow cache part in xfrm policy
  flowcache: Bring net/core/flow.c under IPsec maintain scope

 MAINTAINERS              |    1 +
 include/net/flow.h       |    5 +-
 include/net/flowcache.h  |   25 +++++++++
 include/net/netns/xfrm.h |   11 ++++
 net/core/flow.c          |  127 +++++++++++++++++++++-------------------------
 net/xfrm/xfrm_policy.c   |    7 +--
 6 files changed, 101 insertions(+), 75 deletions(-)
 create mode 100644 include/net/flowcache.h

-- 
1.7.9.5

^ permalink raw reply

* [PATCHv2 net-next 1/4] flowcache: Namespacify flowcache global parameters with xfrm
From: Fan Du @ 2014-01-14  1:39 UTC (permalink / raw)
  To: steffen.klassert; +Cc: davem, netdev
In-Reply-To: <1389663588-29678-1-git-send-email-fan.du@windriver.com>

Since flowcache is tightly coupled with IPsec, so it would be
easier to put flow cache global parameters here into xfrm
namespace part.

Signed-off-by: Fan Du <fan.du@windriver.com>
---
 include/net/netns/xfrm.h |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h
index 1006a26..52d0086 100644
--- a/include/net/netns/xfrm.h
+++ b/include/net/netns/xfrm.h
@@ -6,6 +6,7 @@
 #include <linux/workqueue.h>
 #include <linux/xfrm.h>
 #include <net/dst_ops.h>
+#include <net/flowcache.h>
 
 struct ctl_table_header;
 
@@ -61,6 +62,16 @@ struct netns_xfrm {
 	spinlock_t xfrm_policy_sk_bundle_lock;
 	rwlock_t xfrm_policy_lock;
 	struct mutex xfrm_cfg_mutex;
+
+	/* flow cache part */
+	struct flow_cache	flow_cache_global;
+	struct kmem_cache	*flow_cachep;
+	atomic_t		flow_cache_genid;
+	struct list_head	flow_cache_gc_list;
+	spinlock_t		flow_cache_gc_lock;
+	struct work_struct	flow_cache_gc_work;
+	struct work_struct	flow_cache_flush_work;
+	struct mutex		flow_flush_sem;
 };
 
 #endif
-- 
1.7.9.5

^ permalink raw reply related

* [PATCHv2 net-next 2/4] flowcache: Make flowcache entry inserting/flushing in per-net style
From: Fan Du @ 2014-01-14  1:39 UTC (permalink / raw)
  To: steffen.klassert; +Cc: davem, netdev
In-Reply-To: <1389663588-29678-1-git-send-email-fan.du@windriver.com>

Inserting a entry into flowcache, or flushing flowcache should be based
on per net scope. The reason to do so is flushing operation from fat
netns crammed with flow entries will also making the slim netns with only
a few flow cache entries go away in original implementation.

Signed-off-by: Fan Du <fan.du@windriver.com>
---
 include/net/flow.h      |    5 +-
 include/net/flowcache.h |   25 ++++++++++
 net/core/flow.c         |  127 +++++++++++++++++++++--------------------------
 3 files changed, 85 insertions(+), 72 deletions(-)
 create mode 100644 include/net/flowcache.h

diff --git a/include/net/flow.h b/include/net/flow.h
index d23e7fa..bee3741 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -218,9 +218,10 @@ struct flow_cache_object *flow_cache_lookup(struct net *net,
 					    const struct flowi *key, u16 family,
 					    u8 dir, flow_resolve_t resolver,
 					    void *ctx);
+int flow_cache_init(struct net *net);
 
-void flow_cache_flush(void);
-void flow_cache_flush_deferred(void);
+void flow_cache_flush(struct net *net);
+void flow_cache_flush_deferred(struct net *net);
 extern atomic_t flow_cache_genid;
 
 #endif
diff --git a/include/net/flowcache.h b/include/net/flowcache.h
new file mode 100644
index 0000000..c8f665e
--- /dev/null
+++ b/include/net/flowcache.h
@@ -0,0 +1,25 @@
+#ifndef _NET_FLOWCACHE_H
+#define _NET_FLOWCACHE_H
+
+#include <linux/interrupt.h>
+#include <linux/types.h>
+#include <linux/timer.h>
+#include <linux/notifier.h>
+
+struct flow_cache_percpu {
+	struct hlist_head		*hash_table;
+	int				hash_count;
+	u32				hash_rnd;
+	int				hash_rnd_recalc;
+	struct tasklet_struct		flush_tasklet;
+};
+
+struct flow_cache {
+	u32				hash_shift;
+	struct flow_cache_percpu __percpu *percpu;
+	struct notifier_block		hotcpu_notifier;
+	int				low_watermark;
+	int				high_watermark;
+	struct timer_list		rnd_timer;
+};
+#endif	/* _NET_FLOWCACHE_H */
diff --git a/net/core/flow.c b/net/core/flow.c
index dfa602c..344a184 100644
--- a/net/core/flow.c
+++ b/net/core/flow.c
@@ -24,6 +24,7 @@
 #include <net/flow.h>
 #include <linux/atomic.h>
 #include <linux/security.h>
+#include <net/net_namespace.h>
 
 struct flow_cache_entry {
 	union {
@@ -38,37 +39,12 @@ struct flow_cache_entry {
 	struct flow_cache_object	*object;
 };
 
-struct flow_cache_percpu {
-	struct hlist_head		*hash_table;
-	int				hash_count;
-	u32				hash_rnd;
-	int				hash_rnd_recalc;
-	struct tasklet_struct		flush_tasklet;
-};
-
 struct flow_flush_info {
 	struct flow_cache		*cache;
 	atomic_t			cpuleft;
 	struct completion		completion;
 };
 
-struct flow_cache {
-	u32				hash_shift;
-	struct flow_cache_percpu __percpu *percpu;
-	struct notifier_block		hotcpu_notifier;
-	int				low_watermark;
-	int				high_watermark;
-	struct timer_list		rnd_timer;
-};
-
-atomic_t flow_cache_genid = ATOMIC_INIT(0);
-EXPORT_SYMBOL(flow_cache_genid);
-static struct flow_cache flow_cache_global;
-static struct kmem_cache *flow_cachep __read_mostly;
-
-static DEFINE_SPINLOCK(flow_cache_gc_lock);
-static LIST_HEAD(flow_cache_gc_list);
-
 #define flow_cache_hash_size(cache)	(1 << (cache)->hash_shift)
 #define FLOW_HASH_RND_PERIOD		(10 * 60 * HZ)
 
@@ -84,46 +60,50 @@ static void flow_cache_new_hashrnd(unsigned long arg)
 	add_timer(&fc->rnd_timer);
 }
 
-static int flow_entry_valid(struct flow_cache_entry *fle)
+static int flow_entry_valid(struct flow_cache_entry *fle,
+				struct netns_xfrm *xfrm)
 {
-	if (atomic_read(&flow_cache_genid) != fle->genid)
+	if (atomic_read(&xfrm->flow_cache_genid) != fle->genid)
 		return 0;
 	if (fle->object && !fle->object->ops->check(fle->object))
 		return 0;
 	return 1;
 }
 
-static void flow_entry_kill(struct flow_cache_entry *fle)
+static void flow_entry_kill(struct flow_cache_entry *fle,
+				struct netns_xfrm *xfrm)
 {
 	if (fle->object)
 		fle->object->ops->delete(fle->object);
-	kmem_cache_free(flow_cachep, fle);
+	kmem_cache_free(xfrm->flow_cachep, fle);
 }
 
 static void flow_cache_gc_task(struct work_struct *work)
 {
 	struct list_head gc_list;
 	struct flow_cache_entry *fce, *n;
+	struct netns_xfrm *xfrm = container_of(work, struct netns_xfrm,
+						flow_cache_gc_work);
 
 	INIT_LIST_HEAD(&gc_list);
-	spin_lock_bh(&flow_cache_gc_lock);
-	list_splice_tail_init(&flow_cache_gc_list, &gc_list);
-	spin_unlock_bh(&flow_cache_gc_lock);
+	spin_lock_bh(&xfrm->flow_cache_gc_lock);
+	list_splice_tail_init(&xfrm->flow_cache_gc_list, &gc_list);
+	spin_unlock_bh(&xfrm->flow_cache_gc_lock);
 
 	list_for_each_entry_safe(fce, n, &gc_list, u.gc_list)
-		flow_entry_kill(fce);
+		flow_entry_kill(fce, xfrm);
 }
-static DECLARE_WORK(flow_cache_gc_work, flow_cache_gc_task);
 
 static void flow_cache_queue_garbage(struct flow_cache_percpu *fcp,
-				     int deleted, struct list_head *gc_list)
+				     int deleted, struct list_head *gc_list,
+				     struct netns_xfrm *xfrm)
 {
 	if (deleted) {
 		fcp->hash_count -= deleted;
-		spin_lock_bh(&flow_cache_gc_lock);
-		list_splice_tail(gc_list, &flow_cache_gc_list);
-		spin_unlock_bh(&flow_cache_gc_lock);
-		schedule_work(&flow_cache_gc_work);
+		spin_lock_bh(&xfrm->flow_cache_gc_lock);
+		list_splice_tail(gc_list, &xfrm->flow_cache_gc_list);
+		spin_unlock_bh(&xfrm->flow_cache_gc_lock);
+		schedule_work(&xfrm->flow_cache_gc_work);
 	}
 }
 
@@ -135,6 +115,8 @@ static void __flow_cache_shrink(struct flow_cache *fc,
 	struct hlist_node *tmp;
 	LIST_HEAD(gc_list);
 	int i, deleted = 0;
+	struct netns_xfrm *xfrm = container_of(fc, struct netns_xfrm,
+						flow_cache_global);
 
 	for (i = 0; i < flow_cache_hash_size(fc); i++) {
 		int saved = 0;
@@ -142,7 +124,7 @@ static void __flow_cache_shrink(struct flow_cache *fc,
 		hlist_for_each_entry_safe(fle, tmp,
 					  &fcp->hash_table[i], u.hlist) {
 			if (saved < shrink_to &&
-			    flow_entry_valid(fle)) {
+			    flow_entry_valid(fle, xfrm)) {
 				saved++;
 			} else {
 				deleted++;
@@ -152,7 +134,7 @@ static void __flow_cache_shrink(struct flow_cache *fc,
 		}
 	}
 
-	flow_cache_queue_garbage(fcp, deleted, &gc_list);
+	flow_cache_queue_garbage(fcp, deleted, &gc_list, xfrm);
 }
 
 static void flow_cache_shrink(struct flow_cache *fc,
@@ -208,7 +190,7 @@ struct flow_cache_object *
 flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir,
 		  flow_resolve_t resolver, void *ctx)
 {
-	struct flow_cache *fc = &flow_cache_global;
+	struct flow_cache *fc = &net->xfrm.flow_cache_global;
 	struct flow_cache_percpu *fcp;
 	struct flow_cache_entry *fle, *tfle;
 	struct flow_cache_object *flo;
@@ -248,7 +230,7 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir,
 		if (fcp->hash_count > fc->high_watermark)
 			flow_cache_shrink(fc, fcp);
 
-		fle = kmem_cache_alloc(flow_cachep, GFP_ATOMIC);
+		fle = kmem_cache_alloc(net->xfrm.flow_cachep, GFP_ATOMIC);
 		if (fle) {
 			fle->net = net;
 			fle->family = family;
@@ -258,7 +240,7 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir,
 			hlist_add_head(&fle->u.hlist, &fcp->hash_table[hash]);
 			fcp->hash_count++;
 		}
-	} else if (likely(fle->genid == atomic_read(&flow_cache_genid))) {
+	} else if (likely(fle->genid == atomic_read(&net->xfrm.flow_cache_genid))) {
 		flo = fle->object;
 		if (!flo)
 			goto ret_object;
@@ -279,7 +261,7 @@ nocache:
 	}
 	flo = resolver(net, key, family, dir, flo, ctx);
 	if (fle) {
-		fle->genid = atomic_read(&flow_cache_genid);
+		fle->genid = atomic_read(&net->xfrm.flow_cache_genid);
 		if (!IS_ERR(flo))
 			fle->object = flo;
 		else
@@ -303,12 +285,14 @@ static void flow_cache_flush_tasklet(unsigned long data)
 	struct hlist_node *tmp;
 	LIST_HEAD(gc_list);
 	int i, deleted = 0;
+	struct netns_xfrm *xfrm = container_of(fc, struct netns_xfrm,
+						flow_cache_global);
 
 	fcp = this_cpu_ptr(fc->percpu);
 	for (i = 0; i < flow_cache_hash_size(fc); i++) {
 		hlist_for_each_entry_safe(fle, tmp,
 					  &fcp->hash_table[i], u.hlist) {
-			if (flow_entry_valid(fle))
+			if (flow_entry_valid(fle, xfrm))
 				continue;
 
 			deleted++;
@@ -317,7 +301,7 @@ static void flow_cache_flush_tasklet(unsigned long data)
 		}
 	}
 
-	flow_cache_queue_garbage(fcp, deleted, &gc_list);
+	flow_cache_queue_garbage(fcp, deleted, &gc_list, xfrm);
 
 	if (atomic_dec_and_test(&info->cpuleft))
 		complete(&info->completion);
@@ -351,10 +335,9 @@ static void flow_cache_flush_per_cpu(void *data)
 	tasklet_schedule(tasklet);
 }
 
-void flow_cache_flush(void)
+void flow_cache_flush(struct net *net)
 {
 	struct flow_flush_info info;
-	static DEFINE_MUTEX(flow_flush_sem);
 	cpumask_var_t mask;
 	int i, self;
 
@@ -365,8 +348,8 @@ void flow_cache_flush(void)
 
 	/* Don't want cpus going down or up during this. */
 	get_online_cpus();
-	mutex_lock(&flow_flush_sem);
-	info.cache = &flow_cache_global;
+	mutex_lock(&net->xfrm.flow_flush_sem);
+	info.cache = &net->xfrm.flow_cache_global;
 	for_each_online_cpu(i)
 		if (!flow_cache_percpu_empty(info.cache, i))
 			cpumask_set_cpu(i, mask);
@@ -386,21 +369,23 @@ void flow_cache_flush(void)
 	wait_for_completion(&info.completion);
 
 done:
-	mutex_unlock(&flow_flush_sem);
+	mutex_unlock(&net->xfrm.flow_flush_sem);
 	put_online_cpus();
 	free_cpumask_var(mask);
 }
 
 static void flow_cache_flush_task(struct work_struct *work)
 {
-	flow_cache_flush();
-}
+	struct netns_xfrm *xfrm = container_of(work, struct netns_xfrm,
+						flow_cache_gc_work);
+	struct net *net = container_of(xfrm, struct net, xfrm);
 
-static DECLARE_WORK(flow_cache_flush_work, flow_cache_flush_task);
+	flow_cache_flush(net);
+}
 
-void flow_cache_flush_deferred(void)
+void flow_cache_flush_deferred(struct net *net)
 {
-	schedule_work(&flow_cache_flush_work);
+	schedule_work(&net->xfrm.flow_cache_flush_work);
 }
 
 static int flow_cache_cpu_prepare(struct flow_cache *fc, int cpu)
@@ -425,7 +410,8 @@ static int flow_cache_cpu(struct notifier_block *nfb,
 			  unsigned long action,
 			  void *hcpu)
 {
-	struct flow_cache *fc = container_of(nfb, struct flow_cache, hotcpu_notifier);
+	struct flow_cache *fc = container_of(nfb, struct flow_cache,
+						hotcpu_notifier);
 	int res, cpu = (unsigned long) hcpu;
 	struct flow_cache_percpu *fcp = per_cpu_ptr(fc->percpu, cpu);
 
@@ -444,9 +430,20 @@ static int flow_cache_cpu(struct notifier_block *nfb,
 	return NOTIFY_OK;
 }
 
-static int __init flow_cache_init(struct flow_cache *fc)
+int flow_cache_init(struct net *net)
 {
 	int i;
+	struct flow_cache *fc = &net->xfrm.flow_cache_global;
+
+	/* Initialize per-net flow cache global variables here */
+	net->xfrm.flow_cachep = kmem_cache_create("flow_cache",
+					sizeof(struct flow_cache_entry),
+					0, SLAB_PANIC, NULL);
+	spin_lock_init(&net->xfrm.flow_cache_gc_lock);
+	INIT_LIST_HEAD(&net->xfrm.flow_cache_gc_list);
+	INIT_WORK(&net->xfrm.flow_cache_gc_work, flow_cache_gc_task);
+	INIT_WORK(&net->xfrm.flow_cache_flush_work, flow_cache_flush_task);
+	mutex_init(&net->xfrm.flow_flush_sem);
 
 	fc->hash_shift = 10;
 	fc->low_watermark = 2 * flow_cache_hash_size(fc);
@@ -484,14 +481,4 @@ err:
 
 	return -ENOMEM;
 }
-
-static int __init flow_cache_init_global(void)
-{
-	flow_cachep = kmem_cache_create("flow_cache",
-					sizeof(struct flow_cache_entry),
-					0, SLAB_PANIC, NULL);
-
-	return flow_cache_init(&flow_cache_global);
-}
-
-module_init(flow_cache_init_global);
+EXPORT_SYMBOL(flow_cache_init);
-- 
1.7.9.5

^ permalink raw reply related

* [PATCHv2 net-next 3/4] flowcache: Fixup flow cache part in xfrm policy
From: Fan Du @ 2014-01-14  1:39 UTC (permalink / raw)
  To: steffen.klassert; +Cc: davem, netdev
In-Reply-To: <1389663588-29678-1-git-send-email-fan.du@windriver.com>

Bump flow cache genid, and flush flow cache should also be made
in per net style.

Signed-off-by: Fan Du <fan.du@windriver.com>
---
 net/xfrm/xfrm_policy.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index e205c4b..d39c90f 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -661,7 +661,7 @@ int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl)
 		hlist_add_head(&policy->bydst, chain);
 	xfrm_pol_hold(policy);
 	net->xfrm.policy_count[dir]++;
-	atomic_inc(&flow_cache_genid);
+	atomic_inc(&net->xfrm.flow_cache_genid);
 
 	/* After previous checking, family can either be AF_INET or AF_INET6 */
 	if (policy->family == AF_INET)
@@ -2567,14 +2567,14 @@ static void __xfrm_garbage_collect(struct net *net)
 
 void xfrm_garbage_collect(struct net *net)
 {
-	flow_cache_flush();
+	flow_cache_flush(net);
 	__xfrm_garbage_collect(net);
 }
 EXPORT_SYMBOL(xfrm_garbage_collect);
 
 static void xfrm_garbage_collect_deferred(struct net *net)
 {
-	flow_cache_flush_deferred();
+	flow_cache_flush_deferred(net);
 	__xfrm_garbage_collect(net);
 }
 
@@ -2947,6 +2947,7 @@ static int __net_init xfrm_net_init(struct net *net)
 	spin_lock_init(&net->xfrm.xfrm_policy_sk_bundle_lock);
 	mutex_init(&net->xfrm.xfrm_cfg_mutex);
 
+	flow_cache_init(net);
 	return 0;
 
 out_sysctl:
-- 
1.7.9.5

^ permalink raw reply related

* [PATCHv2 net-next 4/4] flowcache: Bring net/core/flow.c under IPsec maintain scope
From: Fan Du @ 2014-01-14  1:39 UTC (permalink / raw)
  To: steffen.klassert; +Cc: davem, netdev
In-Reply-To: <1389663588-29678-1-git-send-email-fan.du@windriver.com>

As flow cache is mainly manipulated from IPsec.

Signed-off-by: Fan Du <fan.du@windriver.com>
---
 MAINTAINERS |    1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e11d495..14ad385 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5916,6 +5916,7 @@ L:	netdev@vger.kernel.org
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git
 S:	Maintained
+F:	net/core/flow.c
 F:	net/xfrm/
 F:	net/key/
 F:	net/ipv4/xfrm*
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH net 0/2] bonding: fix the sysfs warning when change the master's name
From: Ding Tianhong @ 2014-01-14  1:49 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: Jay Vosburgh, David S. Miller, Netdev, Eric Dumazet
In-Reply-To: <20140113172446.GA2081@redhat.com>

On 2014/1/14 1:24, Veaceslav Falico wrote:
> On Mon, Jan 13, 2014 at 09:08:06PM +0800, Ding Tianhong wrote:
>> When I change the master's name, and then rebuild the master and ensalve a nic again,
>> than I got the calltrace:
>>
>> [329215.749344] WARNING: CPU: 0 PID: 4778 at fs/sysfs/dir.c:486 sysfs_warn_dup+0x87/0xa0()
>> [329215.749347] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:03.0/0000:02:00.0/net/eth100/upper_bond0'
> ...snip...
>> [329215.749494]  [<ffffffff81205b27>] sysfs_warn_dup+0x87/0xa0
>> [329215.749500]  [<ffffffff81205eed>] sysfs_add_one+0x4d/0x50
>> [329215.749505]  [<ffffffff81206f9e>] sysfs_do_create_link_sd+0xbe/0x210
>> [329215.749511]  [<ffffffff812951a0>] ? sprintf+0x40/0x50
>> [329215.749516]  [<ffffffff8120714b>] sysfs_create_link+0x2b/0x30
>> [329215.749523]  [<ffffffff8140a708>] __netdev_adjacent_dev_insert+0x1b8/0x270
>> [329215.749528]  [<ffffffff8140a7f8>] __netdev_adjacent_dev_link_lists+0x38/0x90
>> [329215.749533]  [<ffffffff8140a98b>] __netdev_upper_dev_link+0x13b/0x470
>> [329215.749538]  [<ffffffff8141319c>] ? __ethtool_get_settings+0x5c/0x90
>> [329215.749547]  [<ffffffffa0722179>] ? bond_update_speed_duplex+0x29/0x70 [bonding]
>> [329215.749552]  [<ffffffff8140acd1>] netdev_master_upper_dev_link_private+0x11/0x20
>> [329215.749561]  [<ffffffffa0729246>] bond_enslave+0x806/0xe40 [bonding]
>> [329215.749570]  [<ffffffffa073241f>] bonding_store_slaves+0x18f/0x1c0 [bonding]
>> [329215.749576]  [<ffffffff813757ab>] dev_attr_store+0x1b/0x20
>> [329215.749581]  [<ffffffff812049cc>] sysfs_write_file+0x15c/0x1f0
>> [329215.749587]  [<ffffffff81188897>] vfs_write+0xc7/0x1e0
> 
> It's unrelated to bonding, as it touches any device that uses netdev_adjacent
> logic.

Yes, it is a problem for every device that uses netdev_adjacent.
> 
> This case (renaming stale sysfs links) should be properly handled in
> dev_change_name().
> 

Ok, I will try and fix it in dev_change_name().

> .
> 

^ permalink raw reply

* Re: [PATCH net 2/2] bonding: rename the dev upper link if the master's, name changed
From: Ding Tianhong @ 2014-01-14  1:49 UTC (permalink / raw)
  To: Sergei Shtylyov, Jay Vosburgh, Veaceslav Falico, Eric Dumazet,
	David S. Miller, Netdev
In-Reply-To: <52D42842.5010709@cogentembedded.com>

On 2014/1/14 1:54, Sergei Shtylyov wrote:
> On 13-01-2014 17:08, Ding Tianhong wrote:
> 
>> The bond_maste_rename() will rename the links for slave dev's upper dev link,
> 
>    s/maste/master/.
> 
>> if faild, it will rollback and rename the new name to old name for slave dev.
> 
>    s/faild/failed/.
> 
>> Add a new parameter called name to save the old bonding name in struct bonding.
> 
>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> ---
>>   drivers/net/bonding/bond_main.c | 35 +++++++++++++++++++++++++++++++++++
>>   drivers/net/bonding/bonding.h   |  1 +
>>   2 files changed, 36 insertions(+)
> 
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 4b8c58b..8c044c0 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -2799,11 +2799,41 @@ re_arm:
>>
>>   /*-------------------------- netdev event handling --------------------------*/
>>
>> +static int bond_master_rename(struct bonding *bond)
>> +{
>> +    struct slave *slave;
>> +    struct list_head *iter;
>> +    char ori_linkname[IFNAMSIZ + 7], new_linkname[IFNAMSIZ + 7];
> 
>    Perhaps s/ori/old/ is better?
> 
>> +    int err = 0;
>> +
>> +    sprintf(ori_linkname, "upper_%s", bond->name);
>> +    sprintf(new_linkname, "upper_%s", bond->dev->name);
>> +
>> +    bond_for_each_slave(bond, slave, iter) {
>> +
> 
>    No need for this empty line, I think.
> 
>> +        err = netdev_upper_dev_rename(slave->dev, bond->dev, ori_linkname,
>> +                    new_linkname);
> 
>    The continuation line should start right under 'slave' on the broken up line.
> 
> WBR, Sergei
> 
> 
Yes, thanks.

Regards
Ding

> 

^ permalink raw reply

* Re: [PATCH net 0/2] bonding: ensure that the TSO being set on bond master
From: Ding Tianhong @ 2014-01-14  2:06 UTC (permalink / raw)
  To: David Miller; +Cc: fubar, vfalico, edumazet, netdev
In-Reply-To: <20140113.111511.1204304092822372663.davem@davemloft.net>

On 2014/1/14 3:15, David Miller wrote:
> From: Ding Tianhong <dingtianhong@huawei.com>
> Date: Wed, 8 Jan 2014 15:28:21 +0800
> 
>> The commit b0ce3508(bonding: allow TSO being set on bonding master)
>> has make the TSO being set for bond dev, but in some situation, if
>> the slave did not have the NETIF_F_SG features, the bond master will
>> miss the TSO features in netdev_fix_features because the TSO is
>> depended on SG. So I have to add SG and TSO features on bond master
>> together.
>>
>> The function netdev_add_tso_features() was only be used for bonding,
>> so no need to export it in netdevice.h, remove it and add it to bonding.
> 
> As far as I can tell from the discussion, there is some issue wrt. TSO
> about what happens if SG is not supported by some of the slaves.
> 
>>From my perspective it appears that some changes to these patches are
> necessary to handle that correctly.
> 
> So I am going to mark them as "Changes Requested" in patchwork.
> 
> If this is not the case, please resubmit these changes with appropriate
> explanations added to the commit message(s).
> 
> Thanks.
> 
> 

Ok, thanks.

Regards
Ding

^ permalink raw reply

* Re: [PATCH net] bonding: reset the slave's mtu when its be changed
From: Ding Tianhong @ 2014-01-14  2:11 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: Jay Vosburgh, Netdev, David S. Miller
In-Reply-To: <52D225A2.3070208@huawei.com>

On 2014/1/12 13:18, Ding Tianhong wrote:
> On 2014/1/10 20:19, Veaceslav Falico wrote:
>> On Fri, Jan 10, 2014 at 07:32:51PM +0800, Ding Tianhong wrote:
>>> All slave should have the same mtu with mastet's, and the bond do it when
>>> enslave the slave, but the user could change the slave's mtu, it will cause
>>> the master and slave have different mtu, althrough in AB mode, it does not
>>> matter if the slave is not the current slave, but in other mode, it is incorrect,
>>> so reset the slave's mtu like the master set.
>>
>> Why "net"? It's not a bugfix, it's a feature, and really discussable.
>>
>> Also, wrt the actual change - why do you think it's incorrect for slaves in
>> bonding mode other than AB to have different MTU values? I don't see any
>> reason for it, from the top of the head.
>>
> 
> Ok, I will test more situation for every mode when slave's mtu changed, I am not sure
> what will happened yet, if some links was interrupt, I thinks it is a bug. 
> 
>>>

I have test several mode for bonding when the slave mtu changed:

RR(0)	0<mtu<1500 		ok
AB(1)	0<mtu<1500		loss packets
XOR(2)	0<mtu<1500		ok
Broadcast(3)	0<mtu<1500	ok
LACP		0<mtu<1500	loss packets


so I think we should not let the mtu set for slave.

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: vxlan: when lower dev unregisters remove vxlan dev as well
From: Stephen Hemminger @ 2014-01-14  2:22 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev
In-Reply-To: <1389634880-4138-2-git-send-email-dborkman@redhat.com>

On Mon, 13 Jan 2014 18:41:19 +0100
Daniel Borkmann <dborkman@redhat.com> wrote:

> We can create a vxlan device with an explicit underlying carrier.
> In that case, when the carrier link is being deleted from the
> system (e.g. due to module unload) we should also clean up all
> created vxlan devices on top of it since otherwise we're in an
> inconsistent state in vxlan device. In that case, the user needs
> to remove all such devices, while in case of other virtual devs
> that sit on top of physical ones, it is usually the case that
> these devices do unregister automatically as well and do not
> leave the burden on the user.
> 
> This work is not necessary when vxlan device was not created with
> a real underlying device, as connections can resume in that case
> when driver is plugged again. But at least for the other cases,
> we should go ahead and do the cleanup on removal.
> 
> We don't register the notifier during vxlan_newlink() here since
> I consider this event rather rare, and therefore we should not
> bloat vxlan's core structure unecessary. Also, we can simply make
> use of unregister_netdevice_many() to batch that. fdb is flushed
> upon ndo_stop().
> 
> E.g. `ip -d link show vxlan13` after carrier removal before
> this patch:
> 
> 5: vxlan13: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default
>     link/ether 1e:47:da:6d:4d:99 brd ff:ff:ff:ff:ff:ff promiscuity 0
>     vxlan id 13 group 239.0.0.10 dev 2 port 32768 61000 ageing 300
>                                  ^^^^^
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>

Since vxlan is running over UDP socket. I wonder if this could be
done better by implementing something equivalent to SO_BINDTODEVICE.

What happens to a user land application which has a UDP socket
and has done SO_BINDTODEVICE and device is removed? Is there an asynchronous
error, can the application recover? Why can't vxlan use the same mechanism?

^ permalink raw reply

* Re: [PATCH 09/15] net: ixgbe calls skb_set_hash
From: Brown, Aaron F @ 2014-01-14  2:26 UTC (permalink / raw)
  To: Kirsher, Jeffrey T, therbert@google.com, netdev@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <CAL3LdT7HC0EcYdbTxYe6OEy_cy6EbL1udafhwJA1VWCFv5gG1w@mail.gmail.com>

On Wed, 2013-12-18 at 00:41 -0800, Jeff Kirsher wrote:
> On Tue, Dec 17, 2013 at 11:28 PM, Tom Herbert <therbert@google.com> wrote:
> > Drivers should call skb_set_hash to set the hash and its type
> > in an skbuff.
> >
> > Signed-off-by: Tom Herbert <therbert@google.com>
> > ---
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> 
> I have added this patch to my queue, thanks Tom.

Signed-off by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>

^ permalink raw reply

* [PATCH net-next 0/3] bonding: fix primary problem for bonding
From: Ding Tianhong @ 2014-01-14  2:36 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, Netdev, David S. Miller

If the slave's name changed, and the bond params primary is exist,
the bond should deal with the situation in two ways:

1) If the slave is the primary slave yet, clean the primary slave
   and reselect active slave.
2) If the slave's new name is as same as bond primary, set the slave
   as primary slave and reselect active slave.

If the new primay is not matching any slave in the bond, the bond should
record it to params, clean the primary slave and select a new active slave.

Update bonding.txt for primary description.

Ding Tianhong (3):
  bonding: update the primary slave when slave's name changed
  bonding: clean the primary slave if there is no slave matching new
    primary
  bonding: update bonding.txt for primary description.

 Documentation/networking/bonding.txt |  3 ++-
 drivers/net/bonding/bond_main.c      | 30 ++++++++++++++++++++++++++++--
 drivers/net/bonding/bond_options.c   |  6 ++++++
 3 files changed, 36 insertions(+), 3 deletions(-)

-- 
1.8.0

^ permalink raw reply

* [PATCH net-next 1/3] bonding: update the primary slave when slave's name changed
From: Ding Tianhong @ 2014-01-14  2:36 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, David S. Miller, Netdev

If the slave's name changed, and the bond params primary is exist,
the bond should deal with the situation in two ways:

1) If the slave is the primary slave yet, clean the primary slave
   and reselect active slave.
2) If the slave's new name is as same as bond primary, set the slave
   as primary slave and reselect active slave.

Thanks for Veaceslav's suggestion.

Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/bonding/bond_main.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e06c445..63d6533 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2860,9 +2860,35 @@ static int bond_slave_netdev_event(unsigned long event,
 		 */
 		break;
 	case NETDEV_CHANGENAME:
-		/*
-		 * TODO: handle changing the primary's name
+		/* Handle changing the slave's name:
+		 * 1) If the slave is primary save yet,
+		 * clean the primary slave and reselect
+		 * active slave.
+		 * 2) If the slave's new name is bond
+		 * primary, set the slave as primary
+		 * slave and reselect active slave.
 		 */
+		if (USES_PRIMARY(bond->params.mode) &&
+		    bond->params.primary[0]) {
+			if (bond->primary_slave &&
+			    slave == bond->primary_slave) {
+				pr_info("%s: Setting primary slave to None.\n",
+					bond->dev->name);
+				bond->primary_slave = NULL;
+				write_lock_bh(&bond->curr_slave_lock);
+				bond_select_active_slave(bond);
+				write_unlock_bh(&bond->curr_slave_lock);
+			} else if (!bond->primary_slave &&
+				   !strcmp(bond->params.primary,
+					  slave_dev->name)) {
+				pr_info("%s: Setting %s as primary slave.\n",
+					bond->dev->name, slave_dev->name);
+				bond->primary_slave = slave;
+				write_lock_bh(&bond->curr_slave_lock);
+				bond_select_active_slave(bond);
+				write_unlock_bh(&bond->curr_slave_lock);
+			}
+		}
 		break;
 	case NETDEV_FEAT_CHANGE:
 		bond_compute_features(bond);
-- 
1.8.0

^ permalink raw reply related

* [PATCH net-next 2/3] bonding: clean the primary slave if there is no slave matching new primary
From: Ding Tianhong @ 2014-01-14  2:37 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, David S. Miller, Netdev

If the new primay is not matching any slave in the bond, the bond should
record it to params, clean the primary slave and select a new active slave.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/bonding/bond_options.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 945a666..0ee0bfe 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -512,6 +512,12 @@ int bond_option_primary_set(struct bonding *bond, const char *primary)
 		}
 	}
 
+	if (bond->primary_slave) {
+		pr_info("%s: Setting primary slave to None.\n",
+			bond->dev->name);
+		bond->primary_slave = NULL;
+		bond_select_active_slave(bond);
+	}
 	strncpy(bond->params.primary, primary, IFNAMSIZ);
 	bond->params.primary[IFNAMSIZ - 1] = 0;
 
-- 
1.8.0

^ permalink raw reply related

* [PATCH net-next 3/3] bonding: update bonding.txt for primary description
From: Ding Tianhong @ 2014-01-14  2:37 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, David S. Miller, Netdev

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 Documentation/networking/bonding.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index a4d925e..5cdb229 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -657,7 +657,8 @@ primary
 	one slave is preferred over another, e.g., when one slave has
 	higher throughput than another.
 
-	The primary option is only valid for active-backup mode.
+	The primary option is only valid for active-backup(1),
+	balance-tlb (5) and balance-alb (6) mode.
 
 primary_reselect
 
-- 
1.8.0

^ permalink raw reply related

* Re: [PATCH v3 4/4] dma debug: introduce debug_dma_assert_idle()
From: Dan Williams @ 2014-01-14  2:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: dmaengine@vger.kernel.org, Vinod Koul, Netdev, Joerg Roedel,
	linux-kernel@vger.kernel.org, James Bottomley, Russell King
In-Reply-To: <20140113171412.dd90c020b103f4a686f8dc34@linux-foundation.org>

On Mon, Jan 13, 2014 at 5:14 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon, 13 Jan 2014 16:48:47 -0800 Dan Williams <dan.j.williams@intel.com> wrote:
>
>> Record actively mapped pages and provide an api for asserting a given
>> page is dma inactive before execution proceeds.  Placing
>> debug_dma_assert_idle() in cow_user_page() flagged the violation of the
>> dma-api in the NET_DMA implementation (see commit 77873803363c "net_dma:
>> mark broken").
>
> Some discussion of the overlap counter thing would be useful.

Ok, will add:

"The implementation also has the ability to count repeat mappings of
the same page without an intervening unmap.  This counter is limited
to the few bits of tag space in a radix tree.  This mechanism is added
to mitigate false negative cases where, for example, a page is dma
mapped twice and debug_dma_assert_idle() is called after the page is
un-mapped once."

>> --- a/include/linux/dma-debug.h
>> +++ b/include/linux/dma-debug.h
>>
>> ...
>>
>> +static void __active_pfn_inc_overlap(struct dma_debug_entry *entry)
>> +{
>> +     unsigned long pfn = entry->pfn;
>> +     int i;
>> +
>> +     for (i = 0; i < RADIX_TREE_MAX_TAGS; i++)
>> +             if (radix_tree_tag_get(&dma_active_pfn, pfn, i) == 0) {
>> +                     radix_tree_tag_set(&dma_active_pfn, pfn, i);
>> +                     return;
>> +             }
>> +     pr_debug("DMA-API: max overlap count (%d) reached for pfn 0x%lx\n",
>> +              RADIX_TREE_MAX_TAGS, pfn);
>> +}
>> +
>> +static void __active_pfn_dec_overlap(struct dma_debug_entry *entry)
>> +{
>> +     unsigned long pfn = entry->pfn;
>> +     int i;
>> +
>> +     for (i = RADIX_TREE_MAX_TAGS - 1; i >= 0; i--)
>> +             if (radix_tree_tag_get(&dma_active_pfn, pfn, i)) {
>> +                     radix_tree_tag_clear(&dma_active_pfn, pfn, i);
>> +                     return;
>> +             }
>> +     radix_tree_delete(&dma_active_pfn, pfn);
>> +}
>> +
>> +static int active_pfn_insert(struct dma_debug_entry *entry)
>> +{
>> +     unsigned long flags;
>> +     int rc;
>> +
>> +     spin_lock_irqsave(&radix_lock, flags);
>> +     rc = radix_tree_insert(&dma_active_pfn, entry->pfn, entry);
>> +     if (rc == -EEXIST)
>> +             __active_pfn_inc_overlap(entry);
>> +     spin_unlock_irqrestore(&radix_lock, flags);
>> +
>> +     return rc;
>> +}
>> +
>> +static void active_pfn_remove(struct dma_debug_entry *entry)
>> +{
>> +     unsigned long flags;
>> +
>> +     spin_lock_irqsave(&radix_lock, flags);
>> +     __active_pfn_dec_overlap(entry);
>> +     spin_unlock_irqrestore(&radix_lock, flags);
>> +}
>
> OK, I think I see what's happening.  The tags thing acts as a crude
> counter and if the map/unmap count ends up imbalanced, we deliberately
> leak an entry in the radix-tree so it can later be reported via undescribed
> means.  Thoughts:

Certainly the leak will be noticed by debug_dma_assert_idle(), but
there's no guarantee that we trigger that check at the time of the
leak.  Hmm, dma_debug_entries would also leak in that case...

> - RADIX_TREE_MAX_TAGS=3 so the code could count to 7, with a bit of
>   futzing around.

Yes, if we are going to count might as well leverage the full number
space to help debug implementations that overlap severely.  I should
flesh out the error reporting to say that debug_dma_assert_idle() may
give false positives in the case where the overlap counter overflows.

> - from a style/readability point of view it is unexpected that
>   __active_pfn_dec_overlap() actually removes radix-tree items.  It
>   would be better to do:
>
>         spin_lock_irqsave(&radix_lock, flags);
>         if (__active_pfn_dec_overlap(entry) == something) {
>                 /*
>                  * Nice comment goes here
>                  */
>                 radix_tree_delete(...);
>         }
>         spin_unlock_irqrestore(&radix_lock, flags);
>

Yes, I should have noticed the asymmetry with the insert case, will fix.

^ permalink raw reply

* Re: linux-next: manual merge of the tip tree with the net-next tree
From: Stephen Rothwell @ 2014-01-14  3:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra,
	David Miller, netdev
  Cc: linux-next, linux-kernel
In-Reply-To: <20140113142059.f9f1c58c391b2c8bbd05699f@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 1275 bytes --]

Hi all,

On Mon, 13 Jan 2014 14:20:59 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> On Mon, 13 Jan 2014 14:18:24 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >
> > Today's linux-next merge of the tip tree got conflicts in
> > arch/arc/include/asm/Kbuild, arch/cris/include/asm/Kbuild,
> > arch/hexagon/include/asm/Kbuild, arch/microblaze/include/asm/Kbuild,
> > arch/parisc/include/asm/Kbuild and arch/score/include/asm/Kbuild between
> > commit e3fec2f74f7f ("lib: Add missing arch generic-y entries for
> > asm-generic/hash.h") from the net-next tree and commit 93ea02bb8435
> > ("arch: Clean up asm/barrier.h implementations using
> > asm-generic/barrier.h") from the tip tree.
> > 
> > I fixed it up (see below) and can carry the fix as necessary (no action
> > is required).
> > 
> > BTW: thanks for not keeping the Kbuild files sorted :-(
> 
> I missed arch/mn10300/include/asm/Kbuild the first time round.

And ... git rerere does not work well here.  It stores resolutions by a
hash of the (sanitised) conflict and since most of these files have
exactly the same conflict, I am going to have to edit 5 of them by hand
every day.

Not happy :-(

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [PATCH net-next] bonding: don't permit slaves to change their mtu
From: Ding Tianhong @ 2014-01-14  3:01 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, David S. Miller, Netdev

The commit 2315dc91a5059d7da9a8b9b9daf78d695c11383e
(net: make dev_set_mtu() honor notification return code)
will deal with the return value for NETDEV_CHANGEMTU notification,
and the slaves should not change their mtu, so add return value
to prevent doing it.

Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/bonding/bond_main.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e06c445..af4e678 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2846,19 +2846,11 @@ static int bond_slave_netdev_event(unsigned long event,
 		 */
 		break;
 	case NETDEV_CHANGEMTU:
-		/*
-		 * TODO: Should slaves be allowed to
-		 * independently alter their MTU?  For
-		 * an active-backup bond, slaves need
-		 * not be the same type of device, so
-		 * MTUs may vary.  For other modes,
-		 * slaves arguably should have the
-		 * same MTUs. To do this, we'd need to
-		 * take over the slave's change_mtu
-		 * function for the duration of their
-		 * servitude.
+		/* The master and slaves should have the
+		 * the same mtu, so do't permit slaves
+		 * to change their mtu independently.
 		 */
-		break;
+		return NOTIFY_BAD;
 	case NETDEV_CHANGENAME:
 		/*
 		 * TODO: handle changing the primary's name
-- 
1.8.0

^ permalink raw reply related

* Re: [PATCH 0/2] tun: add the RFS support
From: Zhi Yong Wu @ 2014-01-14  3:28 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Linux Netdev List, Eric Dumazet, Zhi Yong Wu
In-Reply-To: <CA+mtBx9hpzKctKQBHDQUiG_ekSrXyEP4ALKf1h4Ox0fL4g04TQ@mail.gmail.com>

On Tue, Jan 14, 2014 at 12:49 AM, Tom Herbert <therbert@google.com> wrote:
> On Mon, Jan 13, 2014 at 5:29 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>> On Wed, Jan 1, 2014 at 6:33 AM, Tom Herbert <therbert@google.com> wrote:
>>> Zhi,
>> HI, Tom
>>>
>>> Thanks for following up on these patches. It would still be nice to
>>> have performance numbers to show the impact. These will be helpful for
>>> the next task of integrating RFS into virtio-net.
>> I don't get why RFS need to be integrated into virtio-net since
>> virtio-net has supported mq. Can you give some clue? thanks.
>>
> In this case RFS would be the mechanism for selecting the RX queue. We
> can either use the tun approach which is to match the RX queue to the
> TX queue for a flow, or expose it as accelerated RFS to the guest.
> Fine grained control of the queue selection should have value.
Thanks for your explanation.

>
>>>
>>> Tom
>>>
>>> On Tue, Dec 31, 2013 at 10:32 AM, David Miller <davem@davemloft.net> wrote:
>>>> From: Zhi Yong Wu <zwu.kernel@gmail.com>
>>>> Date: Sun, 22 Dec 2013 18:54:30 +0800
>>>>
>>>>> Since Tom Herbert's hash related patchset was modified and got merged,
>>>>> his pachset about adding support for RFS on tun flows also need to get
>>>>> adjusted accordingly. I tried to update them, and before i will start
>>>>> to do some perf tests, i hope to get one correct code base, so it's time
>>>>> to post them out now. Any constructive comments are welcome, thanks.
>>>>
>>>> Series applied to net-next, thanks.
>>
>>
>>
>> --
>> Regards,
>>
>> Zhi Yong Wu



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply

* Re: linux-next: manual merge of the tip tree with the net-next tree
From: H. Peter Anvin @ 2014-01-14  4:51 UTC (permalink / raw)
  To: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	David Miller, netdev
  Cc: linux-next, linux-kernel
In-Reply-To: <20140114140214.7279bcaf88d7a4514d889186@canb.auug.org.au>

On 01/13/2014 07:02 PM, Stephen Rothwell wrote:
> Hi all,
> 
> On Mon, 13 Jan 2014 14:20:59 +1100 Stephen Rothwell
> <sfr@canb.auug.org.au> wrote:
>> 
>> On Mon, 13 Jan 2014 14:18:24 +1100 Stephen Rothwell
>> <sfr@canb.auug.org.au> wrote:
>>> 
>>> Today's linux-next merge of the tip tree got conflicts in 
>>> arch/arc/include/asm/Kbuild, arch/cris/include/asm/Kbuild, 
>>> arch/hexagon/include/asm/Kbuild,
>>> arch/microblaze/include/asm/Kbuild, 
>>> arch/parisc/include/asm/Kbuild and
>>> arch/score/include/asm/Kbuild between commit e3fec2f74f7f
>>> ("lib: Add missing arch generic-y entries for 
>>> asm-generic/hash.h") from the net-next tree and commit
>>> 93ea02bb8435 ("arch: Clean up asm/barrier.h implementations
>>> using asm-generic/barrier.h") from the tip tree.
>>> 
>>> I fixed it up (see below) and can carry the fix as necessary
>>> (no action is required).
>>> 
>>> BTW: thanks for not keeping the Kbuild files sorted :-(
>> 
>> I missed arch/mn10300/include/asm/Kbuild the first time round.
> 
> And ... git rerere does not work well here.  It stores resolutions
> by a hash of the (sanitised) conflict and since most of these files
> have exactly the same conflict, I am going to have to edit 5 of
> them by hand every day.
> 

Well, you probably can keep a diff from the conflict-merge tree to the
fix, but still.

Is there a sensible way we can fix this in either net-next or tip?

	-hpa

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox