Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] asix: Support DLink DUB-E100 H/W Ver C1
From: Søren Holm @ 2012-09-18  8:40 UTC (permalink / raw)
  To: Christian Riesch; +Cc: netdev, stable
In-Reply-To: <CABkLObq3jQG_KMRrNpYGsseJrCYQ08cJfQFD5fpTmYDyDMGROA@mail.gmail.com>

Tirsdag den 18. september 2012 10:21:17 skrev Christian Riesch:
> IIRC this is not how it works. Your patch must go into the current
> mainline kernel first (via net or net-next) before, then it will be
> backported to the stable series.Therefore you must always send patches
> that apply to net, or net-next...
> Christian

I'm aware of that now, thanks :)

-- 
Søren Holm

^ permalink raw reply

* HTB vs CoDel performance
From: Lin Ming @ 2012-09-18  9:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: networking

Hi,

I'm testing htb performance on a gigabit router running 2.6.32 kernel.
Packet path: PC1 ---> Router LAN port ---> Router WAN port ---> PC2

pfifo_fast: 920Mbps
htb: 750Mbps, ~20% drops compared to pfifo_fast

htb tc commands as below,
# tc qdisc add dev eth10 root handle 20: htb default 1
# tc class add dev eth10 parent 20 classid 20:1 htb prio 2 rate
1024Mbit ceil 1024Mbit burst 1281408b cburst 1281408b

The performance drop seems caused by the complex htb enqueue/dequeue algorithm.

I had a quick look at CoDel code, seems it does not have so complex
data structure as HTB.
I'm going to backport CoDel. Is this a good choice?
Can I gain similar performance as pfifo_fast?

Thanks,
Lin Ming

^ permalink raw reply

* Re: HTB vs CoDel performance
From: Eric Dumazet @ 2012-09-18  9:45 UTC (permalink / raw)
  To: Lin Ming; +Cc: networking
In-Reply-To: <CAF1ivSZPpBkKk6mfhEu01bz6yP4KgJU8kK6cmeZ9e+kwH=EtiQ@mail.gmail.com>

On Tue, 2012-09-18 at 17:28 +0800, Lin Ming wrote:
> Hi,
> 
> I'm testing htb performance on a gigabit router running 2.6.32 kernel.
> Packet path: PC1 ---> Router LAN port ---> Router WAN port ---> PC2
> 
> pfifo_fast: 920Mbps
> htb: 750Mbps, ~20% drops compared to pfifo_fast
> 
> htb tc commands as below,
> # tc qdisc add dev eth10 root handle 20: htb default 1
> # tc class add dev eth10 parent 20 classid 20:1 htb prio 2 rate
> 1024Mbit ceil 1024Mbit burst 1281408b cburst 1281408b
> 
> The performance drop seems caused by the complex htb enqueue/dequeue algorithm.
> 
> I had a quick look at CoDel code, seems it does not have so complex
> data structure as HTB.
> I'm going to backport CoDel. Is this a good choice?
> Can I gain similar performance as pfifo_fast?

codel is quite different than HTB : It has no rate control, so its very
fast. (But it has no prio differentiation as pfifo_fast with its 3
bands)

So what are your exact needs ?

^ permalink raw reply

* Re: HTB vs CoDel performance
From: Lin Ming @ 2012-09-18  9:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: networking
In-Reply-To: <1347961511.26523.216.camel@edumazet-glaptop>

On Tue, Sep 18, 2012 at 5:45 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2012-09-18 at 17:28 +0800, Lin Ming wrote:
>> Hi,
>>
>> I'm testing htb performance on a gigabit router running 2.6.32 kernel.
>> Packet path: PC1 ---> Router LAN port ---> Router WAN port ---> PC2
>>
>> pfifo_fast: 920Mbps
>> htb: 750Mbps, ~20% drops compared to pfifo_fast
>>
>> htb tc commands as below,
>> # tc qdisc add dev eth10 root handle 20: htb default 1
>> # tc class add dev eth10 parent 20 classid 20:1 htb prio 2 rate
>> 1024Mbit ceil 1024Mbit burst 1281408b cburst 1281408b
>>
>> The performance drop seems caused by the complex htb enqueue/dequeue algorithm.
>>
>> I had a quick look at CoDel code, seems it does not have so complex
>> data structure as HTB.
>> I'm going to backport CoDel. Is this a good choice?
>> Can I gain similar performance as pfifo_fast?
>
> codel is quite different than HTB : It has no rate control, so its very
> fast. (But it has no prio differentiation as pfifo_fast with its 3
> bands)
>
> So what are your exact needs ?

I need traffic priority/traffic shaping/rate control ... actually all
QoS features on the router.
And if I just set the rate to gigabit(no other settings), for example,

# tc qdisc add dev eth10 root handle 20: htb default 1
# tc class add dev eth10 parent 20 classid 20:1 htb prio 2 rate
1024Mbit ceil 1024Mbit burst 1281408b cburst 1281408b

it should gain similar performance as pfifo_fast.

codel has no rate control. So seems I have to find way to optimize htb?

^ permalink raw reply

* Re: HTB vs CoDel performance
From: Eric Dumazet @ 2012-09-18 10:15 UTC (permalink / raw)
  To: Lin Ming; +Cc: networking
In-Reply-To: <CAF1ivSYweQgxbCd_ejHmDi5w7puRkE7MbpV_hczhXXDca5DJ7A@mail.gmail.com>

On Tue, 2012-09-18 at 17:56 +0800, Lin Ming wrote:

> I need traffic priority/traffic shaping/rate control ... actually all
> QoS features on the router.
> And if I just set the rate to gigabit(no other settings), for example,
> 
> # tc qdisc add dev eth10 root handle 20: htb default 1
> # tc class add dev eth10 parent 20 classid 20:1 htb prio 2 rate
> 1024Mbit ceil 1024Mbit burst 1281408b cburst 1281408b
> 
> it should gain similar performance as pfifo_fast.
> 
> codel has no rate control. So seems I have to find way to optimize htb?

Are you really cpu limited ? You might hit some clocks artifacts.

rate limiting to 1Gbps probably need high resolution timers.

HTB is not the only way to rate limit.

^ permalink raw reply

* Re: [PATCH 1/4] ipv6: add a new namespace for nf_conntrack_reasm
From: Cong Wang @ 2012-09-18 10:44 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netdev, netfilter-devel, Herbert Xu, Michal Kubeček,
	David Miller, Patrick McHardy
In-Reply-To: <20120918073756.GA18206@1984>

On Tue, 2012-09-18 at 09:37 +0200, Pablo Neira Ayuso wrote:
> > +#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
> > +	struct netns_nf_ct	nf_ct;
> > +#endif
> 
> There's above one "struct netns_ct" that already encapsulates
> netfilter conntrack netns parameters.
> 
> However, I'd prefer if, while at it, you define some struct
> netns_nf_frag instead.
> 
> In net/ipv6/netfilter/Makefile, it says:
> 
> # defrag
> nf_defrag_ipv6-y := nf_defrag_ipv6_hooks.o nf_conntrack_reasm.o
> 
> Note that nf defragmentation is not glued to conntrack anymore. So I'd
> go for one netns_nf_frag for this in include/net/net_namespace.h
> 

Sure, I will rename that struct to 'struct netns_nf_frag'.

Thanks for review!



^ permalink raw reply

* xt_hashlimit.c race?
From: "Oleg A. Arkhangelsky" @ 2012-09-18 13:22 UTC (permalink / raw)
  To: netdev

Hello,

Looking at the net/netfilter/xt_hashlimit.c revealed one question. As far as
I can understand hashlimit_mt() code under rcu_read_lock_bh() can be
executed simultaneously by more than one CPU. So what if we have two
packets with the same new dst value that processed in parallel by different
CPUs? In both cases dh is NULL and both CPUs tries to create new
entry in hash table. This is not what we want and can lead to undefined
behavior in the future.

Or maybe I'm wrong? Could anyone tell me is this situation possible?

Thank you!

-- 
wbr, Oleg.

^ permalink raw reply

* loop back address question
From: ratheesh kannoth @ 2012-09-18 13:42 UTC (permalink / raw)
  To: netdev

Hi ,

i have two  linux machines(2.6.29 ) A & B, both  connected  to same lan network

1)  if  i assign 127.0.2.1 and 127.0.2.2 to interfaces , can we ping
each other ?  is there any way to  send packets out of interface ?

Thanks,
Ratheesh

^ permalink raw reply

* [PATCH 1/4] ipv6: add a new namespace for nf_conntrack_reasm
From: Cong Wang @ 2012-09-18 13:45 UTC (permalink / raw)
  To: netdev
  Cc: netfilter-devel, Cong Wang, Herbert Xu, Michal Kubeček,
	David Miller, Patrick McHardy, Pablo Neira Ayuso
In-Reply-To: <1347975911-5655-1-git-send-email-amwang@redhat.com>

As pointed by Michal, it is necessary to add a new
namespace for nf_conntrack_reasm code, this prepares
for the second patch.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 include/net/net_namespace.h             |    3 +
 include/net/netns/ipv6.h                |    8 ++
 net/ipv6/netfilter/nf_conntrack_reasm.c |  135 +++++++++++++++++++++----------
 3 files changed, 104 insertions(+), 42 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 5ae57f1..d61e2b3 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -93,6 +93,9 @@ struct net {
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 	struct netns_ct		ct;
 #endif
+#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
+	struct netns_nf_frag	nf_frag;
+#endif
 	struct sock		*nfnl;
 	struct sock		*nfnl_stash;
 #endif
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 0318104..214cb0a 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -71,4 +71,12 @@ struct netns_ipv6 {
 #endif
 #endif
 };
+
+#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
+struct netns_nf_frag {
+	struct netns_sysctl_ipv6 sysctl;
+	struct netns_frags	frags;
+};
+#endif
+
 #endif
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index f94fb3a..d28c067 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -71,27 +71,26 @@ struct nf_ct_frag6_queue
 };
 
 static struct inet_frags nf_frags;
-static struct netns_frags nf_init_frags;
 
 #ifdef CONFIG_SYSCTL
 static struct ctl_table nf_ct_frag6_sysctl_table[] = {
 	{
 		.procname	= "nf_conntrack_frag6_timeout",
-		.data		= &nf_init_frags.timeout,
+		.data		= &init_net.nf_frag.frags.timeout,
 		.maxlen		= sizeof(unsigned int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_jiffies,
 	},
 	{
 		.procname	= "nf_conntrack_frag6_low_thresh",
-		.data		= &nf_init_frags.low_thresh,
+		.data		= &init_net.nf_frag.frags.low_thresh,
 		.maxlen		= sizeof(unsigned int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
 	{
 		.procname	= "nf_conntrack_frag6_high_thresh",
-		.data		= &nf_init_frags.high_thresh,
+		.data		= &init_net.nf_frag.frags.high_thresh,
 		.maxlen		= sizeof(unsigned int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
@@ -99,7 +98,54 @@ static struct ctl_table nf_ct_frag6_sysctl_table[] = {
 	{ }
 };
 
-static struct ctl_table_header *nf_ct_frag6_sysctl_header;
+static int __net_init nf_ct_frag6_sysctl_register(struct net *net)
+{
+	struct ctl_table *table;
+	struct ctl_table_header *hdr;
+
+	table = nf_ct_frag6_sysctl_table;
+	if (!net_eq(net, &init_net)) {
+		table = kmemdup(table, sizeof(nf_ct_frag6_sysctl_table), GFP_KERNEL);
+		if (table == NULL)
+			goto err_alloc;
+
+		table[0].data = &net->ipv6.frags.high_thresh;
+		table[1].data = &net->ipv6.frags.low_thresh;
+		table[2].data = &net->ipv6.frags.timeout;
+	}
+
+	hdr = register_net_sysctl(net, "net/netfilter", table);
+	if (hdr == NULL)
+		goto err_reg;
+
+	net->ipv6.sysctl.frags_hdr = hdr;
+	return 0;
+
+err_reg:
+	if (!net_eq(net, &init_net))
+		kfree(table);
+err_alloc:
+	return -ENOMEM;
+}
+
+static void __net_exit nf_ct_frags6_sysctl_unregister(struct net *net)
+{
+	struct ctl_table *table;
+
+	table = net->nf_frag.sysctl.frags_hdr->ctl_table_arg;
+	unregister_net_sysctl_table(net->nf_frag.sysctl.frags_hdr);
+	if (!net_eq(net, &init_net))
+		kfree(table);
+}
+
+#else
+static int __net_init nf_ct_frag6_sysctl_register(struct net *net)
+{
+	return 0;
+}
+static void __net_exit nf_ct_frags6_sysctl_unregister(struct net *net)
+{
+}
 #endif
 
 static unsigned int nf_hashfn(struct inet_frag_queue *q)
@@ -131,13 +177,6 @@ static __inline__ void fq_kill(struct nf_ct_frag6_queue *fq)
 	inet_frag_kill(&fq->q, &nf_frags);
 }
 
-static void nf_ct_frag6_evictor(void)
-{
-	local_bh_disable();
-	inet_frag_evictor(&nf_init_frags, &nf_frags);
-	local_bh_enable();
-}
-
 static void nf_ct_frag6_expire(unsigned long data)
 {
 	struct nf_ct_frag6_queue *fq;
@@ -159,8 +198,8 @@ out:
 
 /* Creation primitives. */
 
-static __inline__ struct nf_ct_frag6_queue *
-fq_find(__be32 id, u32 user, struct in6_addr *src, struct in6_addr *dst)
+static __inline__ struct nf_ct_frag6_queue*
+fq_find(struct net *net, __be32 id, u32 user, struct in6_addr *src, struct in6_addr *dst)
 {
 	struct inet_frag_queue *q;
 	struct ip6_create_arg arg;
@@ -174,7 +213,7 @@ fq_find(__be32 id, u32 user, struct in6_addr *src, struct in6_addr *dst)
 	read_lock_bh(&nf_frags.lock);
 	hash = inet6_hash_frag(id, src, dst, nf_frags.rnd);
 
-	q = inet_frag_find(&nf_init_frags, &nf_frags, &arg, hash);
+	q = inet_frag_find(&net->nf_frag.frags, &nf_frags, &arg, hash);
 	local_bh_enable();
 	if (q == NULL)
 		goto oom;
@@ -186,7 +225,7 @@ oom:
 }
 
 
-static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
+static int nf_ct_frag6_queue(struct nf_ct_frag6_queue*fq, struct sk_buff *skb,
 			     const struct frag_hdr *fhdr, int nhoff)
 {
 	struct sk_buff *prev, *next;
@@ -312,7 +351,7 @@ found:
 	fq->q.meat += skb->len;
 	if (payload_len > fq->q.max_size)
 		fq->q.max_size = payload_len;
-	atomic_add(skb->truesize, &nf_init_frags.mem);
+	atomic_add(skb->truesize, &fq->q.net->mem);
 
 	/* The first fragment.
 	 * nhoffset is obtained from the first fragment, of course.
@@ -322,7 +361,7 @@ found:
 		fq->q.last_in |= INET_FRAG_FIRST_IN;
 	}
 	write_lock(&nf_frags.lock);
-	list_move_tail(&fq->q.lru_list, &nf_init_frags.lru_list);
+	list_move_tail(&fq->q.lru_list, &fq->q.net->lru_list);
 	write_unlock(&nf_frags.lock);
 	return 0;
 
@@ -391,7 +430,7 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
 		clone->ip_summed = head->ip_summed;
 
 		NFCT_FRAG6_CB(clone)->orig = NULL;
-		atomic_add(clone->truesize, &nf_init_frags.mem);
+		atomic_add(clone->truesize, &fq->q.net->mem);
 	}
 
 	/* We have to remove fragment header from datagram and to relocate
@@ -415,7 +454,7 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
 			head->csum = csum_add(head->csum, fp->csum);
 		head->truesize += fp->truesize;
 	}
-	atomic_sub(head->truesize, &nf_init_frags.mem);
+	atomic_sub(head->truesize, &fq->q.net->mem);
 
 	head->local_df = 1;
 	head->next = NULL;
@@ -527,6 +566,7 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
 {
 	struct sk_buff *clone;
 	struct net_device *dev = skb->dev;
+	struct net *net = skb_dst(skb) ? dev_net(skb_dst(skb)->dev) : dev_net(skb->dev);
 	struct frag_hdr *fhdr;
 	struct nf_ct_frag6_queue *fq;
 	struct ipv6hdr *hdr;
@@ -560,10 +600,13 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
 	hdr = ipv6_hdr(clone);
 	fhdr = (struct frag_hdr *)skb_transport_header(clone);
 
-	if (atomic_read(&nf_init_frags.mem) > nf_init_frags.high_thresh)
-		nf_ct_frag6_evictor();
+	if (atomic_read(&net->nf_frag.frags.mem) > net->nf_frag.frags.high_thresh) {
+		local_bh_disable();
+		inet_frag_evictor(&net->nf_frag.frags, &nf_frags);
+		local_bh_enable();
+	}
 
-	fq = fq_find(fhdr->identification, user, &hdr->saddr, &hdr->daddr);
+	fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr);
 	if (fq == NULL) {
 		pr_debug("Can't find and can't create new queue\n");
 		goto ret_orig;
@@ -621,8 +664,31 @@ void nf_ct_frag6_output(unsigned int hooknum, struct sk_buff *skb,
 	nf_conntrack_put_reasm(skb);
 }
 
+static int nf_ct_net_init(struct net *net)
+{
+	net->nf_frag.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
+	net->nf_frag.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
+	net->nf_frag.frags.timeout = IPV6_FRAG_TIMEOUT;
+	inet_frags_init_net(&net->nf_frag.frags);
+
+	return nf_ct_frag6_sysctl_register(net);
+}
+
+static void nf_ct_net_exit(struct net *net)
+{
+	nf_ct_frags6_sysctl_unregister(net);
+	inet_frags_exit_net(&net->nf_frag.frags, &nf_frags);
+}
+
+static struct pernet_operations nf_ct_net_ops = {
+	.init = nf_ct_net_init,
+	.exit = nf_ct_net_exit,
+};
+
 int nf_ct_frag6_init(void)
 {
+	int ret = 0;
+
 	nf_frags.hashfn = nf_hashfn;
 	nf_frags.constructor = ip6_frag_init;
 	nf_frags.destructor = NULL;
@@ -631,32 +697,17 @@ int nf_ct_frag6_init(void)
 	nf_frags.match = ip6_frag_match;
 	nf_frags.frag_expire = nf_ct_frag6_expire;
 	nf_frags.secret_interval = 10 * 60 * HZ;
-	nf_init_frags.timeout = IPV6_FRAG_TIMEOUT;
-	nf_init_frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
-	nf_init_frags.low_thresh = IPV6_FRAG_LOW_THRESH;
-	inet_frags_init_net(&nf_init_frags);
 	inet_frags_init(&nf_frags);
 
-#ifdef CONFIG_SYSCTL
-	nf_ct_frag6_sysctl_header = register_net_sysctl(&init_net, "net/netfilter",
-							nf_ct_frag6_sysctl_table);
-	if (!nf_ct_frag6_sysctl_header) {
+	ret = register_pernet_subsys(&nf_ct_net_ops);
+	if (ret)
 		inet_frags_fini(&nf_frags);
-		return -ENOMEM;
-	}
-#endif
 
-	return 0;
+	return ret;
 }
 
 void nf_ct_frag6_cleanup(void)
 {
-#ifdef CONFIG_SYSCTL
-	unregister_net_sysctl_table(nf_ct_frag6_sysctl_header);
-	nf_ct_frag6_sysctl_header = NULL;
-#endif
+	unregister_pernet_subsys(&nf_ct_net_ops);
 	inet_frags_fini(&nf_frags);
-
-	nf_init_frags.low_thresh = 0;
-	nf_ct_frag6_evictor();
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH 3/4] ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static inline
From: Cong Wang @ 2012-09-18 13:45 UTC (permalink / raw)
  To: netdev
  Cc: netfilter-devel, Cong Wang, Herbert Xu, Michal Kubeček,
	David Miller
In-Reply-To: <1347975911-5655-1-git-send-email-amwang@redhat.com>

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 include/net/ipv6.h    |   13 +++++++++++--
 net/ipv6/reassembly.c |   10 ----------
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 81d4455..979bf6c 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -271,8 +271,17 @@ struct ipv6_txoptions *ipv6_fixup_options(struct ipv6_txoptions *opt_space,
 
 extern bool ipv6_opt_accepted(const struct sock *sk, const struct sk_buff *skb);
 
-int ip6_frag_nqueues(struct net *net);
-int ip6_frag_mem(struct net *net);
+#if IS_ENABLED(CONFIG_IPV6)
+static inline int ip6_frag_nqueues(struct net *net)
+{
+	return net->ipv6.frags.nqueues;
+}
+
+static inline int ip6_frag_mem(struct net *net)
+{
+	return atomic_read(&net->ipv6.frags.mem);
+}
+#endif
 
 #define IPV6_FRAG_HIGH_THRESH	(256 * 1024)	/* 262144 */
 #define IPV6_FRAG_LOW_THRESH	(192 * 1024)	/* 196608 */
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 8508c8c..cac690c 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -67,16 +67,6 @@ struct ip6frag_skb_cb
 
 static struct inet_frags ip6_frags;
 
-int ip6_frag_nqueues(struct net *net)
-{
-	return net->ipv6.frags.nqueues;
-}
-
-int ip6_frag_mem(struct net *net)
-{
-	return atomic_read(&net->ipv6.frags.mem);
-}
-
 static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 			  struct net_device *dev);
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 net-next 0/4] ipv6: fix the reassembly expire code in nf_conntrack
From: Cong Wang @ 2012-09-18 13:45 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, Herbert Xu, David S. Miller, Cong Wang

V3: rename struct netns_nf_ct to struct netns_nf_frag

V2: use IS_ENABLED(CONFIG_IPV6) to fix a build error
    rebase to latest net-next

ipv6: add a new namespace for nf_conntrack_reasm
ipv6: unify conntrack reassembly expire code with standard one
ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static
ipv6: unify fragment thresh handling code

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>

-----

 include/net/inet_frag.h                 |    2 +-
 include/net/ipv6.h                      |   32 +++++-
 include/net/net_namespace.h             |    3 +
 include/net/netns/ipv6.h                |    8 ++
 net/ipv4/inet_fragment.c                |    9 +-
 net/ipv4/ip_fragment.c                  |    5 +-
 net/ipv6/netfilter/nf_conntrack_reasm.c |  196 ++++++++++++++++---------------
 net/ipv6/reassembly.c                   |   88 ++++----------
 8 files changed, 178 insertions(+), 165 deletions(-)

^ permalink raw reply

* [PATCH 4/4] ipv6: unify fragment thresh handling code
From: Cong Wang @ 2012-09-18 13:45 UTC (permalink / raw)
  To: netdev
  Cc: netfilter-devel, Cong Wang, Herbert Xu, Michal Kubeček,
	David Miller
In-Reply-To: <1347975911-5655-1-git-send-email-amwang@redhat.com>

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 include/net/inet_frag.h                 |    2 +-
 net/ipv4/inet_fragment.c                |    9 +++++++--
 net/ipv4/ip_fragment.c                  |    5 ++---
 net/ipv6/netfilter/nf_conntrack_reasm.c |    8 +++-----
 net/ipv6/reassembly.c                   |   16 +++++-----------
 5 files changed, 18 insertions(+), 22 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 5098ee7..32786a0 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -61,7 +61,7 @@ void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f);
 void inet_frag_kill(struct inet_frag_queue *q, struct inet_frags *f);
 void inet_frag_destroy(struct inet_frag_queue *q,
 				struct inet_frags *f, int *work);
-int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f);
+int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f, bool force);
 struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
 		struct inet_frags *f, void *key, unsigned int hash)
 	__releases(&f->lock);
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 85190e6..4750d2b 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -89,7 +89,7 @@ void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f)
 	nf->low_thresh = 0;
 
 	local_bh_disable();
-	inet_frag_evictor(nf, f);
+	inet_frag_evictor(nf, f, true);
 	local_bh_enable();
 }
 EXPORT_SYMBOL(inet_frags_exit_net);
@@ -158,11 +158,16 @@ void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f,
 }
 EXPORT_SYMBOL(inet_frag_destroy);
 
-int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f)
+int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f, bool force)
 {
 	struct inet_frag_queue *q;
 	int work, evicted = 0;
 
+	if (!force) {
+		if (atomic_read(&nf->mem) <= nf->high_thresh)
+			return 0;
+	}
+
 	work = atomic_read(&nf->mem) - nf->low_thresh;
 	while (work > 0) {
 		read_lock(&f->lock);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index fa6a12c..448e685 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -219,7 +219,7 @@ static void ip_evictor(struct net *net)
 {
 	int evicted;
 
-	evicted = inet_frag_evictor(&net->ipv4.frags, &ip4_frags);
+	evicted = inet_frag_evictor(&net->ipv4.frags, &ip4_frags, false);
 	if (evicted)
 		IP_ADD_STATS_BH(net, IPSTATS_MIB_REASMFAILS, evicted);
 }
@@ -684,8 +684,7 @@ int ip_defrag(struct sk_buff *skb, u32 user)
 	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMREQDS);
 
 	/* Start by cleaning up the memory. */
-	if (atomic_read(&net->ipv4.frags.mem) > net->ipv4.frags.high_thresh)
-		ip_evictor(net);
+	ip_evictor(net);
 
 	/* Lookup (or create) queue header */
 	if ((qp = ip_find(net, ip_hdr(skb), user)) != NULL) {
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 5a1307b..07e4fb0 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -563,11 +563,9 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
 	hdr = ipv6_hdr(clone);
 	fhdr = (struct frag_hdr *)skb_transport_header(clone);
 
-	if (atomic_read(&net->nf_frag.frags.mem) > net->nf_frag.frags.high_thresh) {
-		local_bh_disable();
-		inet_frag_evictor(&net->nf_frag.frags, &nf_frags);
-		local_bh_enable();
-	}
+	local_bh_disable();
+	inet_frag_evictor(&net->nf_frag.frags, &nf_frags, false);
+	local_bh_enable();
 
 	fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr);
 	if (fq == NULL) {
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index cac690c..a1610ac 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -131,15 +131,6 @@ void ip6_frag_init(struct inet_frag_queue *q, void *a)
 }
 EXPORT_SYMBOL(ip6_frag_init);
 
-static void ip6_evictor(struct net *net, struct inet6_dev *idev)
-{
-	int evicted;
-
-	evicted = inet_frag_evictor(&net->ipv6.frags, &ip6_frags);
-	if (evicted)
-		IP6_ADD_STATS_BH(net, idev, IPSTATS_MIB_REASMFAILS, evicted);
-}
-
 void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq, struct inet_frags *frags)
 {
 	struct net_device *dev = NULL;
@@ -514,6 +505,7 @@ static int ipv6_frag_rcv(struct sk_buff *skb)
 	struct frag_queue *fq;
 	const struct ipv6hdr *hdr = ipv6_hdr(skb);
 	struct net *net = dev_net(skb_dst(skb)->dev);
+	int evicted;
 
 	IP6_INC_STATS_BH(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_REASMREQDS);
 
@@ -538,8 +530,10 @@ static int ipv6_frag_rcv(struct sk_buff *skb)
 		return 1;
 	}
 
-	if (atomic_read(&net->ipv6.frags.mem) > net->ipv6.frags.high_thresh)
-		ip6_evictor(net, ip6_dst_idev(skb_dst(skb)));
+	evicted = inet_frag_evictor(&net->ipv6.frags, &ip6_frags, false);
+	if (evicted)
+		IP6_ADD_STATS_BH(net, ip6_dst_idev(skb_dst(skb)),
+				 IPSTATS_MIB_REASMFAILS, evicted);
 
 	fq = fq_find(net, fhdr->identification, &hdr->saddr, &hdr->daddr);
 	if (fq != NULL) {
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 2/4] ipv6: unify conntrack reassembly expire code with standard one
From: Cong Wang @ 2012-09-18 13:45 UTC (permalink / raw)
  To: netdev
  Cc: netfilter-devel, Cong Wang, Herbert Xu, Michal Kubeček,
	David Miller, Hideaki YOSHIFUJI, Patrick McHardy,
	Pablo Neira Ayuso
In-Reply-To: <1347975911-5655-1-git-send-email-amwang@redhat.com>

Two years ago, Shan Wei tried to fix this:
http://patchwork.ozlabs.org/patch/43905/

The problem is that RFC2460 requires an ICMP Time
Exceeded -- Fragment Reassembly Time Exceeded message should be
sent to the source of that fragment, if the defragmentation
times out.

"
   If insufficient fragments are received to complete reassembly of a
   packet within 60 seconds of the reception of the first-arriving
   fragment of that packet, reassembly of that packet must be
   abandoned and all the fragments that have been received for that
   packet must be discarded.  If the first fragment (i.e., the one
   with a Fragment Offset of zero) has been received, an ICMP Time
   Exceeded -- Fragment Reassembly Time Exceeded message should be
   sent to the source of that fragment.
"

As Herbert suggested, we could actually use the standard IPv6
reassembly code which follows RFC2460.

With this patch applied, I can see ICMP Time Exceeded sent
from the receiver when the sender sent out 3/4 fragmented
IPv6 UPD packet.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 include/net/ipv6.h                      |   19 ++++++++
 net/ipv6/netfilter/nf_conntrack_reasm.c |   71 +++++++-----------------------
 net/ipv6/reassembly.c                   |   62 ++++++++-------------------
 3 files changed, 54 insertions(+), 98 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 9bed5d4..81d4455 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -411,6 +411,25 @@ struct ip6_create_arg {
 void ip6_frag_init(struct inet_frag_queue *q, void *a);
 bool ip6_frag_match(struct inet_frag_queue *q, void *a);
 
+/*
+ *	Equivalent of ipv4 struct ip
+ */
+struct frag_queue {
+	struct inet_frag_queue	q;
+
+	__be32			id;		/* fragment id		*/
+	u32			user;
+	struct in6_addr		saddr;
+	struct in6_addr		daddr;
+
+	int			iif;
+	unsigned int		csum;
+	__u16			nhoffset;
+};
+
+void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq,
+			   struct inet_frags *frags);
+
 static inline bool ipv6_addr_any(const struct in6_addr *a)
 {
 #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index d28c067..5a1307b 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -57,19 +57,6 @@ struct nf_ct_frag6_skb_cb
 
 #define NFCT_FRAG6_CB(skb)	((struct nf_ct_frag6_skb_cb*)((skb)->cb))
 
-struct nf_ct_frag6_queue
-{
-	struct inet_frag_queue	q;
-
-	__be32			id;		/* fragment id		*/
-	u32			user;
-	struct in6_addr		saddr;
-	struct in6_addr		daddr;
-
-	unsigned int		csum;
-	__u16			nhoffset;
-};
-
 static struct inet_frags nf_frags;
 
 #ifdef CONFIG_SYSCTL
@@ -150,9 +137,9 @@ static void __net_exit nf_ct_frags6_sysctl_unregister(struct net *net)
 
 static unsigned int nf_hashfn(struct inet_frag_queue *q)
 {
-	const struct nf_ct_frag6_queue *nq;
+	const struct frag_queue *nq;
 
-	nq = container_of(q, struct nf_ct_frag6_queue, q);
+	nq = container_of(q, struct frag_queue, q);
 	return inet6_hash_frag(nq->id, &nq->saddr, &nq->daddr, nf_frags.rnd);
 }
 
@@ -162,43 +149,19 @@ static void nf_skb_free(struct sk_buff *skb)
 		kfree_skb(NFCT_FRAG6_CB(skb)->orig);
 }
 
-/* Destruction primitives. */
-
-static __inline__ void fq_put(struct nf_ct_frag6_queue *fq)
-{
-	inet_frag_put(&fq->q, &nf_frags);
-}
-
-/* Kill fq entry. It is not destroyed immediately,
- * because caller (and someone more) holds reference count.
- */
-static __inline__ void fq_kill(struct nf_ct_frag6_queue *fq)
-{
-	inet_frag_kill(&fq->q, &nf_frags);
-}
-
 static void nf_ct_frag6_expire(unsigned long data)
 {
-	struct nf_ct_frag6_queue *fq;
-
-	fq = container_of((struct inet_frag_queue *)data,
-			struct nf_ct_frag6_queue, q);
-
-	spin_lock(&fq->q.lock);
+	struct frag_queue *fq;
+	struct net *net;
 
-	if (fq->q.last_in & INET_FRAG_COMPLETE)
-		goto out;
+	fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
+	net = container_of(fq->q.net, struct net, nf_frag.frags);
 
-	fq_kill(fq);
-
-out:
-	spin_unlock(&fq->q.lock);
-	fq_put(fq);
+	ip6_expire_frag_queue(net, fq, &nf_frags);
 }
 
 /* Creation primitives. */
-
-static __inline__ struct nf_ct_frag6_queue*
+static __inline__ struct frag_queue *
 fq_find(struct net *net, __be32 id, u32 user, struct in6_addr *src, struct in6_addr *dst)
 {
 	struct inet_frag_queue *q;
@@ -218,14 +181,14 @@ fq_find(struct net *net, __be32 id, u32 user, struct in6_addr *src, struct in6_a
 	if (q == NULL)
 		goto oom;
 
-	return container_of(q, struct nf_ct_frag6_queue, q);
+	return container_of(q, struct frag_queue, q);
 
 oom:
 	return NULL;
 }
 
 
-static int nf_ct_frag6_queue(struct nf_ct_frag6_queue*fq, struct sk_buff *skb,
+static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb,
 			     const struct frag_hdr *fhdr, int nhoff)
 {
 	struct sk_buff *prev, *next;
@@ -366,7 +329,7 @@ found:
 	return 0;
 
 discard_fq:
-	fq_kill(fq);
+	inet_frag_kill(&fq->q, &nf_frags);
 err:
 	return -1;
 }
@@ -381,12 +344,12 @@ err:
  *	the last and the first frames arrived and all the bits are here.
  */
 static struct sk_buff *
-nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
+nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev)
 {
 	struct sk_buff *fp, *op, *head = fq->q.fragments;
 	int    payload_len;
 
-	fq_kill(fq);
+	inet_frag_kill(&fq->q, &nf_frags);
 
 	WARN_ON(head == NULL);
 	WARN_ON(NFCT_FRAG6_CB(head)->offset != 0);
@@ -568,7 +531,7 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
 	struct net_device *dev = skb->dev;
 	struct net *net = skb_dst(skb) ? dev_net(skb_dst(skb)->dev) : dev_net(skb->dev);
 	struct frag_hdr *fhdr;
-	struct nf_ct_frag6_queue *fq;
+	struct frag_queue *fq;
 	struct ipv6hdr *hdr;
 	int fhoff, nhoff;
 	u8 prevhdr;
@@ -617,7 +580,7 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
 	if (nf_ct_frag6_queue(fq, clone, fhdr, nhoff) < 0) {
 		spin_unlock_bh(&fq->q.lock);
 		pr_debug("Can't insert skb to queue\n");
-		fq_put(fq);
+		inet_frag_put(&fq->q, &nf_frags);
 		goto ret_orig;
 	}
 
@@ -629,7 +592,7 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
 	}
 	spin_unlock_bh(&fq->q.lock);
 
-	fq_put(fq);
+	inet_frag_put(&fq->q, &nf_frags);
 	return ret_skb;
 
 ret_orig:
@@ -693,7 +656,7 @@ int nf_ct_frag6_init(void)
 	nf_frags.constructor = ip6_frag_init;
 	nf_frags.destructor = NULL;
 	nf_frags.skb_free = nf_skb_free;
-	nf_frags.qsize = sizeof(struct nf_ct_frag6_queue);
+	nf_frags.qsize = sizeof(struct frag_queue);
 	nf_frags.match = ip6_frag_match;
 	nf_frags.frag_expire = nf_ct_frag6_expire;
 	nf_frags.secret_interval = 10 * 60 * HZ;
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 4ff9af6..8508c8c 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -65,24 +65,6 @@ struct ip6frag_skb_cb
 #define FRAG6_CB(skb)	((struct ip6frag_skb_cb*)((skb)->cb))
 
 
-/*
- *	Equivalent of ipv4 struct ipq
- */
-
-struct frag_queue
-{
-	struct inet_frag_queue	q;
-
-	__be32			id;		/* fragment id		*/
-	u32			user;
-	struct in6_addr		saddr;
-	struct in6_addr		daddr;
-
-	int			iif;
-	unsigned int		csum;
-	__u16			nhoffset;
-};
-
 static struct inet_frags ip6_frags;
 
 int ip6_frag_nqueues(struct net *net)
@@ -159,21 +141,6 @@ void ip6_frag_init(struct inet_frag_queue *q, void *a)
 }
 EXPORT_SYMBOL(ip6_frag_init);
 
-/* Destruction primitives. */
-
-static __inline__ void fq_put(struct frag_queue *fq)
-{
-	inet_frag_put(&fq->q, &ip6_frags);
-}
-
-/* Kill fq entry. It is not destroyed immediately,
- * because caller (and someone more) holds reference count.
- */
-static __inline__ void fq_kill(struct frag_queue *fq)
-{
-	inet_frag_kill(&fq->q, &ip6_frags);
-}
-
 static void ip6_evictor(struct net *net, struct inet6_dev *idev)
 {
 	int evicted;
@@ -183,22 +150,17 @@ static void ip6_evictor(struct net *net, struct inet6_dev *idev)
 		IP6_ADD_STATS_BH(net, idev, IPSTATS_MIB_REASMFAILS, evicted);
 }
 
-static void ip6_frag_expire(unsigned long data)
+void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq, struct inet_frags *frags)
 {
-	struct frag_queue *fq;
 	struct net_device *dev = NULL;
-	struct net *net;
-
-	fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
 
 	spin_lock(&fq->q.lock);
 
 	if (fq->q.last_in & INET_FRAG_COMPLETE)
 		goto out;
 
-	fq_kill(fq);
+	inet_frag_kill(&fq->q, frags);
 
-	net = container_of(fq->q.net, struct net, ipv6.frags);
 	rcu_read_lock();
 	dev = dev_get_by_index_rcu(net, fq->iif);
 	if (!dev)
@@ -222,7 +184,19 @@ out_rcu_unlock:
 	rcu_read_unlock();
 out:
 	spin_unlock(&fq->q.lock);
-	fq_put(fq);
+	inet_frag_put(&fq->q, frags);
+}
+EXPORT_SYMBOL(ip6_expire_frag_queue);
+
+static void ip6_frag_expire(unsigned long data)
+{
+	struct frag_queue *fq;
+	struct net *net;
+
+	fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
+	net = container_of(fq->q.net, struct net, ipv6.frags);
+
+	ip6_expire_frag_queue(net, fq, &ip6_frags);
 }
 
 static __inline__ struct frag_queue *
@@ -391,7 +365,7 @@ found:
 	return -1;
 
 discard_fq:
-	fq_kill(fq);
+	inet_frag_kill(&fq->q, &ip6_frags);
 err:
 	IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
 		      IPSTATS_MIB_REASMFAILS);
@@ -417,7 +391,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 	unsigned int nhoff;
 	int sum_truesize;
 
-	fq_kill(fq);
+	inet_frag_kill(&fq->q, &ip6_frags);
 
 	/* Make the one we just received the head. */
 	if (prev) {
@@ -586,7 +560,7 @@ static int ipv6_frag_rcv(struct sk_buff *skb)
 		ret = ip6_frag_queue(fq, skb, fhdr, IP6CB(skb)->nhoff);
 
 		spin_unlock(&fq->q.lock);
-		fq_put(fq);
+		inet_frag_put(&fq->q, &ip6_frags);
 		return ret;
 	}
 
-- 
1.7.7.6

^ permalink raw reply related

* Re: [PATCH NEXT V2] rtlwifi: rtl8192c: rtl8192ce: Add support for B-CUT version of RTL8188CE
From: Anisse Astier @ 2012-09-18 14:09 UTC (permalink / raw)
  To: Larry Finger; +Cc: linville, linux-wireless, netdev, Li Chaoming
In-Reply-To: <1347914143-27698-1-git-send-email-Larry.Finger@lwfinger.net>

On Mon, 17 Sep 2012 15:35:43 -0500, Larry Finger <Larry.Finger@lwfinger.net> wrote :

> Realtek devices with designation RTL8188CE-VL have the so-called B-cut
> of the wireless chip. This patch adds the special programming needed by
> these devices.
> 
> Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
> Cc: Anisse Astier <anisse@astier.eu>
> Cc: Li Chaoming <chaoming_li@realsil.com.cn>
> ---
>  drivers/net/wireless/rtlwifi/rtl8192c/phy_common.c |   21 +++++++
>  drivers/net/wireless/rtlwifi/rtl8192ce/def.h       |    3 +
>  drivers/net/wireless/rtlwifi/rtl8192ce/hw.c        |   61 ++++++++++++++++++--
>  drivers/net/wireless/rtlwifi/rtl8192ce/phy.c       |    4 +-
>  drivers/net/wireless/rtlwifi/rtl8192ce/sw.c        |    6 +-
>  drivers/net/wireless/rtlwifi/rtl8192ce/trx.c       |    4 +-
>  6 files changed, 87 insertions(+), 12 deletions(-)
> ---
> V1 => V2	Remove extraneous white space.
> 
> 
> John,
> 
> This patch is too invasive to backport to the stable kernels, thus it should
> be applied to 3.7.
> 
> Thanks,
> 
> Larry
> ---
> 

[snip]

> Index: wireless-testing-new/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
> ===================================================================
> --- wireless-testing-new.orig/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
> +++ wireless-testing-new/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
> @@ -896,7 +896,6 @@ int rtl92ce_hw_init(struct ieee80211_hw
>  	struct rtl_phy *rtlphy = &(rtlpriv->phy);
>  	struct rtl_pci *rtlpci = rtl_pcidev(rtl_pcipriv(hw));
>  	struct rtl_ps_ctl *ppsc = rtl_psc(rtl_priv(hw));
> -	static bool iqk_initialized; /* initialized to false */
>  	bool rtstatus = true;
>  	bool is92c;
>  	int err;
> @@ -921,9 +920,28 @@ int rtl92ce_hw_init(struct ieee80211_hw
>  
>  	rtlhal->last_hmeboxnum = 0;
>  	rtl92c_phy_mac_config(hw);
> +	/* because last function modify RCR, so we update
> +	 * rcr var here, or TP will unstable for receive_config
> +	 * is wrong, RX RCR_ACRC32 will cause TP unstabel & Rx
> +	 * RCR_APP_ICV will cause mac80211 unassoc for cisco 1252*/
> +	rtlpci->receive_config = rtl_read_dword(rtlpriv, REG_RCR);
> +	rtlpci->receive_config &= ~(RCR_ACRC32 | RCR_AICV);
> +	rtl_write_dword(rtlpriv, REG_RCR, rtlpci->receive_config);
>  	rtl92c_phy_bb_config(hw);
>  	rtlphy->rf_mode = RF_OP_BY_SW_3WIRE;
>  	rtl92c_phy_rf_config(hw);
> +	if (IS_VENDOR_UMC_A_CUT(rtlhal->version) &&
> +	    !IS_92C_SERIAL(rtlhal->version)) {
> +		rtl_set_rfreg(hw, RF90_PATH_A, RF_RX_G1, MASKDWORD, 0x30255);
> +		rtl_set_rfreg(hw, RF90_PATH_A, RF_RX_G2, MASKDWORD, 0x50a00);
> +	} else if (IS_81xxC_VENDOR_UMC_B_CUT(rtlhal->version)) {
> +		rtl_set_rfreg(hw, RF90_PATH_A, 0x0C, MASKDWORD, 0x894AE);
> +		rtl_set_rfreg(hw, RF90_PATH_A, 0x0A, MASKDWORD, 0x1AF31);
> +		rtl_set_rfreg(hw, RF90_PATH_A, RF_IPA, MASKDWORD, 0x8F425);
> +		rtl_set_rfreg(hw, RF90_PATH_A, RF_SYN_G2, MASKDWORD, 0x4F200);
> +		rtl_set_rfreg(hw, RF90_PATH_A, RF_RCK1, MASKDWORD, 0x44053);
> +		rtl_set_rfreg(hw, RF90_PATH_A, RF_RCK2, MASKDWORD, 0x80201);
> +	}
>  	rtlphy->rfreg_chnlval[0] = rtl_get_rfreg(hw, (enum radio_path)0,
>  						 RF_CHNLBW, RFREG_OFFSET_MASK);
>  	rtlphy->rfreg_chnlval[1] = rtl_get_rfreg(hw, (enum radio_path)1,
> @@ -945,11 +963,11 @@ int rtl92ce_hw_init(struct ieee80211_hw
>  
>  	if (ppsc->rfpwr_state == ERFON) {
>  		rtl92c_phy_set_rfpath_switch(hw, 1);
> -		if (iqk_initialized) {
> +		if (rtlphy->iqk_initialized) {
>  			rtl92c_phy_iq_calibrate(hw, true);
>  		} else {
>  			rtl92c_phy_iq_calibrate(hw, false);
> -			iqk_initialized = true;
> +			rtlphy->iqk_initialized = true;
>  		}
>  
>  		rtl92c_dm_check_txpower_tracking(hw);
> @@ -1004,6 +1022,13 @@ static enum version_8192c _rtl92ce_read_
>  				   ? CHIP_VENDOR_UMC_B_CUT : CHIP_UNKNOWN) |
>  				   CHIP_VENDOR_UMC));
>  		}
> +		if (IS_92C_SERIAL(version)) {
> +			value32 = rtl_read_dword(rtlpriv, REG_HPON_FSM);
> +			version = (enum version_8192c)(version |
> +				   ((CHIP_BONDING_IDENTIFIER(value32)
> +				   == CHIP_BONDING_92C_1T2R) ?
> +				   RF_TYPE_1T2R : 0));
> +		}
>  	}
>  
>  	switch (version) {
> @@ -1019,12 +1044,30 @@ static enum version_8192c _rtl92ce_read_
>  	case VERSION_A_CHIP_88C:
>  		versionid = "A_CHIP_88C";
>  		break;
> +	case VERSION_NORMAL_UMC_CHIP_92C_1T2R_A_CUT:
> +		versionid = "A_CUT_92C_1T2R";
> +		break;
> +	case VERSION_NORMAL_UMC_CHIP_92C_A_CUT:
> +		versionid = "A_CUT_92C";
> +		break;
> +	case VERSION_NORMAL_UMC_CHIP_88C_A_CUT:
> +		versionid = "A_CUT_88C";
> +		break;
> +	case VERSION_NORMAL_UMC_CHIP_92C_1T2R_B_CUT:
> +		versionid = "B_CUT_92C_1T2R";
> +		break;
> +	case VERSION_NORMAL_UMC_CHIP_92C_B_CUT:
> +		versionid = "B_CUT_92C";
> +		break;
> +	case VERSION_NORMAL_UMC_CHIP_88C_B_CUT:
> +		versionid = "B_CUT_88C";
> +		break;
>  	default:
>  		versionid = "Unknown. Bug?";
>  		break;
>  	}
>  
> -	RT_TRACE(rtlpriv, COMP_INIT, DBG_TRACE,
> +	RT_TRACE(rtlpriv, COMP_INIT, DBG_EMERG,
>  		 "Chip Version ID: %s\n", versionid);
>  
>  	switch (version & 0x3) {
> @@ -1197,6 +1240,7 @@ static void _rtl92ce_poweroff_adapter(st
>  {
>  	struct rtl_priv *rtlpriv = rtl_priv(hw);
>  	struct rtl_pci_priv *rtlpcipriv = rtl_pcipriv(hw);
> +	struct rtl_hal *rtlhal = rtl_hal(rtlpriv);
>  	u8 u1b_tmp;
>  	u32 u4b_tmp;
>  
> @@ -1225,7 +1269,8 @@ static void _rtl92ce_poweroff_adapter(st
>  	rtl_write_word(rtlpriv, REG_GPIO_IO_SEL, 0x0790);
>  	rtl_write_word(rtlpriv, REG_LEDCFG0, 0x8080);
>  	rtl_write_byte(rtlpriv, REG_AFE_PLL_CTRL, 0x80);
> -	rtl_write_byte(rtlpriv, REG_SPS0_CTRL, 0x23);
> +	if (!IS_81xxC_VENDOR_UMC_B_CUT(rtlhal->version))
> +		rtl_write_byte(rtlpriv, REG_SPS0_CTRL, 0x23);
>  	if (rtlpcipriv->bt_coexist.bt_coexistence) {
>  		u4b_tmp = rtl_read_dword(rtlpriv, REG_AFE_XTAL_CTRL);
>  		u4b_tmp |= 0x03824800;
> @@ -1254,6 +1299,9 @@ void rtl92ce_card_disable(struct ieee802
>  		rtlpriv->cfg->ops->led_control(hw, LED_CTL_POWER_OFF);
>  	RT_SET_PS_LEVEL(ppsc, RT_RF_OFF_LEVL_HALT_NIC);
>  	_rtl92ce_poweroff_adapter(hw);
> +
> +	/* after power off we should do iqk again */
> +	rtlpriv->phy.iqk_initialized = false;
>  }
>  
>  void rtl92ce_interrupt_recognized(struct ieee80211_hw *hw,

This part:
> @@ -1355,9 +1403,9 @@ static void _rtl92ce_read_txpower_info_f
>  			tempval = hwinfo[EEPROM_TXPOWERHT40_2SDIFF + i];
>  		else
>  			tempval = EEPROM_DEFAULT_HT40_2SDIFF;
> -		rtlefuse->eeprom_chnlarea_txpwr_ht40_2sdiif[RF90_PATH_A][i] =
> +		rtlefuse->eprom_chnl_txpwr_ht40_2sdf[RF90_PATH_A][i] =
>  		    (tempval & 0xf);
> -		rtlefuse->eeprom_chnlarea_txpwr_ht40_2sdiif[RF90_PATH_B][i] =
> +		rtlefuse->eprom_chnl_txpwr_ht40_2sdf[RF90_PATH_B][i] =
>  		    ((tempval & 0xf0) >> 4);
>  	}
>  
> @@ -1381,7 +1429,7 @@ static void _rtl92ce_read_txpower_info_f
>  				"RF(%d) EEPROM HT40 2S Diff Area(%d) = 0x%x\n",
>  				rf_path, i,
>  				rtlefuse->
> -				eeprom_chnlarea_txpwr_ht40_2sdiif[rf_path][i]);
> +				eprom_chnl_txpwr_ht40_2sdf[rf_path][i]);
>  
>  	for (rf_path = 0; rf_path < 2; rf_path++) {
>  		for (i = 0; i < 14; i++) {
> @@ -1396,14 +1444,14 @@ static void _rtl92ce_read_txpower_info_f
>  			if ((rtlefuse->
>  			     eeprom_chnlarea_txpwr_ht40_1s[rf_path][index] -
>  			     rtlefuse->
> -			     eeprom_chnlarea_txpwr_ht40_2sdiif[rf_path][index])
> +			     eprom_chnl_txpwr_ht40_2sdf[rf_path][index])
>  			    > 0) {
>  				rtlefuse->txpwrlevel_ht40_2s[rf_path][i] =
>  				    rtlefuse->
>  				    eeprom_chnlarea_txpwr_ht40_1s[rf_path]
>  				    [index] -
>  				    rtlefuse->
> -				    eeprom_chnlarea_txpwr_ht40_2sdiif[rf_path]
> +				    eprom_chnl_txpwr_ht40_2sdf[rf_path]
>  				    [index];
>  			} else {
>  				rtlefuse->txpwrlevel_ht40_2s[rf_path][i] = 0;

wasn't in V1 of the patch. Is the rename normal?


Anisse

^ permalink raw reply

* Re: Possible networking regression in 3.6.0
From: Chris Clayton @ 2012-09-18 14:21 UTC (permalink / raw)
  To: netdev
In-Reply-To: <5057455A.7050108@googlemail.com>

On 09/17/12 16:44, Chris Clayton wrote:
> Hi,
>
> I'm having a problem with networking. I'm running Windows XP as a KVM
> guest on a laptop running kernel 3.6.0-rc6. The identical configuration
> works fine with kernels 3.5.4 and 3.4.11 (and has done so, largely
> unchanged, since since KVM was introduced in 2.6.<whatever>.)
>
> The configuration is:
>
> XP guest:    192.168.200.1 (gateway 192.168.200.254)
> tap0:        192.168.200.254
> host:        192.168.0.40 (gateway 192.168.0.1)
> router:        192.168.0.1
>
> The script that starts up the firewall includes the following commands:
>
> # Load the connection-sharing for qemu/kvm guests
> echo 1 > /proc/sys/net/ipv4/ip_forward
> iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
> ...
> # allow traffic to and from the qemu/kvm virtual networks
> NETS="200 201"
> for net in $NETS; do
>    iptables -A INPUT -s 192.168.$net.0/24 -j ACCEPT
>    iptables -A OUTPUT -d 192.168.$net.0/24 -j ACCEPT
> done
> ...
>
> The network-related modules that are loaded are:
>
> $ lsmod
> Module                  Size  Used by
> tun                    12412  0
> xt_state                 891  1
> iptable_filter           852  1
> ipt_MASQUERADE          1222  1
> iptable_nat             3087  1
> nf_nat                 10901  2 ipt_MASQUERADE,iptable_nat
> nf_conntrack_ipv4       4942  4 nf_nat,iptable_nat
> nf_defrag_ipv4           815  1 nf_conntrack_ipv4
> nf_conntrack           37644  5
> ipt_MASQUERADE,nf_nat,xt_state,iptable_nat,nf_conntrack_ipv4
> ...
> r8169                  47159  0
>
>  From the host I can successfully ping the guest, tap0 and the router as
> you would expect, but from the guest, although I can ping the host and
> tap0, I cannot ping the router. In practice, this means I have no
> internet access from the guest. As I say, this configuration works
> perfectly under 3.5.x and 3.4.x kernels.
>
> I'll do a coarse-grained "bisect" of Linus' 3.6 release candidates and
> report back, but does anyone have any prime-suspect patches that may be
> at the cause of this problem?
>

-rc1 turned out to have the problem so I've bisected between 3.5 and 
3.6-rc1. I arrived at:

$ git bisect bad
d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5 is the first bad commit
commit d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5
Author: David S. Miller <davem@davemloft.net>
Date:   Tue Jul 17 12:58:50 2012 -0700

     ipv4: Cache input routes in fib_info nexthops.

     Caching input routes is slightly simpler than output routes, since we
     don't need to be concerned with nexthop exceptions.  (locally
     destined, and routed packets, never trigger PMTU events or redirects
     that will be processed by us).

     However, we have to elide caching for the DIRECTSRC and non-zero itag
     cases.

     Signed-off-by: David S. Miller <davem@davemloft.net>

:040000 040000 6bbc75c1cbe62bf84ea412d3b98adf2b614779cd 
3ad7256b4a71e63ca4530977c0550121ea803d35 M      include
:040000 040000 18c2a950a53c4eec9bfa12185d1e382dfed74af8 
a2ab6157d6cd54930da395758c6ded3a225d1f04 M      net

The bisect log:
git bisect start
# bad: [0d7614f09c1ebdbaa1599a5aba7593f147bf96ee] Linux 3.6-rc1
git bisect bad 0d7614f09c1ebdbaa1599a5aba7593f147bf96ee
# good: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5
git bisect good 28a33cbc24e4256c143dce96c7d93bf423229f92
# bad: [614a6d4341b3760ca98a1c2c09141b71db5d1e90] Merge branch 'for-3.6' 
of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
git bisect bad 614a6d4341b3760ca98a1c2c09141b71db5d1e90
# bad: [320f5ea0cedc08ef65d67e056bcb9d181386ef2c] genetlink: define 
lockdep_genl_is_held() when CONFIG_LOCKDEP
git bisect bad 320f5ea0cedc08ef65d67e056bcb9d181386ef2c
# good: [0cd06647b7c24f6633e32a505930a9aa70138c22] Merge branch 'master' 
of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
git bisect good 0cd06647b7c24f6633e32a505930a9aa70138c22
# good: [dbfa600148a25903976910863c75dae185f8d187] cxgb3: set maximal 
number of default RSS queues
git bisect good dbfa600148a25903976910863c75dae185f8d187
# good: [efdfad3205403e1d1c5c0bdcbdb647ddd89bfaa3] bnx2: Try to recover 
from PCI block reset
git bisect good efdfad3205403e1d1c5c0bdcbdb647ddd89bfaa3
# good: [1bf91cdc1bba94ea062a9147d924815c13f029f2] ixgbe: Drop 
references to deprecated pci_ DMA api and instead use dma_ API
git bisect good 1bf91cdc1bba94ea062a9147d924815c13f029f2
# good: [b6dfd939fdc249fcf8cd7b8006f76239b33eb581] ixgbe: add support 
for new 82599 device
git bisect good b6dfd939fdc249fcf8cd7b8006f76239b33eb581
# good: [3ba97381343b271296487bf073eb670d5465a8b8] net: ethernet: 
davinci_emac: add pm_runtime support
git bisect good 3ba97381343b271296487bf073eb670d5465a8b8
# bad: [5e9965c15ba88319500284e590733f4a4629a288] Merge branch 
'kill_rtcache'
git bisect bad 5e9965c15ba88319500284e590733f4a4629a288
# good: [f5b0a8743601a4477419171f5046bd07d1c080a0] net: Document 
dst->obsolete better.
git bisect good f5b0a8743601a4477419171f5046bd07d1c080a0
# bad: [ba3f7f04ef2b19aace38f855aedd17fe43035d50] ipv4: Kill 
FLOWI_FLAG_RT_NOCACHE and associated code.
git bisect bad ba3f7f04ef2b19aace38f855aedd17fe43035d50
# good: [f2bb4bedf35d5167a073dcdddf16543f351ef3ae] ipv4: Cache output 
routes in fib_info nexthops.
git bisect good f2bb4bedf35d5167a073dcdddf16543f351ef3ae
# bad: [d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5] ipv4: Cache input 
routes in fib_info nexthops.
git bisect bad d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5

Checking out the parent commit 
(f2bb4bedf35d5167a073dcdddf16543f351ef3ae) and building and installing 
the kernel gives a working configuration, so I'm pretty confident in the 
outcome of the bisect. Reversing the patch gives errors, so I've not 
tested master with the patch reversed.

Let me know if I can help in any way to identify a fix.

Chris

> Let me know if there are any other diagnostics I can provide. Also, as
> I'm not subscribed to netdev, please cc me to any reply.
>
> Thanks,
>
> Chris

^ permalink raw reply

* [RFC PATCH v1 0/3] usbnet: runtime suspend when link becomes down
From: Ming Lei @ 2012-09-18 14:23 UTC (permalink / raw)
  To: David S. Miller, Greg Kroah-Hartman
  Cc: Oliver Neukum, Fink Dmitry, Rafael Wysocki, Alan Stern,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA

Hi,

Currently only very few usbnet devices support the traffic based
runtime PM, eg. wake up devices if there are packets to be transmitted.

For the below situation, it should make sense to runtime suspend usbnet
device to save power:

	- after network link becomes down

This patch implements the runtime PM triggered by network link change
event, and it works basically on asix usbnet device after a simple
runtime PM test.

Change Log:
v1:
	-- use system_freezable_wq to fix deadlock by calling
	usbnet_link_updated in same workqueue
	-- use system_freezable_wq to stop link check work during
	system sleep
	-- fix bug of possible change of previous autosuspend_delay
	-- set/clear 'needs_remote_wakeup' for devices which support
	remote wakeup on link change
	-- convert EXPORT_SYMBOL_GPL to EXPORT_SYMBOL
	-- introduce module parameter of link_autocheck_time
	-- introduce link_rpm_supported to not start link
	runtime PM for devices which can't detect link change
	(such as smsc95xx)

	Thanks for Oliver's review.

 drivers/net/usb/asix_devices.c |    6 +-
 drivers/net/usb/cdc_ether.c    |    5 +-
 drivers/net/usb/cdc_ncm.c      |    9 +-
 drivers/net/usb/dm9601.c       |    7 +-
 drivers/net/usb/mcs7830.c      |    6 +-
 drivers/net/usb/sierra_net.c   |    6 +-
 drivers/net/usb/usbnet.c       |  280 +++++++++++++++++++++++++++++++++++++++-
 include/linux/usb/usbnet.h     |   22 +++-
 8 files changed, 306 insertions(+), 35 deletions(-)


Thanks,
--
Ming Lei

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [RFC PATCH v1 2/3] usbnet: apply usbnet_link_change
From: Ming Lei @ 2012-09-18 14:23 UTC (permalink / raw)
  To: David S. Miller, Greg Kroah-Hartman
  Cc: Oliver Neukum, Fink Dmitry, Rafael Wysocki, Alan Stern,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Ming Lei
In-Reply-To: <1347978201-6219-1-git-send-email-ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

This patch applies the introduce usbnet_link_change API.

Signed-off-by: Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 drivers/net/usb/asix_devices.c |    6 +-----
 drivers/net/usb/cdc_ether.c    |    5 +----
 drivers/net/usb/cdc_ncm.c      |    9 +++------
 drivers/net/usb/dm9601.c       |    7 +------
 drivers/net/usb/mcs7830.c      |    6 +-----
 drivers/net/usb/sierra_net.c   |    3 +--
 drivers/net/usb/usbnet.c       |    2 +-
 7 files changed, 9 insertions(+), 29 deletions(-)

diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
index 4fd48df..c354bb1 100644
--- a/drivers/net/usb/asix_devices.c
+++ b/drivers/net/usb/asix_devices.c
@@ -55,11 +55,7 @@ static void asix_status(struct usbnet *dev, struct urb *urb)
 	event = urb->transfer_buffer;
 	link = event->link & 0x01;
 	if (netif_carrier_ok(dev->net) != link) {
-		if (link) {
-			netif_carrier_on(dev->net);
-			usbnet_defer_kevent (dev, EVENT_LINK_RESET );
-		} else
-			netif_carrier_off(dev->net);
+		usbnet_link_change(dev, link, 1);
 		netdev_dbg(dev->net, "Link Status is: %d\n", link);
 	}
 }
diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index a03de71..c6e4be5 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -406,10 +406,7 @@ void usbnet_cdc_status(struct usbnet *dev, struct urb *urb)
 	case USB_CDC_NOTIFY_NETWORK_CONNECTION:
 		netif_dbg(dev, timer, dev->net, "CDC: carrier %s\n",
 			  event->wValue ? "on" : "off");
-		if (event->wValue)
-			netif_carrier_on(dev->net);
-		else
-			netif_carrier_off(dev->net);
+		usbnet_link_change(dev, event->wValue, 0);
 		break;
 	case USB_CDC_NOTIFY_SPEED_CHANGE:	/* tx/rx rates */
 		netif_dbg(dev, timer, dev->net, "CDC: speed change (len %d)\n",
diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 4cd582a..f425c2c 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -593,7 +593,7 @@ advance:
 	 * (carrier is OFF) during attach, so the IP network stack does not
 	 * start IPv6 negotiation and more.
 	 */
-	netif_carrier_off(dev->net);
+	usbnet_link_change(dev, 0, 0);
 	ctx->tx_speed = ctx->rx_speed = 0;
 	return 0;
 
@@ -1131,12 +1131,9 @@ static void cdc_ncm_status(struct usbnet *dev, struct urb *urb)
 			" %sconnected\n",
 			ctx->netdev->name, ctx->connected ? "" : "dis");
 
-		if (ctx->connected)
-			netif_carrier_on(dev->net);
-		else {
-			netif_carrier_off(dev->net);
+		usbnet_link_change(dev, ctx->connected, 0);
+		if (!ctx->connected)
 			ctx->tx_speed = ctx->rx_speed = 0;
-		}
 		break;
 
 	case USB_CDC_NOTIFY_SPEED_CHANGE:
diff --git a/drivers/net/usb/dm9601.c b/drivers/net/usb/dm9601.c
index e0433ce..7422d5a 100644
--- a/drivers/net/usb/dm9601.c
+++ b/drivers/net/usb/dm9601.c
@@ -587,12 +587,7 @@ static void dm9601_status(struct usbnet *dev, struct urb *urb)
 
 	link = !!(buf[0] & 0x40);
 	if (netif_carrier_ok(dev->net) != link) {
-		if (link) {
-			netif_carrier_on(dev->net);
-			usbnet_defer_kevent (dev, EVENT_LINK_RESET);
-		}
-		else
-			netif_carrier_off(dev->net);
+		usbnet_link_change(dev, link, 1);
 		netdev_dbg(dev->net, "Link Status is: %d\n", link);
 	}
 }
diff --git a/drivers/net/usb/mcs7830.c b/drivers/net/usb/mcs7830.c
index 03c2d8d..49a98b7 100644
--- a/drivers/net/usb/mcs7830.c
+++ b/drivers/net/usb/mcs7830.c
@@ -639,11 +639,7 @@ static void mcs7830_status(struct usbnet *dev, struct urb *urb)
 
 	link = !(buf[1] & 0x20);
 	if (netif_carrier_ok(dev->net) != link) {
-		if (link) {
-			netif_carrier_on(dev->net);
-			usbnet_defer_kevent(dev, EVENT_LINK_RESET);
-		} else
-			netif_carrier_off(dev->net);
+		usbnet_link_change(dev, link, 1);
 		netdev_dbg(dev->net, "Link Status is: %d\n", link);
 	}
 }
diff --git a/drivers/net/usb/sierra_net.c b/drivers/net/usb/sierra_net.c
index 7ae70e9..08ed9e5 100644
--- a/drivers/net/usb/sierra_net.c
+++ b/drivers/net/usb/sierra_net.c
@@ -414,11 +414,10 @@ static void sierra_net_handle_lsi(struct usbnet *dev, char *data,
 	if (link_up) {
 		sierra_net_set_ctx_index(priv, hh->msgspecific.byte);
 		priv->link_up = 1;
-		netif_carrier_on(dev->net);
 	} else {
 		priv->link_up = 0;
-		netif_carrier_off(dev->net);
 	}
+	usbnet_link_change(dev, link_up, 0);
 }
 
 static void sierra_net_dosync(struct usbnet *dev)
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index e986e4b..dc4ff47 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1499,7 +1499,7 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
 	netif_device_attach (net);
 
 	if (dev->driver_info->flags & FLAG_LINK_INTR)
-		netif_carrier_off(net);
+		usbnet_link_change(dev, 0, 0);
 
 	return 0;
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [RFC PATCH v1 3/3] usbnet: support runtime PM triggered by link change
From: Ming Lei @ 2012-09-18 14:23 UTC (permalink / raw)
  To: David S. Miller, Greg Kroah-Hartman
  Cc: Oliver Neukum, Fink Dmitry, Rafael Wysocki, Alan Stern,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Ming Lei
In-Reply-To: <1347978201-6219-1-git-send-email-ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

This patch implements runtime PM triggered by link change event
for devices which haven't defined manage_power() callback, based
on the below consideration:

- this kind of runtime PM has been supported by some PCI network
interfaces already, and it does make sense to suspend the usb
device to save power if no link is detected

- link down triggered runtime needn't to be implemented for devices
which have already supported traffic based runtime PM by .manage_power,
because runtime suspend can be triggered when no tx frames are to be
transmitted after link becoms down.

Unfortunately, some usbnet devices don't support remote wakeup,
or some devices may support it but the remote wakup can't be enabled
for link change event for some reason(no documents are public, not
supported ...).

This patch takes a periodic timer to wake up devices for detecting
the link change event if remote wakeup by link change can't be
supported. If the link is found to be down, put the device into
suspend immediately.

For the devices which support remote wakeup by link change and
don't support remote wakeup by incoming packets(not implement
manage_power callback), the patch can still make link change
triggered runtime PM working on these devices.

Signed-off-by: Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 drivers/net/usb/sierra_net.c |    3 +-
 drivers/net/usb/usbnet.c     |  269 +++++++++++++++++++++++++++++++++++++++++-
 include/linux/usb/usbnet.h   |   20 ++++
 3 files changed, 286 insertions(+), 6 deletions(-)

diff --git a/drivers/net/usb/sierra_net.c b/drivers/net/usb/sierra_net.c
index 08ed9e5..0993f2d 100644
--- a/drivers/net/usb/sierra_net.c
+++ b/drivers/net/usb/sierra_net.c
@@ -418,6 +418,7 @@ static void sierra_net_handle_lsi(struct usbnet *dev, char *data,
 		priv->link_up = 0;
 	}
 	usbnet_link_change(dev, link_up, 0);
+	usbnet_link_updated(dev);
 }
 
 static void sierra_net_dosync(struct usbnet *dev)
@@ -915,7 +916,7 @@ static struct sk_buff *sierra_net_tx_fixup(struct usbnet *dev,
 
 static const struct driver_info sierra_net_info_direct_ip = {
 	.description = "Sierra Wireless USB-to-WWAN Modem",
-	.flags = FLAG_WWAN | FLAG_SEND_ZLP,
+	.flags = FLAG_WWAN | FLAG_SEND_ZLP | FLAG_LINK_UPDATE_BY_DRIVER,
 	.bind = sierra_net_bind,
 	.unbind = sierra_net_unbind,
 	.status = sierra_net_status,
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index dc4ff47..14aa39e 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -91,6 +91,12 @@ static int msg_level = -1;
 module_param (msg_level, int, 0);
 MODULE_PARM_DESC (msg_level, "Override default message level");
 
+/* link runtime PM: auto check time */
+static int link_autocheck_time = 3;
+module_param_named(autocheck, link_autocheck_time, int, 0644);
+MODULE_PARM_DESC(autocheck, "default link auto check time in second");
+
+
 /*-------------------------------------------------------------------------*/
 
 /* handles CDC Ethernet and many other network "bulk data" interfaces */
@@ -677,8 +683,228 @@ static void usbnet_terminate_urbs(struct usbnet *dev)
 	remove_wait_queue(&unlink_wakeup, &wait);
 }
 
-void usbnet_link_change(struct usbnet *dev, int link, int need_reset)
+void usbnet_link_updated(struct usbnet *dev)
+{
+	complete(&dev->link_update_completion);
+}
+EXPORT_SYMBOL(usbnet_link_updated);
+
+#define usbnet_link_suspend(dev) do { \
+	dev_dbg(&dev->intf->dev, "%s:link suspend", __func__); \
+	usb_autopm_put_interface_async(dev->intf); \
+} while(0)
+
+#define usbnet_link_resume(dev) do { \
+	dev_dbg(&dev->intf->dev, "%s:link resume", __func__); \
+	usb_autopm_get_interface_async(dev->intf); \
+} while(0)
+
+static int need_link_runtime_pm(struct usbnet *dev)
+{
+	if (dev->driver_info->manage_power)
+		return 0;
+
+	if (!dev->driver_info->status)
+		return 0;
+
+	return 1;
+}
+
+/* called by usbnet_suspend */
+static void start_link_detect(struct usbnet *dev)
 {
+	if (!dev->link_rpm_enabled)
+		return;
+
+	if (dev->link_remote_wakeup)
+		return;
+
+	if (dev->link_check_started)
+		return;
+
+	dev->link_check_started = 1;
+	queue_delayed_work(system_freezable_wq, &dev->link_detect_work,
+			   link_autocheck_time * HZ);
+}
+
+/* called by usbnet_resume */
+static void end_link_detect(struct usbnet *dev, int force_cancel)
+{
+	if (!dev->link_rpm_enabled)
+		return;
+
+	if (!dev->link_check_started)
+		return;
+
+	/*
+	 * cancel the link detect work if usbnet is resumed
+	 * not by link detect work
+	 */
+	if (!dev->link_checking || force_cancel)
+		cancel_delayed_work_sync(&dev->link_detect_work);
+
+	dev->link_check_started = 0;
+}
+
+/* called by usbnet_open */
+static void enable_link_runtime_pm(struct usbnet *dev)
+{
+	dev->link_rpm_enabled = 1;
+
+	if (!dev->link_remote_wakeup) {
+		spin_lock_irq(&dev->udev->dev.power.lock);
+		dev->old_autosuspend_delay =
+			dev->udev->dev.power.autosuspend_delay;
+		spin_unlock_irq(&dev->udev->dev.power.lock);
+		pm_runtime_set_autosuspend_delay(&dev->udev->dev, 1);
+	} else {
+		dev->intf->needs_remote_wakeup = 1;
+	}
+
+	if (!netif_carrier_ok(dev->net) && dev->link_rpm_supported) {
+		dev->link_open_suspend = 1;
+		usbnet_link_suspend(dev);
+	}
+}
+
+/* called by usbnet_stop */
+static void disable_link_runtime_pm(struct usbnet *dev)
+{
+	int delay;
+
+	if (!dev->link_rpm_enabled)
+		return;
+
+	dev->link_rpm_enabled = 0;
+	end_link_detect(dev, 1);
+
+	if (dev->link_open_suspend) {
+		usbnet_link_resume(dev);
+		dev->link_open_suspend = 0;
+	}
+
+	if (dev->link_remote_wakeup) {
+		dev->intf->needs_remote_wakeup = 0;
+		return;
+	}
+
+	spin_lock_irq(&dev->udev->dev.power.lock);
+	delay = dev->udev->dev.power.autosuspend_delay;
+	spin_unlock_irq(&dev->udev->dev.power.lock);
+
+	/*
+	 * If autosuspend delay has been changed after
+	 * enable_link_runtime_pm(), just keep the latest delay.
+	 *
+	 * FIXME: the delay might be changed after the above
+	 * lock is released and before the lock is held in
+	 * pm_runtime_set_autosuspend_delay(), looks no
+	 * big effect.
+	 */
+	if (delay == 1) {
+		delay = dev->old_autosuspend_delay;
+		pm_runtime_set_autosuspend_delay(&dev->udev->dev,
+						 delay);
+	}
+}
+
+static void update_link_state(struct usbnet *dev)
+{
+	char		*buf = NULL;
+	unsigned	pipe = 0;
+	unsigned	maxp;
+	int		ret, act_len, timeout;
+	struct urb	urb;
+
+	pipe = usb_rcvintpipe(dev->udev,
+			      dev->status->desc.bEndpointAddress
+				& USB_ENDPOINT_NUMBER_MASK);
+	maxp = usb_maxpacket(dev->udev, pipe, 0);
+
+	/*
+	 * Take default timeout as 2 times of period.
+	 * It is observed that asix device can update its link
+	 * state duing one period(128ms). Low level driver can set
+	 * its default update link time in bind() callback.
+	 */
+	if (!dev->link_update_timeout) {
+		timeout = max((int) dev->status->desc.bInterval,
+			(dev->udev->speed == USB_SPEED_HIGH) ? 7 : 3);
+		timeout = 1 << timeout;
+		if (dev->udev->speed == USB_SPEED_HIGH)
+			timeout /= 8;
+		if (timeout < 128)
+			timeout = 128;
+	} else
+		timeout = dev->link_update_timeout;
+
+	buf = kmalloc(maxp, GFP_KERNEL);
+	if (!buf)
+		return;
+
+	dev_dbg(&dev->intf->dev, "%s: timeout %dms\n", __func__, timeout);
+	ret = usb_interrupt_msg(dev->udev, pipe, buf, maxp,
+			&act_len, timeout);
+	if (!ret) {
+		urb.status = 0;
+		urb.actual_length = act_len;
+		urb.transfer_buffer = buf;
+		urb.transfer_buffer_length = maxp;
+		dev->driver_info->status(dev, &urb);
+		if (dev->driver_info->flags &
+		    FLAG_LINK_UPDATE_BY_DRIVER)
+			wait_for_completion(&dev->link_update_completion);
+		dev_dbg(&dev->intf->dev, "%s: link updated\n", __func__);
+	} else
+		dev_dbg(&dev->intf->dev, "%s: link update failed %d\n",
+				__func__, ret);
+	kfree(buf);
+}
+
+static void link_detect_work(struct work_struct *work)
+{
+	struct usbnet	*dev = container_of(work, struct usbnet,
+					    link_detect_work.work);
+
+	dev_dbg(&dev->intf->dev, "%s: link resume\n", __func__);
+
+	dev->link_checking = 1;
+	usb_autopm_get_interface(dev->intf);
+	update_link_state(dev);
+	dev->link_checking = 0;
+
+	dev_dbg(&dev->intf->dev, "%s: link state %d\n",
+		__func__, netif_carrier_ok(dev->net));
+
+	if (!netif_carrier_ok(dev->net))
+		usb_autopm_put_interface(dev->intf);
+	else
+		usb_submit_urb(dev->interrupt, GFP_NOIO);
+}
+
+static void init_link_rpm(struct usbnet *dev)
+{
+	INIT_DELAYED_WORK(&dev->link_detect_work, link_detect_work);
+	init_completion(&dev->link_update_completion);
+
+	dev->link_remote_wakeup = !!(dev->driver_info->flags &
+				  FLAG_LINK_SUPPORT_REMOTE_WAKEUP);
+	if (dev->link_remote_wakeup &&
+	    !device_can_wakeup(&dev->udev->dev)) {
+		dev_err(&dev->udev->dev,
+			"I don't support remote wakeup\n");
+		dev->link_remote_wakeup = 0;
+	}
+
+	dev->link_state = 1;
+}
+
+static void __usbnet_link_change(struct usbnet *dev, int link,
+				 int need_reset)
+{
+	dev_dbg(&dev->intf->dev, "%s: old_link=%d link=%d\n", __func__,
+		dev->link_state, link);
+
 	if (link)
 		netif_carrier_on(dev->net);
 	else
@@ -686,6 +912,25 @@ void usbnet_link_change(struct usbnet *dev, int link, int need_reset)
 
 	if (need_reset && link)
 		usbnet_defer_kevent(dev, EVENT_LINK_RESET);
+
+	if (dev->link_rpm_enabled) {
+		if (!link && dev->link_state)
+			usbnet_link_suspend(dev);
+		else if (link && !dev->link_state && dev->link_remote_wakeup)
+			usbnet_link_resume(dev);
+	}
+	dev->link_state = link;
+}
+
+void usbnet_link_change(struct usbnet *dev, int link, int need_reset)
+{
+	/*
+	 * Suppose the low level driver may support link runtime PM
+	 * if it can detect the link change via usbnet_link_change.
+	 */
+	dev->link_rpm_supported = 1;
+
+	__usbnet_link_change(dev, link, need_reset);
 }
 EXPORT_SYMBOL(usbnet_link_change);
 
@@ -731,8 +976,10 @@ int usbnet_stop (struct net_device *net)
 	tasklet_kill (&dev->bh);
 	if (info->manage_power)
 		info->manage_power(dev, 0);
-	else
+	else {
+		disable_link_runtime_pm(dev);
 		usb_autopm_put_interface(dev->intf);
+	}
 
 	return 0;
 }
@@ -807,6 +1054,9 @@ int usbnet_open (struct net_device *net)
 		if (retval < 0)
 			goto done_manage_power_error;
 		usb_autopm_put_interface(dev->intf);
+	} else {
+		if (need_link_runtime_pm(dev))
+			enable_link_runtime_pm(dev);
 	}
 	return retval;
 
@@ -1499,7 +1749,9 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
 	netif_device_attach (net);
 
 	if (dev->driver_info->flags & FLAG_LINK_INTR)
-		usbnet_link_change(dev, 0, 0);
+		__usbnet_link_change(dev, 0, 0);
+
+	init_link_rpm(dev);
 
 	return 0;
 
@@ -1550,6 +1802,9 @@ int usbnet_suspend (struct usb_interface *intf, pm_message_t message)
 		 * wake the device
 		 */
 		netif_device_attach (dev->net);
+
+		if (PMSG_IS_AUTO(message))
+			start_link_detect(dev);
 	}
 	return 0;
 }
@@ -1564,8 +1819,10 @@ int usbnet_resume (struct usb_interface *intf)
 
 	if (!--dev->suspend_count) {
 		/* resume interrupt URBs */
-		if (dev->interrupt && test_bit(EVENT_DEV_OPEN, &dev->flags))
-			usb_submit_urb(dev->interrupt, GFP_NOIO);
+		if (dev->interrupt && test_bit(EVENT_DEV_OPEN, &dev->flags)) {
+			if (!dev->link_checking)
+				usb_submit_urb(dev->interrupt, GFP_NOIO);
+		}
 
 		spin_lock_irq(&dev->txq.lock);
 		while ((res = usb_get_from_anchor(&dev->deferred))) {
@@ -1598,6 +1855,8 @@ int usbnet_resume (struct usb_interface *intf)
 				netif_tx_wake_all_queues(dev->net);
 			tasklet_schedule (&dev->bh);
 		}
+
+		end_link_detect(dev, 0);
 	}
 	return 0;
 }
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index 1937b74..d23dae5 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -68,6 +68,19 @@ struct usbnet {
 #		define EVENT_RX_PAUSED	5
 #		define EVENT_DEV_ASLEEP 6
 #		define EVENT_DEV_OPEN	7
+
+	/* link down triggered runtime PM */
+	struct delayed_work	link_detect_work;
+	struct completion	link_update_completion;
+	int			link_update_timeout;
+	int			old_autosuspend_delay;
+	unsigned int		link_rpm_supported:1;
+	unsigned int		link_rpm_enabled:1;
+	unsigned int		link_check_started:1;
+	unsigned int		link_checking:1;
+	unsigned int		link_open_suspend:1;
+	unsigned int		link_state:1;
+	unsigned int		link_remote_wakeup:1;
 };
 
 static inline struct usb_driver *driver_of(struct usb_interface *intf)
@@ -106,6 +119,12 @@ struct driver_info {
 #define FLAG_MULTI_PACKET	0x2000
 #define FLAG_RX_ASSEMBLE	0x4000	/* rx packets may span >1 frames */
 
+/* some drivers may not update link state in .status */
+#define FLAG_LINK_UPDATE_BY_DRIVER	0x8000
+
+/* device support remote wakeup by link change */
+#define FLAG_LINK_SUPPORT_REMOTE_WAKEUP	0x10000
+
 	/* init device ... can sleep, or cause probe() failure */
 	int	(*bind)(struct usbnet *, struct usb_interface *);
 
@@ -161,6 +180,7 @@ extern int usbnet_suspend(struct usb_interface *, pm_message_t);
 extern int usbnet_resume(struct usb_interface *);
 extern void usbnet_disconnect(struct usb_interface *);
 extern void usbnet_link_change(struct usbnet *dev, int link, int need_reset);
+extern void usbnet_link_updated(struct usbnet *dev);
 
 /* Drivers that reuse some of the standard USB CDC infrastructure
  * (notably, using multiple interfaces according to the CDC
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [RFC PATCH v1 1/3] usbnet: introduce usbnet_link_change API
From: Ming Lei @ 2012-09-18 14:23 UTC (permalink / raw)
  To: David S. Miller, Greg Kroah-Hartman
  Cc: Oliver Neukum, Fink Dmitry, Rafael Wysocki, Alan Stern, netdev,
	linux-usb, Ming Lei
In-Reply-To: <1347978201-6219-1-git-send-email-ming.lei@canonical.com>

This patch introduces the API of usbnet_link_change, so that
usbnet can trace the link change, which may help to implement
the later runtime PM triggered by usb ethernet link change.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/net/usb/usbnet.c   |   13 ++++++++++++-
 include/linux/usb/usbnet.h |    2 +-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index e944109..e986e4b 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -677,6 +677,18 @@ static void usbnet_terminate_urbs(struct usbnet *dev)
 	remove_wait_queue(&unlink_wakeup, &wait);
 }
 
+void usbnet_link_change(struct usbnet *dev, int link, int need_reset)
+{
+	if (link)
+		netif_carrier_on(dev->net);
+	else
+		netif_carrier_off(dev->net);
+
+	if (need_reset && link)
+		usbnet_defer_kevent(dev, EVENT_LINK_RESET);
+}
+EXPORT_SYMBOL(usbnet_link_change);
+
 int usbnet_stop (struct net_device *net)
 {
 	struct usbnet		*dev = netdev_priv(net);
@@ -1591,7 +1603,6 @@ int usbnet_resume (struct usb_interface *intf)
 }
 EXPORT_SYMBOL_GPL(usbnet_resume);
 
-
 /*-------------------------------------------------------------------------*/
 
 static int __init usbnet_init(void)
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index f87cf62..1937b74 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -160,7 +160,7 @@ extern int usbnet_probe(struct usb_interface *, const struct usb_device_id *);
 extern int usbnet_suspend(struct usb_interface *, pm_message_t);
 extern int usbnet_resume(struct usb_interface *);
 extern void usbnet_disconnect(struct usb_interface *);
-
+extern void usbnet_link_change(struct usbnet *dev, int link, int need_reset);
 
 /* Drivers that reuse some of the standard USB CDC infrastructure
  * (notably, using multiple interfaces according to the CDC
-- 
1.7.9.5

^ permalink raw reply related

* Re: xt_hashlimit.c race?
From: Eric Dumazet @ 2012-09-18 14:26 UTC (permalink / raw)
  To: "Oleg A. Arkhangelsky"; +Cc: netdev
In-Reply-To: <20201347974535@web11g.yandex.ru>

On Tue, 2012-09-18 at 17:22 +0400, "Oleg A. Arkhangelsky" wrote:
> Hello,
> 
> Looking at the net/netfilter/xt_hashlimit.c revealed one question. As far as
> I can understand hashlimit_mt() code under rcu_read_lock_bh() can be
> executed simultaneously by more than one CPU. So what if we have two
> packets with the same new dst value that processed in parallel by different
> CPUs? In both cases dh is NULL and both CPUs tries to create new
> entry in hash table. This is not what we want and can lead to undefined
> behavior in the future.
> 
> Or maybe I'm wrong? Could anyone tell me is this situation possible?
> 

Its absolutely possible, but should not have big impact.

One of the newly inserted entry will never be reached again and will
expire.

Following (untested) patch should remove the race.

 net/netfilter/xt_hashlimit.c |  125 ++++++++++++++++-----------------
 1 file changed, 62 insertions(+), 63 deletions(-)

diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 26a668a..246bc92 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -136,56 +136,6 @@ hash_dst(const struct xt_hashlimit_htable *ht, const struct dsthash_dst *dst)
 	return ((u64)hash * ht->cfg.size) >> 32;
 }
 
-static struct dsthash_ent *
-dsthash_find(const struct xt_hashlimit_htable *ht,
-	     const struct dsthash_dst *dst)
-{
-	struct dsthash_ent *ent;
-	struct hlist_node *pos;
-	u_int32_t hash = hash_dst(ht, dst);
-
-	if (!hlist_empty(&ht->hash[hash])) {
-		hlist_for_each_entry_rcu(ent, pos, &ht->hash[hash], node)
-			if (dst_cmp(ent, dst)) {
-				spin_lock(&ent->lock);
-				return ent;
-			}
-	}
-	return NULL;
-}
-
-/* allocate dsthash_ent, initialize dst, put in htable and lock it */
-static struct dsthash_ent *
-dsthash_alloc_init(struct xt_hashlimit_htable *ht,
-		   const struct dsthash_dst *dst)
-{
-	struct dsthash_ent *ent;
-
-	spin_lock(&ht->lock);
-	/* initialize hash with random val at the time we allocate
-	 * the first hashtable entry */
-	if (unlikely(!ht->rnd_initialized)) {
-		get_random_bytes(&ht->rnd, sizeof(ht->rnd));
-		ht->rnd_initialized = true;
-	}
-
-	if (ht->cfg.max && ht->count >= ht->cfg.max) {
-		/* FIXME: do something. question is what.. */
-		net_err_ratelimited("max count of %u reached\n", ht->cfg.max);
-		ent = NULL;
-	} else
-		ent = kmem_cache_alloc(hashlimit_cachep, GFP_ATOMIC);
-	if (ent) {
-		memcpy(&ent->dst, dst, sizeof(ent->dst));
-		spin_lock_init(&ent->lock);
-
-		spin_lock(&ent->lock);
-		hlist_add_head_rcu(&ent->node, &ht->hash[hash_dst(ht, dst)]);
-		ht->count++;
-	}
-	spin_unlock(&ht->lock);
-	return ent;
-}
 
 static void dsthash_free_rcu(struct rcu_head *head)
 {
@@ -577,12 +527,70 @@ static u32 hashlimit_byte_cost(unsigned int len, struct dsthash_ent *dh)
 	return (u32) tmp;
 }
 
+static struct dsthash_ent *
+dsthash_find(struct xt_hashlimit_htable *ht,
+	     const struct dsthash_dst *dst)
+{
+	struct dsthash_ent *dh;
+	struct hlist_node *pos;
+	u_int32_t hash = hash_dst(ht, dst);
+	unsigned long now = jiffies;
+
+	hlist_for_each_entry_rcu(dh, pos, &ht->hash[hash], node) {
+		if (dst_cmp(dh, dst)) {
+found:
+			spin_lock(&dh->lock);
+			/* update expiration timeout */
+			dh->expires = now + msecs_to_jiffies(ht->cfg.expire);
+			rateinfo_recalc(dh, now, ht->cfg.mode);
+			return dh;
+		}
+	}
+
+	/* slow path */
+	spin_lock(&ht->lock);
+
+	/* initialize hash with random val at the time we allocate
+	 * the first hashtable entry
+	 */
+	if (unlikely(!ht->rnd_initialized)) {
+		get_random_bytes(&ht->rnd, sizeof(ht->rnd));
+		ht->rnd_initialized = true;
+	}
+	hash = hash_dst(ht, dst);
+	hlist_for_each_entry_rcu(dh, pos, &ht->hash[hash], node) {
+		if (dst_cmp(dh, dst)) {
+			spin_unlock(&ht->lock);
+			goto found;
+		}
+	}
+
+	if (ht->cfg.max && ht->count >= ht->cfg.max) {
+		/* FIXME: do something. question is what.. */
+		net_err_ratelimited("max count of %u reached\n", ht->cfg.max);
+		dh = NULL;
+	} else {
+		dh = kmem_cache_alloc(hashlimit_cachep, GFP_ATOMIC);
+	}
+	if (dh) {
+		memcpy(&dh->dst, dst, sizeof(dh->dst));
+		spin_lock_init(&dh->lock);
+
+		spin_lock(&dh->lock);
+		hlist_add_head_rcu(&dh->node, &ht->hash[hash_dst(ht, dst)]);
+		ht->count++;
+		dh->expires = now + msecs_to_jiffies(ht->cfg.expire);
+		rateinfo_init(dh, ht);
+	}
+	spin_unlock(&ht->lock);
+	return dh;
+}
+
 static bool
 hashlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
 {
 	const struct xt_hashlimit_mtinfo1 *info = par->matchinfo;
 	struct xt_hashlimit_htable *hinfo = info->hinfo;
-	unsigned long now = jiffies;
 	struct dsthash_ent *dh;
 	struct dsthash_dst dst;
 	u32 cost;
@@ -593,17 +601,8 @@ hashlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	rcu_read_lock_bh();
 	dh = dsthash_find(hinfo, &dst);
 	if (dh == NULL) {
-		dh = dsthash_alloc_init(hinfo, &dst);
-		if (dh == NULL) {
-			rcu_read_unlock_bh();
-			goto hotdrop;
-		}
-		dh->expires = jiffies + msecs_to_jiffies(hinfo->cfg.expire);
-		rateinfo_init(dh, hinfo);
-	} else {
-		/* update expiration timeout */
-		dh->expires = now + msecs_to_jiffies(hinfo->cfg.expire);
-		rateinfo_recalc(dh, now, hinfo->cfg.mode);
+		rcu_read_unlock_bh();
+		goto hotdrop;
 	}
 
 	if (info->cfg.mode & XT_HASHLIMIT_BYTES)
@@ -624,7 +623,7 @@ hashlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	/* default match is underlimit - so over the limit, we need to invert */
 	return info->cfg.mode & XT_HASHLIMIT_INVERT;
 
- hotdrop:
+hotdrop:
 	par->hotdrop = true;
 	return false;
 }

^ permalink raw reply related

* Re: [RFC PATCH v1 0/3] usbnet: runtime suspend when link becomes down
From: Oliver Neukum @ 2012-09-18 14:28 UTC (permalink / raw)
  To: Ming Lei
  Cc: David S. Miller, Greg Kroah-Hartman, Fink Dmitry, Rafael Wysocki,
	Alan Stern, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1347978201-6219-1-git-send-email-ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

On Tuesday 18 September 2012 22:23:18 Ming Lei wrote:
> Hi,
> 
> Currently only very few usbnet devices support the traffic based
> runtime PM, eg. wake up devices if there are packets to be transmitted.

Hi,

independent of the rest it seems to me that the first two patches in your
series are a useful cleanup by themselves. Could you submit them separately?

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Possible networking regression in 3.6.0
From: Chris Clayton @ 2012-09-18 14:31 UTC (permalink / raw)
  To: netdev
In-Reply-To: <50588371.40103@googlemail.com>

>> ...
>> r8169                  47159  0
>>
>>  From the host I can successfully ping the guest, tap0 and the router as
>> you would expect, but from the guest, although I can ping the host and
>> tap0, I cannot ping the router. In practice, this means I have no
>> internet access from the guest. As I say, this configuration works
>> perfectly under 3.5.x and 3.4.x kernels.
>>
>> I'll do a coarse-grained "bisect" of Linus' 3.6 release candidates and
>> report back, but does anyone have any prime-suspect patches that may be
>> at the cause of this problem?
>>
>
> -rc1 turned out to have the problem so I've bisected between 3.5 and
> 3.6-rc1. I arrived at:
>
> $ git bisect bad
> d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5 is the first bad commit
> commit d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5
> Author: David S. Miller <davem@davemloft.net>
> Date:   Tue Jul 17 12:58:50 2012 -0700
>
>      ipv4: Cache input routes in fib_info nexthops.
>
>      Caching input routes is slightly simpler than output routes, since we
>      don't need to be concerned with nexthop exceptions.  (locally
>      destined, and routed packets, never trigger PMTU events or redirects
>      that will be processed by us).
>
>      However, we have to elide caching for the DIRECTSRC and non-zero itag
>      cases.
>
>      Signed-off-by: David S. Miller <davem@davemloft.net>
>
> :040000 040000 6bbc75c1cbe62bf84ea412d3b98adf2b614779cd
> 3ad7256b4a71e63ca4530977c0550121ea803d35 M      include
> :040000 040000 18c2a950a53c4eec9bfa12185d1e382dfed74af8
> a2ab6157d6cd54930da395758c6ded3a225d1f04 M      net
>
> The bisect log:
> git bisect start
> # bad: [0d7614f09c1ebdbaa1599a5aba7593f147bf96ee] Linux 3.6-rc1
> git bisect bad 0d7614f09c1ebdbaa1599a5aba7593f147bf96ee
> # good: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5
> git bisect good 28a33cbc24e4256c143dce96c7d93bf423229f92
> # bad: [614a6d4341b3760ca98a1c2c09141b71db5d1e90] Merge branch 'for-3.6'
> of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
> git bisect bad 614a6d4341b3760ca98a1c2c09141b71db5d1e90
> # bad: [320f5ea0cedc08ef65d67e056bcb9d181386ef2c] genetlink: define
> lockdep_genl_is_held() when CONFIG_LOCKDEP
> git bisect bad 320f5ea0cedc08ef65d67e056bcb9d181386ef2c
> # good: [0cd06647b7c24f6633e32a505930a9aa70138c22] Merge branch 'master'
> of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
> git bisect good 0cd06647b7c24f6633e32a505930a9aa70138c22
> # good: [dbfa600148a25903976910863c75dae185f8d187] cxgb3: set maximal
> number of default RSS queues
> git bisect good dbfa600148a25903976910863c75dae185f8d187
> # good: [efdfad3205403e1d1c5c0bdcbdb647ddd89bfaa3] bnx2: Try to recover
> from PCI block reset
> git bisect good efdfad3205403e1d1c5c0bdcbdb647ddd89bfaa3
> # good: [1bf91cdc1bba94ea062a9147d924815c13f029f2] ixgbe: Drop
> references to deprecated pci_ DMA api and instead use dma_ API
> git bisect good 1bf91cdc1bba94ea062a9147d924815c13f029f2
> # good: [b6dfd939fdc249fcf8cd7b8006f76239b33eb581] ixgbe: add support
> for new 82599 device
> git bisect good b6dfd939fdc249fcf8cd7b8006f76239b33eb581
> # good: [3ba97381343b271296487bf073eb670d5465a8b8] net: ethernet:
> davinci_emac: add pm_runtime support
> git bisect good 3ba97381343b271296487bf073eb670d5465a8b8
> # bad: [5e9965c15ba88319500284e590733f4a4629a288] Merge branch
> 'kill_rtcache'
> git bisect bad 5e9965c15ba88319500284e590733f4a4629a288
> # good: [f5b0a8743601a4477419171f5046bd07d1c080a0] net: Document
> dst->obsolete better.
> git bisect good f5b0a8743601a4477419171f5046bd07d1c080a0
> # bad: [ba3f7f04ef2b19aace38f855aedd17fe43035d50] ipv4: Kill
> FLOWI_FLAG_RT_NOCACHE and associated code.
> git bisect bad ba3f7f04ef2b19aace38f855aedd17fe43035d50
> # good: [f2bb4bedf35d5167a073dcdddf16543f351ef3ae] ipv4: Cache output
> routes in fib_info nexthops.
> git bisect good f2bb4bedf35d5167a073dcdddf16543f351ef3ae
> # bad: [d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5] ipv4: Cache input
> routes in fib_info nexthops.
> git bisect bad d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5
>
> Checking out the parent commit
> (f2bb4bedf35d5167a073dcdddf16543f351ef3ae) and building and installing
> the kernel gives a working configuration, so I'm pretty confident in the
> outcome of the bisect. Reversing the patch gives errors, so I've not
> tested master with the patch reversed.
>
> Let me know if I can help in any way to identify a fix.
>
Sorry, I forgot to say that I also have tried running TinyCore Linux as 
a KVM guest on a 3.6.0-rc6 kernel, and I can ping the router fine, so 
the problem seems to be something specifically related to ruuning 
Windows XP as the guest. I don't have any other guests installed so 
that's as much as I can say, although I could maybe install a Win7 guest 
tomorrow if that would help.

> Chris
>
>> Let me know if there are any other diagnostics I can provide. Also, as
>> I'm not subscribed to netdev, please cc me to any reply.
>>
>> Thanks,
>>
>> Chris

^ permalink raw reply

* Re: [RFC PATCH v1 0/3] usbnet: runtime suspend when link becomes down
From: Ming Lei @ 2012-09-18 14:36 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: David S. Miller, Greg Kroah-Hartman, Fink Dmitry, Rafael Wysocki,
	Alan Stern, netdev, linux-usb
In-Reply-To: <29958246.BU7m2dD7VT@linux-lqwf.site>

On Tue, Sep 18, 2012 at 10:28 PM, Oliver Neukum <oneukum@suse.de> wrote:
> On Tuesday 18 September 2012 22:23:18 Ming Lei wrote:
>> Hi,
>>
>> Currently only very few usbnet devices support the traffic based
>> runtime PM, eg. wake up devices if there are packets to be transmitted.
>
> Hi,
>
> independent of the rest it seems to me that the first two patches in your
> series are a useful cleanup by themselves. Could you submit them separately?

IMO, if the first two are OK now, David may commit the first two only.

Thanks,
--
Ming Lei

^ permalink raw reply

* Re: Possible networking regression in 3.6.0
From: Eric Dumazet @ 2012-09-18 14:40 UTC (permalink / raw)
  To: Chris Clayton; +Cc: netdev
In-Reply-To: <505885DC.1060006@googlemail.com>

On Tue, 2012-09-18 at 15:31 +0100, Chris Clayton wrote:
> >> ...
> >> r8169                  47159  0
> >>
> >>  From the host I can successfully ping the guest, tap0 and the router as
> >> you would expect, but from the guest, although I can ping the host and
> >> tap0, I cannot ping the router. In practice, this means I have no
> >> internet access from the guest. As I say, this configuration works
> >> perfectly under 3.5.x and 3.4.x kernels.
> >>
> >> I'll do a coarse-grained "bisect" of Linus' 3.6 release candidates and
> >> report back, but does anyone have any prime-suspect patches that may be
> >> at the cause of this problem?
> >>
> >
> > -rc1 turned out to have the problem so I've bisected between 3.5 and
> > 3.6-rc1. I arrived at:
> >
> > $ git bisect bad
> > d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5 is the first bad commit
> > commit d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5
> > Author: David S. Miller <davem@davemloft.net>
> > Date:   Tue Jul 17 12:58:50 2012 -0700
> >
> >      ipv4: Cache input routes in fib_info nexthops.
> >
> >      Caching input routes is slightly simpler than output routes, since we
> >      don't need to be concerned with nexthop exceptions.  (locally
> >      destined, and routed packets, never trigger PMTU events or redirects
> >      that will be processed by us).
> >
> >      However, we have to elide caching for the DIRECTSRC and non-zero itag
> >      cases.
> >
> >      Signed-off-by: David S. Miller <davem@davemloft.net>
> >
> > :040000 040000 6bbc75c1cbe62bf84ea412d3b98adf2b614779cd
> > 3ad7256b4a71e63ca4530977c0550121ea803d35 M      include
> > :040000 040000 18c2a950a53c4eec9bfa12185d1e382dfed74af8
> > a2ab6157d6cd54930da395758c6ded3a225d1f04 M      net
> >
> > The bisect log:
> > git bisect start
> > # bad: [0d7614f09c1ebdbaa1599a5aba7593f147bf96ee] Linux 3.6-rc1
> > git bisect bad 0d7614f09c1ebdbaa1599a5aba7593f147bf96ee
> > # good: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5
> > git bisect good 28a33cbc24e4256c143dce96c7d93bf423229f92
> > # bad: [614a6d4341b3760ca98a1c2c09141b71db5d1e90] Merge branch 'for-3.6'
> > of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
> > git bisect bad 614a6d4341b3760ca98a1c2c09141b71db5d1e90
> > # bad: [320f5ea0cedc08ef65d67e056bcb9d181386ef2c] genetlink: define
> > lockdep_genl_is_held() when CONFIG_LOCKDEP
> > git bisect bad 320f5ea0cedc08ef65d67e056bcb9d181386ef2c
> > # good: [0cd06647b7c24f6633e32a505930a9aa70138c22] Merge branch 'master'
> > of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
> > git bisect good 0cd06647b7c24f6633e32a505930a9aa70138c22
> > # good: [dbfa600148a25903976910863c75dae185f8d187] cxgb3: set maximal
> > number of default RSS queues
> > git bisect good dbfa600148a25903976910863c75dae185f8d187
> > # good: [efdfad3205403e1d1c5c0bdcbdb647ddd89bfaa3] bnx2: Try to recover
> > from PCI block reset
> > git bisect good efdfad3205403e1d1c5c0bdcbdb647ddd89bfaa3
> > # good: [1bf91cdc1bba94ea062a9147d924815c13f029f2] ixgbe: Drop
> > references to deprecated pci_ DMA api and instead use dma_ API
> > git bisect good 1bf91cdc1bba94ea062a9147d924815c13f029f2
> > # good: [b6dfd939fdc249fcf8cd7b8006f76239b33eb581] ixgbe: add support
> > for new 82599 device
> > git bisect good b6dfd939fdc249fcf8cd7b8006f76239b33eb581
> > # good: [3ba97381343b271296487bf073eb670d5465a8b8] net: ethernet:
> > davinci_emac: add pm_runtime support
> > git bisect good 3ba97381343b271296487bf073eb670d5465a8b8
> > # bad: [5e9965c15ba88319500284e590733f4a4629a288] Merge branch
> > 'kill_rtcache'
> > git bisect bad 5e9965c15ba88319500284e590733f4a4629a288
> > # good: [f5b0a8743601a4477419171f5046bd07d1c080a0] net: Document
> > dst->obsolete better.
> > git bisect good f5b0a8743601a4477419171f5046bd07d1c080a0
> > # bad: [ba3f7f04ef2b19aace38f855aedd17fe43035d50] ipv4: Kill
> > FLOWI_FLAG_RT_NOCACHE and associated code.
> > git bisect bad ba3f7f04ef2b19aace38f855aedd17fe43035d50
> > # good: [f2bb4bedf35d5167a073dcdddf16543f351ef3ae] ipv4: Cache output
> > routes in fib_info nexthops.
> > git bisect good f2bb4bedf35d5167a073dcdddf16543f351ef3ae
> > # bad: [d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5] ipv4: Cache input
> > routes in fib_info nexthops.
> > git bisect bad d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5
> >
> > Checking out the parent commit
> > (f2bb4bedf35d5167a073dcdddf16543f351ef3ae) and building and installing
> > the kernel gives a working configuration, so I'm pretty confident in the
> > outcome of the bisect. Reversing the patch gives errors, so I've not
> > tested master with the patch reversed.
> >
> > Let me know if I can help in any way to identify a fix.
> >
> Sorry, I forgot to say that I also have tried running TinyCore Linux as 
> a KVM guest on a 3.6.0-rc6 kernel, and I can ping the router fine, so 
> the problem seems to be something specifically related to ruuning 
> Windows XP as the guest. I don't have any other guests installed so 
> that's as much as I can say, although I could maybe install a Win7 guest 
> tomorrow if that would help.

It would help to have some traffic sample, maybe.

Especially if the problem is not easily reproductible for us.

(I dont have Windows XP nor Win7)

Also the bisect might point to a commit with an already fixed bug :

commit 4331debc51ee1ce319f4a389484e0e8e05de2aca
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Jul 25 05:11:23 2012 +0000

    ipv4: rt_cache_valid must check expired routes
    
    commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.)
    introduced rt_cache_valid() helper. It unfortunately doesn't check if
    route is expired before caching it.
    
    I noticed sk_setup_caps() was constantly called on a tcp workload.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: Possible networking regression in 3.6.0
From: Chris Clayton @ 2012-09-18 14:44 UTC (permalink / raw)
  To: netdev
In-Reply-To: <505885DC.1060006@googlemail.com>

>>
> Sorry, I forgot to say that I also have tried running TinyCore Linux as
> a KVM guest on a 3.6.0-rc6 kernel, and I can ping the router fine, so
> the problem seems to be something specifically related to ruuning
> Windows XP as the guest. I don't have any other guests installed so
> that's as much as I can say, although I could maybe install a Win7 guest
> tomorrow if that would help.
>

Sorry again, but ignore the message above, please. Wrong kernel used in 
test. In fact, I get the same failure to ping the router running on a 
6.6.0-rc6 kernel.

Apologies for the noise.

Chris

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox