Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Fwd: Re: RTL 8169  linux driver question
From: Stéphane ANCELOT @ 2012-11-26  7:35 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, sancelot, Hayes Wang
In-Reply-To: <20121123192056.GA30290@electric-eye.fr.zoreil.com>

On 23/11/2012 20:20, Francois Romieu wrote:
> Stéphane ANCELOT <sancelot@free.fr> :
> [...]
>> I have an adapted version of this driver for a realtime linux kernel
>> and a 8168/811B rev 2 component (as listed by lspci).
> You should grep for the XID line in the kernel dmesg to identify
> the chipset version.
XID 1c4000c0
>> I had problem with it, my application sends a frame that is
>> immediately transmitted back by some slaves, there was abnormally
>> 100us lost between the send and receive call.
>>
>> Finally I found it was coming from the following register setup in
>> the driver :
>>
>> RTL_W16(IntrMitigate, 0x5151);
>>
>> Can you give me some details about it, since I do not have the
>> RTL8169 programming guide.
> "Reserved" in my 2007 8168c rev1.0 datasheet.
>
> I merged it long ago from Realtek's driver. It has now changed to 0x5f51.
>
> On the old PCI 8169, bits 15..8 relate to Tx and 7..0 to Rx. Bits 7..4
> count in units of 125 us and bits 0..3 in packet units. You may give
> 0x..00 a try.
>
> Hayes knows better for the 8168 line.
>
>> /100us is important since this component acts as an Ethercat Master
>> running at 1ms./
> Which realtime kernel is it ?
xenomai.

^ permalink raw reply

* [PATCH net-next v2] net: clean up locking in inet_frag_find()
From: Cong Wang @ 2012-11-26  7:26 UTC (permalink / raw)
  To: netdev
  Cc: Eric Dumazet, Patrick McHardy, Pablo Neira Ayuso, David S. Miller,
	Cong Wang
In-Reply-To: <1353909184.11282.3.camel@cr0>

It is weird to take the read lock outside of inet_frag_find()
but release it inside...  This can be improved by refactoring
the code, that is, introducing inet{4,6}_frag_find() which call
the their own hash function, inet{4,6}_hash_frag(), hiding the
details from their callers.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>

---
 include/net/inet_frag.h                 |   17 +++++-
 include/net/ipv6.h                      |    3 -
 net/ipv4/inet_fragment.c                |   82 +++++++++++++++++++++++++++++--
 net/ipv4/ip_fragment.c                  |   16 +-----
 net/ipv6/netfilter/nf_conntrack_reasm.c |    7 +--
 net/ipv6/reassembly.c                   |   34 +------------
 6 files changed, 97 insertions(+), 62 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 32786a0..7314ec5 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -1,6 +1,8 @@
 #ifndef __NET_FRAG_H__
 #define __NET_FRAG_H__
 
+#include <linux/in6.h>
+
 struct netns_frags {
 	int			nqueues;
 	atomic_t		mem;
@@ -62,9 +64,18 @@ void inet_frag_kill(struct inet_frag_queue *q, struct inet_frags *f);
 void inet_frag_destroy(struct inet_frag_queue *q,
 				struct inet_frags *f, int *work);
 int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f, bool force);
-struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
-		struct inet_frags *f, void *key, unsigned int hash)
-	__releases(&f->lock);
+
+extern unsigned int inet4_hash_frag(__be16 id, __be32 saddr, __be32 daddr,
+				    u8 prot, u32 rnd);
+struct inet_frag_queue *inet4_frag_find(struct netns_frags *nf,
+		struct inet_frags *f, void *key);
+
+#if IS_ENABLED(CONFIG_IPV6)
+extern unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
+				    const struct in6_addr *daddr, u32 rnd);
+struct inet_frag_queue *inet6_frag_find(struct netns_frags *nf,
+		struct inet_frags *f, void *key);
+#endif
 
 static inline void inet_frag_put(struct inet_frag_queue *q, struct inet_frags *f)
 {
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 979bf6c..ba19ebc 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -695,9 +695,6 @@ extern int ip6_mc_msfilter(struct sock *sk, struct group_filter *gsf);
 extern int ip6_mc_msfget(struct sock *sk, struct group_filter *gsf,
 			 struct group_filter __user *optval,
 			 int __user *optlen);
-extern unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
-				    const struct in6_addr *daddr, u32 rnd);
-
 #ifdef CONFIG_PROC_FS
 extern int  ac6_proc_init(struct net *net);
 extern void ac6_proc_exit(struct net *net);
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 4750d2b..7290238 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -20,7 +20,10 @@
 #include <linux/skbuff.h>
 #include <linux/rtnetlink.h>
 #include <linux/slab.h>
+#include <linux/jhash.h>
+#include <linux/ip.h>
 
+#include <net/ipv6.h>
 #include <net/inet_frag.h>
 
 static void inet_frag_secret_rebuild(unsigned long dummy)
@@ -270,9 +273,8 @@ static struct inet_frag_queue *inet_frag_create(struct netns_frags *nf,
 	return inet_frag_intern(nf, q, f, arg);
 }
 
-struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
+static struct inet_frag_queue *__inet_frag_find(struct netns_frags *nf,
 		struct inet_frags *f, void *key, unsigned int hash)
-	__releases(&f->lock)
 {
 	struct inet_frag_queue *q;
 	struct hlist_node *n;
@@ -280,12 +282,84 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
 	hlist_for_each_entry(q, n, &f->hash[hash], list) {
 		if (q->net == nf && f->match(q, key)) {
 			atomic_inc(&q->refcnt);
-			read_unlock(&f->lock);
 			return q;
 		}
 	}
+
+	return NULL;
+}
+
+unsigned int inet4_hash_frag(__be16 id, __be32 saddr, __be32 daddr, u8 prot, u32 rnd)
+{
+	return jhash_3words((__force u32)id << 16 | prot,
+			    (__force u32)saddr, (__force u32)daddr,
+			    rnd) & (INETFRAGS_HASHSZ - 1);
+}
+EXPORT_SYMBOL_GPL(inet4_hash_frag);
+
+struct inet_frag_queue *inet4_frag_find(struct netns_frags *nf,
+		struct inet_frags *f, void *key)
+{
+	struct inet_frag_queue *q;
+	struct iphdr *iph;
+	unsigned int hash;
+
+	iph = *(struct iphdr **)key;
+
+	read_lock(&f->lock);
+	hash = inet4_hash_frag(iph->id, iph->saddr, iph->daddr, iph->protocol, f->rnd);
+	q = __inet_frag_find(nf, f, key, hash);
 	read_unlock(&f->lock);
+	if (q)
+		return q;
+	return inet_frag_create(nf, f, key);
+}
+EXPORT_SYMBOL(inet4_frag_find);
+
+#if IS_ENABLED(CONFIG_IPV6)
+/*
+ * callers should be careful not to use the hash value outside the ipfrag_lock
+ * as doing so could race with ipfrag_hash_rnd being recalculated.
+ */
+unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
+			     const struct in6_addr *daddr, u32 rnd)
+{
+	u32 c;
+
+	c = jhash_3words((__force u32)saddr->s6_addr32[0],
+			 (__force u32)saddr->s6_addr32[1],
+			 (__force u32)saddr->s6_addr32[2],
+			 rnd);
+
+	c = jhash_3words((__force u32)saddr->s6_addr32[3],
+			 (__force u32)daddr->s6_addr32[0],
+			 (__force u32)daddr->s6_addr32[1],
+			 c);
+
+	c =  jhash_3words((__force u32)daddr->s6_addr32[2],
+			  (__force u32)daddr->s6_addr32[3],
+			  (__force u32)id,
+			  c);
+
+	return c & (INETFRAGS_HASHSZ - 1);
+}
+EXPORT_SYMBOL_GPL(inet6_hash_frag);
 
+
+struct inet_frag_queue *inet6_frag_find(struct netns_frags *nf,
+		struct inet_frags *f, void *key)
+{
+	struct inet_frag_queue *q;
+	struct ip6_create_arg *arg = key;
+	unsigned int hash;
+
+	read_lock(&f->lock);
+	hash = inet6_hash_frag(arg->id, arg->src, arg->dst, f->rnd);
+	q = __inet_frag_find(nf, f, key, hash);
+	read_unlock(&f->lock);
+	if (q)
+		return q;
 	return inet_frag_create(nf, f, key);
 }
-EXPORT_SYMBOL(inet_frag_find);
+EXPORT_SYMBOL(inet6_frag_find);
+#endif
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 1cf6a76..0a02037 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -32,7 +32,6 @@
 #include <linux/ip.h>
 #include <linux/icmp.h>
 #include <linux/netdevice.h>
-#include <linux/jhash.h>
 #include <linux/random.h>
 #include <linux/slab.h>
 #include <net/route.h>
@@ -133,19 +132,12 @@ struct ip4_create_arg {
 	u32 user;
 };
 
-static unsigned int ipqhashfn(__be16 id, __be32 saddr, __be32 daddr, u8 prot)
-{
-	return jhash_3words((__force u32)id << 16 | prot,
-			    (__force u32)saddr, (__force u32)daddr,
-			    ip4_frags.rnd) & (INETFRAGS_HASHSZ - 1);
-}
-
 static unsigned int ip4_hashfn(struct inet_frag_queue *q)
 {
 	struct ipq *ipq;
 
 	ipq = container_of(q, struct ipq, q);
-	return ipqhashfn(ipq->id, ipq->saddr, ipq->daddr, ipq->protocol);
+	return inet4_hash_frag(ipq->id, ipq->saddr, ipq->daddr, ipq->protocol, ip4_frags.rnd);
 }
 
 static bool ip4_frag_match(struct inet_frag_queue *q, void *a)
@@ -290,15 +282,11 @@ static inline struct ipq *ip_find(struct net *net, struct iphdr *iph, u32 user)
 {
 	struct inet_frag_queue *q;
 	struct ip4_create_arg arg;
-	unsigned int hash;
 
 	arg.iph = iph;
 	arg.user = user;
 
-	read_lock(&ip4_frags.lock);
-	hash = ipqhashfn(iph->id, iph->saddr, iph->daddr, iph->protocol);
-
-	q = inet_frag_find(&net->ipv4.frags, &ip4_frags, &arg, hash);
+	q = inet4_frag_find(&net->ipv4.frags, &ip4_frags, &arg);
 	if (q == NULL)
 		goto out_nomem;
 
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 22c8ea9..2d51584 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -168,17 +168,14 @@ static inline struct frag_queue *fq_find(struct net *net, __be32 id,
 {
 	struct inet_frag_queue *q;
 	struct ip6_create_arg arg;
-	unsigned int hash;
 
 	arg.id = id;
 	arg.user = user;
 	arg.src = src;
 	arg.dst = dst;
 
-	read_lock_bh(&nf_frags.lock);
-	hash = inet6_hash_frag(id, src, dst, nf_frags.rnd);
-
-	q = inet_frag_find(&net->nf_frag.frags, &nf_frags, &arg, hash);
+	local_bh_disable();
+	q = inet6_frag_find(&net->nf_frag.frags, &nf_frags, &arg);
 	local_bh_enable();
 	if (q == NULL)
 		goto oom;
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index e5253ec..08c5063 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -70,34 +70,6 @@ static struct inet_frags ip6_frags;
 static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 			  struct net_device *dev);
 
-/*
- * callers should be careful not to use the hash value outside the ipfrag_lock
- * as doing so could race with ipfrag_hash_rnd being recalculated.
- */
-unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
-			     const struct in6_addr *daddr, u32 rnd)
-{
-	u32 c;
-
-	c = jhash_3words((__force u32)saddr->s6_addr32[0],
-			 (__force u32)saddr->s6_addr32[1],
-			 (__force u32)saddr->s6_addr32[2],
-			 rnd);
-
-	c = jhash_3words((__force u32)saddr->s6_addr32[3],
-			 (__force u32)daddr->s6_addr32[0],
-			 (__force u32)daddr->s6_addr32[1],
-			 c);
-
-	c =  jhash_3words((__force u32)daddr->s6_addr32[2],
-			  (__force u32)daddr->s6_addr32[3],
-			  (__force u32)id,
-			  c);
-
-	return c & (INETFRAGS_HASHSZ - 1);
-}
-EXPORT_SYMBOL_GPL(inet6_hash_frag);
-
 static unsigned int ip6_hashfn(struct inet_frag_queue *q)
 {
 	struct frag_queue *fq;
@@ -186,17 +158,13 @@ fq_find(struct net *net, __be32 id, const struct in6_addr *src, const struct in6
 {
 	struct inet_frag_queue *q;
 	struct ip6_create_arg arg;
-	unsigned int hash;
 
 	arg.id = id;
 	arg.user = IP6_DEFRAG_LOCAL_DELIVER;
 	arg.src = src;
 	arg.dst = dst;
 
-	read_lock(&ip6_frags.lock);
-	hash = inet6_hash_frag(id, src, dst, ip6_frags.rnd);
-
-	q = inet_frag_find(&net->ipv6.frags, &ip6_frags, &arg, hash);
+	q = inet6_frag_find(&net->ipv6.frags, &ip6_frags, &arg);
 	if (q == NULL)
 		return NULL;
 

^ permalink raw reply related

* Re: [PATCH RFC] [INET]: Get cirtical word in first 64bit of cache line
From: Eric Dumazet @ 2012-11-26  6:44 UTC (permalink / raw)
  To: ling.ma.program; +Cc: linux-kernel, netdev
In-Reply-To: <1353900555-5966-1-git-send-email-ling.ma.program@gmail.com>

On Mon, 2012-11-26 at 11:29 +0800, ling.ma.program@gmail.com wrote:
> From: Ma Ling <ling.ma.program@gmail.com>
> 
> In order to reduce memory latency when last level cache miss occurs,
> modern CPUs i.e. x86 and arm introduced Critical Word First(CWF) or
> Early Restart(ER) to get data ASAP. For CWF if critical word is first member
> in cache line, memory feed CPU with critical word, then fill others
> data in cache line one by one, otherwise after critical word it must
> cost more cycle to fill the remaining cache line. For Early First CPU will
> restart until critical word in cache line reaches.
> 
> Hash value is critical word, so in this patch we place it as first member
> in cache line(sock address is cache-line aligned), and it is also good for
> Early Restart platform as well .
> 
> Thanks
> Ling

networking patches should be sent to netdev.

(I understand this patch is more a generic one, but at least CC netdev)

You give no performance numbers for this change...

I never heard of this CWF/ER, where are the official Intel documents
about this, and what models really benefit from it ?

Also, why not moving skc_net as well ?

BTW, skc_daddr & skc_rcv_saddr are 'critical' as well, we use them in
INET_MATCH()

It seems we have a 32bit hole on 64bit arches, so we probably should
move inet_dport/inet_num in it. It could well remove a full cache line
miss (I'll send a patch for this after tests)

Thanks

^ permalink raw reply

* Re: [net-next:master 20/25] warning: (NET_DSA_TAG_DSA && NET_DSA_TAG_EDSA && NET_DSA_TAG_TRAILER) selects NET_DSA which has unmet direct dependencies (NET && EXPERIMENTAL && NETDEVICES && !S390)
From: Viresh Kumar @ 2012-11-26  6:13 UTC (permalink / raw)
  To: kbuild test robot, Ben Hutchings, davem; +Cc: netdev
In-Reply-To: <50b2f66a.czw+uqRm3Q/zfbER%fengguang.wu@intel.com>

Hi Ben/Dave,

On 26 November 2012 10:26, kbuild test robot <fengguang.wu@intel.com> wrote:
> tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
> head:   53d6841d225b93c20d561878637c3cd307c11648
> commit: 82167cb8c6b2f8166d5c7532e5ef4b5e0cc46a72 [20/25] net: dsa/slave: Fix compilation warnings
> config: make ARCH=s390 allmodconfig
>
> All warnings:
>
> warning: (NET_DSA_TAG_DSA && NET_DSA_TAG_EDSA && NET_DSA_TAG_TRAILER) selects NET_DSA which has unmet direct dependencies (NET && EXPERIMENTAL && NETDEVICES && !S390)
> warning: (NET_DSA_TAG_DSA && NET_DSA_TAG_EDSA && NET_DSA_TAG_TRAILER) selects NET_DSA which has unmet direct dependencies (NET && EXPERIMENTAL && NETDEVICES && !S390)
> --
> warning: (NET_DSA_TAG_DSA && NET_DSA_TAG_EDSA && NET_DSA_TAG_TRAILER) selects NET_DSA which has unmet direct dependencies (NET && EXPERIMENTAL && NETDEVICES && !S390)

I am not really sure how to fix this up. Can you help?

--
viresh

^ permalink raw reply

* Re: [RFC net-next PATCH V1 9/9] net: frag remove readers-writer lock (hack)
From: Stephen Hemminger @ 2012-11-26  6:03 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, David S. Miller, Florian Westphal, netdev,
	Pablo Neira Ayuso, Thomas Graf, Cong Wang, Patrick McHardy,
	Paul E. McKenney, Herbert Xu
In-Reply-To: <20121123130847.18764.87682.stgit@dragon>

On Fri, 23 Nov 2012 14:08:47 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> Do NOT APPLY this patch.
> 
> After all the other patches, the rw_lock is now the contention point.

Reader-writer locks are significantly slower than a simple spinlock.
Unless the reader holds the lock for "significant" number of cycles,
a spin lock will be faster. 

Did you try just changing it to a spinlock?

^ permalink raw reply

* Re: [PATCH net-next] net: clean up locking in inet_frag_find()
From: Cong Wang @ 2012-11-26  5:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Patrick McHardy, Pablo Neira Ayuso, David S. Miller
In-Reply-To: <1353908592.30446.1136.camel@edumazet-glaptop>

On Sun, 2012-11-25 at 21:43 -0800, Eric Dumazet wrote:
> On Mon, 2012-11-26 at 13:18 +0800, Cong Wang wrote:
> > It is weird to take the read lock outside of inet_frag_find()
> > but release it inside... Also, the hash function doesn't
> > need this lock.
> 
> This patch is wrong. hash needs the lock.
> 
> 

Indeed, I missed .rnd...

I am cooking a better patch now.

Thanks!

^ permalink raw reply

* Re: [PATCH net-next] net: clean up locking in inet_frag_find()
From: Eric Dumazet @ 2012-11-26  5:43 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, Patrick McHardy, Pablo Neira Ayuso, David S. Miller
In-Reply-To: <1353907085-30824-1-git-send-email-amwang@redhat.com>

On Mon, 2012-11-26 at 13:18 +0800, Cong Wang wrote:
> It is weird to take the read lock outside of inet_frag_find()
> but release it inside... Also, the hash function doesn't
> need this lock.

This patch is wrong. hash needs the lock.

^ permalink raw reply

* [PATCH net-next] net: clean up locking in inet_frag_find()
From: Cong Wang @ 2012-11-26  5:18 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy, Pablo Neira Ayuso, David S. Miller, Cong Wang

It is weird to take the read lock outside of inet_frag_find()
but release it inside... Also, the hash function doesn't
need this lock.

Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>

---
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 4750d2b..caf1807 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -272,11 +272,11 @@ static struct inet_frag_queue *inet_frag_create(struct netns_frags *nf,
 
 struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
 		struct inet_frags *f, void *key, unsigned int hash)
-	__releases(&f->lock)
 {
 	struct inet_frag_queue *q;
 	struct hlist_node *n;
 
+	read_lock(&f->lock);
 	hlist_for_each_entry(q, n, &f->hash[hash], list) {
 		if (q->net == nf && f->match(q, key)) {
 			atomic_inc(&q->refcnt);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 1cf6a76..d9541f4 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -295,9 +295,7 @@ static inline struct ipq *ip_find(struct net *net, struct iphdr *iph, u32 user)
 	arg.iph = iph;
 	arg.user = user;
 
-	read_lock(&ip4_frags.lock);
 	hash = ipqhashfn(iph->id, iph->saddr, iph->daddr, iph->protocol);
-
 	q = inet_frag_find(&net->ipv4.frags, &ip4_frags, &arg, hash);
 	if (q == NULL)
 		goto out_nomem;
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 22c8ea9..b778423 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -175,9 +175,8 @@ static inline struct frag_queue *fq_find(struct net *net, __be32 id,
 	arg.src = src;
 	arg.dst = dst;
 
-	read_lock_bh(&nf_frags.lock);
 	hash = inet6_hash_frag(id, src, dst, nf_frags.rnd);
-
+	local_bh_disable();
 	q = inet_frag_find(&net->nf_frag.frags, &nf_frags, &arg, hash);
 	local_bh_enable();
 	if (q == NULL)
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index e5253ec..15e09a6 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -193,9 +193,7 @@ fq_find(struct net *net, __be32 id, const struct in6_addr *src, const struct in6
 	arg.src = src;
 	arg.dst = dst;
 
-	read_lock(&ip6_frags.lock);
 	hash = inet6_hash_frag(id, src, dst, ip6_frags.rnd);
-
 	q = inet_frag_find(&net->ipv6.frags, &ip6_frags, &arg, hash);
 	if (q == NULL)
 		return NULL;

^ permalink raw reply related

* [net-next:master 20/25] warning: (NET_DSA_TAG_DSA && NET_DSA_TAG_EDSA && NET_DSA_TAG_TRAILER) selects NET_DSA which has unmet direct dependencies (NET && EXPERIMENTAL && NETDEVICES && !S390)
From: kbuild test robot @ 2012-11-26  4:56 UTC (permalink / raw)
  To: viresh kumar; +Cc: netdev

tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   53d6841d225b93c20d561878637c3cd307c11648
commit: 82167cb8c6b2f8166d5c7532e5ef4b5e0cc46a72 [20/25] net: dsa/slave: Fix compilation warnings
config: make ARCH=s390 allmodconfig

All warnings:

warning: (NET_DSA_TAG_DSA && NET_DSA_TAG_EDSA && NET_DSA_TAG_TRAILER) selects NET_DSA which has unmet direct dependencies (NET && EXPERIMENTAL && NETDEVICES && !S390)
warning: (NET_DSA_TAG_DSA && NET_DSA_TAG_EDSA && NET_DSA_TAG_TRAILER) selects NET_DSA which has unmet direct dependencies (NET && EXPERIMENTAL && NETDEVICES && !S390)
--
warning: (NET_DSA_TAG_DSA && NET_DSA_TAG_EDSA && NET_DSA_TAG_TRAILER) selects NET_DSA which has unmet direct dependencies (NET && EXPERIMENTAL && NETDEVICES && !S390)

---
0-DAY kernel build testing backend         Open Source Technology Center
Fengguang Wu, Yuanhan Liu                              Intel Corporation

^ permalink raw reply

* Re: [PATCH 080/493] fddi: remove use of __devexit_p
From: Maciej W. Rozycki @ 2012-11-26  4:35 UTC (permalink / raw)
  To: Greg KH; +Cc: Bill Pemberton, netdev
In-Reply-To: <20121122002337.GA16151@kroah.com>

On Wed, 21 Nov 2012, Greg KH wrote:

> >  TURBOchannel, although valid, is an old exotic case that might not be 
> > worth arguing for, except for purity maybe.  But there are surely many 
> > contemporary systems out there that are known they are never going to 
> > support hot device replacement.  Consider most of the embedded systems for 
> > example, where devices may even physically be cast into a single SOC (with 
> > no prospect of chipping off any pieces ever ;) ), that certainly could not 
> > care less of device replacement, but they do care a lot about memory 
> > consumption.
> 
> Even those don't care about less than 5k of memory, do they?

 I guess you're right.  As long as it's not 5k per driver + 
who_knows_how_much per platform for some generic stuff that is.  Of course 
this is still a waste, but I can accept it as a compromise between the use 
of machine resources and the cost of maintenance.

> >  Was this implication considered, discussed and diregarded as not 
> > important enough compared to benefits from hardcoding HOTPLUG support?  
> 
> Yes, I don't know of any modern system that does not enable
> CONFIG_HOTPLUG, do you?

 I've never enabled it outside x86 to be honest.  None of the platforms I 
care of supports it in hardware.

> > I'm seriously asking for a pointer, not trying to cause any stir-up -- 
> > regrettably I fail to follow most discussions these days, but I would like 
> > to know what the background behind this decision was.  Thanks a lot!
> 
> See Russell's response in this thread for details if you are curious.

 Hmm, not in my inbox for some reason (unless I'm blind), but thanks for 
the pointer, I'll see if I can chase it online in some mailing list 
archive.

  Maciej

^ permalink raw reply

* linux-next: manual merge of the cgroup tree with the net-next tree
From: Stephen Rothwell @ 2012-11-26  4:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-next-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Daniel Wagner, David Miller,
	netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1203 bytes --]

Hi Tejun,

Today's linux-next merge of the cgroup tree got a conflict in
net/sched/cls_cgroup.c between commit 6a328d8c6f03 ("cgroup: net_cls:
Rework update socket logic") from the net-next tree and commits
92fb97487a7e ("cgroup: rename ->create/post_create/pre_destroy/destroy()
to ->css_alloc/online/offline/free()") and 0ba18f7a5e26 ("netcls_cgroup:
move config inheritance to ->css_online() and remove .broken_hierarchy
marking") from the cgroup tree.

I fixed it up (I think - see below) and can carry the fix as necessary
(no action is required).

-- 
Cheers,
Stephen Rothwell                    sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org

diff --cc net/sched/cls_cgroup.c
index 709b0fb,31f06b6..0000000
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@@ -98,9 -79,9 +102,10 @@@ static struct cftype ss_files[] = 

  struct cgroup_subsys net_cls_subsys = {
  	.name		= "net_cls",
- 	.create		= cgrp_create,
- 	.destroy	= cgrp_destroy,
+ 	.css_alloc	= cgrp_css_alloc,
+ 	.css_online	= cgrp_css_online,
+ 	.css_free	= cgrp_css_free,
 +	.attach		= cgrp_attach,
  	.subsys_id	= net_cls_subsys_id,
  	.base_cftypes	= ss_files,
  	.module		= THIS_MODULE,

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [PATCH] ip6mr: Add sizeof verification to MRT6_ASSERT and MT6_PIM
From: Joe Perches @ 2012-11-26  4:26 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev
In-Reply-To: <20121125.164036.1283541826012498001.davem@davemloft.net>

Verify the length of the user-space arguments.

Signed-off-by: Joe Perches <joe@perches.com>
---
 net/ipv6/ip6mr.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 79bb490..926ea54 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1646,6 +1646,9 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
 	case MRT6_ASSERT:
 	{
 		int v;
+
+		if (optlen != sizeof(v))
+			return -EINVAL;
 		if (get_user(v, (int __user *)optval))
 			return -EFAULT;
 		mrt->mroute_do_assert = v;
@@ -1656,6 +1659,9 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
 	case MRT6_PIM:
 	{
 		int v;
+
+		if (optlen != sizeof(v))
+			return -EINVAL;
 		if (get_user(v, (int __user *)optval))
 			return -EFAULT;
 		v = !!v;

^ permalink raw reply related

* Re: [PATCH] net: ipmr: limit MRT_TABLE identifiers
From: Chen Gang @ 2012-11-26  3:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1353898520.30446.868.camel@edumazet-glaptop>

于 2012年11月26日 10:55, Eric Dumazet 写道:
> On Mon, 2012-11-26 at 10:30 +0800, Chen Gang wrote:
> 
>>   maybe a hacker can do ?  (I just guess).
>>
>>   for my experience:
>>     It is necessary to think of more coding ways, when we have to give comments inside a function.
> 
> I have absolutely no idea of what you are trying to say.
> 

excuse me, my English is not quite well.
I will try to say it as clear as I can.

what I want to say are:
  for os api, we need describe them as full as we can (including limitations, although it seems minor).
  I do not suggest to give comments inside a fuction (especially coding with hard code number).
  they are just my suggestion, not mean, it should be regression.

at least, I think this patch is valuable.

-- 
Chen Gang

Asianux Corporation

^ permalink raw reply

* Re: [RFC net-next PATCH V1 5/9] net: frag per CPU mem limit and LRU list accounting
From: Cong Wang @ 2012-11-26  2:54 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, David S. Miller, Florian Westphal, netdev,
	Pablo Neira Ayuso, Thomas Graf, Patrick McHardy, Paul E. McKenney,
	Herbert Xu
In-Reply-To: <20121123130826.18764.66507.stgit@dragon>

On Fri, 2012-11-23 at 14:08 +0100, Jesper Dangaard Brouer wrote:
> +       // QUESTION: Please advice in a better alloc solution than
> NR_CPUS
> +       struct cpu_resource     percpu[NR_CPUS]; 

alloc_percpu(struct cpu_resource) ?

BTW, struct cpu_resource is not a good name here, it is too generic.

^ permalink raw reply

* Re: [PATCH] net: ipmr: limit MRT_TABLE identifiers
From: Eric Dumazet @ 2012-11-26  2:55 UTC (permalink / raw)
  To: Chen Gang; +Cc: David Miller, netdev
In-Reply-To: <50B2D42A.6090804@asianux.com>

On Mon, 2012-11-26 at 10:30 +0800, Chen Gang wrote:

>   maybe a hacker can do ?  (I just guess).
> 
>   for my experience:
>     It is necessary to think of more coding ways, when we have to give comments inside a function.

I have absolutely no idea of what you are trying to say.

^ permalink raw reply

* Re: [PATCH] net: ipmr: limit MRT_TABLE identifiers
From: Chen Gang @ 2012-11-26  2:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1353896372.30446.866.camel@edumazet-glaptop>

于 2012年11月26日 10:19, Eric Dumazet 写道:
> On Mon, 2012-11-26 at 09:34 +0800, Chen Gang wrote:
> 
>>     if "pimre%u" (or another format), will not hurt the functional features, I suggest to use it
>>       since, we need try our best to not touch the OS API. 
>>       ("pimreg%u" seems an internal format, not OS API Level)
> 
> Have you taken a look at user code base before suggesting such a
> change ?

  maybe a hacker can do ?  (I just guess).

  for my experience:
    It is necessary to think of more coding ways, when we have to give comments inside a function.


> 
> My patch is the safest change: Just make sure machine doesnt crash if
> user ask stupid things.
> 
> No possible regression.
> 
> 
> 
> 
> 


-- 
Chen Gang

Asianux Corporation

^ permalink raw reply

* Re: [PATCH] net: ipmr: limit MRT_TABLE identifiers
From: Eric Dumazet @ 2012-11-26  2:19 UTC (permalink / raw)
  To: Chen Gang; +Cc: David Miller, netdev
In-Reply-To: <50B2C72F.9000100@asianux.com>

On Mon, 2012-11-26 at 09:34 +0800, Chen Gang wrote:

>     if "pimre%u" (or another format), will not hurt the functional features, I suggest to use it
>       since, we need try our best to not touch the OS API. 
>       ("pimreg%u" seems an internal format, not OS API Level)

Have you taken a look at user code base before suggesting such a
change ?

My patch is the safest change: Just make sure machine doesnt crash if
user ask stupid things.

No possible regression.

^ permalink raw reply

* Re: [PATCH] net: ipmr: limit MRT_TABLE identifiers
From: Chen Gang @ 2012-11-26  1:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <50B2C470.5090802@asianux.com>

于 2012年11月26日 09:22, Chen Gang 写道:
> 于 2012年11月26日 03:44, Eric Dumazet 写道:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> Name of pimreg devices are built from following format :
>>
>> char name[IFNAMSIZ]; // IFNAMSIZ == 16
>>
>> sprintf(name, "pimreg%u", mrt->id);
>>
>> We must therefore limit mrt->id to 9 decimal digits
>> or risk a buffer overflow and a crash.
>>
>> Restrict table identifiers in [0 ... 999999999] interval.
>>

    if "pimre%u" (or another format), will not hurt the functional features, I suggest to use it
      since, we need try our best to not touch the OS API. 
      ("pimreg%u" seems an internal format, not OS API Level)

> 
>   if we have to stick to "pimreg%u" (or will hurt the functional features)
>   suggest to let user mode know this limitation. 
>     define a macro in public header (user mode can know it) and give comments.
>     use macro instead of number.
>     remove the comments which is inside internal function.
> 
>   thanks.
> 
> gchen.
> 
> 
>> Reported-by: Chen Gang <gang.chen@asianux.com>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> ---
>>  net/ipv4/ipmr.c |    4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
>> index 6168c4d..3eab2b2 100644
>> --- a/net/ipv4/ipmr.c
>> +++ b/net/ipv4/ipmr.c
>> @@ -1318,6 +1318,10 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
>>  		if (get_user(v, (u32 __user *)optval))
>>  			return -EFAULT;
>>  
>> +		/* "pimreg%u" should not exceed 16 bytes (IFNAMSIZ) */
>> +		if (v != RT_TABLE_DEFAULT && v >= 1000000000)
>> +			return -EINVAL;
>> +
>>  		rtnl_lock();
>>  		ret = 0;
>>  		if (sk == rtnl_dereference(mrt->mroute_sk)) {
>>
>>
>>
>>
> 
> 


-- 
Chen Gang

Asianux Corporation

^ permalink raw reply

* Re: [PATCH] net: ipmr: limit MRT_TABLE identifiers
From: Chen Gang @ 2012-11-26  1:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1353872669.30446.863.camel@edumazet-glaptop>

于 2012年11月26日 03:44, Eric Dumazet 写道:
> From: Eric Dumazet <edumazet@google.com>
> 
> Name of pimreg devices are built from following format :
> 
> char name[IFNAMSIZ]; // IFNAMSIZ == 16
> 
> sprintf(name, "pimreg%u", mrt->id);
> 
> We must therefore limit mrt->id to 9 decimal digits
> or risk a buffer overflow and a crash.
> 
> Restrict table identifiers in [0 ... 999999999] interval.
> 

  if we have to stick to "pimreg%u" (or will hurt the functional features)
  suggest to let user mode know this limitation. 
    define a macro in public header (user mode can know it) and give comments.
    use macro instead of number.
    remove the comments which is inside internal function.

  thanks.

gchen.


> Reported-by: Chen Gang <gang.chen@asianux.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/ipv4/ipmr.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index 6168c4d..3eab2b2 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -1318,6 +1318,10 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
>  		if (get_user(v, (u32 __user *)optval))
>  			return -EFAULT;
>  
> +		/* "pimreg%u" should not exceed 16 bytes (IFNAMSIZ) */
> +		if (v != RT_TABLE_DEFAULT && v >= 1000000000)
> +			return -EINVAL;
> +
>  		rtnl_lock();
>  		ret = 0;
>  		if (sk == rtnl_dereference(mrt->mroute_sk)) {
> 
> 
> 
> 


-- 
Chen Gang

Asianux Corporation

^ permalink raw reply

* Re: [PATCH] atm: br2684: Fix excessive queue bloat
From: Krzysztof Mazur @ 2012-11-25 23:52 UTC (permalink / raw)
  To: David Woodhouse
  Cc: netdev, John Crispin, Dave Täht, Chas Williams (CONTRACTOR)
In-Reply-To: <1353880892.26346.300.camel@shinybook.infradead.org>

On Sun, Nov 25, 2012 at 10:01:32PM +0000, David Woodhouse wrote:
> 
> Yeah. This is fairly much the same conversation I ended up having when I
> did the same for PPPoATM.
> 
> Some day *perhaps* we might look at doing something adaptive, so it'll
> detect a TX underrun and increase the amount of buffering. But this
> seems perfectly good for now.

The biggest problem is the existence of additional, mostly unnecessary
huge software queue that we have between us and the hardware on "ATM sockets".
Most of ATM drivers have some hardware queue, but accept and queue
additional frames in software and it just adds unnecessary delay
- it's much better to queue those frames in netdev queue. With this patch
we limit this queue by limiting the number of frames in this queue and
hardware queue to 2 frames, but the hardware may require larger *hardware*
queue to saturate the link. If we had only the hardware queue, the queue
would be probably sized adequately in ATM driver or could be tuned by user.

I think that the 2 frame queue is much better than the huge queue,
that we have now.

> 
> > Maybe this magic "2" and the comment should be moved to some #define.
> 
> Doesn't make it any less magic. I'm happier with it as it is, with a
> clear comment describing why it's done that way.
> 

In PPPoATM we have NONE_INFLIGHT.

Krzysiek

^ permalink raw reply

* Re: [PATCH] 8139cp: re-enable interrupts after tx timeout
From: Francois Romieu @ 2012-11-25 22:43 UTC (permalink / raw)
  To: David Woodhouse; +Cc: netdev, jasowang, gilboad, jgarzik
In-Reply-To: <1353799650.26346.289.camel@shinybook.infradead.org>

David Woodhouse <dwmw2@infradead.org> :
> On Sat, 2012-11-24 at 23:58 +0100, Francois Romieu wrote:
> > While you are at it, do you see what would prevent the watchdog
> > timer and the rx napi handler to race on two different CPUs
> 
> Hm, good point. Is it sufficient just to stick napi_disable() and
> napi_enable() in there?

No, napi_disable() may sleep.

> And should we try to reset the TX state machine without stomping on
> RX at all ?

Either that or the watchdog timer will have to queue some user context
work. Things may change later when you go lockless (hint, hint).

-- 
Ueimor

^ permalink raw reply

* [PATCH v2] atm: br2684: Fix excessive queue bloat
From: David Woodhouse @ 2012-11-25 22:06 UTC (permalink / raw)
  To: netdev, John Crispin, Dave Täht, Chas Williams (CONTRACTOR)

[-- Attachment #1: Type: text/plain, Size: 5330 bytes --]

There's really no excuse for an additional wmem_default of buffering
between the netdev queue and the ATM device. Two packets (one in-flight,
and one ready to send) ought to be fine. It's not as if it should take
long to get another from the netdev queue when we need it.

If necessary we can make the queue space configurable later, but I don't
think it's likely to be necessary.

cf. commit 9d02daf754238adac48fa075ee79e7edd3d79ed3 (pppoatm: Fix
excessive queue bloat) which did something very similar for PPPoATM.

Note that there is a tremendously unlikely race condition which may
result in qspace temporarily going negative. If a CPU running the
br2684_pop() function goes off into the weeds for a long period of time
after incrementing qspace to 1, but before calling netdev_wake_queue()...
and another CPU ends up calling br2684_start_xmit() and *stopping* the
queue again before the first CPU comes back, the netdev queue could
end up being woken when qspace has already reached zero.

An alternative approach to coping with this race would be to check in
br2684_start_xmit() for qspace==0 and return NETDEV_TX_BUSY, but just
using '> 0' and '< 1' for comparison instead of '== 0' and '!= 0' is
simpler. It just warranted a mention of *why* we do it that way...

Move the call to atmvcc->send() to happen *after* the accounting and
potentially stopping the netdev queue, in br2684_xmit_vcc(). This matters
if the ->send() call suffers an immediate failure, because it'll call
br2684_pop() with the offending skb before returning. We want that to
happen *after* we've done the initial accounting for the packet in
question. Also make it return an appropriate success/failure indication
while we're at it.

Tested by running 'ping -l 1000 bottomless.aaisp.net.uk' from within my
network, with only a single PPPoE-over-BR2684 link running. And after
setting txqueuelen on the nas0 interface to something low (5, in fact).
Before the patch, we'd see about 15 packets being queued and a resulting
latency of ~56ms being reached. After the patch, we see only about 8,
which is fairly much what we expect. And a max latency of ~36ms. On this
OpenWRT box, wmem_default is 163840.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Reviewed-by: Krzysztof Mazur <krzysiek@podlesie.net>
---
[v2: Comment format fixed]

 net/atm/br2684.c | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/net/atm/br2684.c b/net/atm/br2684.c
index 4819d315..8eb6fbe 100644
--- a/net/atm/br2684.c
+++ b/net/atm/br2684.c
@@ -74,6 +74,7 @@ struct br2684_vcc {
 	struct br2684_filter filter;
 #endif /* CONFIG_ATM_BR2684_IPFILTER */
 	unsigned int copies_needed, copies_failed;
+	atomic_t qspace;
 };

 struct br2684_dev {
@@ -181,18 +182,15 @@ static struct notifier_block atm_dev_notifier = {
 static void br2684_pop(struct atm_vcc *vcc, struct sk_buff *skb)
 {
 	struct br2684_vcc *brvcc = BR2684_VCC(vcc);
-	struct net_device *net_dev = skb->dev;

-	pr_debug("(vcc %p ; net_dev %p )\n", vcc, net_dev);
+	pr_debug("(vcc %p ; net_dev %p )\n", vcc, brvcc->device);
 	brvcc->old_pop(vcc, skb);

-	if (!net_dev)
-		return;
-
-	if (atm_may_send(vcc, 0))
-		netif_wake_queue(net_dev);
-
+	/* If the queue space just went up from zero, wake */
+	if (atomic_inc_return(&brvcc->qspace) == 1)
+		netif_wake_queue(brvcc->device);
 }
+
 /*
  * Send a packet out a particular vcc.  Not to useful right now, but paves
  * the way for multiple vcc's per itf.  Returns true if we can send,
@@ -256,16 +254,19 @@ static int br2684_xmit_vcc(struct sk_buff *skb, struct net_device *dev,
 	ATM_SKB(skb)->atm_options = atmvcc->atm_options;
 	dev->stats.tx_packets++;
 	dev->stats.tx_bytes += skb->len;
-	atmvcc->send(atmvcc, skb);

-	if (!atm_may_send(atmvcc, 0)) {
+	if (atomic_dec_return(&brvcc->qspace) < 1) {
+		/* No more please! */
 		netif_stop_queue(brvcc->device);
-		/*check for race with br2684_pop*/
-		if (atm_may_send(atmvcc, 0))
-			netif_start_queue(brvcc->device);
+		/* We might have raced with br2684_pop() */
+		if (unlikely(atomic_read(&brvcc->qspace) > 0))
+			netif_wake_queue(brvcc->device);
 	}

-	return 1;
+	/* If this fails immediately, the skb will be freed and br2684_pop()
+	   will wake the queue if appropriate. Just return an error so that
+	   the stats are updated correctly */
+	return !atmvcc->send(atmvcc, skb);
 }

 static inline struct br2684_vcc *pick_outgoing_vcc(const struct sk_buff *skb,
@@ -504,6 +505,13 @@ static int br2684_regvcc(struct atm_vcc *atmvcc, void __user * arg)
 	brvcc = kzalloc(sizeof(struct br2684_vcc), GFP_KERNEL);
 	if (!brvcc)
 		return -ENOMEM;
+	/*
+	 * Allow two packets in the ATM queue. One actually being sent, and one
+	 * for the ATM 'TX done' handler to send. It shouldn't take long to get
+	 * the next one from the netdev queue, when we need it. More than that
+	 * would be bufferbloat.
+	 */
+	atomic_set(&brvcc->qspace, 2);
 	write_lock_irq(&devs_lock);
 	net_dev = br2684_find_dev(&be.ifspec);
 	if (net_dev == NULL) {
-- 
1.8.0

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply related

* Re: [PATCH] atm: br2684: Fix excessive queue bloat
From: David Woodhouse @ 2012-11-25 22:01 UTC (permalink / raw)
  To: Krzysztof Mazur
  Cc: netdev, John Crispin, Dave Täht, Chas Williams (CONTRACTOR)
In-Reply-To: <20121125214332.GA2722@shrek.podlesie.net>

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

On Sun, 2012-11-25 at 22:43 +0100, Krzysztof Mazur wrote:
> On Sat, Nov 24, 2012 at 12:01:32AM +0000, David Woodhouse wrote:
> > There's really no excuse for an additional wmem_default of buffering
> > between the netdev queue and the ATM device. Two packets (one in-flight,
> > and one ready to send) ought to be fine. It's not as if it should take
> > long to get another from the netdev queue when we need it.
> > 
> > If necessary we can make the queue space configurable later, but I don't
> > think it's likely to be necessary.
> 
> Maybe some high-speed devices will require larger queue, especially for
> smaller packets, but 2 packet queue should be sufficient in almost all cases.

Yeah. This is fairly much the same conversation I ended up having when I
did the same for PPPoATM.

Some day *perhaps* we might look at doing something adaptive, so it'll
detect a TX underrun and increase the amount of buffering. But this
seems perfectly good for now.

> Maybe this magic "2" and the comment should be moved to some #define.

Doesn't make it any less magic. I'm happier with it as it is, with a
clear comment describing why it's done that way.

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply

* Re: [PATCH] atm: br2684: Fix excessive queue bloat
From: Krzysztof Mazur @ 2012-11-25 21:43 UTC (permalink / raw)
  To: David Woodhouse
  Cc: netdev, John Crispin, Dave Täht, Chas Williams (CONTRACTOR)
In-Reply-To: <1353715292.26346.237.camel@shinybook.infradead.org>

On Sat, Nov 24, 2012 at 12:01:32AM +0000, David Woodhouse wrote:
> There's really no excuse for an additional wmem_default of buffering
> between the netdev queue and the ATM device. Two packets (one in-flight,
> and one ready to send) ought to be fine. It's not as if it should take
> long to get another from the netdev queue when we need it.
> 
> If necessary we can make the queue space configurable later, but I don't
> think it's likely to be necessary.

Maybe some high-speed devices will require larger queue, especially for
smaller packets, but 2 packet queue should be sufficient in almost all cases.

>  static inline struct br2684_vcc *pick_outgoing_vcc(const struct sk_buff *skb,
> @@ -504,6 +505,11 @@ static int br2684_regvcc(struct atm_vcc *atmvcc, void __user * arg)
>  	brvcc = kzalloc(sizeof(struct br2684_vcc), GFP_KERNEL);
>  	if (!brvcc)
>  		return -ENOMEM;
> +	/* Allow two packets in the ATM queue. One actually being sent, and one
> +	   for the ATM 'TX done' handler to send. It shouldn't take long to get
> +	   the next one from the netdev queue, when we need it. More than that
> +	   would be bufferbloat. */
> +	atomic_set(&brvcc->qspace, 2);

Maybe this magic "2" and the comment should be moved to some #define.

>  	write_lock_irq(&devs_lock);
>  	net_dev = br2684_find_dev(&be.ifspec);
>  	if (net_dev == NULL) {

Looks good,

Reviewed-by: Krzysztof Mazur <krzysiek@podlesie.net>

Krzysiek

^ permalink raw reply

* Re: [PATCH] ipv4/ipmr and ipv6/ip6mr: Convert int mroute_do_<foo> to bool
From: David Miller @ 2012-11-25 21:40 UTC (permalink / raw)
  To: joe; +Cc: eric.dumazet, netdev
In-Reply-To: <1353872130.6558.17.camel@joe-AO722>

From: Joe Perches <joe@perches.com>
Date: Sun, 25 Nov 2012 11:35:30 -0800

> Save a few bytes per table by convert mroute_do_assert and
> mroute_do_pim from int to bool.
> 
> Remove !! as the compiler does that when assigning int to bool.
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox