Netdev List
 help / color / mirror / Atom feed
* Re: cxgb4: Make unnecessarily global functions static
From: David Miller @ 2010-04-22  6:01 UTC (permalink / raw)
  To: rdreier; +Cc: dm, netdev
In-Reply-To: <adapr1spr7e.fsf@roland-alpha.cisco.com>

From: Roland Dreier <rdreier@cisco.com>
Date: Wed, 21 Apr 2010 11:59:17 -0700

> Also put t4_write_indirect() inside "#if 0" to avoid a "defined but not
> used" compile warning.
> 
> Signed-off-by: Roland Dreier <rolandd@cisco.com>

Also applied to net-next-2.6, thanks Roland.

^ permalink raw reply

* Re: cxgb4: Use ntohs() on __be16 value instead of htons()
From: David Miller @ 2010-04-22  6:00 UTC (permalink / raw)
  To: dm; +Cc: rdreier, netdev
In-Reply-To: <4BCF4F57.4050802@chelsio.com>

From: Dimitris Michailidis <dm@chelsio.com>
Date: Wed, 21 Apr 2010 12:17:43 -0700

> On 04/21/2010 11:09 AM, Roland Dreier wrote:
>> Use the correct direction of byte-swapping function to fix a mistake
>> shown by sparse endianness checking -- c.fl0id is __be16.
>>
>> Signed-off-by: Roland Dreier<rolandd@cisco.com>
> 
> Yes, thanks.
> 
> Acked-by: Dimitris Michailidis <dm@chelsio.com>

Applied to net-next-2.6

^ permalink raw reply

* Re: [PATCH] net: ipv6 bind to device issue
From: David Miller @ 2010-04-22  5:58 UTC (permalink / raw)
  To: brian.haley
  Cc: jolsa, kuznet, pekkas, jmorris, yoshfuji, kaber, eric.dumazet,
	netdev
In-Reply-To: <20100421.225015.137831360.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Wed, 21 Apr 2010 22:50:15 -0700 (PDT)

> Jiri please respin your patch with the argument order
> reversed so that we can make the inexpensive check before
> the expensive one.

Nevermind, I see you posted an updated version already,
which I've applied, thanks!

^ permalink raw reply

* Re: [PATCH] ethernet: print protocol in host byte order
From: David Miller @ 2010-04-22  5:57 UTC (permalink / raw)
  To: johannes; +Cc: netdev, eric.dumazet
In-Reply-To: <1271833567.3627.12.camel@jlt3.sipsolutions.net>

From: Johannes Berg <johannes@sipsolutions.net>
Date: Wed, 21 Apr 2010 09:06:07 +0200

> Eric's recent patch added __force, but this
> place would seem to require actually doing
> a byte order conversion so the printk is
> consistent across architectures.
> 
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>

Applied to net-next-2.6, thanks a lot Johannes.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
From: David Miller @ 2010-04-22  5:56 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1271830116.7895.1316.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Apr 2010 08:08:36 +0200

> Le dimanche 18 avril 2010 à 02:46 -0700, David Miller a écrit :
> 
>> Looks good, applied, thanks Eric.
> 
> Hmm, looking at the GSO stuff, I believe we should not call
> skb_orphan_try() on gso skbs ?

Right, I've applied this, thanks.

What we should probably do instead is call and NULL out the
DEV_GSO_CB() destructor.  Right?

^ permalink raw reply

* [PATCH v4] net: batch skb dequeueing from softnet input_pkt_queue
From: Changli Gao @ 2010-04-22  5:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: jamal, Tom Herbert, Eric Dumazet, netdev, Changli Gao

batch skb dequeueing from softnet input_pkt_queue

batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
contention when RPS is enabled. input_pkt_queue is reimplemented as a single
linked list(FIFO), and input_pkt_queue_lock is moved into RPS section, so
softnet becomes smaller when RPS is disabled than before.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/linux/netdevice.h |   14 +++++--
 net/core/dev.c            |   92 +++++++++++++++++++++++++++++++++++-----------
 2 files changed, 82 insertions(+), 24 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3c5ed5f..5ccb92b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1387,6 +1387,7 @@ struct softnet_data {
 	struct Qdisc		*output_queue;
 	struct list_head	poll_list;
 	struct sk_buff		*completion_queue;
+	struct sk_buff		*process_queue;
 
 #ifdef CONFIG_RPS
 	struct softnet_data	*rps_ipi_list;
@@ -1396,15 +1397,22 @@ struct softnet_data {
 	struct softnet_data	*rps_ipi_next;
 	unsigned int		cpu;
 	unsigned int		input_queue_head;
+	spinlock_t		input_pkt_queue_lock;
+	/* 4 bytes hole on 64bits machine */
 #endif
-	struct sk_buff_head	input_pkt_queue;
+	struct sk_buff		*input_pkt_queue_head;
+	struct sk_buff		**input_pkt_queue_tailp;
+	unsigned int		input_pkt_queue_len;
+	unsigned int		process_queue_len;
+
 	struct napi_struct	backlog;
 };
 
-static inline void input_queue_head_incr(struct softnet_data *sd)
+static inline void input_queue_head_add(struct softnet_data *sd,
+					unsigned int len)
 {
 #ifdef CONFIG_RPS
-	sd->input_queue_head++;
+	sd->input_queue_head += len;
 #endif
 }
 
diff --git a/net/core/dev.c b/net/core/dev.c
index e904c47..81c7877 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -211,14 +211,14 @@ static inline struct hlist_head *dev_index_hash(struct net *net, int ifindex)
 static inline void rps_lock(struct softnet_data *sd)
 {
 #ifdef CONFIG_RPS
-	spin_lock(&sd->input_pkt_queue.lock);
+	spin_lock(&sd->input_pkt_queue_lock);
 #endif
 }
 
 static inline void rps_unlock(struct softnet_data *sd)
 {
 #ifdef CONFIG_RPS
-	spin_unlock(&sd->input_pkt_queue.lock);
+	spin_unlock(&sd->input_pkt_queue_lock);
 #endif
 }
 
@@ -2402,6 +2402,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 {
 	struct softnet_data *sd;
 	unsigned long flags;
+	unsigned int qlen;
 
 	sd = &per_cpu(softnet_data, cpu);
 
@@ -2409,12 +2410,16 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 	__get_cpu_var(netdev_rx_stat).total++;
 
 	rps_lock(sd);
-	if (sd->input_pkt_queue.qlen <= netdev_max_backlog) {
-		if (sd->input_pkt_queue.qlen) {
+	qlen = sd->input_pkt_queue_len + sd->process_queue_len;
+	if (qlen <= netdev_max_backlog) {
+		if (qlen) {
 enqueue:
-			__skb_queue_tail(&sd->input_pkt_queue, skb);
+			skb->next = NULL;
+			*sd->input_pkt_queue_tailp = skb;
+			sd->input_pkt_queue_tailp = &skb->next;
+			sd->input_pkt_queue_len++;
 #ifdef CONFIG_RPS
-			*qtail = sd->input_queue_head + sd->input_pkt_queue.qlen;
+			*qtail = sd->input_queue_head + sd->input_pkt_queue_len;
 #endif
 			rps_unlock(sd);
 			local_irq_restore(flags);
@@ -2931,16 +2936,33 @@ static void flush_backlog(void *arg)
 {
 	struct net_device *dev = arg;
 	struct softnet_data *sd = &__get_cpu_var(softnet_data);
-	struct sk_buff *skb, *tmp;
+	struct sk_buff **pskb, *skb;
 
 	rps_lock(sd);
-	skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp)
+	for (pskb = &sd->input_pkt_queue_head; *pskb; ) {
+		skb = *pskb;
 		if (skb->dev == dev) {
-			__skb_unlink(skb, &sd->input_pkt_queue);
+			*pskb = skb->next;
 			kfree_skb(skb);
-			input_queue_head_incr(sd);
+			input_queue_head_add(sd, 1);
+			sd->input_pkt_queue_len--;
+		} else {
+			pskb = &skb->next;
 		}
+	}
+	sd->input_pkt_queue_tailp = pskb;
 	rps_unlock(sd);
+
+	for (pskb = &sd->process_queue; *pskb; ) {
+		skb = *pskb;
+		if (skb->dev == dev) {
+			*pskb = skb->next;
+			kfree_skb(skb);
+			sd->process_queue_len--;
+		} else {
+			pskb = &skb->next;
+		}
+	}
 }
 
 static int napi_gro_complete(struct sk_buff *skb)
@@ -3249,25 +3271,39 @@ static int process_backlog(struct napi_struct *napi, int quota)
 	struct softnet_data *sd = &__get_cpu_var(softnet_data);
 
 	napi->weight = weight_p;
+	local_irq_disable();
 	do {
 		struct sk_buff *skb;
 
-		local_irq_disable();
+		while (sd->process_queue) {
+			skb = sd->process_queue;
+			sd->process_queue = skb->next;
+			sd->process_queue_len--;
+			local_irq_enable();
+			__netif_receive_skb(skb);
+			if (++work >= quota)
+				goto out;
+			local_irq_disable();
+		}
+
 		rps_lock(sd);
-		skb = __skb_dequeue(&sd->input_pkt_queue);
-		if (!skb) {
+		if (sd->input_pkt_queue_head == NULL) {
 			__napi_complete(napi);
 			rps_unlock(sd);
 			local_irq_enable();
 			break;
 		}
-		input_queue_head_incr(sd);
-		rps_unlock(sd);
-		local_irq_enable();
 
-		__netif_receive_skb(skb);
-	} while (++work < quota);
+		sd->process_queue = sd->input_pkt_queue_head;
+		sd->process_queue_len = sd->input_pkt_queue_len;
+		input_queue_head_add(sd, sd->input_pkt_queue_len);
+		sd->input_pkt_queue_head = NULL;
+		sd->input_pkt_queue_tailp = &sd->input_pkt_queue_head;
+		sd->input_pkt_queue_len = 0;
+		rps_unlock(sd);
+	} while (1);
 
+out:
 	return work;
 }
 
@@ -5621,10 +5657,19 @@ static int dev_cpu_callback(struct notifier_block *nfb,
 	local_irq_enable();
 
 	/* Process offline CPU's input_pkt_queue */
-	while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
+	while ((skb = oldsd->input_pkt_queue_head)) {
+		oldsd->input_pkt_queue_head = skb->next;
+		netif_rx(skb);
+	}
+	oldsd->input_pkt_queue_tailp = &oldsd->input_pkt_queue_head;
+	input_queue_head_add(oldsd, oldsd->input_pkt_queue_len);
+	oldsd->input_pkt_queue_len = 0;
+
+	while ((skb = oldsd->process_queue)) {
+		oldsd->process_queue = skb->next;
 		netif_rx(skb);
-		input_queue_head_incr(oldsd);
 	}
+	oldsd->process_queue_len = 0;
 
 	return NOTIFY_OK;
 }
@@ -5842,11 +5887,16 @@ static int __init net_dev_init(void)
 	for_each_possible_cpu(i) {
 		struct softnet_data *sd = &per_cpu(softnet_data, i);
 
-		skb_queue_head_init(&sd->input_pkt_queue);
+		sd->input_pkt_queue_head = NULL;
+		sd->input_pkt_queue_tailp = &sd->input_pkt_queue_head;
+		sd->input_pkt_queue_len = 0;
+		sd->process_queue = NULL;
+		sd->process_queue_len = 0;
 		sd->completion_queue = NULL;
 		INIT_LIST_HEAD(&sd->poll_list);
 
 #ifdef CONFIG_RPS
+		spin_lock_init(&sd->input_pkt_queue_lock);
 		sd->csd.func = rps_trigger_softirq;
 		sd->csd.info = sd;
 		sd->csd.flags = 0;

^ permalink raw reply related

* Re: [PATCH] net: ipv6 bind to device issue
From: David Miller @ 2010-04-22  5:50 UTC (permalink / raw)
  To: brian.haley
  Cc: jolsa, kuznet, pekkas, jmorris, yoshfuji, kaber, eric.dumazet,
	netdev
In-Reply-To: <4BCDEED3.7040901@hp.com>

From: Brian Haley <brian.haley@hp.com>
Date: Tue, 20 Apr 2010 14:13:39 -0400

> Actually, looking at this again, we might want to swap the order
> here since fl->oif should be filled-in for most link-local and
> multicast requests calling this:
> 
> 	if (fl->oif || rt6_need_strict(&fl->fl6_dst))
> 
> Just a thought, but it potentially saves a call to determine
> the scope of the address.

Yes I think we should make this change.

Jiri please respin your patch with the argument order
reversed so that we can make the inexpensive check before
the expensive one.

Thanks.

^ permalink raw reply

* Re: [PATCH BUG-FIX] ipv6: allow to send packet after receiving ICMPv6 Too Big message with MTU field less than IPV6_MIN_MTU
From: David Miller @ 2010-04-22  5:48 UTC (permalink / raw)
  To: herbert
  Cc: shanwei, yoshfuji, yjwei, vladislav.yasevich, kuznet, pekkas,
	jmorris, kaber, eric.dumazet, sri, netdev, linux-sctp
In-Reply-To: <20100419035535.GA7011@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 19 Apr 2010 11:55:35 +0800

> On Mon, Apr 19, 2010 at 10:58:22AM +0800, Shan Wei wrote:
>> 
>> According to RFC2460, PMTU is set to the IPv6 Minimum Link
>> MTU (1280) and a fragment header should always be included
>> after a node receiving Too Big message reporting PMTU is
>> less than the IPv6 Minimum Link MTU.
>> 
>> After receiving a ICMPv6 Too Big message reporting PMTU is
>> less than the IPv6 Minimum Link MTU, sctp *can't* send any
>> data/control chunk that total length including IPv6 head 
>> and IPv6 extend head is less than IPV6_MIN_MTU(1280 bytes).
>> 
>> The failure occured in p6_fragment(), about reason 
>> see following(take SHUTDOWN chunk for example):
>> sctp_packet_transmit (SHUTDOWN chunk, len=16 byte)
>> |------sctp_v6_xmit (local_df=0)
>>    |------ip6_xmit
>>        |------ip6_output (dst_allfrag is ture)
>>            |------ip6_fragment
>> 
>> In ip6_fragment(), for local_df=0, drops the the packet
>> and returns EMSGSIZE.
>> 
>> The patch fixes it with adding check length of skb->len.
>> In this case, Ipv6 not to fragment upper protocol data,
>> just only add a fragment header before it. 
>> 
>> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
> 
> The patch looks good to me.
> 
> If we wanted to optimise the allfrags case it may be better
> to reserve the space beforehand and generate the fragment header
> at the same time as we're doing the IPv6 header.
> 
> But it can't be all that important as it's been broken for so
> many years.

Right, I've applied Shan's patch, thanks.

^ permalink raw reply

* Re: [PATCH] drivers/net/usb: Add new driver ipheth
From: David Miller @ 2010-04-22  5:44 UTC (permalink / raw)
  To: diego-KR0zwsgql1HQT0dZR+AlfA
  Cc: agimenez-lqZFv/KUvpAxAGwisGp4zA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dborca-/E1597aS9LQAvxtiuMwx3w,
	James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk,
	ralf-6z/3iImG2C8G8FEW9MqTrA, gregkh-l3A5Bk7waGM,
	jonas.sjoquist-IzeFyvvaP7pWk0Htik3J/w,
	torgny.johansson-Re5JQEeQqe8AvxtiuMwx3w,
	steve.glendinning-sdUf+H5yV5I,
	dbrownell-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f,
	omar.oberthur-Re5JQEeQqe8AvxtiuMwx3w,
	remi.denis-courmont-xNZwKgViW5gAvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <x2g1b0798831004210715h37253bc5y74baf86556aea7c5-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

From: Diego Giagio <diego-KR0zwsgql1HQT0dZR+AlfA@public.gmane.org>
Date: Wed, 21 Apr 2010 11:15:15 -0300

> On Sun, Apr 18, 2010 at 3:35 PM, L. Alberto Giménez
> <agimenez-lqZFv/KUvpAxAGwisGp4zA@public.gmane.org> wrote:
>> From: Diego Giagio <diego-KR0zwsgql1HQT0dZR+AlfA@public.gmane.org>
>>
>> Add new driver to use tethering with an iPhone device. After initial submission,
>> apply fixes to fit the new driver into the kernel standards.
>>
>> There are still a couple of minor (almost cosmetic-level) issues, but the driver
>> is fully functional right now.
>>
> 
> Signed-off-by: Diego Giagio <diego-KR0zwsgql1HQT0dZR+AlfA@public.gmane.org>
> Cc: Daniel Borca <dborca-/E1597aS9LQAvxtiuMwx3w@public.gmane.org>

Applied, thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] TCP: avoid to send keepalive probes if it is receiving data
From: David Miller @ 2010-04-22  5:42 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: eric.dumazet, fleitner, netdev
In-Reply-To: <alpine.DEB.2.00.1004182328370.19304@melkinpaasi.cs.helsinki.fi>

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Sun, 18 Apr 2010 23:34:15 +0300 (EEST)

> I fail to see why the addition of this new variable is necessary at all, 
> could either of you enlight me why exactly it's necessary and rcv_tstamp 
> will not suffice?

I agree, the existing rcv_tstamp should serve this purpose just
fine.

^ permalink raw reply

* Re: [PATCH v2] rps: optimize rps_get_cpu()
From: David Miller @ 2010-04-22  5:40 UTC (permalink / raw)
  To: xiaosuo; +Cc: therbert, eric.dumazet, netdev
In-Reply-To: <1271772160-28177-1-git-send-email-xiaosuo@gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Tue, 20 Apr 2010 22:02:40 +0800

> use ihl in bytes to eliminate later multiplyings.

I'll buy you a cookie if you can find a multiply generated by the
compiler for "x * 4".  It's going to use shifts and those are
basically free.

Please just change one thing at a time.  It would have helped you
here.  I was willing to apply the port dereference part of your
change, but not necessarily the 'ihl' changes.  But because you've
combined them, I have no choice but to reject everything.

^ permalink raw reply

* Re: [PATCH 1/2][RESEND] ehea: error handling improvement
From: David Miller @ 2010-04-22  5:36 UTC (permalink / raw)
  To: tklein; +Cc: netdev, linuxppc-dev, linux-kernel, themann
In-Reply-To: <201004211110.55986.tklein@de.ibm.com>

From: Thomas Klein <tklein@de.ibm.com>
Date: Wed, 21 Apr 2010 11:10:55 +0200

> Reset a port's resources only if they're actually in an error state
> 
> Signed-off-by: Thomas Klein <tklein@de.ibm.com>
> ---
> 
> Patch created against net-2.6

I thought you were sorry for wasting my time and that you were going
to follow the directions I gave you last time, and I quote:

--------------------
3) These are not appropriate for net-2.6 as we are deep in
   the -rcX series at this point and only the most diabolical
   bug fixes are appropriate.  Therefore, please generate these
   against net-next-2.6, thanks.
--------------------

And here you are generating your patches against net-2.6.  Heck, you
even feel it's worth mentioning explicitly.

Lucky for you the patches happen to apply cleanly to net-next-2.6 so
I've put them there.

^ permalink raw reply

* Re: IPv6 duplicate address detection erroneously marking address as duplicate when a host receives its own multicast packets?
From: David Miller @ 2010-04-22  5:30 UTC (permalink / raw)
  To: herbert; +Cc: brian.haley, sam.cannell, netdev
In-Reply-To: <20100422024140.GA7215@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 22 Apr 2010 10:41:40 +0800

> Brian Haley <brian.haley@hp.com> wrote:
>> 
>> Well, my initial reaction is XVM is doing the wrong thing looping-back
>> multicast packets.  You can try the following (untested) patch, I can
>> only confirm it compiles.
> 
> I agree, whatever is looping the packet back should be fixed.

Ethernet does not send multicasts to itself, so we're definitely not
going to cater to this XVM behavior.

^ permalink raw reply

* Re: [PATCH] tcp: fix outsegs stat for TSO segments
From: David Miller @ 2010-04-22  5:28 UTC (permalink / raw)
  To: therbert; +Cc: netdev
In-Reply-To: <alpine.DEB.1.00.1004212214110.14731@pokey.mtv.corp.google.com>

From: Tom Herbert <therbert@google.com>
Date: Wed, 21 Apr 2010 22:17:24 -0700 (PDT)

>  	if (after(tcb->end_seq, tp->snd_nxt) || tcb->seq == tcb->end_seq)
> -		TCP_INC_STATS(sock_net(sk), TCP_MIB_OUTSEGS);
> +		TCP_ADD_STATS(sock_net(sk), TCP_MIB_OUTSEGS,
> +		    tcp_skb_pcount(skb));

Please follow proper coding style and make the new line
with the 'tcp_skb_pcount(skb)' argument line up with
the start of the macro arguments on the previous line.

^ permalink raw reply

* Re: [PATCH] net: change recvform to return same address length as getsockname on unnamed unix sockets
From: David Miller @ 2010-04-22  5:26 UTC (permalink / raw)
  To: ppergame; +Cc: netdev, linux-kernel
In-Reply-To: <v2x7447a0ac1004212029qd1866eaekc769fee5b13ac09d@mail.gmail.com>

From: Pavel Pergamenshchik <ppergame@gmail.com>
Date: Wed, 21 Apr 2010 20:29:25 -0700

> unix_*_recvmsg() returns zero-length sockaddr if the sender is an
> unnamed AF_UNIX socket. Change it to return a two-byte sockaddr with
> just the address family, to be consistent with unix_getname().
> 
> Signed-off-by: Pavel Pergamenshchik <ppergame@gmail.com>

Since we've behaved this way for at least 10 years, the existing
behavior is the user visible ABI and the risk of breaking things by
making the change is too great.

I'm not applying this, sorry.

^ permalink raw reply

* [PATCH] tcp: fix outsegs stat for TSO segments
From: Tom Herbert @ 2010-04-22  5:17 UTC (permalink / raw)
  To: davem, netdev

Account for TSO segments of an skb in TCP_MIB_OUTSEGS counter.  Without
this, the counter can be off by orders of magnitude from the
actual number of segments sent.

Signed-off-by: Tom Herbert <therbert@google.com>
---
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 884fdbb..92456f1 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -133,6 +133,8 @@ struct linux_xfrm_mib {
 			__this_cpu_add(mib[0]->mibs[field], addend)
 #define SNMP_ADD_STATS_USER(mib, field, addend)	\
 			this_cpu_add(mib[1]->mibs[field], addend)
+#define SNMP_ADD_STATS(mib, field, addend)	\
+			this_cpu_add(mib[0]->mibs[field], addend)
 /*
  * Use "__typeof__(*mib[0]) *ptr" instead of "__typeof__(mib[0]) ptr"
  * to make @ptr a non-percpu pointer.
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 70c5159..91640fe 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -294,6 +294,7 @@ extern struct proto tcp_prot;
 #define TCP_INC_STATS_BH(net, field)	SNMP_INC_STATS_BH((net)->mib.tcp_statistics, field)
 #define TCP_DEC_STATS(net, field)	SNMP_DEC_STATS((net)->mib.tcp_statistics, field)
 #define TCP_ADD_STATS_USER(net, field, val) SNMP_ADD_STATS_USER((net)->mib.tcp_statistics, field, val)
+#define TCP_ADD_STATS(net, field, val)	SNMP_ADD_STATS((net)->mib.tcp_statistics, field, val)
 
 extern void			tcp_v4_err(struct sk_buff *skb, u32);
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2b7d71f..f89fadc 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -888,7 +888,8 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 		tcp_event_data_sent(tp, skb, sk);
 
 	if (after(tcb->end_seq, tp->snd_nxt) || tcb->seq == tcb->end_seq)
-		TCP_INC_STATS(sock_net(sk), TCP_MIB_OUTSEGS);
+		TCP_ADD_STATS(sock_net(sk), TCP_MIB_OUTSEGS,
+		    tcp_skb_pcount(skb));
 
 	err = icsk->icsk_af_ops->queue_xmit(skb);
 	if (likely(err <= 0))
@@ -2503,7 +2504,7 @@ struct sk_buff *tcp_make_synack(struct sock *sk, struct dst_entry *dst,
 	th->window = htons(min(req->rcv_wnd, 65535U));
 	tcp_options_write((__be32 *)(th + 1), tp, &opts);
 	th->doff = (tcp_header_size >> 2);
-	TCP_INC_STATS(sock_net(sk), TCP_MIB_OUTSEGS);
+	TCP_ADD_STATS(sock_net(sk), TCP_MIB_OUTSEGS, tcp_skb_pcount(skb));
 
 #ifdef CONFIG_TCP_MD5SIG
 	/* Okay, we have all we need - do the md5 hash if needed */

^ permalink raw reply related

* [PATCH] net: change recvform to return same address length as getsockname on unnamed unix sockets
From: Pavel Pergamenshchik @ 2010-04-22  3:29 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel

unix_*_recvmsg() returns zero-length sockaddr if the sender is an
unnamed AF_UNIX socket. Change it to return a two-byte sockaddr with
just the address family, to be consistent with unix_getname().

Signed-off-by: Pavel Pergamenshchik <ppergame@gmail.com>

---
Minimal example at http://xzrq.net/uaddrwtf.c
Solaris/OS X print 16 and 16. Linux prints 0 and 2 as described above.

--- a/net/unix/af_unix.c	2010-04-01 16:02:33.000000000 -0700
+++ b/net/unix/af_unix.c	2010-04-21 20:17:43.564703748 -0700
@@ -1634,9 +1634,13 @@
 static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
 {
 	struct unix_sock *u = unix_sk(sk);
+	struct sockaddr_un *sunaddr;

-	msg->msg_namelen = 0;
-	if (u->addr) {
+	if (!u->addr) {
+		msg->msg_namelen = sizeof(short);
+		sunaddr = msg->msg_name;
+		sunaddr->sun_family = AF_UNIX;
+	} else {
 		msg->msg_namelen = u->addr->len;
 		memcpy(msg->msg_name, u->addr->name, u->addr->len);
 	}

^ permalink raw reply

* Re: IPv6 duplicate address detection erroneously marking address as duplicate when a host receives its own multicast packets?
From: Herbert Xu @ 2010-04-22  2:41 UTC (permalink / raw)
  To: Brian Haley; +Cc: sam.cannell, netdev
In-Reply-To: <4BCFA615.8060205@hp.com>

Brian Haley <brian.haley@hp.com> wrote:
> 
> Well, my initial reaction is XVM is doing the wrong thing looping-back
> multicast packets.  You can try the following (untested) patch, I can
> only confirm it compiles.

I agree, whatever is looping the packet back should be fixed.

And if we are going to filter them out at our end, then it should
be done below IP.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: Bug#577640: linux-image-2.6.33-2-amd64: Kernel warnings in netns  thread
From: Eric W. Biederman @ 2010-04-22  2:38 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Martín Ferrari, 577640, netdev, Eric W. Biederman,
	Alexey Dobriyan, Mathieu Lacage
In-Reply-To: <1271895278.2582.3.camel@localhost>

Ben Hutchings <ben@decadent.org.uk> writes:

> On Wed, 2010-04-21 at 12:36 -0700, Eric W. Biederman wrote:
>> Martín Ferrari <martin.ferrari@gmail.com> writes:
>> 
>> > I'm not starting a new thread/bug, as this is probably related...
>> >
>> > I just discovered that in 2.6.33, if I create a veth inside a
>> > namespace and then move one of the halves into the main namespace,
>> > when I kill the namespace, I get one of these warnings followed by an
>> > oops. This does not happen if the veth is created from the main ns and
>> > then moved, nor in 2.6.32. This happens both in Qemu and on real
>> > hardware (both amd64)
>> >
>> > To reproduce:
>> >
>> > $ sudo ./startns bash
>> > # ip l a type veth
>> > # ip l s veth0 netns 1
>> > # exit
>> 
>> Nasty weird. I did a quick test here, and I'm not seeing that.
>> Does the 2.6.33 experimental kernel have any patches applied?
>
> Yes, but not many beyond the stable updates, and nothing in this area.
> You can see the list at:
> http://svn.debian.org/wsvn/kernel/dists/trunk/linux-2.6/debian/patches/series/base

Then I should ask what is startns?

Either that is doing something different from my equivalent program, or I have
patches to fix this, that just haven't been merged yet.

Eric

^ permalink raw reply

* Re: IPv6: race condition in __ipv6_ifa_notify() and dst_free() ?
From: Herbert Xu @ 2010-04-22  2:32 UTC (permalink / raw)
  To: Jiri Bohac; +Cc: Hideaki YOSHIFUJI, netdev, David Miller, Stephen Hemminger
In-Reply-To: <20100421213429.GA2799@midget.suse.cz>

On Wed, Apr 21, 2010 at 11:34:29PM +0200, Jiri Bohac wrote:
> Hi,
> 
> On Tue, Apr 20, 2010 at 07:44:01PM +0200, Jiri Bohac wrote:
> > What is the reason __ipv6_ifa_notify() calls dst_free() when
> > ip6_del_rt() fails? I don't see a way ip6_del_rt() could fail
> > with the dst still needing to be freed.
> 
> checked again and I still think that if ip6_del_rt() fails,
> ifp->rt must have been freed already. Anybody with a
> counterexample?

I agree with your diagnosis and the two duplicate NDISC messages
scenario sounds plausible.

Anyway, I think the root of the issue is the fact that NDISC is
calling addrconf_dad_failure with no locking whatsoever.  The
latter is not idempotent so some form of locking is needed.

This bug appears to have been around since the very start.

I'll dig deeper to see where we might be able to add some locks.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v3] net: batch skb dequeueing from softnet input_pkt_queue
From: Changli Gao @ 2010-04-22  1:35 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Eric Dumazet, David S. Miller, netdev, jamal
In-Reply-To: <k2n65634d661004211623k3ce51c95o2c329529ce402eda@mail.gmail.com>

On Thu, Apr 22, 2010 at 7:23 AM, Tom Herbert <therbert@google.com> wrote:
>
> How about just using two input_pkt_queue's (define
> input_pkt_queue[2])?  One that is used to enqueue from RPS, and one
> that is being processed by process_backlog.  Then the only thing that
> needs to be done under lock in process_backlog is to switch the
> queues;  something like sd->current_input_pkt_queue ^= 1
>

It is a better idea, IMO.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: IPv6 duplicate address detection erroneously marking address as duplicate when a host receives its own multicast packets?
From: Brian Haley @ 2010-04-22  1:27 UTC (permalink / raw)
  To: Sam Cannell; +Cc: netdev
In-Reply-To: <1271880831.6685.6.camel@spathi>

Sam Cannell wrote:
> I've been having some trouble with ip6 duplicate address detection in a
> Linux VM (under XVM on OpenSolaris).  It seems that the ethernet bridge
> in XVM sends a host's own multicast packets back to it, which the
> duplicate address detection code in linux decide that another host on
> the network is using the same address.
<snip>
>
> I'd happily put this down to a failing in XVM, however the stateless
> autoconfiguration RFC (4862) states that the stack shouldn't decide an
> address is duplicate based on receipt of a neighbor solicitation message
> that it sent itself:
<snip>
> 
> Assuming my understanding of the RFC is correct, this suggests to me
> that duplicate address detection in Linux is being a little too hasty to
> mark the address as invalid.  Thoughts?

Well, my initial reaction is XVM is doing the wrong thing looping-back
multicast packets.  You can try the following (untested) patch, I can
only confirm it compiles.

-Brian


Add a check for looped-back DAD packets on Ethernet interfaces.

Signed-off-by: Brian Haley <brian.haley@hp.com>

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index da0a4d2..33a7212 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -57,6 +57,7 @@
 #include <linux/net.h>
 #include <linux/in6.h>
 #include <linux/route.h>
+#include <linux/etherdevice.h>
 #include <linux/init.h>
 #include <linux/rcupdate.h>
 #include <linux/slab.h>
@@ -800,6 +801,16 @@ static void ndisc_recv_ns(struct sk_buff *skb)
 					}
 				}
 
+				if (dev->type == ARPHRD_ETHER) {
+					struct ethhdr *eth = eth_hdr(skb);
+					if (!compare_ether_addr_64bits(
+								dev->dev_addr,
+								eth->h_source)){
+						/* looped-back to us */
+						goto out;
+					}
+				}
+
 				/*
 				 * We are colliding with another node
 				 * who is doing DAD

^ permalink raw reply related

* Re: rps perfomance WAS(Re: rps: question
From: Changli Gao @ 2010-04-22  1:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: hadi, Rick Jones, David Miller, therbert, netdev, robert, andi
In-Reply-To: <1271876480.7895.3106.camel@edumazet-laptop>

On Thu, Apr 22, 2010 at 3:01 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Thanks a lot Jamal, this is really useful
>
> Drawback of using a fixed src ip from your generator is that all flows
> share the same struct dst entry on SUT. This might explain some glitches
> you noticed (ip_route_input + ip_rcv at high level on slave/application
> cpus)
> Also note your test is one way. If some data was replied we would see
> much use of the 'flows'
>
> I notice epoll_ctl() used a lot, are you re-arming epoll each time you
> receive a datagram ?
>
> I see slave/application cpus hit _raw_spin_lock_irqsave() and
> _raw_spin_unlock_irqrestore().
>
> Maybe a ring buffer could help (instead of a double linked queue) for
> backlog, or the double queue trick, if Changli wants to respin his
> patch.
>
>

OK, I'll post a new patch against the current tree, so Jamal can have
a try. I am sorry, but I don't have a suitable computer for benchmark.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] mac8390: change an error return code and some cleanup, take 3
From: Finn Thain @ 2010-04-22  1:13 UTC (permalink / raw)
  To: David Miller; +Cc: joe, p_gortmaker, netdev, linux-kernel, linux-m68k
In-Reply-To: <20100421.163041.158540277.davem@davemloft.net>


On Wed, 21 Apr 2010, David Miller wrote:

> From: Finn Thain <fthain@telegraphics.com.au>
> Date: Sat, 17 Apr 2010 13:16:04 +1000 (EST)
> 
> > 
> > Change an error return code from -EAGAIN to -EBUSY since the former is 
> > misleading.
> > 
> > Nubus slots are geographically addressed and their irqs are equally 
> > inflexible. -EAGAIN is misleading because retrying will not help fix 
> > whatever bug it was that made the irq unavailable.
> 
> request_irq() itself returns an appropriate error code, so the
> correct change is to do:
> 
> 	err = request_irq( ... );
> 	if (err) {
> 	...
> 
> and return 'err'.

OK. I'll send a new patch once 2.6.34 is out and I have time to test this 
and some other patches.

Finn

^ permalink raw reply

* Re: Bug#577640: linux-image-2.6.33-2-amd64: Kernel warnings in netns  thread
From: Ben Hutchings @ 2010-04-22  0:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Martín Ferrari, 577640, netdev, Eric W. Biederman,
	Alexey Dobriyan, Mathieu Lacage
In-Reply-To: <m1ljcgzjh4.fsf@fess.ebiederm.org>

[-- Attachment #1: Type: text/plain, Size: 1121 bytes --]

On Wed, 2010-04-21 at 12:36 -0700, Eric W. Biederman wrote:
> Martín Ferrari <martin.ferrari@gmail.com> writes:
> 
> > I'm not starting a new thread/bug, as this is probably related...
> >
> > I just discovered that in 2.6.33, if I create a veth inside a
> > namespace and then move one of the halves into the main namespace,
> > when I kill the namespace, I get one of these warnings followed by an
> > oops. This does not happen if the veth is created from the main ns and
> > then moved, nor in 2.6.32. This happens both in Qemu and on real
> > hardware (both amd64)
> >
> > To reproduce:
> >
> > $ sudo ./startns bash
> > # ip l a type veth
> > # ip l s veth0 netns 1
> > # exit
> 
> Nasty weird. I did a quick test here, and I'm not seeing that.
> Does the 2.6.33 experimental kernel have any patches applied?

Yes, but not many beyond the stable updates, and nothing in this area.
You can see the list at:
http://svn.debian.org/wsvn/kernel/dists/trunk/linux-2.6/debian/patches/series/base

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox