Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next-2.6] net: sk_dst_cache RCUification
From: Eric Dumazet @ 2010-04-14  5:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, paulmck
In-Reply-To: <1271223325.16881.600.camel@edumazet-laptop>

Le mercredi 14 avril 2010 à 07:35 +0200, Eric Dumazet a écrit :
> Le mardi 13 avril 2010 à 16:11 -0700, David Miller a écrit :
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Wed, 14 Apr 2010 01:04:05 +0200
> > 
> > > Instead of using rcu on whole "struct socket", my plan is to use a small
> > > structure :
> > > 
> > > struct wait_queue_head_rcu {
> > > 	wait_queue_head_t wait;
> > > 	struct rcu_head	  rcu;
> > > } ____cacheline_aligned_in_smp;
> > > 
> > > and make sk->sk_sleep points to this 'wait' field.
> > 
> > So you're relying upon the fact that in the non-FASYNC case
> > the struct socket's wait queue is never actually used?
> 
> Yes, for the first phase of my work, by asynch handling might be RCUfied
> too in a second phase :)

Oh well, I did not really understood the question David, please ignore
the answer (I need to fully wake before...)




^ permalink raw reply

* Re: [PATCH net-next-2.6] net: sk_dst_cache RCUification
From: Eric Dumazet @ 2010-04-14  5:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, paulmck
In-Reply-To: <20100413.161153.135265420.davem@davemloft.net>

Le mardi 13 avril 2010 à 16:11 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 14 Apr 2010 01:04:05 +0200
> 
> > Instead of using rcu on whole "struct socket", my plan is to use a small
> > structure :
> > 
> > struct wait_queue_head_rcu {
> > 	wait_queue_head_t wait;
> > 	struct rcu_head	  rcu;
> > } ____cacheline_aligned_in_smp;
> > 
> > and make sk->sk_sleep points to this 'wait' field.
> 
> So you're relying upon the fact that in the non-FASYNC case
> the struct socket's wait queue is never actually used?

Yes, for the first phase of my work, by asynch handling might be RCUfied
too in a second phase :)




^ permalink raw reply

* Re: [PATCH] fix potential wild pointer when NIC is dying
From: Eric Dumazet @ 2010-04-14  5:33 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, Tom Herbert, Herbert Xu, netdev
In-Reply-To: <1271247503-2973-1-git-send-email-xiaosuo@gmail.com>

Le mercredi 14 avril 2010 à 20:18 +0800, Changli Gao a écrit :
> fix potential wild pointer when NIC is dying.
> 
> flush_backlog() works with the assumption: the NIC doesn't enqueue packets to
> linux kernel, so there are two places, which packets are in, softnet queue or
> being processed in net-rx softirq. flush_backlog() is used to drop the first
> kind of packets, and for the later, a grace period is used to wait the
> finishing of the packets processing.
> 
> It always works without RPS. If RPS is used, although the NIC doesn't enqueue
> packets to linux kernel, RPS may do. There may be condition, a grace period has
> passed due to softirq running time limit, there are still packets, which refer
> to the died NIC, and are enqueued by RPS after flush_backlog() returns.
> 

I dont see how the problem can happens, and how RPS is involved.

Did you got a single panic, could you provide us a stack trace ?

Maybe are you referring to NAPI ?

NAPI process packets delivered by NIC, and through RPS deliver it to a
(possibly) remote CPU queue.

But at device dismantle time, we should stop NAPI on this device and
packet delivery machinery. RPS being on or not, NAPI wont deliver new
packets. The fact that NAPI can be throtled doesnt change the napi
instance being disabled at this point. No more packet will be delivered
(RPS or not)

Only after this point we call flush_backlog() to make sure we dont have
any queued packet in each cpu input_pkt_queue pointing to the device we
dismantle.

RPS doesnt change this at all.

Hmm ???



> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----
>  net/core/dev.c |   24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a10a216..fe4a821 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -131,6 +131,7 @@
>  #include <linux/random.h>
>  #include <trace/events/napi.h>
>  #include <linux/pci.h>
> +#include <linux/stop_machine.h>
>  
>  #include "net-sysfs.h"
>  
> @@ -2791,19 +2792,24 @@ int netif_receive_skb(struct sk_buff *skb)
>  EXPORT_SYMBOL(netif_receive_skb);
>  
>  /* Network device is going away, flush any packets still pending  */
> -static void flush_backlog(void *arg)
> +static int flush_backlog(void *arg)
>  {
>  	struct net_device *dev = arg;
> -	struct softnet_data *queue = &__get_cpu_var(softnet_data);
>  	struct sk_buff *skb, *tmp;
> +	struct softnet_data *queue;
> +	int cpu;
>  
> -	rps_lock(queue);
> -	skb_queue_walk_safe(&queue->input_pkt_queue, skb, tmp)
> -		if (skb->dev == dev) {
> -			__skb_unlink(skb, &queue->input_pkt_queue);
> -			kfree_skb(skb);
> +	for_each_online_cpu(cpu) {
> +		queue = &per_cpu(softnet_data, cpu);
> +		skb_queue_walk_safe(&queue->input_pkt_queue, skb, tmp) {
> +			if (skb->dev == dev) {
> +				__skb_unlink(skb, &queue->input_pkt_queue);
> +				kfree_skb(skb);
> +			}
>  		}
> -	rps_unlock(queue);
> +	}
> +
> +	return 0;
>  }
>  
>  static int napi_gro_complete(struct sk_buff *skb)
> @@ -5027,7 +5033,7 @@ void netdev_run_todo(void)
>  
>  		dev->reg_state = NETREG_UNREGISTERED;
>  
> -		on_each_cpu(flush_backlog, dev, 1);
> +		stop_machine(flush_backlog, dev, NULL);
>  
>  		netdev_wait_allrefs(dev);
>  
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply

* Bug#572201: forcedeth driver hangs under heavy load
From: Ayaz Abdulla @ 2010-04-14  5:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, smulcahy@gmail.com, bhutchings@solarflare.com,
	netdev@vger.kernel.org, ben@decadent.org.uk,
	572201@bugs.debian.org
In-Reply-To: <1271195179.16881.575.camel@edumazet-laptop>

[-- Attachment #1: Type: text/plain, Size: 484 bytes --]

Attached fix has been submitted to netdev.

Ayaz


Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 14:43 -0700, David Miller a écrit :
> 
>>Do you really come to the conclusion that TSO is broken with the above
>>test results?
>>
>>I would conclude that there is a TX checksumming issue, since merely
>>turning TSO off does not fix the problem whereas turning TX
>>checksumming off does.
> 
> 
> Indeed, we clarified the point and it is a TX checksum issue.
> 
> 

[-- Attachment #2: patch-forcedeth-tx-limit2-fix --]
[-- Type: text/plain, Size: 462 bytes --]

--- old/drivers/net/forcedeth.c	2010-04-14 01:18:51.000000000 -0400
+++ new/drivers/net/forcedeth.c	2010-04-14 01:20:40.000000000 -0400
@@ -5901,7 +5901,7 @@
 	/* Limit the number of tx's outstanding for hw bug */
 	if (id->driver_data & DEV_NEED_TX_LIMIT) {
 		np->tx_limit = 1;
-		if ((id->driver_data & DEV_NEED_TX_LIMIT2) &&
+		if (((id->driver_data & DEV_NEED_TX_LIMIT2) == DEV_NEED_TX_LIMIT2) &&
 		    pci_dev->revision >= 0xA2)
 			np->tx_limit = 0;
 	}

^ permalink raw reply

* Bug#572201: [PATCH] forcedeth: fix tx limit2 flag check
From: Ayaz Abdulla @ 2010-04-14  5:31 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet@gmail.com, smulcahy@gmail.com,
	bhutchings@solarflare.com, netdev@vger.kernel.org,
	ben@decadent.org.uk, 572201@bugs.debian.org
In-Reply-To: <20100413.144340.138717714.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 247 bytes --]

This patch fixes the TX_LIMIT feature flag. The previous logic check for 
TX_LIMIT2 also took into account a device that only had TX_LIMIT set.

Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>

This is a fix for bug 572201 @ bugs.debian.org





[-- Attachment #2: patch-forcedeth-tx-limit2-fix --]
[-- Type: text/plain, Size: 462 bytes --]

--- old/drivers/net/forcedeth.c	2010-04-14 01:18:51.000000000 -0400
+++ new/drivers/net/forcedeth.c	2010-04-14 01:20:40.000000000 -0400
@@ -5901,7 +5901,7 @@
 	/* Limit the number of tx's outstanding for hw bug */
 	if (id->driver_data & DEV_NEED_TX_LIMIT) {
 		np->tx_limit = 1;
-		if ((id->driver_data & DEV_NEED_TX_LIMIT2) &&
+		if (((id->driver_data & DEV_NEED_TX_LIMIT2) == DEV_NEED_TX_LIMIT2) &&
 		    pci_dev->revision >= 0xA2)
 			np->tx_limit = 0;
 	}

^ permalink raw reply

* Re: [PATCH] Infiniband: Randomize local port allocation.
From: Cong Wang @ 2010-04-14  4:38 UTC (permalink / raw)
  To: penguin-kernel
  Cc: rolandd, sean.hefty, opurdila, eric.dumazet, netdev, nhorman,
	davem, ebiederm, linux-kernel
In-Reply-To: <201004140201.o3E21Aqn075978@www262.sakura.ne.jp>

penguin-kernel@i-love.sakura.ne.jp wrote:
> Sean Hefty wrote:
>> Sean and Roland, is below patch correct?
>>> inet_is_reserved_local_port() is the new function proposed in this patchset.
>> It looks correct to me.  I didn't test the patch series, but if I comment out
>> the call to inet_is_reserved_local_port() in the provided below, the changes
>> worked fine for me.
>>
>> Acked-by: Sean Hefty <sean.hefty@intel.com>
>>
> Thank you for testing.
> 
> I think it is better to split this patch into
> 
> Part 1: Make cma_alloc_any_port() to use cma_alloc_port().
> 
> Part 2: Insert "!inet_is_reserved_local_port(rover) &&" line.
> 
> for future "git bisect".
> 

Right, thanks a lot for your work!

So, I will rebase my patch 3/3 on top of this patch. I hope someone
could take this one asap.

^ permalink raw reply

* Re: [PATCH] fix potential wild pointer when NIC is dying
From: Changli Gao @ 2010-04-14  4:24 UTC (permalink / raw)
  To: Joe Perches
  Cc: David S. Miller, Tom Herbert, Eric Dumazet, Herbert Xu, netdev
In-Reply-To: <1271218986.1555.29.camel@Joe-Laptop.home>

On Wed, Apr 14, 2010 at 12:23 PM, Joe Perches <joe@perches.com> wrote:
> On Wed, 2010-04-14 at 20:18 +0800, Changli Gao wrote:
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index a10a216..fe4a821 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
> []
>> -static void flush_backlog(void *arg)
>> +static int flush_backlog(void *arg)
>
> Why change this to return int?
>
>> +     return 0;
>
> It seems to always return 0.
>
>

Keep stop_machine() happy.
int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] fix potential wild pointer when NIC is dying
From: Joe Perches @ 2010-04-14  4:23 UTC (permalink / raw)
  To: Changli Gao
  Cc: David S. Miller, Tom Herbert, Eric Dumazet, Herbert Xu, netdev
In-Reply-To: <1271247503-2973-1-git-send-email-xiaosuo@gmail.com>

On Wed, 2010-04-14 at 20:18 +0800, Changli Gao wrote:
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a10a216..fe4a821 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
[]
> -static void flush_backlog(void *arg)
> +static int flush_backlog(void *arg)

Why change this to return int?

> +	return 0;

It seems to always return 0.

^ permalink raw reply

* [PATCH] fix potential wild pointer when NIC is dying
From: Changli Gao @ 2010-04-14 12:18 UTC (permalink / raw)
  To: David S. Miller
  Cc: Tom Herbert, Eric Dumazet, Herbert Xu, netdev, Changli Gao

fix potential wild pointer when NIC is dying.

flush_backlog() works with the assumption: the NIC doesn't enqueue packets to
linux kernel, so there are two places, which packets are in, softnet queue or
being processed in net-rx softirq. flush_backlog() is used to drop the first
kind of packets, and for the later, a grace period is used to wait the
finishing of the packets processing.

It always works without RPS. If RPS is used, although the NIC doesn't enqueue
packets to linux kernel, RPS may do. There may be condition, a grace period has
passed due to softirq running time limit, there are still packets, which refer
to the died NIC, and are enqueued by RPS after flush_backlog() returns.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/core/dev.c |   24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index a10a216..fe4a821 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -131,6 +131,7 @@
 #include <linux/random.h>
 #include <trace/events/napi.h>
 #include <linux/pci.h>
+#include <linux/stop_machine.h>

 #include "net-sysfs.h"

@@ -2791,19 +2792,24 @@ int netif_receive_skb(struct sk_buff *skb)
 EXPORT_SYMBOL(netif_receive_skb);

 /* Network device is going away, flush any packets still pending  */
-static void flush_backlog(void *arg)
+static int flush_backlog(void *arg)
 {
 	struct net_device *dev = arg;
-	struct softnet_data *queue = &__get_cpu_var(softnet_data);
 	struct sk_buff *skb, *tmp;
+	struct softnet_data *queue;
+	int cpu;

-	rps_lock(queue);
-	skb_queue_walk_safe(&queue->input_pkt_queue, skb, tmp)
-		if (skb->dev == dev) {
-			__skb_unlink(skb, &queue->input_pkt_queue);
-			kfree_skb(skb);
+	for_each_online_cpu(cpu) {
+		queue = &per_cpu(softnet_data, cpu);
+		skb_queue_walk_safe(&queue->input_pkt_queue, skb, tmp) {
+			if (skb->dev == dev) {
+				__skb_unlink(skb, &queue->input_pkt_queue);
+				kfree_skb(skb);
+			}
 		}
-	rps_unlock(queue);
+	}
+
+	return 0;
 }

 static int napi_gro_complete(struct sk_buff *skb)
@@ -5027,7 +5033,7 @@ void netdev_run_todo(void)

 		dev->reg_state = NETREG_UNREGISTERED;

-		on_each_cpu(flush_backlog, dev, 1);
+		stop_machine(flush_backlog, dev, NULL);

 		netdev_wait_allrefs(dev);

^ permalink raw reply related

* kernel never returns rtm_protocol of RTPROT_RA?
From: Jeff Haran @ 2010-04-14  2:04 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hi,

Perhaps I am misreading the kernel sources, but it looks to me like an IPv6 default gateway that is discovered via the receipt of the Router Advertisement will not be reported in a netlink RTM_NEWROUTE message with the rtm_protocol field == RTPROT_RA, though that seems to be the defined purpose for RTPROT_RA. It seems like such addresses are going to be reported with rtm_protocol == RTPROT_KERNEL.

This is what I observe reading the sources to our old kernel (2.6.14) and actually running the data through it, and later sources seem to be doing the same thing though I haven't studied them as closely.

Is this the expected behavior?

Does anybody see any problems if I change rt6_fill_node() to put RTPROT_RA into rtm_protocol when rt6i_flags has both the RTF_ADDRCONF and RTF_DEFAULT bits set?

I am writing an application that distinguishes different sources of IPv6 routes and would like be able to count on RTPROT_RA meaning the default router address originated from a Router Advertisements.

Please include my email address in response as I do not subscribe to this list.

Thanks,

Jeff Haran
Brocade

^ permalink raw reply

* [PATCH] Infiniband: Randomize local port allocation.
From: penguin-kernel @ 2010-04-14  2:01 UTC (permalink / raw)
  To: rolandd, sean.hefty
  Cc: amwang, opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm,
	linux-kernel
In-Reply-To: <21DAC78125424ED291B5D6477CFF9657@amr.corp.intel.com>

Sean Hefty wrote:
> Sean and Roland, is below patch correct?
> >inet_is_reserved_local_port() is the new function proposed in this patchset.
> 
> It looks correct to me.  I didn't test the patch series, but if I comment out
> the call to inet_is_reserved_local_port() in the provided below, the changes
> worked fine for me.
> 
> Acked-by: Sean Hefty <sean.hefty@intel.com>
> 
Thank you for testing.

I think it is better to split this patch into

Part 1: Make cma_alloc_any_port() to use cma_alloc_port().

Part 2: Insert "!inet_is_reserved_local_port(rover) &&" line.

for future "git bisect".

Roland, will you review below patch for part 1?
--------------------
[PATCH] Infiniband: Randomize local port allocation.

Randomize local port allocation in a way sctp_get_port_local() does.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/infiniband/core/cma.c |   69 ++++++++++++++----------------------------
 1 file changed, 24 insertions(+), 45 deletions(-)

--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
 static DEFINE_IDR(tcp_ps);
 static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
-static int next_port;
 
 struct cma_device {
 	struct list_head	list;
@@ -1970,47 +1969,32 @@ err1:
 
 static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 {
-	struct rdma_bind_list *bind_list;
-	int port, ret, low, high;
-
-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
-	if (!bind_list)
-		return -ENOMEM;
-
-retry:
-	/* FIXME: add proper port randomization per like inet_csk_get_port */
-	do {
-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
-
-	if (ret)
-		goto err1;
+	static unsigned int last_used_port;
+	int low, high, remaining;
+	unsigned int rover;
 
 	inet_get_local_port_range(&low, &high);
-	if (port > high) {
-		if (next_port != low) {
-			idr_remove(ps, port);
-			next_port = low;
-			goto retry;
+	remaining = (high - low) + 1;
+	rover = net_random() % remaining + low;
+	do {
+		rover++;
+		if ((rover < low) || (rover > high))
+			rover = low;
+		if (last_used_port != rover &&
+		    !idr_find(ps, (unsigned short) rover)) {
+			int ret = cma_alloc_port(ps, id_priv, rover);
+			/*
+			 * Remember previously used port number in order to
+			 * avoid re-using same port immediately after it is
+			 * closed.
+			 */
+			if (!ret)
+				last_used_port = rover;
+			if (ret != -EADDRNOTAVAIL)
+				return ret;
 		}
-		ret = -EADDRNOTAVAIL;
-		goto err2;
-	}
-
-	if (port == high)
-		next_port = low;
-	else
-		next_port = port + 1;
-
-	bind_list->ps = ps;
-	bind_list->port = (unsigned short) port;
-	cma_bind_port(bind_list, id_priv);
-	return 0;
-err2:
-	idr_remove(ps, port);
-err1:
-	kfree(bind_list);
-	return ret;
+	} while (--remaining > 0);
+	return -EADDRNOTAVAIL;
 }
 
 static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
@@ -2995,12 +2979,7 @@ static void cma_remove_one(struct ib_dev
 
 static int __init cma_init(void)
 {
-	int ret, low, high, remaining;
-
-	get_random_bytes(&next_port, sizeof next_port);
-	inet_get_local_port_range(&low, &high);
-	remaining = (high - low) + 1;
-	next_port = ((unsigned int) next_port % remaining) + low;
+	int ret;
 
 	cma_wq = create_singlethread_workqueue("rdma_cm");
 	if (!cma_wq)

^ permalink raw reply

* Re: linux-next: manual merge of the net tree with Linus' tree
From: David Miller @ 2010-04-14  2:00 UTC (permalink / raw)
  To: sfr; +Cc: netdev, linux-next, linux-kernel, ken_kawasaki, jpirko
In-Reply-To: <20100414115244.f97ca080.sfr@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Wed, 14 Apr 2010 11:52:44 +1000

> Hi Dave,
> 
> On Tue, 13 Apr 2010 18:47:24 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
>>
>> Thanks a lot Stephen, I'll merge net-2.6 into net-next-2.6 to
>> fix this up for you.
> 
> Thanks.
> 
> There was another conflict in drivers/net/virtio_net.c (because there is
> a patch in both your tree and Linus' (via net-current)) that git did not
> quite resolve correctly.  The sg_init_table() in add_recvbuf_small() was
> reinserted by the automatic merge ... I removed it in my merge.

Yes I expected that, the cherrypicked fix gets changed by a subsequent
commit in net-next-2.6 that changes where the scatterlist entries are
stored in that driver.

Anyways, thanks for the heads up.

^ permalink raw reply

* Re: linux-next: manual merge of the net tree with Linus' tree
From: Stephen Rothwell @ 2010-04-14  1:52 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-next, linux-kernel, ken_kawasaki, jpirko
In-Reply-To: <20100413.184724.112842393.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 597 bytes --]

Hi Dave,

On Tue, 13 Apr 2010 18:47:24 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
>
> Thanks a lot Stephen, I'll merge net-2.6 into net-next-2.6 to
> fix this up for you.

Thanks.

There was another conflict in drivers/net/virtio_net.c (because there is
a patch in both your tree and Linus' (via net-current)) that git did not
quite resolve correctly.  The sg_init_table() in add_recvbuf_small() was
reinserted by the automatic merge ... I removed it in my merge.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: linux-next: manual merge of the net tree with Linus' tree
From: David Miller @ 2010-04-14  1:47 UTC (permalink / raw)
  To: sfr; +Cc: netdev, linux-next, linux-kernel, ken_kawasaki, jpirko
In-Reply-To: <20100414114556.97d7583d.sfr@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Wed, 14 Apr 2010 11:45:56 +1000

> Hi all,
> 
> Today's linux-next merge of the net tree got a conflict in
> drivers/net/pcmcia/smc91c92_cs.c between commit
> a6d37024de02e7cb2b2333e438e71355a9c32a0a ("smc91c92_cs: define
> multicast_table as unsigned char") from Linus' tree and commit
> 22bedad3ce112d5ca1eaf043d4990fa2ed698c87 ("net: convert multicast list to
> list_head") from the net tree.
> 
> I fixed it up (see below) and can carry the fix for a while.

Thanks a lot Stephen, I'll merge net-2.6 into net-next-2.6 to
fix this up for you.

^ permalink raw reply

* linux-next: manual merge of the net tree with Linus' tree
From: Stephen Rothwell @ 2010-04-14  1:45 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: linux-next, linux-kernel, Ken Kawasaki, Jiri Pirko

Hi all,

Today's linux-next merge of the net tree got a conflict in
drivers/net/pcmcia/smc91c92_cs.c between commit
a6d37024de02e7cb2b2333e438e71355a9c32a0a ("smc91c92_cs: define
multicast_table as unsigned char") from Linus' tree and commit
22bedad3ce112d5ca1eaf043d4990fa2ed698c87 ("net: convert multicast list to
list_head") from the net tree.

I fixed it up (see below) and can carry the fix for a while.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc drivers/net/pcmcia/smc91c92_cs.c
index fd9d6e3,ad22676..0000000
--- a/drivers/net/pcmcia/smc91c92_cs.c
+++ b/drivers/net/pcmcia/smc91c92_cs.c
@@@ -1621,10 -1618,14 +1621,10 @@@ static void set_rx_mode(struct net_devi
  	rx_cfg_setting = RxStripCRC | RxEnable | RxAllMulti;
      else {
  	if (!netdev_mc_empty(dev)) {
- 	    struct dev_mc_list *mc_addr;
+ 	    struct netdev_hw_addr *ha;
  
- 	    netdev_for_each_mc_addr(mc_addr, dev) {
- 		u_int position = ether_crc(6, mc_addr->dmi_addr);
+ 	    netdev_for_each_mc_addr(ha, dev) {
+ 		u_int position = ether_crc(6, ha->addr);
 -#ifndef final_version		/* Verify multicast address. */
 -		if ((ha->addr[0] & 1) == 0)
 -		    continue;
 -#endif
  		multicast_table[position >> 29] |= 1 << ((position >> 26) & 7);
  	    }
  	}

^ permalink raw reply

* Re: forcedeth driver hangs under heavy load
From: David Miller @ 2010-04-14  1:41 UTC (permalink / raw)
  To: aabdulla; +Cc: eric.dumazet, smulcahy, bhutchings, netdev, ben, 572201
In-Reply-To: <4BC5539B.6050908@nvidia.com>

From: Ayaz Abdulla <aabdulla@nvidia.com>
Date: Wed, 14 Apr 2010 01:33:15 -0400

> Attached fix has been submitted to netdev.

Thanks!

I apply this soon.

^ permalink raw reply

* Re: [PATCH] tun: orphan an skb on tx
From: Herbert Xu @ 2010-04-14  0:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michael S. Tsirkin, Jan Kiszka, David S. Miller, Paul Moore,
	David Woodhouse, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel
In-Reply-To: <1271183463.16881.545.camel@edumazet-laptop>

On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
>
> Herbert Acked your patch, so I guess its OK, but I think it can be
> dangerous.

The tun socket accounting was never designed to stop it from
flooding another tun interface.  It's there to stop it from
transmitting above a destination interface TX bandwidth and
cause unnecessary packet drops.  It also limits the total amount
of kernel memory that can be pinned down by a single tun interface.

In this case, all we're doing is shifting the accounting from the
"hardware" queue to the qdisc queue.

So your ability to flood a tun interface is essentially unchanged.

BTW we do the same thing in a number of hardware drivers, as well
as virtio-net.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] Fix SCTP failure with ipv6 source address routing
From: Vlad Yasevich @ 2010-04-14  0:47 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: netdev
In-Reply-To: <1271198256-20477-1-git-send-email-paul.gortmaker@windriver.com>



Paul Gortmaker wrote:
> From: Weixing Shi <Weixing.Shi@windriver.com>
> 
> Given the below test case, using source address routing, SCTP
> does not work.
> 
> Node-A:
>   1)ifconfig eth0 inet6 add 2001:1::1/64
>   2)ip -6 rule add from 2001:1::1 table 100 pref 100
>   3)ip -6 route add 2001:2::1 dev eth0 table 100
>   4)sctp_darn -H 2001:1::1 -P 250 -l &
> 
> Node-B:
>   1)ifconfig eth0 inet6 add 2001:2::1/64
>   2)ip -6 rule add from 2001:2::1 table 100 pref 100
>   3)ip -6 route add 2001:1::1 dev eth0 table 100
>   4)sctp_darn -H 2001:2::1 -P 250 -h 2001:1::1 -p 250 -s
> 
> Root cause:
>   Node-A and Node-B use source address routing, and in the
>   begining, the source address will be NULL.  So SCTP will search
>   the routing table by the destination address (because it is using
>   the source address routing table), and hence the resulting dst_entry
>   will be NULL.
> 
> Solution:
>   After SCTP gets the correct source address, then we search for
>   dst_entry again, and then we will get the correct value.

The problem here is that ipv6 route lookup code in sctp doesn't bother
searching for the source address, unlike the v4 route lookup code.

Compare sctp_v4_get_dst() and sctp_v6_get_dst.  The v4 version bends over
backwards trying to get the correct route, while the v6 version simple does
a single lookup and returns the result.

The v6 route lookup code needs to be fixed to take into account the bound
address list.

-vlad
	
> 
> Signed-off-by: Weixing Shi <Weixing.Shi@windriver.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
>  net/sctp/transport.c |   11 +++++++++--
>  1 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> index be4d63d..b5ae18c 100644
> --- a/net/sctp/transport.c
> +++ b/net/sctp/transport.c
> @@ -295,9 +295,16 @@ void sctp_transport_route(struct sctp_transport *transport,
>  
>  	if (saddr)
>  		memcpy(&transport->saddr, saddr, sizeof(union sctp_addr));
> -	else
> +	else {
>  		af->get_saddr(opt, asoc, dst, daddr, &transport->saddr);
> -
> +		/* When using source address routing, since dst was
> +		 * looked up prior to filling in the source address, dst
> +		 * needs to be looked up again to get the correct dst
> +		 */
> +		if (dst)
> +			dst_release(dst);
> +		dst = af->get_dst(asoc, daddr, &transport->saddr);
> +	}
>  	transport->dst = dst;
>  	if ((transport->param_flags & SPP_PMTUD_DISABLE) && transport->pathmtu) {
>  		return;

^ permalink raw reply

* Phylib polling when doing mdio_read will cause system response and transfer speed drop
From: Bryan Wu @ 2010-04-14  0:27 UTC (permalink / raw)
  To: afleming, davem; +Cc: netdev, LKML

Hi Andy and David,

After I posted a patch to add phylib supporting in drivers/net/fec.c, we found 
performance drop regressions on Freescale i.MX51 babbage board.

Patch is 
http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=commitdiff;h=e6b043d512fa8d9a3801bf5d72bfa3b8fc3b3cc8.

Bug tracker is here: 
https://bugs.launchpad.net/ubuntu/+source/linux-fsl-imx51/+bug/546649

I found the root cause is the polling operation in the mdio_read function. When 
we transfer large files, we experienced many times of timeout issue. So I got 
several question here:
1. Need I return -ETIMEDOUT when polling timeout. If I don't return -ETIMEOUT, 
the performance improved a lot. And after check other drivers, some don't return 
anything, some return 0, some return negative value. What's the rule for this 
mdio_read polling timeout case.

2. How to do polling busy waiting? Normally, we won't buys wait very long in 
polling. But hardware is not perfect every time. Running cpu_relax() 10000 times 
in polling will cause our system response very bad when hardware don't set the 
flag as we expected. Maybe udelay(25) 10 times or msleep(1) 10 times is better 
than that.

I got a patch to recover this issue, 
http://kernel.ubuntu.com/git?p=roc/ubuntu-lucid.git;a=commitdiff;h=5d77e3409b319ca84183bf1d2fd158a9c864e03f.

Thanks a lot,
-- 
Bryan Wu <bryan.wu@canonical.com>
Kernel Developer    +86.138-1617-6545 Mobile
Ubuntu Kernel Team | Hardware Enablement Team
Canonical Ltd.      www.canonical.com
Ubuntu - Linux for human beings | www.ubuntu.com

^ permalink raw reply

* Re: [PATCH] Add somaxconn to Documentation/sysctl/net.txt
From: Rob Landley @ 2010-04-13 23:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, linux-doc, netdev
In-Reply-To: <1271184012.16881.549.camel@edumazet-laptop>

On Tuesday 13 April 2010 13:40:12 Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 13:25 -0500, Rob Landley a écrit :
> > From: Rob Landley <rob@landley.net>
> >
> > Add somaxconn to Documentation/sysctl/net.txt
> >
> > Signed-off-by: Rob Landley <rob@landley.net>
> > ---
> >
> >  Documentation/sysctl/net.txt |    6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
> > index df38ef0..2740085 100644
> > --- a/Documentation/sysctl/net.txt
> > +++ b/Documentation/sysctl/net.txt
> > @@ -90,6 +90,12 @@ optmem_max
> >  Maximum ancillary buffer size allowed per socket. Ancillary data is a
> > sequence of struct cmsghdr structures with appended data.
> >
> > +somaxconn
> > +---------
> > +
> > +Maximum backlog of unanswered connections for a listening socket. 
> > Provides +an upper bound on the "backlog" parameter of the listen()
> > syscall. +
> >  2. /proc/sys/net/unix - Parameters for Unix domain sockets
> >  -------------------------------------------------------
>
> Please cc netdev for such patches
>
> Extract of Documentation/networking/ip-sysctl.txt
>
> somaxconn - INTEGER
> 	Limit of socket listen() backlog, known in userspace as SOMAXCONN.
> 	Defaults to 128.  See also tcp_max_syn_backlog for additional tuning
> 	for TCP sockets.
>
> I guess you need to change both files ?

Dunno.  I just got a question on the busybox mailing list:

  http://lists.busybox.net/pipermail/busybox/2010-April/072090.html

Looked in Documentation to see what /proc/sys/net/core/somaxconn actually 
_did_, found it was undocumented, grepped the kernel source for somaxconn, 
found just one chunk of code actually using it, replied to the guy's question:

  http://lists.busybox.net/pipermail/busybox/2010-April/072096.html

And then tweaked the documentation with what I'd found, and sent in a doc 
patch so I wouldn't have to do that twice.

It's quite possible I got it wrong.  Maybe it's per interface or something?

Rob
-- 
Latency is more important than throughput. It's that simple. - Linus Torvalds

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: sk_dst_cache RCUification
From: David Miller @ 2010-04-13 23:11 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, paulmck
In-Reply-To: <1271199845.16881.586.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 14 Apr 2010 01:04:05 +0200

> Instead of using rcu on whole "struct socket", my plan is to use a small
> structure :
> 
> struct wait_queue_head_rcu {
> 	wait_queue_head_t wait;
> 	struct rcu_head	  rcu;
> } ____cacheline_aligned_in_smp;
> 
> and make sk->sk_sleep points to this 'wait' field.

So you're relying upon the fact that in the non-FASYNC case
the struct socket's wait queue is never actually used?

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: sk_dst_cache RCUification
From: Eric Dumazet @ 2010-04-13 23:04 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, paulmck
In-Reply-To: <20100413.015232.67916764.davem@davemloft.net>

Le mardi 13 avril 2010 à 01:52 -0700, David Miller a écrit :

> Applied, thanks for doing this work Eric.

Thanks David :)

I am now working on sk_callback_lock case, to speedup
sock_def_readable(), sock_def_write_space() in typical cases
(SOCK_FASYNC not set)

Instead of using rcu on whole "struct socket", my plan is to use a small
structure :

struct wait_queue_head_rcu {
	wait_queue_head_t wait;
	struct rcu_head	  rcu;
} ____cacheline_aligned_in_smp;

and make sk->sk_sleep points to this 'wait' field.



^ permalink raw reply

* Re: [PATCH v2] net: batch skb dequeueing from softnet input_pkt_queue
From: Changli Gao @ 2010-04-13 22:43 UTC (permalink / raw)
  To: paulmck; +Cc: Eric Dumazet, David S. Miller, netdev
In-Reply-To: <20100413155227.GC2538@linux.vnet.ibm.com>

On Tue, Apr 13, 2010 at 11:52 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Tue, Apr 13, 2010 at 05:50:29PM +0800, Changli Gao wrote:
>> On Tue, Apr 13, 2010 at 4:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> >
>> >        Probably not necessary.
>> >
>> >> +     volatile bool           flush_processing_queue;
>> >
>> > Use of 'volatile' is strongly discouraged, I would say, forbidden.
>>
>> volatile is used to avoid compiler optimization.
>
> Would it be reasonable to use ACCESS_ONCE() where this variable is used?

Oh, thanks. ACCESS_ONCE() is just what I need.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [PATCH] Fix SCTP failure with ipv6 source address routing
From: Paul Gortmaker @ 2010-04-13 22:37 UTC (permalink / raw)
  To: netdev; +Cc: vladislav.yasevich

From: Weixing Shi <Weixing.Shi@windriver.com>

Given the below test case, using source address routing, SCTP
does not work.

Node-A:
  1)ifconfig eth0 inet6 add 2001:1::1/64
  2)ip -6 rule add from 2001:1::1 table 100 pref 100
  3)ip -6 route add 2001:2::1 dev eth0 table 100
  4)sctp_darn -H 2001:1::1 -P 250 -l &

Node-B:
  1)ifconfig eth0 inet6 add 2001:2::1/64
  2)ip -6 rule add from 2001:2::1 table 100 pref 100
  3)ip -6 route add 2001:1::1 dev eth0 table 100
  4)sctp_darn -H 2001:2::1 -P 250 -h 2001:1::1 -p 250 -s

Root cause:
  Node-A and Node-B use source address routing, and in the
  begining, the source address will be NULL.  So SCTP will search
  the routing table by the destination address (because it is using
  the source address routing table), and hence the resulting dst_entry
  will be NULL.

Solution:
  After SCTP gets the correct source address, then we search for
  dst_entry again, and then we will get the correct value.

Signed-off-by: Weixing Shi <Weixing.Shi@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/sctp/transport.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index be4d63d..b5ae18c 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -295,9 +295,16 @@ void sctp_transport_route(struct sctp_transport *transport,
 
 	if (saddr)
 		memcpy(&transport->saddr, saddr, sizeof(union sctp_addr));
-	else
+	else {
 		af->get_saddr(opt, asoc, dst, daddr, &transport->saddr);
-
+		/* When using source address routing, since dst was
+		 * looked up prior to filling in the source address, dst
+		 * needs to be looked up again to get the correct dst
+		 */
+		if (dst)
+			dst_release(dst);
+		dst = af->get_dst(asoc, daddr, &transport->saddr);
+	}
 	transport->dst = dst;
 	if ((transport->param_flags & SPP_PMTUD_DISABLE) && transport->pathmtu) {
 		return;
-- 
1.6.5.2


^ permalink raw reply related

* Re: [PATCH 0/9] net: support multiple independant multicast routing instances
From: David Miller @ 2010-04-13 21:51 UTC (permalink / raw)
  To: kaber; +Cc: netdev
In-Reply-To: <1271171003-11901-1-git-send-email-kaber@trash.net>

From: Patrick McHardy <kaber@trash.net>
Date: Tue, 13 Apr 2010 17:03:14 +0200

> this is an updated patchset of my patches to support multiple independant
> multicast routing instances. Changes since the last posting are:
> 
> - rebase to the current net-next-2.6.git tree
> - fix up patch subjects to consistently refer to "ipv4: ipmr:"
> - fix up list_head conversion patch to add new elements at the head of
>   the list instead of at the tail
> 
> Please apply or pull from:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/kaber/ipmr-2.6.git master

I applied the patches instead of pulling just to check your email
patch submission format, and it was perfect! :-)

I'll do a git pull next time.

All applied to net-next-2.6, thanks!

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox