Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next-2.6] drivers: net: use skb_headlen()
From: David Miller @ 2010-04-14 23:15 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <20100414.155949.210580528.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Wed, 14 Apr 2010 15:59:49 -0700 (PDT)

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 13 Apr 2010 22:48:20 +0200
> 
>> replaces (skb->len - skb->data_len) occurrences by skb_headlen(skb)
>> 
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Applied, thanks Eric.

Eric, there were several typos in this patch and it is clear
that even e1000 wasn't build tested.

You used the mystical "skb_headsize()" instead of "skb_headlen()"
in several locations.

I fixed this up, but please build test your changes no matter
how seemingly trivial next time.

Thanks.

^ permalink raw reply

* RE: [PATCH] Infiniband: Randomize local port allocation.
From: Sean Hefty @ 2010-04-15  0:01 UTC (permalink / raw)
  To: penguin-kernel, rolandd
  Cc: amwang, opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm,
	linux-kernel
In-Reply-To: <201004140201.o3E21Aqn075978@www262.sakura.ne.jp>

>[PATCH] Infiniband: Randomize local port allocation.
>
>Randomize local port allocation in a way sctp_get_port_local() does.
>
>Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

Thanks for fixing this long outstanding issue.  :)  The latest patch looks
correct and passed some simple tests that I ran against it.  One comment below,
which I didn't catch before:

>---
> drivers/infiniband/core/cma.c |   69 ++++++++++++++---------------------------
>-
> 1 file changed, 24 insertions(+), 45 deletions(-)
>
>--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
>+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
>@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
> static DEFINE_IDR(tcp_ps);
> static DEFINE_IDR(udp_ps);
> static DEFINE_IDR(ipoib_ps);
>-static int next_port;
>
> struct cma_device {
> 	struct list_head	list;
>@@ -1970,47 +1969,32 @@ err1:
>
> static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
> {
>-	struct rdma_bind_list *bind_list;
>-	int port, ret, low, high;
>-
>-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
>-	if (!bind_list)
>-		return -ENOMEM;
>-
>-retry:
>-	/* FIXME: add proper port randomization per like inet_csk_get_port */
>-	do {
>-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
>-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
>-
>-	if (ret)
>-		goto err1;
>+	static unsigned int last_used_port;
>+	int low, high, remaining;
>+	unsigned int rover;
>
> 	inet_get_local_port_range(&low, &high);
>-	if (port > high) {
>-		if (next_port != low) {
>-			idr_remove(ps, port);
>-			next_port = low;
>-			goto retry;
>+	remaining = (high - low) + 1;
>+	rover = net_random() % remaining + low;
>+	do {
>+		rover++;
>+		if ((rover < low) || (rover > high))
>+			rover = low;

Assuming that we're likely to pick a valid port on the first try, it would be
more efficient to move the above 3 lines to the end of the while loop.

>+		if (last_used_port != rover &&
>+		    !idr_find(ps, (unsigned short) rover)) {
>+			int ret = cma_alloc_port(ps, id_priv, rover);
>+			/*
>+			 * Remember previously used port number in order to
>+			 * avoid re-using same port immediately after it is
>+			 * closed.
>+			 */
>+			if (!ret)
>+				last_used_port = rover;
>+			if (ret != -EADDRNOTAVAIL)
>+				return ret;
> 		}
>-		ret = -EADDRNOTAVAIL;
>-		goto err2;
>-	}
>-
>-	if (port == high)
>-		next_port = low;
>-	else
>-		next_port = port + 1;
>-
>-	bind_list->ps = ps;
>-	bind_list->port = (unsigned short) port;
>-	cma_bind_port(bind_list, id_priv);
>-	return 0;
>-err2:
>-	idr_remove(ps, port);
>-err1:
>-	kfree(bind_list);
>-	return ret;
>+	} while (--remaining > 0);
>+	return -EADDRNOTAVAIL;
> }

^ permalink raw reply

* Re: hdlc_ppp: why no detach()?
From: Krzysztof Halasa @ 2010-04-15  0:02 UTC (permalink / raw)
  To: Michael Barkowski
  Cc: David S. Miller, Julia Lawall, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <m3633todl2.fsf@intrepid.localdomain>

> 1. The SCR packet will be delayed by 2 seconds (both first and second
>    SCR will be sent the same time). Perhaps we delay only a little
>    (instead of full 2 seconds) and only then send the initial packet.

BTW can I sleep in a netdevice_notifier callback, no?
-- 
Krzysztof Halasa

^ permalink raw reply

* [PATCH] WAN: flush tx_queue in hdlc_ppp to prevent panic on rmmod hw_driver.
From: Krzysztof Halasa @ 2010-04-15  0:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

tx_queue is used as a temporary queue when not allowed to queue skb
directly to the hw device driver (which may sleep). Most paths flush
it before returning, but ppp_start() currently cannot. Make sure we
don't leave skbs pointing to a non-existent device.

Thanks to Michael Barkowski for reporting this problem.

Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>

diff --git a/drivers/net/wan/hdlc_ppp.c b/drivers/net/wan/hdlc_ppp.c
index b9b9d6b..941f053 100644
--- a/drivers/net/wan/hdlc_ppp.c
+++ b/drivers/net/wan/hdlc_ppp.c
@@ -628,9 +628,15 @@ static void ppp_stop(struct net_device *dev)
 	ppp_cp_event(dev, PID_LCP, STOP, 0, 0, 0, NULL);
 }

+static void ppp_close(struct net_device *dev)
+{
+	ppp_tx_flush();
+}
+
 static struct hdlc_proto proto = {
 	.start		= ppp_start,
 	.stop		= ppp_stop,
+	.close		= ppp_close,
 	.type_trans	= ppp_type_trans,
 	.ioctl		= ppp_ioctl,
 	.netif_rx	= ppp_rx,

^ permalink raw reply related

* RE: [PATCH] Infiniband: Randomize local port allocation.
From: Tetsuo Handa @ 2010-04-15  2:29 UTC (permalink / raw)
  To: sean.hefty
  Cc: amwang, opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm,
	linux-kernel, rolandd
In-Reply-To: <195EB1AD388F455AA386EB214FCD09F4@amr.corp.intel.com>

Sean Hefty wrote:
> >+	remaining = (high - low) + 1;
> >+	rover = net_random() % remaining + low;
> >+	do {
> >+		rover++;
> >+		if ((rover < low) || (rover > high))
> >+			rover = low;
> 
> Assuming that we're likely to pick a valid port on the first try, it would be
> more efficient to move the above 3 lines to the end of the while loop.
> 
Indeed. I moved these lines to "if (--remaining) { ... }" block.
--------------------
[PATCH] Infiniband: Randomize local port allocation.

Randomize local port allocation in a way sctp_get_port_local() does.
Update rover at the end of loop since we're likely to pick a valid port
on the first try.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/infiniband/core/cma.c |   70 +++++++++++++++---------------------------
 1 file changed, 25 insertions(+), 45 deletions(-)

--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
 static DEFINE_IDR(tcp_ps);
 static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
-static int next_port;
 
 struct cma_device {
 	struct list_head	list;
@@ -1970,47 +1969,33 @@ err1:
 
 static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 {
-	struct rdma_bind_list *bind_list;
-	int port, ret, low, high;
-
-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
-	if (!bind_list)
-		return -ENOMEM;
-
-retry:
-	/* FIXME: add proper port randomization per like inet_csk_get_port */
-	do {
-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
-
-	if (ret)
-		goto err1;
+	static unsigned int last_used_port;
+	int low, high, remaining;
+	unsigned int rover;
 
 	inet_get_local_port_range(&low, &high);
-	if (port > high) {
-		if (next_port != low) {
-			idr_remove(ps, port);
-			next_port = low;
-			goto retry;
-		}
-		ret = -EADDRNOTAVAIL;
-		goto err2;
+	remaining = (high - low) + 1;
+	rover = net_random() % remaining + low;
+retry:
+	if (last_used_port != rover &&
+	    !idr_find(ps, (unsigned short) rover)) {
+		int ret = cma_alloc_port(ps, id_priv, rover);
+		/*
+		 * Remember previously used port number in order to avoid
+		 * re-using same port immediately after it is closed.
+		 */
+		if (!ret)
+			last_used_port = rover;
+		if (ret != -EADDRNOTAVAIL)
+			return ret;
 	}
-
-	if (port == high)
-		next_port = low;
-	else
-		next_port = port + 1;
-
-	bind_list->ps = ps;
-	bind_list->port = (unsigned short) port;
-	cma_bind_port(bind_list, id_priv);
-	return 0;
-err2:
-	idr_remove(ps, port);
-err1:
-	kfree(bind_list);
-	return ret;
+	if (--remaining) {
+		rover++;
+		if ((rover < low) || (rover > high))
+			rover = low;
+		goto retry;
+	}
+	return -EADDRNOTAVAIL;
 }
 
 static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
@@ -2995,12 +2980,7 @@ static void cma_remove_one(struct ib_dev
 
 static int __init cma_init(void)
 {
-	int ret, low, high, remaining;
-
-	get_random_bytes(&next_port, sizeof next_port);
-	inet_get_local_port_range(&low, &high);
-	remaining = (high - low) + 1;
-	next_port = ((unsigned int) next_port % remaining) + low;
+	int ret;
 
 	cma_wq = create_singlethread_workqueue("rdma_cm");
 	if (!cma_wq)

^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: Eric Dumazet @ 2010-04-15  2:40 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Changli Gao, Tom Herbert, hadi, netdev, robert, David Miller,
	Andi Kleen
In-Reply-To: <20100414160220.43a62999@nehalam>

Le mercredi 14 avril 2010 à 16:02 -0700, Stephen Hemminger a écrit :
> On Thu, 15 Apr 2010 06:51:29 +0800
> Changli Gao <xiaosuo@gmail.com> wrote:
> 
> > On Thu, Apr 15, 2010 at 4:57 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > >
> > > RPS can be tuned (Changli wants a finer tuning...), it would be
> > > intereting to tune multiqueue devices too. I dont know if its possible
> > > right now.
> > >
> > > On my Nehalem machine (16 logical cpus), its NetXtreme II BCM57711E
> > > 10Gigabit has 16 queues. It might be good to use less queues according
> > > to your results on some workloads, and eventually use RPS on a second
> > > layering.
> > 
> > My idear is: run a daemon in userland to monitor the softnet
> > statistics, and tun the RPS setting if necessary. It seems that the
> > current softnet statistics data isn't correct.
> > 
> > Long time ago, I did a test, and the conclution was
> > call_function_single IPI was more expensive than resched IPI, so I
> > moved to kernel thread from softirq for packet processing. I'll redo
> > the test later.
> > 
> 
> The big thing is data, data, data... Performance can only be examined
> with real hard data with multiple different kind of hardware.  Also, check for
> regressions in lmbench and TPC benchmarks. Yes this is hard, but papers
> on this would allow for rational rather than speculative choices.
> 
> Adding more tuning knobs is not the answer unless you can show when
> the tuning helps.
> 

Agree 100%, and irqbalance is the existing daemon. It should be used and
changed if necessary.

Changli, my stronges argument about your patches is that our scheduler
and memory affinity api (numactl driven) is bitmask oriented, giving the
same weight to individual cpu or individual memory node.




^ permalink raw reply

* Re: [PATCH net-next-2.6] drivers: net: use skb_headlen()
From: Eric Dumazet @ 2010-04-15  2:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100414.161504.149014877.davem@davemloft.net>

Le mercredi 14 avril 2010 à 16:15 -0700, David Miller a écrit :

> Eric, there were several typos in this patch and it is clear
> that even e1000 wasn't build tested.
> 
> You used the mystical "skb_headsize()" instead of "skb_headlen()"
> in several locations.
> 
> I fixed this up, but please build test your changes no matter
> how seemingly trivial next time.
> 

Oh well, I was sure I build tested it on a modallconfig, but now I
realize it was on a custom config, sorry David, thanks !



^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: Changli Gao @ 2010-04-15  2:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, Tom Herbert, hadi, netdev, robert,
	David Miller, Andi Kleen
In-Reply-To: <1271299206.16881.1764.camel@edumazet-laptop>

On Thu, Apr 15, 2010 at 10:40 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Agree 100%, and irqbalance is the existing daemon. It should be used and
> changed if necessary.
>
> Changli, my stronges argument about your patches is that our scheduler
> and memory affinity api (numactl driven) is bitmask oriented, giving the
> same weight to individual cpu or individual memory node.
>

It works with the assumption: the workloads handled in non-schedulable
context are less than the others. If most of work is done in
non-schedulable(softirq) context, scheduler can't keep load balance.


-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [PATCH] net: fix softnet_stat
From: Changli Gao @ 2010-04-15  3:01 UTC (permalink / raw)
  To: David S. Miller; +Cc: Tom Herbert, Eric Dumazet, netdev, Changli Gao

fix softnet_stat

Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
without any protection. And enqueue_to_backlog should update the netdev_rx_stat
of the target CPU.

This patch changes the meaning of softnet_data.total to the number of packets
processed, not includes dropped, in order to avoid it sharing between IRQ and
SoftIRQ. And softnet_data.dropped is protected in the corresponding
input_pkt_queue.lock, if RPS is enabled.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/core/dev.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index a10a216..b38a991 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
 	queue = &per_cpu(softnet_data, cpu);

 	local_irq_save(flags);
-	__get_cpu_var(netdev_rx_stat).total++;

 	rps_lock(queue);
 	if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
@@ -2366,9 +2365,9 @@ enqueue:
 		goto enqueue;
 	}

+	per_cpu(netdev_rx_stat, cpu).dropped++;
 	rps_unlock(queue);

-	__get_cpu_var(netdev_rx_stat).dropped++;
 	local_irq_restore(flags);

 	kfree_skb(skb);
@@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
 	struct netif_rx_stats *s = v;

 	seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
-		   s->total, s->dropped, s->time_squeeze, 0,
+		   s->total + s->dropped, s->dropped, s->time_squeeze, 0,
 		   0, 0, 0, 0, /* was fastroute */
 		   s->cpu_collision, s->received_rps);
 	return 0;

^ permalink raw reply related

* Re: [PATCH] net: fix softnet_stat
From: Changli Gao @ 2010-04-15  3:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: Tom Herbert, Eric Dumazet, netdev, Changli Gao
In-Reply-To: <1271300513-9995-1-git-send-email-xiaosuo@gmail.com>

On Thu, Apr 15, 2010 at 11:01 AM, Changli Gao <xiaosuo@gmail.com> wrote:
> fix softnet_stat
>
> Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
> without any protection. And enqueue_to_backlog should update the netdev_rx_stat
> of the target CPU.
>
> This patch changes the meaning of softnet_data.total to the number of packets
> processed, not includes dropped, in order to avoid it sharing between IRQ and
> SoftIRQ. And softnet_data.dropped is protected in the corresponding
> input_pkt_queue.lock, if RPS is enabled.
>

I forget a comment:

This patch also fixed a bug: packets received by enqueue_to_backlog()
was counted twice: one in enqueue_to_backlog(), and one in
__netif_receive_skb(), although they may not on the same CPUs.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Eric Dumazet @ 2010-04-15  3:28 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271300513-9995-1-git-send-email-xiaosuo@gmail.com>

Le jeudi 15 avril 2010 à 11:01 +0800, Changli Gao a écrit :
> fix softnet_stat
> 
> Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
> without any protection. And enqueue_to_backlog should update the netdev_rx_stat
> of the target CPU.
> 
> This patch changes the meaning of softnet_data.total to the number of packets
> processed, not includes dropped, in order to avoid it sharing between IRQ and
> SoftIRQ. And softnet_data.dropped is protected in the corresponding
> input_pkt_queue.lock, if RPS is enabled.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----
>  net/core/dev.c |    5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a10a216..b38a991 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
>  	queue = &per_cpu(softnet_data, cpu);
>  
>  	local_irq_save(flags);
> -	__get_cpu_var(netdev_rx_stat).total++;

I think this was nice, because we can see number of frames received by
this cpu, before giving them to another cpu.

Maybe we want a new counter ?

	__get_cpu_var(netdev_rx_stat).inpackets++;


>  
>  	rps_lock(queue);
>  	if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
> @@ -2366,9 +2365,9 @@ enqueue:
>  		goto enqueue;
>  	}
>  
> +	per_cpu(netdev_rx_stat, cpu).dropped++;

This is a bit expensive, and could be a queue->rx_stat.dropped++
if netdev_rx_stat is moved inside queue structure.


>  	rps_unlock(queue);
>  
> -	__get_cpu_var(netdev_rx_stat).dropped++;
>  	local_irq_restore(flags);
>  
>  	kfree_skb(skb);
> @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
>  	struct netif_rx_stats *s = v;
>  
>  	seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
> -		   s->total, s->dropped, s->time_squeeze, 0,
> +		   s->total + s->dropped, s->dropped, s->time_squeeze, 0,

This is wrong I believe... I prefer to add a new column, s->inpackets ?

>  		   0, 0, 0, 0, /* was fastroute */
>  		   s->cpu_collision, s->received_rps);
>  	return 0;
> --




^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Eric Dumazet @ 2010-04-15  3:40 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271302133.16881.1814.camel@edumazet-laptop>

Le jeudi 15 avril 2010 à 05:28 +0200, Eric Dumazet a écrit :
> >  
> > +	per_cpu(netdev_rx_stat, cpu).dropped++;
> 
> This is a bit expensive, and could be a queue->rx_stat.dropped++
> if netdev_rx_stat is moved inside queue structure.

Hmm, this is not really needed, this event is the slow path and should
almost never happen. This part of your fix is fine.

^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Changli Gao @ 2010-04-15  3:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271302133.16881.1814.camel@edumazet-laptop>

On Thu, Apr 15, 2010 at 11:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> ----
>>  net/core/dev.c |    5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index a10a216..b38a991 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
>>       queue = &per_cpu(softnet_data, cpu);
>>
>>       local_irq_save(flags);
>> -     __get_cpu_var(netdev_rx_stat).total++;
>
> I think this was nice, because we can see number of frames received by
> this cpu, before giving them to another cpu.
>

:( Packets will be counted again in __net_receive_skb(). Now the
meaning of total is confusing.

> Maybe we want a new counter ?
>
>        __get_cpu_var(netdev_rx_stat).inpackets++;
>

How about replacing total with received?

>>
>>       rps_lock(queue);
>>       if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
>> @@ -2366,9 +2365,9 @@ enqueue:
>>               goto enqueue;
>>       }
>>
>> +     per_cpu(netdev_rx_stat, cpu).dropped++;
>
> This is a bit expensive, and could be a queue->rx_stat.dropped++
> if netdev_rx_stat is moved inside queue structure.
>

It will make softnet_data larger.?

>
>>       rps_unlock(queue);
>>
>> -     __get_cpu_var(netdev_rx_stat).dropped++;
>>       local_irq_restore(flags);
>>
>>       kfree_skb(skb);
>> @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
>>       struct netif_rx_stats *s = v;
>>
>>       seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
>> -                s->total, s->dropped, s->time_squeeze, 0,
>> +                s->total + s->dropped, s->dropped, s->time_squeeze, 0,
>
> This is wrong I believe... I prefer to add a new column, s->inpackets ?

If I rename total to received, is it still confusing?



-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Changli Gao @ 2010-04-15  3:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271302846.16881.1828.camel@edumazet-laptop>

On Thu, Apr 15, 2010 at 11:40 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 15 avril 2010 à 05:28 +0200, Eric Dumazet a écrit :
>> >
>> > +   per_cpu(netdev_rx_stat, cpu).dropped++;
>>
>> This is a bit expensive, and could be a queue->rx_stat.dropped++
>> if netdev_rx_stat is moved inside queue structure.
>
> Hmm, this is not really needed, this event is the slow path and should
> almost never happen. This part of your fix is fine.
>

It there any document about /proc/net/softnet_stat?

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Changli Gao @ 2010-04-15  4:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271302133.16881.1814.camel@edumazet-laptop>

On Thu, Apr 15, 2010 at 11:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> I think this was nice, because we can see number of frames received by
> this cpu, before giving them to another cpu.
>
> Maybe we want a new counter ?
>
>        __get_cpu_var(netdev_rx_stat).inpackets++;
>
>
>>
>>       rps_lock(queue);
>>       if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
>> @@ -2366,9 +2365,9 @@ enqueue:
>>               goto enqueue;
>>       }
>>
>> +     per_cpu(netdev_rx_stat, cpu).dropped++;
>
> This is a bit expensive, and could be a queue->rx_stat.dropped++
> if netdev_rx_stat is moved inside queue structure.
>
>
>>       rps_unlock(queue);
>>
>> -     __get_cpu_var(netdev_rx_stat).dropped++;
>>       local_irq_restore(flags);
>>
>>       kfree_skb(skb);
>> @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
>>       struct netif_rx_stats *s = v;
>>
>>       seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
>> -                s->total, s->dropped, s->time_squeeze, 0,
>> +                s->total + s->dropped, s->dropped, s->time_squeeze, 0,
>
> This is wrong I believe... I prefer to add a new column, s->inpackets ?
>

the first column total should be the number of the packets, which are
processed by this CPU before RPS involved. If RPS is enabled, it
should keep the old meaning. If new statistics data is needed by RPS,
we would add it in another patch.


-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Eric Dumazet @ 2010-04-15  4:03 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <h2q412e6f7f1004142041ld9c760aw5dfd192da506e92c@mail.gmail.com>

Le jeudi 15 avril 2010 à 11:41 +0800, Changli Gao a écrit :
> On Thu, Apr 15, 2010 at 11:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> ----
> >>  net/core/dev.c |    5 ++---
> >>  1 file changed, 2 insertions(+), 3 deletions(-)
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index a10a216..b38a991 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
> >>       queue = &per_cpu(softnet_data, cpu);
> >>
> >>       local_irq_save(flags);
> >> -     __get_cpu_var(netdev_rx_stat).total++;
> >
> > I think this was nice, because we can see number of frames received by
> > this cpu, before giving them to another cpu.
> >
> 
> :( Packets will be counted again in __net_receive_skb(). Now the
> meaning of total is confusing.

Yes, this should be documented somewhere :)

> 
> > Maybe we want a new counter ?
> >
> >        __get_cpu_var(netdev_rx_stat).inpackets++;
> >
> 
> How about replacing total with received?
> 
> >>
> >>       rps_lock(queue);
> >>       if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
> >> @@ -2366,9 +2365,9 @@ enqueue:
> >>               goto enqueue;
> >>       }
> >>
> >> +     per_cpu(netdev_rx_stat, cpu).dropped++;
> >
> > This is a bit expensive, and could be a queue->rx_stat.dropped++
> > if netdev_rx_stat is moved inside queue structure.
> >
> 
> It will make softnet_data larger.?

We move a per_cpu struture inside another one. No extra bytes needed.

> 
> >
> >>       rps_unlock(queue);
> >>
> >> -     __get_cpu_var(netdev_rx_stat).dropped++;
> >>       local_irq_restore(flags);
> >>
> >>       kfree_skb(skb);
> >> @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
> >>       struct netif_rx_stats *s = v;
> >>
> >>       seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
> >> -                s->total, s->dropped, s->time_squeeze, 0,
> >> +                s->total + s->dropped, s->dropped, s->time_squeeze, 0,
> >
> > This is wrong I believe... I prefer to add a new column, s->inpackets ?
> 
> If I rename total to received, is it still confusing?

Keep in mind /proc/net/softnet_stat is an ABI, and first column had the
meaning : number of packets processed by this cpu.

But this was pre RPS time. Now we have a two phases process.

This makes sense to :

let 'total' be the counter of packets processed in upper levels (IP/TCP
stack). As /proc/net/softnet_stat has no header, the 'total' name is not
important and could be changed.

add a new column (after the existing ones in /proc file to keep ABI
compatibility) with the meaning : number of packets pre-processed before
network stacks.

total : number of packets processed in upper layers (IP stack for
example)
dropped: number of packets dropped because this cpu queue was full
time_squeeze: number of time squeeze of this cpu
cpu_collision:...
received_rps: number of time this cpu received a RPS signal from another
cpu
inpackets : number of packets processed in layer 1 by this cpu (RPS
level)

^ permalink raw reply

* Re: [GIT PULL] vhost-net fix for 2.6.34-rc4
From: Michael S. Tsirkin @ 2010-04-15  4:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, hch
In-Reply-To: <20100414.155715.210946931.davem@davemloft.net>

On Wed, Apr 14, 2010 at 03:57:15PM -0700, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Wed, 14 Apr 2010 17:17:33 +0300
> 
> > Christoph Hellwig (1):
> >       vhost: fix sparse warnings
> 
> Do the sparse warnings show actual real bugs, or are we just
> adding annotations to make the tool happy?
> 
> Unless the bugs being fixed are earth shattering, this belongs
> in net-next-2.6 not net-2.6

No, these are just annotations to make the tool happy.
Pls queue for net-next-2.6. Thanks!

-- 
MST

^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Eric Dumazet @ 2010-04-15  4:14 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271302846.16881.1828.camel@edumazet-laptop>

Le jeudi 15 avril 2010 à 05:40 +0200, Eric Dumazet a écrit :
> Le jeudi 15 avril 2010 à 05:28 +0200, Eric Dumazet a écrit :
> > >  
> > > +	per_cpu(netdev_rx_stat, cpu).dropped++;
> > 
> > This is a bit expensive, and could be a queue->rx_stat.dropped++
> > if netdev_rx_stat is moved inside queue structure.
> 
> Hmm, this is not really needed, this event is the slow path and should
> almost never happen. This part of your fix is fine.
> 
> 
> -

Oh well, I revert this comment, we need a mutual exclusion between cpus
if we dont want to miss some increments, even in a slow path.

__get_cpu_var(netdev_rx_stat).field++; is safe because updated only by
this cpu.

Once you use per_cpu(netdev_rx_stat, cpu).field++; you need a
synchronization since this is not atomic.

so queue->dropped++ is safer and faster too (inside the rps_lock
section)

^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Changli Gao @ 2010-04-15  4:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271304846.16881.1881.camel@edumazet-laptop>

On Thu, Apr 15, 2010 at 12:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 15 avril 2010 à 05:40 +0200, Eric Dumazet a écrit :
>
> Oh well, I revert this comment, we need a mutual exclusion between cpus
> if we dont want to miss some increments, even in a slow path.
>
> __get_cpu_var(netdev_rx_stat).field++; is safe because updated only by
> this cpu.
>
> Once you use per_cpu(netdev_rx_stat, cpu).field++; you need a
> synchronization since this is not atomic.
>

rps_unlock() is a synchronization if needed.

> so queue->dropped++ is safer and faster too (inside the rps_lock
> section)
>

It involves more lines of code, moving netdev_rx_stat into softdata.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [PATCH v2] net: fix softnet_stat
From: Changli Gao @ 2010-04-15  5:30 UTC (permalink / raw)
  To: David S. Miller; +Cc: Tom Herbert, Eric Dumazet, netdev, Changli Gao

fix softnet_stat

Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
without any protection. And enqueue_to_backlog should update the netdev_rx_stat
of the target CPU.

This patch splits softnet_data.total into softnet_data.received and
softnet_data.dropped internally, in order to avoid it sharing between IRQ and
SoftIRQ. The old ABI softnet_data.total is kept by summing softnet_data.received
and softnet_data.dropped when exporting it to userland. And softnet_data.dropped
is protected in the corresponding input_pkt_queue.lock, if RPS is enabled.

This patch also fixed a bug: packets received by enqueue_to_backlog() were
counted twice: one in enqueue_to_backlog(), and the other in
__netif_receive_skb(), although they maybe not on the same CPUs, if RPS is
involved.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/linux/netdevice.h |    2 +-
 net/core/dev.c            |    7 +++----
 2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d1a21b5..394e850 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -219,7 +219,7 @@ struct neigh_parms;
 struct sk_buff;

 struct netif_rx_stats {
-	unsigned total;
+	unsigned received;
 	unsigned dropped;
 	unsigned time_squeeze;
 	unsigned cpu_collision;
diff --git a/net/core/dev.c b/net/core/dev.c
index a10a216..5817f0e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
 	queue = &per_cpu(softnet_data, cpu);

 	local_irq_save(flags);
-	__get_cpu_var(netdev_rx_stat).total++;

 	rps_lock(queue);
 	if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
@@ -2366,9 +2365,9 @@ enqueue:
 		goto enqueue;
 	}

+	per_cpu(netdev_rx_stat, cpu).dropped++;
 	rps_unlock(queue);

-	__get_cpu_var(netdev_rx_stat).dropped++;
 	local_irq_restore(flags);

 	kfree_skb(skb);
@@ -2679,7 +2678,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
 			skb->dev = master;
 	}

-	__get_cpu_var(netdev_rx_stat).total++;
+	__get_cpu_var(netdev_rx_stat).received++;

 	skb_reset_network_header(skb);
 	skb_reset_transport_header(skb);
@@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
 	struct netif_rx_stats *s = v;

 	seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
-		   s->total, s->dropped, s->time_squeeze, 0,
+		   s->received + s->dropped, s->dropped, s->time_squeeze, 0,
 		   0, 0, 0, 0, /* was fastroute */
 		   s->cpu_collision, s->received_rps);
 	return 0;

^ permalink raw reply related

* RE: [PATCH] Infiniband: Randomize local port allocation.
From: Tetsuo Handa @ 2010-04-15  5:46 UTC (permalink / raw)
  To: sean.hefty; +Cc: netdev, rolandd, amwang
In-Reply-To: <195EB1AD388F455AA386EB214FCD09F4@amr.corp.intel.com>

Tetsuo Handa wrote:
> Sean Hefty wrote:
> > >+	remaining = (high - low) + 1;
> > >+	rover = net_random() % remaining + low;
> > >+	do {
> > >+		rover++;
> > >+		if ((rover < low) || (rover > high))
> > >+			rover = low;
> > 
> > Assuming that we're likely to pick a valid port on the first try, it would be
> > more efficient to move the above 3 lines to the end of the while loop.
> > 
> Indeed. I moved these lines to "if (--remaining) { ... }" block.

Oh, I made more changes than needed.
Adding "goto start;" and "start:" are sufficient.
Please use whichever you like.
--------------------
[PATCH] Infiniband: Randomize local port allocation.

Randomize local port allocation in a way sctp_get_port_local() does.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/infiniband/core/cma.c |   71 +++++++++++++++---------------------------
 1 file changed, 26 insertions(+), 45 deletions(-)

--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
 static DEFINE_IDR(tcp_ps);
 static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
-static int next_port;
 
 struct cma_device {
 	struct list_head	list;
@@ -1970,47 +1969,34 @@ err1:
 
 static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 {
-	struct rdma_bind_list *bind_list;
-	int port, ret, low, high;
-
-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
-	if (!bind_list)
-		return -ENOMEM;
-
-retry:
-	/* FIXME: add proper port randomization per like inet_csk_get_port */
-	do {
-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
-
-	if (ret)
-		goto err1;
+	static unsigned int last_used_port;
+	int low, high, remaining;
+	unsigned int rover;
 
 	inet_get_local_port_range(&low, &high);
-	if (port > high) {
-		if (next_port != low) {
-			idr_remove(ps, port);
-			next_port = low;
-			goto retry;
+	remaining = (high - low) + 1;
+	rover = net_random() % remaining + low;
+	goto start;
+	do {
+		rover++;
+		if ((rover < low) || (rover > high))
+			rover = low;
+start:
+		if (last_used_port != rover &&
+		    !idr_find(ps, (unsigned short) rover)) {
+			int ret = cma_alloc_port(ps, id_priv, rover);
+			/*
+			 * Remember previously used port number in order to
+			 * avoid re-using same port immediately after it is
+			 * closed.
+			 */
+			if (!ret)
+				last_used_port = rover;
+			if (ret != -EADDRNOTAVAIL)
+				return ret;
 		}
-		ret = -EADDRNOTAVAIL;
-		goto err2;
-	}
-
-	if (port == high)
-		next_port = low;
-	else
-		next_port = port + 1;
-
-	bind_list->ps = ps;
-	bind_list->port = (unsigned short) port;
-	cma_bind_port(bind_list, id_priv);
-	return 0;
-err2:
-	idr_remove(ps, port);
-err1:
-	kfree(bind_list);
-	return ret;
+	} while (--remaining);
+	return -EADDRNOTAVAIL;
 }
 
 static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
@@ -2995,12 +2981,7 @@ static void cma_remove_one(struct ib_dev
 
 static int __init cma_init(void)
 {
-	int ret, low, high, remaining;
-
-	get_random_bytes(&next_port, sizeof next_port);
-	inet_get_local_port_range(&low, &high);
-	remaining = (high - low) + 1;
-	next_port = ((unsigned int) next_port % remaining) + low;
+	int ret;
 
 	cma_wq = create_singlethread_workqueue("rdma_cm");
 	if (!cma_wq)

^ permalink raw reply

* Re: [GIT PULL] vhost-net fix for 2.6.34-rc4
From: David Miller @ 2010-04-15  5:53 UTC (permalink / raw)
  To: mst; +Cc: netdev, hch
In-Reply-To: <20100415040557.GA12108@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Thu, 15 Apr 2010 07:05:57 +0300

> On Wed, Apr 14, 2010 at 03:57:15PM -0700, David Miller wrote:
>> From: "Michael S. Tsirkin" <mst@redhat.com>
>> Date: Wed, 14 Apr 2010 17:17:33 +0300
>> 
>> > Christoph Hellwig (1):
>> >       vhost: fix sparse warnings
>> 
>> Do the sparse warnings show actual real bugs, or are we just
>> adding annotations to make the tool happy?
>> 
>> Unless the bugs being fixed are earth shattering, this belongs
>> in net-next-2.6 not net-2.6
> 
> No, these are just annotations to make the tool happy.
> Pls queue for net-next-2.6. Thanks!

Done, thanks Michael.

^ permalink raw reply

* Re: [linux-2.6.34-rc3] drivers/net: ks8851 MLL ethernet network driver
From: David Miller @ 2010-04-15  6:38 UTC (permalink / raw)
  To: David.Choi; +Cc: netdev

I cannot apply this, there are too many issues:

@@ -378,34 +384,34 @@ union ks_tx_hdr {

 /**
  * struct ks_net - KS8851 driver private data
- * @net_device 	: The network device we're bound to
+ * @net_device	: The network device we're bound to
  * @hw_addr	: start address of data register.

Do not mix lots of coding style and functional changes.

If you want to fix coding style, do it later in a seperate
patch after you implement your functional change.

+	if (!reg_data & CCR_ENDIAN) {

This test will never do what you want it to do.

First the compiler will evaluate "!reg_data" which will be either "0"
or "1", then it will and it with the CCR_ENDIAN mask, which is (1 <<
10).  And this can never be true.

I also do not like all of the endian ifdefs peppered all over the
driver.  Try consolidate this by creating a header file that contains
the structure and macro definitions, and in there create helper inline
functions which do the ifdefs.  This way the ks8851_mll.c file can
stay ifdef clean.

Thanks.

^ permalink raw reply

* Re: [PATCH] drivers/net: remove network drivers' last few uses of IRQF_SAMPLE_RANDOM
From: David Miller @ 2010-04-15  6:42 UTC (permalink / raw)
  To: cpeterso; +Cc: netdev, linux-kernel, mpm
In-Reply-To: <1270877380-23813-1-git-send-email-cpeterso@cpeterso.com>

From: Chris Peterson <cpeterso@cpeterso.com>
Date: Sat, 10 Apr 2010 01:29:40 -0400

> feature-removal-schedule.txt says a new /dev/random entropy model is planned 
> (for July 2009). Until then, this patch removes the few remaining uses of 
> IRQF_SAMPLE_RANDOM from drivers/net/*. 12 network drivers are affected, one 
> more than last year (netxen_nic_main.c). If idle servers need more entropy, 
> they ought to use a hardware RNG or an entropy-gathering userspace daemon.
> 
> Signed-off-by: Chris Peterson <cpeterso@cpeterso.com>

Well, that feature-removal-schedule.txt entry states that new
add_*_randomness() interfaces will be created such that things
like network devices can indicate what kind of entropy they
are providing.

Those interfaces have not yet been created as far as I can tell, and
the idea is that network drivers could use those not-yet-written new
interfaces.

Until that time I feel it does more harm than good to remove these
annotations since frankly there never ever was good agreement about
whether we should do this or not.

It may seem stupid and pointless to just do nothing, but at this time
I think that's still the best thing to do.

Therefore, I'm not applying your patch, sorry.

^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: David Miller @ 2010-04-15  6:52 UTC (permalink / raw)
  To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat
In-Reply-To: <1271056697.16881.7.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 12 Apr 2010 09:18:17 +0200

> [PATCH] net: dev_pick_tx() fix
> 
> When dev_pick_tx() caches tx queue_index on a socket, we must check
> socket dst_entry matches skb one, or risk a crash later, as reported by
> Denys Fedorysychenko, if old packets are in flight during a route
> change, involving devices with different number of queues.
> 
> Bug introduced by commit a4ee3ce3
> (net: Use sk_tx_queue_mapping for connected sockets)
> 
> Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

It looks like Denys is still getting crashes even with this patch
applied.  And I also think there is some meric to some of Krishna's
analysis.

To me it seems to make more sense to validate the SKB's queue against
the real actual choosen device's range.

The socket queue index will catch up and eventually become valid
because the dst reset will invalidate the queue setting, and we'll
thus recompute it as needed, as Krishna stated.

So I'm tossing this patch for now since it doesn't even aparently
fix the bug.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox