Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [linux-2.6.34-rc3] drivers/net: ks8851 MLL ethernet network driver
From: David Miller @ 2010-04-15  6:38 UTC (permalink / raw)
  To: David.Choi; +Cc: netdev

I cannot apply this, there are too many issues:

@@ -378,34 +384,34 @@ union ks_tx_hdr {

 /**
  * struct ks_net - KS8851 driver private data
- * @net_device 	: The network device we're bound to
+ * @net_device	: The network device we're bound to
  * @hw_addr	: start address of data register.

Do not mix lots of coding style and functional changes.

If you want to fix coding style, do it later in a seperate
patch after you implement your functional change.

+	if (!reg_data & CCR_ENDIAN) {

This test will never do what you want it to do.

First the compiler will evaluate "!reg_data" which will be either "0"
or "1", then it will and it with the CCR_ENDIAN mask, which is (1 <<
10).  And this can never be true.

I also do not like all of the endian ifdefs peppered all over the
driver.  Try consolidate this by creating a header file that contains
the structure and macro definitions, and in there create helper inline
functions which do the ifdefs.  This way the ks8851_mll.c file can
stay ifdef clean.

Thanks.

^ permalink raw reply

* Re: [GIT PULL] vhost-net fix for 2.6.34-rc4
From: David Miller @ 2010-04-15  5:53 UTC (permalink / raw)
  To: mst; +Cc: netdev, hch
In-Reply-To: <20100415040557.GA12108@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Thu, 15 Apr 2010 07:05:57 +0300

> On Wed, Apr 14, 2010 at 03:57:15PM -0700, David Miller wrote:
>> From: "Michael S. Tsirkin" <mst@redhat.com>
>> Date: Wed, 14 Apr 2010 17:17:33 +0300
>> 
>> > Christoph Hellwig (1):
>> >       vhost: fix sparse warnings
>> 
>> Do the sparse warnings show actual real bugs, or are we just
>> adding annotations to make the tool happy?
>> 
>> Unless the bugs being fixed are earth shattering, this belongs
>> in net-next-2.6 not net-2.6
> 
> No, these are just annotations to make the tool happy.
> Pls queue for net-next-2.6. Thanks!

Done, thanks Michael.

^ permalink raw reply

* RE: [PATCH] Infiniband: Randomize local port allocation.
From: Tetsuo Handa @ 2010-04-15  5:46 UTC (permalink / raw)
  To: sean.hefty; +Cc: netdev, rolandd, amwang
In-Reply-To: <195EB1AD388F455AA386EB214FCD09F4@amr.corp.intel.com>

Tetsuo Handa wrote:
> Sean Hefty wrote:
> > >+	remaining = (high - low) + 1;
> > >+	rover = net_random() % remaining + low;
> > >+	do {
> > >+		rover++;
> > >+		if ((rover < low) || (rover > high))
> > >+			rover = low;
> > 
> > Assuming that we're likely to pick a valid port on the first try, it would be
> > more efficient to move the above 3 lines to the end of the while loop.
> > 
> Indeed. I moved these lines to "if (--remaining) { ... }" block.

Oh, I made more changes than needed.
Adding "goto start;" and "start:" are sufficient.
Please use whichever you like.
--------------------
[PATCH] Infiniband: Randomize local port allocation.

Randomize local port allocation in a way sctp_get_port_local() does.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/infiniband/core/cma.c |   71 +++++++++++++++---------------------------
 1 file changed, 26 insertions(+), 45 deletions(-)

--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
 static DEFINE_IDR(tcp_ps);
 static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
-static int next_port;
 
 struct cma_device {
 	struct list_head	list;
@@ -1970,47 +1969,34 @@ err1:
 
 static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 {
-	struct rdma_bind_list *bind_list;
-	int port, ret, low, high;
-
-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
-	if (!bind_list)
-		return -ENOMEM;
-
-retry:
-	/* FIXME: add proper port randomization per like inet_csk_get_port */
-	do {
-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
-
-	if (ret)
-		goto err1;
+	static unsigned int last_used_port;
+	int low, high, remaining;
+	unsigned int rover;
 
 	inet_get_local_port_range(&low, &high);
-	if (port > high) {
-		if (next_port != low) {
-			idr_remove(ps, port);
-			next_port = low;
-			goto retry;
+	remaining = (high - low) + 1;
+	rover = net_random() % remaining + low;
+	goto start;
+	do {
+		rover++;
+		if ((rover < low) || (rover > high))
+			rover = low;
+start:
+		if (last_used_port != rover &&
+		    !idr_find(ps, (unsigned short) rover)) {
+			int ret = cma_alloc_port(ps, id_priv, rover);
+			/*
+			 * Remember previously used port number in order to
+			 * avoid re-using same port immediately after it is
+			 * closed.
+			 */
+			if (!ret)
+				last_used_port = rover;
+			if (ret != -EADDRNOTAVAIL)
+				return ret;
 		}
-		ret = -EADDRNOTAVAIL;
-		goto err2;
-	}
-
-	if (port == high)
-		next_port = low;
-	else
-		next_port = port + 1;
-
-	bind_list->ps = ps;
-	bind_list->port = (unsigned short) port;
-	cma_bind_port(bind_list, id_priv);
-	return 0;
-err2:
-	idr_remove(ps, port);
-err1:
-	kfree(bind_list);
-	return ret;
+	} while (--remaining);
+	return -EADDRNOTAVAIL;
 }
 
 static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
@@ -2995,12 +2981,7 @@ static void cma_remove_one(struct ib_dev
 
 static int __init cma_init(void)
 {
-	int ret, low, high, remaining;
-
-	get_random_bytes(&next_port, sizeof next_port);
-	inet_get_local_port_range(&low, &high);
-	remaining = (high - low) + 1;
-	next_port = ((unsigned int) next_port % remaining) + low;
+	int ret;
 
 	cma_wq = create_singlethread_workqueue("rdma_cm");
 	if (!cma_wq)

^ permalink raw reply

* [PATCH v2] net: fix softnet_stat
From: Changli Gao @ 2010-04-15  5:30 UTC (permalink / raw)
  To: David S. Miller; +Cc: Tom Herbert, Eric Dumazet, netdev, Changli Gao

fix softnet_stat

Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
without any protection. And enqueue_to_backlog should update the netdev_rx_stat
of the target CPU.

This patch splits softnet_data.total into softnet_data.received and
softnet_data.dropped internally, in order to avoid it sharing between IRQ and
SoftIRQ. The old ABI softnet_data.total is kept by summing softnet_data.received
and softnet_data.dropped when exporting it to userland. And softnet_data.dropped
is protected in the corresponding input_pkt_queue.lock, if RPS is enabled.

This patch also fixed a bug: packets received by enqueue_to_backlog() were
counted twice: one in enqueue_to_backlog(), and the other in
__netif_receive_skb(), although they maybe not on the same CPUs, if RPS is
involved.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/linux/netdevice.h |    2 +-
 net/core/dev.c            |    7 +++----
 2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d1a21b5..394e850 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -219,7 +219,7 @@ struct neigh_parms;
 struct sk_buff;

 struct netif_rx_stats {
-	unsigned total;
+	unsigned received;
 	unsigned dropped;
 	unsigned time_squeeze;
 	unsigned cpu_collision;
diff --git a/net/core/dev.c b/net/core/dev.c
index a10a216..5817f0e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
 	queue = &per_cpu(softnet_data, cpu);

 	local_irq_save(flags);
-	__get_cpu_var(netdev_rx_stat).total++;

 	rps_lock(queue);
 	if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
@@ -2366,9 +2365,9 @@ enqueue:
 		goto enqueue;
 	}

+	per_cpu(netdev_rx_stat, cpu).dropped++;
 	rps_unlock(queue);

-	__get_cpu_var(netdev_rx_stat).dropped++;
 	local_irq_restore(flags);

 	kfree_skb(skb);
@@ -2679,7 +2678,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
 			skb->dev = master;
 	}

-	__get_cpu_var(netdev_rx_stat).total++;
+	__get_cpu_var(netdev_rx_stat).received++;

 	skb_reset_network_header(skb);
 	skb_reset_transport_header(skb);
@@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
 	struct netif_rx_stats *s = v;

 	seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
-		   s->total, s->dropped, s->time_squeeze, 0,
+		   s->received + s->dropped, s->dropped, s->time_squeeze, 0,
 		   0, 0, 0, 0, /* was fastroute */
 		   s->cpu_collision, s->received_rps);
 	return 0;

^ permalink raw reply related

* Re: [PATCH] net: fix softnet_stat
From: Changli Gao @ 2010-04-15  4:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271304846.16881.1881.camel@edumazet-laptop>

On Thu, Apr 15, 2010 at 12:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 15 avril 2010 à 05:40 +0200, Eric Dumazet a écrit :
>
> Oh well, I revert this comment, we need a mutual exclusion between cpus
> if we dont want to miss some increments, even in a slow path.
>
> __get_cpu_var(netdev_rx_stat).field++; is safe because updated only by
> this cpu.
>
> Once you use per_cpu(netdev_rx_stat, cpu).field++; you need a
> synchronization since this is not atomic.
>

rps_unlock() is a synchronization if needed.

> so queue->dropped++ is safer and faster too (inside the rps_lock
> section)
>

It involves more lines of code, moving netdev_rx_stat into softdata.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Eric Dumazet @ 2010-04-15  4:14 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271302846.16881.1828.camel@edumazet-laptop>

Le jeudi 15 avril 2010 à 05:40 +0200, Eric Dumazet a écrit :
> Le jeudi 15 avril 2010 à 05:28 +0200, Eric Dumazet a écrit :
> > >  
> > > +	per_cpu(netdev_rx_stat, cpu).dropped++;
> > 
> > This is a bit expensive, and could be a queue->rx_stat.dropped++
> > if netdev_rx_stat is moved inside queue structure.
> 
> Hmm, this is not really needed, this event is the slow path and should
> almost never happen. This part of your fix is fine.
> 
> 
> -

Oh well, I revert this comment, we need a mutual exclusion between cpus
if we dont want to miss some increments, even in a slow path.

__get_cpu_var(netdev_rx_stat).field++; is safe because updated only by
this cpu.

Once you use per_cpu(netdev_rx_stat, cpu).field++; you need a
synchronization since this is not atomic.

so queue->dropped++ is safer and faster too (inside the rps_lock
section)

^ permalink raw reply

* Re: [GIT PULL] vhost-net fix for 2.6.34-rc4
From: Michael S. Tsirkin @ 2010-04-15  4:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, hch
In-Reply-To: <20100414.155715.210946931.davem@davemloft.net>

On Wed, Apr 14, 2010 at 03:57:15PM -0700, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Wed, 14 Apr 2010 17:17:33 +0300
> 
> > Christoph Hellwig (1):
> >       vhost: fix sparse warnings
> 
> Do the sparse warnings show actual real bugs, or are we just
> adding annotations to make the tool happy?
> 
> Unless the bugs being fixed are earth shattering, this belongs
> in net-next-2.6 not net-2.6

No, these are just annotations to make the tool happy.
Pls queue for net-next-2.6. Thanks!

-- 
MST

^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Eric Dumazet @ 2010-04-15  4:03 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <h2q412e6f7f1004142041ld9c760aw5dfd192da506e92c@mail.gmail.com>

Le jeudi 15 avril 2010 à 11:41 +0800, Changli Gao a écrit :
> On Thu, Apr 15, 2010 at 11:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> ----
> >>  net/core/dev.c |    5 ++---
> >>  1 file changed, 2 insertions(+), 3 deletions(-)
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index a10a216..b38a991 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
> >>       queue = &per_cpu(softnet_data, cpu);
> >>
> >>       local_irq_save(flags);
> >> -     __get_cpu_var(netdev_rx_stat).total++;
> >
> > I think this was nice, because we can see number of frames received by
> > this cpu, before giving them to another cpu.
> >
> 
> :( Packets will be counted again in __net_receive_skb(). Now the
> meaning of total is confusing.

Yes, this should be documented somewhere :)

> 
> > Maybe we want a new counter ?
> >
> >        __get_cpu_var(netdev_rx_stat).inpackets++;
> >
> 
> How about replacing total with received?
> 
> >>
> >>       rps_lock(queue);
> >>       if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
> >> @@ -2366,9 +2365,9 @@ enqueue:
> >>               goto enqueue;
> >>       }
> >>
> >> +     per_cpu(netdev_rx_stat, cpu).dropped++;
> >
> > This is a bit expensive, and could be a queue->rx_stat.dropped++
> > if netdev_rx_stat is moved inside queue structure.
> >
> 
> It will make softnet_data larger.?

We move a per_cpu struture inside another one. No extra bytes needed.

> 
> >
> >>       rps_unlock(queue);
> >>
> >> -     __get_cpu_var(netdev_rx_stat).dropped++;
> >>       local_irq_restore(flags);
> >>
> >>       kfree_skb(skb);
> >> @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
> >>       struct netif_rx_stats *s = v;
> >>
> >>       seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
> >> -                s->total, s->dropped, s->time_squeeze, 0,
> >> +                s->total + s->dropped, s->dropped, s->time_squeeze, 0,
> >
> > This is wrong I believe... I prefer to add a new column, s->inpackets ?
> 
> If I rename total to received, is it still confusing?

Keep in mind /proc/net/softnet_stat is an ABI, and first column had the
meaning : number of packets processed by this cpu.

But this was pre RPS time. Now we have a two phases process.

This makes sense to :

let 'total' be the counter of packets processed in upper levels (IP/TCP
stack). As /proc/net/softnet_stat has no header, the 'total' name is not
important and could be changed.

add a new column (after the existing ones in /proc file to keep ABI
compatibility) with the meaning : number of packets pre-processed before
network stacks.

total : number of packets processed in upper layers (IP stack for
example)
dropped: number of packets dropped because this cpu queue was full
time_squeeze: number of time squeeze of this cpu
cpu_collision:...
received_rps: number of time this cpu received a RPS signal from another
cpu
inpackets : number of packets processed in layer 1 by this cpu (RPS
level)

^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Changli Gao @ 2010-04-15  4:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271302133.16881.1814.camel@edumazet-laptop>

On Thu, Apr 15, 2010 at 11:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> I think this was nice, because we can see number of frames received by
> this cpu, before giving them to another cpu.
>
> Maybe we want a new counter ?
>
>        __get_cpu_var(netdev_rx_stat).inpackets++;
>
>
>>
>>       rps_lock(queue);
>>       if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
>> @@ -2366,9 +2365,9 @@ enqueue:
>>               goto enqueue;
>>       }
>>
>> +     per_cpu(netdev_rx_stat, cpu).dropped++;
>
> This is a bit expensive, and could be a queue->rx_stat.dropped++
> if netdev_rx_stat is moved inside queue structure.
>
>
>>       rps_unlock(queue);
>>
>> -     __get_cpu_var(netdev_rx_stat).dropped++;
>>       local_irq_restore(flags);
>>
>>       kfree_skb(skb);
>> @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
>>       struct netif_rx_stats *s = v;
>>
>>       seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
>> -                s->total, s->dropped, s->time_squeeze, 0,
>> +                s->total + s->dropped, s->dropped, s->time_squeeze, 0,
>
> This is wrong I believe... I prefer to add a new column, s->inpackets ?
>

the first column total should be the number of the packets, which are
processed by this CPU before RPS involved. If RPS is enabled, it
should keep the old meaning. If new statistics data is needed by RPS,
we would add it in another patch.


-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Changli Gao @ 2010-04-15  3:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271302846.16881.1828.camel@edumazet-laptop>

On Thu, Apr 15, 2010 at 11:40 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 15 avril 2010 à 05:28 +0200, Eric Dumazet a écrit :
>> >
>> > +   per_cpu(netdev_rx_stat, cpu).dropped++;
>>
>> This is a bit expensive, and could be a queue->rx_stat.dropped++
>> if netdev_rx_stat is moved inside queue structure.
>
> Hmm, this is not really needed, this event is the slow path and should
> almost never happen. This part of your fix is fine.
>

It there any document about /proc/net/softnet_stat?

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Changli Gao @ 2010-04-15  3:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271302133.16881.1814.camel@edumazet-laptop>

On Thu, Apr 15, 2010 at 11:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> ----
>>  net/core/dev.c |    5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index a10a216..b38a991 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
>>       queue = &per_cpu(softnet_data, cpu);
>>
>>       local_irq_save(flags);
>> -     __get_cpu_var(netdev_rx_stat).total++;
>
> I think this was nice, because we can see number of frames received by
> this cpu, before giving them to another cpu.
>

:( Packets will be counted again in __net_receive_skb(). Now the
meaning of total is confusing.

> Maybe we want a new counter ?
>
>        __get_cpu_var(netdev_rx_stat).inpackets++;
>

How about replacing total with received?

>>
>>       rps_lock(queue);
>>       if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
>> @@ -2366,9 +2365,9 @@ enqueue:
>>               goto enqueue;
>>       }
>>
>> +     per_cpu(netdev_rx_stat, cpu).dropped++;
>
> This is a bit expensive, and could be a queue->rx_stat.dropped++
> if netdev_rx_stat is moved inside queue structure.
>

It will make softnet_data larger.?

>
>>       rps_unlock(queue);
>>
>> -     __get_cpu_var(netdev_rx_stat).dropped++;
>>       local_irq_restore(flags);
>>
>>       kfree_skb(skb);
>> @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
>>       struct netif_rx_stats *s = v;
>>
>>       seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
>> -                s->total, s->dropped, s->time_squeeze, 0,
>> +                s->total + s->dropped, s->dropped, s->time_squeeze, 0,
>
> This is wrong I believe... I prefer to add a new column, s->inpackets ?

If I rename total to received, is it still confusing?



-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Eric Dumazet @ 2010-04-15  3:40 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271302133.16881.1814.camel@edumazet-laptop>

Le jeudi 15 avril 2010 à 05:28 +0200, Eric Dumazet a écrit :
> >  
> > +	per_cpu(netdev_rx_stat, cpu).dropped++;
> 
> This is a bit expensive, and could be a queue->rx_stat.dropped++
> if netdev_rx_stat is moved inside queue structure.

Hmm, this is not really needed, this event is the slow path and should
almost never happen. This part of your fix is fine.

^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Eric Dumazet @ 2010-04-15  3:28 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271300513-9995-1-git-send-email-xiaosuo@gmail.com>

Le jeudi 15 avril 2010 à 11:01 +0800, Changli Gao a écrit :
> fix softnet_stat
> 
> Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
> without any protection. And enqueue_to_backlog should update the netdev_rx_stat
> of the target CPU.
> 
> This patch changes the meaning of softnet_data.total to the number of packets
> processed, not includes dropped, in order to avoid it sharing between IRQ and
> SoftIRQ. And softnet_data.dropped is protected in the corresponding
> input_pkt_queue.lock, if RPS is enabled.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----
>  net/core/dev.c |    5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a10a216..b38a991 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
>  	queue = &per_cpu(softnet_data, cpu);
>  
>  	local_irq_save(flags);
> -	__get_cpu_var(netdev_rx_stat).total++;

I think this was nice, because we can see number of frames received by
this cpu, before giving them to another cpu.

Maybe we want a new counter ?

	__get_cpu_var(netdev_rx_stat).inpackets++;


>  
>  	rps_lock(queue);
>  	if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
> @@ -2366,9 +2365,9 @@ enqueue:
>  		goto enqueue;
>  	}
>  
> +	per_cpu(netdev_rx_stat, cpu).dropped++;

This is a bit expensive, and could be a queue->rx_stat.dropped++
if netdev_rx_stat is moved inside queue structure.


>  	rps_unlock(queue);
>  
> -	__get_cpu_var(netdev_rx_stat).dropped++;
>  	local_irq_restore(flags);
>  
>  	kfree_skb(skb);
> @@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
>  	struct netif_rx_stats *s = v;
>  
>  	seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
> -		   s->total, s->dropped, s->time_squeeze, 0,
> +		   s->total + s->dropped, s->dropped, s->time_squeeze, 0,

This is wrong I believe... I prefer to add a new column, s->inpackets ?

>  		   0, 0, 0, 0, /* was fastroute */
>  		   s->cpu_collision, s->received_rps);
>  	return 0;
> --




^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Changli Gao @ 2010-04-15  3:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: Tom Herbert, Eric Dumazet, netdev, Changli Gao
In-Reply-To: <1271300513-9995-1-git-send-email-xiaosuo@gmail.com>

On Thu, Apr 15, 2010 at 11:01 AM, Changli Gao <xiaosuo@gmail.com> wrote:
> fix softnet_stat
>
> Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
> without any protection. And enqueue_to_backlog should update the netdev_rx_stat
> of the target CPU.
>
> This patch changes the meaning of softnet_data.total to the number of packets
> processed, not includes dropped, in order to avoid it sharing between IRQ and
> SoftIRQ. And softnet_data.dropped is protected in the corresponding
> input_pkt_queue.lock, if RPS is enabled.
>

I forget a comment:

This patch also fixed a bug: packets received by enqueue_to_backlog()
was counted twice: one in enqueue_to_backlog(), and one in
__netif_receive_skb(), although they may not on the same CPUs.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [PATCH] net: fix softnet_stat
From: Changli Gao @ 2010-04-15  3:01 UTC (permalink / raw)
  To: David S. Miller; +Cc: Tom Herbert, Eric Dumazet, netdev, Changli Gao

fix softnet_stat

Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
without any protection. And enqueue_to_backlog should update the netdev_rx_stat
of the target CPU.

This patch changes the meaning of softnet_data.total to the number of packets
processed, not includes dropped, in order to avoid it sharing between IRQ and
SoftIRQ. And softnet_data.dropped is protected in the corresponding
input_pkt_queue.lock, if RPS is enabled.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/core/dev.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index a10a216..b38a991 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
 	queue = &per_cpu(softnet_data, cpu);

 	local_irq_save(flags);
-	__get_cpu_var(netdev_rx_stat).total++;

 	rps_lock(queue);
 	if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
@@ -2366,9 +2365,9 @@ enqueue:
 		goto enqueue;
 	}

+	per_cpu(netdev_rx_stat, cpu).dropped++;
 	rps_unlock(queue);

-	__get_cpu_var(netdev_rx_stat).dropped++;
 	local_irq_restore(flags);

 	kfree_skb(skb);
@@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
 	struct netif_rx_stats *s = v;

 	seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
-		   s->total, s->dropped, s->time_squeeze, 0,
+		   s->total + s->dropped, s->dropped, s->time_squeeze, 0,
 		   0, 0, 0, 0, /* was fastroute */
 		   s->cpu_collision, s->received_rps);
 	return 0;

^ permalink raw reply related

* Re: rps perfomance WAS(Re: rps: question
From: Changli Gao @ 2010-04-15  2:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, Tom Herbert, hadi, netdev, robert,
	David Miller, Andi Kleen
In-Reply-To: <1271299206.16881.1764.camel@edumazet-laptop>

On Thu, Apr 15, 2010 at 10:40 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Agree 100%, and irqbalance is the existing daemon. It should be used and
> changed if necessary.
>
> Changli, my stronges argument about your patches is that our scheduler
> and memory affinity api (numactl driven) is bitmask oriented, giving the
> same weight to individual cpu or individual memory node.
>

It works with the assumption: the workloads handled in non-schedulable
context are less than the others. If most of work is done in
non-schedulable(softirq) context, scheduler can't keep load balance.


-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH net-next-2.6] drivers: net: use skb_headlen()
From: Eric Dumazet @ 2010-04-15  2:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100414.161504.149014877.davem@davemloft.net>

Le mercredi 14 avril 2010 à 16:15 -0700, David Miller a écrit :

> Eric, there were several typos in this patch and it is clear
> that even e1000 wasn't build tested.
> 
> You used the mystical "skb_headsize()" instead of "skb_headlen()"
> in several locations.
> 
> I fixed this up, but please build test your changes no matter
> how seemingly trivial next time.
> 

Oh well, I was sure I build tested it on a modallconfig, but now I
realize it was on a custom config, sorry David, thanks !



^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: Eric Dumazet @ 2010-04-15  2:40 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Changli Gao, Tom Herbert, hadi, netdev, robert, David Miller,
	Andi Kleen
In-Reply-To: <20100414160220.43a62999@nehalam>

Le mercredi 14 avril 2010 à 16:02 -0700, Stephen Hemminger a écrit :
> On Thu, 15 Apr 2010 06:51:29 +0800
> Changli Gao <xiaosuo@gmail.com> wrote:
> 
> > On Thu, Apr 15, 2010 at 4:57 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > >
> > > RPS can be tuned (Changli wants a finer tuning...), it would be
> > > intereting to tune multiqueue devices too. I dont know if its possible
> > > right now.
> > >
> > > On my Nehalem machine (16 logical cpus), its NetXtreme II BCM57711E
> > > 10Gigabit has 16 queues. It might be good to use less queues according
> > > to your results on some workloads, and eventually use RPS on a second
> > > layering.
> > 
> > My idear is: run a daemon in userland to monitor the softnet
> > statistics, and tun the RPS setting if necessary. It seems that the
> > current softnet statistics data isn't correct.
> > 
> > Long time ago, I did a test, and the conclution was
> > call_function_single IPI was more expensive than resched IPI, so I
> > moved to kernel thread from softirq for packet processing. I'll redo
> > the test later.
> > 
> 
> The big thing is data, data, data... Performance can only be examined
> with real hard data with multiple different kind of hardware.  Also, check for
> regressions in lmbench and TPC benchmarks. Yes this is hard, but papers
> on this would allow for rational rather than speculative choices.
> 
> Adding more tuning knobs is not the answer unless you can show when
> the tuning helps.
> 

Agree 100%, and irqbalance is the existing daemon. It should be used and
changed if necessary.

Changli, my stronges argument about your patches is that our scheduler
and memory affinity api (numactl driven) is bitmask oriented, giving the
same weight to individual cpu or individual memory node.




^ permalink raw reply

* RE: [PATCH] Infiniband: Randomize local port allocation.
From: Tetsuo Handa @ 2010-04-15  2:29 UTC (permalink / raw)
  To: sean.hefty
  Cc: amwang, opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm,
	linux-kernel, rolandd
In-Reply-To: <195EB1AD388F455AA386EB214FCD09F4@amr.corp.intel.com>

Sean Hefty wrote:
> >+	remaining = (high - low) + 1;
> >+	rover = net_random() % remaining + low;
> >+	do {
> >+		rover++;
> >+		if ((rover < low) || (rover > high))
> >+			rover = low;
> 
> Assuming that we're likely to pick a valid port on the first try, it would be
> more efficient to move the above 3 lines to the end of the while loop.
> 
Indeed. I moved these lines to "if (--remaining) { ... }" block.
--------------------
[PATCH] Infiniband: Randomize local port allocation.

Randomize local port allocation in a way sctp_get_port_local() does.
Update rover at the end of loop since we're likely to pick a valid port
on the first try.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/infiniband/core/cma.c |   70 +++++++++++++++---------------------------
 1 file changed, 25 insertions(+), 45 deletions(-)

--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
 static DEFINE_IDR(tcp_ps);
 static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
-static int next_port;
 
 struct cma_device {
 	struct list_head	list;
@@ -1970,47 +1969,33 @@ err1:
 
 static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 {
-	struct rdma_bind_list *bind_list;
-	int port, ret, low, high;
-
-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
-	if (!bind_list)
-		return -ENOMEM;
-
-retry:
-	/* FIXME: add proper port randomization per like inet_csk_get_port */
-	do {
-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
-
-	if (ret)
-		goto err1;
+	static unsigned int last_used_port;
+	int low, high, remaining;
+	unsigned int rover;
 
 	inet_get_local_port_range(&low, &high);
-	if (port > high) {
-		if (next_port != low) {
-			idr_remove(ps, port);
-			next_port = low;
-			goto retry;
-		}
-		ret = -EADDRNOTAVAIL;
-		goto err2;
+	remaining = (high - low) + 1;
+	rover = net_random() % remaining + low;
+retry:
+	if (last_used_port != rover &&
+	    !idr_find(ps, (unsigned short) rover)) {
+		int ret = cma_alloc_port(ps, id_priv, rover);
+		/*
+		 * Remember previously used port number in order to avoid
+		 * re-using same port immediately after it is closed.
+		 */
+		if (!ret)
+			last_used_port = rover;
+		if (ret != -EADDRNOTAVAIL)
+			return ret;
 	}
-
-	if (port == high)
-		next_port = low;
-	else
-		next_port = port + 1;
-
-	bind_list->ps = ps;
-	bind_list->port = (unsigned short) port;
-	cma_bind_port(bind_list, id_priv);
-	return 0;
-err2:
-	idr_remove(ps, port);
-err1:
-	kfree(bind_list);
-	return ret;
+	if (--remaining) {
+		rover++;
+		if ((rover < low) || (rover > high))
+			rover = low;
+		goto retry;
+	}
+	return -EADDRNOTAVAIL;
 }
 
 static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
@@ -2995,12 +2980,7 @@ static void cma_remove_one(struct ib_dev
 
 static int __init cma_init(void)
 {
-	int ret, low, high, remaining;
-
-	get_random_bytes(&next_port, sizeof next_port);
-	inet_get_local_port_range(&low, &high);
-	remaining = (high - low) + 1;
-	next_port = ((unsigned int) next_port % remaining) + low;
+	int ret;
 
 	cma_wq = create_singlethread_workqueue("rdma_cm");
 	if (!cma_wq)

^ permalink raw reply

* [PATCH] WAN: flush tx_queue in hdlc_ppp to prevent panic on rmmod hw_driver.
From: Krzysztof Halasa @ 2010-04-15  0:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

tx_queue is used as a temporary queue when not allowed to queue skb
directly to the hw device driver (which may sleep). Most paths flush
it before returning, but ppp_start() currently cannot. Make sure we
don't leave skbs pointing to a non-existent device.

Thanks to Michael Barkowski for reporting this problem.

Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>

diff --git a/drivers/net/wan/hdlc_ppp.c b/drivers/net/wan/hdlc_ppp.c
index b9b9d6b..941f053 100644
--- a/drivers/net/wan/hdlc_ppp.c
+++ b/drivers/net/wan/hdlc_ppp.c
@@ -628,9 +628,15 @@ static void ppp_stop(struct net_device *dev)
 	ppp_cp_event(dev, PID_LCP, STOP, 0, 0, 0, NULL);
 }

+static void ppp_close(struct net_device *dev)
+{
+	ppp_tx_flush();
+}
+
 static struct hdlc_proto proto = {
 	.start		= ppp_start,
 	.stop		= ppp_stop,
+	.close		= ppp_close,
 	.type_trans	= ppp_type_trans,
 	.ioctl		= ppp_ioctl,
 	.netif_rx	= ppp_rx,

^ permalink raw reply related

* Re: hdlc_ppp: why no detach()?
From: Krzysztof Halasa @ 2010-04-15  0:02 UTC (permalink / raw)
  To: Michael Barkowski
  Cc: David S. Miller, Julia Lawall, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <m3633todl2.fsf@intrepid.localdomain>

> 1. The SCR packet will be delayed by 2 seconds (both first and second
>    SCR will be sent the same time). Perhaps we delay only a little
>    (instead of full 2 seconds) and only then send the initial packet.

BTW can I sleep in a netdevice_notifier callback, no?
-- 
Krzysztof Halasa

^ permalink raw reply

* RE: [PATCH] Infiniband: Randomize local port allocation.
From: Sean Hefty @ 2010-04-15  0:01 UTC (permalink / raw)
  To: penguin-kernel, rolandd
  Cc: amwang, opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm,
	linux-kernel
In-Reply-To: <201004140201.o3E21Aqn075978@www262.sakura.ne.jp>

>[PATCH] Infiniband: Randomize local port allocation.
>
>Randomize local port allocation in a way sctp_get_port_local() does.
>
>Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

Thanks for fixing this long outstanding issue.  :)  The latest patch looks
correct and passed some simple tests that I ran against it.  One comment below,
which I didn't catch before:

>---
> drivers/infiniband/core/cma.c |   69 ++++++++++++++---------------------------
>-
> 1 file changed, 24 insertions(+), 45 deletions(-)
>
>--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
>+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
>@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
> static DEFINE_IDR(tcp_ps);
> static DEFINE_IDR(udp_ps);
> static DEFINE_IDR(ipoib_ps);
>-static int next_port;
>
> struct cma_device {
> 	struct list_head	list;
>@@ -1970,47 +1969,32 @@ err1:
>
> static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
> {
>-	struct rdma_bind_list *bind_list;
>-	int port, ret, low, high;
>-
>-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
>-	if (!bind_list)
>-		return -ENOMEM;
>-
>-retry:
>-	/* FIXME: add proper port randomization per like inet_csk_get_port */
>-	do {
>-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
>-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
>-
>-	if (ret)
>-		goto err1;
>+	static unsigned int last_used_port;
>+	int low, high, remaining;
>+	unsigned int rover;
>
> 	inet_get_local_port_range(&low, &high);
>-	if (port > high) {
>-		if (next_port != low) {
>-			idr_remove(ps, port);
>-			next_port = low;
>-			goto retry;
>+	remaining = (high - low) + 1;
>+	rover = net_random() % remaining + low;
>+	do {
>+		rover++;
>+		if ((rover < low) || (rover > high))
>+			rover = low;

Assuming that we're likely to pick a valid port on the first try, it would be
more efficient to move the above 3 lines to the end of the while loop.

>+		if (last_used_port != rover &&
>+		    !idr_find(ps, (unsigned short) rover)) {
>+			int ret = cma_alloc_port(ps, id_priv, rover);
>+			/*
>+			 * Remember previously used port number in order to
>+			 * avoid re-using same port immediately after it is
>+			 * closed.
>+			 */
>+			if (!ret)
>+				last_used_port = rover;
>+			if (ret != -EADDRNOTAVAIL)
>+				return ret;
> 		}
>-		ret = -EADDRNOTAVAIL;
>-		goto err2;
>-	}
>-
>-	if (port == high)
>-		next_port = low;
>-	else
>-		next_port = port + 1;
>-
>-	bind_list->ps = ps;
>-	bind_list->port = (unsigned short) port;
>-	cma_bind_port(bind_list, id_priv);
>-	return 0;
>-err2:
>-	idr_remove(ps, port);
>-err1:
>-	kfree(bind_list);
>-	return ret;
>+	} while (--remaining > 0);
>+	return -EADDRNOTAVAIL;
> }

^ permalink raw reply

* Re: [PATCH net-next-2.6] drivers: net: use skb_headlen()
From: David Miller @ 2010-04-14 23:15 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <20100414.155949.210580528.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Wed, 14 Apr 2010 15:59:49 -0700 (PDT)

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 13 Apr 2010 22:48:20 +0200
> 
>> replaces (skb->len - skb->data_len) occurrences by skb_headlen(skb)
>> 
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Applied, thanks Eric.

Eric, there were several typos in this patch and it is clear
that even e1000 wasn't build tested.

You used the mystical "skb_headsize()" instead of "skb_headlen()"
in several locations.

I fixed this up, but please build test your changes no matter
how seemingly trivial next time.

Thanks.

^ permalink raw reply

* Re: [PATCH v3] net: batch skb dequeueing from softnet input_pkt_queue
From: Changli Gao @ 2010-04-14 23:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev
In-Reply-To: <1271258409.16881.1701.camel@edumazet-laptop>

On Wed, Apr 14, 2010 at 11:20 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 14 avril 2010 à 17:52 +0800, Changli Gao a écrit :
>> batch skb dequeueing from softnet input_pkt_queue
>>
>> batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
>> contention and irq disabling/enabling.
>>
>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
>
> Adding stop_machine() with no explanation ?

stop_machine() is added to flush the processing_queue. Because the old
flush_backlog() runs in IRQ context, it can't touch the things, which
are only valid in softirq context.

>
> No ack from my previous comments, suggestions, and still same logic ?

In this patch, the volatile variable flush_processing_queue is
removed, and the corresponding lines are removed too.

Oh, I should splice the old message back, as stop_machine is used
instead. So, the processing queue will be removed from softnet_data,
and a single int counter will be used instead to count the packets
which are being processing.

one issue you concern is potential cache miss when summing in enqueue
function. It won't happen all the time, in fact, I think it should
happen seldom, and it is the responsibility of hardware to cache the
data frequently used.

>
> Are we supposed to read patch, test it, make some benches, correct bugs,
> say Amen ?
>

OK, I'll test it.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [BUG net-next-2.6] fib: Some rcu warning
From: David Miller @ 2010-04-14 23:12 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, paulmck
In-Reply-To: <1271275859.16881.1753.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 14 Apr 2010 22:10:59 +0200

> [PATCH] fib: suppress lockdep-RCU false positive in FIB trie.
> 
> Followup of commit 634a4b20
> 
> Allow tnode_get_child_rcu() to be called either under rcu_read_lock()
> protection or with RTNL held.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Since both your and Paul both posted an identical patch I'll
just add both of your signoffs to this one :-)

Since 634a4b20 is in net-2.6, this should probably go into
net-2.6 as well.

Thanks guys.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox