Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: rps perfomance WAS(Re: rps: question
From: jamal @ 2010-04-20 12:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Changli Gao, Rick Jones, David Miller, therbert, netdev, robert,
	andi
In-Reply-To: <1271590476.16881.4925.camel@edumazet-laptop>

folks,

Thanks to everybody (Eric stands out) for your patience. 
I ended mostly validating whats already been said. I have a lot of data
and can describe in details how i tested etc but it would require
patience in reading, so i will spare you;-> If you are interested let me
know and i will be happy to share.

Summary is: 
-rps good, gives higher throughput for apps
-rps not so good, latency worse but gets better with higher input rate
or increasing number of flows (which translates to higher pps)
-rps works well with newer hardware that has better cache structures.
[Gives great results on my test machine a Nehalem single processor, 4
cores each with two SMT threads that has a shared L2 between threads and
a shared L3 between cores]. 
Your selection of what the demux cpu is and where the target cpus are is
an influencing factor in the latency results. If you have a system with
multiple sockets, you should get better numbers if you stay within the
same socket relative to going across sockets.
-rps does a better job at helping schedule apps on same cpu thus
localizing the app. The throughput results with rps are very consistent
and better whereas in non-rps case, variance is _high_.

My next step is to do some forwarding tests - probably next week. I am
concerned here because i expect the cache misses to be higher than the
app scenario (netdev structure and attributes could be touched by many
cpus)

cheers,
jamal

^ permalink raw reply

* Re: A possible bug in reqsk_queue_hash_req()
From: Eric Dumazet @ 2010-04-20 11:06 UTC (permalink / raw)
  To: Li Yu; +Cc: netdev, linux-kernel
In-Reply-To: <i2x9b948ee41004200335vc229a59cvc0a08c35c949d7dd@mail.gmail.com>

Le mardi 20 avril 2010 à 18:35 +0800, Li Yu a écrit :
> Hi,
> 
>      I found out a possible bug in reqsk_queue_hash_req(), it seem
> that we should move "req->dl_next = lopt->syn_table[hash];" statement
> into follow write lock protected scope.
> 
>      As I browsed source code, this function only can be call at rx
> code path which is protected a spin lock over struct sock , but its
> caller (  inet_csk_reqsk_queue_hash_add() ) is a GPL exported symbol,
> so I think that we'd best move this statement into below write lock
> protected scope.
> 
>      Below is the patch to play this change, please do not apply it on
> source code, it's just for show.
> 
>     Thanks.
> 
> Yu
> 
> --- include/net/request_sock.h  2010-04-09 15:27:14.000000000 +0800
> +++ include/net/request_sock.h        2010-04-20 18:11:32.000000000 +0800
> @@ -247,9 +247,9 @@ static inline void reqsk_queue_hash_req(
>         req->expires = jiffies + timeout;
>         req->retrans = 0;
>         req->sk = NULL;
> -       req->dl_next = lopt->syn_table[hash];
> 
>         write_lock(&queue->syn_wait_lock);
> +       req->dl_next = lopt->syn_table[hash];
>         lopt->syn_table[hash] = req;
>         write_unlock(&queue->syn_wait_lock);
>  }

I believe its not really necessary, because we are the only possible
writer at this stage.

The write_lock() ... write_unlock() is there only to enforce a
synchronisation with readers.

All callers of this reqsk_queue_hash_req() must have the socket locked




^ permalink raw reply

* A possible bug in reqsk_queue_hash_req()
From: Li Yu @ 2010-04-20 10:35 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

Hi,

     I found out a possible bug in reqsk_queue_hash_req(), it seem
that we should move "req->dl_next = lopt->syn_table[hash];" statement
into follow write lock protected scope.

     As I browsed source code, this function only can be call at rx
code path which is protected a spin lock over struct sock , but its
caller (  inet_csk_reqsk_queue_hash_add() ) is a GPL exported symbol,
so I think that we'd best move this statement into below write lock
protected scope.

     Below is the patch to play this change, please do not apply it on
source code, it's just for show.

    Thanks.

Yu

--- include/net/request_sock.h  2010-04-09 15:27:14.000000000 +0800
+++ include/net/request_sock.h        2010-04-20 18:11:32.000000000 +0800
@@ -247,9 +247,9 @@ static inline void reqsk_queue_hash_req(
        req->expires = jiffies + timeout;
        req->retrans = 0;
        req->sk = NULL;
-       req->dl_next = lopt->syn_table[hash];

        write_lock(&queue->syn_wait_lock);
+       req->dl_next = lopt->syn_table[hash];
        lopt->syn_table[hash] = req;
        write_unlock(&queue->syn_wait_lock);
 }

^ permalink raw reply

* [PATCH] NET: Fix an RCU warning in dev_pick_tx()
From: David Howells @ 2010-04-20 10:25 UTC (permalink / raw)
  To: netdev; +Cc: dhowells

Fix the following RCU warning in dev_pick_tx():

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
net/core/dev.c:1993 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 0
2 locks held by swapper/0:
 #0:  (&idev->mc_ifc_timer){+.-...}, at: [<ffffffff81039e65>] run_timer_softirq+0x17b/0x278
 #1:  (rcu_read_lock_bh){.+....}, at: [<ffffffff812ea3eb>] dev_queue_xmit+0x14e/0x4dc

stack backtrace:
Pid: 0, comm: swapper Not tainted 2.6.34-rc5-cachefs #4
Call Trace:
 <IRQ>  [<ffffffff810516c4>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffff812ea4f6>] dev_queue_xmit+0x259/0x4dc
 [<ffffffff812ea3eb>] ? dev_queue_xmit+0x14e/0x4dc
 [<ffffffff81052324>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff81035362>] ? local_bh_enable_ip+0xbc/0xc1
 [<ffffffff812f0954>] neigh_resolve_output+0x24b/0x27c
 [<ffffffff8134f673>] ip6_output_finish+0x7c/0xb4
 [<ffffffff81350c34>] ip6_output2+0x256/0x261
 [<ffffffff81052324>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff813517fb>] ip6_output+0xbbc/0xbcb
 [<ffffffff8135bc5d>] ? fib6_force_start_gc+0x2b/0x2d
 [<ffffffff81368acb>] mld_sendpack+0x273/0x39d
 [<ffffffff81368858>] ? mld_sendpack+0x0/0x39d
 [<ffffffff81052099>] ? mark_held_locks+0x52/0x70
 [<ffffffff813692fc>] mld_ifc_timer_expire+0x24f/0x288
 [<ffffffff81039ed6>] run_timer_softirq+0x1ec/0x278
 [<ffffffff81039e65>] ? run_timer_softirq+0x17b/0x278
 [<ffffffff813690ad>] ? mld_ifc_timer_expire+0x0/0x288
 [<ffffffff81035531>] ? __do_softirq+0x69/0x140
 [<ffffffff8103556a>] __do_softirq+0xa2/0x140
 [<ffffffff81002e0c>] call_softirq+0x1c/0x28
 [<ffffffff81004b54>] do_softirq+0x38/0x80
 [<ffffffff81034f06>] irq_exit+0x45/0x47
 [<ffffffff810177c3>] smp_apic_timer_interrupt+0x88/0x96
 [<ffffffff810028d3>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff810488dd>] ? __atomic_notifier_call_chain+0x0/0x86
 [<ffffffff810096bf>] ? mwait_idle+0x6e/0x78
 [<ffffffff810096b6>] ? mwait_idle+0x65/0x78
 [<ffffffff810011cb>] cpu_idle+0x4d/0x83
 [<ffffffff81380b05>] rest_init+0xb9/0xc0
 [<ffffffff81380a4c>] ? rest_init+0x0/0xc0
 [<ffffffff8168dcf0>] start_kernel+0x392/0x39d
 [<ffffffff8168d2a3>] x86_64_start_reservations+0xb3/0xb7
 [<ffffffff8168d38b>] x86_64_start_kernel+0xe4/0xeb

An rcu_dereference() should be an rcu_dereference_bh().

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/core/dev.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 92584bf..f769098 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1990,7 +1990,7 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
 				queue_index = skb_tx_hash(dev, skb);
 
 			if (sk) {
-				struct dst_entry *dst = rcu_dereference(sk->sk_dst_cache);
+				struct dst_entry *dst = rcu_dereference_bh(sk->sk_dst_cache);
 
 				if (dst && skb_dst(skb) == dst)
 					sk_tx_queue_set(sk, queue_index);


^ permalink raw reply related

* [PATCH] rps: optimize rps_get_cpu()
From: Changli Gao @ 2010-04-20  9:55 UTC (permalink / raw)
  To: David S. Miller; +Cc: Tom Herbert, Eric Dumazet, netdev, Changli Gao

optimize rps_get_cpu().

don't initialize ports when we can get the ports. one memory access for ports
than two.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/core/dev.c |   24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index b31d5d6..a281727 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2225,7 +2225,11 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 	int cpu = -1;
 	u8 ip_proto;
 	u16 tcpu;
-	u32 addr1, addr2, ports, ihl;
+	u32 addr1, addr2, ihl;
+	union {
+		u32 v32;
+		u16 v16[2];
+	} ports;
 
 	if (skb_rx_queue_recorded(skb)) {
 		u16 index = skb_get_rx_queue(skb);
@@ -2271,7 +2275,6 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 	default:
 		goto done;
 	}
-	ports = 0;
 	switch (ip_proto) {
 	case IPPROTO_TCP:
 	case IPPROTO_UDP:
@@ -2281,25 +2284,20 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 	case IPPROTO_SCTP:
 	case IPPROTO_UDPLITE:
 		if (pskb_may_pull(skb, (ihl * 4) + 4)) {
-			__be16 *hports = (__be16 *) (skb->data + (ihl * 4));
-			u32 sport, dport;
-
-			sport = (__force u16) hports[0];
-			dport = (__force u16) hports[1];
-			if (dport < sport)
-				swap(sport, dport);
-			ports = (sport << 16) + dport;
+			ports.v32 = * (__force u32 *) (skb->data + (ihl * 4));
+			if (ports.v16[0] < ports.v16[1])
+				swap(ports.v16[0], ports.v16[1]);
+			break;
 		}
-		break;
-
 	default:
+		ports.v32 = 0;
 		break;
 	}
 
 	/* get a consistent hash (same value on both flow directions) */
 	if (addr2 < addr1)
 		swap(addr1, addr2);
-	skb->rxhash = jhash_3words(addr1, addr2, ports, hashrnd);
+	skb->rxhash = jhash_3words(addr1, addr2, ports.v32, hashrnd);
 	if (!skb->rxhash)
 		skb->rxhash = 1;
 

^ permalink raw reply related

* Re: [PATCH] rps: send IPIs ASAP
From: Changli Gao @ 2010-04-20  9:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, David S. Miller, netdev
In-Reply-To: <1271742351.3845.106.camel@edumazet-laptop>

On Tue, Apr 20, 2010 at 1:45 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 19 avril 2010 à 22:15 -0700, Tom Herbert a écrit :
>> On Mon, Apr 19, 2010 at 9:08 PM, Changli Gao <xiaosuo@gmail.com> wrote:
>> > rps: send IPIs ASAP
>> >
>> > In order to reduce latency, we'd better send IPIs ASAP to schedule the
>> > corresponding NAPIs.
>> >
>> A design point of RPS is that we generate at most one IPI per CPU per
>> device interrupt, which at least offers some predictable coalescing.
>> With your changes, we would get at most one IPI per packet-- that
>> could represent a lot more of them.  Did you test this to see what the
>> impact is in this regard?
>>
>
> I agree with you Tom. Coalescing IPI is probably better.
>
> If the receiver CPU got a single packet in its RX handling, latency will
> be the same anyway.

I did the "ping -f" test again, and found that the differences of RTT
I got before were noises. It seems your "shortcut net_rps_action()"
patch eliminates the differences.

>
> If the receiver CPU got many packets, chance is high we are in a stress
> situation, and coalescing is a win in this case.
>
> I am currently testing a patch to call net_rps_action() at the beginning
> of process_backlog() (if we have a non null ipi_rps_list pointer)
>
> Will post a patch with bench results
>

It sounds like a better idea.


-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: emphasize rtnl lock required in call_netdevice_notifiers
From: David Miller @ 2010-04-20  8:45 UTC (permalink / raw)
  To: jpirko; +Cc: netdev, eric.dumazet
In-Reply-To: <20100420083729.GA2810@psychotron.lab.eng.brq.redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Tue, 20 Apr 2010 10:37:30 +0200

> Since netdev_chain is guarded by rtnl_lock, ASSERT_RTNL should be present here
> to make sure that all callers of call_netdevice_notifiers does the locking
> properly.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

That seems right, applied, thanks Jiri!

^ permalink raw reply

* [PATCH net-next-2.6] net: emphasize rtnl lock required in call_netdevice_notifiers
From: Jiri Pirko @ 2010-04-20  8:37 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet

Since netdev_chain is guarded by rtnl_lock, ASSERT_RTNL should be present here
to make sure that all callers of call_netdevice_notifiers does the locking
properly.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

diff --git a/net/core/dev.c b/net/core/dev.c
index 05a2b29..a7f13a5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1435,6 +1435,7 @@ EXPORT_SYMBOL(unregister_netdevice_notifier);
 
 int call_netdevice_notifiers(unsigned long val, struct net_device *dev)
 {
+	ASSERT_RTNL();
 	return raw_notifier_call_chain(&netdev_chain, val, dev);
 }
 

^ permalink raw reply related

* Re: [PATCH] gianfar: Wait for both RX and TX to stop
From: David Miller @ 2010-04-20  8:18 UTC (permalink / raw)
  To: galak; +Cc: netdev, afleming
In-Reply-To: <68CA249E-C6E9-43EA-A132-C48DB9E2384D@kernel.crashing.org>

From: Kumar Gala <galak@kernel.crashing.org>
Date: Mon, 19 Apr 2010 23:44:49 -0500

> 
> On Apr 18, 2010, at 6:13 PM, Andy Fleming wrote:
> 
>> When gracefully stopping the controller, the driver was continuing if
>> *either* RX or TX had stopped.  We need to wait for both, or the
>> controller could get into an invalid state.
>> 
>> Signed-off-by: Andy Fleming <afleming@freescale.com>
>> ---
>> drivers/net/gianfar.c |    5 +++--
>> 1 files changed, 3 insertions(+), 2 deletions(-)
> 
> Acked-by: Kumar Gala <galak@kernel.crashing.org>
> 
> (please pick this up for 2.6.34, fixes an annoying bug).

I will do this tomorrow, thanks!

^ permalink raw reply

* Re: [PATCH net-next-2.6] rps: consistent rxhash
From: David Miller @ 2010-04-20  8:18 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xiaosuo, therbert, netdev
In-Reply-To: <1271750198.3845.216.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 20 Apr 2010 09:56:38 +0200

> In case we compute a software skb->rxhash, we can generate a consistent
> hash : Its value will be the same in both flow directions.
> 
> This helps some workloads, like conntracking, since the same state needs
> to be accessed in both directions.
> 
> tbench + RFS + this patch gives better results than tbench with default
> kernel configuration (no RPS, no RFS)
> 
> Also fixed some sparse warnings.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] rps: cleanups
From: David Miller @ 2010-04-20  8:18 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xiaosuo, therbert, netdev
In-Reply-To: <1271747834.3845.206.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 20 Apr 2010 09:17:14 +0200

> Le lundi 19 avril 2010 à 13:21 -0700, David Miller a écrit :
> 
>> 
>> It is getting increasingly complicated to follow who enables and
>> disabled local cpu irqs in these code paths.  We could combat
>> this by adding something like "_irq_enable()" to the function
>> names.
> 
> Yes I agree, we need a general cleanup in this file
> 
> Thanks David !
> 
> [PATCH net-next-2.6] rps: cleanups
> 

Applied.

^ permalink raw reply

* [PATCH net-next-2.6] rps: consistent rxhash
From: Eric Dumazet @ 2010-04-20  7:56 UTC (permalink / raw)
  To: Changli Gao, David Miller; +Cc: therbert, netdev
In-Reply-To: <1271743164.3845.128.camel@edumazet-laptop>

In case we compute a software skb->rxhash, we can generate a consistent
hash : Its value will be the same in both flow directions.

This helps some workloads, like conntracking, since the same state needs
to be accessed in both directions.

tbench + RFS + this patch gives better results than tbench with default
kernel configuration (no RPS, no RFS)

Also fixed some sparse warnings.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/dev.c |   25 ++++++++++++++++++-------
 1 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 05a2b29..cb150ec 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1974,7 +1974,7 @@ u16 skb_tx_hash(const struct net_device *dev, const struct sk_buff *skb)
 	if (skb->sk && skb->sk->sk_hash)
 		hash = skb->sk->sk_hash;
 	else
-		hash = skb->protocol;
+		hash = (__force u16) skb->protocol;
 
 	hash = jhash_1word(hash, hashrnd);
 
@@ -2253,8 +2253,8 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 
 		ip = (struct iphdr *) skb->data;
 		ip_proto = ip->protocol;
-		addr1 = ip->saddr;
-		addr2 = ip->daddr;
+		addr1 = (__force u32) ip->saddr;
+		addr2 = (__force u32) ip->daddr;
 		ihl = ip->ihl;
 		break;
 	case __constant_htons(ETH_P_IPV6):
@@ -2263,8 +2263,8 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 
 		ip6 = (struct ipv6hdr *) skb->data;
 		ip_proto = ip6->nexthdr;
-		addr1 = ip6->saddr.s6_addr32[3];
-		addr2 = ip6->daddr.s6_addr32[3];
+		addr1 = (__force u32) ip6->saddr.s6_addr32[3];
+		addr2 = (__force u32) ip6->daddr.s6_addr32[3];
 		ihl = (40 >> 2);
 		break;
 	default:
@@ -2279,14 +2279,25 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 	case IPPROTO_AH:
 	case IPPROTO_SCTP:
 	case IPPROTO_UDPLITE:
-		if (pskb_may_pull(skb, (ihl * 4) + 4))
-			ports = *((u32 *) (skb->data + (ihl * 4)));
+		if (pskb_may_pull(skb, (ihl * 4) + 4)) {
+			__be16 *hports = (__be16 *) (skb->data + (ihl * 4));
+			u32 sport, dport;
+
+			sport = (__force u16) hports[0];
+			dport = (__force u16) hports[1];
+			if (dport < sport)
+				swap(sport, dport);
+			ports = (sport << 16) + dport;
+		}
 		break;
 
 	default:
 		break;
 	}
 
+	/* get a consistent hash (same value on both flow directions) */
+	if (addr2 < addr1)
+		swap(addr1, addr2);
 	skb->rxhash = jhash_3words(addr1, addr2, ports, hashrnd);
 	if (!skb->rxhash)
 		skb->rxhash = 1;



^ permalink raw reply related

* [PATCH net-next-2.6] rps: cleanups
From: Eric Dumazet @ 2010-04-20  7:17 UTC (permalink / raw)
  To: David Miller; +Cc: xiaosuo, therbert, netdev
In-Reply-To: <20100419.132158.143863746.davem@davemloft.net>

Le lundi 19 avril 2010 à 13:21 -0700, David Miller a écrit :

> 
> It is getting increasingly complicated to follow who enables and
> disabled local cpu irqs in these code paths.  We could combat
> this by adding something like "_irq_enable()" to the function
> names.

Yes I agree, we need a general cleanup in this file

Thanks David !

[PATCH net-next-2.6] rps: cleanups

struct softnet_data holds many queues, so consistent use "sd" name
instead of "queue" is better.

Adds a rps_ipi_queued() helper to cleanup enqueue_to_backlog()

Adds a _and_irq_disable suffix to net_rps_action() name, as David
suggested.

incr_input_queue_head() becomes input_queue_head_incr()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/netdevice.h |    4 
 net/core/dev.c            |  149 +++++++++++++++++++-----------------
 2 files changed, 82 insertions(+), 71 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 83ab3da..3c5ed5f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1401,10 +1401,10 @@ struct softnet_data {
 	struct napi_struct	backlog;
 };
 
-static inline void incr_input_queue_head(struct softnet_data *queue)
+static inline void input_queue_head_incr(struct softnet_data *sd)
 {
 #ifdef CONFIG_RPS
-	queue->input_queue_head++;
+	sd->input_queue_head++;
 #endif
 }
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 05a2b29..70df048 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -208,17 +208,17 @@ static inline struct hlist_head *dev_index_hash(struct net *net, int ifindex)
 	return &net->dev_index_head[ifindex & (NETDEV_HASHENTRIES - 1)];
 }
 
-static inline void rps_lock(struct softnet_data *queue)
+static inline void rps_lock(struct softnet_data *sd)
 {
 #ifdef CONFIG_RPS
-	spin_lock(&queue->input_pkt_queue.lock);
+	spin_lock(&sd->input_pkt_queue.lock);
 #endif
 }
 
-static inline void rps_unlock(struct softnet_data *queue)
+static inline void rps_unlock(struct softnet_data *sd)
 {
 #ifdef CONFIG_RPS
-	spin_unlock(&queue->input_pkt_queue.lock);
+	spin_unlock(&sd->input_pkt_queue.lock);
 #endif
 }
 
@@ -2346,63 +2346,74 @@ done:
 }
 
 /* Called from hardirq (IPI) context */
-static void trigger_softirq(void *data)
+static void rps_trigger_softirq(void *data)
 {
-	struct softnet_data *queue = data;
-	__napi_schedule(&queue->backlog);
+	struct softnet_data *sd = data;
+
+	__napi_schedule(&sd->backlog);
 	__get_cpu_var(netdev_rx_stat).received_rps++;
 }
+
 #endif /* CONFIG_RPS */
 
 /*
+ * Check if this softnet_data structure is another cpu one
+ * If yes, queue it to our IPI list and return 1
+ * If no, return 0
+ */
+static int rps_ipi_queued(struct softnet_data *sd)
+{
+#ifdef CONFIG_RPS
+	struct softnet_data *mysd = &__get_cpu_var(softnet_data);
+
+	if (sd != mysd) {
+		sd->rps_ipi_next = mysd->rps_ipi_list;
+		mysd->rps_ipi_list = sd;
+
+		__raise_softirq_irqoff(NET_RX_SOFTIRQ);
+		return 1;
+	}
+#endif /* CONFIG_RPS */
+	return 0;
+}
+
+/*
  * enqueue_to_backlog is called to queue an skb to a per CPU backlog
  * queue (may be a remote CPU queue).
  */
 static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 			      unsigned int *qtail)
 {
-	struct softnet_data *queue;
+	struct softnet_data *sd;
 	unsigned long flags;
 
-	queue = &per_cpu(softnet_data, cpu);
+	sd = &per_cpu(softnet_data, cpu);
 
 	local_irq_save(flags);
 	__get_cpu_var(netdev_rx_stat).total++;
 
-	rps_lock(queue);
-	if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
-		if (queue->input_pkt_queue.qlen) {
+	rps_lock(sd);
+	if (sd->input_pkt_queue.qlen <= netdev_max_backlog) {
+		if (sd->input_pkt_queue.qlen) {
 enqueue:
-			__skb_queue_tail(&queue->input_pkt_queue, skb);
+			__skb_queue_tail(&sd->input_pkt_queue, skb);
 #ifdef CONFIG_RPS
-			*qtail = queue->input_queue_head +
-			    queue->input_pkt_queue.qlen;
+			*qtail = sd->input_queue_head + sd->input_pkt_queue.qlen;
 #endif
-			rps_unlock(queue);
+			rps_unlock(sd);
 			local_irq_restore(flags);
 			return NET_RX_SUCCESS;
 		}
 
 		/* Schedule NAPI for backlog device */
-		if (napi_schedule_prep(&queue->backlog)) {
-#ifdef CONFIG_RPS
-			if (cpu != smp_processor_id()) {
-				struct softnet_data *myqueue;
-
-				myqueue = &__get_cpu_var(softnet_data);
-				queue->rps_ipi_next = myqueue->rps_ipi_list;
-				myqueue->rps_ipi_list = queue;
-
-				__raise_softirq_irqoff(NET_RX_SOFTIRQ);
-				goto enqueue;
-			}
-#endif
-			__napi_schedule(&queue->backlog);
+		if (napi_schedule_prep(&sd->backlog)) {
+			if (!rps_ipi_queued(sd))
+				__napi_schedule(&sd->backlog);
 		}
 		goto enqueue;
 	}
 
-	rps_unlock(queue);
+	rps_unlock(sd);
 
 	__get_cpu_var(netdev_rx_stat).dropped++;
 	local_irq_restore(flags);
@@ -2903,17 +2914,17 @@ EXPORT_SYMBOL(netif_receive_skb);
 static void flush_backlog(void *arg)
 {
 	struct net_device *dev = arg;
-	struct softnet_data *queue = &__get_cpu_var(softnet_data);
+	struct softnet_data *sd = &__get_cpu_var(softnet_data);
 	struct sk_buff *skb, *tmp;
 
-	rps_lock(queue);
-	skb_queue_walk_safe(&queue->input_pkt_queue, skb, tmp)
+	rps_lock(sd);
+	skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp)
 		if (skb->dev == dev) {
-			__skb_unlink(skb, &queue->input_pkt_queue);
+			__skb_unlink(skb, &sd->input_pkt_queue);
 			kfree_skb(skb);
-			incr_input_queue_head(queue);
+			input_queue_head_incr(sd);
 		}
-	rps_unlock(queue);
+	rps_unlock(sd);
 }
 
 static int napi_gro_complete(struct sk_buff *skb)
@@ -3219,23 +3230,23 @@ EXPORT_SYMBOL(napi_gro_frags);
 static int process_backlog(struct napi_struct *napi, int quota)
 {
 	int work = 0;
-	struct softnet_data *queue = &__get_cpu_var(softnet_data);
+	struct softnet_data *sd = &__get_cpu_var(softnet_data);
 
 	napi->weight = weight_p;
 	do {
 		struct sk_buff *skb;
 
 		local_irq_disable();
-		rps_lock(queue);
-		skb = __skb_dequeue(&queue->input_pkt_queue);
+		rps_lock(sd);
+		skb = __skb_dequeue(&sd->input_pkt_queue);
 		if (!skb) {
 			__napi_complete(napi);
-			rps_unlock(queue);
+			rps_unlock(sd);
 			local_irq_enable();
 			break;
 		}
-		incr_input_queue_head(queue);
-		rps_unlock(queue);
+		input_queue_head_incr(sd);
+		rps_unlock(sd);
 		local_irq_enable();
 
 		__netif_receive_skb(skb);
@@ -3331,24 +3342,25 @@ EXPORT_SYMBOL(netif_napi_del);
  * net_rps_action sends any pending IPI's for rps.
  * Note: called with local irq disabled, but exits with local irq enabled.
  */
-static void net_rps_action(void)
+static void net_rps_action_and_irq_disable(void)
 {
 #ifdef CONFIG_RPS
-	struct softnet_data *locqueue = &__get_cpu_var(softnet_data);
-	struct softnet_data *remqueue = locqueue->rps_ipi_list;
+	struct softnet_data *sd = &__get_cpu_var(softnet_data);
+	struct softnet_data *remsd = sd->rps_ipi_list;
 
-	if (remqueue) {
-		locqueue->rps_ipi_list = NULL;
+	if (remsd) {
+		sd->rps_ipi_list = NULL;
 
 		local_irq_enable();
 
 		/* Send pending IPI's to kick RPS processing on remote cpus. */
-		while (remqueue) {
-			struct softnet_data *next = remqueue->rps_ipi_next;
-			if (cpu_online(remqueue->cpu))
-				__smp_call_function_single(remqueue->cpu,
-							   &remqueue->csd, 0);
-			remqueue = next;
+		while (remsd) {
+			struct softnet_data *next = remsd->rps_ipi_next;
+
+			if (cpu_online(remsd->cpu))
+				__smp_call_function_single(remsd->cpu,
+							   &remsd->csd, 0);
+			remsd = next;
 		}
 	} else
 #endif
@@ -3423,7 +3435,7 @@ static void net_rx_action(struct softirq_action *h)
 		netpoll_poll_unlock(have);
 	}
 out:
-	net_rps_action();
+	net_rps_action_and_irq_disable();
 
 #ifdef CONFIG_NET_DMA
 	/*
@@ -5595,7 +5607,7 @@ static int dev_cpu_callback(struct notifier_block *nfb,
 	/* Process offline CPU's input_pkt_queue */
 	while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
 		netif_rx(skb);
-		incr_input_queue_head(oldsd);
+		input_queue_head_incr(oldsd);
 	}
 
 	return NOTIFY_OK;
@@ -5812,24 +5824,23 @@ static int __init net_dev_init(void)
 	 */
 
 	for_each_possible_cpu(i) {
-		struct softnet_data *queue;
+		struct softnet_data *sd = &per_cpu(softnet_data, i);
 
-		queue = &per_cpu(softnet_data, i);
-		skb_queue_head_init(&queue->input_pkt_queue);
-		queue->completion_queue = NULL;
-		INIT_LIST_HEAD(&queue->poll_list);
+		skb_queue_head_init(&sd->input_pkt_queue);
+		sd->completion_queue = NULL;
+		INIT_LIST_HEAD(&sd->poll_list);
 
 #ifdef CONFIG_RPS
-		queue->csd.func = trigger_softirq;
-		queue->csd.info = queue;
-		queue->csd.flags = 0;
-		queue->cpu = i;
+		sd->csd.func = rps_trigger_softirq;
+		sd->csd.info = sd;
+		sd->csd.flags = 0;
+		sd->cpu = i;
 #endif
 
-		queue->backlog.poll = process_backlog;
-		queue->backlog.weight = weight_p;
-		queue->backlog.gro_list = NULL;
-		queue->backlog.gro_count = 0;
+		sd->backlog.poll = process_backlog;
+		sd->backlog.weight = weight_p;
+		sd->backlog.gro_list = NULL;
+		sd->backlog.gro_count = 0;
 	}
 
 	dev_boot_phase = 0;



^ permalink raw reply related

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Eric Dumazet @ 2010-04-20  5:59 UTC (permalink / raw)
  To: Changli Gao; +Cc: David Miller, therbert, netdev
In-Reply-To: <y2u412e6f7f1004191638mee9206dfoab7482bbff83e38d@mail.gmail.com>

Le mardi 20 avril 2010 à 07:38 +0800, Changli Gao a écrit :

> Does this problem has relationship with your patch? No. If the rxhash
> isn't provided by hardware, we can get more throughput from you patch,
> and on the other side, we don't lose anything but potential more hash
> collision.
> 

I am not sure what you call hash collision. There is no hash chain here.

This 32bit hash is a jhash one, and we only need 1 to 12 bits in it, I
am pretty sure its OK.




^ permalink raw reply

* Re: [PATCH net-next-2.6] rps: shortcut net_rps_action()
From: Eric Dumazet @ 2010-04-20  5:55 UTC (permalink / raw)
  To: Changli Gao; +Cc: David Miller, Tom Herbert, netdev
In-Reply-To: <p2j412e6f7f1004191732p41094214yf82d2755f9081d1d@mail.gmail.com>

Le mardi 20 avril 2010 à 08:32 +0800, Changli Gao a écrit :

> Oh, I read the code again and got the answer. After the IPI is sent,
> this softnet will be queued by the other CPUs. We prefetch the pointer
> rps_ipi_next to avoid this race condition.
> 

Speaking of prefetch business,

I partly tested following patch, I will submit it if it happens to be a
clear win.

diff --git a/net/core/dev.c b/net/core/dev.c
index 05a2b29..fe6fc9f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2349,7 +2349,9 @@ done:
 static void trigger_softirq(void *data)
 {
 	struct softnet_data *queue = data;
+
 	__napi_schedule(&queue->backlog);
+	prefetch(queue->input_pkt_queue.next);
 	__get_cpu_var(netdev_rx_stat).received_rps++;
 }
 #endif /* CONFIG_RPS */



^ permalink raw reply related

* Re: [PATCH] rps: send IPIs ASAP
From: Eric Dumazet @ 2010-04-20  5:45 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Changli Gao, David S. Miller, netdev
In-Reply-To: <g2y65634d661004192215oc06d9c43xb2a7c7d946149cf5@mail.gmail.com>

Le lundi 19 avril 2010 à 22:15 -0700, Tom Herbert a écrit :
> On Mon, Apr 19, 2010 at 9:08 PM, Changli Gao <xiaosuo@gmail.com> wrote:
> > rps: send IPIs ASAP
> >
> > In order to reduce latency, we'd better send IPIs ASAP to schedule the
> > corresponding NAPIs.
> >
> A design point of RPS is that we generate at most one IPI per CPU per
> device interrupt, which at least offers some predictable coalescing.
> With your changes, we would get at most one IPI per packet-- that
> could represent a lot more of them.  Did you test this to see what the
> impact is in this regard?
> 

I agree with you Tom. Coalescing IPI is probably better.

If the receiver CPU got a single packet in its RX handling, latency will
be the same anyway.

If the receiver CPU got many packets, chance is high we are in a stress
situation, and coalescing is a win in this case.

I am currently testing a patch to call net_rps_action() at the beginning
of process_backlog() (if we have a non null ipi_rps_list pointer)

Will post a patch with bench results

^ permalink raw reply

* Re: [PATCH] rps: send IPIs ASAP
From: David Miller @ 2010-04-20  5:39 UTC (permalink / raw)
  To: therbert; +Cc: xiaosuo, eric.dumazet, netdev
In-Reply-To: <g2y65634d661004192215oc06d9c43xb2a7c7d946149cf5@mail.gmail.com>

From: Tom Herbert <therbert@google.com>
Date: Mon, 19 Apr 2010 22:15:58 -0700

> Did you test this to see what the impact is in this regard?

Changli it would help immensely if you posted performance
test results along with your changes.

I can see quite obviously that this change will completely undo the
intentional batching of IPIs done by RPS.  IPIs are expensive and we
should batch things as much as possible here.

^ permalink raw reply

* Re: [PATCH] rps: send IPIs ASAP
From: Tom Herbert @ 2010-04-20  5:15 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, Eric Dumazet, netdev
In-Reply-To: <1271736519-2991-1-git-send-email-xiaosuo@gmail.com>

On Mon, Apr 19, 2010 at 9:08 PM, Changli Gao <xiaosuo@gmail.com> wrote:
> rps: send IPIs ASAP
>
> In order to reduce latency, we'd better send IPIs ASAP to schedule the
> corresponding NAPIs.
>
A design point of RPS is that we generate at most one IPI per CPU per
device interrupt, which at least offers some predictable coalescing.
With your changes, we would get at most one IPI per packet-- that
could represent a lot more of them.  Did you test this to see what the
impact is in this regard?


> For NAPI drivers, we send IPIs immediately, and for the others, we defer them
> to NET_RX_SOFTIRQ. In this patch, we move net_rps_action() to the beginning of
> net_rx_action() to emulate a softirq with the higher priority than
> NET_RX_SOFTIRQ.
>
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----
>  net/core/dev.c |   23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 05a2b29..d8fca21 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2363,6 +2363,10 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
>  {
>        struct softnet_data *queue;
>        unsigned long flags;
> +#ifdef CONFIG_RPS
> +       bool ni = !in_irq();
> +       bool ipi = false;
> +#endif
>
>        queue = &per_cpu(softnet_data, cpu);
>
> @@ -2380,6 +2384,10 @@ enqueue:
>  #endif
>                        rps_unlock(queue);
>                        local_irq_restore(flags);
> +#ifdef CONFIG_RPS
> +                       if (ipi)
> +                               __smp_call_function_single(cpu, &queue->csd, 0);
> +#endif
>                        return NET_RX_SUCCESS;
>                }
>
> @@ -2389,6 +2397,11 @@ enqueue:
>                        if (cpu != smp_processor_id()) {
>                                struct softnet_data *myqueue;
>
> +                               if (ni) {
> +                                       ipi = true;
> +                                       goto enqueue;
> +                               }
> +
>                                myqueue = &__get_cpu_var(softnet_data);
>                                queue->rps_ipi_next = myqueue->rps_ipi_list;
>                                myqueue->rps_ipi_list = queue;
> @@ -3337,6 +3350,7 @@ static void net_rps_action(void)
>        struct softnet_data *locqueue = &__get_cpu_var(softnet_data);
>        struct softnet_data *remqueue = locqueue->rps_ipi_list;
>
> +       local_irq_disable();
>        if (remqueue) {
>                locqueue->rps_ipi_list = NULL;
>
> @@ -3350,9 +3364,10 @@ static void net_rps_action(void)
>                                                           &remqueue->csd, 0);
>                        remqueue = next;
>                }
> -       } else
> -#endif
> +       } else {
>                local_irq_enable();
> +       }
> +#endif
>  }
>
>  static void net_rx_action(struct softirq_action *h)
> @@ -3362,6 +3377,8 @@ static void net_rx_action(struct softirq_action *h)
>        int budget = netdev_budget;
>        void *have;
>
> +       net_rps_action();
> +
>        local_irq_disable();
>
>        while (!list_empty(list)) {
> @@ -3423,7 +3440,7 @@ static void net_rx_action(struct softirq_action *h)
>                netpoll_poll_unlock(have);
>        }
>  out:
> -       net_rps_action();
> +       local_irq_enable();
>
>  #ifdef CONFIG_NET_DMA
>        /*
>

^ permalink raw reply

* Re: [PATCH] gianfar: Wait for both RX and TX to stop
From: Kumar Gala @ 2010-04-20  4:44 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev, Andy Fleming
In-Reply-To: <1271632401-2472-1-git-send-email-afleming@freescale.com>


On Apr 18, 2010, at 6:13 PM, Andy Fleming wrote:

> When gracefully stopping the controller, the driver was continuing if
> *either* RX or TX had stopped.  We need to wait for both, or the
> controller could get into an invalid state.
> 
> Signed-off-by: Andy Fleming <afleming@freescale.com>
> ---
> drivers/net/gianfar.c |    5 +++--
> 1 files changed, 3 insertions(+), 2 deletions(-)

Acked-by: Kumar Gala <galak@kernel.crashing.org>

(please pick this up for 2.6.34, fixes an annoying bug).

- k
 
> 
> diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> index 032073d..6038397 100644
> --- a/drivers/net/gianfar.c
> +++ b/drivers/net/gianfar.c
> @@ -1571,8 +1571,9 @@ static void gfar_halt_nodisable(struct net_device *dev)
> 		tempval |= (DMACTRL_GRS | DMACTRL_GTS);
> 		gfar_write(&regs->dmactrl, tempval);
> 
> -		while (!(gfar_read(&regs->ievent) &
> -			 (IEVENT_GRSC | IEVENT_GTSC)))
> +		while ((gfar_read(&regs->ievent) &
> +			 (IEVENT_GRSC | IEVENT_GTSC)) !=
> +			 (IEVENT_GRSC | IEVENT_GTSC))
> 			cpu_relax();
> 	}
> }
> -- 
> 1.6.5.2.g6ff9a
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* Re: [PATCH] gianfar: Wait for both RX and TX to stop
From: Kumar Gala @ 2010-04-20  4:43 UTC (permalink / raw)
  To: Timur Tabi; +Cc: Andy Fleming, davem, netdev
In-Reply-To: <m2ged82fe3e1004191408p53f19073x106ef56677c8a5df@mail.gmail.com>


On Apr 19, 2010, at 4:08 PM, Timur Tabi wrote:

> On Sun, Apr 18, 2010 at 6:13 PM, Andy Fleming <afleming@freescale.com> wrote:
> 
>> -               while (!(gfar_read(&regs->ievent) &
>> -                        (IEVENT_GRSC | IEVENT_GTSC)))
>> +               while ((gfar_read(&regs->ievent) &
>> +                        (IEVENT_GRSC | IEVENT_GTSC)) !=
>> +                        (IEVENT_GRSC | IEVENT_GTSC))
>>                        cpu_relax();
> 
> How about using spin_event_timeout()?  It streamlines this process and
> includes a timeout.
> 
> The U-Boot version of this code doesn't have a timeout either, but
> spin_event_timeout() is not available in U-Boot.

spin_event_timeout doesn't make sense for this.  The patch is fine.

- k

^ permalink raw reply

* Re: [PATCH] rdma/cm: Randomize local port allocation.
From: Cong Wang @ 2010-04-20  4:34 UTC (permalink / raw)
  To: David Miller
  Cc: penguin-kernel, sean.hefty, opurdila, eric.dumazet, netdev,
	nhorman, ebiederm, linux-kernel, rolandd, linux-rdma
In-Reply-To: <20100416.133001.262206466.davem@davemloft.net>

David Miller wrote:
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Fri, 16 Apr 2010 22:54:22 +0900
> 
>> Cong Wang wrote:
>>> Sean Hefty wrote:
>>>> I like this version, thanks!  I'm not sure which tree to merge it through.
>>>> Are you needing this for 2.6.34, or is 2.6.35 okay?
>>>>
>>> As soon as possible, so 2.6.34. :)
>>>
>> Cong, merge window for 2.6.34 was already closed.
>> You need to make your patchset towards 2.6.35 (using net-next-2.6 tree)
>> rather than 2.6.34 (using linux-2.6 tree). Therefore, this patch being
>> queued for 2.6.35 (through net-next-2.6 tree) should be okay for you.
> 
> I don't take RDMA patches into net-next-2.6, the less I touch this
> stack avoiding stuff the better and Roland has been taking this stuff
> into his own tree for some time now.

I left for a few days.

Ok, so I will wait for this to be merged.

Thanks, David and Tetsuo!


^ permalink raw reply

* Re: 2.6.34-rc5: Reported regressions from 2.6.33
From: Rafael J. Wysocki @ 2010-04-20  4:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Kernel Mailing List, Maciej Rutecki, Linus Torvalds,
	Kernel Testers List, Network Development, Linux ACPI,
	Linux PM List, Linux SCSI List, Linux Wireless List, DRI
In-Reply-To: <20100419205723.8724338c.akpm@linux-foundation.org>

On Tuesday 20 April 2010, Andrew Morton wrote:
> On Tue, 20 Apr 2010 05:15:57 +0200 (CEST) "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15812
> > Subject		: utsname.domainname not set in x86_32 processes (causing "YPBINDPROC_DOMAIN: domain not bound" errors)
> > Submitter	:  <adi@hexapodia.org>
> > Date		: 2010-04-19 21:28 (1 days old)
> 
> I merged hch's fix for this twelve seconds ago.

I updated the entry.

^ permalink raw reply

* [PATCH] rps: send IPIs ASAP
From: Changli Gao @ 2010-04-20  4:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: Tom Herbert, Eric Dumazet, netdev, Changli Gao

rps: send IPIs ASAP

In order to reduce latency, we'd better send IPIs ASAP to schedule the
corresponding NAPIs.

For NAPI drivers, we send IPIs immediately, and for the others, we defer them
to NET_RX_SOFTIRQ. In this patch, we move net_rps_action() to the beginning of
net_rx_action() to emulate a softirq with the higher priority than
NET_RX_SOFTIRQ.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/core/dev.c |   23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 05a2b29..d8fca21 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2363,6 +2363,10 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 {
 	struct softnet_data *queue;
 	unsigned long flags;
+#ifdef CONFIG_RPS
+	bool ni = !in_irq();
+	bool ipi = false;
+#endif
 
 	queue = &per_cpu(softnet_data, cpu);
 
@@ -2380,6 +2384,10 @@ enqueue:
 #endif
 			rps_unlock(queue);
 			local_irq_restore(flags);
+#ifdef CONFIG_RPS
+			if (ipi)
+				__smp_call_function_single(cpu, &queue->csd, 0);
+#endif
 			return NET_RX_SUCCESS;
 		}
 
@@ -2389,6 +2397,11 @@ enqueue:
 			if (cpu != smp_processor_id()) {
 				struct softnet_data *myqueue;
 
+				if (ni) {
+					ipi = true;
+					goto enqueue;
+				}
+
 				myqueue = &__get_cpu_var(softnet_data);
 				queue->rps_ipi_next = myqueue->rps_ipi_list;
 				myqueue->rps_ipi_list = queue;
@@ -3337,6 +3350,7 @@ static void net_rps_action(void)
 	struct softnet_data *locqueue = &__get_cpu_var(softnet_data);
 	struct softnet_data *remqueue = locqueue->rps_ipi_list;
 
+	local_irq_disable();
 	if (remqueue) {
 		locqueue->rps_ipi_list = NULL;
 
@@ -3350,9 +3364,10 @@ static void net_rps_action(void)
 							   &remqueue->csd, 0);
 			remqueue = next;
 		}
-	} else
-#endif
+	} else {
 		local_irq_enable();
+	}
+#endif
 }
 
 static void net_rx_action(struct softirq_action *h)
@@ -3362,6 +3377,8 @@ static void net_rx_action(struct softirq_action *h)
 	int budget = netdev_budget;
 	void *have;
 
+	net_rps_action();
+
 	local_irq_disable();
 
 	while (!list_empty(list)) {
@@ -3423,7 +3440,7 @@ static void net_rx_action(struct softirq_action *h)
 		netpoll_poll_unlock(have);
 	}
 out:
-	net_rps_action();
+	local_irq_enable();
 
 #ifdef CONFIG_NET_DMA
 	/*

^ permalink raw reply related

* 2.6.34-rc5: Reported regressions from 2.6.33
From: Rafael J. Wysocki @ 2010-04-20  3:15 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Maciej Rutecki, Andrew Morton, Linus Torvalds,
	Kernel Testers List, Network Development, Linux ACPI,
	Linux PM List, Linux SCSI List, Linux Wireless List, DRI

This message contains a list of some regressions from 2.6.33,
for which there are no fixes in the mainline known to the tracking team.
If any of them have been fixed already, please let us know.

If you know of any other unresolved regressions from 2.6.33, please let us
know either and we'll add them to the list.  Also, please let us know
if any of the entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply
to this message with CCs to the people involved in reporting and handling
the issue.


Listed regressions statistics:

  Date          Total  Pending  Unresolved
  ----------------------------------------
  2010-04-20       64       35          34
  2010-04-07       48       35          33
  2010-03-21       15       13          10


Unresolved regressions
----------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15812
Subject		: utsname.domainname not set in x86_32 processes (causing "YPBINDPROC_DOMAIN: domain not bound" errors)
Submitter	:  <adi-3HqRAUrWAWyGglJvpFV4uA@public.gmane.org>
Date		: 2010-04-19 21:28 (1 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15805
Subject		: reiserfs locking
Submitter	: Alexander Beregalov <a.beregalov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2010-04-15 21:02 (5 days old)
Message-ID	: <t2ka4423d671004151402n7b2dc425mdc9c6bb9640d63fb-JsoAwUIsXov1KXRcyAk9cg@public.gmane.orgl.com>
References	: http://marc.info/?l=linux-kernel&m=127136535323933&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15796
Subject		: [REGRESSION bisected] Sound goes too fast due to commit 7b3a177b0
Submitter	: Éric Piel <Eric.Piel-VkQ1JFuSMpfAbQlEx87xDw@public.gmane.org>
Date		: 2010-04-13 21:54 (7 days old)
First-Bad-Commit: http://kernel.org/git/linus/7b3a177b0d4f92b3431b8dca777313a07533a710
Message-ID	: <4BC4E812.6050602-VkQ1JFuSMpfAbQlEx87xDw@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=127119569009790&w=2
Handled-By	: Takashi Iwai <tiwai-l3A5Bk7waGM@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15795
Subject		: 2.6.34-rc4 : OOPS in unmap_vma
Submitter	: Parag Warudkar <parag.lkml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2010-04-14 (6 days old)
Message-ID	: <alpine.DEB.2.00.1004132147260.1881@parag-laptop>
References	: http://marc.info/?l=linux-kernel&m=127121006625429&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15790
Subject		: Meta-Bug: Regressions
Submitter	: Florian Mickler <fmickler-Mmb7MZpHnFY@public.gmane.org>
Date		: 2010-04-15 18:21 (5 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15788
Subject		: external usb sound card doesn't work after resume
Submitter	: François Valenduc <francois.valenduc-bmtTS95sd5BUM80lpFwj4w@public.gmane.org>
Date		: 2010-04-15 10:16 (5 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15774
Subject		: 2.6.34-rc3: eth0 (8139too): transmit queue 0 timed out
Submitter	: Németh Márton <nm127-Y8qEzhMunLyT9ig0jae3mg@public.gmane.org>
Date		: 2010-04-10 12:33 (10 days old)
Message-ID	: <4BC07022.6000708-Y8qEzhMunLyT9ig0jae3mg@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=127090287021976&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15768
Subject		: Incorrectly calculated free blocks result in ENOSPC from writepage
Submitter	: Dmitry Monakhov <dmonakhov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Date		: 2010-04-12 11:24 (8 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15744
Subject		: [2.6.34-rc1 REGRESSION] ahci 0000:00:1f.2: controller reset failed (0xffffffff)
Submitter	: Andy Isaacson <adi-3HqRAUrWAWyGglJvpFV4uA@public.gmane.org>
Date		: 2010-04-06 22:54 (14 days old)
Message-ID	: <<4BC51312.6080302-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org></desc>>
References	: http://marc.info/?l=linux-kernel&m=127059449031511&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15730
Subject		: Ugly rmap NULL ptr deref oopsie on hibernate (was Linux 2.6.34-rc3)
Submitter	: Borislav Petkov <bp-Gina5bIWoIWzQB+pC5nmwQ@public.gmane.org>
Date		: 2010-04-02 17:59 (18 days old)
Message-ID	: <20100402175937.GA19690-f9CnO7I+Q6zU6FkGJEIX5A@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=127023173329741&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15729
Subject		: BUG: physmap modprobe & rmmod
Submitter	: Randy Dunlap <randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Date		: 2010-04-02 20:40 (18 days old)
Message-ID	: <20100402134058.c4682716.randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=127024096210230&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15719
Subject		: virtio_net causing kernel BUG when running under VirtualBox
Submitter	: Thomas Müller <thomas-5bHTHlrcoh6zQB+pC5nmwQ@public.gmane.org>
Date		: 2010-03-27 14:32 (24 days old)
First-Bad-Commit: http://kernel.org/git/linus/9ab86bbcf8be755256f0a5e994e0b38af6b4d399
Message-ID	: <4BAE1707.2050803-5bHTHlrcoh6zQB+pC5nmwQ@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=126970039227740&w=4
Handled-By	: Shirley Ma <mashirle-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15717
Subject		: bluetooth oops
Submitter	: Pavel Machek <pavel-+ZI9xUNit7I@public.gmane.org>
Date		: 2010-03-14 20:14 (37 days old)
Message-ID	: <20100314201434.GE22059-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=126859771528426&w=4
Handled-By	: Marcel Holtmann <marcel-kz+m5ild9QBg9hUCZPvPmw@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15713
Subject		: hackbench regression due to commit 9dfc6e68bfe6e
Submitter	: Alex Shi <alex.shi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Date		: 2010-03-25 8:40 (26 days old)
First-Bad-Commit: http://kernel.org/git/linus/9dfc6e68bfe6ee452efb1a4e9ca26a9007f2b864
Message-ID	: <1269506457.4513.141.camel-c8rhgrCDLIED0+JXs3kMbRL4W9x8LtSr@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=126950632920682&w=4
Handled-By	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
		  Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15712
Subject		: [regression] 2.6.34-rc1 to -rc3 on zaurus: no longer boots
Submitter	: Pavel Machek <pavel-+ZI9xUNit7I@public.gmane.org>
Date		: 2010-04-01 6:06 (19 days old)
Message-ID	: <20100401060624.GA1329-+ZI9xUNit7I@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=127010200817402&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15711
Subject		: 2.6.34-rc3, BUG at mm/slab.c:2989
Submitter	: Heinz Diehl <htd-HjJ2MNWy62to6+H+lsi3Gti2O/JbrIOy@public.gmane.org>
Date		: 2010-04-01 17:52 (19 days old)
Message-ID	: <20100401175225.GA6581-HjJ2MNWy62to6+H+lsi3Gti2O/JbrIOy@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=127014437406250&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15704
Subject		: [r8169] WARNING: at net/sched/sch_generic.c
Submitter	: Sergey Senozhatsky <sergey.senozhatsky-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2010-03-31 10:21 (20 days old)
Message-ID	: <<20100331102142.GA3294-dY8u8AhHFaWtd10JCjopabkcH5ONE+aC@public.gmane.org>>
References	: http://marc.info/?l=linux-kernel&m=127003090406108&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15698
Subject		: Freeze on power-off / suspend to ram
Submitter	: arond <hector1987-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2010-04-05 13:53 (15 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15673
Subject		: 2.6.34-rc2: "ima_dec_counts: open/free imbalance"?
Submitter	: Thomas Meyer <thomas-VsYtu1Qij5c@public.gmane.org>
Date		: 2010-03-28 11:31 (23 days old)
Message-ID	: <1269775909.5301.4.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=126977593326800&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15672
Subject		: KVM bug, git bisected
Submitter	: Kent Overstreet <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2010-03-27 12:43 (24 days old)
First-Bad-Commit: http://kernel.org/git/linus/5beb49305251e5669852ed541e8e2f2f7696c53e
Message-ID	: <4BADFD74.8060904-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=126969385121711&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15671
Subject		: intel graphic card hanging (Hangcheck timer elapsed... GPU hung)
Submitter	: Norbert Preining <preining-DX+603jRYB8@public.gmane.org>
Date		: 2010-03-27 16:11 (24 days old)
Message-ID	: <20100327161104.GA12043-DqSSrKF0TaySnEC3TeqHn5dqbFPxfnh/@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=126970883105262&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15669
Subject		: INFO: suspicious rcu_dereference_check()
Submitter	: Zdenek Kabelac <zdenek.kabelac-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2010-03-08 1:26 (43 days old)
Message-ID	: <c4e36d111003250348q678eb2e6w4f3e8133e7fd6e58-JsoAwUIsXounXO2b/Sh1tA@public.gmane.orgom>
References	: http://marc.info/?l=linux-kernel&m=126801163107713&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15668
Subject		: start_kernel(): bug: interrupts were enabled early
Submitter	: Rabin Vincent <rabin-66gdRtMMWGc@public.gmane.org>
Date		: 2010-03-25 19:53 (26 days old)
First-Bad-Commit: http://kernel.org/git/linus/773e3eb7b81e5ba13b5155dfb3bb75b8ce37f8f9
Message-ID	: <20100325194100.GA2364@debian>
References	: http://marc.info/?l=linux-kernel&m=126954607216519&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15664
Subject		: Graphics hang and kernel backtrace when starting Azureus with Compiz enabled
Submitter	: Alex Villacis Lasso <avillaci-x0m+Mc+nT7uljOmnV8AmnkElSqmLX1BE@public.gmane.org>
Date		: 2010-04-01 01:09 (19 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15661
Subject		: PROBLEM: crash on halt with 2.6.34-0.16.rc2.git0.fc14.x86_64
Submitter	: Jon Masters <jonathan-Zp4isUonpHBD60Wz+7aTrA@public.gmane.org>
Date		: 2010-03-26 15:29 (25 days old)
Message-ID	: <<1269617372.3779.234.camel@localhost>>
References	: http://marc.info/?l=linux-kernel&m=126961739803949&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15659
Subject		: [Regresion] [2.6.34-rc1] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Submitter	: Maciej Rutecki <maciej.rutecki-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2010-03-25 20:04 (26 days old)
Message-ID	: <201003252104.24965.maciej.rutecki-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=126954749618319&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15625
Subject		: BUG: 2.6.34-rc1, RIP is (null)
Submitter	: Randy Dunlap <randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Date		: 2010-03-18 22:22 (33 days old)
Message-ID	: <4BA2A7A9.4080503-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=126895098217351&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15611
Subject		: Failure with the 2.6.34-rc1 kernel
Submitter	: Rupjyoti Sarmah <rsarmah-6mNVq6Owofk@public.gmane.org>
Date		: 2010-03-16 15:45 (35 days old)
Message-ID	: <AC311A8E81420D4EBC1F26E6479848FE065B7D3D-oUPhqDSr77q+n3Z1v9ZxkQ@public.gmane.org.amcc.com>
References	: http://marc.info/?l=linux-kernel&m=126875435718396&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15610
Subject		: fsck leads to swapper - BUG: unable to handle kernel NULL pointer dereference & panic
Submitter	: Ozgur Yuksel <ozgur.yuksel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Date		: 2010-03-22 15:59 (29 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15601
Subject		: [BUG] SLOB breaks Crypto
Submitter	: michael-dev-1SGGS//iJ+Y38rf8aCqVIw@public.gmane.org
Date		: 2010-03-15 13:39 (36 days old)
Message-ID	: <4B9E38AF.70309-1SGGS//iJ+Y38rf8aCqVIw@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=126866044724539&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15590
Subject		: 2.6.34-rc1: regression: ^Z no longer stops sound
Submitter	: Pavel Machek <pavel-+ZI9xUNit7I@public.gmane.org>
Date		: 2010-03-14 7:58 (37 days old)
Message-ID	: <20100314075831.GA13457-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=126855353122623&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15589
Subject		: 2.6.34-rc1: Badness at fs/proc/generic.c:316
Submitter	: Christian Kujau <lists-AanptEQQ3TL9uQeqpI+JUg@public.gmane.org>
Date		: 2010-03-13 23:53 (38 days old)
Message-ID	: <alpine.DEB.2.01.1003131544340.5493-uKsf7x9sgtqQ/Pez2Lbyp4QuADTiUCJX@public.gmane.org>
References	: http://marc.info/?l=linux-kernel&m=126852442903680&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15553
Subject		: Screen backlight doesn't come back on after lid was closed (GM45)
Submitter	:  <bugs-fbdoOxCsnNob1SvskN2V4Q@public.gmane.org>
Date		: 2010-03-17 14:35 (34 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15551
Subject		: WARNING: at net/mac80211/work.c:811 ieee80211_work_work+0x7f/0xde8 [mac80211]()
Submitter	: Alex Zhavnerchik <alex.vizor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2010-03-16 22:03 (35 days old)


Regressions with patches
------------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=15505
Subject		: No more b43 wireless interface since 2.6.34-rc1
Submitter	: Christian Casteyde <casteyde.christian-GANU6spQydw@public.gmane.org>
Date		: 2010-03-10 06:59 (41 days old)
Handled-By	: Yinghai Lu <yinghai-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Patch		: https://bugzilla.kernel.org/show_bug.cgi?id=15505#c11


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.33,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=15310

Please let the tracking team know if there are any Bugzilla entries that
should be added to the list in there.

Thanks!

^ permalink raw reply

* RE: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.
From: Xin, Xiaohui @ 2010-04-20  2:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mingo@elte.hu,
	jdike@linux.intel.com, davem@davemloft.net
In-Reply-To: <20100419102118.GA16198@redhat.com>

Michael,

>>>>>> What we have not done yet:
>>>>>> 	packet split support
>>>>>> 
>>>>>What does this mean, exactly?
>>>> We can support 1500MTU, but for jumbo frame, since vhost driver before don't 
>>>>support mergeable buffer, we cannot try it for multiple sg.
>>>> 
>>>I do not see why, vhost currently supports 64K buffers with indirect
>>>descriptors.
>>> 
>> The receive_skb() in guest virtio-net driver will merge the multiple sg to skb frags, how >>can indirect descriptors to that?

>See add_recvbuf_big.

I don't mean this, it's for buffer submission. I mean when packet is received, in receive_buf(), mergeable buffer knows which pages received can be hooked in skb frags, it's receive_mergeable() which do this.

When a NIC driver supports packet split mode, then each ring descriptor contains a skb and a page. When packet is received, if the status is not EOP, then hook the page of the next descriptor to the prev skb. We don't how many frags belongs to one skb. So when guest submit buffers, it should submit multiple pages, and when receive, the guest should know which pages are belongs to one skb and hook them together. I think receive_mergeable() can do this, but I don't see how big->packets handle this. May I miss something here?

Thanks
Xiaohui 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox