Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v5] rfs: Receive Flow Steering
From: David Miller @ 2010-04-17  0:58 UTC (permalink / raw)
  To: therbert; +Cc: eric.dumazet, netdev
In-Reply-To: <i2s65634d661004161722hece6f9d4naf528c37b63fffbc@mail.gmail.com>

From: Tom Herbert <therbert@google.com>
Date: Fri, 16 Apr 2010 17:22:49 -0700

> Ugh, vmalloc.h must be sneaking in through some other header file for
> me :-(  Sorry about that.  Do you need me to respin the patch?

No, I took care of it and am about to push things out to net-next-2.6
on kernel.org

^ permalink raw reply

* [PATCH] KS8851: NULL pointer dereference if list is empty
From: Abraham Arce @ 2010-04-17  0:48 UTC (permalink / raw)
  To: netdev

Fix NULL pointer dereference in ks8851_tx_work by checking if dequeued
list is already empty before writing the packet to TX FIFO

 Unable to handle kernel NULL pointer dereference at virtual address 00000050
 PC is at ks8851_tx_work+0xdc/0x1b0
 LR is at wait_for_common+0x148/0x164
 pc : [<c01c0df4>]    lr : [<c025a980>]    psr: 20000013
 Backtrace:
  ks8851_tx_work+0x0/0x1b0
  worker_thread+0x0/0x190
  kthread+0x0/0x90

Signed-off-by: Abraham Arce <x0066660@ti.com>
---
 drivers/net/ks8851.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ks8851.c b/drivers/net/ks8851.c
index 13cc1ca..9e9f9b3 100644
--- a/drivers/net/ks8851.c
+++ b/drivers/net/ks8851.c
@@ -722,12 +722,14 @@ static void ks8851_tx_work(struct work_struct *work)
 		txb = skb_dequeue(&ks->txq);
 		last = skb_queue_empty(&ks->txq);

-		ks8851_wrreg16(ks, KS_RXQCR, ks->rc_rxqcr | RXQCR_SDA);
-		ks8851_wrpkt(ks, txb, last);
-		ks8851_wrreg16(ks, KS_RXQCR, ks->rc_rxqcr);
-		ks8851_wrreg16(ks, KS_TXQCR, TXQCR_METFE);
+		if (txb != NULL) {
+			ks8851_wrreg16(ks, KS_RXQCR, ks->rc_rxqcr | RXQCR_SDA);
+			ks8851_wrpkt(ks, txb, last);
+			ks8851_wrreg16(ks, KS_RXQCR, ks->rc_rxqcr);
+			ks8851_wrreg16(ks, KS_TXQCR, TXQCR_METFE);

-		ks8851_done_tx(ks, txb);
+			ks8851_done_tx(ks, txb);
+		}
 	}

 	mutex_unlock(&ks->lock);
-- 
1.5.4.3

^ permalink raw reply related

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Tom Herbert @ 2010-04-17  0:22 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev
In-Reply-To: <20100416.155707.66748057.davem@davemloft.net>

Ugh, vmalloc.h must be sneaking in through some other header file for
me :-(  Sorry about that.  Do you need me to respin the patch?

Tom

On Fri, Apr 16, 2010 at 3:57 PM, David Miller <davem@davemloft.net> wrote:
> From: David Miller <davem@davemloft.net>
> Date: Fri, 16 Apr 2010 15:53:40 -0700 (PDT)
>
>> From: David Miller <davem@davemloft.net>
>> Date: Fri, 16 Apr 2010 15:49:32 -0700 (PDT)
>>
>>> Great, I'll add this to net-next-2.6 right now.
>>
>> I had to add an include of linux/vmalloc.h to net/core/sysctl_net_core.c
>> to fix the build while committing this.
>
> net/core/net-sysfs.c needed it too :-/
>

^ permalink raw reply

* Re: Network protocol (IP,IPv6,...) and TC actions (ACT_CSUM)
From: Grégoire Baron @ 2010-04-16 23:10 UTC (permalink / raw)
  To: netdev; +Cc: Jan Ceuleers
In-Reply-To: <4BC4AA5B.8040901@computer.org>

Thanks Jan, for the suggestion.

Hi,

I will re-explain my situation.

I started to write a new TC action (ACT_CSUM) in order to be able to
force, specially when ACT_PEDIT is used, the update of common checksums:
 * the IPv4 header checksum,
 * the ICMP/IGMP and ICMPv6 checksums,
 * the TCP/UDP checkusms,
 * and why not, more ...

Also, the idea is to support directly IPv4 and IPv6.
The best user interface (via iproute2/tc) could to not ask the final
user to assume a specific network protocol, but let the action
discover it.

With this aim, I would like to know if someone could confirm me the
struct sk_buff .protocol member is the good candidate to discover if I
have an IPv4, an IPv6 packet or any other network protocol, in the skb
got by the TC action code (supporting INGRESS and EGRESS).

Indeed, this struct sk_buff member could contain something like
ETH_P_8021Q, which isn't a network protocol Id ...

I think this kind of content isn't seen by the TC actions, which work
at the network level (even if their filter protocol flag accepts all).
If someone could confirm, thanks in advance.

By the same way, I've wondered if the struct sk_buff .len member could
be used to avoid to "discover" the network packet length in the TC
action code, especially in the case of IPv6 packets (and jumbogram ;-).
But, I think not, because it could be not the case in INGRESS TC action
execution, in my point of view, because the packet wasn't delivered to
the network protocol yet. Is my analysis right?

Thanks again for your help.

Best Regards,

Grégoire Baron

On Tue, Apr 13, 2010 at 07:31:07PM +0200, Jan Ceuleers wrote:
> Grégoire Baron wrote:
> > As this .protocol member seems to be used at different moments when a
> > packet is received, forwared or sent, and could contain something like
> > ETH_P_8021Q which isn't a network protocol Id, can we say the struct
> > sk_buff .protocol member is guaranteed to contain a network protocol Id
> > in the struct sb_buff used in the TC action executions ?
> 
> Grégoire,
> 
> I suggest that you ask your question on the netdev mailing list (netdev@vger.kernel.org).
> 
> Cheers, Jan

^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: David Miller @ 2010-04-16 22:57 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, netdev
In-Reply-To: <20100416.155340.256882855.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Fri, 16 Apr 2010 15:53:40 -0700 (PDT)

> From: David Miller <davem@davemloft.net>
> Date: Fri, 16 Apr 2010 15:49:32 -0700 (PDT)
> 
>> Great, I'll add this to net-next-2.6 right now.
> 
> I had to add an include of linux/vmalloc.h to net/core/sysctl_net_core.c
> to fix the build while committing this.

net/core/net-sysfs.c needed it too :-/

^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: David Miller @ 2010-04-16 22:53 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, netdev
In-Reply-To: <20100416.154932.147279343.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Fri, 16 Apr 2010 15:49:32 -0700 (PDT)

> Great, I'll add this to net-next-2.6 right now.

I had to add an include of linux/vmalloc.h to net/core/sysctl_net_core.c
to fix the build while committing this.

^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: David Miller @ 2010-04-16 22:49 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, netdev
In-Reply-To: <1271446679.16881.4298.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 16 Apr 2010 21:37:59 +0200

> Le jeudi 15 avril 2010 à 23:33 -0700, David Miller a écrit :
>> From: Tom Herbert <therbert@google.com>
>> Date: Thu, 15 Apr 2010 22:47:08 -0700 (PDT)
>> 
>> > Version 5 of RFS:
>> > - Moved rps_sock_flow_sysctl into net/core/sysctl_net_core.c as a
>> > static function.
>> > - Apply limits to rps_sock_flow_entires systcl and rps_flow_count
>> > sysfs variable.
>> 
>> I've read this over a few times and I think it's ready to go into
>> net-next-2.6, we can tweak things as-needed from here on out.
>> 
>> Eric, what do you think?
> 
> I think I can give my Sob, and we have time to fully test it and tweak
> it if necessary.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Great, I'll add this to net-next-2.6 right now.

Thanks!

^ permalink raw reply

* Re: [PATCH net-2.6] packet : remove init_net restriction
From: David Miller @ 2010-04-16 22:41 UTC (permalink / raw)
  To: daniel.lezcano; +Cc: netdev
In-Reply-To: <1271322674-21726-1-git-send-email-daniel.lezcano@free.fr>

From: Daniel Lezcano <daniel.lezcano@free.fr>
Date: Thu, 15 Apr 2010 11:11:14 +0200

> The af_packet protocol is used by Perl to do ioctls as reported by
> Stephane Riviere:
> 
> "Net::RawIP relies on SIOCGIFADDR et SIOCGIFHWADDR to get the IP and MAC
> addresses of the network interface."
> 
> But in a new network namespace these ioctl fail because it is disabled for
> a namespace different from the init_net_ns.
> 
> These two lines should not be there as af_inet and af_packet are
> namespace aware since a long time now. I suppose we forget to remove these
> lines because we sent the af_packet first, before af_inet was supported.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
> Reported-by: Stephane Riviere <stephane.riviere@regis-dgac.net>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH] WAN: flush tx_queue in hdlc_ppp to prevent panic on rmmod hw_driver.
From: David Miller @ 2010-04-16 22:41 UTC (permalink / raw)
  To: khc; +Cc: netdev
In-Reply-To: <m3mxx5mv8v.fsf@intrepid.localdomain>

From: Krzysztof Halasa <khc@pm.waw.pl>
Date: Thu, 15 Apr 2010 02:09:52 +0200

> tx_queue is used as a temporary queue when not allowed to queue skb
> directly to the hw device driver (which may sleep). Most paths flush
> it before returning, but ppp_start() currently cannot. Make sure we
> don't leave skbs pointing to a non-existent device.
> 
> Thanks to Michael Barkowski for reporting this problem.
> 
> Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>

Applied, thank you.

^ permalink raw reply

* [PATCH net-next-2.6] net: Introduce skb_orphan_try()
From: Eric Dumazet @ 2010-04-16 22:18 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100415.143321.200497785.davem@davemloft.net>

Le jeudi 15 avril 2010 à 14:33 -0700, David Miller a écrit :

> If it's not legal to skb_orphan() here then it would not be legal for
> the drivers to unconditionally skb_orphan(), which they do.
> 
> So either your test is unnecessary, or we have a big existing problem
> :-)

I cooked following patch, introducing skb_orphan_try() helper, to
document all known exceptions.

I have a possible followup for this patch :

Orphaning skbs earlier could also make dev_kfree_skb_irq() faster.
Instead of queing skb into completion_queue and triggering
NET_TX_SOFTIRQ, we would directly free an orphaned skb ?



[PATCH net-next-2.6] net: Introduce skb_orphan_try()

Transmitted skb might be attached to a socket and a destructor, for
memory accounting purposes.

Traditionally, this destructor is called at tx completion time, when skb
is freed.

When tx completion is performed by another cpu than the sender, this
forces some cache lines to change ownership. XPS was an attempt to give
tx completion to initial cpu.

David idea is to call destructor right before giving skb to device (call
to ndo_start_xmit()). Because device queues are usually small, orphaning
skb before tx completion is not a big deal. Some drivers already do
this, we could do it in upper level.

There is one known exception to this early orphaning, called tx
timestamping. It needs to keep a reference to socket until device can
give a hardware or software timestamp.

This patch adds a skb_orphan_try() helper, to centralize all exceptions
to early orphaning in one spot, and use it in dev_hard_start_xmit().

"tbench 16" results on a Nehalem machine (2 X5570  @ 2.93GHz)
before: Throughput 4428.9 MB/sec 16 procs
after: Throughput 4448.14 MB/sec 16 procs

UDP should get even better results, its destructor being more complex,
since SOCK_USE_WRITE_QUEUE is not set (four atomic ops instead of one)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/core/dev.c b/net/core/dev.c
index e8041eb..acae5fe 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1880,6 +1880,17 @@ static int dev_gso_segment(struct sk_buff *skb)
 	return 0;
 }
 
+/*
+ * Try to orphan skb early, right before transmission by the device.
+ * We cannot orphan skb if tx timestamp is requested, since
+ * drivers need to call skb_tstamp_tx() to send the timestamp.
+ */
+static inline void skb_orphan_try(struct sk_buff *skb)
+{
+	if (!skb_tx(skb)->flags)
+		skb_orphan(skb);
+}
+
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			struct netdev_queue *txq)
 {
@@ -1904,23 +1915,10 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(skb);
 
+		skb_orphan_try(skb);
 		rc = ops->ndo_start_xmit(skb, dev);
 		if (rc == NETDEV_TX_OK)
 			txq_trans_update(txq);
-		/*
-		 * TODO: if skb_orphan() was called by
-		 * dev->hard_start_xmit() (for example, the unmodified
-		 * igb driver does that; bnx2 doesn't), then
-		 * skb_tx_software_timestamp() will be unable to send
-		 * back the time stamp.
-		 *
-		 * How can this be prevented? Always create another
-		 * reference to the socket before calling
-		 * dev->hard_start_xmit()? Prevent that skb_orphan()
-		 * does anything in dev->hard_start_xmit() by clearing
-		 * the skb destructor before the call and restoring it
-		 * afterwards, then doing the skb_orphan() ourselves?
-		 */
 		return rc;
 	}
 
@@ -1938,6 +1936,7 @@ gso:
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(nskb);
 
+		skb_orphan_try(nskb);
 		rc = ops->ndo_start_xmit(nskb, dev);
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)



^ permalink raw reply related

* [PATCH] gigaset: include cleanup cleanup
From: Tilman Schmidt @ 2010-04-16 22:08 UTC (permalink / raw)
  To: Karsten Keil, David Miller
  Cc: Tejun Heo, Hansjoerg Lipp, i4ldeveloper, netdev, linux-kernel

Commit 5a0e3ad causes slab.h to be included twice in many of the
Gigaset driver's source files, first via the common include file
gigaset.h and then a second time directly. Drop the spares, and
use the opportunity to clean up a few more similar cases.

Impact: cleanup, no functional change
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
CC: Tejun Heo <tj@kernel.org>
---
Seeing that the "include cleanup" patch triggering this was accepted
after the merge window, I have hopes this one will be accepted, too.

 drivers/isdn/gigaset/bas-gigaset.c |    5 -----
 drivers/isdn/gigaset/capi.c        |    2 --
 drivers/isdn/gigaset/common.c      |    2 --
 drivers/isdn/gigaset/gigaset.h     |    2 +-
 drivers/isdn/gigaset/i4l.c         |    1 -
 drivers/isdn/gigaset/interface.c   |    1 -
 drivers/isdn/gigaset/proc.c        |    1 -
 drivers/isdn/gigaset/ser-gigaset.c |    3 ---
 drivers/isdn/gigaset/usb-gigaset.c |    4 ----
 9 files changed, 1 insertions(+), 20 deletions(-)

diff --git a/drivers/isdn/gigaset/bas-gigaset.c b/drivers/isdn/gigaset/bas-gigaset.c
index 0be15c7..47a5ffe 100644
--- a/drivers/isdn/gigaset/bas-gigaset.c
+++ b/drivers/isdn/gigaset/bas-gigaset.c
@@ -14,11 +14,6 @@
  */
 
 #include "gigaset.h"
-
-#include <linux/errno.h>
-#include <linux/init.h>
-#include <linux/slab.h>
-#include <linux/timer.h>
 #include <linux/usb.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
diff --git a/drivers/isdn/gigaset/capi.c b/drivers/isdn/gigaset/capi.c
index eb7e271..964a55f 100644
--- a/drivers/isdn/gigaset/capi.c
+++ b/drivers/isdn/gigaset/capi.c
@@ -12,8 +12,6 @@
  */
 
 #include "gigaset.h"
-#include <linux/slab.h>
-#include <linux/ctype.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 #include <linux/isdn/capilli.h>
diff --git a/drivers/isdn/gigaset/common.c b/drivers/isdn/gigaset/common.c
index 0b39b38..f6f45f2 100644
--- a/drivers/isdn/gigaset/common.c
+++ b/drivers/isdn/gigaset/common.c
@@ -14,10 +14,8 @@
  */
 
 #include "gigaset.h"
-#include <linux/ctype.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
-#include <linux/slab.h>
 
 /* Version Information */
 #define DRIVER_AUTHOR "Hansjoerg Lipp <hjlipp@web.de>, Tilman Schmidt <tilman@imap.cc>, Stefan Eilers"
diff --git a/drivers/isdn/gigaset/gigaset.h b/drivers/isdn/gigaset/gigaset.h
index 9ef5b04..d32efb6 100644
--- a/drivers/isdn/gigaset/gigaset.h
+++ b/drivers/isdn/gigaset/gigaset.h
@@ -22,9 +22,9 @@
 #include <linux/kernel.h>
 #include <linux/compiler.h>
 #include <linux/types.h>
+#include <linux/ctype.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
-#include <linux/usb.h>
 #include <linux/skbuff.h>
 #include <linux/netdevice.h>
 #include <linux/ppp_defs.h>
diff --git a/drivers/isdn/gigaset/i4l.c b/drivers/isdn/gigaset/i4l.c
index c99fb97..c22e5ac 100644
--- a/drivers/isdn/gigaset/i4l.c
+++ b/drivers/isdn/gigaset/i4l.c
@@ -15,7 +15,6 @@
 
 #include "gigaset.h"
 #include <linux/isdnif.h>
-#include <linux/slab.h>
 
 #define HW_HDR_LEN	2	/* Header size used to store ack info */
 
diff --git a/drivers/isdn/gigaset/interface.c b/drivers/isdn/gigaset/interface.c
index f0dc6c9..c9f28dd 100644
--- a/drivers/isdn/gigaset/interface.c
+++ b/drivers/isdn/gigaset/interface.c
@@ -13,7 +13,6 @@
 
 #include "gigaset.h"
 #include <linux/gigaset_dev.h>
-#include <linux/tty.h>
 #include <linux/tty_flip.h>
 
 /*** our ioctls ***/
diff --git a/drivers/isdn/gigaset/proc.c b/drivers/isdn/gigaset/proc.c
index b69f73a..b943efb 100644
--- a/drivers/isdn/gigaset/proc.c
+++ b/drivers/isdn/gigaset/proc.c
@@ -14,7 +14,6 @@
  */
 
 #include "gigaset.h"
-#include <linux/ctype.h>
 
 static ssize_t show_cidmode(struct device *dev,
 			    struct device_attribute *attr, char *buf)
diff --git a/drivers/isdn/gigaset/ser-gigaset.c b/drivers/isdn/gigaset/ser-gigaset.c
index 8b0afd2..e96c058 100644
--- a/drivers/isdn/gigaset/ser-gigaset.c
+++ b/drivers/isdn/gigaset/ser-gigaset.c
@@ -11,13 +11,10 @@
  */
 
 #include "gigaset.h"
-
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/platform_device.h>
-#include <linux/tty.h>
 #include <linux/completion.h>
-#include <linux/slab.h>
 
 /* Version Information */
 #define DRIVER_AUTHOR "Tilman Schmidt"
diff --git a/drivers/isdn/gigaset/usb-gigaset.c b/drivers/isdn/gigaset/usb-gigaset.c
index 9430a2b..76dbb20 100644
--- a/drivers/isdn/gigaset/usb-gigaset.c
+++ b/drivers/isdn/gigaset/usb-gigaset.c
@@ -16,10 +16,6 @@
  */
 
 #include "gigaset.h"
-
-#include <linux/errno.h>
-#include <linux/init.h>
-#include <linux/slab.h>
 #include <linux/usb.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
-- 
1.6.5.3.298.g39add

^ permalink raw reply related

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Eric Dumazet @ 2010-04-16 21:25 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, netdev
In-Reply-To: <1271452358.16881.4486.camel@edumazet-laptop>

Le vendredi 16 avril 2010 à 23:12 +0200, Eric Dumazet a écrit :
> Le vendredi 16 avril 2010 à 13:42 -0700, Tom Herbert a écrit :
> > On Fri, Apr 16, 2010 at 11:53 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > Le vendredi 16 avril 2010 à 11:35 -0700, Tom Herbert a écrit :
> > >> Results with "tbench 16" on an 8 core Intel machine.
> > >>
> > >> No RPS/RFS:  2155 MB/sec
> > >> RPS (0ff mask): 1700 MB/sec
> > >> RFS: 1097
> > >>
> > 
> > Blah, I mistakingly reported that... should have been:
> > 
> > No RPS/RFS:  2155 MB/sec
> > RPS (0ff mask): 1097 MB/sec
> > RFS: 1700 MB/sec
> > 
> > Sorry about that!
> 
> > This was my expectation too, and what my "corrected" numbers show :-)
> > But, I take it this is different in your results?
> 
> 
> My results are on a "tbench 16" on an dual X5570  @ 2.93GHz.
> (16 logical cpus)
> 
> No RPS , no RFS : 4448.14 MB/sec 
> RPS : 2298.00 MB/sec (but lot of variation)
> RFS : 2600 MB/sec
> 
> Maybe my RFS setup is bad ?
> (8192 flows)
> 

Very strange, a second tbench-16 RFS=y run gave me 2134.08 MB/sec 

A third run gave me 1813.21 MB/sec 
A fourth run gave me 2472.91 MB/sec 

Hmm...





^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Eric Dumazet @ 2010-04-16 21:12 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, netdev
In-Reply-To: <u2t65634d661004161342zeadb5602w73c369ec717dc6e1@mail.gmail.com>

Le vendredi 16 avril 2010 à 13:42 -0700, Tom Herbert a écrit :
> On Fri, Apr 16, 2010 at 11:53 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le vendredi 16 avril 2010 à 11:35 -0700, Tom Herbert a écrit :
> >> Results with "tbench 16" on an 8 core Intel machine.
> >>
> >> No RPS/RFS:  2155 MB/sec
> >> RPS (0ff mask): 1700 MB/sec
> >> RFS: 1097
> >>
> 
> Blah, I mistakingly reported that... should have been:
> 
> No RPS/RFS:  2155 MB/sec
> RPS (0ff mask): 1097 MB/sec
> RFS: 1700 MB/sec
> 
> Sorry about that!

> This was my expectation too, and what my "corrected" numbers show :-)
> But, I take it this is different in your results?


My results are on a "tbench 16" on an dual X5570  @ 2.93GHz.
(16 logical cpus)

No RPS , no RFS : 4448.14 MB/sec 
RPS : 2298.00 MB/sec (but lot of variation)
RFS : 2600 MB/sec

Maybe my RFS setup is bad ?
(8192 flows)



^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Tom Herbert @ 2010-04-16 20:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1271443994.16881.4249.camel@edumazet-laptop>

On Fri, Apr 16, 2010 at 11:53 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 16 avril 2010 à 11:35 -0700, Tom Herbert a écrit :
>> Results with "tbench 16" on an 8 core Intel machine.
>>
>> No RPS/RFS:  2155 MB/sec
>> RPS (0ff mask): 1700 MB/sec
>> RFS: 1097
>>

Blah, I mistakingly reported that... should have been:

No RPS/RFS:  2155 MB/sec
RPS (0ff mask): 1097 MB/sec
RFS: 1700 MB/sec

Sorry about that!

>> I am not particularly surprised by the results, using loopback
>> interface already provides good parallelism and RPS/RFS really would
>> only add overhead and more trips between CPUs (last part is why RPS <
>> RFS I suspect)-- I guess this is why we've never enabled RPS on
>> loopback :-)
>>
>> Eric, do you have a particular concern that this could affect a real workload?
>>
>
> I was expecting RFS to be better than RPS at least, for this particular
> workload (tcp over loopback)
>
This was my expectation too, and what my "corrected" numbers show :-)
But, I take it this is different in your results?

Tom

^ permalink raw reply

* Re: [PATCH] rdma/cm: Randomize local port allocation.
From: David Miller @ 2010-04-16 20:30 UTC (permalink / raw)
  To: penguin-kernel-JPay3/Yim36HaxMnTkn67Xf5DAMn2ifp
  Cc: amwang-H+wXaHxf7aLQT0dZR+AlfA, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	opurdila-+zzKsuq53OdBDgjK7y7TUQ,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	rolandd-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <201004162254.FJF73478.SHOOMOFtQFVJLF-JPay3/Yim36HaxMnTkn67Xf5DAMn2ifp@public.gmane.org>

From: Tetsuo Handa <penguin-kernel-JPay3/Yim36HaxMnTkn67Xf5DAMn2ifp@public.gmane.org>
Date: Fri, 16 Apr 2010 22:54:22 +0900

> Cong Wang wrote:
>> Sean Hefty wrote:
>> > I like this version, thanks!  I'm not sure which tree to merge it through.
>> > Are you needing this for 2.6.34, or is 2.6.35 okay?
>> > 
>> 
>> As soon as possible, so 2.6.34. :)
>> 
> Cong, merge window for 2.6.34 was already closed.
> You need to make your patchset towards 2.6.35 (using net-next-2.6 tree)
> rather than 2.6.34 (using linux-2.6 tree). Therefore, this patch being
> queued for 2.6.35 (through net-next-2.6 tree) should be okay for you.

I don't take RDMA patches into net-next-2.6, the less I touch this
stack avoiding stuff the better and Roland has been taking this stuff
into his own tree for some time now.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] mac8390: fix pr_info() calls and change return code
From: David Miller @ 2010-04-16 20:28 UTC (permalink / raw)
  To: fthain; +Cc: joe, p_gortmaker, netdev, linux-kernel, linux-m68k
In-Reply-To: <alpine.OSX.2.00.1004162340370.271@localhost>

From: Finn Thain <fthain@telegraphics.com.au>
Date: Fri, 16 Apr 2010 23:57:34 +1000 (EST)

> 
> On Thu, 15 Apr 2010, Joe Perches wrote:
> 
>> ...Why is it better to use -EBUSY?
> 
> Nubus slots are geographically addressed and their irqs are equally 
> inflexible. -EAGAIN is misleading because retrying will not help fix 
> whatever bug caused the irq to unavailable.

This is exactly the kind of background information and verbose
explanation that belongs in the commit message.

Yet in your recent version of the patch, you're still being extremely
terse as per the reasoning for using -EBUSY

Just saying it's "misleading" doesn't tell anyone anything if they
have to go back in the commit history and try to figure out why this
change was made if it's causing problems later.

Please make the verbose and complete explanation in your commit
message, and resubmit your patch.

I just want to point out that with all the trouble you gave about
Joe's work, you're having one heck of a time even submitting your
changes properly. :-)

Thanks.

^ permalink raw reply

* Re: [PATCH net-2.6] packet : remove init_net restriction
From: David Miller @ 2010-04-16 20:23 UTC (permalink / raw)
  To: daniel.lezcano; +Cc: netdev
In-Reply-To: <4BC87C7C.4060407@free.fr>

From: Daniel Lezcano <daniel.lezcano@free.fr>
Date: Fri, 16 Apr 2010 17:04:28 +0200

> Shall I send it against net-next-2.6 ?

No, I'll likely add it to net-2.6, I just haven't gotten around
to it yet.

Thanks.

^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Eric Dumazet @ 2010-04-16 19:37 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, netdev
In-Reply-To: <20100415.233334.242114544.davem@davemloft.net>

Le jeudi 15 avril 2010 à 23:33 -0700, David Miller a écrit :
> From: Tom Herbert <therbert@google.com>
> Date: Thu, 15 Apr 2010 22:47:08 -0700 (PDT)
> 
> > Version 5 of RFS:
> > - Moved rps_sock_flow_sysctl into net/core/sysctl_net_core.c as a
> > static function.
> > - Apply limits to rps_sock_flow_entires systcl and rps_flow_count
> > sysfs variable.
> 
> I've read this over a few times and I think it's ready to go into
> net-next-2.6, we can tweak things as-needed from here on out.
> 
> Eric, what do you think?

I think I can give my Sob, and we have time to fully test it and tweak
it if necessary.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Thanks Tom !



^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Eric Dumazet @ 2010-04-16 18:53 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, netdev
In-Reply-To: <l2m65634d661004161135h1c1466afi54787022bfc2ce12@mail.gmail.com>

Le vendredi 16 avril 2010 à 11:35 -0700, Tom Herbert a écrit :
> Results with "tbench 16" on an 8 core Intel machine.
> 
> No RPS/RFS:  2155 MB/sec
> RPS (0ff mask): 1700 MB/sec
> RFS: 1097
> 
> I am not particularly surprised by the results, using loopback
> interface already provides good parallelism and RPS/RFS really would
> only add overhead and more trips between CPUs (last part is why RPS <
> RFS I suspect)-- I guess this is why we've never enabled RPS on
> loopback :-)
> 
> Eric, do you have a particular concern that this could affect a real workload?
> 

I was expecting RFS to be better than RPS at least, for this particular
workload (tcp over loopback)

With RPS, the hash function of (127.0.0.1, port1, 127.0.0.1, port2)
is different than (127.0.0.1, port2, 127.0.0.1, port1), so basically we
force the server to run on different processor than client

However, I was expecting that with RFS, client and server would run on
same cpu.

Maybe we could change (for a test) hash function to use  (sport ^ dport)
instead of (sport << 16) + dport 




^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Tom Herbert @ 2010-04-16 18:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1271401007.16881.3762.camel@edumazet-laptop>

Results with "tbench 16" on an 8 core Intel machine.

No RPS/RFS:  2155 MB/sec
RPS (0ff mask): 1700 MB/sec
RFS: 1097

I am not particularly surprised by the results, using loopback
interface already provides good parallelism and RPS/RFS really would
only add overhead and more trips between CPUs (last part is why RPS <
RFS I suspect)-- I guess this is why we've never enabled RPS on
loopback :-)

Eric, do you have a particular concern that this could affect a real workload?

Tom


On Thu, Apr 15, 2010 at 11:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 15 avril 2010 à 23:33 -0700, David Miller a écrit :
>> From: Tom Herbert <therbert@google.com>
>> Date: Thu, 15 Apr 2010 22:47:08 -0700 (PDT)
>>
>> > Version 5 of RFS:
>> > - Moved rps_sock_flow_sysctl into net/core/sysctl_net_core.c as a
>> > static function.
>> > - Apply limits to rps_sock_flow_entires systcl and rps_flow_count
>> > sysfs variable.
>>
>> I've read this over a few times and I think it's ready to go into
>> net-next-2.6, we can tweak things as-needed from here on out.
>>
>> Eric, what do you think?
>
> I read the patch and found no error.
>
> I booted a test machine and performed some tests
>
> I am a bit worried of a tbench regression I am looking at right now.
>
> if RFS disabled , tbench 16   ->  4408.63 MB/sec
>
>
> # grep . /sys/class/net/lo/queues/rx-0/*
> /sys/class/net/lo/queues/rx-0/rps_cpus:00000000
> /sys/class/net/lo/queues/rx-0/rps_flow_cnt:8192
> # cat /proc/sys/net/core/rps_sock_flow_entries
> 8192
>
>
> echo ffff >/sys/class/net/lo/queues/rx-0/rps_cpus
>
> tbench 16 -> 2336.32 MB/sec
>
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>   PerfTop:   14561 irqs/sec  kernel:86.3% [1000Hz cycles],  (all, 16 CPUs)
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>             samples  pcnt function                       DSO
>             _______ _____ ______________________________ __________________________________________________________
>
>             2664.00  5.1% copy_user_generic_string       /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>             2323.00  4.4% acpi_os_read_port              /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>             1641.00  3.1% _raw_spin_lock_irqsave         /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>             1260.00  2.4% schedule                       /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>             1159.00  2.2% _raw_spin_lock                 /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>             1051.00  2.0% tcp_ack                        /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              991.00  1.9% tcp_sendmsg                    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              922.00  1.8% tcp_recvmsg                    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              821.00  1.6% child_run                      /usr/bin/tbench
>              766.00  1.5% all_string_sub                 /usr/bin/tbench
>              630.00  1.2% __switch_to                    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              608.00  1.2% __GI_strchr                    /lib/tls/libc-2.3.4.so
>              606.00  1.2% ipt_do_table                   /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              600.00  1.1% __GI_strstr                    /lib/tls/libc-2.3.4.so
>              556.00  1.1% __netif_receive_skb            /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              504.00  1.0% tcp_transmit_skb               /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              502.00  1.0% tick_nohz_stop_sched_tick      /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              481.00  0.9% _raw_spin_unlock_irqrestore    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              473.00  0.9% next_token                     /usr/bin/tbench
>              449.00  0.9% ip_rcv                         /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              423.00  0.8% call_function_single_interrupt /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              422.00  0.8% ia32_sysenter_target           /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              420.00  0.8% compat_sys_socketcall          /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              401.00  0.8% mod_timer                      /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              400.00  0.8% process_backlog                /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              399.00  0.8% ip_queue_xmit                  /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              387.00  0.7% select_task_rq_fair            /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              377.00  0.7% _raw_spin_lock_bh              /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              360.00  0.7% tcp_v4_rcv                     /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>
> But if RFS is on, why activating rps_cpus change tbench ?
>
>
>
>

^ permalink raw reply

* Re: [PATCH v4] rfs: Receive Flow Steering
From: Rick Jones @ 2010-04-16 18:32 UTC (permalink / raw)
  To: Paul Turner
  Cc: Tom Herbert, Stephen Hemminger, davem, netdev, eric.dumazet,
	Ingo Molnar
In-Reply-To: <i2oed628a921004161059z65a3cf1aq5f3cd2194f40a811@mail.gmail.com>

> Even under a hybrid model I think phrasing it as networking leading
> the scheduler here is a little strong.  The scheduler is in both cases
> the most 'informed' place to make these decisions, but I think it
> could benefit from more knowledge.  In the 'virgin' single flow case
> without any steering the network stack is currently able to implicitly
> hint to the scheduler where flows could be most efficiently served due
> to wake-affine balancing behaviors.  This is a natural side-effect of
> wake-ups being sourced by the networking cpus.

Hinting to the scheduler is fine - so long as the final say is the scheduler. 
Presumably it is the thing that knows about the other forces tugging at where to 
run the thread - where its memory is allocated, what other flows are coming to 
it etc.

rick jones

^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Eric Dumazet @ 2010-04-16 18:15 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, netdev
In-Reply-To: <w2t65634d661004160835z4a604ee7pb5f9d395fe61b5db@mail.gmail.com>

Le vendredi 16 avril 2010 à 08:35 -0700, Tom Herbert a écrit :
> Eric, thanks for testing that.  Admittedly, we have looked at enabling
> RFS/RPS over loopback.   I'll look at that today also.
> 
> 

Hi Tom

I am sorry, but I could not work on this today. I hope I can find some
time a bit later.



> On Thu, Apr 15, 2010 at 11:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le jeudi 15 avril 2010 à 23:33 -0700, David Miller a écrit :
> >> From: Tom Herbert <therbert@google.com>
> >> Date: Thu, 15 Apr 2010 22:47:08 -0700 (PDT)
> >>
> >> > Version 5 of RFS:
> >> > - Moved rps_sock_flow_sysctl into net/core/sysctl_net_core.c as a
> >> > static function.
> >> > - Apply limits to rps_sock_flow_entires systcl and rps_flow_count
> >> > sysfs variable.
> >>
> >> I've read this over a few times and I think it's ready to go into
> >> net-next-2.6, we can tweak things as-needed from here on out.
> >>
> >> Eric, what do you think?
> >
> > I read the patch and found no error.
> >
> > I booted a test machine and performed some tests
> >
> > I am a bit worried of a tbench regression I am looking at right now.
> >
> > if RFS disabled , tbench 16   ->  4408.63 MB/sec
> >
> >
> > # grep . /sys/class/net/lo/queues/rx-0/*
> > /sys/class/net/lo/queues/rx-0/rps_cpus:00000000
> > /sys/class/net/lo/queues/rx-0/rps_flow_cnt:8192
> > # cat /proc/sys/net/core/rps_sock_flow_entries
> > 8192
> >
> >
> > echo ffff >/sys/class/net/lo/queues/rx-0/rps_cpus
> >
> > tbench 16 -> 2336.32 MB/sec
> >
> >
> > -----------------------------------------------------------------------------------------------------------------------------------------------------
> >   PerfTop:   14561 irqs/sec  kernel:86.3% [1000Hz cycles],  (all, 16 CPUs)
> > -----------------------------------------------------------------------------------------------------------------------------------------------------
> >
> >             samples  pcnt function                       DSO
> >             _______ _____ ______________________________ __________________________________________________________
> >
> >             2664.00  5.1% copy_user_generic_string       /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >             2323.00  4.4% acpi_os_read_port              /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >             1641.00  3.1% _raw_spin_lock_irqsave         /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >             1260.00  2.4% schedule                       /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >             1159.00  2.2% _raw_spin_lock                 /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >             1051.00  2.0% tcp_ack                        /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              991.00  1.9% tcp_sendmsg                    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              922.00  1.8% tcp_recvmsg                    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              821.00  1.6% child_run                      /usr/bin/tbench
> >              766.00  1.5% all_string_sub                 /usr/bin/tbench
> >              630.00  1.2% __switch_to                    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              608.00  1.2% __GI_strchr                    /lib/tls/libc-2.3.4.so
> >              606.00  1.2% ipt_do_table                   /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              600.00  1.1% __GI_strstr                    /lib/tls/libc-2.3.4.so
> >              556.00  1.1% __netif_receive_skb            /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              504.00  1.0% tcp_transmit_skb               /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              502.00  1.0% tick_nohz_stop_sched_tick      /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              481.00  0.9% _raw_spin_unlock_irqrestore    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              473.00  0.9% next_token                     /usr/bin/tbench
> >              449.00  0.9% ip_rcv                         /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              423.00  0.8% call_function_single_interrupt /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              422.00  0.8% ia32_sysenter_target           /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              420.00  0.8% compat_sys_socketcall          /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              401.00  0.8% mod_timer                      /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              400.00  0.8% process_backlog                /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              399.00  0.8% ip_queue_xmit                  /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              387.00  0.7% select_task_rq_fair            /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              377.00  0.7% _raw_spin_lock_bh              /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              360.00  0.7% tcp_v4_rcv                     /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >
> > But if RFS is on, why activating rps_cpus change tbench ?
> >
> >
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply

* Re: [PATCH v4] rfs: Receive Flow Steering
From: Paul Turner @ 2010-04-16 17:59 UTC (permalink / raw)
  To: Rick Jones
  Cc: Tom Herbert, Stephen Hemminger, davem, netdev, eric.dumazet,
	Ingo Molnar
In-Reply-To: <4BC89F6D.2080604@hp.com>

On Fri, Apr 16, 2010 at 10:33 AM, Rick Jones <rick.jones2@hp.com> wrote:
>>
>> This is true.  There is a fundamental question of whether scheduler
>> should lead networking or vice versa.  The advantages of networking
>> following scheduler seem to become more apparent on heavily loaded
>> systems or with threads that handle more than one flow.
>
> I will confess to being in the networking should follow the scheduler camp
> :)
>
>> I'm not sure these two models have to be mutually exclusive, we are
>> looking at some ways to make a hybrid model.
>
> It is perhaps too speculative on my part, but if the host has no control
> over the remote addressing of the connections to/from it, doesn't that
> suggest that allowing networking to lead the scheduler gives "external
> forces" more say in intra-system resource consumption than we might want
> them to have?
>
> rick jones
>

Even under a hybrid model I think phrasing it as networking leading
the scheduler here is a little strong.  The scheduler is in both cases
the most 'informed' place to make these decisions, but I think it
could benefit from more knowledge.  In the 'virgin' single flow case
without any steering the network stack is currently able to implicitly
hint to the scheduler where flows could be most efficiently served due
to wake-affine balancing behaviors.  This is a natural side-effect of
wake-ups being sourced by the networking cpus.

I think the win here would be allowing this (naturally existing)
hinting to be a little more explicit so that the scheduler and
load-balancer are able to gracefully 'collapse' back down onto the
network cpu socket under low stress conditions, even if previous
processing was balanced away from it due to load.

This would actually then look very much like today's model under loads
where you don't need scaling via parallelism.  One way to think about
making it an explicit hint could be: should the rx cpu sourcing the
wake-up in this case be the target for wake-affine as opposed to the
current bottom-half delegate?

- Paul

^ permalink raw reply

* Re: [PATCH v4] rfs: Receive Flow Steering
From: Rick Jones @ 2010-04-16 17:33 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Stephen Hemminger, davem, netdev, eric.dumazet, Ingo Molnar,
	Paul Turner
In-Reply-To: <o2k65634d661004160851wc00c609p7136a22fd07503c1@mail.gmail.com>

> 
> This is true.  There is a fundamental question of whether scheduler
> should lead networking or vice versa.  The advantages of networking
> following scheduler seem to become more apparent on heavily loaded
> systems or with threads that handle more than one flow.

I will confess to being in the networking should follow the scheduler camp :)

> I'm not sure these two models have to be mutually exclusive, we are
> looking at some ways to make a hybrid model.

It is perhaps too speculative on my part, but if the host has no control over 
the remote addressing of the connections to/from it, doesn't that suggest that 
allowing networking to lead the scheduler gives "external forces" more say in 
intra-system resource consumption than we might want them to have?

rick jones

^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: Tom Herbert @ 2010-04-16 15:57 UTC (permalink / raw)
  To: hadi; +Cc: Eric Dumazet, netdev, robert, David Miller, Changli Gao,
	Andi Kleen
In-Reply-To: <1271271222.4567.51.camel@bigi>

> It would be valuable to have something like Documentation/networking/rps
> to detail things a little more.
>

Working on it.  Will try to post data for several platforms soon.

> cheers,
> jamal
>
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox