* RFC: NAPI packet weighting patch
@ 2005-05-26 21:36 Mitch Williams
2005-05-27 8:21 ` Robert Olsson
` (4 more replies)
0 siblings, 5 replies; 18+ messages in thread
From: Mitch Williams @ 2005-05-26 21:36 UTC (permalink / raw)
To: netdev; +Cc: john.ronciak, ganesh.venkatesan, jesse.brandeburg
The following patch (which applies to 2.6.12rc4) adds a new sysctl
parameter called 'netdev_packet_weight'. This parameter controls how many
backlog work units each RX packet is worth.
With the parameter set to 0 (the default), NAPI polling works exactly as
it does today: each packet is worth one backlog work unit, and the
maximum number of received packets that will be processed in any given
softirq is controlled by the 'netdev_max_backlog' parameter.
By setting the netdev_packet_weight to a nonzero value, we make each
packet worth more than one backlog work unit. Since it's a shift value, a
setting of 1 makes each packet worth 2 work units, a setting of 2 makes
each packet worth 4 units, etc. Under normal circumstances you would
never use a value higher than 3, though 4 might work for Gigabit and 10
Gigabit networks.
By increasing the packet weight, we accomplish two things: first, we
cause the individual NAPI RX loops in each driver to process fewer
packets. This means that they will free up RX resources to the hardware
more often, which reduces the possibility of dropped packets. Second, it
shortens the total time spent in the NAPI softirq, which can free the CPU
to handle other tasks more often, thus reducing overall latency.
Performance tests in our lab have shown that tweaking this parameter,
along with the netdev_max_backlog parameter, can provide significant
performance increase -- greater than 100Mbps improvement -- over default
settings. I tested with both e1000 and tg3 drivers and saw improvement in
both cases. I did not see higher CPU utilization, even with the increased
throughput.
The caveat, of course, is that different systems and network
configurations require different settings. On the other hand, that's
really no different than what we see with the max_backlog parameter today.
On some systems neither parameter makes any difference.
Still, we feel that there is value to having this in the kernel. Please
test and comment as you have time available.
Thanks!
-Mitch Williams
mitch.a.williams@intel.com
diff -urpN -x dontdiff rc4-clean/Documentation/filesystems/proc.txt linux-2.6.12-rc4/Documentation/filesystems/proc.txt
--- rc4-clean/Documentation/filesystems/proc.txt 2005-05-18 16:35:43.000000000 -0700
+++ linux-2.6.12-rc4/Documentation/filesystems/proc.txt 2005-05-19 11:16:10.000000000 -0700
@@ -1378,7 +1378,13 @@ netdev_max_backlog
------------------
Maximum number of packets, queued on the INPUT side, when the interface
-receives packets faster than kernel can process them.
+receives packets faster than kernel can process them. This is also the
+maximum number of packets handled in a single softirq under NAPI.
+
+netdev_packet_weight
+--------------------
+The value, in netdev_max_backlog unit, of each received packet. This is a
+shift value, and should be set no higher than 3.
optmem_max
----------
diff -urpN -x dontdiff rc4-clean/include/linux/sysctl.h linux-2.6.12-rc4/include/linux/sysctl.h
--- rc4-clean/include/linux/sysctl.h 2005-05-18 16:36:06.000000000 -0700
+++ linux-2.6.12-rc4/include/linux/sysctl.h 2005-05-18 16:44:07.000000000 -0700
@@ -242,6 +242,7 @@ enum
NET_CORE_MOD_CONG=16,
NET_CORE_DEV_WEIGHT=17,
NET_CORE_SOMAXCONN=18,
+ NET_CORE_PACKET_WEIGHT=19,
};
/* /proc/sys/net/ethernet */
diff -urpN -x dontdiff rc4-clean/net/core/dev.c linux-2.6.12-rc4/net/core/dev.c
--- rc4-clean/net/core/dev.c 2005-05-18 16:36:07.000000000 -0700
+++ linux-2.6.12-rc4/net/core/dev.c 2005-05-19 11:16:57.000000000 -0700
@@ -1352,6 +1352,7 @@ out:
=======================================================================*/
int netdev_max_backlog = 300;
+int netdev_packet_weight = 0; /* each packet is worth 1 backlog unit */
int weight_p = 64; /* old backlog weight */
/* These numbers are selected based on intuition and some
* experimentatiom, if you have more scientific way of doing this
@@ -1778,6 +1779,7 @@ static void net_rx_action(struct softirq
struct softnet_data *queue = &__get_cpu_var(softnet_data);
unsigned long start_time = jiffies;
int budget = netdev_max_backlog;
+ int budget_temp;
local_irq_disable();
@@ -1793,21 +1795,22 @@ static void net_rx_action(struct softirq
dev = list_entry(queue->poll_list.next,
struct net_device, poll_list);
netpoll_poll_lock(dev);
-
- if (dev->quota <= 0 || dev->poll(dev, &budget)) {
+ budget_temp = budget;
+ if (dev->quota <= 0 || dev->poll(dev, &budget_temp)) {
netpoll_poll_unlock(dev);
local_irq_disable();
list_del(&dev->poll_list);
list_add_tail(&dev->poll_list, &queue->poll_list);
if (dev->quota < 0)
- dev->quota += dev->weight;
+ dev->quota += dev->weight >> netdev_packet_weight;
else
- dev->quota = dev->weight;
+ dev->quota = dev->weight >> netdev_packet_weight;
} else {
netpoll_poll_unlock(dev);
dev_put(dev);
local_irq_disable();
}
+ budget -= (budget - budget_temp) << netdev_packet_weight;
}
out:
local_irq_enable();
diff -urpN -x dontdiff rc4-clean/net/core/sysctl_net_core.c linux-2.6.12-rc4/net/core/sysctl_net_core.c
--- rc4-clean/net/core/sysctl_net_core.c 2005-03-01 23:38:03.000000000 -0800
+++ linux-2.6.12-rc4/net/core/sysctl_net_core.c 2005-05-18 16:44:09.000000000 -0700
@@ -13,6 +13,7 @@
#ifdef CONFIG_SYSCTL
extern int netdev_max_backlog;
+extern int netdev_packet_weight;
extern int weight_p;
extern int no_cong_thresh;
extern int no_cong;
@@ -91,6 +92,14 @@ ctl_table core_table[] = {
.proc_handler = &proc_dointvec
},
{
+ .ctl_name = NET_CORE_PACKET_WEIGHT,
+ .procname = "netdev_packet_weight",
+ .data = &netdev_packet_weight,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec
+ },
+ {
.ctl_name = NET_CORE_MAX_BACKLOG,
.procname = "netdev_max_backlog",
.data = &netdev_max_backlog,
^ permalink raw reply [flat|nested] 18+ messages in thread* RFC: NAPI packet weighting patch
2005-05-26 21:36 RFC: NAPI packet weighting patch Mitch Williams
@ 2005-05-27 8:21 ` Robert Olsson
2005-05-27 11:18 ` jamal
` (3 subsequent siblings)
4 siblings, 0 replies; 18+ messages in thread
From: Robert Olsson @ 2005-05-27 8:21 UTC (permalink / raw)
To: Mitch Williams; +Cc: netdev, john.ronciak, ganesh.venkatesan, jesse.brandeburg
Hello!
Some comments below.
Mitch Williams writes:
> With the parameter set to 0 (the default), NAPI polling works exactly as
> it does today: each packet is worth one backlog work unit, and the
> maximum number of received packets that will be processed in any given
> softirq is controlled by the 'netdev_max_backlog' parameter.
You should be able to accomplish on per-device basis with dev->weight
> By increasing the packet weight, we accomplish two things: first, we
> cause the individual NAPI RX loops in each driver to process fewer
> packets. This means that they will free up RX resources to the hardware
> more often, which reduces the possibility of dropped packets.
I kind of interesting area and complex as weight setting should consider
coalicing etc.as we try find an acceptable balance of interrupts, polls.
and packtets per poll. Again to me this indicates that this should be done
on driver level.
Do you have more details about the cases your were able to improve and how
your thingking was here. It's kind of unresearched area.
> Second, it shortens the total time spent in the NAPI softirq, which can
> free the CPU to handle other tasks more often, thus reducing overall latency.
At high packet load from several dev's we still only break the RX softirq
when exhausting the total budget or a jiffie. Generally the RX softirq is very
well-behaved due to this.
Cheers.
--ro
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: RFC: NAPI packet weighting patch
2005-05-26 21:36 RFC: NAPI packet weighting patch Mitch Williams
2005-05-27 8:21 ` Robert Olsson
@ 2005-05-27 11:18 ` jamal
2005-05-27 15:50 ` Stephen Hemminger
` (2 subsequent siblings)
4 siblings, 0 replies; 18+ messages in thread
From: jamal @ 2005-05-27 11:18 UTC (permalink / raw)
To: Mitch Williams; +Cc: netdev, john.ronciak, ganesh.venkatesan, jesse.brandeburg
On Thu, 2005-26-05 at 14:36 -0700, Mitch Williams wrote:
> The following patch (which applies to 2.6.12rc4) adds a new sysctl
> parameter called 'netdev_packet_weight'. This parameter controls how many
> backlog work units each RX packet is worth.
>
> With the parameter set to 0 (the default), NAPI polling works exactly as
> it does today: each packet is worth one backlog work unit, and the
> maximum number of received packets that will be processed in any given
> softirq is controlled by the 'netdev_max_backlog' parameter.
>
NAPI uses already using a Weighted Round robin scheduling scheme know as
DRR.
I am not sure providing a weight scale on the weight is enhancing
anything.
Did you try to just reduce the weight instead to make it smaller
instead? i.e take the resultant weight of you using a shift and set that
as the weight.
cheers,
jamal
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: RFC: NAPI packet weighting patch
2005-05-26 21:36 RFC: NAPI packet weighting patch Mitch Williams
2005-05-27 8:21 ` Robert Olsson
2005-05-27 11:18 ` jamal
@ 2005-05-27 15:50 ` Stephen Hemminger
2005-05-27 20:27 ` Mitch Williams
2005-06-02 18:14 ` [PATCH] net: allow controlling NAPI weight with sysfs Stephen Hemminger
2005-06-02 18:19 ` [PATCH] net: fix sysctl_ Stephen Hemminger
4 siblings, 1 reply; 18+ messages in thread
From: Stephen Hemminger @ 2005-05-27 15:50 UTC (permalink / raw)
To: Mitch Williams; +Cc: netdev, john.ronciak, ganesh.venkatesan, jesse.brandeburg
On Thu, 26 May 2005 14:36:22 -0700
Mitch Williams <mitch.a.williams@intel.com> wrote:
> The following patch (which applies to 2.6.12rc4) adds a new sysctl
> parameter called 'netdev_packet_weight'. This parameter controls how many
> backlog work units each RX packet is worth.
>
> With the parameter set to 0 (the default), NAPI polling works exactly as
> it does today: each packet is worth one backlog work unit, and the
> maximum number of received packets that will be processed in any given
> softirq is controlled by the 'netdev_max_backlog' parameter.
>
> By setting the netdev_packet_weight to a nonzero value, we make each
> packet worth more than one backlog work unit. Since it's a shift value, a
> setting of 1 makes each packet worth 2 work units, a setting of 2 makes
> each packet worth 4 units, etc. Under normal circumstances you would
> never use a value higher than 3, though 4 might work for Gigabit and 10
> Gigabit networks.
>
> By increasing the packet weight, we accomplish two things: first, we
> cause the individual NAPI RX loops in each driver to process fewer
> packets. This means that they will free up RX resources to the hardware
> more often, which reduces the possibility of dropped packets. Second, it
> shortens the total time spent in the NAPI softirq, which can free the CPU
> to handle other tasks more often, thus reducing overall latency.
Rather than weighting each packet differently, why not just reduce the upper
bound on number of packets. There are several patches is in my 2.6.12-rc5-tcp3
that make this easier:
----
http://developer.osdl.org/shemminger/patches/2.6.12-rc5-tcp3/patches/bigger-backlog.patch
Separate out the two uses of netdev_max_backlog. One controls the upper
bound on packets processed per softirq, the new name for this is netdev_max_weight;
the other controls the limit on packets queued via netif_rx
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Index: linux-2.6.12-rc4-tcp2/net/core/sysctl_net_core.c
===================================================================
--- linux-2.6.12-rc4-tcp2.orig/net/core/sysctl_net_core.c
+++ linux-2.6.12-rc4-tcp2/net/core/sysctl_net_core.c
@@ -13,6 +13,7 @@
#ifdef CONFIG_SYSCTL
extern int netdev_max_backlog;
+extern int netdev_max_weight;
extern int weight_p;
extern int net_msg_cost;
extern int net_msg_burst;
@@ -137,6 +138,14 @@ ctl_table core_table[] = {
.mode = 0644,
.proc_handler = &proc_dointvec
},
+ {
+ .ctl_name = NET_CORE_MAX_WEIGHT,
+ .procname = "netdev_max_weight",
+ .data = &netdev_max_weight,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec
+ },
{ .ctl_name = 0 }
};
Index: linux-2.6.12-rc4-tcp2/net/core/dev.c
===================================================================
--- linux-2.6.12-rc4-tcp2.orig/net/core/dev.c
+++ linux-2.6.12-rc4-tcp2/net/core/dev.c
@@ -1334,7 +1334,8 @@ out:
Receiver routines
=======================================================================*/
-int netdev_max_backlog = 300;
+int netdev_max_backlog = 10000;
+int netdev_max_weight = 500;
int weight_p = 64; /* old backlog weight */
DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat) = { 0, };
@@ -1682,8 +1683,7 @@ static void net_rx_action(struct softirq
{
struct softnet_data *queue = &__get_cpu_var(softnet_data);
unsigned long start_time = jiffies;
- int budget = netdev_max_backlog;
-
+ int budget = netdev_max_weight;
local_irq_disable();
Index: linux-2.6.12-rc4-tcp2/include/linux/sysctl.h
===================================================================
--- linux-2.6.12-rc4-tcp2.orig/include/linux/sysctl.h
+++ linux-2.6.12-rc4-tcp2/include/linux/sysctl.h
@@ -242,6 +242,7 @@ enum
NET_CORE_MOD_CONG=16,
NET_CORE_DEV_WEIGHT=17,
NET_CORE_SOMAXCONN=18,
+ NET_CORE_MAX_WEIGHT=19,
};
/* /proc/sys/net/ethernet */
----
http://developer.osdl.org/shemminger/patches/2.6.12-rc5-tcp3/patches/fix-weightp.patch
Changing the dev_weight sysctl parameter has no effect because the weight
of the backlog devices is set during initialization and never changed.
Fix this by propogating changes.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Index: linux-2.6.12-rc4-tcp2/net/core/dev.c
===================================================================
--- linux-2.6.12-rc4-tcp2.orig/net/core/dev.c
+++ linux-2.6.12-rc4-tcp2/net/core/dev.c
@@ -1636,6 +1636,7 @@ static int process_backlog(struct net_de
struct softnet_data *queue = &__get_cpu_var(softnet_data);
unsigned long start_time = jiffies;
+ backlog_dev->weight = weight_p;
for (;;) {
struct sk_buff *skb;
struct net_device *dev;
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: RFC: NAPI packet weighting patch
2005-05-27 15:50 ` Stephen Hemminger
@ 2005-05-27 20:27 ` Mitch Williams
2005-05-27 21:01 ` Stephen Hemminger
0 siblings, 1 reply; 18+ messages in thread
From: Mitch Williams @ 2005-05-27 20:27 UTC (permalink / raw)
To: netdev, Stephen Hemminger, hadi, Robert.Olsson
Cc: Ronciak, John, Venkatesan, Ganesh, Brandeburg, Jesse
Stephen, Robert, and Jamal all replied to my original message, and all
said approximately the same thing: "Why don't you just reduce the weight
in the driver? It does the same thing."
To which I reply, respectfully, I know that. And no it doesn't, not
exactly.
My primary reason for adding this setting is to allow for runtime tweaking
-- just like max_backlog has right now. Driver weight is a compile-time
setting, and has to be changed for every driver that you run.
This setting allows you to scale the weight of all your drivers, at
runtime, in one place. It's complimentary to Stephen's max_weight idea --
his patch affects how long you spend in any individual softirq; my patch
affects how long you spend in any driver's individual NAPI poll routine,
as well as how long the softirq lasts.
Perhaps we can merge the two patches to come up with some Ultimate
Tweakable Network Goodness. I'd be happy to do that (on Tuesday; I'm
heading home early today) if anybody's interested.
-Mitch
NB:
I've got a white paper that I wrote up for internal consumption. I plan
to post it to our Sourceforge archive, but I need to do a bunch of
scrubbing first, lest our lawyers go into convulsions.
Meanwhile, I can give away the ending: for my performance tests, on a
pure Gigabit network, I saw consistently better performance by a) using my
patch, b) reducing max_backlog, and c) using a nonzero value for
packet_weight.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: RFC: NAPI packet weighting patch
2005-05-27 20:27 ` Mitch Williams
@ 2005-05-27 21:01 ` Stephen Hemminger
2005-05-28 0:56 ` jamal
0 siblings, 1 reply; 18+ messages in thread
From: Stephen Hemminger @ 2005-05-27 21:01 UTC (permalink / raw)
To: Mitch Williams
Cc: netdev, hadi, Robert.Olsson, Ronciak, John, Venkatesan, Ganesh,
Brandeburg, Jesse
On Fri, 27 May 2005 13:27:04 -0700
Mitch Williams <mitch.a.williams@intel.com> wrote:
>
> Stephen, Robert, and Jamal all replied to my original message, and all
> said approximately the same thing: "Why don't you just reduce the weight
> in the driver? It does the same thing."
>
> To which I reply, respectfully, I know that. And no it doesn't, not
> exactly.
>
> My primary reason for adding this setting is to allow for runtime tweaking
> -- just like max_backlog has right now. Driver weight is a compile-time
> setting, and has to be changed for every driver that you run.
>
> This setting allows you to scale the weight of all your drivers, at
> runtime, in one place. It's complimentary to Stephen's max_weight idea --
> his patch affects how long you spend in any individual softirq; my patch
> affects how long you spend in any driver's individual NAPI poll routine,
> as well as how long the softirq lasts.
>
Why not just allow adjusting dev->weight via sysfs?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: RFC: NAPI packet weighting patch
2005-05-27 21:01 ` Stephen Hemminger
@ 2005-05-28 0:56 ` jamal
2005-05-31 17:35 ` Mitch Williams
0 siblings, 1 reply; 18+ messages in thread
From: jamal @ 2005-05-28 0:56 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Mitch Williams, netdev, Robert.Olsson, Ronciak, John,
Venkatesan, Ganesh, Brandeburg, Jesse
On Fri, 2005-27-05 at 14:01 -0700, Stephen Hemminger wrote:
>
> Why not just allow adjusting dev->weight via sysfs?
I think that should be good enough - and i thought your patch already
did this.
Adding a shift to the weight in a _weighted_ RR algorithm does sound
odd.
cheers,
jamal
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: RFC: NAPI packet weighting patch
2005-05-28 0:56 ` jamal
@ 2005-05-31 17:35 ` Mitch Williams
2005-05-31 17:40 ` Stephen Hemminger
2005-05-31 22:07 ` Jon Mason
0 siblings, 2 replies; 18+ messages in thread
From: Mitch Williams @ 2005-05-31 17:35 UTC (permalink / raw)
To: jamal
Cc: Stephen Hemminger, Williams, Mitch A, netdev, Robert.Olsson,
Ronciak, John, Venkatesan, Ganesh, Brandeburg, Jesse
On Fri, 27 May 2005, jamal wrote:
>
> On Fri, 2005-27-05 at 14:01 -0700, Stephen Hemminger wrote:
>
> >
> > Why not just allow adjusting dev->weight via sysfs?
>
> I think that should be good enough - and i thought your patch already
> did this.
> Adding a shift to the weight in a _weighted_ RR algorithm does sound
> odd.
>
Stephen's patch only affects the weight for the backlog device. Exporting
dev-> weight to sysfs will allow the weight to be set for any network
device. Which makes perfect sense.
I'll work on getting this done and verifying performance this week.
Expect a patch in a few days.
Thanks, guys.
-Mitch
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: RFC: NAPI packet weighting patch
2005-05-31 17:35 ` Mitch Williams
@ 2005-05-31 17:40 ` Stephen Hemminger
2005-05-31 17:43 ` Mitch Williams
2005-05-31 22:07 ` Jon Mason
1 sibling, 1 reply; 18+ messages in thread
From: Stephen Hemminger @ 2005-05-31 17:40 UTC (permalink / raw)
To: Mitch Williams
Cc: jamal, Williams, Mitch A, netdev, Robert.Olsson, Ronciak, John,
Venkatesan, Ganesh, Brandeburg, Jesse
Like this (untested) patch:
Index: napi-sysfs/net/core/net-sysfs.c
===================================================================
--- napi-sysfs.orig/net/core/net-sysfs.c
+++ napi-sysfs/net/core/net-sysfs.c
@@ -184,6 +184,22 @@ static ssize_t store_tx_queue_len(struct
static CLASS_DEVICE_ATTR(tx_queue_len, S_IRUGO | S_IWUSR, show_tx_queue_len,
store_tx_queue_len);
+NETDEVICE_SHOW(weight, fmt_ulong);
+
+static int change_weight(struct net_device *net, unsigned long new_weight)
+{
+ net->weight = new_weight;
+ return 0;
+}
+
+static ssize_t store_weight(struct class_device *dev, const char *buf, size_t len)
+{
+ return netdev_store(dev, buf, len, change_weight);
+}
+
+static CLASS_DEVICE_ATTR(weight, S_IRUGO | S_IWUSR, show_weight,
+ store_weight);
+
static struct class_device_attribute *net_class_attributes[] = {
&class_device_attr_ifindex,
@@ -193,6 +209,7 @@ static struct class_device_attribute *ne
&class_device_attr_features,
&class_device_attr_mtu,
&class_device_attr_flags,
+ &class_device_attr_weight,
&class_device_attr_type,
&class_device_attr_address,
&class_device_attr_broadcast,
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: RFC: NAPI packet weighting patch
2005-05-31 17:40 ` Stephen Hemminger
@ 2005-05-31 17:43 ` Mitch Williams
0 siblings, 0 replies; 18+ messages in thread
From: Mitch Williams @ 2005-05-31 17:43 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Williams, Mitch A, jamal, netdev, Robert.Olsson, Ronciak, John,
Venkatesan, Ganesh, Brandeburg, Jesse
On Tue, 31 May 2005, Stephen Hemminger wrote:
>
> Like this (untested) patch:
>
Gosh, you're making my life too easy. Thanks!
I'll apply this, give it a spin, and let you know what I see.
-Mitch
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: RFC: NAPI packet weighting patch
2005-05-31 17:35 ` Mitch Williams
2005-05-31 17:40 ` Stephen Hemminger
@ 2005-05-31 22:07 ` Jon Mason
2005-05-31 22:14 ` David S. Miller
1 sibling, 1 reply; 18+ messages in thread
From: Jon Mason @ 2005-05-31 22:07 UTC (permalink / raw)
To: Mitch Williams
Cc: jamal, Stephen Hemminger, netdev, Robert.Olsson, Ronciak, John,
Venkatesan, Ganesh, Brandeburg, Jesse
On Tuesday 31 May 2005 12:35 pm, Mitch Williams wrote:
> On Fri, 27 May 2005, jamal wrote:
> > On Fri, 2005-27-05 at 14:01 -0700, Stephen Hemminger wrote:
> > > Why not just allow adjusting dev->weight via sysfs?
> >
> > I think that should be good enough - and i thought your patch already
> > did this.
> > Adding a shift to the weight in a _weighted_ RR algorithm does sound
> > odd.
>
> Stephen's patch only affects the weight for the backlog device. Exporting
> dev-> weight to sysfs will allow the weight to be set for any network
> device. Which makes perfect sense.
>
> I'll work on getting this done and verifying performance this week.
> Expect a patch in a few days.
>
> Thanks, guys.
>
> -Mitch
It seems to me that the drivers should adjust dev->weight dependent on the
media speed/duplexity of the current link. A 10Mbps link will be constantly
re-enabling interrupts, as the incoming traffic is too slow. Why not have it
be 1/4 the weight of the gigabit NAPI weight, and set it when the link speed
is detected (or forced)?
Of course some performace analysis would have to be done to determine the
optimal numbers for each speed/duplexity setting per driver.
Thanks,
Jon
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: RFC: NAPI packet weighting patch
2005-05-31 22:07 ` Jon Mason
@ 2005-05-31 22:14 ` David S. Miller
2005-05-31 23:28 ` Jon Mason
0 siblings, 1 reply; 18+ messages in thread
From: David S. Miller @ 2005-05-31 22:14 UTC (permalink / raw)
To: jdmason
Cc: mitch.a.williams, hadi, shemminger, netdev, Robert.Olsson,
john.ronciak, ganesh.venkatesan, jesse.brandeburg
From: Jon Mason <jdmason@us.ibm.com>
Date: Tue, 31 May 2005 17:07:54 -0500
> Of course some performace analysis would have to be done to determine the
> optimal numbers for each speed/duplexity setting per driver.
per cpu speed, per memory bus speed, per I/O bus speed, and add in other
complications such as NUMA
My point is that whatever experimental number you come up with will be
good for that driver on your systems, not necessarily for others.
Even within a system, whatever number you select will be the wrong
thing to use if one starts a continuous I/O stream to the SATA
controller in the next PCI slot, for example.
We keep getting bitten by this, as the Altix perf data continually shows,
and we need to absolutely stop thinking this way.
The way to go is to make selections based upon observed events and
mesaurements.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: RFC: NAPI packet weighting patch
2005-05-31 22:14 ` David S. Miller
@ 2005-05-31 23:28 ` Jon Mason
2005-06-02 12:26 ` jamal
0 siblings, 1 reply; 18+ messages in thread
From: Jon Mason @ 2005-05-31 23:28 UTC (permalink / raw)
To: David S. Miller
Cc: mitch.a.williams, hadi, shemminger, netdev, Robert.Olsson,
john.ronciak, ganesh.venkatesan, jesse.brandeburg
On Tuesday 31 May 2005 05:14 pm, David S. Miller wrote:
> From: Jon Mason <jdmason@us.ibm.com>
> Date: Tue, 31 May 2005 17:07:54 -0500
>
> > Of course some performace analysis would have to be done to determine the
> > optimal numbers for each speed/duplexity setting per driver.
>
> per cpu speed, per memory bus speed, per I/O bus speed, and add in other
> complications such as NUMA
>
> My point is that whatever experimental number you come up with will be
> good for that driver on your systems, not necessarily for others.
>
> Even within a system, whatever number you select will be the wrong
> thing to use if one starts a continuous I/O stream to the SATA
> controller in the next PCI slot, for example.
>
> We keep getting bitten by this, as the Altix perf data continually shows,
> and we need to absolutely stop thinking this way.
>
> The way to go is to make selections based upon observed events and
> mesaurements.
I'm not arguing against a /proc entry to tune dev->weight for those sysadmins
advanced enough to do that. I am arguing that we can make the driver smarter
(at little/no cost) for "out of the box" users.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: RFC: NAPI packet weighting patch
2005-05-31 23:28 ` Jon Mason
@ 2005-06-02 12:26 ` jamal
2005-06-02 17:30 ` Stephen Hemminger
0 siblings, 1 reply; 18+ messages in thread
From: jamal @ 2005-06-02 12:26 UTC (permalink / raw)
To: Jon Mason
Cc: David S. Miller, mitch.a.williams, shemminger, netdev,
Robert.Olsson, john.ronciak, ganesh.venkatesan, jesse.brandeburg
On Tue, 2005-31-05 at 18:28 -0500, Jon Mason wrote:
> On Tuesday 31 May 2005 05:14 pm, David S. Miller wrote:
> > From: Jon Mason <jdmason@us.ibm.com>
> > Date: Tue, 31 May 2005 17:07:54 -0500
> >
> > > Of course some performace analysis would have to be done to determine the
> > > optimal numbers for each speed/duplexity setting per driver.
> >
> > per cpu speed, per memory bus speed, per I/O bus speed, and add in other
> > complications such as NUMA
> >
> > My point is that whatever experimental number you come up with will be
> > good for that driver on your systems, not necessarily for others.
> >
> > Even within a system, whatever number you select will be the wrong
> > thing to use if one starts a continuous I/O stream to the SATA
> > controller in the next PCI slot, for example.
> >
> > We keep getting bitten by this, as the Altix perf data continually shows,
> > and we need to absolutely stop thinking this way.
> >
> > The way to go is to make selections based upon observed events and
> > mesaurements.
>
> I'm not arguing against a /proc entry to tune dev->weight for those sysadmins
> advanced enough to do that. I am arguing that we can make the driver smarter
> (at little/no cost) for "out of the box" users.
>
What is the point of making the driver "smarter"?
Recall, the algorithm used to schedule the netdevices is based on an
extension of Weighted Round Robin from Varghese et al known as DRR (ask
gooogle for details).
The idea is to provide fairness amongst many drivers. As an example, if
you have a gige driver it shouldnt be taking all the resources at the
expense of starving the fastether driver.
If the admin wants one driver to be more "important" than the other,
s/he will make sure it has a higher weight.
cheers,
jamal
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: RFC: NAPI packet weighting patch
2005-06-02 12:26 ` jamal
@ 2005-06-02 17:30 ` Stephen Hemminger
0 siblings, 0 replies; 18+ messages in thread
From: Stephen Hemminger @ 2005-06-02 17:30 UTC (permalink / raw)
To: hadi
Cc: Jon Mason, David S. Miller, mitch.a.williams, netdev,
Robert.Olsson, john.ronciak, ganesh.venkatesan, jesse.brandeburg
On Thu, 02 Jun 2005 08:26:46 -0400
jamal <hadi@cyberus.ca> wrote:
> On Tue, 2005-31-05 at 18:28 -0500, Jon Mason wrote:
> > On Tuesday 31 May 2005 05:14 pm, David S. Miller wrote:
> > > From: Jon Mason <jdmason@us.ibm.com>
> > > Date: Tue, 31 May 2005 17:07:54 -0500
> > >
> > > > Of course some performace analysis would have to be done to determine the
> > > > optimal numbers for each speed/duplexity setting per driver.
> > >
> > > per cpu speed, per memory bus speed, per I/O bus speed, and add in other
> > > complications such as NUMA
> > >
> > > My point is that whatever experimental number you come up with will be
> > > good for that driver on your systems, not necessarily for others.
> > >
> > > Even within a system, whatever number you select will be the wrong
> > > thing to use if one starts a continuous I/O stream to the SATA
> > > controller in the next PCI slot, for example.
> > >
> > > We keep getting bitten by this, as the Altix perf data continually shows,
> > > and we need to absolutely stop thinking this way.
> > >
> > > The way to go is to make selections based upon observed events and
> > > mesaurements.
> >
> > I'm not arguing against a /proc entry to tune dev->weight for those sysadmins
> > advanced enough to do that. I am arguing that we can make the driver smarter
> > (at little/no cost) for "out of the box" users.
> >
>
> What is the point of making the driver "smarter"?
> Recall, the algorithm used to schedule the netdevices is based on an
> extension of Weighted Round Robin from Varghese et al known as DRR (ask
> gooogle for details).
> The idea is to provide fairness amongst many drivers. As an example, if
> you have a gige driver it shouldnt be taking all the resources at the
> expense of starving the fastether driver.
> If the admin wants one driver to be more "important" than the other,
> s/he will make sure it has a higher weight.
>
In fact, since the default weighting should be based on the amount of cpu time expended
per frame rather than link speed. The point is that a more "heavy weight" driver shouldn't
starve out all the others.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] net: allow controlling NAPI weight with sysfs
2005-05-26 21:36 RFC: NAPI packet weighting patch Mitch Williams
` (2 preceding siblings ...)
2005-05-27 15:50 ` Stephen Hemminger
@ 2005-06-02 18:14 ` Stephen Hemminger
2005-06-08 21:24 ` David S. Miller
2005-06-02 18:19 ` [PATCH] net: fix sysctl_ Stephen Hemminger
4 siblings, 1 reply; 18+ messages in thread
From: Stephen Hemminger @ 2005-06-02 18:14 UTC (permalink / raw)
To: David S. Miller
Cc: Mitch Williams, netdev, john.ronciak, ganesh.venkatesan,
jesse.brandeburg
Simple interface to allow changing network device scheduling weight
with sysfs. Please consider this for 2.6.12, since risk/impact is small.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Index: napi-sysfs/net/core/net-sysfs.c
===================================================================
--- napi-sysfs.orig/net/core/net-sysfs.c
+++ napi-sysfs/net/core/net-sysfs.c
@@ -184,6 +184,22 @@ static ssize_t store_tx_queue_len(struct
static CLASS_DEVICE_ATTR(tx_queue_len, S_IRUGO | S_IWUSR, show_tx_queue_len,
store_tx_queue_len);
+NETDEVICE_SHOW(weight, fmt_ulong);
+
+static int change_weight(struct net_device *net, unsigned long new_weight)
+{
+ net->weight = new_weight;
+ return 0;
+}
+
+static ssize_t store_weight(struct class_device *dev, const char *buf, size_t len)
+{
+ return netdev_store(dev, buf, len, change_weight);
+}
+
+static CLASS_DEVICE_ATTR(weight, S_IRUGO | S_IWUSR, show_weight,
+ store_weight);
+
static struct class_device_attribute *net_class_attributes[] = {
&class_device_attr_ifindex,
@@ -193,6 +209,7 @@ static struct class_device_attribute *ne
&class_device_attr_features,
&class_device_attr_mtu,
&class_device_attr_flags,
+ &class_device_attr_weight,
&class_device_attr_type,
&class_device_attr_address,
&class_device_attr_broadcast,
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH] net: fix sysctl_
2005-05-26 21:36 RFC: NAPI packet weighting patch Mitch Williams
` (3 preceding siblings ...)
2005-06-02 18:14 ` [PATCH] net: allow controlling NAPI weight with sysfs Stephen Hemminger
@ 2005-06-02 18:19 ` Stephen Hemminger
4 siblings, 0 replies; 18+ messages in thread
From: Stephen Hemminger @ 2005-06-02 18:19 UTC (permalink / raw)
To: David S. Miller; +Cc: Mitch Williams, netdev
Changing the sysctl net.core.dev_weight has no effect because the weight
of the backlog devices is set during initialization and never changed.
This patch propagates any changes to the global value affected by sysctl
to the per-cpu devices. It is done every time the packet handler
function is run.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Index: skge-0.8/net/core/dev.c
===================================================================
--- skge-0.8.orig/net/core/dev.c
+++ skge-0.8/net/core/dev.c
@@ -1732,6 +1732,7 @@ static int process_backlog(struct net_de
struct softnet_data *queue = &__get_cpu_var(softnet_data);
unsigned long start_time = jiffies;
+ backlog_dev->weight = weight_p;
for (;;) {
struct sk_buff *skb;
struct net_device *dev;
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2005-06-08 21:24 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-26 21:36 RFC: NAPI packet weighting patch Mitch Williams
2005-05-27 8:21 ` Robert Olsson
2005-05-27 11:18 ` jamal
2005-05-27 15:50 ` Stephen Hemminger
2005-05-27 20:27 ` Mitch Williams
2005-05-27 21:01 ` Stephen Hemminger
2005-05-28 0:56 ` jamal
2005-05-31 17:35 ` Mitch Williams
2005-05-31 17:40 ` Stephen Hemminger
2005-05-31 17:43 ` Mitch Williams
2005-05-31 22:07 ` Jon Mason
2005-05-31 22:14 ` David S. Miller
2005-05-31 23:28 ` Jon Mason
2005-06-02 12:26 ` jamal
2005-06-02 17:30 ` Stephen Hemminger
2005-06-02 18:14 ` [PATCH] net: allow controlling NAPI weight with sysfs Stephen Hemminger
2005-06-08 21:24 ` David S. Miller
2005-06-02 18:19 ` [PATCH] net: fix sysctl_ Stephen Hemminger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).