netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: RFC: NAPI packet weighting patch
@ 2005-06-03 18:19 Ronciak, John
  2005-06-03 18:33 ` Ben Greear
  2005-06-03 20:17 ` Robert Olsson
  0 siblings, 2 replies; 121+ messages in thread
From: Ronciak, John @ 2005-06-03 18:19 UTC (permalink / raw)
  To: Robert Olsson
  Cc: David S. Miller, jdmason, shemminger, hadi, Williams, Mitch A,
	netdev, Venkatesan, Ganesh, Brandeburg, Jesse

>  It's not obvious that weight is to blame for frames dropped. I would 
>  look into RX ring size in relation to HW mitigation.
>  And of course if you system is very loaded the RX softirq gives room
>  for other jobs and frames get dropped
> 
With the same system (fairly high end with nothing major running on it)
we got rid of the dropped frames by just reducing the weight for 64.  So
the weight did have something to do with the dropped frames.  Maybe
other factors as well, but in static tests like this it sure looks like
the 64 value is wrong is some cases.


Cheers,
John

^ permalink raw reply	[flat|nested] 121+ messages in thread
* RE: RFC: NAPI packet weighting patch
@ 2005-06-07 16:23 Ronciak, John
  2005-06-07 20:21 ` David S. Miller
  0 siblings, 1 reply; 121+ messages in thread
From: Ronciak, John @ 2005-06-07 16:23 UTC (permalink / raw)
  To: hadi, Stephen Hemminger
  Cc: Williams, Mitch A, David S. Miller, mchan, buytenh, jdmason,
	netdev, Robert.Olsson, Venkatesan, Ganesh, Brandeburg, Jesse

>> 
> To the intel folks: shouldnt someone be investigating why this is so?

This is why we started all of this.  We have data that is showing this
issue where our over all performance is best in class and yet we can
make it better by changing things like the weight value.

There also seems to be some misconceptions about changing the weight
value.  It actually improves the performance of other drivers as well.
Not as much as it improves the e1000 performance but it does seem to
help others as well.  We (Intel) have to be careful talking about
competitors performance so we just refer to them as competitors in these
threads.  So it is not just e1000 who benefits from the lower weight
values.  One thing it is doing for e1000 right now is that it is
stopping the e1000 from dropping frames which is part of why it's
helping the e1000 more (I think).

I agree that we need to bottom out on this and it's why we are
dedicating the time and resources to this effort.  We also appreciate
all the effort to help resolve this as well.  This should result in a
better performing 2.6 stack and drivers.  The new TSO code is a big step
in that direction as well.

Cheers,
John

^ permalink raw reply	[flat|nested] 121+ messages in thread
* RE: RFC: NAPI packet weighting patch
@ 2005-06-06 20:29 Ronciak, John
  2005-06-06 23:55 ` Mitch Williams
  0 siblings, 1 reply; 121+ messages in thread
From: Ronciak, John @ 2005-06-06 20:29 UTC (permalink / raw)
  To: David S. Miller
  Cc: mchan, hadi, buytenh, Williams, Mitch A, jdmason, shemminger,
	netdev, Robert.Olsson, Venkatesan, Ganesh, Brandeburg, Jesse

> 	If you force the e1000 driver to do RX replenishment every N
> 	packets it should reduce the packet drops the same (in the
> 	single NIC case) as if you reduced the dev->weight to that
> 	same value N.

But this isn't what we are seeing.  Even if we just reduce the weight
value to 32 from 64, all of the drops go away.  So there seems to be
other things affecting this.

We are just talking about single NIC testing at this point.  I agree
that single and multi-NIC results different issues and we will need to
test this as well with whatever we come up with out of this.

I also like your idea about the weight value being adjusted based on
real work done using some measurable metric.  This seems like a good
path to explore as well.

Cheers,
John

^ permalink raw reply	[flat|nested] 121+ messages in thread
* RE: RFC: NAPI packet weighting patch
@ 2005-06-06 15:35 Ronciak, John
  2005-06-06 19:47 ` David S. Miller
  0 siblings, 1 reply; 121+ messages in thread
From: Ronciak, John @ 2005-06-06 15:35 UTC (permalink / raw)
  To: David S. Miller, mchan
  Cc: hadi, buytenh, Williams, Mitch A, jdmason, shemminger, netdev,
	Robert.Olsson, Venkatesan, Ganesh, Brandeburg, Jesse

We are dropping packets at the HW level (FIFO errors) with 256
descriptors and the default weight of 64.  As we said reducing the
weight eliminates this which is understandable since the driver is being
serviced more fequently.  We also hacked the driver to do a buffer
allocation per packet sent up the stack.  This reduced the number of
dropped pacekts by about 80% but it was still a significant number of
drops (190K to 39K dropped).  So I don't think this is where the problem
is.  This is also comfimed with the tg3 driver doing the buffer update
to the HW every 25 descriptors.

We did not up the descriptor ring size with the default weight but will
try this today and report back.

Cheers,
John


> -----Original Message-----
> From: David S. Miller [mailto:davem@davemloft.net] 
> Sent: Sunday, June 05, 2005 11:43 PM
> To: mchan@broadcom.com
> Cc: hadi@cyberus.ca; buytenh@wantstofly.org; Williams, Mitch 
> A; Ronciak, John; jdmason@us.ibm.com; shemminger@osdl.org; 
> netdev@oss.sgi.com; Robert.Olsson@data.slu.se; Venkatesan, 
> Ganesh; Brandeburg, Jesse
> Subject: Re: RFC: NAPI packet weighting patch
> 
> 
> From: "David S. Miller" <davem@davemloft.net>
> Date: Sun, 05 Jun 2005 14:36:53 -0700 (PDT)
> 
> > BTW, here is the patch implementing this stuff.
> 
> A new patch and some more data.
> 
> When we go to gigabit, and NAPI kicks in, the first RX
> packet costs a lot (cache misses etc.) but the rest are
> very efficient to process.  I suspect this only holds
> for the single socket case, and on a real system processing
> many connections the cost drop might not be so clean.
> 
> The log output format is:
> 
> (TX_TICKS:RX_TICKS[ RX_TICK1 RX_TICK2 RX_TICK3 ... ])
> 
> Here is an example trace from a single socket TCP stream
> send over gigabit:
> 
> (9:112[ 26 8 7 8 7 ])
> (6:110[ 23 8 8 8 7 ])
> (7:57[ 26 8 ])
> (6:117[ 25 8 9 7 7 ])
> (5:37[ 26 ])
> (6:113[ 28 8 7 8 7 ])
> (0:20[ 9 ])
> (8:111[ 27 7 7 8 7 ])
> (5:109[ 25 8 8 8 7 ])
> (8:113[ 25 7 8 9 7 ])
> (6:108[ 25 8 7 7 7 ])
> (8:88[ 26 8 8 7 ])
> (6:109[ 25 7 7 7 7 ])
> (6:111[ 25 9 8 7 7 ])
> (0:48[ 9 5 ])
> 
> This kind of trace reiterates some things we already know.
> For example, mitigation (HW, SW, or a combination of both)
> helps because processing multiple packets let's us "reuse"
> the cpu cache priming the handling of the first packet
> achieves for us.
> 
> It would be great to stick something like this into the e1000
> driver, and get some output from it with Intel's single NIC
> performance degradation test case.
> 
> It is also necessary for the Intel folks to say whether the
> NIC is running out of RX descriptors in the single NIC
> case with dev->weight set to the default of 64.  If so, does
> increasing the RX ring size to a larger value via ethtool
> help?  If not, then why in the world are things running more
> slowly?
> 
> I've got a crappy 1.5GHZ sparc64 box in my tg3 tests here, and it can
> handle gigabit line rate with much CPU to spare.  So either Intel is
> doing something other than TCP stream tests, or something else is out
> of whack.
> 
> I even tried to do things like having a memory touching program
> run in parallel with the TCP stream test, and this did not make
> the timing numbers in the logs increase much at all.
> 
> --- ./drivers/net/tg3.c.~1~	2005-06-03 11:13:14.000000000 -0700
> +++ ./drivers/net/tg3.c	2005-06-05 23:21:11.000000000 -0700
> @@ -2836,7 +2836,22 @@ static int tg3_rx(struct tg3 *tp, int bu
>  				    desc->err_vlan & RXD_VLAN_MASK);
>  		} else
>  #endif
> +		{
> +			unsigned long t = get_cycles();
> +			struct tg3_poll_log_ent *lp;
> +			unsigned int ent;
> +
>  			netif_receive_skb(skb);
> +			t = get_cycles() - t;
> +
> +			ent = tp->poll_log_ent;
> +			lp = &tp->poll_log[ent];
> +			ent = lp->rx_cur_ent;
> +			if (ent < POLL_RX_SIZE) {
> +				lp->rx_ents[ent] = (u16) t;
> +				lp->rx_cur_ent = ent + 1;
> +			}
> +		}
>  
>  		tp->dev->last_rx = jiffies;
>  		received++;
> @@ -2897,9 +2912,15 @@ static int tg3_poll(struct net_device *n
>  
>  	/* run TX completion thread */
>  	if (sblk->idx[0].tx_consumer != tp->tx_cons) {
> +		unsigned long t;
> +
>  		spin_lock(&tp->tx_lock);
> +		t = get_cycles();		
>  		tg3_tx(tp);
> +		t = get_cycles() - t;
>  		spin_unlock(&tp->tx_lock);
> +
> +		tp->poll_log[tp->poll_log_ent].tx_ticks = (u16) t;
>  	}
>  
>  	spin_unlock_irqrestore(&tp->lock, flags);
> @@ -2911,16 +2932,28 @@ static int tg3_poll(struct net_device *n
>  	if (sblk->idx[0].rx_producer != tp->rx_rcb_ptr) {
>  		int orig_budget = *budget;
>  		int work_done;
> +		unsigned long t;
> +		unsigned int ent;
>  
>  		if (orig_budget > netdev->quota)
>  			orig_budget = netdev->quota;
>  
> +		t = get_cycles();
>  		work_done = tg3_rx(tp, orig_budget);
> +		t = get_cycles() - t;
> +
> +		ent = tp->poll_log_ent;
> +		tp->poll_log[ent].rx_ticks = (u16) t;
>  
>  		*budget -= work_done;
>  		netdev->quota -= work_done;
>  	}
>  
> +	tp->poll_log_ent = (tp->poll_log_ent + 1) & POLL_LOG_MASK;
> +	tp->poll_log[tp->poll_log_ent].tx_ticks = 0;
> +	tp->poll_log[tp->poll_log_ent].rx_ticks = 0;
> +	tp->poll_log[tp->poll_log_ent].rx_cur_ent = 0;
> +
>  	if (tp->tg3_flags & TG3_FLAG_TAGGED_STATUS)
>  		tp->last_tag = sblk->status_tag;
>  	rmb();
> @@ -6609,6 +6642,27 @@ static struct net_device_stats *tg3_get_
>  	stats->rx_crc_errors = old_stats->rx_crc_errors +
>  		calc_crc_errors(tp);
>  
> +	/* XXX Yes, I know, do this right. :-)  */
> +	{
> +		unsigned int ent;
> +
> +		printk("TG3: POLL LOG, current ent[%d]\n", 
> tp->poll_log_ent);
> +		ent = tp->poll_log_ent - (POLL_LOG_SIZE - 1);
> +		ent &= POLL_LOG_MASK;
> +		while (ent != tp->poll_log_ent) {
> +			struct tg3_poll_log_ent *lp = 
> &tp->poll_log[ent];
> +			int i;
> +
> +			printk("(%u:%u[ ",
> +			       lp->tx_ticks, lp->rx_ticks);
> +			for (i = 0; i < lp->rx_cur_ent; i++)
> +				printk("%d ", lp->rx_ents[i]);
> +			printk("])\n");
> +
> +			ent = (ent + 1) & POLL_LOG_MASK;
> +		}
> +	}
> +
>  	return stats;
>  }
>  
> --- ./drivers/net/tg3.h.~1~	2005-06-03 11:13:14.000000000 -0700
> +++ ./drivers/net/tg3.h	2005-06-05 23:21:05.000000000 -0700
> @@ -2003,6 +2003,15 @@ struct tg3_ethtool_stats {
>  	u64		nic_tx_threshold_hit;
>  };
>  
> +struct tg3_poll_log_ent {
> +	u16 tx_ticks;
> +	u16 rx_ticks;
> +#define POLL_RX_SIZE	8
> +#define POLL_RX_MASK	(POLL_RX_SIZE - 1)
> +	u16 rx_cur_ent;
> +	u16 rx_ents[POLL_RX_SIZE];
> +};
> +
>  struct tg3 {
>  	/* begin "general, frequently-used members" cacheline section */
>  
> @@ -2232,6 +2241,11 @@ struct tg3 {
>  #define SST_25VF0X0_PAGE_SIZE		4098
>  
>  	struct ethtool_coalesce		coal;
> +
> +#define POLL_LOG_SIZE	(1 << 7)
> +#define POLL_LOG_MASK	(POLL_LOG_SIZE - 1)
> +	unsigned int			poll_log_ent;
> +	struct tg3_poll_log_ent		poll_log[POLL_LOG_SIZE];
>  };
>  
>  #endif /* !(_T3_H) */
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread
* RE: RFC: NAPI packet weighting patch
@ 2005-06-03 17:40 Ronciak, John
  2005-06-03 18:08 ` Robert Olsson
  0 siblings, 1 reply; 121+ messages in thread
From: Ronciak, John @ 2005-06-03 17:40 UTC (permalink / raw)
  To: David S. Miller
  Cc: jdmason, shemminger, hadi, Williams, Mitch A, netdev,
	Robert.Olsson, Venkatesan, Ganesh, Brandeburg, Jesse

> What more do you need other than checking the statistics counter?  The
> drop statistics (the ones we care about) are incremented in real time
> by the ->poll() code, so it's not like we have to trigger some
> asynchronous event to get a current version of the number.
> 

I think that there is some more confusion here.  I'm talking about
frames dropped by the Ethernet controller at the hardware level (no
descriptor available).  This for example is happening now with our
driver with the weight set to 64.  This is also what started us looking
into what was going on with the weight.  I don't see how the NAPI code
to dynamically adjust the weight could easily get the hardware stats
number to know if frames are being dropped or not.  Sorry if I caused
the confusion here.

Mitch is working on a response to Jamal's last mail trying to level set
what we are seeing and doing.

Cheers,
John

^ permalink raw reply	[flat|nested] 121+ messages in thread
* RE: RFC: NAPI packet weighting patch
@ 2005-06-03  0:11 Ronciak, John
  2005-06-03  0:18 ` David S. Miller
  0 siblings, 1 reply; 121+ messages in thread
From: Ronciak, John @ 2005-06-03  0:11 UTC (permalink / raw)
  To: Jon Mason, David S. Miller
  Cc: shemminger, hadi, Williams, Mitch A, netdev, Robert.Olsson,
	Venkatesan, Ganesh, Brandeburg, Jesse

I like this idea as well but I do an issue with it.  How would this
stack code find out that the weight is too high and pacekts are being
dropped (not being polled fast enough)?  It would have to check the
controller stats to see the error count increasing for some period.  I'm
not sure this is workable unless we have some sort of feedback which the
driver could send up (or set) saying that this is happening and the
dynamic weight code could take into acount.

Comments?

Cheers,
John


> -----Original Message-----
> From: Jon Mason [mailto:jdmason@us.ibm.com] 
> Sent: Thursday, June 02, 2005 3:20 PM
> To: David S. Miller
> Cc: shemminger@osdl.org; Ronciak, John; hadi@cyberus.ca; 
> Williams, Mitch A; netdev@oss.sgi.com; 
> Robert.Olsson@data.slu.se; Venkatesan, Ganesh; Brandeburg, Jesse
> Subject: Re: RFC: NAPI packet weighting patch
> 
> 
> On Thursday 02 June 2005 05:12 pm, David S. Miller wrote:
> > From: Jon Mason <jdmason@us.ibm.com>
> > Date: Thu, 2 Jun 2005 16:51:48 -0500
> >
> > > Why not have the driver set the weight to 16/32 
> respectively for the
> > > weight (or better yet, have someone run numbers to find 
> weight that
> > > are closer to what the adapter can actually use)?  While these
> > > numbers may not be optimal for every system, this is much better
> > > that the current system, and would only require 5 or so 
> extra lines
> > > of code per NAPI enabled driver.
> >
> > Why do this when we can adjust the weight in one spot,
> > namely the upper level NAPI ->poll() running loop?
> >
> > It can measure the overhead, how many packets processed, etc.
> > and make intelligent decisions based upon that.  This is a CPU
> > speed, memory speed, I/O bus speed, and link speed agnostic
> > solution.
> >
> > The driver need not take any part in this, and the scheme will
> > dynamically adjust to resource usage changes in the system.
> 
> Yes, a much better idea to do this generically.  I 100% agree 
> with you.
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread
* RE: RFC: NAPI packet weighting patch
@ 2005-06-02 21:19 Ronciak, John
  2005-06-02 21:31 ` Stephen Hemminger
  0 siblings, 1 reply; 121+ messages in thread
From: Ronciak, John @ 2005-06-02 21:19 UTC (permalink / raw)
  To: hadi, Jon Mason
  Cc: David S. Miller, Williams, Mitch A, shemminger, netdev,
	Robert.Olsson, Venkatesan, Ganesh, Brandeburg, Jesse

The DRR algorithm assumes a perfect world, where hardware resources are
infinite, packets arrive continuously (or separated by very long
delays), there are no bus latencies, and CPU speed is infinite.

The real world is much messier: hardware starves for resources if it's
not serviced quickly enough, packets arrive at inconvenient intervals
(especially at 10 and 100 Mbps speeds), and buses and CPUs are slow.

Thus, the driver should have the intelligence built into it to make an
"intelligent" choice on what the weight should be for that
driver/hardware.  The calculation in the driver should take into account
all the factors that the driver has access to.  These include link
speed, bus type and speed, processor speed and some amount of actual
device FIFO size and latency smarts.  The driver would use all of the
factors to come up with a weight to prevent it from dropping frames and
not to starve out other devices in the system or hinder performance.  It
seems to us that the driver is the one that know best and should try to
come up with a reasonable value for weight based on its own knowledge of
the hardware.

This has been showing up in our NAPI test data which Mitch is currently
scrubbing for release.  It shows that there is a need for either better
default static weight numbers or for them to be calculated based on some
system dynamic variables.  We would like to see the latter tried but the
only problem is that each driver would have to make its own
calculations, and it may not have access to all of the system-wide data
it would need to make a proper calculation.

Even with a more intelligent driver, we still would like to see some
mechanism for the weight to be changed at runtime, such as with
Stephen's sysfs patch.  This would allow a sysadmin (or user-space app)
to tune the system based on statistical data that isn't available to the
individual driver.

Cheers,
John


> -----Original Message-----
> From: jamal [mailto:hadi@cyberus.ca] 
> Sent: Thursday, June 02, 2005 5:27 AM
> To: Jon Mason
> Cc: David S. Miller; Williams, Mitch A; shemminger@osdl.org; 
> netdev@oss.sgi.com; Robert.Olsson@data.slu.se; Ronciak, John; 
> Venkatesan, Ganesh; Brandeburg, Jesse
> Subject: Re: RFC: NAPI packet weighting patch
> 
> 
> On Tue, 2005-31-05 at 18:28 -0500, Jon Mason wrote:
> > On Tuesday 31 May 2005 05:14 pm, David S. Miller wrote:
> > > From: Jon Mason <jdmason@us.ibm.com>
> > > Date: Tue, 31 May 2005 17:07:54 -0500
> > >
> > > > Of course some performace analysis would have to be 
> done to determine the
> > > > optimal numbers for each speed/duplexity setting per driver.
> > >
> > > per cpu speed, per memory bus speed, per I/O bus speed, 
> and add in other
> > > complications such as NUMA
> > >
> > > My point is that whatever experimental number you come up 
> with will be
> > > good for that driver on your systems, not necessarily for others.
> > >
> > > Even within a system, whatever number you select will be the wrong
> > > thing to use if one starts a continuous I/O stream to the SATA
> > > controller in the next PCI slot, for example.
> > >
> > > We keep getting bitten by this, as the Altix perf data 
> continually shows,
> > > and we need to absolutely stop thinking this way.
> > >
> > > The way to go is to make selections based upon observed events and
> > > mesaurements.
> > 
> > I'm not arguing against a /proc entry to tune dev->weight 
> for those sysadmins 
> > advanced enough to do that.  I am arguing that we can make 
> the driver smarter 
> > (at little/no cost)  for "out of the box" users.
> > 
> 
> What is the point of making the driver "smarter"? 
> Recall, the algorithm used to schedule the netdevices is based on an
> extension of Weighted Round Robin from Varghese et al known 
> as DRR (ask
> gooogle for details).
> The idea is to provide fairness amongst many drivers. As an 
> example, if
> you have a gige driver it shouldnt be taking all the resources at the
> expense of starving the fastether driver.
> If the admin wants one driver to be more "important" than the other,
> s/he will make sure it has a higher weight.
> 
> cheers,
> jamal
> 
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread
* RFC: NAPI packet weighting patch
@ 2005-05-26 21:36 Mitch Williams
  2005-05-27  8:21 ` Robert Olsson
                   ` (2 more replies)
  0 siblings, 3 replies; 121+ messages in thread
From: Mitch Williams @ 2005-05-26 21:36 UTC (permalink / raw)
  To: netdev; +Cc: john.ronciak, ganesh.venkatesan, jesse.brandeburg

The following patch (which applies to 2.6.12rc4) adds a new sysctl
parameter called 'netdev_packet_weight'.  This parameter controls how many
backlog work units each RX packet is worth.

With the parameter set to 0 (the default), NAPI polling works exactly as
it does today:  each packet is worth one backlog work unit, and the
maximum number of received packets that will be processed in any given
softirq is controlled by the 'netdev_max_backlog' parameter.

By setting the netdev_packet_weight to a nonzero value, we make each
packet worth more than one backlog work unit.  Since it's a shift value, a
setting of 1 makes each packet worth 2 work units, a setting of 2 makes
each packet worth 4 units, etc.  Under normal circumstances you would
never use a value higher than 3, though 4 might work for Gigabit and 10
Gigabit networks.

By increasing the packet weight, we accomplish two things:  first, we
cause the individual NAPI RX loops in each driver to process fewer
packets.  This means that they will free up RX resources to the hardware
more often, which reduces the possibility of dropped packets.  Second, it
shortens the total time spent in the NAPI softirq, which can free the CPU
to handle other tasks more often, thus reducing overall latency.

Performance tests in our lab have shown that tweaking this parameter,
along with the netdev_max_backlog parameter, can provide significant
performance increase -- greater than 100Mbps improvement -- over default
settings.  I tested with both e1000 and tg3 drivers and saw improvement in
both cases.  I did not see higher CPU utilization, even with the increased
throughput.

The caveat, of course, is that different systems and network
configurations require different settings.  On the other hand, that's
really no different than what we see with the max_backlog parameter today.
On some systems neither parameter makes any difference.

Still, we feel that there is value to having this in the kernel.  Please
test and comment as you have time available.

Thanks!
-Mitch Williams
mitch.a.williams@intel.com




diff -urpN -x dontdiff rc4-clean/Documentation/filesystems/proc.txt linux-2.6.12-rc4/Documentation/filesystems/proc.txt
--- rc4-clean/Documentation/filesystems/proc.txt	2005-05-18 16:35:43.000000000 -0700
+++ linux-2.6.12-rc4/Documentation/filesystems/proc.txt	2005-05-19 11:16:10.000000000 -0700
@@ -1378,7 +1378,13 @@ netdev_max_backlog
 ------------------

 Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface
-receives packets faster than kernel can process them.
+receives packets faster than kernel can process them.  This is also the
+maximum number of packets handled in a single softirq under NAPI.
+
+netdev_packet_weight
+--------------------
+The value, in netdev_max_backlog unit, of each received packet.  This is a
+shift value, and should be set no higher than 3.

 optmem_max
 ----------
diff -urpN -x dontdiff rc4-clean/include/linux/sysctl.h linux-2.6.12-rc4/include/linux/sysctl.h
--- rc4-clean/include/linux/sysctl.h	2005-05-18 16:36:06.000000000 -0700
+++ linux-2.6.12-rc4/include/linux/sysctl.h	2005-05-18 16:44:07.000000000 -0700
@@ -242,6 +242,7 @@ enum
 	NET_CORE_MOD_CONG=16,
 	NET_CORE_DEV_WEIGHT=17,
 	NET_CORE_SOMAXCONN=18,
+	NET_CORE_PACKET_WEIGHT=19,
 };

 /* /proc/sys/net/ethernet */
diff -urpN -x dontdiff rc4-clean/net/core/dev.c linux-2.6.12-rc4/net/core/dev.c
--- rc4-clean/net/core/dev.c	2005-05-18 16:36:07.000000000 -0700
+++ linux-2.6.12-rc4/net/core/dev.c	2005-05-19 11:16:57.000000000 -0700
@@ -1352,6 +1352,7 @@ out:
   =======================================================================*/

 int netdev_max_backlog = 300;
+int netdev_packet_weight = 0; /* each packet is worth 1 backlog unit */
 int weight_p = 64;            /* old backlog weight */
 /* These numbers are selected based on intuition and some
  * experimentatiom, if you have more scientific way of doing this
@@ -1778,6 +1779,7 @@ static void net_rx_action(struct softirq
 	struct softnet_data *queue = &__get_cpu_var(softnet_data);
 	unsigned long start_time = jiffies;
 	int budget = netdev_max_backlog;
+	int budget_temp;


 	local_irq_disable();
@@ -1793,21 +1795,22 @@ static void net_rx_action(struct softirq
 		dev = list_entry(queue->poll_list.next,
 				 struct net_device, poll_list);
 		netpoll_poll_lock(dev);
-
-		if (dev->quota <= 0 || dev->poll(dev, &budget)) {
+		budget_temp = budget;
+		if (dev->quota <= 0 || dev->poll(dev, &budget_temp)) {
 			netpoll_poll_unlock(dev);
 			local_irq_disable();
 			list_del(&dev->poll_list);
 			list_add_tail(&dev->poll_list, &queue->poll_list);
 			if (dev->quota < 0)
-				dev->quota += dev->weight;
+				dev->quota += dev->weight >> netdev_packet_weight;
 			else
-				dev->quota = dev->weight;
+				dev->quota = dev->weight >> netdev_packet_weight;
 		} else {
 			netpoll_poll_unlock(dev);
 			dev_put(dev);
 			local_irq_disable();
 		}
+		budget -= (budget - budget_temp) << netdev_packet_weight;
 	}
 out:
 	local_irq_enable();
diff -urpN -x dontdiff rc4-clean/net/core/sysctl_net_core.c linux-2.6.12-rc4/net/core/sysctl_net_core.c
--- rc4-clean/net/core/sysctl_net_core.c	2005-03-01 23:38:03.000000000 -0800
+++ linux-2.6.12-rc4/net/core/sysctl_net_core.c	2005-05-18 16:44:09.000000000 -0700
@@ -13,6 +13,7 @@
 #ifdef CONFIG_SYSCTL

 extern int netdev_max_backlog;
+extern int netdev_packet_weight;
 extern int weight_p;
 extern int no_cong_thresh;
 extern int no_cong;
@@ -91,6 +92,14 @@ ctl_table core_table[] = {
 		.proc_handler	= &proc_dointvec
 	},
 	{
+		.ctl_name	= NET_CORE_PACKET_WEIGHT,
+		.procname	= "netdev_packet_weight",
+		.data		= &netdev_packet_weight,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec
+	},
+	{
 		.ctl_name	= NET_CORE_MAX_BACKLOG,
 		.procname	= "netdev_max_backlog",
 		.data		= &netdev_max_backlog,

^ permalink raw reply	[flat|nested] 121+ messages in thread

end of thread, other threads:[~2005-06-23 17:36 UTC | newest]

Thread overview: 121+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-03 18:19 RFC: NAPI packet weighting patch Ronciak, John
2005-06-03 18:33 ` Ben Greear
2005-06-03 18:49   ` David S. Miller
2005-06-03 18:59     ` Ben Greear
2005-06-03 19:02       ` David S. Miller
2005-06-03 20:17 ` Robert Olsson
2005-06-03 20:30   ` David S. Miller
  -- strict thread matches above, loose matches on Subject: below --
2005-06-07 16:23 Ronciak, John
2005-06-07 20:21 ` David S. Miller
2005-06-08  2:20   ` Jesse Brandeburg
2005-06-08  3:31     ` David S. Miller
2005-06-08  3:43     ` David S. Miller
2005-06-08 13:36       ` jamal
2005-06-09 21:37         ` Jesse Brandeburg
2005-06-09 22:05           ` Stephen Hemminger
2005-06-09 22:12             ` Jesse Brandeburg
2005-06-09 22:21               ` David S. Miller
2005-06-09 22:21               ` jamal
2005-06-09 22:22             ` David S. Miller
2005-06-09 22:20           ` jamal
2005-06-06 20:29 Ronciak, John
2005-06-06 23:55 ` Mitch Williams
2005-06-07  0:08   ` Ben Greear
2005-06-08  1:50     ` Jesse Brandeburg
2005-06-07  4:53   ` Stephen Hemminger
2005-06-07 12:38     ` jamal
2005-06-07 12:06       ` Martin Josefsson
2005-06-07 13:29         ` jamal
2005-06-07 12:36           ` Martin Josefsson
2005-06-07 16:34             ` Robert Olsson
2005-06-07 23:19               ` Rick Jones
2005-06-21 20:37         ` David S. Miller
2005-06-22  7:27           ` Eric Dumazet
2005-06-22  8:42           ` P
2005-06-22 19:37             ` jamal
2005-06-23  8:56               ` P
2005-06-21 20:20     ` David S. Miller
2005-06-21 20:38       ` Rick Jones
2005-06-21 20:55         ` David S. Miller
2005-06-21 21:47         ` Andi Kleen
2005-06-21 22:22           ` Donald Becker
2005-06-21 22:34             ` Andi Kleen
2005-06-22  0:08               ` Donald Becker
2005-06-22  4:44                 ` Chris Friesen
2005-06-22 11:31                   ` Andi Kleen
2005-06-22 16:23                 ` Leonid Grossman
2005-06-22 16:37                   ` jamal
2005-06-22 18:00                     ` Leonid Grossman
2005-06-22 18:06                       ` Andi Kleen
2005-06-22 20:22                         ` David S. Miller
2005-06-22 20:35                           ` Rick Jones
2005-06-22 20:43                             ` David S. Miller
2005-06-22 21:10                           ` Andi Kleen
2005-06-22 21:16                             ` David S. Miller
2005-06-22 21:53                             ` Chris Friesen
2005-06-22 22:11                               ` Andi Kleen
2005-06-22 21:38                           ` Eric Dumazet
2005-06-22 22:13                             ` Eric Dumazet
2005-06-22 22:30                               ` David S. Miller
2005-06-22 22:23                             ` David S. Miller
2005-06-23 12:14                               ` jamal
2005-06-23 17:36                                 ` David Mosberger
2005-06-22 22:42                           ` Leonid Grossman
2005-06-22 23:13                             ` Andi Kleen
2005-06-22 23:19                               ` David S. Miller
2005-06-22 23:23                                 ` Andi Kleen
2005-06-22 17:05                   ` Andi Kleen
2005-06-06 15:35 Ronciak, John
2005-06-06 19:47 ` David S. Miller
2005-06-03 17:40 Ronciak, John
2005-06-03 18:08 ` Robert Olsson
2005-06-03  0:11 Ronciak, John
2005-06-03  0:18 ` David S. Miller
2005-06-03  2:32   ` jamal
2005-06-03 17:43     ` Mitch Williams
2005-06-03 18:38       ` David S. Miller
2005-06-03 18:42       ` jamal
2005-06-03 19:01         ` David S. Miller
2005-06-03 19:28           ` Mitch Williams
2005-06-03 19:59             ` jamal
2005-06-03 20:31               ` David S. Miller
2005-06-03 21:12                 ` Jon Mason
2005-06-03 20:22             ` David S. Miller
2005-06-03 20:29               ` David S. Miller
2005-06-03 19:49                 ` Michael Chan
2005-06-03 20:59                   ` Lennert Buytenhek
2005-06-03 20:35                     ` Michael Chan
2005-06-03 22:29                       ` jamal
2005-06-04  0:25                         ` Michael Chan
2005-06-05 21:36                           ` David S. Miller
2005-06-06  6:43                             ` David S. Miller
2005-06-03 23:26                       ` Lennert Buytenhek
2005-06-05 20:11                       ` David S. Miller
2005-06-03 21:07                     ` Edgar E Iglesias
2005-06-03 23:30                       ` Lennert Buytenhek
2005-06-03 20:30             ` Ben Greear
2005-06-03 19:40           ` jamal
2005-06-03 20:23             ` jamal
2005-06-03 20:28               ` Mitch Williams
2005-06-02 21:19 Ronciak, John
2005-06-02 21:31 ` Stephen Hemminger
2005-06-02 21:40   ` David S. Miller
2005-06-02 21:51   ` Jon Mason
2005-06-02 22:12     ` David S. Miller
2005-06-02 22:19       ` Jon Mason
2005-06-02 22:15     ` Robert Olsson
2005-05-26 21:36 Mitch Williams
2005-05-27  8:21 ` Robert Olsson
2005-05-27 11:18 ` jamal
2005-05-27 15:50 ` Stephen Hemminger
2005-05-27 20:27   ` Mitch Williams
2005-05-27 21:01     ` Stephen Hemminger
2005-05-28  0:56       ` jamal
2005-05-31 17:35         ` Mitch Williams
2005-05-31 17:40           ` Stephen Hemminger
2005-05-31 17:43             ` Mitch Williams
2005-05-31 22:07           ` Jon Mason
2005-05-31 22:14             ` David S. Miller
2005-05-31 23:28               ` Jon Mason
2005-06-02 12:26                 ` jamal
2005-06-02 17:30                   ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).