* [PATCH 2.6] e100: use NAPI mode all the time
@ 2004-06-05 0:35 Scott Feldman
2004-06-06 22:57 ` Tim Mattox
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Scott Feldman @ 2004-06-05 0:35 UTC (permalink / raw)
To: jgarzik; +Cc: netdev, scott.feldman
I see no reason to keep the non-NAPI option for e100. This patch removes
the CONFIG_E100_NAPI option and puts the driver in NAPI mode all the time.
Matches the way tg3 works.
Unless someone has a really good reason to keep the non-NAPI mode, this
should go in for 2.6.7.
-scott
----------------
diff -Naurp linux-2.6.7-rc2-bk5/drivers/net/e100.c linux-2.6.7-rc2-bk5.mod/drivers/net/e100.c
--- linux-2.6.7-rc2-bk5/drivers/net/e100.c 2004-06-04 15:58:07.000000000 -0700
+++ linux-2.6.7-rc2-bk5.mod/drivers/net/e100.c 2004-06-04 16:02:04.000000000 -0700
@@ -87,9 +87,8 @@
* cb_to_use is the next CB to use for queuing a command; cb_to_clean
* is the next CB to check for completion; cb_to_send is the first
* CB to start on in case of a previous failure to resume. CB clean
- * up happens in interrupt context in response to a CU interrupt, or
- * in dev->poll in the case where NAPI is enabled. cbs_avail keeps
- * track of number of free CB resources available.
+ * up happens in interrupt context in response to a CU interrupt.
+ * cbs_avail keeps track of number of free CB resources available.
*
* Hardware padding of short packets to minimum packet size is
* enabled. 82557 pads with 7Eh, while the later controllers pad
@@ -112,9 +111,8 @@
* replacement RFDs cannot be allocated, or the RU goes non-active,
* the RU must be restarted. Frame arrival generates an interrupt,
* and Rx indication and re-allocation happen in the same context,
- * therefore no locking is required. If NAPI is enabled, this work
- * happens in dev->poll. A software-generated interrupt is gen-
- * erated from the watchdog to recover from a failed allocation
+ * therefore no locking is required. A software-generated interrupt
+ * is generated from the watchdog to recover from a failed allocation
* senario where all Rx resources have been indicated and none re-
* placed.
*
@@ -126,8 +124,6 @@
* supported. Tx Scatter/Gather is not supported. Jumbo Frames is
* not supported (hardware limitation).
*
- * NAPI support is enabled with CONFIG_E100_NAPI.
- *
* MagicPacket(tm) WoL support is enabled/disabled via ethtool.
*
* Thanks to JC (jchapman@katalix.com) for helping with
@@ -158,7 +154,7 @@
#define DRV_NAME "e100"
-#define DRV_VERSION "3.0.18"
+#define DRV_VERSION "3.0.22-NAPI"
#define DRV_DESCRIPTION "Intel(R) PRO/100 Network Driver"
#define DRV_COPYRIGHT "Copyright(c) 1999-2004 Intel Corporation"
#define PFX DRV_NAME ": "
@@ -1463,11 +1459,7 @@ static inline int e100_rx_indicate(struc
nic->net_stats.rx_packets++;
nic->net_stats.rx_bytes += actual_size;
nic->netdev->last_rx = jiffies;
-#ifdef CONFIG_E100_NAPI
netif_receive_skb(skb);
-#else
- netif_rx(skb);
-#endif
if(work_done)
(*work_done)++;
}
@@ -1562,20 +1554,12 @@ static irqreturn_t e100_intr(int irq, vo
if(stat_ack & stat_ack_rnr)
nic->ru_running = 0;
-#ifdef CONFIG_E100_NAPI
e100_disable_irq(nic);
netif_rx_schedule(netdev);
-#else
- if(stat_ack & stat_ack_rx)
- e100_rx_clean(nic, NULL, 0);
- if(stat_ack & stat_ack_tx)
- e100_tx_clean(nic);
-#endif
return IRQ_HANDLED;
}
-#ifdef CONFIG_E100_NAPI
static int e100_poll(struct net_device *netdev, int *budget)
{
struct nic *nic = netdev_priv(netdev);
@@ -1598,7 +1582,6 @@ static int e100_poll(struct net_device *
return 1;
}
-#endif
#ifdef CONFIG_NET_POLL_CONTROLLER
static void e100_netpoll(struct net_device *netdev)
@@ -2137,10 +2120,8 @@ static int __devinit e100_probe(struct p
SET_ETHTOOL_OPS(netdev, &e100_ethtool_ops);
netdev->tx_timeout = e100_tx_timeout;
netdev->watchdog_timeo = E100_WATCHDOG_PERIOD;
-#ifdef CONFIG_E100_NAPI
netdev->poll = e100_poll;
netdev->weight = E100_NAPI_WEIGHT;
-#endif
#ifdef CONFIG_NET_POLL_CONTROLLER
netdev->poll_controller = e100_netpoll;
#endif
diff -Naurp linux-2.6.7-rc2-bk5/drivers/net/Kconfig linux-2.6.7-rc2-bk5.mod/drivers/net/Kconfig
--- linux-2.6.7-rc2-bk5/drivers/net/Kconfig 2004-06-04 15:58:26.000000000 -0700
+++ linux-2.6.7-rc2-bk5.mod/drivers/net/Kconfig 2004-06-04 16:02:34.000000000 -0700
@@ -1498,10 +1498,6 @@ config E100
<file:Documentation/networking/net-modules.txt>. The module
will be called e100.
-config E100_NAPI
- bool "Use Rx Polling (NAPI)"
- depends on E100
-
config LNE390
tristate "Mylex EISA LNE390A/B support (EXPERIMENTAL)"
depends on NET_PCI && EISA && EXPERIMENTAL
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 2.6] e100: use NAPI mode all the time 2004-06-05 0:35 [PATCH 2.6] e100: use NAPI mode all the time Scott Feldman @ 2004-06-06 22:57 ` Tim Mattox 2004-06-07 0:03 ` Scott Feldman 2004-06-08 9:53 ` Christopher Chan 2004-06-11 0:16 ` Jeff Garzik 2 siblings, 1 reply; 10+ messages in thread From: Tim Mattox @ 2004-06-06 22:57 UTC (permalink / raw) To: Scott Feldman; +Cc: netdev, bonding-devel, jgarzik Scott, Have you considered how this interacts with multiple e100's bonded together with Linux channel bonding? I've CC'd the bonding developer mailing list to flush out any more opinions on this. I have yet to set up a good test system, but my impression has been that NAPI and channel bonding would lead to lots of packet re-ordering load for the CPU that could outweigh the interrupt load savings. Does anyone have experience with this? Also, depending on the setting of /proc/sys/net/ipv4/tcp_reordering the TCP stack might do aggressive NACKs because of a false-positive on dropped packets due to the large reordering that could occur with NAPI and bonding combined. In short, unless there has been study on this, I would suggest not yet removing support for non-NAPI mode on any network driver. On Jun 4, 2004, at 8:35 PM, Scott Feldman wrote: > I see no reason to keep the non-NAPI option for e100. This patch > removes > the CONFIG_E100_NAPI option and puts the driver in NAPI mode all the > time. > Matches the way tg3 works. > > Unless someone has a really good reason to keep the non-NAPI mode, this > should go in for 2.6.7. > > -scott > > ---------------- > > diff -Naurp linux-2.6.7-rc2-bk5/drivers/net/e100.c > linux-2.6.7-rc2-bk5.mod/drivers/net/e100.c > --- linux-2.6.7-rc2-bk5/drivers/net/e100.c 2004-06-04 > 15:58:07.000000000 -0700 > +++ linux-2.6.7-rc2-bk5.mod/drivers/net/e100.c 2004-06-04 > 16:02:04.000000000 -0700 > @@ -87,9 +87,8 @@ > * cb_to_use is the next CB to use for queuing a command; cb_to_clean > * is the next CB to check for completion; cb_to_send is the first > * CB to start on in case of a previous failure to resume. CB clean > - * up happens in interrupt context in response to a CU interrupt, or > - * in dev->poll in the case where NAPI is enabled. cbs_avail keeps > - * track of number of free CB resources available. > + * up happens in interrupt context in response to a CU interrupt. > + * cbs_avail keeps track of number of free CB resources available. > * > * Hardware padding of short packets to minimum packet size is > * enabled. 82557 pads with 7Eh, while the later controllers pad > @@ -112,9 +111,8 @@ > * replacement RFDs cannot be allocated, or the RU goes non-active, > * the RU must be restarted. Frame arrival generates an interrupt, > * and Rx indication and re-allocation happen in the same context, > - * therefore no locking is required. If NAPI is enabled, this work > - * happens in dev->poll. A software-generated interrupt is gen- > - * erated from the watchdog to recover from a failed allocation > + * therefore no locking is required. A software-generated interrupt > + * is generated from the watchdog to recover from a failed allocation > * senario where all Rx resources have been indicated and none re- > * placed. > * > @@ -126,8 +124,6 @@ > * supported. Tx Scatter/Gather is not supported. Jumbo Frames is > * not supported (hardware limitation). > * > - * NAPI support is enabled with CONFIG_E100_NAPI. > - * > * MagicPacket(tm) WoL support is enabled/disabled via ethtool. > * > * Thanks to JC (jchapman@katalix.com) for helping with > @@ -158,7 +154,7 @@ > > > #define DRV_NAME "e100" > -#define DRV_VERSION "3.0.18" > +#define DRV_VERSION "3.0.22-NAPI" > #define DRV_DESCRIPTION "Intel(R) PRO/100 Network Driver" > #define DRV_COPYRIGHT "Copyright(c) 1999-2004 Intel Corporation" > #define PFX DRV_NAME ": " > @@ -1463,11 +1459,7 @@ static inline int e100_rx_indicate(struc > nic->net_stats.rx_packets++; > nic->net_stats.rx_bytes += actual_size; > nic->netdev->last_rx = jiffies; > -#ifdef CONFIG_E100_NAPI > netif_receive_skb(skb); > -#else > - netif_rx(skb); > -#endif > if(work_done) > (*work_done)++; > } > @@ -1562,20 +1554,12 @@ static irqreturn_t e100_intr(int irq, vo > if(stat_ack & stat_ack_rnr) > nic->ru_running = 0; > > -#ifdef CONFIG_E100_NAPI > e100_disable_irq(nic); > netif_rx_schedule(netdev); > -#else > - if(stat_ack & stat_ack_rx) > - e100_rx_clean(nic, NULL, 0); > - if(stat_ack & stat_ack_tx) > - e100_tx_clean(nic); > -#endif > > return IRQ_HANDLED; > } > > -#ifdef CONFIG_E100_NAPI > static int e100_poll(struct net_device *netdev, int *budget) > { > struct nic *nic = netdev_priv(netdev); > @@ -1598,7 +1582,6 @@ static int e100_poll(struct net_device * > > return 1; > } > -#endif > > #ifdef CONFIG_NET_POLL_CONTROLLER > static void e100_netpoll(struct net_device *netdev) > @@ -2137,10 +2120,8 @@ static int __devinit e100_probe(struct p > SET_ETHTOOL_OPS(netdev, &e100_ethtool_ops); > netdev->tx_timeout = e100_tx_timeout; > netdev->watchdog_timeo = E100_WATCHDOG_PERIOD; > -#ifdef CONFIG_E100_NAPI > netdev->poll = e100_poll; > netdev->weight = E100_NAPI_WEIGHT; > -#endif > #ifdef CONFIG_NET_POLL_CONTROLLER > netdev->poll_controller = e100_netpoll; > #endif > diff -Naurp linux-2.6.7-rc2-bk5/drivers/net/Kconfig > linux-2.6.7-rc2-bk5.mod/drivers/net/Kconfig > --- linux-2.6.7-rc2-bk5/drivers/net/Kconfig 2004-06-04 > 15:58:26.000000000 -0700 > +++ linux-2.6.7-rc2-bk5.mod/drivers/net/Kconfig 2004-06-04 > 16:02:34.000000000 -0700 > @@ -1498,10 +1498,6 @@ config E100 > <file:Documentation/networking/net-modules.txt>. The module > will be called e100. > > -config E100_NAPI > - bool "Use Rx Polling (NAPI)" > - depends on E100 > - > config LNE390 > tristate "Mylex EISA LNE390A/B support (EXPERIMENTAL)" > depends on NET_PCI && EISA && EXPERIMENTAL > -- Tim Mattox - tmattox@engr.uky.edu - http://homepage.mac.com/tmattox/ http://aggregate.org/KAOS/ - http://advogato.org/person/tmattox/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2.6] e100: use NAPI mode all the time 2004-06-06 22:57 ` Tim Mattox @ 2004-06-07 0:03 ` Scott Feldman 2004-06-07 1:51 ` Tim Mattox 0 siblings, 1 reply; 10+ messages in thread From: Scott Feldman @ 2004-06-07 0:03 UTC (permalink / raw) To: Tim Mattox; +Cc: Scott Feldman, netdev, bonding-devel, jgarzik > Have you considered how this interacts with multiple e100's bonded > together with Linux channel bonding? > I've CC'd the bonding developer mailing list to flush out any more > opinions on this. No. But if there is an issue between NAPI and bonding, that's something to solve between NAPI and bonding but not the nic driver. > I have yet to set up a good test system, but my impression has been > that NAPI and channel bonding would lead to lots of packet re-ordering > load for the CPU that could outweigh the interrupt load savings. > Does anyone have experience with this? re-ordering or dropped? > Also, depending on the setting of /proc/sys/net/ipv4/tcp_reordering > the TCP stack might do aggressive NACKs because of a false-positive on > dropped packets due to the large reordering that could occur with > NAPI and bonding combined. I guess I don't see the bonding angle. How does inserting a SW FIFO between the nic HW and the softirq thread make things better for bonding? > In short, unless there has been study on this, I would suggest not yet > removing support for non-NAPI mode on any network driver. fedora core 2's default is e100-NAPI, so we're getting good test coverage there without bonding. tg3 has used NAPI only for some time, and I'm sure it's used with bonding. -scott ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2.6] e100: use NAPI mode all the time 2004-06-07 0:03 ` Scott Feldman @ 2004-06-07 1:51 ` Tim Mattox 2004-06-07 2:33 ` Jeff Garzik 0 siblings, 1 reply; 10+ messages in thread From: Tim Mattox @ 2004-06-07 1:51 UTC (permalink / raw) To: sfeldma; +Cc: netdev, bonding-devel, Scott Feldman, jgarzik Please excuse the length of this e-mail. I will attempt to explain the potential problem between NAPI and bonding with an example below. And the only reason I say "potential" is that I have deliberately avoided building clusters with this configuration and have not seen it "in the wild" personally. I've read about this problem on the beowulf mailing list, usually in conjunction with people trying to bond GigE NICs. I will soon have a cluster that can be easily switched to various modes on it's network, including simple bonding, and I should be able to directly test this myself in my lab. The problem is caused by the order packets are delivered to the TCP stack on the receiving machine. In normal round-robin bonding mode, the packets are sent out one per NIC in the bond. For simplicity sake, lets say we have two NICs in a bond, eth0 and eth1. When sending packets, eth0 will handle all the even packets, and eth1 all the odd packets. Similarly when receiving, eth0 would get all the even packets, and eth1 all the odd packets from a particular TCP stream. With NAPI (or other interrupt mitigation techniques) the receiving machine will process multiple packets in a row from a single NIC, before getting packets from another NIC. In the above example, eth0 would receive packets 0, 2, 4, 6, etc. and pass them to the TCP layer. Followed by eth1's packets 1, 3, 5, 7, etc. The specific number of out-of-order packets received in a row would depend on many factors. The TCP layer would need to reorder the packets from something like 0, 2, 4, 6, 1, 3, 5, 7 or something like 0, 2, 4, 1, 3, 5, 6, 7. With many possible variations. Before NAPI (and hardware interrupt mitigation schemes), bonding would work without causing this re-ordering, since each packet would arrive and be enqueued to the TCP stack in the order of arrival, which in a well designed network would match the transmission order. Sure, if your network randomly delayed packets then things would get out of order, but in the HPC community which uses bonding, the two network paths would normally be made identical, and possibly with only a single switch between source and destination NICs. If there was congestion delays in one path and not in another, then the HPC network/program had more serious problems. I don't want to slow the progress of Linux networking development. I was objecting to the removal of a feature to e100 that already has working code and that was, AFAIK, necessary for the performance enhancement of bonding. If the overhead of re-ordering the packets is not significant, and if simply increasing the value of /proc/sys/net/ipv4/tcp_reordering will allow TCP to "chill" and not send negative ACKs when it sees packets this much out of order, than sure, remove the non-NAPI support. I will attempt to re-locate the specific examples discussed on the beowulf mailing list, but I don't have those URLs handy. On Jun 6, 2004, at 8:03 PM, Scott Feldman wrote: >> Have you considered how this interacts with multiple e100's bonded >> together with Linux channel bonding? >> I've CC'd the bonding developer mailing list to flush out any more >> opinions on this. > > No. But if there is an issue between NAPI and bonding, that's > something > to solve between NAPI and bonding but not the nic driver. There may yet need to be more bonding code put in the receive path to deal with this re-ordering problem. Or possibly a configuration option to NAPI that works across various NIC drivers. But I hope not. Any bonding developers have ideas on how to mitigate this problem? >> I have yet to set up a good test system, but my impression has been >> that NAPI and channel bonding would lead to lots of packet re-ordering >> load for the CPU that could outweigh the interrupt load savings. >> Does anyone have experience with this? > > re-ordering or dropped? This re-ordering problem will show up without any actual packet loss. >> Also, depending on the setting of /proc/sys/net/ipv4/tcp_reordering >> the TCP stack might do aggressive NACKs because of a false-positive on >> dropped packets due to the large reordering that could occur with >> NAPI and bonding combined. > > I guess I don't see the bonding angle. How does inserting a SW FIFO > between the nic HW and the softirq thread make things better for > bonding? I'm not sure I understand your question. The tcp_reordering parameter is supposed to control the amount of out-of-order packets the receiving TCP stack sees before issuing pre-emptive negative ACKs to the sender. (To avoid waiting for the TCP resend timer to expire.) This was an optimization that works well in most situations where packet re-ordering was a strong indication of a dropped packet. Such extra NACKs, and the resulting unnecessary retransmits, would be quite detrimental to performance in a bonded network setup that was not actually dropping packets. >> In short, unless there has been study on this, I would suggest not yet >> removing support for non-NAPI mode on any network driver. > > fedora core 2's default is e100-NAPI, so we're getting good test > coverage there without bonding. tg3 has used NAPI only for some time, > and I'm sure it's used with bonding. > > -scott I have NO problems with NAPI itself, I think it's a wonderful development. I would even advocate for making NAPI the default across the board. But for bonding, until I see otherwise, I want to be able to not use NAPI. As I indicated, I will have a new cluster that I can directly test this NAPI vs Bonding issue very soon. -- Tim Mattox - tmattox@engr.uky.edu - http://homepage.mac.com/tmattox/ http://aggregate.org/KAOS/ - http://advogato.org/person/tmattox/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2.6] e100: use NAPI mode all the time 2004-06-07 1:51 ` Tim Mattox @ 2004-06-07 2:33 ` Jeff Garzik 2004-06-07 6:39 ` [Bonding-devel] " Jay Vosburgh 0 siblings, 1 reply; 10+ messages in thread From: Jeff Garzik @ 2004-06-07 2:33 UTC (permalink / raw) To: Tim Mattox; +Cc: sfeldma, netdev, bonding-devel, Scott Feldman Tim Mattox wrote: > The problem is caused by the order packets are delivered to the TCP > stack on the receiving machine. In normal round-robin bonding mode, > the packets are sent out one per NIC in the bond. For simplicity > sake, lets say we have two NICs in a bond, eth0 and eth1. When > sending packets, eth0 will handle all the even packets, and eth1 all > the odd packets. Similarly when receiving, eth0 would get all > the even packets, and eth1 all the odd packets from a particular > TCP stream. > > With NAPI (or other interrupt mitigation techniques) the > receiving machine will process multiple packets in a row from a > single NIC, before getting packets from another NIC. In the > above example, eth0 would receive packets 0, 2, 4, 6, etc. > and pass them to the TCP layer. Followed by eth1's > packets 1, 3, 5, 7, etc. The specific number of out-of-order > packets received in a row would depend on many factors. > > The TCP layer would need to reorder the packets from something > like 0, 2, 4, 6, 1, 3, 5, 7 or something > like 0, 2, 4, 1, 3, 5, 6, 7. With many possible variations. Ethernet drivers have _always_ processed multiple packets per interrupt, since before the days of NAPI, and before the days of hardware mitigation. Therefore, this is mainly an argument against using overly simplistic load balancing schemes that _create_ this problem :) It's much smarter to load balance based on flows, for example. I think the ALB mode does this? You appear to be making the incorrect assumption that packets sent in this simplistic, round-robin manner could ever _hope_ to arrive in-order at the destination. Any number of things serve gather packets into bursts: net stack TX queue, hardware DMA ring, hardware FIFO, remote h/w FIFO, remote hardware DMA ring, remote softirq. > I don't want to slow the progress of Linux networking development. > I was objecting to the removal of a feature to e100 that already has > working code and that was, AFAIK, necessary for the performance > enhancement of bonding. No, just don't use a bonding mode that kills performance. It has nothing to do with NAPI. As I said, ethernet drivers have been processing runs of packets per irq / softirq for ages and ages. This isn't new with NAPI, to be sure. > I have NO problems with NAPI itself, I think it's a wonderful development. > I would even advocate for making NAPI the default across the board. > But for bonding, until I see otherwise, I want to be able to not use NAPI. > As I indicated, I will have a new cluster that I can directly test this > NAPI vs Bonding issue very soon. As Scott indicated, people use bonding with tg3 (unconditional NAPI) all time. Further, I hope you're not doing something silly like trying to load balance on the _same_ ethernet. If you are, that's a signal that deeper problems exist -- you should be able to do wire speed with one NIC. Jeff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bonding-devel] Re: [PATCH 2.6] e100: use NAPI mode all the time 2004-06-07 2:33 ` Jeff Garzik @ 2004-06-07 6:39 ` Jay Vosburgh 2004-06-07 11:17 ` jamal 0 siblings, 1 reply; 10+ messages in thread From: Jay Vosburgh @ 2004-06-07 6:39 UTC (permalink / raw) To: Jeff Garzik; +Cc: Tim Mattox, sfeldma, netdev, bonding-devel, Scott Feldman Jeff Garzik <jgarzik@pobox.com> wrote: >Tim Mattox wrote: >> The problem is caused by the order packets are delivered to the TCP >> stack on the receiving machine. In normal round-robin bonding mode, >> the packets are sent out one per NIC in the bond. For simplicity >> sake, lets say we have two NICs in a bond, eth0 and eth1. When >> sending packets, eth0 will handle all the even packets, and eth1 all >> the odd packets. Similarly when receiving, eth0 would get all >> the even packets, and eth1 all the odd packets from a particular >> TCP stream. >Ethernet drivers have _always_ processed multiple packets per >interrupt, since before the days of NAPI, and before the days of >hardware mitigation. There was a discussion about this behavior (round-robin mode out of order delivery) on bonding-devel in February 2003. The archives can be found here: http://sourceforge.net/mailarchive/forum.php?forum_id=2094&max_rows=25&style=ultimate&viewmonth=200302 The messages on Feb 19 relate to the effects of packet coalescing, and Feb 17 to general out of order delivery problems. Somewhere in there are the results of some testing I did, and analysis of how tcp_ordering effects things. As I recall, I even used e100s for my testing, so it may be a fair apples to apples comparsion. When I tested this (on 4 100Mbps ethernets), even after adjusting tcp_reordering I could only get TCP single stream throughput of about 235 Mb/sec out of a theoretical 375 or so (400 minus about 6% for headers and whatnot). UDP would run in the mid to upper 300's, depending upon datagram size. The tests did not examine UDP delivery order. The round-robin mode will, for all practical purposes, always deliver some large percentage of packets out of order. You can fiddle with the tcp_reordering parameter to mitigate the effects to some degree, but there's no way it's going away entirely. I'm curious as to what types of systems the beowulf / HPC people (mentioned by Tim in an earlier message) are using that they don't see out of order problems with round robin, even without NAPI. >Therefore, this is mainly an argument against using overly simplistic >load balancing schemes that _create_ this problem :) It's much >smarter to load balance based on flows, for example. I think the ALB >mode does this? The round robin mode is unique in that it is the only mode that will attempt (however stupidly) to stripe single connections (flows) across multiple interfaces. The other (smarter) modes, 802.3ad, alb, and tlb, will try to keep particular connections generally on a particular interface (for 802.3ad, it's required by the standard to behave that way). This means that a given single TCP/IP connection won't get more than one interface worth of throughput. With round-robin, you can get more than one interface worth, but not very efficiently. >> I have NO problems with NAPI itself, I think it's a wonderful development. >> I would even advocate for making NAPI the default across the board. >> But for bonding, until I see otherwise, I want to be able to not use NAPI. >> As I indicated, I will have a new cluster that I can directly test this >> NAPI vs Bonding issue very soon. After taking into account the effects of delivering multiple packets per interrupt and the scheduling order of network device interrupts (potentially on different CPUs), I'm not really sure there's much room for NAPI to make round-robin any worse than it already is. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bonding-devel] Re: [PATCH 2.6] e100: use NAPI mode all the time 2004-06-07 6:39 ` [Bonding-devel] " Jay Vosburgh @ 2004-06-07 11:17 ` jamal 0 siblings, 0 replies; 10+ messages in thread From: jamal @ 2004-06-07 11:17 UTC (permalink / raw) To: Jay Vosburgh Cc: Jeff Garzik, Tim Mattox, sfeldma, netdev, bonding-devel, Scott Feldman Hi, dont have time to go through all that thread, but lets understand your problem and setup. Lets start with the setup: You have 4 ethx ports on PC1 x-connected to 4 on PC2. You have bonding on PC1 but not on PC2. You have NAPI on both PC1 and PC2. Is any of them multiprocessor? Lets get the setup then we can continue the discussion. cheers, jamal ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2.6] e100: use NAPI mode all the time 2004-06-05 0:35 [PATCH 2.6] e100: use NAPI mode all the time Scott Feldman 2004-06-06 22:57 ` Tim Mattox @ 2004-06-08 9:53 ` Christopher Chan 2004-06-15 18:04 ` Christopher Chan 2004-06-11 0:16 ` Jeff Garzik 2 siblings, 1 reply; 10+ messages in thread From: Christopher Chan @ 2004-06-08 9:53 UTC (permalink / raw) To: Scott Feldman; +Cc: jgarzik, netdev Scott Feldman wrote: > I see no reason to keep the non-NAPI option for e100. This patch removes > the CONFIG_E100_NAPI option and puts the driver in NAPI mode all the time. > Matches the way tg3 works. > > Unless someone has a really good reason to keep the non-NAPI mode, this > should go in for 2.6.7. I for one need to test 2.6.6 e100 with NAPI on. Under 2.6.3/4 I had problems with NAPI mode turned on. Turning NAPI off and then also doing net.ipv4.tcp_max_syn_backlog = 2048 net.ipv4.route.gc_thresh = 65536 net.ipv4.route.max_size = 1048576 was the only way to keep the machines I run available via the network. I would get dst cache overflows and sometimes the kernel will log garbled messages and when that happens the box requires a reboot. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2.6] e100: use NAPI mode all the time 2004-06-08 9:53 ` Christopher Chan @ 2004-06-15 18:04 ` Christopher Chan 0 siblings, 0 replies; 10+ messages in thread From: Christopher Chan @ 2004-06-15 18:04 UTC (permalink / raw) To: Christopher Chan; +Cc: Scott Feldman, jgarzik, netdev Christopher Chan wrote: > Scott Feldman wrote: > >> I see no reason to keep the non-NAPI option for e100. This patch removes >> the CONFIG_E100_NAPI option and puts the driver in NAPI mode all the >> time. >> Matches the way tg3 works. >> >> Unless someone has a really good reason to keep the non-NAPI mode, this >> should go in for 2.6.7. > > > I for one need to test 2.6.6 e100 with NAPI on. Under 2.6.3/4 I had > problems with NAPI mode turned on. Turning NAPI off and then also doing > > net.ipv4.tcp_max_syn_backlog = 2048 > net.ipv4.route.gc_thresh = 65536 > net.ipv4.route.max_size = 1048576 > > was the only way to keep the machines I run available via the network. > > I would get dst cache overflows and sometimes the kernel will log > garbled messages and when that happens the box requires a reboot. > KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || (flags & (MSG_PEEK | MSG_TRUNC))) failed at net/ipv4/tcp.c (1632) KERNEL: assertion (flags & MSG_PEEK) failed at net/ipv4/tcp.c (1568) KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || (flags & (MSG_PEEK | MSG_TRUNC))) failed at net/ipv4/tcp.c (1632) KERNEL: assertion (flags & MSG_PEEK) failed at net/ipv4/tcp.c (1568) printk: 4253 messages suppressed. dst cache overflow KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || (flags & (MSG_PEEK | MSG_TRUNC))) failed at net/ipv4/tcp.c (1632) KERNEL: assertion (flags & MSG_PEEK) failed at net/ipv4/tcp.c (1568) KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || (flags & (MSG_PEEK | MSG_TRUNC))) failed at net/ipv4/tcp.c (1632) KERNEL: assertion (flags & MSG_PEEK) failed at net/ipv4/tcp.c (1568) KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || (flags & (MSG_PEEK | MSG_TRUNC))) failed at net/ipv4/tcp.c (1632) KERNEL: assertion (flags & MSG_PEEK) failed at net/ipv4/tcp.c (1568) KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || (flags & (MSG_PEEK | MSG_TRUNC))) failed at net/ipv4/tcp.c (1632) I get loads of this now for the only box that I have NAPI enabled on the e100 driver. This is on a 2.6.6 kernel. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2.6] e100: use NAPI mode all the time 2004-06-05 0:35 [PATCH 2.6] e100: use NAPI mode all the time Scott Feldman 2004-06-06 22:57 ` Tim Mattox 2004-06-08 9:53 ` Christopher Chan @ 2004-06-11 0:16 ` Jeff Garzik 2 siblings, 0 replies; 10+ messages in thread From: Jeff Garzik @ 2004-06-11 0:16 UTC (permalink / raw) To: Scott Feldman; +Cc: netdev applied to netdev-2.6 queue (and thus Andrew's -mm tree automatically). We'll let it stew in there for a while and get testing feedback. Your 3 recently-sent bugfixes will go straight upstream, of course. Jeff ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2004-06-15 18:04 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-06-05 0:35 [PATCH 2.6] e100: use NAPI mode all the time Scott Feldman 2004-06-06 22:57 ` Tim Mattox 2004-06-07 0:03 ` Scott Feldman 2004-06-07 1:51 ` Tim Mattox 2004-06-07 2:33 ` Jeff Garzik 2004-06-07 6:39 ` [Bonding-devel] " Jay Vosburgh 2004-06-07 11:17 ` jamal 2004-06-08 9:53 ` Christopher Chan 2004-06-15 18:04 ` Christopher Chan 2004-06-11 0:16 ` Jeff Garzik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).