* [e1000 2.6 10/11] TxDescriptors -> 1024 default
@ 2003-09-09 3:14 Feldman, Scott
2003-09-11 19:18 ` Jeff Garzik
0 siblings, 1 reply; 35+ messages in thread
From: Feldman, Scott @ 2003-09-09 3:14 UTC (permalink / raw)
To: Jeff Garzik; +Cc: netdev, ricardoz
* Change the default number of Tx descriptors from 256 to 1024.
Data from [ricardoz@us.ibm.com] shows it's easy to overrun
the Tx desc queue.
-------------
diff -Nuarp linux-2.6.0-test4/drivers/net/e1000/e1000_param.c linux-2.6.0-test4/drivers/net/e1000.new/e1000_param.c
--- linux-2.6.0-test4/drivers/net/e1000/e1000_param.c 2003-08-22 16:57:59.000000000 -0700
+++ linux-2.6.0-test4/drivers/net/e1000.new/e1000_param.c 2003-09-08 09:13:12.000000000 -0700
@@ -63,9 +63,10 @@ MODULE_PARM_DESC(X, S);
/* Transmit Descriptor Count
*
* Valid Range: 80-256 for 82542 and 82543 gigabit ethernet controllers
- * Valid Range: 80-4096 for 82544
+ * Valid Range: 80-4096 for 82544 and newer
*
- * Default Value: 256
+ * Default Value: 256 for 82542 and 82543 gigabit ethernet controllers
+ * Default Value: 1024 for 82544 and newer
*/
E1000_PARAM(TxDescriptors, "Number of transmit descriptors");
@@ -73,7 +74,7 @@ E1000_PARAM(TxDescriptors, "Number of tr
/* Receive Descriptor Count
*
* Valid Range: 80-256 for 82542 and 82543 gigabit ethernet controllers
- * Valid Range: 80-4096 for 82544
+ * Valid Range: 80-4096 for 82544 and newer
*
* Default Value: 256
*/
@@ -200,6 +201,7 @@ E1000_PARAM(InterruptThrottleRate, "Inte
#define MAX_TXD 256
#define MIN_TXD 80
#define MAX_82544_TXD 4096
+#define DEFAULT_82544_TXD 1024
#define DEFAULT_RXD 256
#define MAX_RXD 256
@@ -320,12 +322,15 @@ e1000_check_options(struct e1000_adapter
struct e1000_option opt = {
.type = range_option,
.name = "Transmit Descriptors",
- .err = "using default of " __MODULE_STRING(DEFAULT_TXD),
- .def = DEFAULT_TXD,
.arg = { .r = { .min = MIN_TXD }}
};
struct e1000_desc_ring *tx_ring = &adapter->tx_ring;
e1000_mac_type mac_type = adapter->hw.mac_type;
+ opt.err = mac_type < e1000_82544 ?
+ "using default of " __MODULE_STRING(DEFAULT_TXD) :
+ "using default of " __MODULE_STRING(DEFAULT_82544_TXD);
+ opt.def = mac_type < e1000_82544 ?
+ DEFAULT_TXD : DEFAULT_82544_TXD;
opt.arg.r.max = mac_type < e1000_82544 ?
MAX_TXD : MAX_82544_TXD;
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-09 3:14 [e1000 2.6 10/11] TxDescriptors -> 1024 default Feldman, Scott @ 2003-09-11 19:18 ` Jeff Garzik 2003-09-11 19:45 ` Ben Greear 0 siblings, 1 reply; 35+ messages in thread From: Jeff Garzik @ 2003-09-11 19:18 UTC (permalink / raw) To: Feldman, Scott; +Cc: netdev, ricardoz Feldman, Scott wrote: > * Change the default number of Tx descriptors from 256 to 1024. > Data from [ricardoz@us.ibm.com] shows it's easy to overrun > the Tx desc queue. All e1000 patches applied except this one. Of _course_ it's easy to overrun the Tx desc queue. That's why we have a TX queue sitting on top of the NIC's hardware queue. And TCP socket buffers on top of that. And similar things. Descriptor increases like this are usually the result of some sillyhead blasting out UDP packets, and then wondering why he sees packet loss on the local computer (the "blast out packets" side). You're just wasting memory. Jeff ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-11 19:18 ` Jeff Garzik @ 2003-09-11 19:45 ` Ben Greear 2003-09-11 19:59 ` Jeff Garzik 2003-09-11 20:12 ` David S. Miller 0 siblings, 2 replies; 35+ messages in thread From: Ben Greear @ 2003-09-11 19:45 UTC (permalink / raw) To: Jeff Garzik; +Cc: Feldman, Scott, netdev, ricardoz Jeff Garzik wrote: > Feldman, Scott wrote: > >> * Change the default number of Tx descriptors from 256 to 1024. >> Data from [ricardoz@us.ibm.com] shows it's easy to overrun >> the Tx desc queue. > > > > All e1000 patches applied except this one. > > Of _course_ it's easy to overrun the Tx desc queue. That's why we have > a TX queue sitting on top of the NIC's hardware queue. And TCP socket > buffers on top of that. And similar things. > > Descriptor increases like this are usually the result of some sillyhead > blasting out UDP packets, and then wondering why he sees packet loss on > the local computer (the "blast out packets" side). Erm, shouldn't the local machine back itself off if the various queues are full? Some time back I looked through the code and it appeared to. If not, I think it should. > > You're just wasting memory. > > Jeff > > > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-11 19:45 ` Ben Greear @ 2003-09-11 19:59 ` Jeff Garzik 2003-09-11 20:12 ` David S. Miller 1 sibling, 0 replies; 35+ messages in thread From: Jeff Garzik @ 2003-09-11 19:59 UTC (permalink / raw) To: Ben Greear; +Cc: Feldman, Scott, netdev, ricardoz Ben Greear wrote: > Jeff Garzik wrote: > >> Feldman, Scott wrote: >> >>> * Change the default number of Tx descriptors from 256 to 1024. >>> Data from [ricardoz@us.ibm.com] shows it's easy to overrun >>> the Tx desc queue. >> >> >> >> >> All e1000 patches applied except this one. >> >> Of _course_ it's easy to overrun the Tx desc queue. That's why we >> have a TX queue sitting on top of the NIC's hardware queue. And TCP >> socket buffers on top of that. And similar things. >> >> Descriptor increases like this are usually the result of some >> sillyhead blasting out UDP packets, and then wondering why he sees >> packet loss on the local computer (the "blast out packets" side). > > > Erm, shouldn't the local machine back itself off if the various > queues are full? Some time back I looked through the code and it > appeared to. If not, I think it should. Given the guarantees of the protocol, the net stack has the freedom to drop UDP packets, for example at times when (for TCP) one would otherwise queue a packet for retransmit. Jeff ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-11 19:45 ` Ben Greear 2003-09-11 19:59 ` Jeff Garzik @ 2003-09-11 20:12 ` David S. Miller 2003-09-11 20:40 ` Ben Greear 1 sibling, 1 reply; 35+ messages in thread From: David S. Miller @ 2003-09-11 20:12 UTC (permalink / raw) To: Ben Greear; +Cc: jgarzik, scott.feldman, netdev, ricardoz On Thu, 11 Sep 2003 12:45:55 -0700 Ben Greear <greearb@candelatech.com> wrote: > Erm, shouldn't the local machine back itself off if the various > queues are full? Some time back I looked through the code and it > appeared to. If not, I think it should. Generic networking device queues drop when the overflow. Whatever dev->tx_queue_len is set to, the device driver needs to be prepared to be able to queue successfully. Most people run into problems when they run stupid UDP applications that send a stream of tinygrams (<~64 bytes). The solutions are to either fix the UDP app or restrict it's socket send buffer size. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-11 20:12 ` David S. Miller @ 2003-09-11 20:40 ` Ben Greear 2003-09-11 21:07 ` David S. Miller 0 siblings, 1 reply; 35+ messages in thread From: Ben Greear @ 2003-09-11 20:40 UTC (permalink / raw) To: David S. Miller; +Cc: jgarzik, scott.feldman, netdev, ricardoz David S. Miller wrote: > Generic networking device queues drop when the overflow. > > Whatever dev->tx_queue_len is set to, the device driver needs > to be prepared to be able to queue successfully. > > Most people run into problems when they run stupid UDP applications > that send a stream of tinygrams (<~64 bytes). The solutions are to > either fix the UDP app or restrict it's socket send buffer size. Is this close to how it works? So, assume we configure a 10MB socket send queue on our UDP socket... Select says its writable up to at least 5MB. We write 5MB of 64byte packets "righ now". Did we just drop a large number of packets? I would expect that the packets, up to 10MB, are buffered in some list/fifo in the socket code, and that as the underlying device queue empties itself, the socket will feed it more packets. The device queue, in turn, is emptied as the driver is able to fill it's TxDescriptors, and the hardware empties the TxDescriptors. Obviously, I'm confused somewhere.... Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-11 20:40 ` Ben Greear @ 2003-09-11 21:07 ` David S. Miller 2003-09-11 21:29 ` Ben Greear 0 siblings, 1 reply; 35+ messages in thread From: David S. Miller @ 2003-09-11 21:07 UTC (permalink / raw) To: Ben Greear; +Cc: jgarzik, scott.feldman, netdev, ricardoz On Thu, 11 Sep 2003 13:40:44 -0700 Ben Greear <greearb@candelatech.com> wrote: > So, assume we configure a 10MB socket send queue on our UDP socket... > > Select says its writable up to at least 5MB. > > We write 5MB of 64byte packets "righ now". > > Did we just drop a large number of packets? Yes, we did _iff_ dev->tx_queue_len is less than or equal to (5MB / (64 + sizeof(udp_id_headers))). ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-11 21:07 ` David S. Miller @ 2003-09-11 21:29 ` Ben Greear 2003-09-11 21:29 ` David S. Miller 0 siblings, 1 reply; 35+ messages in thread From: Ben Greear @ 2003-09-11 21:29 UTC (permalink / raw) To: David S. Miller; +Cc: jgarzik, scott.feldman, netdev, ricardoz David S. Miller wrote: > On Thu, 11 Sep 2003 13:40:44 -0700 > Ben Greear <greearb@candelatech.com> wrote: > > >>So, assume we configure a 10MB socket send queue on our UDP socket... >> >>Select says its writable up to at least 5MB. >> >>We write 5MB of 64byte packets "righ now". >> >>Did we just drop a large number of packets? > > > Yes, we did _iff_ dev->tx_queue_len is less than or equal > to (5MB / (64 + sizeof(udp_id_headers))). Thanks for that clarification. Is there no way to tell at 'sendto' time that the buffers are over-full, and either block or return -EBUSY or something like that? Perhaps the poll logic should also take the underlying buffer into account and not show the socket as writable in this case? Supposing in the above example, I set tx_queue_len to (5MB / (64 + sizeof(udp_id_headers))), will the packets now be dropped in the driver instead, or will there be no more (local) drops? Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-11 21:29 ` Ben Greear @ 2003-09-11 21:29 ` David S. Miller 2003-09-11 21:47 ` Ricardo C Gonzalez 2003-09-11 22:15 ` Ben Greear 0 siblings, 2 replies; 35+ messages in thread From: David S. Miller @ 2003-09-11 21:29 UTC (permalink / raw) To: Ben Greear; +Cc: jgarzik, scott.feldman, netdev, ricardoz On Thu, 11 Sep 2003 14:29:43 -0700 Ben Greear <greearb@candelatech.com> wrote: > Thanks for that clarification. Is there no way to tell > at 'sendto' time that the buffers are over-full, and either > block or return -EBUSY or something like that? The TX queue state can change by hundreds of packets by the time we are finished making the "decision", also how would you like to "wake" up sockets when the TX queue is liberated. That extra overhead and logic would be wonderful for performance. No, this is all nonsense. Packet scheduling and queueing is an opaque layer to all the upper layers. It is the only sensible design. IP transmit is black hole that may drop packets at any moment, any datagram application not prepared for this should be prepared for troubles or choose to move over to something like TCP. I listed even a workaround for such stupid UDP apps, simply limit their socket send queue limits. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-11 21:29 ` David S. Miller @ 2003-09-11 21:47 ` Ricardo C Gonzalez 2003-09-11 22:00 ` Jeff Garzik 2003-09-11 22:15 ` Ben Greear 1 sibling, 1 reply; 35+ messages in thread From: Ricardo C Gonzalez @ 2003-09-11 21:47 UTC (permalink / raw) To: David S. Miller; +Cc: greearb, jgarzik, scott.feldman, netdev >IP transmit is black hole that may drop packets at any moment, >any datagram application not prepared for this should be prepared >for troubles or choose to move over to something like TCP. As I said before, please do not make this a UDP issue. The data I sent out was taken using a TCP_STREAM test case. Please review it. regards, ---------------------------------------------------------------------------------- *** ALWAYS THINK POSITIVE *** Rick Gonzalez IBM Linux Performance Group Building: 905 Office: 7G019 Phone: (512) 838-0623 "David S. Miller" <davem@redhat.com> on 09/11/2003 04:29:06 PM To: Ben Greear <greearb@candelatech.com> cc: jgarzik@pobox.com, scott.feldman@intel.com, netdev@oss.sgi.com, Ricardo C Gonzalez/Austin/IBM@ibmus Subject: Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default On Thu, 11 Sep 2003 14:29:43 -0700 Ben Greear <greearb@candelatech.com> wrote: > Thanks for that clarification. Is there no way to tell > at 'sendto' time that the buffers are over-full, and either > block or return -EBUSY or something like that? The TX queue state can change by hundreds of packets by the time we are finished making the "decision", also how would you like to "wake" up sockets when the TX queue is liberated. That extra overhead and logic would be wonderful for performance. No, this is all nonsense. Packet scheduling and queueing is an opaque layer to all the upper layers. It is the only sensible design. IP transmit is black hole that may drop packets at any moment, any datagram application not prepared for this should be prepared for troubles or choose to move over to something like TCP. I listed even a workaround for such stupid UDP apps, simply limit their socket send queue limits. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-11 21:47 ` Ricardo C Gonzalez @ 2003-09-11 22:00 ` Jeff Garzik 0 siblings, 0 replies; 35+ messages in thread From: Jeff Garzik @ 2003-09-11 22:00 UTC (permalink / raw) To: Ricardo C Gonzalez; +Cc: David S. Miller, greearb, scott.feldman, netdev Ricardo C Gonzalez wrote: > > >>IP transmit is black hole that may drop packets at any moment, >>any datagram application not prepared for this should be prepared >>for troubles or choose to move over to something like TCP. > > > > As I said before, please do not make this a UDP issue. The data I sent out > was taken using a TCP_STREAM test case. Please review it. Your own words say "CPUs can fill TX queue". We already know this. CPUs have been doing wire speed for ages. Jeff ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-11 21:29 ` David S. Miller 2003-09-11 21:47 ` Ricardo C Gonzalez @ 2003-09-11 22:15 ` Ben Greear 2003-09-11 23:02 ` David S. Miller 1 sibling, 1 reply; 35+ messages in thread From: Ben Greear @ 2003-09-11 22:15 UTC (permalink / raw) To: David S. Miller; +Cc: jgarzik, scott.feldman, netdev, ricardoz David S. Miller wrote: > On Thu, 11 Sep 2003 14:29:43 -0700 > Ben Greear <greearb@candelatech.com> wrote: > > >>Thanks for that clarification. Is there no way to tell >>at 'sendto' time that the buffers are over-full, and either >>block or return -EBUSY or something like that? > > > The TX queue state can change by hundreds of packets by > the time we are finished making the "decision", also how would > you like to "wake" up sockets when the TX queue is liberated. So, at some point the decision is already made that we must drop the packet, or that we can enqueue it. This is where I would propose we block the thing trying to enqueue, or at least propagate a failure code back up the stack(s) so that the packet can be retried by the calling layer. Preferably, one would propagate the error all the way to userspace and let them deal with it, just like we currently deal with socket queue full issues. > That extra overhead and logic would be wonderful for performance. The cost of a retransmit is also expensive, whether it is some hacked up UDP protocol or for TCP. Even if one had to implement callbacks from the device queue to the interested sockets, this should not be a large performance hit. > > No, this is all nonsense. Packet scheduling and queueing is > an opaque layer to all the upper layers. It is the only sensible > design. This is possible, but it does not seem cut and dried to me. If there is any documentation or research that support this assertion, please do let us know. > > IP transmit is black hole that may drop packets at any moment, > any datagram application not prepared for this should be prepared > for troubles or choose to move over to something like TCP. > > I listed even a workaround for such stupid UDP apps, simply limit > their socket send queue limits. And the original poster shows how a similar problem slows down TCP as well due to local dropped packets. Don't you think we'd get better TCP throughput if we instead had the calling code wait 1us for the buffers to clear? -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-11 22:15 ` Ben Greear @ 2003-09-11 23:02 ` David S. Miller 2003-09-11 23:22 ` Ben Greear 0 siblings, 1 reply; 35+ messages in thread From: David S. Miller @ 2003-09-11 23:02 UTC (permalink / raw) To: Ben Greear; +Cc: jgarzik, scott.feldman, netdev, ricardoz On Thu, 11 Sep 2003 15:15:19 -0700 Ben Greear <greearb@candelatech.com> wrote: > And the original poster shows how a similar problem slows down TCP > as well due to local dropped packets. So, again, dampen the per-socket send queue sizes. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-11 23:02 ` David S. Miller @ 2003-09-11 23:22 ` Ben Greear 2003-09-11 23:29 ` David S. Miller 2003-09-12 1:34 ` jamal 0 siblings, 2 replies; 35+ messages in thread From: Ben Greear @ 2003-09-11 23:22 UTC (permalink / raw) Cc: jgarzik, scott.feldman, netdev, ricardoz David S. Miller wrote: > On Thu, 11 Sep 2003 15:15:19 -0700 > Ben Greear <greearb@candelatech.com> wrote: > > >>And the original poster shows how a similar problem slows down TCP >>as well due to local dropped packets. > > > So, again, dampen the per-socket send queue sizes. That's just a band-aid to cover up the flaw with the lack of queue-pressure feedback to the higher stacks, as would be increasing the TxDescriptors for that matter. -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-11 23:22 ` Ben Greear @ 2003-09-11 23:29 ` David S. Miller 2003-09-12 1:34 ` jamal 1 sibling, 0 replies; 35+ messages in thread From: David S. Miller @ 2003-09-11 23:29 UTC (permalink / raw) To: Ben Greear; +Cc: jgarzik, scott.feldman, netdev, ricardoz On Thu, 11 Sep 2003 16:22:35 -0700 Ben Greear <greearb@candelatech.com> wrote: > David S. Miller wrote: > > So, again, dampen the per-socket send queue sizes. > > That's just a band-aid to cover up the flaw with the lack > of queue-pressure feedback to the higher stacks, as would be increasing the > TxDescriptors for that matter. The whole point of the various packet scheduler algorithms are foregone if we're just going to queue up and send the crap again. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-11 23:22 ` Ben Greear 2003-09-11 23:29 ` David S. Miller @ 2003-09-12 1:34 ` jamal 2003-09-12 2:20 ` Ricardo C Gonzalez 2003-09-13 3:49 ` David S. Miller 1 sibling, 2 replies; 35+ messages in thread From: jamal @ 2003-09-12 1:34 UTC (permalink / raw) To: Ben Greear; +Cc: jgarzik, scott.feldman, netdev, ricardoz Scott, dont increase the tx descriptor ring size - that would truly wasting memory; 256 is pretty adequate. * increase instead the txquelen (as suggested by Davem); user space tools like ip or ifconfig could do it. The standard size has been around 100 for 100Mbps; i suppose it is fair to say that Gige can move data out at 10x that; so set it to 1000. Maybe you can do this from the driver based on what negotiated speed is detected? -------- [root@jzny root]# ip link ls eth0 4: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:b0:d0:05:ae:81 brd ff:ff:ff:ff:ff:ff [root@jzny root]# ip link set[root@jzny root]# ip link ls eth0 4: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:b0:d0:05:ae:81 brd ff:ff:ff:ff:ff:ff eth0 txqueuelen 1000 ------- TCP already reacts on packets dropped at the scheduler level, UDP would be too hard to enforce since the logic is typically on an app above udp. So just conrtol it via the socket queue size. cheers, jamal On Thu, 2003-09-11 at 19:22, Ben Greear wrote: > David S. Miller wrote: > > On Thu, 11 Sep 2003 15:15:19 -0700 > > Ben Greear <greearb@candelatech.com> wrote: > > > > > >>And the original poster shows how a similar problem slows down TCP > >>as well due to local dropped packets. > > > > > > So, again, dampen the per-socket send queue sizes. > > That's just a band-aid to cover up the flaw with the lack > of queue-pressure feedback to the higher stacks, as would be increasing the > TxDescriptors for that matter. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-12 1:34 ` jamal @ 2003-09-12 2:20 ` Ricardo C Gonzalez 2003-09-12 3:05 ` jamal 2003-09-13 3:49 ` David S. Miller 1 sibling, 1 reply; 35+ messages in thread From: Ricardo C Gonzalez @ 2003-09-12 2:20 UTC (permalink / raw) To: hadi; +Cc: greearb, jgarzik, scott.feldman, netdev Jamal wrote: >* increase instead the txquelen (as suggested by Davem); user space >tools like ip or ifconfig could do it. The standard size has been around >100 for 100Mbps; i suppose it is fair to say that Gige can move data out >at 10x that; so set it to 1000. Maybe you can do this from the driver >based on what negotiated speed is detected? This is also another way to do it. As long as we make it harder for users to drop packets and get up to date with Gigabit speeds. We would also have to think about the upcomming 10Gige adapters and their queue sizes, but that is a separate issue. Anyway, the driver can easly set the txqueuelen to 1000. We should care about counting the packets being dropped on the transmit side. Would it be the responsability of the driver to account for this drops? Because each driver has a dedicated software queue and in my opinion, the driver should account for this packets. regards, ---------------------------------------------------------------------------------- *** ALWAYS THINK POSITIVE *** ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-12 2:20 ` Ricardo C Gonzalez @ 2003-09-12 3:05 ` jamal 0 siblings, 0 replies; 35+ messages in thread From: jamal @ 2003-09-12 3:05 UTC (permalink / raw) To: Ricardo C Gonzalez; +Cc: greearb, jgarzik, scott.feldman, netdev On Thu, 2003-09-11 at 22:20, Ricardo C Gonzalez wrote: > Jamal wrote: > We should care about counting the packets being dropped on the > transmit side. Would it be the responsability of the driver to account for > this drops? Because each driver has a dedicated software queue and in my > opinion, the driver should account for this packets. This is really the schedulers responsibility. Its hard for the driver to keep track of why a packet was dropped. Example, could be dropped to make room for a higher priority packet thats being anticipated to show up soon. The simple default 3-band scheduler unfortunately doesnt quiet show its stats ...so simple way to see drops is: - install the prio qdisc ------ [root@jzny root]# tc qdisc add dev eth0 root prio [root@jzny root]# tc -s qdisc qdisc prio 8001: dev eth0 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 42 bytes 1 pkts (dropped 0, overlimits 0) ----- or you may wanna install a single pfifo queue with size of 1000 each (although this is a little too mediavial) example: #tc qdisc add dev eth0 root pfifo limit 1000 #tc -s qdisc qdisc pfifo 8002: dev eth0 limit 1000p Sent 0 bytes 0 pkts (dropped 0, overlimits 0) etc cheers, jamal ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-12 1:34 ` jamal 2003-09-12 2:20 ` Ricardo C Gonzalez @ 2003-09-13 3:49 ` David S. Miller 2003-09-13 11:52 ` Robert Olsson 2003-09-14 19:08 ` Ricardo C Gonzalez 1 sibling, 2 replies; 35+ messages in thread From: David S. Miller @ 2003-09-13 3:49 UTC (permalink / raw) To: hadi; +Cc: greearb, jgarzik, scott.feldman, netdev, ricardoz On 11 Sep 2003 21:34:23 -0400 jamal <hadi@cyberus.ca> wrote: > dont increase the tx descriptor ring size - that would truly wasting > memory; 256 is pretty adequate. > * increase instead the txquelen (as suggested by Davem); user space > tools like ip or ifconfig could do it. The standard size has been around > 100 for 100Mbps; i suppose it is fair to say that Gige can move data out > at 10x that; so set it to 1000. Maybe you can do this from the driver > based on what negotiated speed is detected? I spoke with Alexey once about this, actually tx_queue_len can be arbitrarily large but it should be reasonable nonetheless. Our preliminary conclusions were that values of 1000 for 100Mbit and faster were probably appropriate. Maybe something larger for 1Gbit, who knows. We also determined that the only connection between TX descriptor ring size and dev->tx_queue_len was that the latter should be large enough to handle, at a minimum, the amount of pending TX descriptor ACKs that can be pending considering mitigation et al. So if TX irq mitigation can defer up to N TX descriptor completions then dev->tx_queue_len must be at least that large. Back to the main topic, maybe we should set dev->tx_queue_len to 1000 by default for all ethernet devices. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-13 3:49 ` David S. Miller @ 2003-09-13 11:52 ` Robert Olsson 2003-09-15 12:12 ` jamal 2003-09-14 19:08 ` Ricardo C Gonzalez 1 sibling, 1 reply; 35+ messages in thread From: Robert Olsson @ 2003-09-13 11:52 UTC (permalink / raw) To: David S. Miller; +Cc: hadi, greearb, jgarzik, scott.feldman, netdev, ricardoz David S. Miller writes: > On 11 Sep 2003 21:34:23 -0400 > jamal <hadi@cyberus.ca> wrote: > > > dont increase the tx descriptor ring size - that would truly wasting > > memory; 256 is pretty adequate. > > * increase instead the txquelen (as suggested by Davem); user space > > tools like ip or ifconfig could do it. The standard size has been around > > 100 for 100Mbps; i suppose it is fair to say that Gige can move data out > > at 10x that; so set it to 1000. Maybe you can do this from the driver > > based on what negotiated speed is detected? > > I spoke with Alexey once about this, actually tx_queue_len can > be arbitrarily large but it should be reasonable nonetheless. > > Our preliminary conclusions were that values of 1000 for 100Mbit and > faster were probably appropriate. Maybe something larger for 1Gbit, > who knows. > > We also determined that the only connection between TX descriptor > ring size and dev->tx_queue_len was that the latter should be large > enough to handle, at a minimum, the amount of pending TX descriptor > ACKs that can be pending considering mitigation et al. > > So if TX irq mitigation can defer up to N TX descriptor completions > then dev->tx_queue_len must be at least that large. > > Back to the main topic, maybe we should set dev->tx_queue_len to > 1000 by default for all ethernet devices. Hello! Yes sounds like adequate setting for GIGE. This is what use for production and lab but rather than increasing dev->tx_queue_len to 1000 we replace the pfifo_fast with the pfifo qdisc w. setting a qlen of 1000. And with we have tx_descriptor_ring_size 256 which is tuned to the NIC's "TX service interval" with respect to interrupt mitigation etc. This seems good enough even for small packets. For routers this setting is even more crucial as we need to serialize several flows and we know the flows are bursty. Cheers. --ro ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-13 11:52 ` Robert Olsson @ 2003-09-15 12:12 ` jamal 2003-09-15 13:45 ` Robert Olsson 0 siblings, 1 reply; 35+ messages in thread From: jamal @ 2003-09-15 12:12 UTC (permalink / raw) To: Robert Olsson Cc: David S. Miller, greearb, jgarzik, scott.feldman, netdev, ricardoz On Sat, 2003-09-13 at 07:52, Robert Olsson wrote: > > > > I spoke with Alexey once about this, actually tx_queue_len can > > be arbitrarily large but it should be reasonable nonetheless. > > > > Our preliminary conclusions were that values of 1000 for 100Mbit and > > faster were probably appropriate. Maybe something larger for 1Gbit, > > who knows. If you recall we saw that even for the gent who was trying to do 100K TCP sockets on a 4 way SMP, 1000 was sufficient and no packets were dropped. > > > > We also determined that the only connection between TX descriptor > > ring size and dev->tx_queue_len was that the latter should be large > > enough to handle, at a minimum, the amount of pending TX descriptor > > ACKs that can be pending considering mitigation et al. > > > > So if TX irq mitigation can defer up to N TX descriptor completions > > then dev->tx_queue_len must be at least that large. > > > > Back to the main topic, maybe we should set dev->tx_queue_len to > > 1000 by default for all ethernet devices. > > Hello! > > Yes sounds like adequate setting for GIGE. This is what use for production > and lab but rather than increasing dev->tx_queue_len to 1000 we replace the > pfifo_fast with the pfifo qdisc w. setting a qlen of 1000. > I think this may not be good for the reason of QoS. You want BGP packets to be given priority over ftp. A single queue kills that. The current default 3 band queue is good enough, the only challenge being noone sees stats for it. I have a patch for the kernel at: http://www.cyberus.ca/~hadi/patches/restore.pfifo.kernel and for tc at: http://www.cyberus.ca/~hadi/patches/restore.pfifo.tc cheers, jamal ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-15 12:12 ` jamal @ 2003-09-15 13:45 ` Robert Olsson 2003-09-15 23:15 ` David S. Miller 0 siblings, 1 reply; 35+ messages in thread From: Robert Olsson @ 2003-09-15 13:45 UTC (permalink / raw) To: hadi Cc: Robert Olsson, David S. Miller, greearb, jgarzik, scott.feldman, netdev, ricardoz jamal writes: > I think this may not be good for the reason of QoS. You want BGP packets > to be given priority over ftp. A single queue kills that. Well so far single queue has been robust enough for BGP-sessions. Talking from own experiences... > The current default 3 band queue is good enough, the only challenge > being noone sees stats for it. I have a patch for the kernel at: > http://www.cyberus.ca/~hadi/patches/restore.pfifo.kernel > and for tc at: > http://www.cyberus.ca/~hadi/patches/restore.pfifo.tc Yes. I've missed this. Our lazy work-around for the missing stats is to install pfifo qdisc as said. IMO it should be included. Cheers. --ro ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-15 13:45 ` Robert Olsson @ 2003-09-15 23:15 ` David S. Miller 2003-09-16 9:28 ` Robert Olsson 0 siblings, 1 reply; 35+ messages in thread From: David S. Miller @ 2003-09-15 23:15 UTC (permalink / raw) To: Robert Olsson Cc: hadi, Robert.Olsson, greearb, jgarzik, scott.feldman, netdev, ricardoz On Mon, 15 Sep 2003 15:45:42 +0200 Robert Olsson <Robert.Olsson@data.slu.se> wrote: > > The current default 3 band queue is good enough, the only challenge > > being noone sees stats for it. I have a patch for the kernel at: > > http://www.cyberus.ca/~hadi/patches/restore.pfifo.kernel > > and for tc at: > > http://www.cyberus.ca/~hadi/patches/restore.pfifo.tc > > Yes. > I've missed this. Our lazy work-around for the missing stats is to install > pfifo qdisc as said. IMO it should be included. I've included Jamal's pfifo_fast statistic patch, and the change to increase ethernet's tx_queue_len to 1000 in all of my trees. Thanks. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-15 23:15 ` David S. Miller @ 2003-09-16 9:28 ` Robert Olsson 0 siblings, 0 replies; 35+ messages in thread From: Robert Olsson @ 2003-09-16 9:28 UTC (permalink / raw) To: David S. Miller Cc: kuznet, Robert Olsson, hadi, greearb, jgarzik, scott.feldman, netdev, ricardoz David S. Miller writes: > > > http://www.cyberus.ca/~hadi/patches/restore.pfifo.kernel > > > and for tc at: > > > http://www.cyberus.ca/~hadi/patches/restore.pfifo.tc > > I've included Jamal's pfifo_fast statistic patch, and the > change to increase ethernet's tx_queue_len to 1000 in all > of my trees. Thanks. We ask Alexey to include the tc part too. Cheers. --ro ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-13 3:49 ` David S. Miller 2003-09-13 11:52 ` Robert Olsson @ 2003-09-14 19:08 ` Ricardo C Gonzalez 2003-09-15 2:50 ` David Brownell 2004-05-15 12:14 ` TxDescriptors -> 1024 default. Please not for every NIC! Marc Herbert 1 sibling, 2 replies; 35+ messages in thread From: Ricardo C Gonzalez @ 2003-09-14 19:08 UTC (permalink / raw) To: David S. Miller; +Cc: hadi, greearb, jgarzik, scott.feldman, netdev David Miller wrote: >Back to the main topic, maybe we should set dev->tx_queue_len to >1000 by default for all ethernet devices. I definately agree with setting the dev->tx_queue_len to 1000 as a default for all ethernet adapters. All adapters will benefit from this change. regards, ---------------------------------------------------------------------------------- *** ALWAYS THINK POSITIVE *** ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-14 19:08 ` Ricardo C Gonzalez @ 2003-09-15 2:50 ` David Brownell 2003-09-15 8:17 ` David S. Miller 2004-05-15 12:14 ` TxDescriptors -> 1024 default. Please not for every NIC! Marc Herbert 1 sibling, 1 reply; 35+ messages in thread From: David Brownell @ 2003-09-15 2:50 UTC (permalink / raw) To: Ricardo C Gonzalez, David S. Miller Cc: hadi, greearb, jgarzik, scott.feldman, netdev Ricardo C Gonzalez wrote: > > David Miller wrote: > > >>Back to the main topic, maybe we should set dev->tx_queue_len to >>1000 by default for all ethernet devices. > > > > I definately agree with setting the dev->tx_queue_len to 1000 as a default > for all ethernet adapters. All adapters will benefit from this change. Except ones where CONFIG_EMBEDDED, maybe? Not everyone wants to spend that much memory, even when it's available... ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default 2003-09-15 2:50 ` David Brownell @ 2003-09-15 8:17 ` David S. Miller 0 siblings, 0 replies; 35+ messages in thread From: David S. Miller @ 2003-09-15 8:17 UTC (permalink / raw) To: David Brownell; +Cc: ricardoz, hadi, greearb, jgarzik, scott.feldman, netdev On Sun, 14 Sep 2003 19:50:56 -0700 David Brownell <david-b@pacbell.net> wrote: > Except ones where CONFIG_EMBEDDED, maybe? Not everyone wants > to spend that much memory, even when it's available... Dropping the packet between the network stack and the driver does waste memory for _LONGER_ periods of time. When we drop, TCP still hangs onto the buffer, and we'll send it again and again until it makes it and we get an ACK back or the connection completely times out. ^ permalink raw reply [flat|nested] 35+ messages in thread
* TxDescriptors -> 1024 default. Please not for every NIC! 2003-09-14 19:08 ` Ricardo C Gonzalez 2003-09-15 2:50 ` David Brownell @ 2004-05-15 12:14 ` Marc Herbert 2004-05-19 9:30 ` Marc Herbert 1 sibling, 1 reply; 35+ messages in thread From: Marc Herbert @ 2004-05-15 12:14 UTC (permalink / raw) To: netdev On Sun, 14 Sep 2003, Ricardo C Gonzalez wrote: > David Miller wrote: > > >Back to the main topic, maybe we should set dev->tx_queue_len to > >1000 by default for all ethernet devices. > > > I definately agree with setting the dev->tx_queue_len to 1000 as a default > for all ethernet adapters. All adapters will benefit from this change. > <http://oss.sgi.com/projects/netdev/archive/2003-09/threads.html#00247> Sorry to exhume this discussion but I only recently discovered this change, the hard way. I carefully read this old thread and did not grasp _every_ detail, but there is one thing that I am sure of: 1000 packets @ 1 Gb/s looks good, but on the other hand, 1000 full-size Ethernet packets @ 10 Mb/s are about 1.2 seconds long! Too little buffering means not enough dampering effect, which is very important for performance in asynchronous systems, granted. However, _too much_ buffering means too big and too variable latencies. When discussing buffers, duration is very often more important than size. Applications, TCP's dynamic (and kernel dynamics too?) do not care much about buffer sizes, they more often care about latencies (and throughput, of course). Buffers sizes is often "just a small matter of implementation" :-) For instance people designing routers talk about buffers in _milliseconds_ much more often than in _bytes_ (despite the fact that their memories cost more than in hosts, considering the throughputs involved). 100 packets @ 100 Mb/s was 12 ms. 1000 packets @ 1 Gb/s is still 12 ms. 12 ms is great. It's a "good" latency because it is the order of magnitude of real-world constants like: comfortable interactive applications, operating system sheduler granularity or propagation time in 2000 km of cable. But 1000 packets @ 100 Mb/s is 120 ms and is neither very good nor very useful anymore. 1000 packets @ 10 Mb/s is 1.2 s, which is ridiculous. It does mean that, when joe user is uploading some big file through his cheap Ethernet card, and that there are no other bottleneck/drops further in the network, every concurrent application will have to wait 1.2 s before accessing the network! It this hard to believe for you, just make the test yourself, it's very easy: force one of you NICs to 10Mb/s full duplex, txqueuelen 1000 and send a continuous flow to a nearby machine. Then try to ping anything. Imagine now that some packet is lost for whatever reason on some _other_ TCP connection going through this terrible 1.2 s queue. Then you need one SACK/RTX extra round trip time to recover from it: so it's now _2.4 s_ to deliver the data sent just after the dropped packet... Assuming of course TCP timers do not become confused by this huge latency and probably huge jitter. And I don't think you want to make fiddling with "tc" mandatory for joe user. Or tell him: "oh, please just 'ifconfig txqueuelen 10', or buy a new Ethernet card". I am unfortunately not familiar with this part of the linux kernel, but I really think that, if possible, txqueuelen should be initialized at some "constant 12 ms" and not at the "1000 packets" highly variable latency setting. I can imagine there are some corner cases, like for instance when some GEth NIC is hot-plugged into a 100 Mb/s, or jumbo frames, but hey, those are corner cases : as a first step, even a simple constant-per-model txqueuelen initialization would be already great. Cheers, Marc. PS: one workaround for joe user against this 1.2s latency would be to keep his SND_BUF and number of sockets small. But this is poor. -- "Je n'ai fait cette lettre-ci plus longue que parce que je n'ai pas eu le loisir de la faire plus courte." -- Blaise Pascal ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: TxDescriptors -> 1024 default. Please not for every NIC! 2004-05-15 12:14 ` TxDescriptors -> 1024 default. Please not for every NIC! Marc Herbert @ 2004-05-19 9:30 ` Marc Herbert 2004-05-19 10:27 ` Pekka Pietikainen 2004-05-19 11:54 ` Andi Kleen 0 siblings, 2 replies; 35+ messages in thread From: Marc Herbert @ 2004-05-19 9:30 UTC (permalink / raw) To: netdev On Sat, 15 May 2004, Marc Herbert wrote: > <http://oss.sgi.com/projects/netdev/archive/2003-09/threads.html#00247> > > Sorry to exhume this discussion but I only recently discovered this > change, the hard way. > > I am unfortunately not familiar with this part of the linux kernel, > but I really think that, if possible, txqueuelen should be initialized > at some "constant 12 ms" and not at the "1000 packets" highly variable > latency setting. I can imagine there are some corner cases, like for > instance when some GEth NIC is hot-plugged into a 100 Mb/s, or jumbo > frames, but hey, those are corner cases : as a first step, even a > simple constant-per-model txqueuelen initialization would be already > great. After some further study, I was glad to discover my suggestion above both easy and short to implement. See patch below. Trying to sum-it up: - Ricardo asks (among others) for a new 1000 packets default txqueuelen for Intel's e1000, based on some data (couldn't not find this data, please send me the pointer if you have it, thanks). - Me argues that we all lived happy for ages with this default setting of 100 packets @ 100 Mb/s (and lived approximately happy @ 10 Mb/s), but we'll soon see doom and gloom with this new and brutal change to 1000 packets for all this _legacy_ 10-100 Mb/s hardware. e1000 data only is not enough to justify this radical shift. If you are convinced by _both_ items above, then the patch below content _both_, and we're done. If you are not, then... wait for further discussion, including answers to latest Ricardo's post. PS: several people seem to think TCP "drops" packets when the qdisc is full. My analysis of the code _and_ my experiments makes me think they are wrong: TCP rather "blocks" when the qdisc is full. See explanation here: <http://oss.sgi.com/archives/netdev/2004-05/msg00151.html> (Subject: Re: TcpOutSegs way too optimistic (netstat -s)) ===== drivers/net/net_init.c 1.11 vs edited ===== --- 1.11/drivers/net/net_init.c Tue Sep 16 01:12:25 2003 +++ edited/drivers/net/net_init.c Wed May 19 11:05:34 2004 @@ -420,7 +420,10 @@ dev->hard_header_len = ETH_HLEN; dev->mtu = 1500; /* eth_mtu */ dev->addr_len = ETH_ALEN; - dev->tx_queue_len = 1000; /* Ethernet wants good queues */ + dev->tx_queue_len = 100; /* This is a sensible generic default for + 100 Mb/s: about 12ms with 1500 full size packets. + Drivers should tune this depending on interface + specificities and settings */ memset(dev->broadcast,0xFF, ETH_ALEN); ===== drivers/net/e1000/e1000_main.c 1.56 vs edited ===== --- 1.56/drivers/net/e1000/e1000_main.c Tue Feb 3 01:43:42 2004 +++ edited/drivers/net/e1000/e1000_main.c Wed May 19 03:14:32 2004 @@ -400,6 +400,8 @@ err = -ENOMEM; goto err_alloc_etherdev; } + + netdev->tx_queue_len = 1000; SET_MODULE_OWNER(netdev); ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: TxDescriptors -> 1024 default. Please not for every NIC! 2004-05-19 9:30 ` Marc Herbert @ 2004-05-19 10:27 ` Pekka Pietikainen 2004-05-20 14:11 ` Luis R. Rodriguez 2004-05-19 11:54 ` Andi Kleen 1 sibling, 1 reply; 35+ messages in thread From: Pekka Pietikainen @ 2004-05-19 10:27 UTC (permalink / raw) To: Marc Herbert; +Cc: netdev, prism54-devel On Wed, May 19, 2004 at 11:30:28AM +0200, Marc Herbert wrote: > - Me argues that we all lived happy for ages with this default > setting of 100 packets @ 100 Mb/s (and lived approximately happy @ > 10 Mb/s), but we'll soon see doom and gloom with this new and > brutal change to 1000 packets for all this _legacy_ 10-100 Mb/s > hardware. e1000 data only is not enough to justify this radical > shift. > > If you are convinced by _both_ items above, then the patch below > content _both_, and we're done. > > If you are not, then... wait for further discussion, including answers > to latest Ricardo's post. Not to mention that not all modern hardware is gigabit, current 2.6 seems to be setting txqueuelen of 1000 for 802.11 devices too (at least my prism54), which might be causing major problems for me. Well, I'm still trying to figure out whether it's txqueue or WEP that causes all traffic to stop (with rx invalid crypt packets showing up in iwconfig afterwards, AP is a linksys wrt54g in case it makes a difference) every now and then until a ifdown / ifup. Tried both vanilla 2.6 prism54 and CVS (which seems to have a reset on tx timeout thing added), but if txqueue is 1000 that won't easily get triggered will it? It's been running for a few days just fine with txqueue = 100 and no WEP, if it stays like that i'll start tweaking to find what exactly triggers it. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Re: TxDescriptors -> 1024 default. Please not for every NIC! 2004-05-19 10:27 ` Pekka Pietikainen @ 2004-05-20 14:11 ` Luis R. Rodriguez 2004-05-20 16:38 ` [Prism54-devel] " Jean Tourrilhes 0 siblings, 1 reply; 35+ messages in thread From: Luis R. Rodriguez @ 2004-05-20 14:11 UTC (permalink / raw) To: Pekka Pietikainen; +Cc: Marc Herbert, netdev, prism54-devel, Jean Tourrilhes [-- Attachment #1: Type: text/plain, Size: 1239 bytes --] On Wed, May 19, 2004 at 01:27:00PM +0300, Pekka Pietikainen wrote: > On Wed, May 19, 2004 at 11:30:28AM +0200, Marc Herbert wrote: > > - Me argues that we all lived happy for ages with this default > > setting of 100?packets @?100?Mb/s (and lived approximately happy @ > > 10 Mb/s), but we'll soon see doom and gloom with this new and > > brutal change to 1000?packets for all this _legacy_ 10-100 Mb/s > > hardware. e1000 data only is not enough to justify this radical > > shift. > > > > If you are convinced by _both_ items above, then the patch below > > content _both_, and we're done. > > > > If you are not, then... wait for further discussion, including answers > > to latest Ricardo's post. > > Not to mention that not all modern hardware is gigabit, current > 2.6 seems to be setting txqueuelen of 1000 for 802.11 devices too (at least > my prism54), which might be causing major problems for me. Considering 802.11b's peak is at 11Mbit and standard 802.11g is at 54Mbit (some manufacturers are using two channels and getting 108Mbit now) I'd think we should stick at 100, as the patch proposes. Jean? Luis -- GnuPG Key fingerprint = 113F B290 C6D2 0251 4D84 A34A 6ADD 4937 E20A 525E [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Prism54-devel] Re: TxDescriptors -> 1024 default. Please not for every NIC! 2004-05-20 14:11 ` Luis R. Rodriguez @ 2004-05-20 16:38 ` Jean Tourrilhes 2004-05-20 16:45 ` Tomasz Torcz 0 siblings, 1 reply; 35+ messages in thread From: Jean Tourrilhes @ 2004-05-20 16:38 UTC (permalink / raw) To: Pekka Pietikainen, Marc Herbert, netdev, prism54-devel On Thu, May 20, 2004 at 10:11:11AM -0400, Luis R. Rodriguez wrote: > On Wed, May 19, 2004 at 01:27:00PM +0300, Pekka Pietikainen wrote: > > On Wed, May 19, 2004 at 11:30:28AM +0200, Marc Herbert wrote: > > > - Me argues that we all lived happy for ages with this default > > > setting of 100?packets @?100?Mb/s (and lived approximately happy @ > > > 10 Mb/s), but we'll soon see doom and gloom with this new and > > > brutal change to 1000?packets for all this _legacy_ 10-100 Mb/s > > > hardware. e1000 data only is not enough to justify this radical > > > shift. > > > > > > If you are convinced by _both_ items above, then the patch below > > > content _both_, and we're done. > > > > > > If you are not, then... wait for further discussion, including answers > > > to latest Ricardo's post. > > > > Not to mention that not all modern hardware is gigabit, current > > 2.6 seems to be setting txqueuelen of 1000 for 802.11 devices too (at least > > my prism54), which might be causing major problems for me. > > Considering 802.11b's peak is at 11Mbit and standard 802.11g is at 54Mbit > (some manufacturers are using two channels and getting 108Mbit now) I'd > think we should stick at 100, as the patch proposes. Jean? > > Luis I never like to have huge queues of buffers. It waste memory, and degrade the latency, especially with competing sockets. In a theoritical stable system, you don't need buffers (you run everything synchronously), buffer are only needed to take care of the jitter in real networks. The real throughouput of 802.11g is more around 30Mb/s (at TCP/IP level). However, wireless networks tend to have more jitter (interference and contention). But, wireless cards tend to have a fair number of buffers in the hardware. I personally would stick with 100. The IrDA stack runs perfectly fine with 15 buffers at 4 Mb/s. If 100 is not enough, I think the problem is not the number of buffers, but somewhere else. For example, we might want to think about explicit socket callbacks (like I did in IrDA). But that's only personal opinions ;-) Have fun... Jean ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Prism54-devel] Re: TxDescriptors -> 1024 default. Please not for every NIC! 2004-05-20 16:38 ` [Prism54-devel] " Jean Tourrilhes @ 2004-05-20 16:45 ` Tomasz Torcz 2004-05-20 17:13 ` zero copy TX in benchmarks was " Andi Kleen 0 siblings, 1 reply; 35+ messages in thread From: Tomasz Torcz @ 2004-05-20 16:45 UTC (permalink / raw) To: netdev On Thu, May 20, 2004 at 09:38:11AM -0700, Jean Tourrilhes wrote: > I personally would stick with 100. The IrDA stack runs > perfectly fine with 15 buffers at 4 Mb/s. If 100 is not enough, I > think the problem is not the number of buffers, but somewhere else. I don't know how much trollish or true is that comment: http://bsd.slashdot.org/comments.pl?sid=106258&cid=9049422 but it suggest, that Linux' stack having no BSD like mbuf functionality, is not perfect for fast transmission. Maybe some network guru cna comment ? -- Tomasz Torcz ,,(...) today's high-end is tomorrow's embedded processor.'' zdzichu@irc.-nie.spam-.pl -- Mitchell Blank on LKML ^ permalink raw reply [flat|nested] 35+ messages in thread
* zero copy TX in benchmarks was Re: [Prism54-devel] Re: TxDescriptors -> 1024 default. Please not for every NIC! 2004-05-20 16:45 ` Tomasz Torcz @ 2004-05-20 17:13 ` Andi Kleen 0 siblings, 0 replies; 35+ messages in thread From: Andi Kleen @ 2004-05-20 17:13 UTC (permalink / raw) To: Tomasz Torcz; +Cc: netdev On Thu, May 20, 2004 at 06:45:16PM +0200, Tomasz Torcz wrote: > On Thu, May 20, 2004 at 09:38:11AM -0700, Jean Tourrilhes wrote: > > I personally would stick with 100. The IrDA stack runs > > perfectly fine with 15 buffers at 4 Mb/s. If 100 is not enough, I > > think the problem is not the number of buffers, but somewhere else. Not sure why you post this to this thread? It has nothing to do with the previous message. > > I don't know how much trollish or true is that comment: > http://bsd.slashdot.org/comments.pl?sid=106258&cid=9049422 Linux sk_buffs and BSD mbufs are not very different anymore today. The BSD mbufs have been getting more sk_buff'ish over time, and sk_buffs have grown some properties of mbufs. They both have changed to optionally pass references of memory around instead of copying always, which is what counts here. > but it suggest, that Linux' stack having no BSD like mbuf functionality, > is not perfect for fast transmission. Maybe some network guru > cna comment ? I have not read all the details, but I suppose they used sendmsg() instead of sendfile() for this test. NetBSD can use zero copy TX in this case; Linux can only with sendfile and sendmsg will copy. Obvious linux will be slower then because a copy can cost quite a lot of CPU. Or rather it is not really the CPU cost that is the problem here, but the bandwidth usage - very high speed networking i s essentially memory bandwidth limited and copying over the CPU adds additional bandwidth requirements to the memory subsystem. There was an implementation of zero copy sendmsg() for linux long ago, but it was removed because it was fundamentally incompatible with good SMP scaling, because it would require remote TLB flushes over possible many CPUs (if you search the archives of this list you will find long threads about it). It would not be very hard to readd (Linux has all the low level infrastructure needed for it), but it doesn't make sense. NetBSD may have the luxury to not care about MP scaling, but Linux doesn't. The disadvantage of sendfile is that you can only transmit files directly; if you want to transmit data directly out of an process' address space you have to put them into a file mmap and sendfile from there. This may be a bit inconvenient if the basic unit of data in your program isn't files. There was an plan suggested to fix that (implement zero copy TX for POSIX AIO instead of BSD sockets), which would not have this problem. POSIX AIO has all the infrastructure to do zero copy IO without problematic and slow TLB flushes. Just so far nobody implemented that. In practice it is not a too big issue because many tuned servers (your typical ftpd, httpd or samba server) use sendfile already. -Andi ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: TxDescriptors -> 1024 default. Please not for every NIC! 2004-05-19 9:30 ` Marc Herbert 2004-05-19 10:27 ` Pekka Pietikainen @ 2004-05-19 11:54 ` Andi Kleen 1 sibling, 0 replies; 35+ messages in thread From: Andi Kleen @ 2004-05-19 11:54 UTC (permalink / raw) To: Marc Herbert; +Cc: netdev Marc Herbert <marc.herbert@free.fr> writes: > > PS: several people seem to think TCP "drops" packets when the qdisc is > full. My analysis of the code _and_ my experiments makes me think they > are wrong: TCP rather "blocks" when the qdisc is full. See explanation > here: <http://oss.sgi.com/archives/netdev/2004-05/msg00151.html> > (Subject: Re: TcpOutSegs way too optimistic (netstat -s)) This behaviour was only added relatively recently (in late 2.3.x timeframe) I believe all the default queue lengths tunings were done before that. So it would probably make sense to reevaluate/rebenchmark the default queue lengths for various devices with the newer code. -Andi ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2004-05-20 17:13 UTC | newest] Thread overview: 35+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-09-09 3:14 [e1000 2.6 10/11] TxDescriptors -> 1024 default Feldman, Scott 2003-09-11 19:18 ` Jeff Garzik 2003-09-11 19:45 ` Ben Greear 2003-09-11 19:59 ` Jeff Garzik 2003-09-11 20:12 ` David S. Miller 2003-09-11 20:40 ` Ben Greear 2003-09-11 21:07 ` David S. Miller 2003-09-11 21:29 ` Ben Greear 2003-09-11 21:29 ` David S. Miller 2003-09-11 21:47 ` Ricardo C Gonzalez 2003-09-11 22:00 ` Jeff Garzik 2003-09-11 22:15 ` Ben Greear 2003-09-11 23:02 ` David S. Miller 2003-09-11 23:22 ` Ben Greear 2003-09-11 23:29 ` David S. Miller 2003-09-12 1:34 ` jamal 2003-09-12 2:20 ` Ricardo C Gonzalez 2003-09-12 3:05 ` jamal 2003-09-13 3:49 ` David S. Miller 2003-09-13 11:52 ` Robert Olsson 2003-09-15 12:12 ` jamal 2003-09-15 13:45 ` Robert Olsson 2003-09-15 23:15 ` David S. Miller 2003-09-16 9:28 ` Robert Olsson 2003-09-14 19:08 ` Ricardo C Gonzalez 2003-09-15 2:50 ` David Brownell 2003-09-15 8:17 ` David S. Miller 2004-05-15 12:14 ` TxDescriptors -> 1024 default. Please not for every NIC! Marc Herbert 2004-05-19 9:30 ` Marc Herbert 2004-05-19 10:27 ` Pekka Pietikainen 2004-05-20 14:11 ` Luis R. Rodriguez 2004-05-20 16:38 ` [Prism54-devel] " Jean Tourrilhes 2004-05-20 16:45 ` Tomasz Torcz 2004-05-20 17:13 ` zero copy TX in benchmarks was " Andi Kleen 2004-05-19 11:54 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).