[e1000 2.6 10/11] TxDescriptors -> 1024 default

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [e1000 2.6 10/11] TxDescriptors -> 1024 default
@ 2003-09-09  3:14 Feldman, Scott
  2003-09-11 19:18 ` Jeff Garzik
  0 siblings, 1 reply; 35+ messages in thread
From: Feldman, Scott @ 2003-09-09  3:14 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, ricardoz


* Change the default number of Tx descriptors from 256 to 1024.
  Data from [ricardoz@us.ibm.com] shows it's easy to overrun
  the Tx desc queue.

-------------

diff -Nuarp linux-2.6.0-test4/drivers/net/e1000/e1000_param.c linux-2.6.0-test4/drivers/net/e1000.new/e1000_param.c
--- linux-2.6.0-test4/drivers/net/e1000/e1000_param.c	2003-08-22 16:57:59.000000000 -0700
+++ linux-2.6.0-test4/drivers/net/e1000.new/e1000_param.c	2003-09-08 09:13:12.000000000 -0700
@@ -63,9 +63,10 @@ MODULE_PARM_DESC(X, S);
 /* Transmit Descriptor Count
  *
  * Valid Range: 80-256 for 82542 and 82543 gigabit ethernet controllers
- * Valid Range: 80-4096 for 82544
+ * Valid Range: 80-4096 for 82544 and newer
  *
- * Default Value: 256
+ * Default Value: 256 for 82542 and 82543 gigabit ethernet controllers
+ * Default Value: 1024 for 82544 and newer
  */
 
 E1000_PARAM(TxDescriptors, "Number of transmit descriptors");
@@ -73,7 +74,7 @@ E1000_PARAM(TxDescriptors, "Number of tr
 /* Receive Descriptor Count
  *
  * Valid Range: 80-256 for 82542 and 82543 gigabit ethernet controllers
- * Valid Range: 80-4096 for 82544
+ * Valid Range: 80-4096 for 82544 and newer
  *
  * Default Value: 256
  */
@@ -200,6 +201,7 @@ E1000_PARAM(InterruptThrottleRate, "Inte
 #define MAX_TXD                      256
 #define MIN_TXD                       80
 #define MAX_82544_TXD               4096
+#define DEFAULT_82544_TXD           1024
 
 #define DEFAULT_RXD                  256
 #define MAX_RXD                      256
@@ -320,12 +322,15 @@ e1000_check_options(struct e1000_adapter
 		struct e1000_option opt = {
 			.type = range_option,
 			.name = "Transmit Descriptors",
-			.err  = "using default of " __MODULE_STRING(DEFAULT_TXD),
-			.def  = DEFAULT_TXD,
 			.arg  = { .r = { .min = MIN_TXD }}
 		};
 		struct e1000_desc_ring *tx_ring = &adapter->tx_ring;
 		e1000_mac_type mac_type = adapter->hw.mac_type;
+		opt.err = mac_type < e1000_82544 ?
+			"using default of " __MODULE_STRING(DEFAULT_TXD) :
+			"using default of " __MODULE_STRING(DEFAULT_82544_TXD);
+		opt.def = mac_type < e1000_82544 ?
+			DEFAULT_TXD : DEFAULT_82544_TXD;
 		opt.arg.r.max = mac_type < e1000_82544 ?
 			MAX_TXD : MAX_82544_TXD;
 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-09  3:14 [e1000 2.6 10/11] TxDescriptors -> 1024 default Feldman, Scott
@ 2003-09-11 19:18 ` Jeff Garzik
  2003-09-11 19:45   ` Ben Greear
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Garzik @ 2003-09-11 19:18 UTC (permalink / raw)
  To: Feldman, Scott; +Cc: netdev, ricardoz

Feldman, Scott wrote:
> * Change the default number of Tx descriptors from 256 to 1024.
>   Data from [ricardoz@us.ibm.com] shows it's easy to overrun
>   the Tx desc queue.

All e1000 patches applied except this one.

Of _course_ it's easy to overrun the Tx desc queue.  That's why we have 
a TX queue sitting on top of the NIC's hardware queue.  And TCP socket 
buffers on top of that.  And similar things.

Descriptor increases like this are usually the result of some sillyhead 
blasting out UDP packets, and then wondering why he sees packet loss on 
the local computer (the "blast out packets" side).

You're just wasting memory.

	Jeff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-11 19:18 ` Jeff Garzik
@ 2003-09-11 19:45   ` Ben Greear
  2003-09-11 19:59     ` Jeff Garzik
  2003-09-11 20:12     ` David S. Miller
  0 siblings, 2 replies; 35+ messages in thread
From: Ben Greear @ 2003-09-11 19:45 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Feldman, Scott, netdev, ricardoz

Jeff Garzik wrote:
> Feldman, Scott wrote:
> 
>> * Change the default number of Tx descriptors from 256 to 1024.
>>   Data from [ricardoz@us.ibm.com] shows it's easy to overrun
>>   the Tx desc queue.
> 
> 
> 
> All e1000 patches applied except this one.
> 
> Of _course_ it's easy to overrun the Tx desc queue.  That's why we have 
> a TX queue sitting on top of the NIC's hardware queue.  And TCP socket 
> buffers on top of that.  And similar things.
> 
> Descriptor increases like this are usually the result of some sillyhead 
> blasting out UDP packets, and then wondering why he sees packet loss on 
> the local computer (the "blast out packets" side).

Erm, shouldn't the local machine back itself off if the various
queues are full?  Some time back I looked through the code and it
appeared to.  If not, I think it should.

> 
> You're just wasting memory.
> 
>     Jeff
> 
> 
> 


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-11 19:45   ` Ben Greear
@ 2003-09-11 19:59     ` Jeff Garzik
  2003-09-11 20:12     ` David S. Miller
  1 sibling, 0 replies; 35+ messages in thread
From: Jeff Garzik @ 2003-09-11 19:59 UTC (permalink / raw)
  To: Ben Greear; +Cc: Feldman, Scott, netdev, ricardoz

Ben Greear wrote:
> Jeff Garzik wrote:
> 
>> Feldman, Scott wrote:
>>
>>> * Change the default number of Tx descriptors from 256 to 1024.
>>>   Data from [ricardoz@us.ibm.com] shows it's easy to overrun
>>>   the Tx desc queue.
>>
>>
>>
>>
>> All e1000 patches applied except this one.
>>
>> Of _course_ it's easy to overrun the Tx desc queue.  That's why we 
>> have a TX queue sitting on top of the NIC's hardware queue.  And TCP 
>> socket buffers on top of that.  And similar things.
>>
>> Descriptor increases like this are usually the result of some 
>> sillyhead blasting out UDP packets, and then wondering why he sees 
>> packet loss on the local computer (the "blast out packets" side).
> 
> 
> Erm, shouldn't the local machine back itself off if the various
> queues are full?  Some time back I looked through the code and it
> appeared to.  If not, I think it should.


Given the guarantees of the protocol, the net stack has the freedom to 
drop UDP packets, for example at times when (for TCP) one would 
otherwise queue a packet for retransmit.

	Jeff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-11 19:45   ` Ben Greear
  2003-09-11 19:59     ` Jeff Garzik
@ 2003-09-11 20:12     ` David S. Miller
  2003-09-11 20:40       ` Ben Greear
  1 sibling, 1 reply; 35+ messages in thread
From: David S. Miller @ 2003-09-11 20:12 UTC (permalink / raw)
  To: Ben Greear; +Cc: jgarzik, scott.feldman, netdev, ricardoz

On Thu, 11 Sep 2003 12:45:55 -0700
Ben Greear <greearb@candelatech.com> wrote:

> Erm, shouldn't the local machine back itself off if the various
> queues are full?  Some time back I looked through the code and it
> appeared to.  If not, I think it should.

Generic networking device queues drop when the overflow.

Whatever dev->tx_queue_len is set to, the device driver needs
to be prepared to be able to queue successfully.

Most people run into problems when they run stupid UDP applications
that send a stream of tinygrams (<~64 bytes).  The solutions are to
either fix the UDP app or restrict it's socket send buffer size.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-11 20:12     ` David S. Miller
@ 2003-09-11 20:40       ` Ben Greear
  2003-09-11 21:07         ` David S. Miller
  0 siblings, 1 reply; 35+ messages in thread
From: Ben Greear @ 2003-09-11 20:40 UTC (permalink / raw)
  To: David S. Miller; +Cc: jgarzik, scott.feldman, netdev, ricardoz

David S. Miller wrote:
> Generic networking device queues drop when the overflow.
> 
> Whatever dev->tx_queue_len is set to, the device driver needs
> to be prepared to be able to queue successfully.
> 
> Most people run into problems when they run stupid UDP applications
> that send a stream of tinygrams (<~64 bytes).  The solutions are to
> either fix the UDP app or restrict it's socket send buffer size.

Is this close to how it works?

So, assume we configure a 10MB socket send queue on our UDP socket...

Select says its writable up to at least 5MB.

We write 5MB of 64byte packets "righ now".

Did we just drop a large number of packets?

I would expect that the packets, up to 10MB, are buffered in some
list/fifo in the socket code, and that as the underlying device queue
empties itself, the socket will feed it more packets.

The device queue, in turn, is emptied as the driver is able to fill it's
TxDescriptors, and the hardware empties the TxDescriptors.

Obviously, I'm confused somewhere....

Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-11 20:40       ` Ben Greear
@ 2003-09-11 21:07         ` David S. Miller
  2003-09-11 21:29           ` Ben Greear
  0 siblings, 1 reply; 35+ messages in thread
From: David S. Miller @ 2003-09-11 21:07 UTC (permalink / raw)
  To: Ben Greear; +Cc: jgarzik, scott.feldman, netdev, ricardoz

On Thu, 11 Sep 2003 13:40:44 -0700
Ben Greear <greearb@candelatech.com> wrote:

> So, assume we configure a 10MB socket send queue on our UDP socket...
> 
> Select says its writable up to at least 5MB.
> 
> We write 5MB of 64byte packets "righ now".
> 
> Did we just drop a large number of packets?

Yes, we did _iff_ dev->tx_queue_len is less than or equal
to (5MB / (64 + sizeof(udp_id_headers))).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-11 21:29           ` Ben Greear
@ 2003-09-11 21:29             ` David S. Miller
  2003-09-11 21:47               ` Ricardo C Gonzalez
  2003-09-11 22:15               ` Ben Greear
  0 siblings, 2 replies; 35+ messages in thread
From: David S. Miller @ 2003-09-11 21:29 UTC (permalink / raw)
  To: Ben Greear; +Cc: jgarzik, scott.feldman, netdev, ricardoz

On Thu, 11 Sep 2003 14:29:43 -0700
Ben Greear <greearb@candelatech.com> wrote:

> Thanks for that clarification.  Is there no way to tell
> at 'sendto' time that the buffers are over-full, and either
> block or return -EBUSY or something like that?

The TX queue state can change by hundreds of packets by
the time we are finished making the "decision", also how would
you like to "wake" up sockets when the TX queue is liberated.
That extra overhead and logic would be wonderful for performance.

No, this is all nonsense.  Packet scheduling and queueing is
an opaque layer to all the upper layers.  It is the only sensible
design.

IP transmit is black hole that may drop packets at any moment,
any datagram application not prepared for this should be prepared
for troubles or choose to move over to something like TCP.

I listed even a workaround for such stupid UDP apps, simply limit
their socket send queue limits.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-11 21:07         ` David S. Miller
@ 2003-09-11 21:29           ` Ben Greear
  2003-09-11 21:29             ` David S. Miller
  0 siblings, 1 reply; 35+ messages in thread
From: Ben Greear @ 2003-09-11 21:29 UTC (permalink / raw)
  To: David S. Miller; +Cc: jgarzik, scott.feldman, netdev, ricardoz

David S. Miller wrote:
> On Thu, 11 Sep 2003 13:40:44 -0700
> Ben Greear <greearb@candelatech.com> wrote:
> 
> 
>>So, assume we configure a 10MB socket send queue on our UDP socket...
>>
>>Select says its writable up to at least 5MB.
>>
>>We write 5MB of 64byte packets "righ now".
>>
>>Did we just drop a large number of packets?
> 
> 
> Yes, we did _iff_ dev->tx_queue_len is less than or equal
> to (5MB / (64 + sizeof(udp_id_headers))).

Thanks for that clarification.  Is there no way to tell
at 'sendto' time that the buffers are over-full, and either
block or return -EBUSY or something like that?

Perhaps the poll logic should also take the underlying buffer
into account and not show the socket as writable in this case?

Supposing in the above example, I set tx_queue_len to
(5MB / (64 + sizeof(udp_id_headers))), will
the packets now be dropped in the driver instead, or will there
be no more (local) drops?

Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-11 21:29             ` David S. Miller
@ 2003-09-11 21:47               ` Ricardo C Gonzalez
  2003-09-11 22:00                 ` Jeff Garzik
  2003-09-11 22:15               ` Ben Greear
  1 sibling, 1 reply; 35+ messages in thread
From: Ricardo C Gonzalez @ 2003-09-11 21:47 UTC (permalink / raw)
  To: David S. Miller; +Cc: greearb, jgarzik, scott.feldman, netdev

>IP transmit is black hole that may drop packets at any moment,
>any datagram application not prepared for this should be prepared
>for troubles or choose to move over to something like TCP.

As I said before, please do not make this a UDP issue. The data I sent out
was taken using a TCP_STREAM test case. Please review it.

regards,
----------------------------------------------------------------------------------

***  ALWAYS THINK POSITIVE ***

Rick Gonzalez
IBM Linux Performance Group
Building: 905    Office: 7G019
Phone: (512) 838-0623

"David S. Miller" <davem@redhat.com> on 09/11/2003 04:29:06 PM

To:    Ben Greear <greearb@candelatech.com>
cc:    jgarzik@pobox.com, scott.feldman@intel.com, netdev@oss.sgi.com,
       Ricardo C Gonzalez/Austin/IBM@ibmus
Subject:    Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default

On Thu, 11 Sep 2003 14:29:43 -0700
Ben Greear <greearb@candelatech.com> wrote:

> Thanks for that clarification.  Is there no way to tell
> at 'sendto' time that the buffers are over-full, and either
> block or return -EBUSY or something like that?

The TX queue state can change by hundreds of packets by
the time we are finished making the "decision", also how would
you like to "wake" up sockets when the TX queue is liberated.
That extra overhead and logic would be wonderful for performance.

No, this is all nonsense.  Packet scheduling and queueing is
an opaque layer to all the upper layers.  It is the only sensible
design.

IP transmit is black hole that may drop packets at any moment,
any datagram application not prepared for this should be prepared
for troubles or choose to move over to something like TCP.

I listed even a workaround for such stupid UDP apps, simply limit
 their socket send queue limits.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-11 21:47               ` Ricardo C Gonzalez
@ 2003-09-11 22:00                 ` Jeff Garzik
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff Garzik @ 2003-09-11 22:00 UTC (permalink / raw)
  To: Ricardo C Gonzalez; +Cc: David S. Miller, greearb, scott.feldman, netdev

Ricardo C Gonzalez wrote:
> 
> 
>>IP transmit is black hole that may drop packets at any moment,
>>any datagram application not prepared for this should be prepared
>>for troubles or choose to move over to something like TCP.
> 
> 
> 
> As I said before, please do not make this a UDP issue. The data I sent out
> was taken using a TCP_STREAM test case. Please review it.


Your own words say "CPUs can fill TX queue".  We already know this. 
CPUs have been doing wire speed for ages.

	Jeff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-11 21:29             ` David S. Miller
  2003-09-11 21:47               ` Ricardo C Gonzalez
@ 2003-09-11 22:15               ` Ben Greear
  2003-09-11 23:02                 ` David S. Miller
  1 sibling, 1 reply; 35+ messages in thread
From: Ben Greear @ 2003-09-11 22:15 UTC (permalink / raw)
  To: David S. Miller; +Cc: jgarzik, scott.feldman, netdev, ricardoz

David S. Miller wrote:
> On Thu, 11 Sep 2003 14:29:43 -0700
> Ben Greear <greearb@candelatech.com> wrote:
> 
> 
>>Thanks for that clarification.  Is there no way to tell
>>at 'sendto' time that the buffers are over-full, and either
>>block or return -EBUSY or something like that?
> 
> 
> The TX queue state can change by hundreds of packets by
> the time we are finished making the "decision", also how would
> you like to "wake" up sockets when the TX queue is liberated.

So, at some point the decision is already made that we must drop
the packet, or that we can enqueue it.  This is where I would propose
we block the thing trying to enqueue, or at least propagate a failure
code back up the stack(s) so that the packet can be retried by the
calling layer.

Preferably, one would propagate the error all the way to userspace
and let them deal with it, just like we currently deal with socket
queue full issues.

> That extra overhead and logic would be wonderful for performance.

The cost of a retransmit is also expensive, whether it is some hacked
up UDP protocol or for TCP.  Even if one had to implement callbacks
from the device queue to the interested sockets, this should not
be a large performance hit.

> 
> No, this is all nonsense.  Packet scheduling and queueing is
> an opaque layer to all the upper layers.  It is the only sensible
> design.

This is possible, but it does not seem cut and dried to me.  If there
is any documentation or research that support this assertion, please
do let us know.

> 
> IP transmit is black hole that may drop packets at any moment,
> any datagram application not prepared for this should be prepared
> for troubles or choose to move over to something like TCP.
> 
> I listed even a workaround for such stupid UDP apps, simply limit
> their socket send queue limits.

And the original poster shows how a similar problem slows down TCP
as well due to local dropped packets.  Don't you think we'd get better
TCP throughput if we instead had the calling code wait 1us for the buffers
to clear?

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-11 22:15               ` Ben Greear
@ 2003-09-11 23:02                 ` David S. Miller
  2003-09-11 23:22                   ` Ben Greear
  0 siblings, 1 reply; 35+ messages in thread
From: David S. Miller @ 2003-09-11 23:02 UTC (permalink / raw)
  To: Ben Greear; +Cc: jgarzik, scott.feldman, netdev, ricardoz

On Thu, 11 Sep 2003 15:15:19 -0700
Ben Greear <greearb@candelatech.com> wrote:

> And the original poster shows how a similar problem slows down TCP
> as well due to local dropped packets.

So, again, dampen the per-socket send queue sizes.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-11 23:02                 ` David S. Miller
@ 2003-09-11 23:22                   ` Ben Greear
  2003-09-11 23:29                     ` David S. Miller
  2003-09-12  1:34                     ` jamal
  0 siblings, 2 replies; 35+ messages in thread
From: Ben Greear @ 2003-09-11 23:22 UTC (permalink / raw)
  Cc: jgarzik, scott.feldman, netdev, ricardoz

David S. Miller wrote:
> On Thu, 11 Sep 2003 15:15:19 -0700
> Ben Greear <greearb@candelatech.com> wrote:
> 
> 
>>And the original poster shows how a similar problem slows down TCP
>>as well due to local dropped packets.
> 
> 
> So, again, dampen the per-socket send queue sizes.

That's just a band-aid to cover up the flaw with the lack
of queue-pressure feedback to the higher stacks, as would be increasing the
TxDescriptors for that matter.

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-11 23:22                   ` Ben Greear
@ 2003-09-11 23:29                     ` David S. Miller
  2003-09-12  1:34                     ` jamal
  1 sibling, 0 replies; 35+ messages in thread
From: David S. Miller @ 2003-09-11 23:29 UTC (permalink / raw)
  To: Ben Greear; +Cc: jgarzik, scott.feldman, netdev, ricardoz

On Thu, 11 Sep 2003 16:22:35 -0700
Ben Greear <greearb@candelatech.com> wrote:

> David S. Miller wrote:
> > So, again, dampen the per-socket send queue sizes.
> 
> That's just a band-aid to cover up the flaw with the lack
> of queue-pressure feedback to the higher stacks, as would be increasing the
> TxDescriptors for that matter.

The whole point of the various packet scheduler algorithms
are foregone if we're just going to queue up and send the
crap again.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-11 23:22                   ` Ben Greear
  2003-09-11 23:29                     ` David S. Miller
@ 2003-09-12  1:34                     ` jamal
  2003-09-12  2:20                       ` Ricardo C Gonzalez
  2003-09-13  3:49                       ` David S. Miller
  1 sibling, 2 replies; 35+ messages in thread
From: jamal @ 2003-09-12  1:34 UTC (permalink / raw)
  To: Ben Greear; +Cc: jgarzik, scott.feldman, netdev, ricardoz

Scott,

dont increase the tx descriptor ring size - that would truly wasting
memory; 256 is pretty adequate.
* increase instead the txquelen (as suggested by Davem); user space
tools like ip or ifconfig could do it. The standard size has been around
100 for 100Mbps; i suppose it is fair to say that Gige can move data out
at 10x that; so set it to 1000. Maybe you can do this from the driver
based on what negotiated speed is detected?

--------
[root@jzny root]# ip link ls eth0
4: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
    link/ether 00:b0:d0:05:ae:81 brd ff:ff:ff:ff:ff:ff

[root@jzny root]# ip  link set[root@jzny root]# ip link ls eth0
4: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:b0:d0:05:ae:81 brd ff:ff:ff:ff:ff:ff eth0 txqueuelen
1000
-------

TCP already reacts on packets dropped at the scheduler level, UDP would
be too hard to enforce since the logic is typically on an app above udp.
So just conrtol it via the socket queue size.

cheers,
jamal

On Thu, 2003-09-11 at 19:22, Ben Greear wrote:
> David S. Miller wrote:
> > On Thu, 11 Sep 2003 15:15:19 -0700
> > Ben Greear <greearb@candelatech.com> wrote:
> > 
> > 
> >>And the original poster shows how a similar problem slows down TCP
> >>as well due to local dropped packets.
> > 
> > 
> > So, again, dampen the per-socket send queue sizes.
> 
> That's just a band-aid to cover up the flaw with the lack
> of queue-pressure feedback to the higher stacks, as would be increasing the
> TxDescriptors for that matter.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-12  1:34                     ` jamal
@ 2003-09-12  2:20                       ` Ricardo C Gonzalez
  2003-09-12  3:05                         ` jamal
  2003-09-13  3:49                       ` David S. Miller
  1 sibling, 1 reply; 35+ messages in thread
From: Ricardo C Gonzalez @ 2003-09-12  2:20 UTC (permalink / raw)
  To: hadi; +Cc: greearb, jgarzik, scott.feldman, netdev



Jamal wrote:

>* increase instead the txquelen (as suggested by Davem); user space
>tools like ip or ifconfig could do it. The standard size has been around
>100 for 100Mbps; i suppose it is fair to say that Gige can move data out
>at 10x that; so set it to 1000. Maybe you can do this from the driver
>based on what negotiated speed is detected?

      This is also another way to do it. As long as we make it harder for
users to drop packets and get up to date with Gigabit speeds. We would also
have to think about the upcomming 10Gige adapters and their queue sizes,
but that is a separate issue. Anyway, the driver can easly set the
txqueuelen to 1000.

      We should care about counting the packets being dropped on the
transmit side. Would it be the responsability of the driver to account for
this drops? Because each driver has a dedicated software queue and in my
opinion, the driver should account for this packets.


regards,


----------------------------------------------------------------------------------

***  ALWAYS THINK POSITIVE ***

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-12  2:20                       ` Ricardo C Gonzalez
@ 2003-09-12  3:05                         ` jamal
  0 siblings, 0 replies; 35+ messages in thread
From: jamal @ 2003-09-12  3:05 UTC (permalink / raw)
  To: Ricardo C Gonzalez; +Cc: greearb, jgarzik, scott.feldman, netdev

On Thu, 2003-09-11 at 22:20, Ricardo C Gonzalez wrote:
> Jamal wrote:

>       We should care about counting the packets being dropped on the
> transmit side. Would it be the responsability of the driver to account for
> this drops? Because each driver has a dedicated software queue and in my
> opinion, the driver should account for this packets.

This is really the schedulers responsibility. Its hard for the driver to
keep track of why a packet was dropped. Example, could be dropped to
make room for a higher priority packet thats being anticipated to show
up soon.

The simple default 3-band scheduler unfortunately doesnt quiet show its
stats ...so simple way to see drops is:

- install the prio qdisc 
------
[root@jzny root]# tc qdisc add dev eth0 root prio
[root@jzny root]# tc -s qdisc
qdisc prio 8001: dev eth0 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1
1
 Sent 42 bytes 1 pkts (dropped 0, overlimits 0) 
-----

or you may wanna install a single pfifo queue with size of 1000 each
(although this is a little too mediavial)

example:
#tc qdisc add dev eth0 root pfifo limit 1000
#tc -s qdisc
qdisc pfifo 8002: dev eth0 limit 1000p
 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 

etc

cheers,
jamal

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-12  1:34                     ` jamal
  2003-09-12  2:20                       ` Ricardo C Gonzalez
@ 2003-09-13  3:49                       ` David S. Miller
  2003-09-13 11:52                         ` Robert Olsson
  2003-09-14 19:08                         ` Ricardo C Gonzalez
  1 sibling, 2 replies; 35+ messages in thread
From: David S. Miller @ 2003-09-13  3:49 UTC (permalink / raw)
  To: hadi; +Cc: greearb, jgarzik, scott.feldman, netdev, ricardoz

On 11 Sep 2003 21:34:23 -0400
jamal <hadi@cyberus.ca> wrote:

> dont increase the tx descriptor ring size - that would truly wasting
> memory; 256 is pretty adequate.
> * increase instead the txquelen (as suggested by Davem); user space
> tools like ip or ifconfig could do it. The standard size has been around
> 100 for 100Mbps; i suppose it is fair to say that Gige can move data out
> at 10x that; so set it to 1000. Maybe you can do this from the driver
> based on what negotiated speed is detected?

I spoke with Alexey once about this, actually tx_queue_len can
be arbitrarily large but it should be reasonable nonetheless.

Our preliminary conclusions were that values of 1000 for 100Mbit and
faster were probably appropriate.  Maybe something larger for 1Gbit,
who knows.

We also determined that the only connection between TX descriptor
ring size and dev->tx_queue_len was that the latter should be large
enough to handle, at a minimum, the amount of pending TX descriptor
ACKs that can be pending considering mitigation et al.

So if TX irq mitigation can defer up to N TX descriptor completions
then dev->tx_queue_len must be at least that large.

Back to the main topic, maybe we should set dev->tx_queue_len to
1000 by default for all ethernet devices.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-13  3:49                       ` David S. Miller
@ 2003-09-13 11:52                         ` Robert Olsson
  2003-09-15 12:12                           ` jamal
  2003-09-14 19:08                         ` Ricardo C Gonzalez
  1 sibling, 1 reply; 35+ messages in thread
From: Robert Olsson @ 2003-09-13 11:52 UTC (permalink / raw)
  To: David S. Miller; +Cc: hadi, greearb, jgarzik, scott.feldman, netdev, ricardoz


David S. Miller writes:
 > On 11 Sep 2003 21:34:23 -0400
 > jamal <hadi@cyberus.ca> wrote:
 > 
 > > dont increase the tx descriptor ring size - that would truly wasting
 > > memory; 256 is pretty adequate.
 > > * increase instead the txquelen (as suggested by Davem); user space
 > > tools like ip or ifconfig could do it. The standard size has been around
 > > 100 for 100Mbps; i suppose it is fair to say that Gige can move data out
 > > at 10x that; so set it to 1000. Maybe you can do this from the driver
 > > based on what negotiated speed is detected?
 > 
 > I spoke with Alexey once about this, actually tx_queue_len can
 > be arbitrarily large but it should be reasonable nonetheless.
 > 
 > Our preliminary conclusions were that values of 1000 for 100Mbit and
 > faster were probably appropriate.  Maybe something larger for 1Gbit,
 > who knows.
 > 
 > We also determined that the only connection between TX descriptor
 > ring size and dev->tx_queue_len was that the latter should be large
 > enough to handle, at a minimum, the amount of pending TX descriptor
 > ACKs that can be pending considering mitigation et al.
 > 
 > So if TX irq mitigation can defer up to N TX descriptor completions
 > then dev->tx_queue_len must be at least that large.
 > 
 > Back to the main topic, maybe we should set dev->tx_queue_len to
 > 1000 by default for all ethernet devices.

 Hello!

 Yes sounds like adequate setting for GIGE. This is what use for production
 and lab but rather than increasing dev->tx_queue_len to 1000 we replace the 
 pfifo_fast with the pfifo qdisc w. setting a qlen of 1000.

 And with we have tx_descriptor_ring_size 256 which is tuned to the NIC's
 "TX service interval" with respect to interrupt mitigation etc. This seems 
 good enough even for small packets.

 For routers this setting is even more crucial as we need to serialize 
 several flows and we know the flows are bursty.


 Cheers.
						--ro

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-13  3:49                       ` David S. Miller
  2003-09-13 11:52                         ` Robert Olsson
@ 2003-09-14 19:08                         ` Ricardo C Gonzalez
  2003-09-15  2:50                           ` David Brownell
  2004-05-15 12:14                           ` TxDescriptors -> 1024 default. Please not for every NIC! Marc Herbert
  1 sibling, 2 replies; 35+ messages in thread
From: Ricardo C Gonzalez @ 2003-09-14 19:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: hadi, greearb, jgarzik, scott.feldman, netdev






David Miller wrote:

>Back to the main topic, maybe we should set dev->tx_queue_len to
>1000 by default for all ethernet devices.


I definately agree with setting the dev->tx_queue_len to 1000 as a default
for all ethernet adapters. All adapters will benefit from this change.


regards,

----------------------------------------------------------------------------------

***  ALWAYS THINK POSITIVE ***

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-14 19:08                         ` Ricardo C Gonzalez
@ 2003-09-15  2:50                           ` David Brownell
  2003-09-15  8:17                             ` David S. Miller
  2004-05-15 12:14                           ` TxDescriptors -> 1024 default. Please not for every NIC! Marc Herbert
  1 sibling, 1 reply; 35+ messages in thread
From: David Brownell @ 2003-09-15  2:50 UTC (permalink / raw)
  To: Ricardo C Gonzalez, David S. Miller
  Cc: hadi, greearb, jgarzik, scott.feldman, netdev

Ricardo C Gonzalez wrote:
> 
> David Miller wrote:
> 
> 
>>Back to the main topic, maybe we should set dev->tx_queue_len to
>>1000 by default for all ethernet devices.
> 
> 
> 
> I definately agree with setting the dev->tx_queue_len to 1000 as a default
> for all ethernet adapters. All adapters will benefit from this change.

Except ones where CONFIG_EMBEDDED, maybe?  Not everyone wants
to spend that much memory, even when it's available...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-15  2:50                           ` David Brownell
@ 2003-09-15  8:17                             ` David S. Miller
  0 siblings, 0 replies; 35+ messages in thread
From: David S. Miller @ 2003-09-15  8:17 UTC (permalink / raw)
  To: David Brownell; +Cc: ricardoz, hadi, greearb, jgarzik, scott.feldman, netdev

On Sun, 14 Sep 2003 19:50:56 -0700
David Brownell <david-b@pacbell.net> wrote:

> Except ones where CONFIG_EMBEDDED, maybe?  Not everyone wants
> to spend that much memory, even when it's available...

Dropping the packet between the network stack and the driver
does waste memory for _LONGER_ periods of time.

When we drop, TCP still hangs onto the buffer, and we'll send
it again and again until it makes it and we get an ACK back
or the connection completely times out.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-13 11:52                         ` Robert Olsson
@ 2003-09-15 12:12                           ` jamal
  2003-09-15 13:45                             ` Robert Olsson
  0 siblings, 1 reply; 35+ messages in thread
From: jamal @ 2003-09-15 12:12 UTC (permalink / raw)
  To: Robert Olsson
  Cc: David S. Miller, greearb, jgarzik, scott.feldman, netdev,
	ricardoz

On Sat, 2003-09-13 at 07:52, Robert Olsson wrote:

>  > 
>  > I spoke with Alexey once about this, actually tx_queue_len can
>  > be arbitrarily large but it should be reasonable nonetheless.
>  > 
>  > Our preliminary conclusions were that values of 1000 for 100Mbit and
>  > faster were probably appropriate.  Maybe something larger for 1Gbit,
>  > who knows.

If you recall we saw that even for the gent who was trying to do 100K
TCP sockets on a 4 way SMP, 1000 was sufficient and no packets were
dropped.

>  > 
>  > We also determined that the only connection between TX descriptor
>  > ring size and dev->tx_queue_len was that the latter should be large
>  > enough to handle, at a minimum, the amount of pending TX descriptor
>  > ACKs that can be pending considering mitigation et al.
>  > 
>  > So if TX irq mitigation can defer up to N TX descriptor completions
>  > then dev->tx_queue_len must be at least that large.
>  > 
>  > Back to the main topic, maybe we should set dev->tx_queue_len to
>  > 1000 by default for all ethernet devices.
> 
>  Hello!
> 
>  Yes sounds like adequate setting for GIGE. This is what use for production
>  and lab but rather than increasing dev->tx_queue_len to 1000 we replace the 
>  pfifo_fast with the pfifo qdisc w. setting a qlen of 1000.
> 

I think this may not be good for the reason of QoS. You want BGP packets
to be given priority over ftp. A single queue kills that.
The current default 3 band queue is good enough, the only challenge
being noone sees stats for it. I have a patch for the kernel at:
http://www.cyberus.ca/~hadi/patches/restore.pfifo.kernel
and for tc at:
http://www.cyberus.ca/~hadi/patches/restore.pfifo.tc

cheers,
jamal

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-15 12:12                           ` jamal
@ 2003-09-15 13:45                             ` Robert Olsson
  2003-09-15 23:15                               ` David S. Miller
  0 siblings, 1 reply; 35+ messages in thread
From: Robert Olsson @ 2003-09-15 13:45 UTC (permalink / raw)
  To: hadi
  Cc: Robert Olsson, David S. Miller, greearb, jgarzik, scott.feldman,
	netdev, ricardoz


jamal writes:

 > I think this may not be good for the reason of QoS. You want BGP packets
 > to be given priority over ftp. A single queue kills that.

 Well so far single queue has been robust enough for BGP-sessions. Talking
 from own experiences...

 > The current default 3 band queue is good enough, the only challenge
 > being noone sees stats for it. I have a patch for the kernel at:
 > http://www.cyberus.ca/~hadi/patches/restore.pfifo.kernel
 > and for tc at:
 > http://www.cyberus.ca/~hadi/patches/restore.pfifo.tc
 
 Yes. 
 I've missed this. Our lazy work-around for the missing stats is to install 
 pfifo qdisc as said. IMO it should be included.

 Cheers.
						--ro
  

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-15 13:45                             ` Robert Olsson
@ 2003-09-15 23:15                               ` David S. Miller
  2003-09-16  9:28                                 ` Robert Olsson
  0 siblings, 1 reply; 35+ messages in thread
From: David S. Miller @ 2003-09-15 23:15 UTC (permalink / raw)
  To: Robert Olsson
  Cc: hadi, Robert.Olsson, greearb, jgarzik, scott.feldman, netdev,
	ricardoz

On Mon, 15 Sep 2003 15:45:42 +0200
Robert Olsson <Robert.Olsson@data.slu.se> wrote:

>  > The current default 3 band queue is good enough, the only challenge
>  > being noone sees stats for it. I have a patch for the kernel at:
>  > http://www.cyberus.ca/~hadi/patches/restore.pfifo.kernel
>  > and for tc at:
>  > http://www.cyberus.ca/~hadi/patches/restore.pfifo.tc
>  
>  Yes. 
>  I've missed this. Our lazy work-around for the missing stats is to install 
>  pfifo qdisc as said. IMO it should be included.

I've included Jamal's pfifo_fast statistic patch, and the
change to increase ethernet's tx_queue_len to 1000 in all
of my trees.

Thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [e1000 2.6 10/11] TxDescriptors -> 1024 default
  2003-09-15 23:15                               ` David S. Miller
@ 2003-09-16  9:28                                 ` Robert Olsson
  0 siblings, 0 replies; 35+ messages in thread
From: Robert Olsson @ 2003-09-16  9:28 UTC (permalink / raw)
  To: David S. Miller
  Cc: kuznet, Robert Olsson, hadi, greearb, jgarzik, scott.feldman,
	netdev, ricardoz


David S. Miller writes:

 > >  > http://www.cyberus.ca/~hadi/patches/restore.pfifo.kernel
 > >  > and for tc at:
 > >  > http://www.cyberus.ca/~hadi/patches/restore.pfifo.tc
 > 
 > I've included Jamal's pfifo_fast statistic patch, and the
 > change to increase ethernet's tx_queue_len to 1000 in all
 > of my trees.

 Thanks. We ask Alexey to include the tc part too.

 Cheers.
					--ro

^ permalink raw reply	[flat|nested] 35+ messages in thread

* TxDescriptors -> 1024 default. Please not for every NIC!
  2003-09-14 19:08                         ` Ricardo C Gonzalez
  2003-09-15  2:50                           ` David Brownell
@ 2004-05-15 12:14                           ` Marc Herbert
  2004-05-19  9:30                             ` Marc Herbert
  1 sibling, 1 reply; 35+ messages in thread
From: Marc Herbert @ 2004-05-15 12:14 UTC (permalink / raw)
  To: netdev

On Sun, 14 Sep 2003, Ricardo C Gonzalez wrote:

> David Miller wrote:
>
> >Back to the main topic, maybe we should set dev->tx_queue_len to
> >1000 by default for all ethernet devices.
>
>
> I definately agree with setting the dev->tx_queue_len to 1000 as a default
> for all ethernet adapters. All adapters will benefit from this change.
>

<http://oss.sgi.com/projects/netdev/archive/2003-09/threads.html#00247>

Sorry to exhume this discussion but I only recently discovered this
change, the hard way.

I carefully read this old thread and did not grasp _every_ detail, but
there is one thing that I am sure of: 1000 packets @ 1 Gb/s looks
good, but on the other hand, 1000 full-size Ethernet packets @ 10 Mb/s
are about 1.2 seconds long!

Too little buffering means not enough dampering effect, which is very
important for performance in asynchronous systems, granted. However,
_too much_ buffering means too big and too variable latencies. When
discussing buffers, duration is very often more important than size.
Applications, TCP's dynamic (and kernel dynamics too?) do not care
much about buffer sizes, they more often care about latencies (and
throughput, of course). Buffers sizes is often "just a small matter of
implementation" :-) For instance people designing routers talk about
buffers in _milliseconds_ much more often than in _bytes_ (despite the
fact that their memories cost more than in hosts, considering the
throughputs involved).

100 packets @ 100 Mb/s was 12 ms. 1000 packets @ 1 Gb/s is still
12 ms. 12 ms is great. It's a "good" latency because it is the
order of magnitude of real-world constants like:  comfortable
interactive applications, operating system sheduler granularity or
propagation time in 2000 km of cable.

But 1000 packets @ 100 Mb/s is 120 ms and is neither very good nor
very useful anymore. 1000 packets @ 10 Mb/s is 1.2 s, which is
ridiculous. It does mean that, when joe user is uploading some big
file through his cheap Ethernet card, and that there are no other
bottleneck/drops further in the network, every concurrent application
will have to wait 1.2 s before accessing the network!
 It this hard to believe for you, just make the test yourself, it's
very easy: force one of you NICs to 10Mb/s full duplex, txqueuelen
1000 and send a continuous flow to a nearby machine. Then try to ping
anything.
 Imagine now that some packet is lost for whatever reason on some
_other_ TCP connection going through this terrible 1.2 s queue. Then
you need one SACK/RTX extra round trip time to recover from it: so
it's now _2.4 s_ to deliver the data sent just after the dropped
packet...  Assuming of course TCP timers do not become confused by
this huge latency and probably huge jitter.

And I don't think you want to make fiddling with "tc" mandatory for
joe user. Or tell him: "oh, please just 'ifconfig txqueuelen 10', or
buy a new Ethernet card".

I am unfortunately not familiar with this part of the linux kernel,
but I really think that, if possible, txqueuelen should be initialized
at some "constant 12 ms" and not at the "1000 packets" highly variable
latency setting. I can imagine there are some corner cases, like for
instance when some GEth NIC is hot-plugged into a 100 Mb/s, or jumbo
frames, but hey, those are corner cases : as a first step, even a
simple constant-per-model txqueuelen initialization would be already
great.

Cheers,

Marc.

PS: one workaround for joe user against this 1.2s latency would be to
keep his SND_BUF and number of sockets small. But this is poor.

-- 
"Je n'ai fait cette lettre-ci plus longue que parce que je n'ai pas eu
le loisir de la faire plus courte." -- Blaise Pascal

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: TxDescriptors -> 1024 default. Please not for every NIC!
  2004-05-15 12:14                           ` TxDescriptors -> 1024 default. Please not for every NIC! Marc Herbert
@ 2004-05-19  9:30                             ` Marc Herbert
  2004-05-19 10:27                               ` Pekka Pietikainen
  2004-05-19 11:54                               ` Andi Kleen
  0 siblings, 2 replies; 35+ messages in thread
From: Marc Herbert @ 2004-05-19  9:30 UTC (permalink / raw)
  To: netdev

On Sat, 15 May 2004, Marc Herbert wrote:

> <http://oss.sgi.com/projects/netdev/archive/2003-09/threads.html#00247>
>
> Sorry to exhume this discussion but I only recently discovered this
> change, the hard way.
>

> I am unfortunately not familiar with this part of the linux kernel,
> but I really think that, if possible, txqueuelen should be initialized
> at some "constant 12 ms" and not at the "1000 packets" highly variable
> latency setting. I can imagine there are some corner cases, like for
> instance when some GEth NIC is hot-plugged into a 100 Mb/s, or jumbo
> frames, but hey, those are corner cases : as a first step, even a
> simple constant-per-model txqueuelen initialization would be already
> great.

After some further study, I was glad to discover my suggestion above
both easy and short to implement. See patch below.

Trying to sum-it up:

- Ricardo asks (among others) for a new 1000 packets default
  txqueuelen for Intel's e1000, based on some data (couldn't not find
  this data, please send me the pointer if you have it, thanks).

- Me argues that we all lived happy for ages with this default
  setting of 100 packets @ 100 Mb/s (and lived approximately happy @
  10 Mb/s), but we'll soon see doom and gloom with this new and
  brutal change to 1000 packets for all this _legacy_ 10-100 Mb/s
  hardware. e1000 data only is not enough to justify this radical
  shift.

If you are convinced by _both_ items above, then the patch below
content _both_, and we're done.

If you are not, then... wait for further discussion, including answers
to latest Ricardo's post.


PS: several people seem to think TCP "drops" packets when the qdisc is
full. My analysis of the code _and_ my experiments makes me think they
are wrong: TCP rather "blocks" when the qdisc is full. See explanation
here: <http://oss.sgi.com/archives/netdev/2004-05/msg00151.html>
(Subject: Re: TcpOutSegs way too optimistic (netstat -s))


===== drivers/net/net_init.c 1.11 vs edited =====
--- 1.11/drivers/net/net_init.c	Tue Sep 16 01:12:25 2003
+++ edited/drivers/net/net_init.c	Wed May 19 11:05:34 2004
@@ -420,7 +420,10 @@
 	dev->hard_header_len 	= ETH_HLEN;
 	dev->mtu		= 1500; /* eth_mtu */
 	dev->addr_len		= ETH_ALEN;
-	dev->tx_queue_len	= 1000;	/* Ethernet wants good queues */
+	dev->tx_queue_len	= 100; /* This is a sensible generic default for
+					100 Mb/s: about 12ms with 1500 full size packets.
+					Drivers should tune this depending on interface
+					specificities and settings */

 	memset(dev->broadcast,0xFF, ETH_ALEN);

===== drivers/net/e1000/e1000_main.c 1.56 vs edited =====
--- 1.56/drivers/net/e1000/e1000_main.c	Tue Feb  3 01:43:42 2004
+++ edited/drivers/net/e1000/e1000_main.c	Wed May 19 03:14:32 2004
@@ -400,6 +400,8 @@
 		err = -ENOMEM;
 		goto err_alloc_etherdev;
 	}
+
+	netdev->tx_queue_len = 1000;

 	SET_MODULE_OWNER(netdev);

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: TxDescriptors -> 1024 default. Please not for every NIC!
  2004-05-19  9:30                             ` Marc Herbert
@ 2004-05-19 10:27                               ` Pekka Pietikainen
  2004-05-20 14:11                                 ` Luis R. Rodriguez
  2004-05-19 11:54                               ` Andi Kleen
  1 sibling, 1 reply; 35+ messages in thread
From: Pekka Pietikainen @ 2004-05-19 10:27 UTC (permalink / raw)
  To: Marc Herbert; +Cc: netdev, prism54-devel

On Wed, May 19, 2004 at 11:30:28AM +0200, Marc Herbert wrote:
> - Me argues that we all lived happy for ages with this default
>   setting of 100 packets @ 100 Mb/s (and lived approximately happy @
>   10 Mb/s), but we'll soon see doom and gloom with this new and
>   brutal change to 1000 packets for all this _legacy_ 10-100 Mb/s
>   hardware. e1000 data only is not enough to justify this radical
>   shift.
> 
> If you are convinced by _both_ items above, then the patch below
> content _both_, and we're done.
> 
> If you are not, then... wait for further discussion, including answers
> to latest Ricardo's post.
Not to mention that not all modern hardware is gigabit, current
2.6 seems to be setting txqueuelen of 1000 for 802.11 devices too (at least
my prism54), which might be causing major problems for me.

Well, I'm still trying to figure out whether it's txqueue or WEP that causes
all traffic to stop (with rx invalid crypt packets showing up in iwconfig
afterwards, AP is a linksys wrt54g in case it makes a difference) every now
and then until a ifdown / ifup. Tried both vanilla 2.6 prism54 and CVS
(which seems to have a reset on tx timeout thing added), but if txqueue is
1000 that won't easily get triggered will it?
 
It's been running for a few days just fine with txqueue = 100 and no WEP, if
it stays like that i'll start tweaking to find what exactly triggers it.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: TxDescriptors -> 1024 default. Please not for every NIC!
  2004-05-19  9:30                             ` Marc Herbert
  2004-05-19 10:27                               ` Pekka Pietikainen
@ 2004-05-19 11:54                               ` Andi Kleen
  1 sibling, 0 replies; 35+ messages in thread
From: Andi Kleen @ 2004-05-19 11:54 UTC (permalink / raw)
  To: Marc Herbert; +Cc: netdev

Marc Herbert <marc.herbert@free.fr> writes:
>
> PS: several people seem to think TCP "drops" packets when the qdisc is
> full. My analysis of the code _and_ my experiments makes me think they
> are wrong: TCP rather "blocks" when the qdisc is full. See explanation
> here: <http://oss.sgi.com/archives/netdev/2004-05/msg00151.html>
> (Subject: Re: TcpOutSegs way too optimistic (netstat -s))

This behaviour was only added relatively recently (in late 2.3.x timeframe)
I believe all the default queue lengths tunings were done before that.
So it would probably make sense to reevaluate/rebenchmark the default 
queue lengths for various devices with the newer code.

-Andi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Re: TxDescriptors -> 1024 default. Please not for every NIC!
  2004-05-19 10:27                               ` Pekka Pietikainen
@ 2004-05-20 14:11                                 ` Luis R. Rodriguez
  2004-05-20 16:38                                   ` [Prism54-devel] " Jean Tourrilhes
  0 siblings, 1 reply; 35+ messages in thread
From: Luis R. Rodriguez @ 2004-05-20 14:11 UTC (permalink / raw)
  To: Pekka Pietikainen; +Cc: Marc Herbert, netdev, prism54-devel, Jean Tourrilhes

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

On Wed, May 19, 2004 at 01:27:00PM +0300, Pekka Pietikainen wrote:
> On Wed, May 19, 2004 at 11:30:28AM +0200, Marc Herbert wrote:
> > - Me argues that we all lived happy for ages with this default
> >   setting of 100?packets @?100?Mb/s (and lived approximately happy @
> >   10 Mb/s), but we'll soon see doom and gloom with this new and
> >   brutal change to 1000?packets for all this _legacy_ 10-100 Mb/s
> >   hardware. e1000 data only is not enough to justify this radical
> >   shift.
> > 
> > If you are convinced by _both_ items above, then the patch below
> > content _both_, and we're done.
> > 
> > If you are not, then... wait for further discussion, including answers
> > to latest Ricardo's post.
>
> Not to mention that not all modern hardware is gigabit, current
> 2.6 seems to be setting txqueuelen of 1000 for 802.11 devices too (at least
> my prism54), which might be causing major problems for me.

Considering 802.11b's peak is at 11Mbit and standard 802.11g is at 54Mbit
(some manufacturers are using two channels and getting 108Mbit now) I'd
think we should stick at 100, as the patch proposes. Jean?

	Luis

-- 
GnuPG Key fingerprint = 113F B290 C6D2 0251 4D84  A34A 6ADD 4937 E20A 525E

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Prism54-devel] Re: TxDescriptors -> 1024 default. Please not for every NIC!
  2004-05-20 14:11                                 ` Luis R. Rodriguez
@ 2004-05-20 16:38                                   ` Jean Tourrilhes
  2004-05-20 16:45                                     ` Tomasz Torcz
  0 siblings, 1 reply; 35+ messages in thread
From: Jean Tourrilhes @ 2004-05-20 16:38 UTC (permalink / raw)
  To: Pekka Pietikainen, Marc Herbert, netdev, prism54-devel

On Thu, May 20, 2004 at 10:11:11AM -0400, Luis R. Rodriguez wrote:
> On Wed, May 19, 2004 at 01:27:00PM +0300, Pekka Pietikainen wrote:
> > On Wed, May 19, 2004 at 11:30:28AM +0200, Marc Herbert wrote:
> > > - Me argues that we all lived happy for ages with this default
> > >   setting of 100?packets @?100?Mb/s (and lived approximately happy @
> > >   10 Mb/s), but we'll soon see doom and gloom with this new and
> > >   brutal change to 1000?packets for all this _legacy_ 10-100 Mb/s
> > >   hardware. e1000 data only is not enough to justify this radical
> > >   shift.
> > > 
> > > If you are convinced by _both_ items above, then the patch below
> > > content _both_, and we're done.
> > > 
> > > If you are not, then... wait for further discussion, including answers
> > > to latest Ricardo's post.
> >
> > Not to mention that not all modern hardware is gigabit, current
> > 2.6 seems to be setting txqueuelen of 1000 for 802.11 devices too (at least
> > my prism54), which might be causing major problems for me.
> 
> Considering 802.11b's peak is at 11Mbit and standard 802.11g is at 54Mbit
> (some manufacturers are using two channels and getting 108Mbit now) I'd
> think we should stick at 100, as the patch proposes. Jean?
> 
> 	Luis

	I never like to have huge queues of buffers. It waste memory,
and degrade the latency, especially with competing sockets. In a
theoritical stable system, you don't need buffers (you run everything
synchronously), buffer are only needed to take care of the jitter in
real networks.
	The real throughouput of 802.11g is more around 30Mb/s (at
TCP/IP level). However, wireless networks tend to have more jitter
(interference and contention). But, wireless cards tend to have a fair
number of buffers in the hardware.
	I personally would stick with 100. The IrDA stack runs
perfectly fine with 15 buffers at 4 Mb/s. If 100 is not enough, I
think the problem is not the number of buffers, but somewhere else.
For example, we might want to think about explicit socket callbacks
(like I did in IrDA).
	But that's only personal opinions ;-)

	Have fun...

	Jean

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Prism54-devel] Re: TxDescriptors -> 1024 default. Please not for every NIC!
  2004-05-20 16:38                                   ` [Prism54-devel] " Jean Tourrilhes
@ 2004-05-20 16:45                                     ` Tomasz Torcz
  2004-05-20 17:13                                       ` zero copy TX in benchmarks was " Andi Kleen
  0 siblings, 1 reply; 35+ messages in thread
From: Tomasz Torcz @ 2004-05-20 16:45 UTC (permalink / raw)
  To: netdev

On Thu, May 20, 2004 at 09:38:11AM -0700, Jean Tourrilhes wrote:
> 	I personally would stick with 100. The IrDA stack runs
> perfectly fine with 15 buffers at 4 Mb/s. If 100 is not enough, I
> think the problem is not the number of buffers, but somewhere else.

 I don't know how much trollish or true is that comment:
http://bsd.slashdot.org/comments.pl?sid=106258&cid=9049422
but it suggest, that Linux' stack having no BSD like mbuf functionality,
is not perfect for fast transmission. Maybe some network guru
cna comment ?

-- 
Tomasz Torcz       ,,(...) today's high-end is tomorrow's embedded processor.''
zdzichu@irc.-nie.spam-.pl                      -- Mitchell Blank on LKML

^ permalink raw reply	[flat|nested] 35+ messages in thread

* zero copy TX in benchmarks was Re: [Prism54-devel] Re: TxDescriptors -> 1024 default. Please not for every NIC!
  2004-05-20 16:45                                     ` Tomasz Torcz
@ 2004-05-20 17:13                                       ` Andi Kleen
  0 siblings, 0 replies; 35+ messages in thread
From: Andi Kleen @ 2004-05-20 17:13 UTC (permalink / raw)
  To: Tomasz Torcz; +Cc: netdev

On Thu, May 20, 2004 at 06:45:16PM +0200, Tomasz Torcz wrote:
> On Thu, May 20, 2004 at 09:38:11AM -0700, Jean Tourrilhes wrote:
> > 	I personally would stick with 100. The IrDA stack runs
> > perfectly fine with 15 buffers at 4 Mb/s. If 100 is not enough, I
> > think the problem is not the number of buffers, but somewhere else.

Not sure why you post this to this thread? It has nothing to do
with the previous message.
> 
>  I don't know how much trollish or true is that comment:
> http://bsd.slashdot.org/comments.pl?sid=106258&cid=9049422

Linux sk_buffs and BSD mbufs are not very different anymore today.
The BSD mbufs have been getting more sk_buff'ish over time,
and sk_buffs have grown some properties of mbufs. They both
have changed to optionally pass references of memory around instead of 
copying always, which is what counts here.

> but it suggest, that Linux' stack having no BSD like mbuf functionality,
> is not perfect for fast transmission. Maybe some network guru
> cna comment ?

I have not read all the details, but I suppose they used sendmsg() 
instead of sendfile() for this test. NetBSD can use zero copy TX
in this case; Linux can only with sendfile and sendmsg will copy. 
Obvious linux will be slower then because a copy can cost quite
a lot of CPU. Or rather it is not really the CPU cost that is the
problem here, but the bandwidth usage - very high speed networking i
s essentially memory bandwidth limited and copying over the CPU 
adds additional bandwidth requirements to the memory subsystem.

There was an implementation of zero copy sendmsg() for linux long ago, 
but it was removed because it was fundamentally incompatible with good 
SMP scaling, because it would require remote TLB flushes over possible
many CPUs (if you search the archives of this list you will find 
long threads about it). It would not be very hard to readd (Linux
has all the low level infrastructure needed for it), but 
it doesn't make sense. NetBSD may have the luxury to not care
about MP scaling, but Linux doesn't.

The disadvantage of sendfile is that you can only transmit files
directly; if you want to transmit data directly out of an process'
address space you have to put them into a file mmap and sendfile
from there. This may be a bit inconvenient if the basic unit
of data in your program isn't files.

There was an plan suggested to fix that (implement zero copy TX for 
POSIX AIO instead of BSD sockets), which would not have this problem.
POSIX AIO has all the infrastructure to do zero copy IO without 
problematic and slow TLB flushes. Just so far nobody implemented that.

In practice it is not a too big issue because many tuned servers 
(your typical ftpd, httpd or samba server) use sendfile already.

-Andi

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2004-05-20 17:13 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-09  3:14 [e1000 2.6 10/11] TxDescriptors -> 1024 default Feldman, Scott
2003-09-11 19:18 ` Jeff Garzik
2003-09-11 19:45   ` Ben Greear
2003-09-11 19:59     ` Jeff Garzik
2003-09-11 20:12     ` David S. Miller
2003-09-11 20:40       ` Ben Greear
2003-09-11 21:07         ` David S. Miller
2003-09-11 21:29           ` Ben Greear
2003-09-11 21:29             ` David S. Miller
2003-09-11 21:47               ` Ricardo C Gonzalez
2003-09-11 22:00                 ` Jeff Garzik
2003-09-11 22:15               ` Ben Greear
2003-09-11 23:02                 ` David S. Miller
2003-09-11 23:22                   ` Ben Greear
2003-09-11 23:29                     ` David S. Miller
2003-09-12  1:34                     ` jamal
2003-09-12  2:20                       ` Ricardo C Gonzalez
2003-09-12  3:05                         ` jamal
2003-09-13  3:49                       ` David S. Miller
2003-09-13 11:52                         ` Robert Olsson
2003-09-15 12:12                           ` jamal
2003-09-15 13:45                             ` Robert Olsson
2003-09-15 23:15                               ` David S. Miller
2003-09-16  9:28                                 ` Robert Olsson
2003-09-14 19:08                         ` Ricardo C Gonzalez
2003-09-15  2:50                           ` David Brownell
2003-09-15  8:17                             ` David S. Miller
2004-05-15 12:14                           ` TxDescriptors -> 1024 default. Please not for every NIC! Marc Herbert
2004-05-19  9:30                             ` Marc Herbert
2004-05-19 10:27                               ` Pekka Pietikainen
2004-05-20 14:11                                 ` Luis R. Rodriguez
2004-05-20 16:38                                   ` [Prism54-devel] " Jean Tourrilhes
2004-05-20 16:45                                     ` Tomasz Torcz
2004-05-20 17:13                                       ` zero copy TX in benchmarks was " Andi Kleen
2004-05-19 11:54                               ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).