netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [ofa-general] Re: IPoIB forwarding
       [not found]           ` <6.1.2.0.2.20070427115435.13ea5ec0@mail.llnl.gov>
@ 2007-04-27 20:32             ` Rick Jones
  2007-04-27 22:26               ` Bryan Lawver
  0 siblings, 1 reply; 20+ messages in thread
From: Rick Jones @ 2007-04-27 20:32 UTC (permalink / raw)
  To: Bryan Lawver; +Cc: Linux Network Development list, Michael S. Tsirkin, general

Bryan Lawver wrote:
> Your right about the ipoib module not combining packets (I believed you 
> without checking) but I did never the less.  The ipoib_start_xmit 
> routine is definitely handed a "double packet"  which means that the IP 
> NIC driver or the kernel is combining two packets into a single super 
> jumbo packet.  This issue is irrespective of the IP MTU setting because 
> I have set all interfaces to 9000k yet  ipoib accepts and forwards this 
> 17964 packet to the next IB node and onto the TCP stack where it is 
> never acknowledged.  This may not have come up in prior testing because 
> I am using some of the fastest IP NICs which have no trouble keeping up 
> with or exceeding the bandwidth of the IB side.  This issue arises 
> exactly every 8 packets...(ring buffer overrun??)
> 
> I will be at Sonoma for the next few days as many on this list will be.


Some NICs (esp 10G) support large receive offload - they coalesce TCP segments 
from the wire/fiber into larger ones they pass up the stack.  Perhaps that is 
happening here?

I'm going to go out a bit on a limb, cross the streams, and include netdev, 
because I suspect that if a system is acting as an IP router, one doesn't want 
large receive offload enabled.  That may need some discussion in netdev - it may 
then require some changes to default settings or some documentation 
enhancements.  That or I'll learn that the stack is already dealing with the 
issue...

rick jones

> bryan
> 
> 
> 
> At 11:06 AM 4/26/2007, Michael S. Tsirkin wrote:
> 
>> > Quoting Bryan Lawver <lawver1@llnl.gov>:
>> > Subject: Re: IPoIB forwarding
>> >
>> > Here's a tcpdump of the same sequence.  The TCP MSS is 8960 and it 
>> appears
>> > that two payloads are queued at ipoib which combines them into a single
>> > 17920 payload with assumingly correct IP header (40) and IB header
>> > (4).  The application or TCP stack does not acknowledge this double 
>> packet
>> > ie. it does not ACK until each of the 8960 packets are resent
>> > individually.  Being an IB newbie, I am guessing this combining is
>> > allowable but may violate TCP protocol.
>>
>> IPoIB does nothing like this - it's just a network device so
>> it sends all packets out as is.
>>
>> -- 
>> MST
> 
> 
> _______________________________________________
> general mailing list
> general@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-04-27 20:32             ` [ofa-general] Re: IPoIB forwarding Rick Jones
@ 2007-04-27 22:26               ` Bryan Lawver
  2007-04-27 22:32                 ` Rick Jones
  2007-04-28  2:35                 ` parks
  0 siblings, 2 replies; 20+ messages in thread
From: Bryan Lawver @ 2007-04-27 22:26 UTC (permalink / raw)
  To: Rick Jones; +Cc: Linux Network Development list, Michael S. Tsirkin, general

I hit the IP NIC over the head with a hammer and turned off all offload 
features and I no longer get the super jumbo packet and I have symmetric 
performance.  This NIC supported "ethtool -K ethx tso/tx/rx/sg on/off" and 
I am not sure at this time which one I needed to whack but all off solved 
the problem.

Thanks for listening and re enforcing my search process.

bryan

At 01:32 PM 4/27/2007, Rick Jones wrote:
>Bryan Lawver wrote:
>>Your right about the ipoib module not combining packets (I believed you 
>>without checking) but I did never the less.  The ipoib_start_xmit routine 
>>is definitely handed a "double packet"  which means that the IP NIC 
>>driver or the kernel is combining two packets into a single super jumbo 
>>packet.  This issue is irrespective of the IP MTU setting because I have 
>>set all interfaces to 9000k yet  ipoib accepts and forwards this 17964 
>>packet to the next IB node and onto the TCP stack where it is never 
>>acknowledged.  This may not have come up in prior testing because I am 
>>using some of the fastest IP NICs which have no trouble keeping up with 
>>or exceeding the bandwidth of the IB side.  This issue arises exactly 
>>every 8 packets...(ring buffer overrun??)
>>I will be at Sonoma for the next few days as many on this list will be.
>
>
>Some NICs (esp 10G) support large receive offload - they coalesce TCP 
>segments from the wire/fiber into larger ones they pass up the 
>stack.  Perhaps that is happening here?
>
>I'm going to go out a bit on a limb, cross the streams, and include 
>netdev, because I suspect that if a system is acting as an IP router, one 
>doesn't want large receive offload enabled.  That may need some discussion 
>in netdev - it may then require some changes to default settings or some 
>documentation enhancements.  That or I'll learn that the stack is already 
>dealing with the issue...
>
>rick jones
>
>>bryan
>>
>>At 11:06 AM 4/26/2007, Michael S. Tsirkin wrote:
>>
>>> > Quoting Bryan Lawver <lawver1@llnl.gov>:
>>> > Subject: Re: IPoIB forwarding
>>> >
>>> > Here's a tcpdump of the same sequence.  The TCP MSS is 8960 and it 
>>> appears
>>> > that two payloads are queued at ipoib which combines them into a single
>>> > 17920 payload with assumingly correct IP header (40) and IB header
>>> > (4).  The application or TCP stack does not acknowledge this double 
>>> packet
>>> > ie. it does not ACK until each of the 8960 packets are resent
>>> > individually.  Being an IB newbie, I am guessing this combining is
>>> > allowable but may violate TCP protocol.
>>>
>>>IPoIB does nothing like this - it's just a network device so
>>>it sends all packets out as is.
>>>
>>>--
>>>MST
>>
>>_______________________________________________
>>general mailing list
>>general@lists.openfabrics.org
>>http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>To unsubscribe, please visit 
>>http://openib.org/mailman/listinfo/openib-general

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-04-27 22:26               ` Bryan Lawver
@ 2007-04-27 22:32                 ` Rick Jones
  2007-04-27 22:43                   ` Bryan Lawver
  2007-04-28  2:35                 ` parks
  1 sibling, 1 reply; 20+ messages in thread
From: Rick Jones @ 2007-04-27 22:32 UTC (permalink / raw)
  To: Bryan Lawver; +Cc: Michael S. Tsirkin, general, Linux Network Development list

Bryan Lawver wrote:
> I hit the IP NIC over the head with a hammer and turned off all offload 
> features and I no longer get the super jumbo packet and I have symmetric 
> performance.  This NIC supported "ethtool -K ethx tso/tx/rx/sg on/off" 
> and I am not sure at this time which one I needed to whack but all off 
> solved the problem.

Yeah, that does seem like a rather broad remedy, but I guess if it works... :) 
And I suppose most of those offloads don't matter for a NIC being used in a router.

Only problem is we don't know if it worked because it slowed-down the 10G side 
or because it had LRO disabling as a side-effect. If I were to guess, of those 
things listed, I'd guess that receive cko would have that as a side effect.

Just what sort of 10G NIC was this anyway?  With that knowledge we could 
probably narrow things down to a more specific modprobe setting, or maybe even 
an ethtool command, for some suitable revision of ethtool.

rick jones

> 
> Thanks for listening and re enforcing my search process.
> 
> bryan
> 
> At 01:32 PM 4/27/2007, Rick Jones wrote:
> 
>> Bryan Lawver wrote:
>>
>>> Your right about the ipoib module not combining packets (I believed 
>>> you without checking) but I did never the less.  The ipoib_start_xmit 
>>> routine is definitely handed a "double packet"  which means that the 
>>> IP NIC driver or the kernel is combining two packets into a single 
>>> super jumbo packet.  This issue is irrespective of the IP MTU setting 
>>> because I have set all interfaces to 9000k yet  ipoib accepts and 
>>> forwards this 17964 packet to the next IB node and onto the TCP stack 
>>> where it is never acknowledged.  This may not have come up in prior 
>>> testing because I am using some of the fastest IP NICs which have no 
>>> trouble keeping up with or exceeding the bandwidth of the IB side.  
>>> This issue arises exactly every 8 packets...(ring buffer overrun??)
>>> I will be at Sonoma for the next few days as many on this list will be.
>>
>>
>>
>> Some NICs (esp 10G) support large receive offload - they coalesce TCP 
>> segments from the wire/fiber into larger ones they pass up the stack.  
>> Perhaps that is happening here?
>>
>> I'm going to go out a bit on a limb, cross the streams, and include 
>> netdev, because I suspect that if a system is acting as an IP router, 
>> one doesn't want large receive offload enabled.  That may need some 
>> discussion in netdev - it may then require some changes to default 
>> settings or some documentation enhancements.  That or I'll learn that 
>> the stack is already dealing with the issue...
>>
>> rick jones
>>
>>> bryan
>>>
>>> At 11:06 AM 4/26/2007, Michael S. Tsirkin wrote:
>>>
>>>> > Quoting Bryan Lawver <lawver1@llnl.gov>:
>>>> > Subject: Re: IPoIB forwarding
>>>> >
>>>> > Here's a tcpdump of the same sequence.  The TCP MSS is 8960 and it 
>>>> appears
>>>> > that two payloads are queued at ipoib which combines them into a 
>>>> single
>>>> > 17920 payload with assumingly correct IP header (40) and IB header
>>>> > (4).  The application or TCP stack does not acknowledge this 
>>>> double packet
>>>> > ie. it does not ACK until each of the 8960 packets are resent
>>>> > individually.  Being an IB newbie, I am guessing this combining is
>>>> > allowable but may violate TCP protocol.
>>>>
>>>> IPoIB does nothing like this - it's just a network device so
>>>> it sends all packets out as is.
>>>>
>>>> -- 
>>>> MST
>>>
>>>
>>> _______________________________________________
>>> general mailing list
>>> general@lists.openfabrics.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>> To unsubscribe, please visit 
>>> http://openib.org/mailman/listinfo/openib-general


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-04-27 22:32                 ` Rick Jones
@ 2007-04-27 22:43                   ` Bryan Lawver
  2007-04-27 23:37                     ` Rick Jones
  0 siblings, 1 reply; 20+ messages in thread
From: Bryan Lawver @ 2007-04-27 22:43 UTC (permalink / raw)
  To: Rick Jones; +Cc: Linux Network Development list, Michael S. Tsirkin, general

I had so much debugging turned on that it was not the "slowing of the 
traffic" but the "non-coelescencing" that was the remedy.  The NIC is a 
MyriCom NIC and these are easy options to set.


At 03:32 PM 4/27/2007, Rick Jones wrote:
>Bryan Lawver wrote:
>>I hit the IP NIC over the head with a hammer and turned off all offload 
>>features and I no longer get the super jumbo packet and I have symmetric 
>>performance.  This NIC supported "ethtool -K ethx tso/tx/rx/sg on/off" 
>>and I am not sure at this time which one I needed to whack but all off 
>>solved the problem.
>
>Yeah, that does seem like a rather broad remedy, but I guess if it 
>works... :) And I suppose most of those offloads don't matter for a NIC 
>being used in a router.
>
>Only problem is we don't know if it worked because it slowed-down the 10G 
>side or because it had LRO disabling as a side-effect. If I were to guess, 
>of those things listed, I'd guess that receive cko would have that as a 
>side effect.
>
>Just what sort of 10G NIC was this anyway?  With that knowledge we could 
>probably narrow things down to a more specific modprobe setting, or maybe 
>even an ethtool command, for some suitable revision of ethtool.
>
>rick jones
>
>>Thanks for listening and re enforcing my search process.
>>bryan
>>At 01:32 PM 4/27/2007, Rick Jones wrote:
>>
>>>Bryan Lawver wrote:
>>>
>>>>Your right about the ipoib module not combining packets (I believed you 
>>>>without checking) but I did never the less.  The ipoib_start_xmit 
>>>>routine is definitely handed a "double packet"  which means that the IP 
>>>>NIC driver or the kernel is combining two packets into a single super 
>>>>jumbo packet.  This issue is irrespective of the IP MTU setting because 
>>>>I have set all interfaces to 9000k yet  ipoib accepts and forwards this 
>>>>17964 packet to the next IB node and onto the TCP stack where it is 
>>>>never acknowledged.  This may not have come up in prior testing because 
>>>>I am using some of the fastest IP NICs which have no trouble keeping up 
>>>>with or exceeding the bandwidth of the IB side.
>>>>This issue arises exactly every 8 packets...(ring buffer overrun??)
>>>>I will be at Sonoma for the next few days as many on this list will be.
>>>
>>>
>>>
>>>Some NICs (esp 10G) support large receive offload - they coalesce TCP 
>>>segments from the wire/fiber into larger ones they pass up the stack.
>>>Perhaps that is happening here?
>>>
>>>I'm going to go out a bit on a limb, cross the streams, and include 
>>>netdev, because I suspect that if a system is acting as an IP router, 
>>>one doesn't want large receive offload enabled.  That may need some 
>>>discussion in netdev - it may then require some changes to default 
>>>settings or some documentation enhancements.  That or I'll learn that 
>>>the stack is already dealing with the issue...
>>>
>>>rick jones
>>>
>>>>bryan
>>>>
>>>>At 11:06 AM 4/26/2007, Michael S. Tsirkin wrote:
>>>>
>>>>> > Quoting Bryan Lawver <lawver1@llnl.gov>:
>>>>> > Subject: Re: IPoIB forwarding
>>>>> >
>>>>> > Here's a tcpdump of the same sequence.  The TCP MSS is 8960 and it 
>>>>> appears
>>>>> > that two payloads are queued at ipoib which combines them into a single
>>>>> > 17920 payload with assumingly correct IP header (40) and IB header
>>>>> > (4).  The application or TCP stack does not acknowledge this double 
>>>>> packet
>>>>> > ie. it does not ACK until each of the 8960 packets are resent
>>>>> > individually.  Being an IB newbie, I am guessing this combining is
>>>>> > allowable but may violate TCP protocol.
>>>>>
>>>>>IPoIB does nothing like this - it's just a network device so
>>>>>it sends all packets out as is.
>>>>>
>>>>>--
>>>>>MST
>>>>
>>>>
>>>>_______________________________________________
>>>>general mailing list
>>>>general@lists.openfabrics.org
>>>>http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>>To unsubscribe, please visit 
>>>>http://openib.org/mailman/listinfo/openib-general

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-04-27 22:43                   ` Bryan Lawver
@ 2007-04-27 23:37                     ` Rick Jones
  2007-04-27 23:39                       ` David Miller
  2007-04-28  6:51                       ` [ofa-general] Re: IPoIB forwarding Bill Fink
  0 siblings, 2 replies; 20+ messages in thread
From: Rick Jones @ 2007-04-27 23:37 UTC (permalink / raw)
  To: Bryan Lawver; +Cc: Linux Network Development list, Michael S. Tsirkin, general

Bryan Lawver wrote:
> I had so much debugging turned on that it was not the "slowing of the 
> traffic" but the "non-coelescencing" that was the remedy.  The NIC is a 
> MyriCom NIC and these are easy options to set.

As chance would have it, I've played with some Myricom myri10ge NICs recently, 
and even disabled large receive offload during some netperf tests :)  It is a 
modprobe option.  Going back now to the driver source and the README I see :-)


<excerpt>
Troubleshooting
===============

Large Receive Offload (LRO) is enabled by default.  This will
interfere with forwarding TCP traffic.  If you plan to forward TCP
traffic (using the host with the Myri10GE NIC as a router or bridge),
you must disable LRO.  To disable LRO, load the myri10ge driver
with myri10ge_lro set to 0:

          # modprobe myri10ge myri10ge_lro=0

Alternatively, you can disable LRO at runtime by disabling
receive checksum offloading via ethtool:

    # ethtool -K eth2 rx off

</excerpt>

rick jones

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-04-27 23:37                     ` Rick Jones
@ 2007-04-27 23:39                       ` David Miller
  2007-04-27 23:48                         ` Rick Jones
  2007-04-28  6:51                       ` [ofa-general] Re: IPoIB forwarding Bill Fink
  1 sibling, 1 reply; 20+ messages in thread
From: David Miller @ 2007-04-27 23:39 UTC (permalink / raw)
  To: rick.jones2; +Cc: lawver1, netdev, mst, general

From: Rick Jones <rick.jones2@hp.com>
Date: Fri, 27 Apr 2007 16:37:49 -0700

> Large Receive Offload (LRO) is enabled by default.  This will
> interfere with forwarding TCP traffic.  If you plan to forward TCP
> traffic (using the host with the Myri10GE NIC as a router or bridge),
> you must disable LRO.  To disable LRO, load the myri10ge driver
> with myri10ge_lro set to 0:

LRO should be disabled by default if the driver does this.  This is a
major and unacceptable bug.

Thanks for pointing this out Rick.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-04-27 23:39                       ` David Miller
@ 2007-04-27 23:48                         ` Rick Jones
  2007-04-27 23:52                           ` David Miller
  0 siblings, 1 reply; 20+ messages in thread
From: Rick Jones @ 2007-04-27 23:48 UTC (permalink / raw)
  To: David Miller; +Cc: lawver1, netdev, mst, general

David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Fri, 27 Apr 2007 16:37:49 -0700
> 
> 
>>Large Receive Offload (LRO) is enabled by default.  This will
>>interfere with forwarding TCP traffic.  If you plan to forward TCP
>>traffic (using the host with the Myri10GE NIC as a router or bridge),
>>you must disable LRO.  To disable LRO, load the myri10ge driver
>>with myri10ge_lro set to 0:
> 
> 
> LRO should be disabled by default if the driver does this.  This is a
> major and unacceptable bug.
> 
> Thanks for pointing this out Rick.

No problem - just to play whatif/devil's advocate for a bit though... is there 
any way to tie that in with the setting of net.ipv4.ip_forward (and/or its IPv6 
counterpart)?

rick jones

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-04-27 23:48                         ` Rick Jones
@ 2007-04-27 23:52                           ` David Miller
  2007-04-30 17:16                             ` Rick Jones
  0 siblings, 1 reply; 20+ messages in thread
From: David Miller @ 2007-04-27 23:52 UTC (permalink / raw)
  To: rick.jones2; +Cc: lawver1, mst, general, netdev

From: Rick Jones <rick.jones2@hp.com>
Date: Fri, 27 Apr 2007 16:48:00 -0700

> No problem - just to play whatif/devil's advocate for a bit
> though... is there any way to tie that in with the setting of
> net.ipv4.ip_forward (and/or its IPv6 counterpart)?

Even ignoring that, consider the potential issues this
kind of problem could be causing netfilter.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-04-27 22:26               ` Bryan Lawver
  2007-04-27 22:32                 ` Rick Jones
@ 2007-04-28  2:35                 ` parks
  1 sibling, 0 replies; 20+ messages in thread
From: parks @ 2007-04-28  2:35 UTC (permalink / raw)
  To: Bryan Lawver, Rick Jones
  Cc: Linux Network Development list, Michael S. Tsirkin, general


[-- Attachment #1.1: Type: text/plain, Size: 3560 bytes --]


If you are using the node as a router and using the myrinet nic then 
there is something we had to turn off. It was causing panics on 
Roadrunner. It is spelled out explicitly in the Myrinet readme....
It combines packerts.

I can tell you more monday.


At 04:26 PM 4/27/2007, Bryan Lawver wrote:
>I hit the IP NIC over the head with a hammer and turned off all 
>offload features and I no longer get the super jumbo packet and I 
>have symmetric performance.  This NIC supported "ethtool -K ethx 
>tso/tx/rx/sg on/off" and I am not sure at this time which one I 
>needed to whack but all off solved the problem.
>
>Thanks for listening and re enforcing my search process.
>
>bryan
>
>At 01:32 PM 4/27/2007, Rick Jones wrote:
>>Bryan Lawver wrote:
>>>Your right about the ipoib module not combining packets (I 
>>>believed you without checking) but I did never the less.  The 
>>>ipoib_start_xmit routine is definitely handed a "double 
>>>packet"  which means that the IP NIC driver or the kernel is 
>>>combining two packets into a single super jumbo packet.  This 
>>>issue is irrespective of the IP MTU setting because I have set all 
>>>interfaces to 9000k yet  ipoib accepts and forwards this 17964 
>>>packet to the next IB node and onto the TCP stack where it is 
>>>never acknowledged.  This may not have come up in prior testing 
>>>because I am using some of the fastest IP NICs which have no 
>>>trouble keeping up with or exceeding the bandwidth of the IB 
>>>side.  This issue arises exactly every 8 packets...(ring buffer overrun??)
>>>I will be at Sonoma for the next few days as many on this list will be.
>>
>>
>>Some NICs (esp 10G) support large receive offload - they coalesce 
>>TCP segments from the wire/fiber into larger ones they pass up the 
>>stack.  Perhaps that is happening here?
>>
>>I'm going to go out a bit on a limb, cross the streams, and include 
>>netdev, because I suspect that if a system is acting as an IP 
>>router, one doesn't want large receive offload enabled.  That may 
>>need some discussion in netdev - it may then require some changes 
>>to default settings or some documentation enhancements.  That or 
>>I'll learn that the stack is already dealing with the issue...
>>
>>rick jones
>>
>>>bryan
>>>
>>>At 11:06 AM 4/26/2007, Michael S. Tsirkin wrote:
>>>
>>>> > Quoting Bryan Lawver <lawver1@llnl.gov>:
>>>> > Subject: Re: IPoIB forwarding
>>>> >
>>>> > Here's a tcpdump of the same sequence.  The TCP MSS is 8960 
>>>> and it appears
>>>> > that two payloads are queued at ipoib which combines them into a single
>>>> > 17920 payload with assumingly correct IP header (40) and IB header
>>>> > (4).  The application or TCP stack does not acknowledge this 
>>>> double packet
>>>> > ie. it does not ACK until each of the 8960 packets are resent
>>>> > individually.  Being an IB newbie, I am guessing this combining is
>>>> > allowable but may violate TCP protocol.
>>>>
>>>>IPoIB does nothing like this - it's just a network device so
>>>>it sends all packets out as is.
>>>>
>>>>--
>>>>MST
>>>
>>>_______________________________________________
>>>general mailing list
>>>general@lists.openfabrics.org
>>>http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>To unsubscribe, please visit 
>>>http://openib.org/mailman/listinfo/openib-general
>
>_______________________________________________
>general mailing list
>general@lists.openfabrics.org
>http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[-- Attachment #1.2: Type: text/html, Size: 4328 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-04-27 23:37                     ` Rick Jones
  2007-04-27 23:39                       ` David Miller
@ 2007-04-28  6:51                       ` Bill Fink
  2007-04-29 19:40                         ` Loic Prylli
  2007-04-30 17:07                         ` Rick Jones
  1 sibling, 2 replies; 20+ messages in thread
From: Bill Fink @ 2007-04-28  6:51 UTC (permalink / raw)
  To: Rick Jones
  Cc: Bryan Lawver, Development list, Michael S. Tsirkin, general,
	Linux

On Fri, 27 Apr 2007, Rick Jones wrote:

> Bryan Lawver wrote:
> > I had so much debugging turned on that it was not the "slowing of the 
> > traffic" but the "non-coelescencing" that was the remedy.  The NIC is a 
> > MyriCom NIC and these are easy options to set.
> 
> As chance would have it, I've played with some Myricom myri10ge NICs recently, 
> and even disabled large receive offload during some netperf tests :)  It is a 
> modprobe option.  Going back now to the driver source and the README I see :-)
> 
> 
> <excerpt>
> Troubleshooting
> ===============
> 
> Large Receive Offload (LRO) is enabled by default.  This will
> interfere with forwarding TCP traffic.  If you plan to forward TCP
> traffic (using the host with the Myri10GE NIC as a router or bridge),
> you must disable LRO.  To disable LRO, load the myri10ge driver
> with myri10ge_lro set to 0:
> 
>           # modprobe myri10ge myri10ge_lro=0
> 
> Alternatively, you can disable LRO at runtime by disabling
> receive checksum offloading via ethtool:
> 
>     # ethtool -K eth2 rx off
> 
> </excerpt>
> 
> rick jones

What version of the myri10ge driver is this?  With the 1.2.0 version
that comes with the 2.6.20.7 kernel, there is no myri10ge_lro module
parameter.

[root@lang2 ~]# modinfo myri10ge | grep -i lro
[root@lang2 ~]# 

And I've been testing IP forwarding using two Myricom 10-GigE NICs
without setting any special modprobe parameters.

						-Bill

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-04-28  6:51                       ` [ofa-general] Re: IPoIB forwarding Bill Fink
@ 2007-04-29 19:40                         ` Loic Prylli
  2007-04-30 21:12                           ` Rick Jones
  2007-04-30 17:07                         ` Rick Jones
  1 sibling, 1 reply; 20+ messages in thread
From: Loic Prylli @ 2007-04-29 19:40 UTC (permalink / raw)
  To: Bill Fink
  Cc: Bryan Lawver, Linux Network Development list, Michael S. Tsirkin,
	general



On 4/28/2007 2:51 AM, Bill Fink wrote:
> On Fri, 27 Apr 2007, Rick Jones wrote:
>
>   
>> Bryan Lawver wrote:
>>     
>>> I had so much debugging turned on that it was not the "slowing of the 
>>> traffic" but the "non-coelescencing" that was the remedy.  The NIC is a 
>>> MyriCom NIC and these are easy options to set.
>>>       
>> As chance would have it, I've played with some Myricom myri10ge NICs recently, 
>> and even disabled large receive offload during some netperf tests :)  It is a 
>> modprobe option.  Going back now to the driver source and the README I see :-)
>>
>>
>> [..]
>>
>> rick jones
>>     
>
> What version of the myri10ge driver is this?  With the 1.2.0 version
> that comes with the 2.6.20.7 kernel, there is no myri10ge_lro module
> parameter.
>
>   


The myri10ge_lro parameter does not exists in the kernel tree. The 
option and corresponding lro code is available only in the externally 
distributed version of myri10ge. That code was submitted to the netdev 
list, but wasn't taken in the kernel tree because of the reasonable 
concern the driver might not be the right place for that code (if nobody 
else proposes something equivalent in the meantime, we might at some 
point resubmit it as a driver-independant addon, but it might not be 
that soon for manpower reasons).

Only the 1.2.0 version of the external driver makes LRO incompatible 
with forwarding. The problem should be fixed in version 1.3.0 released a 
few weeks ago (forwarding with myri10ge_lro enabled should then work), 
let us know otherwise.

Anyway, following David Miller remark about netfilter, for the next 
version we might ask the user to explicitely enable LRO rather than 
making the default.

Sorry for the inconvenience.


Loic

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-04-28  6:51                       ` [ofa-general] Re: IPoIB forwarding Bill Fink
  2007-04-29 19:40                         ` Loic Prylli
@ 2007-04-30 17:07                         ` Rick Jones
  2007-05-01  5:57                           ` Bill Fink
  1 sibling, 1 reply; 20+ messages in thread
From: Rick Jones @ 2007-04-30 17:07 UTC (permalink / raw)
  To: Bill Fink
  Cc: Bryan Lawver, Linux Network Development list, Michael S. Tsirkin,
	general

> What version of the myri10ge driver is this?  With the 1.2.0 version
> that comes with the 2.6.20.7 kernel, there is no myri10ge_lro module
> parameter.
> 
> [root@lang2 ~]# modinfo myri10ge | grep -i lro
> [root@lang2 ~]# 
> 
> And I've been testing IP forwarding using two Myricom 10-GigE NICs
> without setting any special modprobe parameters.


Ethtool -i on the interface reports 1.2.0 as the driver version.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-04-27 23:52                           ` David Miller
@ 2007-04-30 17:16                             ` Rick Jones
  2007-05-01 22:43                               ` [PATCH] make myri10ge use default MTU of 1500 bytes Loic Prylli
  0 siblings, 1 reply; 20+ messages in thread
From: Rick Jones @ 2007-04-30 17:16 UTC (permalink / raw)
  To: David Miller; +Cc: lawver1, netdev, mst, general

David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Fri, 27 Apr 2007 16:48:00 -0700
> 
> 
>>No problem - just to play whatif/devil's advocate for a bit
>>though... is there any way to tie that in with the setting of
>>net.ipv4.ip_forward (and/or its IPv6 counterpart)?
> 
> 
> Even ignoring that, consider the potential issues this
> kind of problem could be causing netfilter.


OK, I'll show my ignorance and bite - what sort of issues with 
netfilter?  Is it tied to link-local MTUs?

rick jones

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-04-29 19:40                         ` Loic Prylli
@ 2007-04-30 21:12                           ` Rick Jones
  2007-05-01 22:05                             ` Loic Prylli
  0 siblings, 1 reply; 20+ messages in thread
From: Rick Jones @ 2007-04-30 21:12 UTC (permalink / raw)
  To: Loic Prylli
  Cc: Bryan Lawver, Linux Network Development list, Bill Fink,
	Michael S. Tsirkin, general

> Only the 1.2.0 version of the external driver makes LRO incompatible 
> with forwarding. The problem should be fixed in version 1.3.0 released a 
> few weeks ago (forwarding with myri10ge_lro enabled should then work), 
> let us know otherwise.
> 
> Anyway, following David Miller remark about netfilter, for the next 
> version we might ask the user to explicitely enable LRO rather than 
> making the default.

Speaking of defaults, it would seem that the external 1.2.0 driver comes 
with 9000 bytes as the default MTU?  At least I think that is what I am 
seeing now that I've started looking more closely.

rick jones

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-04-30 17:07                         ` Rick Jones
@ 2007-05-01  5:57                           ` Bill Fink
  2007-05-01 16:26                             ` Loic Prylli
  0 siblings, 1 reply; 20+ messages in thread
From: Bill Fink @ 2007-05-01  5:57 UTC (permalink / raw)
  To: Rick Jones
  Cc: Bryan Lawver, Michael S. Tsirkin, general,
	Linux Network Development list

On Mon, 30 Apr 2007, Rick Jones wrote:

> > What version of the myri10ge driver is this?  With the 1.2.0 version
> > that comes with the 2.6.20.7 kernel, there is no myri10ge_lro module
> > parameter.
> > 
> > [root@lang2 ~]# modinfo myri10ge | grep -i lro
> > [root@lang2 ~]# 
> > 
> > And I've been testing IP forwarding using two Myricom 10-GigE NICs
> > without setting any special modprobe parameters.
> 
> 
> Ethtool -i on the interface reports 1.2.0 as the driver version.

Perhaps it would be useful to have different version strings for
the in-kernel Linux version and the Myricom externally provided
version.  Just a thought.

						-Bill

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-05-01  5:57                           ` Bill Fink
@ 2007-05-01 16:26                             ` Loic Prylli
  0 siblings, 0 replies; 20+ messages in thread
From: Loic Prylli @ 2007-05-01 16:26 UTC (permalink / raw)
  To: Bill Fink
  Cc: Bryan Lawver, Linux Network Development list, Michael S. Tsirkin,
	general

On 5/1/2007 1:57 AM, Bill Fink wrote:
> On Mon, 30 Apr 2007, Rick Jones wrote:
>
>   
>> Ethtool -i on the interface reports 1.2.0 as the driver version.
>>     
>
> Perhaps it would be useful to have different version strings for
> the in-kernel Linux version and the Myricom externally provided
> version.  Just a thought.
>   


Indeed, and it is the case as of March-21 git (or any myri10ge version 
 >= 1.3.0). The in-kernel version will show something like:
1.3.0-1.226, the external version will only show1.3.0.


Loic

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-04-30 21:12                           ` Rick Jones
@ 2007-05-01 22:05                             ` Loic Prylli
  2007-05-01 22:12                               ` Rick Jones
  2007-05-03 23:37                               ` Bryan Lawver
  0 siblings, 2 replies; 20+ messages in thread
From: Loic Prylli @ 2007-05-01 22:05 UTC (permalink / raw)
  To: Rick Jones
  Cc: Bryan Lawver, Linux Network Development list, Bill Fink, mst,
	general

On 4/30/2007 2:12 PM, Rick Jones wrote:
>
> Speaking of defaults, it would seem that the external 1.2.0 driver 
> comes with 9000 bytes as the default MTU?  At least I think that is 
> what I am seeing now that I've started looking more closely.
>
> rick jones


That's the same for the in-kernel-tree code (9K MTU by default). 
Assuming this is not wanted, I will submit a patch for that.


Loic

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-05-01 22:05                             ` Loic Prylli
@ 2007-05-01 22:12                               ` Rick Jones
  2007-05-03 23:37                               ` Bryan Lawver
  1 sibling, 0 replies; 20+ messages in thread
From: Rick Jones @ 2007-05-01 22:12 UTC (permalink / raw)
  To: Loic Prylli
  Cc: Bryan Lawver, Linux Network Development list, Bill Fink, mst,
	general

Loic Prylli wrote:
> On 4/30/2007 2:12 PM, Rick Jones wrote:
> 
>>
>> Speaking of defaults, it would seem that the external 1.2.0 driver 
>> comes with 9000 bytes as the default MTU?  At least I think that is 
>> what I am seeing now that I've started looking more closely.
>>
>> rick jones
> 
> 
> 
> That's the same for the in-kernel-tree code (9K MTU by default). 
> Assuming this is not wanted, I will submit a patch for that.

While I like what that does for perrformance, and at the risk of putting 
words into the mouths of netdev, I suspect that 1500 bytes is indeed the 
desired default.  It matches the IEEE specs, I've yet to see a switch 
which enabled "Jumbo Frames" by default, not everything out there even 
believes that Jubmo Frames means 9000 byte MTU etc etc etc.  I think 
that 1500 bytes for an "Ethernet" device remains in line with the 
principle of least surprise.

rick jones

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] make myri10ge use default MTU of 1500 bytes
  2007-04-30 17:16                             ` Rick Jones
@ 2007-05-01 22:43                               ` Loic Prylli
  0 siblings, 0 replies; 20+ messages in thread
From: Loic Prylli @ 2007-05-01 22:43 UTC (permalink / raw)
  To: netdev; +Cc: Rick Jones, David Miller

Change default MTU from jumbo (9000) to standard (1500) for myri10ge


Signed-off-by: Loic Prylli <loic@myri.com>
---
 drivers/net/myri10ge/myri10ge.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
index 16e3c43..0e9cc17 100644
--- a/drivers/net/myri10ge/myri10ge.c
+++ b/drivers/net/myri10ge/myri10ge.c
@@ -252,7 +252,7 @@ module_param(myri10ge_force_firmware, int, S_IRUGO);
 MODULE_PARM_DESC(myri10ge_force_firmware,
                 "Force firmware to assume aligned completions\n");
 
-static int myri10ge_initial_mtu = MYRI10GE_MAX_ETHER_MTU - ETH_HLEN;
+static int myri10ge_initial_mtu  = 1500;
 module_param(myri10ge_initial_mtu, int, S_IRUGO);
 MODULE_PARM_DESC(myri10ge_initial_mtu, "Initial MTU\n");
 
-- 
1.5.0.1




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [ofa-general] Re: IPoIB forwarding
  2007-05-01 22:05                             ` Loic Prylli
  2007-05-01 22:12                               ` Rick Jones
@ 2007-05-03 23:37                               ` Bryan Lawver
  1 sibling, 0 replies; 20+ messages in thread
From: Bryan Lawver @ 2007-05-03 23:37 UTC (permalink / raw)
  To: Loic Prylli, Rick Jones
  Cc: Bill Fink, Linux Network Development list, mst, general

I have been able to install and use the 1.3.0 myricom driver and everything 
works as I expected and performance is pretty decent.  Interesting little 
side tour through various drivers...The router node sees almost no load 
which is really encouraging.

Thanks,
bryan

At 03:05 PM 5/1/2007, Loic Prylli wrote:
>On 4/30/2007 2:12 PM, Rick Jones wrote:
>>
>>Speaking of defaults, it would seem that the external 1.2.0 driver comes 
>>with 9000 bytes as the default MTU?  At least I think that is what I am 
>>seeing now that I've started looking more closely.
>>
>>rick jones
>
>
>That's the same for the in-kernel-tree code (9K MTU by default). Assuming 
>this is not wanted, I will submit a patch for that.
>
>
>Loic


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-05-03 23:37 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <6.1.2.0.2.20070423160212.12db6400@mail.llnl.gov>
     [not found] ` <20070425124652.GG1624@mellanox.co.il>
     [not found]   ` <6.1.2.0.2.20070426083410.1389d9e0@mail.llnl.gov>
     [not found]     ` <20070426161409.GF15540@mellanox.co.il>
     [not found]       ` <6.1.2.0.2.20070426095112.138e9a68@mail.llnl.gov>
     [not found]         ` <20070426180618.GJ15540@mellanox.co.il>
     [not found]           ` <6.1.2.0.2.20070427115435.13ea5ec0@mail.llnl.gov>
2007-04-27 20:32             ` [ofa-general] Re: IPoIB forwarding Rick Jones
2007-04-27 22:26               ` Bryan Lawver
2007-04-27 22:32                 ` Rick Jones
2007-04-27 22:43                   ` Bryan Lawver
2007-04-27 23:37                     ` Rick Jones
2007-04-27 23:39                       ` David Miller
2007-04-27 23:48                         ` Rick Jones
2007-04-27 23:52                           ` David Miller
2007-04-30 17:16                             ` Rick Jones
2007-05-01 22:43                               ` [PATCH] make myri10ge use default MTU of 1500 bytes Loic Prylli
2007-04-28  6:51                       ` [ofa-general] Re: IPoIB forwarding Bill Fink
2007-04-29 19:40                         ` Loic Prylli
2007-04-30 21:12                           ` Rick Jones
2007-05-01 22:05                             ` Loic Prylli
2007-05-01 22:12                               ` Rick Jones
2007-05-03 23:37                               ` Bryan Lawver
2007-04-30 17:07                         ` Rick Jones
2007-05-01  5:57                           ` Bill Fink
2007-05-01 16:26                             ` Loic Prylli
2007-04-28  2:35                 ` parks

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).