* Re: [ofa-general] Re: IPoIB forwarding [not found] ` <6.1.2.0.2.20070427115435.13ea5ec0@mail.llnl.gov> @ 2007-04-27 20:32 ` Rick Jones 2007-04-27 22:26 ` Bryan Lawver 0 siblings, 1 reply; 20+ messages in thread From: Rick Jones @ 2007-04-27 20:32 UTC (permalink / raw) To: Bryan Lawver; +Cc: Linux Network Development list, Michael S. Tsirkin, general Bryan Lawver wrote: > Your right about the ipoib module not combining packets (I believed you > without checking) but I did never the less. The ipoib_start_xmit > routine is definitely handed a "double packet" which means that the IP > NIC driver or the kernel is combining two packets into a single super > jumbo packet. This issue is irrespective of the IP MTU setting because > I have set all interfaces to 9000k yet ipoib accepts and forwards this > 17964 packet to the next IB node and onto the TCP stack where it is > never acknowledged. This may not have come up in prior testing because > I am using some of the fastest IP NICs which have no trouble keeping up > with or exceeding the bandwidth of the IB side. This issue arises > exactly every 8 packets...(ring buffer overrun??) > > I will be at Sonoma for the next few days as many on this list will be. Some NICs (esp 10G) support large receive offload - they coalesce TCP segments from the wire/fiber into larger ones they pass up the stack. Perhaps that is happening here? I'm going to go out a bit on a limb, cross the streams, and include netdev, because I suspect that if a system is acting as an IP router, one doesn't want large receive offload enabled. That may need some discussion in netdev - it may then require some changes to default settings or some documentation enhancements. That or I'll learn that the stack is already dealing with the issue... rick jones > bryan > > > > At 11:06 AM 4/26/2007, Michael S. Tsirkin wrote: > >> > Quoting Bryan Lawver <lawver1@llnl.gov>: >> > Subject: Re: IPoIB forwarding >> > >> > Here's a tcpdump of the same sequence. The TCP MSS is 8960 and it >> appears >> > that two payloads are queued at ipoib which combines them into a single >> > 17920 payload with assumingly correct IP header (40) and IB header >> > (4). The application or TCP stack does not acknowledge this double >> packet >> > ie. it does not ACK until each of the 8960 packets are resent >> > individually. Being an IB newbie, I am guessing this combining is >> > allowable but may violate TCP protocol. >> >> IPoIB does nothing like this - it's just a network device so >> it sends all packets out as is. >> >> -- >> MST > > > _______________________________________________ > general mailing list > general@lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-04-27 20:32 ` [ofa-general] Re: IPoIB forwarding Rick Jones @ 2007-04-27 22:26 ` Bryan Lawver 2007-04-27 22:32 ` Rick Jones 2007-04-28 2:35 ` parks 0 siblings, 2 replies; 20+ messages in thread From: Bryan Lawver @ 2007-04-27 22:26 UTC (permalink / raw) To: Rick Jones; +Cc: Linux Network Development list, Michael S. Tsirkin, general I hit the IP NIC over the head with a hammer and turned off all offload features and I no longer get the super jumbo packet and I have symmetric performance. This NIC supported "ethtool -K ethx tso/tx/rx/sg on/off" and I am not sure at this time which one I needed to whack but all off solved the problem. Thanks for listening and re enforcing my search process. bryan At 01:32 PM 4/27/2007, Rick Jones wrote: >Bryan Lawver wrote: >>Your right about the ipoib module not combining packets (I believed you >>without checking) but I did never the less. The ipoib_start_xmit routine >>is definitely handed a "double packet" which means that the IP NIC >>driver or the kernel is combining two packets into a single super jumbo >>packet. This issue is irrespective of the IP MTU setting because I have >>set all interfaces to 9000k yet ipoib accepts and forwards this 17964 >>packet to the next IB node and onto the TCP stack where it is never >>acknowledged. This may not have come up in prior testing because I am >>using some of the fastest IP NICs which have no trouble keeping up with >>or exceeding the bandwidth of the IB side. This issue arises exactly >>every 8 packets...(ring buffer overrun??) >>I will be at Sonoma for the next few days as many on this list will be. > > >Some NICs (esp 10G) support large receive offload - they coalesce TCP >segments from the wire/fiber into larger ones they pass up the >stack. Perhaps that is happening here? > >I'm going to go out a bit on a limb, cross the streams, and include >netdev, because I suspect that if a system is acting as an IP router, one >doesn't want large receive offload enabled. That may need some discussion >in netdev - it may then require some changes to default settings or some >documentation enhancements. That or I'll learn that the stack is already >dealing with the issue... > >rick jones > >>bryan >> >>At 11:06 AM 4/26/2007, Michael S. Tsirkin wrote: >> >>> > Quoting Bryan Lawver <lawver1@llnl.gov>: >>> > Subject: Re: IPoIB forwarding >>> > >>> > Here's a tcpdump of the same sequence. The TCP MSS is 8960 and it >>> appears >>> > that two payloads are queued at ipoib which combines them into a single >>> > 17920 payload with assumingly correct IP header (40) and IB header >>> > (4). The application or TCP stack does not acknowledge this double >>> packet >>> > ie. it does not ACK until each of the 8960 packets are resent >>> > individually. Being an IB newbie, I am guessing this combining is >>> > allowable but may violate TCP protocol. >>> >>>IPoIB does nothing like this - it's just a network device so >>>it sends all packets out as is. >>> >>>-- >>>MST >> >>_______________________________________________ >>general mailing list >>general@lists.openfabrics.org >>http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>To unsubscribe, please visit >>http://openib.org/mailman/listinfo/openib-general ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-04-27 22:26 ` Bryan Lawver @ 2007-04-27 22:32 ` Rick Jones 2007-04-27 22:43 ` Bryan Lawver 2007-04-28 2:35 ` parks 1 sibling, 1 reply; 20+ messages in thread From: Rick Jones @ 2007-04-27 22:32 UTC (permalink / raw) To: Bryan Lawver; +Cc: Michael S. Tsirkin, general, Linux Network Development list Bryan Lawver wrote: > I hit the IP NIC over the head with a hammer and turned off all offload > features and I no longer get the super jumbo packet and I have symmetric > performance. This NIC supported "ethtool -K ethx tso/tx/rx/sg on/off" > and I am not sure at this time which one I needed to whack but all off > solved the problem. Yeah, that does seem like a rather broad remedy, but I guess if it works... :) And I suppose most of those offloads don't matter for a NIC being used in a router. Only problem is we don't know if it worked because it slowed-down the 10G side or because it had LRO disabling as a side-effect. If I were to guess, of those things listed, I'd guess that receive cko would have that as a side effect. Just what sort of 10G NIC was this anyway? With that knowledge we could probably narrow things down to a more specific modprobe setting, or maybe even an ethtool command, for some suitable revision of ethtool. rick jones > > Thanks for listening and re enforcing my search process. > > bryan > > At 01:32 PM 4/27/2007, Rick Jones wrote: > >> Bryan Lawver wrote: >> >>> Your right about the ipoib module not combining packets (I believed >>> you without checking) but I did never the less. The ipoib_start_xmit >>> routine is definitely handed a "double packet" which means that the >>> IP NIC driver or the kernel is combining two packets into a single >>> super jumbo packet. This issue is irrespective of the IP MTU setting >>> because I have set all interfaces to 9000k yet ipoib accepts and >>> forwards this 17964 packet to the next IB node and onto the TCP stack >>> where it is never acknowledged. This may not have come up in prior >>> testing because I am using some of the fastest IP NICs which have no >>> trouble keeping up with or exceeding the bandwidth of the IB side. >>> This issue arises exactly every 8 packets...(ring buffer overrun??) >>> I will be at Sonoma for the next few days as many on this list will be. >> >> >> >> Some NICs (esp 10G) support large receive offload - they coalesce TCP >> segments from the wire/fiber into larger ones they pass up the stack. >> Perhaps that is happening here? >> >> I'm going to go out a bit on a limb, cross the streams, and include >> netdev, because I suspect that if a system is acting as an IP router, >> one doesn't want large receive offload enabled. That may need some >> discussion in netdev - it may then require some changes to default >> settings or some documentation enhancements. That or I'll learn that >> the stack is already dealing with the issue... >> >> rick jones >> >>> bryan >>> >>> At 11:06 AM 4/26/2007, Michael S. Tsirkin wrote: >>> >>>> > Quoting Bryan Lawver <lawver1@llnl.gov>: >>>> > Subject: Re: IPoIB forwarding >>>> > >>>> > Here's a tcpdump of the same sequence. The TCP MSS is 8960 and it >>>> appears >>>> > that two payloads are queued at ipoib which combines them into a >>>> single >>>> > 17920 payload with assumingly correct IP header (40) and IB header >>>> > (4). The application or TCP stack does not acknowledge this >>>> double packet >>>> > ie. it does not ACK until each of the 8960 packets are resent >>>> > individually. Being an IB newbie, I am guessing this combining is >>>> > allowable but may violate TCP protocol. >>>> >>>> IPoIB does nothing like this - it's just a network device so >>>> it sends all packets out as is. >>>> >>>> -- >>>> MST >>> >>> >>> _______________________________________________ >>> general mailing list >>> general@lists.openfabrics.org >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>> To unsubscribe, please visit >>> http://openib.org/mailman/listinfo/openib-general ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-04-27 22:32 ` Rick Jones @ 2007-04-27 22:43 ` Bryan Lawver 2007-04-27 23:37 ` Rick Jones 0 siblings, 1 reply; 20+ messages in thread From: Bryan Lawver @ 2007-04-27 22:43 UTC (permalink / raw) To: Rick Jones; +Cc: Linux Network Development list, Michael S. Tsirkin, general I had so much debugging turned on that it was not the "slowing of the traffic" but the "non-coelescencing" that was the remedy. The NIC is a MyriCom NIC and these are easy options to set. At 03:32 PM 4/27/2007, Rick Jones wrote: >Bryan Lawver wrote: >>I hit the IP NIC over the head with a hammer and turned off all offload >>features and I no longer get the super jumbo packet and I have symmetric >>performance. This NIC supported "ethtool -K ethx tso/tx/rx/sg on/off" >>and I am not sure at this time which one I needed to whack but all off >>solved the problem. > >Yeah, that does seem like a rather broad remedy, but I guess if it >works... :) And I suppose most of those offloads don't matter for a NIC >being used in a router. > >Only problem is we don't know if it worked because it slowed-down the 10G >side or because it had LRO disabling as a side-effect. If I were to guess, >of those things listed, I'd guess that receive cko would have that as a >side effect. > >Just what sort of 10G NIC was this anyway? With that knowledge we could >probably narrow things down to a more specific modprobe setting, or maybe >even an ethtool command, for some suitable revision of ethtool. > >rick jones > >>Thanks for listening and re enforcing my search process. >>bryan >>At 01:32 PM 4/27/2007, Rick Jones wrote: >> >>>Bryan Lawver wrote: >>> >>>>Your right about the ipoib module not combining packets (I believed you >>>>without checking) but I did never the less. The ipoib_start_xmit >>>>routine is definitely handed a "double packet" which means that the IP >>>>NIC driver or the kernel is combining two packets into a single super >>>>jumbo packet. This issue is irrespective of the IP MTU setting because >>>>I have set all interfaces to 9000k yet ipoib accepts and forwards this >>>>17964 packet to the next IB node and onto the TCP stack where it is >>>>never acknowledged. This may not have come up in prior testing because >>>>I am using some of the fastest IP NICs which have no trouble keeping up >>>>with or exceeding the bandwidth of the IB side. >>>>This issue arises exactly every 8 packets...(ring buffer overrun??) >>>>I will be at Sonoma for the next few days as many on this list will be. >>> >>> >>> >>>Some NICs (esp 10G) support large receive offload - they coalesce TCP >>>segments from the wire/fiber into larger ones they pass up the stack. >>>Perhaps that is happening here? >>> >>>I'm going to go out a bit on a limb, cross the streams, and include >>>netdev, because I suspect that if a system is acting as an IP router, >>>one doesn't want large receive offload enabled. That may need some >>>discussion in netdev - it may then require some changes to default >>>settings or some documentation enhancements. That or I'll learn that >>>the stack is already dealing with the issue... >>> >>>rick jones >>> >>>>bryan >>>> >>>>At 11:06 AM 4/26/2007, Michael S. Tsirkin wrote: >>>> >>>>> > Quoting Bryan Lawver <lawver1@llnl.gov>: >>>>> > Subject: Re: IPoIB forwarding >>>>> > >>>>> > Here's a tcpdump of the same sequence. The TCP MSS is 8960 and it >>>>> appears >>>>> > that two payloads are queued at ipoib which combines them into a single >>>>> > 17920 payload with assumingly correct IP header (40) and IB header >>>>> > (4). The application or TCP stack does not acknowledge this double >>>>> packet >>>>> > ie. it does not ACK until each of the 8960 packets are resent >>>>> > individually. Being an IB newbie, I am guessing this combining is >>>>> > allowable but may violate TCP protocol. >>>>> >>>>>IPoIB does nothing like this - it's just a network device so >>>>>it sends all packets out as is. >>>>> >>>>>-- >>>>>MST >>>> >>>> >>>>_______________________________________________ >>>>general mailing list >>>>general@lists.openfabrics.org >>>>http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>>>To unsubscribe, please visit >>>>http://openib.org/mailman/listinfo/openib-general ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-04-27 22:43 ` Bryan Lawver @ 2007-04-27 23:37 ` Rick Jones 2007-04-27 23:39 ` David Miller 2007-04-28 6:51 ` [ofa-general] Re: IPoIB forwarding Bill Fink 0 siblings, 2 replies; 20+ messages in thread From: Rick Jones @ 2007-04-27 23:37 UTC (permalink / raw) To: Bryan Lawver; +Cc: Linux Network Development list, Michael S. Tsirkin, general Bryan Lawver wrote: > I had so much debugging turned on that it was not the "slowing of the > traffic" but the "non-coelescencing" that was the remedy. The NIC is a > MyriCom NIC and these are easy options to set. As chance would have it, I've played with some Myricom myri10ge NICs recently, and even disabled large receive offload during some netperf tests :) It is a modprobe option. Going back now to the driver source and the README I see :-) <excerpt> Troubleshooting =============== Large Receive Offload (LRO) is enabled by default. This will interfere with forwarding TCP traffic. If you plan to forward TCP traffic (using the host with the Myri10GE NIC as a router or bridge), you must disable LRO. To disable LRO, load the myri10ge driver with myri10ge_lro set to 0: # modprobe myri10ge myri10ge_lro=0 Alternatively, you can disable LRO at runtime by disabling receive checksum offloading via ethtool: # ethtool -K eth2 rx off </excerpt> rick jones ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-04-27 23:37 ` Rick Jones @ 2007-04-27 23:39 ` David Miller 2007-04-27 23:48 ` Rick Jones 2007-04-28 6:51 ` [ofa-general] Re: IPoIB forwarding Bill Fink 1 sibling, 1 reply; 20+ messages in thread From: David Miller @ 2007-04-27 23:39 UTC (permalink / raw) To: rick.jones2; +Cc: lawver1, netdev, mst, general From: Rick Jones <rick.jones2@hp.com> Date: Fri, 27 Apr 2007 16:37:49 -0700 > Large Receive Offload (LRO) is enabled by default. This will > interfere with forwarding TCP traffic. If you plan to forward TCP > traffic (using the host with the Myri10GE NIC as a router or bridge), > you must disable LRO. To disable LRO, load the myri10ge driver > with myri10ge_lro set to 0: LRO should be disabled by default if the driver does this. This is a major and unacceptable bug. Thanks for pointing this out Rick. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-04-27 23:39 ` David Miller @ 2007-04-27 23:48 ` Rick Jones 2007-04-27 23:52 ` David Miller 0 siblings, 1 reply; 20+ messages in thread From: Rick Jones @ 2007-04-27 23:48 UTC (permalink / raw) To: David Miller; +Cc: lawver1, netdev, mst, general David Miller wrote: > From: Rick Jones <rick.jones2@hp.com> > Date: Fri, 27 Apr 2007 16:37:49 -0700 > > >>Large Receive Offload (LRO) is enabled by default. This will >>interfere with forwarding TCP traffic. If you plan to forward TCP >>traffic (using the host with the Myri10GE NIC as a router or bridge), >>you must disable LRO. To disable LRO, load the myri10ge driver >>with myri10ge_lro set to 0: > > > LRO should be disabled by default if the driver does this. This is a > major and unacceptable bug. > > Thanks for pointing this out Rick. No problem - just to play whatif/devil's advocate for a bit though... is there any way to tie that in with the setting of net.ipv4.ip_forward (and/or its IPv6 counterpart)? rick jones ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-04-27 23:48 ` Rick Jones @ 2007-04-27 23:52 ` David Miller 2007-04-30 17:16 ` Rick Jones 0 siblings, 1 reply; 20+ messages in thread From: David Miller @ 2007-04-27 23:52 UTC (permalink / raw) To: rick.jones2; +Cc: lawver1, mst, general, netdev From: Rick Jones <rick.jones2@hp.com> Date: Fri, 27 Apr 2007 16:48:00 -0700 > No problem - just to play whatif/devil's advocate for a bit > though... is there any way to tie that in with the setting of > net.ipv4.ip_forward (and/or its IPv6 counterpart)? Even ignoring that, consider the potential issues this kind of problem could be causing netfilter. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-04-27 23:52 ` David Miller @ 2007-04-30 17:16 ` Rick Jones 2007-05-01 22:43 ` [PATCH] make myri10ge use default MTU of 1500 bytes Loic Prylli 0 siblings, 1 reply; 20+ messages in thread From: Rick Jones @ 2007-04-30 17:16 UTC (permalink / raw) To: David Miller; +Cc: lawver1, netdev, mst, general David Miller wrote: > From: Rick Jones <rick.jones2@hp.com> > Date: Fri, 27 Apr 2007 16:48:00 -0700 > > >>No problem - just to play whatif/devil's advocate for a bit >>though... is there any way to tie that in with the setting of >>net.ipv4.ip_forward (and/or its IPv6 counterpart)? > > > Even ignoring that, consider the potential issues this > kind of problem could be causing netfilter. OK, I'll show my ignorance and bite - what sort of issues with netfilter? Is it tied to link-local MTUs? rick jones ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] make myri10ge use default MTU of 1500 bytes 2007-04-30 17:16 ` Rick Jones @ 2007-05-01 22:43 ` Loic Prylli 0 siblings, 0 replies; 20+ messages in thread From: Loic Prylli @ 2007-05-01 22:43 UTC (permalink / raw) To: netdev; +Cc: Rick Jones, David Miller Change default MTU from jumbo (9000) to standard (1500) for myri10ge Signed-off-by: Loic Prylli <loic@myri.com> --- drivers/net/myri10ge/myri10ge.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c index 16e3c43..0e9cc17 100644 --- a/drivers/net/myri10ge/myri10ge.c +++ b/drivers/net/myri10ge/myri10ge.c @@ -252,7 +252,7 @@ module_param(myri10ge_force_firmware, int, S_IRUGO); MODULE_PARM_DESC(myri10ge_force_firmware, "Force firmware to assume aligned completions\n"); -static int myri10ge_initial_mtu = MYRI10GE_MAX_ETHER_MTU - ETH_HLEN; +static int myri10ge_initial_mtu = 1500; module_param(myri10ge_initial_mtu, int, S_IRUGO); MODULE_PARM_DESC(myri10ge_initial_mtu, "Initial MTU\n"); -- 1.5.0.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-04-27 23:37 ` Rick Jones 2007-04-27 23:39 ` David Miller @ 2007-04-28 6:51 ` Bill Fink 2007-04-29 19:40 ` Loic Prylli 2007-04-30 17:07 ` Rick Jones 1 sibling, 2 replies; 20+ messages in thread From: Bill Fink @ 2007-04-28 6:51 UTC (permalink / raw) To: Rick Jones Cc: Bryan Lawver, Development list, Michael S. Tsirkin, general, Linux On Fri, 27 Apr 2007, Rick Jones wrote: > Bryan Lawver wrote: > > I had so much debugging turned on that it was not the "slowing of the > > traffic" but the "non-coelescencing" that was the remedy. The NIC is a > > MyriCom NIC and these are easy options to set. > > As chance would have it, I've played with some Myricom myri10ge NICs recently, > and even disabled large receive offload during some netperf tests :) It is a > modprobe option. Going back now to the driver source and the README I see :-) > > > <excerpt> > Troubleshooting > =============== > > Large Receive Offload (LRO) is enabled by default. This will > interfere with forwarding TCP traffic. If you plan to forward TCP > traffic (using the host with the Myri10GE NIC as a router or bridge), > you must disable LRO. To disable LRO, load the myri10ge driver > with myri10ge_lro set to 0: > > # modprobe myri10ge myri10ge_lro=0 > > Alternatively, you can disable LRO at runtime by disabling > receive checksum offloading via ethtool: > > # ethtool -K eth2 rx off > > </excerpt> > > rick jones What version of the myri10ge driver is this? With the 1.2.0 version that comes with the 2.6.20.7 kernel, there is no myri10ge_lro module parameter. [root@lang2 ~]# modinfo myri10ge | grep -i lro [root@lang2 ~]# And I've been testing IP forwarding using two Myricom 10-GigE NICs without setting any special modprobe parameters. -Bill ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-04-28 6:51 ` [ofa-general] Re: IPoIB forwarding Bill Fink @ 2007-04-29 19:40 ` Loic Prylli 2007-04-30 21:12 ` Rick Jones 2007-04-30 17:07 ` Rick Jones 1 sibling, 1 reply; 20+ messages in thread From: Loic Prylli @ 2007-04-29 19:40 UTC (permalink / raw) To: Bill Fink Cc: Bryan Lawver, Linux Network Development list, Michael S. Tsirkin, general On 4/28/2007 2:51 AM, Bill Fink wrote: > On Fri, 27 Apr 2007, Rick Jones wrote: > > >> Bryan Lawver wrote: >> >>> I had so much debugging turned on that it was not the "slowing of the >>> traffic" but the "non-coelescencing" that was the remedy. The NIC is a >>> MyriCom NIC and these are easy options to set. >>> >> As chance would have it, I've played with some Myricom myri10ge NICs recently, >> and even disabled large receive offload during some netperf tests :) It is a >> modprobe option. Going back now to the driver source and the README I see :-) >> >> >> [..] >> >> rick jones >> > > What version of the myri10ge driver is this? With the 1.2.0 version > that comes with the 2.6.20.7 kernel, there is no myri10ge_lro module > parameter. > > The myri10ge_lro parameter does not exists in the kernel tree. The option and corresponding lro code is available only in the externally distributed version of myri10ge. That code was submitted to the netdev list, but wasn't taken in the kernel tree because of the reasonable concern the driver might not be the right place for that code (if nobody else proposes something equivalent in the meantime, we might at some point resubmit it as a driver-independant addon, but it might not be that soon for manpower reasons). Only the 1.2.0 version of the external driver makes LRO incompatible with forwarding. The problem should be fixed in version 1.3.0 released a few weeks ago (forwarding with myri10ge_lro enabled should then work), let us know otherwise. Anyway, following David Miller remark about netfilter, for the next version we might ask the user to explicitely enable LRO rather than making the default. Sorry for the inconvenience. Loic ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-04-29 19:40 ` Loic Prylli @ 2007-04-30 21:12 ` Rick Jones 2007-05-01 22:05 ` Loic Prylli 0 siblings, 1 reply; 20+ messages in thread From: Rick Jones @ 2007-04-30 21:12 UTC (permalink / raw) To: Loic Prylli Cc: Bryan Lawver, Linux Network Development list, Bill Fink, Michael S. Tsirkin, general > Only the 1.2.0 version of the external driver makes LRO incompatible > with forwarding. The problem should be fixed in version 1.3.0 released a > few weeks ago (forwarding with myri10ge_lro enabled should then work), > let us know otherwise. > > Anyway, following David Miller remark about netfilter, for the next > version we might ask the user to explicitely enable LRO rather than > making the default. Speaking of defaults, it would seem that the external 1.2.0 driver comes with 9000 bytes as the default MTU? At least I think that is what I am seeing now that I've started looking more closely. rick jones ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-04-30 21:12 ` Rick Jones @ 2007-05-01 22:05 ` Loic Prylli 2007-05-01 22:12 ` Rick Jones 2007-05-03 23:37 ` Bryan Lawver 0 siblings, 2 replies; 20+ messages in thread From: Loic Prylli @ 2007-05-01 22:05 UTC (permalink / raw) To: Rick Jones Cc: Bryan Lawver, Linux Network Development list, Bill Fink, mst, general On 4/30/2007 2:12 PM, Rick Jones wrote: > > Speaking of defaults, it would seem that the external 1.2.0 driver > comes with 9000 bytes as the default MTU? At least I think that is > what I am seeing now that I've started looking more closely. > > rick jones That's the same for the in-kernel-tree code (9K MTU by default). Assuming this is not wanted, I will submit a patch for that. Loic ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-05-01 22:05 ` Loic Prylli @ 2007-05-01 22:12 ` Rick Jones 2007-05-03 23:37 ` Bryan Lawver 1 sibling, 0 replies; 20+ messages in thread From: Rick Jones @ 2007-05-01 22:12 UTC (permalink / raw) To: Loic Prylli Cc: Bryan Lawver, Linux Network Development list, Bill Fink, mst, general Loic Prylli wrote: > On 4/30/2007 2:12 PM, Rick Jones wrote: > >> >> Speaking of defaults, it would seem that the external 1.2.0 driver >> comes with 9000 bytes as the default MTU? At least I think that is >> what I am seeing now that I've started looking more closely. >> >> rick jones > > > > That's the same for the in-kernel-tree code (9K MTU by default). > Assuming this is not wanted, I will submit a patch for that. While I like what that does for perrformance, and at the risk of putting words into the mouths of netdev, I suspect that 1500 bytes is indeed the desired default. It matches the IEEE specs, I've yet to see a switch which enabled "Jumbo Frames" by default, not everything out there even believes that Jubmo Frames means 9000 byte MTU etc etc etc. I think that 1500 bytes for an "Ethernet" device remains in line with the principle of least surprise. rick jones ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-05-01 22:05 ` Loic Prylli 2007-05-01 22:12 ` Rick Jones @ 2007-05-03 23:37 ` Bryan Lawver 1 sibling, 0 replies; 20+ messages in thread From: Bryan Lawver @ 2007-05-03 23:37 UTC (permalink / raw) To: Loic Prylli, Rick Jones Cc: Bill Fink, Linux Network Development list, mst, general I have been able to install and use the 1.3.0 myricom driver and everything works as I expected and performance is pretty decent. Interesting little side tour through various drivers...The router node sees almost no load which is really encouraging. Thanks, bryan At 03:05 PM 5/1/2007, Loic Prylli wrote: >On 4/30/2007 2:12 PM, Rick Jones wrote: >> >>Speaking of defaults, it would seem that the external 1.2.0 driver comes >>with 9000 bytes as the default MTU? At least I think that is what I am >>seeing now that I've started looking more closely. >> >>rick jones > > >That's the same for the in-kernel-tree code (9K MTU by default). Assuming >this is not wanted, I will submit a patch for that. > > >Loic ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-04-28 6:51 ` [ofa-general] Re: IPoIB forwarding Bill Fink 2007-04-29 19:40 ` Loic Prylli @ 2007-04-30 17:07 ` Rick Jones 2007-05-01 5:57 ` Bill Fink 1 sibling, 1 reply; 20+ messages in thread From: Rick Jones @ 2007-04-30 17:07 UTC (permalink / raw) To: Bill Fink Cc: Bryan Lawver, Linux Network Development list, Michael S. Tsirkin, general > What version of the myri10ge driver is this? With the 1.2.0 version > that comes with the 2.6.20.7 kernel, there is no myri10ge_lro module > parameter. > > [root@lang2 ~]# modinfo myri10ge | grep -i lro > [root@lang2 ~]# > > And I've been testing IP forwarding using two Myricom 10-GigE NICs > without setting any special modprobe parameters. Ethtool -i on the interface reports 1.2.0 as the driver version. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-04-30 17:07 ` Rick Jones @ 2007-05-01 5:57 ` Bill Fink 2007-05-01 16:26 ` Loic Prylli 0 siblings, 1 reply; 20+ messages in thread From: Bill Fink @ 2007-05-01 5:57 UTC (permalink / raw) To: Rick Jones Cc: Bryan Lawver, Michael S. Tsirkin, general, Linux Network Development list On Mon, 30 Apr 2007, Rick Jones wrote: > > What version of the myri10ge driver is this? With the 1.2.0 version > > that comes with the 2.6.20.7 kernel, there is no myri10ge_lro module > > parameter. > > > > [root@lang2 ~]# modinfo myri10ge | grep -i lro > > [root@lang2 ~]# > > > > And I've been testing IP forwarding using two Myricom 10-GigE NICs > > without setting any special modprobe parameters. > > > Ethtool -i on the interface reports 1.2.0 as the driver version. Perhaps it would be useful to have different version strings for the in-kernel Linux version and the Myricom externally provided version. Just a thought. -Bill ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-05-01 5:57 ` Bill Fink @ 2007-05-01 16:26 ` Loic Prylli 0 siblings, 0 replies; 20+ messages in thread From: Loic Prylli @ 2007-05-01 16:26 UTC (permalink / raw) To: Bill Fink Cc: Bryan Lawver, Linux Network Development list, Michael S. Tsirkin, general On 5/1/2007 1:57 AM, Bill Fink wrote: > On Mon, 30 Apr 2007, Rick Jones wrote: > > >> Ethtool -i on the interface reports 1.2.0 as the driver version. >> > > Perhaps it would be useful to have different version strings for > the in-kernel Linux version and the Myricom externally provided > version. Just a thought. > Indeed, and it is the case as of March-21 git (or any myri10ge version >= 1.3.0). The in-kernel version will show something like: 1.3.0-1.226, the external version will only show1.3.0. Loic ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [ofa-general] Re: IPoIB forwarding 2007-04-27 22:26 ` Bryan Lawver 2007-04-27 22:32 ` Rick Jones @ 2007-04-28 2:35 ` parks 1 sibling, 0 replies; 20+ messages in thread From: parks @ 2007-04-28 2:35 UTC (permalink / raw) To: Bryan Lawver, Rick Jones Cc: Linux Network Development list, Michael S. Tsirkin, general [-- Attachment #1.1: Type: text/plain, Size: 3560 bytes --] If you are using the node as a router and using the myrinet nic then there is something we had to turn off. It was causing panics on Roadrunner. It is spelled out explicitly in the Myrinet readme.... It combines packerts. I can tell you more monday. At 04:26 PM 4/27/2007, Bryan Lawver wrote: >I hit the IP NIC over the head with a hammer and turned off all >offload features and I no longer get the super jumbo packet and I >have symmetric performance. This NIC supported "ethtool -K ethx >tso/tx/rx/sg on/off" and I am not sure at this time which one I >needed to whack but all off solved the problem. > >Thanks for listening and re enforcing my search process. > >bryan > >At 01:32 PM 4/27/2007, Rick Jones wrote: >>Bryan Lawver wrote: >>>Your right about the ipoib module not combining packets (I >>>believed you without checking) but I did never the less. The >>>ipoib_start_xmit routine is definitely handed a "double >>>packet" which means that the IP NIC driver or the kernel is >>>combining two packets into a single super jumbo packet. This >>>issue is irrespective of the IP MTU setting because I have set all >>>interfaces to 9000k yet ipoib accepts and forwards this 17964 >>>packet to the next IB node and onto the TCP stack where it is >>>never acknowledged. This may not have come up in prior testing >>>because I am using some of the fastest IP NICs which have no >>>trouble keeping up with or exceeding the bandwidth of the IB >>>side. This issue arises exactly every 8 packets...(ring buffer overrun??) >>>I will be at Sonoma for the next few days as many on this list will be. >> >> >>Some NICs (esp 10G) support large receive offload - they coalesce >>TCP segments from the wire/fiber into larger ones they pass up the >>stack. Perhaps that is happening here? >> >>I'm going to go out a bit on a limb, cross the streams, and include >>netdev, because I suspect that if a system is acting as an IP >>router, one doesn't want large receive offload enabled. That may >>need some discussion in netdev - it may then require some changes >>to default settings or some documentation enhancements. That or >>I'll learn that the stack is already dealing with the issue... >> >>rick jones >> >>>bryan >>> >>>At 11:06 AM 4/26/2007, Michael S. Tsirkin wrote: >>> >>>> > Quoting Bryan Lawver <lawver1@llnl.gov>: >>>> > Subject: Re: IPoIB forwarding >>>> > >>>> > Here's a tcpdump of the same sequence. The TCP MSS is 8960 >>>> and it appears >>>> > that two payloads are queued at ipoib which combines them into a single >>>> > 17920 payload with assumingly correct IP header (40) and IB header >>>> > (4). The application or TCP stack does not acknowledge this >>>> double packet >>>> > ie. it does not ACK until each of the 8960 packets are resent >>>> > individually. Being an IB newbie, I am guessing this combining is >>>> > allowable but may violate TCP protocol. >>>> >>>>IPoIB does nothing like this - it's just a network device so >>>>it sends all packets out as is. >>>> >>>>-- >>>>MST >>> >>>_______________________________________________ >>>general mailing list >>>general@lists.openfabrics.org >>>http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>>To unsubscribe, please visit >>>http://openib.org/mailman/listinfo/openib-general > >_______________________________________________ >general mailing list >general@lists.openfabrics.org >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general [-- Attachment #1.2: Type: text/html, Size: 4328 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2007-05-03 23:37 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <6.1.2.0.2.20070423160212.12db6400@mail.llnl.gov>
[not found] ` <20070425124652.GG1624@mellanox.co.il>
[not found] ` <6.1.2.0.2.20070426083410.1389d9e0@mail.llnl.gov>
[not found] ` <20070426161409.GF15540@mellanox.co.il>
[not found] ` <6.1.2.0.2.20070426095112.138e9a68@mail.llnl.gov>
[not found] ` <20070426180618.GJ15540@mellanox.co.il>
[not found] ` <6.1.2.0.2.20070427115435.13ea5ec0@mail.llnl.gov>
2007-04-27 20:32 ` [ofa-general] Re: IPoIB forwarding Rick Jones
2007-04-27 22:26 ` Bryan Lawver
2007-04-27 22:32 ` Rick Jones
2007-04-27 22:43 ` Bryan Lawver
2007-04-27 23:37 ` Rick Jones
2007-04-27 23:39 ` David Miller
2007-04-27 23:48 ` Rick Jones
2007-04-27 23:52 ` David Miller
2007-04-30 17:16 ` Rick Jones
2007-05-01 22:43 ` [PATCH] make myri10ge use default MTU of 1500 bytes Loic Prylli
2007-04-28 6:51 ` [ofa-general] Re: IPoIB forwarding Bill Fink
2007-04-29 19:40 ` Loic Prylli
2007-04-30 21:12 ` Rick Jones
2007-05-01 22:05 ` Loic Prylli
2007-05-01 22:12 ` Rick Jones
2007-05-03 23:37 ` Bryan Lawver
2007-04-30 17:07 ` Rick Jones
2007-05-01 5:57 ` Bill Fink
2007-05-01 16:26 ` Loic Prylli
2007-04-28 2:35 ` parks
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).