* rsockets and other performance
@ 2012-06-14 15:24 Pradeep Satyanarayana
[not found] ` <4FDA0233.6090409-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Pradeep Satyanarayana @ 2012-06-14 15:24 UTC (permalink / raw)
To: Hefty, Sean
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
kashyapv-r/Jw6+rmf7HQT0dZR+AlfA, sri-r/Jw6+rmf7HQT0dZR+AlfA
Traditional sockets based applications wanting high throughput could use
rsockets Since it is layered on top of uverbs we expected to see good
throughput numbers.
So, we started to run netperf and iperf. We observed that it tops off at
about 20Gb/s with QDR adapters. A quick "perf top" revealed a lot of
cycles spent in memcpy().
We had hoped these numbers would be somewhat higher since we did not
expect the memcpy() to have such a large overhead.
Given the copy overhead, we wanted to revisit the IPoIB and SDP
performance. Hence we installed to OFED-1.5.4.1 on RHEL 6.2. We found
that for small packets SDP starts
with low throughputs, but seems to catch up with rsockets at about 16 KB
packets. On the other hand IPoIB CM tops off at about 10 Gb/s.
Since SDP does in kernel RDMA we expected IPoIB CM and SDP numbers to be
much closer. Again "perf top" revealed that IPoIB was spending a large
number of cycles in
checksum computation. Out of curiosity Sridhar made the following changes:
--- ipoib_cm.c.orig 2012-06-10 15:27:10.589325138 -0400
+++ ipoib_cm.c 2012-06-12 11:29:49.073262516 -0400
@@ -670,6 +670,7 @@ copied:
skb->dev = dev;
/* XXX get correct PACKET_ type here */
skb->pkt_type = PACKET_HOST;
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
netif_receive_skb(skb);
@@ -1464,7 +1464,8 @@ static ssize_t set_mode(struct device *d
"will cause multicast packet drops\n");
rtnl_lock();
- dev->features &= ~(NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO);
+ dev->features &= ~(NETIF_F_SG | NETIF_F_TSO);
priv->tx_wr.send_flags &= ~IB_SEND_IP_CSUM;
if (ipoib_cm_max_mtu(dev) > priv->mcast_mtu)
With these minimal changes IPoIB throughput reached between 19-20Gb/s
with just 2 threads. This was really unexpected. Given that, we wanted
to revisit the usage of checksums in IPoIB.
So, it looks worthwhile to allow for 'checksum-less' IPoIB-CM within a
cluster on a single subnet. From a checksum perspective, this would be
no different from RDMA. What are your thoughts?
Thanks
Pradeep
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread[parent not found: <4FDA0233.6090409-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* RE: rsockets and other performance [not found] ` <4FDA0233.6090409-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2012-06-14 16:02 ` Hefty, Sean 2012-06-14 16:09 ` Sridhar Samudrala 2012-06-14 17:22 ` Jason Gunthorpe 2 siblings, 0 replies; 10+ messages in thread From: Hefty, Sean @ 2012-06-14 16:02 UTC (permalink / raw) To: Pradeep Satyanarayana Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kashyapv-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org, sri-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org > Traditional sockets based applications wanting high throughput could use > rsockets Since it is layered on top of uverbs we expected to see good > throughput numbers. > So, we started to run netperf and iperf. We observed that it tops off at > about 20Gb/s with QDR adapters. A quick "perf top" revealed a lot of > cycles spent in memcpy(). > We had hoped these numbers would be somewhat higher since we did not > expect the memcpy() to have such a large overhead. Someone more familiar with ipoib needs to respond regarding those changes. For rsockets, please make sure that you've pulled the latest code. You can improve the performance by adjusting the QP size and send/receive buffers, which can be done through config files. I started an rsocket man page that describes the files that I just pushed out to my git tree. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rsockets and other performance [not found] ` <4FDA0233.6090409-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2012-06-14 16:02 ` Hefty, Sean @ 2012-06-14 16:09 ` Sridhar Samudrala [not found] ` <1339690194.14317.5.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org> 2012-06-14 17:22 ` Jason Gunthorpe 2 siblings, 1 reply; 10+ messages in thread From: Sridhar Samudrala @ 2012-06-14 16:09 UTC (permalink / raw) To: Pradeep Satyanarayana Cc: Hefty, Sean, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kashyapv-r/Jw6+rmf7HQT0dZR+AlfA On Thu, 2012-06-14 at 08:24 -0700, Pradeep Satyanarayana wrote: > Traditional sockets based applications wanting high throughput could use > rsockets Since it is layered on top of uverbs we expected to see good > throughput numbers. > So, we started to run netperf and iperf. We observed that it tops off at > about 20Gb/s with QDR adapters. A quick "perf top" revealed a lot of > cycles spent in memcpy(). > We had hoped these numbers would be somewhat higher since we did not > expect the memcpy() to have such a large overhead. > > Given the copy overhead, we wanted to revisit the IPoIB and SDP > performance. Hence we installed to OFED-1.5.4.1 on RHEL 6.2. We found > that for small packets SDP starts > with low throughputs, but seems to catch up with rsockets at about 16 KB > packets. On the other hand IPoIB CM tops off at about 10 Gb/s. > > Since SDP does in kernel RDMA we expected IPoIB CM and SDP numbers to be > much closer. Again "perf top" revealed that IPoIB was spending a large > number of cycles in > checksum computation. Out of curiosity Sridhar made the following changes: > > --- ipoib_cm.c.orig 2012-06-10 15:27:10.589325138 -0400 > +++ ipoib_cm.c 2012-06-12 11:29:49.073262516 -0400 > @@ -670,6 +670,7 @@ copied: > skb->dev = dev; > /* XXX get correct PACKET_ type here */ > skb->pkt_type = PACKET_HOST; > + skb->ip_summed = CHECKSUM_UNNECESSARY; > netif_receive_skb(skb); > > @@ -1464,7 +1464,8 @@ static ssize_t set_mode(struct device *d > "will cause multicast packet drops\n"); > > rtnl_lock(); > - dev->features &= ~(NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO); > + dev->features &= ~(NETIF_F_SG | NETIF_F_TSO); Enabling NETIF_F_SG improves the throughput further by avoiding a additional kernel memcpy caused by skb_linearize() in dev_queue_xmit(). Thanks Sridhar > priv->tx_wr.send_flags &= ~IB_SEND_IP_CSUM; > > if (ipoib_cm_max_mtu(dev) > priv->mcast_mtu) > > > With these minimal changes IPoIB throughput reached between 19-20Gb/s > with just 2 threads. This was really unexpected. Given that, we wanted > to revisit the usage of checksums in IPoIB. > So, it looks worthwhile to allow for 'checksum-less' IPoIB-CM within a > cluster on a single subnet. From a checksum perspective, this would be > no different from RDMA. What are your thoughts? > > Thanks > Pradeep -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <1339690194.14317.5.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>]
* Re: rsockets and other performance [not found] ` <1339690194.14317.5.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org> @ 2012-06-17 7:22 ` Or Gerlitz [not found] ` <4FDD85B3.2090406-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Or Gerlitz @ 2012-06-17 7:22 UTC (permalink / raw) To: Sridhar Samudrala Cc: Pradeep Satyanarayana, Hefty, Sean, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kashyapv-r/Jw6+rmf7HQT0dZR+AlfA, Shlomo Pongratz On 6/14/2012 7:09 PM, Sridhar Samudrala wrote: > Enabling NETIF_F_SG improves the throughput further by avoiding a > additional kernel memcpy caused by skb_linearize() in dev_queue_xmit(). Hi Sridhar, If you **only** enable NETIF_F_SG for CM, does this yields any gain? did you had to patch the code for that end? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <4FDD85B3.2090406-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: rsockets and other performance [not found] ` <4FDD85B3.2090406-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2012-06-17 21:20 ` Pradeep Satyanarayana 0 siblings, 0 replies; 10+ messages in thread From: Pradeep Satyanarayana @ 2012-06-17 21:20 UTC (permalink / raw) To: Or Gerlitz Cc: Sridhar Samudrala, Hefty, Sean, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kashyapv-r/Jw6+rmf7HQT0dZR+AlfA, Shlomo Pongratz On 06/17/2012 12:22 AM, Or Gerlitz wrote: > On 6/14/2012 7:09 PM, Sridhar Samudrala wrote: >> Enabling NETIF_F_SG improves the throughput further by avoiding a >> additional kernel memcpy caused by skb_linearize() in dev_queue_xmit(). > > Hi Sridhar, > > If you **only** enable NETIF_F_SG for CM, does this yields any gain? > did you had to patch the code for that end? > Hi Or, We did not try only NETIF_F_SG for IPoIB CM. We tried it along with NETIF_F_IP_CSUM. We did run into issues with netperf i.e. netperf errors out along with NETIF_F_SG. However, iperf worked fine. Did not investigate that further. Thanks Pradeep -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rsockets and other performance [not found] ` <4FDA0233.6090409-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2012-06-14 16:02 ` Hefty, Sean 2012-06-14 16:09 ` Sridhar Samudrala @ 2012-06-14 17:22 ` Jason Gunthorpe [not found] ` <20120614172245.GC6552-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2 siblings, 1 reply; 10+ messages in thread From: Jason Gunthorpe @ 2012-06-14 17:22 UTC (permalink / raw) To: Pradeep Satyanarayana Cc: Hefty, Sean, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kashyapv-r/Jw6+rmf7HQT0dZR+AlfA, sri-r/Jw6+rmf7HQT0dZR+AlfA On Thu, Jun 14, 2012 at 08:24:35AM -0700, Pradeep Satyanarayana wrote: > With these minimal changes IPoIB throughput reached between > 19-20Gb/s with just 2 threads. This was really unexpected. Given > that, we wanted to revisit the usage of checksums in IPoIB. > So, it looks worthwhile to allow for 'checksum-less' IPoIB-CM within > a cluster on a single subnet. From a checksum perspective, this > would be no different from RDMA. What are your thoughts? There have been discussions around a 'checksum-less' IPoIB operation for a little while. The basic notion was to enable the checksum offload mechanism, pass the information from Linux for offload straight through to the other side (eg via an extra header or something), have the other side reconstruct the offload indication on RX and inject back to into the net stack. This would be similar to the way checksum bypass works in virtualization (Xen/KVM) where the virtualized net TX just packages the offload data and sends it to the hyperviser kernel which then RX's it and restores the very same checksum offload information. During the CM process this feature would be negotiated. I don't think anyone ever made patches for this, but considering the performance delta you see it really seems worthwhile. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <20120614172245.GC6552-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: rsockets and other performance [not found] ` <20120614172245.GC6552-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2012-06-15 6:48 ` Vivek Kashyap [not found] ` <alpine.LRH.2.00.1206142336220.13728-Nzfcc2Us4m+Xfo/+vza+31aTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Vivek Kashyap @ 2012-06-15 6:48 UTC (permalink / raw) To: Jason Gunthorpe Cc: Pradeep Satyanarayana, Hefty, Sean, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, sri-r/Jw6+rmf7HQT0dZR+AlfA On Thu, 14 Jun 2012, Jason Gunthorpe wrote: > On Thu, Jun 14, 2012 at 08:24:35AM -0700, Pradeep Satyanarayana wrote: > >> With these minimal changes IPoIB throughput reached between >> 19-20Gb/s with just 2 threads. This was really unexpected. Given >> that, we wanted to revisit the usage of checksums in IPoIB. >> So, it looks worthwhile to allow for 'checksum-less' IPoIB-CM within >> a cluster on a single subnet. From a checksum perspective, this >> would be no different from RDMA. What are your thoughts? > > There have been discussions around a 'checksum-less' IPoIB operation > for a little while. > > The basic notion was to enable the checksum offload mechanism, pass > the information from Linux for offload straight through to the other > side (eg via an extra header or something), have the other side > reconstruct the offload indication on RX and inject back to into the > net stack. > > This would be similar to the way checksum bypass works in > virtualization (Xen/KVM) where the virtualized net TX just packages > the offload data and sends it to the hyperviser kernel which then RX's > it and restores the very same checksum offload information. > > During the CM process this feature would be negotiated. > > I don't think anyone ever made patches for this, but considering the > performance delta you see it really seems worthwhile. How about something like below? Basically the the 'checksum-less' operation is only between hosts that both support it by extending the existing IB connection setup mechanism. The following also keeps the changes confined to ipoib-cm module. - add a sysctl variable csum_simulate - In ipoinb-cm module if (csum_simulate) advertize hardware checksum offload capabilities - when a QP is created to a remote host it checks csum_simulate. if (csum_simulate) include CSUM_SIMULATE command in RC private data when setting up the connection Note: the RFC 4755 utilizes this private data to exchange the receive MTU and UD QP. We just add another parameter to it. If accepted by the other end during connection negotiation, then set csum_simulate_on = 1 (For the QP) - when a QP connection request is received if (csum_simulate) look for CSUM_SIMULATE field in the private data if present respond with CSUM_SIMULATE else zero it in response set csum_simulate_on flag = 1 for the QP In the above two steps one would also want to check that the peer is a directly connected host. - when sending data if (csum_simulate_on) send the data over ipoib-cm link normally (no data checksum added) else /* sending over a QP not enabled for checksum offload */ calculate the overall checksum send data - when receiving data if (csum_simulate_on) set CSUM_UNNECESSARY indicating csum has been validated thanks Vivek > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <alpine.LRH.2.00.1206142336220.13728-Nzfcc2Us4m+Xfo/+vza+31aTQe2KTcn/@public.gmane.org>]
* Re: rsockets and other performance [not found] ` <alpine.LRH.2.00.1206142336220.13728-Nzfcc2Us4m+Xfo/+vza+31aTQe2KTcn/@public.gmane.org> @ 2012-06-15 17:19 ` Jason Gunthorpe [not found] ` <20120615171919.GA17224-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Jason Gunthorpe @ 2012-06-15 17:19 UTC (permalink / raw) To: Vivek Kashyap Cc: Pradeep Satyanarayana, Hefty, Sean, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, sri-r/Jw6+rmf7HQT0dZR+AlfA On Thu, Jun 14, 2012 at 11:48:09PM -0700, Vivek Kashyap wrote: > >I don't think anyone ever made patches for this, but considering the > >performance delta you see it really seems worthwhile. > > How about something like below? Basically the the 'checksum-less' > operation is only between hosts that both support it by extending > the existing IB connection setup mechanism. The following also keeps > the changes confined to ipoib-cm module. It is much better to do things properly (as I described) so you don't have a gotcha when a packet gets routed. We don't want to fake that we are doing csum, we want to forward the csum offload data to the other side, which might discard it or might use it to forward the packet out another interface. This is not hard, just fiddly work. -- Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> (780)4406067x832 Chief Technology Officer, Obsidian Research Corp Edmonton, Canada -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <20120615171919.GA17224-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: rsockets and other performance [not found] ` <20120615171919.GA17224-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2012-06-18 6:37 ` Vivek Kashyap [not found] ` <alpine.LRH.2.00.1206172330220.13728-Nzfcc2Us4m+Xfo/+vza+31aTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Vivek Kashyap @ 2012-06-18 6:37 UTC (permalink / raw) To: Jason Gunthorpe Cc: Pradeep Satyanarayana, Hefty, Sean, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, sri-r/Jw6+rmf7HQT0dZR+AlfA On Fri, 15 Jun 2012, Jason Gunthorpe wrote: > On Thu, Jun 14, 2012 at 11:48:09PM -0700, Vivek Kashyap wrote: >>> I don't think anyone ever made patches for this, but considering the >>> performance delta you see it really seems worthwhile. >> >> How about something like below? Basically the the 'checksum-less' >> operation is only between hosts that both support it by extending >> the existing IB connection setup mechanism. The following also keeps >> the changes confined to ipoib-cm module. > > It is much better to do things properly (as I described) so you don't > have a gotcha when a packet gets routed. > > We don't want to fake that we are doing csum, we want to forward the > csum offload data to the other side, which might discard it or might > use it to forward the packet out another interface. This is not hard, > just fiddly work. We certainly need to ensure that the routed packet is not left without a checksum. One either does not enable 'checksum less link' on routers, or else the router must be able to take care of it when it forwards the packet. By 'csum offload data' do you mean something akin to csum_start/csum_offset? How would one transmit this information when using IPoIB-CM across from the sending host to the router? If we can do that it will certainly be useful. Otherwise what I was proposing was that a router is either configured to not accept checksum less configuration on IPoIb-CM, or if checksum less links are supported then, when forwarding to a 'checksum less' interface it does nothing; but when forwarding to any other interface the checksum is added. This addition is really done in the ipoib-cm driver to insulate from the rest of the stack. The 'faking' of checksum offload is only to the local IP stack so that it does not do the calculation but leaves it to the IPoIB-CM driver. That is what we essentially do with virtio or hardware offload mechanisms. We are working on a test patch. That should make it easier to discuss the gaps or if the solution is complete. thanks Vivek > > -- > Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> (780)4406067x832 > Chief Technology Officer, Obsidian Research Corp Edmonton, Canada > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <alpine.LRH.2.00.1206172330220.13728-Nzfcc2Us4m+Xfo/+vza+31aTQe2KTcn/@public.gmane.org>]
* Re: rsockets and other performance [not found] ` <alpine.LRH.2.00.1206172330220.13728-Nzfcc2Us4m+Xfo/+vza+31aTQe2KTcn/@public.gmane.org> @ 2012-06-18 17:53 ` Jason Gunthorpe 0 siblings, 0 replies; 10+ messages in thread From: Jason Gunthorpe @ 2012-06-18 17:53 UTC (permalink / raw) To: Vivek Kashyap Cc: Pradeep Satyanarayana, Hefty, Sean, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, sri-r/Jw6+rmf7HQT0dZR+AlfA On Sun, Jun 17, 2012 at 11:37:33PM -0700, Vivek Kashyap wrote: > >On Thu, Jun 14, 2012 at 11:48:09PM -0700, Vivek Kashyap wrote: > >We don't want to fake that we are doing csum, we want to forward the > >csum offload data to the other side, which might discard it or might > >use it to forward the packet out another interface. This is not hard, > >just fiddly work. > > We certainly need to ensure that the routed packet is not left without a > checksum. One either does not enable 'checksum less link' on > routers, or else the router must be able to take care of it when it > forwards the packet. The latter is vastly preferred. > By 'csum offload data' do you mean something akin to csum_start/csum_offset? > How would one transmit this information when using IPoIB-CM across from the > sending host to the router? If we can do that it will certainly be useful. That plus a ip_summed needed flag seems to match what virtio_net is doing. Review virtio_net.c:receive_buf and xmit_skb to see how it handles the packet. Also, I am just reminded, there has been some interest in also forwarding GSO through the CM connection. This would speed things up as well and avoid the need for the wonky 64k MTU. ie rework the CM mode to follow more closely how virtio-net works. Like virtio_net you'd have to prefix a small header, or if you are lucky, maybe encoding things in the 32 bits of immediate data. > Otherwise what I was proposing was that a router is either > configured to not accept checksum less configuration on IPoIb-CM, or This sort of solution is something I'd like to avoid, it is not necessary to make something so fragile. > This addition is really done in the ipoib-cm driver to insulate from the > rest of the stack. The 'faking' of checksum offload is only to the local > IP stack so that it does not do the calculation but leaves it to the > IPoIB-CM driver. That is what we essentially do with virtio or hardware > offload mechanisms. I mean 'faking' in the sense you tell the stack you will compute the checksum, throw away the information needed to do that calculation, and the send a packet on the wire that can never be check-summed properly. I don't see the point of keeping things contained to the CM part of the ipoib driver, if the ipoib packet handling needs to be changed then change it.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-06-18 17:53 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-14 15:24 rsockets and other performance Pradeep Satyanarayana
[not found] ` <4FDA0233.6090409-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2012-06-14 16:02 ` Hefty, Sean
2012-06-14 16:09 ` Sridhar Samudrala
[not found] ` <1339690194.14317.5.camel-5vSEHtyIv2TJ4MwkZ4db91aTQe2KTcn/@public.gmane.org>
2012-06-17 7:22 ` Or Gerlitz
[not found] ` <4FDD85B3.2090406-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-06-17 21:20 ` Pradeep Satyanarayana
2012-06-14 17:22 ` Jason Gunthorpe
[not found] ` <20120614172245.GC6552-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2012-06-15 6:48 ` Vivek Kashyap
[not found] ` <alpine.LRH.2.00.1206142336220.13728-Nzfcc2Us4m+Xfo/+vza+31aTQe2KTcn/@public.gmane.org>
2012-06-15 17:19 ` Jason Gunthorpe
[not found] ` <20120615171919.GA17224-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2012-06-18 6:37 ` Vivek Kashyap
[not found] ` <alpine.LRH.2.00.1206172330220.13728-Nzfcc2Us4m+Xfo/+vza+31aTQe2KTcn/@public.gmane.org>
2012-06-18 17:53 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).