From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tom Talpey Subject: Re: NFS over RDMA benchmark Date: Tue, 30 Apr 2013 09:05:06 -0400 Message-ID: <517FC182.3030703@talpey.com> References: <0EE9A1CDC8D6434DB00095CD7DB873462CF96C65@MTLDAG01.mtl.com> <62745258-4F3B-4C05-BFFD-03EA604576E4@ornl.gov> <0EE9A1CDC8D6434DB00095CD7DB873462CF9715B@MTLDAG01.mtl.com> <20130423210607.GJ3676@fieldses.org> <0EE9A1CDC8D6434DB00095CD7DB873462CF988C9@MTLDAG01.mtl.com> <20130424150540.GB20275@fieldses.org> <20130424152631.GC20275@fieldses.org> <0EE9A1CDC8D6434DB00095CD7DB873462CF9A820@MTLDAG01.mtl.com> <20130428144248.GA2037@fieldses.org> <0EE9A1CDC8D6434DB00095CD7DB873462CF9C90C@MTLDAG01.mtl.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <0EE9A1CDC8D6434DB00095CD7DB873462CF9C90C-fViJhHBwANKuSA5JZHE7gA@public.gmane.org> Sender: linux-nfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Yan Burman Cc: "J. Bruce Fields" , Wendy Cheng , "Atchley, Scott" , Tom Tucker , "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , "linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Or Gerlitz List-Id: linux-rdma@vger.kernel.org On 4/30/2013 1:09 AM, Yan Burman wrote: > > >> -----Original Message----- >> From: J. Bruce Fields [mailto:bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org] >> Sent: Sunday, April 28, 2013 17:43 >> To: Yan Burman >> Cc: Wendy Cheng; Atchley, Scott; Tom Tucker; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; >> linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Or Gerlitz >> Subject: Re: NFS over RDMA benchmark >> >> On Sun, Apr 28, 2013 at 06:28:16AM +0000, Yan Burman wrote: >>>>>>>>>>> On Wed, Apr 17, 2013 at 7:36 AM, Yan Burman >>>>>>>>>>> >>>>>>>>>>>> I've been trying to do some benchmarks for NFS over >>>>>>>>>>>> RDMA and I seem to >>>>>>>>> only get about half of the bandwidth that the HW can give me. >>>>>>>>>>>> My setup consists of 2 servers each with 16 cores, >>>>>>>>>>>> 32Gb of memory, and >>>>>>>>> Mellanox ConnectX3 QDR card over PCI-e gen3. >>>>>>>>>>>> These servers are connected to a QDR IB switch. The >>>>>>>>>>>> backing storage on >>>>>>>>> the server is tmpfs mounted with noatime. >>>>>>>>>>>> I am running kernel 3.5.7. >>>>>>>>>>>> >>>>>>>>>>>> When running ib_send_bw, I get 4.3-4.5 GB/sec for >>>>>>>>>>>> block sizes 4- >>>> 512K. >>>>>>>>>>>> When I run fio over rdma mounted nfs, I get >>>>>>>>>>>> 260-2200MB/sec for the >>>>>>>>> same block sizes (4-512K). running over IPoIB-CM, I get >>>>>>>>> 200- >>>> 980MB/sec. >> ... >>>>>>>> I am trying to get maximum performance from a single server >>>>>>>> - I used 2 >>>>>>> processes in fio test - more than 2 did not show any performance >> boost. >>>>>>>> I tried running fio from 2 different PCs on 2 different >>>>>>>> files, but the sum of >>>>>>> the two is more or less the same as running from single client PC. >>>>>>>> > > I finally got up to 4.1GB/sec bandwidth with RDMA (ipoib-CM bandwidth is also way higher now). > For some reason when I had intel IOMMU enabled, the performance dropped significantly. > I now get up to ~95K IOPS and 4.1GB/sec bandwidth. Excellent, but is that 95K IOPS a typo? At 4KB, that's less than 400MBps. What is the client CPU percentage you see under this workload, and how different are the NFS/RDMA and NFS/IPoIB overheads? > Now I will take care of the issue that I am running only at 40Gbit/s instead of 56Gbit/s, but that is another unrelated problem (I suspect I have a cable issue). > > This is still strange, since ib_send_bw with intel iommu enabled did get up to 4.5GB/sec, so why did intel iommu affect only nfs code? You'll need to do more profiling to track that down. I would suspect that ib_send_bw is using some sort of direct hardware access, bypassing the IOMMU management and possibly performing no dynamic memory registration. The NFS/RDMA code goes via the standard kernel DMA API, and correctly registers/deregisters memory on a per-i/o basis in order to provide storage data integrity. Perhaps there are overheads in the IOMMU management which can be addressed. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html