From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shirley Ma Subject: Re: IB_CQ_VECTOR_LEAST_ATTACHED Date: Sun, 07 Dec 2014 16:46:01 -0800 Message-ID: <5484F4C9.6010304@oracle.com> References: <54809030.6090107@oracle.com> <5480AB49.1080209@acm.org> <5480B8CE.3080704@oracle.com> <54842A05.9070207@dev.mellanox.co.il> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Chuck Lever , Sagi Grimberg Cc: Bart Van Assche , linux-rdma , Or Gerlitz , eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, Matan Barak List-Id: linux-rdma@vger.kernel.org On 12/07/2014 12:08 PM, Chuck Lever wrote: > > On Dec 7, 2014, at 5:20 AM, Sagi Grimberg wrote: > >> On 12/4/2014 9:41 PM, Shirley Ma wrote: >>> On 12/04/2014 10:43 AM, Bart Van Assche wrote: >>>> On 12/04/14 17:47, Shirley Ma wrote: >>>>> What's the history of this patch? >>>>> http://lists.openfabrics.org/pipermail/general/2008-May/050813.html >>>>> >>>>> I am working on multiple QPs workload. And I created a similar approach >>>>> with IB_CQ_VECTOR_LEAST_ATTACHED, which can bring up about 17% small I/O >>>>> performance. I think this CQ_VECTOR loading balance should be maintained >>>>> in provider not the caller. I didn't see this patch was submitted to >>>>> mainline kernel, wonder any reason behind? >>>> >>>> My interpretation is that an approach similar to IB_CQ_VECTOR_LEAST_ATTACHED is useful on single-socket systems but suboptimal on multi-socket systems. Hence the code for associating CQ sets with CPU sockets in the SRP initiator. These changes have been queued for kernel 3.19. See also branch drivers-for-3.19 in git repo git://git.infradead.org/users/hch/scsi-queue.git. >>> >>> What I did is that I manually controlled IRQ and working thread on the same socket. The CQ is created when mounting the file system in NFS/RDMA, but the workload thread might start from different socket, so per-cpu based implementation might not apply. I will look at SRP implementation. >>> >> >> Hey Shirley, >> >> Bart is correct, in general the LEAST_ATTACHED approach might not be >> optimal in the NUMA case. The thread <-> QP/CQ/CPU assignment is >> addressed by the multi-channel approach which to my understanding won't >> be implemented in NFSoRDMA in the near future (right Chuck?) > > As I understand it, the preference of the Linux NFS community is that > any multi-pathing solution should be transparent to the ULP (NFS and > RPC, in this case). mp-tcp is ideal in that the ULP is presented with > a single virtual transport instance, but under the covers, that instance > can be backed by multiple active paths. > > Alternately, pNFS can be deployed. This allows a dataset to be striped > across multiple servers (and networks). There is a rather high bar to > entering this arena however. > > Speculating aloud, multiple QPs per transport instance may require > implementation changes on the server as well as the client. Any > interoperability dependencies should be documented via a standards > process. > > And note that an RPC transport (at least in kernel) is shared across > many user applications and mount points. I find it difficult to visualize > an intuitive and comprehensive administrative interface where enough > guidance is provided to place a set of NFS applications and an RPC > transport in the same resource domain (maybe cgroups?). > > So for the time being I prefer staying with a single QP per client- > server pair. > > A large NFS client can actively use many NFS servers, however. Each > client-server pair would benefit from finding "least-used" resources > when QP and CQs are created. That is something we can leverage today. > Yes, that's something I am evaluating now for one NFS client to different destination servers. I can see more than 15% BW performance increase when simulating multiple servers mount points through different IPoIB child interfaces and changing create_cq completion vector from 0 and "least-used", so comp vectors are balanced among QPs. >> However, the LEAST_ATTACH vector hint will revive again in the future >> as there is a need to spread applications on different interrupt >> vectors (especially for user-space). >> >> CC'ing Matan who is working on this, perhaps he can comment on this as >> well. > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html