From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sagi Grimberg Subject: Re: IB_CQ_VECTOR_LEAST_ATTACHED Date: Sun, 07 Dec 2014 12:20:53 +0200 Message-ID: <54842A05.9070207@dev.mellanox.co.il> References: <54809030.6090107@oracle.com> <5480AB49.1080209@acm.org> <5480B8CE.3080704@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5480B8CE.3080704-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Shirley Ma , Bart Van Assche , linux-rdma , Or Gerlitz , eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org Cc: Matan Barak List-Id: linux-rdma@vger.kernel.org On 12/4/2014 9:41 PM, Shirley Ma wrote: > On 12/04/2014 10:43 AM, Bart Van Assche wrote: >> On 12/04/14 17:47, Shirley Ma wrote: >>> What's the history of this patch? >>> http://lists.openfabrics.org/pipermail/general/2008-May/050813.html >>> >>> I am working on multiple QPs workload. And I created a similar approach >>> with IB_CQ_VECTOR_LEAST_ATTACHED, which can bring up about 17% small I/O >>> performance. I think this CQ_VECTOR loading balance should be maintained >>> in provider not the caller. I didn't see this patch was submitted to >>> mainline kernel, wonder any reason behind? >> >> My interpretation is that an approach similar to IB_CQ_VECTOR_LEAST_ATTACHED is useful on single-socket systems but suboptimal on multi-socket systems. Hence the code for associating CQ sets with CPU sockets in the SRP initiator. These changes have been queued for kernel 3.19. See also branch drivers-for-3.19 in git repo git://git.infradead.org/users/hch/scsi-queue.git. > > What I did is that I manually controlled IRQ and working thread on the same socket. The CQ is created when mounting the file system in NFS/RDMA, but the workload thread might start from different socket, so per-cpu based implementation might not apply. I will look at SRP implementation. > Hey Shirley, Bart is correct, in general the LEAST_ATTACHED approach might not be optimal in the NUMA case. The thread <-> QP/CQ/CPU assignment is addressed by the multi-channel approach which to my understanding won't be implemented in NFSoRDMA in the near future (right Chuck?) However, the LEAST_ATTACH vector hint will revive again in the future as there is a need to spread applications on different interrupt vectors (especially for user-space). CC'ing Matan who is working on this, perhaps he can comment on this as well. Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html