From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bart Van Assche Subject: Re: [PATCH v2 12/12] IB/srp: Add multichannel support Date: Mon, 20 Oct 2014 14:56:48 +0200 Message-ID: <54450690.709@acm.org> References: <5433E43D.3010107@acm.org> <5433E585.607@acm.org> <5443F69F.40606@dev.mellanox.co.il> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from baptiste.telenet-ops.be ([195.130.132.51]:55163 "EHLO baptiste.telenet-ops.be" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752121AbaJTM46 (ORCPT ); Mon, 20 Oct 2014 08:56:58 -0400 In-Reply-To: <5443F69F.40606@dev.mellanox.co.il> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Sagi Grimberg , Christoph Hellwig Cc: Jens Axboe , Sagi Grimberg , Sebastian Parschauer , Robert Elliott , Ming Lei , "linux-scsi@vger.kernel.org" , linux-rdma On 10/19/14 19:36, Sagi Grimberg wrote: > On 10/7/2014 4:07 PM, Bart Van Assche wrote: >> * comp_vector, a number in the range 0..n-1 specifying the >> - MSI-X completion vector. Some HCA's allocate multiple (n) >> - MSI-X vectors per HCA port. If the IRQ affinity masks of >> - these interrupts have been configured such that each MSI-X >> - interrupt is handled by a different CPU then the comp_vector >> - parameter can be used to spread the SRP completion workload >> - over multiple CPU's. >> + MSI-X completion vector of the first RDMA channel. Some >> + HCA's allocate multiple (n) MSI-X vectors per HCA port. If >> + the IRQ affinity masks of these interrupts have been >> + configured such that each MSI-X interrupt is handled by a >> + different CPU then the comp_vector parameter can be used to >> + spread the SRP completion workload over multiple CPU's. > > This is fairly not trivial for the user... > > Aren't we requesting a bit too much awareness here? > Can't we just "make it work"? The user hands out ch_count - why can't > you do some least-used logic here? > > Maybe we can even go with per-cpu QPs and discard comp_vector argument? > this would probably bring the best performance, wouldn't it? > (fallback to least-used logic in case HW support less vectors) Hello Sagi, The only reason the comp_vector parameter is still supported is because of backwards compatibility. What I expect is that users will set the ch_count parameter but not the comp_vector parameter. Using one QP per CPU thread does not necessarily result in the best performance. In the tests I ran performance was about 4% better when using one QP for each pair of CPU threads (with hyperthreading enabled). >> +static unsigned ch_count; >> +module_param(ch_count, uint, 0444); >> +MODULE_PARM_DESC(ch_count, >> + "Number of RDMA channels to use for communication with an >> SRP target. Using more than one channel improves performance if the >> HCA supports multiple completion vectors. The default value is the >> minimum of four times the number of online CPU sockets and the number >> of completion vectors supported by the HCA."); > > Why? how did you get to this magic equation? On the systems I have access to measurements have shown that this choice for the ch_count parameter results in a significant performance improvement without consuming too many system resources. The performance difference when using more than four channels was small. This means that the exact value of this parameter is not that important. What matters to me is that users can benefit from improved performance even if the ch_count kernel module parameter has been left to its default value. Bart.