public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
To: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>,
	Vu Pham <vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Sebastian Riemer
	<sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>,
	Jinpu Wang <jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>,
	linux-rdma <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
Date: Mon, 15 Jul 2013 16:29:21 +0300	[thread overview]
Message-ID: <51E3F931.9080903@mellanox.com> (raw)
In-Reply-To: <51E3D79D.9070808-HInyCGIudOg@public.gmane.org>

On 7/15/2013 2:06 PM, Bart Van Assche wrote:
> On 14/07/2013 3:43, Sagi Grimberg wrote:
>> On 7/3/2013 3:58 PM, Bart Van Assche wrote:
>>> Several InfiniBand HCA's allow to configure the completion vector
>>> per queue pair. This allows to spread the workload created by IB
>>> completion interrupts over multiple MSI-X vectors and hence over
>>> multiple CPU cores. In other words, configuring the completion
>>> vector properly not only allows to reduce latency on an initiator
>>> connected to multiple SRP targets but also allows to improve
>>> throughput.
>>
>> Hey Bart,
>> Just wrote a small patch to allow srp_daemon spread connection across
>> HCA's completion vectors.
>> But re-thinking on this, is it really a good idea to give the user
>> control over completion
>> vectors for CQs he doesn't really owns. This way the user must retrieve
>> the maximum completion
>> vectors from the ib_device and consider this when adding a connection
>> and In addition will need to set proper IRQ affinity.
>>
>> Perhaps the driver can manage this on it's own without involving the
>> user, take the mlx4_en driver for
>> example, it spreads it's CQs across HCAs completion vectors without
>> involving the user. the user that
>> opens a socket has no influence of the underlying cq<->comp-vector
>> assignment.
>>
>> The only use-case I can think of is where the user will want to use only
>> a subset of the completion-vectors
>> if the user will want to reserve some completion-vectors for native IB
>> applications but I don't know
>> how common it is.
>>
>> Other from that, I think it is always better to spread the CQs across
>> HCA completion-vectors, so perhaps the driver
>> just assign connection CQs across comp-vecs without getting args from
>> the user, but simply iterate over comp_vectors.
>>
>> What do you think?
>
> Hello Sagi,
>
> Sorry but I do not think it is a good idea to let srp_daemon assign 
> the completion vector. While this might work well on single-socket 
> systems this will result in suboptimal results on NUMA systems. For 
> certain workloads on NUMA systems, and when a NUMA initiator system is 
> connected to multiple target systems, the optimal configuration is to 
> make sure that all processing that is associated with a single SCSI 
> host occurs on the same NUMA node. This means configuring the 
> completion vector value such that IB interrupts are generated on the 
> same NUMA node where the associated SCSI host and applications are 
> running.
>
> More in general, performance tuning on NUMA systems requires 
> system-wide knowledge of all applications that are running and also of 
> which interrupt is processed by which NUMA node. So choosing a proper 
> value for the completion vector is only possible once the system 
> topology and the IRQ affinity masks are known. I don't think we should 
> build knowledge of all this in srp_daemon.
>
> Bart.
>

Hey Bart,

Thanks for your quick attention for my question.
srp_daemon is a package designated for the costumer to automatically 
detect targets in the IB fabric. From our expeirience here in Mellanox, 
costumers/users like automatic "plug&play" tools.
They are reluctant to build their own scriptology to enhance performance 
and settle with srp_daemon which is preferred over use of ibsrpdm and 
manual adding new targets.
Regardless, the completion vectors assignment is meaningless without 
setting proper IRQ affinity, so in the worst case where the user didn't 
set his IRQ affinity,
this assignment will perform like the default completion vector 
assignment as all IRQs are directed without any masking i.e. core 0.

 From my expiriments in NUMA systems, optimal performance is gained 
where all IRQs are directed to half of the cores on the NUMA node close 
to the HCA, and all traffic generators share the other half of the cores 
on the same NUMA node. So based on that knowledge, I thought that 
srp_daemon/srp driver will assign it's CQs across the HCAs completion 
vectors, and the user is encouraged to set the IRQ affinity as described 
above to gain optimal performance.
Adding connections over the far NUMA node don't seem to benefit 
performance too much...

As I mentioned, a use-case I see that may raise a problem here, is if 
the user would like to maintain multiple SRP connections and reserve 
some completion vectors for other IB applications on the system.
in this case the user will be able to disable srp_daemon/srp driver 
completion vectors assignment.

So, this was just an idea, and easy implementation that would 
potentionaly give the user semi-automatic performance optimized 
configuration...

-Sagi
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2013-07-15 13:29 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-03 12:41 [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11 Bart Van Assche
     [not found] ` <51D41C03.4020607-HInyCGIudOg@public.gmane.org>
2013-07-03 12:43   ` [PATCH v3 01/13] IB/srp: Fix remove_one crash due to resource exhaustion Bart Van Assche
2013-07-03 12:44   ` [PATCH v3 02/13] IB/srp: Avoid that srp_reset_host() is skipped after a TL error Bart Van Assche
2013-07-03 12:45   ` [PATCH v3 03/13] IB/srp: Fail I/O fast if target offline Bart Van Assche
2013-07-03 12:50   ` [PATCH v3 04/13] IB/srp: Skip host settle delay Bart Van Assche
2013-07-03 12:51   ` [PATCH v3 05/13] IB/srp: Maintain a single connection per I_T nexus Bart Van Assche
2013-07-03 12:54   ` [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling Bart Van Assche
     [not found]     ` <51D41F13.6060203-HInyCGIudOg@public.gmane.org>
2013-07-03 15:14       ` David Dillow
2013-07-03 16:00         ` Bart Van Assche
     [not found]           ` <51D44A86.5050000-HInyCGIudOg@public.gmane.org>
2013-07-03 17:27             ` David Dillow
     [not found]               ` <1372872474.24238.43.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
2013-07-03 18:24                 ` Bart Van Assche
2013-07-03 18:57                   ` David Dillow
     [not found]                     ` <1372877861.24238.64.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
2013-07-03 23:41                       ` Vu Pham
2013-07-04  8:01                       ` Bart Van Assche
     [not found]                         ` <51D52BD7.1090506-HInyCGIudOg@public.gmane.org>
2013-07-04  8:16                           ` Bart Van Assche
2013-07-08 20:37                           ` David Dillow
2013-07-08 17:26                 ` Vu Pham
     [not found]                   ` <51DAF63D.9010906-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-07-08 18:42                     ` Bart Van Assche
2013-07-03 12:55   ` [PATCH v3 08/13] IB/srp: Add srp_terminate_io() Bart Van Assche
     [not found]     ` <51D41F52.4000409-HInyCGIudOg@public.gmane.org>
2013-07-03 14:08       ` David Dillow
     [not found]         ` <1372860491.24238.0.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
2013-07-03 14:45           ` Bart Van Assche
     [not found]             ` <51D43915.9000007-HInyCGIudOg@public.gmane.org>
2013-07-03 14:57               ` David Dillow
     [not found]                 ` <1372863441.24238.26.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
2013-07-03 15:13                   ` David Dillow
2013-07-03 12:56   ` [PATCH v3 09/13] IB/srp: Use SRP transport layer error recovery Bart Van Assche
2013-07-03 12:57   ` [PATCH v3 10/13] IB/srp: Start timers if a transport layer error occurs Bart Van Assche
2013-07-03 12:58   ` [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable Bart Van Assche
     [not found]     ` <51D41FFC.6070105-HInyCGIudOg@public.gmane.org>
2013-07-14  9:43       ` Sagi Grimberg
     [not found]         ` <51E272A4.5030707-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-07-15 11:06           ` Bart Van Assche
     [not found]             ` <51E3D79D.9070808-HInyCGIudOg@public.gmane.org>
2013-07-15 13:29               ` Sagi Grimberg [this message]
     [not found]                 ` <51E3F931.9080903-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-07-15 18:23                   ` Bart Van Assche
     [not found]                     ` <51E43E22.2060502-HInyCGIudOg@public.gmane.org>
2013-07-16 10:11                       ` Sagi Grimberg
     [not found]                         ` <51E51C56.50906-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-07-16 10:58                           ` Bart Van Assche
     [not found]                             ` <51E5275F.2070009-HInyCGIudOg@public.gmane.org>
2013-07-16 12:41                               ` Sagi Grimberg
2013-07-16 15:11           ` Bart Van Assche
     [not found]             ` <51E56296.2000403-HInyCGIudOg@public.gmane.org>
2013-07-17  9:27               ` Sagi Grimberg
2013-07-03 12:59   ` [PATCH v3 12/13] IB/srp: Make transport layer retry count configurable Bart Van Assche
     [not found]     ` <51D4204E.7040301-HInyCGIudOg@public.gmane.org>
2013-07-03 14:30       ` David Dillow
2013-07-03 13:00   ` [PATCH v3 13/13] IB/srp: Bump driver version and release date Bart Van Assche
2013-07-03 12:53 ` [PATCH v3 06/13] IB/srp: Keep rport as long as the IB transport layer Bart Van Assche
2013-07-03 13:38 ` [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11 Or Gerlitz
2013-07-03 14:38   ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51E3F931.9080903@mellanox.com \
    --to=sagig-vpraknaxozvwk0htik3j/w@public.gmane.org \
    --cc=bvanassche-HInyCGIudOg@public.gmane.org \
    --cc=dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org \
    --cc=jinpu.wang-EIkl63zCoXaH+58JC4qpiA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org \
    --cc=vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox