IB_CQ_VECTOR_LEAST

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* IB_CQ_VECTOR_LEAST_ATTACHED
@ 2014-12-04 16:47 Shirley Ma
       [not found] ` <54809030.6090107-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Shirley Ma @ 2014-12-04 16:47 UTC (permalink / raw)
  To: linux-rdma, Or Gerlitz, eli-VPRAkNaXOzVWk0Htik3J/w

Hello Or, Eli,

What's the history of this patch?
	http://lists.openfabrics.org/pipermail/general/2008-May/050813.html

I am working on multiple QPs workload. And I created a similar approach with IB_CQ_VECTOR_LEAST_ATTACHED, which can bring up about 17% small I/O performance. I think this CQ_VECTOR loading balance should be maintained in provider not the caller. I didn't see this patch was submitted to mainline kernel, wonder any reason behind?

Thanks
Shirley 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <54809030.6090107-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]

* Re: IB_CQ_VECTOR_LEAST_ATTACHED
       [not found] ` <54809030.6090107-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-12-04 18:43   ` Bart Van Assche
       [not found]     ` <5480AB49.1080209-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2014-12-04 18:43 UTC (permalink / raw)
  To: Shirley Ma, linux-rdma, Or Gerlitz, eli-VPRAkNaXOzVWk0Htik3J/w

On 12/04/14 17:47, Shirley Ma wrote:
> What's the history of this patch?
> 	http://lists.openfabrics.org/pipermail/general/2008-May/050813.html
>
> I am working on multiple QPs workload. And I created a similar approach
 > with IB_CQ_VECTOR_LEAST_ATTACHED, which can bring up about 17% small I/O
 > performance. I think this CQ_VECTOR loading balance should be maintained
 > in provider not the caller. I didn't see this patch was submitted to
 > mainline kernel, wonder any reason behind?

My interpretation is that an approach similar to 
IB_CQ_VECTOR_LEAST_ATTACHED is useful on single-socket systems but 
suboptimal on multi-socket systems. Hence the code for associating CQ 
sets with CPU sockets in the SRP initiator. These changes have been 
queued for kernel 3.19. See also branch drivers-for-3.19 in git repo 
git://git.infradead.org/users/hch/scsi-queue.git.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <5480AB49.1080209-HInyCGIudOg@public.gmane.org>]

* Re: IB_CQ_VECTOR_LEAST_ATTACHED
       [not found]     ` <5480AB49.1080209-HInyCGIudOg@public.gmane.org>
@ 2014-12-04 19:41       ` Shirley Ma
       [not found]         ` <5480B8CE.3080704-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Shirley Ma @ 2014-12-04 19:41 UTC (permalink / raw)
  To: Bart Van Assche, linux-rdma, Or Gerlitz,
	eli-VPRAkNaXOzVWk0Htik3J/w

On 12/04/2014 10:43 AM, Bart Van Assche wrote:
> On 12/04/14 17:47, Shirley Ma wrote:
>> What's the history of this patch?
>>     http://lists.openfabrics.org/pipermail/general/2008-May/050813.html
>>
>> I am working on multiple QPs workload. And I created a similar approach
>> with IB_CQ_VECTOR_LEAST_ATTACHED, which can bring up about 17% small I/O
>> performance. I think this CQ_VECTOR loading balance should be maintained
>> in provider not the caller. I didn't see this patch was submitted to
>> mainline kernel, wonder any reason behind?
> 
> My interpretation is that an approach similar to IB_CQ_VECTOR_LEAST_ATTACHED is useful on single-socket systems but suboptimal on multi-socket systems. Hence the code for associating CQ sets with CPU sockets in the SRP initiator. These changes have been queued for kernel 3.19. See also branch drivers-for-3.19 in git repo git://git.infradead.org/users/hch/scsi-queue.git.

What I did is that I manually controlled IRQ and working thread on the same socket. The CQ is created when mounting the file system in NFS/RDMA, but the workload thread might start from different socket, so per-cpu based implementation might not apply. I will look at SRP implementation.

Thanks,
Shirley
 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <5480B8CE.3080704-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]

* Re: IB_CQ_VECTOR_LEAST_ATTACHED
       [not found]         ` <5480B8CE.3080704-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-12-07 10:20           ` Sagi Grimberg
       [not found]             ` <54842A05.9070207-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Sagi Grimberg @ 2014-12-07 10:20 UTC (permalink / raw)
  To: Shirley Ma, Bart Van Assche, linux-rdma, Or Gerlitz,
	eli-VPRAkNaXOzVWk0Htik3J/w
  Cc: Matan Barak

On 12/4/2014 9:41 PM, Shirley Ma wrote:
> On 12/04/2014 10:43 AM, Bart Van Assche wrote:
>> On 12/04/14 17:47, Shirley Ma wrote:
>>> What's the history of this patch?
>>>      http://lists.openfabrics.org/pipermail/general/2008-May/050813.html
>>>
>>> I am working on multiple QPs workload. And I created a similar approach
>>> with IB_CQ_VECTOR_LEAST_ATTACHED, which can bring up about 17% small I/O
>>> performance. I think this CQ_VECTOR loading balance should be maintained
>>> in provider not the caller. I didn't see this patch was submitted to
>>> mainline kernel, wonder any reason behind?
>>
>> My interpretation is that an approach similar to IB_CQ_VECTOR_LEAST_ATTACHED is useful on single-socket systems but suboptimal on multi-socket systems. Hence the code for associating CQ sets with CPU sockets in the SRP initiator. These changes have been queued for kernel 3.19. See also branch drivers-for-3.19 in git repo git://git.infradead.org/users/hch/scsi-queue.git.
>
> What I did is that I manually controlled IRQ and working thread on the same socket. The CQ is created when mounting the file system in NFS/RDMA, but the workload thread might start from different socket, so per-cpu based implementation might not apply. I will look at SRP implementation.
>

Hey Shirley,

Bart is correct, in general the LEAST_ATTACHED approach might not be
optimal in the NUMA case. The thread <-> QP/CQ/CPU assignment is 
addressed by the multi-channel approach which to my understanding won't
be implemented in NFSoRDMA in the near future (right Chuck?)

However, the LEAST_ATTACH vector hint will revive again in the future
as there is a need to spread applications on different interrupt
vectors (especially for user-space).

CC'ing Matan who is working on this, perhaps he can comment on this as
well.

Sagi.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <54842A05.9070207-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]

* Re: IB_CQ_VECTOR_LEAST_ATTACHED
       [not found]             ` <54842A05.9070207-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2014-12-07 12:22               ` Matan Barak
       [not found]                 ` <54844690.4040501-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2014-12-07 20:08               ` IB_CQ_VECTOR_LEAST_ATTACHED Chuck Lever
  1 sibling, 1 reply; 11+ messages in thread
From: Matan Barak @ 2014-12-07 12:22 UTC (permalink / raw)
  To: Sagi Grimberg, Shirley Ma, Bart Van Assche, linux-rdma,
	Or Gerlitz, eli-VPRAkNaXOzVWk0Htik3J/w



On 12/7/2014 12:20 PM, Sagi Grimberg wrote:
> On 12/4/2014 9:41 PM, Shirley Ma wrote:
>> On 12/04/2014 10:43 AM, Bart Van Assche wrote:
>>> On 12/04/14 17:47, Shirley Ma wrote:
>>>> What's the history of this patch?
>>>>
>>>> http://lists.openfabrics.org/pipermail/general/2008-May/050813.html
>>>>
>>>> I am working on multiple QPs workload. And I created a similar approach
>>>> with IB_CQ_VECTOR_LEAST_ATTACHED, which can bring up about 17% small
>>>> I/O
>>>> performance. I think this CQ_VECTOR loading balance should be
>>>> maintained
>>>> in provider not the caller. I didn't see this patch was submitted to
>>>> mainline kernel, wonder any reason behind?
>>>
>>> My interpretation is that an approach similar to
>>> IB_CQ_VECTOR_LEAST_ATTACHED is useful on single-socket systems but
>>> suboptimal on multi-socket systems. Hence the code for associating CQ
>>> sets with CPU sockets in the SRP initiator. These changes have been
>>> queued for kernel 3.19. See also branch drivers-for-3.19 in git repo
>>> git://git.infradead.org/users/hch/scsi-queue.git.
>>
>> What I did is that I manually controlled IRQ and working thread on the
>> same socket. The CQ is created when mounting the file system in
>> NFS/RDMA, but the workload thread might start from different socket,
>> so per-cpu based implementation might not apply. I will look at SRP
>> implementation.
>>
>
> Hey Shirley,
>
> Bart is correct, in general the LEAST_ATTACHED approach might not be
> optimal in the NUMA case. The thread <-> QP/CQ/CPU assignment is
> addressed by the multi-channel approach which to my understanding won't
> be implemented in NFSoRDMA in the near future (right Chuck?)
>
> However, the LEAST_ATTACH vector hint will revive again in the future
> as there is a need to spread applications on different interrupt
> vectors (especially for user-space).
>
> CC'ing Matan who is working on this, perhaps he can comment on this as
> well.
>
> Sagi.
>
>

Hi,

I'm not sure LEAST_ATTACHED is the best practice here. Applications 
might want to create a CQ on n different cores. You can't guarantee that 
with LEAST_ATTACHED policy. Anything smarter would probably require an 
API change or some tricks.

We might, for example, add an API like "give me the least attached CQ 
vector which isn't in the following list {a, b, c....}". Another option 
might be that when several CQs are registered with LEAST_ATTACHED on the 
same PD, we'll try to give them different vectors.

Anyway, these are some rough ideas. We should really think about that 
thoroughly before implementing.

Regards,
Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <54844690.4040501-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]

* Re: IB_CQ_VECTOR_LEAST_ATTACHED
       [not found]                 ` <54844690.4040501-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2014-12-07 12:59                   ` Or Gerlitz
       [not found]                     ` <54844F44.1010604-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Or Gerlitz @ 2014-12-07 12:59 UTC (permalink / raw)
  To: Matan Barak, Sagi Grimberg, Shirley Ma, Bart Van Assche,
	linux-rdma, eli-VPRAkNaXOzVWk0Htik3J/w

On 12/7/2014 2:22 PM, Matan Barak wrote:
> Applications might want to create a CQ on n different cores

You mean like an IRQ can flush on a mask potentially made of multiple CPUs?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <54844F44.1010604-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]

* Re: IB_CQ_VECTOR_LEAST_ATTACHED
       [not found]                     ` <54844F44.1010604-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2014-12-07 16:58                       ` Matan Barak
       [not found]                         ` <54848724.4020908-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Matan Barak @ 2014-12-07 16:58 UTC (permalink / raw)
  To: Or Gerlitz, Sagi Grimberg, Shirley Ma, Bart Van Assche,
	linux-rdma, eli-VPRAkNaXOzVWk0Htik3J/w



On 12/7/2014 2:59 PM, Or Gerlitz wrote:
> On 12/7/2014 2:22 PM, Matan Barak wrote:
>> Applications might want to create a CQ on n different cores
>
> You mean like an IRQ can flush on a mask potentially made of multiple CPUs?

Sort of. In both cases you try to spread the resources such that you'll 
get best performance (that should be done by the device driver itself). 
The user needs to somehow get n different least-used resources. 
Hopefully, if the device driver does a decent job - the user would get 
his resources potentially on multiple cpus.

Matan

>
> Or.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <54848724.4020908-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]

* Re: IB_CQ_VECTOR_LEAST_ATTACHED
       [not found]                         ` <54848724.4020908-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2014-12-09 20:11                           ` Or Gerlitz
  0 siblings, 0 replies; 11+ messages in thread
From: Or Gerlitz @ 2014-12-09 20:11 UTC (permalink / raw)
  To: Matan Barak
  Cc: Sagi Grimberg, Shirley Ma, Bart Van Assche, linux-rdma, Eli Cohen,
	Eyal Salomon

On Sun, Dec 7, 2014 at 6:58 PM, Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On 12/7/2014 2:59 PM, Or Gerlitz wrote:
>> On 12/7/2014 2:22 PM, Matan Barak wrote:
>>> Applications might want to create a CQ on n different cores

>> You mean like an IRQ can flush on a mask potentially made of multiple
>> CPUs?

> Sort of. In both cases you try to spread the resources such that you'll get
> best performance (that should be done by the device driver itself). The user
> needs to somehow get n different least-used resources. Hopefully, if the
> device driver does a decent job - the user would get his resources
> potentially on multiple cpus.


I am not sure to follow on the "n different LU resources".

Thinking on this matter little further, what user-space (and maybe
kernel too?) apps would want follows the rmap (reverse map where cpu
--> set of IRQs) used by the kernel aRFS logic, where we let app
choose some primitive that would cause the interrupt to be raised on
the cpu they want. In this context, the param to the cq creation verb
needs not be the vector number, but rather the cpu number (maybe nice
default to THIS_CPU) or set of cpus?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IB_CQ_VECTOR_LEAST_ATTACHED
       [not found]             ` <54842A05.9070207-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2014-12-07 12:22               ` IB_CQ_VECTOR_LEAST_ATTACHED Matan Barak
@ 2014-12-07 20:08               ` Chuck Lever
       [not found]                 ` <A1DD5C9B-ED0E-42D1-A20C-710C7DAB514B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 11+ messages in thread
From: Chuck Lever @ 2014-12-07 20:08 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Shirley Ma, Bart Van Assche, linux-rdma, Or Gerlitz,
	eli-VPRAkNaXOzVWk0Htik3J/w, Matan Barak

On Dec 7, 2014, at 5:20 AM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:

> On 12/4/2014 9:41 PM, Shirley Ma wrote:
>> On 12/04/2014 10:43 AM, Bart Van Assche wrote:
>>> On 12/04/14 17:47, Shirley Ma wrote:
>>>> What's the history of this patch?
>>>>     http://lists.openfabrics.org/pipermail/general/2008-May/050813.html
>>>> 
>>>> I am working on multiple QPs workload. And I created a similar approach
>>>> with IB_CQ_VECTOR_LEAST_ATTACHED, which can bring up about 17% small I/O
>>>> performance. I think this CQ_VECTOR loading balance should be maintained
>>>> in provider not the caller. I didn't see this patch was submitted to
>>>> mainline kernel, wonder any reason behind?
>>> 
>>> My interpretation is that an approach similar to IB_CQ_VECTOR_LEAST_ATTACHED is useful on single-socket systems but suboptimal on multi-socket systems. Hence the code for associating CQ sets with CPU sockets in the SRP initiator. These changes have been queued for kernel 3.19. See also branch drivers-for-3.19 in git repo git://git.infradead.org/users/hch/scsi-queue.git.
>> 
>> What I did is that I manually controlled IRQ and working thread on the same socket. The CQ is created when mounting the file system in NFS/RDMA, but the workload thread might start from different socket, so per-cpu based implementation might not apply. I will look at SRP implementation.
>> 
> 
> Hey Shirley,
> 
> Bart is correct, in general the LEAST_ATTACHED approach might not be
> optimal in the NUMA case. The thread <-> QP/CQ/CPU assignment is
> addressed by the multi-channel approach which to my understanding won't
> be implemented in NFSoRDMA in the near future (right Chuck?)

As I understand it, the preference of the Linux NFS community is that
any multi-pathing solution should be transparent to the ULP (NFS and
RPC, in this case). mp-tcp is ideal in that the ULP is presented with
a single virtual transport instance, but under the covers, that instance
can be backed by multiple active paths.

Alternately, pNFS can be deployed. This allows a dataset to be striped
across multiple servers (and networks). There is a rather high bar to
entering this arena however.

Speculating aloud, multiple QPs per transport instance may require
implementation changes on the server as well as the client. Any
interoperability dependencies should be documented via a standards
process.

And note that an RPC transport (at least in kernel) is shared across
many user applications and mount points. I find it difficult to visualize
an intuitive and comprehensive administrative interface where enough
guidance is provided to place a set of NFS applications and an RPC
transport in the same resource domain (maybe cgroups?).

So for the time being I prefer staying with a single QP per client-
server pair.

A large NFS client can actively use many NFS servers, however. Each
client-server pair would benefit from finding "least-used" resources
when QP and CQs are created. That is something we can leverage today.

> However, the LEAST_ATTACH vector hint will revive again in the future
> as there is a need to spread applications on different interrupt
> vectors (especially for user-space).
> 
> CC'ing Matan who is working on this, perhaps he can comment on this as
> well.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <A1DD5C9B-ED0E-42D1-A20C-710C7DAB514B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]

* Re: IB_CQ_VECTOR_LEAST_ATTACHED
       [not found]                 ` <A1DD5C9B-ED0E-42D1-A20C-710C7DAB514B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-12-08  0:46                   ` Shirley Ma
  2014-12-09 11:29                   ` IB_CQ_VECTOR_LEAST_ATTACHED Sagi Grimberg
  1 sibling, 0 replies; 11+ messages in thread
From: Shirley Ma @ 2014-12-08  0:46 UTC (permalink / raw)
  To: Chuck Lever, Sagi Grimberg
  Cc: Bart Van Assche, linux-rdma, Or Gerlitz,
	eli-VPRAkNaXOzVWk0Htik3J/w, Matan Barak



On 12/07/2014 12:08 PM, Chuck Lever wrote:
> 
> On Dec 7, 2014, at 5:20 AM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> 
>> On 12/4/2014 9:41 PM, Shirley Ma wrote:
>>> On 12/04/2014 10:43 AM, Bart Van Assche wrote:
>>>> On 12/04/14 17:47, Shirley Ma wrote:
>>>>> What's the history of this patch?
>>>>>     http://lists.openfabrics.org/pipermail/general/2008-May/050813.html
>>>>>
>>>>> I am working on multiple QPs workload. And I created a similar approach
>>>>> with IB_CQ_VECTOR_LEAST_ATTACHED, which can bring up about 17% small I/O
>>>>> performance. I think this CQ_VECTOR loading balance should be maintained
>>>>> in provider not the caller. I didn't see this patch was submitted to
>>>>> mainline kernel, wonder any reason behind?
>>>>
>>>> My interpretation is that an approach similar to IB_CQ_VECTOR_LEAST_ATTACHED is useful on single-socket systems but suboptimal on multi-socket systems. Hence the code for associating CQ sets with CPU sockets in the SRP initiator. These changes have been queued for kernel 3.19. See also branch drivers-for-3.19 in git repo git://git.infradead.org/users/hch/scsi-queue.git.
>>>
>>> What I did is that I manually controlled IRQ and working thread on the same socket. The CQ is created when mounting the file system in NFS/RDMA, but the workload thread might start from different socket, so per-cpu based implementation might not apply. I will look at SRP implementation.
>>>
>>
>> Hey Shirley,
>>
>> Bart is correct, in general the LEAST_ATTACHED approach might not be
>> optimal in the NUMA case. The thread <-> QP/CQ/CPU assignment is
>> addressed by the multi-channel approach which to my understanding won't
>> be implemented in NFSoRDMA in the near future (right Chuck?)
> 
> As I understand it, the preference of the Linux NFS community is that
> any multi-pathing solution should be transparent to the ULP (NFS and
> RPC, in this case). mp-tcp is ideal in that the ULP is presented with
> a single virtual transport instance, but under the covers, that instance
> can be backed by multiple active paths.
> 
> Alternately, pNFS can be deployed. This allows a dataset to be striped
> across multiple servers (and networks). There is a rather high bar to
> entering this arena however.
> 
> Speculating aloud, multiple QPs per transport instance may require
> implementation changes on the server as well as the client. Any
> interoperability dependencies should be documented via a standards
> process.
> 
> And note that an RPC transport (at least in kernel) is shared across
> many user applications and mount points. I find it difficult to visualize
> an intuitive and comprehensive administrative interface where enough
> guidance is provided to place a set of NFS applications and an RPC
> transport in the same resource domain (maybe cgroups?).
> 
> So for the time being I prefer staying with a single QP per client-
> server pair.
> 
> A large NFS client can actively use many NFS servers, however. Each
> client-server pair would benefit from finding "least-used" resources
> when QP and CQs are created. That is something we can leverage today.
> 

Yes, that's something I am evaluating now for one NFS client to different destination servers. I can see more than 15% BW performance increase when simulating multiple servers mount points through different IPoIB child interfaces and changing create_cq completion vector from 0 and "least-used", so comp vectors are balanced among QPs.

>> However, the LEAST_ATTACH vector hint will revive again in the future
>> as there is a need to spread applications on different interrupt
>> vectors (especially for user-space).
>>
>> CC'ing Matan who is working on this, perhaps he can comment on this as
>> well.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: IB_CQ_VECTOR_LEAST_ATTACHED
       [not found]                 ` <A1DD5C9B-ED0E-42D1-A20C-710C7DAB514B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2014-12-08  0:46                   ` IB_CQ_VECTOR_LEAST_ATTACHED Shirley Ma
@ 2014-12-09 11:29                   ` Sagi Grimberg
  1 sibling, 0 replies; 11+ messages in thread
From: Sagi Grimberg @ 2014-12-09 11:29 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Shirley Ma, Bart Van Assche, linux-rdma, Or Gerlitz,
	eli-VPRAkNaXOzVWk0Htik3J/w, Matan Barak

On 12/7/2014 10:08 PM, Chuck Lever wrote:
>
> On Dec 7, 2014, at 5:20 AM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
>
>> On 12/4/2014 9:41 PM, Shirley Ma wrote:
>>> On 12/04/2014 10:43 AM, Bart Van Assche wrote:
>>>> On 12/04/14 17:47, Shirley Ma wrote:
>>>>> What's the history of this patch?
>>>>>      http://lists.openfabrics.org/pipermail/general/2008-May/050813.html
>>>>>
>>>>> I am working on multiple QPs workload. And I created a similar approach
>>>>> with IB_CQ_VECTOR_LEAST_ATTACHED, which can bring up about 17% small I/O
>>>>> performance. I think this CQ_VECTOR loading balance should be maintained
>>>>> in provider not the caller. I didn't see this patch was submitted to
>>>>> mainline kernel, wonder any reason behind?
>>>>
>>>> My interpretation is that an approach similar to IB_CQ_VECTOR_LEAST_ATTACHED is useful on single-socket systems but suboptimal on multi-socket systems. Hence the code for associating CQ sets with CPU sockets in the SRP initiator. These changes have been queued for kernel 3.19. See also branch drivers-for-3.19 in git repo git://git.infradead.org/users/hch/scsi-queue.git.
>>>
>>> What I did is that I manually controlled IRQ and working thread on the same socket. The CQ is created when mounting the file system in NFS/RDMA, but the workload thread might start from different socket, so per-cpu based implementation might not apply. I will look at SRP implementation.
>>>
>>
>> Hey Shirley,
>>
>> Bart is correct, in general the LEAST_ATTACHED approach might not be
>> optimal in the NUMA case. The thread <-> QP/CQ/CPU assignment is
>> addressed by the multi-channel approach which to my understanding won't
>> be implemented in NFSoRDMA in the near future (right Chuck?)
>
> As I understand it, the preference of the Linux NFS community is that
> any multi-pathing solution should be transparent to the ULP (NFS and
> RPC, in this case).

Agree.

> mp-tcp is ideal in that the ULP is presented with
> a single virtual transport instance, but under the covers, that instance
> can be backed by multiple active paths.
>
> Alternately, pNFS can be deployed. This allows a dataset to be striped
> across multiple servers (and networks). There is a rather high bar to
> entering this arena however.
>
> Speculating aloud, multiple QPs per transport instance may require
> implementation changes on the server as well as the client. Any
> interoperability dependencies should be documented via a standards
> process.

Correct, this obviously needs negotiation. But this is specific  to
NFSoRDMA standard.

>
> And note that an RPC transport (at least in kernel) is shared across
> many user applications and mount points. I find it difficult to visualize
> an intuitive and comprehensive administrative interface where enough
> guidance is provided to place a set of NFS applications and an RPC
> transport in the same resource domain (maybe cgroups?).

This is why a multi-channel approach will solve the problem. Each IO
operation selects a channel by the best fit (for example running
cpu-id). This gives a *very* high gain and possibly max out HW
performance even over a single mount.

Having said that, I think this discussion is ahead of its time...

>
> So for the time being I prefer staying with a single QP per client-
> server pair.
>
> A large NFS client can actively use many NFS servers, however. Each
> client-server pair would benefit from finding "least-used" resources
> when QP and CQs are created. That is something we can leverage today.
>

I agree that for the current state least-used can give some benefit by
separating interrupt vectors for each client-server pair.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-12-09 20:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-04 16:47 IB_CQ_VECTOR_LEAST_ATTACHED Shirley Ma
     [not found] ` <54809030.6090107-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-12-04 18:43   ` IB_CQ_VECTOR_LEAST_ATTACHED Bart Van Assche
     [not found]     ` <5480AB49.1080209-HInyCGIudOg@public.gmane.org>
2014-12-04 19:41       ` IB_CQ_VECTOR_LEAST_ATTACHED Shirley Ma
     [not found]         ` <5480B8CE.3080704-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-12-07 10:20           ` IB_CQ_VECTOR_LEAST_ATTACHED Sagi Grimberg
     [not found]             ` <54842A05.9070207-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-12-07 12:22               ` IB_CQ_VECTOR_LEAST_ATTACHED Matan Barak
     [not found]                 ` <54844690.4040501-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-12-07 12:59                   ` IB_CQ_VECTOR_LEAST_ATTACHED Or Gerlitz
     [not found]                     ` <54844F44.1010604-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-12-07 16:58                       ` IB_CQ_VECTOR_LEAST_ATTACHED Matan Barak
     [not found]                         ` <54848724.4020908-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-12-09 20:11                           ` IB_CQ_VECTOR_LEAST_ATTACHED Or Gerlitz
2014-12-07 20:08               ` IB_CQ_VECTOR_LEAST_ATTACHED Chuck Lever
     [not found]                 ` <A1DD5C9B-ED0E-42D1-A20C-710C7DAB514B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-12-08  0:46                   ` IB_CQ_VECTOR_LEAST_ATTACHED Shirley Ma
2014-12-09 11:29                   ` IB_CQ_VECTOR_LEAST_ATTACHED Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox