* RDMA/CM and multiple QPs
@ 2015-09-06 6:45 Christoph Hellwig
[not found] ` <20150906064550.GA30683-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2015-09-06 6:45 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi All,
right now RDMA/CM works on a QP basis, but seems very awakward if you
want multiple QPs as part of a single logical device, which will be
useful for a lot of modern protocols. For example we will need to check
in the CM handler that we're not getting a different ib_device if we
want to apply the device limit in any sort of global scope, and it's
generally very hard to get a struct ib_device that can be used as
a driver model parent.
Is there any interest in trying to add an API to the CM to do a single
address resolution and allocate multiple QPs with these checks in
place?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread[parent not found: <20150906064550.GA30683-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: RDMA/CM and multiple QPs [not found] ` <20150906064550.GA30683-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2015-09-06 7:42 ` Parav Pandit [not found] ` <CAG53R5VZDZKiuR-jLybS1PhrT9K4GG6xTr8bOG-L0VaQgqEXSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-09-08 12:32 ` Sagi Grimberg 2015-09-10 16:30 ` Hefty, Sean 2 siblings, 1 reply; 15+ messages in thread From: Parav Pandit @ 2015-09-06 7:42 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA Hi Christoph, Establishing multiple QP is just one part of it. Bigger challenge is how do we distribute the work request among multiple QPs specially when STAG advertisements, their invalidation is agnostic at Verbs layer (which is not part of the IB spec and every ULP has their own method possibly for good reason). Few months back when I was working on this problem; solution we considered is similar to what networking stack currently does. As below: 1. instead of having pure ib_send, write, read verbs, invalidate, we need to have more higher level verbs for data transport. such send_data, receive_data, advertise data_buffers etc. Of course keeping zero copy semantics in mind. 2. Perform device aggregation similar to Ethernet netdev link aggregation. So two ib_device forms the pair on which one or more QPs will be created. This virtual device provides higher level data transfer APIS than just raw IB semantics. By doing so, this layer decides how to advertise memory, when to invalidate, which QP to use for transport (load balance or failover). 3. I have not thought through on how we can port existing ULPs whose specification is IB driven to migrate on this newly defined interface. 4. Accelio is one such framework come close to this design philosophy, however its current implementation brings resource overhead for MRs and as we go along we have scope to optimize it. 5. Since this layer is located above raw IB verbs layer and above RDMA-CM, core is untouched for the functionality. Once we have it many of the migration related issue can be solved, where node can disconnect and reconnect in stateful way. 6. This way pure hardware resource is detached from transport acceleration, it gives flexibility to implement services which is often difficult to do at raw IB verbs level. Parav On Sun, Sep 6, 2015 at 12:15 PM, Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote: > Hi All, > > right now RDMA/CM works on a QP basis, but seems very awakward if you > want multiple QPs as part of a single logical device, which will be > useful for a lot of modern protocols. For example we will need to check > in the CM handler that we're not getting a different ib_device if we > want to apply the device limit in any sort of global scope, and it's > generally very hard to get a struct ib_device that can be used as > a driver model parent. > > Is there any interest in trying to add an API to the CM to do a single > address resolution and allocate multiple QPs with these checks in > place? > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <CAG53R5VZDZKiuR-jLybS1PhrT9K4GG6xTr8bOG-L0VaQgqEXSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: RDMA/CM and multiple QPs [not found] ` <CAG53R5VZDZKiuR-jLybS1PhrT9K4GG6xTr8bOG-L0VaQgqEXSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-09-06 7:50 ` Christoph Hellwig [not found] ` <20150906075024.GA7845-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Christoph Hellwig @ 2015-09-06 7:50 UTC (permalink / raw) To: Parav Pandit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Sun, Sep 06, 2015 at 01:12:56PM +0530, Parav Pandit wrote: > Hi Christoph, > > Establishing multiple QP is just one part of it. > Bigger challenge is how do we distribute the work request among > multiple QPs For my case I simply rely on the blk-mq layer to have cpu-local queues, so that's a somewhat solved issue as long as you are fine with the usage model. If your usage is skewed heavily towards certain CPUs it might be a little suboptimal. Note that the SRP driver already in tree is a good example for this, although it doesn't use RDMA/CM and thus already operates on a per-ib_device level. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20150906075024.GA7845-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: RDMA/CM and multiple QPs [not found] ` <20150906075024.GA7845-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2015-09-06 7:54 ` Parav Pandit [not found] ` <CAG53R5UsH3aEmRf2EgNYydJ=cMZCFG19ZQjHcLn=NjQxsnwf-g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-09-06 15:15 ` Bart Van Assche 1 sibling, 1 reply; 15+ messages in thread From: Parav Pandit @ 2015-09-06 7:54 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Sun, Sep 6, 2015 at 1:20 PM, Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote: > On Sun, Sep 06, 2015 at 01:12:56PM +0530, Parav Pandit wrote: >> Hi Christoph, >> >> Establishing multiple QP is just one part of it. >> Bigger challenge is how do we distribute the work request among >> multiple QPs > > For my case I simply rely on the blk-mq layer to have cpu-local queues, > so that's a somewhat solved issue as long as you are fine with the > usage model. If your usage is skewed heavily towards certain CPUs > it might be a little suboptimal. > > Note that the SRP driver already in tree is a good example for this, > although it doesn't use RDMA/CM and thus already operates on a > per-ib_device level. Yes. SRP is good example. The point I am trying to make is, SRP implements failover and request spreading where one QP fails it delivers to other QP. So one Session spans across multiple transport QP connections. Similarly we every ULP needs to implement such functionalities. Instead there could be single such transport mid layer who should do it. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <CAG53R5UsH3aEmRf2EgNYydJ=cMZCFG19ZQjHcLn=NjQxsnwf-g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: RDMA/CM and multiple QPs [not found] ` <CAG53R5UsH3aEmRf2EgNYydJ=cMZCFG19ZQjHcLn=NjQxsnwf-g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-09-07 5:08 ` Christoph Hellwig 0 siblings, 0 replies; 15+ messages in thread From: Christoph Hellwig @ 2015-09-07 5:08 UTC (permalink / raw) To: Parav Pandit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Sun, Sep 06, 2015 at 01:24:52PM +0530, Parav Pandit wrote: > Yes. SRP is good example. The point I am trying to make is, SRP > implements failover and request spreading where one QP fails it > delivers to other QP. But SRP doesn't implement that. There are no fail over capabilities in a single SRP session even with multiple QPs, and the spreading is implemented by a higher layer, namely blk-mq, which is common code for all block drivers. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RDMA/CM and multiple QPs [not found] ` <20150906075024.GA7845-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-09-06 7:54 ` Parav Pandit @ 2015-09-06 15:15 ` Bart Van Assche [not found] ` <55EC5879.202-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> 1 sibling, 1 reply; 15+ messages in thread From: Bart Van Assche @ 2015-09-06 15:15 UTC (permalink / raw) To: Christoph Hellwig, Parav Pandit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On 09/06/15 00:50, Christoph Hellwig wrote: > Note that the SRP driver already in tree is a good example for this, > although it doesn't use RDMA/CM and thus already operates on a > per-ib_device level. The challenges with regard to adding RDMA/CM support to the SRP initiator and target drivers are: - IANA has not yet assigned a port number to the SRP protocol (see e.g. http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml). - The login request (struct srp_login_req) is too large for the RDMA/CM. A format for the login parameters for the RDMA/CM has not yet been standardized. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <55EC5879.202-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>]
* Re: RDMA/CM and multiple QPs [not found] ` <55EC5879.202-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> @ 2015-09-08 13:57 ` Tom Talpey [not found] ` <55EEE936.3060702-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Tom Talpey @ 2015-09-08 13:57 UTC (permalink / raw) To: Bart Van Assche, Christoph Hellwig, Parav Pandit Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On 9/6/2015 11:15 AM, Bart Van Assche wrote: > On 09/06/15 00:50, Christoph Hellwig wrote: >> Note that the SRP driver already in tree is a good example for this, >> although it doesn't use RDMA/CM and thus already operates on a >> per-ib_device level. > > The challenges with regard to adding RDMA/CM support to the SRP > initiator and target drivers are: > - IANA has not yet assigned a port number to the SRP protocol (see e.g. > > http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml). > IANA doesn't do this automatically. Has anyone made the request? You might want to think through why it needs a dedicated port number though. iSER reuses the iSCSI port, by negotiating RDMA during login. > - The login request (struct srp_login_req) is too large for the RDMA/CM. > A format for the login parameters for the RDMA/CM has not yet been > standardized. Are you suggesting that RDMA/CM perform the login? That seems like a layering issue. Tom. > > Bart. > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <55EEE936.3060702-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>]
* Re: RDMA/CM and multiple QPs [not found] ` <55EEE936.3060702-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org> @ 2015-09-08 15:07 ` Bart Van Assche 0 siblings, 0 replies; 15+ messages in thread From: Bart Van Assche @ 2015-09-08 15:07 UTC (permalink / raw) To: Tom Talpey, Christoph Hellwig, Parav Pandit Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On 09/08/2015 06:57 AM, Tom Talpey wrote: > On 9/6/2015 11:15 AM, Bart Van Assche wrote: >> On 09/06/15 00:50, Christoph Hellwig wrote: >>> Note that the SRP driver already in tree is a good example for this, >>> although it doesn't use RDMA/CM and thus already operates on a >>> per-ib_device level. >> >> The challenges with regard to adding RDMA/CM support to the SRP >> initiator and target drivers are: >> - IANA has not yet assigned a port number to the SRP protocol (see e.g. >> >> http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml). > > IANA doesn't do this automatically. Has anyone made the request? > > You might want to think through why it needs a dedicated port number > though. iSER reuses the iSCSI port, by negotiating RDMA during login. iSER is an iSCSI transport and that is why iSER reuses the iSCSI port number. SRP is a SCSI transport protocol by itself and that is why a new port number is needed for the SRP protocol. >> - The login request (struct srp_login_req) is too large for the RDMA/CM. >> A format for the login parameters for the RDMA/CM has not yet been >> standardized. > > Are you suggesting that RDMA/CM perform the login? That seems > like a layering issue. Sorry but I don't see why this would be a layering issue. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: RDMA/CM and multiple QPs [not found] ` <20150906064550.GA30683-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-09-06 7:42 ` Parav Pandit @ 2015-09-08 12:32 ` Sagi Grimberg [not found] ` <55EED54B.7090608-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-09-10 16:30 ` Hefty, Sean 2 siblings, 1 reply; 15+ messages in thread From: Sagi Grimberg @ 2015-09-08 12:32 UTC (permalink / raw) To: Christoph Hellwig, linux-rdma-u79uwXL29TY76Z2rM5mHXA On 9/6/2015 9:45 AM, Christoph Hellwig wrote: > Hi All, > > right now RDMA/CM works on a QP basis, but seems very awakward if you > want multiple QPs as part of a single logical device, which will be > useful for a lot of modern protocols. For example we will need to check > in the CM handler that we're not getting a different ib_device if we > want to apply the device limit in any sort of global scope, and it's > generally very hard to get a struct ib_device that can be used as > a driver model parent. > > Is there any interest in trying to add an API to the CM to do a single > address resolution and allocate multiple QPs with these checks in > place? Hi Christoph, The CM is responsible of establishing an RDMA channel. What you are referring to is a concept of a session. I'm not entirely sure how we can fit a model where the CM establishes a multi-channel session as the CM request contains a (single) source QPN. So there is a 1-1 relationship between a cm_id and a queue-pair. The device handle depends on the address resolution to the end-node. I assume we can think of some form of an rdma_session which will manage multiple cm_id's (that belongs to a single address resolution), call the ULP to allocate their corresponding queue-pairs and send a connect request for each one. Such an rdma_session can verify the same ib_device handle on all the cm_id's. But I'm not sure how such a concept would impact on aspects such as event handling etc... Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <55EED54B.7090608-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: RDMA/CM and multiple QPs [not found] ` <55EED54B.7090608-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-09-08 13:14 ` Christoph Hellwig [not found] ` <20150908131407.GB5316-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Christoph Hellwig @ 2015-09-08 13:14 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Tue, Sep 08, 2015 at 03:32:11PM +0300, Sagi Grimberg wrote: > The CM is responsible of establishing an RDMA channel. What you are > referring to is a concept of a session. I'm not entirely sure how we can > fit a model where the CM establishes a multi-channel session as the > CM request contains a (single) source QPN. So there is a 1-1 > relationship between a cm_id and a queue-pair. The device handle depends > on the address resolution to the end-node. > > I assume we can think of some form of an rdma_session which will manage > multiple cm_id's (that belongs to a single address resolution), call > the ULP to allocate their corresponding queue-pairs and send a connect > request for each one. Such an rdma_session can verify the same ib_device > handle on all the cm_id's. But I'm not sure how such a concept would > impact on aspects such as event handling etc... What I'm more interested in is a way to tell the CM that I only want routes that are using this ib_device that I got from the first lookup as all others are useless for me. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20150908131407.GB5316-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: RDMA/CM and multiple QPs [not found] ` <20150908131407.GB5316-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2015-09-10 9:50 ` Sagi Grimberg [not found] ` <55F15269.2060200-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Sagi Grimberg @ 2015-09-10 9:50 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA > What I'm more interested in is a way to tell the CM that I only > want routes that are using this ib_device that I got from the first > lookup as all others are useless for me. > I'm not sure I understand what you are aiming for? if you connect to a single address multiple times you will get the same device because it is the same route right? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <55F15269.2060200-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: RDMA/CM and multiple QPs [not found] ` <55F15269.2060200-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-09-10 13:29 ` Christoph Hellwig [not found] ` <20150910132927.GA6440-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Christoph Hellwig @ 2015-09-10 13:29 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Thu, Sep 10, 2015 at 12:50:33PM +0300, Sagi Grimberg wrote: > >What I'm more interested in is a way to tell the CM that I only > >want routes that are using this ib_device that I got from the first > >lookup as all others are useless for me. > > > > I'm not sure I understand what you are aiming for? if you connect to > a single address multiple times you will get the same device because > it is the same route right? In testing I do get the same all the time, but I don't see anything that gurantees that in code or documentation. Think about the case where the routing changes between the calls, or we're using multipath TCP for example. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20150910132927.GA6440-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: RDMA/CM and multiple QPs [not found] ` <20150910132927.GA6440-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2015-09-10 13:52 ` Sagi Grimberg 0 siblings, 0 replies; 15+ messages in thread From: Sagi Grimberg @ 2015-09-10 13:52 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On 9/10/2015 4:29 PM, Christoph Hellwig wrote: > On Thu, Sep 10, 2015 at 12:50:33PM +0300, Sagi Grimberg wrote: >>> What I'm more interested in is a way to tell the CM that I only >>> want routes that are using this ib_device that I got from the first >>> lookup as all others are useless for me. >>> >> >> I'm not sure I understand what you are aiming for? if you connect to >> a single address multiple times you will get the same device because >> it is the same route right? > > In testing I do get the same all the time, but I don't see anything that > gurantees that in code or documentation. I think it depends on the routing table. > Think about the case where the routing changes between the calls, > or we're using multipath TCP for example. That indeed can happen, in fact, if a bond changes its primary iface you can see different devices. But I don't think you should support that anyway. Just fail the session if you see different devices. I don't think that forcing the CM to a single device would help you as they will probably fail anyway. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: RDMA/CM and multiple QPs [not found] ` <20150906064550.GA30683-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-09-06 7:42 ` Parav Pandit 2015-09-08 12:32 ` Sagi Grimberg @ 2015-09-10 16:30 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A82373A903A082-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2 siblings, 1 reply; 15+ messages in thread From: Hefty, Sean @ 2015-09-10 16:30 UTC (permalink / raw) To: Christoph Hellwig, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > right now RDMA/CM works on a QP basis, but seems very awakward if you > want multiple QPs as part of a single logical device, which will be > useful for a lot of modern protocols. For example we will need to check > in the CM handler that we're not getting a different ib_device if we > want to apply the device limit in any sort of global scope, and it's > generally very hard to get a struct ib_device that can be used as > a driver model parent. > > Is there any interest in trying to add an API to the CM to do a single > address resolution and allocate multiple QPs with these checks in > place? IMO, you want a completely different level of abstraction. One not based on a specific hardware implementation. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <1828884A29C6694DAF28B7E6B8A82373A903A082-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: RDMA/CM and multiple QPs [not found] ` <1828884A29C6694DAF28B7E6B8A82373A903A082-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2015-09-10 17:55 ` Parav Pandit 0 siblings, 0 replies; 15+ messages in thread From: Parav Pandit @ 2015-09-10 17:55 UTC (permalink / raw) To: Hefty, Sean Cc: Christoph Hellwig, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Sorry if you find that I am imposing, but there were not much inputs on below thoughts in this email chain for abstraction, so iterating again to see if there is different view now. I understood the Christoph's requirement is relatively lean where block-mq's MQ can be bound to CPU and/or to RDMA QP. That session layer is probably is the right place, to attach the connection(s) to a session. Establishing multiple QP is just one part of it. Bigger challenge is how do we distribute the work request among multiple QPs specially when STAG advertisements, their invalidation is agnostic at Verbs layer (which is not part of the IB spec and every ULP has their own method possibly for good reason). Few months back when I was working on this problem; solution we considered is similar to what networking stack currently does. As below: 1. instead of having pure ib_send, write, read verbs, invalidate, we need to have more higher level verbs for data transport. such send_data, receive_data, advertise data_buffers etc. Of course keeping zero copy semantics in mind. 2. Perform device aggregation similar to Ethernet netdev link aggregation. So two ib_device forms the pair on which one or more QPs will be created. This virtual device provides higher level data transfer APIS than just raw IB semantics. By doing so, this layer decides how to advertise memory, when to invalidate, which QP to use for transport (load balance or failover). 3. I have not thought through on how we can port existing ULPs whose specification is IB driven to migrate on this newly defined interface. 4. Accelio is one such framework come close to this design philosophy, however its current implementation brings resource overhead for MRs and as we go along we have scope to optimize it. 5. Since this layer is located above raw IB verbs layer and above RDMA-CM, core is untouched for the functionality. Once we have it many of the migration related issue can be solved, where node can disconnect and reconnect in stateful way. 6. This way pure hardware resource is detached from transport acceleration, it gives flexibility to implement services which is often difficult to do at raw IB verbs level. On Thu, Sep 10, 2015 at 10:00 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: >> right now RDMA/CM works on a QP basis, but seems very awakward if you >> want multiple QPs as part of a single logical device, which will be >> useful for a lot of modern protocols. For example we will need to check >> in the CM handler that we're not getting a different ib_device if we >> want to apply the device limit in any sort of global scope, and it's >> generally very hard to get a struct ib_device that can be used as >> a driver model parent. >> >> Is there any interest in trying to add an API to the CM to do a single >> address resolution and allocate multiple QPs with these checks in >> place? > > IMO, you want a completely different level of abstraction. One not based on a specific hardware implementation. > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-09-10 17:55 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-06 6:45 RDMA/CM and multiple QPs Christoph Hellwig
[not found] ` <20150906064550.GA30683-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-09-06 7:42 ` Parav Pandit
[not found] ` <CAG53R5VZDZKiuR-jLybS1PhrT9K4GG6xTr8bOG-L0VaQgqEXSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-06 7:50 ` Christoph Hellwig
[not found] ` <20150906075024.GA7845-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-09-06 7:54 ` Parav Pandit
[not found] ` <CAG53R5UsH3aEmRf2EgNYydJ=cMZCFG19ZQjHcLn=NjQxsnwf-g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-07 5:08 ` Christoph Hellwig
2015-09-06 15:15 ` Bart Van Assche
[not found] ` <55EC5879.202-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-09-08 13:57 ` Tom Talpey
[not found] ` <55EEE936.3060702-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
2015-09-08 15:07 ` Bart Van Assche
2015-09-08 12:32 ` Sagi Grimberg
[not found] ` <55EED54B.7090608-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-09-08 13:14 ` Christoph Hellwig
[not found] ` <20150908131407.GB5316-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-09-10 9:50 ` Sagi Grimberg
[not found] ` <55F15269.2060200-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-09-10 13:29 ` Christoph Hellwig
[not found] ` <20150910132927.GA6440-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-09-10 13:52 ` Sagi Grimberg
2015-09-10 16:30 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A82373A903A082-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-09-10 17:55 ` Parav Pandit
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox