* issues with the rdma-cm server side mapping of IP to GID
@ 2014-02-25 8:18 Or Gerlitz
[not found] ` <530C51EF.2000509-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2014-02-25 8:18 UTC (permalink / raw)
To: Hefty, Sean
Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
Yan Burman
Hi Sean,
We came across a pretty deadly situation with rdma-cm based
client/server application where the client set their RC QP to send to
HCA X on the server node but the server app opened their QP on HCA Y.
The result was un-acked RC packets and RC session failure.
This happened because the mapping between destination IP to destination
GID as seen by the client was different from what's present in the
server IP stack at the time the connection request arrived -- the server
side rdma-cm IP --> GID mapping is established by the
cma_translate_addr() call in cma_new_conn_id() which is done on the
destination IP taken from the RDMA-CM header in the CM REQ.
Such situation can happen in the following cases:
1. net.ipv4.conf.default.arp_ignore equals 0 (the default)
2. server side bonding/teaming fail-over when the Gratitous ARP sent was
lost
3. re-order of ibM net-devices mapping to HCA PCI devices after server
boot/crash
4. etc more
Basically, when the rdma-cm observes difference between the destination
GID as present in the IB path within
the CM REQ to the one resolved locally, we should at least print a
warning. Perhaps, we should reject the connection request? (in that
case, I wasn't sure what would be the appropriate reject reason), any
more ideas?
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: issues with the rdma-cm server side mapping of IP to GID
[not found] ` <530C51EF.2000509-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2014-03-01 23:50 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F0BD-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Hefty, Sean @ 2014-03-01 23:50 UTC (permalink / raw)
To: Or Gerlitz
Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
Yan Burman
> Such situation can happen in the following cases:
>
> 1. net.ipv4.conf.default.arp_ignore equals 0 (the default)
> 2. server side bonding/teaming fail-over when the Gratitous ARP sent was
> lost
> 3. re-order of ibM net-devices mapping to HCA PCI devices after server
> boot/crash
> 4. etc more
>
> Basically, when the rdma-cm observes difference between the destination
> GID as present in the IB path within
> the CM REQ to the one resolved locally, we should at least print a
> warning. Perhaps, we should reject the connection request? (in that
> case, I wasn't sure what would be the appropriate reject reason), any
> more ideas?
I'm not sure that this results in a single error case.
Can the kernel rdma_cm check for net.ipv4.default.arp_ignore on startup and at least print a warning if that is wrong?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: issues with the rdma-cm server side mapping of IP to GID
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F0BD-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2014-03-02 8:51 ` Or Gerlitz
[not found] ` <5312F107.2000404-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2014-03-02 8:51 UTC (permalink / raw)
To: Hefty, Sean
Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
Yan Burman
On 02/03/2014 01:50, Hefty, Sean wrote:
>> Such situation can happen in the following cases:
>>
>> 1. net.ipv4.conf.default.arp_ignore equals 0 (the default)
>> 2. server side bonding/teaming fail-over when the Gratitous ARP sent was
>> lost
>> 3. re-order of ibM net-devices mapping to HCA PCI devices after server
>> boot/crash
>> 4. etc more
>>
>> Basically, when the rdma-cm observes difference between the destination GID as present in the IB path within the CM REQ to the one resolved locally, we should at least print a warning. Perhaps, we should reject the connection request? (in that case, I wasn't sure what would be the appropriate reject reason), any more ideas?
> I'm not sure that this results in a single error case.
Sorry... I'm not sure to follow, can you elaborate a bit more?
> Can the kernel rdma_cm check for net.ipv4.default.arp_ignore on startup and at least print a warning if that is wrong?
I am not sure, and anyway, please note that I brought at least two more
use cases where the problem happens
- following server side bonding fail-over
- following server side reboot after which the PCI ordering changes
between two HCAs and hence ibM devices change their PCI association
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: issues with the rdma-cm server side mapping of IP to GID
[not found] ` <5312F107.2000404-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2014-03-03 14:46 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F39F-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Hefty, Sean @ 2014-03-03 14:46 UTC (permalink / raw)
To: Or Gerlitz
Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
Yan Burman
> >> 1. net.ipv4.conf.default.arp_ignore equals 0 (the default)
> >> 2. server side bonding/teaming fail-over when the Gratitous ARP sent was
> >> lost
> >> 3. re-order of ibM net-devices mapping to HCA PCI devices after server
> >> boot/crash
> >> 4. etc more
> >>
> >> Basically, when the rdma-cm observes difference between the destination
> GID as present in the IB path within the CM REQ to the one resolved
> locally, we should at least print a warning. Perhaps, we should reject the
> connection request? (in that case, I wasn't sure what would be the
> appropriate reject reason), any more ideas?
> > I'm not sure that this results in a single error case.
>
> Sorry... I'm not sure to follow, can you elaborate a bit more?
We don't know what type of device responds to the ARP query. It could come from an ethernet device.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: issues with the rdma-cm server side mapping of IP to GID
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F39F-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2014-03-03 16:47 ` Or Gerlitz
[not found] ` <CAJZOPZ+ZC62FeZCy17ZMkzkxqrTdTrNRDs+nWQQ4Xjb9Sx5T3A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2014-03-03 16:47 UTC (permalink / raw)
To: Hefty, Sean
Cc: Or Gerlitz,
linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
Yan Burman
On Mon, Mar 3, 2014 at 4:46 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>>> I'm not sure that this results in a single error case.
>> Sorry... I'm not sure to follow, can you elaborate a bit more?
> We don't know what type of device responds to the ARP query. It could come from an ethernet device.
Yep, this issue is really nasty..., but wait, you mentioned Ethernet,
well, if the fabric is IB we do know that the GID in the REQ belongs
to an HCA of that server node, b/c the client did route (== path
query) resolution based on this DGID and their CM REQ landed in our
hands, right?
We could come and say that we adhere to the IP --> GID mapping as
present in the CM REQ (GID in the path, IP in the CMA header) and
associate the newly created CMA ID with the device/port where this GID
resides, no matter what the local IP stack has to say. This would
work, but for people that seek HA for their apps, multiple sessions
can be created over the same server hca/port where they wanted them to
be spreaded... what we can when such inconsistency is observed by the
rdma-cm is the following
1. print warning to the system log
2. reject the connection request
3. send Gratuitous ARP to update client nodes IPoIB neighbour IP --> GID mapping
I suggest that we 1st debate/agree on something that makes sense with
IB and later see how it would work for RoCE
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: issues with the rdma-cm server side mapping of IP to GID
[not found] ` <CAJZOPZ+ZC62FeZCy17ZMkzkxqrTdTrNRDs+nWQQ4Xjb9Sx5T3A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-03 17:15 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F440-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Hefty, Sean @ 2014-03-03 17:15 UTC (permalink / raw)
To: Or Gerlitz
Cc: Or Gerlitz,
linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
Yan Burman
> >>> I'm not sure that this results in a single error case.
>
> >> Sorry... I'm not sure to follow, can you elaborate a bit more?
>
> > We don't know what type of device responds to the ARP query. It could
> come from an ethernet device.
>
> Yep, this issue is really nasty..., but wait, you mentioned Ethernet,
> well, if the fabric is IB we do know that the GID in the REQ belongs
> to an HCA of that server node, b/c the client did route (== path
> query) resolution based on this DGID and their CM REQ landed in our
> hands, right?
>
> We could come and say that we adhere to the IP --> GID mapping as
> present in the CM REQ (GID in the path, IP in the CMA header) and
> associate the newly created CMA ID with the device/port where this GID
> resides, no matter what the local IP stack has to say. This would
> work, but for people that seek HA for their apps, multiple sessions
> can be created over the same server hca/port where they wanted them to
> be spreaded... what we can when such inconsistency is observed by the
> rdma-cm is the following
>
> 1. print warning to the system log
> 2. reject the connection request
> 3. send Gratuitous ARP to update client nodes IPoIB neighbour IP --> GID
> mapping
>
> I suggest that we 1st debate/agree on something that makes sense with
> IB and later see how it would work for RoCE
I wasn't referring to RoCE. I was simply saying that this problem may result in multiple errors. It's possible that we may never reach the point of sending a CM REQ. It seems best to try to detect this problem as early as possible.
If a CM REQ actually gets to the remote side, it could be rejected as an invalid GID, and the client could retry the request.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: issues with the rdma-cm server side mapping of IP to GID
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F440-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2014-03-04 21:12 ` Or Gerlitz
[not found] ` <CAJZOPZKT0oGfx99PhX9OP5_qaa2QyRVEJqE+hSkM7tykN23GOw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2014-03-04 21:12 UTC (permalink / raw)
To: Hefty, Sean
Cc: Or Gerlitz,
linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
Yan Burman
On Mon, Mar 3, 2014 at 7:15 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
[...]
> If a CM REQ actually gets to the remote side, it could be rejected as an invalid GID, and the client could retry the request.
Retrying the request is practically calling
1. rdma_resolve_addr
2. rdma_resolve_route
3. rdma_connect
where step #1 would get wrong GID from the client arp cache and hence
step #3 would result in a reject and so on...
I am starting to think that the rdma address resolution must not make
use of the arp cache, or at least provide applications
the API to dictate that ARP request must be sent, how this sounds?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: issues with the rdma-cm server side mapping of IP to GID
[not found] ` <CAJZOPZKT0oGfx99PhX9OP5_qaa2QyRVEJqE+hSkM7tykN23GOw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-04 21:31 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D200B2-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Hefty, Sean @ 2014-03-04 21:31 UTC (permalink / raw)
To: Or Gerlitz
Cc: Or Gerlitz,
linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
Yan Burman
> > If a CM REQ actually gets to the remote side, it could be rejected as an
> invalid GID, and the client could retry the request.
>
> Retrying the request is practically calling
>
> 1. rdma_resolve_addr
> 2. rdma_resolve_route
> 3. rdma_connect
Yep - the entire setup is broken. If the wrong remote GID was resolved, then the wrong local GID _may_ have been selected. There's no easy guaranteed recovery here.
> where step #1 would get wrong GID from the client arp cache and hence
> step #3 would result in a reject and so on...
> I am starting to think that the rdma address resolution must not make
> use of the arp cache, or at least provide applications
> the API to dictate that ARP request must be sent, how this sounds?
Clients should not be made aware of how the resolution was done. The RDMA CM needs to abstract that problem. Alternate mechanisms may be usable, but there aren't exactly a whole lot of options available. The client could use native IB addressing at this point if it does not want to rely on address translation.
- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: issues with the rdma-cm server side mapping of IP to GID
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D200B2-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2014-03-04 21:39 ` Hefty, Sean
2014-03-05 20:04 ` Or Gerlitz
1 sibling, 0 replies; 10+ messages in thread
From: Hefty, Sean @ 2014-03-04 21:39 UTC (permalink / raw)
To: Hefty, Sean, Or Gerlitz
Cc: Or Gerlitz,
linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
Yan Burman
> > > If a CM REQ actually gets to the remote side, it could be rejected as
> an
> > invalid GID, and the client could retry the request.
> >
> > Retrying the request is practically calling
> >
> > 1. rdma_resolve_addr
> > 2. rdma_resolve_route
> > 3. rdma_connect
>
> Yep - the entire setup is broken. If the wrong remote GID was resolved,
> then the wrong local GID _may_ have been selected. There's no easy
> guaranteed recovery here.
On second thought, I don't think this is true. The SGID is selected based on the IP address, not the DGID. If the remote CM rejects the connection, the RDMA CM may be able to recover by restarting at step 2. Query for a new PR, then re-issue the connect request. It may be possible to do this without the application's involvement. I'm not sure how the librdmacm would handle this, since the initial resolve_route would be redone.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: issues with the rdma-cm server side mapping of IP to GID
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D200B2-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-03-04 21:39 ` Hefty, Sean
@ 2014-03-05 20:04 ` Or Gerlitz
1 sibling, 0 replies; 10+ messages in thread
From: Or Gerlitz @ 2014-03-05 20:04 UTC (permalink / raw)
To: Hefty, Sean
Cc: Or Gerlitz,
linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
Yan Burman
On Tue, Mar 4, 2014 at 11:31 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> Clients should not be made aware of how the resolution was done. The RDMA CM needs to abstract that problem. Alternate mechanisms may be usable, but there aren't exactly a whole lot of options available. The client could use native IB addressing at this point if it does not want to rely on address translation.
No, native IB addressing isn't an option for this and many other
applications. they want to use IP addressing. We have to make sure
that our IP --> RDMA address resolving which uses neighbour lookup is
robust under the real life schemes I brought which makes the RC
connection / UD session established based on this resolution to be
totally broken.
So given the fact that we have a solution for large scale through
native IB addressing that offloads the address and route resolution --
what's your thinking on my suggestion to make address resolution for
rdma-cm endpoints creates with the non native IB port space to use
solicit neighbour lookup (e.g avoid arp cache usage?)
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-03-05 20:04 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-25 8:18 issues with the rdma-cm server side mapping of IP to GID Or Gerlitz
[not found] ` <530C51EF.2000509-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-03-01 23:50 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F0BD-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-03-02 8:51 ` Or Gerlitz
[not found] ` <5312F107.2000404-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-03-03 14:46 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F39F-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-03-03 16:47 ` Or Gerlitz
[not found] ` <CAJZOPZ+ZC62FeZCy17ZMkzkxqrTdTrNRDs+nWQQ4Xjb9Sx5T3A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-03 17:15 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F440-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-03-04 21:12 ` Or Gerlitz
[not found] ` <CAJZOPZKT0oGfx99PhX9OP5_qaa2QyRVEJqE+hSkM7tykN23GOw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-04 21:31 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D200B2-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-03-04 21:39 ` Hefty, Sean
2014-03-05 20:04 ` Or Gerlitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).