* issues with the rdma-cm server side mapping of IP to GID
@ 2014-02-25 8:18 Or Gerlitz
[not found] ` <530C51EF.2000509-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2014-02-25 8:18 UTC (permalink / raw)
To: Hefty, Sean
Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
Yan Burman
Hi Sean,
We came across a pretty deadly situation with rdma-cm based
client/server application where the client set their RC QP to send to
HCA X on the server node but the server app opened their QP on HCA Y.
The result was un-acked RC packets and RC session failure.
This happened because the mapping between destination IP to destination
GID as seen by the client was different from what's present in the
server IP stack at the time the connection request arrived -- the server
side rdma-cm IP --> GID mapping is established by the
cma_translate_addr() call in cma_new_conn_id() which is done on the
destination IP taken from the RDMA-CM header in the CM REQ.
Such situation can happen in the following cases:
1. net.ipv4.conf.default.arp_ignore equals 0 (the default)
2. server side bonding/teaming fail-over when the Gratitous ARP sent was
lost
3. re-order of ibM net-devices mapping to HCA PCI devices after server
boot/crash
4. etc more
Basically, when the rdma-cm observes difference between the destination
GID as present in the IB path within
the CM REQ to the one resolved locally, we should at least print a
warning. Perhaps, we should reject the connection request? (in that
case, I wasn't sure what would be the appropriate reject reason), any
more ideas?
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread[parent not found: <530C51EF.2000509-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* RE: issues with the rdma-cm server side mapping of IP to GID [not found] ` <530C51EF.2000509-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2014-03-01 23:50 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F0BD-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Hefty, Sean @ 2014-03-01 23:50 UTC (permalink / raw) To: Or Gerlitz Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org), Yan Burman > Such situation can happen in the following cases: > > 1. net.ipv4.conf.default.arp_ignore equals 0 (the default) > 2. server side bonding/teaming fail-over when the Gratitous ARP sent was > lost > 3. re-order of ibM net-devices mapping to HCA PCI devices after server > boot/crash > 4. etc more > > Basically, when the rdma-cm observes difference between the destination > GID as present in the IB path within > the CM REQ to the one resolved locally, we should at least print a > warning. Perhaps, we should reject the connection request? (in that > case, I wasn't sure what would be the appropriate reject reason), any > more ideas? I'm not sure that this results in a single error case. Can the kernel rdma_cm check for net.ipv4.default.arp_ignore on startup and at least print a warning if that is wrong? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <1828884A29C6694DAF28B7E6B8A8237388D1F0BD-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: issues with the rdma-cm server side mapping of IP to GID [not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F0BD-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2014-03-02 8:51 ` Or Gerlitz [not found] ` <5312F107.2000404-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Or Gerlitz @ 2014-03-02 8:51 UTC (permalink / raw) To: Hefty, Sean Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org), Yan Burman On 02/03/2014 01:50, Hefty, Sean wrote: >> Such situation can happen in the following cases: >> >> 1. net.ipv4.conf.default.arp_ignore equals 0 (the default) >> 2. server side bonding/teaming fail-over when the Gratitous ARP sent was >> lost >> 3. re-order of ibM net-devices mapping to HCA PCI devices after server >> boot/crash >> 4. etc more >> >> Basically, when the rdma-cm observes difference between the destination GID as present in the IB path within the CM REQ to the one resolved locally, we should at least print a warning. Perhaps, we should reject the connection request? (in that case, I wasn't sure what would be the appropriate reject reason), any more ideas? > I'm not sure that this results in a single error case. Sorry... I'm not sure to follow, can you elaborate a bit more? > Can the kernel rdma_cm check for net.ipv4.default.arp_ignore on startup and at least print a warning if that is wrong? I am not sure, and anyway, please note that I brought at least two more use cases where the problem happens - following server side bonding fail-over - following server side reboot after which the PCI ordering changes between two HCAs and hence ibM devices change their PCI association Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <5312F107.2000404-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* RE: issues with the rdma-cm server side mapping of IP to GID [not found] ` <5312F107.2000404-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2014-03-03 14:46 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F39F-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Hefty, Sean @ 2014-03-03 14:46 UTC (permalink / raw) To: Or Gerlitz Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org), Yan Burman > >> 1. net.ipv4.conf.default.arp_ignore equals 0 (the default) > >> 2. server side bonding/teaming fail-over when the Gratitous ARP sent was > >> lost > >> 3. re-order of ibM net-devices mapping to HCA PCI devices after server > >> boot/crash > >> 4. etc more > >> > >> Basically, when the rdma-cm observes difference between the destination > GID as present in the IB path within the CM REQ to the one resolved > locally, we should at least print a warning. Perhaps, we should reject the > connection request? (in that case, I wasn't sure what would be the > appropriate reject reason), any more ideas? > > I'm not sure that this results in a single error case. > > Sorry... I'm not sure to follow, can you elaborate a bit more? We don't know what type of device responds to the ARP query. It could come from an ethernet device. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <1828884A29C6694DAF28B7E6B8A8237388D1F39F-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: issues with the rdma-cm server side mapping of IP to GID [not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F39F-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2014-03-03 16:47 ` Or Gerlitz [not found] ` <CAJZOPZ+ZC62FeZCy17ZMkzkxqrTdTrNRDs+nWQQ4Xjb9Sx5T3A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Or Gerlitz @ 2014-03-03 16:47 UTC (permalink / raw) To: Hefty, Sean Cc: Or Gerlitz, linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org), Yan Burman On Mon, Mar 3, 2014 at 4:46 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: >>> I'm not sure that this results in a single error case. >> Sorry... I'm not sure to follow, can you elaborate a bit more? > We don't know what type of device responds to the ARP query. It could come from an ethernet device. Yep, this issue is really nasty..., but wait, you mentioned Ethernet, well, if the fabric is IB we do know that the GID in the REQ belongs to an HCA of that server node, b/c the client did route (== path query) resolution based on this DGID and their CM REQ landed in our hands, right? We could come and say that we adhere to the IP --> GID mapping as present in the CM REQ (GID in the path, IP in the CMA header) and associate the newly created CMA ID with the device/port where this GID resides, no matter what the local IP stack has to say. This would work, but for people that seek HA for their apps, multiple sessions can be created over the same server hca/port where they wanted them to be spreaded... what we can when such inconsistency is observed by the rdma-cm is the following 1. print warning to the system log 2. reject the connection request 3. send Gratuitous ARP to update client nodes IPoIB neighbour IP --> GID mapping I suggest that we 1st debate/agree on something that makes sense with IB and later see how it would work for RoCE Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <CAJZOPZ+ZC62FeZCy17ZMkzkxqrTdTrNRDs+nWQQ4Xjb9Sx5T3A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* RE: issues with the rdma-cm server side mapping of IP to GID [not found] ` <CAJZOPZ+ZC62FeZCy17ZMkzkxqrTdTrNRDs+nWQQ4Xjb9Sx5T3A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-03-03 17:15 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F440-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Hefty, Sean @ 2014-03-03 17:15 UTC (permalink / raw) To: Or Gerlitz Cc: Or Gerlitz, linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org), Yan Burman > >>> I'm not sure that this results in a single error case. > > >> Sorry... I'm not sure to follow, can you elaborate a bit more? > > > We don't know what type of device responds to the ARP query. It could > come from an ethernet device. > > Yep, this issue is really nasty..., but wait, you mentioned Ethernet, > well, if the fabric is IB we do know that the GID in the REQ belongs > to an HCA of that server node, b/c the client did route (== path > query) resolution based on this DGID and their CM REQ landed in our > hands, right? > > We could come and say that we adhere to the IP --> GID mapping as > present in the CM REQ (GID in the path, IP in the CMA header) and > associate the newly created CMA ID with the device/port where this GID > resides, no matter what the local IP stack has to say. This would > work, but for people that seek HA for their apps, multiple sessions > can be created over the same server hca/port where they wanted them to > be spreaded... what we can when such inconsistency is observed by the > rdma-cm is the following > > 1. print warning to the system log > 2. reject the connection request > 3. send Gratuitous ARP to update client nodes IPoIB neighbour IP --> GID > mapping > > I suggest that we 1st debate/agree on something that makes sense with > IB and later see how it would work for RoCE I wasn't referring to RoCE. I was simply saying that this problem may result in multiple errors. It's possible that we may never reach the point of sending a CM REQ. It seems best to try to detect this problem as early as possible. If a CM REQ actually gets to the remote side, it could be rejected as an invalid GID, and the client could retry the request. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <1828884A29C6694DAF28B7E6B8A8237388D1F440-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: issues with the rdma-cm server side mapping of IP to GID [not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F440-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2014-03-04 21:12 ` Or Gerlitz [not found] ` <CAJZOPZKT0oGfx99PhX9OP5_qaa2QyRVEJqE+hSkM7tykN23GOw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Or Gerlitz @ 2014-03-04 21:12 UTC (permalink / raw) To: Hefty, Sean Cc: Or Gerlitz, linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org), Yan Burman On Mon, Mar 3, 2014 at 7:15 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: [...] > If a CM REQ actually gets to the remote side, it could be rejected as an invalid GID, and the client could retry the request. Retrying the request is practically calling 1. rdma_resolve_addr 2. rdma_resolve_route 3. rdma_connect where step #1 would get wrong GID from the client arp cache and hence step #3 would result in a reject and so on... I am starting to think that the rdma address resolution must not make use of the arp cache, or at least provide applications the API to dictate that ARP request must be sent, how this sounds? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <CAJZOPZKT0oGfx99PhX9OP5_qaa2QyRVEJqE+hSkM7tykN23GOw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* RE: issues with the rdma-cm server side mapping of IP to GID [not found] ` <CAJZOPZKT0oGfx99PhX9OP5_qaa2QyRVEJqE+hSkM7tykN23GOw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-03-04 21:31 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A8237388D200B2-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Hefty, Sean @ 2014-03-04 21:31 UTC (permalink / raw) To: Or Gerlitz Cc: Or Gerlitz, linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org), Yan Burman > > If a CM REQ actually gets to the remote side, it could be rejected as an > invalid GID, and the client could retry the request. > > Retrying the request is practically calling > > 1. rdma_resolve_addr > 2. rdma_resolve_route > 3. rdma_connect Yep - the entire setup is broken. If the wrong remote GID was resolved, then the wrong local GID _may_ have been selected. There's no easy guaranteed recovery here. > where step #1 would get wrong GID from the client arp cache and hence > step #3 would result in a reject and so on... > I am starting to think that the rdma address resolution must not make > use of the arp cache, or at least provide applications > the API to dictate that ARP request must be sent, how this sounds? Clients should not be made aware of how the resolution was done. The RDMA CM needs to abstract that problem. Alternate mechanisms may be usable, but there aren't exactly a whole lot of options available. The client could use native IB addressing at this point if it does not want to rely on address translation. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <1828884A29C6694DAF28B7E6B8A8237388D200B2-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* RE: issues with the rdma-cm server side mapping of IP to GID [not found] ` <1828884A29C6694DAF28B7E6B8A8237388D200B2-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2014-03-04 21:39 ` Hefty, Sean 2014-03-05 20:04 ` Or Gerlitz 1 sibling, 0 replies; 10+ messages in thread From: Hefty, Sean @ 2014-03-04 21:39 UTC (permalink / raw) To: Hefty, Sean, Or Gerlitz Cc: Or Gerlitz, linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org), Yan Burman > > > If a CM REQ actually gets to the remote side, it could be rejected as > an > > invalid GID, and the client could retry the request. > > > > Retrying the request is practically calling > > > > 1. rdma_resolve_addr > > 2. rdma_resolve_route > > 3. rdma_connect > > Yep - the entire setup is broken. If the wrong remote GID was resolved, > then the wrong local GID _may_ have been selected. There's no easy > guaranteed recovery here. On second thought, I don't think this is true. The SGID is selected based on the IP address, not the DGID. If the remote CM rejects the connection, the RDMA CM may be able to recover by restarting at step 2. Query for a new PR, then re-issue the connect request. It may be possible to do this without the application's involvement. I'm not sure how the librdmacm would handle this, since the initial resolve_route would be redone. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: issues with the rdma-cm server side mapping of IP to GID [not found] ` <1828884A29C6694DAF28B7E6B8A8237388D200B2-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2014-03-04 21:39 ` Hefty, Sean @ 2014-03-05 20:04 ` Or Gerlitz 1 sibling, 0 replies; 10+ messages in thread From: Or Gerlitz @ 2014-03-05 20:04 UTC (permalink / raw) To: Hefty, Sean Cc: Or Gerlitz, linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org), Yan Burman On Tue, Mar 4, 2014 at 11:31 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: > Clients should not be made aware of how the resolution was done. The RDMA CM needs to abstract that problem. Alternate mechanisms may be usable, but there aren't exactly a whole lot of options available. The client could use native IB addressing at this point if it does not want to rely on address translation. No, native IB addressing isn't an option for this and many other applications. they want to use IP addressing. We have to make sure that our IP --> RDMA address resolving which uses neighbour lookup is robust under the real life schemes I brought which makes the RC connection / UD session established based on this resolution to be totally broken. So given the fact that we have a solution for large scale through native IB addressing that offloads the address and route resolution -- what's your thinking on my suggestion to make address resolution for rdma-cm endpoints creates with the non native IB port space to use solicit neighbour lookup (e.g avoid arp cache usage?) Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-03-05 20:04 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-25 8:18 issues with the rdma-cm server side mapping of IP to GID Or Gerlitz
[not found] ` <530C51EF.2000509-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-03-01 23:50 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F0BD-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-03-02 8:51 ` Or Gerlitz
[not found] ` <5312F107.2000404-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-03-03 14:46 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F39F-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-03-03 16:47 ` Or Gerlitz
[not found] ` <CAJZOPZ+ZC62FeZCy17ZMkzkxqrTdTrNRDs+nWQQ4Xjb9Sx5T3A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-03 17:15 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D1F440-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-03-04 21:12 ` Or Gerlitz
[not found] ` <CAJZOPZKT0oGfx99PhX9OP5_qaa2QyRVEJqE+hSkM7tykN23GOw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-04 21:31 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388D200B2-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-03-04 21:39 ` Hefty, Sean
2014-03-05 20:04 ` Or Gerlitz
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.