All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next v1 0/2] RDMA: fix cross-NIC same-host IPv6 RDMA-CM connect
@ 2026-06-15 17:46 Alex Timofeyev
  2026-06-15 17:46 ` [PATCH rdma-next v1 2/2] RDMA/cma: accept cross-NIC same-host local dst in validate_ipv6_net_dev Alex Timofeyev
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Alex Timofeyev @ 2026-06-15 17:46 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, linux-rdma
  Cc: Parav Pandit, Edward Srouji, Vlad Dumitrescu, stable,
	linux-kernel

RDMA-CM cannot establish an IPv6 RoCEv2 connection between two NICs that
live on the same host. This shows up on hosts that pin one process per
NUMA-local NIC and let those processes talk to each other over each NIC's
global IPv6 GID (e.g. a storage daemon with one engine per NUMA node on
dual ConnectX-7). rdma_resolve_addr() and ib_send_cm_req() both return
success, but the destination NIC silently drops the frame and the peer
never sees the REQ; the connection times out.

The bug has two halves, one on each side of the connection:

1) Send side (patch 1, drivers/infiniband/core/addr.c)

   When the destination address is local, addr_resolve_neigh() copies the
   *source* device's MAC into the path record's destination MAC. That is
   right for true loopback (same netdev), but for a destination that lives
   on a different netdev of the same host the destination NIC will not
   accept a frame addressed to the source NIC's MAC and drops it in HW.
   The fix resolves the netdev that owns the destination address and uses
   its MAC.

2) Receive side (patch 2, drivers/infiniband/core/cma.c)

   Once the REQ does reach the peer, validate_ipv6_net_dev() rejects it:
   rt6_lookup() of a same-host destination collapses onto the loopback
   netdev, so the strict rt6i_idev->dev == net_dev check fails with
   -EHOSTUNREACH even though the REQ arrived on the right net_dev. The fix
   accepts an RTF_LOCAL route when net_dev itself owns the listener
   address. This half is only observable once patch 1 lets the REQ arrive.

Both halves are needed for a working connection; patch 1 alone makes the
REQ reach the peer but it is then rejected by the unfixed receive side.

Verification
------------
Measured on two RoCEv2 ConnectX-7 ports on the same host, each with a
global IPv6 GID (port A "src", port B "dst"), driving a cross-NIC
RDMA-CM connect (rping, src GID on port A -> dst GID on port B) while
tracing the destination MAC resolved in addr_resolve():

  without the series:  resolved dst MAC = port A's MAC (the *source* NIC)
                        -> frame dropped, connect times out
  with the series:     resolved dst MAC = port B's MAC (the *dest* NIC)
                        -> connect completes

The kernel under test carried c31e4038c97f and its dst_rtable() prereq
(i.e. the same addr_resolve_neigh()/is_dst_local() shape as for-next);
the change applies unmodified to rdma.git for-next.

Note on stable: the Fixes: tags bound the backport to where each construct
exists in its current form. Trees predating c31e4038c97f have the
equivalent send-side gap in the older IFF_LOOPBACK form of
addr_resolve_neigh() and would need a separately shaped backport.

The patches are independent files but should be applied as a pair so the
connection works end to end.

Alex Timofeyev (2):
  RDMA/core: use destination netdev MAC for cross-NIC same-host local
    dst
  RDMA/cma: accept cross-NIC same-host local dst in
    validate_ipv6_net_dev

 drivers/infiniband/core/addr.c | 22 +++++++++++++++++++---
 drivers/infiniband/core/cma.c  | 15 ++++++++++++++-
 2 files changed, 33 insertions(+), 4 deletions(-)


base-commit: 20ff9350862468af21b46cae2c22d17d6ec637f9
-- 
2.40.4


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH rdma-next v1 1/2] RDMA/core: use destination netdev MAC for cross-NIC same-host local dst
  2026-06-15 17:46 [PATCH rdma-next v1 0/2] RDMA: fix cross-NIC same-host IPv6 RDMA-CM connect Alex Timofeyev
  2026-06-15 17:46 ` [PATCH rdma-next v1 2/2] RDMA/cma: accept cross-NIC same-host local dst in validate_ipv6_net_dev Alex Timofeyev
@ 2026-06-15 17:46 ` Alex Timofeyev
  2026-06-15 23:59 ` [PATCH rdma-next v1 0/2] RDMA: fix cross-NIC same-host IPv6 RDMA-CM connect Jason Gunthorpe
  2 siblings, 0 replies; 4+ messages in thread
From: Alex Timofeyev @ 2026-06-15 17:46 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, linux-rdma
  Cc: Parav Pandit, Edward Srouji, Vlad Dumitrescu, stable,
	linux-kernel

addr_resolve_neigh() treats every is_dst_local() destination as loopback
and copies the source device's MAC into the path record's destination MAC
(dst_dev_addr <- src_dev_addr). That is correct for true loopback (source
and destination on the same netdev), but wrong when the local destination
address lives on a different netdev of the same host.

In that cross-NIC same-host case the destination NIC will not accept a
frame whose destination MAC is the source NIC's MAC, and drops it in
hardware before it reaches the peer. rdma_resolve_addr() and
ib_send_cm_req() both return success, but the CM REQ never arrives and the
connection times out.

Look up the netdev that owns the destination address and copy its MAC into
dst_dev_addr instead. Fall back to the source MAC when no netdev claims the
address (true loopback), preserving the existing behaviour.

This was observed with two RoCEv2 ConnectX-7 ports on the same host, each
holding a global IPv6 GID, when one process pinned per NUMA NIC connected
to the other over RDMA-CM: the resolved destination MAC was the source
port's MAC instead of the destination port's, and the REQ was silently
dropped. With the fix the resolved MAC is the destination port's and the
connection completes.

Fixes: c31e4038c97f ("RDMA/core: Use route entry flag to decide on loopback traffic")
Cc: stable@vger.kernel.org
Cc: Parav Pandit <parav@nvidia.com>
Signed-off-by: Alex Timofeyev <sashka@ankey.net>
---
 drivers/infiniband/core/addr.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 7e62b5b1ffaa..84aa43436bfe 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -451,10 +451,26 @@ static int addr_resolve_neigh(const struct dst_entry *dst,
 			      u32 seq)
 {
 	if (is_dst_local(dst)) {
-		/* When the destination is local entry, source and destination
-		 * are same. Skip the neighbour lookup.
+		struct net_device *dst_ndev;
+
+		/* When the destination is local, source and destination are on
+		 * the same host. For true loopback (same netdev) the source and
+		 * destination MACs are equal, but when the destination address
+		 * lives on a different netdev of the same host the destination
+		 * MAC must be that netdev's MAC -- otherwise the destination NIC
+		 * silently drops the frame. Look up the netdev that owns the
+		 * destination address and copy its MAC; fall back to the source
+		 * MAC if no netdev claims the address.
 		 */
-		memcpy(addr->dst_dev_addr, addr->src_dev_addr, MAX_ADDR_LEN);
+		rcu_read_lock();
+		dst_ndev = rdma_find_ndev_for_src_ip_rcu(dev_net(dst->dev), dst_in);
+		if (!IS_ERR(dst_ndev))
+			memcpy(addr->dst_dev_addr, dst_ndev->dev_addr,
+			       MAX_ADDR_LEN);
+		else
+			memcpy(addr->dst_dev_addr, addr->src_dev_addr,
+			       MAX_ADDR_LEN);
+		rcu_read_unlock();
 		return 0;
 	}
 
-- 
2.40.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH rdma-next v1 2/2] RDMA/cma: accept cross-NIC same-host local dst in validate_ipv6_net_dev
  2026-06-15 17:46 [PATCH rdma-next v1 0/2] RDMA: fix cross-NIC same-host IPv6 RDMA-CM connect Alex Timofeyev
@ 2026-06-15 17:46 ` Alex Timofeyev
  2026-06-15 17:46 ` [PATCH rdma-next v1 1/2] RDMA/core: use destination netdev MAC for cross-NIC same-host local dst Alex Timofeyev
  2026-06-15 23:59 ` [PATCH rdma-next v1 0/2] RDMA: fix cross-NIC same-host IPv6 RDMA-CM connect Jason Gunthorpe
  2 siblings, 0 replies; 4+ messages in thread
From: Alex Timofeyev @ 2026-06-15 17:46 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, linux-rdma
  Cc: Parav Pandit, Edward Srouji, Vlad Dumitrescu, stable,
	linux-kernel

validate_ipv6_net_dev() confirms an incoming CM REQ was delivered on the
correct net_dev with an rt6_lookup() that requires
rt->rt6i_idev->dev == net_dev. For an IPv6 destination that is local to a
different netdev of the same host, the FIB resolves the lookup onto the
loopback netdev, so rt6i_idev->dev is lo regardless of which physical
netdev owns the listener address. The strict comparison then rejects the
REQ with -EHOSTUNREACH even though it was correctly delivered on net_dev.

Accept the request when the resolved route is RTF_LOCAL and net_dev itself
owns the address the listener was bound to (src_addr). This is the
receive-side counterpart to the cross-NIC same-host send-side fix in
addr_resolve_neigh().

Fixes: f887f2ac87c2 ("IB/cma: Validate routing of incoming requests")
Cc: stable@vger.kernel.org
Cc: Parav Pandit <parav@nvidia.com>
Signed-off-by: Alex Timofeyev <sashka@ankey.net>
---
 drivers/infiniband/core/cma.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 9480d1a51c11..872c57943362 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1635,7 +1635,20 @@ static bool validate_ipv6_net_dev(struct net_device *net_dev,
 	if (!rt)
 		return false;
 
-	ret = rt->rt6i_idev->dev == net_dev;
+	if (rt->rt6i_idev->dev == net_dev) {
+		ret = true;
+	} else if (rt->rt6i_flags & RTF_LOCAL) {
+		/* For a destination that is local to another netdev of the same
+		 * host, the FIB collapses the lookup onto the loopback netdev,
+		 * so rt6i_idev->dev is not net_dev even though the request was
+		 * correctly delivered on net_dev. Accept it when net_dev itself
+		 * owns the address we were listening on.
+		 */
+		ret = ipv6_chk_addr(dev_net(net_dev), &src_addr->sin6_addr,
+				    net_dev, 1);
+	} else {
+		ret = false;
+	}
 	ip6_rt_put(rt);
 
 	return ret;
-- 
2.40.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH rdma-next v1 0/2] RDMA: fix cross-NIC same-host IPv6 RDMA-CM connect
  2026-06-15 17:46 [PATCH rdma-next v1 0/2] RDMA: fix cross-NIC same-host IPv6 RDMA-CM connect Alex Timofeyev
  2026-06-15 17:46 ` [PATCH rdma-next v1 2/2] RDMA/cma: accept cross-NIC same-host local dst in validate_ipv6_net_dev Alex Timofeyev
  2026-06-15 17:46 ` [PATCH rdma-next v1 1/2] RDMA/core: use destination netdev MAC for cross-NIC same-host local dst Alex Timofeyev
@ 2026-06-15 23:59 ` Jason Gunthorpe
  2 siblings, 0 replies; 4+ messages in thread
From: Jason Gunthorpe @ 2026-06-15 23:59 UTC (permalink / raw)
  To: Alex Timofeyev
  Cc: Leon Romanovsky, linux-rdma, Parav Pandit, Edward Srouji,
	Vlad Dumitrescu, linux-kernel

On Mon, Jun 15, 2026 at 05:46:19PM +0000, Alex Timofeyev wrote:
> RDMA-CM cannot establish an IPv6 RoCEv2 connection between two NICs that
> live on the same host. This shows up on hosts that pin one process per
> NUMA-local NIC and let those processes talk to each other over each NIC's
> global IPv6 GID (e.g. a storage daemon with one engine per NUMA node on
> dual ConnectX-7). rdma_resolve_addr() and ib_send_cm_req() both return
> success, but the destination NIC silently drops the frame and the peer
> never sees the REQ; the connection times out.
> 
> The bug has two halves, one on each side of the connection:
> 
> 1) Send side (patch 1, drivers/infiniband/core/addr.c)
> 
>    When the destination address is local, addr_resolve_neigh() copies the
>    *source* device's MAC into the path record's destination MAC. That is
>    right for true loopback (same netdev), but for a destination that lives
>    on a different netdev of the same host the destination NIC will not
>    accept a frame addressed to the source NIC's MAC and drops it in HW.
>    The fix resolves the netdev that owns the destination address and uses
>    its MAC.

I'm not sure about this, you need to have policy routing or VRF setup
so these local routes don't show up.. Do you have that?

A local route result should result only in a local loopback AH, it should
never result in a packet on the wire, and we shouldn't be trying to
mangle loopback routes at all.

> 2) Receive side (patch 2, drivers/infiniband/core/cma.c)
> 
>    Once the REQ does reach the peer, validate_ipv6_net_dev() rejects it:
>    rt6_lookup() of a same-host destination collapses onto the loopback
>    netdev, so the strict rt6i_idev->dev == net_dev check fails with
>    -EHOSTUNREACH even though the REQ arrived on the right net_dev. The fix
>    accepts an RTF_LOCAL route when net_dev itself owns the listener
>    address. This half is only observable once patch 1 lets the REQ
>    arrive.

Same answer here, if you have proper routing you won't get a loopback
route to match and you won't fail on this check. Removing the check
does not seem correct.

Jason

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-15 23:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-15 17:46 [PATCH rdma-next v1 0/2] RDMA: fix cross-NIC same-host IPv6 RDMA-CM connect Alex Timofeyev
2026-06-15 17:46 ` [PATCH rdma-next v1 2/2] RDMA/cma: accept cross-NIC same-host local dst in validate_ipv6_net_dev Alex Timofeyev
2026-06-15 17:46 ` [PATCH rdma-next v1 1/2] RDMA/core: use destination netdev MAC for cross-NIC same-host local dst Alex Timofeyev
2026-06-15 23:59 ` [PATCH rdma-next v1 0/2] RDMA: fix cross-NIC same-host IPv6 RDMA-CM connect Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.