netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Guangguan Wang <guangguan.wang@linux.alibaba.com>,
	Wenjia Zhang <wenjia@linux.ibm.com>,
	Halil Pasic <pasic@linux.ibm.com>,
	"David S . Miller" <davem@davemloft.net>,
	Sasha Levin <sashal@kernel.org>,
	jaka@linux.ibm.com, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, linux-rdma@vger.kernel.org,
	linux-s390@vger.kernel.org, netdev@vger.kernel.org
Subject: [PATCH AUTOSEL 6.12 116/486] net/smc: use the correct ndev to find pnetid by pnetid table
Date: Mon,  5 May 2025 18:33:12 -0400	[thread overview]
Message-ID: <20250505223922.2682012-116-sashal@kernel.org> (raw)
In-Reply-To: <20250505223922.2682012-1-sashal@kernel.org>

From: Guangguan Wang <guangguan.wang@linux.alibaba.com>

[ Upstream commit bfc6c67ec2d64d0ca4e5cc3e1ac84298a10b8d62 ]

When using smc_pnet in SMC, it will only search the pnetid in the
base_ndev of the netdev hierarchy(both HW PNETID and User-defined
sw pnetid). This may not work for some scenarios when using SMC in
container on cloud environment.
In container, there have choices of different container network,
such as directly using host network, virtual network IPVLAN, veth,
etc. Different choices of container network have different netdev
hierarchy. Examples of netdev hierarchy show below. (eth0 and eth1
in host below is the netdev directly related to the physical device).
            _______________________________
           |   _________________           |
           |  |POD              |          |
           |  |                 |          |
           |  | eth0_________   |          |
           |  |____|         |__|          |
           |       |         |             |
           |       |         |             |
           |   eth1|base_ndev| eth0_______ |
           |       |         |    | RDMA  ||
           | host  |_________|    |_______||
           ---------------------------------
     netdev hierarchy if directly using host network
           ________________________________
           |   _________________           |
           |  |POD  __________  |          |
           |  |    |upper_ndev| |          |
           |  |eth0|__________| |          |
           |  |_______|_________|          |
           |          |lower netdev        |
           |        __|______              |
           |   eth1|         | eth0_______ |
           |       |base_ndev|    | RDMA  ||
           | host  |_________|    |_______||
           ---------------------------------
            netdev hierarchy if using IPVLAN
            _______________________________
           |   _____________________       |
           |  |POD        _________ |      |
           |  |          |base_ndev||      |
           |  |eth0(veth)|_________||      |
           |  |____________|________|      |
           |               |pairs          |
           |        _______|_              |
           |       |         | eth0_______ |
           |   veth|base_ndev|    | RDMA  ||
           |       |_________|    |_______||
           |        _________              |
           |   eth1|base_ndev|             |
           | host  |_________|             |
           ---------------------------------
             netdev hierarchy if using veth
Due to some reasons, the eth1 in host is not RDMA attached netdevice,
pnetid is needed to map the eth1(in host) with RDMA device so that POD
can do SMC-R. Because the eth1(in host) is managed by CNI plugin(such
as Terway, network management plugin in container environment), and in
cloud environment the eth(in host) can dynamically be inserted by CNI
when POD create and dynamically be removed by CNI when POD destroy and
no POD related to the eth(in host) anymore. It is hard to config the
pnetid to the eth1(in host). But it is easy to config the pnetid to the
netdevice which can be seen in POD. When do SMC-R, both the container
directly using host network and the container using veth network can
successfully match the RDMA device, because the configured pnetid netdev
is a base_ndev. But the container using IPVLAN can not successfully
match the RDMA device and 0x03030000 fallback happens, because the
configured pnetid netdev is not a base_ndev. Additionally, if config
pnetid to the eth1(in host) also can not work for matching RDMA device
when using veth network and doing SMC-R in POD.

To resolve the problems list above, this patch extends to search user
-defined sw pnetid in the clc handshake ndev when no pnetid can be found
in the base_ndev, and the base_ndev take precedence over ndev for backward
compatibility. This patch also can unify the pnetid setup of different
network choices list above in container(Config user-defined sw pnetid in
the netdevice can be seen in POD).

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/smc/smc_pnet.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c
index 716808f374a8d..b391c2ef463f2 100644
--- a/net/smc/smc_pnet.c
+++ b/net/smc/smc_pnet.c
@@ -1079,14 +1079,16 @@ static void smc_pnet_find_roce_by_pnetid(struct net_device *ndev,
 					 struct smc_init_info *ini)
 {
 	u8 ndev_pnetid[SMC_MAX_PNETID_LEN];
+	struct net_device *base_ndev;
 	struct net *net;
 
-	ndev = pnet_find_base_ndev(ndev);
+	base_ndev = pnet_find_base_ndev(ndev);
 	net = dev_net(ndev);
-	if (smc_pnetid_by_dev_port(ndev->dev.parent, ndev->dev_port,
+	if (smc_pnetid_by_dev_port(base_ndev->dev.parent, base_ndev->dev_port,
 				   ndev_pnetid) &&
+	    smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid) &&
 	    smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid)) {
-		smc_pnet_find_rdma_dev(ndev, ini);
+		smc_pnet_find_rdma_dev(base_ndev, ini);
 		return; /* pnetid could not be determined */
 	}
 	_smc_pnet_find_roce_by_pnetid(ndev_pnetid, ini, NULL, net);
-- 
2.39.5


  parent reply	other threads:[~2025-05-05 22:43 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20250505223922.2682012-1-sashal@kernel.org>
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 013/486] SUNRPC: Don't allow waiting for exiting tasks Sasha Levin
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 029/486] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting Sasha Levin
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 030/486] SUNRPC: rpcbind should never reset the port to the value '0' Sasha Levin
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 034/486] mctp: Fix incorrect tx flow invalidation condition in mctp-i2c Sasha Levin
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 035/486] net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards Sasha Levin
2025-05-05 22:31 ` [PATCH AUTOSEL 6.12 036/486] net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus Sasha Levin
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 046/486] r8169: disable RTL8126 ZRX-DC timeout Sasha Levin
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 092/486] bnxt_en: Query FW parameters when the CAPS_CHANGE bit is set Sasha Levin
2025-05-05 22:32 ` [PATCH AUTOSEL 6.12 103/486] tcp: reorganize tcp_in_ack_event() and tcp_count_delivered() Sasha Levin
2025-05-05 22:33 ` Sasha Levin [this message]
2025-05-05 22:33 ` [PATCH AUTOSEL 6.12 131/486] net: stmmac: dwmac-rk: Validate GRF and peripheral GRF during probe Sasha Levin
2025-05-05 22:33 ` [PATCH AUTOSEL 6.12 132/486] net: hsr: Fix PRP duplicate detection Sasha Levin
2025-05-05 22:33 ` [PATCH AUTOSEL 6.12 135/486] netfilter: conntrack: Bound nf_conntrack sysctl writes Sasha Levin
2025-05-05 22:33 ` [PATCH AUTOSEL 6.12 155/486] ipv6: save dontfrag in cork Sasha Levin
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 180/486] tcp: bring back NUMA dispersion in inet_ehash_locks_alloc() Sasha Levin
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 182/486] ieee802154: ca8210: Use proper setters and getters for bitwise types Sasha Levin
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 193/486] net: phylink: use pl->link_interface in phylink_expects_phy() Sasha Levin
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 206/486] net: ethernet: ti: cpsw_new: populate netdev of_node Sasha Levin
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 207/486] net: phy: nxp-c45-tja11xx: add match_phy_device to TJA1103/TJA1104 Sasha Levin
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 208/486] dpll: Add an assertion to check freq_supported_num Sasha Levin
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 212/486] net: pktgen: fix mpls maximum labels list parsing Sasha Levin
2025-05-05 22:34 ` [PATCH AUTOSEL 6.12 216/486] ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config() Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 238/486] net/mlx5: Avoid report two health errors on same syndrome Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 239/486] selftests/net: have `gro.sh -t` return a correct exit code Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 244/486] net: ethernet: mtk_ppe_offload: Allow QinQ, double ETH_P_8021Q only Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 245/486] net: xgene-v2: remove incorrect ACPI_PTR annotation Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 246/486] bonding: report duplicate MAC address in all situations Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 250/486] Octeontx2-af: RPM: Register driver with PCI subsys IDs Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 258/486] vhost-scsi: Return queue full for page alloc failures during copy Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 263/486] net/mlx5e: Add correct match to check IPSec syndromes for switchdev mode Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 270/486] net/mlx5: Change POOL_NEXT_SIZE define value and make it global Sasha Levin
2025-05-05 22:35 ` [PATCH AUTOSEL 6.12 274/486] net: ipv6: Init tunnel link-netns before registering dev Sasha Levin
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 291/486] net: pktgen: fix access outside of user given buffer in pktgen_thread_write() Sasha Levin
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 294/486] bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback Sasha Levin
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 312/486] eth: mlx4: don't try to complete XDP frames in netpoll Sasha Levin
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 315/486] vxlan: Join / leave MC group after remote changes Sasha Levin
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 321/486] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit Sasha Levin
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 322/486] net/mlx5: Apply rate-limiting to high temperature warning Sasha Levin
2025-05-05 22:36 ` [PATCH AUTOSEL 6.12 342/486] net/mlx4_core: Avoid impossible mlx4_db_alloc() order value Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 359/486] net: stmmac: dwmac-loongson: Set correct {tx,rx}_fifo_size Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 379/486] net/mlx5: XDP, Enable TX side XDP multi-buffer support Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 380/486] net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 381/486] net/mlx5e: set the tx_queue_len for pfifo_fast Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 382/486] net/mlx5e: reduce rep rxq depth to 256 for ECPF Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 383/486] net/mlx5e: reduce the max log mpwrq sz for ECPF and reps Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 385/486] xfrm: prevent high SEQ input in non-ESN mode Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 387/486] mptcp: pm: userspace: flags: clearer msg if no remote addr Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 393/486] net: fec: Refactor MAC reset to function Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 397/486] ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure() Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 398/486] r8152: add vendor/device ID pair for Dell Alienware AW1022z Sasha Levin
2025-05-05 22:37 ` [PATCH AUTOSEL 6.12 402/486] net: ethtool: prevent flow steering to RSS contexts which don't exist Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 412/486] net: page_pool: avoid false positive warning if NAPI was never added Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 420/486] eth: fbnic: set IFF_UNICAST_FLT to avoid enabling promiscuous mode when adding unicast addrs Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 421/486] tools: ynl-gen: don't output external constants Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 422/486] net/mlx5e: Avoid WARN_ON when configuring MQPRIO with HTB offload enabled Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 424/486] vxlan: Annotate FDB data races Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 425/486] ipv4: ip_gre: Fix set but not used warning in ipgre_err() if IPv4-only Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 426/486] r8169: don't scan PHY addresses > 0 Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 427/486] net: flush_backlog() small changes Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 428/486] bridge: mdb: Allow replace of a host-joined group Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 429/486] net-sysfs: remove rtnl_trylock from queue attributes Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 430/486] net-sysfs: prevent uncleared queues from being re-added Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 431/486] net-sysfs: remove rtnl_trylock from device attributes Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 432/486] ice: init flow director before RDMA Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 433/486] ice: treat dyn_allowed only as suggestion Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 438/486] ice: count combined queues using Rx/Tx count Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 440/486] net/mana: fix warning in the writer of client oob Sasha Levin
2025-05-05 22:38 ` [PATCH AUTOSEL 6.12 456/486] bpf: Use kallsyms to find the function name of a struct_ops's stub function Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250505223922.2682012-116-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=guangguan.wang@linux.alibaba.com \
    --cc=jaka@linux.ibm.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=stable@vger.kernel.org \
    --cc=wenjia@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).