From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C9144C25B74 for ; Thu, 16 May 2024 12:14:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:Message-ID:Date:Subject:Cc:To:From:Reply-To:Content-Type: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=K6y0GW9JzAhUdPghsSQ+AUMuf984pMy01fSSRAXmnhU=; b=di13WtOz+kyYG3Mn9HAi5zVD/U zZmGeFZbj7WakxdWmsomol0e9+eKDRufhRkZleHBd1Vl5zmDygIps2QJ5gTGd7mRq8IbND66GG1oo ECdET7RKJjMdxgNozByjZIu8QRqL1F6verk41npwbaV+NfRa2/7sK+cgX1366ccr06mYjTSxZRyWI bGYuMrS5JroGNgvgbOYW7+XetcJDLyjNb85fJZJ7U5/NY7wODAIgb1ErYRCyHO/7p7M6zAMdxWZ48 2lmhvBJNBq2cN7lJFtIjswwUIjJUCzRMmwTmP5kLsS5jEGMMUbYmPFfl3weF9qP2qzXm33Ha612/w srvH7Uig==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1s7a0Q-00000004jKq-40GU; Thu, 16 May 2024 12:14:42 +0000 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1s7a0N-00000004jJz-1BnG for linux-nvme@lists.infradead.org; Thu, 16 May 2024 12:14:41 +0000 Received: from pps.filterd (m0353723.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 44GAofsd008289; Thu, 16 May 2024 12:14:26 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=K6y0GW9JzAhUdPghsSQ+AUMuf984pMy01fSSRAXmnhU=; b=U9qo6qnK7dwi0nW3wXVYf6di35lCUacDNON+/pYq1nnPZJ1Njm/b8PZm9HmVr9JqD+fO euDPyBo4IeayDgF89mtAkx2E7rk+0X7fzPaH0a9Vm7r4HiS1Px+2lcAjmg/rRRlKX328 xhELdyFkJCN/EH94jax56ZQT0204vwxr9Smrah+DDHCfxWsJ2+6+yUd7lPvdJWfOEcew H5OQ4vUnNTliiVPKyLqgscqDP3AkD9QuyzJVI+CzPBu0nG9y9SbTAXUpTvPoKeRYe1Nr KE2W6/Rw55eh6yqsaVm9vtzmsnupdI+nIfRn0WYAm61eyFaKhpJ6W6Zr89QsTXX4sCs6 bg== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3y5ddgrqd0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 16 May 2024 12:14:26 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 44G9GNx2029636; Thu, 16 May 2024 12:14:25 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3y2n7m1g8m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 16 May 2024 12:14:25 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 44GCEJDU51118442 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 16 May 2024 12:14:21 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 297A92004E; Thu, 16 May 2024 12:14:19 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6854F20049; Thu, 16 May 2024 12:14:17 +0000 (GMT) Received: from li-c9696b4c-3419-11b2-a85c-f9edc3bf8a84.in.ibm.com (unknown [9.109.198.160]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 16 May 2024 12:14:17 +0000 (GMT) From: Nilay Shroff To: linux-nvme@lists.infradead.org Cc: hch@lst.de, kbusch@kernel.org, sagi@grimberg.me, gjoyce@linux.ibm.com, axboe@fb.com, Nilay Shroff Subject: [PATCH] nvme-multipath: find NUMA path only for online numa-node Date: Thu, 16 May 2024 17:43:51 +0530 Message-ID: <20240516121358.3145962-1-nilay@linux.ibm.com> X-Mailer: git-send-email 2.44.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 3jSzjlh_Z7IAb5UA1aLciSNVTvxkfu_E X-Proofpoint-GUID: 3jSzjlh_Z7IAb5UA1aLciSNVTvxkfu_E X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.650,FMLib:17.11.176.26 definitions=2024-05-16_05,2024-05-15_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 lowpriorityscore=0 mlxscore=0 clxscore=1015 impostorscore=0 adultscore=0 mlxlogscore=999 spamscore=0 suspectscore=0 malwarescore=0 bulkscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2405010000 definitions=main-2405160085 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240516_051439_507278_D80525B8 X-CRM114-Status: GOOD ( 21.52 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org In current native multipath design when a shared namespace is created, we loop through each possible numa-node, calculate the NUMA distance of that node from each nvme controller and then cache the optimal IO path for future reference while sending IO. The issue with this design is that we may refer to the NUMA distance table for an offline node which may not be populated at the time and so we may inadvertently end up finding and caching a non-optimal path for IO. Then latter when the corresponding numa-node becomes online and hence the NUMA distance table entry for that node is created, ideally we should re-calculate the multipath node distance for the newly added node however that doesn't happen unless we rescan/reset the controller. So essentially, we may keep using non-optimal IO path for a node which is made online after namespace is created. This patch helps fix this issue ensuring that when a shared namespace is created, we calculate the multipath node distance for each online numa-node instead of each possible numa-node. Then latter when a node becomes online and we receive any IO on that newly added node, we would calculate the multipath node distance for newly added node but this time NUMA distance table would have been already populated for newly added node. Hence we would be able to correctly calculate the multipath node distance and choose the optimal path for the IO. Signed-off-by: Nilay Shroff --- drivers/nvme/host/multipath.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index d16e976ae1a4..9c1e135b8df3 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -595,7 +595,7 @@ static void nvme_mpath_set_live(struct nvme_ns *ns) int node, srcu_idx; srcu_idx = srcu_read_lock(&head->srcu); - for_each_node(node) + for_each_online_node(node) __nvme_find_path(head, node); srcu_read_unlock(&head->srcu, srcu_idx); } -- 2.44.0