From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDEACC282C0 for ; Fri, 25 Jan 2019 04:42:42 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 31AC421872 for ; Fri, 25 Jan 2019 04:42:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 31AC421872 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 43m5wv4Zb2zDqMK for ; Fri, 25 Jan 2019 15:42:39 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0a-001b2d01.pphosted.com; envelope-from=vaibhav@linux.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 43m5v63XZPzDqLH for ; Fri, 25 Jan 2019 15:41:04 +1100 (AEDT) Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x0P4d7Gg066886 for ; Thu, 24 Jan 2019 23:41:02 -0500 Received: from e06smtp02.uk.ibm.com (e06smtp02.uk.ibm.com [195.75.94.98]) by mx0b-001b2d01.pphosted.com with ESMTP id 2q7tnht6xq-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 24 Jan 2019 23:41:02 -0500 Received: from localhost by e06smtp02.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 25 Jan 2019 04:41:00 -0000 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp02.uk.ibm.com (192.168.101.132) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Fri, 25 Jan 2019 04:40:59 -0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x0P4evRX37683214 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 25 Jan 2019 04:40:57 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C2ACF52051; Fri, 25 Jan 2019 04:40:57 +0000 (GMT) Received: from vajain21.in.ibm.com (unknown [9.109.223.28]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id D46C052052; Fri, 25 Jan 2019 04:40:55 +0000 (GMT) From: Vaibhav Jain To: linuxppc-dev@lists.ozlabs.org, Frederic Barrat Subject: [PATCH] cxl: Wrap iterations over afu slices inside 'afu_list_lock' Date: Fri, 25 Jan 2019 10:10:52 +0530 X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 19012504-0008-0000-0000-000002B5BD88 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19012504-0009-0000-0000-00002221F40D Message-Id: <20190125044052.32059-1-vaibhav@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-01-25_02:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1901250036 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Philippe Bergheaud , Vaibhav Jain , Alastair D'Silva , Christophe Lombard , Andrew Donnellan Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" Within cxl module, iteration over array 'adapter->slices' may be racy at few points as it might be simultaneously read during an EEH and its contents being set to NULL while driver is being unloaded or unbound from the adapter. This might result in a NULL pointer to 'struct afu' being de-referenced during an EEH thereby causing a kernel oops. This patch fixes this by making sure that all access to the array 'adapter->slices' is wrapped within the context of spin-lock 'adapter->afu_list_lock'. Signed-off-by: Vaibhav Jain --- drivers/misc/cxl/guest.c | 2 ++ drivers/misc/cxl/pci.c | 38 ++++++++++++++++++++++++++++---------- 2 files changed, 30 insertions(+), 10 deletions(-) diff --git a/drivers/misc/cxl/guest.c b/drivers/misc/cxl/guest.c index 5d28d9e454f5..08f4a512afad 100644 --- a/drivers/misc/cxl/guest.c +++ b/drivers/misc/cxl/guest.c @@ -267,6 +267,7 @@ static int guest_reset(struct cxl *adapter) int i, rc; pr_devel("Adapter reset request\n"); + spin_lock(&adapter->afu_list_lock); for (i = 0; i < adapter->slices; i++) { if ((afu = adapter->afu[i])) { pci_error_handlers(afu, CXL_ERROR_DETECTED_EVENT, @@ -283,6 +284,7 @@ static int guest_reset(struct cxl *adapter) pci_error_handlers(afu, CXL_RESUME_EVENT, 0); } } + spin_unlock(&adapter->afu_list_lock); return rc; } diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index c79ba1c699ad..28c28bceb063 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -1805,7 +1805,7 @@ static pci_ers_result_t cxl_vphb_error_detected(struct cxl_afu *afu, /* There should only be one entry, but go through the list * anyway */ - if (afu->phb == NULL) + if (afu == NULL || afu->phb == NULL) return result; list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) { @@ -1843,6 +1843,8 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev, /* If we're permanently dead, give up. */ if (state == pci_channel_io_perm_failure) { + /* Stop the slice traces */ + spin_lock(&adapter->afu_list_lock); for (i = 0; i < adapter->slices; i++) { afu = adapter->afu[i]; /* @@ -1851,6 +1853,7 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev, */ cxl_vphb_error_detected(afu, state); } + spin_unlock(&adapter->afu_list_lock); return PCI_ERS_RESULT_DISCONNECT; } @@ -1932,14 +1935,20 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev, * * In slot_reset, free the old resources and allocate new ones. * * In resume, clear the flag to allow things to start. */ + + /* Make sure no one else changes the afu list */ + spin_lock(&adapter->afu_list_lock); + for (i = 0; i < adapter->slices; i++) { afu = adapter->afu[i]; afu_result = cxl_vphb_error_detected(afu, state); - cxl_context_detach_all(afu); - cxl_ops->afu_deactivate_mode(afu, afu->current_mode); - pci_deconfigure_afu(afu); + if (afu != NULL) { + cxl_context_detach_all(afu); + cxl_ops->afu_deactivate_mode(afu, afu->current_mode); + pci_deconfigure_afu(afu); + } /* Disconnect trumps all, NONE trumps NEED_RESET */ if (afu_result == PCI_ERS_RESULT_DISCONNECT) @@ -1948,6 +1957,7 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev, (result == PCI_ERS_RESULT_NEED_RESET)) result = PCI_ERS_RESULT_NONE; } + spin_unlock(&adapter->afu_list_lock); /* should take the context lock here */ if (cxl_adapter_context_lock(adapter) != 0) @@ -1980,14 +1990,15 @@ static pci_ers_result_t cxl_pci_slot_reset(struct pci_dev *pdev) */ cxl_adapter_context_unlock(adapter); + spin_lock(&adapter->afu_list_lock); for (i = 0; i < adapter->slices; i++) { afu = adapter->afu[i]; if (pci_configure_afu(afu, adapter, pdev)) - goto err; + goto err_unlock; if (cxl_afu_select_best_mode(afu)) - goto err; + goto err_unlock; if (afu->phb == NULL) continue; @@ -1999,16 +2010,16 @@ static pci_ers_result_t cxl_pci_slot_reset(struct pci_dev *pdev) ctx = cxl_get_context(afu_dev); if (ctx && cxl_release_context(ctx)) - goto err; + goto err_unlock; ctx = cxl_dev_context_init(afu_dev); if (IS_ERR(ctx)) - goto err; + goto err_unlock; afu_dev->dev.archdata.cxl_ctx = ctx; if (cxl_ops->afu_check_and_enable(afu)) - goto err; + goto err_unlock; afu_dev->error_state = pci_channel_io_normal; @@ -2029,8 +2040,13 @@ static pci_ers_result_t cxl_pci_slot_reset(struct pci_dev *pdev) result = PCI_ERS_RESULT_DISCONNECT; } } + + spin_unlock(&adapter->afu_list_lock); return result; +err_unlock: + spin_unlock(&adapter->afu_list_lock); + err: /* All the bits that happen in both error_detected and cxl_remove * should be idempotent, so we don't need to worry about leaving a mix @@ -2051,10 +2067,11 @@ static void cxl_pci_resume(struct pci_dev *pdev) * This is not the place to be checking if everything came back up * properly, because there's no return value: do that in slot_reset. */ + spin_lock(&adapter->afu_list_lock); for (i = 0; i < adapter->slices; i++) { afu = adapter->afu[i]; - if (afu->phb == NULL) + if (afu || afu->phb == NULL) continue; list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) { @@ -2063,6 +2080,7 @@ static void cxl_pci_resume(struct pci_dev *pdev) afu_dev->driver->err_handler->resume(afu_dev); } } + spin_unlock(&adapter->afu_list_lock); } static const struct pci_error_handlers cxl_err_handler = { -- 2.20.1