From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from e36.co.us.ibm.com (e36.co.us.ibm.com [32.97.110.154]) (using TLSv1 with cipher CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 53B371A19FD for ; Tue, 15 Dec 2015 08:08:09 +1100 (AEDT) Received: from localhost by e36.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 14 Dec 2015 14:08:07 -0700 Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id 2E3F91FF0045 for ; Mon, 14 Dec 2015 13:55:57 -0700 (MST) Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id tBEL7k6W27197612 for ; Mon, 14 Dec 2015 14:07:46 -0700 Received: from d03av02.boulder.ibm.com (localhost [127.0.0.1]) by d03av02.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id tBEL7iwg030889 for ; Mon, 14 Dec 2015 14:07:45 -0700 From: Uma Krishnan To: linux-scsi@vger.kernel.org, James Bottomley , "Martin K. Petersen" , "Matthew R. Ochs" , "Manoj N. Kumar" , Brian King Cc: linuxppc-dev@lists.ozlabs.org, Ian Munsie , Andrew Donnellan , Daniel Axtens Subject: [PATCH v2 4/6] cxlflash: Fix to resolve cmd leak after host reset Date: Mon, 14 Dec 2015 15:07:02 -0600 Message-Id: <1450127222-48145-1-git-send-email-ukrishn@linux.vnet.ibm.com> In-Reply-To: <1450126293-47440-1-git-send-email-ukrishn@linux.vnet.ibm.com> References: <1450126293-47440-1-git-send-email-ukrishn@linux.vnet.ibm.com> List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Manoj Kumar After a few iterations of resetting the card, either during EEH recovery, or a host_reset the following is seen in the logs. cxlflash 0008:00: cxlflash_queuecommand: could not get a free command At every reset of the card, the commands that are outstanding are being leaked. No effort is being made to reap these commands. A few more resets later, the above error message floods the logs and the card is rendered totally unusable as no free commands are available. Iterated through the 'cmd' queue and printed out the 'free' counter and found that on each reset certain commands were in-use and stayed in-use through subsequent resets. To resolve this issue, when the card is reset, reap all the commands that are active/outstanding. Signed-off-by: Manoj N. Kumar Acked-by: Matthew R. Ochs --- drivers/scsi/cxlflash/main.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c index 35a3202..ac39856 100644 --- a/drivers/scsi/cxlflash/main.c +++ b/drivers/scsi/cxlflash/main.c @@ -632,15 +632,30 @@ static void free_mem(struct cxlflash_cfg *cfg) * @cfg: Internal structure associated with the host. * * Safe to call with AFU in a partially allocated/initialized state. + * + * Cleans up all state associated with the command queue, and unmaps + * the MMIO space. + * + * - complete() will take care of commands we initiated (they'll be checked + * in as part of the cleanup that occurs after the completion) + * + * - cmd_checkin() will take care of entries that we did not initiate and that + * have not (and will not) complete because they are sitting on a [now stale] + * hardware queue */ static void stop_afu(struct cxlflash_cfg *cfg) { int i; struct afu *afu = cfg->afu; + struct afu_cmd *cmd; if (likely(afu)) { - for (i = 0; i < CXLFLASH_NUM_CMDS; i++) - complete(&afu->cmd[i].cevent); + for (i = 0; i < CXLFLASH_NUM_CMDS; i++) { + cmd = &afu->cmd[i]; + complete(&cmd->cevent); + if (!atomic_read(&cmd->free)) + cmd_checkin(cmd); + } if (likely(afu->afu_map)) { cxl_psa_unmap((void __iomem *)afu->afu_map); -- 2.1.0