From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF562C43334 for ; Mon, 6 Jun 2022 06:52:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229623AbiFFGwB (ORCPT ); Mon, 6 Jun 2022 02:52:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37274 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230193AbiFFGvK (ORCPT ); Mon, 6 Jun 2022 02:51:10 -0400 Received: from verein.lst.de (verein.lst.de [213.95.11.211]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A9ECD9EBF for ; Sun, 5 Jun 2022 23:51:07 -0700 (PDT) Received: by verein.lst.de (Postfix, from userid 2407) id 6555868AA6; Mon, 6 Jun 2022 08:51:03 +0200 (CEST) Date: Mon, 6 Jun 2022 08:51:02 +0200 From: "hch@lst.de" To: "Michael Kelley (LINUX)" Cc: Keith Busch , "axboe@fb.com" , "hch@lst.de" , "sagi@grimberg.me" , "linux-nvme@lists.infradead.org" , "linux-kernel@vger.kernel.org" , Caroline Subramoney , Richard Wurdack , Nathan Obr Subject: Re: [PATCH v2 2/2] nvme: handle persistent internal error AER from NVMe controller Message-ID: <20220606065102.GA2551@lst.de> References: <1654278961-81423-1-git-send-email-mikelley@microsoft.com> <1654278961-81423-2-git-send-email-mikelley@microsoft.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17 (2007-11-01) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jun 04, 2022 at 02:28:11PM +0000, Michael Kelley (LINUX) wrote: > > driver's irq handler. The other transports block on register reads, though, so > > they can't call this from an atomic context. The TCP context looks safe, but > > I'm not sure about RDMA or FC. > > Good point. But even if the RDMA and FC contexts are safe, For RDMA this is typically called from softirq context, so it is indeed not save. > if a > persistent error is reported, the controller is already in trouble and > may not respond to a request to retrieve the CSTS anyway. Perhaps > we should just trust the AER error report and not bother checking > CSTS to decide whether to do the reset. We can still check ctrl->state > and skip the reset if there's already one in progress. Yes, that might be a better option.