From mboxrd@z Thu Jan 1 00:00:00 1970 From: kbusch@kernel.org (Keith Busch) Date: Wed, 24 Apr 2019 14:44:17 -0600 Subject: [PATCH v2 0/2] Adding per-controller timeout support to nvme In-Reply-To: <983e5d039dce9de1d32c71d28fd59bbc01c3fee5.camel@infradead.org> References: <20190403123506.122904-1-mheyne@amazon.de> <20190424200706.GB15412@localhost.localdomain> <983e5d039dce9de1d32c71d28fd59bbc01c3fee5.camel@infradead.org> Message-ID: <20190424204417.GC15412@localhost.localdomain> On Wed, Apr 24, 2019@10:30:08PM +0200, David Woodhouse wrote: > It isn't that the media is slow; the max timeout is based on the SLA > for certain classes of "fabric" outages. Linux copes *really* badly > with I/O errors, and if we can make the timeout last long enough to > cover the switch restart worst case, then users are a lot happier. Gotchya. So the default timeout is sufficient under normal operation, but temporary intermittent outages may exceed it. It'd be a real dissappointment if the command times out with an error anyway after waiting for the extended time, but we ought to cover the worst case time for a successful completion. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1457BC10F11 for ; Wed, 24 Apr 2019 20:50:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D95C9218B0 for ; Wed, 24 Apr 2019 20:50:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1556139020; bh=cwKb2Q4aYEHUb1HddaC9+HyWLX71sRpMMritCJuPOqk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:List-ID:From; b=BKpelkhBVvc4EUmVEbwRM2TrpGQo362r/PsUX4Uzi/PAvWCEug+AsumM7RsyMu5fr 1s4FlLWqAaI1/Lc3cX5lmQyQGWzvU7qVPaKveSZP1CnYSc45jNU8TkNTms4fGYRAjE DaZChypZmZfUtZzI7TctCDyVX3nxDa7Ni4XpdCMo= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732746AbfDXUuT (ORCPT ); Wed, 24 Apr 2019 16:50:19 -0400 Received: from mga04.intel.com ([192.55.52.120]:39221 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731341AbfDXUuT (ORCPT ); Wed, 24 Apr 2019 16:50:19 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Apr 2019 13:50:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,390,1549958400"; d="scan'208";a="164547980" Received: from unknown (HELO localhost.localdomain) ([10.232.112.69]) by fmsmga002.fm.intel.com with ESMTP; 24 Apr 2019 13:50:18 -0700 Date: Wed, 24 Apr 2019 14:44:17 -0600 From: Keith Busch To: David Woodhouse Cc: Sagi Grimberg , Jens Axboe , James Smart , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Keith Busch , Maximilian Heyne , Amit Shah , Christoph Hellwig Subject: Re: [PATCH v2 0/2] Adding per-controller timeout support to nvme Message-ID: <20190424204417.GC15412@localhost.localdomain> References: <20190403123506.122904-1-mheyne@amazon.de> <20190424200706.GB15412@localhost.localdomain> <983e5d039dce9de1d32c71d28fd59bbc01c3fee5.camel@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <983e5d039dce9de1d32c71d28fd59bbc01c3fee5.camel@infradead.org> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 24, 2019 at 10:30:08PM +0200, David Woodhouse wrote: > It isn't that the media is slow; the max timeout is based on the SLA > for certain classes of "fabric" outages. Linux copes *really* badly > with I/O errors, and if we can make the timeout last long enough to > cover the switch restart worst case, then users are a lot happier. Gotchya. So the default timeout is sufficient under normal operation, but temporary intermittent outages may exceed it. It'd be a real dissappointment if the command times out with an error anyway after waiting for the extended time, but we ought to cover the worst case time for a successful completion.