From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B8732F4333 for ; Fri, 13 Jun 2025 04:56:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749790619; cv=none; b=J13MBhM7lGwHBKZeTcSL+ZaNHNvdzmCz0tpdJ0qkeeAuiuTjh4XJXr6PwwGl9TH5m2EeLWjeKihuCuvM4dfaC6A0ujxLaO3TsZOddULc/LGDABoFa01UC1brZ8e3rfjm6WdFPGBkbt3Y7ZOzak9DzlIhjrwiz59ZQxmBv7/b1q4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749790619; c=relaxed/simple; bh=gi7/BSDVNb7O2bJcvH7LRMv3hJdUoqoKrn7QiBdXi3M=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=UxSYwLmVksVg6TigTslLwysJCAZW2mfJtQxMlARMqw1VRKhIoYvg6Mduji5b06/pGKBa2Gwy1289VjUaGLhzg3/A43NZUak2sneFCmhnb8FsfD1579NGyrLk68cSINbP0ASHU4lHNNyVWL1ECWkqWQ8bq5+253xudBb4dKPNat8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZdkL6JRk; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZdkL6JRk" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1749790615; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=07mu1H2Fz6+UQ8LWJqAXuVs0XQYIbPBJGblMDKiU/bE=; b=ZdkL6JRkC2VbfDm2HWA/NTpVTcJBk7e1hWcWh7s6mbzU6wctZk9p75OgmXevvckLenxIiE GC3BE4PYK3xdv+bLjQbhRCKHYmd+B+oN6N+aqDeuhGdv4BqN21e4M6D1BmAGejwX65v7kg sjgaE6tDm2n2Qu3x/9S9x5pBA9fonE0= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-193-QZcifxZZPGqh1SdGkHd1Rg-1; Fri, 13 Jun 2025 00:56:54 -0400 X-MC-Unique: QZcifxZZPGqh1SdGkHd1Rg-1 X-Mimecast-MFC-AGG-ID: QZcifxZZPGqh1SdGkHd1Rg_1749790612 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 6A80519560AD; Fri, 13 Jun 2025 04:56:51 +0000 (UTC) Received: from fedora (unknown [10.72.116.73]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C8B56195E340; Fri, 13 Jun 2025 04:56:45 +0000 (UTC) Date: Fri, 13 Jun 2025 12:56:40 +0800 From: Ming Lei To: Shinichiro Kawasaki Cc: Sagi Grimberg , "linux-nvme@lists.infradead.org" , linux-block , Hannes Reinecke , Keith Busch , Jens Axboe , "hch@infradead.org" Subject: Re: [PATCH 1/2] nvme-tcp: avoid race between nvme scan and reset Message-ID: References: <20250602043522.55787-1-shinichiro.kawasaki@wdc.com> <20250602043522.55787-2-shinichiro.kawasaki@wdc.com> <910b31ba-1982-4365-961e-435f5e7611b2@grimberg.me> <86e241dd-9065-4cf0-9c35-8b7502ab2d8a@grimberg.me> <6pt5u3fg3qts4jekun5ory5lr2jtfbibd76phqviheulpjqjtq@m3arkh44nrs2> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <6pt5u3fg3qts4jekun5ory5lr2jtfbibd76phqviheulpjqjtq@m3arkh44nrs2> X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 On Wed, Jun 04, 2025 at 11:17:05AM +0000, Shinichiro Kawasaki wrote: > Cc+: Ming, > > Hi Sagi, thanks for the background explanation and the suggestion. > > On Jun 04, 2025 / 10:10, Sagi Grimberg wrote: > ... > > > This is a problem. We are flushing a work that is IO bound, which can > > > take a long time to complete. > > > Up until now, we have deliberately avoided introducing dependency > > > between reset forward progress > > > and scan work IO to completely finish. > > > > > > I would like to keep it this way. > > > > > > BTW, this is not TCP specific. > > I see. The blktests test case nvme/063 is dedicated to tcp transport, so that's > why it was reported for the TCP transport. > > > > > blk_mq_unquiesce_queue is still very much safe to call as many times as we > > want. > > The only thing that comes in the way is this pesky WARN. How about we make > > it go away and have > > a debug print instead? > > > > My preference would be to allow nvme to unquiesce queues that were not > > previously quiesced (just > > like it historically was) instead of having to block a controller reset > > until the scan_work is completed (which > > is admin I/O dependent, and may get stuck until admin timeout, which can be > > changed by the user for 60 > > minutes or something arbitrarily long btw). > > > > How about something like this patch instead: > > -- > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index c2697db59109..74f3ad16e812 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -327,8 +327,10 @@ void blk_mq_unquiesce_queue(struct request_queue *q) > >         bool run_queue = false; > > > >         spin_lock_irqsave(&q->queue_lock, flags); > > -       if (WARN_ON_ONCE(q->quiesce_depth <= 0)) { > > -               ; > > +       if (q->quiesce_depth <= 0) { > > +               printk(KERN_DEBUG > > +                       "dev %s: unquiescing a non-quiesced queue, > > expected?\n", > > +                       q->disk ? q->disk->disk_name : "?", ); > >         } else if (!--q->quiesce_depth) { > >                 blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q); > >                 run_queue = true; > > -- > > The WARN was introduced with the commit e70feb8b3e68 ("blk-mq: support > concurrent queue quiesce/unquiesce") that Ming authored. Ming, may I > ask your comment on the suggestion by Sagi? I think it is bad to use one standard block layer API in this unbalanced way, that is why WARN_ON_ONCE() is added. We shouldn't encourage driver to use APIs in this way. Question is why nvme have to unquiesce one non-quiesced queue? Thanks, Ming