From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C7BFC433F5 for ; Tue, 2 Nov 2021 10:50:01 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2724560EE9 for ; Tue, 2 Nov 2021 10:50:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2724560EE9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Stb4+fCHX8yk7t0HzWUpLzpD3VwfDrSQlVWYPWZfL2E=; b=xJADtbrED3UtJPuSAI4JMWNyp+ i7v5Cd+GRTZCfhHU7oV7xyi9utIm0avM8Eo87HMMcZk8+JpRwH+IIuvx8OhPaltgdZnRH33JHI7+y 8G49fmXQgxdKZcm06qDRVivXm7Th9C+wiDj9HAjYIzxL2cNbYJPzjJZqsiMHTDbBMQKsieIqxDvdn /BCI7FY7DWXuXrC06oDkoU2mk2vMQ2cDoMMZ2WKMglLpk/bnXXRHeyf0gmIKNAeBqdGjsEaw+f3Ca OjzaQR7rdGpBtbDabR+pDcNYQrCp90nHzg/eIFhrnz28/BbVxRm2l9m138BtbPGIW2GAT7x/6IyR8 BDDsFnWw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mhrMZ-001O2Z-PI; Tue, 02 Nov 2021 10:49:55 +0000 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mhrMW-001Nzz-4W for linux-nvme@lists.infradead.org; Tue, 02 Nov 2021 10:49:54 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1635850189; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Stb4+fCHX8yk7t0HzWUpLzpD3VwfDrSQlVWYPWZfL2E=; b=eaSOykrXP9E8A+vuiYoGNpm3sp18U64VzNofXWgk3Ng4/paqKPuXYRZRO0z0udupI552uI 6mfuPPagcVtdq1sEBPCYIJ16UUB4lQbaq+FDDAEpInPMQx3uHPuttB4EaR430urdjXPnBx BNwW5R2djGokpjfI94ukhIwvkpvjqsY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-369-AMrEnV7aOHuGHcnOfv7-VQ-1; Tue, 02 Nov 2021 06:49:45 -0400 X-MC-Unique: AMrEnV7aOHuGHcnOfv7-VQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 650DF101F001; Tue, 2 Nov 2021 10:49:44 +0000 (UTC) Received: from T590 (ovpn-8-19.pek2.redhat.com [10.72.8.19]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2054F60C0F; Tue, 2 Nov 2021 10:48:35 +0000 (UTC) Date: Tue, 2 Nov 2021 18:48:30 +0800 From: Ming Lei To: Shinichiro Kawasaki Cc: Jens Axboe , "linux-block@vger.kernel.org" , Damien Le Moal , "linux-nvme@lists.infradead.org" , Keith Busch , Christoph Hellwig , ming.lei@redhat.com Subject: Re: [bug report] block/005 hangs with NVMe device and linux-block/for-next Message-ID: References: <20211101083417.fcttizyxpahrcgov@shindev> <30d7ccec-c798-3936-67bd-e66ae59c318b@kernel.dk> <20211102022214.7hetxsg4z2yqafyd@shindev> <20211102082846.m632phnsaqnwtaec@shindev> <20211102090246.5own2pqinv3lw6qg@shindev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211102090246.5own2pqinv3lw6qg@shindev> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211102_034952_377376_365F8177 X-CRM114-Status: GOOD ( 46.98 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org On Tue, Nov 02, 2021 at 09:02:47AM +0000, Shinichiro Kawasaki wrote: > Let me add linux-nvme, Keith and Christoph to the CC list. > > -- > Best Regards, > Shin'ichiro Kawasaki > > > On Nov 02, 2021 / 17:28, Shin'ichiro Kawasaki wrote: > > On Nov 02, 2021 / 11:44, Ming Lei wrote: > > > On Tue, Nov 02, 2021 at 02:22:15AM +0000, Shinichiro Kawasaki wrote: > > > > On Nov 01, 2021 / 17:01, Jens Axboe wrote: > > > > > On 11/1/21 6:41 AM, Jens Axboe wrote: > > > > > > On 11/1/21 2:34 AM, Shinichiro Kawasaki wrote: > > > > > >> I tried the latest linux-block/for-next branch tip (git hash b43fadb6631f and > > > > > >> observed a process hang during blktests block/005 run on a NVMe device. > > > > > >> Kernel message reported "INFO: task check:1224 blocked for more than 122 > > > > > >> seconds." with call trace [1]. So far, the hang is 100% reproducible with my > > > > > >> system. This hang is not observed with HDDs or null_blk devices. > > > > > >> > > > > > >> I bisected and found the commit 4f5022453acd ("nvme: wire up completion batching > > > > > >> for the IRQ path") triggers the hang. When I revert this commit from the > > > > > >> for-next branch tip, the hang disappears. The block/005 test case does IO > > > > > >> scheduler switch during IO, and the completion path change by the commit looks > > > > > >> affecting the scheduler switch. Comments for solution will be appreciated. > > > > > > > > > > > > I'll take a look at this. > > > > > > > > > > I've tried running various things most of the day, and I cannot > > > > > reproduce this issue nor do I see what it could be. Even if requests are > > > > > split between batched completion and one-by-one completion, it works > > > > > just fine for me. No special care needs to be taken for put_many() on > > > > > the queue reference, as the wake_up() happens for the ref going to zero. > > > > > > > > > > Tell me more about your setup. What does the runtimes of the test look > > > > > like? Do you have all schedulers enabled? What kind of NVMe device is > > > > > this? > > > > > > > > Thank you for spending your precious time. With the kernel without the hang, > > > > the test case completes around 20 seconds. When the hang happens, the check > > > > script process stops at blk_mq_freeze_queue_wait() at scheduler change, and fio > > > > workload processes stop at __blkdev_direct_IO_simple(). The test case does not > > > > end, so I need to reboot the system for the next trial. While waiting the test > > > > case completion, the kernel repeats the same INFO message every 2 minutes. > > > > > > > > Regarding the scheduler, I compiled the kernel with mq-deadline and kyber. > > > > > > > > The NVMe device I use is a U.2 NVMe ZNS SSD. It has a zoned name space and > > > > a regular name space, and the hang is observed with both name spaces. I have > > > > not yet tried other NVME devices, so I will try them. > > > > > > > > > > > > > > FWIW, this is upstream now, so testing with Linus -git would be > > > > > preferable. > > > > > > > > I see. I have switched from linux-block for-next branch to the upstream branch > > > > of Linus. At git hash 879dbe9ffebc, and still the hang is observed. > > > > > > Can you post the blk-mq debugfs log after the hang is triggered? > > > > > > (cd /sys/kernel/debug/block/nvme0n1 && find . -type f -exec grep -aH . {} \;) > > > > Thanks Ming. When I ran the command above, the grep command stopped when it > > opened tag related files in the debugfs tree. That grep command looked hanking > > also. So I used the find command below instead to exclude the tag related files. > > > > # find . -type f -not -name *tag* -exec grep -aH . {} \; > > > > Here I share the captured log. > > It is a bit odd since batching completion shouldn't be triggered in case of io sched, but blk_mq_end_request_batch() does not restart hctx, so maybe you can try the following patch: diff --git a/block/blk-mq.c b/block/blk-mq.c index 07eb1412760b..4c0c9af9235e 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -846,16 +846,20 @@ void blk_mq_end_request_batch(struct io_comp_batch *iob) rq_qos_done(rq->q, rq); if (nr_tags == TAG_COMP_BATCH || cur_hctx != rq->mq_hctx) { - if (cur_hctx) + if (cur_hctx) { blk_mq_flush_tag_batch(cur_hctx, tags, nr_tags); + blk_mq_sched_restart(cur_hctx); + } nr_tags = 0; cur_hctx = rq->mq_hctx; } tags[nr_tags++] = rq->tag; } - if (nr_tags) + if (nr_tags) { blk_mq_flush_tag_batch(cur_hctx, tags, nr_tags); + blk_mq_sched_restart(cur_hctx); + } } EXPORT_SYMBOL_GPL(blk_mq_end_request_batch); -- Ming