From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7C0FC25B08 for ; Thu, 18 Aug 2022 03:25:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243103AbiHRDZE (ORCPT ); Wed, 17 Aug 2022 23:25:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47196 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242748AbiHRDZD (ORCPT ); Wed, 17 Aug 2022 23:25:03 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB3365C961 for ; Wed, 17 Aug 2022 20:25:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1660793100; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=3MPK+GM2uD1yCea/zMrRjLKC1MQOzHK2bgMfr6/LwJY=; b=FCOmCd1zTMP5ewQVtkF9vmSuU6PSbpAxg90/p2egm0jjjMQ/EyLY/Wu3Oghmyd6OZmHmOw iJYltkxLzMEk3XNSBMdGOHQO3JH9hRhAwnj4ustgoShDfwc7r6PefoZE/3afJn5Z6LePKM JJ40DVEghTBYKa0t21eGNOR6a7HZ4FE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-463-ZFERLax1Opu6ZvU_1l8PDQ-1; Wed, 17 Aug 2022 23:24:56 -0400 X-MC-Unique: ZFERLax1Opu6ZvU_1l8PDQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F1EC3804197; Thu, 18 Aug 2022 03:24:55 +0000 (UTC) Received: from T590 (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0487E2026D4C; Thu, 18 Aug 2022 03:24:46 +0000 (UTC) Date: Thu, 18 Aug 2022 11:24:41 +0800 From: Ming Lei To: Chris Murphy Cc: Nikolay Borisov , Jens Axboe , Jan Kara , Paolo Valente , Btrfs BTRFS , Linux-RAID , linux-block , linux-kernel , Josef Bacik Subject: Re: stalling IO regression since linux 5.12, through 5.18 Message-ID: References: <2b8a38fa-f15f-45e8-8caa-61c5f8cd52de@www.fastmail.com> <35f0d608-7448-4276-8922-19a23d8f9049@www.fastmail.com> <568465de-5c3b-4d94-a74b-5b83ce2f942f@www.fastmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <568465de-5c3b-4d94-a74b-5b83ce2f942f@www.fastmail.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Wed, Aug 17, 2022 at 10:30:39PM -0400, Chris Murphy wrote: > > > On Wed, Aug 17, 2022, at 9:03 PM, Ming Lei wrote: > > On Wed, Aug 17, 2022 at 12:34:42PM -0400, Chris Murphy wrote: > >> > >> > >> On Wed, Aug 17, 2022, at 11:34 AM, Ming Lei wrote: > >> > >> > From the 2nd log of blockdebugfs-all.txt, still not see any in-flight IO on > >> > request based block devices, but sda is _not_ included in this log, and > >> > only sdi, sdg and sdf are collected, is that expected? > >> > >> While the problem was happening I did > >> > >> cd /sys/kernel/debug/block > >> find . -type f -exec grep -aH . {} \; > >> > >> The file has the nodes out of order, but I don't know enough about the interface to see if there are things that are missing, or what it means. > >> > >> > >> > BTW, all request based block devices should be observed in blk-mq debugfs. > >> > >> /sys/kernel/debug/block contains > >> > >> drwxr-xr-x. 2 root root 0 Aug 17 15:20 md0 > >> drwxr-xr-x. 51 root root 0 Aug 17 15:20 sda > >> drwxr-xr-x. 51 root root 0 Aug 17 15:20 sdb > >> drwxr-xr-x. 51 root root 0 Aug 17 15:20 sdc > >> drwxr-xr-x. 51 root root 0 Aug 17 15:20 sdd > >> drwxr-xr-x. 51 root root 0 Aug 17 15:20 sde > >> drwxr-xr-x. 51 root root 0 Aug 17 15:20 sdf > >> drwxr-xr-x. 51 root root 0 Aug 17 15:20 sdg > >> drwxr-xr-x. 51 root root 0 Aug 17 15:20 sdh > >> drwxr-xr-x. 4 root root 0 Aug 17 15:20 sdi > >> drwxr-xr-x. 2 root root 0 Aug 17 15:20 zram0 > > > > OK, so lots of devices are missed in your log, and the following command > > is supposed to work for collecting log from all block device's debugfs: > > > > (cd /sys/kernel/debug/block/ && find . -type f -exec grep -aH . {} \;) > > OK here it is: > > https://drive.google.com/file/d/18nEOx2Ghsqx8uII6nzWpCFuYENHuQd-f/view?usp=sharing The above log shows that the io stall happens on sdd, where: 1) 616 requests pending from scheduler queue grep "busy=" blockdebugfs-all2.txt | grep sdd | grep sched | awk -F "=" '{s+=$2} END {print s}' 616 2) 11 requests pending from ./sdd/hctx2/dispatch for more than 300 seconds Recently we seldom observe io hang from dispatch list, except for the following two: https://lore.kernel.org/linux-block/20220803023355.3687360-1-yuyufen@huaweicloud.com/ https://lore.kernel.org/linux-block/20220726122224.1790882-1-yukuai1@huaweicloud.com/ BTW, what is the output of the following log? (cd /sys/block/sdd/device && find . -type f -exec grep -aH . {} \;) Also the above log shows that host_tagset_enable support is still crippled on v5.12, I guess the issue may not be triggered(or pretty hard) after you update to d97e594c5166 ("blk-mq: Use request queue-wide tags for tagset-wide sbitmap"), or v5.14. thanks, Ming