From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D40BCC433F5 for ; Mon, 13 Dec 2021 07:38:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232523AbhLMHi1 (ORCPT ); Mon, 13 Dec 2021 02:38:27 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:49960 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232553AbhLMHi0 (ORCPT ); Mon, 13 Dec 2021 02:38:26 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639381105; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=a/bulpOGfX7Ol3Co48vHOs47RkeTbU1WRhBnHDiLHQU=; b=jGhCDwimIsqlDW0PCJGJth+mbZa7Ud0lV6qzxHOYSXkCh1HE0sKEh6reV5WQy00qVxmuuv Iq5GCL40qh69a/1NqsMun14+4B0s3tD6KOK3mukpW6OR1SCjjTLE3nVP1WIOWGUmj+bXP3 koPA/kYmZ4jjGHdogb59eI8Rf1xP3EM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-546-mrNOIbXxNEa1MBVpMLU7NA-1; Mon, 13 Dec 2021 02:38:24 -0500 X-MC-Unique: mrNOIbXxNEa1MBVpMLU7NA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 14169344BE; Mon, 13 Dec 2021 07:38:23 +0000 (UTC) Received: from T590 (ovpn-8-29.pek2.redhat.com [10.72.8.29]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4466F610A7; Mon, 13 Dec 2021 07:38:11 +0000 (UTC) Date: Mon, 13 Dec 2021 15:38:07 +0800 From: Ming Lei To: Dexuan Cui Cc: Jens Axboe , 'Christoph Hellwig' , "'linux-block@vger.kernel.org'" , Long Li , "Michael Kelley (LINUX)" , "'linux-kernel@vger.kernel.org'" Subject: Re: Random high CPU utilization in blk-mq with the none scheduler Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Mon, Dec 13, 2021 at 04:20:49AM +0000, Dexuan Cui wrote: > > From: Ming Lei > > ... > > Can you provide the following blk-mq debugfs log? > > > > (cd /sys/kernel/debug/block/dm-N && find . -type f -exec grep -aH . {} \;) > > > > (cd /sys/kernel/debug/block/sdN && find . -type f -exec grep -aH . {} \;) > > > > And it is enough to just collect log from one dm-mpath & one underlying iscsi > > disk, > > so we can understand basic blk-mq setting, such as nr_hw_queues, queue > > depths, ... > > > > Thanks, > > Ming > > Attached. I collected the logs for all the dm-* and sd* devices against > v5.16-rc4 with the 3 commits reverted: > b22809092c70 ("block: replace always false argument with 'false'") > ff1552232b36 ("blk-mq: don't issue request directly in case that current is to be blocked") > dc5fc361d891 ("block: attempt direct issue of plug list") > > v5.16-rc4 does not reproduce the issue, so I'm pretty sure b22809092c70 is the > patch that resolves the excessive CPU usage. >From the log: 1) dm-mpath: - queue depth: 2048 - busy: 848, and 62 of them are in sw queue, so run queue is often caused - nr_hw_queues: 1 - dm-2 is in use, and dm-1/dm-3 is idle - dm-2's dispatch busy is 8, that should be the reason why excessive CPU usage is observed when flushing plug list without commit dc5fc361d891 in which hctx->dispatch_busy is just bypassed 2) iscsi - dispatch_busy is 0 - nr_hw_queues: 1 - queue depth: 113 - busy=~33, active_queues is 3, so each LUN/iscsi host is saturated - 23 active LUNs, 23 * 33 = 759 in-flight commands The high CPU utilization may be caused by: 1) big queue depth of dm mpath, the situation may be improved much if it is reduced to 1024 or 800. The max allowed inflight commands from iscsi hosts can be figured out, if dm's queue depth is much more than this number, the extra commands need to dispatch, and run queue can be scheduled immediately, so high CPU utilization is caused. 2) single hw queue, so contention should be big, which should be avoided in big machine, nvme-tcp might be better than iscsi here 3) iscsi io latency is a bit big Even CPU utilization is reduced by commit dc5fc361d891, io performance can't be good too with v5.16-rc, I guess. Thanks, Ming