From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CE13C11F65 for ; Wed, 30 Jun 2021 08:43:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4072F61D05 for ; Wed, 30 Jun 2021 08:43:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233451AbhF3Ipc (ORCPT ); Wed, 30 Jun 2021 04:45:32 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:51524 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233210AbhF3Ipc (ORCPT ); Wed, 30 Jun 2021 04:45:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625042583; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=+K2zZjhe3r1i8JK801FyLP5ypzTQN49MZp3LuTInG34=; b=NHULOkXPqAOOGlEvkEib44GmiM7I/qzN4q2N7gZ8OSfJCJgbSm1yQi6V52kDfYBuqklVJs JMNIOOptFE0qvEyOnCboUOqLmz4eYIVBnuAuCZpNCsCvA7fVVtfHbO2pyTdv61ttTTAAFM dGE8tdHO6VlcNrdK6kz8RUlIiWMkyts= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-279-RjPb-VVCNJ-VPL9wbvj3xg-1; Wed, 30 Jun 2021 04:42:58 -0400 X-MC-Unique: RjPb-VVCNJ-VPL9wbvj3xg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E6866800C60; Wed, 30 Jun 2021 08:42:56 +0000 (UTC) Received: from T590 (ovpn-13-153.pek2.redhat.com [10.72.13.153]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 6661E60843; Wed, 30 Jun 2021 08:42:47 +0000 (UTC) Date: Wed, 30 Jun 2021 16:42:43 +0800 From: Ming Lei To: Hannes Reinecke Cc: Jens Axboe , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, Christoph Hellwig , Sagi Grimberg , Daniel Wagner , Wen Xiong , John Garry Subject: Re: [PATCH 0/2] blk-mq: fix blk_mq_alloc_request_hctx Message-ID: References: <20210629074951.1981284-1-ming.lei@redhat.com> <5f304121-38ce-034b-2d17-93d136c77fe6@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5f304121-38ce-034b-2d17-93d136c77fe6@suse.de> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Wed, Jun 30, 2021 at 10:18:37AM +0200, Hannes Reinecke wrote: > On 6/29/21 9:49 AM, Ming Lei wrote: > > Hi, > > > > blk_mq_alloc_request_hctx() is used by NVMe fc/rdma/tcp/loop to connect > > io queue. Also the sw ctx is chosen as the 1st online cpu in hctx->cpumask. > > However, all cpus in hctx->cpumask may be offline. > > > > This usage model isn't well supported by blk-mq which supposes allocator is > > always done on one online CPU in hctx->cpumask. This assumption is > > related with managed irq, which also requires blk-mq to drain inflight > > request in this hctx when the last cpu in hctx->cpumask is going to > > offline. > > > > However, NVMe fc/rdma/tcp/loop don't use managed irq, so we should allow > > them to ask for request allocation when the specified hctx is inactive > > (all cpus in hctx->cpumask are offline). > > > > Fix blk_mq_alloc_request_hctx() by adding/passing flag of > > BLK_MQ_F_NOT_USE_MANAGED_IRQ. > > > > > > Ming Lei (2): > > blk-mq: not deactivate hctx if the device doesn't use managed irq > > nvme: pass BLK_MQ_F_NOT_USE_MANAGED_IRQ for fc/rdma/tcp/loop > > > > block/blk-mq.c | 6 +++++- > > drivers/nvme/host/fc.c | 3 ++- > > drivers/nvme/host/rdma.c | 3 ++- > > drivers/nvme/host/tcp.c | 3 ++- > > drivers/nvme/target/loop.c | 3 ++- > > include/linux/blk-mq.h | 1 + > > 6 files changed, 14 insertions(+), 5 deletions(-) > > > > Cc: Sagi Grimberg > > Cc: Daniel Wagner > > Cc: Wen Xiong > > Cc: John Garry > > > > > I have my misgivings about this patchset. > To my understanding, only CPUs present in the hctx cpumask are eligible to > submit I/O to that hctx. It is just true for managed irq, and should be CPUs online. However, no such constraint for non managed irq, since irq may migrate to other online CPUs if all CPUs in irq's current affinity become offline. > Consequently if all cpus in that mask are offline, where is the point of > even transmitting a 'connect' request? nvmef requires to submit the connect request via one specified hctx which index has to be same with the io queue's index. Almost all nvmef drivers fail to setup controller in case of connect io queue error. Also CPU can become offline & online, especially it is done in lots of sanity test. So we should allow to allocate the connect request successful, and submit it to drivers given it is allowed in this way for non-managed irq. > Shouldn't we rather modify the tagset to only refer to the current online > CPUs _only_, thereby never submit a connect request for hctx with only > offline CPUs? Then you may setup very less io queues, and performance may suffer even though lots of CPUs become online later. Thanks, Ming