From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:41963 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752672AbcHLR4H (ORCPT ); Fri, 12 Aug 2016 13:56:07 -0400 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id u7CHnkiC061272 for ; Fri, 12 Aug 2016 13:56:06 -0400 Received: from e24smtp02.br.ibm.com (e24smtp02.br.ibm.com [32.104.18.86]) by mx0a-001b2d01.pphosted.com with ESMTP id 24ryvsqjfh-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Fri, 12 Aug 2016 13:56:06 -0400 Received: from localhost by e24smtp02.br.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 12 Aug 2016 14:56:03 -0300 From: Gabriel Krisman Bertazi To: Jens Axboe Cc: Brian King , linux-block@vger.kernel.org, linux-scsi@vger.kernel.org Subject: Re: [RFC PATCH] blk-mq: Prevent round-robin from scheduling dead cpus References: <1470154771-22774-1-git-send-email-krisman@linux.vnet.ibm.com> Date: Fri, 12 Aug 2016 14:56:01 -0300 In-Reply-To: <1470154771-22774-1-git-send-email-krisman@linux.vnet.ibm.com> (Gabriel Krisman Bertazi's message of "Tue, 2 Aug 2016 13:19:31 -0300") MIME-Version: 1.0 Content-Type: text/plain Message-Id: <874m6pswge.fsf@linux.vnet.ibm.com> Sender: linux-block-owner@vger.kernel.org List-Id: linux-block@vger.kernel.org Gabriel Krisman Bertazi writes: > Hi, > > I'm not completely sure I got the cause for this one completely right. > Still, it does looks like the correct fix and a good improvement in the > overall, so I'm making it an RFC for now to gather some feedback. > > Let me hear your thoughts. ping > > -- >8 -- > > When notifying blk-mq about CPU removals while running IO, we risk > racing the hctx->cpumask update with blk_mq_hctx_next_cpu, and end up > scheduling a dead cpu to execute hctx->run_{,delayed_}work. As a > result, kblockd_schedule_delayed_work_on() may schedule another cpu > outside of hctx->cpumask, which triggers the following warning at > __blk_mq_run_hw_queue: > > WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)); > > This patch makes the issue much more unlikely to happen, as it makes > blk_mq_hctx_next_cpu aware of dead cpus, and triggers the round-robin > code, despite of remaining batch processing time. Thus, in case we > offline a cpu in the middle of its batch processing time, we no longer > waste time scheduling it here, and just move through to the next cpu in > the mask. > > The warning may still be triggered, though, since this is not the only > case that may cause the queue to schedule on a dead cpu. But this fixes > the common case, which is the remaining batch processing time of a > sudden dead cpu, which makes the issue much more unlikely to happen. > > Signed-off-by: Gabriel Krisman Bertazi > Cc: Brian King > Cc: linux-block@vger.kernel.org > Cc: linux-scsi@vger.kernel.org > --- > block/blk-mq.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index c27bb37..a2cb64c 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -858,7 +858,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) > if (hctx->queue->nr_hw_queues == 1) > return WORK_CPU_UNBOUND; > > - if (--hctx->next_cpu_batch <= 0) { > + if (--hctx->next_cpu_batch <= 0 || > + !cpumask_test_cpu(hctx->next_cpu, cpu_online_mask)) { > int cpu = hctx->next_cpu, next_cpu; > > next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask); > @@ -868,7 +869,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) > hctx->next_cpu = next_cpu; > hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; > > - return cpu; > + return (cpumask_test_cpu(cpu, cpu_online_mask)) ? > + cpu : blk_mq_hctx_next_cpu(hctx); > } > > return hctx->next_cpu; -- Gabriel Krisman Bertazi