From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753111AbaIJSOi (ORCPT ); Wed, 10 Sep 2014 14:14:38 -0400 Received: from mail-pa0-f47.google.com ([209.85.220.47]:52301 "EHLO mail-pa0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753054AbaIJSOg (ORCPT ); Wed, 10 Sep 2014 14:14:36 -0400 Message-ID: <54109514.4090901@kernel.dk> Date: Wed, 10 Sep 2014 12:14:44 -0600 From: Jens Axboe User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Robert Elliott , elliott@hp.com, hch@lst.de, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] block: default to rq_affinity=2 for blk-mq References: <20140910001417.9294.40414.stgit@beardog.cce.hp.com> <20140910001801.9294.79720.stgit@beardog.cce.hp.com> In-Reply-To: <20140910001801.9294.79720.stgit@beardog.cce.hp.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/09/2014 06:18 PM, Robert Elliott wrote: > From: Robert Elliott > > One change introduced by blk-mq is that it does all > the completion work in hard irq context rather than > soft irq context. > > On a 6 core system, if all interrupts are routed to > one CPU, then you can easily run into this: > * 5 CPUs submitting IOs > * 1 CPU spending 100% of its time in hard irq context > processing IO completions, not able to submit anything > itself > > Example with CPU5 receiving all interrupts: > CPU usage: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > %usr: 0.00 3.03 1.01 2.02 2.00 0.00 > %sys: 14.58 75.76 14.14 4.04 78.00 0.00 > %irq: 0.00 0.00 0.00 1.01 0.00 100.00 > %soft: 0.00 0.00 0.00 0.00 0.00 0.00 > %iowait idle: 85.42 21.21 84.85 92.93 20.00 0.00 > %idle: 0.00 0.00 0.00 0.00 0.00 0.00 > > When the submitting CPUs are forced to process their own > completion interrupts, this steals time from new > submissions and self-throttles them. > > Without that, there is no direct feedback to the > submitters to slow down. The only feedback is: > * reaching max queue depth > * lots of timeouts, resulting in aborts, resets, soft > lockups and self-detected stalls on CPU5, bogus > clocksource tsc unstable reports, network > drop-offs, etc. > > The SCSI LLD can set affinity_hint for each of its > interrupts to request that a program like irqbalance > route the interrupts back to the submitting CPU. > The latest version of irqbalance ignores those hints, > though, instead offering an option to run a policy > script that could honor them. Otherwise, it balances > them based on its own algorithms. So, we cannot rely > on this. > > Hardware might perform interrupt coalescing to help, > but it cannot help 1 CPU keep up with the work > generated by many other CPUs. > > rq_affinity=2 helps by pushing most of the block layer > and SCSI midlayer completion work back to the submitting > CPU (via an IPI). > > Change the default rq_affinity=2 under blk-mq > so there's at least some feedback to slow down the > submitters. I don't think we should do this generically. For "sane" devices with multiple completion queues, and with proper affinity setting in the driver, this is going to be a loss. So lets not add it to QUEUE_FLAG_MQ_DEFAULT, but we can make it default for nr_hw_queues == 1. I think that would be way saner. -- Jens Axboe