From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f182.google.com ([209.85.216.182]:33776 "EHLO mail-qt0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752126AbdHHN6L (ORCPT ); Tue, 8 Aug 2017 09:58:11 -0400 Received: by mail-qt0-f182.google.com with SMTP id a18so20174572qta.0 for ; Tue, 08 Aug 2017 06:58:10 -0700 (PDT) Subject: Re: [PATCH V2 00/20] blk-mq-sched: improve SCSI-MQ performance To: Ming Lei Cc: Jens Axboe , linux-block@vger.kernel.org, Christoph Hellwig , Bart Van Assche References: <20170805065705.12989-1-ming.lei@redhat.com> <20170808134110.GA22763@ming.t460p> From: Laurence Oberman Message-ID: Date: Tue, 8 Aug 2017 09:58:07 -0400 MIME-Version: 1.0 In-Reply-To: <20170808134110.GA22763@ming.t460p> Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-block-owner@vger.kernel.org List-Id: linux-block@vger.kernel.org On 08/08/2017 09:41 AM, Ming Lei wrote: > Hi Laurence and Guys, > > On Mon, Aug 07, 2017 at 06:06:11PM -0400, Laurence Oberman wrote: >> On Mon, Aug 7, 2017 at 8:48 AM, Laurence Oberman >> wrote: >> Hello >> >> I need to retract my Tested-by: >> >> While its valid that the patches do not introduce performance regressions, >> they seem to cause a hard lockup when the [mq-deadline] scheduler is >> enabled so I am not confident with a passing result here. >> >> This is specific to large buffered I/O writes (4MB) At least that is my >> current test. >> >> I did not wait long enough for the issue to show when I first sent the pass >> (Tested-by) message because I know my test platform so well I thought I had >> given it enough time to validate the patches for performance regressions. >> >> I dont know if the failing clone in blk_get_request() is a direct a >> catalyst for the hard lockup but what I do know is with the stock upstream >> 4.13-RC3 I only see them when I am set to [none] and stock upstream never >> seems to see the hard lockup. >> >> With [mq-deadline] enabled on stock I dont see them at all and it behaves. >> >> Now with Ming's patches if we enable [mq-deadline] we DO see the clone >> failures and the hard lockup so we have opposit behaviour with the >> scheduler choice and we have the hard lockup. >> >> On Ming's kernel with [none] we are well behaved and that was my original >> focus, testing on [none] and hence my Tested-by: pass. >> >> So more investigation is needed here. > > Laurence, as we talked in IRC, the hard lock issue you saw isn't > related with this patchset, because the issue can be reproduced on > both v4.13-rc3 and RHEL7. The only trick is to run your hammer > write script concurrently in 16 jobs, then it just takes several > minutes to trigger, no matter with using mq none or mq-deadline > scheduler. > > Given it is easy to reproduce, I believe it shouldn't be very > difficult to investigate and root cause. > > I will report the issue on another thread, and attach the > script for reproduction. > > So let's focus on this patchset([PATCH V2 00/20] blk-mq-sched: improve > SCSI-MQ performance) in this thread. > > Thanks again for your test! > > Thanks, > Ming > Hello Ming, Yes I agree, this means my original Tested-by: for your patch set is then still valid for large size I/O tests. Thank you for all this hard work and improving block-MQ Regards Laurence