From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751692AbdJCS2C (ORCPT ); Tue, 3 Oct 2017 14:28:02 -0400 Received: from vulcan.natalenko.name ([104.207.131.136]:47296 "EHLO vulcan.natalenko.name" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751158AbdJCS2A (ORCPT ); Tue, 3 Oct 2017 14:28:00 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 vulcan.natalenko.name 28D032631D9 Authentication-Results: vulcan.natalenko.name; dmarc=fail (p=none dis=none) header.from=natalenko.name From: Oleksandr Natalenko To: Ming Lei Cc: Jens Axboe , linux-block@vger.kernel.org, Christoph Hellwig , linux-scsi@vger.kernel.org, "Martin K . Petersen" , "James E . J . Bottomley" , Bart Van Assche , Johannes Thumshirn , Cathy Avery , Martin Steigerwald , linux-kernel@vger.kernel.org, Hannes Reinecke Subject: Re: [PATCH V8 0/8] block/scsi: safe SCSI quiescing Date: Tue, 03 Oct 2017 20:27:56 +0200 Message-ID: <2586333.Nyo4hBpWP3@natalenko.name> In-Reply-To: <20171003140406.26060-1-ming.lei@redhat.com> References: <20171003140406.26060-1-ming.lei@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=natalenko.name; s=arc-20170712; t=1507055277; h=from:subject:date:message-id:to:cc:mime-version:content-type:content-transfer-encoding:in-reply-to:references; bh=rVqVDmNEXJpezybr8/j2cMy/KGyAA0rK9xvtHaygo1I=; b=fnNnNUDdW+C3lxBR8TtrW81tYlf1EOzCcbbFcoLnavk1sthK+nuFnGEKlAQeL1Gi5zU8eW Qt3qGk5JQfGhogEpH1/UAsERHpKgTrzCQgMeeDUGCjb9izjTnHPPSssFh09QKfO/2PcuWl iJY7AXaIqVvMfFL5CeNv3+8uF9KkEW8= ARC-Seal: i=1; s=arc-20170712; d=natalenko.name; t=1507055277; a=rsa-sha256; cv=none; b=bZFChiv7tPpWIT85et+Q9Kbt/V+NVmWujhvtA6Q3P5tDzmKwlOhMaKZgHZwF/GtopM8dKA2ewdo2KQX85bcfeG2lTVHK7TXIoisr9UY7NaO5bvnsTikOJj/3F9tfSnXC0NJsS9NNd/AomHa0JR+osVUrSxrzHoB3gwYEh1yIWA8= ARC-Authentication-Results: i=1; auth=pass smtp.auth=oleksandr@natalenko.name smtp.mailfrom=oleksandr@natalenko.name Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by nfs id v93IS8re011752 Also Tested-by: Oleksandr Natalenko for whole v8. On úterý 3. října 2017 16:03:58 CEST Ming Lei wrote: > Hi Jens, > > Please consider this patchset for V4.15, and it fixes one > kind of long-term I/O hang issue in either block legacy path > or blk-mq. > > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock. > > Once SCSI device is put into QUIESCE, no new request except for > RQF_PREEMPT can be dispatched to SCSI successfully, and > scsi_device_quiesce() just simply waits for completion of I/Os > dispatched to SCSI stack. It isn't enough at all. > > Because new request still can be comming, but all the allocated > requests can't be dispatched successfully, so request pool can be > consumed up easily. > > Then request with RQF_PREEMPT can't be allocated and wait forever, > then system hangs forever, such as during system suspend or > sending SCSI domain alidation in case of transport_spi. > > Both IO hang inside system suspend[1] or SCSI domain validation > were reported before. > > This patch introduces preempt only mode, and solves the issue > by allowing RQF_PREEMP only during SCSI quiesce. > > Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes > them all. > > V8: > - fix one race as pointed out by Bart > - pass 'op' to blk_queue_enter() as suggested by Christoph > > V7: > - add Reviewed-by & Tested-by > - one line change in patch 5 for checking preempt request > > V6: > - borrow Bart's idea of preempt only, with clean > implementation(patch 5/patch 6) > - needn't any external driver's dependency, such as MD's > change > > V5: > - fix one tiny race by introducing blk_queue_enter_preempt_freeze() > given this change is small enough compared with V4, I added > tested-by directly > > V4: > - reorganize patch order to make it more reasonable > - support nested preempt freeze, as required by SCSI transport spi > - check preempt freezing in slow path of of blk_queue_enter() > - add "SCSI: transport_spi: resume a quiesced device" > - wake up freeze queue in setting dying for both blk-mq and legacy > - rename blk_mq_[freeze|unfreeze]_queue() in one patch > - rename .mq_freeze_wq and .mq_freeze_depth > - improve comment > > V3: > - introduce q->preempt_unfreezing to fix one bug of preempt freeze > - call blk_queue_enter_live() only when queue is preempt frozen > - cleanup a bit on the implementation of preempt freeze > - only patch 6 and 7 are changed > > V2: > - drop the 1st patch in V1 because percpu_ref_is_dying() is > enough as pointed by Tejun > - introduce preempt version of blk_[freeze|unfreeze]_queue > - sync between preempt freeze and normal freeze > - fix warning from percpu-refcount as reported by Oleksandr > > > [1] https://marc.info/?t=150340250100013&r=3&w=2 > > > Thanks, > Ming > > Bart Van Assche (1): > block: Convert RQF_PREEMPT into REQ_PREEMPT > > Ming Lei (7): > blk-mq: only run hw queues for blk-mq > block: tracking request allocation with q_usage_counter > block: pass 'op' to blk_queue_enter() > percpu-refcount: introduce __percpu_ref_tryget_live > blk-mq: return if queue is frozen via current blk_freeze_queue_start > block: support PREEMPT_ONLY > SCSI: set block queue at preempt only when SCSI device is put into > quiesce > > block/blk-core.c | 66 > +++++++++++++++++++++++++++++++++++++---- block/blk-mq-debugfs.c | > 2 +- > block/blk-mq.c | 26 ++++++++-------- > block/blk-mq.h | 1 - > block/blk-timeout.c | 2 +- > block/blk.h | 2 +- > drivers/ide/ide-atapi.c | 3 +- > drivers/ide/ide-io.c | 2 +- > drivers/ide/ide-pm.c | 4 +-- > drivers/scsi/scsi_lib.c | 31 +++++++++++++++---- > fs/block_dev.c | 4 +-- > include/linux/blk-mq.h | 4 +-- > include/linux/blk_types.h | 6 ++++ > include/linux/blkdev.h | 10 ++++--- > include/linux/percpu-refcount.h | 27 ++++++++++------- > 15 files changed, 137 insertions(+), 53 deletions(-)