From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B81C2C04EB8 for ; Wed, 5 Dec 2018 02:30:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 70B5F20645 for ; Wed, 5 Dec 2018 02:30:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="gHyeRDJz" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 70B5F20645 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-block-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725841AbeLECa3 (ORCPT ); Tue, 4 Dec 2018 21:30:29 -0500 Received: from mail-pl1-f196.google.com ([209.85.214.196]:39457 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725834AbeLECa2 (ORCPT ); Tue, 4 Dec 2018 21:30:28 -0500 Received: by mail-pl1-f196.google.com with SMTP id 101so9277866pld.6 for ; Tue, 04 Dec 2018 18:30:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=ijMQ6PfdBdcJ5IRW3QIGcxEoQ3fzgfItltRULwFKlcI=; b=gHyeRDJzCiA1JZaoUMQpfPlUvNN5iwdMw3o4M1IicLj5ycqA6yQ434AywIgocagpqb YjvTiJ7+1NJA5XzpiZcLm33u9OlNV0Vh8fARMOZ/SSTZpb5xAGYhtJOsW/aVNhUvX/+y rMJcYC28cFZq1TREoMqpU/cCU0raWYx9kaqxh9aG1tPacU3bNDJu27VBlHnSVnKG22HQ mx959yJA4Ks4LDDknmtr87QIF6gsxx0ubsZw1hcjnUgNMdxPhTEYIJYEZ81Uf1BhOA+d PIHRwmHsAodA8D8ORPnprpNINaW8p76bHKAR0qgjDiWwm43vWYL1wrBkG1GCLRge8NwX kDsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=ijMQ6PfdBdcJ5IRW3QIGcxEoQ3fzgfItltRULwFKlcI=; b=DRMzq5/N7DQHnA8FgWkV0+JgEHCn7J3kAovHfEontprCmttFlu4P4iu1uSeXLnNv9N 1Xd6y3eWV1AR7WbeSBuO2wwFrshc7Yf6oefjSOAZwrUvGzXv2lms3yunj4+NIO0QJ1HS OqwX6ByCFQ7bER9QxQzLRNsLp9ukUBAPD3QYeFyK+rU5o+uKUd9iRb/p4z8FdmZ4V/90 roaasFPh/SQ0II5i1hwv+yz5XJZffZ3rjHGs6nogiAdr9kxfV9qnOpTVAYRGYqX+ZIjg TYF5wkxyXdRhHKjC/MNl4PG+C58bz8ZOepa9KhUayeNtwAUqaDBalWs8dYFW5mN9GmA9 G9sQ== X-Gm-Message-State: AA+aEWZwjuPvUI+EPZg7/ttKGl/R9EnmeOm7DF7f2dqLxOQgn7f7pwYc wakmF02zFQW74+oTNEotV0EdlBjWQ7I= X-Google-Smtp-Source: AFSGD/Xi4dbRqAEsyBgEp9IEfuF3L/aq4L5tL+NgMshnJO566jY1o5wL+z0ncRLzIZNgKcj9mp0uxg== X-Received: by 2002:a17:902:112c:: with SMTP id d41mr877780pla.144.1543977027322; Tue, 04 Dec 2018 18:30:27 -0800 (PST) Received: from [192.168.1.121] (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id p2sm20566336pgc.94.2018.12.04.18.30.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 04 Dec 2018 18:30:26 -0800 (PST) Subject: Re: [PATCH] blk-mq: fix corruption with direct issue To: Ming Lei Cc: "linux-block@vger.kernel.org" References: <1d359819-5410-7af2-d02b-f0ecca39d2c9@kernel.dk> <20181205013736.GD17845@ming.t460p> <37bf8821-c205-717a-df0d-96ecfb0f75aa@kernel.dk> <20181205022716.GE17845@ming.t460p> From: Jens Axboe Message-ID: <227a40a3-6599-9fc0-ab58-674f063e9c3a@kernel.dk> Date: Tue, 4 Dec 2018 19:30:24 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <20181205022716.GE17845@ming.t460p> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On 12/4/18 7:27 PM, Ming Lei wrote: > On Tue, Dec 04, 2018 at 07:16:11PM -0700, Jens Axboe wrote: >> On 12/4/18 6:37 PM, Ming Lei wrote: >>> On Tue, Dec 04, 2018 at 03:47:46PM -0700, Jens Axboe wrote: >>>> If we attempt a direct issue to a SCSI device, and it returns BUSY, then >>>> we queue the request up normally. However, the SCSI layer may have >>>> already setup SG tables etc for this particular command. If we later >>>> merge with this request, then the old tables are no longer valid. Once >>>> we issue the IO, we only read/write the original part of the request, >>>> not the new state of it. >>>> >>>> This causes data corruption, and is most often noticed with the file >>>> system complaining about the just read data being invalid: >>>> >>>> [ 235.934465] EXT4-fs error (device sda1): ext4_iget:4831: inode #7142: comm dpkg-query: bad extra_isize 24937 (inode size 256) >>>> >>>> because most of it is garbage... >>>> >>>> This doesn't happen from the normal issue path, as we will simply defer >>>> the request to the hardware queue dispatch list if we fail. Once it's on >>>> the dispatch list, we never merge with it. >>>> >>>> Fix this from the direct issue path by flagging the request as >>>> REQ_NOMERGE so we don't change the size of it before issue. >>>> >>>> See also: >>>> https://bugzilla.kernel.org/show_bug.cgi?id=201685 >>>> >>>> Fixes: 6ce3dd6eec1 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'") >>>> Signed-off-by: Jens Axboe >>>> >>>> --- >>>> >>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>> index 3f91c6e5b17a..d8f518c6ea38 100644 >>>> --- a/block/blk-mq.c >>>> +++ b/block/blk-mq.c >>>> @@ -1715,6 +1715,15 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx, >>>> break; >>>> case BLK_STS_RESOURCE: >>>> case BLK_STS_DEV_RESOURCE: >>>> + /* >>>> + * If direct dispatch fails, we cannot allow any merging on >>>> + * this IO. Drivers (like SCSI) may have set up permanent state >>>> + * for this request, like SG tables and mappings, and if we >>>> + * merge to it later on then we'll still only do IO to the >>>> + * original part. >>>> + */ >>>> + rq->cmd_flags |= REQ_NOMERGE; >>>> + >>>> blk_mq_update_dispatch_busy(hctx, true); >>>> __blk_mq_requeue_request(rq); >>>> break; >>>> >>> >>> Not sure it is enough to just mark it as NOMERGE, for example, driver >>> may have setup the .special_vec for discard, and NOMERGE may not prevent >>> request from entering elevator queue completely. Cause 'rq.rb_node' and >>> 'rq.special_vec' share same space. >> >> We should rather limit the scope of the direct dispatch instead. It >> doesn't make sense to do for anything but read/write anyway. > > discard is kind of write, and it isn't treated very specially in make > request path, except for multi-range discard. The point of direct dispatch is to reduce latencies for requests, discards are so damn slow on ALL devices anyway that it doesn't make any sense to try direct dispatch to begin with, regardless of whether it possible or not. >>> So how about inserting this request via blk_mq_request_bypass_insert() >>> in case that direct issue returns BUSY? Then it is invariant that >>> any request queued via .queue_rq() won't enter scheduler queue. >> >> I did consider this, but I didn't want to experiment with exercising >> a new path for an important bug fix. You do realize that your original >> patch has been corrupting data for months? I think a little caution >> is in order here. > > But marking NOMERGE still may have a hole on re-insert discard request as > mentioned above. What I said was further limit the scope of direct dispatch, which means not allowing anything that isn't a read/write. > Given we never allow to re-insert queued request to scheduler queue > except for 6ce3dd6eec1, I think it is the correct thing to do, and the > fix is simple too. As I said, it's not the time to experiment. This issue has been there since 4.19-rc1. The alternative is yanking both those patches, and then looking at it later when the direct issue path has been cleaned up first. -- Jens Axboe