From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DA7DC433E1 for ; Wed, 26 Aug 2020 12:06:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5DA97208A9 for ; Wed, 26 Aug 2020 12:06:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729349AbgHZMGU (ORCPT ); Wed, 26 Aug 2020 08:06:20 -0400 Received: from lhrrgout.huawei.com ([185.176.76.210]:2693 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729494AbgHZMGG (ORCPT ); Wed, 26 Aug 2020 08:06:06 -0400 Received: from lhreml724-chm.china.huawei.com (unknown [172.18.7.108]) by Forcepoint Email with ESMTP id 3C460F36EAB786817D3; Wed, 26 Aug 2020 13:06:05 +0100 (IST) Received: from [127.0.0.1] (10.47.10.200) by lhreml724-chm.china.huawei.com (10.201.108.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.1913.5; Wed, 26 Aug 2020 13:06:04 +0100 Subject: Re: [PATCH 0/5] blk-mq: fix use-after-free on stale request To: Ming Lei , Bart Van Assche CC: Jens Axboe , , Hannes Reinecke , Christoph Hellwig References: <20200820180335.3109216-1-ming.lei@redhat.com> <20200821024949.GA3110165@T590> From: John Garry Message-ID: Date: Wed, 26 Aug 2020 13:03:37 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.1.2 MIME-Version: 1.0 In-Reply-To: <20200821024949.GA3110165@T590> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.47.10.200] X-ClientProxiedBy: lhreml707-chm.china.huawei.com (10.201.108.56) To lhreml724-chm.china.huawei.com (10.201.108.75) X-CFilter-Loop: Reflected Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On 21/08/2020 03:49, Ming Lei wrote: > Hello Bart, > > On Thu, Aug 20, 2020 at 01:30:38PM -0700, Bart Van Assche wrote: >> On 8/20/20 11:03 AM, Ming Lei wrote: >>> We can't run allocating driver tag and updating tags->rqs[tag] atomically, >>> so stale request may be retrieved from tags->rqs[tag]. More seriously, the >>> stale request may have been freed via updating nr_requests or switching >>> elevator or other use cases. >>> >>> It is one long-term issue, and Jianchao previous worked towards using >>> static_rqs[] for iterating request, one problem is that it can be hard >>> to use when iterating over tagset. >>> >>> This patchset takes another different approach for fixing the issue: cache >>> freed rqs pages and release them until all tags->rqs[] references on these >>> pages are gone. >> >> Hi Ming, >> >> Is this the only possible solution? Would it e.g. be possible to protect the >> code that iterates over all tags with rcu_read_lock() / rcu_read_unlock() and >> to free pages that contain request pointers only after an RCU grace period has >> expired? > > That can't work, tags->rqs[] is host-wide, request pool belongs to scheduler tag > and it is owned by request queue actually. When one elevator is switched on this > request queue or updating nr_requests, the old request pool of this queue is freed, > but IOs are still queued from other request queues in this tagset. Elevator switch > or updating nr_requests on one request queue shouldn't or can't other request queues > in the same tagset. > > Meantime the reference in tags->rqs[] may stay a bit long, and RCU can't cover this > case. > > Also we can't reset the related tags->rqs[tag] simply somewhere, cause it may > race with new driver tag allocation. How about iterate all tags->rqs[] for all scheduler tags when exiting the scheduler, etc, and clear any scheduler requests references, like this: cmpxchg(&hctx->tags->rqs[tag], scheduler_rq, 0); So we NULLify any tags->rqs[] entries which contain a scheduler request of concern atomically, cleaning up any references. I quickly tried it and it looks to work, but maybe not so elegant. Or some atomic update is required, > but obviously extra load is introduced in fast path. Yes, similar said on this patch: https://lore.kernel.org/linux-block/cf524178-c497-373c-37f6-abee13eacf19@kernel.dk/ > >> Would that perhaps result in a simpler solution? > > No, that doesn't work actually. > > This patchset looks complicated, but the idea is very simple. With this > approach, we can extend to support allocating request pool attached to > driver tags dynamically. So far, it is always pre-allocated, and never be > used for normal single queue disks. > I'll continue to check this solution, but it seems to me that we should not get as far as the rq->q == hctx->queue check in bt_iter(). Thanks, John