From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f172.google.com ([209.85.214.172]:35692 "EHLO mail-ob0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751324AbbLDSCk (ORCPT ); Fri, 4 Dec 2015 13:02:40 -0500 Received: by obbnk6 with SMTP id nk6so78687027obb.2 for ; Fri, 04 Dec 2015 10:02:39 -0800 (PST) Subject: Re: [PATCH] block: detach bdev inode from its wb in __blkdev_put() To: Ilya Dryomov References: <1448054554-24138-1-git-send-email-idryomov@gmail.com> <20151130072029.GA10738@linux.vnet.ibm.com> <5661D0F3.3080603@kernel.dk> Cc: Alexander Viro , Tejun Heo , Christoph Hellwig , linux-fsdevel@vger.kernel.org, Raghavendra K T From: Jens Axboe Message-ID: <5661D53D.2020809@kernel.dk> Date: Fri, 4 Dec 2015 11:02:37 -0700 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 12/04/2015 10:55 AM, Ilya Dryomov wrote: > On Fri, Dec 4, 2015 at 6:44 PM, Jens Axboe wrote: >> On 12/04/2015 07:07 AM, Ilya Dryomov wrote: >>> >>> On Mon, Nov 30, 2015 at 11:54 AM, Ilya Dryomov wrote: >>>> >>>> On Mon, Nov 30, 2015 at 8:20 AM, Raghavendra K T >>>> wrote: >>>>> >>>>> * Ilya Dryomov [2015-11-20 22:22:34]: >>>>> >>>>>> Since 52ebea749aae ("writeback: make backing_dev_info host >>>>>> cgroup-specific bdi_writebacks") inode, at some point in its lifetime, >>>>>> gets attached to a wb (struct bdi_writeback). Detaching happens on >>>>>> evict, in inode_detach_wb() called from __destroy_inode(), and involves >>>>>> updating wb. >>>>>> >>>>>> However, detaching an internal bdev inode from its wb in >>>>>> __destroy_inode() is too late. Its bdi and by extension root wb are >>>>>> embedded into struct request_queue, which has different lifetime rules >>>>>> and can be freed long before the final bdput() is called (can be from >>>>>> __fput() of a corresponding /dev inode, through dput() - evict() - >>>>>> bd_forget(). bdevs hold onto the underlying disk/queue pair only while >>>>>> opened; as soon as bdev is closed all bets are off. In fact, >>>>>> disk/queue can be gone before __blkdev_put() even returns: >>>>>> >>>>>> 1499 static void __blkdev_put(struct block_device *bdev, fmode_t mode, >>>>>> int for_part) >>>>>> 1500 { >>>>>> ... >>>>>> 1518 if (bdev->bd_contains == bdev) { >>>>>> 1519 if (disk->fops->release) >>>>>> 1520 disk->fops->release(disk, mode); >>>>>> >>>>>> [ Driver puts its references to disk/queue ] >>>>>> >>>>>> 1521 } >>>>>> 1522 if (!bdev->bd_openers) { >>>>>> 1523 struct module *owner = disk->fops->owner; >>>>>> 1524 >>>>>> 1525 disk_put_part(bdev->bd_part); >>>>>> 1526 bdev->bd_part = NULL; >>>>>> 1527 bdev->bd_disk = NULL; >>>>>> 1528 if (bdev != bdev->bd_contains) >>>>>> 1529 victim = bdev->bd_contains; >>>>>> 1530 bdev->bd_contains = NULL; >>>>>> 1531 >>>>>> 1532 put_disk(disk); >>>>>> >>>>>> [ We put ours, the queue is gone >>>>>> The last bdput() would result in a write to invalid memory ] >>>>>> >>>>>> 1533 module_put(owner); >>>>>> ... >>>>>> 1539 } >>>>>> >>>>>> Since bdev inodes are special anyway, detach them in __blkdev_put() >>>>>> after clearing inode's dirty bits, turning the problematic >>>>>> inode_detach_wb() in __destroy_inode() into a noop. >>>>>> >>>>>> add_disk() grabs its disk->queue since 523e1d399ce0 ("block: make >>>>>> gendisk hold a reference to its queue"), so the old ->release comment >>>>>> is removed in favor of the new inode_detach_wb() comment. >>>>>> >>>>>> Cc: stable@vger.kernel.org # 4.2+, needs backporting >>>>>> Signed-off-by: Ilya Dryomov >>>>>> --- >>>>> >>>>> Feel free to add >>>>> Tested-by: Raghavendra K T >>>>> >>>>> I was facing bad memory access problem while creating thousands of >>>>> containers. >>>>> With this patch I am able to create more than 10k containers without >>>>> hitting >>>>> the problem. >>>>> I had reported the problem here: https://lkml.org/lkml/2015/11/19/149 >>>> >>>> >>>> Great! Christoph's concern is with ->i_wb as a whole, not this >>>> particular patch - Al, this one is marked for stable, can we get it >>>> merged into -rc4? Or should it go through Jens' tree, as cgroup >>>> writeback patches did? >>> >>> >>> Ping? >> >> >> Was holding off, but I think we should get the simple fix in for 4.4. I've >> applied it for this series. > > I think you missed Raghavendra's Tested-by (I'm looking at > https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/commit/?h=for-linus&id=f2148d49930497c66d38b13f137c30d78c60da3d). I did, fixed. -- Jens Axboe