From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from e38.co.us.ibm.com ([32.97.110.159]:41511 "EHLO e38.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753093AbbK3HP2 (ORCPT ); Mon, 30 Nov 2015 02:15:28 -0500 Received: from localhost by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 30 Nov 2015 00:15:28 -0700 Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id 1212E3E40044 for ; Mon, 30 Nov 2015 00:15:25 -0700 (MST) Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by b03cxnp08027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id tAU7FPks29753440 for ; Mon, 30 Nov 2015 00:15:25 -0700 Received: from d03av03.boulder.ibm.com (localhost [127.0.0.1]) by d03av03.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id tAU7FORY013876 for ; Mon, 30 Nov 2015 00:15:24 -0700 Date: Mon, 30 Nov 2015 12:50:29 +0530 From: Raghavendra K T To: Ilya Dryomov Cc: Alexander Viro , Tejun Heo , Christoph Hellwig , linux-fsdevel@vger.kernel.org Subject: Re: [PATCH] block: detach bdev inode from its wb in __blkdev_put() Message-ID: <20151130072029.GA10738@linux.vnet.ibm.com> Reply-To: Raghavendra K T References: <1448054554-24138-1-git-send-email-idryomov@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <1448054554-24138-1-git-send-email-idryomov@gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: * Ilya Dryomov [2015-11-20 22:22:34]: > Since 52ebea749aae ("writeback: make backing_dev_info host > cgroup-specific bdi_writebacks") inode, at some point in its lifetime, > gets attached to a wb (struct bdi_writeback). Detaching happens on > evict, in inode_detach_wb() called from __destroy_inode(), and involves > updating wb. > > However, detaching an internal bdev inode from its wb in > __destroy_inode() is too late. Its bdi and by extension root wb are > embedded into struct request_queue, which has different lifetime rules > and can be freed long before the final bdput() is called (can be from > __fput() of a corresponding /dev inode, through dput() - evict() - > bd_forget(). bdevs hold onto the underlying disk/queue pair only while > opened; as soon as bdev is closed all bets are off. In fact, > disk/queue can be gone before __blkdev_put() even returns: > > 1499 static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part) > 1500 { > ... > 1518 if (bdev->bd_contains == bdev) { > 1519 if (disk->fops->release) > 1520 disk->fops->release(disk, mode); > > [ Driver puts its references to disk/queue ] > > 1521 } > 1522 if (!bdev->bd_openers) { > 1523 struct module *owner = disk->fops->owner; > 1524 > 1525 disk_put_part(bdev->bd_part); > 1526 bdev->bd_part = NULL; > 1527 bdev->bd_disk = NULL; > 1528 if (bdev != bdev->bd_contains) > 1529 victim = bdev->bd_contains; > 1530 bdev->bd_contains = NULL; > 1531 > 1532 put_disk(disk); > > [ We put ours, the queue is gone > The last bdput() would result in a write to invalid memory ] > > 1533 module_put(owner); > ... > 1539 } > > Since bdev inodes are special anyway, detach them in __blkdev_put() > after clearing inode's dirty bits, turning the problematic > inode_detach_wb() in __destroy_inode() into a noop. > > add_disk() grabs its disk->queue since 523e1d399ce0 ("block: make > gendisk hold a reference to its queue"), so the old ->release comment > is removed in favor of the new inode_detach_wb() comment. > > Cc: stable@vger.kernel.org # 4.2+, needs backporting > Signed-off-by: Ilya Dryomov > --- Feel free to add Tested-by: Raghavendra K T I was facing bad memory access problem while creating thousands of containers. With this patch I am able to create more than 10k containers without hitting the problem. I had reported the problem here: https://lkml.org/lkml/2015/11/19/149