All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Ilya Dryomov <idryomov@gmail.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>, Christoph Hellwig <hch@lst.de>,
	linux-fsdevel@vger.kernel.org,
	Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Subject: Re: [PATCH] block: detach bdev inode from its wb in __blkdev_put()
Date: Fri, 4 Dec 2015 10:44:19 -0700	[thread overview]
Message-ID: <5661D0F3.3080603@kernel.dk> (raw)
In-Reply-To: <CAOi1vP8cj2qxjGf3a8qOmP=kYriU-5bx2YriKLbqSj_jm7uPeg@mail.gmail.com>

On 12/04/2015 07:07 AM, Ilya Dryomov wrote:
> On Mon, Nov 30, 2015 at 11:54 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
>> On Mon, Nov 30, 2015 at 8:20 AM, Raghavendra K T
>> <raghavendra.kt@linux.vnet.ibm.com> wrote:
>>> * Ilya Dryomov <idryomov@gmail.com> [2015-11-20 22:22:34]:
>>>
>>>> Since 52ebea749aae ("writeback: make backing_dev_info host
>>>> cgroup-specific bdi_writebacks") inode, at some point in its lifetime,
>>>> gets attached to a wb (struct bdi_writeback).  Detaching happens on
>>>> evict, in inode_detach_wb() called from __destroy_inode(), and involves
>>>> updating wb.
>>>>
>>>> However, detaching an internal bdev inode from its wb in
>>>> __destroy_inode() is too late.  Its bdi and by extension root wb are
>>>> embedded into struct request_queue, which has different lifetime rules
>>>> and can be freed long before the final bdput() is called (can be from
>>>> __fput() of a corresponding /dev inode, through dput() - evict() -
>>>> bd_forget().  bdevs hold onto the underlying disk/queue pair only while
>>>> opened; as soon as bdev is closed all bets are off.  In fact,
>>>> disk/queue can be gone before __blkdev_put() even returns:
>>>>
>>>> 1499 static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
>>>> 1500 {
>>>> ...
>>>> 1518         if (bdev->bd_contains == bdev) {
>>>> 1519                 if (disk->fops->release)
>>>> 1520                         disk->fops->release(disk, mode);
>>>>
>>>> [ Driver puts its references to disk/queue ]
>>>>
>>>> 1521         }
>>>> 1522         if (!bdev->bd_openers) {
>>>> 1523                 struct module *owner = disk->fops->owner;
>>>> 1524
>>>> 1525                 disk_put_part(bdev->bd_part);
>>>> 1526                 bdev->bd_part = NULL;
>>>> 1527                 bdev->bd_disk = NULL;
>>>> 1528                 if (bdev != bdev->bd_contains)
>>>> 1529                         victim = bdev->bd_contains;
>>>> 1530                 bdev->bd_contains = NULL;
>>>> 1531
>>>> 1532                 put_disk(disk);
>>>>
>>>> [ We put ours, the queue is gone
>>>>    The last bdput() would result in a write to invalid memory ]
>>>>
>>>> 1533                 module_put(owner);
>>>> ...
>>>> 1539 }
>>>>
>>>> Since bdev inodes are special anyway, detach them in __blkdev_put()
>>>> after clearing inode's dirty bits, turning the problematic
>>>> inode_detach_wb() in __destroy_inode() into a noop.
>>>>
>>>> add_disk() grabs its disk->queue since 523e1d399ce0 ("block: make
>>>> gendisk hold a reference to its queue"), so the old ->release comment
>>>> is removed in favor of the new inode_detach_wb() comment.
>>>>
>>>> Cc: stable@vger.kernel.org # 4.2+, needs backporting
>>>> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
>>>> ---
>>> Feel free to add
>>> Tested-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
>>>
>>> I was facing bad memory access problem while creating thousands of containers.
>>> With this patch I am able to create more than 10k containers without hitting
>>> the problem.
>>> I had reported the problem here: https://lkml.org/lkml/2015/11/19/149
>>
>> Great!  Christoph's concern is with ->i_wb as a whole, not this
>> particular patch - Al, this one is marked for stable, can we get it
>> merged into -rc4?  Or should it go through Jens' tree, as cgroup
>> writeback patches did?
>
> Ping?

Was holding off, but I think we should get the simple fix in for 4.4. 
I've applied it for this series.

-- 
Jens Axboe


  reply	other threads:[~2015-12-04 17:44 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-20 21:22 [PATCH] block: detach bdev inode from its wb in __blkdev_put() Ilya Dryomov
2015-11-20 21:23 ` Tejun Heo
2015-11-22 15:02 ` Christoph Hellwig
2015-11-23 15:35   ` Tejun Heo
2015-11-24  7:54     ` Christoph Hellwig
2015-11-24 14:00       ` Tejun Heo
2015-11-30  7:20 ` Raghavendra K T
2015-11-30 10:54   ` Ilya Dryomov
2015-12-04 14:07     ` Ilya Dryomov
2015-12-04 17:44       ` Jens Axboe [this message]
2015-12-04 17:55         ` Ilya Dryomov
2015-12-04 18:02           ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5661D0F3.3080603@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=idryomov@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=raghavendra.kt@linux.vnet.ibm.com \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.