linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime
@ 2017-01-07  1:02 Dan Williams
  2017-01-07  1:02 ` [RFC PATCH v2 1/2] block: fix lifetime of request_queue / backing_dev_info relative to bdev Dan Williams
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Dan Williams @ 2017-01-07  1:02 UTC (permalink / raw)
  To: linux-block
  Cc: Andi Kleen, Jan Kara, Rabin Vincent, linux-nvdimm, linux-kernel,
	Jens Axboe, Jeff Moyer, Wei Fang, linux-fsdevel,
	Christoph Hellwig

v1 of these changes [1] was a one line change to bdev_get_queue() to
prevent a shutdown crash when del_gendisk() races the final
__blkdev_put().

While it is known at del_gendisk() time that the queue is still alive,
Jan Kara points to other paths [2] that are racing __blkdev_put() where
the assumption that ->bd_queue, or inode->i_wb is valid does not hold.

Fix that broken assumption, make it the case that if you have a live
block_device, or block_device-inode that the corresponding queue and
inode-write-back data is still valid.

These changes survive a run of the libnvdimm unit test suite which puts
some stress on the block_device shutdown path.

---

Changes since v1 [1]:

* Introduce "block: fix lifetime of request_queue / backing_dev_info
  relative to bdev" to keep the queue allocated and the inode attached
  for writeback until ->destroy_inode() time.

* Rework the comments in "block: fix blk_get_backing_dev_info() crash,
  use bdev->bd_queue" to reflect the assumptions about the liveness of
  ->bd_queue.

[1]: http://marc.info/?l=linux-block&m=148366637105761&w=2
[2]: http://www.spinics.net/lists/linux-fsdevel/msg105153.html

---

Dan Williams (2):
      block: fix lifetime of request_queue / backing_dev_info relative to bdev
      block: fix blk_get_backing_dev_info() crash, use bdev->bd_queue


 block/blk-core.c       |    7 ++++---
 fs/block_dev.c         |   25 +++++++++++++++----------
 include/linux/blkdev.h |    6 +++++-
 include/linux/fs.h     |    1 +
 4 files changed, 25 insertions(+), 14 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH v2 1/2] block: fix lifetime of request_queue / backing_dev_info relative to bdev
  2017-01-07  1:02 [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime Dan Williams
@ 2017-01-07  1:02 ` Dan Williams
  2017-01-07  1:03 ` [RFC PATCH v2 2/2] block: fix blk_get_backing_dev_info() crash, use bdev->bd_queue Dan Williams
  2017-01-23 21:17 ` [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime Thiago Jung Bauermann
  2 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2017-01-07  1:02 UTC (permalink / raw)
  To: linux-block
  Cc: Jan Kara, linux-nvdimm, linux-kernel, Jens Axboe, Jeff Moyer,
	linux-fsdevel, Christoph Hellwig

By definition the lifetime of a struct block_device is equal to the
lifetime of its corresponding inode since they both live in struct
bdev_inode. Up until the inode is destroyed it may be the target of
delayed write-back requests. Issuing write-back to a block_device
requires a lookup of the bdev backing_dev_info that is embedded in the
request_queue.

It follows that a struct request_queue must be alive as long as a
corresponding block device inode is the active target of requests.

Match the lifetimes of backing_dev_info and its usage in
inode-write-back by arranging for the bdev to take a reference against
its request_queue when the bdev is instantiated. Release the
request_queue reference when the inode is freed.

Cc: Jens Axboe <axboe@fb.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/blk-core.c   |    3 ++-
 fs/block_dev.c     |   25 +++++++++++++++----------
 include/linux/fs.h |    1 +
 3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 14d7c0740dc0..e8713137b846 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -378,7 +378,8 @@ EXPORT_SYMBOL(blk_run_queue);
 
 void blk_put_queue(struct request_queue *q)
 {
-	kobject_put(&q->kobj);
+	if (q)
+		kobject_put(&q->kobj);
 }
 EXPORT_SYMBOL(blk_put_queue);
 
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 899fa8ccc347..8f94b74bfc7d 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -589,12 +589,22 @@ static struct inode *bdev_alloc_inode(struct super_block *sb)
 	return &ei->vfs_inode;
 }
 
+static void bdev_release(struct work_struct *w)
+{
+	struct block_device *bdev = container_of(w, typeof(*bdev), bd_release);
+	struct bdev_inode *bdi = container_of(bdev, typeof(*bdi), bdev);
+
+	blk_put_queue(bdev->bd_queue);
+	kmem_cache_free(bdev_cachep, bdi);
+}
+
 static void bdev_i_callback(struct rcu_head *head)
 {
 	struct inode *inode = container_of(head, struct inode, i_rcu);
-	struct bdev_inode *bdi = BDEV_I(inode);
+	struct block_device *bdev = &BDEV_I(inode)->bdev;
 
-	kmem_cache_free(bdev_cachep, bdi);
+	/* blk_put_queue needs process context */
+	schedule_work(&bdev->bd_release);
 }
 
 static void bdev_destroy_inode(struct inode *inode)
@@ -613,6 +623,7 @@ static void init_once(void *foo)
 #ifdef CONFIG_SYSFS
 	INIT_LIST_HEAD(&bdev->bd_holder_disks);
 #endif
+	INIT_WORK(&bdev->bd_release, bdev_release);
 	inode_init_once(&ei->vfs_inode);
 	/* Initialize mutex for freeze. */
 	mutex_init(&bdev->bd_fsfreeze_mutex);
@@ -1268,6 +1279,8 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
 	mutex_lock_nested(&bdev->bd_mutex, for_part);
 	if (!bdev->bd_openers) {
 		bdev->bd_disk = disk;
+		if (!blk_get_queue(disk->queue))
+			goto out_clear;
 		bdev->bd_queue = disk->queue;
 		bdev->bd_contains = bdev;
 
@@ -1288,7 +1301,6 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
 					disk_put_part(bdev->bd_part);
 					bdev->bd_part = NULL;
 					bdev->bd_disk = NULL;
-					bdev->bd_queue = NULL;
 					mutex_unlock(&bdev->bd_mutex);
 					disk_unblock_events(disk);
 					put_disk(disk);
@@ -1364,7 +1376,6 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
 	disk_put_part(bdev->bd_part);
 	bdev->bd_disk = NULL;
 	bdev->bd_part = NULL;
-	bdev->bd_queue = NULL;
 	if (bdev != bdev->bd_contains)
 		__blkdev_put(bdev->bd_contains, mode, 1);
 	bdev->bd_contains = NULL;
@@ -1586,12 +1597,6 @@ static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
 		kill_bdev(bdev);
 
 		bdev_write_inode(bdev);
-		/*
-		 * Detaching bdev inode from its wb in __destroy_inode()
-		 * is too late: the queue which embeds its bdi (along with
-		 * root wb) can be gone as soon as we put_disk() below.
-		 */
-		inode_detach_wb(bdev->bd_inode);
 	}
 	if (bdev->bd_contains == bdev) {
 		if (disk->fops->release)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index dc0478c07b2a..f084e4a2198b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -488,6 +488,7 @@ struct block_device {
 	int			bd_fsfreeze_count;
 	/* Mutex for freeze */
 	struct mutex		bd_fsfreeze_mutex;
+	struct work_struct	bd_release;
 };
 
 /*


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v2 2/2] block: fix blk_get_backing_dev_info() crash, use bdev->bd_queue
  2017-01-07  1:02 [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime Dan Williams
  2017-01-07  1:02 ` [RFC PATCH v2 1/2] block: fix lifetime of request_queue / backing_dev_info relative to bdev Dan Williams
@ 2017-01-07  1:03 ` Dan Williams
  2017-01-23 21:17 ` [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime Thiago Jung Bauermann
  2 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2017-01-07  1:03 UTC (permalink / raw)
  To: linux-block
  Cc: Andi Kleen, Jan Kara, Rabin Vincent, linux-nvdimm, linux-kernel,
	Jens Axboe, Jeff Moyer, Wei Fang, linux-fsdevel,
	Christoph Hellwig

The ->bd_queue member of struct block_device was added in commit
87192a2a49c4 ("vfs: cache request_queue in struct block_device") in
v3.3. However, blk_get_backing_dev_info() has been using
bdev_get_queue() and grabbing the request_queue through the gendisk
since before the git era.

At final __blkdev_put() time ->bd_disk is cleared while ->bd_queue is
not. The queue remains valid until the final put of the parent disk.

The following crash signature results from blk_get_backing_dev_info()
trying to lookup the queue through ->bd_disk after the final put of the
block device. Simply switch bdev_get_queue() to use ->bd_queue directly
which is guaranteed to still be valid since the request_queue is alive
as long as the inode corresponding to the bdev has not been destroyed.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000568
 IP: blk_get_backing_dev_info+0x10/0x20
 [..]
 Call Trace:
  __inode_attach_wb+0x3a7/0x5d0
  __filemap_fdatawrite_range+0xf8/0x100
  filemap_write_and_wait+0x40/0x90
  fsync_bdev+0x54/0x60
  ? bdget_disk+0x30/0x40
  invalidate_partition+0x24/0x50
  del_gendisk+0xfa/0x230

Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Wei Fang <fangwei1@huawei.com>
Cc: Rabin Vincent <rabinv@axis.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/blk-core.c       |    4 ++--
 include/linux/blkdev.h |    6 +++++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e8713137b846..cfd6731dfed7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -108,8 +108,8 @@ void blk_queue_congestion_threshold(struct request_queue *q)
  * @bdev:	device
  *
  * Locates the passed device's request queue and returns the address of its
- * backing_dev_info.  This function can only be called if @bdev is opened
- * and the return value is never NULL.
+ * backing_dev_info.  This function can be called until the final iput()
+ * of the bdev inode.
  */
 struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev)
 {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358ba052..fd332da0fc38 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -841,7 +841,11 @@ bool blk_poll(struct request_queue *q, blk_qc_t cookie);
 
 static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
 {
-	return bdev->bd_disk->queue;	/* this is never NULL */
+	/*
+	 * ->bd_queue is valid as long as there is a reference against
+	 *  the bdev inode.
+	 */
+	return bdev->bd_queue;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime
  2017-01-07  1:02 [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime Dan Williams
  2017-01-07  1:02 ` [RFC PATCH v2 1/2] block: fix lifetime of request_queue / backing_dev_info relative to bdev Dan Williams
  2017-01-07  1:03 ` [RFC PATCH v2 2/2] block: fix blk_get_backing_dev_info() crash, use bdev->bd_queue Dan Williams
@ 2017-01-23 21:17 ` Thiago Jung Bauermann
  2017-01-25 21:43   ` Dan Williams
  2 siblings, 1 reply; 8+ messages in thread
From: Thiago Jung Bauermann @ 2017-01-23 21:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-block, Andi Kleen, Jan Kara, Rabin Vincent, linux-nvdimm,
	linux-kernel, Jens Axboe, Wei Fang, linux-fsdevel,
	Christoph Hellwig

Hello Dan,

Am Freitag, 6. Januar 2017, 17:02:51 BRST schrieb Dan Williams:
> v1 of these changes [1] was a one line change to bdev_get_queue() to
> prevent a shutdown crash when del_gendisk() races the final
> __blkdev_put().
> 
> While it is known at del_gendisk() time that the queue is still alive,
> Jan Kara points to other paths [2] that are racing __blkdev_put() where
> the assumption that ->bd_queue, or inode->i_wb is valid does not hold.
> 
> Fix that broken assumption, make it the case that if you have a live
> block_device, or block_device-inode that the corresponding queue and
> inode-write-back data is still valid.
> 
> These changes survive a run of the libnvdimm unit test suite which puts
> some stress on the block_device shutdown path.

I realize that the kernel test robot found problems with this series, but FWIW 
it fixes the bug mentioned in [2].

> [1]: http://marc.info/?l=linux-block&m=148366637105761&w=2
> [2]: http://www.spinics.net/lists/linux-fsdevel/msg105153.html

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime
  2017-01-23 21:17 ` [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime Thiago Jung Bauermann
@ 2017-01-25 21:43   ` Dan Williams
  2017-01-26 10:06     ` Jan Kara
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2017-01-25 21:43 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: Jens Axboe, Jan Kara, Rabin Vincent, linux-nvdimm@lists.01.org,
	Linux Kernel Mailing List, linux-block, Andi Kleen, Wei Fang,
	linux-fsdevel, Christoph Hellwig

On Mon, Jan 23, 2017 at 1:17 PM, Thiago Jung Bauermann
<bauerman@linux.vnet.ibm.com> wrote:
> Hello Dan,
>
> Am Freitag, 6. Januar 2017, 17:02:51 BRST schrieb Dan Williams:
>> v1 of these changes [1] was a one line change to bdev_get_queue() to
>> prevent a shutdown crash when del_gendisk() races the final
>> __blkdev_put().
>>
>> While it is known at del_gendisk() time that the queue is still alive,
>> Jan Kara points to other paths [2] that are racing __blkdev_put() where
>> the assumption that ->bd_queue, or inode->i_wb is valid does not hold.
>>
>> Fix that broken assumption, make it the case that if you have a live
>> block_device, or block_device-inode that the corresponding queue and
>> inode-write-back data is still valid.
>>
>> These changes survive a run of the libnvdimm unit test suite which puts
>> some stress on the block_device shutdown path.
>
> I realize that the kernel test robot found problems with this series, but FWIW
> it fixes the bug mentioned in [2].
>

Thanks for the test result. I might take a look at cleaning up the
test robot reports and resubmitting this approach unless Jan beats me
to the punch with his backing_devi_info lifetime change patches.

>> [2]: http://www.spinics.net/lists/linux-fsdevel/msg105153.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime
  2017-01-25 21:43   ` Dan Williams
@ 2017-01-26 10:06     ` Jan Kara
  2017-01-26 13:17       ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kara @ 2017-01-26 10:06 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thiago Jung Bauermann, Jens Axboe, Jan Kara, Rabin Vincent,
	linux-nvdimm@lists.01.org, Linux Kernel Mailing List, linux-block,
	Andi Kleen, Wei Fang, linux-fsdevel, Christoph Hellwig

On Wed 25-01-17 13:43:58, Dan Williams wrote:
> On Mon, Jan 23, 2017 at 1:17 PM, Thiago Jung Bauermann
> <bauerman@linux.vnet.ibm.com> wrote:
> > Hello Dan,
> >
> > Am Freitag, 6. Januar 2017, 17:02:51 BRST schrieb Dan Williams:
> >> v1 of these changes [1] was a one line change to bdev_get_queue() to
> >> prevent a shutdown crash when del_gendisk() races the final
> >> __blkdev_put().
> >>
> >> While it is known at del_gendisk() time that the queue is still alive,
> >> Jan Kara points to other paths [2] that are racing __blkdev_put() where
> >> the assumption that ->bd_queue, or inode->i_wb is valid does not hold.
> >>
> >> Fix that broken assumption, make it the case that if you have a live
> >> block_device, or block_device-inode that the corresponding queue and
> >> inode-write-back data is still valid.
> >>
> >> These changes survive a run of the libnvdimm unit test suite which puts
> >> some stress on the block_device shutdown path.
> >
> > I realize that the kernel test robot found problems with this series, but FWIW
> > it fixes the bug mentioned in [2].
> >
> 
> Thanks for the test result. I might take a look at cleaning up the
> test robot reports and resubmitting this approach unless Jan beats me
> to the punch with his backing_devi_info lifetime change patches.

Yeah, so my patches (and I suspect your as well), have a problem when the
backing_device_info stays around because blkdev inode still exists, device
gets removed (e.g. USB disk gets unplugged) but blkdev inode still stays
around (there doesn't appear to be anything that would be forcing blkdev
inode out of cache on device removal and there cannot be because different
processes may hold inode reference) and then some other device gets plugged
in and reuses the same MAJOR:MINOR combination. Things get awkward there, I
think we need to unhash blkdev inode on device removal but so far I didn't
make this work...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime
  2017-01-26 10:06     ` Jan Kara
@ 2017-01-26 13:17       ` Christoph Hellwig
  2017-01-26 16:39         ` Dan Williams
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2017-01-26 13:17 UTC (permalink / raw)
  To: Jan Kara
  Cc: Dan Williams, Thiago Jung Bauermann, Jens Axboe, Rabin Vincent,
	linux-nvdimm@lists.01.org, Linux Kernel Mailing List, linux-block,
	Andi Kleen, Wei Fang, linux-fsdevel, Christoph Hellwig

On Thu, Jan 26, 2017 at 11:06:53AM +0100, Jan Kara wrote:
> Yeah, so my patches (and I suspect your as well), have a problem when the
> backing_device_info stays around because blkdev inode still exists, device
> gets removed (e.g. USB disk gets unplugged) but blkdev inode still stays
> around (there doesn't appear to be anything that would be forcing blkdev
> inode out of cache on device removal and there cannot be because different
> processes may hold inode reference) and then some other device gets plugged
> in and reuses the same MAJOR:MINOR combination. Things get awkward there, I
> think we need to unhash blkdev inode on device removal but so far I didn't
> make this work...

The other option is to simply not release the dev_t until the backing_dev
is gone.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime
  2017-01-26 13:17       ` Christoph Hellwig
@ 2017-01-26 16:39         ` Dan Williams
  0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2017-01-26 16:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Thiago Jung Bauermann, Jens Axboe, Rabin Vincent,
	linux-nvdimm@lists.01.org, Linux Kernel Mailing List, linux-block,
	Andi Kleen, Wei Fang, linux-fsdevel, Jej B

On Thu, Jan 26, 2017 at 5:17 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Thu, Jan 26, 2017 at 11:06:53AM +0100, Jan Kara wrote:
>> Yeah, so my patches (and I suspect your as well), have a problem when the
>> backing_device_info stays around because blkdev inode still exists, device
>> gets removed (e.g. USB disk gets unplugged) but blkdev inode still stays
>> around (there doesn't appear to be anything that would be forcing blkdev
>> inode out of cache on device removal and there cannot be because different
>> processes may hold inode reference) and then some other device gets plugged
>> in and reuses the same MAJOR:MINOR combination. Things get awkward there, I
>> think we need to unhash blkdev inode on device removal but so far I didn't
>> make this work...
>
> The other option is to simply not release the dev_t until the backing_dev
> is gone.

I came to a similar conclusion here:

   https://marc.info/?l=linux-scsi&m=147103737421897&w=4

James had some concerns, but I think its now clear this problem is
bigger than something we can fix locally in scsi.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-01-26 16:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-07  1:02 [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime Dan Williams
2017-01-07  1:02 ` [RFC PATCH v2 1/2] block: fix lifetime of request_queue / backing_dev_info relative to bdev Dan Williams
2017-01-07  1:03 ` [RFC PATCH v2 2/2] block: fix blk_get_backing_dev_info() crash, use bdev->bd_queue Dan Williams
2017-01-23 21:17 ` [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime Thiago Jung Bauermann
2017-01-25 21:43   ` Dan Williams
2017-01-26 10:06     ` Jan Kara
2017-01-26 13:17       ` Christoph Hellwig
2017-01-26 16:39         ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).