All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/10] block: Fix block device shutdown related races
@ 2017-02-09 12:44 Jan Kara
  2017-02-09 12:44 ` [PATCH 01/10] block: Move bdev_unhash_inode() after invalidate_partition() Jan Kara
                   ` (11 more replies)
  0 siblings, 12 replies; 29+ messages in thread
From: Jan Kara @ 2017-02-09 12:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Tejun Heo, Dan Williams,
	Thiago Jung Bauermann, NeilBrown, Jan Kara

Hello,

this patch set fixes several different races and issues I've found when testing
device shutdown and reuse. The first three patches and fixes to problems in my
previous series fixing BDI lifetime issues. Patch 4 fixes issues with reuse of
BDI name with scsi devices. With it I cannot reproduce the BDI name reuse
issues using Omar's stress test using scsi_debug so it can be used as a
replacement of Dan's patches. Patches 5-8 fix oops that is triggered by
__blkdev_put() calling inode_detach_wb() too early (the problem reported by
Thiago). Patches 9 and 10 fix oops due to a bug in gendisk code where
get_gendisk() can return already freed gendisk structure (again triggered by
Omar's stress test).

People, please have a look at patches. The are mostly simple however the
interactions are rather complex so I may have missed something. Also I'm
happy for any additional testing these patches can get - I've stressed them
with Omar's script, tested memcg writeback, tested static (not udev managed)
device inodes.

								Honza

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/10] block: Move bdev_unhash_inode() after invalidate_partition()
  2017-02-09 12:44 [PATCH 0/10] block: Fix block device shutdown related races Jan Kara
@ 2017-02-09 12:44 ` Jan Kara
  2017-02-12  3:58   ` Tejun Heo
  2017-02-09 12:44 ` [PATCH 02/10] block: Unhash also block device inode for the whole device Jan Kara
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2017-02-09 12:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Tejun Heo, Dan Williams,
	Thiago Jung Bauermann, NeilBrown, Jan Kara

Move bdev_unhash_inode() after invalidate_partition() as
invalidate_partition() looks up bdev and will unnecessarily recreate it
if bdev_unhash_inode() destroyed it. Also use part_devt() when calling
bdev_unhash_inode() instead of manually creating the device number.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/genhd.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index d9ccd42f3675..6cb9f3a34a92 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -648,9 +648,8 @@ void del_gendisk(struct gendisk *disk)
 	disk_part_iter_init(&piter, disk,
 			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
 	while ((part = disk_part_iter_next(&piter))) {
-		bdev_unhash_inode(MKDEV(disk->major,
-					disk->first_minor + part->partno));
 		invalidate_partition(disk, part->partno);
+		bdev_unhash_inode(part_devt(part));
 		delete_partition(disk, part->partno);
 	}
 	disk_part_iter_exit(&piter);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/10] block: Unhash also block device inode for the whole device
  2017-02-09 12:44 [PATCH 0/10] block: Fix block device shutdown related races Jan Kara
  2017-02-09 12:44 ` [PATCH 01/10] block: Move bdev_unhash_inode() after invalidate_partition() Jan Kara
@ 2017-02-09 12:44 ` Jan Kara
  2017-02-12  4:16   ` Tejun Heo
  2017-02-09 12:44 ` [PATCH 03/10] block: Revalidate i_bdev reference in bd_aquire() Jan Kara
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2017-02-09 12:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Tejun Heo, Dan Williams,
	Thiago Jung Bauermann, NeilBrown, Jan Kara

Iteration over partitions in del_gendisk() omits part0. Add
bdev_unhash_inode() call for the whole device. Otherwise if the device
number gets reused, bdev inode will be still associated with the old
(stale) bdi.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/genhd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/genhd.c b/block/genhd.c
index 6cb9f3a34a92..f6c4d4400759 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -655,6 +655,7 @@ void del_gendisk(struct gendisk *disk)
 	disk_part_iter_exit(&piter);
 
 	invalidate_partition(disk, 0);
+	bdev_unhash_inode(disk_devt(disk));
 	set_capacity(disk, 0);
 	disk->flags &= ~GENHD_FL_UP;
 
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/10] block: Revalidate i_bdev reference in bd_aquire()
  2017-02-09 12:44 [PATCH 0/10] block: Fix block device shutdown related races Jan Kara
  2017-02-09 12:44 ` [PATCH 01/10] block: Move bdev_unhash_inode() after invalidate_partition() Jan Kara
  2017-02-09 12:44 ` [PATCH 02/10] block: Unhash also block device inode for the whole device Jan Kara
@ 2017-02-09 12:44 ` Jan Kara
  2017-02-09 15:54   ` Jan Kara
  2017-02-09 12:44 ` [PATCH 04/10] block: Move bdi_unregister() to del_gendisk() Jan Kara
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2017-02-09 12:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Tejun Heo, Dan Williams,
	Thiago Jung Bauermann, NeilBrown, Jan Kara

When a device gets removed, block device inode unhashed so that it is not
used anymore (bdget() will not find it anymore). Later when a new device
gets created with the same device number, we create new block device
inode. However there may be file system device inodes whose i_bdev still
points to the original block device inode and thus we get two active
block device inodes for the same device. They will share the same
gendisk so the only visible differences will be that page caches will
not be coherent and BDIs will be different (the old block device inode
still points to unregistered BDI).

Fix the problem by checking in bd_acquire() whether i_bdev still points
to active block device inode and re-lookup the block device if not. That
way any open of a block device happening after the old device has been
removed will get correct block device inode.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 601b71b76d7f..360439373a66 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1043,13 +1043,22 @@ static struct block_device *bd_acquire(struct inode *inode)
 
 	spin_lock(&bdev_lock);
 	bdev = inode->i_bdev;
-	if (bdev) {
+	if (bdev && !inode_unhashed(bdev->bd_inode)) {
 		bdgrab(bdev);
 		spin_unlock(&bdev_lock);
 		return bdev;
 	}
 	spin_unlock(&bdev_lock);
 
+	/*
+	 * i_bdev references block device inode that was already shut down
+	 * (corresponding device got removed).  Remove the reference and look
+	 * up block device inode again just in case new device got
+	 * reestablished under the same device number.
+	 */
+	if (bdev)
+		bd_forget(bdev);
+
 	bdev = bdget(inode->i_rdev);
 	if (bdev) {
 		spin_lock(&bdev_lock);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/10] block: Move bdi_unregister() to del_gendisk()
  2017-02-09 12:44 [PATCH 0/10] block: Fix block device shutdown related races Jan Kara
                   ` (2 preceding siblings ...)
  2017-02-09 12:44 ` [PATCH 03/10] block: Revalidate i_bdev reference in bd_aquire() Jan Kara
@ 2017-02-09 12:44 ` Jan Kara
  2017-02-10  2:21   ` NeilBrown
  2017-02-12  4:31   ` Tejun Heo
  2017-02-09 12:44 ` [PATCH 05/10] writeback: Generalize and standardize I_SYNC waiting function Jan Kara
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 29+ messages in thread
From: Jan Kara @ 2017-02-09 12:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Tejun Heo, Dan Williams,
	Thiago Jung Bauermann, NeilBrown, Jan Kara

Commit 6cd18e711dd8 "block: destroy bdi before blockdev is
unregistered." moved bdi unregistration (at that time through
bdi_destroy()) from blk_release_queue() to blk_cleanup_queue() because
it needs to happen before blk_unregister_region() call in del_gendisk()
for MD. As much as it is fine for device registration / unregistration
purposes, it does not fit our needs wrt writeback code. For those we
will need bdi_unregister() to happen after bdev_unhash_inode() so that
we are sure bdev inode is destroyed or soon to be destroyed (as soon as
last inode reference is dropped and nobody should be holding bdev inode
reference for long at this point) because bdi_unregister() may block
waiting for bdev's inode i_wb reference to be dropped and that happens
only once bdev inode gets destroyed.

Also SCSI will free up the device number from sd_remove() called through
a maze of callbacks from device_del() in __scsi_remove_device() before
blk_cleanup_queue() and thus similar races as described in 6cd18e711dd8
can happen for SCSI as well as reported by Omar [1]. Moving
bdi_unregister() to del_gendisk() fixes these problems as well since
del_gendisk() gets called from sd_remove() before freeing the device
number.

This also makes device_add_disk() (calling bdi_register_owner()) more
symmetric with del_gendisk().

[1] http://marc.info/?l=linux-block&m=148554717109098&w=2

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-core.c | 2 --
 block/genhd.c    | 7 +++++++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 47104f6a398b..9a901dcfdd5c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -580,8 +580,6 @@ void blk_cleanup_queue(struct request_queue *q)
 		q->queue_lock = &q->__queue_lock;
 	spin_unlock_irq(lock);
 
-	bdi_unregister(q->backing_dev_info);
-
 	/* @q is and will stay empty, shutdown and put */
 	blk_put_queue(q);
 }
diff --git a/block/genhd.c b/block/genhd.c
index f6c4d4400759..68c613edb93a 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -660,6 +660,13 @@ void del_gendisk(struct gendisk *disk)
 	disk->flags &= ~GENHD_FL_UP;
 
 	sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
+	/*
+	 * Unregister bdi before releasing device numbers (as they can get
+	 * reused and we'd get clashes in sysfs) but after bdev inodes are
+	 * unhashed and thus will be soon destroyed as bdev inode's reference
+	 * to wb_writeback can block bdi_unregister().
+	 */
+	bdi_unregister(disk->queue->backing_dev_info);
 	blk_unregister_queue(disk);
 	blk_unregister_region(disk_devt(disk), disk->minors);
 
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/10] writeback: Generalize and standardize I_SYNC waiting function
  2017-02-09 12:44 [PATCH 0/10] block: Fix block device shutdown related races Jan Kara
                   ` (3 preceding siblings ...)
  2017-02-09 12:44 ` [PATCH 04/10] block: Move bdi_unregister() to del_gendisk() Jan Kara
@ 2017-02-09 12:44 ` Jan Kara
  2017-02-12  4:32   ` Tejun Heo
  2017-02-09 12:44 ` [PATCH 06/10] writeback: Move __inode_wait_for_state_bit Jan Kara
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2017-02-09 12:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Tejun Heo, Dan Williams,
	Thiago Jung Bauermann, NeilBrown, Jan Kara

__inode_wait_for_writeback() waits for I_SYNC on inode to get cleared.
There's nothing specific regarting I_SYNC for that function. Generalize
it so that we can use it also for I_WB_SWITCH bit. Also the function
uses __wait_on_bit() unnecessarily. Switch it to wait_on_bit() to remove
some code.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index ef600591d96f..c9770de11650 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1170,21 +1170,16 @@ static int write_inode(struct inode *inode, struct writeback_control *wbc)
 }
 
 /*
- * Wait for writeback on an inode to complete. Called with i_lock held.
+ * Wait for bit in inode->i_state to clear. Called with i_lock held.
  * Caller must make sure inode cannot go away when we drop i_lock.
  */
-static void __inode_wait_for_writeback(struct inode *inode)
+static void __inode_wait_for_state_bit(struct inode *inode, int bit_nr)
 	__releases(inode->i_lock)
 	__acquires(inode->i_lock)
 {
-	DEFINE_WAIT_BIT(wq, &inode->i_state, __I_SYNC);
-	wait_queue_head_t *wqh;
-
-	wqh = bit_waitqueue(&inode->i_state, __I_SYNC);
-	while (inode->i_state & I_SYNC) {
+	while (inode->i_state & (1 << bit_nr)) {
 		spin_unlock(&inode->i_lock);
-		__wait_on_bit(wqh, &wq, bit_wait,
-			      TASK_UNINTERRUPTIBLE);
+		wait_on_bit(&inode->i_state, bit_nr, TASK_UNINTERRUPTIBLE);
 		spin_lock(&inode->i_lock);
 	}
 }
@@ -1195,7 +1190,7 @@ static void __inode_wait_for_writeback(struct inode *inode)
 void inode_wait_for_writeback(struct inode *inode)
 {
 	spin_lock(&inode->i_lock);
-	__inode_wait_for_writeback(inode);
+	__inode_wait_for_state_bit(inode, __I_SYNC);
 	spin_unlock(&inode->i_lock);
 }
 
@@ -1397,7 +1392,7 @@ static int writeback_single_inode(struct inode *inode,
 		 * inode reference or inode has I_WILL_FREE set, it cannot go
 		 * away under us.
 		 */
-		__inode_wait_for_writeback(inode);
+		__inode_wait_for_state_bit(inode, __I_SYNC);
 	}
 	WARN_ON(inode->i_state & I_SYNC);
 	/*
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/10] writeback: Move __inode_wait_for_state_bit
  2017-02-09 12:44 [PATCH 0/10] block: Fix block device shutdown related races Jan Kara
                   ` (4 preceding siblings ...)
  2017-02-09 12:44 ` [PATCH 05/10] writeback: Generalize and standardize I_SYNC waiting function Jan Kara
@ 2017-02-09 12:44 ` Jan Kara
  2017-02-09 12:44 ` [PATCH 07/10] writeback: Implement reliable switching to default writeback structure Jan Kara
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2017-02-09 12:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Tejun Heo, Dan Williams,
	Thiago Jung Bauermann, NeilBrown, Jan Kara

Move it up in fs/fs-writeback.c so that we don't have to use forward
declarations. No code change.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index c9770de11650..23dc97cf2a50 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -99,6 +99,21 @@ static inline struct inode *wb_inode(struct list_head *head)
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(wbc_writepage);
 
+/*
+ * Wait for bit in inode->i_state to clear. Called with i_lock held.
+ * Caller must make sure inode cannot go away when we drop i_lock.
+ */
+static void __inode_wait_for_state_bit(struct inode *inode, int bit_nr)
+	__releases(inode->i_lock)
+	__acquires(inode->i_lock)
+{
+	while (inode->i_state & (1 << bit_nr)) {
+		spin_unlock(&inode->i_lock);
+		wait_on_bit(&inode->i_state, bit_nr, TASK_UNINTERRUPTIBLE);
+		spin_lock(&inode->i_lock);
+	}
+}
+
 static bool wb_io_lists_populated(struct bdi_writeback *wb)
 {
 	if (wb_has_dirty_io(wb)) {
@@ -1170,21 +1185,6 @@ static int write_inode(struct inode *inode, struct writeback_control *wbc)
 }
 
 /*
- * Wait for bit in inode->i_state to clear. Called with i_lock held.
- * Caller must make sure inode cannot go away when we drop i_lock.
- */
-static void __inode_wait_for_state_bit(struct inode *inode, int bit_nr)
-	__releases(inode->i_lock)
-	__acquires(inode->i_lock)
-{
-	while (inode->i_state & (1 << bit_nr)) {
-		spin_unlock(&inode->i_lock);
-		wait_on_bit(&inode->i_state, bit_nr, TASK_UNINTERRUPTIBLE);
-		spin_lock(&inode->i_lock);
-	}
-}
-
-/*
  * Wait for writeback on an inode to complete. Caller must have inode pinned.
  */
 void inode_wait_for_writeback(struct inode *inode)
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/10] writeback: Implement reliable switching to default writeback structure
  2017-02-09 12:44 [PATCH 0/10] block: Fix block device shutdown related races Jan Kara
                   ` (5 preceding siblings ...)
  2017-02-09 12:44 ` [PATCH 06/10] writeback: Move __inode_wait_for_state_bit Jan Kara
@ 2017-02-09 12:44 ` Jan Kara
  2017-02-10  2:19   ` NeilBrown
  2017-02-09 12:44 ` [PATCH 08/10] block: Fix oops in locked_inode_to_wb_and_lock_list() Jan Kara
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2017-02-09 12:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Tejun Heo, Dan Williams,
	Thiago Jung Bauermann, NeilBrown, Jan Kara

Currently switching of inode between different writeback structures is
asynchronous and not guaranteed to succeed. Add a variant of switching
that is synchronous and reliable so that it can reliably move inode to
the default writeback structure (bdi->wb) when writeback on bdi is going
to be shutdown.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c         | 60 ++++++++++++++++++++++++++++++++++++++++-------
 include/linux/fs.h        |  3 ++-
 include/linux/writeback.h |  6 +++++
 3 files changed, 60 insertions(+), 9 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 23dc97cf2a50..52992a1036b1 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -332,14 +332,11 @@ struct inode_switch_wbs_context {
 	struct work_struct	work;
 };
 
-static void inode_switch_wbs_work_fn(struct work_struct *work)
+static void do_inode_switch_wbs(struct inode *inode,
+				struct bdi_writeback *new_wb)
 {
-	struct inode_switch_wbs_context *isw =
-		container_of(work, struct inode_switch_wbs_context, work);
-	struct inode *inode = isw->inode;
 	struct address_space *mapping = inode->i_mapping;
 	struct bdi_writeback *old_wb = inode->i_wb;
-	struct bdi_writeback *new_wb = isw->new_wb;
 	struct radix_tree_iter iter;
 	bool switched = false;
 	void **slot;
@@ -436,15 +433,29 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
 	spin_unlock(&new_wb->list_lock);
 	spin_unlock(&old_wb->list_lock);
 
+	/*
+	 * Make sure waitqueue_active() check in wake_up_bit() cannot happen
+	 * before I_WB_SWITCH is cleared. Pairs with the barrier in
+	 * set_task_state() after wait_on_bit() added waiter to the wait queue.
+	 */
+	smp_mb();
+	wake_up_bit(&inode->i_state, __I_WB_SWITCH);
+
 	if (switched) {
 		wb_wakeup(new_wb);
 		wb_put(old_wb);
 	}
-	wb_put(new_wb);
+}
 
-	iput(inode);
-	kfree(isw);
+static void inode_switch_wbs_work_fn(struct work_struct *work)
+{
+	struct inode_switch_wbs_context *isw =
+		container_of(work, struct inode_switch_wbs_context, work);
 
+	do_inode_switch_wbs(isw->inode, isw->new_wb);
+	wb_put(isw->new_wb);
+	iput(isw->inode);
+	kfree(isw);
 	atomic_dec(&isw_nr_in_flight);
 }
 
@@ -521,6 +532,39 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
 }
 
 /**
+ * inode_switch_to_default_wb_sync - change the wb association of an inode to
+ *	the default writeback structure synchronously
+ * @inode: target inode
+ *
+ * Switch @inode's wb association to the default writeback structure (bdi->wb).
+ * Unlike inode_switch_wbs() the switching is performed synchronously and we
+ * guarantee the inode is switched to the default writeback structure when this
+ * function returns. Nothing prevents from someone else switching inode to
+ * another writeback structure just when we are done though. Preventing that is
+ * upto the caller if needed.
+ */
+void inode_switch_to_default_wb_sync(struct inode *inode)
+{
+	struct backing_dev_info *bdi = inode_to_bdi(inode);
+
+	/* while holding I_WB_SWITCH, no one else can update the association */
+	spin_lock(&inode->i_lock);
+	if (WARN_ON_ONCE(inode->i_state & I_FREEING) ||
+	    !inode_to_wb_is_valid(inode) || inode_to_wb(inode) == &bdi->wb) {
+		spin_unlock(&inode->i_lock);
+		return;
+	}
+	__inode_wait_for_state_bit(inode, __I_WB_SWITCH);
+	inode->i_state |= I_WB_SWITCH;
+	spin_unlock(&inode->i_lock);
+
+	/* Make I_WB_SWITCH setting visible to unlocked users of i_wb */
+	synchronize_rcu();
+
+	do_inode_switch_wbs(inode, &bdi->wb);
+}
+
+/**
  * wbc_attach_and_unlock_inode - associate wbc with target inode and unlock it
  * @wbc: writeback_control of interest
  * @inode: target inode
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c930cbc19342..319fb76f9081 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1929,7 +1929,8 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
 #define I_DIRTY_TIME		(1 << 11)
 #define __I_DIRTY_TIME_EXPIRED	12
 #define I_DIRTY_TIME_EXPIRED	(1 << __I_DIRTY_TIME_EXPIRED)
-#define I_WB_SWITCH		(1 << 13)
+#define __I_WB_SWITCH		13
+#define I_WB_SWITCH		(1 << __I_WB_SWITCH)
 
 #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
 #define I_DIRTY_ALL (I_DIRTY | I_DIRTY_TIME)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 5527d910ba3d..0d3ba83a0f7f 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -280,6 +280,8 @@ static inline void wbc_init_bio(struct writeback_control *wbc, struct bio *bio)
 		bio_associate_blkcg(bio, wbc->wb->blkcg_css);
 }
 
+void inode_switch_to_default_wb_sync(struct inode *inode);
+
 #else	/* CONFIG_CGROUP_WRITEBACK */
 
 static inline void inode_attach_wb(struct inode *inode, struct page *page)
@@ -319,6 +321,10 @@ static inline void cgroup_writeback_umount(void)
 {
 }
 
+static inline void inode_switch_to_default_wb_sync(struct inode *inode)
+{
+}
+
 #endif	/* CONFIG_CGROUP_WRITEBACK */
 
 /*
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/10] block: Fix oops in locked_inode_to_wb_and_lock_list()
  2017-02-09 12:44 [PATCH 0/10] block: Fix block device shutdown related races Jan Kara
                   ` (6 preceding siblings ...)
  2017-02-09 12:44 ` [PATCH 07/10] writeback: Implement reliable switching to default writeback structure Jan Kara
@ 2017-02-09 12:44 ` Jan Kara
  2017-02-12  4:40   ` Tejun Heo
  2017-02-09 12:44 ` [PATCH 09/10] kobject: Export kobject_get_unless_zero() Jan Kara
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2017-02-09 12:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Tejun Heo, Dan Williams,
	Thiago Jung Bauermann, NeilBrown, Jan Kara

When block device is closed, we call inode_detach_wb() in __blkdev_put()
which sets inode->i_wb to NULL. That is contrary to expectations that
inode->i_wb stays valid once set during the whole inode's lifetime and
leads to oops in wb_get() in locked_inode_to_wb_and_lock_list() because
inode_to_wb() returned NULL.

The reason why we called inode_detach_wb() is not valid anymore though.
BDI is guaranteed to stay along until we call bdi_put() from
bdev_evict_inode() so we can postpone calling inode_detach_wb() to that
moment. A complication is that i_wb can point to non-root wb_writeback
structure and in that case we do need to clean it up as bdi_unregister()
blocks waiting for all non-root wb_writeback references to get dropped.
Thus this i_wb reference could block device removal e.g. from
__scsi_remove_device() (which indirectly ends up calling
bdi_unregister()). We cannot rely on block device inode to go away soon
(and thus i_wb reference to get dropped) as the device may got
hot-removed e.g. under a mounted filesystem. We deal with these issues
by switching block device inode from non-root wb_writeback structure to
bdi->wb when needed.  Since this is rather expensive (requires
synchronize_rcu()) we do the switching only in del_gendisk() when we
know the device is going away.

Also add a warning to catch if someone uses inode_detach_wb() in a
dangerous way.

Reported-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/genhd.c             |  4 ++--
 fs/block_dev.c            | 11 ++++-------
 include/linux/fs.h        |  2 +-
 include/linux/writeback.h |  1 +
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 68c613edb93a..721921a140cc 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -649,13 +649,13 @@ void del_gendisk(struct gendisk *disk)
 			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
 	while ((part = disk_part_iter_next(&piter))) {
 		invalidate_partition(disk, part->partno);
-		bdev_unhash_inode(part_devt(part));
+		bdev_cleanup_inode(part_devt(part));
 		delete_partition(disk, part->partno);
 	}
 	disk_part_iter_exit(&piter);
 
 	invalidate_partition(disk, 0);
-	bdev_unhash_inode(disk_devt(disk));
+	bdev_cleanup_inode(disk_devt(disk));
 	set_capacity(disk, 0);
 	disk->flags &= ~GENHD_FL_UP;
 
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 360439373a66..65ac3a60ac8e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -884,6 +884,8 @@ static void bdev_evict_inode(struct inode *inode)
 	spin_lock(&bdev_lock);
 	list_del_init(&bdev->bd_list);
 	spin_unlock(&bdev_lock);
+	/* Detach inode from wb early as bdi_put() may free bdi->wb */
+	inode_detach_wb(inode);
 	if (bdev->bd_bdi != &noop_backing_dev_info)
 		bdi_put(bdev->bd_bdi);
 }
@@ -960,13 +962,14 @@ static LIST_HEAD(all_bdevs);
  * If there is a bdev inode for this device, unhash it so that it gets evicted
  * as soon as last inode reference is dropped.
  */
-void bdev_unhash_inode(dev_t dev)
+void bdev_cleanup_inode(dev_t dev)
 {
 	struct inode *inode;
 
 	inode = ilookup5(blockdev_superblock, hash(dev), bdev_test, &dev);
 	if (inode) {
 		remove_inode_hash(inode);
+		inode_switch_to_default_wb_sync(inode);
 		iput(inode);
 	}
 }
@@ -1874,12 +1877,6 @@ static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
 		kill_bdev(bdev);
 
 		bdev_write_inode(bdev);
-		/*
-		 * Detaching bdev inode from its wb in __destroy_inode()
-		 * is too late: the queue which embeds its bdi (along with
-		 * root wb) can be gone as soon as we put_disk() below.
-		 */
-		inode_detach_wb(bdev->bd_inode);
 	}
 	if (bdev->bd_contains == bdev) {
 		if (disk->fops->release)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 319fb76f9081..f8c86b9c31d5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2344,7 +2344,7 @@ extern struct kmem_cache *names_cachep;
 #ifdef CONFIG_BLOCK
 extern int register_blkdev(unsigned int, const char *);
 extern void unregister_blkdev(unsigned int, const char *);
-extern void bdev_unhash_inode(dev_t dev);
+extern void bdev_cleanup_inode(dev_t dev);
 extern struct block_device *bdget(dev_t);
 extern struct block_device *bdgrab(struct block_device *bdev);
 extern void bd_set_size(struct block_device *, loff_t size);
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 0d3ba83a0f7f..6d27b78c9a79 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -237,6 +237,7 @@ static inline void inode_attach_wb(struct inode *inode, struct page *page)
 static inline void inode_detach_wb(struct inode *inode)
 {
 	if (inode->i_wb) {
+		WARN_ON_ONCE(!(inode->i_state & I_CLEAR));
 		wb_put(inode->i_wb);
 		inode->i_wb = NULL;
 	}
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/10] kobject: Export kobject_get_unless_zero()
  2017-02-09 12:44 [PATCH 0/10] block: Fix block device shutdown related races Jan Kara
                   ` (7 preceding siblings ...)
  2017-02-09 12:44 ` [PATCH 08/10] block: Fix oops in locked_inode_to_wb_and_lock_list() Jan Kara
@ 2017-02-09 12:44 ` Jan Kara
  2017-02-12  4:41   ` Tejun Heo
  2017-02-09 12:44 ` [PATCH 10/10] block: Fix oops scsi_disk_get() Jan Kara
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2017-02-09 12:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Tejun Heo, Dan Williams,
	Thiago Jung Bauermann, NeilBrown, Jan Kara

Make the function available for outside use and fortify it against NULL
kobject.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/kobject.h | 2 ++
 lib/kobject.c           | 5 ++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index e6284591599e..ca85cb80e99a 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -108,6 +108,8 @@ extern int __must_check kobject_rename(struct kobject *, const char *new_name);
 extern int __must_check kobject_move(struct kobject *, struct kobject *);
 
 extern struct kobject *kobject_get(struct kobject *kobj);
+extern struct kobject * __must_check kobject_get_unless_zero(
+						struct kobject *kobj);
 extern void kobject_put(struct kobject *kobj);
 
 extern const void *kobject_namespace(struct kobject *kobj);
diff --git a/lib/kobject.c b/lib/kobject.c
index 445dcaeb0f56..763d70a18941 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -601,12 +601,15 @@ struct kobject *kobject_get(struct kobject *kobj)
 }
 EXPORT_SYMBOL(kobject_get);
 
-static struct kobject * __must_check kobject_get_unless_zero(struct kobject *kobj)
+struct kobject * __must_check kobject_get_unless_zero(struct kobject *kobj)
 {
+	if (!kobj)
+		return NULL;
 	if (!kref_get_unless_zero(&kobj->kref))
 		kobj = NULL;
 	return kobj;
 }
+EXPORT_SYMBOL(kobject_get_unless_zero);
 
 /*
  * kobject_cleanup - free kobject resources.
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/10] block: Fix oops scsi_disk_get()
  2017-02-09 12:44 [PATCH 0/10] block: Fix block device shutdown related races Jan Kara
                   ` (8 preceding siblings ...)
  2017-02-09 12:44 ` [PATCH 09/10] kobject: Export kobject_get_unless_zero() Jan Kara
@ 2017-02-09 12:44 ` Jan Kara
  2017-02-12  4:43   ` Tejun Heo
  2017-02-09 14:52 ` [PATCH 0/10] block: Fix block device shutdown related races Thiago Jung Bauermann
  2017-02-13 14:27 ` Thiago Jung Bauermann
  11 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2017-02-09 12:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Tejun Heo, Dan Williams,
	Thiago Jung Bauermann, NeilBrown, Jan Kara

When device open races with device shutdown, we can get the following
oops in scsi_disk_get():

[11863.044351] general protection fault: 0000 [#1] SMP
[11863.045561] Modules linked in: scsi_debug xfs libcrc32c netconsole btrfs raid6_pq zlib_deflate lzo_compress xor [last unloaded: loop]
[11863.047853] CPU: 3 PID: 13042 Comm: hald-probe-stor Tainted: G W      4.10.0-rc2-xen+ #35
[11863.048030] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[11863.048030] task: ffff88007f438200 task.stack: ffffc90000fd0000
[11863.048030] RIP: 0010:scsi_disk_get+0x43/0x70
[11863.048030] RSP: 0018:ffffc90000fd3a08 EFLAGS: 00010202
[11863.048030] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007f56d000 RCX: 0000000000000000
[11863.048030] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffffff81a8d880
[11863.048030] RBP: ffffc90000fd3a18 R08: 0000000000000000 R09: 0000000000000001
[11863.059217] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffa
[11863.059217] R13: ffff880078872800 R14: ffff880070915540 R15: 000000000000001d
[11863.059217] FS:  00007f2611f71800(0000) GS:ffff88007f0c0000(0000) knlGS:0000000000000000
[11863.059217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11863.059217] CR2: 000000000060e048 CR3: 00000000778d4000 CR4: 00000000000006e0
[11863.059217] Call Trace:
[11863.059217]  ? disk_get_part+0x22/0x1f0
[11863.059217]  sd_open+0x39/0x130
[11863.059217]  __blkdev_get+0x69/0x430
[11863.059217]  ? bd_acquire+0x7f/0xc0
[11863.059217]  ? bd_acquire+0x96/0xc0
[11863.059217]  ? blkdev_get+0x350/0x350
[11863.059217]  blkdev_get+0x126/0x350
[11863.059217]  ? _raw_spin_unlock+0x2b/0x40
[11863.059217]  ? bd_acquire+0x7f/0xc0
[11863.059217]  ? blkdev_get+0x350/0x350
[11863.059217]  blkdev_open+0x65/0x80
...

As you can see RAX value is already poisoned showing that gendisk we got
is already freed. The problem is that get_gendisk() looks up device
number in ext_devt_idr and then does get_disk() which does kobject_get()
on the disks kobject. However the disk gets removed from ext_devt_idr
only in disk_release() (through blk_free_devt()) at which moment it has
already 0 refcount and is already on its way to be freed. Indeed we've
got a warning from kobject_get() about 0 refcount shortly before the
oops.

We fix the problem by using kobject_get_unless_zero() in get_disk() so
that get_disk() cannot get reference on a disk that is already being
freed.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/genhd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/genhd.c b/block/genhd.c
index 721921a140cc..e3dbc311c323 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1350,7 +1350,7 @@ struct kobject *get_disk(struct gendisk *disk)
 	owner = disk->fops->owner;
 	if (owner && !try_module_get(owner))
 		return NULL;
-	kobj = kobject_get(&disk_to_dev(disk)->kobj);
+	kobj = kobject_get_unless_zero(&disk_to_dev(disk)->kobj);
 	if (kobj == NULL) {
 		module_put(owner);
 		return NULL;
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/10] block: Fix block device shutdown related races
  2017-02-09 12:44 [PATCH 0/10] block: Fix block device shutdown related races Jan Kara
                   ` (9 preceding siblings ...)
  2017-02-09 12:44 ` [PATCH 10/10] block: Fix oops scsi_disk_get() Jan Kara
@ 2017-02-09 14:52 ` Thiago Jung Bauermann
  2017-02-09 15:48   ` Jan Kara
  2017-02-13 14:27 ` Thiago Jung Bauermann
  11 siblings, 1 reply; 29+ messages in thread
From: Thiago Jung Bauermann @ 2017-02-09 14:52 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Tejun Heo,
	Dan Williams, NeilBrown

Hello Jan,

Am Donnerstag, 9. Februar 2017, 13:44:23 BRST schrieb Jan Kara:
> People, please have a look at patches. The are mostly simple however the
> interactions are rather complex so I may have missed something. Also I'm
> happy for any additional testing these patches can get - I've stressed th=
em
> with Omar's script, tested memcg writeback, tested static (not udev manag=
ed)
> device inodes.

Thank you for these fixes. I will have them tested and report back how it=20
goes.

Can you tell which branch I should apply them on? I tried a number of branc=
hes=20
in linux-block (and applied the bdi lifetime v3 patches if the branch didn'=
t=20
already had them) but this series either didn't apply or the build failed=20
with:

/home/bauermann/trabalho/src/linux-2.6.git/fs/block_dev.c: In function=20
=E2=80=98bd_acquire=E2=80=99:
/home/bauermann/trabalho/src/linux-2.6.git/fs/block_dev.c:1063:13: error:=20
passing argument 1 of =E2=80=98bd_forget=E2=80=99 from incompatible pointer=
 type [-
Werror=3Dincompatible-pointer-types]
   bd_forget(bdev);
             ^
In file included from /home/bauermann/trabalho/src/linux-2.6.git/include/
linux/device_cgroup.h:1:0,
                 from /home/bauermann/trabalho/src/linux-2.6.git/fs/
block_dev.c:14:
/home/bauermann/trabalho/src/linux-2.6.git/include/linux/fs.h:2351:13: note=
:=20
expected =E2=80=98struct inode *=E2=80=99 but argument is of type =E2=80=98=
struct block_device *=E2=80=99
 extern void bd_forget(struct inode *inode);
             ^
cc1: some warnings being treated as errors


=2D-=20
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/10] block: Fix block device shutdown related races
  2017-02-09 14:52 ` [PATCH 0/10] block: Fix block device shutdown related races Thiago Jung Bauermann
@ 2017-02-09 15:48   ` Jan Kara
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2017-02-09 15:48 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: Jan Kara, Jens Axboe, linux-block, Christoph Hellwig, Tejun Heo,
	Dan Williams, NeilBrown

[-- Attachment #1: Type: text/plain, Size: 2028 bytes --]

On Thu 09-02-17 12:52:47, Thiago Jung Bauermann wrote:
> Hello Jan,
> 
> Am Donnerstag, 9. Februar 2017, 13:44:23 BRST schrieb Jan Kara:
> > People, please have a look at patches. The are mostly simple however the
> > interactions are rather complex so I may have missed something. Also I'm
> > happy for any additional testing these patches can get - I've stressed them
> > with Omar's script, tested memcg writeback, tested static (not udev managed)
> > device inodes.
> 
> Thank you for these fixes. I will have them tested and report back how it 
> goes.
> 
> Can you tell which branch I should apply them on? I tried a number of branches 
> in linux-block (and applied the bdi lifetime v3 patches if the branch didn't 
> already had them) but this series either didn't apply or the build failed 
> with:
> 
> /home/bauermann/trabalho/src/linux-2.6.git/fs/block_dev.c: In function 
> ‘bd_acquire’:
> /home/bauermann/trabalho/src/linux-2.6.git/fs/block_dev.c:1063:13: error: 
> passing argument 1 of ‘bd_forget’ from incompatible pointer type [-
> Werror=incompatible-pointer-types]
>    bd_forget(bdev);
>              ^
> In file included from /home/bauermann/trabalho/src/linux-2.6.git/include/
> linux/device_cgroup.h:1:0,
>                  from /home/bauermann/trabalho/src/linux-2.6.git/fs/
> block_dev.c:14:
> /home/bauermann/trabalho/src/linux-2.6.git/include/linux/fs.h:2351:13: note: 
> expected ‘struct inode *’ but argument is of type ‘struct block_device *’
>  extern void bd_forget(struct inode *inode);
>              ^
> cc1: some warnings being treated as errors

Indeed, I'm wondering how this could pass one of the tests I did... Hum.
Anyway thanks for spotting this and attached is a fixed up version of the
patch 3.

I've pushed out a branch with all BDI patches I have accumulated to

git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git bdi

It includes filesystem-bdi cleanup patches as well on top of these fixes.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0003-block-Revalidate-i_bdev-reference-in-bd_aquire.patch --]
[-- Type: text/x-patch, Size: 2001 bytes --]

>From aaf612333753b948a96aebe4a2f8066ed45ef164 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 9 Feb 2017 12:16:30 +0100
Subject: [PATCH 03/10] block: Revalidate i_bdev reference in bd_aquire()

When a device gets removed, block device inode unhashed so that it is not
used anymore (bdget() will not find it anymore). Later when a new device
gets created with the same device number, we create new block device
inode. However there may be file system device inodes whose i_bdev still
points to the original block device inode and thus we get two active
block device inodes for the same device. They will share the same
gendisk so the only visible differences will be that page caches will
not be coherent and BDIs will be different (the old block device inode
still points to unregistered BDI).

Fix the problem by checking in bd_acquire() whether i_bdev still points
to active block device inode and re-lookup the block device if not. That
way any open of a block device happening after the old device has been
removed will get correct block device inode.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 601b71b76d7f..68e855fdce58 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1043,13 +1043,22 @@ static struct block_device *bd_acquire(struct inode *inode)
 
 	spin_lock(&bdev_lock);
 	bdev = inode->i_bdev;
-	if (bdev) {
+	if (bdev && !inode_unhashed(bdev->bd_inode)) {
 		bdgrab(bdev);
 		spin_unlock(&bdev_lock);
 		return bdev;
 	}
 	spin_unlock(&bdev_lock);
 
+	/*
+	 * i_bdev references block device inode that was already shut down
+	 * (corresponding device got removed).  Remove the reference and look
+	 * up block device inode again just in case new device got
+	 * reestablished under the same device number.
+	 */
+	if (bdev)
+		bd_forget(inode);
+
 	bdev = bdget(inode->i_rdev);
 	if (bdev) {
 		spin_lock(&bdev_lock);
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 03/10] block: Revalidate i_bdev reference in bd_aquire()
  2017-02-09 12:44 ` [PATCH 03/10] block: Revalidate i_bdev reference in bd_aquire() Jan Kara
@ 2017-02-09 15:54   ` Jan Kara
  2017-02-12  4:22     ` Tejun Heo
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2017-02-09 15:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Tejun Heo, Dan Williams,
	Thiago Jung Bauermann, NeilBrown, Jan Kara

[-- Attachment #1: Type: text/plain, Size: 2107 bytes --]

On Thu 09-02-17 13:44:26, Jan Kara wrote:
> When a device gets removed, block device inode unhashed so that it is not
> used anymore (bdget() will not find it anymore). Later when a new device
> gets created with the same device number, we create new block device
> inode. However there may be file system device inodes whose i_bdev still
> points to the original block device inode and thus we get two active
> block device inodes for the same device. They will share the same
> gendisk so the only visible differences will be that page caches will
> not be coherent and BDIs will be different (the old block device inode
> still points to unregistered BDI).
> 
> Fix the problem by checking in bd_acquire() whether i_bdev still points
> to active block device inode and re-lookup the block device if not. That
> way any open of a block device happening after the old device has been
> removed will get correct block device inode.

Thiago spotted a stupid bug in this patch (calling bd_forget() on bdev
instead of inode). Fixed version is attached.

								Honza
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/block_dev.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 601b71b76d7f..360439373a66 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1043,13 +1043,22 @@ static struct block_device *bd_acquire(struct inode *inode)
>  
>  	spin_lock(&bdev_lock);
>  	bdev = inode->i_bdev;
> -	if (bdev) {
> +	if (bdev && !inode_unhashed(bdev->bd_inode)) {
>  		bdgrab(bdev);
>  		spin_unlock(&bdev_lock);
>  		return bdev;
>  	}
>  	spin_unlock(&bdev_lock);
>  
> +	/*
> +	 * i_bdev references block device inode that was already shut down
> +	 * (corresponding device got removed).  Remove the reference and look
> +	 * up block device inode again just in case new device got
> +	 * reestablished under the same device number.
> +	 */
> +	if (bdev)
> +		bd_forget(bdev);
> +
>  	bdev = bdget(inode->i_rdev);
>  	if (bdev) {
>  		spin_lock(&bdev_lock);
> -- 
> 2.10.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0003-block-Revalidate-i_bdev-reference-in-bd_aquire.patch --]
[-- Type: text/x-patch, Size: 2001 bytes --]

>From aaf612333753b948a96aebe4a2f8066ed45ef164 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 9 Feb 2017 12:16:30 +0100
Subject: [PATCH 03/10] block: Revalidate i_bdev reference in bd_aquire()

When a device gets removed, block device inode unhashed so that it is not
used anymore (bdget() will not find it anymore). Later when a new device
gets created with the same device number, we create new block device
inode. However there may be file system device inodes whose i_bdev still
points to the original block device inode and thus we get two active
block device inodes for the same device. They will share the same
gendisk so the only visible differences will be that page caches will
not be coherent and BDIs will be different (the old block device inode
still points to unregistered BDI).

Fix the problem by checking in bd_acquire() whether i_bdev still points
to active block device inode and re-lookup the block device if not. That
way any open of a block device happening after the old device has been
removed will get correct block device inode.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 601b71b76d7f..68e855fdce58 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1043,13 +1043,22 @@ static struct block_device *bd_acquire(struct inode *inode)
 
 	spin_lock(&bdev_lock);
 	bdev = inode->i_bdev;
-	if (bdev) {
+	if (bdev && !inode_unhashed(bdev->bd_inode)) {
 		bdgrab(bdev);
 		spin_unlock(&bdev_lock);
 		return bdev;
 	}
 	spin_unlock(&bdev_lock);
 
+	/*
+	 * i_bdev references block device inode that was already shut down
+	 * (corresponding device got removed).  Remove the reference and look
+	 * up block device inode again just in case new device got
+	 * reestablished under the same device number.
+	 */
+	if (bdev)
+		bd_forget(inode);
+
 	bdev = bdget(inode->i_rdev);
 	if (bdev) {
 		spin_lock(&bdev_lock);
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 07/10] writeback: Implement reliable switching to default writeback structure
  2017-02-09 12:44 ` [PATCH 07/10] writeback: Implement reliable switching to default writeback structure Jan Kara
@ 2017-02-10  2:19   ` NeilBrown
  2017-02-10 13:20     ` Jan Kara
  0 siblings, 1 reply; 29+ messages in thread
From: NeilBrown @ 2017-02-10  2:19 UTC (permalink / raw)
  To: Jan Kara, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Tejun Heo, Dan Williams,
	Thiago Jung Bauermann, NeilBrown, Jan Kara

[-- Attachment #1: Type: text/plain, Size: 7516 bytes --]

On Thu, Feb 09 2017, Jan Kara wrote:

> Currently switching of inode between different writeback structures is
> asynchronous and not guaranteed to succeed. Add a variant of switching
> that is synchronous and reliable so that it can reliably move inode to
> the default writeback structure (bdi->wb) when writeback on bdi is going
> to be shutdown.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/fs-writeback.c         | 60 ++++++++++++++++++++++++++++++++++++++++-------
>  include/linux/fs.h        |  3 ++-
>  include/linux/writeback.h |  6 +++++
>  3 files changed, 60 insertions(+), 9 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 23dc97cf2a50..52992a1036b1 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -332,14 +332,11 @@ struct inode_switch_wbs_context {
>  	struct work_struct	work;
>  };
>  
> -static void inode_switch_wbs_work_fn(struct work_struct *work)
> +static void do_inode_switch_wbs(struct inode *inode,
> +				struct bdi_writeback *new_wb)
>  {
> -	struct inode_switch_wbs_context *isw =
> -		container_of(work, struct inode_switch_wbs_context, work);
> -	struct inode *inode = isw->inode;
>  	struct address_space *mapping = inode->i_mapping;
>  	struct bdi_writeback *old_wb = inode->i_wb;
> -	struct bdi_writeback *new_wb = isw->new_wb;
>  	struct radix_tree_iter iter;
>  	bool switched = false;
>  	void **slot;
> @@ -436,15 +433,29 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
>  	spin_unlock(&new_wb->list_lock);
>  	spin_unlock(&old_wb->list_lock);
>  
> +	/*
> +	 * Make sure waitqueue_active() check in wake_up_bit() cannot happen
> +	 * before I_WB_SWITCH is cleared. Pairs with the barrier in
> +	 * set_task_state() after wait_on_bit() added waiter to the wait queue.

I think you mean "set_current_state()" ??

It's rather a trap for the unwary, this need for a smp_mb().
Greping for wake_up_bit(), I find quite a few places with barriers -
sometimes clear_bit_unlock() or spin_unlock() - but

fs/block_dev.c-         whole->bd_claiming = NULL;
fs/block_dev.c:         wake_up_bit(&whole->bd_claiming, 0);

fs/cifs/connect.c-      clear_bit(TCON_LINK_PENDING, &tlink->tl_flags);
fs/cifs/connect.c:      wake_up_bit(&tlink->tl_flags, TCON_LINK_PENDING);

fs/cifs/misc.c-                 clear_bit(CIFS_INODE_PENDING_WRITERS, &cinode->flags);
fs/cifs/misc.c:                 wake_up_bit(&cinode->flags, CIFS_INODE_PENDING_WRITERS);

(several more in cifs)

net/sunrpc/xprt.c-      clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
net/sunrpc/xprt.c-      xprt->ops->close(xprt);
net/sunrpc/xprt.c-      xprt_release_write(xprt, NULL);
net/sunrpc/xprt.c:      wake_up_bit(&xprt->state, XPRT_LOCKED);
(there might be a barrier in ->close or xprt_release_write() I guess)

security/keys/gc.c-             clear_bit(KEY_GC_REAPING_KEYTYPE, &key_gc_flags);
security/keys/gc.c:             wake_up_bit(&key_gc_flags, KEY_GC_REAPING_KEYTYPE);

I wonder if there is a good way to make this less error-prone.
I would suggest that wake_up_bit() should always have a barrier, and
__wake_up_bit() is needed to avoid it, but there is already a
__wake_up_bit() with a slightly different interface.


In this case, you have a spin_unlock() just before the wake_up_bit().
It is my understand that it would provide enough of a barrier (all
writes before are globally visible after), so do you really need
the barrier here?

> +	 */
> +	smp_mb();
> +	wake_up_bit(&inode->i_state, __I_WB_SWITCH);
> +
>  	if (switched) {
>  		wb_wakeup(new_wb);
>  		wb_put(old_wb);
>  	}
> -	wb_put(new_wb);
> +}
>  
> -	iput(inode);
> -	kfree(isw);
> +static void inode_switch_wbs_work_fn(struct work_struct *work)
> +{
> +	struct inode_switch_wbs_context *isw =
> +		container_of(work, struct inode_switch_wbs_context, work);
>  
> +	do_inode_switch_wbs(isw->inode, isw->new_wb);
> +	wb_put(isw->new_wb);
> +	iput(isw->inode);
> +	kfree(isw);
>  	atomic_dec(&isw_nr_in_flight);
>  }
>  
> @@ -521,6 +532,39 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
>  }
>  
>  /**
> + * inode_switch_to_default_wb_sync - change the wb association of an inode to
> + *	the default writeback structure synchronously
> + * @inode: target inode
> + *
> + * Switch @inode's wb association to the default writeback structure (bdi->wb).
> + * Unlike inode_switch_wbs() the switching is performed synchronously and we
> + * guarantee the inode is switched to the default writeback structure when this
> + * function returns. Nothing prevents from someone else switching inode to
> + * another writeback structure just when we are done though. Preventing that is
> + * upto the caller if needed.
> + */
> +void inode_switch_to_default_wb_sync(struct inode *inode)
> +{
> +	struct backing_dev_info *bdi = inode_to_bdi(inode);
> +
> +	/* while holding I_WB_SWITCH, no one else can update the association */
> +	spin_lock(&inode->i_lock);
> +	if (WARN_ON_ONCE(inode->i_state & I_FREEING) ||
> +	    !inode_to_wb_is_valid(inode) || inode_to_wb(inode) == &bdi->wb) {
> +		spin_unlock(&inode->i_lock);
> +		return;
> +	}
> +	__inode_wait_for_state_bit(inode, __I_WB_SWITCH);

I note that __inode_wait_for_state_bit() can drop and reclaim ->i_lock.
is it possible that:
  !inode_to_wb_is_valid(inode) || inode_to_wb(inode) == &bdi->wb)

could change while ->i_lock is unlocked?
It would be particular unfortunate if inode_to_wb(inode) became &bdi->wb
due to some thing thread, as do_inode_switch_wbs() will deadlock if
  inode_to_wb(inode) == &bdi->wb

i.e. do you need to repeat the test?

Thanks,
NeilBrown


> +	inode->i_state |= I_WB_SWITCH;
> +	spin_unlock(&inode->i_lock);
> +
> +	/* Make I_WB_SWITCH setting visible to unlocked users of i_wb */
> +	synchronize_rcu();
> +
> +	do_inode_switch_wbs(inode, &bdi->wb);
> +}
> +
> +/**
>   * wbc_attach_and_unlock_inode - associate wbc with target inode and unlock it
>   * @wbc: writeback_control of interest
>   * @inode: target inode
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c930cbc19342..319fb76f9081 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1929,7 +1929,8 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
>  #define I_DIRTY_TIME		(1 << 11)
>  #define __I_DIRTY_TIME_EXPIRED	12
>  #define I_DIRTY_TIME_EXPIRED	(1 << __I_DIRTY_TIME_EXPIRED)
> -#define I_WB_SWITCH		(1 << 13)
> +#define __I_WB_SWITCH		13
> +#define I_WB_SWITCH		(1 << __I_WB_SWITCH)
>  
>  #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
>  #define I_DIRTY_ALL (I_DIRTY | I_DIRTY_TIME)
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 5527d910ba3d..0d3ba83a0f7f 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -280,6 +280,8 @@ static inline void wbc_init_bio(struct writeback_control *wbc, struct bio *bio)
>  		bio_associate_blkcg(bio, wbc->wb->blkcg_css);
>  }
>  
> +void inode_switch_to_default_wb_sync(struct inode *inode);
> +
>  #else	/* CONFIG_CGROUP_WRITEBACK */
>  
>  static inline void inode_attach_wb(struct inode *inode, struct page *page)
> @@ -319,6 +321,10 @@ static inline void cgroup_writeback_umount(void)
>  {
>  }
>  
> +static inline void inode_switch_to_default_wb_sync(struct inode *inode)
> +{
> +}
> +
>  #endif	/* CONFIG_CGROUP_WRITEBACK */
>  
>  /*
> -- 
> 2.10.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/10] block: Move bdi_unregister() to del_gendisk()
  2017-02-09 12:44 ` [PATCH 04/10] block: Move bdi_unregister() to del_gendisk() Jan Kara
@ 2017-02-10  2:21   ` NeilBrown
  2017-02-12  4:31   ` Tejun Heo
  1 sibling, 0 replies; 29+ messages in thread
From: NeilBrown @ 2017-02-10  2:21 UTC (permalink / raw)
  To: Jan Kara, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Tejun Heo, Dan Williams,
	Thiago Jung Bauermann, NeilBrown, Jan Kara

[-- Attachment #1: Type: text/plain, Size: 1475 bytes --]

On Thu, Feb 09 2017, Jan Kara wrote:

> Commit 6cd18e711dd8 "block: destroy bdi before blockdev is
> unregistered." moved bdi unregistration (at that time through
> bdi_destroy()) from blk_release_queue() to blk_cleanup_queue() because
> it needs to happen before blk_unregister_region() call in del_gendisk()
> for MD. As much as it is fine for device registration / unregistration
> purposes, it does not fit our needs wrt writeback code. For those we
> will need bdi_unregister() to happen after bdev_unhash_inode() so that
> we are sure bdev inode is destroyed or soon to be destroyed (as soon as
> last inode reference is dropped and nobody should be holding bdev inode
> reference for long at this point) because bdi_unregister() may block
> waiting for bdev's inode i_wb reference to be dropped and that happens
> only once bdev inode gets destroyed.
>
> Also SCSI will free up the device number from sd_remove() called through
> a maze of callbacks from device_del() in __scsi_remove_device() before
> blk_cleanup_queue() and thus similar races as described in 6cd18e711dd8
> can happen for SCSI as well as reported by Omar [1]. Moving
> bdi_unregister() to del_gendisk() fixes these problems as well since
> del_gendisk() gets called from sd_remove() before freeing the device
> number.
>
> This also makes device_add_disk() (calling bdi_register_owner()) more
> symmetric with del_gendisk().

What an amazingly sensible idea!! I like that a lot.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 07/10] writeback: Implement reliable switching to default writeback structure
  2017-02-10  2:19   ` NeilBrown
@ 2017-02-10 13:20     ` Jan Kara
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2017-02-10 13:20 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jan Kara, Jens Axboe, linux-block, Christoph Hellwig, Tejun Heo,
	Dan Williams, Thiago Jung Bauermann, NeilBrown

On Fri 10-02-17 13:19:44, NeilBrown wrote:
> On Thu, Feb 09 2017, Jan Kara wrote:
> 
> > Currently switching of inode between different writeback structures is
> > asynchronous and not guaranteed to succeed. Add a variant of switching
> > that is synchronous and reliable so that it can reliably move inode to
> > the default writeback structure (bdi->wb) when writeback on bdi is going
> > to be shutdown.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/fs-writeback.c         | 60 ++++++++++++++++++++++++++++++++++++++++-------
> >  include/linux/fs.h        |  3 ++-
> >  include/linux/writeback.h |  6 +++++
> >  3 files changed, 60 insertions(+), 9 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 23dc97cf2a50..52992a1036b1 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -332,14 +332,11 @@ struct inode_switch_wbs_context {
> >  	struct work_struct	work;
> >  };
> >  
> > -static void inode_switch_wbs_work_fn(struct work_struct *work)
> > +static void do_inode_switch_wbs(struct inode *inode,
> > +				struct bdi_writeback *new_wb)
> >  {
> > -	struct inode_switch_wbs_context *isw =
> > -		container_of(work, struct inode_switch_wbs_context, work);
> > -	struct inode *inode = isw->inode;
> >  	struct address_space *mapping = inode->i_mapping;
> >  	struct bdi_writeback *old_wb = inode->i_wb;
> > -	struct bdi_writeback *new_wb = isw->new_wb;
> >  	struct radix_tree_iter iter;
> >  	bool switched = false;
> >  	void **slot;
> > @@ -436,15 +433,29 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
> >  	spin_unlock(&new_wb->list_lock);
> >  	spin_unlock(&old_wb->list_lock);
> >  
> > +	/*
> > +	 * Make sure waitqueue_active() check in wake_up_bit() cannot happen
> > +	 * before I_WB_SWITCH is cleared. Pairs with the barrier in
> > +	 * set_task_state() after wait_on_bit() added waiter to the wait queue.
> 
> I think you mean "set_current_state()" ??

Yes, I'll fix that.

> It's rather a trap for the unwary, this need for a smp_mb().
> Greping for wake_up_bit(), I find quite a few places with barriers -
> sometimes clear_bit_unlock() or spin_unlock() - but
> 
> fs/block_dev.c-         whole->bd_claiming = NULL;
> fs/block_dev.c:         wake_up_bit(&whole->bd_claiming, 0);
> 
> fs/cifs/connect.c-      clear_bit(TCON_LINK_PENDING, &tlink->tl_flags);
> fs/cifs/connect.c:      wake_up_bit(&tlink->tl_flags, TCON_LINK_PENDING);
> 
> fs/cifs/misc.c-                 clear_bit(CIFS_INODE_PENDING_WRITERS, &cinode->flags);
> fs/cifs/misc.c:                 wake_up_bit(&cinode->flags, CIFS_INODE_PENDING_WRITERS);
> 
> (several more in cifs)
> 
> net/sunrpc/xprt.c-      clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
> net/sunrpc/xprt.c-      xprt->ops->close(xprt);
> net/sunrpc/xprt.c-      xprt_release_write(xprt, NULL);
> net/sunrpc/xprt.c:      wake_up_bit(&xprt->state, XPRT_LOCKED);
> (there might be a barrier in ->close or xprt_release_write() I guess)
> 
> security/keys/gc.c-             clear_bit(KEY_GC_REAPING_KEYTYPE, &key_gc_flags);
> security/keys/gc.c:             wake_up_bit(&key_gc_flags, KEY_GC_REAPING_KEYTYPE);

Yup, the above look like bugs.

> I wonder if there is a good way to make this less error-prone.
> I would suggest that wake_up_bit() should always have a barrier, and
> __wake_up_bit() is needed to avoid it, but there is already a
> __wake_up_bit() with a slightly different interface.

Yeah, it is error-prone as all waitqueue_active() optimizations...
 
> In this case, you have a spin_unlock() just before the wake_up_bit().
> It is my understand that it would provide enough of a barrier (all
> writes before are globally visible after), so do you really need
> the barrier here?

I believe I do. spin_unlock() is a semi-permeable barrier - i.e., any read
or write from "outside" can be moved inside. So CPU is free to prefetch
values for waitqueue active checks before the spinlock is unlocked or even
before clearing I_WB_SWITCH bit.

> > +	 */
> > +	smp_mb();
> > +	wake_up_bit(&inode->i_state, __I_WB_SWITCH);
> > +
> >  	if (switched) {
> >  		wb_wakeup(new_wb);
> >  		wb_put(old_wb);
> >  	}
> > -	wb_put(new_wb);
> > +}
> >  
> > -	iput(inode);
> > -	kfree(isw);
> > +static void inode_switch_wbs_work_fn(struct work_struct *work)
> > +{
> > +	struct inode_switch_wbs_context *isw =
> > +		container_of(work, struct inode_switch_wbs_context, work);
> >  
> > +	do_inode_switch_wbs(isw->inode, isw->new_wb);
> > +	wb_put(isw->new_wb);
> > +	iput(isw->inode);
> > +	kfree(isw);
> >  	atomic_dec(&isw_nr_in_flight);
> >  }
> >  
> > @@ -521,6 +532,39 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
> >  }
> >  
> >  /**
> > + * inode_switch_to_default_wb_sync - change the wb association of an inode to
> > + *	the default writeback structure synchronously
> > + * @inode: target inode
> > + *
> > + * Switch @inode's wb association to the default writeback structure (bdi->wb).
> > + * Unlike inode_switch_wbs() the switching is performed synchronously and we
> > + * guarantee the inode is switched to the default writeback structure when this
> > + * function returns. Nothing prevents from someone else switching inode to
> > + * another writeback structure just when we are done though. Preventing that is
> > + * upto the caller if needed.
> > + */
> > +void inode_switch_to_default_wb_sync(struct inode *inode)
> > +{
> > +	struct backing_dev_info *bdi = inode_to_bdi(inode);
> > +
> > +	/* while holding I_WB_SWITCH, no one else can update the association */
> > +	spin_lock(&inode->i_lock);
> > +	if (WARN_ON_ONCE(inode->i_state & I_FREEING) ||
> > +	    !inode_to_wb_is_valid(inode) || inode_to_wb(inode) == &bdi->wb) {
> > +		spin_unlock(&inode->i_lock);
> > +		return;
> > +	}
> > +	__inode_wait_for_state_bit(inode, __I_WB_SWITCH);
> 
> I note that __inode_wait_for_state_bit() can drop and reclaim ->i_lock.
> is it possible that:
>   !inode_to_wb_is_valid(inode) || inode_to_wb(inode) == &bdi->wb)
> 
> could change while ->i_lock is unlocked?
> It would be particular unfortunate if inode_to_wb(inode) became &bdi->wb
> due to some thing thread, as do_inode_switch_wbs() will deadlock if
>   inode_to_wb(inode) == &bdi->wb
> 
> i.e. do you need to repeat the test?

That's a very good question and I think you are right that I need to repeat
the checks for inode->i_wb. Will fix. Thanks for review!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/10] block: Move bdev_unhash_inode() after invalidate_partition()
  2017-02-09 12:44 ` [PATCH 01/10] block: Move bdev_unhash_inode() after invalidate_partition() Jan Kara
@ 2017-02-12  3:58   ` Tejun Heo
  2017-02-20 14:53     ` Jan Kara
  0 siblings, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2017-02-12  3:58 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, NeilBrown

On Thu, Feb 09, 2017 at 01:44:24PM +0100, Jan Kara wrote:
> Move bdev_unhash_inode() after invalidate_partition() as
> invalidate_partition() looks up bdev and will unnecessarily recreate it
> if bdev_unhash_inode() destroyed it. Also use part_devt() when calling
> bdev_unhash_inode() instead of manually creating the device number.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Tejun Heo <tj@kernel.org>

> @@ -648,9 +648,8 @@ void del_gendisk(struct gendisk *disk)
>  	disk_part_iter_init(&piter, disk,
>  			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
>  	while ((part = disk_part_iter_next(&piter))) {
> -		bdev_unhash_inode(MKDEV(disk->major,
> -					disk->first_minor + part->partno));
>  		invalidate_partition(disk, part->partno);
> +		bdev_unhash_inode(part_devt(part));
>  		delete_partition(disk, part->partno);

So, before this patch, invalidate_partition() would have operated on a
newly created inode and thus wouldn't have actually invalidated the
existing bdev / mapping, right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/10] block: Unhash also block device inode for the whole device
  2017-02-09 12:44 ` [PATCH 02/10] block: Unhash also block device inode for the whole device Jan Kara
@ 2017-02-12  4:16   ` Tejun Heo
  0 siblings, 0 replies; 29+ messages in thread
From: Tejun Heo @ 2017-02-12  4:16 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, NeilBrown

On Thu, Feb 09, 2017 at 01:44:25PM +0100, Jan Kara wrote:
> Iteration over partitions in del_gendisk() omits part0. Add
> bdev_unhash_inode() call for the whole device. Otherwise if the device
> number gets reused, bdev inode will be still associated with the old
> (stale) bdi.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 03/10] block: Revalidate i_bdev reference in bd_aquire()
  2017-02-09 15:54   ` Jan Kara
@ 2017-02-12  4:22     ` Tejun Heo
  0 siblings, 0 replies; 29+ messages in thread
From: Tejun Heo @ 2017-02-12  4:22 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, NeilBrown

On Thu, Feb 09, 2017 at 04:54:17PM +0100, Jan Kara wrote:
> From aaf612333753b948a96aebe4a2f8066ed45ef164 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Thu, 9 Feb 2017 12:16:30 +0100
> Subject: [PATCH 03/10] block: Revalidate i_bdev reference in bd_aquire()
> 
> When a device gets removed, block device inode unhashed so that it is not
> used anymore (bdget() will not find it anymore). Later when a new device
> gets created with the same device number, we create new block device
> inode. However there may be file system device inodes whose i_bdev still
> points to the original block device inode and thus we get two active
> block device inodes for the same device. They will share the same
> gendisk so the only visible differences will be that page caches will
> not be coherent and BDIs will be different (the old block device inode
> still points to unregistered BDI).
> 
> Fix the problem by checking in bd_acquire() whether i_bdev still points
> to active block device inode and re-lookup the block device if not. That
> way any open of a block device happening after the old device has been
> removed will get correct block device inode.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/10] block: Move bdi_unregister() to del_gendisk()
  2017-02-09 12:44 ` [PATCH 04/10] block: Move bdi_unregister() to del_gendisk() Jan Kara
  2017-02-10  2:21   ` NeilBrown
@ 2017-02-12  4:31   ` Tejun Heo
  1 sibling, 0 replies; 29+ messages in thread
From: Tejun Heo @ 2017-02-12  4:31 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, NeilBrown

On Thu, Feb 09, 2017 at 01:44:27PM +0100, Jan Kara wrote:
> Commit 6cd18e711dd8 "block: destroy bdi before blockdev is
> unregistered." moved bdi unregistration (at that time through
> bdi_destroy()) from blk_release_queue() to blk_cleanup_queue() because
> it needs to happen before blk_unregister_region() call in del_gendisk()
> for MD. As much as it is fine for device registration / unregistration
> purposes, it does not fit our needs wrt writeback code. For those we
> will need bdi_unregister() to happen after bdev_unhash_inode() so that
> we are sure bdev inode is destroyed or soon to be destroyed (as soon as
> last inode reference is dropped and nobody should be holding bdev inode
> reference for long at this point) because bdi_unregister() may block
> waiting for bdev's inode i_wb reference to be dropped and that happens
> only once bdev inode gets destroyed.
> 
> Also SCSI will free up the device number from sd_remove() called through
> a maze of callbacks from device_del() in __scsi_remove_device() before
> blk_cleanup_queue() and thus similar races as described in 6cd18e711dd8
> can happen for SCSI as well as reported by Omar [1]. Moving
> bdi_unregister() to del_gendisk() fixes these problems as well since
> del_gendisk() gets called from sd_remove() before freeing the device
> number.
> 
> This also makes device_add_disk() (calling bdi_register_owner()) more
> symmetric with del_gendisk().
> 
> [1] http://marc.info/?l=linux-block&m=148554717109098&w=2
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 05/10] writeback: Generalize and standardize I_SYNC waiting function
  2017-02-09 12:44 ` [PATCH 05/10] writeback: Generalize and standardize I_SYNC waiting function Jan Kara
@ 2017-02-12  4:32   ` Tejun Heo
  0 siblings, 0 replies; 29+ messages in thread
From: Tejun Heo @ 2017-02-12  4:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, NeilBrown

On Thu, Feb 09, 2017 at 01:44:28PM +0100, Jan Kara wrote:
> __inode_wait_for_writeback() waits for I_SYNC on inode to get cleared.
> There's nothing specific regarting I_SYNC for that function. Generalize
> it so that we can use it also for I_WB_SWITCH bit. Also the function
> uses __wait_on_bit() unnecessarily. Switch it to wait_on_bit() to remove
> some code.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 08/10] block: Fix oops in locked_inode_to_wb_and_lock_list()
  2017-02-09 12:44 ` [PATCH 08/10] block: Fix oops in locked_inode_to_wb_and_lock_list() Jan Kara
@ 2017-02-12  4:40   ` Tejun Heo
  2017-02-20 16:58     ` Jan Kara
  0 siblings, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2017-02-12  4:40 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, NeilBrown

Hello, Jan.

On Thu, Feb 09, 2017 at 01:44:31PM +0100, Jan Kara wrote:
> When block device is closed, we call inode_detach_wb() in __blkdev_put()
> which sets inode->i_wb to NULL. That is contrary to expectations that
> inode->i_wb stays valid once set during the whole inode's lifetime and
> leads to oops in wb_get() in locked_inode_to_wb_and_lock_list() because
> inode_to_wb() returned NULL.
> 
> The reason why we called inode_detach_wb() is not valid anymore though.
> BDI is guaranteed to stay along until we call bdi_put() from
> bdev_evict_inode() so we can postpone calling inode_detach_wb() to that
> moment. A complication is that i_wb can point to non-root wb_writeback
> structure and in that case we do need to clean it up as bdi_unregister()
> blocks waiting for all non-root wb_writeback references to get dropped.
> Thus this i_wb reference could block device removal e.g. from
> __scsi_remove_device() (which indirectly ends up calling
> bdi_unregister()). We cannot rely on block device inode to go away soon
> (and thus i_wb reference to get dropped) as the device may got
> hot-removed e.g. under a mounted filesystem. We deal with these issues
> by switching block device inode from non-root wb_writeback structure to
> bdi->wb when needed.  Since this is rather expensive (requires
> synchronize_rcu()) we do the switching only in del_gendisk() when we
> know the device is going away.

So, the only reason cgwb_bdi_destroy() is synchronous is because bdi
destruction was synchronous.  Now that bdi is properly reference
counted and can be decoupled from gendisk / q destruction, I can't
think of a reason to keep cgwb destruction synchronous.  Switching
wb's on destruction is kinda clumsy and it almost always hurts to
expose synchronize_rcu() in userland visible paths.

Wouldn't something like the following work?

* Remove bdi->usage_cnt and the synchronous waiting in
  cgwb_bdi_destroy().

* Instead, make cgwb's hold bdi->refcnt and put it from
  cgwb_release_workfn().

Then, we don't have to switch during shutdown and can just let things
drain.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/10] kobject: Export kobject_get_unless_zero()
  2017-02-09 12:44 ` [PATCH 09/10] kobject: Export kobject_get_unless_zero() Jan Kara
@ 2017-02-12  4:41   ` Tejun Heo
  0 siblings, 0 replies; 29+ messages in thread
From: Tejun Heo @ 2017-02-12  4:41 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, NeilBrown

On Thu, Feb 09, 2017 at 01:44:32PM +0100, Jan Kara wrote:
> Make the function available for outside use and fortify it against NULL
> kobject.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Tejun Heo <tj@kernel.org>

Cc Greg?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 10/10] block: Fix oops scsi_disk_get()
  2017-02-09 12:44 ` [PATCH 10/10] block: Fix oops scsi_disk_get() Jan Kara
@ 2017-02-12  4:43   ` Tejun Heo
  0 siblings, 0 replies; 29+ messages in thread
From: Tejun Heo @ 2017-02-12  4:43 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, NeilBrown

On Thu, Feb 09, 2017 at 01:44:33PM +0100, Jan Kara wrote:
> When device open races with device shutdown, we can get the following
> oops in scsi_disk_get():
> 
> [11863.044351] general protection fault: 0000 [#1] SMP
> [11863.045561] Modules linked in: scsi_debug xfs libcrc32c netconsole btrfs raid6_pq zlib_deflate lzo_compress xor [last unloaded: loop]
> [11863.047853] CPU: 3 PID: 13042 Comm: hald-probe-stor Tainted: G W      4.10.0-rc2-xen+ #35
> [11863.048030] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [11863.048030] task: ffff88007f438200 task.stack: ffffc90000fd0000
> [11863.048030] RIP: 0010:scsi_disk_get+0x43/0x70
> [11863.048030] RSP: 0018:ffffc90000fd3a08 EFLAGS: 00010202
> [11863.048030] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007f56d000 RCX: 0000000000000000
> [11863.048030] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffffff81a8d880
> [11863.048030] RBP: ffffc90000fd3a18 R08: 0000000000000000 R09: 0000000000000001
> [11863.059217] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffa
> [11863.059217] R13: ffff880078872800 R14: ffff880070915540 R15: 000000000000001d
> [11863.059217] FS:  00007f2611f71800(0000) GS:ffff88007f0c0000(0000) knlGS:0000000000000000
> [11863.059217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [11863.059217] CR2: 000000000060e048 CR3: 00000000778d4000 CR4: 00000000000006e0
> [11863.059217] Call Trace:
> [11863.059217]  ? disk_get_part+0x22/0x1f0
> [11863.059217]  sd_open+0x39/0x130
> [11863.059217]  __blkdev_get+0x69/0x430
> [11863.059217]  ? bd_acquire+0x7f/0xc0
> [11863.059217]  ? bd_acquire+0x96/0xc0
> [11863.059217]  ? blkdev_get+0x350/0x350
> [11863.059217]  blkdev_get+0x126/0x350
> [11863.059217]  ? _raw_spin_unlock+0x2b/0x40
> [11863.059217]  ? bd_acquire+0x7f/0xc0
> [11863.059217]  ? blkdev_get+0x350/0x350
> [11863.059217]  blkdev_open+0x65/0x80
> ...
> 
> As you can see RAX value is already poisoned showing that gendisk we got
> is already freed. The problem is that get_gendisk() looks up device
> number in ext_devt_idr and then does get_disk() which does kobject_get()
> on the disks kobject. However the disk gets removed from ext_devt_idr
> only in disk_release() (through blk_free_devt()) at which moment it has
> already 0 refcount and is already on its way to be freed. Indeed we've
> got a warning from kobject_get() about 0 refcount shortly before the
> oops.
> 
> We fix the problem by using kobject_get_unless_zero() in get_disk() so
> that get_disk() cannot get reference on a disk that is already being
> freed.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/10] block: Fix block device shutdown related races
  2017-02-09 12:44 [PATCH 0/10] block: Fix block device shutdown related races Jan Kara
                   ` (10 preceding siblings ...)
  2017-02-09 14:52 ` [PATCH 0/10] block: Fix block device shutdown related races Thiago Jung Bauermann
@ 2017-02-13 14:27 ` Thiago Jung Bauermann
  11 siblings, 0 replies; 29+ messages in thread
From: Thiago Jung Bauermann @ 2017-02-13 14:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Tejun Heo,
	Dan Williams, NeilBrown, lekshmicpillai

Hi,

Am Donnerstag, 9. Februar 2017, 13:44:23 BRST schrieb Jan Kara:
> People, please have a look at patches. The are mostly simple however the
> interactions are rather complex so I may have missed something. Also I'm
> happy for any additional testing these patches can get - I've stressed them
> with Omar's script, tested memcg writeback, tested static (not udev managed)
> device inodes.

Lekshmi has been stress-testing these patches for 4 days now without problems.
If this is the version that goes in, you can add:

Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com>

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/10] block: Move bdev_unhash_inode() after invalidate_partition()
  2017-02-12  3:58   ` Tejun Heo
@ 2017-02-20 14:53     ` Jan Kara
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2017-02-20 14:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jan Kara, Jens Axboe, linux-block, Christoph Hellwig,
	Dan Williams, Thiago Jung Bauermann, NeilBrown

On Sun 12-02-17 12:58:33, Tejun Heo wrote:
> On Thu, Feb 09, 2017 at 01:44:24PM +0100, Jan Kara wrote:
> > Move bdev_unhash_inode() after invalidate_partition() as
> > invalidate_partition() looks up bdev and will unnecessarily recreate it
> > if bdev_unhash_inode() destroyed it. Also use part_devt() when calling
> > bdev_unhash_inode() instead of manually creating the device number.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> Acked-by: Tejun Heo <tj@kernel.org>
> 
> > @@ -648,9 +648,8 @@ void del_gendisk(struct gendisk *disk)
> >  	disk_part_iter_init(&piter, disk,
> >  			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
> >  	while ((part = disk_part_iter_next(&piter))) {
> > -		bdev_unhash_inode(MKDEV(disk->major,
> > -					disk->first_minor + part->partno));
> >  		invalidate_partition(disk, part->partno);
> > +		bdev_unhash_inode(part_devt(part));
> >  		delete_partition(disk, part->partno);
> 
> So, before this patch, invalidate_partition() would have operated on a
> newly created inode and thus wouldn't have actually invalidated the
> existing bdev / mapping, right?

Yes, that is another effect. I'll add a note about this to the changelog.
Thanks for noting this.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 08/10] block: Fix oops in locked_inode_to_wb_and_lock_list()
  2017-02-12  4:40   ` Tejun Heo
@ 2017-02-20 16:58     ` Jan Kara
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2017-02-20 16:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jan Kara, Jens Axboe, linux-block, Christoph Hellwig,
	Dan Williams, Thiago Jung Bauermann, NeilBrown

On Sun 12-02-17 13:40:27, Tejun Heo wrote:
> Hello, Jan.
> 
> On Thu, Feb 09, 2017 at 01:44:31PM +0100, Jan Kara wrote:
> > When block device is closed, we call inode_detach_wb() in __blkdev_put()
> > which sets inode->i_wb to NULL. That is contrary to expectations that
> > inode->i_wb stays valid once set during the whole inode's lifetime and
> > leads to oops in wb_get() in locked_inode_to_wb_and_lock_list() because
> > inode_to_wb() returned NULL.
> > 
> > The reason why we called inode_detach_wb() is not valid anymore though.
> > BDI is guaranteed to stay along until we call bdi_put() from
> > bdev_evict_inode() so we can postpone calling inode_detach_wb() to that
> > moment. A complication is that i_wb can point to non-root wb_writeback
> > structure and in that case we do need to clean it up as bdi_unregister()
> > blocks waiting for all non-root wb_writeback references to get dropped.
> > Thus this i_wb reference could block device removal e.g. from
> > __scsi_remove_device() (which indirectly ends up calling
> > bdi_unregister()). We cannot rely on block device inode to go away soon
> > (and thus i_wb reference to get dropped) as the device may got
> > hot-removed e.g. under a mounted filesystem. We deal with these issues
> > by switching block device inode from non-root wb_writeback structure to
> > bdi->wb when needed.  Since this is rather expensive (requires
> > synchronize_rcu()) we do the switching only in del_gendisk() when we
> > know the device is going away.
> 
> So, the only reason cgwb_bdi_destroy() is synchronous is because bdi
> destruction was synchronous.  Now that bdi is properly reference
> counted and can be decoupled from gendisk / q destruction, I can't
> think of a reason to keep cgwb destruction synchronous.  Switching
> wb's on destruction is kinda clumsy and it almost always hurts to
> expose synchronize_rcu() in userland visible paths.
> 
> Wouldn't something like the following work?
> 
> * Remove bdi->usage_cnt and the synchronous waiting in
>   cgwb_bdi_destroy().
> 
> * Instead, make cgwb's hold bdi->refcnt and put it from
>   cgwb_release_workfn().
> 
> Then, we don't have to switch during shutdown and can just let things
> drain.

At first sight this looks workable and would mean less special code so I
like it. I'll experiment with it and see how it works out.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 08/10] block: Fix oops in locked_inode_to_wb_and_lock_list()
  2017-03-23  0:36 [PATCH 0/10 v5] " Jan Kara
@ 2017-03-23  0:37 ` Jan Kara
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2017-03-23  0:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Tejun Heo, Tahsin Erdogan, Omar Sandoval,
	Jan Kara

When block device is closed, we call inode_detach_wb() in __blkdev_put()
which sets inode->i_wb to NULL. That is contrary to expectations that
inode->i_wb stays valid once set during the whole inode's lifetime and
leads to oops in wb_get() in locked_inode_to_wb_and_lock_list() because
inode_to_wb() returned NULL.

The reason why we called inode_detach_wb() is not valid anymore though.
BDI is guaranteed to stay along until we call bdi_put() from
bdev_evict_inode() so we can postpone calling inode_detach_wb() to that
moment.

Also add a warning to catch if someone uses inode_detach_wb() in a
dangerous way.

Reported-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c            | 8 ++------
 include/linux/writeback.h | 1 +
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 53e2389ae4d4..f2d59f143ef4 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -885,6 +885,8 @@ static void bdev_evict_inode(struct inode *inode)
 	spin_lock(&bdev_lock);
 	list_del_init(&bdev->bd_list);
 	spin_unlock(&bdev_lock);
+	/* Detach inode from wb early as bdi_put() may free bdi->wb */
+	inode_detach_wb(inode);
 	if (bdev->bd_bdi != &noop_backing_dev_info) {
 		bdi_put(bdev->bd_bdi);
 		bdev->bd_bdi = &noop_backing_dev_info;
@@ -1875,12 +1877,6 @@ static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
 		kill_bdev(bdev);
 
 		bdev_write_inode(bdev);
-		/*
-		 * Detaching bdev inode from its wb in __destroy_inode()
-		 * is too late: the queue which embeds its bdi (along with
-		 * root wb) can be gone as soon as we put_disk() below.
-		 */
-		inode_detach_wb(bdev->bd_inode);
 	}
 	if (bdev->bd_contains == bdev) {
 		if (disk->fops->release)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index a3c0cbd7c888..d5815794416c 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -237,6 +237,7 @@ static inline void inode_attach_wb(struct inode *inode, struct page *page)
 static inline void inode_detach_wb(struct inode *inode)
 {
 	if (inode->i_wb) {
+		WARN_ON_ONCE(!(inode->i_state & I_CLEAR));
 		wb_put(inode->i_wb);
 		inode->i_wb = NULL;
 	}
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-03-23  0:37 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-09 12:44 [PATCH 0/10] block: Fix block device shutdown related races Jan Kara
2017-02-09 12:44 ` [PATCH 01/10] block: Move bdev_unhash_inode() after invalidate_partition() Jan Kara
2017-02-12  3:58   ` Tejun Heo
2017-02-20 14:53     ` Jan Kara
2017-02-09 12:44 ` [PATCH 02/10] block: Unhash also block device inode for the whole device Jan Kara
2017-02-12  4:16   ` Tejun Heo
2017-02-09 12:44 ` [PATCH 03/10] block: Revalidate i_bdev reference in bd_aquire() Jan Kara
2017-02-09 15:54   ` Jan Kara
2017-02-12  4:22     ` Tejun Heo
2017-02-09 12:44 ` [PATCH 04/10] block: Move bdi_unregister() to del_gendisk() Jan Kara
2017-02-10  2:21   ` NeilBrown
2017-02-12  4:31   ` Tejun Heo
2017-02-09 12:44 ` [PATCH 05/10] writeback: Generalize and standardize I_SYNC waiting function Jan Kara
2017-02-12  4:32   ` Tejun Heo
2017-02-09 12:44 ` [PATCH 06/10] writeback: Move __inode_wait_for_state_bit Jan Kara
2017-02-09 12:44 ` [PATCH 07/10] writeback: Implement reliable switching to default writeback structure Jan Kara
2017-02-10  2:19   ` NeilBrown
2017-02-10 13:20     ` Jan Kara
2017-02-09 12:44 ` [PATCH 08/10] block: Fix oops in locked_inode_to_wb_and_lock_list() Jan Kara
2017-02-12  4:40   ` Tejun Heo
2017-02-20 16:58     ` Jan Kara
2017-02-09 12:44 ` [PATCH 09/10] kobject: Export kobject_get_unless_zero() Jan Kara
2017-02-12  4:41   ` Tejun Heo
2017-02-09 12:44 ` [PATCH 10/10] block: Fix oops scsi_disk_get() Jan Kara
2017-02-12  4:43   ` Tejun Heo
2017-02-09 14:52 ` [PATCH 0/10] block: Fix block device shutdown related races Thiago Jung Bauermann
2017-02-09 15:48   ` Jan Kara
2017-02-13 14:27 ` Thiago Jung Bauermann
  -- strict thread matches above, loose matches on Subject: below --
2017-03-23  0:36 [PATCH 0/10 v5] " Jan Kara
2017-03-23  0:37 ` [PATCH 08/10] block: Fix oops in locked_inode_to_wb_and_lock_list() Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.