linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] fs, bdev: handle end of life
@ 2015-12-01 23:58 Dan Williams
  2015-12-01 23:58 ` [PATCH 1/3] block, fs: reliably communicate bdev end-of-life Dan Williams
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Dan Williams @ 2015-12-01 23:58 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, linux-nvdimm, Dave Chinner, xfs, Jens Axboe,
	Jan Kara, Tejun Heo, Matthew Wilcox, Ross Zwisler

As mentioned in [PATCH 1/3] "block, fs: reliably communicate bdev
end-of-life", historically we have waited for filesystem specific
heuristics to attempt to guess when a block device is gone.  Sometimes
this works, but in other cases the system can hang waiting for the fs to
trigger its shutdown protocol.

Now with DAX we need new actions, like unmapping all inodes, to be taken
upon a shutdown event.  Those actions need to be taken whether the
shutdown event comes from the block device being torn down, or some
other file system specific event.

For now, the approach taken in the following patches only affects xfs
and block drivers that are converted to use del_gendisk_queue().  We can
add more filesystems and driver support over time.

Note that 'bdi_gone' was chosen over 'shutdown' so as not to be confused
with generic_shutdown_super()

---

Dan Williams (3):
      block, fs: reliably communicate bdev end-of-life
      xfs: handle shutdown notifications
      writeback: fix false positive WARN in __mark_inode_dirty


 block/genhd.c                |   87 +++++++++++++++++++++++++++++++++++-------
 drivers/block/brd.c          |    3 -
 drivers/nvdimm/pmem.c        |    3 -
 drivers/s390/block/dcssblk.c |    6 +--
 fs/block_dev.c               |   79 +++++++++++++++++++++++++++++++++-----
 fs/xfs/xfs_super.c           |    9 ++++
 include/linux/fs.h           |    4 ++
 include/linux/genhd.h        |    1 
 mm/backing-dev.c             |    7 +++
 9 files changed, 166 insertions(+), 33 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] block, fs: reliably communicate bdev end-of-life
  2015-12-01 23:58 [PATCH 0/3] fs, bdev: handle end of life Dan Williams
@ 2015-12-01 23:58 ` Dan Williams
  2015-12-02 11:16   ` Jan Kara
  2015-12-01 23:58 ` [PATCH 2/3] xfs: handle shutdown notifications Dan Williams
  2015-12-01 23:58 ` [PATCH 3/3] writeback: fix false positive WARN in __mark_inode_dirty Dan Williams
  2 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2015-12-01 23:58 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, linux-nvdimm, Dave Chinner, Jens Axboe, Jan Kara,
	Matthew Wilcox, Ross Zwisler

Historically we have waited for filesystem specific heuristics to
attempt to guess when a block device is gone.  Sometimes this works, but
in other cases the system can hang waiting for the fs to trigger its
shutdown protocol.

The initial motivation for this investigation was to prevent DAX
mappings (direct mmap access to persistent memory) from leaking past the
lifetime of the hosting block device.  However, Dave points out that
these shutdown operations are needed in other scenarios.  Quoting Dave:

    For example, if we detect a free space corruption during allocation,
    it is not safe to trust *any active mapping* because we can't trust
    that we having handed out the same block to multiple owners. Hence
    on such a filesystem shutdown, we have to prevent any new DAX
    mapping from occurring and invalidate all existing mappings as we
    cannot allow userspace to modify any data or metadata until we've
    resolved the corruption situation.

The current block device shutdown sequence of del_gendisk +
blk_cleanup_queue is problematic.  We want to tell the fs after
blk_cleanup_queue that there is no possibility of recovery, but by that
time we have deleted partitions and lost the ability to find all the
super-blocks on a block device.

Introduce del_gendisk_queue to trigger ->quiesce() and ->bdi_gone()
notifications to all the filesystems hosted on the disk.  Where
->quiesce() are 'shutdown' operations while the bdev may still be alive,
and ->bdi_gone() is a set of actions to take after the backing device
is known to be permanently dead.

generic_quiesce_super and generic_bdi_gone, are the default operations
when a filesystem does not implement ->quiesce(), ->bdi_gone().  They
invalidate inodes and unmap DAX-inodes respectively.  For now only
->bdi_gone() has an associated super operation as xfs will implement
->bdi_gone() in a later patch.

Cc: Jan Kara <jack@suse.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/genhd.c                |   87 +++++++++++++++++++++++++++++++++++-------
 drivers/block/brd.c          |    3 -
 drivers/nvdimm/pmem.c        |    3 -
 drivers/s390/block/dcssblk.c |    6 +--
 fs/block_dev.c               |   78 ++++++++++++++++++++++++++++++++------
 include/linux/fs.h           |    2 +
 include/linux/genhd.h        |    1 
 7 files changed, 147 insertions(+), 33 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index e5cafa51567c..967fcfd63c98 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -634,24 +634,14 @@ void add_disk(struct gendisk *disk)
 }
 EXPORT_SYMBOL(add_disk);
 
-void del_gendisk(struct gendisk *disk)
+static void del_gendisk_start(struct gendisk *disk)
 {
-	struct disk_part_iter piter;
-	struct hd_struct *part;
-
 	blk_integrity_del(disk);
 	disk_del_events(disk);
+}
 
-	/* invalidate stuff */
-	disk_part_iter_init(&piter, disk,
-			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
-	while ((part = disk_part_iter_next(&piter))) {
-		invalidate_partition(disk, part->partno);
-		delete_partition(disk, part->partno);
-	}
-	disk_part_iter_exit(&piter);
-
-	invalidate_partition(disk, 0);
+static void del_gendisk_end(struct gendisk *disk)
+{
 	set_capacity(disk, 0);
 	disk->flags &= ~GENHD_FL_UP;
 
@@ -670,9 +660,78 @@ void del_gendisk(struct gendisk *disk)
 	pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);
 	device_del(disk_to_dev(disk));
 }
+
+#define for_each_part(part, piter) \
+	for (part = disk_part_iter_next(piter); part; \
+			part = disk_part_iter_next(piter))
+void del_gendisk(struct gendisk *disk)
+{
+	struct disk_part_iter piter;
+	struct hd_struct *part;
+
+	del_gendisk_start(disk);
+
+	/* invalidate stuff */
+	disk_part_iter_init(&piter, disk,
+			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
+	for_each_part(part, &piter) {
+		invalidate_partition(disk, part->partno);
+		delete_partition(disk, part->partno);
+	}
+	disk_part_iter_exit(&piter);
+	invalidate_partition(disk, 0);
+
+	del_gendisk_end(disk);
+}
 EXPORT_SYMBOL(del_gendisk);
 
 /**
+ * del_gendisk_queue - combined del_gendisk + blk_cleanup_queue
+ * @disk: disk to delete, invalidate, unmap, and end i/o
+ *
+ * This alternative for open coded calls to:
+ *     del_gendisk()
+ *     blk_cleanup_queue()
+ * ...is for notifying the filesystem that the block device has gone
+ * away.  This distinction is important for triggering a filesystem to
+ * abort its error recovery and for (DAX) capable devices.  DAX bypasses
+ * page cache and mappings go directly to storage media.  When such a
+ * disk is removed the pfn backing a mapping may be invalid or removed
+ * from the system.  Upon return accessing DAX mappings of this disk
+ * will trigger SIGBUS.
+ */
+void del_gendisk_queue(struct gendisk *disk)
+{
+	struct disk_part_iter piter;
+	struct hd_struct *part;
+
+	del_gendisk_start(disk);
+
+	/* pass1 sync fs + evict idle inodes */
+	disk_part_iter_init(&piter, disk,
+			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
+	for_each_part(part, &piter)
+		invalidate_partition(disk, part->partno);
+	disk_part_iter_exit(&piter);
+	invalidate_partition(disk, 0);
+
+	blk_cleanup_queue(disk->queue);
+
+	/* pass2 the queue is dead, trigger fs shutdown via ->bdi_gone() */
+	disk_part_iter_init(&piter, disk,
+			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
+	for_each_part(part, &piter) {
+		shutdown_partition(disk, part->partno);
+		delete_partition(disk, part->partno);
+	}
+	disk_part_iter_exit(&piter);
+	shutdown_partition(disk, 0);
+
+	del_gendisk_end(disk);
+}
+EXPORT_SYMBOL(del_gendisk_queue);
+
+/**
  * get_gendisk - get partitioning information for a given device
  * @devt: device to get partitioning information for
  * @partno: returned partition index
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index a5880f4ab40e..f149dd57bba3 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -532,7 +532,6 @@ out:
 static void brd_free(struct brd_device *brd)
 {
 	put_disk(brd->brd_disk);
-	blk_cleanup_queue(brd->brd_queue);
 	brd_free_pages(brd);
 	kfree(brd);
 }
@@ -560,7 +559,7 @@ out:
 static void brd_del_one(struct brd_device *brd)
 {
 	list_del(&brd->brd_list);
-	del_gendisk(brd->brd_disk);
+	del_gendisk_queue(brd->brd_disk);
 	brd_free(brd);
 }
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 8ee79893d2f5..6dd06e9d34b0 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -158,9 +158,8 @@ static void pmem_detach_disk(struct pmem_device *pmem)
 	if (!pmem->pmem_disk)
 		return;
 
-	del_gendisk(pmem->pmem_disk);
+	del_gendisk_queue(pmem->pmem_disk);
 	put_disk(pmem->pmem_disk);
-	blk_cleanup_queue(pmem->pmem_queue);
 }
 
 static int pmem_attach_disk(struct device *dev,
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 94a8f4ab57bc..0c3c968b57d9 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -388,8 +388,7 @@ removeseg:
 	}
 	list_del(&dev_info->lh);
 
-	del_gendisk(dev_info->gd);
-	blk_cleanup_queue(dev_info->dcssblk_queue);
+	del_gendisk_queue(dev_info->gd);
 	dev_info->gd->queue = NULL;
 	put_disk(dev_info->gd);
 	up_write(&dcssblk_devices_sem);
@@ -751,8 +750,7 @@ dcssblk_remove_store(struct device *dev, struct device_attribute *attr, const ch
 	}
 
 	list_del(&dev_info->lh);
-	del_gendisk(dev_info->gd);
-	blk_cleanup_queue(dev_info->dcssblk_queue);
+	del_gendisk_queue(dev_info->gd);
 	dev_info->gd->queue = NULL;
 	put_disk(dev_info->gd);
 	device_unregister(&dev_info->dev);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 1dd416bf72f7..39989e990df9 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1774,27 +1774,83 @@ fail:
 }
 EXPORT_SYMBOL(lookup_bdev);
 
+static int generic_quiesce_super(struct super_block *sb, bool kill_dirty)
+{
+	/*
+	 * no need to lock the super, get_super holds the read mutex so
+	 * the filesystem cannot go away under us (->put_super runs with
+	 * the write lock held).
+	 */
+	shrink_dcache_sb(sb);
+	return invalidate_inodes(sb, kill_dirty);
+}
+
 int __invalidate_device(struct block_device *bdev, bool kill_dirty)
 {
 	struct super_block *sb = get_super(bdev);
 	int res = 0;
 
-	if (sb) {
-		/*
-		 * no need to lock the super, get_super holds the
-		 * read mutex so the filesystem cannot go away
-		 * under us (->put_super runs with the write lock
-		 * hold).
-		 */
-		shrink_dcache_sb(sb);
-		res = invalidate_inodes(sb, kill_dirty);
-		drop_super(sb);
-	}
+	if (!sb)
+		goto out;
+
+	res = generic_quiesce_super(sb, kill_dirty);
+	drop_super(sb);
+ out:
 	invalidate_bdev(bdev);
+
 	return res;
 }
 EXPORT_SYMBOL(__invalidate_device);
 
+static void generic_bdi_gone(struct super_block *sb)
+{
+	struct inode *inode, *_inode = NULL;
+
+	spin_lock(&sb->s_inode_list_lock);
+	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+		spin_lock(&inode->i_lock);
+		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
+				|| !IS_DAX(inode)) {
+			spin_unlock(&inode->i_lock);
+			continue;
+		}
+		__iget(inode);
+		spin_unlock(&inode->i_lock);
+		spin_unlock(&sb->s_inode_list_lock);
+
+		unmap_mapping_range(inode->i_mapping, 0, 0, 1);
+		iput(_inode);
+		_inode = inode;
+		cond_resched();
+
+		spin_lock(&sb->s_inode_list_lock);
+	}
+	spin_unlock(&sb->s_inode_list_lock);
+	iput(_inode);
+}
+
+void shutdown_partition(struct gendisk *disk, int partno)
+{
+	struct block_device *bdev = bdget_disk(disk, partno);
+	struct super_block *sb = get_super(bdev);
+
+	if (!bdev)
+		return;
+
+	if (!sb) {
+		bdput(bdev);
+		return;
+	}
+
+	if (sb->s_op->bdi_gone)
+		sb->s_op->bdi_gone(sb);
+	else
+		generic_bdi_gone(sb);
+
+	drop_super(sb);
+	bdput(bdev);
+}
+
 void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
 {
 	struct inode *inode, *old_inode = NULL;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3aa514254161..0e201ed38045 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1713,6 +1713,7 @@ struct super_operations {
 				  struct shrink_control *);
 	long (*free_cached_objects)(struct super_block *,
 				    struct shrink_control *);
+	void (*bdi_gone)(struct super_block *);
 };
 
 /*
@@ -2390,6 +2391,7 @@ extern int revalidate_disk(struct gendisk *);
 extern int check_disk_change(struct block_device *);
 extern int __invalidate_device(struct block_device *, bool);
 extern int invalidate_partition(struct gendisk *, int);
+extern void shutdown_partition(struct gendisk *, int);
 #endif
 unsigned long invalidate_mapping_pages(struct address_space *mapping,
 					pgoff_t start, pgoff_t end);
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 847cc1d91634..028cf15a8a57 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -431,6 +431,7 @@ extern void part_round_stats(int cpu, struct hd_struct *part);
 /* block/genhd.c */
 extern void add_disk(struct gendisk *disk);
 extern void del_gendisk(struct gendisk *gp);
+extern void del_gendisk_queue(struct gendisk *disk);
 extern struct gendisk *get_gendisk(dev_t dev, int *partno);
 extern struct block_device *bdget_disk(struct gendisk *disk, int partno);
 


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] xfs: handle shutdown notifications
  2015-12-01 23:58 [PATCH 0/3] fs, bdev: handle end of life Dan Williams
  2015-12-01 23:58 ` [PATCH 1/3] block, fs: reliably communicate bdev end-of-life Dan Williams
@ 2015-12-01 23:58 ` Dan Williams
  2015-12-23  2:39   ` Dan Williams
  2016-01-03 21:05   ` Dave Chinner
  2015-12-01 23:58 ` [PATCH 3/3] writeback: fix false positive WARN in __mark_inode_dirty Dan Williams
  2 siblings, 2 replies; 10+ messages in thread
From: Dan Williams @ 2015-12-01 23:58 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-block, linux-nvdimm, Dave Chinner, xfs

Force a filesystem shutdown when the backing device is known to be dead.
I.e. blk_queue_enter() permanently returns -ENODEV.

Cc: xfs@oss.sgi.com
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/block_dev.c     |    3 ++-
 fs/xfs/xfs_super.c |    9 +++++++++
 include/linux/fs.h |    2 ++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 39989e990df9..dfe9a53a7c53 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1802,7 +1802,7 @@ int __invalidate_device(struct block_device *bdev, bool kill_dirty)
 }
 EXPORT_SYMBOL(__invalidate_device);
 
-static void generic_bdi_gone(struct super_block *sb)
+void generic_bdi_gone(struct super_block *sb)
 {
 	struct inode *inode, *_inode = NULL;
 
@@ -1828,6 +1828,7 @@ static void generic_bdi_gone(struct super_block *sb)
 	spin_unlock(&sb->s_inode_list_lock);
 	iput(_inode);
 }
+EXPORT_SYMBOL(generic_bdi_gone);
 
 void shutdown_partition(struct gendisk *disk, int partno)
 {
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 36bd8825bfb0..63c36508e9db 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1618,6 +1618,14 @@ xfs_fs_free_cached_objects(
 	return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan);
 }
 
+static void
+xfs_fs_bdi_gone(
+	struct super_block *sb)
+{
+	xfs_force_shutdown(XFS_M(sb), SHUTDOWN_DEVICE_REQ);
+	generic_bdi_gone(sb);
+}
+
 static const struct super_operations xfs_super_operations = {
 	.alloc_inode		= xfs_fs_alloc_inode,
 	.destroy_inode		= xfs_fs_destroy_inode,
@@ -1632,6 +1640,7 @@ static const struct super_operations xfs_super_operations = {
 	.show_options		= xfs_fs_show_options,
 	.nr_cached_objects	= xfs_fs_nr_cached_objects,
 	.free_cached_objects	= xfs_fs_free_cached_objects,
+	.bdi_gone		= xfs_fs_bdi_gone,
 };
 
 static struct file_system_type xfs_fs_type = {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0e201ed38045..b1e8e049e4b8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2265,6 +2265,7 @@ extern struct super_block *freeze_bdev(struct block_device *);
 extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
 extern int fsync_bdev(struct block_device *);
+extern void generic_bdi_gone(struct super_block *sb);
 
 extern struct super_block *blockdev_superblock;
 
@@ -2277,6 +2278,7 @@ static inline void bd_forget(struct inode *inode) {}
 static inline int sync_blockdev(struct block_device *bdev) { return 0; }
 static inline void kill_bdev(struct block_device *bdev) {}
 static inline void invalidate_bdev(struct block_device *bdev) {}
+static inline void generic_bdi_gone(struct super_block *sb) {}
 
 static inline struct super_block *freeze_bdev(struct block_device *sb)
 {


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] writeback: fix false positive WARN in __mark_inode_dirty
  2015-12-01 23:58 [PATCH 0/3] fs, bdev: handle end of life Dan Williams
  2015-12-01 23:58 ` [PATCH 1/3] block, fs: reliably communicate bdev end-of-life Dan Williams
  2015-12-01 23:58 ` [PATCH 2/3] xfs: handle shutdown notifications Dan Williams
@ 2015-12-01 23:58 ` Dan Williams
  2015-12-02 10:35   ` Jan Kara
  2 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2015-12-01 23:58 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, linux-nvdimm, Dave Chinner, Jens Axboe, Jan Kara,
	Tejun Heo

This warning was added as a debugging aid way back in commit
500b067c5e6c "writeback: check for registered bdi in flusher add and
inode dirty" when we were switching over to per-bdi writeback.

Once the block device has been torn down it's no longer useful to
complain about unregistered bdi's.  Clear the writeback capability under
the the wb->list_lock(), so that __mark_inode_dirty has no opportunity
to race bdi_unregister() to this WARN() condition.

Alternatively we could just delete the warning...

Found this while testing block device remove from underneath an active
fs triggering traces like:

 WARNING: CPU: 6 PID: 2129 at fs/fs-writeback.c:2065 __mark_inode_dirty+0x261/0x350()
 bdi-block not registered
 [..]
 Call Trace:
  [<ffffffff81459f02>] dump_stack+0x44/0x62
  [<ffffffff810a1f32>] warn_slowpath_common+0x82/0xc0
  [<ffffffff810a1fcc>] warn_slowpath_fmt+0x5c/0x80
  [<ffffffff812830b1>] __mark_inode_dirty+0x261/0x350
  [<ffffffff8126d019>] generic_update_time+0x79/0xd0
  [<ffffffff8126d19d>] file_update_time+0xbd/0x110
  [<ffffffff812e4ab8>] ext4_dax_fault+0x68/0x110
  [<ffffffff811f7f3e>] __do_fault+0x4e/0xf0
  [<ffffffff811fc077>] handle_mm_fault+0x5e7/0x1b50
  [<ffffffff811fbae1>] ? handle_mm_fault+0x51/0x1b50
  [<ffffffff81068981>] __do_page_fault+0x191/0x3f0
  [<ffffffff81068caf>] trace_do_page_fault+0x4f/0x120
  [<ffffffff810630fa>] do_async_page_fault+0x1a/0xa0
  [<ffffffff819023f8>] async_page_fault+0x28/0x30

Cc: Jan Kara <jack@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 mm/backing-dev.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 8ed2ffd963c5..24e61acf74a7 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -343,10 +343,17 @@ static void wb_shutdown(struct bdi_writeback *wb)
 {
 	/* Make sure nobody queues further work */
 	spin_lock_bh(&wb->work_lock);
+
 	if (!test_and_clear_bit(WB_registered, &wb->state)) {
 		spin_unlock_bh(&wb->work_lock);
 		return;
 	}
+
+	/* tell __mark_inode_dirty that writeback is no longer possible */
+	spin_lock(&wb->list_lock);
+	wb->bdi->capabilities |= BDI_CAP_NO_WRITEBACK;
+	spin_unlock(&wb->list_lock);
+
 	spin_unlock_bh(&wb->work_lock);
 
 	/*


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] writeback: fix false positive WARN in __mark_inode_dirty
  2015-12-01 23:58 ` [PATCH 3/3] writeback: fix false positive WARN in __mark_inode_dirty Dan Williams
@ 2015-12-02 10:35   ` Jan Kara
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2015-12-02 10:35 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-fsdevel, linux-block, linux-nvdimm, Dave Chinner,
	Jens Axboe, Jan Kara, Tejun Heo

On Tue 01-12-15 15:58:52, Dan Williams wrote:
> This warning was added as a debugging aid way back in commit
> 500b067c5e6c "writeback: check for registered bdi in flusher add and
> inode dirty" when we were switching over to per-bdi writeback.
> 
> Once the block device has been torn down it's no longer useful to
> complain about unregistered bdi's.  Clear the writeback capability under
> the the wb->list_lock(), so that __mark_inode_dirty has no opportunity
> to race bdi_unregister() to this WARN() condition.
> 
> Alternatively we could just delete the warning...
> 
> Found this while testing block device remove from underneath an active
> fs triggering traces like:
> 
>  WARNING: CPU: 6 PID: 2129 at fs/fs-writeback.c:2065 __mark_inode_dirty+0x261/0x350()
>  bdi-block not registered
>  [..]
>  Call Trace:
>   [<ffffffff81459f02>] dump_stack+0x44/0x62
>   [<ffffffff810a1f32>] warn_slowpath_common+0x82/0xc0
>   [<ffffffff810a1fcc>] warn_slowpath_fmt+0x5c/0x80
>   [<ffffffff812830b1>] __mark_inode_dirty+0x261/0x350
>   [<ffffffff8126d019>] generic_update_time+0x79/0xd0
>   [<ffffffff8126d19d>] file_update_time+0xbd/0x110
>   [<ffffffff812e4ab8>] ext4_dax_fault+0x68/0x110
>   [<ffffffff811f7f3e>] __do_fault+0x4e/0xf0
>   [<ffffffff811fc077>] handle_mm_fault+0x5e7/0x1b50
>   [<ffffffff811fbae1>] ? handle_mm_fault+0x51/0x1b50
>   [<ffffffff81068981>] __do_page_fault+0x191/0x3f0
>   [<ffffffff81068caf>] trace_do_page_fault+0x4f/0x120
>   [<ffffffff810630fa>] do_async_page_fault+0x1a/0xa0
>   [<ffffffff819023f8>] async_page_fault+0x28/0x30
> 
> Cc: Jan Kara <jack@suse.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

OK, looks good to me. There is still a tiny race window where the warning
could trigger but I don't think it's worth caring about. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  mm/backing-dev.c |    7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 8ed2ffd963c5..24e61acf74a7 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -343,10 +343,17 @@ static void wb_shutdown(struct bdi_writeback *wb)
>  {
>  	/* Make sure nobody queues further work */
>  	spin_lock_bh(&wb->work_lock);
> +
>  	if (!test_and_clear_bit(WB_registered, &wb->state)) {
>  		spin_unlock_bh(&wb->work_lock);
>  		return;
>  	}
> +
> +	/* tell __mark_inode_dirty that writeback is no longer possible */
> +	spin_lock(&wb->list_lock);
> +	wb->bdi->capabilities |= BDI_CAP_NO_WRITEBACK;
> +	spin_unlock(&wb->list_lock);
> +
>  	spin_unlock_bh(&wb->work_lock);
>  
>  	/*
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] block, fs: reliably communicate bdev end-of-life
  2015-12-01 23:58 ` [PATCH 1/3] block, fs: reliably communicate bdev end-of-life Dan Williams
@ 2015-12-02 11:16   ` Jan Kara
  2015-12-02 15:37     ` Dan Williams
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2015-12-02 11:16 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-fsdevel, linux-block, linux-nvdimm, Dave Chinner,
	Jens Axboe, Jan Kara, Matthew Wilcox, Ross Zwisler

On Tue 01-12-15 15:58:41, Dan Williams wrote:
> Historically we have waited for filesystem specific heuristics to
> attempt to guess when a block device is gone.  Sometimes this works, but
> in other cases the system can hang waiting for the fs to trigger its
> shutdown protocol.
> 
> The initial motivation for this investigation was to prevent DAX
> mappings (direct mmap access to persistent memory) from leaking past the
> lifetime of the hosting block device.  However, Dave points out that
> these shutdown operations are needed in other scenarios.  Quoting Dave:
> 
>     For example, if we detect a free space corruption during allocation,
>     it is not safe to trust *any active mapping* because we can't trust
>     that we having handed out the same block to multiple owners. Hence
>     on such a filesystem shutdown, we have to prevent any new DAX
>     mapping from occurring and invalidate all existing mappings as we
>     cannot allow userspace to modify any data or metadata until we've
>     resolved the corruption situation.
> 
> The current block device shutdown sequence of del_gendisk +
> blk_cleanup_queue is problematic.  We want to tell the fs after
> blk_cleanup_queue that there is no possibility of recovery, but by that
> time we have deleted partitions and lost the ability to find all the
> super-blocks on a block device.
> 
> Introduce del_gendisk_queue to trigger ->quiesce() and ->bdi_gone()
> notifications to all the filesystems hosted on the disk.  Where
> ->quiesce() are 'shutdown' operations while the bdev may still be alive,
> and ->bdi_gone() is a set of actions to take after the backing device
> is known to be permanently dead.
> 
> generic_quiesce_super and generic_bdi_gone, are the default operations
> when a filesystem does not implement ->quiesce(), ->bdi_gone().  They
> invalidate inodes and unmap DAX-inodes respectively.  For now only
> ->bdi_gone() has an associated super operation as xfs will implement
> ->bdi_gone() in a later patch.

So the patch looks good. Just for the callbacks to be generally useful we
would need del_gendisk_queue() to be used for all devices, not just for
pmem ones... But I guess this is better than nothing.

								Honza
 
> Cc: Jan Kara <jack@suse.com>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Matthew Wilcox <willy@linux.intel.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  block/genhd.c                |   87 +++++++++++++++++++++++++++++++++++-------
>  drivers/block/brd.c          |    3 -
>  drivers/nvdimm/pmem.c        |    3 -
>  drivers/s390/block/dcssblk.c |    6 +--
>  fs/block_dev.c               |   78 ++++++++++++++++++++++++++++++++------
>  include/linux/fs.h           |    2 +
>  include/linux/genhd.h        |    1 
>  7 files changed, 147 insertions(+), 33 deletions(-)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index e5cafa51567c..967fcfd63c98 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -634,24 +634,14 @@ void add_disk(struct gendisk *disk)
>  }
>  EXPORT_SYMBOL(add_disk);
>  
> -void del_gendisk(struct gendisk *disk)
> +static void del_gendisk_start(struct gendisk *disk)
>  {
> -	struct disk_part_iter piter;
> -	struct hd_struct *part;
> -
>  	blk_integrity_del(disk);
>  	disk_del_events(disk);
> +}
>  
> -	/* invalidate stuff */
> -	disk_part_iter_init(&piter, disk,
> -			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
> -	while ((part = disk_part_iter_next(&piter))) {
> -		invalidate_partition(disk, part->partno);
> -		delete_partition(disk, part->partno);
> -	}
> -	disk_part_iter_exit(&piter);
> -
> -	invalidate_partition(disk, 0);
> +static void del_gendisk_end(struct gendisk *disk)
> +{
>  	set_capacity(disk, 0);
>  	disk->flags &= ~GENHD_FL_UP;
>  
> @@ -670,9 +660,78 @@ void del_gendisk(struct gendisk *disk)
>  	pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);
>  	device_del(disk_to_dev(disk));
>  }
> +
> +#define for_each_part(part, piter) \
> +	for (part = disk_part_iter_next(piter); part; \
> +			part = disk_part_iter_next(piter))
> +void del_gendisk(struct gendisk *disk)
> +{
> +	struct disk_part_iter piter;
> +	struct hd_struct *part;
> +
> +	del_gendisk_start(disk);
> +
> +	/* invalidate stuff */
> +	disk_part_iter_init(&piter, disk,
> +			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
> +	for_each_part(part, &piter) {
> +		invalidate_partition(disk, part->partno);
> +		delete_partition(disk, part->partno);
> +	}
> +	disk_part_iter_exit(&piter);
> +	invalidate_partition(disk, 0);
> +
> +	del_gendisk_end(disk);
> +}
>  EXPORT_SYMBOL(del_gendisk);
>  
>  /**
> + * del_gendisk_queue - combined del_gendisk + blk_cleanup_queue
> + * @disk: disk to delete, invalidate, unmap, and end i/o
> + *
> + * This alternative for open coded calls to:
> + *     del_gendisk()
> + *     blk_cleanup_queue()
> + * ...is for notifying the filesystem that the block device has gone
> + * away.  This distinction is important for triggering a filesystem to
> + * abort its error recovery and for (DAX) capable devices.  DAX bypasses
> + * page cache and mappings go directly to storage media.  When such a
> + * disk is removed the pfn backing a mapping may be invalid or removed
> + * from the system.  Upon return accessing DAX mappings of this disk
> + * will trigger SIGBUS.
> + */
> +void del_gendisk_queue(struct gendisk *disk)
> +{
> +	struct disk_part_iter piter;
> +	struct hd_struct *part;
> +
> +	del_gendisk_start(disk);
> +
> +	/* pass1 sync fs + evict idle inodes */
> +	disk_part_iter_init(&piter, disk,
> +			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
> +	for_each_part(part, &piter)
> +		invalidate_partition(disk, part->partno);
> +	disk_part_iter_exit(&piter);
> +	invalidate_partition(disk, 0);
> +
> +	blk_cleanup_queue(disk->queue);
> +
> +	/* pass2 the queue is dead, trigger fs shutdown via ->bdi_gone() */
> +	disk_part_iter_init(&piter, disk,
> +			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
> +	for_each_part(part, &piter) {
> +		shutdown_partition(disk, part->partno);
> +		delete_partition(disk, part->partno);
> +	}
> +	disk_part_iter_exit(&piter);
> +	shutdown_partition(disk, 0);
> +
> +	del_gendisk_end(disk);
> +}
> +EXPORT_SYMBOL(del_gendisk_queue);
> +
> +/**
>   * get_gendisk - get partitioning information for a given device
>   * @devt: device to get partitioning information for
>   * @partno: returned partition index
> diff --git a/drivers/block/brd.c b/drivers/block/brd.c
> index a5880f4ab40e..f149dd57bba3 100644
> --- a/drivers/block/brd.c
> +++ b/drivers/block/brd.c
> @@ -532,7 +532,6 @@ out:
>  static void brd_free(struct brd_device *brd)
>  {
>  	put_disk(brd->brd_disk);
> -	blk_cleanup_queue(brd->brd_queue);
>  	brd_free_pages(brd);
>  	kfree(brd);
>  }
> @@ -560,7 +559,7 @@ out:
>  static void brd_del_one(struct brd_device *brd)
>  {
>  	list_del(&brd->brd_list);
> -	del_gendisk(brd->brd_disk);
> +	del_gendisk_queue(brd->brd_disk);
>  	brd_free(brd);
>  }
>  
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 8ee79893d2f5..6dd06e9d34b0 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -158,9 +158,8 @@ static void pmem_detach_disk(struct pmem_device *pmem)
>  	if (!pmem->pmem_disk)
>  		return;
>  
> -	del_gendisk(pmem->pmem_disk);
> +	del_gendisk_queue(pmem->pmem_disk);
>  	put_disk(pmem->pmem_disk);
> -	blk_cleanup_queue(pmem->pmem_queue);
>  }
>  
>  static int pmem_attach_disk(struct device *dev,
> diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
> index 94a8f4ab57bc..0c3c968b57d9 100644
> --- a/drivers/s390/block/dcssblk.c
> +++ b/drivers/s390/block/dcssblk.c
> @@ -388,8 +388,7 @@ removeseg:
>  	}
>  	list_del(&dev_info->lh);
>  
> -	del_gendisk(dev_info->gd);
> -	blk_cleanup_queue(dev_info->dcssblk_queue);
> +	del_gendisk_queue(dev_info->gd);
>  	dev_info->gd->queue = NULL;
>  	put_disk(dev_info->gd);
>  	up_write(&dcssblk_devices_sem);
> @@ -751,8 +750,7 @@ dcssblk_remove_store(struct device *dev, struct device_attribute *attr, const ch
>  	}
>  
>  	list_del(&dev_info->lh);
> -	del_gendisk(dev_info->gd);
> -	blk_cleanup_queue(dev_info->dcssblk_queue);
> +	del_gendisk_queue(dev_info->gd);
>  	dev_info->gd->queue = NULL;
>  	put_disk(dev_info->gd);
>  	device_unregister(&dev_info->dev);
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 1dd416bf72f7..39989e990df9 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1774,27 +1774,83 @@ fail:
>  }
>  EXPORT_SYMBOL(lookup_bdev);
>  
> +static int generic_quiesce_super(struct super_block *sb, bool kill_dirty)
> +{
> +	/*
> +	 * no need to lock the super, get_super holds the read mutex so
> +	 * the filesystem cannot go away under us (->put_super runs with
> +	 * the write lock held).
> +	 */
> +	shrink_dcache_sb(sb);
> +	return invalidate_inodes(sb, kill_dirty);
> +}
> +
>  int __invalidate_device(struct block_device *bdev, bool kill_dirty)
>  {
>  	struct super_block *sb = get_super(bdev);
>  	int res = 0;
>  
> -	if (sb) {
> -		/*
> -		 * no need to lock the super, get_super holds the
> -		 * read mutex so the filesystem cannot go away
> -		 * under us (->put_super runs with the write lock
> -		 * hold).
> -		 */
> -		shrink_dcache_sb(sb);
> -		res = invalidate_inodes(sb, kill_dirty);
> -		drop_super(sb);
> -	}
> +	if (!sb)
> +		goto out;
> +
> +	res = generic_quiesce_super(sb, kill_dirty);
> +	drop_super(sb);
> + out:
>  	invalidate_bdev(bdev);
> +
>  	return res;
>  }
>  EXPORT_SYMBOL(__invalidate_device);
>  
> +static void generic_bdi_gone(struct super_block *sb)
> +{
> +	struct inode *inode, *_inode = NULL;
> +
> +	spin_lock(&sb->s_inode_list_lock);
> +	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> +		spin_lock(&inode->i_lock);
> +		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
> +				|| !IS_DAX(inode)) {
> +			spin_unlock(&inode->i_lock);
> +			continue;
> +		}
> +		__iget(inode);
> +		spin_unlock(&inode->i_lock);
> +		spin_unlock(&sb->s_inode_list_lock);
> +
> +		unmap_mapping_range(inode->i_mapping, 0, 0, 1);
> +		iput(_inode);
> +		_inode = inode;
> +		cond_resched();
> +
> +		spin_lock(&sb->s_inode_list_lock);
> +	}
> +	spin_unlock(&sb->s_inode_list_lock);
> +	iput(_inode);
> +}
> +
> +void shutdown_partition(struct gendisk *disk, int partno)
> +{
> +	struct block_device *bdev = bdget_disk(disk, partno);
> +	struct super_block *sb = get_super(bdev);
> +
> +	if (!bdev)
> +		return;
> +
> +	if (!sb) {
> +		bdput(bdev);
> +		return;
> +	}
> +
> +	if (sb->s_op->bdi_gone)
> +		sb->s_op->bdi_gone(sb);
> +	else
> +		generic_bdi_gone(sb);
> +
> +	drop_super(sb);
> +	bdput(bdev);
> +}
> +
>  void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
>  {
>  	struct inode *inode, *old_inode = NULL;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3aa514254161..0e201ed38045 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1713,6 +1713,7 @@ struct super_operations {
>  				  struct shrink_control *);
>  	long (*free_cached_objects)(struct super_block *,
>  				    struct shrink_control *);
> +	void (*bdi_gone)(struct super_block *);
>  };
>  
>  /*
> @@ -2390,6 +2391,7 @@ extern int revalidate_disk(struct gendisk *);
>  extern int check_disk_change(struct block_device *);
>  extern int __invalidate_device(struct block_device *, bool);
>  extern int invalidate_partition(struct gendisk *, int);
> +extern void shutdown_partition(struct gendisk *, int);
>  #endif
>  unsigned long invalidate_mapping_pages(struct address_space *mapping,
>  					pgoff_t start, pgoff_t end);
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index 847cc1d91634..028cf15a8a57 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -431,6 +431,7 @@ extern void part_round_stats(int cpu, struct hd_struct *part);
>  /* block/genhd.c */
>  extern void add_disk(struct gendisk *disk);
>  extern void del_gendisk(struct gendisk *gp);
> +extern void del_gendisk_queue(struct gendisk *disk);
>  extern struct gendisk *get_gendisk(dev_t dev, int *partno);
>  extern struct block_device *bdget_disk(struct gendisk *disk, int partno);
>  
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] block, fs: reliably communicate bdev end-of-life
  2015-12-02 11:16   ` Jan Kara
@ 2015-12-02 15:37     ` Dan Williams
  2015-12-03 12:38       ` Jan Kara
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2015-12-02 15:37 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-block, linux-nvdimm@lists.01.org,
	Dave Chinner, Jens Axboe, Jan Kara, Matthew Wilcox, Ross Zwisler

On Wed, Dec 2, 2015 at 3:16 AM, Jan Kara <jack@suse.cz> wrote:
>> generic_quiesce_super and generic_bdi_gone, are the default operations
>> when a filesystem does not implement ->quiesce(), ->bdi_gone().  They
>> invalidate inodes and unmap DAX-inodes respectively.  For now only
>> ->bdi_gone() has an associated super operation as xfs will implement
>> ->bdi_gone() in a later patch.
>
> So the patch looks good. Just for the callbacks to be generally useful we
> would need del_gendisk_queue() to be used for all devices, not just for
> pmem ones... But I guess this is better than nothing.

Once this goes in I'll go back and convert nvme and sd to
del_gendisk_queue.  That should give us coverage for the most
prominent block drivers.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] block, fs: reliably communicate bdev end-of-life
  2015-12-02 15:37     ` Dan Williams
@ 2015-12-03 12:38       ` Jan Kara
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2015-12-03 12:38 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, linux-fsdevel, linux-block, linux-nvdimm@lists.01.org,
	Dave Chinner, Jens Axboe, Jan Kara, Matthew Wilcox, Ross Zwisler

On Wed 02-12-15 07:37:58, Dan Williams wrote:
> On Wed, Dec 2, 2015 at 3:16 AM, Jan Kara <jack@suse.cz> wrote:
> >> generic_quiesce_super and generic_bdi_gone, are the default operations
> >> when a filesystem does not implement ->quiesce(), ->bdi_gone().  They
> >> invalidate inodes and unmap DAX-inodes respectively.  For now only
> >> ->bdi_gone() has an associated super operation as xfs will implement
> >> ->bdi_gone() in a later patch.
> >
> > So the patch looks good. Just for the callbacks to be generally useful we
> > would need del_gendisk_queue() to be used for all devices, not just for
> > pmem ones... But I guess this is better than nothing.
> 
> Once this goes in I'll go back and convert nvme and sd to
> del_gendisk_queue.  That should give us coverage for the most
> prominent block drivers.

OK, that would be good enough. Feel free to add:

Acked-by: Jan Kara <jack@suse.cz>

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] xfs: handle shutdown notifications
  2015-12-01 23:58 ` [PATCH 2/3] xfs: handle shutdown notifications Dan Williams
@ 2015-12-23  2:39   ` Dan Williams
  2016-01-03 21:05   ` Dave Chinner
  1 sibling, 0 replies; 10+ messages in thread
From: Dan Williams @ 2015-12-23  2:39 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, XFS Developers, Dave Chinner,
	linux-nvdimm@lists.01.org

Dave, I'm looking to push out a branch with this patch included.  Any concerns?

On Tue, Dec 1, 2015 at 3:58 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> Force a filesystem shutdown when the backing device is known to be dead.
> I.e. blk_queue_enter() permanently returns -ENODEV.
>
> Cc: xfs@oss.sgi.com
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  fs/block_dev.c     |    3 ++-
>  fs/xfs/xfs_super.c |    9 +++++++++
>  include/linux/fs.h |    2 ++
>  3 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 39989e990df9..dfe9a53a7c53 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1802,7 +1802,7 @@ int __invalidate_device(struct block_device *bdev, bool kill_dirty)
>  }
>  EXPORT_SYMBOL(__invalidate_device);
>
> -static void generic_bdi_gone(struct super_block *sb)
> +void generic_bdi_gone(struct super_block *sb)
>  {
>         struct inode *inode, *_inode = NULL;
>
> @@ -1828,6 +1828,7 @@ static void generic_bdi_gone(struct super_block *sb)
>         spin_unlock(&sb->s_inode_list_lock);
>         iput(_inode);
>  }
> +EXPORT_SYMBOL(generic_bdi_gone);
>
>  void shutdown_partition(struct gendisk *disk, int partno)
>  {
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 36bd8825bfb0..63c36508e9db 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1618,6 +1618,14 @@ xfs_fs_free_cached_objects(
>         return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan);
>  }
>
> +static void
> +xfs_fs_bdi_gone(
> +       struct super_block *sb)
> +{
> +       xfs_force_shutdown(XFS_M(sb), SHUTDOWN_DEVICE_REQ);
> +       generic_bdi_gone(sb);
> +}
> +
>  static const struct super_operations xfs_super_operations = {
>         .alloc_inode            = xfs_fs_alloc_inode,
>         .destroy_inode          = xfs_fs_destroy_inode,
> @@ -1632,6 +1640,7 @@ static const struct super_operations xfs_super_operations = {
>         .show_options           = xfs_fs_show_options,
>         .nr_cached_objects      = xfs_fs_nr_cached_objects,
>         .free_cached_objects    = xfs_fs_free_cached_objects,
> +       .bdi_gone               = xfs_fs_bdi_gone,
>  };
>
>  static struct file_system_type xfs_fs_type = {
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 0e201ed38045..b1e8e049e4b8 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2265,6 +2265,7 @@ extern struct super_block *freeze_bdev(struct block_device *);
>  extern void emergency_thaw_all(void);
>  extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
>  extern int fsync_bdev(struct block_device *);
> +extern void generic_bdi_gone(struct super_block *sb);
>
>  extern struct super_block *blockdev_superblock;
>
> @@ -2277,6 +2278,7 @@ static inline void bd_forget(struct inode *inode) {}
>  static inline int sync_blockdev(struct block_device *bdev) { return 0; }
>  static inline void kill_bdev(struct block_device *bdev) {}
>  static inline void invalidate_bdev(struct block_device *bdev) {}
> +static inline void generic_bdi_gone(struct super_block *sb) {}
>
>  static inline struct super_block *freeze_bdev(struct block_device *sb)
>  {
>
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] xfs: handle shutdown notifications
  2015-12-01 23:58 ` [PATCH 2/3] xfs: handle shutdown notifications Dan Williams
  2015-12-23  2:39   ` Dan Williams
@ 2016-01-03 21:05   ` Dave Chinner
  1 sibling, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2016-01-03 21:05 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-fsdevel, linux-block, linux-nvdimm, xfs

On Tue, Dec 01, 2015 at 03:58:46PM -0800, Dan Williams wrote:
> Force a filesystem shutdown when the backing device is known to be dead.
> I.e. blk_queue_enter() permanently returns -ENODEV.
> 
> Cc: xfs@oss.sgi.com
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  fs/block_dev.c     |    3 ++-
>  fs/xfs/xfs_super.c |    9 +++++++++
>  include/linux/fs.h |    2 ++
>  3 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 39989e990df9..dfe9a53a7c53 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1802,7 +1802,7 @@ int __invalidate_device(struct block_device *bdev, bool kill_dirty)
>  }
>  EXPORT_SYMBOL(__invalidate_device);
>  
> -static void generic_bdi_gone(struct super_block *sb)
> +void generic_bdi_gone(struct super_block *sb)
>  {
>  	struct inode *inode, *_inode = NULL;
>  
> @@ -1828,6 +1828,7 @@ static void generic_bdi_gone(struct super_block *sb)
>  	spin_unlock(&sb->s_inode_list_lock);
>  	iput(_inode);
>  }
> +EXPORT_SYMBOL(generic_bdi_gone);
>  
>  void shutdown_partition(struct gendisk *disk, int partno)
>  {
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 36bd8825bfb0..63c36508e9db 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1618,6 +1618,14 @@ xfs_fs_free_cached_objects(
>  	return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan);
>  }
>  
> +static void
> +xfs_fs_bdi_gone(
> +	struct super_block *sb)
> +{
> +	xfs_force_shutdown(XFS_M(sb), SHUTDOWN_DEVICE_REQ);
> +	generic_bdi_gone(sb);
> +}

You haven't cc'd me or the XFS list on the other patches in this
series, so I cannot comment as to whether this will work or not.
As I've asked before, please do not do single patch CC's like this
as it does not give the reviewer the context to be able to review
the change, nor is it possible to *apply and test* the patch they
are being asked to comment on.

Please resend the series with complete CC's to the relevant lists.
Sure, go ahead an CC individual people on certain patches, but you
need to make sure the whole patch series hit each mailing list....

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-01-03 21:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-01 23:58 [PATCH 0/3] fs, bdev: handle end of life Dan Williams
2015-12-01 23:58 ` [PATCH 1/3] block, fs: reliably communicate bdev end-of-life Dan Williams
2015-12-02 11:16   ` Jan Kara
2015-12-02 15:37     ` Dan Williams
2015-12-03 12:38       ` Jan Kara
2015-12-01 23:58 ` [PATCH 2/3] xfs: handle shutdown notifications Dan Williams
2015-12-23  2:39   ` Dan Williams
2016-01-03 21:05   ` Dave Chinner
2015-12-01 23:58 ` [PATCH 3/3] writeback: fix false positive WARN in __mark_inode_dirty Dan Williams
2015-12-02 10:35   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).