linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] fs, block: handle end of life
@ 2016-01-06  4:56 Dan Williams
  2016-01-06  4:56 ` [PATCH v2 1/4] block: prepare for del_gendisk_queue() Dan Williams
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Dan Williams @ 2016-01-06  4:56 UTC (permalink / raw)
  To: xfs
  Cc: linux-block, linux-nvdimm, Dave Chinner, Jens Axboe,
	Alexander Viro, Jan Kara, linux-fsdevel, Matthew Wilcox,
	Ross Zwisler

Changes since v1 [1]:

1/ move the del_gendisk() refactoring to its own patch (Dave)

2/ add unmap_dax_inodes to the xfs shutdown path (Dave)

3/ kill the unnecessary ->quiesce super operation and rename ->bdi_gone
   to ->force_failure. (Dave)

4/ rework tricky call to get_super() with a NULL bdev parameter. (Dave)

[1]: https://lists.01.org/pipermail/linux-nvdimm/2016-January/003797.html

---

As mentioned in [PATCH v2 2/4] "block: introduce del_gendisk_queue()" ,
historically we have waited for filesystem specific heuristics to
attempt to guess when a block device is gone.  Sometimes this works, but
in other cases the system can hang waiting for the fs to trigger its
shutdown protocol.

Now with DAX we need new actions, like unmapping all inodes, to be taken
upon a device loss event or fs corruption event.

For now, the approach taken in the following patches only affects xfs
and block drivers that are converted to use del_gendisk_queue().  We can
add more filesystems and driver support over time.

---

Dan Williams (4):
      block: prepare for del_gendisk_queue()
      block: introduce del_gendisk_queue()
      xfs: unmap dax at shutdown (force_failure)
      block, xfs: implement 'force_failure' notifications


 block/genhd.c                |   87 +++++++++++++++++++++++++++++++++++-------
 drivers/block/brd.c          |    9 +---
 drivers/nvdimm/pmem.c        |    3 -
 drivers/s390/block/dcssblk.c |    6 +--
 fs/block_dev.c               |   22 +++++++++++
 fs/inode.c                   |   28 ++++++++++++++
 fs/xfs/xfs_fsops.c           |    9 ++++
 fs/xfs/xfs_super.c           |    8 ++++
 include/linux/fs.h           |    3 +
 include/linux/genhd.h        |    1 
 10 files changed, 150 insertions(+), 26 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/4] block: prepare for del_gendisk_queue()
  2016-01-06  4:56 [PATCH v2 0/4] fs, block: handle end of life Dan Williams
@ 2016-01-06  4:56 ` Dan Williams
  2016-01-06  4:56 ` [PATCH v2 2/4] block: introduce del_gendisk_queue() Dan Williams
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2016-01-06  4:56 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel, linux-block, linux-nvdimm

Refactor del_gendisk() into del_gendisk_start() and del_gendisk_end().
These are common helpers that will be shared bewtween del_gendisk() and
the to-be-introduced del_gendisk_queue().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/genhd.c |   41 +++++++++++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index e5cafa51567c..b1d1df42ba13 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -634,24 +634,14 @@ void add_disk(struct gendisk *disk)
 }
 EXPORT_SYMBOL(add_disk);
 
-void del_gendisk(struct gendisk *disk)
+static void del_gendisk_start(struct gendisk *disk)
 {
-	struct disk_part_iter piter;
-	struct hd_struct *part;
-
 	blk_integrity_del(disk);
 	disk_del_events(disk);
+}
 
-	/* invalidate stuff */
-	disk_part_iter_init(&piter, disk,
-			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
-	while ((part = disk_part_iter_next(&piter))) {
-		invalidate_partition(disk, part->partno);
-		delete_partition(disk, part->partno);
-	}
-	disk_part_iter_exit(&piter);
-
-	invalidate_partition(disk, 0);
+static void del_gendisk_end(struct gendisk *disk)
+{
 	set_capacity(disk, 0);
 	disk->flags &= ~GENHD_FL_UP;
 
@@ -670,6 +660,29 @@ void del_gendisk(struct gendisk *disk)
 	pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);
 	device_del(disk_to_dev(disk));
 }
+
+#define for_each_part(part, piter) \
+	for (part = disk_part_iter_next(piter); part; \
+			part = disk_part_iter_next(piter))
+void del_gendisk(struct gendisk *disk)
+{
+	struct disk_part_iter piter;
+	struct hd_struct *part;
+
+	del_gendisk_start(disk);
+
+	/* invalidate stuff */
+	disk_part_iter_init(&piter, disk,
+			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
+	for_each_part(part, &piter) {
+		invalidate_partition(disk, part->partno);
+		delete_partition(disk, part->partno);
+	}
+	disk_part_iter_exit(&piter);
+	invalidate_partition(disk, 0);
+
+	del_gendisk_end(disk);
+}
 EXPORT_SYMBOL(del_gendisk);
 
 /**


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/4] block: introduce del_gendisk_queue()
  2016-01-06  4:56 [PATCH v2 0/4] fs, block: handle end of life Dan Williams
  2016-01-06  4:56 ` [PATCH v2 1/4] block: prepare for del_gendisk_queue() Dan Williams
@ 2016-01-06  4:56 ` Dan Williams
  2016-01-08  0:15   ` Dave Chinner
  2016-01-06  4:56 ` [PATCH v2 3/4] xfs: unmap dax at shutdown (force_failure) Dan Williams
  2016-01-06  4:56 ` [PATCH v2 4/4] block, xfs: implement 'force_failure' notifications Dan Williams
  3 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2016-01-06  4:56 UTC (permalink / raw)
  To: xfs
  Cc: linux-block, linux-nvdimm, Dave Chinner, Jens Axboe, Jan Kara,
	linux-fsdevel, Matthew Wilcox, Ross Zwisler

Historically we have waited for filesystem specific heuristics to
attempt to guess when a block device is gone.  Sometimes this works, but
in other cases the system can hang waiting for the fs to trigger its
shutdown protocol.

The initial motivation for this investigation was to prevent DAX
mappings (direct mmap access to persistent memory) from leaking past the
lifetime of the hosting block device.  However, Dave points out that
these shutdown operations are needed in other scenarios.  Quoting Dave:

    For example, if we detect a free space corruption during allocation,
    it is not safe to trust *any active mapping* because we can't trust
    that we having handed out the same block to multiple owners. Hence
    on such a filesystem shutdown, we have to prevent any new DAX
    mapping from occurring and invalidate all existing mappings as we
    cannot allow userspace to modify any data or metadata until we've
    resolved the corruption situation.

The current block device shutdown sequence of del_gendisk +
blk_cleanup_queue is problematic.  We want to tell the fs after
blk_cleanup_queue that there is no possibility of recovery, but by that
time we have deleted partitions and lost the ability to find all the
super-blocks on a block device.

del_gendisk_queue() combines block device shutdown, blk_cleanup_queue(),
with block device end of life notification, del_gendisk().  A later
patch builds on this sequence to additionally communicate to the fs that
it should force-fail all future i/o since the queue is permanently dead.

Cc: Jan Kara <jack@suse.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/genhd.c                |   46 ++++++++++++++++++++++++++++++++++++++++++
 drivers/block/brd.c          |    9 +++-----
 drivers/nvdimm/pmem.c        |    3 +--
 drivers/s390/block/dcssblk.c |    6 ++---
 fs/block_dev.c               |   19 +++++++++++++++++
 fs/inode.c                   |   28 ++++++++++++++++++++++++++
 include/linux/fs.h           |    2 ++
 include/linux/genhd.h        |    1 +
 8 files changed, 102 insertions(+), 12 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index b1d1df42ba13..ac0d12c4f895 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -686,6 +686,52 @@ void del_gendisk(struct gendisk *disk)
 EXPORT_SYMBOL(del_gendisk);
 
 /**
+ * del_gendisk_queue - combined del_gendisk + blk_cleanup_queue
+ * @disk: disk to delete, invalidate, unmap, and force-fail fs operations
+ *
+ * This is an alternative for open coded calls to:
+ *     del_gendisk()
+ *     blk_cleanup_queue()
+ * It notifies filesystems / vfs that a block device is permanently dead
+ * after the queue has been torn down.  This notification is needed for
+ * triggering a filesystem to abort its error recovery and for (DAX)
+ * capable devices.  DAX bypasses page cache and mappings go directly to
+ * storage media.  When such a disk is removed the pfn backing a mapping
+ * may be invalid or removed from the system.  Upon return accessing DAX
+ * mappings of this disk will trigger SIGBUS.
+ */
+void del_gendisk_queue(struct gendisk *disk)
+{
+	struct disk_part_iter piter;
+	struct hd_struct *part;
+
+	del_gendisk_start(disk);
+
+	/* pass1 sync fs + evict idle inodes */
+	disk_part_iter_init(&piter, disk,
+			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
+	for_each_part(part, &piter)
+		invalidate_partition(disk, part->partno);
+	disk_part_iter_exit(&piter);
+	invalidate_partition(disk, 0);
+
+	blk_cleanup_queue(disk->queue);
+
+	/* pass2 the queue is dead, halt dax */
+	disk_part_iter_init(&piter, disk,
+			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
+	for_each_part(part, &piter) {
+		force_failure_partition(disk, part->partno);
+		delete_partition(disk, part->partno);
+	}
+	disk_part_iter_exit(&piter);
+	force_failure_partition(disk, 0);
+
+	del_gendisk_end(disk);
+}
+EXPORT_SYMBOL(del_gendisk_queue);
+
+/**
  * get_gendisk - get partitioning information for a given device
  * @devt: device to get partitioning information for
  * @partno: returned partition index
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index a5880f4ab40e..013ff58f9af8 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -532,7 +532,6 @@ out:
 static void brd_free(struct brd_device *brd)
 {
 	put_disk(brd->brd_disk);
-	blk_cleanup_queue(brd->brd_queue);
 	brd_free_pages(brd);
 	kfree(brd);
 }
@@ -560,7 +559,7 @@ out:
 static void brd_del_one(struct brd_device *brd)
 {
 	list_del(&brd->brd_list);
-	del_gendisk(brd->brd_disk);
+	del_gendisk_queue(brd->brd_disk);
 	brd_free(brd);
 }
 
@@ -626,10 +625,8 @@ static int __init brd_init(void)
 	return 0;
 
 out_free:
-	list_for_each_entry_safe(brd, next, &brd_devices, brd_list) {
-		list_del(&brd->brd_list);
-		brd_free(brd);
-	}
+	list_for_each_entry_safe(brd, next, &brd_devices, brd_list)
+		brd_del_one(brd);
 	unregister_blkdev(RAMDISK_MAJOR, "ramdisk");
 
 	pr_info("brd: module NOT loaded !!!\n");
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 8ee79893d2f5..6dd06e9d34b0 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -158,9 +158,8 @@ static void pmem_detach_disk(struct pmem_device *pmem)
 	if (!pmem->pmem_disk)
 		return;
 
-	del_gendisk(pmem->pmem_disk);
+	del_gendisk_queue(pmem->pmem_disk);
 	put_disk(pmem->pmem_disk);
-	blk_cleanup_queue(pmem->pmem_queue);
 }
 
 static int pmem_attach_disk(struct device *dev,
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 94a8f4ab57bc..0c3c968b57d9 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -388,8 +388,7 @@ removeseg:
 	}
 	list_del(&dev_info->lh);
 
-	del_gendisk(dev_info->gd);
-	blk_cleanup_queue(dev_info->dcssblk_queue);
+	del_gendisk_queue(dev_info->gd);
 	dev_info->gd->queue = NULL;
 	put_disk(dev_info->gd);
 	up_write(&dcssblk_devices_sem);
@@ -751,8 +750,7 @@ dcssblk_remove_store(struct device *dev, struct device_attribute *attr, const ch
 	}
 
 	list_del(&dev_info->lh);
-	del_gendisk(dev_info->gd);
-	blk_cleanup_queue(dev_info->dcssblk_queue);
+	del_gendisk_queue(dev_info->gd);
 	dev_info->gd->queue = NULL;
 	put_disk(dev_info->gd);
 	device_unregister(&dev_info->dev);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 44d4a1e9244e..9cff33b6baab 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1799,6 +1799,25 @@ int __invalidate_device(struct block_device *bdev, bool kill_dirty)
 }
 EXPORT_SYMBOL(__invalidate_device);
 
+void force_failure_partition(struct gendisk *disk, int partno)
+{
+	struct block_device *bdev;
+	struct super_block *sb;
+
+	bdev = bdget_disk(disk, partno);
+	if (!bdev)
+		return;
+
+	sb = get_super(bdev);
+	if (!sb)
+		goto out;
+
+	unmap_dax_inodes(sb);
+	drop_super(sb);
+ out:
+	bdput(bdev);
+}
+
 void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
 {
 	struct inode *inode, *old_inode = NULL;
diff --git a/fs/inode.c b/fs/inode.c
index 1be5f9003eb3..ed62e5f78f35 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -673,6 +673,34 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty)
 	return busy;
 }
 
+void unmap_dax_inodes(struct super_block *sb)
+{
+	struct inode *inode, *_inode = NULL;
+
+	spin_lock(&sb->s_inode_list_lock);
+	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+		spin_lock(&inode->i_lock);
+		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
+				|| !IS_DAX(inode)) {
+			spin_unlock(&inode->i_lock);
+			continue;
+		}
+		__iget(inode);
+		spin_unlock(&inode->i_lock);
+		spin_unlock(&sb->s_inode_list_lock);
+
+		unmap_mapping_range(inode->i_mapping, 0, 0, 1);
+		iput(_inode);
+		_inode = inode;
+		cond_resched();
+
+		spin_lock(&sb->s_inode_list_lock);
+	}
+	spin_unlock(&sb->s_inode_list_lock);
+	iput(_inode);
+}
+EXPORT_SYMBOL(unmap_dax_inodes);
+
 /*
  * Isolate the inode from the LRU in preparation for freeing it.
  *
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3aa514254161..a0d55199e628 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2390,6 +2390,7 @@ extern int revalidate_disk(struct gendisk *);
 extern int check_disk_change(struct block_device *);
 extern int __invalidate_device(struct block_device *, bool);
 extern int invalidate_partition(struct gendisk *, int);
+extern void force_failure_partition(struct gendisk *, int);
 #endif
 unsigned long invalidate_mapping_pages(struct address_space *mapping,
 					pgoff_t start, pgoff_t end);
@@ -2544,6 +2545,7 @@ extern loff_t default_llseek(struct file *file, loff_t offset, int whence);
 
 extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence);
 
+extern void unmap_dax_inodes(struct super_block *sb);
 extern int inode_init_always(struct super_block *, struct inode *);
 extern void inode_init_once(struct inode *);
 extern void address_space_init_once(struct address_space *mapping);
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 847cc1d91634..028cf15a8a57 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -431,6 +431,7 @@ extern void part_round_stats(int cpu, struct hd_struct *part);
 /* block/genhd.c */
 extern void add_disk(struct gendisk *disk);
 extern void del_gendisk(struct gendisk *gp);
+extern void del_gendisk_queue(struct gendisk *disk);
 extern struct gendisk *get_gendisk(dev_t dev, int *partno);
 extern struct block_device *bdget_disk(struct gendisk *disk, int partno);
 


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/4] xfs: unmap dax at shutdown (force_failure)
  2016-01-06  4:56 [PATCH v2 0/4] fs, block: handle end of life Dan Williams
  2016-01-06  4:56 ` [PATCH v2 1/4] block: prepare for del_gendisk_queue() Dan Williams
  2016-01-06  4:56 ` [PATCH v2 2/4] block: introduce del_gendisk_queue() Dan Williams
@ 2016-01-06  4:56 ` Dan Williams
  2016-01-08  0:16   ` Dave Chinner
  2016-01-06  4:56 ` [PATCH v2 4/4] block, xfs: implement 'force_failure' notifications Dan Williams
  3 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2016-01-06  4:56 UTC (permalink / raw)
  To: xfs; +Cc: linux-fsdevel, linux-block, linux-nvdimm

When an exceptional event triggers xfs_force_shutdown() tear down dax
mappings.  Quoting Dave,

    "The simple fact is that a /filesystem/ shutdown needs to do DAX
    mapping invalidation regardless of whether the block device has
    been unplugged or not. This is not a case of "this only happens
    when we unplug the device", this is a user data protection
    mechanism that we use to prevent corruption propagation once it
    has been detected. A device unplug is just one type of
    "corruption" that can occur."

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/xfs/xfs_fsops.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index ee3aaa0a5317..0c6a52809dcc 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -828,6 +828,15 @@ xfs_do_force_shutdown(
 	if (xfs_log_force_umount(mp, logerror))
 		return;
 
+	/*
+	 * If DAX is in use, we have to unmap all direct access virtual
+	 * mappings to ensure nothing more gets written directly from
+	 * userspace. This will force them to refault and that will
+	 * result in them detecting the shutdown condition and hence
+	 * will fail appropriately.
+	 */
+	unmap_dax_inodes(mp->m_super);
+
 	if (flags & SHUTDOWN_CORRUPT_INCORE) {
 		xfs_alert_tag(mp, XFS_PTAG_SHUTDOWN_CORRUPT,
     "Corruption of in-memory data detected.  Shutting down filesystem");


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 4/4] block, xfs: implement 'force_failure' notifications
  2016-01-06  4:56 [PATCH v2 0/4] fs, block: handle end of life Dan Williams
                   ` (2 preceding siblings ...)
  2016-01-06  4:56 ` [PATCH v2 3/4] xfs: unmap dax at shutdown (force_failure) Dan Williams
@ 2016-01-06  4:56 ` Dan Williams
  3 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2016-01-06  4:56 UTC (permalink / raw)
  To: xfs
  Cc: linux-block, linux-nvdimm, Dave Chinner, Jens Axboe,
	Alexander Viro, Jan Kara, linux-fsdevel

Introduce a new super operation, 'force_failure', that is invoked by
force_failure_partition() when the block device is dead.  This
unambiguously communicates to a filesystem that i/o errors are permanent
and no recovery effort will succeed.

'force_failure' simply becomes another exceptional event that can
trigger xfs_force_shutdown().

Cc: Jan Kara <jack@suse.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 block/genhd.c      |    2 +-
 fs/block_dev.c     |    5 ++++-
 fs/xfs/xfs_super.c |    8 ++++++++
 include/linux/fs.h |    1 +
 4 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index ac0d12c4f895..45f9f123013b 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -717,7 +717,7 @@ void del_gendisk_queue(struct gendisk *disk)
 
 	blk_cleanup_queue(disk->queue);
 
-	/* pass2 the queue is dead, halt dax */
+	/* pass2 the queue is dead, halt dax, and halt fs operations */
 	disk_part_iter_init(&piter, disk,
 			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
 	for_each_part(part, &piter) {
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9cff33b6baab..a9c07910481c 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1812,7 +1812,10 @@ void force_failure_partition(struct gendisk *disk, int partno)
 	if (!sb)
 		goto out;
 
-	unmap_dax_inodes(sb);
+	if (sb->s_op->force_failure)
+		sb->s_op->force_failure(sb);
+	else
+		unmap_dax_inodes(sb);
 	drop_super(sb);
  out:
 	bdput(bdev);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 36bd8825bfb0..e1113ac2e342 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1618,6 +1618,13 @@ xfs_fs_free_cached_objects(
 	return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan);
 }
 
+static void
+xfs_fs_force_failure(
+	struct super_block *sb)
+{
+	xfs_force_shutdown(XFS_M(sb), SHUTDOWN_DEVICE_REQ);
+}
+
 static const struct super_operations xfs_super_operations = {
 	.alloc_inode		= xfs_fs_alloc_inode,
 	.destroy_inode		= xfs_fs_destroy_inode,
@@ -1632,6 +1639,7 @@ static const struct super_operations xfs_super_operations = {
 	.show_options		= xfs_fs_show_options,
 	.nr_cached_objects	= xfs_fs_nr_cached_objects,
 	.free_cached_objects	= xfs_fs_free_cached_objects,
+	.force_failure		= xfs_fs_force_failure,
 };
 
 static struct file_system_type xfs_fs_type = {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a0d55199e628..bfd9bb7b529d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1713,6 +1713,7 @@ struct super_operations {
 				  struct shrink_control *);
 	long (*free_cached_objects)(struct super_block *,
 				    struct shrink_control *);
+	void (*force_failure)(struct super_block *);
 };
 
 /*


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/4] block: introduce del_gendisk_queue()
  2016-01-06  4:56 ` [PATCH v2 2/4] block: introduce del_gendisk_queue() Dan Williams
@ 2016-01-08  0:15   ` Dave Chinner
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2016-01-08  0:15 UTC (permalink / raw)
  To: Dan Williams
  Cc: xfs, linux-block, linux-nvdimm, Jens Axboe, Jan Kara,
	linux-fsdevel, Matthew Wilcox, Ross Zwisler

On Tue, Jan 05, 2016 at 08:56:27PM -0800, Dan Williams wrote:
> Historically we have waited for filesystem specific heuristics to
> attempt to guess when a block device is gone.  Sometimes this works, but
> in other cases the system can hang waiting for the fs to trigger its
> shutdown protocol.
> 
> The initial motivation for this investigation was to prevent DAX
> mappings (direct mmap access to persistent memory) from leaking past the
> lifetime of the hosting block device.  However, Dave points out that
> these shutdown operations are needed in other scenarios.  Quoting Dave:
> 
>     For example, if we detect a free space corruption during allocation,
>     it is not safe to trust *any active mapping* because we can't trust
>     that we having handed out the same block to multiple owners. Hence
>     on such a filesystem shutdown, we have to prevent any new DAX
>     mapping from occurring and invalidate all existing mappings as we
>     cannot allow userspace to modify any data or metadata until we've
>     resolved the corruption situation.
> 
> The current block device shutdown sequence of del_gendisk +
> blk_cleanup_queue is problematic.  We want to tell the fs after
> blk_cleanup_queue that there is no possibility of recovery, but by that
> time we have deleted partitions and lost the ability to find all the
> super-blocks on a block device.
> 
> del_gendisk_queue() combines block device shutdown, blk_cleanup_queue(),
> with block device end of life notification, del_gendisk().  A later
> patch builds on this sequence to additionally communicate to the fs that
> it should force-fail all future i/o since the queue is permanently dead.

This still is two changes in one. Adding the force failure feature
is a separate change to creating del_gendisk_queue().

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 3/4] xfs: unmap dax at shutdown (force_failure)
  2016-01-06  4:56 ` [PATCH v2 3/4] xfs: unmap dax at shutdown (force_failure) Dan Williams
@ 2016-01-08  0:16   ` Dave Chinner
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2016-01-08  0:16 UTC (permalink / raw)
  To: Dan Williams; +Cc: xfs, linux-fsdevel, linux-block, linux-nvdimm

On Tue, Jan 05, 2016 at 08:56:32PM -0800, Dan Williams wrote:
> When an exceptional event triggers xfs_force_shutdown() tear down dax
> mappings.  Quoting Dave,
> 
>     "The simple fact is that a /filesystem/ shutdown needs to do DAX
>     mapping invalidation regardless of whether the block device has
>     been unplugged or not. This is not a case of "this only happens
>     when we unplug the device", this is a user data protection
>     mechanism that we use to prevent corruption propagation once it
>     has been detected. A device unplug is just one type of
>     "corruption" that can occur."
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  fs/xfs/xfs_fsops.c |    9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index ee3aaa0a5317..0c6a52809dcc 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -828,6 +828,15 @@ xfs_do_force_shutdown(
>  	if (xfs_log_force_umount(mp, logerror))
>  		return;
>  
> +	/*
> +	 * If DAX is in use, we have to unmap all direct access virtual
> +	 * mappings to ensure nothing more gets written directly from
> +	 * userspace. This will force them to refault and that will
> +	 * result in them detecting the shutdown condition and hence
> +	 * will fail appropriately.
> +	 */
> +	unmap_dax_inodes(mp->m_super);
> +
>  	if (flags & SHUTDOWN_CORRUPT_INCORE) {
>  		xfs_alert_tag(mp, XFS_PTAG_SHUTDOWN_CORRUPT,
>      "Corruption of in-memory data detected.  Shutting down filesystem");

Looks fine.

Acked-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-01-08  0:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-06  4:56 [PATCH v2 0/4] fs, block: handle end of life Dan Williams
2016-01-06  4:56 ` [PATCH v2 1/4] block: prepare for del_gendisk_queue() Dan Williams
2016-01-06  4:56 ` [PATCH v2 2/4] block: introduce del_gendisk_queue() Dan Williams
2016-01-08  0:15   ` Dave Chinner
2016-01-06  4:56 ` [PATCH v2 3/4] xfs: unmap dax at shutdown (force_failure) Dan Williams
2016-01-08  0:16   ` Dave Chinner
2016-01-06  4:56 ` [PATCH v2 4/4] block, xfs: implement 'force_failure' notifications Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).