* [PATCH v7 1/6] dax: fallback from pmd to pte on error
2016-05-11 21:08 [PATCH v7 0/6] dax: handling media errors (clear-on-zero only) Vishal Verma
@ 2016-05-11 21:08 ` Vishal Verma
2016-05-11 21:08 ` [PATCH v7 2/6] dax: enable dax in the presence of known media errors (badblocks) Vishal Verma
` (4 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Vishal Verma @ 2016-05-11 21:08 UTC (permalink / raw)
To: linux-nvdimm
Cc: Dan Williams, linux-fsdevel, linux-block, xfs, linux-ext4,
linux-mm, Ross Zwisler, Dave Chinner, Jan Kara, Jens Axboe,
Andrew Morton, linux-kernel, Christoph Hellwig, Jeff Moyer,
Boaz Harrosh
From: Dan Williams <dan.j.williams@intel.com>
In preparation for consulting a badblocks list in pmem_direct_access(),
teach dax_pmd_fault() to fallback rather than fail immediately upon
encountering an error. The thought being that reducing the span of the
dax request may avoid the error region.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
fs/dax.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 9bc6624..d602410 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -855,8 +855,8 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
long length = dax_map_atomic(bdev, &dax);
if (length < 0) {
- result = VM_FAULT_SIGBUS;
- goto out;
+ dax_pmd_dbg(&bh, address, "dax-error fallback");
+ goto fallback;
}
if (length < PMD_SIZE) {
dax_pmd_dbg(&bh, address, "dax-length too small");
--
2.5.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v7 2/6] dax: enable dax in the presence of known media errors (badblocks)
2016-05-11 21:08 [PATCH v7 0/6] dax: handling media errors (clear-on-zero only) Vishal Verma
2016-05-11 21:08 ` [PATCH v7 1/6] dax: fallback from pmd to pte on error Vishal Verma
@ 2016-05-11 21:08 ` Vishal Verma
2016-05-11 21:08 ` [PATCH v7 3/6] dax: use sb_issue_zerout instead of calling dax_clear_sectors Vishal Verma
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Vishal Verma @ 2016-05-11 21:08 UTC (permalink / raw)
To: linux-nvdimm
Cc: Dan Williams, linux-fsdevel, linux-block, xfs, linux-ext4,
linux-mm, Ross Zwisler, Dave Chinner, Jan Kara, Jens Axboe,
Andrew Morton, linux-kernel, Christoph Hellwig, Jeff Moyer,
Boaz Harrosh, Vishal Verma
From: Dan Williams <dan.j.williams@intel.com>
1/ If a mapping overlaps a bad sector fail the request.
2/ Do not opportunistically report more dax-capable capacity than is
requested when errors present.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[vishal: fix a conflict with system RAM collision patches]
[vishal: add a 'size' parameter to ->direct_access]
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
arch/powerpc/sysdev/axonram.c | 2 +-
block/ioctl.c | 9 ---------
drivers/block/brd.c | 2 +-
drivers/nvdimm/pmem.c | 10 +++++++++-
drivers/s390/block/dcssblk.c | 2 +-
fs/block_dev.c | 2 +-
include/linux/blkdev.h | 2 +-
7 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 0d112b9..ff75d70 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -143,7 +143,7 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
*/
static long
axon_ram_direct_access(struct block_device *device, sector_t sector,
- void __pmem **kaddr, pfn_t *pfn)
+ void __pmem **kaddr, pfn_t *pfn, long size)
{
struct axon_ram_bank *bank = device->bd_disk->private_data;
loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
diff --git a/block/ioctl.c b/block/ioctl.c
index 4ff1f92..bf80bfd 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -423,15 +423,6 @@ bool blkdev_dax_capable(struct block_device *bdev)
|| (bdev->bd_part->nr_sects % (PAGE_SIZE / 512)))
return false;
- /*
- * If the device has known bad blocks, force all I/O through the
- * driver / page cache.
- *
- * TODO: support finer grained dax error handling
- */
- if (disk->bb && disk->bb->count)
- return false;
-
return true;
}
#endif
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 51a071e..c04bd9b 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -381,7 +381,7 @@ static int brd_rw_page(struct block_device *bdev, sector_t sector,
#ifdef CONFIG_BLK_DEV_RAM_DAX
static long brd_direct_access(struct block_device *bdev, sector_t sector,
- void __pmem **kaddr, pfn_t *pfn)
+ void __pmem **kaddr, pfn_t *pfn, long size)
{
struct brd_device *brd = bdev->bd_disk->private_data;
struct page *page;
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 5101f3a..b68f04d 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -182,14 +182,22 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
}
static long pmem_direct_access(struct block_device *bdev, sector_t sector,
- void __pmem **kaddr, pfn_t *pfn)
+ void __pmem **kaddr, pfn_t *pfn, long size)
{
struct pmem_device *pmem = bdev->bd_disk->private_data;
resource_size_t offset = sector * 512 + pmem->data_offset;
+ if (unlikely(is_bad_pmem(&pmem->bb, sector, size)))
+ return -EIO;
*kaddr = pmem->virt_addr + offset;
*pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
+ /*
+ * If badblocks are present, limit known good range to the
+ * requested range.
+ */
+ if (unlikely(pmem->bb.count))
+ return size;
return pmem->size - pmem->pfn_pad - offset;
}
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index b839086..c45d538 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -884,7 +884,7 @@ fail:
static long
dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
- void __pmem **kaddr, pfn_t *pfn)
+ void __pmem **kaddr, pfn_t *pfn, long size)
{
struct dcssblk_dev_info *dev_info;
unsigned long offset, dev_sz;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index b25bb23..02c68c4 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -488,7 +488,7 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
sector += get_start_sect(bdev);
if (sector % (PAGE_SIZE / 512))
return -EINVAL;
- avail = ops->direct_access(bdev, sector, &dax->addr, &dax->pfn);
+ avail = ops->direct_access(bdev, sector, &dax->addr, &dax->pfn, size);
if (!avail)
return -ERANGE;
if (avail > 0 && avail & ~PAGE_MASK)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 669e419..55ed530 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1657,7 +1657,7 @@ struct block_device_operations {
int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
long (*direct_access)(struct block_device *, sector_t, void __pmem **,
- pfn_t *);
+ pfn_t *, long);
unsigned int (*check_events) (struct gendisk *disk,
unsigned int clearing);
/* ->media_changed() is DEPRECATED, use ->check_events() instead */
--
2.5.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v7 3/6] dax: use sb_issue_zerout instead of calling dax_clear_sectors
2016-05-11 21:08 [PATCH v7 0/6] dax: handling media errors (clear-on-zero only) Vishal Verma
2016-05-11 21:08 ` [PATCH v7 1/6] dax: fallback from pmd to pte on error Vishal Verma
2016-05-11 21:08 ` [PATCH v7 2/6] dax: enable dax in the presence of known media errors (badblocks) Vishal Verma
@ 2016-05-11 21:08 ` Vishal Verma
2016-05-11 21:08 ` [PATCH v7 4/6] dax: export a low-level __dax_zero_page_range helper Vishal Verma
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Vishal Verma @ 2016-05-11 21:08 UTC (permalink / raw)
To: linux-nvdimm
Cc: Matthew Wilcox, linux-fsdevel, linux-block, xfs, linux-ext4,
linux-mm, Ross Zwisler, Dan Williams, Dave Chinner, Jan Kara,
Jens Axboe, Andrew Morton, linux-kernel, Christoph Hellwig,
Jeff Moyer, Boaz Harrosh, Vishal Verma
From: Matthew Wilcox <matthew.r.wilcox@intel.com>
dax_clear_sectors() cannot handle poisoned blocks. These must be
zeroed using the BIO interface instead. Convert ext2 and XFS to use
only sb_issue_zerout().
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
[vishal: Also remove the dax_clear_sectors function entirely]
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
fs/dax.c | 32 --------------------------------
fs/ext2/inode.c | 7 +++----
fs/xfs/xfs_bmap_util.c | 15 ++++-----------
include/linux/dax.h | 1 -
4 files changed, 7 insertions(+), 48 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index d602410..0abbbb6 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -87,38 +87,6 @@ struct page *read_dax_sector(struct block_device *bdev, sector_t n)
return page;
}
-/*
- * dax_clear_sectors() is called from within transaction context from XFS,
- * and hence this means the stack from this point must follow GFP_NOFS
- * semantics for all operations.
- */
-int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size)
-{
- struct blk_dax_ctl dax = {
- .sector = _sector,
- .size = _size,
- };
-
- might_sleep();
- do {
- long count, sz;
-
- count = dax_map_atomic(bdev, &dax);
- if (count < 0)
- return count;
- sz = min_t(long, count, SZ_128K);
- clear_pmem(dax.addr, sz);
- dax.size -= sz;
- dax.sector += sz / 512;
- dax_unmap_atomic(bdev, &dax);
- cond_resched();
- } while (dax.size);
-
- wmb_pmem();
- return 0;
-}
-EXPORT_SYMBOL_GPL(dax_clear_sectors);
-
static bool buffer_written(struct buffer_head *bh)
{
return buffer_mapped(bh) && !buffer_unwritten(bh);
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 1f07b75..35f2b0bf 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -26,6 +26,7 @@
#include <linux/highuid.h>
#include <linux/pagemap.h>
#include <linux/dax.h>
+#include <linux/blkdev.h>
#include <linux/quotaops.h>
#include <linux/writeback.h>
#include <linux/buffer_head.h>
@@ -737,10 +738,8 @@ static int ext2_get_blocks(struct inode *inode,
* so that it's not found by another thread before it's
* initialised
*/
- err = dax_clear_sectors(inode->i_sb->s_bdev,
- le32_to_cpu(chain[depth-1].key) <<
- (inode->i_blkbits - 9),
- 1 << inode->i_blkbits);
+ err = sb_issue_zeroout(inode->i_sb,
+ le32_to_cpu(chain[depth-1].key), 1, GFP_NOFS);
if (err) {
mutex_unlock(&ei->truncate_mutex);
goto cleanup;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 3b63098..930ac6a 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -72,18 +72,11 @@ xfs_zero_extent(
struct xfs_mount *mp = ip->i_mount;
xfs_daddr_t sector = xfs_fsb_to_db(ip, start_fsb);
sector_t block = XFS_BB_TO_FSBT(mp, sector);
- ssize_t size = XFS_FSB_TO_B(mp, count_fsb);
-
- if (IS_DAX(VFS_I(ip)))
- return dax_clear_sectors(xfs_find_bdev_for_inode(VFS_I(ip)),
- sector, size);
-
- /*
- * let the block layer decide on the fastest method of
- * implementing the zeroing.
- */
- return sb_issue_zeroout(mp->m_super, block, count_fsb, GFP_NOFS);
+ return blkdev_issue_zeroout(xfs_find_bdev_for_inode(VFS_I(ip)),
+ block << (mp->m_super->s_blocksize_bits - 9),
+ count_fsb << (mp->m_super->s_blocksize_bits - 9),
+ GFP_NOFS, true);
}
/*
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 7c45ac7..7f853ff 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,7 +7,6 @@
ssize_t dax_do_io(struct kiocb *, struct inode *, struct iov_iter *, loff_t,
get_block_t, dio_iodone_t, int flags);
-int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size);
int dax_zero_page_range(struct inode *, loff_t from, unsigned len, get_block_t);
int dax_truncate_page(struct inode *, loff_t from, get_block_t);
int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t);
--
2.5.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v7 4/6] dax: export a low-level __dax_zero_page_range helper
2016-05-11 21:08 [PATCH v7 0/6] dax: handling media errors (clear-on-zero only) Vishal Verma
` (2 preceding siblings ...)
2016-05-11 21:08 ` [PATCH v7 3/6] dax: use sb_issue_zerout instead of calling dax_clear_sectors Vishal Verma
@ 2016-05-11 21:08 ` Vishal Verma
2016-05-12 8:41 ` Jan Kara
2016-05-11 21:08 ` [PATCH v7 5/6] dax: for truncate/hole-punch, do zeroing through the driver if possible Vishal Verma
2016-05-11 21:08 ` [PATCH v7 6/6] dax: fix a comment in dax_zero_page_range and dax_truncate_page Vishal Verma
5 siblings, 1 reply; 10+ messages in thread
From: Vishal Verma @ 2016-05-11 21:08 UTC (permalink / raw)
To: linux-nvdimm
Cc: Christoph Hellwig, linux-fsdevel, linux-block, xfs, linux-ext4,
linux-mm, Ross Zwisler, Dan Williams, Dave Chinner, Jan Kara,
Jens Axboe, Andrew Morton, linux-kernel, Christoph Hellwig,
Jeff Moyer, Boaz Harrosh
From: Christoph Hellwig <hch@lst.de>
This allows XFS to perform zeroing using the iomap infrastructure and
avoid buffer heads.
[vishal: fix conflicts with dax-error-handling]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/dax.c | 35 ++++++++++++++++++++---------------
include/linux/dax.h | 7 +++++++
2 files changed, 27 insertions(+), 15 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 0abbbb6..651d4b1 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -947,6 +947,23 @@ int dax_pfn_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
}
EXPORT_SYMBOL_GPL(dax_pfn_mkwrite);
+int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
+ unsigned int offset, unsigned int length)
+{
+ struct blk_dax_ctl dax = {
+ .sector = sector,
+ .size = PAGE_SIZE,
+ };
+
+ if (dax_map_atomic(bdev, &dax) < 0)
+ return PTR_ERR(dax.addr);
+ clear_pmem(dax.addr + offset, length);
+ wmb_pmem();
+ dax_unmap_atomic(bdev, &dax);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(__dax_zero_page_range);
+
/**
* dax_zero_page_range - zero a range within a page of a DAX file
* @inode: The file being truncated
@@ -982,23 +999,11 @@ int dax_zero_page_range(struct inode *inode, loff_t from, unsigned length,
bh.b_bdev = inode->i_sb->s_bdev;
bh.b_size = PAGE_SIZE;
err = get_block(inode, index, &bh, 0);
- if (err < 0)
+ if (err < 0 || !buffer_written(&bh))
return err;
- if (buffer_written(&bh)) {
- struct block_device *bdev = bh.b_bdev;
- struct blk_dax_ctl dax = {
- .sector = to_sector(&bh, inode),
- .size = PAGE_SIZE,
- };
- if (dax_map_atomic(bdev, &dax) < 0)
- return PTR_ERR(dax.addr);
- clear_pmem(dax.addr + offset, length);
- wmb_pmem();
- dax_unmap_atomic(bdev, &dax);
- }
-
- return 0;
+ return __dax_zero_page_range(bh.b_bdev, to_sector(&bh, inode),
+ offset, length);
}
EXPORT_SYMBOL_GPL(dax_zero_page_range);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 7f853ff..90fbc99 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -14,12 +14,19 @@ int __dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t);
#ifdef CONFIG_FS_DAX
struct page *read_dax_sector(struct block_device *bdev, sector_t n);
+int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
+ unsigned int offset, unsigned int length);
#else
static inline struct page *read_dax_sector(struct block_device *bdev,
sector_t n)
{
return ERR_PTR(-ENXIO);
}
+static inline int __dax_zero_page_range(struct block_device *bdev,
+ sector_t sector, unsigned int offset, unsigned int length)
+{
+ return -ENXIO;
+}
#endif
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
--
2.5.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v7 4/6] dax: export a low-level __dax_zero_page_range helper
2016-05-11 21:08 ` [PATCH v7 4/6] dax: export a low-level __dax_zero_page_range helper Vishal Verma
@ 2016-05-12 8:41 ` Jan Kara
2016-05-12 17:06 ` Verma, Vishal L
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2016-05-12 8:41 UTC (permalink / raw)
To: Vishal Verma
Cc: linux-nvdimm, Christoph Hellwig, linux-fsdevel, linux-block, xfs,
linux-ext4, linux-mm, Ross Zwisler, Dan Williams, Dave Chinner,
Jan Kara, Jens Axboe, Andrew Morton, linux-kernel,
Christoph Hellwig, Jeff Moyer, Boaz Harrosh
On Wed 11-05-16 15:08:50, Vishal Verma wrote:
> From: Christoph Hellwig <hch@lst.de>
>
> This allows XFS to perform zeroing using the iomap infrastructure and
> avoid buffer heads.
>
> [vishal: fix conflicts with dax-error-handling]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
BTW: You are supposed to add your Signed-off-by when forwarding patches
like this...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v7 4/6] dax: export a low-level __dax_zero_page_range helper
2016-05-12 8:41 ` Jan Kara
@ 2016-05-12 17:06 ` Verma, Vishal L
0 siblings, 0 replies; 10+ messages in thread
From: Verma, Vishal L @ 2016-05-12 17:06 UTC (permalink / raw)
To: jack@suse.cz
Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
hch@infradead.org, xfs@oss.sgi.com, jmoyer@redhat.com,
linux-mm@kvack.org, hch@lst.de, Williams, Dan J, axboe@fb.com,
akpm@linux-foundation.org, linux-nvdimm@lists.01.org,
linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com,
linux-ext4@vger.kernel.org, boaz@plexistor.com,
david@fromorbit.com
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 729 bytes --]
On Thu, 2016-05-12 at 10:41 +0200, Jan Kara wrote:
> On Wed 11-05-16 15:08:50, Vishal Verma wrote:
> >
> > From: Christoph Hellwig <hch@lst.de>
> >
> > This allows XFS to perform zeroing using the iomap infrastructure
> > and
> > avoid buffer heads.
> >
> > [vishal: fix conflicts with dax-error-handling]
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> Looks good. You can add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
> BTW: You are supposed to add your Signed-off-by when forwarding
> patches
> like this...
Ah I didn't know. I'll add it when making the stable topic branch for
this. Thanks!
N§²æìr¸zǧu©²Æ {\béì¹»\x1c®&Þ)îÆi¢Ø^nr¶Ý¢j$½§$¢¸\x05¢¹¨è§~'.)îÄÃ,yèm¶ÿÃ\f%{±j+ðèצj)Z·
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v7 5/6] dax: for truncate/hole-punch, do zeroing through the driver if possible
2016-05-11 21:08 [PATCH v7 0/6] dax: handling media errors (clear-on-zero only) Vishal Verma
` (3 preceding siblings ...)
2016-05-11 21:08 ` [PATCH v7 4/6] dax: export a low-level __dax_zero_page_range helper Vishal Verma
@ 2016-05-11 21:08 ` Vishal Verma
2016-05-12 8:38 ` Jan Kara
2016-05-11 21:08 ` [PATCH v7 6/6] dax: fix a comment in dax_zero_page_range and dax_truncate_page Vishal Verma
5 siblings, 1 reply; 10+ messages in thread
From: Vishal Verma @ 2016-05-11 21:08 UTC (permalink / raw)
To: linux-nvdimm
Cc: Vishal Verma, linux-fsdevel, linux-block, xfs, linux-ext4,
linux-mm, Ross Zwisler, Dan Williams, Dave Chinner, Jan Kara,
Jens Axboe, Andrew Morton, linux-kernel, Christoph Hellwig,
Jeff Moyer, Boaz Harrosh
In the truncate or hole-punch path in dax, we clear out sub-page ranges.
If these sub-page ranges are sector aligned and sized, we can do the
zeroing through the driver instead so that error-clearing is handled
automatically.
For sub-sector ranges, we still have to rely on clear_pmem and have the
possibility of tripping over errors.
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
Documentation/filesystems/dax.txt | 32 ++++++++++++++++++++++++++++++++
fs/dax.c | 30 +++++++++++++++++++++++++-----
2 files changed, 57 insertions(+), 5 deletions(-)
diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
index 7bde640..ce4587d 100644
--- a/Documentation/filesystems/dax.txt
+++ b/Documentation/filesystems/dax.txt
@@ -79,6 +79,38 @@ These filesystems may be used for inspiration:
- ext4: the fourth extended filesystem, see Documentation/filesystems/ext4.txt
+Handling Media Errors
+---------------------
+
+The libnvdimm subsystem stores a record of known media error locations for
+each pmem block device (in gendisk->badblocks). If we fault at such location,
+or one with a latent error not yet discovered, the application can expect
+to receive a SIGBUS. Libnvdimm also allows clearing of these errors by simply
+writing the affected sectors (through the pmem driver, and if the underlying
+NVDIMM supports the clear_poison DSM defined by ACPI).
+
+Since DAX IO normally doesn't go through the driver/bio path, applications or
+sysadmins have an option to restore the lost data from a prior backup/inbuilt
+redundancy in the following ways:
+
+1. Delete the affected file, and restore from a backup (sysadmin route):
+ This will free the file system blocks that were being used by the file,
+ and the next time they're allocated, they will be zeroed first, which
+ happens through the driver, and will clear bad sectors.
+
+2. Truncate or hole-punch the part of the file that has a bad-block (at least
+ an entire aligned sector has to be hole-punched, but not necessarily an
+ entire filesystem block).
+
+These are the two basic paths that allow DAX filesystems to continue operating
+in the presence of media errors. More robust error recovery mechanisms can be
+built on top of this in the future, for example, involving redundancy/mirroring
+provided at the block layer through DM, or additionally, at the filesystem
+level. These would have to rely on the above two tenets, that error clearing
+can happen either by sending an IO through the driver, or zeroing (also through
+the driver).
+
+
Shortcomings
------------
diff --git a/fs/dax.c b/fs/dax.c
index 651d4b1..0b9a169 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -947,6 +947,19 @@ int dax_pfn_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
}
EXPORT_SYMBOL_GPL(dax_pfn_mkwrite);
+static bool dax_range_is_aligned(struct block_device *bdev,
+ unsigned int offset, unsigned int length)
+{
+ unsigned short sector_size = bdev_logical_block_size(bdev);
+
+ if (!IS_ALIGNED(offset, sector_size))
+ return false;
+ if (!IS_ALIGNED(length, sector_size))
+ return false;
+
+ return true;
+}
+
int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
unsigned int offset, unsigned int length)
{
@@ -955,11 +968,18 @@ int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
.size = PAGE_SIZE,
};
- if (dax_map_atomic(bdev, &dax) < 0)
- return PTR_ERR(dax.addr);
- clear_pmem(dax.addr + offset, length);
- wmb_pmem();
- dax_unmap_atomic(bdev, &dax);
+ if (dax_range_is_aligned(bdev, offset, length)) {
+ sector_t start_sector = dax.sector + (offset >> 9);
+
+ return blkdev_issue_zeroout(bdev, start_sector,
+ length >> 9, GFP_NOFS, true);
+ } else {
+ if (dax_map_atomic(bdev, &dax) < 0)
+ return PTR_ERR(dax.addr);
+ clear_pmem(dax.addr + offset, length);
+ wmb_pmem();
+ dax_unmap_atomic(bdev, &dax);
+ }
return 0;
}
EXPORT_SYMBOL_GPL(__dax_zero_page_range);
--
2.5.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v7 5/6] dax: for truncate/hole-punch, do zeroing through the driver if possible
2016-05-11 21:08 ` [PATCH v7 5/6] dax: for truncate/hole-punch, do zeroing through the driver if possible Vishal Verma
@ 2016-05-12 8:38 ` Jan Kara
0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2016-05-12 8:38 UTC (permalink / raw)
To: Vishal Verma
Cc: linux-nvdimm, linux-fsdevel, linux-block, xfs, linux-ext4,
linux-mm, Ross Zwisler, Dan Williams, Dave Chinner, Jan Kara,
Jens Axboe, Andrew Morton, linux-kernel, Christoph Hellwig,
Jeff Moyer, Boaz Harrosh
On Wed 11-05-16 15:08:51, Vishal Verma wrote:
> In the truncate or hole-punch path in dax, we clear out sub-page ranges.
> If these sub-page ranges are sector aligned and sized, we can do the
> zeroing through the driver instead so that error-clearing is handled
> automatically.
>
> For sub-sector ranges, we still have to rely on clear_pmem and have the
> possibility of tripping over errors.
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Cc: Jeff Moyer <jmoyer@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Jan Kara <jack@suse.cz>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
The patch looks good to me now. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v7 6/6] dax: fix a comment in dax_zero_page_range and dax_truncate_page
2016-05-11 21:08 [PATCH v7 0/6] dax: handling media errors (clear-on-zero only) Vishal Verma
` (4 preceding siblings ...)
2016-05-11 21:08 ` [PATCH v7 5/6] dax: for truncate/hole-punch, do zeroing through the driver if possible Vishal Verma
@ 2016-05-11 21:08 ` Vishal Verma
5 siblings, 0 replies; 10+ messages in thread
From: Vishal Verma @ 2016-05-11 21:08 UTC (permalink / raw)
To: linux-nvdimm
Cc: Vishal Verma, linux-fsdevel, linux-block, xfs, linux-ext4,
linux-mm, Ross Zwisler, Dan Williams, Dave Chinner, Jan Kara,
Jens Axboe, Andrew Morton, linux-kernel, Christoph Hellwig,
Jeff Moyer, Boaz Harrosh, Kirill A. Shutemov
The distinction between PAGE_SIZE and PAGE_CACHE_SIZE was removed in
09cbfea mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release}
macros
The comments for the above functions described a distinction between
those, that is now redundant, so remove those paragraphs
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
fs/dax.c | 12 ------------
1 file changed, 12 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 0b9a169..ea936b3 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -995,12 +995,6 @@ EXPORT_SYMBOL_GPL(__dax_zero_page_range);
* page in a DAX file. This is intended for hole-punch operations. If
* you are truncating a file, the helper function dax_truncate_page() may be
* more convenient.
- *
- * We work in terms of PAGE_SIZE here for commonality with
- * block_truncate_page(), but we could go down to PAGE_SIZE if the filesystem
- * took care of disposing of the unnecessary blocks. Even if the filesystem
- * block size is smaller than PAGE_SIZE, we have to zero the rest of the page
- * since the file might be mmapped.
*/
int dax_zero_page_range(struct inode *inode, loff_t from, unsigned length,
get_block_t get_block)
@@ -1035,12 +1029,6 @@ EXPORT_SYMBOL_GPL(dax_zero_page_range);
*
* Similar to block_truncate_page(), this function can be called by a
* filesystem when it is truncating a DAX file to handle the partial page.
- *
- * We work in terms of PAGE_SIZE here for commonality with
- * block_truncate_page(), but we could go down to PAGE_SIZE if the filesystem
- * took care of disposing of the unnecessary blocks. Even if the filesystem
- * block size is smaller than PAGE_SIZE, we have to zero the rest of the page
- * since the file might be mmapped.
*/
int dax_truncate_page(struct inode *inode, loff_t from, get_block_t get_block)
{
--
2.5.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 10+ messages in thread