* Re: [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function
[not found] ` <1415336894-15327-4-git-send-email-martin.petersen@oracle.com>
@ 2014-11-11 0:04 ` Darrick J. Wong
2014-11-11 2:33 ` Martin K. Petersen
2014-11-17 19:28 ` [PATCH] block: create ioctl to discard-or-zeroout a range of blocks Darrick J. Wong
0 siblings, 2 replies; 3+ messages in thread
From: Darrick J. Wong @ 2014-11-11 0:04 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: linux-scsi, linux-ide, linux-fsdevel, neilb, linux-api
On Fri, Nov 07, 2014 at 12:08:14AM -0500, Martin K. Petersen wrote:
> blkdev_issue_discard() will zero a given block range on disk. This is
> done by way of either WRITE SAME or regular WRITE. I.e. the blocks on
> disk will be written and thus provisioned.
>
> There are use cases where the desired behavior is to zero the blocks but
> unprovision them if possible. The blocks must deterministically contain
> zeroes when they are subsequently read back.
>
> This patch introduces a blkdev_issue_zeroout_discard() call that
> provides this functionality. If a block device guarantees
> discard_zeroes_data the new function will use discard to clear the block
> range. If the device does not support discard_zeroes_data or if the
> discard request fails we will fall back to blkdev_issue_zeroout() to
> ensure predictable results.
Can this be plumbed into a BLK* ioctl too? I'll write a patch, if this is ok
with everyone:
struct blkzeroout_t {
__u64 start;
__u64 end;
__u32 flags;
};
#define BLKZEROOUT_DISCARD_OK 1
#define BLKZEROOUT_V2 _IOR(0x12, 127, sizeof(struct blkzeroout_t))
...and make it zap the page cache per earlier discussion. This seems to be a
good fit with what we've been discussing for mke2fs.
--D
>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
> block/blk-lib.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
> include/linux/blkdev.h | 2 ++
> 2 files changed, 44 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 8411be3c19d3..2ffec6a01c71 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -278,14 +278,18 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> }
>
> /**
> - * blkdev_issue_zeroout - zero-fill a block range
> + * blkdev_issue_zeroout - zero-fill and provision a block range
> * @bdev: blockdev to write
> * @sector: start sector
> * @nr_sects: number of sectors to write
> * @gfp_mask: memory allocation flags (for bio_alloc)
> *
> * Description:
> - * Generate and issue number of bios with zerofiled pages.
> + * Zero-fill a block range. The blocks will be provisioned
> + * (allocated/anchored) and are guaranteed to return zeroes when read
> + * back. This function will attempt to use WRITE SAME to optimize the
> + * process if the block device supports it. Otherwise it will fall back
> + * to zeroing the blocks using regular WRITE calls.
> */
>
> int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> @@ -305,3 +309,39 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
> }
> EXPORT_SYMBOL(blkdev_issue_zeroout);
> +
> +/**
> + * blkdev_issue_zeroout_discard - zero-fill and attempt to discard block range
> + * @bdev: blockdev to write
> + * @sector: start sector
> + * @nr_sects: number of sectors to write
> + * @gfp_mask: memory allocation flags (for bio_alloc)
> + *
> + * Description:
> + * Zero-fill a block range. In contrast to blkdev_issue_zeroout() this
> + * function will attempt to deprovision (deallocate/discard) the blocks
> + * in question. It will only do so if the underlying device guarantees
> + * that subsequent READ operations to the block range in question will
> + * return zeroes. If the device does not provide hard guarantees or if
> + * the DISCARD attempt should fail the block range will be explicitly
> + * zeroed using blkdev_issue_zeroout().
> + */
> +
> +int blkdev_issue_zeroout_discard(struct block_device *bdev, sector_t sector,
> + sector_t nr_sects, gfp_t gfp_mask)
> +{
> + struct request_queue *q = bdev_get_queue(bdev);
> +
> + if (blk_queue_discard(q) && q->limits.discard_zeroes_data) {
> + unsigned char bdn[BDEVNAME_SIZE];
> +
> + if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0))
> + return 0;
> +
> + bdevname(bdev, bdn);
> + pr_err("%s: DISCARD failed. Manually zeroing.\n", bdn);
> + }
> +
> + return blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
> +}
> +EXPORT_SYMBOL(blkdev_issue_zeroout_discard);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index aac0f9ea952a..078b6e5f488a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1164,6 +1164,8 @@ extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
> sector_t nr_sects, gfp_t gfp_mask, struct page *page);
> extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> sector_t nr_sects, gfp_t gfp_mask);
> +extern int blkdev_issue_zeroout_discard(struct block_device *bdev,
> + sector_t sector, sector_t nr_sects, gfp_t gfp_mask);
> static inline int sb_issue_discard(struct super_block *sb, sector_t block,
> sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags)
> {
> --
> 1.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function
2014-11-11 0:04 ` [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function Darrick J. Wong
@ 2014-11-11 2:33 ` Martin K. Petersen
2014-11-17 19:28 ` [PATCH] block: create ioctl to discard-or-zeroout a range of blocks Darrick J. Wong
1 sibling, 0 replies; 3+ messages in thread
From: Martin K. Petersen @ 2014-11-11 2:33 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Martin K. Petersen, linux-scsi, linux-ide, linux-fsdevel, neilb,
linux-api
>>>>> "Darrick" == Darrick J Wong <darrick.wong@oracle.com> writes:
Darrick> Can this be plumbed into a BLK* ioctl too? I'll write a patch,
Darrick> if this is ok with everyone:
Darrick> ...and make it zap the page cache per earlier discussion. This
Darrick> seems to be a good fit with what we've been discussing for
Darrick> mke2fs.
That sounds good to me. I'll get the updated patch out tomorrow.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH] block: create ioctl to discard-or-zeroout a range of blocks
2014-11-11 0:04 ` [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function Darrick J. Wong
2014-11-11 2:33 ` Martin K. Petersen
@ 2014-11-17 19:28 ` Darrick J. Wong
1 sibling, 0 replies; 3+ messages in thread
From: Darrick J. Wong @ 2014-11-17 19:28 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: linux-scsi, linux-ide, linux-fsdevel, neilb, linux-api
Create a new ioctl to expose the block layer's newfound ability to
issue either a zeroing discard, a WRITE SAME with a zero page, or a
regular write with the zero page. This BLKZEROOUT2 ioctl takes
{start, length, flags} as parameters. So far, the only flag available
is to enable the zeroing discard part -- without it, the call invokes
the old BLKZEROOUT behavior. start and length have the same meaning
as in BLKZEROOUT.
Furthermore, because BLKZEROOUT2 issues commands directly to the
storage device, we must invalidate the page cache (as a regular
O_DIRECT write would do) to avoid returning stale cache contents at a
later time.
This patch depends on mkp's earlier patch "block: Introduce
blkdev_issue_zeroout_discard() function".
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
block/ioctl.c | 45 ++++++++++++++++++++++++++++++++++++++-------
include/uapi/linux/fs.h | 7 +++++++
2 files changed, 45 insertions(+), 7 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index 7d8befd..ff623d5 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -186,19 +186,39 @@ static int blk_ioctl_discard(struct block_device *bdev, uint64_t start,
}
static int blk_ioctl_zeroout(struct block_device *bdev, uint64_t start,
- uint64_t len)
+ uint64_t len, uint32_t flags)
{
+ int ret;
+ struct address_space *mapping;
+ uint64_t end = start + len - 1;
+
+ if (flags & ~BLKZEROOUT2_DISCARD_OK)
+ return -EINVAL;
if (start & 511)
return -EINVAL;
if (len & 511)
return -EINVAL;
- start >>= 9;
- len >>= 9;
-
- if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+ if (end >= i_size_read(bdev->bd_inode))
return -EINVAL;
- return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+ /* Invalidate the page cache, including dirty pages */
+ mapping = bdev->bd_inode->i_mapping;
+ truncate_inode_pages_range(mapping, start, end);
+
+ ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+ flags & BLKZEROOUT2_DISCARD_OK);
+ if (ret)
+ goto out;
+
+ /*
+ * Invalidate again; if someone wandered in and dirtied a page,
+ * the caller will be given -EBUSY.
+ */
+ ret = invalidate_inode_pages2_range(mapping,
+ start >> PAGE_CACHE_SHIFT,
+ end >> PAGE_CACHE_SHIFT);
+out:
+ return ret;
}
static int put_ushort(unsigned long arg, unsigned short val)
@@ -326,7 +346,18 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
if (copy_from_user(range, (void __user *)arg, sizeof(range)))
return -EFAULT;
- return blk_ioctl_zeroout(bdev, range[0], range[1]);
+ return blk_ioctl_zeroout(bdev, range[0], range[1], 0);
+ }
+ case BLKZEROOUT2: {
+ struct blkzeroout2 p;
+
+ if (!(mode & FMODE_WRITE))
+ return -EBADF;
+
+ if (copy_from_user(&p, (void __user *)arg, sizeof(p)))
+ return -EFAULT;
+
+ return blk_ioctl_zeroout(bdev, p.start, p.length, p.flags);
}
case HDIO_GETGEO: {
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 3735fa0..54d24ea 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -150,6 +150,13 @@ struct inodes_stat_t {
#define BLKSECDISCARD _IO(0x12,125)
#define BLKROTATIONAL _IO(0x12,126)
#define BLKZEROOUT _IO(0x12,127)
+struct blkzeroout2 {
+ __u64 start;
+ __u64 length;
+ __u32 flags;
+};
+#define BLKZEROOUT2_DISCARD_OK 1
+#define BLKZEROOUT2 _IOR(0x12, 127, struct blkzeroout2)
#define BMAP_IOCTL 1 /* obsolete - kept for compatibility */
#define FIBMAP _IO(0x00,1) /* bmap access */
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-11-17 19:28 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1415336894-15327-1-git-send-email-martin.petersen@oracle.com>
[not found] ` <1415336894-15327-4-git-send-email-martin.petersen@oracle.com>
2014-11-11 0:04 ` [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function Darrick J. Wong
2014-11-11 2:33 ` Martin K. Petersen
2014-11-17 19:28 ` [PATCH] block: create ioctl to discard-or-zeroout a range of blocks Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).