* [PATCH v10 0/3] fallocate for block devices @ 2016-09-29 0:39 Darrick J. Wong [not found] ` <147510957066.8940.13803086684642725401.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org> ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Darrick J. Wong @ 2016-09-29 0:39 UTC (permalink / raw) To: axboe, akpm, darrick.wong Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs, hch, dm-devel, linux-fsdevel, bart.vanassche Hi Andrew & others, This is a resend of the patchset to fix page cache coherency with BLKZEROOUT and implement fallocate for block devices. This time I'm sending it direct to Andrew for inclusion because the block layer maintainer has not been responsive over the past year of submissions. Can this please go upstream for 4.9? The first patch is still a fix to the existing BLKZEROOUT ioctl to invalidate the page cache if the zeroing command to the underlying device succeeds. Without this patch we still have the pagecache coherence bug that's been in the kernel forever. The second patch changes the internal block device functions to reject attempts to discard or zeroout that are not aligned to the logical block size. Previously, we only checked that the start/len parameters were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs to 4k-LBA devices. The third patch creates an fallocate handler for block devices, wires up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent fallocate interface between files and block devices. It also allows the combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing discard. Test cases for the new block device fallocate are now in xfstests as generic/349-351. Comments and questions are, as always, welcome. Patches are against 4.8-rc8. v7: Strengthen parameter checking and fix various code issues pointed out by Linus and Christoph. v8: More code rearranging, rebase to 4.6-rc3, and dig into alignment issues. v9: Forward port to 4.7. v10: Forward port to 4.8. Remove the extra call to invalidate_inode_pages2_range per Bart Van Assche's request. --D ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <147510957066.8940.13803086684642725401.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>]
* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT. [not found] ` <147510957066.8940.13803086684642725401.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org> @ 2016-09-29 0:39 ` Darrick J. Wong 2016-09-29 1:16 ` Bart Van Assche 2016-09-29 5:56 ` Hannes Reinecke 0 siblings, 2 replies; 13+ messages in thread From: Darrick J. Wong @ 2016-09-29 0:39 UTC (permalink / raw) To: axboe-tSWWG44O7X1aa/9Udqfwiw, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, darrick.wong-QHcLZuEGTsvQT0dZR+AlfA Cc: linux-block-u79uwXL29TY76Z2rM5mHXA, tytso-3s7WtUTddSA, martin.petersen-QHcLZuEGTsvQT0dZR+AlfA, snitzer-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA, bfoster-H+wXaHxf7aLQT0dZR+AlfA, xfs-VZNHf3L845pBDgjK7y7TUQ, hch-wEGCiKHe2LqWVfeAwA7xHQ, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ, Christoph Hellwig Invalidate the page cache (as a regular O_DIRECT write would do) to avoid returning stale cache contents at a later time. Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> Reviewed-by: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> --- block/ioctl.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/block/ioctl.c b/block/ioctl.c index ed2397f..755119c 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -225,7 +225,8 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode, unsigned long arg) { uint64_t range[2]; - uint64_t start, len; + struct address_space *mapping; + uint64_t start, end, len; if (!(mode & FMODE_WRITE)) return -EBADF; @@ -235,18 +236,23 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode, start = range[0]; len = range[1]; + end = start + len - 1; if (start & 511) return -EINVAL; if (len & 511) return -EINVAL; - start >>= 9; - len >>= 9; - - if (start + len > (i_size_read(bdev->bd_inode) >> 9)) + if (end >= (uint64_t)i_size_read(bdev->bd_inode)) + return -EINVAL; + if (end < start) return -EINVAL; - return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false); + /* Invalidate the page cache, including dirty pages */ + mapping = bdev->bd_inode->i_mapping; + truncate_inode_pages_range(mapping, start, end); + + return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL, + false); } static int put_ushort(unsigned long arg, unsigned short val) ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT. 2016-09-29 0:39 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong @ 2016-09-29 1:16 ` Bart Van Assche 2016-09-29 5:56 ` Hannes Reinecke 1 sibling, 0 replies; 13+ messages in thread From: Bart Van Assche @ 2016-09-29 1:16 UTC (permalink / raw) To: Darrick J. Wong, axboe, akpm Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs, linux-block, dm-devel, linux-fsdevel, Christoph Hellwig On 09/28/16 17:39, Darrick J. Wong wrote: > Invalidate the page cache (as a regular O_DIRECT write would do) to avoid > returning stale cache contents at a later time. Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT. 2016-09-29 0:39 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong 2016-09-29 1:16 ` Bart Van Assche @ 2016-09-29 5:56 ` Hannes Reinecke 1 sibling, 0 replies; 13+ messages in thread From: Hannes Reinecke @ 2016-09-29 5:56 UTC (permalink / raw) To: Darrick J. Wong, axboe, akpm Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs, hch, dm-devel, linux-fsdevel, bart.vanassche, Christoph Hellwig On 09/29/2016 02:39 AM, Darrick J. Wong wrote: > Invalidate the page cache (as a regular O_DIRECT write would do) to avoid > returning stale cache contents at a later time. > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > Reviewed-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> > --- > block/ioctl.c | 18 ++++++++++++------ > 1 file changed, 12 insertions(+), 6 deletions(-) > Reviewed-by: Hannes Reinecke <hare@suse.com> Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/3] block: require write_same and discard requests align to logical block size 2016-09-29 0:39 [PATCH v10 0/3] fallocate for block devices Darrick J. Wong [not found] ` <147510957066.8940.13803086684642725401.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org> @ 2016-09-29 0:39 ` Darrick J. Wong [not found] ` <147510958448.8940.14280630935441533825.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org> 2016-09-29 0:39 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong 2 siblings, 1 reply; 13+ messages in thread From: Darrick J. Wong @ 2016-09-29 0:39 UTC (permalink / raw) To: axboe, akpm, darrick.wong Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs, linux-block, dm-devel, linux-fsdevel, bart.vanassche, Christoph Hellwig Make sure that the offset and length arguments that we're using to construct WRITE SAME and DISCARD requests are actually aligned to the logical block size. Failure to do this causes other errors in other parts of the block layer or the SCSI layer because disks don't support partial logical block writes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> --- block/blk-lib.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/block/blk-lib.c b/block/blk-lib.c index 083e56f..46fe924 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -31,6 +31,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, unsigned int granularity; enum req_op op; int alignment; + sector_t bs_mask; if (!q) return -ENXIO; @@ -50,6 +51,10 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, op = REQ_OP_DISCARD; } + bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1; + if ((sector | nr_sects) & bs_mask) + return -EINVAL; + /* Zero-sector (unknown) and one-sector granularities are the same. */ granularity = max(q->limits.discard_granularity >> 9, 1U); alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; @@ -150,10 +155,15 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, unsigned int max_write_same_sectors; struct bio *bio = NULL; int ret = 0; + sector_t bs_mask; if (!q) return -ENXIO; + bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1; + if ((sector | nr_sects) & bs_mask) + return -EINVAL; + /* Ensure that max_write_same_sectors doesn't overflow bi_size */ max_write_same_sectors = UINT_MAX >> 9; @@ -202,6 +212,11 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, int ret; struct bio *bio = NULL; unsigned int sz; + sector_t bs_mask; + + bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1; + if ((sector | nr_sects) & bs_mask) + return -EINVAL; while (nr_sects != 0) { bio = next_bio(bio, min(nr_sects, (sector_t)BIO_MAX_PAGES), _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 13+ messages in thread
[parent not found: <147510958448.8940.14280630935441533825.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>]
* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size [not found] ` <147510958448.8940.14280630935441533825.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org> @ 2016-09-29 5:56 ` Hannes Reinecke 0 siblings, 0 replies; 13+ messages in thread From: Hannes Reinecke @ 2016-09-29 5:56 UTC (permalink / raw) To: Darrick J. Wong, axboe-tSWWG44O7X1aa/9Udqfwiw, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b Cc: linux-block-u79uwXL29TY76Z2rM5mHXA, tytso-3s7WtUTddSA, martin.petersen-QHcLZuEGTsvQT0dZR+AlfA, snitzer-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA, bfoster-H+wXaHxf7aLQT0dZR+AlfA, xfs-VZNHf3L845pBDgjK7y7TUQ, hch-wEGCiKHe2LqWVfeAwA7xHQ, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ, Christoph Hellwig On 09/29/2016 02:39 AM, Darrick J. Wong wrote: > Make sure that the offset and length arguments that we're using to > construct WRITE SAME and DISCARD requests are actually aligned to the > logical block size. Failure to do this causes other errors in other > parts of the block layer or the SCSI layer because disks don't support > partial logical block writes. > > Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> > Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> > Reviewed-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> > Reviewed-by: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> > --- > block/blk-lib.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > Reviewed-by: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org> Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare-l3A5Bk7waGM@public.gmane.org +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 3/3] block: implement (some of) fallocate for block devices 2016-09-29 0:39 [PATCH v10 0/3] fallocate for block devices Darrick J. Wong [not found] ` <147510957066.8940.13803086684642725401.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org> 2016-09-29 0:39 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong @ 2016-09-29 0:39 ` Darrick J. Wong 2016-09-29 1:42 ` Bart Van Assche ` (2 more replies) 2 siblings, 3 replies; 13+ messages in thread From: Darrick J. Wong @ 2016-09-29 0:39 UTC (permalink / raw) To: axboe, akpm, darrick.wong Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs, hch, dm-devel, linux-fsdevel, bart.vanassche After much discussion, it seems that the fallocate feature flag FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been whitelisted for zeroing SCSI UNMAP. Punch still requires that FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the device will be clamped to the device size if KEEP_SIZE is set; or will return -EINVAL if not. Both start and length must be aligned to the device's logical block size. Since the semantics of fallocate are fairly well established already, wire up the two pieces. The other fallocate variants (collapse range, insert range, and allocate blocks) are not supported. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- v2: Incorporate feedback from Christoph & Linus. Tentatively add a requirement that the fallocate arguments be aligned to logical block size, and put in a few XXX comments ahead of LSF discussion. v3: Forward port to 4.7. v4: Forward port to 4.8. --- fs/block_dev.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/open.c | 3 +- 2 files changed, 80 insertions(+), 1 deletion(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 08ae993..0c808fc 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -30,6 +30,7 @@ #include <linux/cleancache.h> #include <linux/dax.h> #include <linux/badblocks.h> +#include <linux/falloc.h> #include <asm/uaccess.h> #include "internal.h" @@ -1787,6 +1788,82 @@ static const struct address_space_operations def_blk_aops = { .is_dirty_writeback = buffer_check_dirty_writeback, }; +#define BLKDEV_FALLOC_FL_SUPPORTED \ + (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \ + FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE) + +static long blkdev_fallocate(struct file *file, int mode, loff_t start, + loff_t len) +{ + struct block_device *bdev = I_BDEV(bdev_file_inode(file)); + struct request_queue *q = bdev_get_queue(bdev); + struct address_space *mapping; + loff_t end = start + len - 1; + loff_t isize; + int error; + + /* Fail if we don't recognize the flags. */ + if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED) + return -EOPNOTSUPP; + + /* Don't go off the end of the device. */ + isize = i_size_read(bdev->bd_inode); + if (start >= isize) + return -EINVAL; + if (end > isize) { + if (mode & FALLOC_FL_KEEP_SIZE) { + len = isize - start; + end = start + len - 1; + } else + return -EINVAL; + } + + /* + * Don't allow IO that isn't aligned to logical block size. + */ + if ((start | len) & (bdev_logical_block_size(bdev) - 1)) + return -EINVAL; + + /* Invalidate the page cache, including dirty pages. */ + mapping = bdev->bd_inode->i_mapping; + truncate_inode_pages_range(mapping, start, end); + + switch (mode) { + case FALLOC_FL_ZERO_RANGE: + case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE: + error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, + GFP_KERNEL, false); + if (error) + return error; + break; + case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE: + /* Only punch if the device can do zeroing discard. */ + if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data) + return -EOPNOTSUPP; + error = blkdev_issue_discard(bdev, start >> 9, len >> 9, + GFP_KERNEL, 0); + if (error) + return error; + break; + case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE: + error = blkdev_issue_discard(bdev, start >> 9, len >> 9, + GFP_KERNEL, 0); + if (error) + return error; + break; + default: + return -EOPNOTSUPP; + } + + /* + * Invalidate again; if someone wandered in and dirtied a page, + * the caller will be given -EBUSY; + */ + return invalidate_inode_pages2_range(mapping, + start >> PAGE_SHIFT, + end >> PAGE_SHIFT); +} + const struct file_operations def_blk_fops = { .open = blkdev_open, .release = blkdev_close, @@ -1801,6 +1878,7 @@ const struct file_operations def_blk_fops = { #endif .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, + .fallocate = blkdev_fallocate, }; int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg) diff --git a/fs/open.c b/fs/open.c index 4fd6e25..01b6092 100644 --- a/fs/open.c +++ b/fs/open.c @@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len) * Let individual file system decide if it supports preallocation * for directories or not. */ - if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode)) + if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) && + !S_ISBLK(inode->i_mode)) return -ENODEV; /* Check for wrap through zero too */ ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices 2016-09-29 0:39 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong @ 2016-09-29 1:42 ` Bart Van Assche 2016-09-29 2:09 ` Darrick J. Wong 2016-09-29 2:19 ` [PATCH v2 " Darrick J. Wong 2016-09-29 5:57 ` [PATCH " Hannes Reinecke 2 siblings, 1 reply; 13+ messages in thread From: Bart Van Assche @ 2016-09-29 1:42 UTC (permalink / raw) To: Darrick J. Wong, axboe, akpm Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs, hch, dm-devel, linux-fsdevel On 09/28/16 17:39, Darrick J. Wong wrote: > + if (end > isize) { > + if (mode & FALLOC_FL_KEEP_SIZE) { > + len = isize - start; > + end = start + len - 1; > + } else > + return -EINVAL; > + } If FALLOC_FL_KEEP_SIZE has been set and end == isize the above code won't reduce end to isize - 1. Shouldn't "end > isize" be changed into "end >= isize" ? > + switch (mode) { > + case FALLOC_FL_ZERO_RANGE: > + case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE: > + error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, > + GFP_KERNEL, false); > + if (error) > + return error; > + break; > + case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE: > + /* Only punch if the device can do zeroing discard. */ > + if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data) > + return -EOPNOTSUPP; > + error = blkdev_issue_discard(bdev, start >> 9, len >> 9, > + GFP_KERNEL, 0); > + if (error) > + return error; > + break; > + case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE: > + error = blkdev_issue_discard(bdev, start >> 9, len >> 9, > + GFP_KERNEL, 0); > + if (error) > + return error; > + break; > + default: > + return -EOPNOTSUPP; > + } Have you considered to move "if (error) return error" out of the switch statement? > + /* > + * Invalidate again; if someone wandered in and dirtied a page, > + * the caller will be given -EBUSY; > + */ > + return invalidate_inode_pages2_range(mapping, > + start >> PAGE_SHIFT, > + end >> PAGE_SHIFT); A comment might be appropriate here that since end is inclusive and since the third argument of invalidate_inode_pages2_range() is inclusive that rounding down will yield the correct result. Bart. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices 2016-09-29 1:42 ` Bart Van Assche @ 2016-09-29 2:09 ` Darrick J. Wong 0 siblings, 0 replies; 13+ messages in thread From: Darrick J. Wong @ 2016-09-29 2:09 UTC (permalink / raw) To: Bart Van Assche Cc: axboe, hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs, linux-block, dm-devel, linux-fsdevel, akpm On Wed, Sep 28, 2016 at 06:42:14PM -0700, Bart Van Assche wrote: > On 09/28/16 17:39, Darrick J. Wong wrote: > >+ if (end > isize) { > >+ if (mode & FALLOC_FL_KEEP_SIZE) { > >+ len = isize - start; > >+ end = start + len - 1; > >+ } else > >+ return -EINVAL; > >+ } > > If FALLOC_FL_KEEP_SIZE has been set and end == isize the above code won't > reduce end to isize - 1. Shouldn't "end > isize" be changed into "end >= > isize" ? Oops. Will fix and send out a v2. > >+ switch (mode) { > >+ case FALLOC_FL_ZERO_RANGE: > >+ case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE: > >+ error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, > >+ GFP_KERNEL, false); > >+ if (error) > >+ return error; > >+ break; > >+ case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE: > >+ /* Only punch if the device can do zeroing discard. */ > >+ if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data) > >+ return -EOPNOTSUPP; > >+ error = blkdev_issue_discard(bdev, start >> 9, len >> 9, > >+ GFP_KERNEL, 0); > >+ if (error) > >+ return error; > >+ break; > >+ case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE: > >+ error = blkdev_issue_discard(bdev, start >> 9, len >> 9, > >+ GFP_KERNEL, 0); > >+ if (error) > >+ return error; > >+ break; > >+ default: > >+ return -EOPNOTSUPP; > >+ } > > Have you considered to move "if (error) return error" out of the switch > statement? Sure, I could do that. > >+ /* > >+ * Invalidate again; if someone wandered in and dirtied a page, > >+ * the caller will be given -EBUSY; > >+ */ > >+ return invalidate_inode_pages2_range(mapping, > >+ start >> PAGE_SHIFT, > >+ end >> PAGE_SHIFT); > > A comment might be appropriate here that since end is inclusive and since > the third argument of invalidate_inode_pages2_range() is inclusive that > rounding down will yield the correct result. /methot the documentation of invalidate_inode_pages2_range was clear enough on that point, but I could throw that into the comment too. --D > > Bart. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 3/3] block: implement (some of) fallocate for block devices 2016-09-29 0:39 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong 2016-09-29 1:42 ` Bart Van Assche @ 2016-09-29 2:19 ` Darrick J. Wong [not found] ` <20160929021900.GA4901-PTl6brltDGh4DFYR7WNSRA@public.gmane.org> 2016-09-29 5:57 ` [PATCH " Hannes Reinecke 2 siblings, 1 reply; 13+ messages in thread From: Darrick J. Wong @ 2016-09-29 2:19 UTC (permalink / raw) To: axboe, akpm Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs, linux-block, dm-devel, linux-fsdevel, bart.vanassche After much discussion, it seems that the fallocate feature flag FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been whitelisted for zeroing SCSI UNMAP. Punch still requires that FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the device will be clamped to the device size if KEEP_SIZE is set; or will return -EINVAL if not. Both start and length must be aligned to the device's logical block size. Since the semantics of fallocate are fairly well established already, wire up the two pieces. The other fallocate variants (collapse range, insert range, and allocate blocks) are not supported. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- v2: Incorporate feedback from Christoph & Linus. Tentatively add a requirement that the fallocate arguments be aligned to logical block size, and put in a few XXX comments ahead of LSF discussion. v3: Forward port to 4.7. v4: Forward port to 4.8. --- fs/block_dev.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/open.c | 3 +- 2 files changed, 77 insertions(+), 1 deletion(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 08ae993..7b6d096 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -30,6 +30,7 @@ #include <linux/cleancache.h> #include <linux/dax.h> #include <linux/badblocks.h> +#include <linux/falloc.h> #include <asm/uaccess.h> #include "internal.h" @@ -1787,6 +1788,79 @@ static const struct address_space_operations def_blk_aops = { .is_dirty_writeback = buffer_check_dirty_writeback, }; +#define BLKDEV_FALLOC_FL_SUPPORTED \ + (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \ + FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE) + +static long blkdev_fallocate(struct file *file, int mode, loff_t start, + loff_t len) +{ + struct block_device *bdev = I_BDEV(bdev_file_inode(file)); + struct request_queue *q = bdev_get_queue(bdev); + struct address_space *mapping; + loff_t end = start + len - 1; + loff_t isize; + int error; + + /* Fail if we don't recognize the flags. */ + if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED) + return -EOPNOTSUPP; + + /* Don't go off the end of the device. */ + isize = i_size_read(bdev->bd_inode); + if (start >= isize) + return -EINVAL; + if (end >= isize) { + if (mode & FALLOC_FL_KEEP_SIZE) { + len = isize - start; + end = start + len - 1; + } else + return -EINVAL; + } + + /* + * Don't allow IO that isn't aligned to logical block size. + */ + if ((start | len) & (bdev_logical_block_size(bdev) - 1)) + return -EINVAL; + + /* Invalidate the page cache, including dirty pages. */ + mapping = bdev->bd_inode->i_mapping; + truncate_inode_pages_range(mapping, start, end); + + switch (mode) { + case FALLOC_FL_ZERO_RANGE: + case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE: + error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, + GFP_KERNEL, false); + break; + case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE: + /* Only punch if the device can do zeroing discard. */ + if (!blk_queue_discard(q) || !q->limits.discard_zeroes_data) + return -EOPNOTSUPP; + error = blkdev_issue_discard(bdev, start >> 9, len >> 9, + GFP_KERNEL, 0); + break; + case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE: + error = blkdev_issue_discard(bdev, start >> 9, len >> 9, + GFP_KERNEL, 0); + break; + default: + return -EOPNOTSUPP; + } + if (error) + return error; + + /* + * Invalidate again; if someone wandered in and dirtied a page, + * the caller will be given -EBUSY. The third argument is + * inclusive, so the rounding here is safe. + */ + return invalidate_inode_pages2_range(mapping, + start >> PAGE_SHIFT, + end >> PAGE_SHIFT); +} + const struct file_operations def_blk_fops = { .open = blkdev_open, .release = blkdev_close, @@ -1801,6 +1875,7 @@ const struct file_operations def_blk_fops = { #endif .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, + .fallocate = blkdev_fallocate, }; int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg) diff --git a/fs/open.c b/fs/open.c index 4fd6e25..01b6092 100644 --- a/fs/open.c +++ b/fs/open.c @@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len) * Let individual file system decide if it supports preallocation * for directories or not. */ - if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode)) + if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) && + !S_ISBLK(inode->i_mode)) return -ENODEV; /* Check for wrap through zero too */ ^ permalink raw reply related [flat|nested] 13+ messages in thread
[parent not found: <20160929021900.GA4901-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>]
* Re: [PATCH v2 3/3] block: implement (some of) fallocate for block devices [not found] ` <20160929021900.GA4901-PTl6brltDGh4DFYR7WNSRA@public.gmane.org> @ 2016-09-29 20:08 ` Bart Van Assche 2016-09-29 20:35 ` Darrick J. Wong 0 siblings, 1 reply; 13+ messages in thread From: Bart Van Assche @ 2016-09-29 20:08 UTC (permalink / raw) To: Darrick J. Wong, axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org Cc: hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org, tytso-3s7WtUTddSA@public.gmane.org, martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org, snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, bfoster-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org, linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On 09/28/2016 07:19 PM, Darrick J. Wong wrote: > After much discussion, it seems that the fallocate feature flag > FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature > FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been > whitelisted for zeroing SCSI UNMAP. Punch still requires that > FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the > device will be clamped to the device size if KEEP_SIZE is set; or will > return -EINVAL if not. Both start and length must be aligned to the > device's logical block size. > > Since the semantics of fallocate are fairly well established already, > wire up the two pieces. The other fallocate variants (collapse range, > insert range, and allocate blocks) are not supported. For the FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE case, it's probably safer not to try to send a discard to block devices that do not support discard in order not to hit block driver bugs. But that's something we can still discuss later. Hence: Reviewed-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/3] block: implement (some of) fallocate for block devices 2016-09-29 20:08 ` Bart Van Assche @ 2016-09-29 20:35 ` Darrick J. Wong 0 siblings, 0 replies; 13+ messages in thread From: Darrick J. Wong @ 2016-09-29 20:35 UTC (permalink / raw) To: Bart Van Assche Cc: axboe@kernel.dk, linux-block@vger.kernel.org, tytso@mit.edu, martin.petersen@oracle.com, snitzer@redhat.com, linux-api@vger.kernel.org, bfoster@redhat.com, xfs@oss.sgi.com, hch@infradead.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, akpm@linux-foundation.org On Thu, Sep 29, 2016 at 01:08:57PM -0700, Bart Van Assche wrote: > On 09/28/2016 07:19 PM, Darrick J. Wong wrote: > >After much discussion, it seems that the fallocate feature flag > >FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature > >FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been > >whitelisted for zeroing SCSI UNMAP. Punch still requires that > >FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the > >device will be clamped to the device size if KEEP_SIZE is set; or will > >return -EINVAL if not. Both start and length must be aligned to the > >device's logical block size. > > > >Since the semantics of fallocate are fairly well established already, > >wire up the two pieces. The other fallocate variants (collapse range, > >insert range, and allocate blocks) are not supported. > > For the FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE > case, it's probably safer not to try to send a discard to block devices that > do not support discard in order not to hit block driver bugs. But that's > something we can still discuss later. Hence: I'll just change it to check the queue flags and post a new revision. At this point I might as well repost the whole thing to reflect the reviewed-bys. --D > Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices 2016-09-29 0:39 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong 2016-09-29 1:42 ` Bart Van Assche 2016-09-29 2:19 ` [PATCH v2 " Darrick J. Wong @ 2016-09-29 5:57 ` Hannes Reinecke 2 siblings, 0 replies; 13+ messages in thread From: Hannes Reinecke @ 2016-09-29 5:57 UTC (permalink / raw) To: Darrick J. Wong, axboe, akpm Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs, hch, dm-devel, linux-fsdevel, bart.vanassche On 09/29/2016 02:39 AM, Darrick J. Wong wrote: > After much discussion, it seems that the fallocate feature flag > FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature > FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been > whitelisted for zeroing SCSI UNMAP. Punch still requires that > FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the > device will be clamped to the device size if KEEP_SIZE is set; or will > return -EINVAL if not. Both start and length must be aligned to the > device's logical block size. > > Since the semantics of fallocate are fairly well established already, > wire up the two pieces. The other fallocate variants (collapse range, > insert range, and allocate blocks) are not supported. > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > --- > v2: Incorporate feedback from Christoph & Linus. Tentatively add > a requirement that the fallocate arguments be aligned to logical block > size, and put in a few XXX comments ahead of LSF discussion. > v3: Forward port to 4.7. > v4: Forward port to 4.8. > --- > fs/block_dev.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > fs/open.c | 3 +- > 2 files changed, 80 insertions(+), 1 deletion(-) > Reviewed-by: Hannes Reinecke <hare@suse.com> Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-09-29 20:35 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-09-29 0:39 [PATCH v10 0/3] fallocate for block devices Darrick J. Wong [not found] ` <147510957066.8940.13803086684642725401.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org> 2016-09-29 0:39 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong 2016-09-29 1:16 ` Bart Van Assche 2016-09-29 5:56 ` Hannes Reinecke 2016-09-29 0:39 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong [not found] ` <147510958448.8940.14280630935441533825.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org> 2016-09-29 5:56 ` Hannes Reinecke 2016-09-29 0:39 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong 2016-09-29 1:42 ` Bart Van Assche 2016-09-29 2:09 ` Darrick J. Wong 2016-09-29 2:19 ` [PATCH v2 " Darrick J. Wong [not found] ` <20160929021900.GA4901-PTl6brltDGh4DFYR7WNSRA@public.gmane.org> 2016-09-29 20:08 ` Bart Van Assche 2016-09-29 20:35 ` Darrick J. Wong 2016-09-29 5:57 ` [PATCH " Hannes Reinecke
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).