* [PATCH v7 0/3] fallocate for block devices to provide zero-out
@ 2016-03-15 19:42 Darrick J. Wong
[not found] ` <20160315194221.30093.70506.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-15 19:42 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
0 siblings, 2 replies; 24+ messages in thread
From: Darrick J. Wong @ 2016-03-15 19:42 UTC (permalink / raw)
To: axboe, torvalds, darrick.wong
Cc: bfields, tytso, martin.petersen, linux-api, david, linux-kernel,
shane.seymour, hch, linux-fsdevel, jlayton, akpm
Hi,
This is a redesign of the patch series that fixes various interface
problems with the existing "zero out this part of a block device"
code. BLKZEROOUT2 is gone.
The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds.
The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size. Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.
The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices.
Test cases for the new block device fallocate have been submitted to
the xfstests list as generic/70[5-7], though the numbering will change
to a lower number when the API and the tests are accepted upstream.
Look for the v2 testcase patch, which reflects v7 of this patchset.
Comments and questions are, as always, welcome. Patches are against
4.5.
v7: Strengthen parameter checking and fix various code issues pointed
out by Linus and Christoph.
--D
^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20160315194221.30093.70506.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>]
* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
[not found] ` <20160315194221.30093.70506.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
@ 2016-03-15 19:42 ` Darrick J. Wong
2016-03-15 19:42 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong
1 sibling, 0 replies; 24+ messages in thread
From: Darrick J. Wong @ 2016-03-15 19:42 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw,
torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
Cc: bfields-uC3wQj2KruNg9hUCZPvPmw, tytso-3s7WtUTddSA,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
linux-api-u79uwXL29TY76Z2rM5mHXA, david-FqsqvQoI3Ljby3iVrkZq2A,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, shane.seymour-ZPxbGqLxI0U,
hch-wEGCiKHe2LqWVfeAwA7xHQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
jlayton-vpEMnDpepFuMZCB2o+C8xQ, Christoph Hellwig
Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.
v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
Split the page invalidation and the new ioctl into separate patches.
Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
block/ioctl.c | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index d8996bb..c6eb462 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -226,7 +226,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
unsigned long arg)
{
uint64_t range[2];
- uint64_t start, len;
+ struct address_space *mapping;
+ uint64_t start, end, len;
+ int ret;
if (!(mode & FMODE_WRITE))
return -EBADF;
@@ -236,18 +238,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
start = range[0];
len = range[1];
+ end = start + len - 1;
if (start & 511)
return -EINVAL;
if (len & 511)
return -EINVAL;
- start >>= 9;
- len >>= 9;
-
- if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+ if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+ return -EINVAL;
+ if (end < start)
return -EINVAL;
- return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+ /* Invalidate the page cache, including dirty pages */
+ mapping = bdev->bd_inode->i_mapping;
+ truncate_inode_pages_range(mapping, start, end);
+
+ ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+ false);
+ if (ret)
+ return ret;
+
+ /*
+ * Invalidate again; if someone wandered in and dirtied a page,
+ * the caller will be given -EBUSY.
+ */
+ return invalidate_inode_pages2_range(mapping,
+ start >> PAGE_CACHE_SHIFT,
+ end >> PAGE_CACHE_SHIFT);
}
static int put_ushort(unsigned long arg, unsigned short val)
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 3/3] block: implement (some of) fallocate for block devices
[not found] ` <20160315194221.30093.70506.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-15 19:42 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
@ 2016-03-15 19:42 ` Darrick J. Wong
[not found] ` <20160315194244.30093.6483.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-21 18:52 ` Mike Snitzer
1 sibling, 2 replies; 24+ messages in thread
From: Darrick J. Wong @ 2016-03-15 19:42 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw,
torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
Cc: bfields-uC3wQj2KruNg9hUCZPvPmw, tytso-3s7WtUTddSA,
martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
linux-api-u79uwXL29TY76Z2rM5mHXA, david-FqsqvQoI3Ljby3iVrkZq2A,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, shane.seymour-ZPxbGqLxI0U,
hch-wEGCiKHe2LqWVfeAwA7xHQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
jlayton-vpEMnDpepFuMZCB2o+C8xQ,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP. Punch still requires that
FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the
device will be clamped to the device size if KEEP_SIZE is set; or will
return -EINVAL if not. Both start and length must be aligned to the
device's logical block size.
Since the semantics of fallocate are fairly well established already,
wire up the two pieces. The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.
Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
fs/block_dev.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/open.c | 3 ++
2 files changed, 71 insertions(+), 1 deletion(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 826b164..6137c6e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
#include <linux/cleancache.h>
#include <linux/dax.h>
#include <asm/uaccess.h>
+#include <linux/falloc.h>
#include "internal.h"
struct bdev_inode {
@@ -1786,6 +1787,73 @@ static int blkdev_mmap(struct file *file, struct vm_area_struct *vma)
#define blkdev_mmap generic_file_mmap
#endif
+#define BLKDEV_FALLOC_FL_SUPPORTED \
+ (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \
+ FALLOC_FL_ZERO_RANGE)
+
+long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
+{
+ struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+ struct request_queue *q = bdev_get_queue(bdev);
+ struct address_space *mapping;
+ loff_t end = start + len - 1;
+ loff_t bs_mask, isize;
+ int error;
+
+ /* We only support zero range and punch hole. */
+ if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+ return -EOPNOTSUPP;
+
+ /* We haven't a primitive for "ensure space exists" right now. */
+ if (!(mode & ~FALLOC_FL_KEEP_SIZE))
+ return -EOPNOTSUPP;
+
+ /* Only punch if the device can do zeroing discard. */
+ if ((mode & FALLOC_FL_PUNCH_HOLE) &&
+ (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
+ return -EOPNOTSUPP;
+
+ /* Don't go off the end of the device */
+ isize = i_size_read(bdev->bd_inode);
+ if (start >= isize)
+ return -EINVAL;
+ if (end > isize) {
+ if (mode & FALLOC_FL_KEEP_SIZE) {
+ len = isize - start;
+ end = start + len - 1;
+ } else
+ return -EINVAL;
+ }
+
+ /* Don't allow IO that isn't aligned to logical block size */
+ bs_mask = bdev_logical_block_size(bdev) - 1;
+ if ((start | len) & bs_mask)
+ return -EINVAL;
+
+ /* Invalidate the page cache, including dirty pages. */
+ mapping = bdev->bd_inode->i_mapping;
+ truncate_inode_pages_range(mapping, start, end);
+
+ error = -EINVAL;
+ if (mode & FALLOC_FL_ZERO_RANGE)
+ error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+ GFP_KERNEL, false);
+ else if (mode & FALLOC_FL_PUNCH_HOLE)
+ error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+ GFP_KERNEL, 0);
+ if (error)
+ return error;
+
+ /*
+ * Invalidate again; if someone wandered in and dirtied a page,
+ * the caller will be given -EBUSY;
+ */
+ return invalidate_inode_pages2_range(mapping,
+ start >> PAGE_CACHE_SHIFT,
+ end >> PAGE_CACHE_SHIFT);
+}
+EXPORT_SYMBOL_GPL(blkdev_fallocate);
+
const struct file_operations def_blk_fops = {
.open = blkdev_open,
.release = blkdev_close,
@@ -1800,6 +1868,7 @@ const struct file_operations def_blk_fops = {
#endif
.splice_read = generic_file_splice_read,
.splice_write = iter_file_splice_write,
+ .fallocate = blkdev_fallocate,
};
int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 55bdc75..4f99adc 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
* Let individual file system decide if it supports preallocation
* for directories or not.
*/
- if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+ if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+ !S_ISBLK(inode->i_mode))
return -ENODEV;
/* Check for wrap through zero too */
^ permalink raw reply related [flat|nested] 24+ messages in thread
[parent not found: <20160315194244.30093.6483.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>]
* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
[not found] ` <20160315194244.30093.6483.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
@ 2016-03-21 15:38 ` Christoph Hellwig
[not found] ` <20160321153827.GA27230-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2016-03-21 15:38 UTC (permalink / raw)
To: Darrick J. Wong
Cc: axboe-tSWWG44O7X1aa/9Udqfwiw,
torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
bfields-uC3wQj2KruNg9hUCZPvPmw, tytso-3s7WtUTddSA,
martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
linux-api-u79uwXL29TY76Z2rM5mHXA, david-FqsqvQoI3Ljby3iVrkZq2A,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, shane.seymour-ZPxbGqLxI0U,
hch-wEGCiKHe2LqWVfeAwA7xHQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
jlayton-vpEMnDpepFuMZCB2o+C8xQ,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
On Tue, Mar 15, 2016 at 12:42:44PM -0700, Darrick J. Wong wrote:
> #include <linux/cleancache.h>
> #include <linux/dax.h>
> #include <asm/uaccess.h>
> +#include <linux/falloc.h>
Maybe keep this before asm/uaccess.h
> +long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
should be marked static.
> + /* We haven't a primitive for "ensure space exists" right now. */
> + if (!(mode & ~FALLOC_FL_KEEP_SIZE))
> + return -EOPNOTSUPP;
I don't really understand the comment. But I think you'd be much
better off with having blkdev_fallocate as just a tiny wrapper that has
a switch for the supported modes, e.g.
switch (mode) {
case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
return blkdev_punch_hole();
case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE::
return blkdev_zero_range();
default:
return -EOPNOTSUPP;
}
> +EXPORT_SYMBOL_GPL(blkdev_fallocate);
and no need to export it either..
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
2016-03-15 19:42 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong
[not found] ` <20160315194244.30093.6483.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
@ 2016-03-21 18:52 ` Mike Snitzer
2016-03-21 19:11 ` Darrick J. Wong
1 sibling, 1 reply; 24+ messages in thread
From: Mike Snitzer @ 2016-03-21 18:52 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Jens Axboe, Christoph Hellwig, Theodore Ts'o,
Martin K. Petersen, linux-api, david,
linux-kernel@vger.kernel.org, shane.seymour, Bruce Fields,
device-mapper development, linux-fsdevel, Jeff Layton,
Linus Torvalds, Andrew Morton
On Tue, Mar 15, 2016 at 3:42 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> After much discussion, it seems that the fallocate feature flag
> FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> whitelisted for zeroing SCSI UNMAP. Punch still requires that
> FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the
> device will be clamped to the device size if KEEP_SIZE is set; or will
> return -EINVAL if not. Both start and length must be aligned to the
> device's logical block size.
>
> Since the semantics of fallocate are fairly well established already,
> wire up the two pieces. The other fallocate variants (collapse range,
> insert range, and allocate blocks) are not supported.
I'd like to see fallocate (block allocation) extend down to DM thinp.
This more traditional use of fallocate would be useful for ensuring
ENOSPC won't occur -- especially important if the FS has committed
space in response to fallocate. As of now fallocate doesn't inform DM
thinp at all. Curious why you decided not to wire it up?
But I'm not sure what "it" (the "allocate blocks" variant) even is
given falloc.h doesn't show anything like "_ALLOCATE_BLOCKS"...
It would require a new block interface to pass the fallocate extent
down. But it seems bizarre to implement "some of" fallocate but not
the most widely used case for fallocate.
Mike
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
2016-03-21 18:52 ` Mike Snitzer
@ 2016-03-21 19:11 ` Darrick J. Wong
2016-03-21 19:22 ` Mike Snitzer
0 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2016-03-21 19:11 UTC (permalink / raw)
To: Mike Snitzer
Cc: Jens Axboe, Christoph Hellwig, Theodore Ts'o,
Martin K. Petersen, linux-api, david,
linux-kernel@vger.kernel.org, shane.seymour, Bruce Fields,
device-mapper development, linux-fsdevel, Jeff Layton,
Linus Torvalds, Andrew Morton
On Mon, Mar 21, 2016 at 02:52:00PM -0400, Mike Snitzer wrote:
> On Tue, Mar 15, 2016 at 3:42 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > After much discussion, it seems that the fallocate feature flag
> > FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> > FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> > whitelisted for zeroing SCSI UNMAP. Punch still requires that
> > FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the
> > device will be clamped to the device size if KEEP_SIZE is set; or will
> > return -EINVAL if not. Both start and length must be aligned to the
> > device's logical block size.
> >
> > Since the semantics of fallocate are fairly well established already,
> > wire up the two pieces. The other fallocate variants (collapse range,
> > insert range, and allocate blocks) are not supported.
>
> I'd like to see fallocate (block allocation) extend down to DM thinp.
> This more traditional use of fallocate would be useful for ensuring
> ENOSPC won't occur -- especially important if the FS has committed
> space in response to fallocate. As of now fallocate doesn't inform DM
> thinp at all. Curious why you decided not to wire it up?
I don't know what to wire it up to. :)
I didn't find any blkdev_* function that looked encouraging, though I
haven't dug too deeply into bfoster's "prototype a block reservation
allocation model" patchset yet. At a high level I'd guess that would
be a reasonable piece to connect to? It looks like the piece I want
is blk_provision_space().
> But I'm not sure what "it" (the "allocate blocks" variant) even is
> given falloc.h doesn't show anything like "_ALLOCATE_BLOCKS"...
The default behavior of fallocate is to allocate blocks, which means
that one invokes it by not passing any mode flags (except possibly
KEEP_SIZE).
> It would require a new block interface to pass the fallocate extent
> down. But it seems bizarre to implement "some of" fallocate but not
> the most widely used case for fallocate.
Agreed. I'd like to get the existing functionality wired up sooner than
later, and plumbing "allocate blocks" down to thinp can be done as a
followup.
(Or stall long enough that it becomes one patchset.)
--D
>
> Mike
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
2016-03-21 19:11 ` Darrick J. Wong
@ 2016-03-21 19:22 ` Mike Snitzer
[not found] ` <20160321192229.GA17220-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 24+ messages in thread
From: Mike Snitzer @ 2016-03-21 19:22 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Jens Axboe, Linus Torvalds, Bruce Fields, Theodore Ts'o,
Martin K. Petersen, linux-api, david,
linux-kernel@vger.kernel.org, shane.seymour, Christoph Hellwig,
linux-fsdevel, Jeff Layton, Andrew Morton,
device-mapper development
On Mon, Mar 21 2016 at 3:11pm -0400,
Darrick J. Wong <darrick.wong@oracle.com> wrote:
> On Mon, Mar 21, 2016 at 02:52:00PM -0400, Mike Snitzer wrote:
> > On Tue, Mar 15, 2016 at 3:42 PM, Darrick J. Wong
> > <darrick.wong@oracle.com> wrote:
> > > After much discussion, it seems that the fallocate feature flag
> > > FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
> > > FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
> > > whitelisted for zeroing SCSI UNMAP. Punch still requires that
> > > FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the
> > > device will be clamped to the device size if KEEP_SIZE is set; or will
> > > return -EINVAL if not. Both start and length must be aligned to the
> > > device's logical block size.
> > >
> > > Since the semantics of fallocate are fairly well established already,
> > > wire up the two pieces. The other fallocate variants (collapse range,
> > > insert range, and allocate blocks) are not supported.
> >
> > I'd like to see fallocate (block allocation) extend down to DM thinp.
> > This more traditional use of fallocate would be useful for ensuring
> > ENOSPC won't occur -- especially important if the FS has committed
> > space in response to fallocate. As of now fallocate doesn't inform DM
> > thinp at all. Curious why you decided not to wire it up?
>
> I don't know what to wire it up to. :)
Fair enough. Yes something needs to be invented.
> I didn't find any blkdev_* function that looked encouraging, though I
> haven't dug too deeply into bfoster's "prototype a block reservation
> allocation model" patchset yet. At a high level I'd guess that would
> be a reasonable piece to connect to? It looks like the piece I want
> is blk_provision_space().
Yes, something like that.
> > But I'm not sure what "it" (the "allocate blocks" variant) even is
> > given falloc.h doesn't show anything like "_ALLOCATE_BLOCKS"...
>
> The default behavior of fallocate is to allocate blocks, which means
> that one invokes it by not passing any mode flags (except possibly
> KEEP_SIZE).
OK.
> > It would require a new block interface to pass the fallocate extent
> > down. But it seems bizarre to implement "some of" fallocate but not
> > the most widely used case for fallocate.
>
> Agreed. I'd like to get the existing functionality wired up sooner than
> later, and plumbing "allocate blocks" down to thinp can be done as a
> followup.
>
> (Or stall long enough that it becomes one patchset.)
Sure, sounds good. Glad we're in agreement.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/3] block: require write_same and discard requests align to logical block size
2016-03-15 19:42 [PATCH v7 0/3] fallocate for block devices to provide zero-out Darrick J. Wong
[not found] ` <20160315194221.30093.70506.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
@ 2016-03-15 19:42 ` Darrick J. Wong
1 sibling, 0 replies; 24+ messages in thread
From: Darrick J. Wong @ 2016-03-15 19:42 UTC (permalink / raw)
To: axboe, torvalds, darrick.wong
Cc: bfields, tytso, akpm, martin.petersen, linux-api, david,
linux-kernel, shane.seymour, hch, linux-fsdevel, jlayton,
Christoph Hellwig
Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size. Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
block/blk-lib.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9ebf653..9dca6bb 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -49,6 +49,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
struct bio *bio;
int ret = 0;
struct blk_plug plug;
+ sector_t bs_mask;
if (!q)
return -ENXIO;
@@ -56,6 +57,10 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
+ bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+ if ((sector | nr_sects) & bs_mask)
+ return -EINVAL;
+
/* Zero-sector (unknown) and one-sector granularities are the same. */
granularity = max(q->limits.discard_granularity >> 9, 1U);
alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -148,6 +153,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
unsigned int max_write_same_sectors;
+ sector_t bs_mask;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -155,6 +161,10 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
if (!q)
return -ENXIO;
+ bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+ if ((sector | nr_sects) & bs_mask)
+ return -EINVAL;
+
/* Ensure that max_write_same_sectors doesn't overflow bi_size */
max_write_same_sectors = UINT_MAX >> 9;
@@ -218,9 +228,14 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
int ret;
struct bio *bio;
struct bio_batch bb;
+ sector_t bs_mask;
unsigned int sz;
DECLARE_COMPLETION_ONSTACK(wait);
+ bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+ if ((sector | nr_sects) & bs_mask)
+ return -EINVAL;
+
atomic_set(&bb.done, 1);
bb.error = 0;
bb.wait = &wait;
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v11 0/3] fallocate for block devices
@ 2016-09-29 21:16 Darrick J. Wong
2016-09-29 21:16 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
0 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2016-09-29 21:16 UTC (permalink / raw)
To: axboe, akpm, darrick.wong
Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
xfs, hch, dm-devel, hare, linux-fsdevel, bart.vanassche
Hi Andrew & others,
This is a resend of the patchset to fix page cache coherency with
BLKZEROOUT and implement fallocate for block devices. This time I'm
sending it direct to Andrew for inclusion because the block layer
maintainer has not been responsive over the past year of submissions.
Can this please go upstream for 4.9?
The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds. Without this patch we still have the pagecache
coherence bug that's been in the kernel forever.
The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size. Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.
The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices. It also allows
the combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing
discard.
Test cases for the new block device fallocate are now in xfstests as
generic/349-351.
Comments and questions are, as always, welcome. Patches are against
4.8-rc8.
v7: Strengthen parameter checking and fix various code issues pointed
out by Linus and Christoph.
v8: More code rearranging, rebase to 4.6-rc3, and dig into alignment
issues.
v9: Forward port to 4.7.
v10: Forward port to 4.8. Remove the extra call to
invalidate_inode_pages2_range per Bart Van Assche's request.
v11: Minor fallocate code cleanups suggested by Bart Van Assche.
--D
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
2016-09-29 21:16 [PATCH v11 0/3] fallocate for block devices Darrick J. Wong
@ 2016-09-29 21:16 ` Darrick J. Wong
0 siblings, 0 replies; 24+ messages in thread
From: Darrick J. Wong @ 2016-09-29 21:16 UTC (permalink / raw)
To: axboe, akpm, darrick.wong
Cc: linux-block, Hannes Reinecke, tytso, snitzer, martin.petersen,
linux-api, bfoster, xfs, hch, dm-devel, hare, linux-fsdevel,
bart.vanassche, Christoph Hellwig
Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
---
block/ioctl.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..755119c 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -225,7 +225,8 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
unsigned long arg)
{
uint64_t range[2];
- uint64_t start, len;
+ struct address_space *mapping;
+ uint64_t start, end, len;
if (!(mode & FMODE_WRITE))
return -EBADF;
@@ -235,18 +236,23 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
start = range[0];
len = range[1];
+ end = start + len - 1;
if (start & 511)
return -EINVAL;
if (len & 511)
return -EINVAL;
- start >>= 9;
- len >>= 9;
-
- if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+ if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+ return -EINVAL;
+ if (end < start)
return -EINVAL;
- return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+ /* Invalidate the page cache, including dirty pages */
+ mapping = bdev->bd_inode->i_mapping;
+ truncate_inode_pages_range(mapping, start, end);
+
+ return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+ false);
}
static int put_ushort(unsigned long arg, unsigned short val)
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v10 0/3] fallocate for block devices
@ 2016-09-29 0:39 Darrick J. Wong
[not found] ` <147510957066.8940.13803086684642725401.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
0 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2016-09-29 0:39 UTC (permalink / raw)
To: axboe, akpm, darrick.wong
Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
xfs, hch, dm-devel, linux-fsdevel, bart.vanassche
Hi Andrew & others,
This is a resend of the patchset to fix page cache coherency with
BLKZEROOUT and implement fallocate for block devices. This time I'm
sending it direct to Andrew for inclusion because the block layer
maintainer has not been responsive over the past year of submissions.
Can this please go upstream for 4.9?
The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds. Without this patch we still have the pagecache
coherence bug that's been in the kernel forever.
The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size. Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.
The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices. It also allows
the combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing
discard.
Test cases for the new block device fallocate are now in xfstests as
generic/349-351.
Comments and questions are, as always, welcome. Patches are against
4.8-rc8.
v7: Strengthen parameter checking and fix various code issues pointed
out by Linus and Christoph.
v8: More code rearranging, rebase to 4.6-rc3, and dig into alignment
issues.
v9: Forward port to 4.7.
v10: Forward port to 4.8. Remove the extra call to
invalidate_inode_pages2_range per Bart Van Assche's request.
--D
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v10 0/3] fallocate for block devices
@ 2016-08-26 0:02 Darrick J. Wong
2016-08-26 0:02 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
0 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2016-08-26 0:02 UTC (permalink / raw)
To: axboe, darrick.wong
Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
linux-block, dm-devel, linux-fsdevel, bart.vanassche
Hi,
This is a redesign of the patch series that fixes various interface
problems with the existing "zero out this part of a block device"
code. BLKZEROOUT2 is gone.
The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds. Without this patch we still have the pagecache
coherence bug that's been in the kernel forever.
The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size. Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.
The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices. It also allows
the combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing
discard.
Test cases for the new block device fallocate are now in xfstests as
generic/349-351.
Comments and questions are, as always, welcome. Patches are against
4.8-rc3.
v7: Strengthen parameter checking and fix various code issues pointed
out by Linus and Christoph.
v8: More code rearranging, rebase to 4.6-rc3, and dig into alignment
issues.
v9: Forward port to 4.7.
v10: Forward port to 4.8. Remove the extra call to
invalidate_inode_pages2_range per Bart Van Assche's request.
--D
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
2016-08-26 0:02 [PATCH v10 0/3] fallocate for block devices Darrick J. Wong
@ 2016-08-26 0:02 ` Darrick J. Wong
0 siblings, 0 replies; 24+ messages in thread
From: Darrick J. Wong @ 2016-08-26 0:02 UTC (permalink / raw)
To: axboe, darrick.wong
Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
linux-block, dm-devel, linux-fsdevel, bart.vanassche,
Christoph Hellwig
Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.
v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
Split the page invalidation and the new ioctl into separate patches.
v6: Remove the call to invalidate_inode_pages2_range since we don't need it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
block/ioctl.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..755119c 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -225,7 +225,8 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
unsigned long arg)
{
uint64_t range[2];
- uint64_t start, len;
+ struct address_space *mapping;
+ uint64_t start, end, len;
if (!(mode & FMODE_WRITE))
return -EBADF;
@@ -235,18 +236,23 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
start = range[0];
len = range[1];
+ end = start + len - 1;
if (start & 511)
return -EINVAL;
if (len & 511)
return -EINVAL;
- start >>= 9;
- len >>= 9;
-
- if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+ if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+ return -EINVAL;
+ if (end < start)
return -EINVAL;
- return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+ /* Invalidate the page cache, including dirty pages */
+ mapping = bdev->bd_inode->i_mapping;
+ truncate_inode_pages_range(mapping, start, end);
+
+ return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+ false);
}
static int put_ushort(unsigned long arg, unsigned short val)
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v9 0/3] fallocate for block devices
@ 2016-06-17 1:17 Darrick J. Wong
2016-06-17 1:17 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
0 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2016-06-17 1:17 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw, darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
Cc: linux-block-u79uwXL29TY76Z2rM5mHXA, tytso-3s7WtUTddSA,
martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
snitzer-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA,
bfoster-H+wXaHxf7aLQT0dZR+AlfA, xfs-VZNHf3L845pBDgjK7y7TUQ,
hch-wEGCiKHe2LqWVfeAwA7xHQ, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
Hi,
This is a redesign of the patch series that fixes various interface
problems with the existing "zero out this part of a block device"
code. BLKZEROOUT2 is gone.
The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds. Without this patch we still have the pagecache
coherence bug that's been in the kernel forever.
The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size. Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.
The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices. It also allows
the combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing
discard.
Test cases for the new block device fallocate are now in xfstests as
generic/349-351.
Comments and questions are, as always, welcome. Patches are against
4.7-rc3.
v7: Strengthen parameter checking and fix various code issues pointed
out by Linus and Christoph.
v8: More code rearranging, rebase to 4.6-rc3, and dig into alignment
issues.
v9: Forward port to 4.7.
--D
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
2016-06-17 1:17 [PATCH v9 0/3] fallocate for block devices Darrick J. Wong
@ 2016-06-17 1:17 ` Darrick J. Wong
2016-06-20 12:35 ` Bart Van Assche
2016-06-29 4:57 ` Martin K. Petersen
0 siblings, 2 replies; 24+ messages in thread
From: Darrick J. Wong @ 2016-06-17 1:17 UTC (permalink / raw)
To: axboe, darrick.wong
Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
xfs, hch, dm-devel, linux-fsdevel, Christoph Hellwig
Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.
v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
Split the page invalidation and the new ioctl into separate patches.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
block/ioctl.c | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..d001f52 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -225,7 +225,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
unsigned long arg)
{
uint64_t range[2];
- uint64_t start, len;
+ struct address_space *mapping;
+ uint64_t start, end, len;
+ int ret;
if (!(mode & FMODE_WRITE))
return -EBADF;
@@ -235,18 +237,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
start = range[0];
len = range[1];
+ end = start + len - 1;
if (start & 511)
return -EINVAL;
if (len & 511)
return -EINVAL;
- start >>= 9;
- len >>= 9;
-
- if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+ if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+ return -EINVAL;
+ if (end < start)
return -EINVAL;
- return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+ /* Invalidate the page cache, including dirty pages */
+ mapping = bdev->bd_inode->i_mapping;
+ truncate_inode_pages_range(mapping, start, end);
+
+ ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+ false);
+ if (ret)
+ return ret;
+
+ /*
+ * Invalidate again; if someone wandered in and dirtied a page,
+ * the caller will be given -EBUSY.
+ */
+ return invalidate_inode_pages2_range(mapping,
+ start >> PAGE_SHIFT,
+ end >> PAGE_SHIFT);
}
static int put_ushort(unsigned long arg, unsigned short val)
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
2016-06-17 1:17 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
@ 2016-06-20 12:35 ` Bart Van Assche
[not found] ` <7589fc01-a000-a912-f9b5-cf099cc2d27a-HInyCGIudOg@public.gmane.org>
2016-06-29 4:57 ` Martin K. Petersen
1 sibling, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2016-06-20 12:35 UTC (permalink / raw)
To: Darrick J. Wong, axboe@kernel.dk
Cc: hch@infradead.org, tytso@mit.edu, martin.petersen@oracle.com,
snitzer@redhat.com, linux-api@vger.kernel.org, bfoster@redhat.com,
xfs@oss.sgi.com, linux-block@vger.kernel.org, dm-devel@redhat.com,
linux-fsdevel@vger.kernel.org, Christoph Hellwig
On 06/17/2016 03:18 AM, Darrick J. Wong wrote:
> Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
> returning stale cache contents at a later time.
>
> v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
> Split the page invalidation and the new ioctl into separate patches.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
> block/ioctl.c | 29 +++++++++++++++++++++++------
> 1 file changed, 23 insertions(+), 6 deletions(-)
>
>
> diff --git a/block/ioctl.c b/block/ioctl.c
> index ed2397f..d001f52 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -225,7 +225,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
> unsigned long arg)
> {
> uint64_t range[2];
> - uint64_t start, len;
> + struct address_space *mapping;
> + uint64_t start, end, len;
> + int ret;
>
> if (!(mode & FMODE_WRITE))
> return -EBADF;
> @@ -235,18 +237,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
>
> start = range[0];
> len = range[1];
> + end = start + len - 1;
>
> if (start & 511)
> return -EINVAL;
> if (len & 511)
> return -EINVAL;
> - start >>= 9;
> - len >>= 9;
> -
> - if (start + len > (i_size_read(bdev->bd_inode) >> 9))
> + if (end >= (uint64_t)i_size_read(bdev->bd_inode))
> + return -EINVAL;
> + if (end < start)
> return -EINVAL;
>
> - return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
> + /* Invalidate the page cache, including dirty pages */
> + mapping = bdev->bd_inode->i_mapping;
> + truncate_inode_pages_range(mapping, start, end);
> +
> + ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
> + false);
> + if (ret)
> + return ret;
> +
> + /*
> + * Invalidate again; if someone wandered in and dirtied a page,
> + * the caller will be given -EBUSY.
> + */
> + return invalidate_inode_pages2_range(mapping,
> + start >> PAGE_SHIFT,
> + end >> PAGE_SHIFT);
> }
Hello Darrick,
Maybe this has already been discussed, but anyway: in the POSIX spec
(http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html) I
found the following: "This volume of POSIX.1-2008 does not specify
behavior of concurrent writes to a file from multiple processes.
Applications should use some form of concurrency control."
Do we really need the invalidate_inode_pages2_range() call?
Thanks,
Bart.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
2016-06-17 1:17 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-06-20 12:35 ` Bart Van Assche
@ 2016-06-29 4:57 ` Martin K. Petersen
1 sibling, 0 replies; 24+ messages in thread
From: Martin K. Petersen @ 2016-06-29 4:57 UTC (permalink / raw)
To: Darrick J. Wong
Cc: axboe, linux-block, tytso, martin.petersen, snitzer, linux-api,
bfoster, xfs, hch, dm-devel, linux-fsdevel, Christoph Hellwig
>>>>> "Darrick" == Darrick J Wong <darrick.wong@oracle.com> writes:
Darrick> Invalidate the page cache (as a regular O_DIRECT write would
Darrick> do) to avoid returning stale cache contents at a later time.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
--
Martin K. Petersen Oracle Linux Engineering
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 24+ messages in thread
* [RFC DONOTMERGE v8 0/3] fallocate for block devices
@ 2016-04-13 4:01 Darrick J. Wong
[not found] ` <20160413040121.10562.98998.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
0 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2016-04-13 4:01 UTC (permalink / raw)
To: darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, hch-wEGCiKHe2LqWVfeAwA7xHQ,
tytso-3s7WtUTddSA, martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
snitzer-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA,
bfoster-H+wXaHxf7aLQT0dZR+AlfA, xfs-VZNHf3L845pBDgjK7y7TUQ,
linux-block-u79uwXL29TY76Z2rM5mHXA,
dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
Hi,
This is a redesign of the patch series that fixes various interface
problems with the existing "zero out this part of a block device"
code. BLKZEROOUT2 is gone.
The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds.
The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size. Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.
The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices. It also allows
the combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing
discard.
The point of this patchset is not to go upstream, but is to be a
starting point for a discussion at LSF. Don't merge this! Foremost
in my mind is whether or not we require the offset/len parameters to
be aligned to logical block size or minimum_io_size; what error code
to return for unaligned values; and whether or not we should allow
byte ranges and zero blocks with the page cache (like file fallocate
does now). It'll also be a jumping off point for Brian Foster and
Mike Snitzer's patches to allow bdev clients to ask that space be
allocated to a range, and to plumb that out to userspace.
Test cases for the new block device fallocate have been submitted to
the xfstests list as generic/70[5-7], though the latest versions of
those test cases will be attached to this patchset for convenience.
Comments and questions are, as always, welcome. Patches are against
4.6-rc3.
v7: Strengthen parameter checking and fix various code issues pointed
out by Linus and Christoph.
v8: More code rearranging, rebase to 4.6-rc3, and dig into alignment
issues.
--D
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v6 0/3] fallocate for block devices to provide zero-out
@ 2016-03-05 0:55 Darrick J. Wong
[not found] ` <20160305005556.29738.66782.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
0 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2016-03-05 0:55 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw,
torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
Cc: hch-wEGCiKHe2LqWVfeAwA7xHQ, tytso-3s7WtUTddSA,
martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
linux-api-u79uwXL29TY76Z2rM5mHXA, david-FqsqvQoI3Ljby3iVrkZq2A,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, shane.seymour-ZPxbGqLxI0U,
bfields-uC3wQj2KruNg9hUCZPvPmw,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
jlayton-vpEMnDpepFuMZCB2o+C8xQ,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Hi,
This is a redesign of the patch series that fixes various interface
problems with the existing "zero out this part of a block device"
code. BLKZEROOUT2 is gone.
The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds.
The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size. Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.
The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices.
Test cases[1] for the new block device fallocate will have been
submitted to the xfstests list as generic/70[5-7], though the
numbering will change to a lower number when the API and the tests are
accepted upstream.
Comments and questions are, as always, welcome. Patches are against
4.5-rc6.
--D
[1] https://github.com/djwong/xfstests/commit/fdc0980ef01076dfb246fd1db2511227e9f67a3f
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2016-09-29 21:16 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-15 19:42 [PATCH v7 0/3] fallocate for block devices to provide zero-out Darrick J. Wong
[not found] ` <20160315194221.30093.70506.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-15 19:42 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-03-15 19:42 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong
[not found] ` <20160315194244.30093.6483.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-21 15:38 ` Christoph Hellwig
[not found] ` <20160321153827.GA27230-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-03-21 17:52 ` Darrick J. Wong
[not found] ` <20160321175235.GA5812-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-21 18:17 ` Christoph Hellwig
[not found] ` <20160321181726.GA10892-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-03-21 18:21 ` Martin K. Petersen
2016-03-21 18:52 ` Mike Snitzer
2016-03-21 19:11 ` Darrick J. Wong
2016-03-21 19:22 ` Mike Snitzer
[not found] ` <20160321192229.GA17220-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-03-21 20:59 ` Brian Foster
2016-03-15 19:42 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2016-09-29 21:16 [PATCH v11 0/3] fallocate for block devices Darrick J. Wong
2016-09-29 21:16 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-09-29 0:39 [PATCH v10 0/3] fallocate for block devices Darrick J. Wong
[not found] ` <147510957066.8940.13803086684642725401.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-09-29 0:39 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-09-29 1:16 ` Bart Van Assche
2016-09-29 5:56 ` Hannes Reinecke
2016-08-26 0:02 [PATCH v10 0/3] fallocate for block devices Darrick J. Wong
2016-08-26 0:02 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-06-17 1:17 [PATCH v9 0/3] fallocate for block devices Darrick J. Wong
2016-06-17 1:17 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-06-20 12:35 ` Bart Van Assche
[not found] ` <7589fc01-a000-a912-f9b5-cf099cc2d27a-HInyCGIudOg@public.gmane.org>
2016-06-28 19:13 ` Darrick J. Wong
2016-06-29 4:57 ` Martin K. Petersen
2016-04-13 4:01 [RFC DONOTMERGE v8 0/3] fallocate for block devices Darrick J. Wong
[not found] ` <20160413040121.10562.98998.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-04-13 4:01 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-03-05 0:55 [PATCH v6 0/3] fallocate for block devices to provide zero-out Darrick J. Wong
[not found] ` <20160305005556.29738.66782.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-05 0:56 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
[not found] ` <20160305005603.29738.58460.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-15 7:35 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).