* [PATCH v6 0/3] fallocate for block devices to provide zero-out
@ 2016-03-05 0:55 Darrick J. Wong
2016-03-05 0:56 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Darrick J. Wong @ 2016-03-05 0:55 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw,
torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
Cc: hch-wEGCiKHe2LqWVfeAwA7xHQ, tytso-3s7WtUTddSA,
martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
linux-api-u79uwXL29TY76Z2rM5mHXA, david-FqsqvQoI3Ljby3iVrkZq2A,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, shane.seymour-ZPxbGqLxI0U,
bfields-uC3wQj2KruNg9hUCZPvPmw,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
jlayton-vpEMnDpepFuMZCB2o+C8xQ,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Hi,
This is a redesign of the patch series that fixes various interface
problems with the existing "zero out this part of a block device"
code. BLKZEROOUT2 is gone.
The first patch is still a fix to the existing BLKZEROOUT ioctl to
invalidate the page cache if the zeroing command to the underlying
device succeeds.
The second patch changes the internal block device functions to reject
attempts to discard or zeroout that are not aligned to the logical
block size. Previously, we only checked that the start/len parameters
were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs
to 4k-LBA devices.
The third patch creates an fallocate handler for block devices, wires
up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects
FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent
fallocate interface between files and block devices.
Test cases[1] for the new block device fallocate will have been
submitted to the xfstests list as generic/70[5-7], though the
numbering will change to a lower number when the API and the tests are
accepted upstream.
Comments and questions are, as always, welcome. Patches are against
4.5-rc6.
--D
[1] https://github.com/djwong/xfstests/commit/fdc0980ef01076dfb246fd1db2511227e9f67a3f
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
[not found] ` <20160305005556.29738.66782.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
@ 2016-03-05 0:56 ` Darrick J. Wong
[not found] ` <20160305005603.29738.58460.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-05 3:17 ` [PATCH v6 0/3] fallocate for block devices to provide zero-out Linus Torvalds
1 sibling, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2016-03-05 0:56 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw,
torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
Cc: hch-wEGCiKHe2LqWVfeAwA7xHQ, tytso-3s7WtUTddSA,
martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
linux-api-u79uwXL29TY76Z2rM5mHXA, david-FqsqvQoI3Ljby3iVrkZq2A,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, shane.seymour-ZPxbGqLxI0U,
bfields-uC3wQj2KruNg9hUCZPvPmw,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
jlayton-vpEMnDpepFuMZCB2o+C8xQ,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.
v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
Split the page invalidation and the new ioctl into separate patches.
Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
block/ioctl.c | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index d8996bb..c6eb462 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -226,7 +226,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
unsigned long arg)
{
uint64_t range[2];
- uint64_t start, len;
+ struct address_space *mapping;
+ uint64_t start, end, len;
+ int ret;
if (!(mode & FMODE_WRITE))
return -EBADF;
@@ -236,18 +238,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
start = range[0];
len = range[1];
+ end = start + len - 1;
if (start & 511)
return -EINVAL;
if (len & 511)
return -EINVAL;
- start >>= 9;
- len >>= 9;
-
- if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+ if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+ return -EINVAL;
+ if (end < start)
return -EINVAL;
- return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+ /* Invalidate the page cache, including dirty pages */
+ mapping = bdev->bd_inode->i_mapping;
+ truncate_inode_pages_range(mapping, start, end);
+
+ ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+ false);
+ if (ret)
+ return ret;
+
+ /*
+ * Invalidate again; if someone wandered in and dirtied a page,
+ * the caller will be given -EBUSY.
+ */
+ return invalidate_inode_pages2_range(mapping,
+ start >> PAGE_CACHE_SHIFT,
+ end >> PAGE_CACHE_SHIFT);
}
static int put_ushort(unsigned long arg, unsigned short val)
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 2/3] block: require write_same and discard requests align to logical block size
2016-03-05 0:55 [PATCH v6 0/3] fallocate for block devices to provide zero-out Darrick J. Wong
@ 2016-03-05 0:56 ` Darrick J. Wong
2016-03-05 3:02 ` Linus Torvalds
2016-03-15 7:34 ` Christoph Hellwig
2016-03-05 0:56 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong
[not found] ` <20160305005556.29738.66782.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2 siblings, 2 replies; 23+ messages in thread
From: Darrick J. Wong @ 2016-03-05 0:56 UTC (permalink / raw)
To: axboe, torvalds, darrick.wong
Cc: hch, tytso, martin.petersen, linux-api, david, linux-kernel,
shane.seymour, bfields, linux-fsdevel, jlayton, akpm
Make sure that the offset and length arguments that we're using to
construct WRITE SAME and DISCARD requests are actually aligned to the
logical block size. Failure to do this causes other errors in other
parts of the block layer or the SCSI layer because disks don't support
partial logical block writes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
block/blk-lib.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9ebf653..3e5ca28 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -49,6 +49,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
struct bio *bio;
int ret = 0;
struct blk_plug plug;
+ sector_t bs_mask;
if (!q)
return -ENXIO;
@@ -56,6 +57,10 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
+ bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+ if ((sector & bs_mask) || ((sector + nr_sects) & bs_mask))
+ return -EINVAL;
+
/* Zero-sector (unknown) and one-sector granularities are the same. */
granularity = max(q->limits.discard_granularity >> 9, 1U);
alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
@@ -148,6 +153,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
unsigned int max_write_same_sectors;
+ sector_t bs_mask;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -155,6 +161,10 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
if (!q)
return -ENXIO;
+ bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+ if ((sector & bs_mask) || ((sector + nr_sects) & bs_mask))
+ return -EINVAL;
+
/* Ensure that max_write_same_sectors doesn't overflow bi_size */
max_write_same_sectors = UINT_MAX >> 9;
@@ -218,9 +228,14 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
int ret;
struct bio *bio;
struct bio_batch bb;
+ sector_t bs_mask;
unsigned int sz;
DECLARE_COMPLETION_ONSTACK(wait);
+ bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+ if ((sector & bs_mask) || ((sector + nr_sects) & bs_mask))
+ return -EINVAL;
+
atomic_set(&bb.done, 1);
bb.error = 0;
bb.wait = &wait;
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 3/3] block: implement (some of) fallocate for block devices
2016-03-05 0:55 [PATCH v6 0/3] fallocate for block devices to provide zero-out Darrick J. Wong
2016-03-05 0:56 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
@ 2016-03-05 0:56 ` Darrick J. Wong
[not found] ` <20160305005617.29738.85316.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
[not found] ` <20160305005556.29738.66782.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2016-03-05 0:56 UTC (permalink / raw)
To: axboe, torvalds, darrick.wong
Cc: hch, tytso, martin.petersen, linux-api, david, linux-kernel,
shane.seymour, bfields, linux-fsdevel, jlayton, akpm
After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been
whitelisted for zeroing SCSI UNMAP. Both flags require that
FALLOC_FL_KEEP_SIZE are set, both return EINVAL if one tries
to write past the end of the device, and both require that the
offset and length be aligned at least to 512-byte offsets.q
Since the semantics of fallocate are fairly well established already,
wire up the two pieces. The other fallocate variants (collapse range,
insert range, and allocate blocks) are not supported.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/block_dev.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/open.c | 3 ++-
2 files changed, 69 insertions(+), 1 deletion(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 826b164..c9c9421 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -30,6 +30,7 @@
#include <linux/cleancache.h>
#include <linux/dax.h>
#include <asm/uaccess.h>
+#include <linux/falloc.h>
#include "internal.h"
struct bdev_inode {
@@ -1786,6 +1787,71 @@ static int blkdev_mmap(struct file *file, struct vm_area_struct *vma)
#define blkdev_mmap generic_file_mmap
#endif
+#define BLKDEV_FALLOC_FL_SUPPORTED \
+ (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \
+ FALLOC_FL_ZERO_RANGE)
+
+long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
+{
+ struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+ struct request_queue *q = bdev_get_queue(bdev);
+ struct address_space *mapping;
+ loff_t end = start + len - 1;
+ loff_t bs_mask;
+ int error;
+
+ /* We only support zero range and punch hole. */
+ if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
+ return -EOPNOTSUPP;
+
+ /* We can't change the bdev size from here */
+ if (!(mode & FALLOC_FL_KEEP_SIZE))
+ return -EOPNOTSUPP;
+
+ /* We haven't a primitive for "ensure space exists" right now. */
+ if (mode == FALLOC_FL_KEEP_SIZE)
+ return -EOPNOTSUPP;
+
+ /* Only punch if the device can do zeroing discard. */
+ if ((mode & FALLOC_FL_PUNCH_HOLE) &&
+ (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
+ return -EOPNOTSUPP;
+
+ /* Don't allow IO that isn't aligned to logical block size */
+ bs_mask = bdev_logical_block_size(bdev) - 1;
+ if ((start & bs_mask) || ((start + len) & bs_mask))
+ return -EINVAL;
+
+ /* Don't go off the end of the device */
+ if (end > i_size_read(bdev->bd_inode))
+ return -EINVAL;
+ if (end < start)
+ return -EINVAL;
+
+ /* Invalidate the page cache, including dirty pages. */
+ mapping = bdev->bd_inode->i_mapping;
+ truncate_inode_pages_range(mapping, start, end);
+
+ error = -EINVAL;
+ if (mode & FALLOC_FL_ZERO_RANGE)
+ error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+ GFP_KERNEL, false);
+ else if (mode & FALLOC_FL_PUNCH_HOLE)
+ error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+ GFP_KERNEL, 0);
+ if (error)
+ return error;
+
+ /*
+ * Invalidate again; if someone wandered in and dirtied a page,
+ * the caller will be given -EBUSY;
+ */
+ return invalidate_inode_pages2_range(mapping,
+ start >> PAGE_CACHE_SHIFT,
+ end >> PAGE_CACHE_SHIFT);
+}
+EXPORT_SYMBOL_GPL(blkdev_fallocate);
+
const struct file_operations def_blk_fops = {
.open = blkdev_open,
.release = blkdev_close,
@@ -1800,6 +1866,7 @@ const struct file_operations def_blk_fops = {
#endif
.splice_read = generic_file_splice_read,
.splice_write = iter_file_splice_write,
+ .fallocate = blkdev_fallocate,
};
int ioctl_by_bdev(struct block_device *bdev, unsigned cmd, unsigned long arg)
diff --git a/fs/open.c b/fs/open.c
index 55bdc75..4f99adc 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -289,7 +289,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
* Let individual file system decide if it supports preallocation
* for directories or not.
*/
- if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
+ if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
+ !S_ISBLK(inode->i_mode))
return -ENODEV;
/* Check for wrap through zero too */
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
2016-03-05 0:56 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
@ 2016-03-05 3:02 ` Linus Torvalds
2016-03-15 7:34 ` Christoph Hellwig
1 sibling, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2016-03-05 3:02 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Jens Axboe, Christoph Hellwig, Theodore Ts'o,
Martin K. Petersen, Linux API, Dave Chinner,
Linux Kernel Mailing List, shane.seymour, Bruce Fields,
linux-fsdevel, Jeff Layton, Andrew Morton
On Fri, Mar 4, 2016 at 4:56 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> + bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
> + if ((sector & bs_mask) || ((sector + nr_sects) & bs_mask))
> + return -EINVAL;
This test may _work_, but it's kind of crazily overly complicated.
The sane test would be just "are the start and length aligned":
if ((sector & bs_mask) || (nr_sects & bs_mask))
return -EINVAL;
and the *smart* test is simpler still, and asks "are there invalid
bits in either the start or the length":
if ((sector | nr_sects) & bs_mask)
return -EINVAL:
I suspect either of these would be fine, and the compiler may even
notice that there's the smart way of doing it.
The compiler *might* even notice that the original version can be
simplified and generate sane code.
But I think that original version is not only overly complicated, it's
also actually less obvious than the simpler versions, if only because
the whole conditional is so big that you have to actively parse it.
That last shortest form is actually so simple that I think it's the
easiest to understand too - the conditional is simply so small that it
doesn't take a lot of effort to see what it does.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
[not found] ` <20160305005617.29738.85316.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
@ 2016-03-05 3:06 ` Linus Torvalds
[not found] ` <CA+55aFyY+oDoK8C6C++PG=N+vtY-r1Y6fYO_3skDTzkP_SXC-A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-05 3:13 ` Linus Torvalds
1 sibling, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2016-03-05 3:06 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Jens Axboe, Christoph Hellwig, Theodore Ts'o,
Martin K. Petersen, Linux API, Dave Chinner,
Linux Kernel Mailing List, shane.seymour-ZPxbGqLxI0U,
Bruce Fields, linux-fsdevel, Jeff Layton, Andrew Morton
On Fri, Mar 4, 2016 at 4:56 PM, Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> + /* Only punch if the device can do zeroing discard. */
> + if ((mode & FALLOC_FL_PUNCH_HOLE) &&
> + (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
> + return -EOPNOTSUPP;
I'm ok with this, but suspect that some users would prefer to just
turn this into ZERO_RANGE silently.
Comments from people who would be expected to use this?
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
[not found] ` <20160305005617.29738.85316.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-05 3:06 ` Linus Torvalds
@ 2016-03-05 3:13 ` Linus Torvalds
[not found] ` <CA+55aFzJW70YApg8ra-c79E-j1ujOXbh=26k2dvyYX+qZuu_8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
1 sibling, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2016-03-05 3:13 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Jens Axboe, Christoph Hellwig, Theodore Ts'o,
Martin K. Petersen, Linux API, Dave Chinner,
Linux Kernel Mailing List, shane.seymour-ZPxbGqLxI0U,
Bruce Fields, linux-fsdevel, Jeff Layton, Andrew Morton
On Fri, Mar 4, 2016 at 4:56 PM, Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> +
> + /* We can't change the bdev size from here */
> + if (!(mode & FALLOC_FL_KEEP_SIZE))
> + return -EOPNOTSUPP;
Oh, and this I think is wrong.
The thing is, FALLOC_FL_KEEP_SIZE is only supposed to matter if the
region is outside the existing length.
So if y ou punch a hole in the middle of a file, you don't need
FALLOC_FL_KEEP_SIZE.
I would suggest removing this check entirely, since you already check
that people don't try to punch holes past the end of the device. So
FALLOC_FL_KEEP_SIZE is simply a non-issue, and shouldn't even be
checked.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v6 0/3] fallocate for block devices to provide zero-out
[not found] ` <20160305005556.29738.66782.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-05 0:56 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
@ 2016-03-05 3:17 ` Linus Torvalds
1 sibling, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2016-03-05 3:17 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Jens Axboe, Christoph Hellwig, Theodore Ts'o,
Martin K. Petersen, Linux API, Dave Chinner,
Linux Kernel Mailing List, shane.seymour-ZPxbGqLxI0U,
Bruce Fields, linux-fsdevel, Jeff Layton, Andrew Morton
On Fri, Mar 4, 2016 at 4:55 PM, Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>
> This is a redesign of the patch series that fixes various interface
> problems with the existing "zero out this part of a block device"
> code. BLKZEROOUT2 is gone.
I replied to two of the patches with small nits, but on the whole I
like it. The only nit that I think is serious is that you shouldn't be
requiring people to have to use FALLOC_FL_KEEP_SIZE.
If anything, you might just change that logic to say: If
FALLOC_FL_KEEP_SIZE is set, then we truncate the hole punching/zeroing
to the size of the device. So people could do something like
fallocate(fd,
FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
0, ~0);
to trim the whole device (and if FALLOC_FL_KEEP_SIZE isn't set, then
the above would be an error simply because the requested hole was
bigger than the device)
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
[not found] ` <CA+55aFyY+oDoK8C6C++PG=N+vtY-r1Y6fYO_3skDTzkP_SXC-A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-03-05 20:57 ` Christoph Hellwig
0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2016-03-05 20:57 UTC (permalink / raw)
To: Linus Torvalds
Cc: Darrick J. Wong, Jens Axboe, Christoph Hellwig, Theodore Ts'o,
Martin K. Petersen, Linux API, Dave Chinner,
Linux Kernel Mailing List, shane.seymour-ZPxbGqLxI0U,
Bruce Fields, linux-fsdevel, Jeff Layton, Andrew Morton
On Fri, Mar 04, 2016 at 07:06:38PM -0800, Linus Torvalds wrote:
> > + if ((mode & FALLOC_FL_PUNCH_HOLE) &&
> > + (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
> > + return -EOPNOTSUPP;
>
> I'm ok with this, but suspect that some users would prefer to just
> turn this into ZERO_RANGE silently.
>
> Comments from people who would be expected to use this?
A hole punch should be a hole punch, and not silently allocate blocks
isntead of deallocating them. It's not even a fallback, it's pretty
much the opposite for some workloads.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] block: implement (some of) fallocate for block devices
[not found] ` <CA+55aFzJW70YApg8ra-c79E-j1ujOXbh=26k2dvyYX+qZuu_8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-03-05 20:58 ` Christoph Hellwig
0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2016-03-05 20:58 UTC (permalink / raw)
To: Linus Torvalds
Cc: Darrick J. Wong, Jens Axboe, Christoph Hellwig, Theodore Ts'o,
Martin K. Petersen, Linux API, Dave Chinner,
Linux Kernel Mailing List, shane.seymour-ZPxbGqLxI0U,
Bruce Fields, linux-fsdevel, Jeff Layton, Andrew Morton
On Fri, Mar 04, 2016 at 07:13:25PM -0800, Linus Torvalds wrote:
> > + /* We can't change the bdev size from here */
> > + if (!(mode & FALLOC_FL_KEEP_SIZE))
> > + return -EOPNOTSUPP;
>
> Oh, and this I think is wrong.
>
> The thing is, FALLOC_FL_KEEP_SIZE is only supposed to matter if the
> region is outside the existing length.
For allocations...
> So if y ou punch a hole in the middle of a file, you don't need
> FALLOC_FL_KEEP_SIZE.
For FALLOC_FL_PUNCH_HOLE we always require FALLOC_FL_KEEP_SIZE so far,
and I'd rather not change things for block devices just because we can.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/3] block: require write_same and discard requests align to logical block size
2016-03-05 0:56 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
2016-03-05 3:02 ` Linus Torvalds
@ 2016-03-15 7:34 ` Christoph Hellwig
1 sibling, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2016-03-15 7:34 UTC (permalink / raw)
To: Darrick J. Wong
Cc: axboe, torvalds, hch, tytso, martin.petersen, linux-api, david,
linux-kernel, shane.seymour, bfields, linux-fsdevel, jlayton,
akpm
On Fri, Mar 04, 2016 at 04:56:10PM -0800, Darrick J. Wong wrote:
> Make sure that the offset and length arguments that we're using to
> construct WRITE SAME and DISCARD requests are actually aligned to the
> logical block size. Failure to do this causes other errors in other
> parts of the block layer or the SCSI layer because disks don't support
> partial logical block writes.
Looks fine,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
[not found] ` <20160305005603.29738.58460.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
@ 2016-03-15 7:35 ` Christoph Hellwig
0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2016-03-15 7:35 UTC (permalink / raw)
To: Darrick J. Wong
Cc: axboe-tSWWG44O7X1aa/9Udqfwiw,
torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
hch-wEGCiKHe2LqWVfeAwA7xHQ, tytso-3s7WtUTddSA,
martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
linux-api-u79uwXL29TY76Z2rM5mHXA, david-FqsqvQoI3Ljby3iVrkZq2A,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, shane.seymour-ZPxbGqLxI0U,
bfields-uC3wQj2KruNg9hUCZPvPmw,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
jlayton-vpEMnDpepFuMZCB2o+C8xQ,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
On Fri, Mar 04, 2016 at 04:56:03PM -0800, Darrick J. Wong wrote:
> Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
> returning stale cache contents at a later time.
>
> v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
> Split the page invalidation and the new ioctl into separate patches.
>
> Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Didn't I review this already? If not:
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
[not found] ` <20160315194221.30093.70506.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
@ 2016-03-15 19:42 ` Darrick J. Wong
0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2016-03-15 19:42 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw,
torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
Cc: bfields-uC3wQj2KruNg9hUCZPvPmw, tytso-3s7WtUTddSA,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
linux-api-u79uwXL29TY76Z2rM5mHXA, david-FqsqvQoI3Ljby3iVrkZq2A,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, shane.seymour-ZPxbGqLxI0U,
hch-wEGCiKHe2LqWVfeAwA7xHQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
jlayton-vpEMnDpepFuMZCB2o+C8xQ, Christoph Hellwig
Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.
v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
Split the page invalidation and the new ioctl into separate patches.
Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
block/ioctl.c | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index d8996bb..c6eb462 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -226,7 +226,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
unsigned long arg)
{
uint64_t range[2];
- uint64_t start, len;
+ struct address_space *mapping;
+ uint64_t start, end, len;
+ int ret;
if (!(mode & FMODE_WRITE))
return -EBADF;
@@ -236,18 +238,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
start = range[0];
len = range[1];
+ end = start + len - 1;
if (start & 511)
return -EINVAL;
if (len & 511)
return -EINVAL;
- start >>= 9;
- len >>= 9;
-
- if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+ if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+ return -EINVAL;
+ if (end < start)
return -EINVAL;
- return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+ /* Invalidate the page cache, including dirty pages */
+ mapping = bdev->bd_inode->i_mapping;
+ truncate_inode_pages_range(mapping, start, end);
+
+ ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+ false);
+ if (ret)
+ return ret;
+
+ /*
+ * Invalidate again; if someone wandered in and dirtied a page,
+ * the caller will be given -EBUSY.
+ */
+ return invalidate_inode_pages2_range(mapping,
+ start >> PAGE_CACHE_SHIFT,
+ end >> PAGE_CACHE_SHIFT);
}
static int put_ushort(unsigned long arg, unsigned short val)
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
[not found] ` <20160413040121.10562.98998.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
@ 2016-04-13 4:01 ` Darrick J. Wong
0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2016-04-13 4:01 UTC (permalink / raw)
To: darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, hch-wEGCiKHe2LqWVfeAwA7xHQ,
tytso-3s7WtUTddSA, martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
snitzer-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA,
bfoster-H+wXaHxf7aLQT0dZR+AlfA, xfs-VZNHf3L845pBDgjK7y7TUQ,
linux-block-u79uwXL29TY76Z2rM5mHXA,
dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig
Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.
v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
Split the page invalidation and the new ioctl into separate patches.
Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
block/ioctl.c | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index 4ff1f92..52b60b2 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -226,7 +226,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
unsigned long arg)
{
uint64_t range[2];
- uint64_t start, len;
+ struct address_space *mapping;
+ uint64_t start, end, len;
+ int ret;
if (!(mode & FMODE_WRITE))
return -EBADF;
@@ -236,18 +238,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
start = range[0];
len = range[1];
+ end = start + len - 1;
if (start & 511)
return -EINVAL;
if (len & 511)
return -EINVAL;
- start >>= 9;
- len >>= 9;
-
- if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+ if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+ return -EINVAL;
+ if (end < start)
return -EINVAL;
- return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+ /* Invalidate the page cache, including dirty pages */
+ mapping = bdev->bd_inode->i_mapping;
+ truncate_inode_pages_range(mapping, start, end);
+
+ ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+ false);
+ if (ret)
+ return ret;
+
+ /*
+ * Invalidate again; if someone wandered in and dirtied a page,
+ * the caller will be given -EBUSY.
+ */
+ return invalidate_inode_pages2_range(mapping,
+ start >> PAGE_SHIFT,
+ end >> PAGE_SHIFT);
}
static int put_ushort(unsigned long arg, unsigned short val)
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
2016-06-17 1:17 [PATCH v9 0/3] fallocate for block devices Darrick J. Wong
@ 2016-06-17 1:17 ` Darrick J. Wong
2016-06-20 12:35 ` Bart Van Assche
2016-06-29 4:57 ` Martin K. Petersen
0 siblings, 2 replies; 23+ messages in thread
From: Darrick J. Wong @ 2016-06-17 1:17 UTC (permalink / raw)
To: axboe, darrick.wong
Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
xfs, hch, dm-devel, linux-fsdevel, Christoph Hellwig
Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.
v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
Split the page invalidation and the new ioctl into separate patches.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
block/ioctl.c | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..d001f52 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -225,7 +225,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
unsigned long arg)
{
uint64_t range[2];
- uint64_t start, len;
+ struct address_space *mapping;
+ uint64_t start, end, len;
+ int ret;
if (!(mode & FMODE_WRITE))
return -EBADF;
@@ -235,18 +237,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
start = range[0];
len = range[1];
+ end = start + len - 1;
if (start & 511)
return -EINVAL;
if (len & 511)
return -EINVAL;
- start >>= 9;
- len >>= 9;
-
- if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+ if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+ return -EINVAL;
+ if (end < start)
return -EINVAL;
- return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+ /* Invalidate the page cache, including dirty pages */
+ mapping = bdev->bd_inode->i_mapping;
+ truncate_inode_pages_range(mapping, start, end);
+
+ ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+ false);
+ if (ret)
+ return ret;
+
+ /*
+ * Invalidate again; if someone wandered in and dirtied a page,
+ * the caller will be given -EBUSY.
+ */
+ return invalidate_inode_pages2_range(mapping,
+ start >> PAGE_SHIFT,
+ end >> PAGE_SHIFT);
}
static int put_ushort(unsigned long arg, unsigned short val)
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
2016-06-17 1:17 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
@ 2016-06-20 12:35 ` Bart Van Assche
[not found] ` <7589fc01-a000-a912-f9b5-cf099cc2d27a-HInyCGIudOg@public.gmane.org>
2016-06-29 4:57 ` Martin K. Petersen
1 sibling, 1 reply; 23+ messages in thread
From: Bart Van Assche @ 2016-06-20 12:35 UTC (permalink / raw)
To: Darrick J. Wong, axboe@kernel.dk
Cc: hch@infradead.org, tytso@mit.edu, martin.petersen@oracle.com,
snitzer@redhat.com, linux-api@vger.kernel.org, bfoster@redhat.com,
xfs@oss.sgi.com, linux-block@vger.kernel.org, dm-devel@redhat.com,
linux-fsdevel@vger.kernel.org, Christoph Hellwig
On 06/17/2016 03:18 AM, Darrick J. Wong wrote:
> Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
> returning stale cache contents at a later time.
>
> v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
> Split the page invalidation and the new ioctl into separate patches.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
> block/ioctl.c | 29 +++++++++++++++++++++++------
> 1 file changed, 23 insertions(+), 6 deletions(-)
>
>
> diff --git a/block/ioctl.c b/block/ioctl.c
> index ed2397f..d001f52 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -225,7 +225,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
> unsigned long arg)
> {
> uint64_t range[2];
> - uint64_t start, len;
> + struct address_space *mapping;
> + uint64_t start, end, len;
> + int ret;
>
> if (!(mode & FMODE_WRITE))
> return -EBADF;
> @@ -235,18 +237,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
>
> start = range[0];
> len = range[1];
> + end = start + len - 1;
>
> if (start & 511)
> return -EINVAL;
> if (len & 511)
> return -EINVAL;
> - start >>= 9;
> - len >>= 9;
> -
> - if (start + len > (i_size_read(bdev->bd_inode) >> 9))
> + if (end >= (uint64_t)i_size_read(bdev->bd_inode))
> + return -EINVAL;
> + if (end < start)
> return -EINVAL;
>
> - return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
> + /* Invalidate the page cache, including dirty pages */
> + mapping = bdev->bd_inode->i_mapping;
> + truncate_inode_pages_range(mapping, start, end);
> +
> + ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
> + false);
> + if (ret)
> + return ret;
> +
> + /*
> + * Invalidate again; if someone wandered in and dirtied a page,
> + * the caller will be given -EBUSY.
> + */
> + return invalidate_inode_pages2_range(mapping,
> + start >> PAGE_SHIFT,
> + end >> PAGE_SHIFT);
> }
Hello Darrick,
Maybe this has already been discussed, but anyway: in the POSIX spec
(http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html) I
found the following: "This volume of POSIX.1-2008 does not specify
behavior of concurrent writes to a file from multiple processes.
Applications should use some form of concurrency control."
Do we really need the invalidate_inode_pages2_range() call?
Thanks,
Bart.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
[not found] ` <7589fc01-a000-a912-f9b5-cf099cc2d27a-HInyCGIudOg@public.gmane.org>
@ 2016-06-28 19:13 ` Darrick J. Wong
0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2016-06-28 19:13 UTC (permalink / raw)
To: Bart Van Assche
Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org,
linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
tytso-3s7WtUTddSA@public.gmane.org,
martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
bfoster-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Christoph Hellwig
On Mon, Jun 20, 2016 at 02:35:29PM +0200, Bart Van Assche wrote:
> On 06/17/2016 03:18 AM, Darrick J. Wong wrote:
> >Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
> >returning stale cache contents at a later time.
> >
> >v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
> >Split the page invalidation and the new ioctl into separate patches.
> >
> >Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> >Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> >---
> > block/ioctl.c | 29 +++++++++++++++++++++++------
> > 1 file changed, 23 insertions(+), 6 deletions(-)
> >
> >
> >diff --git a/block/ioctl.c b/block/ioctl.c
> >index ed2397f..d001f52 100644
> >--- a/block/ioctl.c
> >+++ b/block/ioctl.c
> >@@ -225,7 +225,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
> > unsigned long arg)
> > {
> > uint64_t range[2];
> >- uint64_t start, len;
> >+ struct address_space *mapping;
> >+ uint64_t start, end, len;
> >+ int ret;
> >
> > if (!(mode & FMODE_WRITE))
> > return -EBADF;
> >@@ -235,18 +237,33 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
> >
> > start = range[0];
> > len = range[1];
> >+ end = start + len - 1;
> >
> > if (start & 511)
> > return -EINVAL;
> > if (len & 511)
> > return -EINVAL;
> >- start >>= 9;
> >- len >>= 9;
> >-
> >- if (start + len > (i_size_read(bdev->bd_inode) >> 9))
> >+ if (end >= (uint64_t)i_size_read(bdev->bd_inode))
> >+ return -EINVAL;
> >+ if (end < start)
> > return -EINVAL;
> >
> >- return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
> >+ /* Invalidate the page cache, including dirty pages */
> >+ mapping = bdev->bd_inode->i_mapping;
> >+ truncate_inode_pages_range(mapping, start, end);
> >+
> >+ ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
> >+ false);
> >+ if (ret)
> >+ return ret;
> >+
> >+ /*
> >+ * Invalidate again; if someone wandered in and dirtied a page,
> >+ * the caller will be given -EBUSY.
> >+ */
> >+ return invalidate_inode_pages2_range(mapping,
> >+ start >> PAGE_SHIFT,
> >+ end >> PAGE_SHIFT);
> > }
>
> Hello Darrick,
>
> Maybe this has already been discussed, but anyway: in the POSIX spec
> (http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html) I
> found the following: "This volume of POSIX.1-2008 does not specify behavior
> of concurrent writes to a file from multiple processes. Applications should
> use some form of concurrency control."
>
> Do we really need the invalidate_inode_pages2_range() call?
It's not strictly necessary. I like the idea of having the kernel bonking
userspace when they don't coordinate and collide, but we could just jump
out after the blkdev_*() calls and let userspace fend for themselves. :)
--D
>
> Thanks,
>
> Bart.
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
2016-06-17 1:17 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-06-20 12:35 ` Bart Van Assche
@ 2016-06-29 4:57 ` Martin K. Petersen
1 sibling, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2016-06-29 4:57 UTC (permalink / raw)
To: Darrick J. Wong
Cc: axboe, linux-block, tytso, martin.petersen, snitzer, linux-api,
bfoster, xfs, hch, dm-devel, linux-fsdevel, Christoph Hellwig
>>>>> "Darrick" == Darrick J Wong <darrick.wong@oracle.com> writes:
Darrick> Invalidate the page cache (as a regular O_DIRECT write would
Darrick> do) to avoid returning stale cache contents at a later time.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
--
Martin K. Petersen Oracle Linux Engineering
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
2016-08-26 0:02 [PATCH v10 0/3] fallocate for block devices Darrick J. Wong
@ 2016-08-26 0:02 ` Darrick J. Wong
0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2016-08-26 0:02 UTC (permalink / raw)
To: axboe, darrick.wong
Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
linux-block, dm-devel, linux-fsdevel, bart.vanassche,
Christoph Hellwig
Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.
v5: Refactor the 4.4 refactoring of the ioctl code into separate functions.
Split the page invalidation and the new ioctl into separate patches.
v6: Remove the call to invalidate_inode_pages2_range since we don't need it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
block/ioctl.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..755119c 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -225,7 +225,8 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
unsigned long arg)
{
uint64_t range[2];
- uint64_t start, len;
+ struct address_space *mapping;
+ uint64_t start, end, len;
if (!(mode & FMODE_WRITE))
return -EBADF;
@@ -235,18 +236,23 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
start = range[0];
len = range[1];
+ end = start + len - 1;
if (start & 511)
return -EINVAL;
if (len & 511)
return -EINVAL;
- start >>= 9;
- len >>= 9;
-
- if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+ if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+ return -EINVAL;
+ if (end < start)
return -EINVAL;
- return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+ /* Invalidate the page cache, including dirty pages */
+ mapping = bdev->bd_inode->i_mapping;
+ truncate_inode_pages_range(mapping, start, end);
+
+ return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+ false);
}
static int put_ushort(unsigned long arg, unsigned short val)
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
[not found] ` <147510957066.8940.13803086684642725401.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
@ 2016-09-29 0:39 ` Darrick J. Wong
2016-09-29 1:16 ` Bart Van Assche
2016-09-29 5:56 ` Hannes Reinecke
0 siblings, 2 replies; 23+ messages in thread
From: Darrick J. Wong @ 2016-09-29 0:39 UTC (permalink / raw)
To: axboe-tSWWG44O7X1aa/9Udqfwiw,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
darrick.wong-QHcLZuEGTsvQT0dZR+AlfA
Cc: linux-block-u79uwXL29TY76Z2rM5mHXA, tytso-3s7WtUTddSA,
martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
snitzer-H+wXaHxf7aLQT0dZR+AlfA, linux-api-u79uwXL29TY76Z2rM5mHXA,
bfoster-H+wXaHxf7aLQT0dZR+AlfA, xfs-VZNHf3L845pBDgjK7y7TUQ,
hch-wEGCiKHe2LqWVfeAwA7xHQ, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ, Christoph Hellwig
Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.
Signed-off-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Reviewed-by: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
block/ioctl.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..755119c 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -225,7 +225,8 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
unsigned long arg)
{
uint64_t range[2];
- uint64_t start, len;
+ struct address_space *mapping;
+ uint64_t start, end, len;
if (!(mode & FMODE_WRITE))
return -EBADF;
@@ -235,18 +236,23 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
start = range[0];
len = range[1];
+ end = start + len - 1;
if (start & 511)
return -EINVAL;
if (len & 511)
return -EINVAL;
- start >>= 9;
- len >>= 9;
-
- if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+ if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+ return -EINVAL;
+ if (end < start)
return -EINVAL;
- return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+ /* Invalidate the page cache, including dirty pages */
+ mapping = bdev->bd_inode->i_mapping;
+ truncate_inode_pages_range(mapping, start, end);
+
+ return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+ false);
}
static int put_ushort(unsigned long arg, unsigned short val)
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
2016-09-29 0:39 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
@ 2016-09-29 1:16 ` Bart Van Assche
2016-09-29 5:56 ` Hannes Reinecke
1 sibling, 0 replies; 23+ messages in thread
From: Bart Van Assche @ 2016-09-29 1:16 UTC (permalink / raw)
To: Darrick J. Wong, axboe, akpm
Cc: hch, tytso, martin.petersen, snitzer, linux-api, bfoster, xfs,
linux-block, dm-devel, linux-fsdevel, Christoph Hellwig
On 09/28/16 17:39, Darrick J. Wong wrote:
> Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
> returning stale cache contents at a later time.
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
2016-09-29 0:39 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-09-29 1:16 ` Bart Van Assche
@ 2016-09-29 5:56 ` Hannes Reinecke
1 sibling, 0 replies; 23+ messages in thread
From: Hannes Reinecke @ 2016-09-29 5:56 UTC (permalink / raw)
To: Darrick J. Wong, axboe, akpm
Cc: linux-block, tytso, martin.petersen, snitzer, linux-api, bfoster,
xfs, hch, dm-devel, linux-fsdevel, bart.vanassche,
Christoph Hellwig
On 09/29/2016 02:39 AM, Darrick J. Wong wrote:
> Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
> returning stale cache contents at a later time.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
> block/ioctl.c | 18 ++++++++++++------
> 1 file changed, 12 insertions(+), 6 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT.
2016-09-29 21:16 [PATCH v11 0/3] fallocate for block devices Darrick J. Wong
@ 2016-09-29 21:16 ` Darrick J. Wong
0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2016-09-29 21:16 UTC (permalink / raw)
To: axboe, akpm, darrick.wong
Cc: linux-block, Hannes Reinecke, tytso, snitzer, martin.petersen,
linux-api, bfoster, xfs, hch, dm-devel, hare, linux-fsdevel,
bart.vanassche, Christoph Hellwig
Invalidate the page cache (as a regular O_DIRECT write would do) to avoid
returning stale cache contents at a later time.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
---
block/ioctl.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..755119c 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -225,7 +225,8 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
unsigned long arg)
{
uint64_t range[2];
- uint64_t start, len;
+ struct address_space *mapping;
+ uint64_t start, end, len;
if (!(mode & FMODE_WRITE))
return -EBADF;
@@ -235,18 +236,23 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
start = range[0];
len = range[1];
+ end = start + len - 1;
if (start & 511)
return -EINVAL;
if (len & 511)
return -EINVAL;
- start >>= 9;
- len >>= 9;
-
- if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+ if (end >= (uint64_t)i_size_read(bdev->bd_inode))
+ return -EINVAL;
+ if (end < start)
return -EINVAL;
- return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
+ /* Invalidate the page cache, including dirty pages */
+ mapping = bdev->bd_inode->i_mapping;
+ truncate_inode_pages_range(mapping, start, end);
+
+ return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+ false);
}
static int put_ushort(unsigned long arg, unsigned short val)
^ permalink raw reply related [flat|nested] 23+ messages in thread
end of thread, other threads:[~2016-09-29 21:16 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-05 0:55 [PATCH v6 0/3] fallocate for block devices to provide zero-out Darrick J. Wong
2016-03-05 0:56 ` [PATCH 2/3] block: require write_same and discard requests align to logical block size Darrick J. Wong
2016-03-05 3:02 ` Linus Torvalds
2016-03-15 7:34 ` Christoph Hellwig
2016-03-05 0:56 ` [PATCH 3/3] block: implement (some of) fallocate for block devices Darrick J. Wong
[not found] ` <20160305005617.29738.85316.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-05 3:06 ` Linus Torvalds
[not found] ` <CA+55aFyY+oDoK8C6C++PG=N+vtY-r1Y6fYO_3skDTzkP_SXC-A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-05 20:57 ` Christoph Hellwig
2016-03-05 3:13 ` Linus Torvalds
[not found] ` <CA+55aFzJW70YApg8ra-c79E-j1ujOXbh=26k2dvyYX+qZuu_8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-05 20:58 ` Christoph Hellwig
[not found] ` <20160305005556.29738.66782.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-05 0:56 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
[not found] ` <20160305005603.29738.58460.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-15 7:35 ` Christoph Hellwig
2016-03-05 3:17 ` [PATCH v6 0/3] fallocate for block devices to provide zero-out Linus Torvalds
-- strict thread matches above, loose matches on Subject: below --
2016-03-15 19:42 [PATCH v7 " Darrick J. Wong
[not found] ` <20160315194221.30093.70506.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-03-15 19:42 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-04-13 4:01 [RFC DONOTMERGE v8 0/3] fallocate for block devices Darrick J. Wong
[not found] ` <20160413040121.10562.98998.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-04-13 4:01 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-06-17 1:17 [PATCH v9 0/3] fallocate for block devices Darrick J. Wong
2016-06-17 1:17 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-06-20 12:35 ` Bart Van Assche
[not found] ` <7589fc01-a000-a912-f9b5-cf099cc2d27a-HInyCGIudOg@public.gmane.org>
2016-06-28 19:13 ` Darrick J. Wong
2016-06-29 4:57 ` Martin K. Petersen
2016-08-26 0:02 [PATCH v10 0/3] fallocate for block devices Darrick J. Wong
2016-08-26 0:02 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-09-29 0:39 [PATCH v10 0/3] fallocate for block devices Darrick J. Wong
[not found] ` <147510957066.8940.13803086684642725401.stgit-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2016-09-29 0:39 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
2016-09-29 1:16 ` Bart Van Assche
2016-09-29 5:56 ` Hannes Reinecke
2016-09-29 21:16 [PATCH v11 0/3] fallocate for block devices Darrick J. Wong
2016-09-29 21:16 ` [PATCH 1/3] block: invalidate the page cache when issuing BLKZEROOUT Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).