From: Mike Snitzer <snitzer@redhat.com>
To: Brian Foster <bfoster@redhat.com>
Cc: xfs@oss.sgi.com, linux-block@vger.kernel.org,
linux-fsdevel@vger.kernel.org, dm-devel@redhat.com,
"Darrick J. Wong" <darrick.wong@oracle.com>
Subject: [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space
Date: Tue, 12 Apr 2016 16:04:59 -0400 [thread overview]
Message-ID: <20160412200459.GA10730@redhat.com> (raw)
In-Reply-To: <1460479373-63317-1-git-send-email-bfoster@redhat.com>
On Tue, Apr 12 2016 at 12:42P -0400,
Brian Foster <bfoster@redhat.com> wrote:
> Hi all,
>
> This is v2 of the XFS and block device reservation experiment. The
> significant changes in v2 are that the bdev interface has been condensed
> to a single callback function, the XFS transaction reservation
> management has been reworked to make transactions responsible for
> tracking and releasing excess reservation (for non-delalloc cases) and a
> workaround for the fallocate over-reservation issue is included. Beyond
> that, this version adds a bunch of miscellaneous cleanups and fixes some
> of the nastier locking/leak issues present in the first rfc.
>
> Patches 1-2 refactor some XFS reserve pool and block accounting code in
> preparation for subsequent patches. Patches 3-5 add block/device-mapper
> reservation support. Patches 6-10 add the core reservation
> infrastructure and management bits to XFS. See the link to the original
> rfc below for instructions and further details around the purpose of
> this series.
>
> Finally, note that this is still highly experimental/theoretical and
> should not be used on production systems. Thoughts, reviews, flames
> appreciated.
Thanks for carrying on with this work Brian.
I've started to review your patchset and Darrick's fallocate patchset.
I've pushed a branch to linux-dm.git that combines the 2, see:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-fallocate
and then added this RFC patch, at the end, which relies on both of your
patchsets -- you'll see blkdev_ensure_space_exists() has a FIXME which
implies it isn't much more than simply stubbed out at this point
(completely untested):
From: Mike Snitzer <snitzer@redhat.com>
Date: Tue, 12 Apr 2016 15:54:31 -0400
Subject: [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space
This effectively exposes the primitive for "ensure space exists". It
relies on block_device_operations' reserve_space method.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
block/blk-lib.c | 26 ++++++++++++++++++++++++++
fs/block_dev.c | 20 +++++++++++---------
include/linux/blkdev.h | 2 ++
3 files changed, 39 insertions(+), 9 deletions(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9dca6bb..5042a84 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -314,3 +314,29 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
}
EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_ensure_space_exists - preallocate a block range
+ * @bdev: blockdev to preallocate space for
+ * @sector: start sector
+ * @nr_sects: number of sectors to preallocate
+ * @gfp_mask: memory allocation flags (for bio_alloc)
+ * @flags: FALLOC_FL_* to control behaviour
+ *
+ * Description:
+ * Ensure space exists, or is preallocated, for the sectors in question.
+ */
+int blkdev_ensure_space_exists(struct block_device *bdev, sector_t sector,
+ sector_t nr_sects, unsigned long flags)
+{
+ sector_t res;
+ const struct block_device_operations *ops = bdev->bd_disk->fops;
+
+ if (!ops->reserve_space)
+ return -EOPNOTSUPP;
+
+ // FIXME: check with Brian Foster on whether it makes sense to
+ // use BDEV_RES_GET/BDEV_RES_MOD instead of BDEV_RES_PROVISION?
+ return ops->reserve_space(bdev, BDEV_RES_PROVISION, sector, nr_sects, &res);
+}
+EXPORT_SYMBOL(blkdev_ensure_space_exists);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 5a2c3ab..b34c07b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1801,17 +1801,13 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
struct request_queue *q = bdev_get_queue(bdev);
struct address_space *mapping;
loff_t end = start + len - 1;
- loff_t bs_mask, isize;
+ loff_t isize;
int error;
/* We only support zero range and punch hole. */
if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
return -EOPNOTSUPP;
- /* We haven't a primitive for "ensure space exists" right now. */
- if (!(mode & ~FALLOC_FL_KEEP_SIZE))
- return -EOPNOTSUPP;
-
/* Only punch if the device can do zeroing discard. */
if ((mode & FALLOC_FL_PUNCH_HOLE) &&
(!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
@@ -1829,9 +1825,12 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
return -EINVAL;
}
- /* Don't allow IO that isn't aligned to logical block size */
- bs_mask = bdev_logical_block_size(bdev) - 1;
- if ((start | len) & bs_mask)
+ /*
+ * Don't allow IO that isn't aligned to minimum IO size (io_min)
+ * - for normal device's io_min is usually logical block size
+ * - but for more exotic devices (e.g. DM thinp) it may be larger
+ */
+ if ((start | len) % bdev_io_min(bdev))
return -EINVAL;
/* Invalidate the page cache, including dirty pages. */
@@ -1839,7 +1838,10 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
truncate_inode_pages_range(mapping, start, end);
error = -EINVAL;
- if (mode & FALLOC_FL_ZERO_RANGE)
+ if (!(mode & ~FALLOC_FL_KEEP_SIZE))
+ error = blkdev_ensure_space_exists(bdev, start >> 9, len >> 9,
+ mode);
+ else if (mode & FALLOC_FL_ZERO_RANGE)
error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
GFP_KERNEL, false);
else if (mode & FALLOC_FL_PUNCH_HOLE)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6c6ea96..4147af2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1132,6 +1132,8 @@ extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, struct page *page);
extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, bool discard);
+extern int blkdev_ensure_space_exists(struct block_device *bdev, sector_t sector,
+ sector_t nr_sects, unsigned long flags);
static inline int sb_issue_discard(struct super_block *sb, sector_t block,
sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags)
{
--
2.6.4 (Apple Git-63)
next prev parent reply other threads:[~2016-04-12 20:05 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-12 16:42 [RFC v2 PATCH 00/10] dm-thin/xfs: prototype a block reservation allocation model Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 01/10] xfs: refactor xfs_reserve_blocks() to handle ENOSPC correctly Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 02/10] xfs: replace xfs_mod_fdblocks() bool param with flags Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 03/10] block: add block_device_operations methods to set and get reserved space Brian Foster
2016-04-14 0:32 ` Dave Chinner
2016-04-12 16:42 ` [RFC v2 PATCH 04/10] dm: add " Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 05/10] dm thin: " Brian Foster
2016-04-13 17:44 ` Darrick J. Wong
2016-04-13 18:33 ` Brian Foster
2016-04-13 20:41 ` Brian Foster
2016-04-13 21:01 ` Darrick J. Wong
2016-04-14 15:10 ` Mike Snitzer
2016-04-14 16:23 ` Brian Foster
2016-04-14 20:18 ` Mike Snitzer
2016-04-15 11:48 ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 06/10] xfs: thin block device reservation mechanism Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 07/10] xfs: adopt a reserved allocation model on dm-thin devices Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 08/10] xfs: handle bdev reservation ENOSPC correctly from XFS reserved pool Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 09/10] xfs: support no block reservation transaction mode Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 10/10] xfs: use contiguous bdev reservation for file preallocation Brian Foster
2016-04-12 20:04 ` Mike Snitzer [this message]
2016-04-12 20:39 ` [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space Darrick J. Wong
2016-04-12 20:46 ` Mike Snitzer
2016-04-12 22:25 ` Darrick J. Wong
2016-04-12 21:04 ` Mike Snitzer
2016-04-13 0:12 ` Darrick J. Wong
2016-04-14 15:18 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160412200459.GA10730@redhat.com \
--to=snitzer@redhat.com \
--cc=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=dm-devel@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).