dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Brian Foster <bfoster@redhat.com>,
	xfs@oss.sgi.com, linux-block@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, dm-devel@redhat.com
Subject: Re: [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space
Date: Tue, 12 Apr 2016 16:46:58 -0400	[thread overview]
Message-ID: <20160412204658.GA1759@redhat.com> (raw)
In-Reply-To: <20160412203904.GD5812@birch.djwong.org>

On Tue, Apr 12 2016 at  4:39pm -0400,
Darrick J. Wong <darrick.wong@oracle.com> wrote:

> On Tue, Apr 12, 2016 at 04:04:59PM -0400, Mike Snitzer wrote:
> > On Tue, Apr 12 2016 at 12:42P -0400,
> > Brian Foster <bfoster@redhat.com> wrote:
> > 
> > > Hi all,
> > > 
> > > This is v2 of the XFS and block device reservation experiment. The
> > > significant changes in v2 are that the bdev interface has been condensed
> > > to a single callback function, the XFS transaction reservation
> > > management has been reworked to make transactions responsible for
> > > tracking and releasing excess reservation (for non-delalloc cases) and a
> > > workaround for the fallocate over-reservation issue is included. Beyond
> > > that, this version adds a bunch of miscellaneous cleanups and fixes some
> > > of the nastier locking/leak issues present in the first rfc.
> > > 
> > > Patches 1-2 refactor some XFS reserve pool and block accounting code in
> > > preparation for subsequent patches. Patches 3-5 add block/device-mapper
> > > reservation support. Patches 6-10 add the core reservation
> > > infrastructure and management bits to XFS. See the link to the original
> > > rfc below for instructions and further details around the purpose of
> > > this series.
> > > 
> > > Finally, note that this is still highly experimental/theoretical and
> > > should not be used on production systems. Thoughts, reviews, flames
> > > appreciated.
> > 
> > Thanks for carrying on with this work Brian.
> > 
> > I've started to review your patchset and Darrick's fallocate patchset.
> > I've pushed a branch to linux-dm.git that combines the 2, see:
> > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-fallocate
> > 
> > and then added this RFC patch, at the end, which relies on both of your
> > patchsets -- you'll see blkdev_ensure_space_exists() has a FIXME which
> > implies it isn't much more than simply stubbed out at this point
> > (completely untested):
> 
> Hmm, ok, but -rc3 broke a bunch of stuff.  Guess I should repost with all
> the PAGE_CACHE_ -> PAGE_ stuff fixed. :)

Yeah, the kernel.org kbuild robots just spammed us about that same exact
breakage.

> > From: Mike Snitzer <snitzer@redhat.com>
> > Date: Tue, 12 Apr 2016 15:54:31 -0400
> > Subject: [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space
> > 
> > This effectively exposes the primitive for "ensure space exists".  It
> > relies on block_device_operations' reserve_space method.
> > 
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > ---
> >  block/blk-lib.c        | 26 ++++++++++++++++++++++++++
> >  fs/block_dev.c         | 20 +++++++++++---------
> >  include/linux/blkdev.h |  2 ++
> >  3 files changed, 39 insertions(+), 9 deletions(-)
> > 
> > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > index 9dca6bb..5042a84 100644
> > --- a/block/blk-lib.c
> > +++ b/block/blk-lib.c
> > @@ -314,3 +314,29 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> >  	return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
> >  }
> >  EXPORT_SYMBOL(blkdev_issue_zeroout);
> > +
> > +/**
> > + * blkdev_ensure_space_exists - preallocate a block range
> > + * @bdev:	blockdev to preallocate space for
> > + * @sector:	start sector
> > + * @nr_sects:	number of sectors to preallocate
> > + * @gfp_mask:	memory allocation flags (for bio_alloc)
> > + * @flags:	FALLOC_FL_* to control behaviour
> > + *
> > + * Description:
> > + *    Ensure space exists, or is preallocated, for the sectors in question.
> > + */
> > +int blkdev_ensure_space_exists(struct block_device *bdev, sector_t sector,
> > +		sector_t nr_sects, unsigned long flags)
> > +{
> > +	sector_t res;
> > +	const struct block_device_operations *ops = bdev->bd_disk->fops;
> > +
> > +	if (!ops->reserve_space)
> > +		return -EOPNOTSUPP;
> > +
> > +	// FIXME: check with Brian Foster on whether it makes sense to
> > +	// use BDEV_RES_GET/BDEV_RES_MOD instead of BDEV_RES_PROVISION?
> > +	return ops->reserve_space(bdev, BDEV_RES_PROVISION, sector, nr_sects, &res);
> 
> /me thinks BDEV_RES_PROVISION is correct here, because regular-mode file
> fallocate (for ext4/xfs anyway) allocates blocks and maps them to specific file
> offsets as unwritten extents.  afaict RES_PROVISION -> thin_provision_space()
> and thin_provision_space() seems to allocate blocks and map them to the
> device's LBAs.
> 
> If I'm reading the patches correctly, RES_GET/RES_MOD seem to reserve N blocks
> but doesn't map them to any specific LBA.

Right that is how I read it too.  I just put that FIXME in to cover my
ass incase I was being an idiot ;)

> > +}
> > +EXPORT_SYMBOL(blkdev_ensure_space_exists);
> > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > index 5a2c3ab..b34c07b 100644
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -1801,17 +1801,13 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
> >  	struct request_queue *q = bdev_get_queue(bdev);
> >  	struct address_space *mapping;
> >  	loff_t end = start + len - 1;
> > -	loff_t bs_mask, isize;
> > +	loff_t isize;
> >  	int error;
> >  
> >  	/* We only support zero range and punch hole. */
> >  	if (mode & ~BLKDEV_FALLOC_FL_SUPPORTED)
> >  		return -EOPNOTSUPP;
> >  
> > -	/* We haven't a primitive for "ensure space exists" right now. */
> > -	if (!(mode & ~FALLOC_FL_KEEP_SIZE))
> > -		return -EOPNOTSUPP;
> > -
> >  	/* Only punch if the device can do zeroing discard. */
> >  	if ((mode & FALLOC_FL_PUNCH_HOLE) &&
> >  	    (!blk_queue_discard(q) || !q->limits.discard_zeroes_data))
> > @@ -1829,9 +1825,12 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
> >  			return -EINVAL;
> >  	}
> >  
> > -	/* Don't allow IO that isn't aligned to logical block size */
> > -	bs_mask = bdev_logical_block_size(bdev) - 1;
> > -	if ((start | len) & bs_mask)
> > +	/*
> > +	 * Don't allow IO that isn't aligned to minimum IO size (io_min)
> > +	 * - for normal device's io_min is usually logical block size
> > +	 * - but for more exotic devices (e.g. DM thinp) it may be larger
> > +	 */
> > +	if ((start | len) % bdev_io_min(bdev))
> >  		return -EINVAL;
> 
> Noted.  Will update the original patch.

OK, thanks.

Once your new patchset is available I'll rebase my 'dm-fallocate' test
branch accordingly.
 
> >  	/* Invalidate the page cache, including dirty pages. */
> > @@ -1839,7 +1838,10 @@ long blkdev_fallocate(struct file *file, int mode, loff_t start, loff_t len)
> >  	truncate_inode_pages_range(mapping, start, end);
> >  
> >  	error = -EINVAL;
> > -	if (mode & FALLOC_FL_ZERO_RANGE)
> > +	if (!(mode & ~FALLOC_FL_KEEP_SIZE))
> > +		error = blkdev_ensure_space_exists(bdev, start >> 9, len >> 9,
> > +						   mode);
> > +	else if (mode & FALLOC_FL_ZERO_RANGE)
> 
> This whole thing got converted to a switch statement due to some feedback
> from hch.
> 
> Anyway, will try to have a new blockdev fallocate patchset done by the end
> of the day.
> 
> (Is there a test case for this?)

No, but once my patch is in place to join your patchset with Brian's
then any basic fallocate tests against a DM thinp volume _should_ work.

/me assumes xfstests has such tests?  Only missing bit would be to layer
the filesystem ontop of DM thinp?  Or extend the tests your added to
test DM thinp devices directly.  I think Eric Sandeen (now cc'd) made
xfstests capable or creating DM thinp volumes for certain tests.

  reply	other threads:[~2016-04-12 20:46 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-12 16:42 [RFC v2 PATCH 00/10] dm-thin/xfs: prototype a block reservation allocation model Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 01/10] xfs: refactor xfs_reserve_blocks() to handle ENOSPC correctly Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 02/10] xfs: replace xfs_mod_fdblocks() bool param with flags Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 03/10] block: add block_device_operations methods to set and get reserved space Brian Foster
2016-04-14  0:32   ` Dave Chinner
2016-04-12 16:42 ` [RFC v2 PATCH 04/10] dm: add " Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 05/10] dm thin: " Brian Foster
2016-04-13 17:44   ` Darrick J. Wong
2016-04-13 18:33     ` Brian Foster
2016-04-13 20:41       ` Brian Foster
2016-04-13 21:01         ` Darrick J. Wong
2016-04-14 15:10         ` Mike Snitzer
2016-04-14 16:23           ` Brian Foster
2016-04-14 20:18             ` Mike Snitzer
2016-04-15 11:48               ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 06/10] xfs: thin block device reservation mechanism Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 07/10] xfs: adopt a reserved allocation model on dm-thin devices Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 08/10] xfs: handle bdev reservation ENOSPC correctly from XFS reserved pool Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 09/10] xfs: support no block reservation transaction mode Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 10/10] xfs: use contiguous bdev reservation for file preallocation Brian Foster
2016-04-12 20:04 ` [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space Mike Snitzer
2016-04-12 20:39   ` Darrick J. Wong
2016-04-12 20:46     ` Mike Snitzer [this message]
2016-04-12 22:25       ` Darrick J. Wong
2016-04-12 21:04     ` Mike Snitzer
2016-04-13  0:12       ` Darrick J. Wong
2016-04-14 15:18         ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160412204658.GA1759@redhat.com \
    --to=snitzer@redhat.com \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=dm-devel@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).