From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: xfs@oss.sgi.com, linux-block@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, dm-devel@redhat.com,
	Joe Thornber <ejt@redhat.com>,
	snitzer@redhat.com
Subject: Re: [RFC v2 PATCH 05/10] dm thin: add methods to set and get reserved space
Date: Wed, 13 Apr 2016 16:41:18 -0400	[thread overview]
Message-ID: <20160413204117.GA6870@bfoster.bfoster> (raw)
In-Reply-To: <20160413183352.GB2775@bfoster.bfoster>
On Wed, Apr 13, 2016 at 02:33:52PM -0400, Brian Foster wrote:
> On Wed, Apr 13, 2016 at 10:44:42AM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 12, 2016 at 12:42:48PM -0400, Brian Foster wrote:
> > > From: Joe Thornber <ejt@redhat.com>
> > > 
> > > Experimental reserve interface for XFS guys to play with.
> > > 
> > > I have big reservations (no pun intended) about this patch.
> > > 
> > > [BF:
> > >  - Support for reservation reduction.
> > >  - Support for space provisioning.
> > >  - Condensed to a single function.]
> > > 
> > > Not-Signed-off-by: Joe Thornber <ejt@redhat.com>
> > > Not-Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > > ---
> > >  drivers/md/dm-thin.c | 181 ++++++++++++++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 171 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > > index 92237b6..32bc5bd 100644
> > > --- a/drivers/md/dm-thin.c
> > > +++ b/drivers/md/dm-thin.c
> ...
> > > @@ -4271,6 +4343,94 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
> > >  	limits->max_discard_sectors = 2048 * 1024 * 16; /* 16G */
> > >  }
> > >  
> > > +static int thin_provision_space(struct dm_target *ti, sector_t offset,
> > > +				sector_t len, sector_t *res)
> > > +{
> > > +	struct thin_c *tc = ti->private;
> > > +	struct pool *pool = tc->pool;
> > > +	sector_t end;
> > > +	dm_block_t pblock;
> > > +	dm_block_t vblock;
> > > +	int error;
> > > +	struct dm_thin_lookup_result lookup;
> > > +
> > > +	if (!is_factor(offset, pool->sectors_per_block))
> > > +		return -EINVAL;
> > > +
> > > +	if (!len || !is_factor(len, pool->sectors_per_block))
> > > +		return -EINVAL;
> > > +
> > > +	if (res && !is_factor(*res, pool->sectors_per_block))
> > > +		return -EINVAL;
> > > +
> > > +	end = offset + len;
> > > +
> > > +	while (offset < end) {
> > > +		vblock = offset;
> > > +		do_div(vblock, pool->sectors_per_block);
> > > +
> > > +		error = dm_thin_find_block(tc->td, vblock, true, &lookup);
> > > +		if (error == 0)
> > > +			goto next;
> > > +		if (error != -ENODATA)
> > > +			return error;
> > > +
> > > +		error = alloc_data_block(tc, &pblock);
> > 
> > So this means that if fallocate wants to BDEV_RES_PROVISION N blocks, it must
> > first increase the reservation (BDEV_RES_MOD) by N blocks to avoid using up
> > space that was previously reserved by some other caller.  I think?
> > 
> 
> Yes, assuming this is being called from a filesystem using the
> reservation mechanism.
> 
> > > +		if (error)
> > > +			return error;
> > > +
> > > +		error = dm_thin_insert_block(tc->td, vblock, pblock);
> > 
> > Having reserved and mapped blocks, what happens when we try to read them?
> > Do we actually get zeroes, or does the read go straight through to whatever
> > happens to be in the disk blocks?  I don't think it's correct that we could
> > BDEV_RES_PROVISION and end up with stale credit card numbers from some other
> > thin device.
> > 
> 
> Agree, but I'm not really sure how this works in thinp tbh. fallocate
> wasn't really on my mind when doing this. I was simply trying to cobble
> together what I could to facilitate making progress on the fs parts
> (e.g., I just needed a call that allocated blocks and consumed
> reservation in the process).
> 
> Skimming through the dm-thin code, it looks like a (configurable) block
> zeroing mechanism can be triggered from somewhere around
> provision_block()->schedule_zero(), depending on whether the incoming
> write overwrites the newly allocated block. If that's the case, then I
> suspect that means reads would just fall through to the block and return
> whatever was on disk. This code would probably need to tie into that
> zeroing mechanism one way or another to deal with that issue. (Though
> somebody who actually knows something about dm-thin should verify that.
> :)
> 
BTW, if that mechanism is in fact doing I/O, that might not be the
appropriate solution for fallocate. Perhaps we'd have to consider an
unwritten flag or some such in dm-thin, if possible.
Brian
> Brian
> 
> > (PS: I don't know enough about thinp to know if this has already been taken
> > care of.  I didn't see anything, but who knows what I missed. :))
> > 
> > --D
> > 
> > > +		if (error)
> > > +			return error;
> > > +
> > > +		if (res && *res)
> > > +			*res -= pool->sectors_per_block;
> > > +next:
> > > +		offset += pool->sectors_per_block;
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int thin_reserve_space(struct dm_target *ti, int mode, sector_t offset,
> > > +			      sector_t len, sector_t *res)
> > > +{
> > > +	struct thin_c *tc = ti->private;
> > > +	struct pool *pool = tc->pool;
> > > +	sector_t blocks;
> > > +	unsigned long flags;
> > > +	int error;
> > > +
> > > +	if (mode == BDEV_RES_PROVISION)
> > > +		return thin_provision_space(ti, offset, len, res);
> > > +
> > > +	/* res required for get/set */
> > > +	error = -EINVAL;
> > > +	if (!res)
> > > +		return error;
> > > +
> > > +	if (mode == BDEV_RES_GET) {
> > > +		spin_lock_irqsave(&tc->pool->lock, flags);
> > > +		*res = tc->reserve_count * pool->sectors_per_block;
> > > +		spin_unlock_irqrestore(&tc->pool->lock, flags);
> > > +		error = 0;
> > > +	} else if (mode == BDEV_RES_MOD) {
> > > +		/*
> > > +		* @res must always be a factor of the pool's blocksize; upper
> > > +		* layers can rely on the bdev's minimum_io_size for this.
> > > +		*/
> > > +		if (!is_factor(*res, pool->sectors_per_block))
> > > +			return error;
> > > +
> > > +		blocks = *res;
> > > +		(void) sector_div(blocks, pool->sectors_per_block);
> > > +
> > > +		error = set_reserve_count(tc, blocks);
> > > +	}
> > > +
> > > +	return error;
> > > +}
> > > +
> > >  static struct target_type thin_target = {
> > >  	.name = "thin",
> > >  	.version = {1, 18, 0},
> > > @@ -4285,6 +4445,7 @@ static struct target_type thin_target = {
> > >  	.status = thin_status,
> > >  	.iterate_devices = thin_iterate_devices,
> > >  	.io_hints = thin_io_hints,
> > > +	.reserve_space = thin_reserve_space,
> > >  };
> > >  
> > >  /*----------------------------------------------------------------*/
> > > -- 
> > > 2.4.11
> > > 
> > > _______________________________________________
> > > xfs mailing list
> > > xfs@oss.sgi.com
> > > http://oss.sgi.com/mailman/listinfo/xfs
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-block" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
next prev parent reply	other threads:[~2016-04-13 20:41 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-12 16:42 [RFC v2 PATCH 00/10] dm-thin/xfs: prototype a block reservation allocation model Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 01/10] xfs: refactor xfs_reserve_blocks() to handle ENOSPC correctly Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 02/10] xfs: replace xfs_mod_fdblocks() bool param with flags Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 03/10] block: add block_device_operations methods to set and get reserved space Brian Foster
2016-04-14  0:32   ` Dave Chinner
2016-04-12 16:42 ` [RFC v2 PATCH 04/10] dm: add " Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 05/10] dm thin: " Brian Foster
2016-04-13 17:44   ` Darrick J. Wong
2016-04-13 18:33     ` Brian Foster
2016-04-13 20:41       ` Brian Foster [this message]
2016-04-13 21:01         ` Darrick J. Wong
2016-04-14 15:10         ` Mike Snitzer
2016-04-14 16:23           ` Brian Foster
2016-04-14 20:18             ` Mike Snitzer
2016-04-15 11:48               ` Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 06/10] xfs: thin block device reservation mechanism Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 07/10] xfs: adopt a reserved allocation model on dm-thin devices Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 08/10] xfs: handle bdev reservation ENOSPC correctly from XFS reserved pool Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 09/10] xfs: support no block reservation transaction mode Brian Foster
2016-04-12 16:42 ` [RFC v2 PATCH 10/10] xfs: use contiguous bdev reservation for file preallocation Brian Foster
2016-04-12 20:04 ` [RFC PATCH] block: wire blkdev_fallocate() to block_device_operations' reserve_space Mike Snitzer
2016-04-12 20:39   ` Darrick J. Wong
2016-04-12 20:46     ` Mike Snitzer
2016-04-12 22:25       ` Darrick J. Wong
2016-04-12 21:04     ` Mike Snitzer
2016-04-13  0:12       ` Darrick J. Wong
2016-04-14 15:18         ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=20160413204117.GA6870@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=dm-devel@redhat.com \
    --cc=ejt@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=snitzer@redhat.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).