linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>,
	Christoph Hellwig <hch@infradead.org>,
	linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, neilb@suse.de
Subject: Re: [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function
Date: Mon, 17 Nov 2014 10:53:27 -0800	[thread overview]
Message-ID: <20141117185327.GB10047@birch.djwong.org> (raw)
In-Reply-To: <yq1ioihv8gi.fsf@sermon.lab.mkp.net>

On Fri, Nov 14, 2014 at 03:22:05PM -0500, Martin K. Petersen wrote:
> >>>>> "Martin" == Martin K Petersen <martin.petersen@oracle.com> writes:
> 
> Martin> What would you prefer as the default for the ext4 use case? To
> Martin> allocate or to discard?
> 
> I didn't get a preference for whether sb_issue_zeroout() should discard
> or allocate.

In the discussions I've had on the ext4 list, we seem to be leaning towards
discard and falling back to allocate if necessary.

--D

> 
> But here's an updated patch 3...
> 
> commit eb23c9e71e08b7f467cbc36990a1a01a94a7b959
> Author: Martin K. Petersen <martin.petersen@oracle.com>
> Date:   Thu Nov 6 14:36:05 2014 -0500
> 
>     block: Add discard flag to blkdev_issue_zeroout() function
>     
>     blkdev_issue_discard() will zero a given block range. This is done by
>     way of explicit writing, thus provisioning or allocating the blocks on
>     disk.
>     
>     There are use cases where the desired behavior is to zero the blocks but
>     unprovision them if possible. The blocks must deterministically contain
>     zeroes when they are subsequently read back.
>     
>     This patch adds a flag to blkdev_issue_zeroout() that provides this
>     variant. If the discard flag is set and a block device guarantees
>     discard_zeroes_data we will use REQ_DISCARD to clear the block range. If
>     the device does not support discard_zeroes_data or if the discard
>     request fails we will fall back to first REQ_WRITE_SAME and then a
>     regular REQ_WRITE.
>     
>     Also update the callers of blkdev_issue_zero() to reflect the new flag
>     and make sb_issue_zeroout() prefer the discard approach.
>     
>     Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 8411be3c19d3..715e948f58a4 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -283,23 +283,45 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>   * @sector:	start sector
>   * @nr_sects:	number of sectors to write
>   * @gfp_mask:	memory allocation flags (for bio_alloc)
> + * @discard:	whether to discard the block range
>   *
>   * Description:
> - *  Generate and issue number of bios with zerofiled pages.
> +
> + *  Zero-fill a block range.  If the discard flag is set and the block
> + *  device guarantees that subsequent READ operations to the block range
> + *  in question will return zeroes, the blocks will be discarded. Should
> + *  the discard request fail, if the discard flag is not set, or if
> + *  discard_zeroes_data is not supported, this function will resort to
> + *  zeroing the blocks manually, thus provisioning (allocating,
> + *  anchoring) them. If the block device supports the WRITE SAME command
> + *  blkdev_issue_zeroout() will use it to optimize the process of
> + *  clearing the block range. Otherwise the zeroing will be performed
> + *  using regular WRITE calls.
>   */
>  
>  int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> -			 sector_t nr_sects, gfp_t gfp_mask)
> +			 sector_t nr_sects, gfp_t gfp_mask, bool discard)
>  {
> +	struct request_queue *q = bdev_get_queue(bdev);
> +	unsigned char bdn[BDEVNAME_SIZE];
> +
> +	if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data) {
> +
> +		if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0))
> +			return 0;
> +
> +		bdevname(bdev, bdn);
> +		pr_warn("%s: DISCARD failed. Manually zeroing.\n", bdn);
> +	}
> +
>  	if (bdev_write_same(bdev)) {
> -		unsigned char bdn[BDEVNAME_SIZE];
>  
>  		if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
>  					     ZERO_PAGE(0)))
>  			return 0;
>  
>  		bdevname(bdev, bdn);
> -		pr_err("%s: WRITE SAME failed. Manually zeroing.\n", bdn);
> +		pr_warn("%s: WRITE SAME failed. Manually zeroing.\n", bdn);
>  	}
>  
>  	return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
> diff --git a/block/ioctl.c b/block/ioctl.c
> index 6c7bf903742f..7d8befde2aca 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -198,7 +198,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, uint64_t start,
>  	if (start + len > (i_size_read(bdev->bd_inode) >> 9))
>  		return -EINVAL;
>  
> -	return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL);
> +	return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
>  }
>  
>  static int put_ushort(unsigned long arg, unsigned short val)
> diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
> index 6960fb064731..ee5b9611c51c 100644
> --- a/drivers/block/drbd/drbd_receiver.c
> +++ b/drivers/block/drbd/drbd_receiver.c
> @@ -1388,7 +1388,7 @@ int drbd_submit_peer_request(struct drbd_device *device,
>  		list_add_tail(&peer_req->w.list, &device->active_ee);
>  		spin_unlock_irq(&device->resource->req_lock);
>  		if (blkdev_issue_zeroout(device->ldev->backing_bdev,
> -			sector, data_size >> 9, GFP_NOIO))
> +			sector, data_size >> 9, GFP_NOIO, false))
>  			peer_req->flags |= EE_WAS_ERROR;
>  		drbd_endio_write_sec_final(peer_req);
>  		return 0;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index aac0f9ea952a..f3ad69d8de45 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1163,7 +1163,7 @@ extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
>  		sector_t nr_sects, gfp_t gfp_mask, struct page *page);
>  extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> -			sector_t nr_sects, gfp_t gfp_mask);
> +		sector_t nr_sects, gfp_t gfp_mask, bool discard);
>  static inline int sb_issue_discard(struct super_block *sb, sector_t block,
>  		sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags)
>  {
> @@ -1177,7 +1177,7 @@ static inline int sb_issue_zeroout(struct super_block *sb, sector_t block,
>  	return blkdev_issue_zeroout(sb->s_bdev,
>  				    block << (sb->s_blocksize_bits - 9),
>  				    nr_blocks << (sb->s_blocksize_bits - 9),
> -				    gfp_mask);
> +				    gfp_mask, true);
>  }
>  
>  extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2014-11-17 18:53 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-07  5:08 [RFC] Discard update for 3.19 Martin K. Petersen
2014-11-07  5:08 ` [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM Martin K. Petersen
2014-11-07  8:24   ` Christoph Hellwig
2014-11-07 15:37     ` Martin K. Petersen
2014-12-05 16:45   ` Paolo Bonzini
2014-12-05 22:58   ` Elliott, Robert (Server Storage)
2014-12-08 15:15     ` Theodore Ts'o
2014-12-08 15:28       ` James Bottomley
2014-12-08 22:59       ` One Thousand Gnomes
2014-11-07  5:08 ` [PATCH 2/3] sd: Disable discard_zeroes_data for UNMAP Martin K. Petersen
2014-11-07  8:25   ` Christoph Hellwig
2014-11-10 23:43   ` Paolo Bonzini
2014-11-07  5:08 ` [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function Martin K. Petersen
2014-11-07  8:26   ` Christoph Hellwig
2014-11-07 15:42     ` Martin K. Petersen
2014-11-07 16:20       ` Theodore Ts'o
2014-11-07 16:27         ` Martin K. Petersen
2014-11-14 20:22           ` Martin K. Petersen
2014-11-17 18:53             ` Darrick J. Wong [this message]
2014-11-11  0:04   ` Darrick J. Wong
2014-11-11  2:33     ` Martin K. Petersen
2014-11-17 19:28     ` [PATCH] block: create ioctl to discard-or-zeroout a range of blocks Darrick J. Wong
2014-11-07 12:09 ` [RFC] Discard update for 3.19 Bernd Schubert
2014-11-07 12:11 ` Bernd Schubert
2014-11-07 15:46   ` Martin K. Petersen
2014-11-10 14:19 ` Christoph Hellwig
2014-11-12 18:40   ` Ewan Milne
2014-11-12 19:41     ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141117185327.GB10047@birch.djwong.org \
    --to=darrick.wong@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=neilb@suse.de \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).