From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>,
Christoph Hellwig <hch@infradead.org>,
linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org,
linux-fsdevel@vger.kernel.org, neilb@suse.de
Subject: Re: [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function
Date: Mon, 17 Nov 2014 10:53:27 -0800 [thread overview]
Message-ID: <20141117185327.GB10047@birch.djwong.org> (raw)
In-Reply-To: <yq1ioihv8gi.fsf@sermon.lab.mkp.net>
On Fri, Nov 14, 2014 at 03:22:05PM -0500, Martin K. Petersen wrote:
> >>>>> "Martin" == Martin K Petersen <martin.petersen@oracle.com> writes:
>
> Martin> What would you prefer as the default for the ext4 use case? To
> Martin> allocate or to discard?
>
> I didn't get a preference for whether sb_issue_zeroout() should discard
> or allocate.
In the discussions I've had on the ext4 list, we seem to be leaning towards
discard and falling back to allocate if necessary.
--D
>
> But here's an updated patch 3...
>
> commit eb23c9e71e08b7f467cbc36990a1a01a94a7b959
> Author: Martin K. Petersen <martin.petersen@oracle.com>
> Date: Thu Nov 6 14:36:05 2014 -0500
>
> block: Add discard flag to blkdev_issue_zeroout() function
>
> blkdev_issue_discard() will zero a given block range. This is done by
> way of explicit writing, thus provisioning or allocating the blocks on
> disk.
>
> There are use cases where the desired behavior is to zero the blocks but
> unprovision them if possible. The blocks must deterministically contain
> zeroes when they are subsequently read back.
>
> This patch adds a flag to blkdev_issue_zeroout() that provides this
> variant. If the discard flag is set and a block device guarantees
> discard_zeroes_data we will use REQ_DISCARD to clear the block range. If
> the device does not support discard_zeroes_data or if the discard
> request fails we will fall back to first REQ_WRITE_SAME and then a
> regular REQ_WRITE.
>
> Also update the callers of blkdev_issue_zero() to reflect the new flag
> and make sb_issue_zeroout() prefer the discard approach.
>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
>
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 8411be3c19d3..715e948f58a4 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -283,23 +283,45 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> * @sector: start sector
> * @nr_sects: number of sectors to write
> * @gfp_mask: memory allocation flags (for bio_alloc)
> + * @discard: whether to discard the block range
> *
> * Description:
> - * Generate and issue number of bios with zerofiled pages.
> +
> + * Zero-fill a block range. If the discard flag is set and the block
> + * device guarantees that subsequent READ operations to the block range
> + * in question will return zeroes, the blocks will be discarded. Should
> + * the discard request fail, if the discard flag is not set, or if
> + * discard_zeroes_data is not supported, this function will resort to
> + * zeroing the blocks manually, thus provisioning (allocating,
> + * anchoring) them. If the block device supports the WRITE SAME command
> + * blkdev_issue_zeroout() will use it to optimize the process of
> + * clearing the block range. Otherwise the zeroing will be performed
> + * using regular WRITE calls.
> */
>
> int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> - sector_t nr_sects, gfp_t gfp_mask)
> + sector_t nr_sects, gfp_t gfp_mask, bool discard)
> {
> + struct request_queue *q = bdev_get_queue(bdev);
> + unsigned char bdn[BDEVNAME_SIZE];
> +
> + if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data) {
> +
> + if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0))
> + return 0;
> +
> + bdevname(bdev, bdn);
> + pr_warn("%s: DISCARD failed. Manually zeroing.\n", bdn);
> + }
> +
> if (bdev_write_same(bdev)) {
> - unsigned char bdn[BDEVNAME_SIZE];
>
> if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
> ZERO_PAGE(0)))
> return 0;
>
> bdevname(bdev, bdn);
> - pr_err("%s: WRITE SAME failed. Manually zeroing.\n", bdn);
> + pr_warn("%s: WRITE SAME failed. Manually zeroing.\n", bdn);
> }
>
> return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
> diff --git a/block/ioctl.c b/block/ioctl.c
> index 6c7bf903742f..7d8befde2aca 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -198,7 +198,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, uint64_t start,
> if (start + len > (i_size_read(bdev->bd_inode) >> 9))
> return -EINVAL;
>
> - return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL);
> + return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false);
> }
>
> static int put_ushort(unsigned long arg, unsigned short val)
> diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
> index 6960fb064731..ee5b9611c51c 100644
> --- a/drivers/block/drbd/drbd_receiver.c
> +++ b/drivers/block/drbd/drbd_receiver.c
> @@ -1388,7 +1388,7 @@ int drbd_submit_peer_request(struct drbd_device *device,
> list_add_tail(&peer_req->w.list, &device->active_ee);
> spin_unlock_irq(&device->resource->req_lock);
> if (blkdev_issue_zeroout(device->ldev->backing_bdev,
> - sector, data_size >> 9, GFP_NOIO))
> + sector, data_size >> 9, GFP_NOIO, false))
> peer_req->flags |= EE_WAS_ERROR;
> drbd_endio_write_sec_final(peer_req);
> return 0;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index aac0f9ea952a..f3ad69d8de45 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1163,7 +1163,7 @@ extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
> sector_t nr_sects, gfp_t gfp_mask, struct page *page);
> extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> - sector_t nr_sects, gfp_t gfp_mask);
> + sector_t nr_sects, gfp_t gfp_mask, bool discard);
> static inline int sb_issue_discard(struct super_block *sb, sector_t block,
> sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags)
> {
> @@ -1177,7 +1177,7 @@ static inline int sb_issue_zeroout(struct super_block *sb, sector_t block,
> return blkdev_issue_zeroout(sb->s_bdev,
> block << (sb->s_blocksize_bits - 9),
> nr_blocks << (sb->s_blocksize_bits - 9),
> - gfp_mask);
> + gfp_mask, true);
> }
>
> extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-11-17 18:53 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-07 5:08 [RFC] Discard update for 3.19 Martin K. Petersen
2014-11-07 5:08 ` [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM Martin K. Petersen
2014-11-07 8:24 ` Christoph Hellwig
2014-11-07 15:37 ` Martin K. Petersen
2014-12-05 16:45 ` Paolo Bonzini
2014-12-05 22:58 ` Elliott, Robert (Server Storage)
2014-12-08 15:15 ` Theodore Ts'o
2014-12-08 15:28 ` James Bottomley
2014-12-08 22:59 ` One Thousand Gnomes
2014-11-07 5:08 ` [PATCH 2/3] sd: Disable discard_zeroes_data for UNMAP Martin K. Petersen
2014-11-07 8:25 ` Christoph Hellwig
2014-11-10 23:43 ` Paolo Bonzini
2014-11-07 5:08 ` [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function Martin K. Petersen
2014-11-07 8:26 ` Christoph Hellwig
2014-11-07 15:42 ` Martin K. Petersen
2014-11-07 16:20 ` Theodore Ts'o
2014-11-07 16:27 ` Martin K. Petersen
2014-11-14 20:22 ` Martin K. Petersen
2014-11-17 18:53 ` Darrick J. Wong [this message]
2014-11-11 0:04 ` Darrick J. Wong
2014-11-11 2:33 ` Martin K. Petersen
2014-11-17 19:28 ` [PATCH] block: create ioctl to discard-or-zeroout a range of blocks Darrick J. Wong
2014-11-07 12:09 ` [RFC] Discard update for 3.19 Bernd Schubert
2014-11-07 12:11 ` Bernd Schubert
2014-11-07 15:46 ` Martin K. Petersen
2014-11-10 14:19 ` Christoph Hellwig
2014-11-12 18:40 ` Ewan Milne
2014-11-12 19:41 ` Martin K. Petersen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141117185327.GB10047@birch.djwong.org \
--to=darrick.wong@oracle.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=neilb@suse.de \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).