linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Damien Le Moal <damien.lemoal@wdc.com>
Cc: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>,
	linux-scsi@vger.kernel.org,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
	Keith Busch <keith.busch@wdc.com>
Subject: Re: [PATCH v2 1/2] block: introduce zone_write_granularity limit
Date: Wed, 20 Jan 2021 11:10:18 +0100	[thread overview]
Message-ID: <20210120101018.GB25746@lst.de> (raw)
In-Reply-To: <20210119131723.1637853-2-damien.lemoal@wdc.com>

On Tue, Jan 19, 2021 at 10:17:22PM +0900, Damien Le Moal wrote:
> Per ZBC and ZAC specifications, host-managed SMR hard-disks mandate that
> all writes into sequential write required zones be aligned to the device
> physical block size. However, NVMe ZNS does not have this constraint and
> allows write operations into sequential zones to be logical block size
> aligned. This inconsistency does not help with portability of software
> across device types.
> To solve this, introduce the zone_write_granularity queue limit to
> indicate the alignment constraint, in bytes, of write operations into
> zones of a zoned block device. This new limit is exported as a
> read-only sysfs queue attribute and the helper
> blk_queue_zone_write_granularity() introduced for drivers to set this
> limit. The scsi disk driver is modified to use this helper to set
> host-managed SMR disk zone write granularity to the disk physical block
> size. The nvme driver zns support use this helper to set the new limit
> to the logical block size of the zoned namespace.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  Documentation/block/queue-sysfs.rst |  7 +++++++
>  block/blk-settings.c                | 28 ++++++++++++++++++++++++++++
>  block/blk-sysfs.c                   |  7 +++++++
>  drivers/nvme/host/zns.c             |  1 +
>  drivers/scsi/sd_zbc.c               | 10 ++++++++++
>  include/linux/blkdev.h              |  3 +++
>  6 files changed, 56 insertions(+)
> 
> diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst
> index 2638d3446b79..c8bf8bc3c03a 100644
> --- a/Documentation/block/queue-sysfs.rst
> +++ b/Documentation/block/queue-sysfs.rst
> @@ -273,4 +273,11 @@ devices are described in the ZBC (Zoned Block Commands) and ZAC
>  do not support zone commands, they will be treated as regular block devices
>  and zoned will report "none".
>  
> +zone_write_granularity (RO)
> +---------------------------
> +This indicates the alignment constraint, in bytes, for write operations in
> +sequential zones of zoned block devices (devices with a zoned attributed
> +that reports "host-managed" or "host-aware"). This value is always 0 for
> +regular block devices.
> +
>  Jens Axboe <jens.axboe@oracle.com>, February 2009
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 43990b1d148b..6be6ed9485e3 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -60,6 +60,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>  	lim->io_opt = 0;
>  	lim->misaligned = 0;
>  	lim->zoned = BLK_ZONED_NONE;
> +	lim->zone_write_granularity = 0;

I think this should default to 512 just like the logic and physical
block size.

>  }
>  EXPORT_SYMBOL(blk_set_default_limits);
>  
> @@ -366,6 +367,31 @@ void blk_queue_physical_block_size(struct request_queue *q, unsigned int size)
>  }
>  EXPORT_SYMBOL(blk_queue_physical_block_size);
>  
> +/**
> + * blk_queue_zone_write_granularity - set zone write granularity for the queue
> + * @q:  the request queue for the zoned device
> + * @size:  the zone write granularity size, in bytes
> + *
> + * Description:
> + *   This should be set to the lowest possible size allowing to write in
> + *   sequential zones of a zoned block device.
> + */
> +void blk_queue_zone_write_granularity(struct request_queue *q,
> +				      unsigned int size)
> +{
> +	if (WARN_ON(!blk_queue_is_zoned(q)))
> +		return;
> +
> +	q->limits.zone_write_granularity = size;
> +
> +	if (q->limits.zone_write_granularity < q->limits.logical_block_size)
> +		q->limits.zone_write_granularity = q->limits.logical_block_size;

I think this should be a WARN_ON_ONCE.

> +	if (q->limits.zone_write_granularity < q->limits.io_min)
> +		q->limits.zone_write_granularity = q->limits.io_min;

I don't think this makes sense at all.

> +static ssize_t queue_zone_write_granularity_show(struct request_queue *q, char *page)

Overly long line.

> +	/*
> +	 * Per ZBC and ZAC specifications, writes in sequential write required
> +	 * zones of host-managed devices must be aligned to the device physical
> +	 * block size.
> +	 */
> +	if (blk_queue_zoned_model(q) == BLK_ZONED_HM)
> +		blk_queue_zone_write_granularity(q, sdkp->physical_block_size);
> +	else
> +		blk_queue_zone_write_granularity(q, sdkp->device->sector_size);

Do we really want to special case HA drives here?  I though we generally
either treat them as drive managed (if they have partitions) or else like
host managed ones.

  reply	other threads:[~2021-01-20 11:24 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 13:17 [PATCH v2 0/2] block: add zone write granularity limit Damien Le Moal
2021-01-19 13:17 ` [PATCH v2 1/2] block: introduce zone_write_granularity limit Damien Le Moal
2021-01-20 10:10   ` Christoph Hellwig [this message]
2021-01-20 11:00     ` Damien Le Moal
2021-01-20 12:36       ` Christoph Hellwig
2021-01-19 13:17 ` [PATCH v2 2/2] block: document zone_append_max_bytes attribute Damien Le Moal
2021-01-20 10:10   ` Christoph Hellwig
2021-01-20 10:07 ` [PATCH v2 0/2] block: add zone write granularity limit Christoph Hellwig
2021-01-20 10:42   ` Damien Le Moal
2021-01-20 10:46   ` Damien Le Moal
  -- strict thread matches above, loose matches on Subject: below --
2021-01-19  9:38 Damien Le Moal
2021-01-19  9:38 ` [PATCH v2 1/2] block: introduce zone_write_granularity limit Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210120101018.GB25746@lst.de \
    --to=hch@lst.de \
    --cc=axboe@kernel.dk \
    --cc=chaitanya.kulkarni@wdc.com \
    --cc=damien.lemoal@wdc.com \
    --cc=keith.busch@wdc.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).