public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Pankaj Raghav <p.raghav@samsung.com>
Cc: hch@lst.de, agk@redhat.com, damien.lemoal@opensource.wdc.com,
	axboe@kernel.dk, snitzer@kernel.org,
	linux-kernel@vger.kernel.org, Johannes.Thumshirn@wdc.com,
	linux-nvme@lists.infradead.org, pankydev8@gmail.com,
	matias.bjorling@wdc.com, linux-block@vger.kernel.org,
	bvanassche@acm.org, gost.dev@samsung.com, dm-devel@redhat.com,
	hare@suse.de, jaegeuk@kernel.org,
	Damien Le Moal <damien.lemoal@wdc.com>
Subject: Re: [PATCH v13 13/13] dm: add power-of-2 target for zoned devices with non power-of-2 zone sizes
Date: Wed, 14 Sep 2022 12:52:46 -0400	[thread overview]
Message-ID: <YyIG3i++QriS9Gyy@redhat.com> (raw)
In-Reply-To: <20220912082204.51189-14-p.raghav@samsung.com>

On Mon, Sep 12 2022 at  4:22P -0400,
Pankaj Raghav <p.raghav@samsung.com> wrote:

> Only zoned devices with power-of-2(po2) number of sectors per zone(zone
> size) were supported in linux but now non power-of-2(npo2) zone sizes
> support has been added to the block layer.
> 
> Filesystems such as F2FS and btrfs have support for zoned devices with
> po2 zone size assumption. Before adding native support for npo2 zone
> sizes, it was suggested to create a dm target for npo2 zone size device to
> appear as a po2 zone size target so that file systems can initially
> work without any explicit changes.
> 
> The design of this target is very simple: remap the device zone size to
> the zone capacity and change the zone size to be the nearest power of 2
> value.
> 
> For e.g., a device with a zone size/capacity of 3M will have an equivalent
> target layout as follows:
> 
> Device layout :-
> zone capacity = 3M
> zone size = 3M
> 
> |--------------|-------------|
> 0             3M            6M
> 
> Target layout :-
> zone capacity=3M
> zone size = 4M
> 
> |--------------|---|--------------|---|
> 0             3M  4M             7M  8M
> 
> The area between target's zone capacity and zone size will be emulated
> in the target.
> The read IOs that fall in the emulated gap area will return 0 filled
> bio and all the other IOs in that area will result in an error.
> If a read IO span across the emulated area boundary, then the IOs are
> split across them. All other IO operations that span across the emulated
> area boundary will result in an error.
> 
> The target can be easily created as follows:
> dmsetup create <label> --table '0 <size_sects> po2zoned /dev/nvme<id>'
> 
> The target does not support partial mapping of the underlying
> device as there is no use-case for it.
> 
> Note:
> This target is not related to dm-zoned target, which exposes a zoned block
> device as a regular block device without any write constraint.
> 
> This target only exposes a different zone size than the underlying device.
> The underlying device's other constraints will be directly exposed to the
> target.
> 
> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
> Suggested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Suggested-by: Damien Le Moal <damien.lemoal@wdc.com>
> Suggested-by: Hannes Reinecke <hare@suse.de>
> ---

<snip>

> diff --git a/drivers/md/dm-po2zoned-target.c b/drivers/md/dm-po2zoned-target.c
> new file mode 100644
> index 000000000000..a48955faa978
> --- /dev/null
> +++ b/drivers/md/dm-po2zoned-target.c

<snip>

> +/*
> + * This target works on the complete zoned device. Partial mapping is not
> + * supported.
> + * Construct a zoned po2 logical device: <dev-path>
> + */
> +static int dm_po2z_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> +{
> +	struct dm_po2z_target *dmh = NULL;
> +	int ret;
> +	sector_t zone_size;
> +	sector_t dev_capacity;
> +
> +	if (argc != 1)
> +		return -EINVAL;
> +
> +	dmh = kmalloc(sizeof(*dmh), GFP_KERNEL);
> +	if (!dmh)
> +		return -ENOMEM;
> +
> +	ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table),
> +			    &dmh->dev);
> +	if (ret) {
> +		ti->error = "Device lookup failed";
> +		kfree(dmh);
> +		return ret;
> +	}
> +
> +	if (!bdev_is_zoned(dmh->dev->bdev)) {
> +		DMERR("%pg is not a zoned device", dmh->dev->bdev);
> +		kfree(dmh);
> +		return -EINVAL;
> +	}
> +
> +	zone_size = bdev_zone_sectors(dmh->dev->bdev);
> +	dev_capacity = get_capacity(dmh->dev->bdev->bd_disk);
> +	if (ti->len != dev_capacity) {
> +		DMERR("%pg Partial mapping of the target is not supported",
> +		      dmh->dev->bdev);
> +		kfree(dmh);
> +		return -EINVAL;
> +	}
> +
> +	if (is_power_of_2(zone_size))
> +		DMWARN("%pg: underlying device has a power-of-2 number of sectors per zone",
> +		       dmh->dev->bdev);
> +
> +	dmh->zone_size = zone_size;
> +	dmh->zone_size_po2 = 1 << get_count_order_long(zone_size);
> +	dmh->zone_size_po2_shift = ilog2(dmh->zone_size_po2);
> +	dmh->zone_size_diff = dmh->zone_size_po2 - dmh->zone_size;
> +	ti->private = dmh;
> +	ti->max_io_len = dmh->zone_size_po2;
> +	dmh->nr_zones = npo2_zone_no(dmh, ti->len);
> +	ti->len = dmh->zone_size_po2 * dmh->nr_zones;
> +
> +	return 0;
> +}

The above error paths need to unwind the references or any other
resources acquired before failing.  Please see other targets for how
they handle sequencing of the needed operations (e.g. dm_put_device)
in the error path by using gotos, etc.

> +
> +static int dm_po2z_report_zones_cb(struct blk_zone *zone, unsigned int idx,
> +				   void *data)
> +{
> +	struct dm_report_zones_args *args = data;
> +	struct dm_target *ti = args->tgt;
> +	struct dm_po2z_target *dmh = ti->private;
> +
> +	zone->start = device_to_target_sect(ti, zone->start);
> +	zone->wp = device_to_target_sect(ti, zone->wp);
> +	zone->len = dmh->zone_size_po2;
> +	args->next_sector = zone->start + zone->len;
> +
> +	return args->orig_cb(zone, args->zone_idx++, args->orig_data);
> +}
> +
> +static int dm_po2z_report_zones(struct dm_target *ti,
> +				struct dm_report_zones_args *args,
> +				unsigned int nr_zones)
> +{
> +	struct dm_po2z_target *dmh = ti->private;
> +	sector_t sect =
> +		po2_zone_no(dmh, dm_target_offset(ti, args->next_sector)) *
> +		dmh->zone_size;
> +
> +	return blkdev_report_zones(dmh->dev->bdev, sect, nr_zones,
> +				   dm_po2z_report_zones_cb, args);
> +}
> +
> +static int dm_po2z_end_io(struct dm_target *ti, struct bio *bio,
> +			  blk_status_t *error)
> +{
> +	if (bio->bi_status == BLK_STS_OK && bio_op(bio) == REQ_OP_ZONE_APPEND)
> +		bio->bi_iter.bi_sector =
> +			device_to_target_sect(ti, bio->bi_iter.bi_sector);
> +
> +	return DM_ENDIO_DONE;
> +}
> +
> +static void dm_po2z_io_hints(struct dm_target *ti, struct queue_limits *limits)
> +{
> +	struct dm_po2z_target *dmh = ti->private;
> +
> +	limits->chunk_sectors = dmh->zone_size_po2;
> +}

Are you certain you shouldn't at least be exposing a different
logical_block_size to upper layers?

> +
> +static void dm_po2z_status(struct dm_target *ti, status_type_t type,
> +			   unsigned int status_flags, char *result,
> +			   unsigned int maxlen)
> +{
> +	struct dm_po2z_target *dmh = ti->private;
> +	size_t sz = 0;
> +
> +	switch (type) {
> +	case STATUSTYPE_INFO:
> +		DMEMIT("%s %lld", dmh->dev->name,
> +		       (unsigned long long)dmh->zone_size_po2);
> +		break;

Wouldn't it be worthwhile to expose the zone sectors (native npo2 vs
simulated po2?) You merely roundup but never expose what you're using
(unless I'm missing something about generic "zoned" device
capabilities).

Mike


  reply	other threads:[~2022-09-14 16:53 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20220912082205eucas1p1e5fb1acfb0b2a2fe6261520ace90c883@eucas1p1.samsung.com>
2022-09-12  8:21 ` [PATCH v13 00/13] support zoned block devices with non-power-of-2 zone sizes Pankaj Raghav
2022-09-12  8:21   ` [PATCH v13 01/13] block: make bdev_nr_zones and disk_zone_no generic for npo2 zone size Pankaj Raghav
2022-09-12 10:03     ` Johannes Thumshirn
2022-09-14 17:19     ` Chaitanya Kulkarni
2022-09-12  8:21   ` [PATCH v13 02/13] block: rearrange bdev_{is_zoned,zone_sectors,get_queue} helper in blkdev.h Pankaj Raghav
2022-09-12 10:03     ` Johannes Thumshirn
2022-09-14 17:19     ` Chaitanya Kulkarni
2022-09-12  8:21   ` [PATCH v13 03/13] block: allow blk-zoned devices to have non-power-of-2 zone size Pankaj Raghav
2022-09-12 10:04     ` Johannes Thumshirn
2022-09-12  8:21   ` [PATCH v13 04/13] nvmet: Allow ZNS target to support non-power_of_2 zone sizes Pankaj Raghav
2022-09-14 17:16     ` Chaitanya Kulkarni
2022-09-12  8:21   ` [PATCH v13 05/13] nvme: zns: Allow ZNS drives that have non-power_of_2 zone size Pankaj Raghav
2022-09-14 17:16     ` Chaitanya Kulkarni
2022-09-12  8:21   ` [PATCH v13 06/13] null_blk: allow zoned devices with non power-of-2 zone sizes Pankaj Raghav
2022-09-12 10:06     ` Johannes Thumshirn
2022-09-14 17:17     ` Chaitanya Kulkarni
2022-09-12  8:21   ` [PATCH v13 07/13] zonefs: allow non power of 2 zoned devices Pankaj Raghav
2022-09-12 10:07     ` Johannes Thumshirn
2022-09-14 17:17     ` Chaitanya Kulkarni
2022-09-12  8:21   ` [PATCH v13 08/13] dm-zoned: ensure only power of 2 zone sizes are allowed Pankaj Raghav
2022-09-12 10:07     ` Johannes Thumshirn
2022-09-12  8:22   ` [PATCH v13 09/13] dm-zone: use generic helpers to calculate offset from zone start Pankaj Raghav
2022-09-12 10:08     ` Johannes Thumshirn
2022-09-12  8:22   ` [PATCH v13 10/13] dm-table: allow zoned devices with non power-of-2 zone sizes Pankaj Raghav
2022-09-12 10:08     ` Johannes Thumshirn
2022-09-12  8:22   ` [PATCH v13 11/13] dm: call dm_zone_endio after the target endio callback for zoned devices Pankaj Raghav
2022-09-12  8:22   ` [PATCH v13 12/13] dm: introduce DM_EMULATED_ZONES target feature flag Pankaj Raghav
2022-09-12  8:22   ` [PATCH v13 13/13] dm: add power-of-2 target for zoned devices with non power-of-2 zone sizes Pankaj Raghav
2022-09-14 16:52     ` Mike Snitzer [this message]
2022-09-14 19:16       ` Pankaj Raghav
2022-09-16 17:57         ` Pankaj Raghav
2022-09-16 18:04           ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YyIG3i++QriS9Gyy@redhat.com \
    --to=snitzer@redhat.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=damien.lemoal@wdc.com \
    --cc=dm-devel@redhat.com \
    --cc=gost.dev@samsung.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jaegeuk@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=matias.bjorling@wdc.com \
    --cc=p.raghav@samsung.com \
    --cc=pankydev8@gmail.com \
    --cc=snitzer@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox