From: David Sterba <dsterba@suse.cz>
To: Pankaj Raghav <p.raghav@samsung.com>
Cc: axboe@kernel.dk, damien.lemoal@opensource.wdc.com,
pankydev8@gmail.com, dsterba@suse.com, hch@lst.de,
linux-nvme@lists.infradead.org, linux-fsdevel@vger.kernel.org,
linux-btrfs@vger.kernel.org, jiangbo.365@bytedance.com,
linux-block@vger.kernel.org, gost.dev@samsung.com,
linux-kernel@vger.kernel.org, dm-devel@redhat.com
Subject: Re: [PATCH v4 08/13] btrfs:zoned: make sb for npo2 zone devices align with sb log offsets
Date: Tue, 17 May 2022 14:42:57 +0200 [thread overview]
Message-ID: <20220517124257.GD18596@twin.jikos.cz> (raw)
In-Reply-To: <20220516165416.171196-9-p.raghav@samsung.com>
On Mon, May 16, 2022 at 06:54:11PM +0200, Pankaj Raghav wrote:
> Superblocks for zoned devices are fixed as 2 zones at 0, 512GB and 4TB.
> These are fixed at these locations so that recovery tools can reliably
> retrieve the superblocks even if one of the mirror gets corrupted.
>
> power of 2 zone sizes align at these offsets irrespective of their
> value but non power of 2 zone sizes will not align.
>
> To make sure the first zone at mirror 1 and mirror 2 align, write zero
> operation is performed to move the write pointer of the first zone to
> the expected offset. This operation is performed only after a zone reset
> of the first zone, i.e., when the second zone that contains the sb is FULL.
Is it a good idea to do the "write zeros", instead of a plain "set write
pointer"? I assume setting write pointer is instant, while writing
potentially hundreds of megabytes may take significiant time. As the
functions may be called from random contexts, the increased time may
become a problem.
> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
> ---
> fs/btrfs/zoned.c | 68 ++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 63 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 3023c871e..805aeaa76 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -760,11 +760,44 @@ int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info)
> return 0;
> }
>
> +static int fill_sb_wp_offset(struct block_device *bdev, struct blk_zone *zone,
> + int mirror, u64 *wp_ret)
> +{
> + u64 offset = 0;
> + int ret = 0;
> +
> + ASSERT(!is_power_of_two_u64(zone->len));
> + ASSERT(zone->wp == zone->start);
> + ASSERT(mirror != 0);
This could simply accept 0 as the mirror offset too, the calculation is
trivial.
> +
> + switch (mirror) {
> + case 1:
> + div64_u64_rem(BTRFS_SB_LOG_FIRST_OFFSET >> SECTOR_SHIFT,
> + zone->len, &offset);
> + break;
> + case 2:
> + div64_u64_rem(BTRFS_SB_LOG_SECOND_OFFSET >> SECTOR_SHIFT,
> + zone->len, &offset);
> + break;
> + }
> +
> + ret = blkdev_issue_zeroout(bdev, zone->start, offset, GFP_NOFS, 0);
> + if (ret)
> + return ret;
> +
> + zone->wp += offset;
> + zone->cond = BLK_ZONE_COND_IMP_OPEN;
> + *wp_ret = zone->wp << SECTOR_SHIFT;
> +
> + return 0;
> +}
> +
> static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
> - int rw, u64 *bytenr_ret)
> + int rw, int mirror, u64 *bytenr_ret)
> {
> u64 wp;
> int ret;
> + bool zones_empty = false;
>
> if (zones[0].type == BLK_ZONE_TYPE_CONVENTIONAL) {
> *bytenr_ret = zones[0].start << SECTOR_SHIFT;
> @@ -775,13 +808,31 @@ static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
> if (ret != -ENOENT && ret < 0)
> return ret;
>
> + if (ret == -ENOENT)
> + zones_empty = true;
> +
> if (rw == WRITE) {
> struct blk_zone *reset = NULL;
> + bool is_sb_offset_write_req = false;
> + u32 reset_zone_nr = -1;
>
> - if (wp == zones[0].start << SECTOR_SHIFT)
> + if (wp == zones[0].start << SECTOR_SHIFT) {
> reset = &zones[0];
> - else if (wp == zones[1].start << SECTOR_SHIFT)
> + reset_zone_nr = 0;
> + } else if (wp == zones[1].start << SECTOR_SHIFT) {
> reset = &zones[1];
> + reset_zone_nr = 1;
> + }
> +
> + /*
> + * Non po2 zone sizes will not align naturally at
> + * mirror 1 (512GB) and mirror 2 (4TB). The wp of the
> + * 1st zone in those superblock mirrors need to be
> + * moved to align at those offsets.
> + */
Please move this comment to the helper fill_sb_wp_offset itself, there
it's more discoverable.
> + is_sb_offset_write_req =
> + (zones_empty || (reset_zone_nr == 0)) && mirror &&
> + !is_power_of_2(zones[0].len);
Accepting 0 as the mirror number would also get rid of this wild
expression substituting and 'if'.
>
> if (reset && reset->cond != BLK_ZONE_COND_EMPTY) {
> ASSERT(sb_zone_is_full(reset));
> @@ -795,6 +846,13 @@ static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
> reset->cond = BLK_ZONE_COND_EMPTY;
> reset->wp = reset->start;
> }
> +
> + if (is_sb_offset_write_req) {
And get rid of the conditional. The point of supporting both po2 and
nonpo2 is to hide any implementation details to wrappers as much as
possible.
> + ret = fill_sb_wp_offset(bdev, &zones[0], mirror, &wp);
> + if (ret)
> + return ret;
> + }
> +
> } else if (ret != -ENOENT) {
> /*
> * For READ, we want the previous one. Move write pointer to
next prev parent reply other threads:[~2022-05-17 12:47 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20220516165418eucas1p2be592d9cd4b35f6b71d39ccbe87f3fef@eucas1p2.samsung.com>
2022-05-16 16:54 ` [PATCH v4 00/13] support non power of 2 zoned devices Pankaj Raghav
2022-05-16 16:54 ` [PATCH v4 01/13] block: make blkdev_nr_zones and blk_queue_zone_no generic for npo2 zsze Pankaj Raghav
2022-05-16 16:54 ` [PATCH v4 02/13] block: allow blk-zoned devices to have non-power-of-2 zone size Pankaj Raghav
2022-05-16 19:05 ` Pankaj Raghav
2022-05-16 16:54 ` [PATCH v4 03/13] nvme: zns: Allow ZNS drives that have non-power_of_2 " Pankaj Raghav
2022-05-16 16:54 ` [PATCH v4 04/13] nvmet: Allow ZNS target to support non-power_of_2 zone sizes Pankaj Raghav
2022-05-17 14:19 ` Johannes Thumshirn
2022-05-16 16:54 ` [PATCH v4 05/13] btrfs: zoned: Cache superblock location in btrfs_zoned_device_info Pankaj Raghav
2022-05-16 21:58 ` David Sterba
2022-05-17 7:55 ` Pankaj Raghav
2022-05-16 16:54 ` [PATCH v4 06/13] btrfs: zoned: Make sb_zone_number function non power of 2 compatible Pankaj Raghav
2022-05-17 6:53 ` Johannes Thumshirn
2022-05-17 11:51 ` David Sterba
2022-05-16 16:54 ` [PATCH v4 07/13] btrfs: zoned: use generic btrfs zone helpers to support npo2 zoned devices Pankaj Raghav
2022-05-17 12:30 ` David Sterba
2022-05-18 9:40 ` Pankaj Raghav
2022-05-18 11:21 ` David Sterba
2022-05-19 4:13 ` Naohiro Aota
2022-05-16 16:54 ` [PATCH v4 08/13] btrfs:zoned: make sb for npo2 zone devices align with sb log offsets Pankaj Raghav
2022-05-17 6:50 ` Johannes Thumshirn
2022-05-17 8:00 ` Pankaj Raghav
2022-05-17 12:42 ` David Sterba [this message]
2022-05-18 9:15 ` Pankaj Raghav
2022-05-19 7:57 ` Johannes Thumshirn
2022-05-20 9:06 ` Pankaj Raghav
2022-05-20 9:15 ` Johannes Thumshirn
2022-05-19 7:59 ` Naohiro Aota
2022-05-20 9:09 ` Pankaj Raghav
2022-05-16 16:54 ` [PATCH v4 09/13] btrfs: zoned: relax the alignment constraint for zoned devices Pankaj Raghav
2022-05-16 16:54 ` [PATCH v4 10/13] zonefs: allow non power of 2 " Pankaj Raghav
2022-05-16 16:54 ` [PATCH v4 11/13] null_blk: " Pankaj Raghav
2022-05-17 4:12 ` kernel test robot
2022-05-16 16:54 ` [PATCH v4 12/13] null_blk: use zone_size_sects_shift for " Pankaj Raghav
2022-05-16 16:54 ` [PATCH v4 13/13] dm-zoned: ensure only power of 2 zone sizes are allowed Pankaj Raghav
2022-05-17 8:10 ` [PATCH v4 00/13] support non power of 2 zoned devices Christoph Hellwig
2022-05-17 9:18 ` Javier González
2022-05-18 8:00 ` Christoph Hellwig
2022-05-19 15:25 ` Javier González
2022-05-17 15:34 ` [dm-devel] " Theodore Ts'o
2022-05-18 23:06 ` Luis Chamberlain
2022-05-19 3:08 ` Damien Le Moal
2022-05-19 3:12 ` Luis Chamberlain
2022-05-19 3:19 ` Damien Le Moal
2022-05-19 7:34 ` Johannes Thumshirn
2022-05-20 3:47 ` Damien Le Moal
2022-05-20 6:07 ` Hannes Reinecke
2022-05-20 6:27 ` Javier González
2022-05-20 6:41 ` Damien Le Moal
2022-05-20 6:59 ` Javier González
2022-05-20 9:30 ` Pankaj Raghav
2022-05-20 17:18 ` David Sterba
2022-05-23 8:25 ` Pankaj Raghav
2022-05-20 9:30 ` Johannes Thumshirn
2022-05-20 10:16 ` Javier González
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220517124257.GD18596@twin.jikos.cz \
--to=dsterba@suse.cz \
--cc=axboe@kernel.dk \
--cc=damien.lemoal@opensource.wdc.com \
--cc=dm-devel@redhat.com \
--cc=dsterba@suse.com \
--cc=gost.dev@samsung.com \
--cc=hch@lst.de \
--cc=jiangbo.365@bytedance.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=p.raghav@samsung.com \
--cc=pankydev8@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox