From: Ming Lei <ming.lei@redhat.com>
To: Damien Le Moal <dlemoal@kernel.org>,
Bart Van Assche <bvanassche@acm.org>
Cc: linux-block@vger.kernel.org, Yi Zhang <yi.zhang@redhat.com>,
John Meneghini <jmeneghi@redhat.com>,
linux-nvme@lists.infradead.org, hch@lst.de,
Keith Busch <kbusch@kernel.org>
Subject: Re: [Report] blk-zoned/ZNS: non_power_of_2 of zone->len]
Date: Fri, 12 Jan 2024 11:29:15 +0800 [thread overview]
Message-ID: <ZaCyC5RIAcbkBYeL@fedora> (raw)
In-Reply-To: <20503cd0-3a99-45bb-8374-40296a3cb92a@kernel.org>
On Fri, Jan 12, 2024 at 12:05:45PM +0900, Damien Le Moal wrote:
> On 1/12/24 10:13, Ming Lei wrote:
> > Hello Damien and Guys,
> >
> > Yi reported that the following failure:
> >
> > Oct 18 15:24:15 localhost kernel: nvme nvme4: invalid zone size:196608 for namespace:1
> > Oct 18 15:24:33 localhost smartd[2303]: Device: /dev/nvme4, opened
> > Oct 18 15:24:33 localhost smartd[2303]: Device: /dev/nvme4, NETAPPX4022S173A4T0NTZ, S/N:S66NNE0T800169, FW:MVP40B7B, 4.09 TB
> >
> > Looks current blk-zoned requires zone->len to be power_of_2() since
> > commit:
> >
> > 6c6b35491422 ("block: set the zone size in blk_revalidate_disk_zones atomically")
> >
> > And the original power_of_2() requirement is from the following commit
> > for ZBC and ZAC.
> >
> > d9dd73087a8b ("block: Enhance blk_revalidate_disk_zones()")
> >
> > Meantime block layer does support non-power_of_2 chunk sectors limit.
>
> That is not true. It does. See blk_stack_limits which ahs:
>
> /* Set non-power-of-2 compatible chunk_sectors boundary */
> if (b->chunk_sectors)
> t->chunk_sectors = gcd(t->chunk_sectors, b->chunk_sectors);
>
> and the absence of any check on the value of chunk_sectors in
> blk_queue_chunk_sectors().
I meant non-power_of_2 chunk sectors limit is supported, see
07d098e6bbad ("block: allow 'chunk_sectors' to be non-power-of-2")
And device mapper uses that.
>
> > The question is if there is such hard requirement for ZNS, and I can't see
> > any such words in NVMe Zoned Namespace Command Set Specification.
>
> No, there are no requirements in ZNS for the zone size to be a power of 2 number
> of sectors/LBAs. The same is also true for ZBC and ZAC (SCSI and ATA) SMR HDDs.
> The requirement for the zone size to be a power of 2 number of sectors is
> entirely in the kernel. The reason being that zoned block device support started
> with SMR HDDs which all had a zone size of 256 MB (and still do) and no user
> ever wanted anything else than that. So everything was coded with this
> requirement, as that allowed many nice things like bit-shift/mask arithmetic for
> conversions between zone number and sectors etc (and that of course is very
> efficient).
Thanks for the clarification.
>
> > So is it one NVMe firmware issue? or blk-zoned problem with too strict(power_of_2)
> > requirement on zone->len?
>
> It is the latter. There was a session at LSF/MM last year about this. I recall
> that the conclusion was that unless there is a strong user demand for non power
> of 2 zone size, we are not going to do anything about it. Because allowing
> non-power of 2 zone size has some serious consequences all over the place,
> including in FSes that natively support zoned devices. So relaxing that
> requirement is not trivial.
Just saw Bart's work on supporting non-power_of_2 zone len:
https://lore.kernel.org/linux-block/dc89c70e-4931-baaf-c450-6801c200c1d7@acm.org/
IMO FS support might be another topic, cause FS isn't the only user,
also without block layer support, the device isn't usable, not mention FS.
Since non-power2 zoned device does exists, I'd suggest Bart to restart the
work and let linux cover more zoned devices(include non-power 2 zone).
Thanks,
Ming
next prev parent reply other threads:[~2024-01-12 3:29 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-12 1:13 [Report] blk-zoned/ZNS: non_power_of_2 of zone->len] Ming Lei
2024-01-12 3:05 ` Damien Le Moal
2024-01-12 3:29 ` Ming Lei [this message]
2024-01-12 3:34 ` Damien Le Moal
2024-01-12 3:46 ` Bart Van Assche
2024-01-12 15:40 ` Pankaj Raghav (Samsung)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZaCyC5RIAcbkBYeL@fedora \
--to=ming.lei@redhat.com \
--cc=bvanassche@acm.org \
--cc=dlemoal@kernel.org \
--cc=hch@lst.de \
--cc=jmeneghi@redhat.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=yi.zhang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.