From: Klaus Jensen <its@irrelevant.dk>
To: Keith Busch <kbusch@kernel.org>
Cc: Kevin Wolf <kwolf@redhat.com>,
Klaus Jensen <k.jensen@samsung.com>,
qemu-devel@nongnu.org, qemu-block@nongnu.org,
Max Reitz <mreitz@redhat.com>
Subject: Re: [PATCH] hw/block/nvme: fix zone write finalize
Date: Fri, 15 Jan 2021 07:01:49 +0100 [thread overview]
Message-ID: <YAEvzfXMZJ3f8uiN@apples.localdomain> (raw)
In-Reply-To: <20210114230319.GC1511902@dhcp-10-100-145-180.wdc.com>
[-- Attachment #1: Type: text/plain, Size: 3430 bytes --]
On Jan 14 15:03, Keith Busch wrote:
> On Tue, Jan 12, 2021 at 10:42:35AM +0100, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> >
> > The zone write pointer is unconditionally advanced, even for write
> > faults. Make sure that the zone is always transitioned to Full if the
> > write pointer reaches zone capacity.
>
> Looks like some spec weirdness. It says we can transition to full:
>
> b) as a result of successful completion of a write operation that
> writes one or more logical blocks that causes the zone to reach its
> writeable zone capacity;
>
> But a failed write also advances the write pointer as you've pointed
> out, so they might want to strike "successful".
>
Yes, its slightly inconsistent.
Empty/Closed to Opened is also kinda fuzzy in the spec. The wording is
slightly different when transitioning from Empty and Closed to
Implicitly Opened. ZSE->ZSIO just says "and a write operation writes on
or more logical blocks of that zone". ZSC->ZSIO says "and a write
operation that writes one or more logical blocks to the zone completes
succesfully". This has given me some headaches on if the transition
should occur when "preparing/submitting" the write or when the write
completes. But I guess this is pretty much an implementation detail of
the device.
I should mention this to my representatives ;)
> Looks fine to me.
>
> Reviewed-by: Keith Busch <kbusch@kernel.org>
>
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> > ---
> > hw/block/nvme.c | 10 +++++-----
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 0854ee307227..280b31b4459d 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -1268,10 +1268,13 @@ static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req,
> > nlb = le16_to_cpu(rw->nlb) + 1;
> > zone = nvme_get_zone_by_slba(ns, slba);
> >
> > + zone->d.wp += nlb;
> > +
> > if (failed) {
> > res->slba = 0;
> > - zone->d.wp += nlb;
> > - } else if (zone->w_ptr == nvme_zone_wr_boundary(zone)) {
> > + }
> > +
> > + if (zone->d.wp == nvme_zone_wr_boundary(zone)) {
>
> The previous check was using 'zone->w_ptr', but now it's 'zone->d.wp'.
> As far as I can tell, this difference will mean the zone won't finalize
> until the last write completes, where before it could finalize after the
> zone's last write is submitted. Either way looks okay, but I think these
> two values ought to always be in sync.
>
Right - the zone will transition to Full on the last completed write. I
really not sure if this models devices better or not. But I would argue
that a real device would not relinquish resources until the last write
completes.
An issue is that for QD>1 (zone append), we can easily end up with
appends A and B completing in reversed order such that d.wp is advanced
by B_nlb, but we left a "hole" of A_nlb in the zone because that write
has not completed yet.
But - I think *some* inconsistency on the write pointer value is to be
expected when using Zone Append. The host can only ever depend on a
consistent value of the write pointer by quiescing I/O to the zone and
then getting a zone report - and d.wp and d.w_ptr will always
converge in that case.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2021-01-15 6:06 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-12 9:42 [PATCH] hw/block/nvme: fix zone write finalize Klaus Jensen
2021-01-14 23:03 ` Keith Busch
2021-01-15 6:01 ` Klaus Jensen [this message]
2021-01-20 9:14 ` Klaus Jensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YAEvzfXMZJ3f8uiN@apples.localdomain \
--to=its@irrelevant.dk \
--cc=k.jensen@samsung.com \
--cc=kbusch@kernel.org \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).