qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Klaus Jensen <its@irrelevant.dk>
To: Dmitry Fomichev <Dmitry.Fomichev@wdc.com>
Cc: "Fam Zheng" <fam@euphon.net>, "Kevin Wolf" <kwolf@redhat.com>,
	"Damien Le Moal" <Damien.LeMoal@wdc.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
	"Niklas Cassel" <Niklas.Cassel@wdc.com>,
	"Klaus Jensen" <k.jensen@samsung.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Alistair Francis" <Alistair.Francis@wdc.com>,
	"Keith Busch" <kbusch@kernel.org>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Matias Bjorling" <Matias.Bjorling@wdc.com>
Subject: Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set
Date: Tue, 29 Sep 2020 18:36:47 +0200	[thread overview]
Message-ID: <20200929163647.GA286786@apples.localdomain> (raw)
In-Reply-To: <MN2PR04MB59517FB6076902451F463C5FE1320@MN2PR04MB5951.namprd04.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 7947 bytes --]

On Sep 29 15:43, Dmitry Fomichev wrote:
> > -----Original Message-----
> > From: Qemu-block <qemu-block-
> > bounces+dmitry.fomichev=wdc.com@nongnu.org> On Behalf Of Klaus
> > Jensen
> > Sent: Tuesday, September 29, 2020 6:47 AM
> > To: Damien Le Moal <Damien.LeMoal@wdc.com>
> > Cc: Fam Zheng <fam@euphon.net>; Kevin Wolf <kwolf@redhat.com>; qemu-
> > block@nongnu.org; Niklas Cassel <Niklas.Cassel@wdc.com>; Klaus Jensen
> > <k.jensen@samsung.com>; qemu-devel@nongnu.org; Alistair Francis
> > <Alistair.Francis@wdc.com>; Keith Busch <kbusch@kernel.org>; Philippe
> > Mathieu-Daudé <philmd@redhat.com>; Matias Bjorling
> > <Matias.Bjorling@wdc.com>
> > Subject: Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types
> > and Zoned Namespace Command Set
> > 
> > On Sep 28 22:54, Damien Le Moal wrote:
> > > On 2020/09/29 6:25, Keith Busch wrote:
> > > > On Mon, Sep 28, 2020 at 08:36:48AM +0200, Klaus Jensen wrote:
> > > >> On Sep 28 02:33, Dmitry Fomichev wrote:
> > > >>> You are making it sound like the entire WDC series relies on this
> > approach.
> > > >>> Actually, the persistency is introduced in the second to last patch in the
> > > >>> series and it only adds a couple of lines of code in the i/o path to mark
> > > >>> zones dirty. This is possible because of using mmap() and I find the way
> > > >>> it is done to be quite elegant, not ugly :)
> > > >>>
> > > >>
> > > >> No, I understand that your implementation works fine without
> > > >> persistance, but persistance is key. That is why my series adds it in
> > > >> the first patch. Without persistence it is just a toy. And the QEMU
> > > >> device is not just an "NVMe-version" of null_blk.
> > > >
> > > > I really think we should be a bit more cautious of commiting to an
> > > > on-disk format for the persistent state. Both this and Klaus' persistent
> > > > state feels a bit ad-hoc, and with all the other knobs provided, it
> > > > looks too easy to have out-of-sync states, or just not being able to
> > > > boot at all if a qemu versions have different on-disk formats.
> > > >
> > > > Is anyone really considering zone emulation for production level stuff
> > > > anyway? I can't imagine a real scenario where you'd want put yourself
> > > > through that: you are just giving yourself all the downsides of a zoned
> > > > block device and none of the benefits. AFAIK, this is provided as a
> > > > development vehicle, closer to a "toy".
> > > >
> > > > I think we should consider trimming this down to a more minimal set that
> > > > we *do* agree on and commit for inclusion ASAP. We can iterate all the
> > > > bells & whistles and flush out the meta data's data marshalling scheme
> > > > for persistence later.
> > >
> > > +1 on this. Removing the persistence also removes the debate on
> > endianess. With
> > > that out of the way, it should be straightforward to get agreement on a
> > series
> > > that can be merged quickly to get developers started with testing ZNS
> > software
> > > with QEMU. That is the most important goal here. 5.9 is around the corner,
> > we
> > > need something for people to get started with ZNS quickly.
> > >
> > 
> > Wait. What. No. Stop!
> > 
> > It is unmistakably clear that you are invalidating my arguments about
> > portability and endianness issues by suggesting that we just remove
> > persistent state and deal with it later, but persistence is the killer
> > feature that sets the QEMU emulated device apart from other emulation
> > options. It is not about using emulation in production (because yeah,
> > why would you?), but persistence is what makes it possible to develop
> > and test "zoned FTLs" or something that requires recovery at power up.
> > This is what allows testing of how your host software deals with opened
> > zones being transitioned to FULL on power up and the persistent tracking
> > of LBA allocation (in my series) can be used to properly test error
> > recovery if you lost state in the app.
> > 
> > Please, work with me on this instead of just removing such an essential
> > feature. Since persistence seems to be the only thing we are really
> > discussing, we should have plenty of time until the soft-freeze to come
> > up with a proper solution on that.
> > 
> > I agree that my version had a format that was pretty ad-hoc and that
> > won't fly - it needs magic and version capabilities like in Dmitry's
> > series, which incidentially looks a lot like what we did in the
> > OpenChannel implementation, so I agree with the strategy.
> 
> Are you insinuating that I somehow took stuff from OCSSD code and try
> to claim priority this way? I am not at all that familiar with that code.
> And I've already sent you the link to tcmu-runner code that served me
> as an inspiration for implementing persistence in WDC patchset.
> That code has been around for years, uses mmap, works great and has
> nothing to do with you.
> 

No. I am not insinuating anything. The OpenChannel device also used a
blockdev, but, yes, incidentially (and sorry, I should not have used
that word), it looked like how we did it there and I noted that I agreed
with the strategy.

> > 
> > ZNS-wise, the only thing my implementation stores is the zone
> > descriptors (in spec-native little-endian format) and the zone
> > descriptor extensions. So there are no endian issues with those. The
> > allocation tracking bitmap is always stored in little endian, but
> > converted to big-endian if running on a big-endian host.
> > 
> > Let me just conjure something up.
> > 
> >     #define NVME_PSTATE_MAGIC ...
> >     #define NVME_PSTATE_V1    1
> > 
> >     typedef struct NvmePstateHeader {
> >         uint32_t magic;
> >         uint32_t version;
> > 
> >         uint64_t blk_len;
> > 
> >         uint8_t  lbads;
> >         uint8_t  iocs;
> > 
> >         uint8_t  rsvd18[3054];
> > 
> >         struct {
> >             uint64_t zsze;
> >             uint8_t  zdes;
> >         } QEMU_PACKED zns;
> > 
> >         uint8_t  rsvd3089[1007];
> >     } QEMU_PACKED NvmePstateHeader;
> > 
> 
> Why conjure something that already exists in WDC patchset? And that part
> has been published in the very first version of our patches, weeks before
> your entire ZNS series was posted. Add an rsvd[] here and there and that code
> can be as suitable to achieve the stated goals as what you have above.
> 

Yes, I read your code. I know you have a header and I also noted above
that "it needs magic and version capabilities like in Dmitry's series".

> > series,
> > With such a header we have all we need. We can bail out if any
> > parameters do not match and similar to nvme data structures it contains
> > reserved areas for future use. I'll be posting a v2 with this. If this
> > still feels too ad-hoc, we can be inspired by QCOW2 and the "extension"
> > feature.
> > 
> > I can agree that we drop other optional features like zone excursions
> > and the reset/finish recommended limit simulation, but PLEASE DO NOT
> > remove persistence and upstream a half-baked version when we are so
> > close and have time to get it right.
> 
> One important thing IMO is to reduce future need for metadata versioning.
> This requires a really good effort to design and review the proper metadata
> format that would become stable for some time. Think about portability.
> To get out something "conjured up" now and then have to move to V2
> metadata in the next release is even worse than no persistence at all.
> So maybe is makes sense to go with Keith's suggestion.

As I've said, we have time until the soft-freeze to get this right. I
"conjured" something up to show a point. The reason we review and
iterate is to NOT upstream something thats is conjured up.

But we gotta start somewhere, no? So what is so bad about me posting a
v2?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-09-29 16:40 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-23 18:20 [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set Dmitry Fomichev
2020-09-23 18:20 ` [PATCH v4 01/14] hw/block/nvme: Report actual LBA data shift in LBAF Dmitry Fomichev
2020-09-24 12:12   ` Klaus Jensen
2020-09-23 18:20 ` [PATCH v4 02/14] hw/block/nvme: Add Commands Supported and Effects log Dmitry Fomichev
2020-09-23 18:20 ` [PATCH v4 03/14] hw/block/nvme: Introduce the Namespace Types definitions Dmitry Fomichev
2020-09-23 18:20 ` [PATCH v4 04/14] hw/block/nvme: Define trace events related to NS Types Dmitry Fomichev
2020-09-23 18:20 ` [PATCH v4 05/14] hw/block/nvme: Add support for Namespace Types Dmitry Fomichev
2020-09-23 18:20 ` [PATCH v4 06/14] hw/block/nvme: Add support for active/inactive namespaces Dmitry Fomichev
2020-09-24 12:12   ` Klaus Jensen
2020-09-24 18:17     ` Niklas Cassel
2020-09-24 18:55       ` Klaus Jensen
2020-09-24 19:40         ` Niklas Cassel
2020-09-23 18:20 ` [PATCH v4 07/14] hw/block/nvme: Make Zoned NS Command Set definitions Dmitry Fomichev
2020-09-23 18:20 ` [PATCH v4 08/14] hw/block/nvme: Define Zoned NS Command Set trace events Dmitry Fomichev
2020-09-23 18:20 ` [PATCH v4 09/14] hw/block/nvme: Support Zoned Namespace Command Set Dmitry Fomichev
2020-09-25 18:24   ` Klaus Jensen
2020-09-23 18:20 ` [PATCH v4 10/14] hw/block/nvme: Introduce max active and open zone limits Dmitry Fomichev
2020-09-23 18:20 ` [PATCH v4 11/14] hw/block/nvme: Support Zone Descriptor Extensions Dmitry Fomichev
2020-09-23 18:20 ` [PATCH v4 12/14] hw/block/nvme: Add injection of Offline/Read-Only zones Dmitry Fomichev
2020-09-23 18:20 ` [PATCH v4 13/14] hw/block/nvme: Use zone metadata file for persistence Dmitry Fomichev
2020-09-23 18:20 ` [PATCH v4 14/14] hw/block/nvme: Document zoned parameters in usage text Dmitry Fomichev
2020-09-24 21:07 ` [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set Klaus Jensen
2020-09-28  2:33   ` Dmitry Fomichev
2020-09-28  6:36     ` Klaus Jensen
2020-09-28 21:25       ` Keith Busch
2020-09-28 22:54         ` Damien Le Moal
2020-09-29 10:46           ` Klaus Jensen
2020-09-29 11:13             ` Damien Le Moal
2020-09-29 17:44               ` Keith Busch
2020-09-29 15:43             ` Dmitry Fomichev
2020-09-29 16:36               ` Klaus Jensen [this message]
2020-09-29 17:29             ` Keith Busch
2020-09-29 18:00               ` Klaus Jensen
2020-09-29 18:15                 ` Keith Busch
2020-09-29 18:18                   ` Klaus Jensen
2020-09-29 18:17                 ` Matias Bjorling
2020-09-29 18:36                   ` Klaus Jensen
2020-09-29 19:42                     ` Matias Bjorling
2020-09-29 15:42       ` Dmitry Fomichev
2020-09-29 18:39         ` Klaus Jensen
2020-09-29 19:22           ` Keith Busch
2020-09-29 19:53             ` Dmitry Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200929163647.GA286786@apples.localdomain \
    --to=its@irrelevant.dk \
    --cc=Alistair.Francis@wdc.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=Dmitry.Fomichev@wdc.com \
    --cc=Matias.Bjorling@wdc.com \
    --cc=Niklas.Cassel@wdc.com \
    --cc=fam@euphon.net \
    --cc=k.jensen@samsung.com \
    --cc=kbusch@kernel.org \
    --cc=kwolf@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).