From: Greg KH <gregkh@linuxfoundation.org>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jing Zhang <jingzhangos@google.com>, KVM <kvm@vger.kernel.org>,
KVMARM <kvmarm@lists.cs.columbia.edu>,
LinuxMIPS <linux-mips@vger.kernel.org>,
KVMPPC <kvm-ppc@vger.kernel.org>,
LinuxS390 <linux-s390@vger.kernel.org>,
Linuxkselftest <linux-kselftest@vger.kernel.org>,
Marc Zyngier <maz@kernel.org>, James Morse <james.morse@arm.com>,
Julien Thierry <julien.thierry.kdev@gmail.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Will Deacon <will@kernel.org>,
Huacai Chen <chenhuacai@kernel.org>,
Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
Paul Mackerras <paulus@ozlabs.org>,
Christian Borntraeger <borntraeger@de.ibm.com>,
Janosch Frank <frankja@linux.ibm.com>,
David Hildenbrand <david@redhat.com>,
Cornelia Huck <cohuck@redhat.com>,
Claudio Imbrenda <imbrenda@linux.ibm.com>,
Sean Christopherson <seanjc@google.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Jim Mattson <jmattson@google.com>,
Peter Shier <pshier@google.com>, Oliver Upton <oupton@google.com>,
David Rientjes <rientjes@google.com>,
Emanuele Giuseppe Esposito <eesposit@redhat.com>,
David Matlack <dmatlack@google.com>,
Ricardo Koller <ricarkol@google.com>,
Krish Sadhukhan <krish.sadhukhan@oracle.com>,
Fuad Tabba <tabba@google.com>
Subject: Re: [PATCH v9 3/5] KVM: stats: Add documentation for statistics data binary interface
Date: Wed, 16 Jun 2021 20:18:25 +0200 [thread overview]
Message-ID: <YMpAcaqjk4ZN90gj@kroah.com> (raw)
In-Reply-To: <9b9a951d-d020-5599-5c4f-e154b40522b9@redhat.com>
On Wed, Jun 16, 2021 at 06:59:15PM +0200, Paolo Bonzini wrote:
> On 16/06/21 17:21, Greg KH wrote:
> > I forgot to comment on this one, sorry for the delay.
> >
> > Why are you "inventing" your own schema format here for this? Why not
> > use a well-known or at least well-designed/implemented one that we have
> > in userspace already?
> >
> > There are a few that I would love to see in the kernel, varlink being
> > the best example. We have kernel examples of this and I would consider
> > using that as a transport for sysfs-like data in the future, but never
> > got around to it.
>
> Thanks, that's a good observation, and it's a problem that the rationale and
> the design process didn't end up in either the documentation or the commit
> message (only the outcome did). In order to fix that, this is going to be
> quite a long message.
>
> Varlink comprises both a schema and an encoding, and each has its own
> problems. For the encoding of the data, varlink is really just using JSON
> and it is really just the wrong tool here. The first few problems that come
> to mind are:
>
> - varlink structs are encoded as JSON dictionaries. Therefore, every time
> userspace reads the fields, the kernel has to include the field names as
> JSON dictionary keys. This means that a lot of time is spent writing
> buffers, and on the receiving side parsing them.
Has this been measured? Years ago when I messed with this, it was in
the noise as JSON parsing is really fast these days.
> - because numeric data has to be converted to ASCII the output doesn't have
> fixed offsets, so it is not possible to make an efficient implementation of
> pread.
efficient where? In the kernel?
> - even though Varlink specifies that int is "usually int64", a little-known
> gem is that JSON behavior for numbers not representable as a double (i.e.
> exceeding 2^53) is implementation-defined; implementations can and will mess
> up values outside that range. This problem is not specific to this KVM
> stats usecase; varlink's schema specification really is at odds with its
> encoding specification.
That's interesting, do the varlink developers know this? And we can say
"for the kernel int will be int64" and be done with it, so this
shouldn't be that big of an issue.
> For the schema, there are some specific problems with varlink, but also a
> more generic issue. The specific problems are:
>
> - the schema doesn't include the length of arrays. This makes it hard to
> compute in advance lengths and offsets of fields (even ignoring the fact
> that data is not binary, which I'll get to later)
Do you care in advance?
> - the schema also is not extensible with user annotations. In our case that
> would be mostly the unit in which the value is expressed.
units would be nice, I agree.
> The main problem with the various available serialization formats is that
> their schema languages are designed to be compiled with a code generator.
> Here instead the schema is transmitted from the kernel to userspace.
> Userspace is not really supposed to know what a value means or even that if
> it exists. Userspace takes care of collecting the data from the kernel, but
> ultimately there will be a human that knows what e.g. "io_exits" or
> "req_event" means, and they will ask for the current value, or a plot over
> time, of a specific statistic.
>
> Now, unlike most other schema languages, varlink does not require
> precompiling the schema in its C bindings. However, this was mostly just
> the authors not bothering to write a C code generator---they did write one
> for Rust, for example. When using the "official" varlink C bindings, you
> hardly use the schema at all.
I don't think the kernel implementation needs this, you can just specify
the schema "on the fly" and it should work. Or maybe I'm wrong, it's
been a few years.
> Something similar to the above issues is quite common in other formats. For
> example, FlatBuffers[2]'s schema language[1] does have annotations (which it
> calls metadata), but then we would still have to invent some standard
> annotations and teach programs about them. However, the deal breaker is
> that again there is no way to transmit the schema from the server to the
> client, and that users are supposed to precompile the schema using a code
> generator.
Again, I didn't think this was an issue with the kernel implementation
in that the userspace side could determine the schema by the data coming
from the kernel, it wouldn't have to "know" about it ahead of time.
But I could be wrong.
> All that said, what we _could_ do is serialize the schema as JSON instead of
> using a binary format, like so:
>
> [{
> "name": "exits",
> "kind": "cumulative",
> "count": 1
> }, {
> "name": "halt_poll_fail_ns",
> "kind": "cumulative",
> "unit": "seconds",
> "scale": {"base": "pow2", exponent: -9},
> "count": 1
> }, ...]
>
> while keeping the actual statistics as an array of u64 values just like in
> these patches. The JSON representation of the schema would be always the
> same, so it could be treated as fixed-size and an efficient implementation
> of pread would be possible. And once the schema is JSON, there could be a
> *meta*-schema expressed using the varlink language:
>
> # compare to struct kvm_stats_desc, included after my sig
> # for reference
> type StatsDescriptor {
> name: string,
> kind: (cumulative, instant, peak),
> unit: ?(bytes, seconds, cycles),
> scale: ?(base: (pow2, pow10), exponent: int),
> count: int
> }
>
> type StatsSchema {
> name: string,
> stats: []StatsDescriptor
> }
>
> Varlink would *not* be used in the kernel, because Varlink isreally just
> JSON. The above ten lines (plus comments) would be dropped in
> Documentation/ as a .varlink file; they would be a convenient way to
> describe (to either a human or a program) how to parse the JSON schema.
>
> There would then be another problem, namely how to include the varlink
> schema in the generated documentation, since right now there are for example
> no sphinx bindings for varlink. Nevertheless, if you think it's better to
> have the schema as JSON instead of binary, according to the above varlink
> meta-schema, then we can look into it.
It should be in some standard format. I'm not wed to varlink, but I
thought it solved some of your issues here with userspace not knowing
the schema ahead of time which is the key as it will change over time
and the kernel should be 'self-describing'.
If not, and it sounds like you have looked into it, or at least the
userspace side, then that's fine.
But you should write up a justification somewhere why you didn't use an
existing format (what about the netlink format?) in at least the
changelog, like you did here. As you will be stuck with this api for
the next 20+ years, it's good to be confident in it :)
maybe someday we can get varlink into the tree, as I still think it
would be better than the ad-hoc implementations we currently have.
thanks,
greg k-h
next prev parent reply other threads:[~2021-06-16 18:18 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-14 21:21 [PATCH v9 0/5] KVM statistics data fd-based binary interface Jing Zhang
2021-06-14 21:21 ` [PATCH v9 1/5] KVM: stats: Separate generic stats from architecture specific ones Jing Zhang
2021-06-14 21:21 ` [PATCH v9 2/5] KVM: stats: Add fd-based API to read binary stats data Jing Zhang
2021-06-16 17:12 ` Paolo Bonzini
2021-06-16 18:04 ` Jing Zhang
2021-06-14 21:21 ` [PATCH v9 3/5] KVM: stats: Add documentation for statistics data binary interface Jing Zhang
2021-06-16 15:21 ` Greg KH
2021-06-16 16:59 ` Paolo Bonzini
2021-06-16 18:18 ` Greg KH [this message]
2021-06-16 19:45 ` Paolo Bonzini
2021-06-14 21:21 ` [PATCH v9 4/5] KVM: selftests: Add selftest for KVM " Jing Zhang
2021-06-15 8:03 ` Fuad Tabba
2021-06-16 21:35 ` Jing Zhang
2021-06-14 21:21 ` [PATCH v9 5/5] KVM: stats: Remove code duplication for binary and debugfs stats Jing Zhang
2021-06-15 5:25 ` [PATCH v9 0/5] KVM statistics data fd-based binary interface Leon Romanovsky
2021-06-15 7:06 ` Paolo Bonzini
2021-06-15 7:53 ` Leon Romanovsky
2021-06-15 11:03 ` Paolo Bonzini
2021-06-15 13:34 ` Leon Romanovsky
2021-06-15 8:37 ` Enrico Weigelt, metux IT consult
2021-06-15 9:21 ` Greg KH
2021-06-15 11:31 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YMpAcaqjk4ZN90gj@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=aleksandar.qemu.devel@gmail.com \
--cc=borntraeger@de.ibm.com \
--cc=chenhuacai@kernel.org \
--cc=cohuck@redhat.com \
--cc=david@redhat.com \
--cc=dmatlack@google.com \
--cc=eesposit@redhat.com \
--cc=frankja@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=james.morse@arm.com \
--cc=jingzhangos@google.com \
--cc=jmattson@google.com \
--cc=julien.thierry.kdev@gmail.com \
--cc=krish.sadhukhan@oracle.com \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=maz@kernel.org \
--cc=oupton@google.com \
--cc=paulus@ozlabs.org \
--cc=pbonzini@redhat.com \
--cc=pshier@google.com \
--cc=ricarkol@google.com \
--cc=rientjes@google.com \
--cc=seanjc@google.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=tsbogend@alpha.franken.de \
--cc=vkuznets@redhat.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox