All of lore.kernel.org
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: John Snow <jsnow@redhat.com>
Cc: Michael Roth <michael.roth@amd.com>,
	Cleber Rosa <crosa@redhat.com>,
	qemu-devel@nongnu.org, Eduardo Habkost <ehabkost@redhat.com>
Subject: Re: [PATCH v4 11/14] qapi/introspect.py: add type hint annotations
Date: Tue, 09 Feb 2021 10:06:23 +0100	[thread overview]
Message-ID: <87lfbxvcds.fsf@dusky.pond.sub.org> (raw)
In-Reply-To: <a1d1c67e-8066-3154-1117-6c86c6f8d9b6@redhat.com> (John Snow's message of "Mon, 8 Feb 2021 16:39:12 -0500")

John Snow <jsnow@redhat.com> writes:

> On 2/5/21 8:42 AM, Markus Armbruster wrote:
>> John Snow <jsnow@redhat.com> writes:
>> 
>>> On 2/3/21 10:15 AM, Markus Armbruster wrote:
>>>> John Snow <jsnow@redhat.com> writes:
>>>>
>>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>>> ---
>>>>>    scripts/qapi/introspect.py | 115 ++++++++++++++++++++++++++-----------
>>>>>    scripts/qapi/mypy.ini      |   5 --
>>>>>    scripts/qapi/schema.py     |   2 +-
>>>>>    3 files changed, 82 insertions(+), 40 deletions(-)
>>>>>
>>>>> diff --git a/scripts/qapi/introspect.py b/scripts/qapi/introspect.py
>>>>> index 60ec326d2c7..b7f2a6cf260 100644
>>>>> --- a/scripts/qapi/introspect.py
>>>>> +++ b/scripts/qapi/introspect.py
>>>>> @@ -30,10 +30,19 @@
>>>>>    )
>>>>>    from .gen import QAPISchemaMonolithicCVisitor
>>>>>    from .schema import (
>>>>> +    QAPISchema,
>>>>>        QAPISchemaArrayType,
>>>>>        QAPISchemaBuiltinType,
>>>>> +    QAPISchemaEntity,
>>>>> +    QAPISchemaEnumMember,
>>>>> +    QAPISchemaFeature,
>>>>> +    QAPISchemaObjectType,
>>>>> +    QAPISchemaObjectTypeMember,
>>>>>        QAPISchemaType,
>>>>> +    QAPISchemaVariant,
>>>>> +    QAPISchemaVariants,
>>>>>    )
>>>>> +from .source import QAPISourceInfo
>>>>>    
>>>>>    
>>>>>    # This module constructs a tree data structure that is used to
>>>>> @@ -57,6 +66,8 @@
>>>>      # generate the introspection information for QEMU. It behaves similarly
>>>>      # to a JSON value.
>>>>      #
>>>>      # A complexity over JSON is that our values may or may not be annotated.
>>>>      #
>>>>      # Un-annotated values may be:
>>>>      #     Scalar: str, bool, None.
>>>>      #     Non-scalar: List, Dict
>>>>      # _value = Union[str, bool, None, Dict[str, TreeValue], List[TreeValue]]
>>>>      #
>>>>      # With optional annotations, the type of all values is:
>>>>      # TreeValue = Union[_value, Annotated[_value]]
>>>>      #
>>>>      # Sadly, mypy does not support recursive types, so we must approximate this.
>>>>      _stub = Any
>>>>      _scalar = Union[str, bool, None]
>>>>      _nonscalar = Union[Dict[str, _stub], List[_stub]]
>>>>>    _value = Union[_scalar, _nonscalar]
>>>>>    TreeValue = Union[_value, 'Annotated[_value]']
>> 
>> I'm once again terminally confused about when to use _lower_case and
>> when to use CamelCase for such variables.
>> 
>
> That's my fault for not using them consistently.
>
> Generally:
>
> TitleCase: Classes, Real Type Names :tm:
> lowercase: instance names (and certain built-in types like str/bool/int)
> UPPERCASE: "Constants". This is an extremely loose idea in Python.
>
> I use the "_" prefix for any of the above categories to indicate 
> something not intended to be used outside of the current scope. These 
> types won't be accessible outside the module by default.
>
> TypeVars I use "T", "U", "V", etc unless I bind them to another type; 
> then I use e.g. NodeT instead.
>
> When it comes to things like type aliases, I believe I instinctively 
> used lowercase because I am not creating a new Real Type and wanted some 
> visual distinction from a real class name. (aliases created in this way 
> cannot be used with isinstance and hold no significance to mypy.)
>
> That's why I used _stub, _scalar, _nonscalar, and _value for those types 
> there. Then I disregarded my own convention and used TreeValue; perhaps 
> that ought to be tree_value for consistency as it's not a Real Type :tm:
>
> ...but then we have the SchemaInfo type aliases, which I named using the 
> same type name as they use in QAPI to help paint the association (and 
> pick up 'git grep' searchers.)
>
> Not fantastically consistent, sorry. Feel free to express a preference, 
> I clearly don't have a universally applied one.
>
> (Current leaning: rename TreeValue to tree_value, but leave everything 
> else as it is.)

https://www.python.org/dev/peps/pep-0484/#type-aliases

    Note that we recommend capitalizing alias names, since they
    represent user-defined types, which (like user-defined classes) are
    typically spelled that way.

I think this wants names like _Scalar, _NonScalar, _Value, TreeValue.

>> The reader has to connect _stub = Any back "we must approximate this".
>> Hmm... "we approximate with Any"?
>> 
>
> I can try to be more explicit about it.
>
>>>>>    
>>>>> +# This is a (strict) alias for an arbitrary object non-scalar, as above:
>>>>> +_DObject = Dict[str, object]
>>>>
>>>> Sounds greek :)
>>>>
>>>
>>> Admittedly it is still not explained well ... until the next patch. I'm
>>> going to leave it alone for now until you have a chance to respond to
>>> these walls of text.
>> 
>> You explain it some futher down.
>> 
>>>> It's almost the Dict part of _nonscalar, but not quite: object vs. Any.
>>>>
>>>> I naively expect something closer to
>>>>
>>>>      _scalar = ...
>>>>      _object = Dict[str, _stub]
>>>>      _nonscalar = Union[_object, List[_stub]
>>>>
>>>> and (still naively) expect _object to be good enough to serve as type
>>>> annotation for dicts representing JSON objects.
>>>
>>> "_object" would be good, except ... I am trying to avoid using that word
>>> because what does it mean? Python object? JSON object? Here at the
>>> boundary between two worlds, nothing makes sense.
>> 
>> Naming is hard.
>> 
>
> Yep. We can skip this debate by just naming the incoming types 
> SchemaInfo and similar... (cont'd below)
>
>> We talked about these names in review of v2.  Let me try again.
>> 
>> introspect.py needs to generate (a suitable C representation of) an
>> instance of QAPI type '[SchemaInfo]'.
>> 
>> Its current choice of "suitable C representation" is "a QLitQObject
>> initializer with #if and comments".  This is a "lose" representation:
>> QLitQObject can encode pretty much anything, not just instances of
>> '[SchemaInfo]'.
>> 
>> C code converts this QLitQObject to a SchemaInfoList object[*].
>> SchemaInfoList is the C type for QAPI type '[SchemaInfo]'.  Automated
>> tests ensure this conversion cannot fail, i.e. the "lose" QLitQObject
>> actually encodes a '[SchemaInfo]'.
>> 
>> introspect.py separates concerns: it first builds an abstract
>> representation of "set of QObject with #if and comments", then generates
>> C code from that.
>> 
>> Why "QObject with #if and comments", and not "QLitQObject with #if and
>> comments"?  Because QLitQObject is *one* way to represent QObject, and
>> we don't care which way outside C code generation.
>> 
>> A QObject represents a JSON value.  We could just as well say "JSON
>> value with #if and comments".
>> 
>> So, the abstract representation of "JSON value with #if and comments" is
>> what we're trying to type.  If you'd rather say "QObject with #if and
>> comments", that's fine.
>> 
>> Our abstract representation is a tree, where
>> 
>> * JSON null / QNull is represented as Python None
>> 
>> * JSON string / QString as str
>> 
>> * JSON true and false / QBool as bool
>> 
>> * JSON number / QNum is not implemented
>> 
>> * JSON object / QDict is dict mapping string keys to sub-trees
>> 
>> * JSON array / QList is list of sub-trees
>> 
>> * #if and comment tacked to a sub-tree is represented by wrapping the
>>    subtree in Annotated
>> 
>>    Wrapping a sub-tree that is already wrapped seems mostly useless, but
>>    the code doesn't care.
>> 
>>    Wrapping dictionary values makes no sense.  The code doesn't care, and
>>    gives you GIGO.
>> 
>>    Making the code reject these two feels out of scope.  If you want to
>>    anyway, I won't object unless it gets in the way of "in scope" stuff
>>    (right now it doesn't seem to).
>> 
>> Let me stress once again: this is *not* an abstract representation of a
>> 'SchemaInfo'.  Such a representation would also work, and you might like
>> it better, but it's simply not what we have.  Evidence: _tree_to_qlit()
>> works fine for *any* tree, not just for trees that encode instances of
>> 'SchemaInfo'.
>> 
>
> ... as long as you don't feel that's incorrect to do. We are free to 
> name those structures SchemaInfo but type _tree_to_qlit() in terms of 
> generic Dict objects, leaving us without a middle-abstract thing to name 
> at all.
>
> Based on your review of the "dummy types" patch, I'm going to assume 
> that's fine.

I guess it's okayish enough.  It still feels more complicated to me than
it needs to be.

QAPISchemaGenIntrospectVisitor an abstract representation of "QObject
with #if and comments" for each SchemaInfo.

This is not really a representation of SchemaInfo.  We can choose to
name it that way regardless, if it helps, and we explain it properly.

Once we hand off the data to _tree_to_qlit(), we can't name it that way
anymore, simply because _tree_to_qlit() treats it as the stupid
recursive data structure it is, and doesn't need or want to know about
SchemaInfo.

I think I'd dispense with _DObject entirely, and use TreeValue
throughout.  Yes, we'd use Any a bit more.  I doubt the additional
complexity to *sometimes* use object instead is worthwhile.  This data
structure is used only within this file.  It pretty much never changes
(because JSON doesn't).  It's basically write-only in
QAPISchemaGenIntrospectVisitor.  This means all the extra typing work
buys us is use of object instead of Any where it doesn't actually
matter.

I would use a more telling name than TreeValue, though.  One that
actually hints at the kind of value "representation of QObject with #if
and comment".

>> Since each (sub-)tree represents a JSON value / QObject, possibly with
>> annotations, I'd like to propose a few "obvious" (hahaha) names:
>> 
>> * an unannotated QObject: QObject
>> 
>> * an annotated QObject: AnnotatedQObject
>> 
>> * a possibly annotated QObject: PossiblyAnnotatedQObject
>> 
>>    Too long.  Rename QObject to BareQObject, then call this one QObject.
>> 
>> This gives us:
>> 
>>      _BareQObject = Union[None, str, bool, Dict[str, Any], List[Any]]
>>      _AnnotatedQObject = Annotated[_QObject]
>>      _QObject = Union[_BareQObject, _AnnotatedQObject]
>> 
>> Feel free to replace QObject by JsonValue in these names if you like
>> that better.  I think I'd slightly prefer JsonValue right now.
>> 
>> Now back to _DObject:
>> 
>>> (See patch 12/14 for A More Betterer Understanding of what _DObject is
>>> used for. It will contribute to A Greater Understanding.)
>>>
>>> Anyway, to your questions;
>>>
>>> (1) _DObject was my shorthand garbage way of saying "This is a Python
>>> Dict that represents a JSON object". Hence Dict-Object, "DObject". I
>>> have also derisively called this a "dictly-typed" object at times.
>> 
>> In the naming system I proposed, this is BareQDict, with an additional
>> complication: we actually have two different types for the same thing,
>> an anonymous one within _BareQObject, and a named one.
>> 
>>> (2) Dict[str, Any] and Dict[str, object] are similar, but do have a
>> 
>> The former is the anonymous one, the latter the named one.
>> 
>
> Kinda-sorta. I am talking about pure mypy here, and the differences 
> between typing two things this way.
>
> Though I think you're right: I used the "Any" form for the anonymous 
> type (inherent to the structure of a JSON compound type) and the 
> "object" form for the named forms (The SchemaInfo objects we build in 
> the visitors to pass to the generator later).
>
>>> semantic difference. I alluded to it by calling this a "(strict) alias";
>>> which does not help you understand any of the following points:
>>>
>>> Whenever you use "Any", it basically turns off type-checking at that
>>> boundary; it is the gradually typed boundary type. Avoid it whenever
>>> reasonably possible.
>>>
>>> Example time:
>>>
>>>
>>> def foo(thing: Any) -> None:
>>>       print(thing.value)  # Sure, I guess! We'll believe you.
>>>
>>>
>>> def foo(thing: object) -> None:
>>>       print(thing.value)  # BZZT, Python object has no value prop.
>>>
>>>
>>> Use "Any" when you really just cannot constrain the type, because you're
>>> out of bourbon or you've decided to give in to the darkness inside your
>>> heart.
>>>
>>> Use "object" when the type of the value actually doesn't matter, because
>>> you are only passing it on to something else later that will do its own
>>> type analysis or introspection on the object.
>>>
>>> For introspect.py, 'object' is actually a really good type when we can
>>> use it, because we interrogate the type exhaustively upon receipt in
>>> _tree_to_qlit.
>>>
>>>
>>> That leaves one question you would almost assuredly ask as a followup:
>>>
>>> "Why didn't you use object for the stub type to begin with?"
>>>
>>> Let's say we define _stub as `object` instead, the Python object. When
>>> _tree_to_qlit recurses on non-scalar structures, the held value there is
>>> only known as "object" and not as str/bool/None, which causes a typing
>>> error at that point.
>>>
>>> Moving the stub to "Any" tells mypy to ... not worry about what type we
>>> actually passed here. I gave in to the darkness in my heart. It's just
>>> too annoying without real recursion.
>> 
>> May I have an abridged version of this in the comments?  It might look
>> quaint in ten years, when we're all fluent in Python type annotations.
>> But right now, at least some readers aren't, and they can use a bit of
>> help.
>> 
>
> Yeah, I'm sympathetic to that.... though I'm not sure what to write or 
> where. I can add some reference points in the commit message, like this one:
>
> https://mypy.readthedocs.io/en/stable/dynamic_typing.html#any-vs-object
>
> maybe in conjunction with the named type aliases patch this is actually 
> sufficient?

I can see two solutions right now:

1. Use Dict[str, Any] throughout

   All we need to explain is

   * What the data structure is about (JSON annotated with ifconds and
     comments; got that, could use improvement perhaps)

   * Your work-around for the lack of recursive types (got that
     already)

   * That the use of Any bypasses type static checking on use (shouldn't
     be hard)

   * Where such uses are (I believe only in _tree_to_qlit(), were Any
     can't be avoided anyway).

2. Use Dict[str, object] where we can

   Now we get to explain a few more things:

   * Why we bother (to get stricter static type checks on use)

   * Where such uses are (I can't see any offhand)

   * Maybe also where we go from one static type to the other.

In either case, we also need to pick names that need no explanation, or
explain them.

>> [*] Actually, we take a shortcut and convert straight to QObject, but
>> that's just laziness.  See qmp_query_qmp_schema()'s "Minor hack:"
>> comment.
>> 
>
> :)



  parent reply	other threads:[~2021-02-09  9:25 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-02 17:46 [PATCH v4 00/14] qapi: static typing conversion, pt2 John Snow
2021-02-02 17:46 ` [PATCH v4 01/14] qapi/introspect.py: assert schema is not None John Snow
2021-02-02 17:46 ` [PATCH v4 02/14] qapi/introspect.py: use _make_tree for features nodes John Snow
2021-02-03 13:49   ` Markus Armbruster
2021-02-02 17:46 ` [PATCH v4 03/14] qapi/introspect.py: add _gen_features helper John Snow
2021-02-02 17:46 ` [PATCH v4 04/14] qapi/introspect.py: guard against ifcond/comment misuse John Snow
2021-02-03 14:08   ` Markus Armbruster
2021-02-03 20:42     ` John Snow
2021-02-03 21:18       ` Eduardo Habkost
2021-02-04 15:06       ` Markus Armbruster
2021-02-02 17:46 ` [PATCH v4 05/14] qapi/introspect.py: Unify return type of _make_tree() John Snow
2021-02-02 17:46 ` [PATCH v4 06/14] qapi/introspect.py: replace 'extra' dict with 'comment' argument John Snow
2021-02-03 14:23   ` Markus Armbruster
2021-02-03 21:21     ` John Snow
2021-02-04  8:37       ` Markus Armbruster
2021-02-02 17:46 ` [PATCH v4 07/14] qapi/introspect.py: Introduce preliminary tree typing John Snow
2021-02-03 14:30   ` Markus Armbruster
2021-02-03 21:40     ` John Snow
2021-02-02 17:46 ` [PATCH v4 08/14] qapi/introspect.py: create a typed 'Annotated' data strutcure John Snow
2021-02-03 14:47   ` Markus Armbruster
2021-02-03 21:50     ` Eduardo Habkost
2021-02-04 15:37       ` Markus Armbruster
2021-02-04 16:20         ` John Snow
2021-02-04 16:28         ` Eduardo Habkost
2021-02-05  8:45           ` Markus Armbruster
2021-02-03 23:12     ` John Snow
2021-02-05  9:10       ` Markus Armbruster
2021-02-02 17:46 ` [PATCH v4 09/14] qapi/introspect.py: improve _tree_to_qlit error message John Snow
2021-02-02 17:46 ` [PATCH v4 10/14] qapi/introspect.py: improve readability of _tree_to_qlit John Snow
2021-02-02 17:46 ` [PATCH v4 11/14] qapi/introspect.py: add type hint annotations John Snow
2021-02-03 15:15   ` Markus Armbruster
2021-02-03 23:27     ` John Snow
2021-02-05 13:42       ` Markus Armbruster
2021-02-08 21:39         ` John Snow
2021-02-08 21:53           ` John Snow
2021-02-09  9:06           ` Markus Armbruster [this message]
2021-02-10 17:31             ` John Snow
2021-02-02 17:46 ` [PATCH v4 12/14] qapi/introspect.py: add introspect.json dummy types John Snow
2021-02-02 17:46 ` [PATCH v4 13/14] qapi/introspect.py: Add docstring to _tree_to_qlit John Snow
2021-02-02 17:46 ` [PATCH v4 14/14] qapi/introspect.py: Update copyright and authors list John Snow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lfbxvcds.fsf@dusky.pond.sub.org \
    --to=armbru@redhat.com \
    --cc=crosa@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=michael.roth@amd.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.