From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D87F5C433F5 for ; Wed, 11 May 2022 06:23:31 +0000 (UTC) Received: from localhost ([::1]:44968 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nofkw-0006Vt-20 for qemu-devel@archiver.kernel.org; Wed, 11 May 2022 02:23:30 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53380) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nofdV-0003tu-74 for qemu-devel@nongnu.org; Wed, 11 May 2022 02:15:49 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:40771) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nofdR-0001nE-4H for qemu-devel@nongnu.org; Wed, 11 May 2022 02:15:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1652249743; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=mj8WDpuuSOdifqFgUDuBzL4i4171LeOSdKqcwdADJqQ=; b=IymCiP3xOdQxnrTjExz1DRu92omgE5WAVR0cLY15CFtmR3p9Ek4vKimdw2eqikWAIptIvy 6vekwp6gAetfY2x//U07AAShJKaqJIpZR8+bKL1s+B+4vuKhxpWYjb3Nu+IXR4d3ViJm42 XkOezf7jc2crKRZzr3SxZQMyt/xQ37g= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-182-yziFON97MS2Pgpui7lfP5g-1; Wed, 11 May 2022 02:15:42 -0400 X-MC-Unique: yziFON97MS2Pgpui7lfP5g-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B5CE6185A7BA for ; Wed, 11 May 2022 06:15:41 +0000 (UTC) Received: from blackfin.pond.sub.org (unknown [10.36.112.3]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 70B77401E9D; Wed, 11 May 2022 06:15:41 +0000 (UTC) Received: by blackfin.pond.sub.org (Postfix, from userid 1000) id 37B3A21E6880; Wed, 11 May 2022 08:15:40 +0200 (CEST) From: Markus Armbruster To: Andrea Bolognani Cc: Victor Toso , qemu-devel@nongnu.org, John Snow , Eric Blake , Kevin Wolf Subject: Re: [RFC PATCH v1 0/8] qapi: add generator for Golang interface References: <20220401224104.145961-1-victortoso@redhat.com> <87a6c52u68.fsf@pond.sub.org> <87v8uos8lb.fsf@pond.sub.org> <875ymop374.fsf@pond.sub.org> <87bkwff3q0.fsf@pond.sub.org> Date: Wed, 11 May 2022 08:15:40 +0200 In-Reply-To: (Andrea Bolognani's message of "Tue, 3 May 2022 02:40:14 -0700") Message-ID: <87czgkiohf.fsf@pond.sub.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.85 on 10.11.54.10 Received-SPF: pass client-ip=170.10.133.124; envelope-from=armbru@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Andrea Bolognani writes: > On Tue, May 03, 2022 at 09:57:27AM +0200, Markus Armbruster wrote: >> Andrea Bolognani writes: >> > I still feel that 1) users of a language SDK will ideally not need to >> > look at the QAPI schema or wire chatter too often >> >> I think the most likely point of contact is the QEMU QMP Reference >> Manual. > > Note that there isn't anything preventing us from including the > original QAPI name in the documentation for the corresponding Go > symbol, or even a link to the reference manual. > > So we could make jumping from the Go API documentation, which is what > a Go programmer will be looking at most of the time, to the QMP > documentation pretty much effortless. > >> My point is: a name override feature like the one you propose needs to >> be used with discipline and restraint. Adds to reviewers' mental load. >> Needs to be worth it. I'm not saying it isn't, I'm just pointing out a >> cost. > > Yeah, I get that. > > Note that I'm not suggesting it should be possible for a name to be > completely overridden - I just want to make it possible for a human > to provide the name parsing algorithm solutions to those problems it > can't figure out on its own. > > We could prevent that feature from being misused by verifying that > the symbol the annotation is attached to can be derived from the list > of words provided. That way, something like > > # SOMEName (completely-DIFFERENT-name) > > would be rejected and we would avoid misuse. Possibly as simple as "down-case both names and drop the funny characters, result must be the same". >> Wild idea: assume all lower case, but keep a list of exceptions. > > That could actually work reasonably well for QEMU because we only > need to handle correctly what's in the schema, not arbitrary input. > > There's always the risk of the list of exceptions getting out of sync > with the needs of the schema, but there's similarly no guarantee that > annotations are going to be introduced when they are necessary, so > it's mostly a wash. > > The only slight advantage of the annotation approach would be that it > might be easier to notice it being missing because it's close to the > name it refers to, while the list of exceptions is tucked away in a > script far away from it. We'd put it in qapi/pragma.json, I guess. >> The QAPI schema language uses three naming styles: >> >> * lower-case-with-hyphens for command and member names >> >> Many names use upper case and '_'. See pragma command-name-exceptions >> and member-name-exceptions. > > Looking at the output generated by Victor's WIP script, it looks like > these are already handled as nicely as those that don't fall under > any exception. > >> Some (many?) names lack separators between words (example: logappend). How many would be good to know. Ad hoc hackery to find names, filter out camels (because word splitting is too hard there), split into words, look up words in a word list: $ for i in `/usr/bin/python3 /work/armbru/qemu/scripts/qapi-gen.py -o qapi -b ../qapi/qapi-schema.json | sort -u | awk '/^### [a-z0-9-]+$/ { print "lc", $2; next } /^### [a-z0-9_-]+$/ { print lu; next } /^### [A-Z0-9_]+$/ { print "uc", $2; next } /^### ([A-Z][a-z]+)+/ { print "cc", $2; next } { print "mc", $2 }' | sed '/^mc\|^cc/d;s/^.. //;s/[^A-Za-z0-9]/\n/g' | tr A-Z a-z | sort -u`; do grep -q "^$i$" /usr/share/dict/words || echo $i; done 420 lines. How many arguably lack separators between words? Wild guess based on glancing at the output sideways: some 50. >> * UPPER_CASE_WITH_UNDERSCORE for event names >> >> * CamelCase for type names >> >> Capitalization of words is inconsistent in places (example: VncInfo >> vs. DisplayReloadOptionsVNC). >> >> What style conversions will we need for Go? Any other conversions come >> to mind? >> >> What problems do these conversions have? > > Go uses CamelCase for pretty much everything: types, methods, > constants... > > There's one slight wrinkle, in that the case of the first letter > decides whether it's going to be a PublicName or a privateName. We > can't do anything about that, but it shouldn't really affect us > that much because we'll want all QAPI names to be public. > > So the issues preventing us from producing a "perfect" Go API are > > 1. inconsistent capitalization in type names > > -> could be addressed by simply changing the schema, as type > names do not travel on the wire At the price of some churn in C code. Perhaps more consistent capitalization could be regarded as a slight improvement on its own. We need to see (a good sample of) the changes to judge. > 2. missing dashes in certain command/member names > > -> leads to Incorrectcamelcase. Names with words run together are arguably no uglier in CamelCase (Go) than in lower_case_with_underscores (C). > Kevin's work is supposed to > address this Except it's stuck. Perhaps Kevin and I can get it moving again. Perhaps we can try to extract a local alias feature that can be grown into the more ambitious aliases Kevin wants (if we can solve the issues). > 3. inability to know which parts of a lower-case-name or > UPPER_CASE_NAME are acronyms or are otherwise supposed to be > capitalized in a specific way > > -> leads to WeirdVncAndDbusCapitalization. There's currently no > way, either implemented or planned, to avoid this A list of words with special capitalization needs[*]? VNC is an acronym, some languagues want VNC in camels, some Vnc. DBus is an abbreviation, some languages want DBus in camels, some Dbus. > In addition to these I'm also thinking that QKeyCode and all the > QCrypto stuff should probably lose their prefixes. As Daniel pointed out, schema names sometimes have prefixes because we need the generated C identifiers to have prefixes. If we hate these prefixes enough, we can try to limit them to C identifiers. > Note that 3 shouldn't be an issue for Rust and addressing 1 would > actually make things worse for that language, because at the moment > at least *some* of the types follow its expected naming rules :) Solving Go problems by creating Rust problems doesn't feel like a good move to me. >> > Revised proposal for the annotation: >> > >> > ns:word-WORD-WoRD-123Word >> > >> > Words are always separated by dashes; "regular" words are entirely >> > lowercase, while the presence of even a single uppercase letter in a >> > word denotes the fact that its case should be preserved when the >> > naming conventions of the target language allow that. >> >> Is a word always capitalized the same for a single target language? Or >> could capitalization depend on context? > > I'm not aware of any language that would adopt more than a single > style of capitalization, outside of course the obvious > lower_case_name or UPPER_CASE_NAME scenarios where the original > capitalization stops being relevant. Makes sense. [*] Sounds like crony capitalism, doesn't it :)