From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46238) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YbafH-0007Wn-Bw for qemu-devel@nongnu.org; Fri, 27 Mar 2015 16:15:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YbafD-00084i-Sl for qemu-devel@nongnu.org; Fri, 27 Mar 2015 16:15:35 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42617) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YbafD-00084U-K0 for qemu-devel@nongnu.org; Fri, 27 Mar 2015 16:15:31 -0400 Message-ID: <5515BA61.8080608@redhat.com> Date: Fri, 27 Mar 2015 14:15:29 -0600 From: Eric Blake MIME-Version: 1.0 References: <1427227433-5030-1-git-send-email-eblake@redhat.com> <1427227433-5030-22-git-send-email-eblake@redhat.com> <87bnjeu8qi.fsf@blackfin.pond.sub.org> In-Reply-To: <87bnjeu8qi.fsf@blackfin.pond.sub.org> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="GMskNAMBmsI5fA3k0mrtH3KdCq20KmMT2" Subject: Re: [Qemu-devel] [PATCH v5 21/28] qapi: Require valid names List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: kwolf@redhat.com, lcapitulino@redhat.com, famz@redhat.com, qemu-devel@nongnu.org, wenchaoqemu@gmail.com This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --GMskNAMBmsI5fA3k0mrtH3KdCq20KmMT2 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 03/27/2015 02:48 AM, Markus Armbruster wrote: > Eric Blake writes: >=20 >> Previous commits demonstrated that the generator overlooked various >> bad naming situations: >> - types, commands, and events need a valid name >> - union and alternate branches cannot be marked optional >> >> The set of valid names includes [a-zA-Z0-9._-] (where '.' is >> useful only in downstream extensions). >> >> +valid_characters =3D set(string.ascii_letters + string.digits + '.' += '-' + '_') >=20 > strings.ascii_letters depends on the locale... https://docs.python.org/2/library/string.html string.ascii_letters The concatenation of the ascii_lowercase and ascii_uppercase constants described below. This value is not locale-dependent. You are thinking of string.letters, which IS locale-dependent. I intentionally used ascii_letters. >=20 >> +def check_name(expr_info, source, name, allow_optional =3D False): >> + membername =3D name >> + >> + if not isinstance(name, str): >> + raise QAPIExprError(expr_info, >> + "%s requires a string name" % source) >> + if name =3D=3D '**': >> + return >=20 > Doesn't this permit '**' anywhere, not just as pseudo-type in command > arguments and results? Yes, on the grounds that check_type then filters it appropriately. But worthy of a comment (probably both in the commit message AND in the code base). Grounds for a v6 respin. >=20 >> + if name.startswith('*'): >> + membername =3D name[1:] >> + if not allow_optional: >> + raise QAPIExprError(expr_info, >> + "%s does not allow optional name '%s'= " >> + % (source, name)) >> + if not set(membername) <=3D valid_characters: >=20 > ... so this check would break if we called locale.setlocale() in this > program. While I don't think we need to worry about it, I think you > could just as well use something like >=20 > valid_name =3D re.compile(r"^[A-Za-z0-9-._]+$") >=20 > if not valid_name.match(membername): regex is slightly slower than string matching _if the regex is precompiled_, and MUCH slower than string matching if the regex is compiled every time. In turn, string matching is slower than open-coding things, but has the benefit of being more compact and maintainable (open-coded loops are the worst on that front). Here's where I got my inspiration: https://stackoverflow.com/questions/1323364/in-python-how-to-check-if-a-s= tring-only-contains-certain-characters But I may just go with the regex after all (I don't know how efficient python is about reusing a regex when a function is called multiple times, rather than recompiling the regex every time. Personal side note: back in 2009 or so, I was able to make 'm4' significantly faster in the context of 'autoconf' when I taught it to cache the compilation of the 8 most-recently-encountered regex, rather than recompiling every time; and then made 'autoconf' even faster when I taught it to do actions that didn't require regex use from 'm4' in the first place.) The nice thing, though, is that I factored things so that the implementation of this one function can change without having to hunt down all call-sites, if I keep the contract the same. >> discriminator_type =3D base_fields.get(discriminator) >> if not discriminator_type: >> raise QAPIExprError(expr_info, >=20 > What happens when I try 'discriminator': '**'? No clue. Good thing for me to add somewhere in my series. However, I did manage to have this series at least think about a base type with '*switch':'Enum', then use 'discriminator':'*switch', which got past the generator (who knows what the C code would have done if have_switch was false?), so I plugged that hole; but in the process of testing it, I noticed that '*switch':'Enum' paired with 'discriminator':'switch' complained that 'switch' was not a member of the base class (which is a lie; it is present in the base class, but as an optional member). Proof that the generator is a bunch of hacks strung together :) --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --GMskNAMBmsI5fA3k0mrtH3KdCq20KmMT2 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJVFbphAAoJEKeha0olJ0NqegsIAJoM0Mn1PSeUKxl4U1UEiPt8 QvLMzNV+1NArXpsvYPXOp0JW3lf416Qgy9jMatbpol1AA6y04QIjGgcqklPmuvPJ eCWHIomCYCT9S+n1c/E9UrX23DY+/rHvptSNUmKbRVjmVtuL9foQAtKPINKcNABI DzPk1vxAfPa7b1xHpFmYcVYfz93QuIbg+fw0aojMC9U+pyMORgdcoMz81a2/sOia yRb6bwgpgK13dz1TvaK+BQBcWSVi5ZfQyW0TTY/syUQQYeavi+bVUuY8wz+qKBgx TM7nqZtC9MYSYhhu3doj0vt35F8RuCtmwZ+vVQqS6W1hze2lcAONsbq8HtNMtn4= =fboS -----END PGP SIGNATURE----- --GMskNAMBmsI5fA3k0mrtH3KdCq20KmMT2--