From mboxrd@z Thu Jan 1 00:00:00 1970 From: Luca Weiss Subject: Re: [PATCH 1/4] pylibfdt: add Property.as_stringlist() Date: Thu, 20 Jan 2022 20:24:31 +0100 Message-ID: <12903489.O9o76ZdvQC@g550jk> References: <20211225132558.167123-1-luca@z3ntu.xyz> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=z3ntu.xyz; s=z3ntu; t=1642706671; bh=vRkcR8FiSNSjNeU6mUQcD6y2MM6eHRfcI/W3vcYwDGs=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=EZZSWoLcYOmQUKqcCoxSR0QX3wduhKudqxmSL48ksfS0ZzGxWJ0nJ14vjEDTw2Bi6 CJBdt1/bRFDGgm4ItBRdHJkx6Go8dqbzWPtJtR1PUsV7BecoR/kRDE1x34nYklfl7z GYtqtyhE5X3TIuYo6MieGibQsMLW4G4SC1xFbwkc= In-Reply-To: List-ID: To: Rob Herring , David Gibson Cc: Devicetree Compiler Hi, On Dienstag, 18. J=E4nner 2022 11:08:22 CET David Gibson wrote: > On Wed, Jan 05, 2022 at 04:48:31PM -0600, Rob Herring wrote: > > On Sat, Dec 25, 2021 at 7:26 AM Luca Weiss wrote: > > > Add a new method for decoding a string list property, useful for e.g. > > > the "reg-names" property. > > >=20 > > > Also add a test for the new method. > > >=20 > > > Signed-off-by: Luca Weiss > > > --- > > >=20 > > > pylibfdt/libfdt.i | 7 +++++++ > > > tests/pylibfdt_tests.py | 8 ++++++++ > > > 2 files changed, 15 insertions(+) > > >=20 > > > diff --git a/pylibfdt/libfdt.i b/pylibfdt/libfdt.i > > > index 9ccc57b..c81b504 100644 > > > --- a/pylibfdt/libfdt.i > > > +++ b/pylibfdt/libfdt.i > > >=20 > > > @@ -724,6 +724,13 @@ class Property(bytearray): > > > raise ValueError('Property contains embedded nul > > > characters') > > > =20 > > > return self[:-1].decode('utf-8') > > >=20 > > > + def as_stringlist(self): > > > + """Unicode is supported by decoding from UTF-8""" > > > + if self[-1] !=3D 0: > > > + raise ValueError('Property lacks nul termination') > > > + parts =3D self[:-1].split(b'\x00') > > > + return list(map(lambda x: x.decode('utf-8'), parts)) > >=20 > > Doesn't this result in multiple decode() calls when a single one would > > work: > >=20 > > return data[:-1].decode(encoding=3D'ascii').split('\0') >=20 > Uh.. I guess? I feel like the split-then-decode makes more logical > sense, since it's splitting a bytestring, then decoding the pieces as > utf-8 strings. That makes sense to me given that raw properties are > bytestrings and can included multiple different datatypes and > encodings in general. >=20 > In this specific case, decode-then-split would be fine as well, since > \u00000 works as a separator unambiguously, but it still seems > conceptually muddier to me. The reason I made it this way was mostly because I didn't know you could ha= ve=20 null bytes present in str.decode, I just remember horrible UnicodeDecodeErr= ors=20 on invalid input from other projects. If wanted I can make a patch changing to just one str.decode call as yes, i= t's=20 surely more efficient than doing multiple. But it's also not like there wil= l be=20 100 parts of this property that's being decoded in a performance critical=20 application so I think it's also okay like that. Let me know what you think. Regards Luca