[XEN PATCH v3 0/1] xen: rework compat headers generation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [XEN PATCH v3 0/1] xen: rework compat headers generation
@ 2023-01-16 18:10 Anthony PERARD
  2023-01-16 18:10 ` [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script Anthony PERARD
  0 siblings, 1 reply; 12+ messages in thread
From: Anthony PERARD @ 2023-01-16 18:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Anthony PERARD, George Dunlap, Julien Grall, Andrew Cooper,
	Wei Liu, Stefano Stabellini, Jan Beulich

Patch series available in this git branch:
https://xenbits.xen.org/git-http/people/aperard/xen-unstable.git br.build-system-xen-include-rework-v3

v3:
- Rewrite script into python instead of perl.
  (last patch of the series)

v2:
- new patch [1/4] to fix issue with command line that can be way too long
- other small changes, and reorder patches

Hi,

This patch series is about 2 improvement. First one is to use $(if_changed, )
in "include/Makefile" to make the generation of the compat headers less verbose
and to have the command line part of the decision to rebuild the headers.
Second one is to replace one slow script by a much faster one, and save time
when generating the headers.

Thanks.

Anthony PERARD (1):
  build: replace get-fields.sh by a python script

 xen/include/Makefile            |   6 +-
 xen/tools/compat-xlat-header.py | 468 ++++++++++++++++++++++++++++
 xen/tools/get-fields.sh         | 528 --------------------------------
 3 files changed, 470 insertions(+), 532 deletions(-)
 create mode 100644 xen/tools/compat-xlat-header.py
 delete mode 100644 xen/tools/get-fields.sh

-- 
Anthony PERARD



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script
  2023-01-16 18:10 [XEN PATCH v3 0/1] xen: rework compat headers generation Anthony PERARD
@ 2023-01-16 18:10 ` Anthony PERARD
  2023-01-17 16:07   ` Luca Fancellu
  2023-01-17 17:21   ` Andrew Cooper
  0 siblings, 2 replies; 12+ messages in thread
From: Anthony PERARD @ 2023-01-16 18:10 UTC (permalink / raw)
  To: xen-devel
  Cc: Anthony PERARD, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

The get-fields.sh which generate all the include/compat/.xlat/*.h
headers is quite slow. It takes for example nearly 3 seconds to
generate platform.h on a recent machine, or 2.3 seconds for memory.h.

Rewriting the mix of shell/sed/python into a single python script make
the generation of those file a lot faster.

No functional change, the headers generated are identical.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
---

Notes:
    To test the header generation, I've submit a branch to gitlab ci,
    where all the headers where generated, and for each one both the shell
    script and the python script where run and the result of both
    compared.
    
    v3:
        convert to python script instead of perl
        - this should allow more developper do be able to review/edit it.
        - it avoid adding a dependency on perl for the hypervisor build.
    
        It can be twice as slow than the perl version, but overall, when doing
        a build with make, there isn't much difference between the perl and
        python script.
        We might be able to speed the python script up by precompiling the
        many regex, but it's probably not worth it. (python already have a
        cache of compiled regex, but I think it's small, maybe 10 or so)
    
    v2:
    - Add .pl extension to the perl script
    - remove "-w" from the shebang as it is duplicate of "use warning;"
    - Add a note in the commit message that the "headers generated are identical".

 xen/include/Makefile            |   6 +-
 xen/tools/compat-xlat-header.py | 468 ++++++++++++++++++++++++++++
 xen/tools/get-fields.sh         | 528 --------------------------------
 3 files changed, 470 insertions(+), 532 deletions(-)
 create mode 100644 xen/tools/compat-xlat-header.py
 delete mode 100644 xen/tools/get-fields.sh

diff --git a/xen/include/Makefile b/xen/include/Makefile
index 65be310eca..b950423efe 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -60,9 +60,7 @@ cmd_compat_c = \
 
 quiet_cmd_xlat_headers = GEN     $@
 cmd_xlat_headers = \
-    while read what name; do \
-        $(SHELL) $(srctree)/tools/get-fields.sh "$$what" compat_$$name $< || exit $$?; \
-    done <$(patsubst $(obj)/compat/%,$(obj)/compat/.xlat/%,$(basename $<)).lst >$@.new; \
+    $(PYTHON) $(srctree)/tools/compat-xlat-header.py $< $(patsubst $(obj)/compat/%,$(obj)/compat/.xlat/%,$(basename $<)).lst > $@.new; \
     mv -f $@.new $@
 
 targets += $(headers-y)
@@ -80,7 +78,7 @@ $(obj)/compat/%.c: $(src)/public/%.h $(srcdir)/xlat.lst $(srctree)/tools/compat-
 	$(call if_changed,compat_c)
 
 targets += $(patsubst compat/%, compat/.xlat/%, $(headers-y))
-$(obj)/compat/.xlat/%.h: $(obj)/compat/%.h $(obj)/compat/.xlat/%.lst $(srctree)/tools/get-fields.sh FORCE
+$(obj)/compat/.xlat/%.h: $(obj)/compat/%.h $(obj)/compat/.xlat/%.lst $(srctree)/tools/compat-xlat-header.py FORCE
 	$(call if_changed,xlat_headers)
 
 quiet_cmd_xlat_lst = GEN     $@
diff --git a/xen/tools/compat-xlat-header.py b/xen/tools/compat-xlat-header.py
new file mode 100644
index 0000000000..c1b361ac56
--- /dev/null
+++ b/xen/tools/compat-xlat-header.py
@@ -0,0 +1,468 @@
+#!/usr/bin/env python
+
+from __future__ import print_function
+import re
+import sys
+
+typedefs = []
+
+def removeprefix(s, prefix):
+    if s.startswith(prefix):
+        return s[len(prefix):]
+    return s
+
+def removesuffix(s, suffix):
+    if s.endswith(suffix):
+        return s[:-len(suffix)]
+    return s
+
+def get_fields(looking_for, header_tokens):
+    level = 1
+    aggr = 0
+    fields = []
+    name = ''
+
+    for token in header_tokens:
+        if token in ['struct', 'union']:
+            if level == 1:
+                aggr = 1
+                fields = []
+                name = ''
+        elif token == '{':
+            level += 1
+        elif token == '}':
+            level -= 1
+            if level == 1 and name == looking_for:
+                fields.append(token)
+                return fields
+        elif re.match(r'^[a-zA-Z_]', token):
+            if not (aggr == 0 or name != ''):
+                name = token
+
+        if aggr != 0:
+            fields.append(token)
+
+    return []
+
+def get_typedefs(tokens):
+    level = 1
+    state = 0
+    typedefs = []
+    for token in tokens:
+        if token == 'typedef':
+            if level == 1:
+                state = 1
+        elif re.match(r'^COMPAT_HANDLE\((.*)\)$', token):
+            if not (level != 1 or state != 1):
+                state = 2
+        elif token in ['[', '{']:
+            level += 1
+        elif token in [']', '}']:
+            level -= 1
+        elif token == ';':
+            if level == 1:
+                state = 0
+        elif re.match(r'^[a-zA-Z_]', token):
+            if not (level != 1 or state != 2):
+                typedefs.append(token)
+    return typedefs
+
+def build_enums(name, tokens):
+    level = 1
+    kind = ''
+    named = ''
+    fields = []
+    members = []
+    id = ''
+
+    for token in tokens:
+        if token in ['struct', 'union']:
+            if not level != 2:
+                fields = ['']
+            kind = "%s;%s" % (token, kind)
+        elif token == '{':
+            level += 1
+        elif token == '}':
+            level -= 1
+            if level == 1:
+                subkind = kind
+                (subkind, _, _) = subkind.partition(';')
+                if subkind == 'union':
+                    print("\nenum XLAT_%s {" % (name,))
+                    for m in members:
+                        print("    XLAT_%s_%s," % (name, m))
+                    print("};")
+                return
+            elif level == 2:
+                named = '?'
+        elif re.match(r'^[a-zA-Z_]', token):
+            id = token
+            k = kind
+            (_, _, k) = k.partition(';')
+            if named != '' and k != '':
+                if len(fields) > 0 and fields[0] == '':
+                    fields.pop(0)
+                build_enums("%s_%s" % (name, token), fields)
+                named = '!'
+        elif token == ',':
+            if level == 2:
+                members.append(id)
+        elif token == ';':
+            if level == 2:
+                members.append(id)
+            if named != '':
+                (_, _, kind) = kind.partition(';')
+            named = ''
+        if len(fields) != 0:
+            fields.append(token)
+
+def handle_field(prefix, name, id, type, fields):
+    if len(fields) == 0:
+        print(" \\")
+        if type == '':
+            print("%s(_d_)->%s = (_s_)->%s;" % (prefix, id, id), end='')
+        else:
+            k = id.replace('.', '_')
+            print("%sXLAT_%s_HNDL_%s(_d_, _s_);" % (prefix, name, k), end='')
+    elif not re.search(r'[{}]', ' '.join(fields)):
+        tag = ' '.join(fields)
+        tag = re.sub(r'\s*(struct|union)\s+(compat_)?(\w+)\s.*', '\\3', tag)
+        print(" \\")
+        print("%sXLAT_%s(&(_d_)->%s, &(_s_)->%s);" % (prefix, tag, id, id), end='')
+    else:
+        func_id = id
+        func_tokens = fields
+        kind = ''
+        array = ""
+        level = 1
+        arrlvl = 1
+        array_type = ''
+        id = ''
+        type = ''
+        fields = []
+        for token in func_tokens:
+            if token in ['struct', 'union']:
+                if level == 2:
+                    fields = ['']
+                if level == 1:
+                    kind = token
+                    if kind == 'union':
+                        tmp = func_id.replace('.', '_')
+                        print(" \\")
+                        print("%sswitch (%s) {" % (prefix, tmp), end='')
+            elif token == '{':
+                level += 1
+                id = ''
+            elif token == '}':
+                level -= 1
+                id = ''
+                if level == 1 and kind == 'union':
+                    print(" \\")
+                    print("%s}" % (prefix,), end='')
+            elif token == '[':
+                if level != 2 or arrlvl != 1:
+                    pass
+                elif array == '':
+                    array = ' '
+                else:
+                    array = "%s;" % (array,)
+                arrlvl += 1
+            elif token == ']':
+                arrlvl -= 1
+            elif re.match(r'^COMPAT_HANDLE\((.*)\)$', token):
+                if level == 2 and id == '':
+                    m = re.match(r'^COMPAT_HANDLE\((.*)\)$', token)
+                    type = m.groups()[0]
+                    type = removeprefix(type, 'compat_')
+            elif token == "compat_domain_handle_t":
+                if level == 2 and id == '':
+                    array_type = token
+            elif re.match(r'^[a-zA-Z_]', token):
+                if id == '' and type == '' and array_type == '':
+                    for id in typedefs:
+                        if id == token:
+                            type = id
+                    if type == '':
+                        id = token
+                    else:
+                        id = ''
+                else:
+                    id = token
+            elif token in [',', ';']:
+                if level == 2 and not re.match(r'^_pad\d*$', id):
+                    if kind == 'union':
+                        tmp = "%s.%s" % (func_id, id)
+                        tmp = tmp.replace('.', '_')
+                        print(" \\")
+                        print("%scase XLAT_%s_%s:" % (prefix, name, tmp), end='')
+                        if len(fields) > 0 and fields[0] == '':
+                            fields.pop(0)
+                        handle_field("%s    " % (prefix,), name, "%s.%s" % (func_id, id), type, fields)
+                    elif array == '' and array_type == '':
+                        if len(fields) > 0 and fields[0] == '':
+                            fields.pop(0)
+                        handle_field(prefix, name, "%s.%s" % (func_id, id), type, fields)
+                    elif array == '':
+                        copy_array("    ", "%s.%s" % (func_id, id))
+                    else:
+                        (_, _, array) = array.partition(';')
+                        if len(fields) > 0 and fields[0] == '':
+                            fields.pop(0)
+                        handle_array(prefix, name, "{func_id}.{id}", array, type, fields)
+                    if token == ';':
+                        fields = []
+                        id = ''
+                        type = ''
+                    array = ''
+                    if kind == 'union':
+                        print(" \\")
+                        print("%s    break;" % (prefix,), end='')
+            else:
+                if array != '':
+                    array = "%s %s" % (array, token)
+            if len(fields) > 0:
+                fields.append(token)
+
+def copy_array(prefix, id):
+    print(" \\")
+    print("%sif ((_d_)->%s != (_s_)->%s) \\" % (prefix, id, id))
+    print("%s    memcpy((_d_)->%s, (_s_)->%s, sizeof((_d_)->%s));" % (prefix, id, id, id), end='')
+
+def handle_array(prefix, name, id, array, type, fields):
+    i = re.sub(r'[^;]', '', array)
+    i = "i%s" % (len(i),)
+
+    print(" \\")
+    print("%s{ \\" % (prefix,))
+    print("%s    unsigned int %s; \\" % (prefix, i))
+    (head, _, tail) = array.partition(';')
+    head = head.strip()
+    print("%s    for (%s = 0; %s < %s; ++%s) {" % (prefix, i, i, head, i), end='')
+    if not ';' in array:
+        handle_field("%s        " % (prefix,), name, "%s[%s]" % (id, i), type, fields)
+    else:
+        handle_array("%s        " % (prefix,) , name, "%s[%s]" % (id, i), tail, type, fields)
+    print(" \\")
+    print("%s    } \\" % (prefix,))
+    print("%s}" % (prefix,), end='')
+
+def build_body(name, tokens):
+    level = 1
+    id = ''
+    array = ''
+    arrlvl = 1
+    array_type = ''
+    type = ''
+    fields = []
+
+    print("\n#define XLAT_%s(_d_, _s_) do {" % (name,), end='')
+
+    for token in tokens:
+        if token in ['struct', 'union']:
+            if level == 2:
+                fields = ['']
+        elif token == '{':
+            level += 1
+            id = ''
+        elif token == '}':
+            level -= 1
+            id = ''
+        elif token == '[':
+            if level != 2 or arrlvl != 1:
+                pass
+            elif array == '':
+                array = ' '
+            else:
+                array = "%s;" % (array,)
+            arrlvl += 1
+        elif token == ']':
+            arrlvl -= 1
+        elif re.match(r'^COMPAT_HANDLE\((.*)\)$', token):
+            if level == 2 and id == '':
+                m = re.match(r'^COMPAT_HANDLE\((.*)\)$', token)
+                type = m.groups()[0]
+                type = removeprefix(type, 'compat_')
+        elif token == "compat_domain_handle_t":
+            if level == 2 and id == '':
+                array_type = token
+        elif re.match(r'^[a-zA-Z_]', token):
+            if array != '':
+                array = "%s %s" % (array, token)
+            elif id == '' and type == '' and array_type == '':
+                for id in typedefs:
+                    if id != token:
+                        type = id
+                if type == '':
+                    id = token
+                else:
+                    id = ''
+            else:
+                id = token
+        elif token in [',', ';']:
+            if level == 2 and not re.match(r'^_pad\d*$', id):
+                if array == '' and array_type == '':
+                    if len(fields) > 0 and fields[0] == '':
+                        fields.pop(0)
+                    handle_field("    ", name, id, type, fields)
+                elif array == '':
+                    copy_array("    ", id)
+                else:
+                    (head, sep, tmp) = array.partition(';')
+                    if sep == '':
+                        tmp = head
+                    if len(fields) > 0 and fields[0] == '':
+                        fields.pop(0)
+                    handle_array("    ", name, id, tmp, type, fields)
+                if token == ';':
+                    fields = []
+                    id = ''
+                    type = ''
+                array = ''
+        else:
+            if array != '':
+                array = "%s %s" % (array, token)
+        if len(fields) > 0:
+            fields.append(token)
+    print(" \\\n} while (0)")
+
+def check_field(kind, name, field, extrafields):
+    if not re.search(r'[{}]', ' '.join(extrafields)):
+        print("; \\")
+        if len(extrafields) != 0:
+            for token in extrafields:
+                if token in ['struct', 'union']:
+                    pass
+                elif re.match(r'^[a-zA-Z_]', token):
+                    print("    CHECK_%s" % (removeprefix(token, 'xen_'),), end='')
+                    break
+                else:
+                    raise Exception("Malformed compound declaration: '%s'" % (token,))
+        elif not '.' in field:
+            print("    CHECK_FIELD_(%s, %s, %s)" % (kind, name, field), end='')
+        else:
+            n = field.count('.')
+            field = field.replace('.', ', ')
+            print("    CHECK_SUBFIELD_%s_(%s, %s, %s)" % (n, kind, name, field), end='')
+    else:
+        level = 1
+        fields = []
+        id = ''
+
+        for token in extrafields:
+            if token in ['struct', 'union']:
+                if level == 2:
+                    fields = ['']
+            elif token == '{':
+                level += 1
+                id = ''
+            elif token == '}':
+                level -= 1
+                id = ''
+            elif re.match(r'^compat_.*_t$', token):
+                if level == 2:
+                    fields = ['']
+                    token = removesuffix(token, '_t')
+                    token = removeprefix(token, 'compat_')
+            elif re.match(r'^evtchn_.*_compat_t$', token):
+                if level == 2 and token != "evtchn_port_compat_t":
+                    fields = ['']
+                    token = removesuffix(token, '_compat_t')
+            elif re.match(r'^[a-zA-Z_]', token):
+                id = token
+            elif token in [',', ';']:
+                if level == 2 and not re.match(r'^_pad\d*$', id):
+                    if len(fields) > 0 and fields[0] == '':
+                        fields.pop(0)
+                    check_field(kind, name, "%s.%s" % (field, id), fields)
+                    if token == ";":
+                        fields = []
+                        id = ''
+            if len(fields) > 0:
+                fields.append(token)
+
+def build_check(name, tokens):
+    level = 1
+    fields = []
+    kind = ''
+    id = ''
+    arrlvl = 1
+
+    print("")
+    print("#define CHECK_%s \\" % (name,))
+
+    for token in tokens:
+        if token in ['struct', 'union']:
+            if level == 1:
+                kind = token
+                print("    CHECK_SIZE_(%s, %s)" % (kind, name), end='')
+            elif level == 2:
+                fields = ['']
+        elif token == '{':
+            level += 1
+            id = ''
+        elif token == '}':
+            level -= 1
+            id = ''
+        elif token == '[':
+            arrlvl += 1
+        elif token == ']':
+            arrlvl -= 1
+        elif re.match(r'^compat_.*_t$', token):
+            if level == 2 and token != "compat_argo_port_t":
+                fields = ['']
+                token = removesuffix(token, '_t')
+                token = removeprefix(token, 'compat_')
+        elif re.match(r'^[a-zA-Z_]', token):
+            if not (level != 2 or arrlvl != 1):
+                id = token
+        elif token in [',', ';']:
+            if level == 2 and not re.match(r'^_pad\d*$', id):
+                if len(fields) > 0 and fields[0] == '':
+                    fields.pop(0)
+                check_field(kind, name, id, fields)
+                if token == ";":
+                    fields = []
+                    id = ''
+
+        if len(fields) > 0:
+            fields.append(token)
+    print("")
+
+
+def main():
+    header_tokens = []
+
+    with open(sys.argv[1]) as header:
+        for line in header:
+            if re.match(r'^\s*(#|$)', line):
+                continue
+            line = re.sub(r'([\]\[,;:{}])', ' \\1 ', line)
+            line = line.strip()
+            header_tokens += re.split(r'\s+', line)
+
+    global typedefs
+    typedefs = get_typedefs(header_tokens)
+
+    with open(sys.argv[2]) as compat_list:
+        for line in compat_list:
+            words = re.split(r'\s+', line, maxsplit=1)
+            what = words[0]
+            name = words[1]
+
+            name = removeprefix(name, 'xen')
+            name = name.strip()
+
+            fields = get_fields("compat_%s" % (name,), header_tokens)
+            if len(fields) == 0:
+                raise Exception("Fields of 'compat_%s' not found in '%s'" % (name, sys.argv[1]))
+
+            if what == "!":
+                build_enums(name, fields)
+                build_body(name, fields)
+            elif what == "?":
+                build_check(name, fields)
+            else:
+                raise Exception("Invalid translation indicator: '%s'" % (what,))
+
+if __name__ == '__main__':
+    main()
diff --git a/xen/tools/get-fields.sh b/xen/tools/get-fields.sh
deleted file mode 100644
index 002db2093f..0000000000
--- a/xen/tools/get-fields.sh
+++ /dev/null
@@ -1,528 +0,0 @@
-test -n "$1" -a -n "$2" -a -n "$3"
-set -ef
-
-SED=sed
-if test -x /usr/xpg4/bin/sed; then
-	SED=/usr/xpg4/bin/sed
-fi
-if test -z ${PYTHON}; then
-	PYTHON=`/usr/bin/env python`
-fi
-if test -z ${PYTHON}; then
-	echo "Python not found"
-	exit 1
-fi
-
-get_fields ()
-{
-	local level=1 aggr=0 name= fields=
-	for token in $2
-	do
-		case "$token" in
-		struct|union)
-			test $level != 1 || aggr=1 fields= name=
-			;;
-		"{")
-			level=$(expr $level + 1)
-			;;
-		"}")
-			level=$(expr $level - 1)
-			if [ $level = 1 -a $name = $1 ]
-			then
-				echo "$fields }"
-				return 0
-			fi
-			;;
-		[a-zA-Z_]*)
-			test $aggr = 0 -o -n "$name" || name="$token"
-			;;
-		esac
-		test $aggr = 0 || fields="$fields $token"
-	done
-}
-
-get_typedefs ()
-{
-	local level=1 state=
-	for token in $1
-	do
-		case "$token" in
-		typedef)
-			test $level != 1 || state=1
-			;;
-		COMPAT_HANDLE\(*\))
-			test $level != 1 -o "$state" != 1 || state=2
-			;;
-		[\{\[])
-			level=$(expr $level + 1)
-			;;
-		[\}\]])
-			level=$(expr $level - 1)
-			;;
-		";")
-			test $level != 1 || state=
-			;;
-		[a-zA-Z_]*)
-			test $level != 1 -o "$state" != 2 || echo "$token"
-			;;
-		esac
-	done
-}
-
-build_enums ()
-{
-	local level=1 kind= fields= members= named= id= token
-	for token in $2
-	do
-		case "$token" in
-		struct|union)
-			test $level != 2 || fields=" "
-			kind="$token;$kind"
-			;;
-		"{")
-			level=$(expr $level + 1)
-			;;
-		"}")
-			level=$(expr $level - 1)
-			if [ $level = 1 ]
-			then
-				if [ "${kind%%;*}" = union ]
-				then
-					echo
-					echo "enum XLAT_$1 {"
-					for m in $members
-					do
-						echo "    XLAT_${1}_$m,"
-					done
-					echo "};"
-				fi
-				return 0
-			elif [ $level = 2 ]
-			then
-				named='?'
-			fi
-			;;
-		[a-zA-Z]*)
-			id=$token
-			if [ -n "$named" -a -n "${kind#*;}" ]
-			then
-				build_enums ${1}_$token "$fields"
-				named='!'
-			fi
-			;;
-		",")
-			test $level != 2 || members="$members $id"
-			;;
-		";")
-			test $level != 2 || members="$members $id"
-			test -z "$named" || kind=${kind#*;}
-			named=
-			;;
-		esac
-		test -z "$fields" || fields="$fields $token"
-	done
-}
-
-handle_field ()
-{
-	if [ -z "$5" ]
-	then
-		echo " \\"
-		if [ -z "$4" ]
-		then
-			printf %s "$1(_d_)->$3 = (_s_)->$3;"
-		else
-			printf %s "$1XLAT_${2}_HNDL_$(echo $3 | $SED 's,\.,_,g')(_d_, _s_);"
-		fi
-	elif [ -z "$(echo "$5" | $SED 's,[^{}],,g')" ]
-	then
-		local tag=$(echo "$5" | ${PYTHON} -c '
-import re,sys
-for line in sys.stdin.readlines():
-    sys.stdout.write(re.subn(r"\s*(struct|union)\s+(compat_)?(\w+)\s.*", r"\3", line)[0].rstrip() + "\n")
-')
-		echo " \\"
-		printf %s "${1}XLAT_$tag(&(_d_)->$3, &(_s_)->$3);"
-	else
-		local level=1 kind= fields= id= array= arrlvl=1 array_type= type= token
-		for token in $5
-		do
-			case "$token" in
-			struct|union)
-				test $level != 2 || fields=" "
-				if [ $level = 1 ]
-				then
-					kind=$token
-					if [ $kind = union ]
-					then
-						echo " \\"
-						printf %s "${1}switch ($(echo $3 | $SED 's,\.,_,g')) {"
-					fi
-				fi
-				;;
-			"{")
-				level=$(expr $level + 1) id=
-				;;
-			"}")
-				level=$(expr $level - 1) id=
-				if [ $level = 1 -a $kind = union ]
-				then
-					echo " \\"
-					printf %s "$1}"
-				fi
-				;;
-			"[")
-				if [ $level != 2 -o $arrlvl != 1 ]
-				then
-					:
-				elif [ -z "$array" ]
-				then
-					array=" "
-				else
-					array="$array;"
-				fi
-				arrlvl=$(expr $arrlvl + 1)
-				;;
-			"]")
-				arrlvl=$(expr $arrlvl - 1)
-				;;
-			COMPAT_HANDLE\(*\))
-				if [ $level = 2 -a -z "$id" ]
-				then
-					type=${token#COMPAT_HANDLE?}
-					type=${type%?}
-					type=${type#compat_}
-				fi
-				;;
-			compat_domain_handle_t)
-				if [ $level = 2 -a -z "$id" ]
-				then
-					array_type=$token
-				fi
-				;;
-			[a-zA-Z]*)
-				if [ -z "$id" -a -z "$type" -a -z "$array_type" ]
-				then
-					for id in $typedefs
-					do
-						test $id != "$token" || type=$id
-					done
-					if [ -z "$type" ]
-					then
-						id=$token
-					else
-						id=
-					fi
-				else
-					id=$token
-				fi
-				;;
-			[\,\;])
-				if [ $level = 2 -a -n "$(echo $id | $SED 's,^_pad[[:digit:]]*,,')" ]
-				then
-					if [ $kind = union ]
-					then
-						echo " \\"
-						printf %s "${1}case XLAT_${2}_$(echo $3.$id | $SED 's,\.,_,g'):"
-						handle_field "$1    " $2 $3.$id "$type" "$fields"
-					elif [ -z "$array" -a -z "$array_type" ]
-					then
-						handle_field "$1" $2 $3.$id "$type" "$fields"
-					elif [ -z "$array" ]
-					then
-						copy_array "    " $3.$id
-					else
-						handle_array "$1" $2 $3.$id "${array#*;}" "$type" "$fields"
-					fi
-					test "$token" != ";" || fields= id= type=
-					array=
-					if [ $kind = union ]
-					then
-						echo " \\"
-						printf %s "$1    break;"
-					fi
-				fi
-				;;
-			*)
-				if [ -n "$array" ]
-				then
-					array="$array $token"
-				fi
-				;;
-			esac
-			test -z "$fields" || fields="$fields $token"
-		done
-	fi
-}
-
-copy_array ()
-{
-	echo " \\"
-	echo "${1}if ((_d_)->$2 != (_s_)->$2) \\"
-	printf %s "$1    memcpy((_d_)->$2, (_s_)->$2, sizeof((_d_)->$2));"
-}
-
-handle_array ()
-{
-	local i="i$(echo $4 | $SED 's,[^;], ,g' | wc -w | $SED 's,[[:space:]]*,,g')"
-	echo " \\"
-	echo "$1{ \\"
-	echo "$1    unsigned int $i; \\"
-	printf %s "$1    for ($i = 0; $i < "${4%%;*}"; ++$i) {"
-	if [ "$4" = "${4#*;}" ]
-	then
-		handle_field "$1        " $2 $3[$i] "$5" "$6"
-	else
-		handle_array "$1        " $2 $3[$i] "${4#*;}" "$5" "$6"
-	fi
-	echo " \\"
-	echo "$1    } \\"
-	printf %s "$1}"
-}
-
-build_body ()
-{
-	echo
-	printf %s "#define XLAT_$1(_d_, _s_) do {"
-	local level=1 fields= id= array= arrlvl=1 array_type= type= token
-	for token in $2
-	do
-		case "$token" in
-		struct|union)
-			test $level != 2 || fields=" "
-			;;
-		"{")
-			level=$(expr $level + 1) id=
-			;;
-		"}")
-			level=$(expr $level - 1) id=
-			;;
-		"[")
-			if [ $level != 2 -o $arrlvl != 1 ]
-			then
-				:
-			elif [ -z "$array" ]
-			then
-				array=" "
-			else
-				array="$array;"
-			fi
-			arrlvl=$(expr $arrlvl + 1)
-			;;
-		"]")
-			arrlvl=$(expr $arrlvl - 1)
-			;;
-		COMPAT_HANDLE\(*\))
-			if [ $level = 2 -a -z "$id" ]
-			then
-				type=${token#COMPAT_HANDLE?}
-				type=${type%?}
-				type=${type#compat_}
-			fi
-			;;
-		compat_domain_handle_t)
-			if [ $level = 2 -a -z "$id" ]
-			then
-				array_type=$token
-			fi
-			;;
-		[a-zA-Z_]*)
-			if [ -n "$array" ]
-			then
-				array="$array $token"
-			elif [ -z "$id" -a -z "$type" -a -z "$array_type" ]
-			then
-				for id in $typedefs
-				do
-					test $id != "$token" || type=$id
-				done
-				if [ -z "$type" ]
-				then
-					id=$token
-				else
-					id=
-				fi
-			else
-				id=$token
-			fi
-			;;
-		[\,\;])
-			if [ $level = 2 -a -n "$(echo $id | $SED 's,^_pad[[:digit:]]*,,')" ]
-			then
-				if [ -z "$array" -a -z "$array_type" ]
-				then
-					handle_field "    " $1 $id "$type" "$fields"
-				elif [ -z "$array" ]
-				then
-					copy_array "    " $id
-				else
-					handle_array "    " $1 $id "${array#*;}" "$type" "$fields"
-				fi
-				test "$token" != ";" || fields= id= type=
-				array=
-			fi
-			;;
-		*)
-			if [ -n "$array" ]
-			then
-				array="$array $token"
-			fi
-			;;
-		esac
-		test -z "$fields" || fields="$fields $token"
-	done
-	echo " \\"
-	echo "} while (0)"
-}
-
-check_field ()
-{
-	if [ -z "$(echo "$4" | $SED 's,[^{}],,g')" ]
-	then
-		echo "; \\"
-		local n=$(echo $3 | $SED 's,[^.], ,g' | wc -w | $SED 's,[[:space:]]*,,g')
-		if [ -n "$4" ]
-		then
-			for n in $4
-			do
-				case $n in
-				struct|union)
-					;;
-				[a-zA-Z_]*)
-					printf %s "    CHECK_${n#xen_}"
-					break
-					;;
-				*)
-					echo "Malformed compound declaration: '$n'" >&2
-					exit 1
-					;;
-				esac
-			done
-		elif [ $n = 0 ]
-		then
-			printf %s "    CHECK_FIELD_($1, $2, $3)"
-		else
-			printf %s "    CHECK_SUBFIELD_${n}_($1, $2, $(echo $3 | $SED 's!\.!, !g'))"
-		fi
-	else
-		local level=1 fields= id= token
-		for token in $4
-		do
-			case "$token" in
-			struct|union)
-				test $level != 2 || fields=" "
-				;;
-			"{")
-				level=$(expr $level + 1) id=
-				;;
-			"}")
-				level=$(expr $level - 1) id=
-				;;
-			compat_*_t)
-				if [ $level = 2 ]
-				then
-					fields=" "
-					token="${token%_t}"
-					token="${token#compat_}"
-				fi
-				;;
-			evtchn_*_compat_t)
-				if [ $level = 2 -a $token != evtchn_port_compat_t ]
-				then
-					fields=" "
-					token="${token%_compat_t}"
-				fi
-				;;
-			[a-zA-Z]*)
-				id=$token
-				;;
-			[\,\;])
-				if [ $level = 2 -a -n "$(echo $id | $SED 's,^_pad[[:digit:]]*,,')" ]
-				then
-					check_field $1 $2 $3.$id "$fields"
-					test "$token" != ";" || fields= id=
-				fi
-				;;
-			esac
-			test -z "$fields" || fields="$fields $token"
-		done
-	fi
-}
-
-build_check ()
-{
-	echo
-	echo "#define CHECK_$1 \\"
-	local level=1 fields= kind= id= arrlvl=1 token
-	for token in $2
-	do
-		case "$token" in
-		struct|union)
-			if [ $level = 1 ]
-			then
-				kind=$token
-				printf %s "    CHECK_SIZE_($kind, $1)"
-			elif [ $level = 2 ]
-			then
-				fields=" "
-			fi
-			;;
-		"{")
-			level=$(expr $level + 1) id=
-			;;
-		"}")
-			level=$(expr $level - 1) id=
-			;;
-		"[")
-			arrlvl=$(expr $arrlvl + 1)
-			;;
-		"]")
-			arrlvl=$(expr $arrlvl - 1)
-			;;
-		compat_*_t)
-			if [ $level = 2 -a $token != compat_argo_port_t ]
-			then
-				fields=" "
-				token="${token%_t}"
-				token="${token#compat_}"
-			fi
-			;;
-		[a-zA-Z_]*)
-			test $level != 2 -o $arrlvl != 1 || id=$token
-			;;
-		[\,\;])
-			if [ $level = 2 -a -n "$(echo $id | $SED 's,^_pad[[:digit:]]*,,')" ]
-			then
-				check_field $kind $1 $id "$fields"
-				test "$token" != ";" || fields= id=
-			fi
-			;;
-		esac
-		test -z "$fields" || fields="$fields $token"
-	done
-	echo ""
-}
-
-list="$($SED -e 's,^[[:space:]]#.*,,' -e 's!\([]\[,;:{}]\)! \1 !g' $3)"
-fields="$(get_fields $(echo $2 | $SED 's,^compat_xen,compat_,') "$list")"
-if [ -z "$fields" ]
-then
-	echo "Fields of '$2' not found in '$3'" >&2
-	exit 1
-fi
-name=${2#compat_}
-name=${name#xen}
-case "$1" in
-"!")
-	typedefs="$(get_typedefs "$list")"
-	build_enums $name "$fields"
-	build_body $name "$fields"
-	;;
-"?")
-	build_check $name "$fields"
-	;;
-*)
-	echo "Invalid translation indicator: '$1'" >&2
-	exit 1
-	;;
-esac
-- 
Anthony PERARD



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script
  2023-01-16 18:10 ` [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script Anthony PERARD
@ 2023-01-17 16:07   ` Luca Fancellu
  2023-01-17 16:55     ` Anthony PERARD
  2023-01-17 17:21   ` Andrew Cooper
  1 sibling, 1 reply; 12+ messages in thread
From: Luca Fancellu @ 2023-01-17 16:07 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu



> On 16 Jan 2023, at 18:10, Anthony PERARD <anthony.perard@citrix.com> wrote:
> 
> The get-fields.sh which generate all the include/compat/.xlat/*.h
> headers is quite slow. It takes for example nearly 3 seconds to
> generate platform.h on a recent machine, or 2.3 seconds for memory.h.
> 
> Rewriting the mix of shell/sed/python into a single python script make
> the generation of those file a lot faster.
> 
> No functional change, the headers generated are identical.
> 
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> ---
> 
> Notes:
>    To test the header generation, I've submit a branch to gitlab ci,
>    where all the headers where generated, and for each one both the shell
>    script and the python script where run and the result of both
>    compared.
> 
>    v3:
>        convert to python script instead of perl
>        - this should allow more developper do be able to review/edit it.
>        - it avoid adding a dependency on perl for the hypervisor build.
> 
>        It can be twice as slow than the perl version, but overall, when doing
>        a build with make, there isn't much difference between the perl and
>        python script.
>        We might be able to speed the python script up by precompiling the
>        many regex, but it's probably not worth it. (python already have a
>        cache of compiled regex, but I think it's small, maybe 10 or so)
> 
>    v2:
>    - Add .pl extension to the perl script
>    - remove "-w" from the shebang as it is duplicate of "use warning;"
>    - Add a note in the commit message that the "headers generated are identical".
> 
> xen/include/Makefile            |   6 +-
> xen/tools/compat-xlat-header.py | 468 ++++++++++++++++++++++++++++
> xen/tools/get-fields.sh         | 528 --------------------------------
> 3 files changed, 470 insertions(+), 532 deletions(-)
> create mode 100644 xen/tools/compat-xlat-header.py
> delete mode 100644 xen/tools/get-fields.sh
> 
> diff --git a/xen/include/Makefile b/xen/include/Makefile
> index 65be310eca..b950423efe 100644
> --- a/xen/include/Makefile
> +++ b/xen/include/Makefile
> @@ -60,9 +60,7 @@ cmd_compat_c = \
> 
> quiet_cmd_xlat_headers = GEN     $@
> cmd_xlat_headers = \
> -    while read what name; do \
> -        $(SHELL) $(srctree)/tools/get-fields.sh "$$what" compat_$$name $< || exit $$?; \
> -    done <$(patsubst $(obj)/compat/%,$(obj)/compat/.xlat/%,$(basename $<)).lst >$@.new; \
> +    $(PYTHON) $(srctree)/tools/compat-xlat-header.py $< $(patsubst $(obj)/compat/%,$(obj)/compat/.xlat/%,$(basename $<)).lst > $@.new; \
>     mv -f $@.new $@
> 
> targets += $(headers-y)
> @@ -80,7 +78,7 @@ $(obj)/compat/%.c: $(src)/public/%.h $(srcdir)/xlat.lst $(srctree)/tools/compat-
> $(call if_changed,compat_c)
> 
> targets += $(patsubst compat/%, compat/.xlat/%, $(headers-y))
> -$(obj)/compat/.xlat/%.h: $(obj)/compat/%.h $(obj)/compat/.xlat/%.lst $(srctree)/tools/get-fields.sh FORCE
> +$(obj)/compat/.xlat/%.h: $(obj)/compat/%.h $(obj)/compat/.xlat/%.lst $(srctree)/tools/compat-xlat-header.py FORCE
> $(call if_changed,xlat_headers)
> 
> quiet_cmd_xlat_lst = GEN     $@
> diff --git a/xen/tools/compat-xlat-header.py b/xen/tools/compat-xlat-header.py
> new file mode 100644
> index 0000000000..c1b361ac56
> --- /dev/null
> +++ b/xen/tools/compat-xlat-header.py
> @@ -0,0 +1,468 @@
> +#!/usr/bin/env python

Would it make sense to start with python3 since it is a new script?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script
  2023-01-17 16:07   ` Luca Fancellu
@ 2023-01-17 16:55     ` Anthony PERARD
  2023-01-17 16:58       ` Luca Fancellu
  0 siblings, 1 reply; 12+ messages in thread
From: Anthony PERARD @ 2023-01-17 16:55 UTC (permalink / raw)
  To: Luca Fancellu
  Cc: Xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On Tue, Jan 17, 2023 at 04:07:24PM +0000, Luca Fancellu wrote:
> > On 16 Jan 2023, at 18:10, Anthony PERARD <anthony.perard@citrix.com> wrote:
> > diff --git a/xen/tools/compat-xlat-header.py b/xen/tools/compat-xlat-header.py
> > new file mode 100644
> > index 0000000000..c1b361ac56
> > --- /dev/null
> > +++ b/xen/tools/compat-xlat-header.py
> > @@ -0,0 +1,468 @@
> > +#!/usr/bin/env python
> 
> Would it make sense to start with python3 since it is a new script?

That shebang isn't even used as the script doesn't even have the
execution bit set. So why do you say that the script isn't python3? Not
really asking, just been pedantic :-)

Even if it's a new script, it isn't a new project. We can't depend on
brand new functionality from our dependencies. We need to be able to
build the hypervisor with old build toolchain / distribution.

Anyway, I did start by writing a python3 script in all its glory (or at
least some of the new part of the language that I know about), but I had
to rework it to be able to use it on older distribution. Our centos7
container in our GitLab CI seems to use python2.7.

So I had to stop using str.removeprefix() and I introduce some function
doing the same thing instead (so that works with older than python 3.9).
Then I had to stop using f-strings and use %-formatting instead.
Then use "m.groups()[0]" instead of "m[1]" where "m" is a match result
from re.match() and other.
And use the classing "from __future__ ..." preamble.

Cheers,

-- 
Anthony PERARD

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script
  2023-01-17 16:55     ` Anthony PERARD
@ 2023-01-17 16:58       ` Luca Fancellu
  0 siblings, 0 replies; 12+ messages in thread
From: Luca Fancellu @ 2023-01-17 16:58 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu



> On 17 Jan 2023, at 16:55, Anthony PERARD <anthony.perard@citrix.com> wrote:
> 
> On Tue, Jan 17, 2023 at 04:07:24PM +0000, Luca Fancellu wrote:
>>> On 16 Jan 2023, at 18:10, Anthony PERARD <anthony.perard@citrix.com> wrote:
>>> diff --git a/xen/tools/compat-xlat-header.py b/xen/tools/compat-xlat-header.py
>>> new file mode 100644
>>> index 0000000000..c1b361ac56
>>> --- /dev/null
>>> +++ b/xen/tools/compat-xlat-header.py
>>> @@ -0,0 +1,468 @@
>>> +#!/usr/bin/env python
>> 
>> Would it make sense to start with python3 since it is a new script?
> 
> That shebang isn't even used as the script doesn't even have the
> execution bit set. So why do you say that the script isn't python3? Not
> really asking, just been pedantic :-)

Yes I didn’t pay attention to that

> 
> Even if it's a new script, it isn't a new project. We can't depend on
> brand new functionality from our dependencies. We need to be able to
> build the hypervisor with old build toolchain / distribution.
> 
> Anyway, I did start by writing a python3 script in all its glory (or at
> least some of the new part of the language that I know about), but I had
> to rework it to be able to use it on older distribution. Our centos7
> container in our GitLab CI seems to use python2.7.

That makes sense, thanks for the explanation.

> 
> So I had to stop using str.removeprefix() and I introduce some function
> doing the same thing instead (so that works with older than python 3.9).
> Then I had to stop using f-strings and use %-formatting instead.
> Then use "m.groups()[0]" instead of "m[1]" where "m" is a match result
> from re.match() and other.
> And use the classing "from __future__ ..." preamble.
> 
> Cheers,
> 
> -- 
> Anthony PERARD


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script
  2023-01-16 18:10 ` [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script Anthony PERARD
  2023-01-17 16:07   ` Luca Fancellu
@ 2023-01-17 17:21   ` Andrew Cooper
  2023-01-18  8:06     ` Jan Beulich
  2023-01-18 18:21     ` Anthony PERARD
  1 sibling, 2 replies; 12+ messages in thread
From: Andrew Cooper @ 2023-01-17 17:21 UTC (permalink / raw)
  To: Anthony Perard, xen-devel@lists.xenproject.org
  Cc: George Dunlap, Jan Beulich, Julien Grall, Stefano Stabellini,
	Wei Liu

On 16/01/2023 6:10 pm, Anthony PERARD wrote:
> The get-fields.sh which generate all the include/compat/.xlat/*.h
> headers is quite slow. It takes for example nearly 3 seconds to
> generate platform.h on a recent machine, or 2.3 seconds for memory.h.
>
> Rewriting the mix of shell/sed/python into a single python script make
> the generation of those file a lot faster.
>
> No functional change, the headers generated are identical.
>
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>

I'll happily trust the testing you've done, and that the outputs are
equivalent.  However, I have some general python suggestions.

> diff --git a/xen/tools/compat-xlat-header.py b/xen/tools/compat-xlat-header.py
> new file mode 100644
> index 0000000000..c1b361ac56
> --- /dev/null
> +++ b/xen/tools/compat-xlat-header.py
> @@ -0,0 +1,468 @@
> +#!/usr/bin/env python
> +
> +from __future__ import print_function
> +import re
> +import sys
> +
> +typedefs = []
> +
> +def removeprefix(s, prefix):
> +    if s.startswith(prefix):
> +        return s[len(prefix):]
> +    return s
> +
> +def removesuffix(s, suffix):
> +    if s.endswith(suffix):
> +        return s[:-len(suffix)]
> +    return s
> +
> +def get_fields(looking_for, header_tokens):
> +    level = 1
> +    aggr = 0
> +    fields = []
> +    name = ''
> +
> +    for token in header_tokens:
> +        if token in ['struct', 'union']:

('struct', 'union')

A tuple here rather than a list is marginally more efficient.

> +            if level == 1:
> +                aggr = 1
> +                fields = []
> +                name = ''
> +        elif token == '{':
> +            level += 1
> +        elif token == '}':
> +            level -= 1
> +            if level == 1 and name == looking_for:
> +                fields.append(token)
> +                return fields
> +        elif re.match(r'^[a-zA-Z_]', token):

Here and many other places, you've got re.{search,match} with repeated
regexes.   This can end up being quite expensive, and we already had one
build system slowdown caused by it.

Up near typedefs at the top, you want something like:

token_re = re.compile('[a-zA-Z_]')

to prepare the regex once at the start of day, and and use 'elif
token_re.match(token)' here.

Another thing to note, the difference between re.search and re.match is
that match has an implicit '^' anchor.  You need to be aware of this
when compiling one regex to be used with both.


All of this said, where is 0-9 in the token regex?  Have we just got
extremely lucky with having no embedded digits in identifiers thus far?

> +            if not (aggr == 0 or name != ''):
> +                name = token
> +
> +        if aggr != 0:
> +            fields.append(token)
> +
> +    return []
> +
> +def get_typedefs(tokens):
> +    level = 1
> +    state = 0
> +    typedefs = []

This shadows the global typedefs list, but the result of calling this
function is simply assigned to it.  Looking at the code, the global list
is used in several places

It would be better to "global typedefs" here, and make this function
void, unless you want to modify handle_field() and build_body() to take
it in as a regular parameter.

I'm pretty sure typedefs actually wants to be a dict rather than a list
(will have better "id in typedefs" performance lower down), but that
wants matching with code changes elsewhere, and probably wants doing
separately.

> +    for token in tokens:
> +        if token == 'typedef':
> +            if level == 1:
> +                state = 1
> +        elif re.match(r'^COMPAT_HANDLE\((.*)\)$', token):
> +            if not (level != 1 or state != 1):
> +                state = 2
> +        elif token in ['[', '{']:
> +            level += 1
> +        elif token in [']', '}']:
> +            level -= 1
> +        elif token == ';':
> +            if level == 1:
> +                state = 0
> +        elif re.match(r'^[a-zA-Z_]', token):
> +            if not (level != 1 or state != 2):
> +                typedefs.append(token)
> +    return typedefs
> +
> +def build_enums(name, tokens):
> +    level = 1
> +    kind = ''
> +    named = ''
> +    fields = []
> +    members = []
> +    id = ''
> +
> +    for token in tokens:
> +        if token in ['struct', 'union']:
> +            if not level != 2:
> +                fields = ['']
> +            kind = "%s;%s" % (token, kind)
> +        elif token == '{':
> +            level += 1
> +        elif token == '}':
> +            level -= 1
> +            if level == 1:
> +                subkind = kind
> +                (subkind, _, _) = subkind.partition(';')
> +                if subkind == 'union':
> +                    print("\nenum XLAT_%s {" % (name,))
> +                    for m in members:
> +                        print("    XLAT_%s_%s," % (name, m))
> +                    print("};")
> +                return
> +            elif level == 2:
> +                named = '?'
> +        elif re.match(r'^[a-zA-Z_]', token):
> +            id = token
> +            k = kind
> +            (_, _, k) = k.partition(';')
> +            if named != '' and k != '':
> +                if len(fields) > 0 and fields[0] == '':
> +                    fields.pop(0)
> +                build_enums("%s_%s" % (name, token), fields)
> +                named = '!'
> +        elif token == ',':
> +            if level == 2:
> +                members.append(id)
> +        elif token == ';':
> +            if level == 2:
> +                members.append(id)
> +            if named != '':
> +                (_, _, kind) = kind.partition(';')
> +            named = ''
> +        if len(fields) != 0:
> +            fields.append(token)
> +
> +def handle_field(prefix, name, id, type, fields):
> +    if len(fields) == 0:
> +        print(" \\")
> +        if type == '':
> +            print("%s(_d_)->%s = (_s_)->%s;" % (prefix, id, id), end='')
> +        else:
> +            k = id.replace('.', '_')
> +            print("%sXLAT_%s_HNDL_%s(_d_, _s_);" % (prefix, name, k), end='')
> +    elif not re.search(r'[{}]', ' '.join(fields)):
> +        tag = ' '.join(fields)
> +        tag = re.sub(r'\s*(struct|union)\s+(compat_)?(\w+)\s.*', '\\3', tag)
> +        print(" \\")
> +        print("%sXLAT_%s(&(_d_)->%s, &(_s_)->%s);" % (prefix, tag, id, id), end='')
> +    else:
> +        func_id = id
> +        func_tokens = fields
> +        kind = ''
> +        array = ""
> +        level = 1
> +        arrlvl = 1
> +        array_type = ''
> +        id = ''
> +        type = ''
> +        fields = []
> +        for token in func_tokens:
> +            if token in ['struct', 'union']:
> +                if level == 2:
> +                    fields = ['']
> +                if level == 1:
> +                    kind = token
> +                    if kind == 'union':
> +                        tmp = func_id.replace('.', '_')
> +                        print(" \\")
> +                        print("%sswitch (%s) {" % (prefix, tmp), end='')
> +            elif token == '{':
> +                level += 1
> +                id = ''
> +            elif token == '}':
> +                level -= 1
> +                id = ''
> +                if level == 1 and kind == 'union':
> +                    print(" \\")
> +                    print("%s}" % (prefix,), end='')
> +            elif token == '[':
> +                if level != 2 or arrlvl != 1:
> +                    pass
> +                elif array == '':
> +                    array = ' '
> +                else:
> +                    array = "%s;" % (array,)
> +                arrlvl += 1
> +            elif token == ']':
> +                arrlvl -= 1
> +            elif re.match(r'^COMPAT_HANDLE\((.*)\)$', token):
> +                if level == 2 and id == '':
> +                    m = re.match(r'^COMPAT_HANDLE\((.*)\)$', token)
> +                    type = m.groups()[0]
> +                    type = removeprefix(type, 'compat_')

So this pattern is what lead to the introduction of the := operator in
Python 3.8, but we obviously can't use it yet.

At least pre-compiling the regexes will reduce the hit from the double
evaluation here.

~Andrew

P.S. I probably don't want to know why we have to special case evtchn
port, argo port and domain handle.  I think it says more about the this
bodge of a parser than anything else...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script
  2023-01-17 17:21   ` Andrew Cooper
@ 2023-01-18  8:06     ` Jan Beulich
  2023-01-18 19:38       ` Andrew Cooper
  2023-01-18 18:21     ` Anthony PERARD
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2023-01-18  8:06 UTC (permalink / raw)
  To: Andrew Cooper, Anthony Perard, xen-devel@lists.xenproject.org
  Cc: George Dunlap, Julien Grall, Stefano Stabellini, Wei Liu

On 17.01.2023 18:21, Andrew Cooper wrote:
> On 16/01/2023 6:10 pm, Anthony PERARD wrote:
>> +        elif re.match(r'^[a-zA-Z_]', token):
> 
>[...]
> 
> All of this said, where is 0-9 in the token regex?  Have we just got
> extremely lucky with having no embedded digits in identifiers thus far?

That's checking for just the first character, which can't be a digit?

> P.S. I probably don't want to know why we have to special case evtchn
> port, argo port and domain handle.  I think it says more about the this
> bodge of a parser than anything else...

Iirc something broke without it, but it's been too long and spending a
reasonable amount of time trying to re-construct I couldn't come up
with anything. I didn't want to go as far as put time into actually
trying out what (if anything) breaks with those removed. What I'm
puzzled about is that argo and evtchn port types are handled in
different places.

For the domain handle iirc the exception was attributed to it being
the only typedef of an array which is embedded in other structures.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script
  2023-01-18  8:06     ` Jan Beulich
@ 2023-01-18 19:38       ` Andrew Cooper
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2023-01-18 19:38 UTC (permalink / raw)
  To: Jan Beulich, Anthony Perard, xen-devel@lists.xenproject.org
  Cc: George Dunlap, Julien Grall, Stefano Stabellini, Wei Liu

On 18/01/2023 8:06 am, Jan Beulich wrote:
> On 17.01.2023 18:21, Andrew Cooper wrote:
>> On 16/01/2023 6:10 pm, Anthony PERARD wrote:
>>> +        elif re.match(r'^[a-zA-Z_]', token):
>> [...]
>>
>> All of this said, where is 0-9 in the token regex?  Have we just got
>> extremely lucky with having no embedded digits in identifiers thus far?
> That's checking for just the first character, which can't be a digit?

So it is...

But nothing good can possibly come of having a token here which matches
on the first char but mismatches later.

>
>> P.S. I probably don't want to know why we have to special case evtchn
>> port, argo port and domain handle.  I think it says more about the this
>> bodge of a parser than anything else...
> Iirc something broke without it, but it's been too long and spending a
> reasonable amount of time trying to re-construct I couldn't come up
> with anything. I didn't want to go as far as put time into actually
> trying out what (if anything) breaks with those removed. What I'm
> puzzled about is that argo and evtchn port types are handled in
> different places.
>
> For the domain handle iirc the exception was attributed to it being
> the only typedef of an array which is embedded in other structures.

I refer back to "bodge of a parser".

~Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script
  2023-01-17 17:21   ` Andrew Cooper
  2023-01-18  8:06     ` Jan Beulich
@ 2023-01-18 18:21     ` Anthony PERARD
  2023-01-18 19:49       ` Andrew Cooper
  2023-01-19 10:50       ` Jan Beulich
  1 sibling, 2 replies; 12+ messages in thread
From: Anthony PERARD @ 2023-01-18 18:21 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel@lists.xenproject.org, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On Tue, Jan 17, 2023 at 05:21:32PM +0000, Andrew Cooper wrote:
> On 16/01/2023 6:10 pm, Anthony PERARD wrote:
> > +def get_typedefs(tokens):
> > +    level = 1
> > +    state = 0
> > +    typedefs = []
> 
> I'm pretty sure typedefs actually wants to be a dict rather than a list
> (will have better "id in typedefs" performance lower down), but that
> wants matching with code changes elsewhere, and probably wants doing
> separately.

I'm not sure that going to make a difference to have "id in ()" instead
of "id in []". I just found out that `typedefs` is always empty...

I don't know what get_typedefs() is supposed to do, or at least if it
works as expected, because it always returns "" or an empty list. (even
the shell version)

So, it would actually be a bit faster to not call get_typedefs(), but I
don't know if that's safe.

> > +    for token in tokens:
> > +        if token == 'typedef':
> > +            if level == 1:
> > +                state = 1
> > +        elif re.match(r'^COMPAT_HANDLE\((.*)\)$', token):
> > +            if not (level != 1 or state != 1):
> > +                state = 2
> > +        elif token in ['[', '{']:
> > +            level += 1
> > +        elif token in [']', '}']:
> > +            level -= 1
> > +        elif token == ';':
> > +            if level == 1:
> > +                state = 0
> > +        elif re.match(r'^[a-zA-Z_]', token):
> > +            if not (level != 1 or state != 2):
> > +                typedefs.append(token)
> > +    return typedefs

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script
  2023-01-18 18:21     ` Anthony PERARD
@ 2023-01-18 19:49       ` Andrew Cooper
  2023-01-19 10:50       ` Jan Beulich
  1 sibling, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2023-01-18 19:49 UTC (permalink / raw)
  To: Anthony Perard
  Cc: xen-devel@lists.xenproject.org, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On 18/01/2023 6:21 pm, Anthony PERARD wrote:
> On Tue, Jan 17, 2023 at 05:21:32PM +0000, Andrew Cooper wrote:
>> On 16/01/2023 6:10 pm, Anthony PERARD wrote:
>>> +def get_typedefs(tokens):
>>> +    level = 1
>>> +    state = 0
>>> +    typedefs = []
>> I'm pretty sure typedefs actually wants to be a dict rather than a list
>> (will have better "id in typedefs" performance lower down), but that
>> wants matching with code changes elsewhere, and probably wants doing
>> separately.
> I'm not sure that going to make a difference to have "id in ()" instead
> of "id in []".

It will in the middle of a tight loop.  Less pointer chasing in memory. 
But it is very marginal.

>  I just found out that `typedefs` is always empty...
>
> I don't know what get_typedefs() is supposed to do, or at least if it
> works as expected, because it always returns "" or an empty list. (even
> the shell version)
>
> So, it would actually be a bit faster to not call get_typedefs(), but I
> don't know if that's safe.

If it is dead logic even at the shell level, drop it.  Perhaps a prereq
patch, because removing the complexity first will make this patch
simpler to follow.

The conversion is atypical python, and perf will improve (which is the
point of this patch anyway).

~Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script
  2023-01-18 18:21     ` Anthony PERARD
  2023-01-18 19:49       ` Andrew Cooper
@ 2023-01-19 10:50       ` Jan Beulich
  2023-01-19 11:27         ` Andrew Cooper
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2023-01-19 10:50 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: xen-devel@lists.xenproject.org, George Dunlap, Julien Grall,
	Stefano Stabellini, Wei Liu, Andrew Cooper

On 18.01.2023 19:21, Anthony PERARD wrote:
> On Tue, Jan 17, 2023 at 05:21:32PM +0000, Andrew Cooper wrote:
>> On 16/01/2023 6:10 pm, Anthony PERARD wrote:
>>> +def get_typedefs(tokens):
>>> +    level = 1
>>> +    state = 0
>>> +    typedefs = []
>>
>> I'm pretty sure typedefs actually wants to be a dict rather than a list
>> (will have better "id in typedefs" performance lower down), but that
>> wants matching with code changes elsewhere, and probably wants doing
>> separately.
> 
> I'm not sure that going to make a difference to have "id in ()" instead
> of "id in []". I just found out that `typedefs` is always empty...
> 
> I don't know what get_typedefs() is supposed to do, or at least if it
> works as expected, because it always returns "" or an empty list. (even
> the shell version)
> 
> So, it would actually be a bit faster to not call get_typedefs(), but I
> don't know if that's safe.

There's exactly one instance that this would take care of:

typedef XEN_GUEST_HANDLE(char) tmem_cli_va_t;

But tmem.h isn't being processed anymore, and hence right now the list
would always be empty. Are we going to be able to guarantee that going
forward?

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script
  2023-01-19 10:50       ` Jan Beulich
@ 2023-01-19 11:27         ` Andrew Cooper
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2023-01-19 11:27 UTC (permalink / raw)
  To: Jan Beulich, Anthony Perard
  Cc: xen-devel@lists.xenproject.org, George Dunlap, Julien Grall,
	Stefano Stabellini, Wei Liu

On 19/01/2023 10:50 am, Jan Beulich wrote:
> On 18.01.2023 19:21, Anthony PERARD wrote:
>> On Tue, Jan 17, 2023 at 05:21:32PM +0000, Andrew Cooper wrote:
>>> On 16/01/2023 6:10 pm, Anthony PERARD wrote:
>>>> +def get_typedefs(tokens):
>>>> +    level = 1
>>>> +    state = 0
>>>> +    typedefs = []
>>> I'm pretty sure typedefs actually wants to be a dict rather than a list
>>> (will have better "id in typedefs" performance lower down), but that
>>> wants matching with code changes elsewhere, and probably wants doing
>>> separately.
>> I'm not sure that going to make a difference to have "id in ()" instead
>> of "id in []". I just found out that `typedefs` is always empty...
>>
>> I don't know what get_typedefs() is supposed to do, or at least if it
>> works as expected, because it always returns "" or an empty list. (even
>> the shell version)
>>
>> So, it would actually be a bit faster to not call get_typedefs(), but I
>> don't know if that's safe.
> There's exactly one instance that this would take care of:
>
> typedef XEN_GUEST_HANDLE(char) tmem_cli_va_t;
>
> But tmem.h isn't being processed anymore, and hence right now the list
> would always be empty. Are we going to be able to guarantee that going
> forward?

IMO that's a code pattern we wouldn't want to repeat moving forward.

There's already too much magic in a guest handle, without hiding it
behind a typedef too.

~Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-01-19 11:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-16 18:10 [XEN PATCH v3 0/1] xen: rework compat headers generation Anthony PERARD
2023-01-16 18:10 ` [XEN PATCH v3 1/1] build: replace get-fields.sh by a python script Anthony PERARD
2023-01-17 16:07   ` Luca Fancellu
2023-01-17 16:55     ` Anthony PERARD
2023-01-17 16:58       ` Luca Fancellu
2023-01-17 17:21   ` Andrew Cooper
2023-01-18  8:06     ` Jan Beulich
2023-01-18 19:38       ` Andrew Cooper
2023-01-18 18:21     ` Anthony PERARD
2023-01-18 19:49       ` Andrew Cooper
2023-01-19 10:50       ` Jan Beulich
2023-01-19 11:27         ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.