Re: [KVM-AUTOTEST PATCH v4] KVM test: A memory efficient kvm_config implementation

From: Lucas Meneghel Rodrigues <lmr@redhat.com>
To: Michael Goldish <mgoldish@redhat.com>
Cc: autotest@test.kernel.org, kvm@vger.kernel.org
Subject: Re: [KVM-AUTOTEST PATCH v4] KVM test: A memory efficient kvm_config implementation
Date: Wed, 03 Mar 2010 11:44:36 -0300	[thread overview]
Message-ID: <1267627476.2565.2.camel@localhost.localdomain> (raw)
In-Reply-To: <1267551059-24189-1-git-send-email-mgoldish@redhat.com>

On Tue, 2010-03-02 at 19:30 +0200, Michael Goldish wrote:
> This patch:
> 
> - Makes kvm_config use less memory during parsing, by storing config data
>   compactly in arrays during parsing, and generating the final dicts only when
>   requested.
>   On my machine this results in 5-10 times less memory being used (depending on
>   the size of the final generated list).
>   This allows the test configuration to keep expanding without having the
>   parser run out of memory.
> 
> - Adds config.fork_and_parse(), a function that parses a config file/string in
>   a forked process and then terminates the process.  This works around Python's
>   policy of keeping allocated memory to itself even after the objects occupying
>   the memory have been destroyed.  If the process that does the parsing is the
>   same one that runs the tests, less memory will be available to the VMs during
>   testing.
> 
> - Makes parsing 4-5 times faster as a result of the new internal representation.
> 
> Overall, kvm_config's memory usage should now be negligible in most cases.
> 
> Changes from v3:
> - Use the homemade 'configreader' class instead of regular files in parse()
>   and parse_variants() (readline() and/or seek() are very slow).
> - Use a regex cache dict (regex_cache).
> - Use a string cache dict in addition to the list (object_cache_indices).
> - Some changes to fork_and_parse() (disable buffering).
> 
> Changes from v2:
> - Merged _get_next_line() and _get_next_line_indent().
> - Made _array_get_name() faster.
> 
> Changes from v1:
> - Added config.get_generator() which is similar to get_list() but returns a
>   dict generator instead of a list.  This should save some more memory and will
>   make tests start sooner.
> - Use get_generator() in control.
> - Call waitpid() at the end of fork_and_parse().

As the generated patch is kinda fragmented for posting comments inline,
I am going to throw just a block of minor comments after I have reviewed
the code:

Observations:

* When a file is missing, it's more appropriate to raise a IOError than
an Exception, so we must change that. Also, it's important to follow the
coding standards for raising exceptions.
• I was wondering whether make fork_and_parse a public interface for the
config object was the right decision, maybe all calls to parse_file
should be done in a fork_and_parse fashion? I guess I got your point in
making it a public interface and separate it from parse_file, but isn't
that kinda confusing for the users (I mean, people writing control files
for kvm autotest)?
• About buffering on fork_and_parse: The performance penalties in
disabling buffering varies, with caches dropped it was something like
3-5%, after 'warming up' it was something like 8-11%, so it's small
stuff. But we can favour speed in this case so the final version won't
disable buffering.

Compliments:

• The configreader class was a very interesting move, simple, clean and
fast. Congrats!
• The output of the config system is good for debugging purposes, so
we'll stick with it.
• Thank you very much for your work, now we have faster parsing, that
consumes a lot less memory, so smaller boxes will benefit a *lot* from
that.

What I am going to do:

• I will re-send the version with the tiny changes I made so it gets
recorded on patchwork, and soon after I'll apply it upstream. I think
from this point on we might have only minor tweaks to make.

> 
> Signed-off-by: Michael Goldish <mgoldish@redhat.com>
> ---
>  client/tests/kvm/control          |   30 +-
>  client/tests/kvm/control.parallel |   21 +-
>  client/tests/kvm/kvm_config.py    |  832 ++++++++++++++++++++++---------------
>  3 files changed, 535 insertions(+), 348 deletions(-)
> 
> diff --git a/client/tests/kvm/control b/client/tests/kvm/control
> index 163286e..15c4539 100644
> --- a/client/tests/kvm/control
> +++ b/client/tests/kvm/control
> @@ -30,34 +30,38 @@ import kvm_utils, kvm_config
>  # set English environment (command output might be localized, need to be safe)
>  os.environ['LANG'] = 'en_US.UTF-8'
>  
> -build_cfg_path = os.path.join(kvm_test_dir, "build.cfg")
> -build_cfg = kvm_config.config(build_cfg_path)
> -# Make any desired changes to the build configuration here. For example:
> -#build_cfg.parse_string("""
> +str = """
> +# This string will be parsed after build.cfg.  Make any desired changes to the
> +# build configuration here.  For example:
>  #release_tag = 84
> -#""")
> -if not kvm_utils.run_tests(build_cfg.get_list(), job):
> +"""
> +build_cfg = kvm_config.config()
> +build_cfg_path = os.path.join(kvm_test_dir, "build.cfg")
> +build_cfg.fork_and_parse(build_cfg_path, str)
> +if not kvm_utils.run_tests(build_cfg.get_generator(), job):
>      logging.error("KVM build step failed, exiting.")
>      sys.exit(1)
>  
> -tests_cfg_path = os.path.join(kvm_test_dir, "tests.cfg")
> -tests_cfg = kvm_config.config(tests_cfg_path)
> -# Make any desired changes to the test configuration here. For example:
> -#tests_cfg.parse_string("""
> +str = """
> +# This string will be parsed after tests.cfg.  Make any desired changes to the
> +# test configuration here.  For example:
>  #display = sdl
>  #install|setup: timeout_multiplier = 3
> -#""")
> +"""
> +tests_cfg = kvm_config.config()
> +tests_cfg_path = os.path.join(kvm_test_dir, "tests.cfg")
> +tests_cfg.fork_and_parse(tests_cfg_path, str)
>  
>  pools_cfg_path = os.path.join(kvm_test_dir, "address_pools.cfg")
>  tests_cfg.parse_file(pools_cfg_path)
>  hostname = os.uname()[1].split(".")[0]
> -if tests_cfg.filter("^" + hostname):
> +if tests_cfg.count("^" + hostname):
>      tests_cfg.parse_string("only ^%s" % hostname)
>  else:
>      tests_cfg.parse_string("only ^default_host")
>  
>  # Run the tests
> -kvm_utils.run_tests(tests_cfg.get_list(), job)
> +kvm_utils.run_tests(tests_cfg.get_generator(), job)
>  
>  # Generate a nice HTML report inside the job's results dir
>  kvm_utils.create_report(kvm_test_dir, job.resultdir)
> diff --git a/client/tests/kvm/control.parallel b/client/tests/kvm/control.parallel
> index 343f694..07bc6e5 100644
> --- a/client/tests/kvm/control.parallel
> +++ b/client/tests/kvm/control.parallel
> @@ -160,19 +160,22 @@ if not params.get("mode") == "noinstall":
>  # ----------------------------------------------------------
>  import kvm_config
>  
> -filename = os.path.join(pwd, "kvm_tests.cfg")
> -cfg = kvm_config.config(filename)
> -
> -# If desirable, make changes to the test configuration here.  For example:
> -# cfg.parse_string("install|setup: timeout_multiplier = 2")
> -# cfg.parse_string("only fc8_quick")
> -# cfg.parse_string("display = sdl")
> +str = """
> +# This string will be parsed after tests.cfg.  Make any desired changes to the
> +# test configuration here.  For example:
> +#install|setup: timeout_multiplier = 3
> +#only fc8_quick
> +#display = sdl
> +"""
> +cfg = kvm_config.config()
> +filename = os.path.join(pwd, "tests.cfg")
> +cfg.fork_and_parse(filename, str)
>  
> -filename = os.path.join(pwd, "kvm_address_pools.cfg")
> +filename = os.path.join(pwd, "address_pools.cfg")
>  if os.path.exists(filename):
>      cfg.parse_file(filename)
>      hostname = os.uname()[1].split(".")[0]
> -    if cfg.filter("^" + hostname):
> +    if cfg.count("^" + hostname):
>          cfg.parse_string("only ^%s" % hostname)
>      else:
>          cfg.parse_string("only ^default_host")
> diff --git a/client/tests/kvm/kvm_config.py b/client/tests/kvm/kvm_config.py
> index 798ef56..7ff28e4 100755
> --- a/client/tests/kvm/kvm_config.py
> +++ b/client/tests/kvm/kvm_config.py
> @@ -2,10 +2,10 @@
>  """
>  KVM configuration file utility functions.
>  
> -@copyright: Red Hat 2008-2009
> +@copyright: Red Hat 2008-2010
>  """
>  
> -import logging, re, os, sys, StringIO, optparse
> +import logging, re, os, sys, optparse, array, traceback, cPickle
>  import common
>  from autotest_lib.client.common_lib import error
>  from autotest_lib.client.common_lib import logging_config, logging_manager
> @@ -21,490 +21,670 @@ class config:
>      """
>      Parse an input file or string that follows the KVM Test Config File format
>      and generate a list of dicts that will be later used as configuration
> -    parameters by the the KVM tests.
> +    parameters by the KVM tests.
>  
>      @see: http://www.linux-kvm.org/page/KVM-Autotest/Test_Config_File
>      """
>  
> -    def __init__(self, filename=None, debug=False):
> +    def __init__(self, filename=None, debug=True):
>          """
> -        Initialize the list and optionally parse filename.
> +        Initialize the list and optionally parse a file.
>  
>          @param filename: Path of the file that will be taken.
> -        @param debug: Whether to turn debugging output.
> +        @param debug: Whether to turn on debugging output.
>          """
> -        self.list = [{"name": "", "shortname": "", "depend": []}]
> -        self.debug = debug
> +        self.list = [array.array("H", [4, 4, 4, 4])]
> +        self.object_cache = []
> +        self.object_cache_indices = {}
> +        self.regex_cache = {}
>          self.filename = filename
> +        self.debug = debug
>          if filename:
>              self.parse_file(filename)
>  
> 
>      def parse_file(self, filename):
>          """
> -        Parse filename, return the resulting list and store it in .list. If
> -        filename does not exist, raise an exception.
> +        Parse file.  If it doesn't exist, raise an exception.
>  
>          @param filename: Path of the configuration file.
>          """
>          if not os.path.exists(filename):
>              raise Exception, "File %s not found" % filename
>          self.filename = filename
> -        file = open(filename, "r")
> -        self.list = self.parse(file, self.list)
> -        file.close()
> -        return self.list
> +        str = open(filename).read()
> +        self.list = self.parse(configreader(str), self.list)
>  
> 
>      def parse_string(self, str):
>          """
> -        Parse a string, return the resulting list and store it in .list.
> +        Parse a string.
>  
> -        @param str: String that will be parsed.
> +        @param str: String to parse.
>          """
> -        file = StringIO.StringIO(str)
> -        self.list = self.parse(file, self.list)
> -        file.close()
> -        return self.list
> +        self.list = self.parse(configreader(str), self.list)
>  
> 
> -    def get_list(self):
> -        """
> -        Return the list of dictionaries. This should probably be called after
> -        parsing something.
> +    def fork_and_parse(self, filename=None, str=None):
>          """
> -        return self.list
> +        Parse a file and/or a string in a separate process to save memory.
>  
> +        Python likes to keep memory to itself even after the objects occupying
> +        it have been destroyed.  If during a call to parse_file() or
> +        parse_string() a lot of memory is used, it can only be freed by
> +        terminating the process.  This function works around the problem by
> +        doing the parsing in a forked process and then terminating it, freeing
> +        any unneeded memory.
>  
> -    def match(self, filter, dict):
> -        """
> -        Return True if dict matches filter.
> +        Note: if an exception is raised during parsing, its information will be
> +        printed, and the resulting list will be empty.  The exception will not
> +        be raised in the process calling this function.
>  
> -        @param filter: A regular expression that defines the filter.
> -        @param dict: Dictionary that will be inspected.
> +        @param filename: Path of file to parse (optional).
> +        @param str: String to parse (optional).
>          """
> -        filter = re.compile(r"(\.|^)(%s)(\.|$)" % filter)
> -        return bool(filter.search(dict["name"]))
> -
> -
> -    def filter(self, filter, list=None):
> +        r, w = os.pipe()
> +        r, w = os.fdopen(r, "r", 0), os.fdopen(w, "w", 0)
> +        pid = os.fork()
> +        if not pid:
> +            # Child process
> +            r.close()
> +            try:
> +                if filename:
> +                    self.parse_file(filename)
> +                if str:
> +                    self.parse_string(str)
> +            except:
> +                traceback.print_exc()
> +                self.list = []
> +            # Convert the arrays to strings before pickling because at least
> +            # some Python versions can't pickle/unpickle arrays
> +            l = [a.tostring() for a in self.list]
> +            cPickle.dump((l, self.object_cache), w, -1)
> +            w.close()
> +            os._exit(0)
> +        else:
> +            # Parent process
> +            w.close()
> +            (l, self.object_cache) = cPickle.load(r)
> +            r.close()
> +            os.waitpid(pid, 0)
> +            self.list = []
> +            for s in l:
> +                a = array.array("H")
> +                a.fromstring(s)
> +                self.list.append(a)
> +
> +
> +    def get_generator(self):
>          """
> -        Filter a list of dicts.
> +        Generate dictionaries from the code parsed so far.  This should
> +        probably be called after parsing something.
>  
> -        @param filter: A regular expression that will be used as a filter.
> -        @param list: A list of dictionaries that will be filtered.
> +        @return: A dict generator.
>          """
> -        if list is None:
> -            list = self.list
> -        return [dict for dict in list if self.match(filter, dict)]
> +        for a in self.list:
> +            name, shortname, depend, content = _array_get_all(a, self.object_cache)
> +            dict = {"name": name, "shortname": shortname, "depend": depend}
> +            self._apply_content_to_dict(dict, content)
> +            yield dict
>  
> 
> -    def split_and_strip(self, str, sep="="):
> +    def get_list(self):
>          """
> -        Split str and strip quotes from the resulting parts.
> +        Generate a list of dictionaries from the code parsed so far.
> +        This should probably be called after parsing something.
>  
> -        @param str: String that will be processed
> -        @param sep: Separator that will be used to split the string
> +        @return: A list of dicts.
>          """
> -        temp = str.split(sep, 1)
> -        for i in range(len(temp)):
> -            temp[i] = temp[i].strip()
> -            if re.findall("^\".*\"$", temp[i]):
> -                temp[i] = temp[i].strip("\"")
> -            elif re.findall("^\'.*\'$", temp[i]):
> -                temp[i] = temp[i].strip("\'")
> -        return temp
> -
> +        return list(self.get_generator())
>  
> -    def get_next_line(self, file):
> -        """
> -        Get the next non-empty, non-comment line in a file like object.
>  
> -        @param file: File like object
> -        @return: If no line is available, return None.
> +    def count(self, filter=".*"):
>          """
> -        while True:
> -            line = file.readline()
> -            if line == "": return None
> -            stripped_line = line.strip()
> -            if len(stripped_line) > 0 \
> -                    and not stripped_line.startswith('#') \
> -                    and not stripped_line.startswith('//'):
> -                return line
> -
> +        Return the number of dictionaries whose names match filter.
>  
> -    def get_next_line_indent(self, file):
> +        @param filter: A regular expression string.
>          """
> -        Return the indent level of the next non-empty, non-comment line in file.
> -
> -        @param file: File like object.
> -        @return: If no line is available, return -1.
> -        """
> -        pos = file.tell()
> -        line = self.get_next_line(file)
> -        if not line:
> -            file.seek(pos)
> -            return -1
> -        line = line.expandtabs()
> -        indent = 0
> -        while line[indent] == ' ':
> -            indent += 1
> -        file.seek(pos)
> -        return indent
> -
> -
> -    def add_name(self, str, name, append=False):
> -        """
> -        Add name to str with a separator dot and return the result.
> -
> -        @param str: String that will be processed
> -        @param name: name that will be appended to the string.
> -        @return: If append is True, append name to str.
> -                Otherwise, pre-pend name to str.
> -        """
> -        if str == "":
> -            return name
> -        # Append?
> -        elif append:
> -            return str + "." + name
> -        # Prepend?
> -        else:
> -            return name + "." + str
> +        exp = self._get_filter_regex(filter)
> +        count = 0
> +        for a in self.list:
> +            name = _array_get_name(a, self.object_cache)
> +            if exp.search(name):
> +                count += 1
> +        return count
>  
> 
> -    def parse_variants(self, file, list, subvariants=False, prev_indent=-1):
> +    def parse_variants(self, cr, list, subvariants=False, prev_indent=-1):
>          """
> -        Read and parse lines from file like object until a line with an indent
> -        level lower than or equal to prev_indent is encountered.
> +        Read and parse lines from a configreader object until a line with an
> +        indent level lower than or equal to prev_indent is encountered.
>  
> -        @brief: Parse a 'variants' or 'subvariants' block from a file-like
> -        object.
> -        @param file: File-like object that will be parsed
> -        @param list: List of dicts to operate on
> +        @brief: Parse a 'variants' or 'subvariants' block from a configreader
> +            object.
> +        @param cr: configreader object to be parsed.
> +        @param list: List of arrays to operate on.
>          @param subvariants: If True, parse in 'subvariants' mode;
> -        otherwise parse in 'variants' mode
> -        @param prev_indent: The indent level of the "parent" block
> -        @return: The resulting list of dicts.
> +            otherwise parse in 'variants' mode.
> +        @param prev_indent: The indent level of the "parent" block.
> +        @return: The resulting list of arrays.
>          """
>          new_list = []
>  
>          while True:
> -            indent = self.get_next_line_indent(file)
> +            pos = cr.tell()
> +            (indented_line, line, indent) = cr.get_next_line()
>              if indent <= prev_indent:
> +                cr.seek(pos)
>                  break
> -            indented_line = self.get_next_line(file).rstrip()
> -            line = indented_line.strip()
>  
>              # Get name and dependencies
> -            temp = line.strip("- ").split(":")
> -            name = temp[0]
> -            if len(temp) == 1:
> -                dep_list = []
> -            else:
> -                dep_list = temp[1].split()
> +            (name, depend) = map(str.strip, line.lstrip("- ").split(":"))
>  
>              # See if name should be added to the 'shortname' field
> -            add_to_shortname = True
> -            if name.startswith("@"):
> -                name = name.strip("@")
> -                add_to_shortname = False
> -
> -            # Make a deep copy of list
> -            temp_list = []
> -            for dict in list:
> -                new_dict = dict.copy()
> -                new_dict["depend"] = dict["depend"][:]
> -                temp_list.append(new_dict)
> +            add_to_shortname = not name.startswith("@")
> +            name = name.lstrip("@")
> +
> +            # Store name and dependencies in cache and get their indices
> +            n = self._store_str(name)
> +            d = self._store_str(depend)
> +
> +            # Make a copy of list
> +            temp_list = [a[:] for a in list]
>  
>              if subvariants:
>                  # If we're parsing 'subvariants', first modify the list
> -                self.__modify_list_subvariants(temp_list, name, dep_list,
> -                                               add_to_shortname)
> -                temp_list = self.parse(file, temp_list,
> -                        restricted=True, prev_indent=indent)
> +                if add_to_shortname:
> +                    for a in temp_list:
> +                        _array_append_to_name_shortname_depend(a, n, d)
> +                else:
> +                    for a in temp_list:
> +                        _array_append_to_name_depend(a, n, d)
> +                temp_list = self.parse(cr, temp_list, restricted=True,
> +                                       prev_indent=indent)
>              else:
>                  # If we're parsing 'variants', parse before modifying the list
>                  if self.debug:
> -                    self.__debug_print(indented_line,
> -                                       "Entering variant '%s' "
> -                                       "(variant inherits %d dicts)" %
> -                                       (name, len(list)))
> -                temp_list = self.parse(file, temp_list,
> -                        restricted=False, prev_indent=indent)
> -                self.__modify_list_variants(temp_list, name, dep_list,
> -                                            add_to_shortname)
> +                    _debug_print(indented_line,
> +                                 "Entering variant '%s' "
> +                                 "(variant inherits %d dicts)" %
> +                                 (name, len(list)))
> +                temp_list = self.parse(cr, temp_list, restricted=False,
> +                                       prev_indent=indent)
> +                if add_to_shortname:
> +                    for a in temp_list:
> +                        _array_prepend_to_name_shortname_depend(a, n, d)
> +                else:
> +                    for a in temp_list:
> +                        _array_prepend_to_name_depend(a, n, d)
>  
>              new_list += temp_list
>  
>          return new_list
>  
> 
> -    def parse(self, file, list, restricted=False, prev_indent=-1):
> +    def parse(self, cr, list, restricted=False, prev_indent=-1):
>          """
> -        Read and parse lines from file until a line with an indent level lower
> -        than or equal to prev_indent is encountered.
> -
> -        @brief: Parse a file-like object.
> -        @param file: A file-like object
> -        @param list: A list of dicts to operate on (list is modified in
> -        place and should not be used after the call)
> -        @param restricted: if True, operate in restricted mode
> -        (prohibit 'variants')
> -        @param prev_indent: the indent level of the "parent" block
> -        @return: Return the resulting list of dicts.
> +        Read and parse lines from a configreader object until a line with an
> +        indent level lower than or equal to prev_indent is encountered.
> +
> +        @brief: Parse a configreader object.
> +        @param cr: A configreader object.
> +        @param list: A list of arrays to operate on (list is modified in
> +            place and should not be used after the call).
> +        @param restricted: If True, operate in restricted mode
> +            (prohibit 'variants').
> +        @param prev_indent: The indent level of the "parent" block.
> +        @return: The resulting list of arrays.
>          @note: List is destroyed and should not be used after the call.
> -        Only the returned list should be used.
> +            Only the returned list should be used.
>          """
> +        current_block = ""
> +
>          while True:
> -            indent = self.get_next_line_indent(file)
> +            pos = cr.tell()
> +            (indented_line, line, indent) = cr.get_next_line()
>              if indent <= prev_indent:
> +                cr.seek(pos)
> +                self._append_content_to_arrays(list, current_block)
>                  break
> -            indented_line = self.get_next_line(file).rstrip()
> -            line = indented_line.strip()
> -            words = line.split()
>  
>              len_list = len(list)
>  
> -            # Look for a known operator in the line
> -            operators = ["?+=", "?<=", "?=", "+=", "<=", "="]
> -            op_found = None
> -            op_pos = len(line)
> -            for op in operators:
> -                pos = line.find(op)
> -                if pos >= 0 and pos < op_pos:
> -                    op_found = op
> -                    op_pos = pos
> -
> -            # Found an operator?
> -            if op_found:
> +            # Parse assignment operators (keep lines in temporary buffer)
> +            if "=" in line:
>                  if self.debug and not restricted:
> -                    self.__debug_print(indented_line,
> -                                       "Parsing operator (%d dicts in current "
> -                                       "context)" % len_list)
> -                (left, value) = self.split_and_strip(line, op_found)
> -                filters_and_key = self.split_and_strip(left, ":")
> -                filters = filters_and_key[:-1]
> -                key = filters_and_key[-1]
> -                filtered_list = list
> -                for filter in filters:
> -                    filtered_list = self.filter(filter, filtered_list)
> -                # Apply the operation to the filtered list
> -                if op_found == "=":
> -                    for dict in filtered_list:
> -                        dict[key] = value
> -                elif op_found == "+=":
> -                    for dict in filtered_list:
> -                        dict[key] = dict.get(key, "") + value
> -                elif op_found == "<=":
> -                    for dict in filtered_list:
> -                        dict[key] = value + dict.get(key, "")
> -                elif op_found.startswith("?"):
> -                    exp = re.compile("^(%s)$" % key)
> -                    if op_found == "?=":
> -                        for dict in filtered_list:
> -                            for key in dict.keys():
> -                                if exp.match(key):
> -                                    dict[key] = value
> -                    elif op_found == "?+=":
> -                        for dict in filtered_list:
> -                            for key in dict.keys():
> -                                if exp.match(key):
> -                                    dict[key] = dict.get(key, "") + value
> -                    elif op_found == "?<=":
> -                        for dict in filtered_list:
> -                            for key in dict.keys():
> -                                if exp.match(key):
> -                                    dict[key] = value + dict.get(key, "")
> +                    _debug_print(indented_line,
> +                                 "Parsing operator (%d dicts in current "
> +                                 "context)" % len_list)
> +                current_block += line + "\n"
> +                continue
> +
> +            # Flush the temporary buffer
> +            self._append_content_to_arrays(list, current_block)
> +            current_block = ""
> +
> +            words = line.split()
>  
>              # Parse 'no' and 'only' statements
> -            elif words[0] == "no" or words[0] == "only":
> +            if words[0] == "no" or words[0] == "only":
>                  if len(words) <= 1:
>                      continue
> -                filters = words[1:]
> +                filters = map(self._get_filter_regex, words[1:])
>                  filtered_list = []
>                  if words[0] == "no":
> -                    for dict in list:
> +                    for a in list:
> +                        name = _array_get_name(a, self.object_cache)
>                          for filter in filters:
> -                            if self.match(filter, dict):
> +                            if filter.search(name):
>                                  break
>                          else:
> -                            filtered_list.append(dict)
> +                            filtered_list.append(a)
>                  if words[0] == "only":
> -                    for dict in list:
> +                    for a in list:
> +                        name = _array_get_name(a, self.object_cache)
>                          for filter in filters:
> -                            if self.match(filter, dict):
> -                                filtered_list.append(dict)
> +                            if filter.search(name):
> +                                filtered_list.append(a)
>                                  break
>                  list = filtered_list
>                  if self.debug and not restricted:
> -                    self.__debug_print(indented_line,
> -                                       "Parsing no/only (%d dicts in current "
> -                                       "context, %d remain)" %
> -                                       (len_list, len(list)))
> +                    _debug_print(indented_line,
> +                                 "Parsing no/only (%d dicts in current "
> +                                 "context, %d remain)" %
> +                                 (len_list, len(list)))
> +                continue
>  
>              # Parse 'variants'
> -            elif line == "variants:":
> +            if line == "variants:":
>                  # 'variants' not allowed in restricted mode
>                  # (inside an exception or inside subvariants)
>                  if restricted:
>                      e_msg = "Using variants in this context is not allowed"
>                      raise error.AutotestError(e_msg)
>                  if self.debug and not restricted:
> -                    self.__debug_print(indented_line,
> -                                       "Entering variants block (%d dicts in "
> -                                       "current context)" % len_list)
> -                list = self.parse_variants(file, list, subvariants=False,
> +                    _debug_print(indented_line,
> +                                 "Entering variants block (%d dicts in "
> +                                 "current context)" % len_list)
> +                list = self.parse_variants(cr, list, subvariants=False,
>                                             prev_indent=indent)
> +                continue
>  
>              # Parse 'subvariants' (the block is parsed for each dict
>              # separately)
> -            elif line == "subvariants:":
> +            if line == "subvariants:":
>                  if self.debug and not restricted:
> -                    self.__debug_print(indented_line,
> -                                       "Entering subvariants block (%d dicts in "
> -                                       "current context)" % len_list)
> +                    _debug_print(indented_line,
> +                                 "Entering subvariants block (%d dicts in "
> +                                 "current context)" % len_list)
>                  new_list = []
> -                # Remember current file position
> -                pos = file.tell()
> +                # Remember current position
> +                pos = cr.tell()
>                  # Read the lines in any case
> -                self.parse_variants(file, [], subvariants=True,
> +                self.parse_variants(cr, [], subvariants=True,
>                                      prev_indent=indent)
>                  # Iterate over the list...
> -                for index in range(len(list)):
> -                    # Revert to initial file position in this 'subvariants'
> -                    # block
> -                    file.seek(pos)
> +                for index in xrange(len(list)):
> +                    # Revert to initial position in this 'subvariants' block
> +                    cr.seek(pos)
>                      # Everything inside 'subvariants' should be parsed in
>                      # restricted mode
> -                    new_list += self.parse_variants(file, list[index:index+1],
> +                    new_list += self.parse_variants(cr, list[index:index+1],
>                                                      subvariants=True,
>                                                      prev_indent=indent)
>                  list = new_list
> +                continue
>  
>              # Parse 'include' statements
> -            elif words[0] == "include":
> +            if words[0] == "include":
>                  if len(words) <= 1:
>                      continue
>                  if self.debug and not restricted:
> -                    self.__debug_print(indented_line,
> -                                       "Entering file %s" % words[1])
> +                    _debug_print(indented_line, "Entering file %s" % words[1])
>                  if self.filename:
>                      filename = os.path.join(os.path.dirname(self.filename),
>                                              words[1])
>                      if os.path.exists(filename):
> -                        new_file = open(filename, "r")
> -                        list = self.parse(new_file, list, restricted)
> -                        new_file.close()
> +                        str = open(filename).read()
> +                        list = self.parse(configreader(str), list, restricted)
>                          if self.debug and not restricted:
> -                            self.__debug_print("", "Leaving file %s" % words[1])
> +                            _debug_print("", "Leaving file %s" % words[1])
>                      else:
>                          logging.warning("Cannot include %s -- file not found",
>                                          filename)
>                  else:
>                      logging.warning("Cannot include %s because no file is "
>                                      "currently open", words[1])
> +                continue
>  
>              # Parse multi-line exceptions
>              # (the block is parsed for each dict separately)
> -            elif line.endswith(":"):
> +            if line.endswith(":"):
>                  if self.debug and not restricted:
> -                    self.__debug_print(indented_line,
> -                                       "Entering multi-line exception block "
> -                                       "(%d dicts in current context outside "
> -                                       "exception)" % len_list)
> -                line = line.strip(":")
> +                    _debug_print(indented_line,
> +                                 "Entering multi-line exception block "
> +                                 "(%d dicts in current context outside "
> +                                 "exception)" % len_list)
> +                line = line[:-1]
>                  new_list = []
> -                # Remember current file position
> -                pos = file.tell()
> +                # Remember current position
> +                pos = cr.tell()
>                  # Read the lines in any case
> -                self.parse(file, [], restricted=True, prev_indent=indent)
> +                self.parse(cr, [], restricted=True, prev_indent=indent)
>                  # Iterate over the list...
> -                for index in range(len(list)):
> -                    if self.match(line, list[index]):
> -                        # Revert to initial file position in this
> -                        # exception block
> -                        file.seek(pos)
> +                exp = self._get_filter_regex(line)
> +                for index in xrange(len(list)):
> +                    name = _array_get_name(list[index], self.object_cache)
> +                    if exp.search(name):
> +                        # Revert to initial position in this exception block
> +                        cr.seek(pos)
>                          # Everything inside an exception should be parsed in
>                          # restricted mode
> -                        new_list += self.parse(file, list[index:index+1],
> +                        new_list += self.parse(cr, list[index:index+1],
>                                                 restricted=True,
>                                                 prev_indent=indent)
>                      else:
> -                        new_list += list[index:index+1]
> +                        new_list.append(list[index])
>                  list = new_list
> +                continue
>  
>          return list
>  
> 
> -    def __debug_print(self, str1, str2=""):
> +    def _get_filter_regex(self, filter):
>          """
> -        Nicely print two strings and an arrow.
> +        Return a regex object corresponding to a given filter string.
>  
> -        @param str1: First string
> -        @param str2: Second string
> +        All regular expressions given to the parser are passed through this
> +        function first.  Its purpose is to make them more specific and better
> +        suited to match dictionary names: it forces simple expressions to match
> +        only between dots or at the beginning or end of a string.  For example,
> +        the filter 'foo' will match 'foo.bar' but not 'foobar'.
>          """
> -        if str2:
> -            str = "%-50s ---> %s" % (str1, str2)
> -        else:
> -            str = str1
> -        logging.debug(str)
> -
> -
> -    def __modify_list_variants(self, list, name, dep_list, add_to_shortname):
> -        """
> -        Make some modifications to list, as part of parsing a 'variants' block.
> -
> -        @param list: List to be processed
> -        @param name: Name to be prepended to the dictionary's 'name' key
> -        @param dep_list: List of dependencies to be added to the dictionary's
> -                'depend' key
> -        @param add_to_shortname: Boolean indicating whether name should be
> -                prepended to the dictionary's 'shortname' key as well
> -        """
> -        for dict in list:
> -            # Prepend name to the dict's 'name' field
> -            dict["name"] = self.add_name(dict["name"], name)
> -            # Prepend name to the dict's 'shortname' field
> -            if add_to_shortname:
> -                dict["shortname"] = self.add_name(dict["shortname"], name)
> -            # Prepend name to each of the dict's dependencies
> -            for i in range(len(dict["depend"])):
> -                dict["depend"][i] = self.add_name(dict["depend"][i], name)
> -            # Add new dependencies
> -            dict["depend"] += dep_list
> -
> -
> -    def __modify_list_subvariants(self, list, name, dep_list, add_to_shortname):
> -        """
> -        Make some modifications to list, as part of parsing a 'subvariants'
> -        block.
> -
> -        @param list: List to be processed
> -        @param name: Name to be appended to the dictionary's 'name' key
> -        @param dep_list: List of dependencies to be added to the dictionary's
> -                'depend' key
> -        @param add_to_shortname: Boolean indicating whether name should be
> -                appended to the dictionary's 'shortname' as well
> -        """
> -        for dict in list:
> -            # Add new dependencies
> -            for dep in dep_list:
> -                dep_name = self.add_name(dict["name"], dep, append=True)
> -                dict["depend"].append(dep_name)
> -            # Append name to the dict's 'name' field
> -            dict["name"] = self.add_name(dict["name"], name, append=True)
> -            # Append name to the dict's 'shortname' field
> -            if add_to_shortname:
> -                dict["shortname"] = self.add_name(dict["shortname"], name,
> -                                                  append=True)
> +        try:
> +            return self.regex_cache[filter]
> +        except KeyError:
> +            exp = re.compile(r"(\.|^)(%s)(\.|$)" % filter)
> +            self.regex_cache[filter] = exp
> +            return exp
> +
> +
> +    def _store_str(self, str):
> +        """
> +        Store str in the internal object cache, if it isn't already there, and
> +        return its identifying index.
> +
> +        @param str: String to store.
> +        @return: The index of str in the object cache.
> +        """
> +        try:
> +            return self.object_cache_indices[str]
> +        except KeyError:
> +            self.object_cache.append(str)
> +            index = len(self.object_cache) - 1
> +            self.object_cache_indices[str] = index
> +            return index
> +
> +
> +    def _append_content_to_arrays(self, list, content):
> +        """
> +        Append content (config code containing assignment operations) to a list
> +        of arrays.
> +
> +        @param list: List of arrays to operate on.
> +        @param content: String containing assignment operations.
> +        """
> +        if content:
> +            str_index = self._store_str(content)
> +            for a in list:
> +                _array_append_to_content(a, str_index)
> +
> +
> +    def _apply_content_to_dict(self, dict, content):
> +        """
> +        Apply the operations in content (config code containing assignment
> +        operations) to a dict.
> +
> +        @param dict: Dictionary to operate on.  Must have 'name' key.
> +        @param content: String containing assignment operations.
> +        """
> +        for line in content.splitlines():
> +            op_found = None
> +            op_pos = len(line)
> +            for op in ops:
> +                pos = line.find(op)
> +                if pos >= 0 and pos < op_pos:
> +                    op_found = op
> +                    op_pos = pos
> +            if not op_found:
> +                continue
> +            (left, value) = map(str.strip, line.split(op_found, 1))
> +            if value and ((value[0] == '"' and value[-1] == '"') or
> +                          (value[0] == "'" and value[-1] == "'")):
> +                value = value[1:-1]
> +            filters_and_key = map(str.strip, left.split(":"))
> +            filters = filters_and_key[:-1]
> +            key = filters_and_key[-1]
> +            for filter in filters:
> +                exp = self._get_filter_regex(filter)
> +                if not exp.search(dict["name"]):
> +                    break
> +            else:
> +                ops[op_found](dict, key, value)
> +
> +
> +# Assignment operators
> +
> +def _op_set(dict, key, value):
> +    dict[key] = value
> +
> +
> +def _op_append(dict, key, value):
> +    dict[key] = dict.get(key, "") + value
> +
> +
> +def _op_prepend(dict, key, value):
> +    dict[key] = value + dict.get(key, "")
> +
> +
> +def _op_regex_set(dict, exp, value):
> +    exp = re.compile("^(%s)$" % exp)
> +    for key in dict:
> +        if exp.match(key):
> +            dict[key] = value
> +
> +
> +def _op_regex_append(dict, exp, value):
> +    exp = re.compile("^(%s)$" % exp)
> +    for key in dict:
> +        if exp.match(key):
> +            dict[key] += value
> +
> +
> +def _op_regex_prepend(dict, exp, value):
> +    exp = re.compile("^(%s)$" % exp)
> +    for key in dict:
> +        if exp.match(key):
> +            dict[key] = value + dict[key]
> +
> +
> +ops = {
> +    "=": _op_set,
> +    "+=": _op_append,
> +    "<=": _op_prepend,
> +    "?=": _op_regex_set,
> +    "?+=": _op_regex_append,
> +    "?<=": _op_regex_prepend,
> +}
> +
> +
> +# Misc functions
> +
> +def _debug_print(str1, str2=""):
> +    """
> +    Nicely print two strings and an arrow.
> +
> +    @param str1: First string.
> +    @param str2: Second string.
> +    """
> +    if str2:
> +        str = "%-50s ---> %s" % (str1, str2)
> +    else:
> +        str = str1
> +    logging.debug(str)
> +
> +
> +# configreader
> +
> +class configreader:
> +    """
> +    Preprocess an input string and provide file-like services.
> +    This is intended as a replacement for the file and StringIO classes,
> +    whose readline() and/or seek() methods seem to be slow.
> +    """
> +
> +    def __init__(self, str):
> +        """
> +        Initialize the reader.
> +
> +        @param str: The string to parse.
> +        """
> +        self.line_index = 0
> +        self.lines = []
> +        for line in str.splitlines():
> +            line = line.rstrip().expandtabs()
> +            stripped_line = line.strip()
> +            indent = len(line) - len(stripped_line)
> +            if (not stripped_line
> +                or stripped_line.startswith("#")
> +                or stripped_line.startswith("//")):
> +                continue
> +            self.lines.append((line, stripped_line, indent))
> +
> +
> +    def get_next_line(self):
> +        """
> +        Get the next non-empty, non-comment line in the string.
> +
> +        @param file: File like object.
> +        @return: (line, stripped_line, indent), where indent is the line's
> +            indent level or -1 if no line is available.
> +        """
> +        try:
> +            if self.line_index < len(self.lines):
> +                return self.lines[self.line_index]
> +            else:
> +                return (None, None, -1)
> +        finally:
> +            self.line_index += 1
> +
> +
> +    def tell(self):
> +        """
> +        Return the current line index.
> +        """
> +        return self.line_index
> +
> +
> +    def seek(self, index):
> +        """
> +        Set the current line index.
> +        """
> +        self.line_index = index
> +
> +
> +# Array structure:
> +# ----------------
> +# The first 4 elements contain the indices of the 4 segments.
> +# a[0] -- Index of beginning of 'name' segment (always 4).
> +# a[1] -- Index of beginning of 'shortname' segment.
> +# a[2] -- Index of beginning of 'depend' segment.
> +# a[3] -- Index of beginning of 'content' segment.
> +# The next elements in the array comprise the aforementioned segments:
> +# The 'name' segment begins with a[a[0]] and ends with a[a[1]-1].
> +# The 'shortname' segment begins with a[a[1]] and ends with a[a[2]-1].
> +# The 'depend' segment begins with a[a[2]] and ends with a[a[3]-1].
> +# The 'content' segment begins with a[a[3]] and ends at the end of the array.
> +
> +# The following functions append/prepend to various segments of an array.
> +
> +def _array_append_to_name_shortname_depend(a, name, depend):
> +    a.insert(a[1], name)
> +    a.insert(a[2] + 1, name)
> +    a.insert(a[3] + 2, depend)
> +    a[1] += 1
> +    a[2] += 2
> +    a[3] += 3
> +
> +
> +def _array_prepend_to_name_shortname_depend(a, name, depend):
> +    a[1] += 1
> +    a[2] += 2
> +    a[3] += 3
> +    a.insert(a[0], name)
> +    a.insert(a[1], name)
> +    a.insert(a[2], depend)
> +
> +
> +def _array_append_to_name_depend(a, name, depend):
> +    a.insert(a[1], name)
> +    a.insert(a[3] + 1, depend)
> +    a[1] += 1
> +    a[2] += 1
> +    a[3] += 2
> +
> +
> +def _array_prepend_to_name_depend(a, name, depend):
> +    a[1] += 1
> +    a[2] += 1
> +    a[3] += 2
> +    a.insert(a[0], name)
> +    a.insert(a[2], depend)
> +
> +
> +def _array_append_to_content(a, content):
> +    a.append(content)
> +
> +
> +def _array_get_name(a, object_cache):
> +    """
> +    Return the name of a dictionary represented by a given array.
> +
> +    @param a: Array representing a dictionary.
> +    @param object_cache: A list of strings referenced by elements in the array.
> +    """
> +    return ".".join([object_cache[i] for i in a[a[0]:a[1]]])
> +
> +
> +def _array_get_all(a, object_cache):
> +    """
> +    Return a 4-tuple containing all the data stored in a given array, in a
> +    format that is easy to turn into an actual dictionary.
> +
> +    @param a: Array representing a dictionary.
> +    @param object_cache: A list of strings referenced by elements in the array.
> +    @return: A 4-tuple: (name, shortname, depend, content), in which all
> +        members are strings except depend which is a list of strings.
> +    """
> +    name = ".".join([object_cache[i] for i in a[a[0]:a[1]]])
> +    shortname = ".".join([object_cache[i] for i in a[a[1]:a[2]]])
> +    content = "".join([object_cache[i] for i in a[a[3]:]])
> +    depend = []
> +    prefix = ""
> +    for n, d in zip(a[a[0]:a[1]], a[a[2]:a[3]]):
> +        for dep in object_cache[d].split():
> +            depend.append(prefix + dep)
> +        prefix += object_cache[n] + "."
> +    return name, shortname, depend, content
> +
>  
> 
>  if __name__ == "__main__":
>      parser = optparse.OptionParser()
>      parser.add_option('-f', '--file', dest="filename", action='store',
>                        help='path to a config file that will be parsed. '
> -                           'If not specified, will parse kvm_tests.cfg '
> -                           'located inside the kvm test dir.')
> +                           'If not specified, will parse tests.cfg located '
> +                           'inside the kvm test dir.')
>      parser.add_option('--verbose', dest="debug", action='store_true',
>                        help='include debug messages in console output')
>  
> @@ -518,9 +698,9 @@ if __name__ == "__main__":
>      # Here we configure the stand alone program to use the autotest
>      # logging system.
>      logging_manager.configure_logging(KvmLoggingConfig(), verbose=debug)
> -    list = config(filename, debug=debug).get_list()
> +    dicts = config(filename, debug=debug).get_generator()
>      i = 0
> -    for dict in list:
> +    for dict in dicts:
>          logging.info("Dictionary #%d:", i)
>          keys = dict.keys()
>          keys.sort()

_______________________________________________
Autotest mailing list
Autotest@test.kernel.org
http://test.kernel.org/cgi-bin/mailman/listinfo/autotest