All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Thomas Huth <thuth@redhat.com>
Subject: Re: [PATCH] migration: Add more error handling to analyze-migration.py
Date: Thu, 02 Jan 2025 19:48:27 -0300	[thread overview]
Message-ID: <87pll496is.fsf@suse.de> (raw)
In-Reply-To: <Z3cNJZ0inEMRSitV@x1n>

Peter Xu <peterx@redhat.com> writes:

> On Thu, Jan 02, 2025 at 03:58:31PM -0300, Fabiano Rosas wrote:
>> The analyze-migration script was seen failing in s390x in misterious
>> ways. It seems we're reaching the subsection constructor without any
>
> It might be a good idea to add some debug lines indeed. Though are you sure
> it's from parsing a subsection?

Ah, indeed it's not always a subsection. So I need to go back to the
code and do more auditing, we might have a wrong struct macro somewhere.

>
> https://lore.kernel.org/all/20241219123213.GA692742@fedora/
>
> ====8<====
> stderr:
> Traceback (most recent call last):
>   File "/home/gitlab-runner/builds/4S3awx_3/0/qemu-project/qemu/build/scripts/analyze-migration.py", line 688, in <module>
>     dump.read(dump_memory = args.memory)
>   File "/home/gitlab-runner/builds/4S3awx_3/0/qemu-project/qemu/build/scripts/analyze-migration.py", line 625, in read
>     section.read()
>   File "/home/gitlab-runner/builds/4S3awx_3/0/qemu-project/qemu/build/scripts/analyze-migration.py", line 461, in read
>     field['data'] = reader(field, self.file)
>   File "/home/gitlab-runner/builds/4S3awx_3/0/qemu-project/qemu/build/scripts/analyze-migration.py", line 434, in __init__
>     for field in self.desc['struct']['fields']:
> KeyError: 'fields'
> ====8<====
>
> It reads to me that it's not in "if 'subsections' in self.desc['struct']"
> loop yet, instead it looks like a real STRUCT field in one of the device
> sections?  If that's true, then...
>
>> fields, which would indicate an empty .subsection entry in the vmstate
>> definition. We don't have any of those, at least not without the
>> unmigratable flag set, so this should never happen.
>> 
>> Add some debug statements so that we can see what's going on the next
>> time the issue happens.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  scripts/analyze-migration.py | 33 +++++++++++++++++++++++++++------
>>  1 file changed, 27 insertions(+), 6 deletions(-)
>> 
>> diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
>> index 8a254a5b6a..1dd98f1d5a 100755
>> --- a/scripts/analyze-migration.py
>> +++ b/scripts/analyze-migration.py
>> @@ -429,6 +429,10 @@ def __init__(self, desc, file):
>>          super(VMSDFieldStruct, self).__init__(desc, file)
>>          self.data = collections.OrderedDict()
>>  
>> +        if 'fields' not in self.desc['struct']:
>> +            raise Exception("No fields in subsection key=%s name=%s" %
>> +                            (self.section_key, self.vmsd_name))
>
> ... this self.section_key / self.vmsd_name may not exist..

Right, so there's no information we can get at this point I think. But
raising will still trigger the dump later on.

>
> In all cases, IMHO this would be better the debug lines work with both pure
> structs and sections.

Yeah, I need to change that to something generic.

>
>> +
>>          # When we see compressed array elements, unfold them here
>>          new_fields = []
>>          for field in self.desc['struct']['fields']:
>> @@ -477,6 +481,11 @@ def read(self):
>>                      raise Exception("Subsection %s not found at offset %x" % ( subsection['vmsd_name'], self.file.tell()))
>>                  name = self.file.readstr()
>>                  version_id = self.file.read32()
>> +
>> +                if not subsection:
>> +                    raise Exception("Empty description for subsection %s" %
>> +                                    name)
>
> This is checking subsection==None??  I doubt whether this will hit..

Well, there could be a None in the list of subsections somehow, no?

>
>> +
>>                  self.data[name] = VMSDSection(self.file, version_id, subsection, (name, 0))
>>                  self.data[name].read()
>>  
>> @@ -575,9 +584,8 @@ def __init__(self, filename):
>>          self.filename = filename
>>          self.vmsd_desc = None
>>  
>> -    def read(self, desc_only = False, dump_memory = False, write_memory = False):
>> -        # Read in the whole file
>> -        file = MigrationFile(self.filename)
>> +    def _read(self, file, vmsd_json, desc_only = False, dump_memory = False,
>> +              write_memory = False):
>>  
>>          # File magic
>>          data = file.read32()
>> @@ -589,7 +597,7 @@ def read(self, desc_only = False, dump_memory = False, write_memory = False):
>>          if data != self.QEMU_VM_FILE_VERSION:
>>              raise Exception("Invalid version number %d" % data)
>>  
>> -        self.load_vmsd_json(file)
>> +        self.load_vmsd_json(file, vmsd_json)
>>  
>>          # Read sections
>>          self.sections = collections.OrderedDict()
>> @@ -632,12 +640,25 @@ def read(self, desc_only = False, dump_memory = False, write_memory = False):
>>                      raise Exception("Mismatched section footer: %x vs %x" % (read_section_id, section_id))
>>              else:
>>                  raise Exception("Unknown section type: %d" % section_type)
>> -        file.close()
>>  
>> -    def load_vmsd_json(self, file):
>> +    def read(self, desc_only = False, dump_memory = False,
>> +             write_memory = False):
>> +        file = MigrationFile(self.filename)
>>          vmsd_json = file.read_migration_debug_json()
>> +
>> +        try:
>> +            self._read(file, vmsd_json, desc_only, dump_memory, write_memory)
>> +        except:
>> +            raise Exception("Full JSON dump:\n%s", vmsd_json)
>
> Better dump the Exception line itself too? IIUC that needs things like:
>
>   except Exception as e:
>       raise Exception(XXX, e, vmsd_json)

It already shows both exceptions in the form:

Traceback (most recent call last):
  <backtrace>
Exception: <msg>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  <backtrace>
Exception: ('Full JSON dump:\n%s', '{"configuration": {"vmsd_name" ...

>
> More below..
>
>> +        finally:
>> +            file.close()
>> +
>> +    def load_vmsd_json(self, file, vmsd_json):
>>          self.vmsd_desc = json.loads(vmsd_json, object_pairs_hook=collections.OrderedDict)
>>          for device in self.vmsd_desc['devices']:
>> +            if 'fields' not in device:
>> +                raise Exception("vmstate for device %s has no fields" %
>> +                                device['name'])
>
> This looks a valid check, though I still doubt whether this would hit at
> all for this specific error (which looks like a top level STRUCT of a
> section, that is missing "fields").

Ok, but it's a (VMSDSection, device) tuple below. That "device" might
end up in the constructor of the FieldStruct:

MigrationDump.load_vmsd_json:
  ...
  value = ( VMSDSection, device )
  self.section_classes[key] = value
  ...

MigrationDump.read:
  ...
  classdesc = self.section_classes[section_key]
  section = classdesc[0](file, version_id, classdesc[1], section_key)


class VMSDSection(VMSDFieldStruct):
    def __init__(self, file, version_id, device, section_key):
        ...
        # A section really is nothing but a FieldStruct :)
        super(VMSDSection, self).__init__({ 'struct' : desc }, file)

>>              key = (device['name'], device['instance_id'])
>>              value = ( VMSDSection, device )
>>              self.section_classes[key] = value
>> -- 
>> 2.35.3
>> 
>
> For the capture part, would it be easier if we trap at the very top level
> once and for all, then try to dump the vmsd_desc as long as existed?  Like
> this:
>
> ====8<====
> @@ -685,9 +686,15 @@ def default(self, o):
>      f.close()
>  elif args.dump == "state":
>      dump = MigrationDump(args.file)
> -    dump.read(dump_memory = args.memory)
> -    dict = dump.getDict()
> -    print(jsonenc.encode(dict))
> +    try:
> +        dump.read(dump_memory = args.memory)

This is not the only dump.read() call in the file. Ideally we'd wrap
them all. But then there's a slight clash with the parameter validation
part (args.dump == ...) because it raises if no valid option is used.

I'll try to improve this a bit. I was just trying to get this thing to
spit something useful so we can unblock the migration PR.

> +        dict = dump.getDict()
> +        print(jsonenc.encode(dict))
> +    except Exception as e:
> +        # If loading vmstate stream failed, try the best to dump vmsd desc
> +        raise Exception(
> +            f"Caught exception when reading dump: {e}\n"
> +            f"Trying to dump vmsd_desc: {jsonenc.encode(dump.vmsd_desc)}")

I wanted to avoid doing any processing after the error because there's
the chance there's Python issue during decoding. But I can maybe store
the string instead at load_vmsd_json.

>  elif args.dump == "desc":
>      dump = MigrationDump(args.file)
>      dump.read(desc_only = True)
> ====8<====
>
> So no matter what caused error, fallback to try dump vmsd_desc as long as
> available.
>
> Thanks,


      reply	other threads:[~2025-01-02 22:48 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-02 18:58 [PATCH] migration: Add more error handling to analyze-migration.py Fabiano Rosas
2025-01-02 22:03 ` Peter Xu
2025-01-02 22:48   ` Fabiano Rosas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pll496is.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.