All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mikko Rapeli <mikko.rapeli@linaro.org>
To: Richard Purdie <richard.purdie@linuxfoundation.org>
Cc: openembedded-core@lists.openembedded.org
Subject: Re: [OE-core] [PATCH 3/3] testimage.bbclass: capture RuntimeError too
Date: Mon, 18 Nov 2024 10:00:13 +0200	[thread overview]
Message-ID: <Zzr0Df_92qav6fWq@nuoska> (raw)
In-Reply-To: <10810101792ffe49440847764ed2e4c620e67fe9.camel@linuxfoundation.org>

Hi,

On Tue, Nov 12, 2024 at 11:25:51AM +0000, Richard Purdie wrote:
> On Mon, 2024-11-11 at 13:16 +0000, Mikko Rapeli via lists.openembedded.org wrote:
> > runqemu can fail with RuntimeError exception. Non-cought exception
> > causes cooker process leaks which bind to successive bitbake command
> > line calls and that can cause really odd errors to users, e.g. when
> > build/tmp is wiped and cooker processes expect files to be there.
> > 
> > Signed-off-by: Mikko Rapeli <mikko.rapeli@linaro.org>
> > ---
> > �meta/classes-recipe/testimage.bbclass | 2 +-
> > �1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/meta/classes-recipe/testimage.bbclass b/meta/classes-recipe/testimage.bbclass
> > index 19075ce1f3..a9b031093a 100644
> > --- a/meta/classes-recipe/testimage.bbclass
> > +++ b/meta/classes-recipe/testimage.bbclass
> > @@ -371,7 +371,7 @@ def testimage_main(d):
> > �������� complete = True
> > �������� if results.hasAnyFailingTest():
> > ������������ run_failed_tests_post_actions(d, tc)
> > -��� except (KeyboardInterrupt, BlockingIOError) as err:
> > +��� except (KeyboardInterrupt, BlockingIOError, RuntimeError) as err:
> > �������� if isinstance(err, KeyboardInterrupt):
> > ������������ bb.error('testimage interrupted, shutting down...')
> > �������� else:
> > 
> 
> During review it is hard to understand what the real issue is from this
> description. I don't like the sound of processes leaking and if that is
> happening, adding another exception to this list doesn't feel correct.
> I was going to ask for a better explanation but looking at the code,
> perhaps this error handling path just needs rewriting/improving with
> more of the code in the finally, conditionally?
> 
> I just want to make sure we fix the real bug here.

Sorry for being unclear. I thought the backtrace would be too verbose.

The bug happens when runqemu startup fails:

poky/meta/lib/oeqa/targetcontrol.py:            raise RuntimeError("%s - FAILED to start qemu - check the task log and the boot log" % self.pn)

cooker processes do leak when the exceptions are not cought.
Maybe these are not strictly related but it happens for me. It
can be that cleanup happens but just slowly, and when I run
other bitbake commands right after failure they connect to these
leaked cooker processes which then behave badly, for example when
build/tmp was already wiped.

Here is a failure log from one of my builds from meta-arm with other
changes on top which are the root cause for missing rootfs file:

ERROR: core-image-base-1.0-r0 do_testimage: Invalid rootfs /home/builder/src/base/meta-arm/build/tmp/deploy/images/qemuarm64-secureboot/core-image-base-qemuarm64-secureboot.rootfs.wic.qcow2
ERROR: core-image-base-1.0-r0 do_testimage: Error executing a python function in exec_func_python() autogenerated:

The stack trace of python calls that resulted in this exception/failure was:
File: 'exec_func_python() autogenerated', lineno: 2, function: <module>
     0001:
 *** 0002:do_testimage(d)
     0003:
File: '/home/builder/src/base/meta-arm/build/../poky/meta/classes-recipe/testimage.bbclass', lineno: 122, function: do_testimage
     0118:    dump-guest-memory {"paging":false,"protocol":"file:%s.img"}
     0119:}
     0120:
     0121:python do_testimage() {
 *** 0122:    testimage_main(d)
     0123:}
     0124:
     0125:addtask testimage
     0126:do_testimage[nostamp] = "1"
File: '/home/builder/src/base/meta-arm/build/../poky/meta/classes-recipe/testimage.bbclass', lineno: 364, function: testimage_main
     0360:    orig_sigterm_handler = signal.signal(signal.SIGTERM, sigterm_exception)
     0361:    try:
     0362:        # We need to check if runqemu ends unexpectedly
     0363:        # or if the worker send us a SIGTERM
 *** 0364:        tc.target.start(params=d.getVar("TEST_QEMUPARAMS"), runqemuparams=d.getVar("TEST_RUNQEMUPARAMS"))
     0365:        import threading
     0366:        try:
     0367:            threading.Timer(int(d.getVar("TEST_OVERALL_TIMEOUT")), handle_test_timeout, (int(d.getVar("TEST_OVERALL_TIMEOUT")),)).start()
     0368:        except ValueError:
File: '/home/builder/src/base/meta-arm/build/../poky/meta/lib/oeqa/core/target/qemu.py', lineno: 91, function: start
     0087:            except (subprocess.CalledProcessError, subprocess.TimeoutExpired, FileNotFoundError) as err:
     0088:                msg += "Error running command: %s\n%s\n" % (blcmd, err)
     0089:            msg += "\n\n===== end: snippet =====\n"
     0090:
 *** 0091:            raise RuntimeError("FAILED to start qemu - check the task log and the boot log %s" % (msg))
     0092:    
     0093:    def stop(self):
     0094:        self.runner.stop()
Exception: RuntimeError: FAILED to start qemu - check the task log and the boot log 

===== start: snippet =====



===== end: snippet =====

===== start: snippet =====

Error running command: ['tail', '-20', '/home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-linux/core-image-base/1.0/testimage/qemu_boot_log.20241031142321']
Command '['tail', '-20', '/home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-linux/core-image-base/1.0/testimage/qemu_boot_log.20241031142321']' returned non-zero exit status 1.


===== end: snippet =====


ERROR: Logfile of failure stored in: /home/builder/src/base/meta-arm/build/tmp/work/qemuarm64_secureboot-poky-linux/core-image-base/1.0/temp/log.do_testimage.1779338
NOTE: recipe core-image-base-1.0-r0: task do_testimage: Failed
ERROR: Task (/home/builder/src/base/meta-arm/build/../poky/meta/recipes-core/images/core-image-base.bb:do_testimage) failed with exit code '1'
NOTE: Tasks Summary: Attempted 7084 tasks of which 7054 didn't need to be rerun and 1 failed.

Summary: 1 task failed:
  /home/builder/src/base/meta-arm/build/../poky/meta/recipes-core/images/core-image-base.bb:do_testimage

Cheers,

-Mikko


  reply	other threads:[~2024-11-18  8:00 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-11 13:16 [PATCH 1/3] uki.bbclass: fix debug print logging level Mikko Rapeli
2024-11-11 13:16 ` [PATCH 2/3] oeqa runtime uki.py: add tests Mikko Rapeli
2024-11-11 13:16 ` [PATCH 3/3] testimage.bbclass: capture RuntimeError too Mikko Rapeli
2024-11-12 11:25   ` [OE-core] " Richard Purdie
2024-11-18  8:00     ` Mikko Rapeli [this message]
2025-01-28 13:04       ` Richard Purdie
     [not found]       ` <181EDCF7C1A1686B.17613@lists.openembedded.org>
2025-01-28 13:49         ` Richard Purdie
2025-01-28 14:37           ` Mikko Rapeli
2025-01-28 15:10             ` Richard Purdie
2025-01-28 15:23               ` Mikko Rapeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zzr0Df_92qav6fWq@nuoska \
    --to=mikko.rapeli@linaro.org \
    --cc=openembedded-core@lists.openembedded.org \
    --cc=richard.purdie@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.