From: Richard Purdie <richard.purdie@linuxfoundation.org>
To: openembedded-core <openembedded-core@lists.openembedded.org>
Cc: Bruce Ashfield <bruce.ashfield@gmail.com>,
Joshua Watt <JPEWhacker@gmail.com>,
Alexandre Belloni <alexandre.belloni@bootlin.com>
Subject: Current QA failure challenges (perf, QA http-server, asyncio prserv, bitbake runCommand timeout, unfsd test issue)
Date: Sun, 29 Jan 2023 15:41:03 +0000 [thread overview]
Message-ID: <aba59b60b8eb52c2963aaaafa0308fbbf80443db.camel@linuxfoundation.org> (raw)
Hi All,
I've going to be a distracted this next week with other project related
things and the autobuilder is showing a number of problems. I've spent
part of my weekend trying to stabilise things so I could not have the
distraction and whilst I've had partial success, things aren't
resolved. I'm sending this out in the hope others may be able to
help/assist. The issues:
Kernel 6.1 reproducibility issue in perf
========================================
We've had two fixes which do seem to have fixed the kernel-devsrc issue
but not the perf one. The diffoscope:
http://autobuilder.yocto.io/pub/repro-fail/oe-reproducible-20230129-8bp_1b_z/packages/diff-html/
The autobuilder failure:
https://autobuilder.yoctoproject.org/typhoon/#/builders/117/builds/2314/steps/13/logs/stdio
Part of the issue is this seems to be host contamination, so you do
need to build on a different host. The first time the test runs, it
does so on the same host and the output is unchanged which is why we've
had green builds but then failures.
I did try removing the sstate for perf, the issue came back, so it
isn't a cache contamination issue afaik.
bitbake timeout on sending commands
===================================
Bitbake runCommand appears to be able to hang in the send since
there is no timeout set there. This can cause things to block. We need
to put a worse case timeout/exit in there.
unfs tests leave nfs server running
===================================
After the unfs NFS test runs, we seem to leave an unfs server hanging
around after the builds are long gone. These build up on the
autobuilder workers.
http-server in dnf tests breaking
=================================
We've started seeing a lot of short packets with the http-server used
by the dnf package manager tests. dnf retries and succeeds leaving a
failure in the logs which parse_logs picks up.
https://autobuilder.yoctoproject.org/typhoon/#/builders/53/builds/6592/steps/14/logs/stdio
Central error: 2023-01-29T11:43:32+0000 INFO Error during transfer: Curl error (18): Transferred a partial file for http://192.168.7.1:36183/cortexa15t2hf_neon/libc6-2.36-r0.cortexa15t2hf_neon.rpm [transfer closed with 15758 bytes remaining to read]
https://autobuilder.yoctoproject.org/typhoon/#/builders/59/builds/6592/steps/14/logs/stdio
Central error: 2023-01-29T11:07:31+0000 INFO Error during transfer: Curl error (18): Transferred a partial file for http://192.168.7.3:42417/core2_32/busybox-1.35.0-r0.core2_32.rpm [transfer closed with 14627 bytes remaining to read]
https://autobuilder.yoctoproject.org/typhoon/#/builders/45/builds/6598/steps/13/logs/stdio
https://autobuilder.yoctoproject.org/typhoon/#/builders/59/builds/6532
and a load more. Why the sudden increase in frequency, not sure. Could
be 6.1 kernel related.
asyncio issues with PR Serv
===========================
I've found at least one bug with what the code was doing and fixing
that should help some of the test failures we've seen. When looking
into that I found more problems though.
If you run "oe-selftest -r prservice -j 3 -K" and then look at the
bitbake-cookerdaemon.log, it shows a load of python asyncio issues. The
first one is:
/usr/lib/python3.10/asyncio/base_events.py:685: ResourceWarning:
unclosed event loop <_UnixSelectorEventLoop running=False closed=False
debug=False>
_warn(f"unclosed event loop {self!r}", ResourceWarning, source=self)
showing connections to the PR service aren't being cleaned up.
Also:
/home/pokybuild/yocto-worker/oe-selftest-
debian/build/bitbake/lib/bb/codeparser.py:472: ResourceWarning:
unclosed <socket.socket fd=15, family=AddressFamily.AF_INET,
type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 47910),
raddr=('127.0.0.1', 40751)>
self.process_tokens(more_tokens)
but the codeparser reference isn't accurate, it is just from the
garbage collection point.
The second issue is that in running the selftests, a PR Service process
often seems to be left behind without bitbake running. It suggests some
kind of lifecycle issue somewhere. It might only be happening in
failure cases, not sure.
The third issue seems to be around event loop shutdown. On debian11
with py 3.9, when bitbake is shutting down we're seeing:
412934 07:12:26.081334 Exiting (socket: True)
/usr/lib/python3.9/signal.py:30: ResourceWarning: unclosed
<socket.socket fd=22, family=AddressFamily.AF_INET,
type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 36287), r
addr=('127.0.0.1', 36422)>
return enum_klass(value)
ResourceWarning: Enable tracemalloc to get the object allocation
traceback
/usr/lib/python3.9/asyncio/selector_events.py:704: ResourceWarning:
unclosed transport <_SelectorSocketTransport fd=22>
_warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation
traceback
Traceback (most recent call last):
File "/home/pokybuild/yocto-worker/oe-selftest-
debian/build/bitbake/lib/bb/asyncrpc/serv.py", line 71, in
process_requests
d = await self.read_message()
GeneratorExit
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/pokybuild/yocto-worker/oe-selftest-
debian/build/bitbake/lib/bb/asyncrpc/serv.py", line 201, in
handle_client
await client.process_requests()
File "/home/pokybuild/yocto-worker/oe-selftest-
debian/build/bitbake/lib/bb/asyncrpc/serv.py", line 79, in
process_requests
self.writer.close()
File "/usr/lib/python3.9/asyncio/streams.py", line 353, in close
return self._transport.close()
File "/usr/lib/python3.9/asyncio/selector_events.py", line 700, in
close
self._loop.call_soon(self._call_connection_lost, None)
File "/usr/lib/python3.9/asyncio/base_events.py", line 746, in
call_soon
self._check_closed()
File "/usr/lib/python3.9/asyncio/base_events.py", line 510, in
_check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
Task was destroyed but it is pending!
task: <Task pending name='Task-2540' coro=<AsyncServer.h
If anyone can help with any of these please reply to the email. We
probably need to transfer some into bugzilla, I just wanted to at least
get something written down.
Cheers,
Richard
next reply other threads:[~2023-01-29 15:41 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-29 15:41 Richard Purdie [this message]
2023-01-30 20:23 ` [OE-core] Current QA failure challenges (perf, QA http-server, asyncio prserv, bitbake runCommand timeout, unfsd test issue) Alexander Kanavin
2023-01-30 20:23 ` Alexander Kanavin
2023-01-30 20:30 ` Alexander Kanavin
[not found] ` <173F3040B8ABB5B1.13569@lists.openembedded.org>
2023-01-30 22:14 ` Alexander Kanavin
2023-01-31 15:06 ` Bruce Ashfield
2023-01-31 15:14 ` [OE-core] " Alexander Kanavin
2023-01-31 16:03 ` Richard Purdie
2023-02-01 13:59 ` Ross Burton
2023-02-01 14:22 ` Joshua Watt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aba59b60b8eb52c2963aaaafa0308fbbf80443db.camel@linuxfoundation.org \
--to=richard.purdie@linuxfoundation.org \
--cc=JPEWhacker@gmail.com \
--cc=alexandre.belloni@bootlin.com \
--cc=bruce.ashfield@gmail.com \
--cc=openembedded-core@lists.openembedded.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox