From: Patrick Ohly <patrick.ohly@intel.com>
To: Martin Jansa <martin.jansa@gmail.com>
Cc: Patches and discussions about the oe-core layer
<openembedded-core@lists.openembedded.org>
Subject: Re: Never ending stream of bitbake exceptions when the builder runs out of disk space
Date: Tue, 27 Jun 2017 10:08:52 +0200 [thread overview]
Message-ID: <1498550932.7464.29.camel@intel.com> (raw)
In-Reply-To: <CA+chaQcmksz0eGEwb9RZ9B_7JFaB7e97iNCoWCp84zJEkqpHoA@mail.gmail.com>
On Thu, 2017-06-15 at 08:48 +0200, Martin Jansa wrote:
> This issue exists for very long time.
>
>
> I know that when the builder runs out of disk space there are multiple
> things which might go wrong (I've seen bad archives on premirrors, bad
> sstate archives caused by this), so this issue isn't the main problem,
> but still would be nice to fail faster.
>
>
> In last build which was running for some 9 hours, it was first
> building for maybe 2 hours before it run out of disk space and this
> morning there is 50MB log just from bitbake output stored on the
> jenkins master. Repeating following message very quickly
>
>
> # grep -c "Errno 28" consoleText.txt
> 42986
>
>
> ERROR: Running command [['world'], 'build']
> Traceback (most recent call last):
> File "/home/jenkins/oe/world/shr-core/bitbake/lib/bb/event.py", line
> 211, in fire(event=<bb.event.HeartbeatEvent object at 0x7fcfed3e96a0>,
> d=<bb.data_smart.DataSmart object at 0x7fd00330b198>):
>
> > fire_class_handlers(event, d)
> if worker_fire:
> File "/home/jenkins/oe/world/shr-core/bitbake/lib/bb/event.py", line
> 134, in fire_class_handlers(event=<bb.event.HeartbeatEvent object at
> 0x7fcfed3e96a0>, d=<bb.data_smart.DataSmart object at
> 0x7fd00330b198>):
> continue
> > execute_handler(name, handler, event, d)
>
> File "/home/jenkins/oe/world/shr-core/bitbake/lib/bb/event.py", line
> 106, in execute_handler(name='runqueue_stats', handler=<function
> runqueue_stats at 0x7fd0020c6158>, event=<bb.event.HeartbeatEvent
> object at 0x7fcfed3e96a0>, d=<bb.data_smart.DataSmart object at
> 0x7fd00330b198>):
> try:
> > ret = handler(event)
> except (bb.parse.SkipRecipe, bb.BBHandledException):
> File
> "/home/jenkins/oe/world/shr-core/openembedded-core/meta/classes/buildstats.bbclass", line 212, in runqueue_stats(e=<bb.event.HeartbeatEvent object at 0x7fcfed3e96a0>):
> done = isinstance(e, bb.event.BuildCompleted)
> > system_stats.sample(e, force=done)
> if done:
> File
> "/home/jenkins/oe/world/shr-core/openembedded-core/meta/lib/buildstats.py", line 148, in SystemStats.sample(event=<bb.event.HeartbeatEvent object at 0x7fcfed3e96a0>, force=False):
> data +
> > b'\n')
> self.last_proc = now
> OSError: [Errno 28] No space left on device
>
>
> It would be better to exit completely when something as bad as Errno
> 28 happens.
Do you have BB_DISKMON_DIRS active? Probably yes.
The reason why it did not trigger here might be that the build ran out
of disk space so quickly that the disk monitoring had no chance to
detect the problem before system stat sampling itself started failing
with the error above.
System stat sampling and disk monitoring are hooking into the same
event, so my theory is that once the system stat sampling fails, disk
monitoring code no longer runs.
I'm not sure what exactly the right fix is: detect uncaught OSError like
28 in the bitbake event loop and abort the build, and/or catch the error
in buildstats.py and ignore it so that the normal disk monitoring can
happen?
I know how to do the latter, but not the former.
--
Best Regards, Patrick Ohly
The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.
next prev parent reply other threads:[~2017-06-27 8:08 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-15 6:48 Never ending stream of bitbake exceptions when the builder runs out of disk space Martin Jansa
2017-06-27 8:08 ` Patrick Ohly [this message]
2017-06-27 8:12 ` Martin Jansa
2017-06-27 8:25 ` Richard Purdie
2017-06-27 9:21 ` Patrick Ohly
2017-06-27 9:37 ` Richard Purdie
2017-06-27 13:00 ` Martin Jansa
2017-06-27 9:41 ` Richard Purdie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1498550932.7464.29.camel@intel.com \
--to=patrick.ohly@intel.com \
--cc=martin.jansa@gmail.com \
--cc=openembedded-core@lists.openembedded.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox