From: Richard Purdie <richard.purdie@linuxfoundation.org>
To: Patrick Ohly <patrick.ohly@intel.com>,
Martin Jansa <martin.jansa@gmail.com>
Cc: Patches and discussions about the oe-core layer
<openembedded-core@lists.openembedded.org>
Subject: Re: Never ending stream of bitbake exceptions when the builder runs out of disk space
Date: Tue, 27 Jun 2017 10:41:26 +0100 [thread overview]
Message-ID: <1498556486.3124.8.camel@linuxfoundation.org> (raw)
In-Reply-To: <1498550932.7464.29.camel@intel.com>
On Tue, 2017-06-27 at 10:08 +0200, Patrick Ohly wrote:
> On Thu, 2017-06-15 at 08:48 +0200, Martin Jansa wrote:
> >
> > This issue exists for very long time.
> >
> >
> > I know that when the builder runs out of disk space there are
> > multiple
> > things which might go wrong (I've seen bad archives on premirrors,
> > bad
> > sstate archives caused by this), so this issue isn't the main
> > problem,
> > but still would be nice to fail faster.
> >
> >
> > In last build which was running for some 9 hours, it was first
> > building for maybe 2 hours before it run out of disk space and this
> > morning there is 50MB log just from bitbake output stored on the
> > jenkins master. Repeating following message very quickly
> >
> >
> > # grep -c "Errno 28" consoleText.txt
> > 42986
> >
> >
> > ERROR: Running command [['world'], 'build']
> > Traceback (most recent call last):
> > File "/home/jenkins/oe/world/shr-core/bitbake/lib/bb/event.py",
> > line
> > 211, in fire(event=<bb.event.HeartbeatEvent object at
> > 0x7fcfed3e96a0>,
> > d=<bb.data_smart.DataSmart object at 0x7fd00330b198>):
> >
> > > fire_class_handlers(event, d)
> > if worker_fire:
> > File "/home/jenkins/oe/world/shr-core/bitbake/lib/bb/event.py",
> > line
> > 134, in fire_class_handlers(event=<bb.event.HeartbeatEvent object
> > at
> > 0x7fcfed3e96a0>, d=<bb.data_smart.DataSmart object at
> > 0x7fd00330b198>):
> > continue
> > > execute_handler(name, handler, event, d)
> >
> > File "/home/jenkins/oe/world/shr-core/bitbake/lib/bb/event.py",
> > line
> > 106, in execute_handler(name='runqueue_stats', handler=<function
> > runqueue_stats at 0x7fd0020c6158>, event=<bb.event.HeartbeatEvent
> > object at 0x7fcfed3e96a0>, d=<bb.data_smart.DataSmart object at
> > 0x7fd00330b198>):
> > try:
> > > ret = handler(event)
> > except (bb.parse.SkipRecipe, bb.BBHandledException):
> > File
> > "/home/jenkins/oe/world/shr-core/openembedded-
> > core/meta/classes/buildstats.bbclass", line 212, in
> > runqueue_stats(e=<bb.event.HeartbeatEvent object at
> > 0x7fcfed3e96a0>):
> > done = isinstance(e, bb.event.BuildCompleted)
> > > system_stats.sample(e, force=done)
> > if done:
> > File
> > "/home/jenkins/oe/world/shr-core/openembedded-
> > core/meta/lib/buildstats.py", line 148, in
> > SystemStats.sample(event=<bb.event.HeartbeatEvent object at
> > 0x7fcfed3e96a0>, force=False):
> > data +
> > > b'\n')
> > self.last_proc = now
> > OSError: [Errno 28] No space left on device
> >
> >
> > It would be better to exit completely when something as bad as
> > Errno
> > 28 happens.
> Do you have BB_DISKMON_DIRS active? Probably yes.
>
> The reason why it did not trigger here might be that the build ran
> out
> of disk space so quickly that the disk monitoring had no chance to
> detect the problem before system stat sampling itself started failing
> with the error above.
>
> System stat sampling and disk monitoring are hooking into the same
> event, so my theory is that once the system stat sampling fails, disk
> monitoring code no longer runs.
>
> I'm not sure what exactly the right fix is: detect uncaught OSError
> like
> 28 in the bitbake event loop and abort the build, and/or catch the
> error
> in buildstats.py and ignore it so that the normal disk monitoring can
> happen?
>
> I know how to do the latter, but not the former.
Incidentally, looking at this trace, I think bitbake should drop the
event handler triggering exceptions in a case like this, try and avoid
looping quite so badly. We should probably have a bug for that.
Cheers,
Richard
prev parent reply other threads:[~2017-06-27 9:41 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-15 6:48 Never ending stream of bitbake exceptions when the builder runs out of disk space Martin Jansa
2017-06-27 8:08 ` Patrick Ohly
2017-06-27 8:12 ` Martin Jansa
2017-06-27 8:25 ` Richard Purdie
2017-06-27 9:21 ` Patrick Ohly
2017-06-27 9:37 ` Richard Purdie
2017-06-27 13:00 ` Martin Jansa
2017-06-27 9:41 ` Richard Purdie [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1498556486.3124.8.camel@linuxfoundation.org \
--to=richard.purdie@linuxfoundation.org \
--cc=martin.jansa@gmail.com \
--cc=openembedded-core@lists.openembedded.org \
--cc=patrick.ohly@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox