From: Rich Pixley <rich.pixley@palm.com>
To: openembedded-core@lists.openembedded.org
Subject: Re: SetScene tasks hang forever?
Date: Wed, 09 May 2012 10:51:50 -0700 [thread overview]
Message-ID: <4FAAAEB6.2060703@palm.com> (raw)
In-Reply-To: <1336480442.25084.74.camel@ted>
[-- Attachment #1: Type: text/plain, Size: 4844 bytes --]
On 5/8/12 05:34 , Richard Purdie wrote:
> On Sun, 2012-05-06 at 10:36 -0700, Rich Pixley wrote:
>> On 5/2/12 16:06 , Richard Purdie wrote:
>>> On Wed, 2012-05-02 at 14:48 -0500, Mark Hatle wrote:
>>>> On 5/2/12 2:45 PM, Rich Pixley wrote:
>>> What would really help is a way to reproduce this...
>>>
>>> Does it reproduce with a certain set of metadata/sstate perhaps?
>>>
>>> What is odd about the above logs is that it appears bitbake never
>>> executes any task. Its possible something might have crashed somewhere I
>>> guess and not realise part of the system had died. Or it could be some
>>> kind of circular dependency loop where X needs Y to build and Y needs X
>>> so nothing happens. We are supposed to spot and error if that would have
>>> happened.
>>>
>>> Does strace give an idea of which bits of bitbake are alive/looping? I'd
>>> probably resort to a few print()/bb.error() in the code at this point to
>>> find out what is alive, what is dead and where its looping...
>> I have more info now.
>>
>> What I suspected was looping, (since it took longer than the ~1hr I was
>> willing to wait), isn't actual looping. Given enough time, the builds
>> do complete and I have comparable results on 5 different servers, (all
>> ubuntu-12.04 amd64 and all on btrfs).
>>
>> My initial, full builds of core-image-minimal do build, and they build
>> in ~60min, (~30min if I hand seed the downloads directory). I'm using
>> no mirrors other than the defaults. My second build in an already built
>> directory, (expected to do nothing), takes anywhere from 7 - 10.5hrs to
>> complete and successfully do nothing, depending on the server.
>>
>> During this time, top shows a single cpu pinned at 98 - 100%
>> utilization, and strace shows literally millions of access and stat
>> calls on stamp files, mkdir on the stamps directory, etc. Statistical
>> analysis of just the do_fetch access calls shows a distribution that
>> seems to mimic the topological tree. That is, the most called access is
>> for quilt-native and the components higher up the tree get fewer stats.
>>
>> Oh, and the setscene stamps are all nonexistent. I presume that's expected.
>>
>> First, I can't imagine why there would need to be more than one mkdir on
>> the stamps directory within a single instantiation of bitbake. I can
>> imagine that it was easier to attempt to mkdir it than to check first,
>> but once it has been mkdir'd, (or checked), there's no need to do it
>> another million times, is there?
>>
>> Second, I can't imagine why there would need to be all the redundant
>> stamp checking. That info is cached internally, isn't it?
>>
>> And third, the fact that it seems to be checking the entire subtree what
>> appear to be multiple times at every node suggests to me that the
>> checking algorithm is broken. Back of the envelope... perhaps 300
>> components, maybe 10 tasks per component ~= 3e3 tasks. Figure a
>> geometric explosion of checks for an inefficient algorithm and we're up
>> to around 10e6 checks. I haven't counted an entire run, but based on
>> the time it takes to run, I'd say I'm seeing one, maybe two orders of
>> magnitude more checks than that. I've seen a few million node
>> traversals in about 15min and a node traversal appears to involve
>> several accesses and at least one stat.
>>
>> I'm not familiar with the current bitbake internals so my next thought
>> would be to replace the calls to access, stat, and mkdir on the stamp
>> files with caching, counting calls. Build a dictionary of each file
>> called, if it's new, do the kernel call and cache the result in the
>> dictionary. If it's already in the dictionary, then inc a counter for
>> it and return the cached value. This should a) improve the speed of the
>> current algorithm, b) improve the speed of the eventual replacement
>> algorithm, and c) give us some useful statistical data in the mean time.
>>
>> I'm also going to try reformating one of the systems and compare how
>> long a build on ext4 takes.
>>
>> Any other ideas?
> Well, this clearly doesn't happen with master or in any combination of
> the layers most users are using. The logical conclusion would be that
> there is something in your layer that is somehow triggering this.
No private layer involved.
I do have a makefile which encapsulates the environment stuff, but
that's it.
> Of course since that layer is secret and you can't show us it, we have a
> bit of a problem. Can you reproduce the bug against public code?
Done. (Our layer is becoming open, we're committed to it, but it's a
long process internally).
> Are you by any chance setting BB_STAMP_POLICY somewhere?
Yes. BB_STAMP_POLICY = "full".
I'll attach a copy of my local.conf and bblayers.conf.
--rich
[-- Attachment #2: bblayers.conf --]
[-- Type: text/plain, Size: 1010 bytes --]
# Time-stamp: <09-May-2012 10:50:03 PDT by rich.pixley@palm.com>
# Copyright (c) 2008 - 2012 Hewlett-Packard Development Company, L.P.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##
# LAYER_CONF_VERSION is increased each time build/conf/bblayers.conf
# changes incompatibly
LCONF_VERSION = "4"
PALMDIR ?= "/home/rich/projects/webos"
OECORE_LAYER ?= "${PALMDIR}/openembedded-core/meta"
WEBOS_LAYER ?= ""
BBFILES ?= ""
BBLAYERS ?= " \
${OECORE_LAYER} \
${WEBOS_LAYER} \
"
[-- Attachment #3: local.conf --]
[-- Type: text/plain, Size: 1678 bytes --]
# DO NOT MODIFY! This script is generated by configure. Changes made
# here will be lost. Source for this file is in local-conf.in.
# Time-stamp: <27-Apr-2012 15:23:26 PDT by rich.pixley@palm.com>
# Copyright (c) 2008 - 2012 Hewlett-Packard Development Company, L.P.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
MACHINE := "qemux86"
# Uncomment to have 'work' directories removed after a package builds
#INHERIT += "rm_work"
BB_STAMP_POLICY = "full"
COVERAGE_BUILD = "0"
TMPDIR := "/home/rich/projects/webos/BUILD-qemux86"
TCLIBCAPPEND := ""
PRODUCTION_BUILD := ""
# parallelization options
# there's an extra space in these CFLAGS such that defining
# 'TARGET_CFLAGS += ""' causes gdb to break. I'm tired of looking for
# it for now. Hence this strange construction of a naked trigger.
PARALLEL_MAKE := "-j 48"
BB_NUMBER_THREADS := "48"
BB_SRCREV_POLICY = "cache"
BB_FETCH_PREMIRRORONLY = "true"
# CONF_VERSION is increased each time build/conf/ changes incompatibly and is used to
# track the version of this file when it was generated. This can safely be ignored if
# this doesn't mean anything to you.
CONF_VERSION = "1"
next prev parent reply other threads:[~2012-05-09 18:02 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-02 18:21 SetScene tasks hang forever? Rich Pixley
2012-05-02 18:40 ` Mark Hatle
2012-05-02 19:16 ` Rich Pixley
2012-05-02 19:40 ` Mark Hatle
2012-05-02 19:45 ` Rich Pixley
2012-05-02 19:48 ` Mark Hatle
2012-05-02 23:06 ` Richard Purdie
2012-05-06 17:36 ` Rich Pixley
2012-05-07 16:38 ` Rich Pixley
2012-05-08 12:34 ` Richard Purdie
2012-05-09 17:51 ` Rich Pixley [this message]
2012-05-09 19:52 ` Richard Purdie
2012-05-09 23:04 ` Rich Pixley
2012-05-09 23:26 ` Richard Purdie
2012-05-10 0:03 ` Rich Pixley
2012-05-09 20:32 ` Richard Purdie
2012-05-09 23:20 ` Rich Pixley
2012-05-09 23:32 ` Richard Purdie
2012-05-10 0:00 ` Rich Pixley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FAAAEB6.2060703@palm.com \
--to=rich.pixley@palm.com \
--cc=openembedded-core@lists.openembedded.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.