Openembedded Core Discussions
 help / color / mirror / Atom feed
* SetScene tasks hang forever?
@ 2012-05-02 18:21 Rich Pixley
  2012-05-02 18:40 ` Mark Hatle
  0 siblings, 1 reply; 19+ messages in thread
From: Rich Pixley @ 2012-05-02 18:21 UTC (permalink / raw)
  To: Openembedded-core@lists.openembedded.org

I'm seeing a lot of builds apparently hanging forever, (the ones that 
work seem to work within seconds - the ones that hang seem to hang for 
at least 10's of minutes), with:

rich@dolphin> nice tail -f Log
MACHINE           = "qemux86"
DISTRO            = ""
DISTRO_VERSION    = "oe-core.0"
TUNE_FEATURES     = "m32 i586"
TARGET_FPU        = ""
meta              = "master:35b5fb2dd2131d4c7dc6635c14c6e08ea6926457"

NOTE: Resolving any missing task queue dependencies
NOTE: Preparing runqueue
NOTE: Executing SetScene Tasks

If I run top, I see one processor pinned at 98 - 99% utilization running 
python, but no other clues.

Can anyone point me to doc, explain what's going on here, or point me in 
the right direction to debug this?

--rich



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-02 18:21 SetScene tasks hang forever? Rich Pixley
@ 2012-05-02 18:40 ` Mark Hatle
  2012-05-02 19:16   ` Rich Pixley
  0 siblings, 1 reply; 19+ messages in thread
From: Mark Hatle @ 2012-05-02 18:40 UTC (permalink / raw)
  To: openembedded-core

On 5/2/12 1:21 PM, Rich Pixley wrote:
> I'm seeing a lot of builds apparently hanging forever, (the ones that
> work seem to work within seconds - the ones that hang seem to hang for
> at least 10's of minutes), with:
>
> rich@dolphin>  nice tail -f Log
> MACHINE           = "qemux86"
> DISTRO            = ""
> DISTRO_VERSION    = "oe-core.0"
> TUNE_FEATURES     = "m32 i586"
> TARGET_FPU        = ""
> meta              = "master:35b5fb2dd2131d4c7dc6635c14c6e08ea6926457"
>
> NOTE: Resolving any missing task queue dependencies
> NOTE: Preparing runqueue
> NOTE: Executing SetScene Tasks
>
> If I run top, I see one processor pinned at 98 - 99% utilization running
> python, but no other clues.
>
> Can anyone point me to doc, explain what's going on here, or point me in
> the right direction to debug this?

The only time I've seen "hang-like" behavior the system actually opened a 
devshell and was awaiting input.   But based on your log, it doesn't look like 
that is the case.

Run bitbake with -DDD option, you will get considerably more debug information 
and it might help point out what it thinks it is doing.

--Mark

> --rich
>
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-02 18:40 ` Mark Hatle
@ 2012-05-02 19:16   ` Rich Pixley
  2012-05-02 19:40     ` Mark Hatle
  0 siblings, 1 reply; 19+ messages in thread
From: Rich Pixley @ 2012-05-02 19:16 UTC (permalink / raw)
  To: openembedded-core

On 5/2/12 11:40 , Mark Hatle wrote:
> On 5/2/12 1:21 PM, Rich Pixley wrote:
>> I'm seeing a lot of builds apparently hanging forever, (the ones that
>> work seem to work within seconds - the ones that hang seem to hang for
>> at least 10's of minutes), with:
>>
>> rich@dolphin>   nice tail -f Log
>> MACHINE           = "qemux86"
>> DISTRO            = ""
>> DISTRO_VERSION    = "oe-core.0"
>> TUNE_FEATURES     = "m32 i586"
>> TARGET_FPU        = ""
>> meta              = "master:35b5fb2dd2131d4c7dc6635c14c6e08ea6926457"
>>
>> NOTE: Resolving any missing task queue dependencies
>> NOTE: Preparing runqueue
>> NOTE: Executing SetScene Tasks
>>
>> If I run top, I see one processor pinned at 98 - 99% utilization running
>> python, but no other clues.
>>
>> Can anyone point me to doc, explain what's going on here, or point me in
>> the right direction to debug this?
> The only time I've seen "hang-like" behavior the system actually opened a
> devshell and was awaiting input.   But based on your log, it doesn't look like
> that is the case.
>
> Run bitbake with -DDD option, you will get considerably more debug information
> and it might help point out what it thinks it is doing.
NOTE: Executing SetScene Tasks
DEBUG: Stamp for underlying task 
12(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/opkg/opkg_svn.bb, 
do_populate_sysroot) is current, so skipping setscene variant
DEBUG: Stamp for underlying task 
16(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/opkg-utils/opkg-utils_git.bb, 
do_populate_sysroot) is current, so skipping setscene variant
DEBUG: Stamp for underlying task 
20(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/makedevs/makedevs_1.0.0.bb, 
do_populate_sysroot) is current, so skipping setscene variant
DEBUG: Stamp for underlying task 
24(/home/rich/projects/webos/openembedded-core/meta/recipes-core/eglibc/ldconfig-native_2.12.1.bb, 
do_populate_sysroot) is current, so skipping setscene variant
DEBUG: Stamp for underlying task 
32(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/genext2fs/genext2fs_1.4.1.bb, 
do_populate_sysroot) is current, so skipping setscene variant
DEBUG: Stamp for underlying task 
36(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/e2fsprogs/e2fsprogs_1.42.1.bb, 
do_populate_sysroot) is current, so skipping setscene variant
DEBUG: Stamp for underlying task 
40(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/qemu/qemu_0.15.1.bb, 
do_populate_sysroot) is current, so skipping setscene variant
DEBUG: Stamp for underlying task 
44(/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/qemu/qemu-helper-native_1.0.bb, 
do_populate_sysroot) is current, so skipping setscene variant

And then the spinning hang.

--rich



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-02 19:16   ` Rich Pixley
@ 2012-05-02 19:40     ` Mark Hatle
  2012-05-02 19:45       ` Rich Pixley
  0 siblings, 1 reply; 19+ messages in thread
From: Mark Hatle @ 2012-05-02 19:40 UTC (permalink / raw)
  To: openembedded-core

On 5/2/12 2:16 PM, Rich Pixley wrote:
> On 5/2/12 11:40 , Mark Hatle wrote:
>> On 5/2/12 1:21 PM, Rich Pixley wrote:
>>> I'm seeing a lot of builds apparently hanging forever, (the ones that
>>> work seem to work within seconds - the ones that hang seem to hang for
>>> at least 10's of minutes), with:
>>>
>>> rich@dolphin>    nice tail -f Log
>>> MACHINE           = "qemux86"
>>> DISTRO            = ""
>>> DISTRO_VERSION    = "oe-core.0"
>>> TUNE_FEATURES     = "m32 i586"
>>> TARGET_FPU        = ""
>>> meta              = "master:35b5fb2dd2131d4c7dc6635c14c6e08ea6926457"
>>>
>>> NOTE: Resolving any missing task queue dependencies
>>> NOTE: Preparing runqueue
>>> NOTE: Executing SetScene Tasks
>>>
>>> If I run top, I see one processor pinned at 98 - 99% utilization running
>>> python, but no other clues.
>>>
>>> Can anyone point me to doc, explain what's going on here, or point me in
>>> the right direction to debug this?
>> The only time I've seen "hang-like" behavior the system actually opened a
>> devshell and was awaiting input.   But based on your log, it doesn't look like
>> that is the case.
>>
>> Run bitbake with -DDD option, you will get considerably more debug information
>> and it might help point out what it thinks it is doing.
> NOTE: Executing SetScene Tasks
> DEBUG: Stamp for underlying task
> 12(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/opkg/opkg_svn.bb,
> do_populate_sysroot) is current, so skipping setscene variant
> DEBUG: Stamp for underlying task
> 16(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/opkg-utils/opkg-utils_git.bb,
> do_populate_sysroot) is current, so skipping setscene variant
> DEBUG: Stamp for underlying task
> 20(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/makedevs/makedevs_1.0.0.bb,
> do_populate_sysroot) is current, so skipping setscene variant
> DEBUG: Stamp for underlying task
> 24(/home/rich/projects/webos/openembedded-core/meta/recipes-core/eglibc/ldconfig-native_2.12.1.bb,
> do_populate_sysroot) is current, so skipping setscene variant
> DEBUG: Stamp for underlying task
> 32(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/genext2fs/genext2fs_1.4.1.bb,
> do_populate_sysroot) is current, so skipping setscene variant
> DEBUG: Stamp for underlying task
> 36(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/e2fsprogs/e2fsprogs_1.42.1.bb,
> do_populate_sysroot) is current, so skipping setscene variant
> DEBUG: Stamp for underlying task
> 40(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/qemu/qemu_0.15.1.bb,
> do_populate_sysroot) is current, so skipping setscene variant
> DEBUG: Stamp for underlying task
> 44(/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/qemu/qemu-helper-native_1.0.bb,
> do_populate_sysroot) is current, so skipping setscene variant
>
> And then the spinning hang.

Sorry, I don't know how to continue debugging what might be wrong.  The only 
other thing I can suggest is check that your filesystem is "real", not a 
netapp/nfs/network emulated filesystem....

And if you were continuing a previous build, start a new build directory and 
retry it.

--Mark

> --rich
>
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-02 19:40     ` Mark Hatle
@ 2012-05-02 19:45       ` Rich Pixley
  2012-05-02 19:48         ` Mark Hatle
  0 siblings, 1 reply; 19+ messages in thread
From: Rich Pixley @ 2012-05-02 19:45 UTC (permalink / raw)
  To: openembedded-core

On 5/2/12 12:40 , Mark Hatle wrote:
> On 5/2/12 2:16 PM, Rich Pixley wrote:
>> On 5/2/12 11:40 , Mark Hatle wrote:
>>> On 5/2/12 1:21 PM, Rich Pixley wrote:
>>>> I'm seeing a lot of builds apparently hanging forever, (the ones that
>>>> work seem to work within seconds - the ones that hang seem to hang for
>>>> at least 10's of minutes), with:
>>>>
>>>> rich@dolphin>     nice tail -f Log
>>>> MACHINE           = "qemux86"
>>>> DISTRO            = ""
>>>> DISTRO_VERSION    = "oe-core.0"
>>>> TUNE_FEATURES     = "m32 i586"
>>>> TARGET_FPU        = ""
>>>> meta              = "master:35b5fb2dd2131d4c7dc6635c14c6e08ea6926457"
>>>>
>>>> NOTE: Resolving any missing task queue dependencies
>>>> NOTE: Preparing runqueue
>>>> NOTE: Executing SetScene Tasks
>>>>
>>>> If I run top, I see one processor pinned at 98 - 99% utilization running
>>>> python, but no other clues.
>>>>
>>>> Can anyone point me to doc, explain what's going on here, or point me in
>>>> the right direction to debug this?
>>> The only time I've seen "hang-like" behavior the system actually opened a
>>> devshell and was awaiting input.   But based on your log, it doesn't look like
>>> that is the case.
>>>
>>> Run bitbake with -DDD option, you will get considerably more debug information
>>> and it might help point out what it thinks it is doing.
>> NOTE: Executing SetScene Tasks
>> DEBUG: Stamp for underlying task
>> 12(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/opkg/opkg_svn.bb,
>> do_populate_sysroot) is current, so skipping setscene variant
>> DEBUG: Stamp for underlying task
>> 16(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/opkg-utils/opkg-utils_git.bb,
>> do_populate_sysroot) is current, so skipping setscene variant
>> DEBUG: Stamp for underlying task
>> 20(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/makedevs/makedevs_1.0.0.bb,
>> do_populate_sysroot) is current, so skipping setscene variant
>> DEBUG: Stamp for underlying task
>> 24(/home/rich/projects/webos/openembedded-core/meta/recipes-core/eglibc/ldconfig-native_2.12.1.bb,
>> do_populate_sysroot) is current, so skipping setscene variant
>> DEBUG: Stamp for underlying task
>> 32(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/genext2fs/genext2fs_1.4.1.bb,
>> do_populate_sysroot) is current, so skipping setscene variant
>> DEBUG: Stamp for underlying task
>> 36(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/e2fsprogs/e2fsprogs_1.42.1.bb,
>> do_populate_sysroot) is current, so skipping setscene variant
>> DEBUG: Stamp for underlying task
>> 40(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/qemu/qemu_0.15.1.bb,
>> do_populate_sysroot) is current, so skipping setscene variant
>> DEBUG: Stamp for underlying task
>> 44(/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/qemu/qemu-helper-native_1.0.bb,
>> do_populate_sysroot) is current, so skipping setscene variant
>>
>> And then the spinning hang.
> Sorry, I don't know how to continue debugging what might be wrong.  The only
> other thing I can suggest is check that your filesystem is "real", not a
> netapp/nfs/network emulated filesystem....
>
> And if you were continuing a previous build, start a new build directory and
> retry it.
Local file system.  I'm building a second time expecting a null build 
pass.  I was able to get a null build pass in the same directory yesterday.

Removing my build directory and starting over has been working, but 
costs me a few hours each time, and this happens frequently enough that 
I get no other work done.  :(.

Thanks for reading.

--rich



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-02 19:45       ` Rich Pixley
@ 2012-05-02 19:48         ` Mark Hatle
  2012-05-02 23:06           ` Richard Purdie
  0 siblings, 1 reply; 19+ messages in thread
From: Mark Hatle @ 2012-05-02 19:48 UTC (permalink / raw)
  To: openembedded-core

On 5/2/12 2:45 PM, Rich Pixley wrote:
> On 5/2/12 12:40 , Mark Hatle wrote:
>> On 5/2/12 2:16 PM, Rich Pixley wrote:
>>> On 5/2/12 11:40 , Mark Hatle wrote:
>>>> On 5/2/12 1:21 PM, Rich Pixley wrote:
>>>>> I'm seeing a lot of builds apparently hanging forever, (the ones that
>>>>> work seem to work within seconds - the ones that hang seem to hang for
>>>>> at least 10's of minutes), with:
>>>>>
>>>>> rich@dolphin>      nice tail -f Log
>>>>> MACHINE           = "qemux86"
>>>>> DISTRO            = ""
>>>>> DISTRO_VERSION    = "oe-core.0"
>>>>> TUNE_FEATURES     = "m32 i586"
>>>>> TARGET_FPU        = ""
>>>>> meta              = "master:35b5fb2dd2131d4c7dc6635c14c6e08ea6926457"
>>>>>
>>>>> NOTE: Resolving any missing task queue dependencies
>>>>> NOTE: Preparing runqueue
>>>>> NOTE: Executing SetScene Tasks
>>>>>
>>>>> If I run top, I see one processor pinned at 98 - 99% utilization running
>>>>> python, but no other clues.
>>>>>
>>>>> Can anyone point me to doc, explain what's going on here, or point me in
>>>>> the right direction to debug this?
>>>> The only time I've seen "hang-like" behavior the system actually opened a
>>>> devshell and was awaiting input.   But based on your log, it doesn't look like
>>>> that is the case.
>>>>
>>>> Run bitbake with -DDD option, you will get considerably more debug information
>>>> and it might help point out what it thinks it is doing.
>>> NOTE: Executing SetScene Tasks
>>> DEBUG: Stamp for underlying task
>>> 12(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/opkg/opkg_svn.bb,
>>> do_populate_sysroot) is current, so skipping setscene variant
>>> DEBUG: Stamp for underlying task
>>> 16(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/opkg-utils/opkg-utils_git.bb,
>>> do_populate_sysroot) is current, so skipping setscene variant
>>> DEBUG: Stamp for underlying task
>>> 20(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/makedevs/makedevs_1.0.0.bb,
>>> do_populate_sysroot) is current, so skipping setscene variant
>>> DEBUG: Stamp for underlying task
>>> 24(/home/rich/projects/webos/openembedded-core/meta/recipes-core/eglibc/ldconfig-native_2.12.1.bb,
>>> do_populate_sysroot) is current, so skipping setscene variant
>>> DEBUG: Stamp for underlying task
>>> 32(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/genext2fs/genext2fs_1.4.1.bb,
>>> do_populate_sysroot) is current, so skipping setscene variant
>>> DEBUG: Stamp for underlying task
>>> 36(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/e2fsprogs/e2fsprogs_1.42.1.bb,
>>> do_populate_sysroot) is current, so skipping setscene variant
>>> DEBUG: Stamp for underlying task
>>> 40(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/qemu/qemu_0.15.1.bb,
>>> do_populate_sysroot) is current, so skipping setscene variant
>>> DEBUG: Stamp for underlying task
>>> 44(/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/qemu/qemu-helper-native_1.0.bb,
>>> do_populate_sysroot) is current, so skipping setscene variant
>>>
>>> And then the spinning hang.
>> Sorry, I don't know how to continue debugging what might be wrong.  The only
>> other thing I can suggest is check that your filesystem is "real", not a
>> netapp/nfs/network emulated filesystem....
>>
>> And if you were continuing a previous build, start a new build directory and
>> retry it.
> Local file system.  I'm building a second time expecting a null build
> pass.  I was able to get a null build pass in the same directory yesterday.
>
> Removing my build directory and starting over has been working, but
> costs me a few hours each time, and this happens frequently enough that
> I get no other work done.  :(.

Ya, that is certainly not acceptable.  If you could file a bug on the 
bugzilla.yoctoproject.org someone might be able to help you diagnose this 
further and hopefully figure out a fix.

--Mark

> Thanks for reading.
>
> --rich
>
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-02 19:48         ` Mark Hatle
@ 2012-05-02 23:06           ` Richard Purdie
  2012-05-06 17:36             ` Rich Pixley
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Purdie @ 2012-05-02 23:06 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

On Wed, 2012-05-02 at 14:48 -0500, Mark Hatle wrote:
> On 5/2/12 2:45 PM, Rich Pixley wrote:
> > On 5/2/12 12:40 , Mark Hatle wrote:
> >> On 5/2/12 2:16 PM, Rich Pixley wrote:
> >>> On 5/2/12 11:40 , Mark Hatle wrote:
> >>>> On 5/2/12 1:21 PM, Rich Pixley wrote:
> >>>>> I'm seeing a lot of builds apparently hanging forever, (the ones that
> >>>>> work seem to work within seconds - the ones that hang seem to hang for
> >>>>> at least 10's of minutes), with:
> >>>>>
> >>>>> rich@dolphin>      nice tail -f Log
> >>>>> MACHINE           = "qemux86"
> >>>>> DISTRO            = ""
> >>>>> DISTRO_VERSION    = "oe-core.0"
> >>>>> TUNE_FEATURES     = "m32 i586"
> >>>>> TARGET_FPU        = ""
> >>>>> meta              = "master:35b5fb2dd2131d4c7dc6635c14c6e08ea6926457"
> >>>>>
> >>>>> NOTE: Resolving any missing task queue dependencies
> >>>>> NOTE: Preparing runqueue
> >>>>> NOTE: Executing SetScene Tasks
> >>>>>
> >>>>> If I run top, I see one processor pinned at 98 - 99% utilization running
> >>>>> python, but no other clues.
> >>>>>
> >>>>> Can anyone point me to doc, explain what's going on here, or point me in
> >>>>> the right direction to debug this?
> >>>> The only time I've seen "hang-like" behavior the system actually opened a
> >>>> devshell and was awaiting input.   But based on your log, it doesn't look like
> >>>> that is the case.
> >>>>
> >>>> Run bitbake with -DDD option, you will get considerably more debug information
> >>>> and it might help point out what it thinks it is doing.
> >>> NOTE: Executing SetScene Tasks
> >>> DEBUG: Stamp for underlying task
> >>> 12(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/opkg/opkg_svn.bb,
> >>> do_populate_sysroot) is current, so skipping setscene variant
> >>> DEBUG: Stamp for underlying task
> >>> 16(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/opkg-utils/opkg-utils_git.bb,
> >>> do_populate_sysroot) is current, so skipping setscene variant
> >>> DEBUG: Stamp for underlying task
> >>> 20(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/makedevs/makedevs_1.0.0.bb,
> >>> do_populate_sysroot) is current, so skipping setscene variant
> >>> DEBUG: Stamp for underlying task
> >>> 24(/home/rich/projects/webos/openembedded-core/meta/recipes-core/eglibc/ldconfig-native_2.12.1.bb,
> >>> do_populate_sysroot) is current, so skipping setscene variant
> >>> DEBUG: Stamp for underlying task
> >>> 32(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/genext2fs/genext2fs_1.4.1.bb,
> >>> do_populate_sysroot) is current, so skipping setscene variant
> >>> DEBUG: Stamp for underlying task
> >>> 36(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/e2fsprogs/e2fsprogs_1.42.1.bb,
> >>> do_populate_sysroot) is current, so skipping setscene variant
> >>> DEBUG: Stamp for underlying task
> >>> 40(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/qemu/qemu_0.15.1.bb,
> >>> do_populate_sysroot) is current, so skipping setscene variant
> >>> DEBUG: Stamp for underlying task
> >>> 44(/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/qemu/qemu-helper-native_1.0.bb,
> >>> do_populate_sysroot) is current, so skipping setscene variant
> >>>
> >>> And then the spinning hang.
> >> Sorry, I don't know how to continue debugging what might be wrong.  The only
> >> other thing I can suggest is check that your filesystem is "real", not a
> >> netapp/nfs/network emulated filesystem....
> >>
> >> And if you were continuing a previous build, start a new build directory and
> >> retry it.
> > Local file system.  I'm building a second time expecting a null build
> > pass.  I was able to get a null build pass in the same directory yesterday.
> >
> > Removing my build directory and starting over has been working, but
> > costs me a few hours each time, and this happens frequently enough that
> > I get no other work done.  :(.
> 
> Ya, that is certainly not acceptable.  If you could file a bug on the 
> bugzilla.yoctoproject.org someone might be able to help you diagnose this 
> further and hopefully figure out a fix.

What would really help is a way to reproduce this...

Does it reproduce with a certain set of metadata/sstate perhaps?

What is odd about the above logs is that it appears bitbake never
executes any task. Its possible something might have crashed somewhere I
guess and not realise part of the system had died. Or it could be some
kind of circular dependency loop where X needs Y to build and Y needs X
so nothing happens. We are supposed to spot and error if that would have
happened.

Does strace give an idea of which bits of bitbake are alive/looping? I'd
probably resort to a few print()/bb.error() in the code at this point to
find out what is alive, what is dead and where its looping...

Cheers,

Richard




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-02 23:06           ` Richard Purdie
@ 2012-05-06 17:36             ` Rich Pixley
  2012-05-07 16:38               ` Rich Pixley
  2012-05-08 12:34               ` Richard Purdie
  0 siblings, 2 replies; 19+ messages in thread
From: Rich Pixley @ 2012-05-06 17:36 UTC (permalink / raw)
  To: openembedded-core

On 5/2/12 16:06 , Richard Purdie wrote:
> On Wed, 2012-05-02 at 14:48 -0500, Mark Hatle wrote:
>> On 5/2/12 2:45 PM, Rich Pixley wrote:
>>> On 5/2/12 12:40 , Mark Hatle wrote:
>>>> On 5/2/12 2:16 PM, Rich Pixley wrote:
>>>>> On 5/2/12 11:40 , Mark Hatle wrote:
>>>>>> On 5/2/12 1:21 PM, Rich Pixley wrote:
>>>>>>> I'm seeing a lot of builds apparently hanging forever, (the ones that
>>>>>>> work seem to work within seconds - the ones that hang seem to hang for
>>>>>>> at least 10's of minutes), with:
>>>>>>>
>>>>>>> rich@dolphin>       nice tail -f Log
>>>>>>> MACHINE           = "qemux86"
>>>>>>> DISTRO            = ""
>>>>>>> DISTRO_VERSION    = "oe-core.0"
>>>>>>> TUNE_FEATURES     = "m32 i586"
>>>>>>> TARGET_FPU        = ""
>>>>>>> meta              = "master:35b5fb2dd2131d4c7dc6635c14c6e08ea6926457"
>>>>>>>
>>>>>>> NOTE: Resolving any missing task queue dependencies
>>>>>>> NOTE: Preparing runqueue
>>>>>>> NOTE: Executing SetScene Tasks
>>>>>>>
>>>>>>> If I run top, I see one processor pinned at 98 - 99% utilization running
>>>>>>> python, but no other clues.
>>>>>>>
>>>>>>> Can anyone point me to doc, explain what's going on here, or point me in
>>>>>>> the right direction to debug this?
>>>>>> The only time I've seen "hang-like" behavior the system actually opened a
>>>>>> devshell and was awaiting input.   But based on your log, it doesn't look like
>>>>>> that is the case.
>>>>>>
>>>>>> Run bitbake with -DDD option, you will get considerably more debug information
>>>>>> and it might help point out what it thinks it is doing.
>>>>> NOTE: Executing SetScene Tasks
>>>>> DEBUG: Stamp for underlying task
>>>>> 12(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/opkg/opkg_svn.bb,
>>>>> do_populate_sysroot) is current, so skipping setscene variant
>>>>> DEBUG: Stamp for underlying task
>>>>> 16(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/opkg-utils/opkg-utils_git.bb,
>>>>> do_populate_sysroot) is current, so skipping setscene variant
>>>>> DEBUG: Stamp for underlying task
>>>>> 20(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/makedevs/makedevs_1.0.0.bb,
>>>>> do_populate_sysroot) is current, so skipping setscene variant
>>>>> DEBUG: Stamp for underlying task
>>>>> 24(/home/rich/projects/webos/openembedded-core/meta/recipes-core/eglibc/ldconfig-native_2.12.1.bb,
>>>>> do_populate_sysroot) is current, so skipping setscene variant
>>>>> DEBUG: Stamp for underlying task
>>>>> 32(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/genext2fs/genext2fs_1.4.1.bb,
>>>>> do_populate_sysroot) is current, so skipping setscene variant
>>>>> DEBUG: Stamp for underlying task
>>>>> 36(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/e2fsprogs/e2fsprogs_1.42.1.bb,
>>>>> do_populate_sysroot) is current, so skipping setscene variant
>>>>> DEBUG: Stamp for underlying task
>>>>> 40(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/qemu/qemu_0.15.1.bb,
>>>>> do_populate_sysroot) is current, so skipping setscene variant
>>>>> DEBUG: Stamp for underlying task
>>>>> 44(/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/qemu/qemu-helper-native_1.0.bb,
>>>>> do_populate_sysroot) is current, so skipping setscene variant
>>>>>
>>>>> And then the spinning hang.
>>>> Sorry, I don't know how to continue debugging what might be wrong.  The only
>>>> other thing I can suggest is check that your filesystem is "real", not a
>>>> netapp/nfs/network emulated filesystem....
>>>>
>>>> And if you were continuing a previous build, start a new build directory and
>>>> retry it.
>>> Local file system.  I'm building a second time expecting a null build
>>> pass.  I was able to get a null build pass in the same directory yesterday.
>>>
>>> Removing my build directory and starting over has been working, but
>>> costs me a few hours each time, and this happens frequently enough that
>>> I get no other work done.  :(.
>> Ya, that is certainly not acceptable.  If you could file a bug on the
>> bugzilla.yoctoproject.org someone might be able to help you diagnose this
>> further and hopefully figure out a fix.
> What would really help is a way to reproduce this...
>
> Does it reproduce with a certain set of metadata/sstate perhaps?
>
> What is odd about the above logs is that it appears bitbake never
> executes any task. Its possible something might have crashed somewhere I
> guess and not realise part of the system had died. Or it could be some
> kind of circular dependency loop where X needs Y to build and Y needs X
> so nothing happens. We are supposed to spot and error if that would have
> happened.
>
> Does strace give an idea of which bits of bitbake are alive/looping? I'd
> probably resort to a few print()/bb.error() in the code at this point to
> find out what is alive, what is dead and where its looping...
I have more info now.

What I suspected was looping, (since it took longer than the ~1hr I was 
willing to wait), isn't actual looping.  Given enough time, the builds 
do complete and I have comparable results on 5 different servers, (all 
ubuntu-12.04 amd64 and all on btrfs).

My initial, full builds of core-image-minimal do build, and they build 
in ~60min, (~30min if I hand seed the downloads directory).  I'm using 
no mirrors other than the defaults.  My second build in an already built 
directory, (expected to do nothing), takes anywhere from 7 - 10.5hrs to 
complete and successfully do nothing, depending on the server.

During this time, top shows a single cpu pinned at 98 - 100% 
utilization, and strace shows literally millions of access and stat 
calls on stamp files, mkdir on the stamps directory, etc.  Statistical 
analysis of just the do_fetch access calls shows a distribution that 
seems to mimic the topological tree.  That is, the most called access is 
for quilt-native and the components higher up the tree get fewer stats.

Oh, and the setscene stamps are all nonexistent.  I presume that's expected.

First, I can't imagine why there would need to be more than one mkdir on 
the stamps directory within a single instantiation of bitbake.  I can 
imagine that it was easier to attempt to mkdir it than to check first, 
but once it has been mkdir'd, (or checked), there's no need to do it 
another million times, is there?

Second, I can't imagine why there would need to be all the redundant 
stamp checking.  That info is cached internally, isn't it?

And third, the fact that it seems to be checking the entire subtree what 
appear to be multiple times at every node suggests to me that the 
checking algorithm is broken.  Back of the envelope... perhaps 300 
components, maybe 10 tasks per component ~= 3e3 tasks.  Figure a 
geometric explosion of checks for an inefficient algorithm and we're up 
to around 10e6 checks.  I haven't counted an entire run, but based on 
the time it takes to run, I'd say I'm seeing one, maybe two orders of 
magnitude more checks than that.  I've seen a few million node 
traversals in about 15min and a node traversal appears to involve 
several accesses and at least one stat.

I'm not familiar with the current bitbake internals so my next thought 
would be to replace the calls to access, stat, and mkdir on the stamp 
files with caching, counting calls.  Build a dictionary of each file 
called, if it's new, do the kernel call and cache the result in the 
dictionary.  If it's already in the dictionary, then inc a counter for 
it and return the cached value.  This should a) improve the speed of the 
current algorithm, b) improve the speed of the eventual replacement 
algorithm, and c) give us some useful statistical data in the mean time.

I'm also going to try reformating one of the systems and compare how 
long a build on ext4 takes.

Any other ideas?

--rich





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-06 17:36             ` Rich Pixley
@ 2012-05-07 16:38               ` Rich Pixley
  2012-05-08 12:34               ` Richard Purdie
  1 sibling, 0 replies; 19+ messages in thread
From: Rich Pixley @ 2012-05-07 16:38 UTC (permalink / raw)
  To: openembedded-core@lists.openembedded.org

On 5/6/12 10:36 , Rich Pixley wrote:
> On 5/2/12 16:06 , Richard Purdie wrote:
>> On Wed, 2012-05-02 at 14:48 -0500, Mark Hatle wrote:
>>> On 5/2/12 2:45 PM, Rich Pixley wrote:
>>>> On 5/2/12 12:40 , Mark Hatle wrote:
>>>>> On 5/2/12 2:16 PM, Rich Pixley wrote:
>>>>>> On 5/2/12 11:40 , Mark Hatle wrote:
>>>>>>> On 5/2/12 1:21 PM, Rich Pixley wrote:
>>>>>>>> I'm seeing a lot of builds apparently hanging forever, (the ones that
>>>>>>>> work seem to work within seconds - the ones that hang seem to hang for
>>>>>>>> at least 10's of minutes), with:
>>>>>>>>
>>>>>>>> rich@dolphin>        nice tail -f Log
>>>>>>>> MACHINE           = "qemux86"
>>>>>>>> DISTRO            = ""
>>>>>>>> DISTRO_VERSION    = "oe-core.0"
>>>>>>>> TUNE_FEATURES     = "m32 i586"
>>>>>>>> TARGET_FPU        = ""
>>>>>>>> meta              = "master:35b5fb2dd2131d4c7dc6635c14c6e08ea6926457"
>>>>>>>>
>>>>>>>> NOTE: Resolving any missing task queue dependencies
>>>>>>>> NOTE: Preparing runqueue
>>>>>>>> NOTE: Executing SetScene Tasks
>>>>>>>>
>>>>>>>> If I run top, I see one processor pinned at 98 - 99% utilization running
>>>>>>>> python, but no other clues.
>>>>>>>>
>>>>>>>> Can anyone point me to doc, explain what's going on here, or point me in
>>>>>>>> the right direction to debug this?
>>>>>>> The only time I've seen "hang-like" behavior the system actually opened a
>>>>>>> devshell and was awaiting input.   But based on your log, it doesn't look like
>>>>>>> that is the case.
>>>>>>>
>>>>>>> Run bitbake with -DDD option, you will get considerably more debug information
>>>>>>> and it might help point out what it thinks it is doing.
>>>>>> NOTE: Executing SetScene Tasks
>>>>>> DEBUG: Stamp for underlying task
>>>>>> 12(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/opkg/opkg_svn.bb,
>>>>>> do_populate_sysroot) is current, so skipping setscene variant
>>>>>> DEBUG: Stamp for underlying task
>>>>>> 16(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/opkg-utils/opkg-utils_git.bb,
>>>>>> do_populate_sysroot) is current, so skipping setscene variant
>>>>>> DEBUG: Stamp for underlying task
>>>>>> 20(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/makedevs/makedevs_1.0.0.bb,
>>>>>> do_populate_sysroot) is current, so skipping setscene variant
>>>>>> DEBUG: Stamp for underlying task
>>>>>> 24(/home/rich/projects/webos/openembedded-core/meta/recipes-core/eglibc/ldconfig-native_2.12.1.bb,
>>>>>> do_populate_sysroot) is current, so skipping setscene variant
>>>>>> DEBUG: Stamp for underlying task
>>>>>> 32(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/genext2fs/genext2fs_1.4.1.bb,
>>>>>> do_populate_sysroot) is current, so skipping setscene variant
>>>>>> DEBUG: Stamp for underlying task
>>>>>> 36(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/e2fsprogs/e2fsprogs_1.42.1.bb,
>>>>>> do_populate_sysroot) is current, so skipping setscene variant
>>>>>> DEBUG: Stamp for underlying task
>>>>>> 40(virtual:native:/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/qemu/qemu_0.15.1.bb,
>>>>>> do_populate_sysroot) is current, so skipping setscene variant
>>>>>> DEBUG: Stamp for underlying task
>>>>>> 44(/home/rich/projects/webos/openembedded-core/meta/recipes-devtools/qemu/qemu-helper-native_1.0.bb,
>>>>>> do_populate_sysroot) is current, so skipping setscene variant
>>>>>>
>>>>>> And then the spinning hang.
>>>>> Sorry, I don't know how to continue debugging what might be wrong.  The only
>>>>> other thing I can suggest is check that your filesystem is "real", not a
>>>>> netapp/nfs/network emulated filesystem....
>>>>>
>>>>> And if you were continuing a previous build, start a new build directory and
>>>>> retry it.
>>>> Local file system.  I'm building a second time expecting a null build
>>>> pass.  I was able to get a null build pass in the same directory yesterday.
>>>>
>>>> Removing my build directory and starting over has been working, but
>>>> costs me a few hours each time, and this happens frequently enough that
>>>> I get no other work done.  :(.
>>> Ya, that is certainly not acceptable.  If you could file a bug on the
>>> bugzilla.yoctoproject.org someone might be able to help you diagnose this
>>> further and hopefully figure out a fix.
>> What would really help is a way to reproduce this...
>>
>> Does it reproduce with a certain set of metadata/sstate perhaps?
>>
>> What is odd about the above logs is that it appears bitbake never
>> executes any task. Its possible something might have crashed somewhere I
>> guess and not realise part of the system had died. Or it could be some
>> kind of circular dependency loop where X needs Y to build and Y needs X
>> so nothing happens. We are supposed to spot and error if that would have
>> happened.
>>
>> Does strace give an idea of which bits of bitbake are alive/looping? I'd
>> probably resort to a few print()/bb.error() in the code at this point to
>> find out what is alive, what is dead and where its looping...
> I have more info now.
>
> What I suspected was looping, (since it took longer than the ~1hr I was
> willing to wait), isn't actual looping.  Given enough time, the builds
> do complete and I have comparable results on 5 different servers, (all
> ubuntu-12.04 amd64 and all on btrfs).
>
> My initial, full builds of core-image-minimal do build, and they build
> in ~60min, (~30min if I hand seed the downloads directory).  I'm using
> no mirrors other than the defaults.  My second build in an already built
> directory, (expected to do nothing), takes anywhere from 7 - 10.5hrs to
> complete and successfully do nothing, depending on the server.
>
> During this time, top shows a single cpu pinned at 98 - 100%
> utilization, and strace shows literally millions of access and stat
> calls on stamp files, mkdir on the stamps directory, etc.  Statistical
> analysis of just the do_fetch access calls shows a distribution that
> seems to mimic the topological tree.  That is, the most called access is
> for quilt-native and the components higher up the tree get fewer stats.
>
> Oh, and the setscene stamps are all nonexistent.  I presume that's expected.
>
> First, I can't imagine why there would need to be more than one mkdir on
> the stamps directory within a single instantiation of bitbake.  I can
> imagine that it was easier to attempt to mkdir it than to check first,
> but once it has been mkdir'd, (or checked), there's no need to do it
> another million times, is there?
>
> Second, I can't imagine why there would need to be all the redundant
> stamp checking.  That info is cached internally, isn't it?
>
> And third, the fact that it seems to be checking the entire subtree what
> appear to be multiple times at every node suggests to me that the
> checking algorithm is broken.  Back of the envelope... perhaps 300
> components, maybe 10 tasks per component ~= 3e3 tasks.  Figure a
> geometric explosion of checks for an inefficient algorithm and we're up
> to around 10e6 checks.  I haven't counted an entire run, but based on
> the time it takes to run, I'd say I'm seeing one, maybe two orders of
> magnitude more checks than that.  I've seen a few million node
> traversals in about 15min and a node traversal appears to involve
> several accesses and at least one stat.
>
> I'm not familiar with the current bitbake internals so my next thought
> would be to replace the calls to access, stat, and mkdir on the stamp
> files with caching, counting calls.  Build a dictionary of each file
> called, if it's new, do the kernel call and cache the result in the
> dictionary.  If it's already in the dictionary, then inc a counter for
> it and return the cached value.  This should a) improve the speed of the
> current algorithm, b) improve the speed of the eventual replacement
> algorithm, and c) give us some useful statistical data in the mean time.
>
> I'm also going to try reformating one of the systems and compare how
> long a build on ext4 takes.
A build on ext4 produces comparable results for me.  ~30min initial 
build, ~7hrs for a second (do-nothing) build.

--rich



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-06 17:36             ` Rich Pixley
  2012-05-07 16:38               ` Rich Pixley
@ 2012-05-08 12:34               ` Richard Purdie
  2012-05-09 17:51                 ` Rich Pixley
  1 sibling, 1 reply; 19+ messages in thread
From: Richard Purdie @ 2012-05-08 12:34 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

On Sun, 2012-05-06 at 10:36 -0700, Rich Pixley wrote:
> On 5/2/12 16:06 , Richard Purdie wrote:
> > On Wed, 2012-05-02 at 14:48 -0500, Mark Hatle wrote:
> >> On 5/2/12 2:45 PM, Rich Pixley wrote:
> > What would really help is a way to reproduce this...
> >
> > Does it reproduce with a certain set of metadata/sstate perhaps?
> >
> > What is odd about the above logs is that it appears bitbake never
> > executes any task. Its possible something might have crashed somewhere I
> > guess and not realise part of the system had died. Or it could be some
> > kind of circular dependency loop where X needs Y to build and Y needs X
> > so nothing happens. We are supposed to spot and error if that would have
> > happened.
> >
> > Does strace give an idea of which bits of bitbake are alive/looping? I'd
> > probably resort to a few print()/bb.error() in the code at this point to
> > find out what is alive, what is dead and where its looping...
> I have more info now.
> 
> What I suspected was looping, (since it took longer than the ~1hr I was 
> willing to wait), isn't actual looping.  Given enough time, the builds 
> do complete and I have comparable results on 5 different servers, (all 
> ubuntu-12.04 amd64 and all on btrfs).
> 
> My initial, full builds of core-image-minimal do build, and they build 
> in ~60min, (~30min if I hand seed the downloads directory).  I'm using 
> no mirrors other than the defaults.  My second build in an already built 
> directory, (expected to do nothing), takes anywhere from 7 - 10.5hrs to 
> complete and successfully do nothing, depending on the server.
> 
> During this time, top shows a single cpu pinned at 98 - 100% 
> utilization, and strace shows literally millions of access and stat 
> calls on stamp files, mkdir on the stamps directory, etc.  Statistical 
> analysis of just the do_fetch access calls shows a distribution that 
> seems to mimic the topological tree.  That is, the most called access is 
> for quilt-native and the components higher up the tree get fewer stats.
> 
> Oh, and the setscene stamps are all nonexistent.  I presume that's expected.
> 
> First, I can't imagine why there would need to be more than one mkdir on 
> the stamps directory within a single instantiation of bitbake.  I can 
> imagine that it was easier to attempt to mkdir it than to check first, 
> but once it has been mkdir'd, (or checked), there's no need to do it 
> another million times, is there?
> 
> Second, I can't imagine why there would need to be all the redundant 
> stamp checking.  That info is cached internally, isn't it?
> 
> And third, the fact that it seems to be checking the entire subtree what 
> appear to be multiple times at every node suggests to me that the 
> checking algorithm is broken.  Back of the envelope... perhaps 300 
> components, maybe 10 tasks per component ~= 3e3 tasks.  Figure a 
> geometric explosion of checks for an inefficient algorithm and we're up 
> to around 10e6 checks.  I haven't counted an entire run, but based on 
> the time it takes to run, I'd say I'm seeing one, maybe two orders of 
> magnitude more checks than that.  I've seen a few million node 
> traversals in about 15min and a node traversal appears to involve 
> several accesses and at least one stat.
> 
> I'm not familiar with the current bitbake internals so my next thought 
> would be to replace the calls to access, stat, and mkdir on the stamp 
> files with caching, counting calls.  Build a dictionary of each file 
> called, if it's new, do the kernel call and cache the result in the 
> dictionary.  If it's already in the dictionary, then inc a counter for 
> it and return the cached value.  This should a) improve the speed of the 
> current algorithm, b) improve the speed of the eventual replacement 
> algorithm, and c) give us some useful statistical data in the mean time.
> 
> I'm also going to try reformating one of the systems and compare how 
> long a build on ext4 takes.
> 
> Any other ideas?

Well, this clearly doesn't happen with master or in any combination of
the layers most users are using. The logical conclusion would be that
there is something in your layer that is somehow triggering this.

Of course since that layer is secret and you can't show us it, we have a
bit of a problem. Can you reproduce the bug against public code?

Are you by any chance setting BB_STAMP_POLICY somewhere?

Cheers,

Richard








^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-08 12:34               ` Richard Purdie
@ 2012-05-09 17:51                 ` Rich Pixley
  2012-05-09 19:52                   ` Richard Purdie
  2012-05-09 20:32                   ` Richard Purdie
  0 siblings, 2 replies; 19+ messages in thread
From: Rich Pixley @ 2012-05-09 17:51 UTC (permalink / raw)
  To: openembedded-core

[-- Attachment #1: Type: text/plain, Size: 4844 bytes --]

On 5/8/12 05:34 , Richard Purdie wrote:
> On Sun, 2012-05-06 at 10:36 -0700, Rich Pixley wrote:
>> On 5/2/12 16:06 , Richard Purdie wrote:
>>> On Wed, 2012-05-02 at 14:48 -0500, Mark Hatle wrote:
>>>> On 5/2/12 2:45 PM, Rich Pixley wrote:
>>> What would really help is a way to reproduce this...
>>>
>>> Does it reproduce with a certain set of metadata/sstate perhaps?
>>>
>>> What is odd about the above logs is that it appears bitbake never
>>> executes any task. Its possible something might have crashed somewhere I
>>> guess and not realise part of the system had died. Or it could be some
>>> kind of circular dependency loop where X needs Y to build and Y needs X
>>> so nothing happens. We are supposed to spot and error if that would have
>>> happened.
>>>
>>> Does strace give an idea of which bits of bitbake are alive/looping? I'd
>>> probably resort to a few print()/bb.error() in the code at this point to
>>> find out what is alive, what is dead and where its looping...
>> I have more info now.
>>
>> What I suspected was looping, (since it took longer than the ~1hr I was
>> willing to wait), isn't actual looping.  Given enough time, the builds
>> do complete and I have comparable results on 5 different servers, (all
>> ubuntu-12.04 amd64 and all on btrfs).
>>
>> My initial, full builds of core-image-minimal do build, and they build
>> in ~60min, (~30min if I hand seed the downloads directory).  I'm using
>> no mirrors other than the defaults.  My second build in an already built
>> directory, (expected to do nothing), takes anywhere from 7 - 10.5hrs to
>> complete and successfully do nothing, depending on the server.
>>
>> During this time, top shows a single cpu pinned at 98 - 100%
>> utilization, and strace shows literally millions of access and stat
>> calls on stamp files, mkdir on the stamps directory, etc.  Statistical
>> analysis of just the do_fetch access calls shows a distribution that
>> seems to mimic the topological tree.  That is, the most called access is
>> for quilt-native and the components higher up the tree get fewer stats.
>>
>> Oh, and the setscene stamps are all nonexistent.  I presume that's expected.
>>
>> First, I can't imagine why there would need to be more than one mkdir on
>> the stamps directory within a single instantiation of bitbake.  I can
>> imagine that it was easier to attempt to mkdir it than to check first,
>> but once it has been mkdir'd, (or checked), there's no need to do it
>> another million times, is there?
>>
>> Second, I can't imagine why there would need to be all the redundant
>> stamp checking.  That info is cached internally, isn't it?
>>
>> And third, the fact that it seems to be checking the entire subtree what
>> appear to be multiple times at every node suggests to me that the
>> checking algorithm is broken.  Back of the envelope... perhaps 300
>> components, maybe 10 tasks per component ~= 3e3 tasks.  Figure a
>> geometric explosion of checks for an inefficient algorithm and we're up
>> to around 10e6 checks.  I haven't counted an entire run, but based on
>> the time it takes to run, I'd say I'm seeing one, maybe two orders of
>> magnitude more checks than that.  I've seen a few million node
>> traversals in about 15min and a node traversal appears to involve
>> several accesses and at least one stat.
>>
>> I'm not familiar with the current bitbake internals so my next thought
>> would be to replace the calls to access, stat, and mkdir on the stamp
>> files with caching, counting calls.  Build a dictionary of each file
>> called, if it's new, do the kernel call and cache the result in the
>> dictionary.  If it's already in the dictionary, then inc a counter for
>> it and return the cached value.  This should a) improve the speed of the
>> current algorithm, b) improve the speed of the eventual replacement
>> algorithm, and c) give us some useful statistical data in the mean time.
>>
>> I'm also going to try reformating one of the systems and compare how
>> long a build on ext4 takes.
>>
>> Any other ideas?
> Well, this clearly doesn't happen with master or in any combination of
> the layers most users are using. The logical conclusion would be that
> there is something in your layer that is somehow triggering this.
No private layer involved.

I do have a makefile which encapsulates the environment stuff, but 
that's it.
> Of course since that layer is secret and you can't show us it, we have a
> bit of a problem. Can you reproduce the bug against public code?
Done.  (Our layer is becoming open, we're committed to it, but it's a 
long process internally).
> Are you by any chance setting BB_STAMP_POLICY somewhere?
Yes.  BB_STAMP_POLICY = "full".

I'll attach a copy of my local.conf and bblayers.conf.

--rich

[-- Attachment #2: bblayers.conf --]
[-- Type: text/plain, Size: 1010 bytes --]

# Time-stamp: <09-May-2012 10:50:03 PDT by rich.pixley@palm.com>

# Copyright (c) 2008 - 2012 Hewlett-Packard Development Company, L.P.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##

# LAYER_CONF_VERSION is increased each time build/conf/bblayers.conf
# changes incompatibly
LCONF_VERSION = "4"

PALMDIR ?= "/home/rich/projects/webos"

OECORE_LAYER ?= "${PALMDIR}/openembedded-core/meta"
WEBOS_LAYER ?= ""

BBFILES ?= ""
BBLAYERS ?= " \
  ${OECORE_LAYER} \
  ${WEBOS_LAYER} \
  "

[-- Attachment #3: local.conf --]
[-- Type: text/plain, Size: 1678 bytes --]

# DO NOT MODIFY!  This script is generated by configure. Changes made
# here will be lost.  Source for this file is in local-conf.in.

# Time-stamp: <27-Apr-2012 15:23:26 PDT by rich.pixley@palm.com>

# Copyright (c) 2008 - 2012 Hewlett-Packard Development Company, L.P.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

MACHINE := "qemux86"

# Uncomment to have 'work' directories removed after a package builds
#INHERIT += "rm_work"

BB_STAMP_POLICY = "full"
COVERAGE_BUILD = "0"
TMPDIR := "/home/rich/projects/webos/BUILD-qemux86"
TCLIBCAPPEND := ""
PRODUCTION_BUILD := ""

# parallelization options
# there's an extra space in these CFLAGS such that defining
# 'TARGET_CFLAGS += ""' causes gdb to break.  I'm tired of looking for
# it for now.  Hence this strange construction of a naked trigger.
PARALLEL_MAKE := "-j 48"
BB_NUMBER_THREADS := "48"

BB_SRCREV_POLICY = "cache"
BB_FETCH_PREMIRRORONLY = "true"

# CONF_VERSION is increased each time build/conf/ changes incompatibly and is used to
# track the version of this file when it was generated. This can safely be ignored if
# this doesn't mean anything to you.
CONF_VERSION = "1"

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-09 17:51                 ` Rich Pixley
@ 2012-05-09 19:52                   ` Richard Purdie
  2012-05-09 23:04                     ` Rich Pixley
  2012-05-09 20:32                   ` Richard Purdie
  1 sibling, 1 reply; 19+ messages in thread
From: Richard Purdie @ 2012-05-09 19:52 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

On Wed, 2012-05-09 at 10:51 -0700, Rich Pixley wrote:
> On 5/8/12 05:34 , Richard Purdie wrote:
> > On Sun, 2012-05-06 at 10:36 -0700, Rich Pixley wrote:
> >> Any other ideas?
> > Well, this clearly doesn't happen with master or in any combination of
> > the layers most users are using. The logical conclusion would be that
> > there is something in your layer that is somehow triggering this.
> No private layer involved.
> 
> I do have a makefile which encapsulates the environment stuff, but 
> that's it.
> > Of course since that layer is secret and you can't show us it, we have a
> > bit of a problem. Can you reproduce the bug against public code?
> Done.  (Our layer is becoming open, we're committed to it, but it's a 
> long process internally).
> > Are you by any chance setting BB_STAMP_POLICY somewhere?
> Yes.  BB_STAMP_POLICY = "full".
> 
> I'll attach a copy of my local.conf and bblayers.conf.

I'm 95% sure its BB_STAMP_POLICY = "full" causing the problems. The idea
is really that sstate and other recent developments obsolete the "full"
stamp code. I'm not sure it actually gets on with the setscene stamps
the sstate code generates, as I suspect you're discovering.

We could try and fix the "full" policy, or we could just remove it.
Looking at the code for the function that deals with this in
runqueue.py, I can see where problems could occur.

So I guess I'm asking if we should fix that or can we remove it?

Cheers,

Richard





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-09 17:51                 ` Rich Pixley
  2012-05-09 19:52                   ` Richard Purdie
@ 2012-05-09 20:32                   ` Richard Purdie
  2012-05-09 23:20                     ` Rich Pixley
  1 sibling, 1 reply; 19+ messages in thread
From: Richard Purdie @ 2012-05-09 20:32 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

Hi Rich,

You might like to try the change below as I think it might address the problem.

Cheers,

Richard

bitbake/runqueue: Fix 'full' stamp checking to be more efficient and cache results

This should fix issues where bitbake would seemingly lock up when checking
certain configurations of stampfiles.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
---
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index b870caf..48433be 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -875,7 +875,7 @@ class RunQueue:
             bb.msg.fatal("RunQueue", "check_stamps fatal internal error")
         return current
 
-    def check_stamp_task(self, task, taskname = None, recurse = False):
+    def check_stamp_task(self, task, taskname = None, recurse = False, cache = {}):
         def get_timestamp(f):
             try:
                 if not os.access(f, os.F_OK):
@@ -915,6 +915,9 @@ class RunQueue:
         t1 = get_timestamp(stampfile)
         for dep in self.rqdata.runq_depends[task]:
             if iscurrent:
+                if dep in cache:
+                    iscurrent = cache[dep]
+                    continue
                 fn2 = self.rqdata.taskData.fn_index[self.rqdata.runq_fnid[dep]]
                 taskname2 = self.rqdata.runq_task[dep]
                 stampfile2 = bb.build.stampfile(taskname2, self.rqdata.dataCache, fn2)
@@ -931,7 +934,9 @@ class RunQueue:
                         logger.debug(2, 'Stampfile %s < %s', stampfile, stampfile2)
                         iscurrent = False
                     if recurse and iscurrent:
-                        iscurrent = self.check_stamp_task(dep, recurse=True)
+                        iscurrent = self.check_stamp_task(dep, recurse=True, cache=cache)
+                        cache[dep] = iscurrent
+        cache[task] = iscurrent
         return iscurrent
 
     def execute_runqueue(self):





^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-09 19:52                   ` Richard Purdie
@ 2012-05-09 23:04                     ` Rich Pixley
  2012-05-09 23:26                       ` Richard Purdie
  0 siblings, 1 reply; 19+ messages in thread
From: Rich Pixley @ 2012-05-09 23:04 UTC (permalink / raw)
  To: openembedded-core

On 5/9/12 12:52 , Richard Purdie wrote:
> On Wed, 2012-05-09 at 10:51 -0700, Rich Pixley wrote:
>> On 5/8/12 05:34 , Richard Purdie wrote:
>>> On Sun, 2012-05-06 at 10:36 -0700, Rich Pixley wrote:
>>>> Any other ideas?
>>> Well, this clearly doesn't happen with master or in any combination of
>>> the layers most users are using. The logical conclusion would be that
>>> there is something in your layer that is somehow triggering this.
>> No private layer involved.
>>
>> I do have a makefile which encapsulates the environment stuff, but
>> that's it.
>>> Of course since that layer is secret and you can't show us it, we have a
>>> bit of a problem. Can you reproduce the bug against public code?
>> Done.  (Our layer is becoming open, we're committed to it, but it's a
>> long process internally).
>>> Are you by any chance setting BB_STAMP_POLICY somewhere?
>> Yes.  BB_STAMP_POLICY = "full".
>>
>> I'll attach a copy of my local.conf and bblayers.conf.
> I'm 95% sure its BB_STAMP_POLICY = "full" causing the problems. The idea
> is really that sstate and other recent developments obsolete the "full"
> stamp code. I'm not sure it actually gets on with the setscene stamps
> the sstate code generates, as I suspect you're discovering.
>
> We could try and fix the "full" policy, or we could just remove it.
> Looking at the code for the function that deals with this in
> runqueue.py, I can see where problems could occur.
>
> So I guess I'm asking if we should fix that or can we remove it?
Um... I'm not sure.

In the past, that was required to get everything built in an incremental 
fashion.  That is, if A depended on B depended on C and C changed, 
BB_STAMP_POLICY = "full" was the only way to get A to be rebuilt 
automatically.

Are you saying that this happens automatically now even without the 
BB_STAMP_POLICY = "full" setting?  Or that some other setting is more 
appropriate and perhaps has semantics I don't know?  Or that the current 
default is no incremental rebuilds?  Or... ?

--rich



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-09 20:32                   ` Richard Purdie
@ 2012-05-09 23:20                     ` Rich Pixley
  2012-05-09 23:32                       ` Richard Purdie
  0 siblings, 1 reply; 19+ messages in thread
From: Rich Pixley @ 2012-05-09 23:20 UTC (permalink / raw)
  To: openembedded-core

My rebuild completed in 5 seconds.

Thank you very much!

I'd rather not fork bitbake locally.  Can I expect this patch to show up 
in git://git.openembedded.org/bitbake sometime soon?

--rich

On 5/9/12 13:32 , Richard Purdie wrote:
> Hi Rich,
>
> You might like to try the change below as I think it might address the problem.
>
> Cheers,
>
> Richard
>
> bitbake/runqueue: Fix 'full' stamp checking to be more efficient and cache results
>
> This should fix issues where bitbake would seemingly lock up when checking
> certain configurations of stampfiles.
>
> Signed-off-by: Richard Purdie<richard.purdie@linuxfoundation.org>
> ---
> diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
> index b870caf..48433be 100644
> --- a/bitbake/lib/bb/runqueue.py
> +++ b/bitbake/lib/bb/runqueue.py
> @@ -875,7 +875,7 @@ class RunQueue:
>               bb.msg.fatal("RunQueue", "check_stamps fatal internal error")
>           return current
>
> -    def check_stamp_task(self, task, taskname = None, recurse = False):
> +    def check_stamp_task(self, task, taskname = None, recurse = False, cache = {}):
>           def get_timestamp(f):
>               try:
>                   if not os.access(f, os.F_OK):
> @@ -915,6 +915,9 @@ class RunQueue:
>           t1 = get_timestamp(stampfile)
>           for dep in self.rqdata.runq_depends[task]:
>               if iscurrent:
> +                if dep in cache:
> +                    iscurrent = cache[dep]
> +                    continue
>                   fn2 = self.rqdata.taskData.fn_index[self.rqdata.runq_fnid[dep]]
>                   taskname2 = self.rqdata.runq_task[dep]
>                   stampfile2 = bb.build.stampfile(taskname2, self.rqdata.dataCache, fn2)
> @@ -931,7 +934,9 @@ class RunQueue:
>                           logger.debug(2, 'Stampfile %s<  %s', stampfile, stampfile2)
>                           iscurrent = False
>                       if recurse and iscurrent:
> -                        iscurrent = self.check_stamp_task(dep, recurse=True)
> +                        iscurrent = self.check_stamp_task(dep, recurse=True, cache=cache)
> +                        cache[dep] = iscurrent
> +        cache[task] = iscurrent
>           return iscurrent
>
>       def execute_runqueue(self):
>
>
>
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-09 23:04                     ` Rich Pixley
@ 2012-05-09 23:26                       ` Richard Purdie
  2012-05-10  0:03                         ` Rich Pixley
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Purdie @ 2012-05-09 23:26 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

On Wed, 2012-05-09 at 16:04 -0700, Rich Pixley wrote:
> On 5/9/12 12:52 , Richard Purdie wrote:
> > On Wed, 2012-05-09 at 10:51 -0700, Rich Pixley wrote:
> >> On 5/8/12 05:34 , Richard Purdie wrote:
> >>> On Sun, 2012-05-06 at 10:36 -0700, Rich Pixley wrote:
> >>>> Any other ideas?
> >>> Well, this clearly doesn't happen with master or in any combination of
> >>> the layers most users are using. The logical conclusion would be that
> >>> there is something in your layer that is somehow triggering this.
> >> No private layer involved.
> >>
> >> I do have a makefile which encapsulates the environment stuff, but
> >> that's it.
> >>> Of course since that layer is secret and you can't show us it, we have a
> >>> bit of a problem. Can you reproduce the bug against public code?
> >> Done.  (Our layer is becoming open, we're committed to it, but it's a
> >> long process internally).
> >>> Are you by any chance setting BB_STAMP_POLICY somewhere?
> >> Yes.  BB_STAMP_POLICY = "full".
> >>
> >> I'll attach a copy of my local.conf and bblayers.conf.
> > I'm 95% sure its BB_STAMP_POLICY = "full" causing the problems. The idea
> > is really that sstate and other recent developments obsolete the "full"
> > stamp code. I'm not sure it actually gets on with the setscene stamps
> > the sstate code generates, as I suspect you're discovering.
> >
> > We could try and fix the "full" policy, or we could just remove it.
> > Looking at the code for the function that deals with this in
> > runqueue.py, I can see where problems could occur.
> >
> > So I guess I'm asking if we should fix that or can we remove it?
> Um... I'm not sure.
> 
> In the past, that was required to get everything built in an incremental 
> fashion.  That is, if A depended on B depended on C and C changed, 
> BB_STAMP_POLICY = "full" was the only way to get A to be rebuilt 
> automatically.
> 
> Are you saying that this happens automatically now even without the 
> BB_STAMP_POLICY = "full" setting?  Or that some other setting is more 
> appropriate and perhaps has semantics I don't know?  Or that the current 
> default is no incremental rebuilds?  Or... ?

The settings that are now recommended are:

BB_SIGNATURE_HANDLER ?= 'OEBasicHash'
OELAYOUT_ABI = "8"

This requires a rebuild since the stamp file format changes, hence the
ABI number increase. Currently, poky and angstrom use these settings
amongst others but its not default in OE-Core. I'll likely propose a
change to make it the default soon though.

This would then make BB_STAMP_POLICY = "full" obsolete and yet
incremental builds will work correctly and likely rebuild less things
(only really what potentially changed).

Cheers,

Richard






^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-09 23:20                     ` Rich Pixley
@ 2012-05-09 23:32                       ` Richard Purdie
  2012-05-10  0:00                         ` Rich Pixley
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Purdie @ 2012-05-09 23:32 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

On Wed, 2012-05-09 at 16:20 -0700, Rich Pixley wrote:
> My rebuild completed in 5 seconds.
> 
> Thank you very much!
> 
> I'd rather not fork bitbake locally.  Can I expect this patch to show up 
> in git://git.openembedded.org/bitbake sometime soon?

Yes, I just wanted to confirm we were fixing the right problem...

I'll post it to the bitbake list and merge in a day or two assuming no
negative feedback.

I still wonder if that stamp mode is useful but that is a different
discussion.

Cheers,

Richard




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-09 23:32                       ` Richard Purdie
@ 2012-05-10  0:00                         ` Rich Pixley
  0 siblings, 0 replies; 19+ messages in thread
From: Rich Pixley @ 2012-05-10  0:00 UTC (permalink / raw)
  To: openembedded-core

On 5/9/12 16:32 , Richard Purdie wrote:
> On Wed, 2012-05-09 at 16:20 -0700, Rich Pixley wrote:
>> My rebuild completed in 5 seconds.
>>
>> Thank you very much!
>>
>> I'd rather not fork bitbake locally.  Can I expect this patch to show up
>> in git://git.openembedded.org/bitbake sometime soon?
> Yes, I just wanted to confirm we were fixing the right problem...
>
> I'll post it to the bitbake list and merge in a day or two assuming no
> negative feedback.
I'm running more thorough tests as I type.
> I still wonder if that stamp mode is useful but that is a different
> discussion.
If I'm understanding your other message, then no, it's probably not.

My only concern is that in the A depends on B depends on C case, 
changing C forces a rebuild of A.  I don't really care which mechanism 
makes that happen.

--rich



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: SetScene tasks hang forever?
  2012-05-09 23:26                       ` Richard Purdie
@ 2012-05-10  0:03                         ` Rich Pixley
  0 siblings, 0 replies; 19+ messages in thread
From: Rich Pixley @ 2012-05-10  0:03 UTC (permalink / raw)
  To: openembedded-core

On 5/9/12 16:26 , Richard Purdie wrote:
> The settings that are now recommended are:
>
> BB_SIGNATURE_HANDLER ?= 'OEBasicHash'
> OELAYOUT_ABI = "8"
>
> This requires a rebuild since the stamp file format changes, hence the
> ABI number increase. Currently, poky and angstrom use these settings
> amongst others but its not default in OE-Core. I'll likely propose a
> change to make it the default soon though.
>
> This would then make BB_STAMP_POLICY = "full" obsolete and yet
> incremental builds will work correctly and likely rebuild less things
> (only really what potentially changed).
Thank you for the explanation.

I'll switch us immediately.

--rich



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-05-10  0:13 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-02 18:21 SetScene tasks hang forever? Rich Pixley
2012-05-02 18:40 ` Mark Hatle
2012-05-02 19:16   ` Rich Pixley
2012-05-02 19:40     ` Mark Hatle
2012-05-02 19:45       ` Rich Pixley
2012-05-02 19:48         ` Mark Hatle
2012-05-02 23:06           ` Richard Purdie
2012-05-06 17:36             ` Rich Pixley
2012-05-07 16:38               ` Rich Pixley
2012-05-08 12:34               ` Richard Purdie
2012-05-09 17:51                 ` Rich Pixley
2012-05-09 19:52                   ` Richard Purdie
2012-05-09 23:04                     ` Rich Pixley
2012-05-09 23:26                       ` Richard Purdie
2012-05-10  0:03                         ` Rich Pixley
2012-05-09 20:32                   ` Richard Purdie
2012-05-09 23:20                     ` Rich Pixley
2012-05-09 23:32                       ` Richard Purdie
2012-05-10  0:00                         ` Rich Pixley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox