* [Buildroot] Issue with host-erlang-rebar causing timeouts
@ 2015-05-21 19:21 Thomas Petazzoni
2015-05-21 19:47 ` Thomas Petazzoni
0 siblings, 1 reply; 8+ messages in thread
From: Thomas Petazzoni @ 2015-05-21 19:21 UTC (permalink / raw)
To: buildroot
Hello Johan,
We have an issue with host-erlang-rebar: it causes some timeouts in our
builds. See
http://autobuild.buildroot.org/?reason=host-erlang-rebar-2.5.1.
The script that does the autobuilder builds kills the build if it lasts
for more than 8 hours. And in the last few days, all the timeouts we
have had were only caused by host-erlang-rebar.
If you look at the link above, things are fairly strange:
* We had three of such timeouts back on March 16, all on the gcc10
machine.
* Since May 19th, we have the exact same timeouts, but this time only
on gcc75.
All the timeouts take place at exactly the same point, during the
"build" step of host-erlang-rebar:
./bootstrap
package/pkg-generic.mk:156: recipe for target '/ssd1/thomas/autobuild/instance-2/output/build/host-erlang-rebar-2.5.1/.stamp_built' failed
make: *** [/ssd1/thomas/autobuild/instance-2/output/build/host-erlang-rebar-2.5.1/.stamp_built] Terminated
Makefile:7: recipe for target 'all' failed
make[1]: *** [all] Terminated
So it's the ./bootstrap program that either hangs forever, or does an
infinite loop.
Let's take a closer look at
http://autobuild.buildroot.org/results/73d/73d491670cb29ab68cb8552b4b9bd82d31571e62/.
From the logs of the autobuilder instance (only visible on gcc75), I
see:
[Wed, 20 May 2015 14:11:36] INFO: generate the configuration
[Wed, 20 May 2015 14:11:44] INFO: build started
[Wed, 20 May 2015 22:11:44] INFO: build timed out
Importing 73d491670cb29ab68cb8552b4b9bd82d31571e62 from /tmp/phpS9zcmM
So the build started at 14h11, and timed out at 22h11, so exactly 8
hours after the start of the build, which is expected.
Now, we can correlate this with the build-time.log information
available at
http://autobuild.buildroot.org/results/73d/73d491670cb29ab68cb8552b4b9bd82d31571e62//build-time.log,
which gives us the starting and ending time of each step of the build
process.
The first line is:
1432123908:start:extract : toolchain-external
The Unix time stamp 1432123908 corresponds to:
thomas at skate:~$ LANG=C date -d @1432123908
Wed May 20 14:11:48 CEST 2015
So this is exactly matching the 14h11 start time for the build.
The last line of build-time.log is the starting time of
host-erlang-rebar build:
1432126776:start:build : host-erlang-rebar
And this time stamp corresponds to:
$ LANG=C date -d @1432126776
Wed May 20 14:59:36 CEST 2015
So basically about 48 minutes after the start of the build process, we
started building host-erlang-rebar.
And then nothing happened for the next 7+ hours, until the build got
killed at 22h11.
I've used the br-reproduce-build script on gcc75 to attempt to
reproduce exactly this 73d491670cb29ab68cb8552b4b9bd82d31571e62 build,
but it didn't occur: the build succeeded completely without an error,
and without any hang. The ./bootstrap part went just fine:
./bootstrap
Recompile: src/rebar
Recompile: src/rebar_abnfc_compiler
Recompile: src/rebar_app_utils
[...]
Do you have any idea of what could cause this problem? Is this to only
happen on certain build machines (so maybe the version of some host
tools is playing a role), but also not always.
Do you have any idea?
Thanks a lot,
Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Buildroot] Issue with host-erlang-rebar causing timeouts
2015-05-21 19:21 [Buildroot] Issue with host-erlang-rebar causing timeouts Thomas Petazzoni
@ 2015-05-21 19:47 ` Thomas Petazzoni
2015-05-21 20:04 ` Thomas Petazzoni
0 siblings, 1 reply; 8+ messages in thread
From: Thomas Petazzoni @ 2015-05-21 19:47 UTC (permalink / raw)
To: buildroot
Hello Johan,
On Thu, 21 May 2015 21:21:50 +0200, Thomas Petazzoni wrote:
> Do you have any idea of what could cause this problem? Is this to only
> happen on certain build machines (so maybe the version of some host
> tools is playing a role), but also not always.
I've done a bit more "data mining" on this issue. I looked at *all* the
build results of gcc75 since I restarted it on May 4th, and looked at
the one that built host-erlang-rebar.
Up to May 19th, every time host-erlang-rebar was built, there was no
problem. Not a single failure. The last successful build that built
host-erlang-rebar was:
http://autobuild.buildroot.org/results/083/08369dd2c7147d620b50ed799c74acea38bf1457/
This build was done on May 19th at 8h26.
Then later on that day, at 21h26, the first timeout occurred on
host-erlang-rebar, with this build result:
http://autobuild.buildroot.org/results/3f9/3f9381a410e8411c390ccf5632a06c66b1c93524/
After that one, *all* builds that needed to build host-erlang-rebar
timed out.
Looking at the Buildroot Git commit being tested by both of those
builds, they are exactly the same:
db989f89c9fb1ebc3997d8a3c517948392611d77
So it really smells like a change in the machine configuration. The
last working build of host-erlang-rebar took place at:
1432011188:start:build : host-erlang-rebar
that is Tue May 19 06:53:08 CEST 2015.
The first non-working build of host-erlang-rebar too place at:
$ LANG=C date -d @1432038506
Tue May 19 14:28:26 CEST 2015
So just a few hours later.
According to the apt and dpkg logs on this machine, no package updates
have been made since May 4th.
The weird thing is that the problem happens in *all* builds of
host-erlang-rebar, but I'm not able to reproduce on the same machine,
outside of the autobuild infrastructure.
I'm puzzled.
Maybe I should set up a cronjob that checks every minute if we are in
this situation, and send me an e-mail, so that maybe I can catch the
situation while it's happening, and see a bit more what's going on. I
don't really have better ideas :/
Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Buildroot] Issue with host-erlang-rebar causing timeouts
2015-05-21 19:47 ` Thomas Petazzoni
@ 2015-05-21 20:04 ` Thomas Petazzoni
2015-05-27 21:18 ` Arnout Vandecappelle
0 siblings, 1 reply; 8+ messages in thread
From: Thomas Petazzoni @ 2015-05-21 20:04 UTC (permalink / raw)
To: buildroot
Johan,
On Thu, 21 May 2015 21:47:26 +0200, Thomas Petazzoni wrote:
> Maybe I should set up a cronjob that checks every minute if we are in
> this situation, and send me an e-mail, so that maybe I can catch the
> situation while it's happening, and see a bit more what's going on. I
> don't really have better ideas :/
Turns out that right after sending this e-mail, I checked, and one of
the build was stuck exactly in this situation. The process tree is like
this:
23332 pts/5 S+ 0:01 | \_ python ../buildroot-test/scripts/autobuild-run -c autobuild-run.conf
23969 pts/5 S 0:00 | | \_ timeout 28800 make O=/ssd1/thomas/autobuild/instance-2/output -C instance-2/buildroot BR2_DL_DIR=/ssd1/thomas/autobuild/instance-2/dl BR2_JLEVE
23970 pts/5 T 0:19 | | \_ make O=/ssd1/thomas/autobuild/instance-2/output -C instance-2/buildroot BR2_DL_DIR=/ssd1/thomas/autobuild/instance-2/dl BR2_JLEVEL=4
10928 pts/5 Z 0:00 | | \_ [bash] <defunct>
So basically, the Buildroot "make" invocation only has one child
process: a bash process that is defunct. This instance is really stuck
in the ./bootstrap call:
make[1]: Entering directory '/ssd1/thomas/autobuild/instance-2/output/build/host-erlang-rebar-2.5.1'
./bootstrap
[nothing else]
I tried running the exact make command that builds host-erlang-rebar in
a terminal, inside the build directory of this build, and it worked
perfectly fine. So even more puzzled.
I see however that 'bootstrap' tries to play with the hg and git VCS to
get a commit reference. Maybe this is failing for some reason?
Any other idea?
Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Buildroot] Issue with host-erlang-rebar causing timeouts
2015-05-21 20:04 ` Thomas Petazzoni
@ 2015-05-27 21:18 ` Arnout Vandecappelle
2015-08-10 13:10 ` Johan Oudinet
2015-08-19 21:02 ` Thomas Petazzoni
0 siblings, 2 replies; 8+ messages in thread
From: Arnout Vandecappelle @ 2015-05-27 21:18 UTC (permalink / raw)
To: buildroot
On 05/21/15 22:04, Thomas Petazzoni wrote:
> Johan,
>
> On Thu, 21 May 2015 21:47:26 +0200, Thomas Petazzoni wrote:
>
>> Maybe I should set up a cronjob that checks every minute if we are in
>> this situation, and send me an e-mail, so that maybe I can catch the
>> situation while it's happening, and see a bit more what's going on. I
>> don't really have better ideas :/
>
> Turns out that right after sending this e-mail, I checked, and one of
> the build was stuck exactly in this situation. The process tree is like
> this:
>
> 23332 pts/5 S+ 0:01 | \_ python ../buildroot-test/scripts/autobuild-run -c autobuild-run.conf
> 23969 pts/5 S 0:00 | | \_ timeout 28800 make O=/ssd1/thomas/autobuild/instance-2/output -C instance-2/buildroot BR2_DL_DIR=/ssd1/thomas/autobuild/instance-2/dl BR2_JLEVE
> 23970 pts/5 T 0:19 | | \_ make O=/ssd1/thomas/autobuild/instance-2/output -C instance-2/buildroot BR2_DL_DIR=/ssd1/thomas/autobuild/instance-2/dl BR2_JLEVEL=4
^
This means that the 'make' instance either got a SIGSTOP or ptrace(). Is there
some administrative process running on that machine that is doing something
funky with it? Or is it the autobuild-run script that is sending some signal?
> 10928 pts/5 Z 0:00 | | \_ [bash] <defunct>
>
> So basically, the Buildroot "make" invocation only has one child
> process: a bash process that is defunct. This instance is really stuck
> in the ./bootstrap call:
No it's not, bootstrap is finished already, but the stopped make hasn't reaped
it yet. Actually, even the make has finished already...
>
> make[1]: Entering directory '/ssd1/thomas/autobuild/instance-2/output/build/host-erlang-rebar-2.5.1'
> ./bootstrap
> [nothing else]
...but somehow the output of that finished make has not yet ended up in the log
file...
Mysterious...
>
> I tried running the exact make command that builds host-erlang-rebar in
> a terminal, inside the build directory of this build, and it worked
> perfectly fine. So even more puzzled.
Have you tried running it within the timeout? Maybe that one is doing SIGSTOP
or ptrace() for some reason...
Regards,
Arnout
>
> I see however that 'bootstrap' tries to play with the hg and git VCS to
> get a commit reference. Maybe this is failing for some reason?
>
> Any other idea?
>
> Thomas
>
--
Arnout Vandecappelle arnout at mind be
Senior Embedded Software Architect +32-16-286500
Essensium/Mind http://www.mind.be
G.Geenslaan 9, 3001 Leuven, Belgium BE 872 984 063 RPR Leuven
LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle
GPG fingerprint: 7CB5 E4CC 6C2E EFD4 6E3D A754 F963 ECAB 2450 2F1F
^ permalink raw reply [flat|nested] 8+ messages in thread* [Buildroot] Issue with host-erlang-rebar causing timeouts
2015-05-27 21:18 ` Arnout Vandecappelle
@ 2015-08-10 13:10 ` Johan Oudinet
2015-08-19 21:02 ` Thomas Petazzoni
1 sibling, 0 replies; 8+ messages in thread
From: Johan Oudinet @ 2015-08-10 13:10 UTC (permalink / raw)
To: buildroot
Thomas, Arnout, All,
I'm just back from vacations and I see there is still this timeout
issue about host-erlang-rebar during the last autobuilds.
Looking at the thread on this topic, it looks like there are some
questions from Arnout that remain unanswered.
On Wed, May 27, 2015 at 11:18 PM, Arnout Vandecappelle <arnout@mind.be> wrote:
> On 05/21/15 22:04, Thomas Petazzoni wrote:
>> On Thu, 21 May 2015 21:47:26 +0200, Thomas Petazzoni wrote:
>>
>> Turns out that right after sending this e-mail, I checked, and one of
>> the build was stuck exactly in this situation. The process tree is like
>> this:
>>
>> 23332 pts/5 S+ 0:01 | \_ python ../buildroot-test/scripts/autobuild-run -c autobuild-run.conf
>> 23969 pts/5 S 0:00 | | \_ timeout 28800 make O=/ssd1/thomas/autobuild/instance-2/output -C instance-2/buildroot BR2_DL_DIR=/ssd1/thomas/autobuild/instance-2/dl BR2_JLEVE
>> 23970 pts/5 T 0:19 | | \_ make O=/ssd1/thomas/autobuild/instance-2/output -C instance-2/buildroot BR2_DL_DIR=/ssd1/thomas/autobuild/instance-2/dl BR2_JLEVEL=4
> ^
> This means that the 'make' instance either got a SIGSTOP or ptrace(). Is there
> some administrative process running on that machine that is doing something
> funky with it? Or is it the autobuild-run script that is sending some signal?
>
>
>> 10928 pts/5 Z 0:00 | | \_ [bash] <defunct>
>>
>> So basically, the Buildroot "make" invocation only has one child
>> process: a bash process that is defunct. This instance is really stuck
>> in the ./bootstrap call:
>
> No it's not, bootstrap is finished already, but the stopped make hasn't reaped
> it yet. Actually, even the make has finished already...
I agree with Arnout. From ps output, the bash process is a zombie,
which means it has terminated and its parent hasn't reaped it yet.
Since its parent is in a stopped status (either by a job control
signal or because it is being traced), we should look at the reason
why make is in such status.
Could it be a problem with the timeout command?
>
>>
>> make[1]: Entering directory '/ssd1/thomas/autobuild/instance-2/output/build/host-erlang-rebar-2.5.1'
>> ./bootstrap
>> [nothing else]
>
> ...but somehow the output of that finished make has not yet ended up in the log
> file...
>
> Mysterious...
>
>
>>
>> I tried running the exact make command that builds host-erlang-rebar in
>> a terminal, inside the build directory of this build, and it worked
>> perfectly fine. So even more puzzled.
>
> Have you tried running it within the timeout? Maybe that one is doing SIGSTOP
> or ptrace() for some reason...
>
I think it's worth trying to manually run the command within the
timeout as suggested by Arnout.
If there is still no output, you could try to run ./bootstrap with the
debug flag. Finally, I would strace the entire build process to see
where it is stuck, but this might requires a lot of disk space.
That's indeed a weird problem and I'm sorry I don't have a better
solution to offer.
Regards,
--
Johan
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Buildroot] Issue with host-erlang-rebar causing timeouts
2015-05-27 21:18 ` Arnout Vandecappelle
2015-08-10 13:10 ` Johan Oudinet
@ 2015-08-19 21:02 ` Thomas Petazzoni
2015-08-20 21:31 ` Thomas Petazzoni
1 sibling, 1 reply; 8+ messages in thread
From: Thomas Petazzoni @ 2015-08-19 21:02 UTC (permalink / raw)
To: buildroot
Arnout,
On Wed, 27 May 2015 23:18:11 +0200, Arnout Vandecappelle wrote:
> > 23332 pts/5 S+ 0:01 | \_ python ../buildroot-test/scripts/autobuild-run -c autobuild-run.conf
> > 23969 pts/5 S 0:00 | | \_ timeout 28800 make O=/ssd1/thomas/autobuild/instance-2/output -C instance-2/buildroot BR2_DL_DIR=/ssd1/thomas/autobuild/instance-2/dl BR2_JLEVE
> > 23970 pts/5 T 0:19 | | \_ make O=/ssd1/thomas/autobuild/instance-2/output -C instance-2/buildroot BR2_DL_DIR=/ssd1/thomas/autobuild/instance-2/dl BR2_JLEVEL=4
> ^
> This means that the 'make' instance either got a SIGSTOP or ptrace(). Is there
> some administrative process running on that machine that is doing something
> funky with it? Or is it the autobuild-run script that is sending some signal?
This machine is gcc75, so I don't really control all what is happening.
But not much seems to be going on on this machine.
The autobuild-run script does send a SIGTERM to subprocesses when the
main autobuild-run process is killed. But this is normally not the
case, unless you hit Ctrl+C.
> Have you tried running it within the timeout? Maybe that one is doing SIGSTOP
> or ptrace() for some reason...
I haven't tried running without the timeout, but I've tried to do a
manual build of host-relang-rebar *under* timeout, and it worked just
fine.
Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Buildroot] Issue with host-erlang-rebar causing timeouts
2015-08-19 21:02 ` Thomas Petazzoni
@ 2015-08-20 21:31 ` Thomas Petazzoni
2015-08-21 15:52 ` Arnout Vandecappelle
0 siblings, 1 reply; 8+ messages in thread
From: Thomas Petazzoni @ 2015-08-20 21:31 UTC (permalink / raw)
To: buildroot
Arnout, Johan,
On Wed, 19 Aug 2015 23:02:51 +0200, Thomas Petazzoni wrote:
> > Have you tried running it within the timeout? Maybe that one is doing SIGSTOP
> > or ptrace() for some reason...
>
> I haven't tried running without the timeout, but I've tried to do a
> manual build of host-relang-rebar *under* timeout, and it worked just
> fine.
FWIW, the "timeout" seems to no longer be able to detect/interrupt the
host-erlang-rebar build. The four autobuild jobs on gcc75 are stuck
since several weeks on gcc75, blocking any other build. Here is the
current process tree (look at the date of the processes) :
thomas 1474 0.0 0.0 56780 4056 pts/7 S+ Jul09 0:00 | \_ python ../buildroot-test/scripts/autobuild-run -c autobuild-run.conf
thomas 1478 0.0 0.0 58680 7044 pts/7 S+ Jul09 0:51 | \_ python ../buildroot-test/scripts/autobuild-run -c autobuild-run.conf
thomas 1755 0.0 0.0 8496 804 pts/7 S Aug06 0:00 | | \_ timeout 28800 make O=/ssd1/thomas/autobuild/instance-0/output -C instance-0/buildroot BR2_DL_
thomas 1756 0.0 0.4 85192 78400 pts/7 T Aug06 0:14 | | \_ make O=/ssd1/thomas/autobuild/instance-0/output -C instance-0/buildroot BR2_DL_DIR=/ssd1/
thomas 11478 0.0 0.0 0 0 pts/7 Z Aug06 0:00 | | \_ [bash] <defunct>
thomas 1479 0.0 0.0 58680 7548 pts/7 S+ Jul09 0:46 | \_ python ../buildroot-test/scripts/autobuild-run -c autobuild-run.conf
thomas 3490 0.0 0.0 8496 816 pts/7 S Aug07 0:00 | | \_ timeout 28800 make O=/ssd1/thomas/autobuild/instance-1/output -C instance-1/buildroot BR2_DL_
thomas 3491 0.0 0.4 84964 80376 pts/7 T Aug07 0:14 | | \_ make O=/ssd1/thomas/autobuild/instance-1/output -C instance-1/buildroot BR2_DL_DIR=/ssd1/
thomas 3942 0.0 0.0 0 0 pts/7 Z Aug07 0:00 | | \_ [bash] <defunct>
thomas 1480 0.0 0.0 58792 3556 pts/7 S+ Jul09 0:22 | \_ python ../buildroot-test/scripts/autobuild-run -c autobuild-run.conf
thomas 30856 0.0 0.0 8496 696 pts/7 S Jul21 0:00 | | \_ timeout 28800 make O=/ssd1/thomas/autobuild/instance-2/output -C instance-2/buildroot BR2_DL_
thomas 30857 0.0 0.0 83800 1680 pts/7 T Jul21 0:19 | | \_ make O=/ssd1/thomas/autobuild/instance-2/output -C instance-2/buildroot BR2_DL_DIR=/ssd1/
thomas 29670 0.0 0.0 0 0 pts/7 Z Jul21 0:00 | | \_ [bash] <defunct>
thomas 1481 0.0 0.0 58680 2612 pts/7 S+ Jul09 0:13 | \_ python ../buildroot-test/scripts/autobuild-run -c autobuild-run.conf
thomas 30548 0.0 0.0 8496 668 pts/7 S Jul17 0:00 | \_ timeout 28800 make O=/ssd1/thomas/autobuild/instance-3/output -C instance-3/buildroot BR2_DL_
thomas 30552 0.0 0.0 82736 1656 pts/7 T Jul17 0:09 | \_ make O=/ssd1/thomas/autobuild/instance-3/output -C instance-3/buildroot BR2_DL_DIR=/ssd1/
thomas 23005 0.0 0.0 0 0 pts/7 Z Jul17 0:00 | \_ [bash] <defunct>
The log files of the four instances indicate they are blocked
running ./bootstrap as part of host-erlang-rebar build process.
Also, I have no idea if it's related, but in a different part of the
process tree, I have:
thomas 23006 0.0 0.0 6640 1508 pts/7 T Jul17 0:00 /usr/bin/make -j4
thomas 23007 0.0 0.0 0 0 pts/7 Z Jul17 0:00 \_ [beam.smp] <defunct>
thomas 29673 0.0 0.0 6640 1524 pts/7 T Jul21 0:00 /usr/bin/make -j4
thomas 29674 0.0 0.0 0 0 pts/7 Z Jul21 0:00 \_ [beam.smp] <defunct>
thomas 11479 0.0 0.0 6640 1836 pts/7 T Aug06 0:00 /usr/bin/make -j4
thomas 11480 0.0 0.0 0 0 pts/7 Z Aug06 0:00 \_ [beam.smp] <defunct>
thomas 3943 0.0 0.0 6640 1868 pts/7 T Aug07 0:00 /usr/bin/make -j4
thomas 3944 0.0 0.0 0 0 pts/7 Z Aug07 0:00 \_ [beam.smp] <defunct>
The date of these processes are the same as the blocked sub-processes
of the autobuilder script.
https://www.reddit.com/r/Ubuntu/comments/caae8/what_the_hell_is_beamsmb/
says: "It's the Erlang runtime, which is running couchdb under the
guise of desktopcouch. CouchDB is a schemaless database system, which
typically runs globally and has no authentication. Ubuntu adds a layer
of authentication and runs it on an arbitrary port.".
Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Buildroot] Issue with host-erlang-rebar causing timeouts
2015-08-20 21:31 ` Thomas Petazzoni
@ 2015-08-21 15:52 ` Arnout Vandecappelle
0 siblings, 0 replies; 8+ messages in thread
From: Arnout Vandecappelle @ 2015-08-21 15:52 UTC (permalink / raw)
To: buildroot
On 08/20/15 23:31, Thomas Petazzoni wrote:
> Arnout, Johan,
>
> On Wed, 19 Aug 2015 23:02:51 +0200, Thomas Petazzoni wrote:
>
>>> Have you tried running it within the timeout? Maybe that one is doing SIGSTOP
>>> or ptrace() for some reason...
>>
>> I haven't tried running without the timeout, but I've tried to do a
>> manual build of host-relang-rebar *under* timeout, and it worked just
>> fine.
>
> FWIW, the "timeout" seems to no longer be able to detect/interrupt the
> host-erlang-rebar build. The four autobuild jobs on gcc75 are stuck
> since several weeks on gcc75, blocking any other build. Here is the
> current process tree (look at the date of the processes) :
>
> thomas 1474 0.0 0.0 56780 4056 pts/7 S+ Jul09 0:00 | \_ python ../buildroot-test/scripts/autobuild-run -c autobuild-run.conf
> thomas 1478 0.0 0.0 58680 7044 pts/7 S+ Jul09 0:51 | \_ python ../buildroot-test/scripts/autobuild-run -c autobuild-run.conf
> thomas 1755 0.0 0.0 8496 804 pts/7 S Aug06 0:00 | | \_ timeout 28800 make O=/ssd1/thomas/autobuild/instance-0/output -C instance-0/buildroot BR2_DL_
> thomas 1756 0.0 0.4 85192 78400 pts/7 T Aug06 0:14 | | \_ make O=/ssd1/thomas/autobuild/instance-0/output -C instance-0/buildroot BR2_DL_DIR=/ssd1/
timeout isn't called with the -k option, so it'll just send SIGTERM and then
try to reap its child. Since the child never exits because it is STOPped,
timeout itself waits indefinitely. Does the make process exit if you send it a
SIGKILL? (I think there's a difference between how STOPped and ptrace'd process
behave in that respect.)
> thomas 11478 0.0 0.0 0 0 pts/7 Z Aug06 0:00 | | \_ [bash] <defunct>
[snip]
> The log files of the four instances indicate they are blocked
> running ./bootstrap as part of host-erlang-rebar build process.
Or whatever bootstrap is running without generating output...
>
> Also, I have no idea if it's related, but in a different part of the
> process tree, I have:
>
[snip]
> thomas 11479 0.0 0.0 6640 1836 pts/7 T Aug06 0:00 /usr/bin/make -j4
> thomas 11480 0.0 0.0 0 0 pts/7 Z Aug06 0:00 \_ [beam.smp] <defunct>
It can't be a coincidence that PID 11479 is exactly one higher than bash 11478,
so I guess the bash is the shell from HOST_ERLANG_REBAR_BUILD_CMDS and the make
is the corresponding $(HOST_MAKE_ENV) $(MAKE). But bash itself has already
exited. Probably it did get killed by timeout, but its child make didn't so that
one got desinherited.
And then the make does just one thing: it calls bootstrap, which also has
exited already. The beam.smp process is indeed the erlang runtime of the
bootstrap script (I checked with an actual host-erlang-rebar build). I have no
idea why bootstrap has exited before actually building. Its output should also
keep coming out even if the parent make gets STOPped.
The other weird thing is that _both_ make processes that were active at the
time get STOPped. It looks as if this bootstrap script is doing a STOP on all
its parent processes called 'make' before starting the actual compilation...
Which seems highly unlikely and an strace on the thing also doesn't show that...
I still suspect that there must be some weird interaction with timeout. Perhaps
you can just disable the timeout on that build server and see if it still happens?
Regards,
Arnout
> thomas 3943 0.0 0.0 6640 1868 pts/7 T Aug07 0:00 /usr/bin/make -j4
> thomas 3944 0.0 0.0 0 0 pts/7 Z Aug07 0:00 \_ [beam.smp] <defunct>
>
> The date of these processes are the same as the blocked sub-processes
> of the autobuilder script.
> https://www.reddit.com/r/Ubuntu/comments/caae8/what_the_hell_is_beamsmb/
> says: "It's the Erlang runtime, which is running couchdb under the
> guise of desktopcouch. CouchDB is a schemaless database system, which
> typically runs globally and has no authentication. Ubuntu adds a layer
> of authentication and runs it on an arbitrary port.".
>
> Thomas
>
--
Arnout Vandecappelle arnout dot vandecappelle at essensium dot com
Senior Embedded Software Architect . . . . . . +32-478-010353 (mobile)
Essensium, Mind division . . . . . . . . . . . . . . http://www.mind.be
G.Geenslaan 9, 3001 Leuven, Belgium . . . . . BE 872 984 063 RPR Leuven
LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle
GPG fingerprint: 7493 020B C7E3 8618 8DEC 222C 82EB F404 F9AC 0DDF
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-08-21 15:52 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-21 19:21 [Buildroot] Issue with host-erlang-rebar causing timeouts Thomas Petazzoni
2015-05-21 19:47 ` Thomas Petazzoni
2015-05-21 20:04 ` Thomas Petazzoni
2015-05-27 21:18 ` Arnout Vandecappelle
2015-08-10 13:10 ` Johan Oudinet
2015-08-19 21:02 ` Thomas Petazzoni
2015-08-20 21:31 ` Thomas Petazzoni
2015-08-21 15:52 ` Arnout Vandecappelle
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox