qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Remaining CI failures
@ 2019-01-11 19:10 Alex Bennée
  2019-01-14 10:20 ` Peter Maydell
  2019-01-14 15:02 ` Emilio G. Cota
  0 siblings, 2 replies; 5+ messages in thread
From: Alex Bennée @ 2019-01-11 19:10 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: Paolo Bonzini, Philippe Mathieu-Daudé, cota@braap.org,
	Peter Maydell, Daniel P. Berrangé


Hi,

So trying to narrow down the remaining failures in the CI system. There
is one with a patch in flight (use g_usleep instead of sleep) but there
remains two failure modes, both erratic.

tests/qht-par:

I can trigger this on my dev machine with a gprof enabled build:

  # QEMU configure log Fri Jan 11 14:10:45 GMT 2019
  # Configured with: './configure' '--disable-tools' '--disable-docs' '--enable-gprof' '--enable-gcov'

I only seem to be able to trigger it when running via the wrapper in the
make system:

  retry.py -n 30 --invert make check-tests/test-qht-par

Eventually this crashes with:

  ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)

Leaving a core dump for the child:

  Core was generated by `tests/qht-bench -R -S0.1 -D10000 -N1 -n 2 -u 20 -d 1'
  (gdb) info thread
    Id   Target Id         Frame
  * 1    Thread 0x7ffff6a7e700 (LWP 15473) 0x000055555557c306 in call_rcu_thread (opaque=0x0) at util/rcu.c:255
    2    Thread 0x7ffff7fbe780 (LWP 15472) 0x00005555555b8d50 in gcov_read_words ()
  (gdb) bt
  #0  __mcount_internal (frompc=<optimised out>, selfpc=93824992383630) at mcount.c:98
  #1  0x00007ffff6e15e24 in mcount () at ../sysdeps/x86_64/_mcount.S:51
  #2  0x000055555557928e in qemu_event_reset (ev=0x3cc692b8d452f400) at util/qemu-thread-posix.c:408
  #3  0x000055555557c306 in call_rcu_thread (opaque=0x0) at util/rcu.c:255
  #4  0x0000555555579630 in qemu_thread_start (args=0x555555808080) at util/qemu-thread-posix.c:502
  #5  0x00007ffff70e96db in start_thread (arg=0x7ffff6a7e700) at pthread_create.c:463
  #6  0x00007ffff6e1288f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  (gdb) thread 2
  [Switching to thread 2 (Thread 0x7ffff7fbe780 (LWP 15472))]
  #0  0x00005555555b8d50 in gcov_read_words ()
  (gdb) bt
  #0  0x00005555555b8d50 in gcov_read_words ()
  #1  0x00005555555b9453 in __gcov_read_summary ()
  #2  0x00005555555ba461 in gcov_do_dump ()
  #3  0x00005555555bab62 in __gcov_exit ()
  #4  0x00005555555b8c22 in _GLOBAL__sub_D_00100_1_json_lexer_init () at qobject/json-lexer.c:365
  #5  0x00007ffff7de5b73 in _dl_fini () at dl-fini.c:138
  #6  0x00007ffff6d34041 in __run_exit_handlers (status=0, listp=0x7ffff70dc718 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at
  exit.c:108
  #7  0x00007ffff6d3413a in __GI_exit (status=<optimised out>) at exit.c:139
  #8  0x00007ffff6d12b9e in __libc_start_main (main=0x555555575d50 <main>, argc=11, argv=0x7fffffffdf78, init=<optimised out>, fini=<optimised out>, rtld_fini=<optimised out>, stack_end=0x7fffffffdf68) at ../csu/libc-start.c:344
  #9  0x0000555555573c1a in _start ()

To trigger the second failure I had to run on a limited Xenial machine
(16.04, 2 cores, 8Gb RAM) again with gprof build:

  # QEMU configure log Thu 10 Jan 22:22:52 GMT 2019
  # Configured with: './configure' '--enable-gprof' '--enable-gcov' '--disable-pie' '--target-list=aarch64-softmmu,arm-softmmu,i386-softmmu,mips-softmmu,mips64-softmmu,ppc64-softmmu,riscv64-softmmu,s390x-softmmu,x86_64-softmmu'

Running it like Travis does:

  retry.py -n 40 --invert -- make -j 3 check V=1

It eventually fails with:

  PASS: tests/test-hmp
  make: write error: stdout

It's hard to tell from the output what was running that failed. So far
I've managed to get the following information out of execsnoop:

  qemu-system-x86  1345   1332     0 x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1332.qmp,no
  wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display none -S -M pc-i440fx-4.0
  sh               1350   1332     0 /bin/sh -c exec x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev socket,path=/tmp/q
  t
  qemu-system-x86  1350   1332     0 x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1332.qmp,no
  wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display none -S -M pc-q35-3.1
  sh               1356   1332     0 /bin/sh -c exec x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev socket,path=/tmp/q
  t
  qemu-system-x86  1356   1332     0 x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1332.qmp,no
  wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display none -S -M pc-i440fx-3.1
  sh               1361   1332     0 /bin/sh -c exec x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev socket,path=/tmp/q
  t
  qemu-system-x86  1361   1332     0 x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1332.qmp,no
  wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display none -S -M pc-q35-4.0
  sh               1366   1332     0 /bin/sh -c exec x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev socket,path=/tmp/q
  t
  qemu-system-x86  1366   1332     0 x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1332.qmp,no
  wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display none -S -M none -m 2
  tset             1370   2129     0 /usr/bin/tset -c
  git              1374   1373     0 /usr/bin/git rev-parse --git-dir
  git              1376   1375     0 /usr/bin/git symbolic-ref HEAD
  git              1378   1377     0 /usr/bin/git config branch.testing/next.remote

So even though tests/test-hmp has nominally passed I think it has to be
one of the x86_64 tests unless there is something that was triggered a
lot longer ago finally crashing out somehow.

I did run the make under strace to see who is using O_NONBLOCK but even
after filtering out a bunch of stuff it seems to be quite embedded:

 ag "O_NONBLOCK" check.strace  | ag -v "/sys" | ag -v "/dev" | grep -v ".git" | wc -l
 2449

Anyway I thought it would be worth dumping my current debug state to the
list in case anyone has any bright ideas about whats going on or fancies
some weekend debugging.

The retry.py is just a hacky script I use, I'm sure everyone else has
something similar. See https://github.com/stsquad/retry

--
Alex Bennée

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Remaining CI failures
  2019-01-11 19:10 [Qemu-devel] Remaining CI failures Alex Bennée
@ 2019-01-14 10:20 ` Peter Maydell
  2019-01-14 15:02 ` Emilio G. Cota
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Maydell @ 2019-01-14 10:20 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel@nongnu.org, Paolo Bonzini, Philippe Mathieu-Daudé,
	cota@braap.org, Daniel P. Berrangé

On Fri, 11 Jan 2019 at 19:10, Alex Bennée <alex.bennee@linaro.org> wrote:
> I did run the make under strace to see who is using O_NONBLOCK but even
> after filtering out a bunch of stuff it seems to be quite embedded:
>
>  ag "O_NONBLOCK" check.strace  | ag -v "/sys" | ag -v "/dev" | grep -v ".git" | wc -l
>  2449

The strace rune I used to track down the other O_NONBLOCK issue was
strace -o /tmp/strace.log -e fcntl,execve,clone -e signal=none -f make ...

and then looking through the strace.log for uses of O_NONBLOCK
on fds 0, 1 or 2, and backtracking those to which binary had been
exec'd to give that PID. Then I did a full strace of just
that test, and looked to check whether the 0/1/2 was really
the top-level make's stdin/stdout/stderr or whether something else
had dup'd or opened an fd there.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Remaining CI failures
  2019-01-11 19:10 [Qemu-devel] Remaining CI failures Alex Bennée
  2019-01-14 10:20 ` Peter Maydell
@ 2019-01-14 15:02 ` Emilio G. Cota
  2019-01-14 15:12   ` Daniel P. Berrangé
  2019-01-14 15:29   ` Alex Bennée
  1 sibling, 2 replies; 5+ messages in thread
From: Emilio G. Cota @ 2019-01-14 15:02 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel@nongnu.org, Paolo Bonzini, Philippe Mathieu-Daudé,
	Peter Maydell, Daniel P. Berrangé

On Fri, Jan 11, 2019 at 19:10:07 +0000, Alex Bennée wrote:
> So trying to narrow down the remaining failures in the CI system. There
> is one with a patch in flight (use g_usleep instead of sleep) but there
> remains two failure modes, both erratic.
> 
> tests/qht-par:
> 
> I can trigger this on my dev machine with a gprof enabled build:
> 
>   # QEMU configure log Fri Jan 11 14:10:45 GMT 2019
>   # Configured with: './configure' '--disable-tools' '--disable-docs' '--enable-gprof' '--enable-gcov'
> 
> I only seem to be able to trigger it when running via the wrapper in the
> make system:
> 
>   retry.py -n 30 --invert make check-tests/test-qht-par
> 
> Eventually this crashes with:
> 
>   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)
> 
> Leaving a core dump for the child:

I can't replicate this on my machine :-(

I suspect this is probably related to the fact that gprof
is not even meant to support multi-threaded programs (like qht-bench).

Given that we've fixed the sleep issue (thanks!) and that there is no use
in running test-qht-par under gprof, I propose to permanently skip
it under gprof, e.g.:

--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -88,7 +88,8 @@ check-unit-y += tests/test-rcu-simpleq$(EXESUF)
 check-unit-y += tests/test-rcu-tailq$(EXESUF)
 check-unit-y += tests/test-qdist$(EXESUF)
 check-unit-y += tests/test-qht$(EXESUF)
-# FIXME: {test-qht-par + gprof} often break on Travis CI
+# test-qht-par invokes qht-bench, which is multi-threaded.
+# gprof doesn't support multi-threaded programs, so skip this test under gprof.
 check-unit-$(call lnot,$(CONFIG_GPROF)) += tests/test-qht-par$(EXESUF)
 check-unit-y += tests/test-bitops$(EXESUF)
 check-unit-y += tests/test-bitcnt$(EXESUF)

If you agree, I can submit a proper patch with the above.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Remaining CI failures
  2019-01-14 15:02 ` Emilio G. Cota
@ 2019-01-14 15:12   ` Daniel P. Berrangé
  2019-01-14 15:29   ` Alex Bennée
  1 sibling, 0 replies; 5+ messages in thread
From: Daniel P. Berrangé @ 2019-01-14 15:12 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: Alex Bennée, qemu-devel@nongnu.org, Paolo Bonzini,
	Philippe Mathieu-Daudé, Peter Maydell

On Mon, Jan 14, 2019 at 10:02:06AM -0500, Emilio G. Cota wrote:
> On Fri, Jan 11, 2019 at 19:10:07 +0000, Alex Bennée wrote:
> > So trying to narrow down the remaining failures in the CI system. There
> > is one with a patch in flight (use g_usleep instead of sleep) but there
> > remains two failure modes, both erratic.
> > 
> > tests/qht-par:
> > 
> > I can trigger this on my dev machine with a gprof enabled build:
> > 
> >   # QEMU configure log Fri Jan 11 14:10:45 GMT 2019
> >   # Configured with: './configure' '--disable-tools' '--disable-docs' '--enable-gprof' '--enable-gcov'
> > 
> > I only seem to be able to trigger it when running via the wrapper in the
> > make system:
> > 
> >   retry.py -n 30 --invert make check-tests/test-qht-par
> > 
> > Eventually this crashes with:
> > 
> >   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)
> > 
> > Leaving a core dump for the child:
> 
> I can't replicate this on my machine :-(
> 
> I suspect this is probably related to the fact that gprof
> is not even meant to support multi-threaded programs (like qht-bench).
> 
> Given that we've fixed the sleep issue (thanks!) and that there is no use
> in running test-qht-par under gprof, I propose to permanently skip
> it under gprof, e.g.:
> 
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -88,7 +88,8 @@ check-unit-y += tests/test-rcu-simpleq$(EXESUF)
>  check-unit-y += tests/test-rcu-tailq$(EXESUF)
>  check-unit-y += tests/test-qdist$(EXESUF)
>  check-unit-y += tests/test-qht$(EXESUF)
> -# FIXME: {test-qht-par + gprof} often break on Travis CI
> +# test-qht-par invokes qht-bench, which is multi-threaded.
> +# gprof doesn't support multi-threaded programs, so skip this test under gprof.

If gprof doesn't support threads, then we potentially need to skip more
of our tests. eg test-io-channel-socket uses APIs that spawn threads
behind the scenes, not to mention the test-thread-pool test. Some
chardev tests probably hit APIs that spawn background threads too.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Remaining CI failures
  2019-01-14 15:02 ` Emilio G. Cota
  2019-01-14 15:12   ` Daniel P. Berrangé
@ 2019-01-14 15:29   ` Alex Bennée
  1 sibling, 0 replies; 5+ messages in thread
From: Alex Bennée @ 2019-01-14 15:29 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel@nongnu.org, Paolo Bonzini, Philippe Mathieu-Daudé,
	Peter Maydell, Daniel P. Berrangé


Emilio G. Cota <cota@braap.org> writes:

> On Fri, Jan 11, 2019 at 19:10:07 +0000, Alex Bennée wrote:
>> So trying to narrow down the remaining failures in the CI system. There
>> is one with a patch in flight (use g_usleep instead of sleep) but there
>> remains two failure modes, both erratic.
>>
>> tests/qht-par:
>>
>> I can trigger this on my dev machine with a gprof enabled build:
>>
>>   # QEMU configure log Fri Jan 11 14:10:45 GMT 2019
>>   # Configured with: './configure' '--disable-tools' '--disable-docs' '--enable-gprof' '--enable-gcov'
>>
>> I only seem to be able to trigger it when running via the wrapper in the
>> make system:
>>
>>   retry.py -n 30 --invert make check-tests/test-qht-par
>>
>> Eventually this crashes with:
>>
>>   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)
>>
>> Leaving a core dump for the child:
>
> I can't replicate this on my machine :-(
>
> I suspect this is probably related to the fact that gprof
> is not even meant to support multi-threaded programs (like qht-bench).
>
> Given that we've fixed the sleep issue (thanks!) and that there is no use
> in running test-qht-par under gprof, I propose to permanently skip
> it under gprof, e.g.:
>
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -88,7 +88,8 @@ check-unit-y += tests/test-rcu-simpleq$(EXESUF)
>  check-unit-y += tests/test-rcu-tailq$(EXESUF)
>  check-unit-y += tests/test-qdist$(EXESUF)
>  check-unit-y += tests/test-qht$(EXESUF)
> -# FIXME: {test-qht-par + gprof} often break on Travis CI
> +# test-qht-par invokes qht-bench, which is multi-threaded.
> +# gprof doesn't support multi-threaded programs, so skip this test under gprof.
>  check-unit-$(call lnot,$(CONFIG_GPROF)) += tests/test-qht-par$(EXESUF)
>  check-unit-y += tests/test-bitops$(EXESUF)
>  check-unit-y += tests/test-bitcnt$(EXESUF)
>
> If you agree, I can submit a proper patch with the above.

Heh, that fix is already in although I reverted the work-around in my PR
as it seemed a little too hacky given it seems to be fixed now.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-01-14 15:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-01-11 19:10 [Qemu-devel] Remaining CI failures Alex Bennée
2019-01-14 10:20 ` Peter Maydell
2019-01-14 15:02 ` Emilio G. Cota
2019-01-14 15:12   ` Daniel P. Berrangé
2019-01-14 15:29   ` Alex Bennée

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).