* Re: [Qemu-devel] Qemu: Guest Linux hangs on Mac OS X 10.11
2015-10-18 20:37 ` Peter Maydell
@ 2015-10-19 5:09 ` Aaron Elkins
2015-10-19 5:50 ` Aaron Elkins
2015-11-08 22:55 ` Peter Maydell
2 siblings, 0 replies; 15+ messages in thread
From: Aaron Elkins @ 2015-10-19 5:09 UTC (permalink / raw)
To: Peter Maydell; +Cc: Paolo Bonzini, qemu-devel@nongnu.org Developers
> On Oct 19, 2015, at 4:37 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On 18 October 2015 at 20:46, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 16 October 2015 at 08:32, Aaron Elkins <threcius@yahoo.com> wrote:
>>> I built Qemu on Mac OS X 10.11 El Capitan with:
>>>
>>> ./configure
>>> make
>>> make install
>>>
>>> Successfully.
>>>
>>> And downloaded the official linux image, and ran it with the following:
>>>
>>> qemu-system-x86_64 -drive file=linux-0.2.img,index=0,media=disk,format=raw
>>>
>>> And then got the following warnings and the guest linux can not response to keyboard and freezes for a period of time, and this issue repeated forever while the guest linux was running.
>>>
>>> main-loop: WARNING: I/O thread spun for 1000 iterations
>>>
>>> Any idea?
>>
>> I've been able to reproduce this on my Yosemite system (only
>> with debugging disabled, annoyingly.) Not sure why it's doing
>> it yet, though...
>>
>> (Note to Paolo: if I disable the 'suppress second notifications'
>> code it prints the warning forever.)
>
> Sometimes it does manage to unwedge itself. Paolo, do you have
> any suggestions for how to debug this kind of issue? Backtraces
> and some debug printfs seem to indicate that the TCG CPU thread is sat in
> qemu_tcg_wait_io_event() until the main loop hits its "spun too
> much" check, and then the unlock-lock causes us to get out of the
> cond-wait, but then we get stuck again.
>
> thanks
> -- PMM
I testes this issue in older version of QEMU, and in version 1.5, the warning message disappeared,
but the lagging remained.
I am looking to a solution to this issue for a few days, but still get no progress.
-Aaron
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Qemu: Guest Linux hangs on Mac OS X 10.11
2015-10-18 20:37 ` Peter Maydell
2015-10-19 5:09 ` Aaron Elkins
@ 2015-10-19 5:50 ` Aaron Elkins
2015-11-08 22:55 ` Peter Maydell
2 siblings, 0 replies; 15+ messages in thread
From: Aaron Elkins @ 2015-10-19 5:50 UTC (permalink / raw)
To: Peter Maydell; +Cc: Paolo Bonzini, qemu-devel@nongnu.org Developers
> On Oct 19, 2015, at 4:37 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On 18 October 2015 at 20:46, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 16 October 2015 at 08:32, Aaron Elkins <threcius@yahoo.com> wrote:
>>> I built Qemu on Mac OS X 10.11 El Capitan with:
>>>
>>> ./configure
>>> make
>>> make install
>>>
>>> Successfully.
>>>
>>> And downloaded the official linux image, and ran it with the following:
>>>
>>> qemu-system-x86_64 -drive file=linux-0.2.img,index=0,media=disk,format=raw
>>>
>>> And then got the following warnings and the guest linux can not response to keyboard and freezes for a period of time, and this issue repeated forever while the guest linux was running.
>>>
>>> main-loop: WARNING: I/O thread spun for 1000 iterations
>>>
>>> Any idea?
>>
>> I've been able to reproduce this on my Yosemite system (only
>> with debugging disabled, annoyingly.) Not sure why it's doing
>> it yet, though...
>>
>> (Note to Paolo: if I disable the 'suppress second notifications'
>> code it prints the warning forever.)
>
> Sometimes it does manage to unwedge itself. Paolo, do you have
> any suggestions for how to debug this kind of issue? Backtraces
> and some debug printfs seem to indicate that the TCG CPU thread is sat in
> qemu_tcg_wait_io_event() until the main loop hits its "spun too
> much" check, and then the unlock-lock causes us to get out of the
> cond-wait, but then we get stuck again.
>
> thanks
> -- PMM
Yeah, I rebuild from the source code with --enable-debug, and this bug disappears,
It is only buggy with debugging disabled.
Wired.
-Aaron
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Qemu: Guest Linux hangs on Mac OS X 10.11
2015-10-18 20:37 ` Peter Maydell
2015-10-19 5:09 ` Aaron Elkins
2015-10-19 5:50 ` Aaron Elkins
@ 2015-11-08 22:55 ` Peter Maydell
2015-11-09 9:10 ` Paolo Bonzini
2 siblings, 1 reply; 15+ messages in thread
From: Peter Maydell @ 2015-11-08 22:55 UTC (permalink / raw)
To: Aaron Elkins; +Cc: Paolo Bonzini, qemu-devel@nongnu.org Developers
On 18 October 2015 at 21:37, Peter Maydell <peter.maydell@linaro.org> wrote:
> Sometimes it does manage to unwedge itself. Paolo, do you have
> any suggestions for how to debug this kind of issue?
So the good news is that on mainline this doesn't happen any more.
The bad news is that something weird is going on such that git
bisect doesn't give helpful answers. Specifically if I start by
compiling older versions and work forwards, then
0fd7e09 kvmclock: add a new function to update env->tsc.
shows the bug, and
6388acc Revert "Introduce cpu_clean_all_dirty"
does not. (And I've got to that commit both via a git-bisect
and by a second round of manually trying to identify the commit,
so it's consistent about where it changes behaviour.)
However that makes no sense because that revert commit
is just removing unused code. And then if I go backwards again
to 0fd7e09 the bug doesn't repro there.
thanks
-- PMM
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Qemu: Guest Linux hangs on Mac OS X 10.11
2015-11-08 22:55 ` Peter Maydell
@ 2015-11-09 9:10 ` Paolo Bonzini
2015-11-09 10:02 ` Peter Maydell
0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2015-11-09 9:10 UTC (permalink / raw)
To: Peter Maydell, Aaron Elkins; +Cc: qemu-devel@nongnu.org Developers
On 08/11/2015 23:55, Peter Maydell wrote:
> So the good news is that on mainline this doesn't happen any more.
> The bad news is that something weird is going on such that git
> bisect doesn't give helpful answers. Specifically if I start by
> compiling older versions and work forwards, then
> 0fd7e09 kvmclock: add a new function to update env->tsc.
> shows the bug, and
> 6388acc Revert "Introduce cpu_clean_all_dirty"
> does not. (And I've got to that commit both via a git-bisect
> and by a second round of manually trying to identify the commit,
> so it's consistent about where it changes behaviour.)
> However that makes no sense because that revert commit
> is just removing unused code. And then if I go backwards again
> to 0fd7e09 the bug doesn't repro there.
Even 0fd7e09 does not change behavior unless you use KVM (which you
obviously don't do under Mac OS X). So if you go backwards to 0fd7e09^
it shouldn't reproduce there either.
What is the known bad SHA1?
Paolo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Qemu: Guest Linux hangs on Mac OS X 10.11
2015-11-09 9:10 ` Paolo Bonzini
@ 2015-11-09 10:02 ` Peter Maydell
2015-11-09 10:21 ` Paolo Bonzini
0 siblings, 1 reply; 15+ messages in thread
From: Peter Maydell @ 2015-11-09 10:02 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Aaron Elkins, qemu-devel@nongnu.org Developers
On 9 November 2015 at 09:10, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 08/11/2015 23:55, Peter Maydell wrote:
>> So the good news is that on mainline this doesn't happen any more.
>> The bad news is that something weird is going on such that git
>> bisect doesn't give helpful answers. Specifically if I start by
>> compiling older versions and work forwards, then
>> 0fd7e09 kvmclock: add a new function to update env->tsc.
>> shows the bug, and
>> 6388acc Revert "Introduce cpu_clean_all_dirty"
>> does not. (And I've got to that commit both via a git-bisect
>> and by a second round of manually trying to identify the commit,
>> so it's consistent about where it changes behaviour.)
>> However that makes no sense because that revert commit
>> is just removing unused code. And then if I go backwards again
>> to 0fd7e09 the bug doesn't repro there.
>
> Even 0fd7e09 does not change behavior unless you use KVM (which you
> obviously don't do under Mac OS X). So if you go backwards to 0fd7e09^
> it shouldn't reproduce there either.
>
> What is the known bad SHA1?
2b5a79f is definitely bad even rebuilt from clean. I'm going
to do a re-bisect building each step from clean this morning.
thanks
-- PMM
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Qemu: Guest Linux hangs on Mac OS X 10.11
2015-11-09 10:02 ` Peter Maydell
@ 2015-11-09 10:21 ` Paolo Bonzini
2015-11-09 11:06 ` Peter Maydell
0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2015-11-09 10:21 UTC (permalink / raw)
To: Peter Maydell; +Cc: Aaron Elkins, qemu-devel@nongnu.org Developers
On 09/11/2015 11:02, Peter Maydell wrote:
> On 9 November 2015 at 09:10, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>>
>> On 08/11/2015 23:55, Peter Maydell wrote:
>>> So the good news is that on mainline this doesn't happen any more.
>>> The bad news is that something weird is going on such that git
>>> bisect doesn't give helpful answers. Specifically if I start by
>>> compiling older versions and work forwards, then
>>> 0fd7e09 kvmclock: add a new function to update env->tsc.
>>> shows the bug, and
>>> 6388acc Revert "Introduce cpu_clean_all_dirty"
>>> does not. (And I've got to that commit both via a git-bisect
>>> and by a second round of manually trying to identify the commit,
>>> so it's consistent about where it changes behaviour.)
>>> However that makes no sense because that revert commit
>>> is just removing unused code. And then if I go backwards again
>>> to 0fd7e09 the bug doesn't repro there.
>>
>> Even 0fd7e09 does not change behavior unless you use KVM (which you
>> obviously don't do under Mac OS X). So if you go backwards to 0fd7e09^
>> it shouldn't reproduce there either.
>>
>> What is the known bad SHA1?
>
> 2b5a79f is definitely bad even rebuilt from clean. I'm going
Hmm, so the list is pretty short:
-------------
Eduardo Habkost (3):
pc: Set hw_version on all machine classes
osdep: Rename qemu_{get, set}_version() to qemu_{, set_}hw_version()
megasas: Use qemu_hw_version() instead of QEMU_VERSION
Fam Zheng (1):
scripts/text2pod.pl: Escape left brace
Igor Mammedov (1):
file_ram_alloc: propagate error to caller instead of terminating QEMU
John Snow (2):
configure: disallow ccache during compile tests
configure: disable FORTIFY_SOURCE under clang
Paolo Bonzini (4):
target-i386: fix pcmpxstrx equal-ordered (strstr) mode
ioport: do not use CPU_LOG_IOPORT
qemu-log: remove -d ioport
memory: call begin, log_start and commit when registering a new listener
Pavel Fedin (1):
backends/hostmem-file: Allow to specify full pathname for backing file
Stefan Weil (1):
cpu-exec: Fix compiler warning (-Werror=clobbered)
-------------
The only patches that could possibly fix the bug are:
target-i386: fix pcmpxstrx equal-ordered (strstr) mode
memory: call begin, log_start and commit when registering a new listener
cpu-exec: Fix compiler warning (-Werror=clobbered)
It's probably the second. The first has been there forever, while
the last doesn't have any effect under clang.
Paolo
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [Qemu-devel] Qemu: Guest Linux hangs on Mac OS X 10.11
2015-11-09 10:21 ` Paolo Bonzini
@ 2015-11-09 11:06 ` Peter Maydell
2015-11-09 11:46 ` Peter Maydell
0 siblings, 1 reply; 15+ messages in thread
From: Peter Maydell @ 2015-11-09 11:06 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Aaron Elkins, qemu-devel@nongnu.org Developers
On 9 November 2015 at 10:21, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Hmm, so the list is pretty short:
The good news is that my second bisect conclusively fingered
the culprit for why the bug went away...
> configure: disable FORTIFY_SOURCE under clang
...the bad news is it's because this patch inadvertently makes
all non-fortify-source compiles be no-optimization (and we already
knew the bug didn't repro in an unoptimized build).
This also explains the weird behaviour in bisect: it was only
when some later commit touched enough of the header files to
force recompilation of whichever object file it is that's
being problematic that the bug disappeared.
thanks
-- PMM
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Qemu: Guest Linux hangs on Mac OS X 10.11
2015-11-09 11:06 ` Peter Maydell
@ 2015-11-09 11:46 ` Peter Maydell
2015-11-09 13:30 ` Peter Maydell
0 siblings, 1 reply; 15+ messages in thread
From: Peter Maydell @ 2015-11-09 11:46 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Aaron Elkins, qemu-devel@nongnu.org Developers
On 9 November 2015 at 11:06, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 9 November 2015 at 10:21, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Hmm, so the list is pretty short:
>
> The good news is that my second bisect conclusively fingered
> the culprit for why the bug went away...
>
>> configure: disable FORTIFY_SOURCE under clang
>
> ...the bad news is it's because this patch inadvertently makes
> all non-fortify-source compiles be no-optimization (and we already
> knew the bug didn't repro in an unoptimized build).
Yep, just confirmed that current master with commit b553a0428
reverted displays the bug again.
thanks
-- PMM
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Qemu: Guest Linux hangs on Mac OS X 10.11
2015-11-09 11:46 ` Peter Maydell
@ 2015-11-09 13:30 ` Peter Maydell
2015-11-09 13:40 ` Paolo Bonzini
0 siblings, 1 reply; 15+ messages in thread
From: Peter Maydell @ 2015-11-09 13:30 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Aaron Elkins, qemu-devel@nongnu.org Developers,
Michael S. Tsirkin
On 9 November 2015 at 11:46, Peter Maydell <peter.maydell@linaro.org> wrote:
> Yep, just confirmed that current master with commit b553a0428
> reverted displays the bug again.
After a bunch of "try building specific object files with optimization
off to see where the problem goes away" tests, I've narrowed the
problem down further: if you tell clang to disable optimization by
adding __attribute__ ((optnone)) to the two functions
hpet_time_after() and hpet_time_after64() in hw/timer/hpet.c then
the problem goes away.
My current theory is that we're doing something here that's not
valid C and the compiler ends up optimizing it into something
that results in the timer setting the next-timeout to a very
short interval and that's what's hogging the main-loop time.
I'll look in more detail after lunch.
thanks
-- PMM
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Qemu: Guest Linux hangs on Mac OS X 10.11
2015-11-09 13:30 ` Peter Maydell
@ 2015-11-09 13:40 ` Paolo Bonzini
2015-11-09 14:26 ` Peter Maydell
0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2015-11-09 13:40 UTC (permalink / raw)
To: Peter Maydell
Cc: Aaron Elkins, qemu-devel@nongnu.org Developers,
Michael S. Tsirkin
On 09/11/2015 14:30, Peter Maydell wrote:
> After a bunch of "try building specific object files with optimization
> off to see where the problem goes away" tests, I've narrowed the
> problem down further: if you tell clang to disable optimization by
> adding __attribute__ ((optnone)) to the two functions
> hpet_time_after() and hpet_time_after64() in hw/timer/hpet.c then
> the problem goes away.
>
> My current theory is that we're doing something here that's not
> valid C and the compiler ends up optimizing it into something
> that results in the timer setting the next-timeout to a very
> short interval and that's what's hogging the main-loop time.
> I'll look in more detail after lunch.
The obvious way to write those functions would be
static uint32_t hpet_time_after(uint64_t a, uint64_t b)
{
return ((int32_t)(b - a)) < 0;
}
static uint32_t hpet_time_after64(uint64_t a, uint64_t b)
{
return ((int64_t)(b - a)) < 0;
}
Paolo
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [Qemu-devel] Qemu: Guest Linux hangs on Mac OS X 10.11
2015-11-09 13:40 ` Paolo Bonzini
@ 2015-11-09 14:26 ` Peter Maydell
0 siblings, 0 replies; 15+ messages in thread
From: Peter Maydell @ 2015-11-09 14:26 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Aaron Elkins, qemu-devel@nongnu.org Developers,
Michael S. Tsirkin
On 9 November 2015 at 13:40, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 09/11/2015 14:30, Peter Maydell wrote:
>> After a bunch of "try building specific object files with optimization
>> off to see where the problem goes away" tests, I've narrowed the
>> problem down further: if you tell clang to disable optimization by
>> adding __attribute__ ((optnone)) to the two functions
>> hpet_time_after() and hpet_time_after64() in hw/timer/hpet.c then
>> the problem goes away.
>>
>> My current theory is that we're doing something here that's not
>> valid C and the compiler ends up optimizing it into something
>> that results in the timer setting the next-timeout to a very
>> short interval and that's what's hogging the main-loop time.
>> I'll look in more detail after lunch.
>
> The obvious way to write those functions would be
>
> static uint32_t hpet_time_after(uint64_t a, uint64_t b)
> {
> return ((int32_t)(b - a)) < 0;
> }
>
> static uint32_t hpet_time_after64(uint64_t a, uint64_t b)
> {
> return ((int64_t)(b - a)) < 0;
> }
Yep. I actually tried that just before lunch and it does indeed
cause the bug to go away.
-- PMM
^ permalink raw reply [flat|nested] 15+ messages in thread