* [Qemu-devel] qemu-system-ppc hangs
@ 2017-11-21 0:00 Richard Purdie
2017-11-21 7:50 ` Mark Cave-Ayland
2017-11-21 9:55 ` Alex Bennée
0 siblings, 2 replies; 5+ messages in thread
From: Richard Purdie @ 2017-11-21 0:00 UTC (permalink / raw)
To: qemu-ppc, david; +Cc: qemu-devel
Hi,
I work on the Yocto Project and we use qemu to test boot our Linux
images and run tests against them. We've been noticing some instability
for ppc where the images sometimes hang, usually around udevd bring up
time so just after booting into userspace.
To cut a long story short, I've tracked down what I think is the
problem. I believe the decrementer timer stops receiving interrupts so
tasks in our images hang indefinitely as the timer stopped.
It can be summed up with this line of debug:
ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req 00000004
It should normally read:
ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req 00000002
The question is why CPU_INTERRUPT_EXITTB ends up being set when the
lines above this log message clearly sets CPU_INTERRUPT_HARD (via
cpu_interrupt() ).
I note in cpu.h:
/* updates protected by BQL */
uint32_t interrupt_request;
(for struct CPUState)
The ppc code does "cs->interrupt_request |= CPU_INTERRUPT_EXITTB" in 5
places, 3 in excp_helper.c and 2 in helper_regs.h. In all cases,
g_assert(qemu_mutex_iothread_locked()); fails. If I do something like:
if (!qemu_mutex_iothread_locked()) {
qemu_mutex_lock_iothread();
cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
qemu_mutex_unlock_iothread();
} else {
cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
}
in these call sites then I can no longer lock qemu up with my test
case.
I suspect the _HARD setting gets overwritten which stops the
decrementer interrupts being delivered.
I don't know if taking this lock in these situations is going to be bad
for performance and whether such a patch would be right/wrong.
At this point I therefore wanted to seek advice on what the real issue
is here and how to fix it!
Cheers,
Richard
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] qemu-system-ppc hangs
2017-11-21 0:00 [Qemu-devel] qemu-system-ppc hangs Richard Purdie
@ 2017-11-21 7:50 ` Mark Cave-Ayland
2017-11-21 9:02 ` Richard Purdie
2017-11-21 9:55 ` Alex Bennée
1 sibling, 1 reply; 5+ messages in thread
From: Mark Cave-Ayland @ 2017-11-21 7:50 UTC (permalink / raw)
To: Richard Purdie, qemu-ppc, david; +Cc: qemu-devel
On 21/11/17 00:00, Richard Purdie wrote:
> Hi,
>
> I work on the Yocto Project and we use qemu to test boot our Linux
> images and run tests against them. We've been noticing some instability
> for ppc where the images sometimes hang, usually around udevd bring up
> time so just after booting into userspace.
>
> To cut a long story short, I've tracked down what I think is the
> problem. I believe the decrementer timer stops receiving interrupts so
> tasks in our images hang indefinitely as the timer stopped.
>
> It can be summed up with this line of debug:
>
> ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req 00000004
>
> It should normally read:
>
> ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req 00000002
>
> The question is why CPU_INTERRUPT_EXITTB ends up being set when the
> lines above this log message clearly sets CPU_INTERRUPT_HARD (via
> cpu_interrupt() ).
>
> I note in cpu.h:
>
> /* updates protected by BQL */
> uint32_t interrupt_request;
>
> (for struct CPUState)
>
> The ppc code does "cs->interrupt_request |= CPU_INTERRUPT_EXITTB" in 5
> places, 3 in excp_helper.c and 2 in helper_regs.h. In all cases,
> g_assert(qemu_mutex_iothread_locked()); fails. If I do something like:
>
> if (!qemu_mutex_iothread_locked()) {
> qemu_mutex_lock_iothread();
> cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> qemu_mutex_unlock_iothread();
> } else {
> cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> }
>
> in these call sites then I can no longer lock qemu up with my test
> case.
>
> I suspect the _HARD setting gets overwritten which stops the
> decrementer interrupts being delivered.
>
> I don't know if taking this lock in these situations is going to be bad
> for performance and whether such a patch would be right/wrong.
>
> At this point I therefore wanted to seek advice on what the real issue
> is here and how to fix it!
Thanks for the report - given that a lot of work has been done on MTTCG
and iothread over the past few releases, it wouldn't be a complete
surprise if something had crept in here.
Firstly let's start off with some basics: what is your host
architecture, QEMU version and full command line being used to launch QEMU?
Would it also be possible for you to make your test image available for
other people to see if they can recreate the same issue?
ATB,
Mark.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] qemu-system-ppc hangs
2017-11-21 7:50 ` Mark Cave-Ayland
@ 2017-11-21 9:02 ` Richard Purdie
0 siblings, 0 replies; 5+ messages in thread
From: Richard Purdie @ 2017-11-21 9:02 UTC (permalink / raw)
To: Mark Cave-Ayland, qemu-ppc, david; +Cc: qemu-devel
On Tue, 2017-11-21 at 07:50 +0000, Mark Cave-Ayland wrote:
> On 21/11/17 00:00, Richard Purdie wrote:
> > I work on the Yocto Project and we use qemu to test boot our Linux
> > images and run tests against them. We've been noticing some
> > instability
> > for ppc where the images sometimes hang, usually around udevd bring
> > up
> > time so just after booting into userspace.
> >
> > To cut a long story short, I've tracked down what I think is the
> > problem. I believe the decrementer timer stops receiving interrupts
> > so
> > tasks in our images hang indefinitely as the timer stopped.
> >
> > It can be summed up with this line of debug:
> >
> > ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req
> > 00000004
> >
> > It should normally read:
> >
> > ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req
> > 00000002
> >
> > The question is why CPU_INTERRUPT_EXITTB ends up being set when the
> > lines above this log message clearly sets CPU_INTERRUPT_HARD (via
> > cpu_interrupt() ).
> >
> > I note in cpu.h:
> >
> > /* updates protected by BQL */
> > uint32_t interrupt_request;
> >
> > (for struct CPUState)
> >
> > The ppc code does "cs->interrupt_request |= CPU_INTERRUPT_EXITTB"
> > in 5
> > places, 3 in excp_helper.c and 2 in helper_regs.h. In all cases,
> > g_assert(qemu_mutex_iothread_locked()); fails. If I do something
> > like:
> >
> > if (!qemu_mutex_iothread_locked()) {
> > qemu_mutex_lock_iothread();
> > cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> > qemu_mutex_unlock_iothread();
> > } else {
> > cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> > }
> >
> > in these call sites then I can no longer lock qemu up with my test
> > case.
> >
> > I suspect the _HARD setting gets overwritten which stops the
> > decrementer interrupts being delivered.
> >
> > I don't know if taking this lock in these situations is going to be
> > bad
> > for performance and whether such a patch would be right/wrong.
> >
> > At this point I therefore wanted to seek advice on what the real
> > issue
> > is here and how to fix it!
>
> Thanks for the report - given that a lot of work has been done on
> MTTCG and iothread over the past few releases, it wouldn't be a
> complete surprise if something had crept in here.
>
> Firstly let's start off with some basics: what is your host
> architecture, QEMU version and full command line being used to launch
> QEMU?
I'm running this on x86_64, I'm using qemu 2.10.1 and the commandline
being used for qemu is:
qemu-system-ppc -device virtio-net-pci,netdev=net0,mac=52:54:00:12:34:02
-netdev tap,id=net0,ifname=tap0,script=no,downscript=no
-drive file=./core-image-sato-qemuppc.rootfs.ext4,if=virtio,format=raw
-show-cursor -device virtio-rng-pci -nographic -pidfile /tmp/zzqemu.1.pid
-d unimp,guest_errors,int -D /tmp/qemu.1 -monitor pty -machine mac99
-cpu G4 -m 256 -snapshot -serial mon:stdio -serial null
-kernel /tmp/repro/vmlinux-qemuppc.bin -append 'root=/dev/vda rw highres=off
console=ttyS0 mem=256M ip=192.168.7.2::192.168.7.1:255.255.255.0 console=tty
console=ttyS0 udev.log-priority=debug powersave=off'
> Would it also be possible for you to make your test image available
> for other people to see if they can recreate the same issue?
I've shared the image, kernel and my "reproducer" script:
http://www.rpsys.net/wp/rp/qemuppc-hang-reproducer.tgz
We needed to find a way to reproduce this at will and it doesn't seem
to happen often. The scripts in there are partly extracted from our
test setup and partly ugly brute forcing. To run them you'd do
something like:
cc tunctl.c -o tunctl
sudo ./runqemu-gen-tapdevs 1000 1000 50
(This sets up tap0-tap49 accessible by user/group 1000/1000, only need
to do this once - its how we enable easy networking without needing
sudo on our test infrastructure)
vi core-image-sato-qemuppc.qemuboot.conf
[set the last three lines to point at where qemu-system-ppc lives]
vi ./runqemu-parallel.py
[set mydir to wherever you extracted it to]
python3 ./runqemu-parallel.py
This will launch 50 copies of qemu, dumping logging and output into
/tmp/qemu.X and /tmp/*runqemu* files and than monitor the logs to see
which if any "stall". Its normal for the image to stall for a few
seconds towards the end of boot but if any are printing stalled
messages for a minute, they've properly hung. You'll see a:
ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req
00000004
in the logs of a hung qemu. The image output would usually stop with a
ep_poll4. The kernel I've provided there is a very verbose debug kernel
which is hang to tell when its hung, if a bit slower to boot.
I didn't promise it was neat, sorry :)
Cheers,
Richard
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] qemu-system-ppc hangs
2017-11-21 0:00 [Qemu-devel] qemu-system-ppc hangs Richard Purdie
2017-11-21 7:50 ` Mark Cave-Ayland
@ 2017-11-21 9:55 ` Alex Bennée
2017-11-21 17:33 ` Richard Purdie
1 sibling, 1 reply; 5+ messages in thread
From: Alex Bennée @ 2017-11-21 9:55 UTC (permalink / raw)
To: Richard Purdie; +Cc: qemu-ppc, david, qemu-devel
Richard Purdie <richard.purdie@linuxfoundation.org> writes:
> Hi,
>
> I work on the Yocto Project and we use qemu to test boot our Linux
> images and run tests against them. We've been noticing some instability
> for ppc where the images sometimes hang, usually around udevd bring up
> time so just after booting into userspace.
>
> To cut a long story short, I've tracked down what I think is the
> problem. I believe the decrementer timer stops receiving interrupts so
> tasks in our images hang indefinitely as the timer stopped.
>
> It can be summed up with this line of debug:
>
> ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req 00000004
>
> It should normally read:
>
> ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level1 => pending 00000100req 00000002
>
> The question is whyCPU_INTERRUPT_EXITTB ends up being set when the
> lines above thislog message clearly setsCPU_INTERRUPT_HARD (via
> cpu_interrupt() ).
>
> I note in cpu.h:
>
> /* updates protected by BQL */
> uint32_t interrupt_request;
>
> (for struct CPUState)
>
> The ppc code does "cs->interrupt_request |= CPU_INTERRUPT_EXITTB" in 5
> places, 3 in excp_helper.c and 2 in helper_regs.h. In all cases,
> g_assert(qemu_mutex_iothread_locked()); fails. If I do something like:
>
> if (!qemu_mutex_iothread_locked()) {
> qemu_mutex_lock_iothread();
> cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> qemu_mutex_unlock_iothread();
> } else {
> cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> }
>
> in these call sites then I can no longer lock qemu up with my test
> case.
>
> I suspect the _HARD setting gets overwritten which stops the
> decrementer interrupts being delivered.
>
> I don't know if taking this lock in these situations is going to be bad
> for performance and whether such a patch would be right/wrong.
It is certainly right to hold the BQL when messing with cpu_interrupt()
(and by extension cpu->interrupt_request). This is because although
nominally a per-vCPU variable there are cross-vCPU accesses and therefor
a chance of races like the one you have triggered.
> At this point I therefore wanted to seek advice on what the real issue
> is here and how to fix it!
Code should be using cpu_interrupt() to change things but we have plenty
of examples in the code of messing with cpu->interrupt_request directly
which is often why the assert() in cpu_interrupt() doesn't get a chance
to fire.
The two most prolific direct users seems to be i386 and ppc.
The i386 cases are all most likely OK as it tends to be in KVM emulation
code where by definition the BQL is already held by the time you get
there. For PPC it will depend on how you got there. The
multi-thread-tcg.txt document describes the approach:
Emulated hardware state
-----------------------
Currently thanks to KVM work any access to IO memory is automatically
protected by the global iothread mutex, also known as the BQL (Big
Qemu Lock). Any IO region that doesn't use global mutex is expected to
do its own locking.
However IO memory isn't the only way emulated hardware state can be
modified. Some architectures have model specific registers that
trigger hardware emulation features. Generally any translation helper
that needs to update more than a single vCPUs of state should take the
BQL.
As the BQL, or global iothread mutex is shared across the system we
push the use of the lock as far down into the TCG code as possible to
minimise contention.
(Current solution)
MMIO access automatically serialises hardware emulation by way of the
BQL. Currently ARM targets serialise all ARM_CP_IO register accesses
and also defer the reset/startup of vCPUs to the vCPU context by way
of async_run_on_cpu().
Updates to interrupt state are also protected by the BQL as they can
often be cross vCPU.
So basically it comes down to the call-path into your final
cpu_interrupt() call which is why I guess your doing the:
if (!qemu_mutex_iothread_locked()) {
qemu_mutex_lock_iothread();
cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
qemu_mutex_unlock_iothread();
}
dance. I suspect the helper functions are called both from device
emulation (where BQL is held) and from vCPU helpers (where it is
currently not).
So I suggest the fix is:
1. Convert sites doing their own manipulation of
cpu->interrupt_request() to call the helper instead
2. If helper directly called from TCG code:
- take BQL before calling cpu_interrupt(), release after
Else if helper shared between MMIO/TCG paths
- take BQL from TCG path, call helper, release after
It might be there is some sensible re-factoring that could be done to
make things clearer but I'll leave that to the PPC experts to weigh in
on.
Hope that helps!
--
Alex Bennée
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] qemu-system-ppc hangs
2017-11-21 9:55 ` Alex Bennée
@ 2017-11-21 17:33 ` Richard Purdie
0 siblings, 0 replies; 5+ messages in thread
From: Richard Purdie @ 2017-11-21 17:33 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-ppc, qemu-devel, david
On Tue, 2017-11-21 at 09:55 +0000, Alex Bennée wrote:
> Richard Purdie <richard.purdie@linuxfoundation.org> writes:
> > At this point I therefore wanted to seek advice on what the real
> > issue
> > is here and how to fix it!
> Code should be using cpu_interrupt() to change things but we have
> plenty
> of examples in the code of messing with cpu->interrupt_request
> directly
> which is often why the assert() in cpu_interrupt() doesn't get a
> chance
> to fire.
>
> The two most prolific direct users seems to be i386 and ppc.
>
> The i386 cases are all most likely OK as it tends to be in KVM
> emulation
> code where by definition the BQL is already held by the time you get
> there. For PPC it will depend on how you got there. The
> multi-thread-tcg.txt document describes the approach:
>
> Emulated hardware state
> -----------------------
>
> Currently thanks to KVM work any access to IO memory is
> automatically
> protected by the global iothread mutex, also known as the BQL (Big
> Qemu Lock). Any IO region that doesn't use global mutex is expected
> to
> do its own locking.
>
> However IO memory isn't the only way emulated hardware state can be
> modified. Some architectures have model specific registers that
> trigger hardware emulation features. Generally any translation
> helper
> that needs to update more than a single vCPUs of state should take
> the
> BQL.
>
> As the BQL, or global iothread mutex is shared across the system we
> push the use of the lock as far down into the TCG code as possible
> to
> minimise contention.
>
> (Current solution)
>
> MMIO access automatically serialises hardware emulation by way of
> the
> BQL. Currently ARM targets serialise all ARM_CP_IO register
> accesses
> and also defer the reset/startup of vCPUs to the vCPU context by
> way
> of async_run_on_cpu().
>
> Updates to interrupt state are also protected by the BQL as they
> can
> often be cross vCPU.
>
> So basically it comes down to the call-path into your final
> cpu_interrupt() call which is why I guess your doing the:
>
> if (!qemu_mutex_iothread_locked()) {
> qemu_mutex_lock_iothread();
> cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> qemu_mutex_unlock_iothread();
> }
>
> dance. I suspect the helper functions are called both from device
> emulation (where BQL is held) and from vCPU helpers (where it is
> currently not).
>
> So I suggest the fix is:
>
> 1. Convert sites doing their own manipulation of
> cpu->interrupt_request() to call the helper instead
> 2. If helper directly called from TCG code:
> - take BQL before calling cpu_interrupt(), release after
> Else if helper shared between MMIO/TCG paths
> - take BQL from TCG path, call helper, release after
>
> It might be there is some sensible re-factoring that could be done to
> make things clearer but I'll leave that to the PPC experts to weigh
> in
> on.
>
> Hope that helps!
It does indeed, thanks. I've sent out a proposed patch which does the
above so people can review it and see if its the right thing to do.
Certainly its working fine locally.
Cheers,
Richard
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-11-21 17:33 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-21 0:00 [Qemu-devel] qemu-system-ppc hangs Richard Purdie
2017-11-21 7:50 ` Mark Cave-Ayland
2017-11-21 9:02 ` Richard Purdie
2017-11-21 9:55 ` Alex Bennée
2017-11-21 17:33 ` Richard Purdie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).