* [Qemu-devel] interrupt handling in qemu
@ 2011-12-27 23:12 Xin Tong
2011-12-27 23:36 ` Peter Maydell
2011-12-28 10:42 ` Avi Kivity
0 siblings, 2 replies; 13+ messages in thread
From: Xin Tong @ 2011-12-27 23:12 UTC (permalink / raw)
To: qemu-devel
QEMU does not exit and handle interrupt within translation blocks. it
only exits after the translation block is finished. Assuming a
translation block is very long, is it possible that QEMU could have
exceeded the interrupt's "timing window" and yields unexpected
behavior.
The reason I ask is that I am searching for alternatives to QEMU
current way of handling interrupt (unlink translation blocks on
interrupt). However, an obvious approach - checking for interrupt in
every basic block, seems to be too heavy ( too many tb enters/exits
). Maybe checking interrupt in a few basic blocks might be better, but
what is a good measure for the number of basic blocks to execute
before checking for interrupt ?
Thanks
Xin
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] interrupt handling in qemu
2011-12-27 23:12 [Qemu-devel] interrupt handling in qemu Xin Tong
@ 2011-12-27 23:36 ` Peter Maydell
2011-12-28 0:43 ` Xin Tong
2011-12-28 10:42 ` Avi Kivity
1 sibling, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2011-12-27 23:36 UTC (permalink / raw)
To: Xin Tong; +Cc: qemu-devel
On 27 December 2011 23:12, Xin Tong <xerox.time.tech@gmail.com> wrote:
> The reason I ask is that I am searching for alternatives to QEMU
> current way of handling interrupt (unlink translation blocks on
> interrupt). However, an obvious approach - checking for interrupt in
> every basic block, seems to be too heavy ( too many tb enters/exits
> ).
It's not awful -- an extra load-test-branch-not-taken per
TB, which IIRC from last time I tried to measure it was ~3%
speed penalty, obv. very variable with what the guest code is.
I have a half-finished patch for this but since I don't have a
decent benchmarking setup I've never got round to submitting it.
> Maybe checking interrupt in a few basic blocks might be better, but
> what is a good measure for the number of basic blocks to execute
> before checking for interrupt ?
The trouble is that you can't tell when you're translating the
TB whether it's just one in a sequence A->B->C (where you could
perhaps skip the check at the start of B), or if you're
actually looking at a tight loop B->B (or B->C->B). So you don't
have the information conveniently to hand to tell you whether
you can skip compiling the interrupt check into this TB.
(One heuristic for how often we need to check would be "every
N instructions, or at every backwards branch or indirect-branch",
but this doesn't fit with the idea of putting the checks at the
start of the TB, they'd have to go in the middle of the TB which
is probably awkward.)
-- PMM
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] interrupt handling in qemu
2011-12-27 23:36 ` Peter Maydell
@ 2011-12-28 0:43 ` Xin Tong
2011-12-28 1:10 ` Peter Maydell
2011-12-28 21:10 ` Peter Maydell
0 siblings, 2 replies; 13+ messages in thread
From: Xin Tong @ 2011-12-28 0:43 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel
On Tue, Dec 27, 2011 at 4:36 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 27 December 2011 23:12, Xin Tong <xerox.time.tech@gmail.com> wrote:
>> The reason I ask is that I am searching for alternatives to QEMU
>> current way of handling interrupt (unlink translation blocks on
>> interrupt). However, an obvious approach - checking for interrupt in
>> every basic block, seems to be too heavy ( too many tb enters/exits
>> ).
>
> It's not awful -- an extra load-test-branch-not-taken per
> TB, which IIRC from last time I tried to measure it was ~3%
> speed penalty, obv. very variable with what the guest code is.
> I have a half-finished patch for this but since I don't have a
> decent benchmarking setup I've never got round to submitting it.
>
>> Maybe checking interrupt in a few basic blocks might be better, but
>> what is a good measure for the number of basic blocks to execute
>> before checking for interrupt ?
>
Which version of QEMU did you do your test on, and what are the tests.
I modified QEMU to check for interrupt status at the end of every TB
and ran it on SPECINT2000 benchmarks with QEMU 0.15.0. The performance
is 70% of the unmodified one for some benchmarks on a x86_64 host. I
agree that the extra load-test-branch-not-taken per TB is minimal, but
what I found is that the average number of TB executed per TB enter is
low (~3.5 TBs), while the unmodified approach has ~10 TBs per TB
enter. this makes me wonder why. Maybe the mechanism i used to gather
this statistics is flawed. but the performance is indeed hindered.
> The trouble is that you can't tell when you're translating the
> TB whether it's just one in a sequence A->B->C (where you could
> perhaps skip the check at the start of B), or if you're
> actually looking at a tight loop B->B (or B->C->B). So you don't
> have the information conveniently to hand to tell you whether
> you can skip compiling the interrupt check into this TB.
>
> (One heuristic for how often we need to check would be "every
> N instructions, or at every backwards branch or indirect-branch",
> but this doesn't fit with the idea of putting the checks at the
> start of the TB, they'd have to go in the middle of the TB which
> is probably awkward.)
By keeping a counter that decrements on every TB, and when the counter
reaches 0, the current executing TB checks for interrupt status. this
way, we can control the dynamic number of TBs executed per interrupt
check. But it is going to be introducing some more overhead. Probably
3% - 5%
---Xin .
>
> -- PMM
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] interrupt handling in qemu
2011-12-28 0:43 ` Xin Tong
@ 2011-12-28 1:10 ` Peter Maydell
2011-12-28 1:23 ` Xin Tong
2011-12-28 21:10 ` Peter Maydell
1 sibling, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2011-12-28 1:10 UTC (permalink / raw)
To: Xin Tong; +Cc: qemu-devel
On 28 December 2011 00:43, Xin Tong <xerox.time.tech@gmail.com> wrote:
> Which version of QEMU did you do your test on, and what are the tests.
It was whatever trunk qemu was a year or so ago, and the test was
just "time to login prompt for ARM guest in system mode".
> I modified QEMU to check for interrupt status at the end of every TB
> and ran it on SPECINT2000 benchmarks with QEMU 0.15.0. The performance
> is 70% of the unmodified one for some benchmarks on a x86_64 host.
I don't suppose you could provide a brief set of instructions for
setting up a benchmark setup like this? (Is this user-mode or
system-mode?)
> I
> agree that the extra load-test-branch-not-taken per TB is minimal, but
> what I found is that the average number of TB executed per TB enter is
> low (~3.5 TBs), while the unmodified approach has ~10 TBs per TB
> enter. this makes me wonder why. Maybe the mechanism i used to gather
> this statistics is flawed. but the performance is indeed hindered.
Odd.
> By keeping a counter that decrements on every TB, and when the counter
> reaches 0, the current executing TB checks for interrupt status.
Decrementing a counter on every TB is going to be slower than
just checking a flag, so you might as well just check the flag.
(If the flag is set you need to handle the interrupt anyway
so there's no point delaying it.)
-- PMM
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] interrupt handling in qemu
2011-12-28 1:10 ` Peter Maydell
@ 2011-12-28 1:23 ` Xin Tong
0 siblings, 0 replies; 13+ messages in thread
From: Xin Tong @ 2011-12-28 1:23 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel
On Tue, Dec 27, 2011 at 6:10 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 28 December 2011 00:43, Xin Tong <xerox.time.tech@gmail.com> wrote:
>> Which version of QEMU did you do your test on, and what are the tests.
>
> It was whatever trunk qemu was a year or so ago, and the test was
> just "time to login prompt for ARM guest in system mode".
>
>> I modified QEMU to check for interrupt status at the end of every TB
>> and ran it on SPECINT2000 benchmarks with QEMU 0.15.0. The performance
>> is 70% of the unmodified one for some benchmarks on a x86_64 host.
>
> I don't suppose you could provide a brief set of instructions for
> setting up a benchmark setup like this? (Is this user-mode or
> system-mode?)
It is in system mode, the SPECINT2000 benchmarks are running on top of
a ubuntu linux. It is not trial to set up.
>
>> I
>> agree that the extra load-test-branch-not-taken per TB is minimal, but
>> what I found is that the average number of TB executed per TB enter is
>> low (~3.5 TBs), while the unmodified approach has ~10 TBs per TB
>> enter. this makes me wonder why. Maybe the mechanism i used to gather
>> this statistics is flawed. but the performance is indeed hindered.
>
> Odd.
>
>> By keeping a counter that decrements on every TB, and when the counter
>> reaches 0, the current executing TB checks for interrupt status.
>
> Decrementing a counter on every TB is going to be slower than
> just checking a flag, so you might as well just check the flag.
> (If the flag is set you need to handle the interrupt anyway
> so there's no point delaying it.)
>
The point I am trying to make here is that if qemu TB exits for every
interrupt, it is going to be too many. What if QEMU exits and handle a
few interrupts in one exit.
> -- PMM
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] interrupt handling in qemu
2011-12-27 23:12 [Qemu-devel] interrupt handling in qemu Xin Tong
2011-12-27 23:36 ` Peter Maydell
@ 2011-12-28 10:42 ` Avi Kivity
2011-12-28 11:40 ` Peter Maydell
1 sibling, 1 reply; 13+ messages in thread
From: Avi Kivity @ 2011-12-28 10:42 UTC (permalink / raw)
To: Xin Tong; +Cc: qemu-devel
On 12/28/2011 01:12 AM, Xin Tong wrote:
> QEMU does not exit and handle interrupt within translation blocks. it
> only exits after the translation block is finished. Assuming a
> translation block is very long, is it possible that QEMU could have
> exceeded the interrupt's "timing window" and yields unexpected
> behavior.
>
> The reason I ask is that I am searching for alternatives to QEMU
> current way of handling interrupt (unlink translation blocks on
> interrupt). However, an obvious approach - checking for interrupt in
> every basic block, seems to be too heavy ( too many tb enters/exits
> ). Maybe checking interrupt in a few basic blocks might be better, but
> what is a good measure for the number of basic blocks to execute
> before checking for interrupt ?
>
It's possible to check for an interrupt before every instruction,
without any overhead:
- when a signal arrives, check the instruction pointer. If it points
outside tcg code, set a flag and return.
- consult a table indexed by the instruction pointer, that gives the
number of bytes to the next guest instruction boundary
- if nonzero, set a breakpoint at that boundary, and resume
- remove the breakpoint (if set)
- adjust the TB to return on the current instruction pointer
- return
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] interrupt handling in qemu
2011-12-28 10:42 ` Avi Kivity
@ 2011-12-28 11:40 ` Peter Maydell
2011-12-28 12:04 ` Avi Kivity
0 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2011-12-28 11:40 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel, Xin Tong
On 28 December 2011 10:42, Avi Kivity <avi@redhat.com> wrote:
> It's possible to check for an interrupt before every instruction,
> without any overhead:
>
> - when a signal arrives, check the instruction pointer. If it points
> outside tcg code, set a flag and return.
> - consult a table indexed by the instruction pointer, that gives the
> number of bytes to the next guest instruction boundary
> - if nonzero, set a breakpoint at that boundary, and resume
> - remove the breakpoint (if set)
> - adjust the TB to return on the current instruction pointer
> - return
This assumes you have hardware breakpoints on your host, so
it's not portable.
(You also need to add a check-and-handle-flag for every return
from a helper function to TCG code, and of course you need to
actually create the instruction-boundary table. These are both
overheads.)
-- PMM
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] interrupt handling in qemu
2011-12-28 11:40 ` Peter Maydell
@ 2011-12-28 12:04 ` Avi Kivity
2011-12-28 17:00 ` Xin Tong
0 siblings, 1 reply; 13+ messages in thread
From: Avi Kivity @ 2011-12-28 12:04 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel, Xin Tong
On 12/28/2011 01:40 PM, Peter Maydell wrote:
> On 28 December 2011 10:42, Avi Kivity <avi@redhat.com> wrote:
> > It's possible to check for an interrupt before every instruction,
> > without any overhead:
> >
> > - when a signal arrives, check the instruction pointer. If it points
> > outside tcg code, set a flag and return.
> > - consult a table indexed by the instruction pointer, that gives the
> > number of bytes to the next guest instruction boundary
> > - if nonzero, set a breakpoint at that boundary, and resume
> > - remove the breakpoint (if set)
> > - adjust the TB to return on the current instruction pointer
> > - return
>
> This assumes you have hardware breakpoints on your host, so
> it's not portable.
You could also use software breakpoints. Or just temporarily replace
the host instruction on the next guest instruction boundary with a return.
> (You also need to add a check-and-handle-flag for every return
> from a helper function to TCG code,
ah yes - didn't consider that.
you could put all helper in their own section, an do something around
that - but that assumes no callouts from helpers to the standard library.
> and of course you need to
> actually create the instruction-boundary table.
This should be well amortized.
> These are both
> overheads.)
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] interrupt handling in qemu
2011-12-28 12:04 ` Avi Kivity
@ 2011-12-28 17:00 ` Xin Tong
2011-12-28 19:07 ` Lluís Vilanova
0 siblings, 1 reply; 13+ messages in thread
From: Xin Tong @ 2011-12-28 17:00 UTC (permalink / raw)
To: Avi Kivity, Peter Maydell; +Cc: qemu-devel
My main concern here is not how timely the interrupts can be handled,
i am more interested in reducing the number of TB enters/exits due to
interrupt. Returning to qemu mainloop requires saving and restoring
register contexts which are expensive, what i am thinking is that can
we check and handle interrupts every few TBs executed. But the
drawback is that I do not know how many TBs would be a good number
such that the interrupts do not get delayed too much.
Thanks
On Wed, Dec 28, 2011 at 5:04 AM, Avi Kivity <avi@redhat.com> wrote:
> On 12/28/2011 01:40 PM, Peter Maydell wrote:
>> On 28 December 2011 10:42, Avi Kivity <avi@redhat.com> wrote:
>> > It's possible to check for an interrupt before every instruction,
>> > without any overhead:
>> >
>> > - when a signal arrives, check the instruction pointer. If it points
>> > outside tcg code, set a flag and return.
>> > - consult a table indexed by the instruction pointer, that gives the
>> > number of bytes to the next guest instruction boundary
>> > - if nonzero, set a breakpoint at that boundary, and resume
>> > - remove the breakpoint (if set)
>> > - adjust the TB to return on the current instruction pointer
>> > - return
>>
>> This assumes you have hardware breakpoints on your host, so
>> it's not portable.
>
> You could also use software breakpoints. Or just temporarily replace
> the host instruction on the next guest instruction boundary with a return.
>
>> (You also need to add a check-and-handle-flag for every return
>> from a helper function to TCG code,
>
> ah yes - didn't consider that.
>
> you could put all helper in their own section, an do something around
> that - but that assumes no callouts from helpers to the standard library.
>
>> and of course you need to
>> actually create the instruction-boundary table.
>
> This should be well amortized.
>
>> These are both
>> overheads.)
>
> --
> error compiling committee.c: too many arguments to function
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] interrupt handling in qemu
2011-12-28 17:00 ` Xin Tong
@ 2011-12-28 19:07 ` Lluís Vilanova
0 siblings, 0 replies; 13+ messages in thread
From: Lluís Vilanova @ 2011-12-28 19:07 UTC (permalink / raw)
To: Xin Tong; +Cc: Peter Maydell, Avi Kivity, qemu-devel
Xin Tong writes:
> My main concern here is not how timely the interrupts can be handled,
> i am more interested in reducing the number of TB enters/exits due to
> interrupt. Returning to qemu mainloop requires saving and restoring
> register contexts which are expensive, what i am thinking is that can
> we check and handle interrupts every few TBs executed. But the
> drawback is that I do not know how many TBs would be a good number
> such that the interrupts do not get delayed too much.
I think a maximum amount of guest time for which interrupts can be delayed would
provide a better response (maybe together with a maximum number of delayed
interrupts).
For that you could program a "special" timer that forces a return-from-guest,
whatever the mechanism. But I'm sure that's going to make the system slower than
just checking every fixed number of TBs.
Lluis
--
"And it's much the same thing with knowledge, for whenever you learn
something new, the whole world becomes that much richer."
-- The Princess of Pure Reason, as told by Norton Juster in The Phantom
Tollbooth
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] interrupt handling in qemu
2011-12-28 0:43 ` Xin Tong
2011-12-28 1:10 ` Peter Maydell
@ 2011-12-28 21:10 ` Peter Maydell
2011-12-29 0:48 ` Xin Tong
1 sibling, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2011-12-28 21:10 UTC (permalink / raw)
To: Xin Tong; +Cc: qemu-devel
On 28 December 2011 00:43, Xin Tong <xerox.time.tech@gmail.com> wrote:
> I modified QEMU to check for interrupt status at the end of every TB
> and ran it on SPECINT2000 benchmarks with QEMU 0.15.0. The performance
> is 70% of the unmodified one for some benchmarks on a x86_64 host. I
> agree that the extra load-test-branch-not-taken per TB is minimal, but
> what I found is that the average number of TB executed per TB enter is
> low (~3.5 TBs), while the unmodified approach has ~10 TBs per TB
> enter. this makes me wonder why. Maybe the mechanism i used to gather
> this statistics is flawed. but the performance is indeed hindered.
Since you said you're using system mode, here's my guess. The
unlink-tbs method of interrupting the guest CPU thread runs
in a second thread (the io thread), and doesn't stop the guest
CPU thread. So while the io thread is trying to unlink TBs,
the CPU thread is still running on, and might well execute
a few more TBs before the io thread's traversal of the TB
graph catches up with it and manages to unlink the TB link
the CPU thread is about to traverse.
More generally: are we really taking an interrupt every 3 to
5 TBs? This seems very high -- surely we will be spending more
time in the OS servicing interrupts than running useful guest
userspace code...
-- PMM
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] interrupt handling in qemu
2011-12-28 21:10 ` Peter Maydell
@ 2011-12-29 0:48 ` Xin Tong
2011-12-29 1:31 ` Peter Maydell
0 siblings, 1 reply; 13+ messages in thread
From: Xin Tong @ 2011-12-29 0:48 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel
That is my guess as well in the first place, but my QEMU is built with
CONFIG_IOTHREAD set to 0.
I am not 100% sure about how interrupts are delivered in QEMU, my
guess is that some kind of timer devices will have to fire and qemu
might have installed a signal handler and the signal handler takes the
signal and invokes unlink_tb. I hope you can enlighten me on that.
Thanks
Xin
On Wed, Dec 28, 2011 at 2:10 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 28 December 2011 00:43, Xin Tong <xerox.time.tech@gmail.com> wrote:
>> I modified QEMU to check for interrupt status at the end of every TB
>> and ran it on SPECINT2000 benchmarks with QEMU 0.15.0. The performance
>> is 70% of the unmodified one for some benchmarks on a x86_64 host. I
>> agree that the extra load-test-branch-not-taken per TB is minimal, but
>> what I found is that the average number of TB executed per TB enter is
>> low (~3.5 TBs), while the unmodified approach has ~10 TBs per TB
>> enter. this makes me wonder why. Maybe the mechanism i used to gather
>> this statistics is flawed. but the performance is indeed hindered.
>
> Since you said you're using system mode, here's my guess. The
> unlink-tbs method of interrupting the guest CPU thread runs
> in a second thread (the io thread), and doesn't stop the guest
> CPU thread. So while the io thread is trying to unlink TBs,
> the CPU thread is still running on, and might well execute
> a few more TBs before the io thread's traversal of the TB
> graph catches up with it and manages to unlink the TB link
> the CPU thread is about to traverse.
>
> More generally: are we really taking an interrupt every 3 to
> 5 TBs? This seems very high -- surely we will be spending more
> time in the OS servicing interrupts than running useful guest
> userspace code...
>
> -- PMM
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] interrupt handling in qemu
2011-12-29 0:48 ` Xin Tong
@ 2011-12-29 1:31 ` Peter Maydell
0 siblings, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2011-12-29 1:31 UTC (permalink / raw)
To: Xin Tong; +Cc: qemu-devel
On 29 December 2011 00:48, Xin Tong <xerox.time.tech@gmail.com> wrote:
> That is my guess as well in the first place, but my QEMU is built with
> CONFIG_IOTHREAD set to 0.
Your QEMU is old -- iothread is now the only option (the config
option to use not-iothread has gone away).
> I am not 100% sure about how interrupts are delivered in QEMU, my
> guess is that some kind of timer devices will have to fire and qemu
> might have installed a signal handler and the signal handler takes the
> signal and invokes unlink_tb. I hope you can enlighten me on that.
I think the non-iothread config used to use a signal handler, yes.
However I don't recall the details and it's all a bit irrelevant
now anyway. I recommend using an up to date source tree to do your
experiments with...
-- PMM
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2011-12-29 1:31 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-27 23:12 [Qemu-devel] interrupt handling in qemu Xin Tong
2011-12-27 23:36 ` Peter Maydell
2011-12-28 0:43 ` Xin Tong
2011-12-28 1:10 ` Peter Maydell
2011-12-28 1:23 ` Xin Tong
2011-12-28 21:10 ` Peter Maydell
2011-12-29 0:48 ` Xin Tong
2011-12-29 1:31 ` Peter Maydell
2011-12-28 10:42 ` Avi Kivity
2011-12-28 11:40 ` Peter Maydell
2011-12-28 12:04 ` Avi Kivity
2011-12-28 17:00 ` Xin Tong
2011-12-28 19:07 ` Lluís Vilanova
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).