* [Qemu-devel] [PATCH v2 1/3] Do not drop global mutex for polled main loop runs
2011-04-11 20:27 [Qemu-devel] [PATCH v2 0/3] io-thread optimizations Jan Kiszka
@ 2011-04-11 20:27 ` Jan Kiszka
2011-04-11 20:27 ` [Qemu-devel] [PATCH v2 2/3] Poll main loop after I/O events were received Jan Kiszka
` (3 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: Jan Kiszka @ 2011-04-11 20:27 UTC (permalink / raw)
To: qemu-devel
Cc: Paolo Bonzini, Anthony Liguori, Marcelo Tosatti, Aurelien Jarno
From: Jan Kiszka <jan.kiszka@siemens.com>
If we call select without a timeout, it's more efficient to keep the
global mutex locked as we may otherwise just play ping pong with a
vcpu thread contending for it. This is particularly important for TCG
mode where we run in lock-step with the vcpu thread.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
vl.c | 10 ++++++++--
1 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/vl.c b/vl.c
index 68c3b53..b7bbed8 100644
--- a/vl.c
+++ b/vl.c
@@ -1314,9 +1314,15 @@ void main_loop_wait(int nonblocking)
qemu_iohandler_fill(&nfds, &rfds, &wfds, &xfds);
slirp_select_fill(&nfds, &rfds, &wfds, &xfds);
- qemu_mutex_unlock_iothread();
+ if (timeout > 0) {
+ qemu_mutex_unlock_iothread();
+ }
+
ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv);
- qemu_mutex_lock_iothread();
+
+ if (timeout > 0) {
+ qemu_mutex_lock_iothread();
+ }
qemu_iohandler_poll(&rfds, &wfds, &xfds, ret);
slirp_select_poll(&rfds, &wfds, &xfds, (ret < 0));
--
1.7.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH v2 2/3] Poll main loop after I/O events were received
2011-04-11 20:27 [Qemu-devel] [PATCH v2 0/3] io-thread optimizations Jan Kiszka
2011-04-11 20:27 ` [Qemu-devel] [PATCH v2 1/3] Do not drop global mutex for polled main loop runs Jan Kiszka
@ 2011-04-11 20:27 ` Jan Kiszka
2011-04-11 20:27 ` [Qemu-devel] [PATCH v2 3/3] Do not kick vcpus in TCG mode Jan Kiszka
` (2 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: Jan Kiszka @ 2011-04-11 20:27 UTC (permalink / raw)
To: qemu-devel
Cc: Paolo Bonzini, Anthony Liguori, Marcelo Tosatti, Aurelien Jarno
From: Jan Kiszka <jan.kiszka@siemens.com>
Polling until select returns empty fdsets helps to reduce the switches
between iothread and vcpus. The benefit of this patch is best visible
when running an SMP guest on an SMP host in emulation mode.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
sysemu.h | 2 +-
vl.c | 12 ++++++++----
2 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/sysemu.h b/sysemu.h
index bbbd0fd..f75a03a 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -87,7 +87,7 @@ void cpu_synchronize_all_post_init(void);
void qemu_announce_self(void);
-void main_loop_wait(int nonblocking);
+int main_loop_wait(int nonblocking);
bool qemu_savevm_state_blocked(Monitor *mon);
int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
diff --git a/vl.c b/vl.c
index b7bbed8..2c46cd9 100644
--- a/vl.c
+++ b/vl.c
@@ -1286,7 +1286,7 @@ void qemu_system_vmstop_request(int reason)
qemu_notify_event();
}
-void main_loop_wait(int nonblocking)
+int main_loop_wait(int nonblocking)
{
fd_set rfds, wfds, xfds;
int ret, nfds;
@@ -1333,6 +1333,7 @@ void main_loop_wait(int nonblocking)
them. */
qemu_bh_poll();
+ return ret;
}
#ifndef CONFIG_IOTHREAD
@@ -1350,7 +1351,8 @@ qemu_irq qemu_system_powerdown;
static void main_loop(void)
{
- bool nonblocking = false;
+ bool nonblocking;
+ int last_io = 0;
#ifdef CONFIG_PROFILER
int64_t ti;
#endif
@@ -1359,7 +1361,9 @@ static void main_loop(void)
qemu_main_loop_start();
for (;;) {
-#ifndef CONFIG_IOTHREAD
+#ifdef CONFIG_IOTHREAD
+ nonblocking = !kvm_enabled() && last_io > 0;
+#else
nonblocking = cpu_exec_all();
if (vm_request_pending()) {
nonblocking = true;
@@ -1368,7 +1372,7 @@ static void main_loop(void)
#ifdef CONFIG_PROFILER
ti = profile_getclock();
#endif
- main_loop_wait(nonblocking);
+ last_io = main_loop_wait(nonblocking);
#ifdef CONFIG_PROFILER
dev_time += profile_getclock() - ti;
#endif
--
1.7.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH v2 3/3] Do not kick vcpus in TCG mode
2011-04-11 20:27 [Qemu-devel] [PATCH v2 0/3] io-thread optimizations Jan Kiszka
2011-04-11 20:27 ` [Qemu-devel] [PATCH v2 1/3] Do not drop global mutex for polled main loop runs Jan Kiszka
2011-04-11 20:27 ` [Qemu-devel] [PATCH v2 2/3] Poll main loop after I/O events were received Jan Kiszka
@ 2011-04-11 20:27 ` Jan Kiszka
2011-04-12 7:09 ` [Qemu-devel] [PATCH v2 0/3] io-thread optimizations Paolo Bonzini
2011-04-13 20:16 ` Aurelien Jarno
4 siblings, 0 replies; 15+ messages in thread
From: Jan Kiszka @ 2011-04-11 20:27 UTC (permalink / raw)
To: qemu-devel
Cc: Paolo Bonzini, Anthony Liguori, Marcelo Tosatti, Aurelien Jarno
From: Jan Kiszka <jan.kiszka@siemens.com>
In TCG mode, iothread and vcpus run in lock-step. So it's pointless to
send a signal from qemu_cpu_kick to the vcpu thread - if we got here,
the receiver already left the vcpu loop.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
cpus.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/cpus.c b/cpus.c
index 41bec7c..82a7806 100644
--- a/cpus.c
+++ b/cpus.c
@@ -860,7 +860,7 @@ void qemu_cpu_kick(void *_env)
CPUState *env = _env;
qemu_cond_broadcast(env->halt_cond);
- if (!env->thread_kicked) {
+ if (kvm_enabled() && !env->thread_kicked) {
qemu_cpu_kick_thread(env);
env->thread_kicked = true;
}
--
1.7.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] io-thread optimizations
2011-04-11 20:27 [Qemu-devel] [PATCH v2 0/3] io-thread optimizations Jan Kiszka
` (2 preceding siblings ...)
2011-04-11 20:27 ` [Qemu-devel] [PATCH v2 3/3] Do not kick vcpus in TCG mode Jan Kiszka
@ 2011-04-12 7:09 ` Paolo Bonzini
2011-04-13 20:16 ` Aurelien Jarno
4 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2011-04-12 7:09 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Anthony Liguori, Marcelo Tosatti, qemu-devel, Aurelien Jarno
On 04/11/2011 10:27 PM, Jan Kiszka wrote:
> These patches were posted before. They bring down the overhead of the
> io-thread mode for TCG here, specifically when emulating SMP.
>
> The major change in this version, besides rebasing, is the exclusion of
> KVM from the main loop polling optimization.
>
>
>
> Jan Kiszka (3):
> Do not drop global mutex for polled main loop runs
> Poll main loop after I/O events were received
> Do not kick vcpus in TCG mode
>
> cpus.c | 2 +-
> sysemu.h | 2 +-
> vl.c | 22 ++++++++++++++++------
> 3 files changed, 18 insertions(+), 8 deletions(-)
>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] io-thread optimizations
2011-04-11 20:27 [Qemu-devel] [PATCH v2 0/3] io-thread optimizations Jan Kiszka
` (3 preceding siblings ...)
2011-04-12 7:09 ` [Qemu-devel] [PATCH v2 0/3] io-thread optimizations Paolo Bonzini
@ 2011-04-13 20:16 ` Aurelien Jarno
2011-04-14 7:14 ` Jan Kiszka
2011-06-25 8:38 ` Jan Kiszka
4 siblings, 2 replies; 15+ messages in thread
From: Aurelien Jarno @ 2011-04-13 20:16 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Paolo Bonzini, Anthony Liguori, Marcelo Tosatti, qemu-devel
On Mon, Apr 11, 2011 at 10:27:41PM +0200, Jan Kiszka wrote:
> These patches were posted before. They bring down the overhead of the
> io-thread mode for TCG here, specifically when emulating SMP.
>
> The major change in this version, besides rebasing, is the exclusion of
> KVM from the main loop polling optimization.
>
>
>
> Jan Kiszka (3):
> Do not drop global mutex for polled main loop runs
> Poll main loop after I/O events were received
> Do not kick vcpus in TCG mode
>
> cpus.c | 2 +-
> sysemu.h | 2 +-
> vl.c | 22 ++++++++++++++++------
> 3 files changed, 18 insertions(+), 8 deletions(-)
>
Thanks for working on improving the io-thread with TCG. Your patches
make sense, but they don't seems to fix the slowdown observed when
enabling the io-thread. Well maybe they were not supposed to. This is
for example the results of netperf between guest and host using virtio:
no io-thread 122 MB/s
io-thread 97 MB/s
io-thread + patches 98 MB/s
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] io-thread optimizations
2011-04-13 20:16 ` Aurelien Jarno
@ 2011-04-14 7:14 ` Jan Kiszka
2011-04-14 13:45 ` Avi Kivity
2011-04-25 18:35 ` Aurelien Jarno
2011-06-25 8:38 ` Jan Kiszka
1 sibling, 2 replies; 15+ messages in thread
From: Jan Kiszka @ 2011-04-14 7:14 UTC (permalink / raw)
To: Aurelien Jarno
Cc: Paolo Bonzini, Anthony Liguori, Marcelo Tosatti, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]
On 2011-04-13 22:16, Aurelien Jarno wrote:
> On Mon, Apr 11, 2011 at 10:27:41PM +0200, Jan Kiszka wrote:
>> These patches were posted before. They bring down the overhead of the
>> io-thread mode for TCG here, specifically when emulating SMP.
>>
>> The major change in this version, besides rebasing, is the exclusion of
>> KVM from the main loop polling optimization.
>>
>>
>>
>> Jan Kiszka (3):
>> Do not drop global mutex for polled main loop runs
>> Poll main loop after I/O events were received
>> Do not kick vcpus in TCG mode
>>
>> cpus.c | 2 +-
>> sysemu.h | 2 +-
>> vl.c | 22 ++++++++++++++++------
>> 3 files changed, 18 insertions(+), 8 deletions(-)
>>
>
> Thanks for working on improving the io-thread with TCG. Your patches
> make sense, but they don't seems to fix the slowdown observed when
> enabling the io-thread. Well maybe they were not supposed to. This is
> for example the results of netperf between guest and host using virtio:
>
> no io-thread 122 MB/s
> io-thread 97 MB/s
> io-thread + patches 98 MB/s
>
Can you capture ftraces of io-thread enabled & disabled runs? They just
need to cover a hand full of frames.
Thanks,
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] io-thread optimizations
2011-04-14 7:14 ` Jan Kiszka
@ 2011-04-14 13:45 ` Avi Kivity
2011-04-14 13:59 ` Anthony Liguori
2011-04-25 18:35 ` Aurelien Jarno
1 sibling, 1 reply; 15+ messages in thread
From: Avi Kivity @ 2011-04-14 13:45 UTC (permalink / raw)
To: Jan Kiszka
Cc: Paolo Bonzini, Anthony Liguori, Marcelo Tosatti, qemu-devel,
Aurelien Jarno
On 04/14/2011 10:14 AM, Jan Kiszka wrote:
> On 2011-04-13 22:16, Aurelien Jarno wrote:
> > On Mon, Apr 11, 2011 at 10:27:41PM +0200, Jan Kiszka wrote:
> >> These patches were posted before. They bring down the overhead of the
> >> io-thread mode for TCG here, specifically when emulating SMP.
> >>
> >> The major change in this version, besides rebasing, is the exclusion of
> >> KVM from the main loop polling optimization.
> >>
> >>
> >>
> >> Jan Kiszka (3):
> >> Do not drop global mutex for polled main loop runs
> >> Poll main loop after I/O events were received
> >> Do not kick vcpus in TCG mode
> >>
> >> cpus.c | 2 +-
> >> sysemu.h | 2 +-
> >> vl.c | 22 ++++++++++++++++------
> >> 3 files changed, 18 insertions(+), 8 deletions(-)
> >>
> >
> > Thanks for working on improving the io-thread with TCG. Your patches
> > make sense, but they don't seems to fix the slowdown observed when
> > enabling the io-thread. Well maybe they were not supposed to. This is
> > for example the results of netperf between guest and host using virtio:
> >
> > no io-thread 122 MB/s
> > io-thread 97 MB/s
> > io-thread + patches 98 MB/s
> >
>
> Can you capture ftraces of io-thread enabled& disabled runs? They just
> need to cover a hand full of frames.
>
Also interesting would be the context switch rates on the host.
If they're large, perhaps using user-space threading instead of native
threads would help.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] io-thread optimizations
2011-04-14 13:45 ` Avi Kivity
@ 2011-04-14 13:59 ` Anthony Liguori
2011-04-14 14:05 ` Avi Kivity
0 siblings, 1 reply; 15+ messages in thread
From: Anthony Liguori @ 2011-04-14 13:59 UTC (permalink / raw)
To: Avi Kivity
Cc: Paolo Bonzini, Marcelo Tosatti, Jan Kiszka, qemu-devel,
Aurelien Jarno
On 04/14/2011 08:45 AM, Avi Kivity wrote:
> On 04/14/2011 10:14 AM, Jan Kiszka wrote:
>> On 2011-04-13 22:16, Aurelien Jarno wrote:
>> > On Mon, Apr 11, 2011 at 10:27:41PM +0200, Jan Kiszka wrote:
>> >> These patches were posted before. They bring down the overhead of
>> the
>> >> io-thread mode for TCG here, specifically when emulating SMP.
>> >>
>> >> The major change in this version, besides rebasing, is the
>> exclusion of
>> >> KVM from the main loop polling optimization.
>> >>
>> >>
>> >>
>> >> Jan Kiszka (3):
>> >> Do not drop global mutex for polled main loop runs
>> >> Poll main loop after I/O events were received
>> >> Do not kick vcpus in TCG mode
>> >>
>> >> cpus.c | 2 +-
>> >> sysemu.h | 2 +-
>> >> vl.c | 22 ++++++++++++++++------
>> >> 3 files changed, 18 insertions(+), 8 deletions(-)
>> >>
>> >
>> > Thanks for working on improving the io-thread with TCG. Your patches
>> > make sense, but they don't seems to fix the slowdown observed when
>> > enabling the io-thread. Well maybe they were not supposed to. This is
>> > for example the results of netperf between guest and host using
>> virtio:
>> >
>> > no io-thread 122 MB/s
>> > io-thread 97 MB/s
>> > io-thread + patches 98 MB/s
>> >
>>
>> Can you capture ftraces of io-thread enabled& disabled runs? They just
>> need to cover a hand full of frames.
>>
>
> Also interesting would be the context switch rates on the host.
>
> If they're large, perhaps using user-space threading instead of native
> threads would help.
I still suspect mitigation as the culprit here. Select is going to get
to run more often which means more interrupt generation.
I bet if you count the number of packets per interrupt/notify you'll
find that less batching is occurring.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] io-thread optimizations
2011-04-14 13:59 ` Anthony Liguori
@ 2011-04-14 14:05 ` Avi Kivity
0 siblings, 0 replies; 15+ messages in thread
From: Avi Kivity @ 2011-04-14 14:05 UTC (permalink / raw)
To: Anthony Liguori
Cc: Paolo Bonzini, Marcelo Tosatti, Jan Kiszka, qemu-devel,
Aurelien Jarno
On 04/14/2011 04:59 PM, Anthony Liguori wrote:
> On 04/14/2011 08:45 AM, Avi Kivity wrote:
>> On 04/14/2011 10:14 AM, Jan Kiszka wrote:
>>> On 2011-04-13 22:16, Aurelien Jarno wrote:
>>> > On Mon, Apr 11, 2011 at 10:27:41PM +0200, Jan Kiszka wrote:
>>> >> These patches were posted before. They bring down the overhead
>>> of the
>>> >> io-thread mode for TCG here, specifically when emulating SMP.
>>> >>
>>> >> The major change in this version, besides rebasing, is the
>>> exclusion of
>>> >> KVM from the main loop polling optimization.
>>> >>
>>> >>
>>> >>
>>> >> Jan Kiszka (3):
>>> >> Do not drop global mutex for polled main loop runs
>>> >> Poll main loop after I/O events were received
>>> >> Do not kick vcpus in TCG mode
>>> >>
>>> >> cpus.c | 2 +-
>>> >> sysemu.h | 2 +-
>>> >> vl.c | 22 ++++++++++++++++------
>>> >> 3 files changed, 18 insertions(+), 8 deletions(-)
>>> >>
>>> >
>>> > Thanks for working on improving the io-thread with TCG. Your patches
>>> > make sense, but they don't seems to fix the slowdown observed when
>>> > enabling the io-thread. Well maybe they were not supposed to.
>>> This is
>>> > for example the results of netperf between guest and host using
>>> virtio:
>>> >
>>> > no io-thread 122 MB/s
>>> > io-thread 97 MB/s
>>> > io-thread + patches 98 MB/s
>>> >
>>>
>>> Can you capture ftraces of io-thread enabled& disabled runs? They just
>>> need to cover a hand full of frames.
>>>
>>
>> Also interesting would be the context switch rates on the host.
>>
>> If they're large, perhaps using user-space threading instead of
>> native threads would help.
>
> I still suspect mitigation as the culprit here. Select is going to
> get to run more often which means more interrupt generation.
>
> I bet if you count the number of packets per interrupt/notify you'll
> find that less batching is occurring.
>
Can you clarify? Which mitigation? virtio-net interrupt mitigation?
virtio-net interrupt mitigation is time-based, no? so why should
threading affect it? and why would select() run more often? since we
make all fds generate a signal, we ought to run a similar number same
number of select()s.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] io-thread optimizations
2011-04-14 7:14 ` Jan Kiszka
2011-04-14 13:45 ` Avi Kivity
@ 2011-04-25 18:35 ` Aurelien Jarno
2011-04-26 7:36 ` Jan Kiszka
1 sibling, 1 reply; 15+ messages in thread
From: Aurelien Jarno @ 2011-04-25 18:35 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Paolo Bonzini, Anthony Liguori, Marcelo Tosatti, qemu-devel
On Thu, Apr 14, 2011 at 09:14:35AM +0200, Jan Kiszka wrote:
> On 2011-04-13 22:16, Aurelien Jarno wrote:
> > On Mon, Apr 11, 2011 at 10:27:41PM +0200, Jan Kiszka wrote:
> >> These patches were posted before. They bring down the overhead of the
> >> io-thread mode for TCG here, specifically when emulating SMP.
> >>
> >> The major change in this version, besides rebasing, is the exclusion of
> >> KVM from the main loop polling optimization.
> >>
> >>
> >>
> >> Jan Kiszka (3):
> >> Do not drop global mutex for polled main loop runs
> >> Poll main loop after I/O events were received
> >> Do not kick vcpus in TCG mode
> >>
> >> cpus.c | 2 +-
> >> sysemu.h | 2 +-
> >> vl.c | 22 ++++++++++++++++------
> >> 3 files changed, 18 insertions(+), 8 deletions(-)
> >>
> >
> > Thanks for working on improving the io-thread with TCG. Your patches
> > make sense, but they don't seems to fix the slowdown observed when
> > enabling the io-thread. Well maybe they were not supposed to. This is
> > for example the results of netperf between guest and host using virtio:
> >
> > no io-thread 122 MB/s
> > io-thread 97 MB/s
> > io-thread + patches 98 MB/s
> >
>
> Can you capture ftraces of io-thread enabled & disabled runs? They just
> need to cover a hand full of frames.
>
>From what I have been able to get from the ftraces documentation, it's
possible multiple tracers. Which tracers would you like to use there?
The best would be a set of command lines to run.
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] io-thread optimizations
2011-04-25 18:35 ` Aurelien Jarno
@ 2011-04-26 7:36 ` Jan Kiszka
0 siblings, 0 replies; 15+ messages in thread
From: Jan Kiszka @ 2011-04-26 7:36 UTC (permalink / raw)
To: Aurelien Jarno
Cc: Paolo Bonzini, Anthony Liguori, Marcelo Tosatti, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1978 bytes --]
On 2011-04-25 20:35, Aurelien Jarno wrote:
> On Thu, Apr 14, 2011 at 09:14:35AM +0200, Jan Kiszka wrote:
>> On 2011-04-13 22:16, Aurelien Jarno wrote:
>>> On Mon, Apr 11, 2011 at 10:27:41PM +0200, Jan Kiszka wrote:
>>>> These patches were posted before. They bring down the overhead of the
>>>> io-thread mode for TCG here, specifically when emulating SMP.
>>>>
>>>> The major change in this version, besides rebasing, is the exclusion of
>>>> KVM from the main loop polling optimization.
>>>>
>>>>
>>>>
>>>> Jan Kiszka (3):
>>>> Do not drop global mutex for polled main loop runs
>>>> Poll main loop after I/O events were received
>>>> Do not kick vcpus in TCG mode
>>>>
>>>> cpus.c | 2 +-
>>>> sysemu.h | 2 +-
>>>> vl.c | 22 ++++++++++++++++------
>>>> 3 files changed, 18 insertions(+), 8 deletions(-)
>>>>
>>>
>>> Thanks for working on improving the io-thread with TCG. Your patches
>>> make sense, but they don't seems to fix the slowdown observed when
>>> enabling the io-thread. Well maybe they were not supposed to. This is
>>> for example the results of netperf between guest and host using virtio:
>>>
>>> no io-thread 122 MB/s
>>> io-thread 97 MB/s
>>> io-thread + patches 98 MB/s
>>>
>>
>> Can you capture ftraces of io-thread enabled & disabled runs? They just
>> need to cover a hand full of frames.
>>
>
> From what I have been able to get from the ftraces documentation, it's
> possible multiple tracers. Which tracers would you like to use there?
> The best would be a set of command lines to run.
Sorry, of course: Just download, build & install trace-cmd [1], then
execute "trace-cmd record -b 16000 -e all" while qemu is running. The
result is written to trace.dat in the current directory. Visualize it
via "trace-cmd report" (or kernelshark if you built that as well).
Jan
[1] git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] io-thread optimizations
2011-04-13 20:16 ` Aurelien Jarno
2011-04-14 7:14 ` Jan Kiszka
@ 2011-06-25 8:38 ` Jan Kiszka
2011-06-25 22:44 ` Andreas Färber
1 sibling, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2011-06-25 8:38 UTC (permalink / raw)
To: Aurelien Jarno
Cc: Paolo Bonzini, Anthony Liguori, Marcelo Tosatti, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1499 bytes --]
On 2011-04-13 22:16, Aurelien Jarno wrote:
> On Mon, Apr 11, 2011 at 10:27:41PM +0200, Jan Kiszka wrote:
>> These patches were posted before. They bring down the overhead of the
>> io-thread mode for TCG here, specifically when emulating SMP.
>>
>> The major change in this version, besides rebasing, is the exclusion of
>> KVM from the main loop polling optimization.
>>
>>
>>
>> Jan Kiszka (3):
>> Do not drop global mutex for polled main loop runs
>> Poll main loop after I/O events were received
>> Do not kick vcpus in TCG mode
>>
>> cpus.c | 2 +-
>> sysemu.h | 2 +-
>> vl.c | 22 ++++++++++++++++------
>> 3 files changed, 18 insertions(+), 8 deletions(-)
>>
>
> Thanks for working on improving the io-thread with TCG. Your patches
> make sense, but they don't seems to fix the slowdown observed when
> enabling the io-thread. Well maybe they were not supposed to. This is
> for example the results of netperf between guest and host using virtio:
>
> no io-thread 122 MB/s
> io-thread 97 MB/s
> io-thread + patches 98 MB/s
>
Given that everyone seems to agree that these patches are a step into
the right direction (for the current TCG locking architecture at least),
can we please finally apply them? They do have positive impact in some
use cases. If rebasing is required (I don't think so), just let me know.
BTW, did you make any progress with tracing the remaining issues?
Thanks,
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] io-thread optimizations
2011-06-25 8:38 ` Jan Kiszka
@ 2011-06-25 22:44 ` Andreas Färber
2011-06-26 9:11 ` Jan Kiszka
0 siblings, 1 reply; 15+ messages in thread
From: Andreas Färber @ 2011-06-25 22:44 UTC (permalink / raw)
To: Jan Kiszka
Cc: Anthony Liguori, Marcelo Tosatti, qemu-devel Developers,
Alexander Graf, Paolo Bonzini, Aurelien Jarno
Am 25.06.2011 um 10:38 schrieb Jan Kiszka:
> On 2011-04-13 22:16, Aurelien Jarno wrote:
>> On Mon, Apr 11, 2011 at 10:27:41PM +0200, Jan Kiszka wrote:
>>> These patches were posted before. They bring down the overhead of
>>> the
>>> io-thread mode for TCG here, specifically when emulating SMP.
>>>
>>> The major change in this version, besides rebasing, is the
>>> exclusion of
>>> KVM from the main loop polling optimization.
>>>
>>>
>>>
>>> Jan Kiszka (3):
>>> Do not drop global mutex for polled main loop runs
>>> Poll main loop after I/O events were received
>>> Do not kick vcpus in TCG mode
>>>
>>> cpus.c | 2 +-
>>> sysemu.h | 2 +-
>>> vl.c | 22 ++++++++++++++++------
>>> 3 files changed, 18 insertions(+), 8 deletions(-)
>>>
>>
>> Thanks for working on improving the io-thread with TCG. Your patches
>> make sense, but they don't seems to fix the slowdown observed when
>> enabling the io-thread. Well maybe they were not supposed to. This is
>> for example the results of netperf between guest and host using
>> virtio:
>>
>> no io-thread 122 MB/s
>> io-thread 97 MB/s
>> io-thread + patches 98 MB/s
>>
>
> Given that everyone seems to agree that these patches are a step into
> the right direction (for the current TCG locking architecture at
> least),
> can we please finally apply them? They do have positive impact in some
> use cases. If rebasing is required (I don't think so), just let me
> know.
>
> BTW, did you make any progress with tracing the remaining issues?
I've tested these together with Paolo's fixes, but it still hangs on
Darwin. Doesn't appear to make it worse though.
Andreas
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH v2 0/3] io-thread optimizations
2011-06-25 22:44 ` Andreas Färber
@ 2011-06-26 9:11 ` Jan Kiszka
0 siblings, 0 replies; 15+ messages in thread
From: Jan Kiszka @ 2011-06-26 9:11 UTC (permalink / raw)
To: Andreas Färber
Cc: Anthony Liguori, Marcelo Tosatti, qemu-devel Developers,
Alexander Graf, Paolo Bonzini, Aurelien Jarno
[-- Attachment #1: Type: text/plain, Size: 1862 bytes --]
On 2011-06-26 00:44, Andreas Färber wrote:
> Am 25.06.2011 um 10:38 schrieb Jan Kiszka:
>
>> On 2011-04-13 22:16, Aurelien Jarno wrote:
>>> On Mon, Apr 11, 2011 at 10:27:41PM +0200, Jan Kiszka wrote:
>>>> These patches were posted before. They bring down the overhead of the
>>>> io-thread mode for TCG here, specifically when emulating SMP.
>>>>
>>>> The major change in this version, besides rebasing, is the exclusion of
>>>> KVM from the main loop polling optimization.
>>>>
>>>>
>>>>
>>>> Jan Kiszka (3):
>>>> Do not drop global mutex for polled main loop runs
>>>> Poll main loop after I/O events were received
>>>> Do not kick vcpus in TCG mode
>>>>
>>>> cpus.c | 2 +-
>>>> sysemu.h | 2 +-
>>>> vl.c | 22 ++++++++++++++++------
>>>> 3 files changed, 18 insertions(+), 8 deletions(-)
>>>>
>>>
>>> Thanks for working on improving the io-thread with TCG. Your patches
>>> make sense, but they don't seems to fix the slowdown observed when
>>> enabling the io-thread. Well maybe they were not supposed to. This is
>>> for example the results of netperf between guest and host using virtio:
>>>
>>> no io-thread 122 MB/s
>>> io-thread 97 MB/s
>>> io-thread + patches 98 MB/s
>>>
>>
>> Given that everyone seems to agree that these patches are a step into
>> the right direction (for the current TCG locking architecture at least),
>> can we please finally apply them? They do have positive impact in some
>> use cases. If rebasing is required (I don't think so), just let me know.
>>
>> BTW, did you make any progress with tracing the remaining issues?
>
> I've tested these together with Paolo's fixes, but it still hangs on
> Darwin. Doesn't appear to make it worse though.
http://thread.gmane.org/gmane.comp.emulators.qemu/106225 is still not
merged.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread