* [Qemu-devel] Re: request : qemu-smp as target
@ 2005-05-14 9:37 Blue Swirl
2005-05-14 11:31 ` Paul Brook
2005-05-14 12:16 ` Fabrice Bellard
0 siblings, 2 replies; 14+ messages in thread
From: Blue Swirl @ 2005-05-14 9:37 UTC (permalink / raw)
To: octane; +Cc: qemu-devel
Hi,
The architecture used in sparc target (sun4m) supports SMP up to a maximum
of 16 CPUs. At hardware emulation level (hw/*, target-sparc/*), it would be
easy to add the missing interprocessor interrupts, per-CPU counters and
atomic instructions. It would also be simple to add the prom functions for
starting/stopping CPUs to Proll. Maybe some days' work in total.
Higher level (vl.c, cpu-exec.c) could need more work. Maybe Fabrice can
enlighten us?
For some reason, Sparc performance is low (1/10 of native x86 nbench)
compared to x86 (2/3). Simulating SMP on a uniprocessor would only decrease
performance.
_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar - get it now!
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Re: request : qemu-smp as target
2005-05-14 9:37 [Qemu-devel] Re: request : qemu-smp as target Blue Swirl
@ 2005-05-14 11:31 ` Paul Brook
2005-05-14 15:22 ` Blue Swirl
2005-05-14 12:16 ` Fabrice Bellard
1 sibling, 1 reply; 14+ messages in thread
From: Paul Brook @ 2005-05-14 11:31 UTC (permalink / raw)
To: qemu-devel; +Cc: Blue Swirl
On Saturday 14 May 2005 10:37, Blue Swirl wrote:
> Hi,
>
> The architecture used in sparc target (sun4m) supports SMP up to a maximum
> of 16 CPUs. At hardware emulation level (hw/*, target-sparc/*), it would be
> easy to add the missing interprocessor interrupts, per-CPU counters and
> atomic instructions. It would also be simple to add the prom functions for
> starting/stopping CPUs to Proll. Maybe some days' work in total.
>
> Higher level (vl.c, cpu-exec.c) could need more work. Maybe Fabrice can
> enlighten us?
I guess you'd really want to simulate multiple CPUs with multiple host
threads. One of the additional problems could then be memory/cache coherency.
I'm not sure how much of a problem this would be in practice. If both host
and guest require the same (or no) explicit SMP memory barriert it's not a
problem. It the guest has stronger coherency requirements than the host we
have a problem.
> For some reason, Sparc performance is low (1/10 of native x86 nbench)
> compared to x86 (2/3). Simulating SMP on a uniprocessor would only decrease
> performance.
It think x86-on-x86 user-mode uses code-copying by default. ie. it runs a lot
of the the code unmodified. In my experience i386-softmmu is generally 10-15x
slower than native, and arm-user is 5-10x slower.
Paul
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Qemu-devel] Re: request : qemu-smp as target
2005-05-14 9:37 [Qemu-devel] Re: request : qemu-smp as target Blue Swirl
2005-05-14 11:31 ` Paul Brook
@ 2005-05-14 12:16 ` Fabrice Bellard
2005-05-14 13:11 ` Jonas Maebe
` (2 more replies)
1 sibling, 3 replies; 14+ messages in thread
From: Fabrice Bellard @ 2005-05-14 12:16 UTC (permalink / raw)
To: Blue Swirl; +Cc: qemu-devel
Blue Swirl wrote:
> Hi,
>
> The architecture used in sparc target (sun4m) supports SMP up to a
> maximum of 16 CPUs. At hardware emulation level (hw/*, target-sparc/*),
> it would be easy to add the missing interprocessor interrupts, per-CPU
> counters and atomic instructions. It would also be simple to add the
> prom functions for starting/stopping CPUs to Proll. Maybe some days'
> work in total.
>
> Higher level (vl.c, cpu-exec.c) could need more work. Maybe Fabrice can
> enlighten us?
SMP est definitely possible in QEMU - a few days of work are necessary
to add the missing generic support and an x86 implementation... but
currently I prefer to work an other topics.
Just for your information, some choices need to be made:
1) Do the CPUs share the same translation cache ?
2) The first implementation would use a cycle counter to schedule
between CPUs. Is it interesting to go further and to use a host thread
for each guest CPU at the expense of more locking overhead ?
Fabrice.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Re: request : qemu-smp as target
2005-05-14 12:16 ` Fabrice Bellard
@ 2005-05-14 13:11 ` Jonas Maebe
2005-05-14 14:46 ` Blue Swirl
2005-05-14 16:55 ` Joe Batt
2 siblings, 0 replies; 14+ messages in thread
From: Jonas Maebe @ 2005-05-14 13:11 UTC (permalink / raw)
To: qemu-devel
On 14 May 2005, at 14:16, Fabrice Bellard wrote:
> 1) Do the CPUs share the same translation cache ?
>
> 2) The first implementation would use a cycle counter to schedule
> between CPUs. Is it interesting to go further and to use a host
> thread for each guest CPU at the expense of more locking overhead ?
These two choices are closely related: if you use a host thread for
each guest cpu, you almost need different translation caches, because
otherwise it becomes very difficult to flush the translation cache.
Jonas
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Qemu-devel] Re: request : qemu-smp as target
2005-05-14 12:16 ` Fabrice Bellard
2005-05-14 13:11 ` Jonas Maebe
@ 2005-05-14 14:46 ` Blue Swirl
2005-05-14 16:55 ` Joe Batt
2 siblings, 0 replies; 14+ messages in thread
From: Blue Swirl @ 2005-05-14 14:46 UTC (permalink / raw)
To: fabrice; +Cc: qemu-devel
>SMP est definitely possible in QEMU - a few days of work are necessary to
>add the missing generic support and an x86 implementation... but currently
>I prefer to work an other topics.
>
>Just for your information, some choices need to be made:
>
>1) Do the CPUs share the same translation cache ?
This could be very useful, but wouldn't the cache need to be indexed by
physical addresses, not virtual?
>2) The first implementation would use a cycle counter to schedule between
>CPUs. Is it interesting to go further and to use a host thread for each
>guest CPU at the expense of more locking overhead ?
I'd skip the cycle counter at first iteration and schedule using a host
timer and CPU idling. What are the things that need locking, by the way?
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Re: request : qemu-smp as target
2005-05-14 11:31 ` Paul Brook
@ 2005-05-14 15:22 ` Blue Swirl
0 siblings, 0 replies; 14+ messages in thread
From: Blue Swirl @ 2005-05-14 15:22 UTC (permalink / raw)
To: paul, qemu-devel
>I guess you'd really want to simulate multiple CPUs with multiple host
>threads. One of the additional problems could then be memory/cache
>coherency.
>I'm not sure how much of a problem this would be in practice. If both host
>and guest require the same (or no) explicit SMP memory barriert it's not a
>problem. It the guest has stronger coherency requirements than the host we
>have a problem.
Sparc32 architecure requires flushes and atomic instructions in critical
code in Total Store Ordering mode. There is also higher performance mode
requiring memory barriers called Partial Store Ordering, but I think Linux
doesn't enable it.
> > For some reason, Sparc performance is low (1/10 of native x86 nbench)
> > compared to x86 (2/3). Simulating SMP on a uniprocessor would only
>decrease
> > performance.
>
>It think x86-on-x86 user-mode uses code-copying by default. ie. it runs a
>lot
>of the the code unmodified. In my experience i386-softmmu is generally
>10-15x
>slower than native, and arm-user is 5-10x slower.
Good point. With -no-code-copy I get about 1/6 of native nbench in x86-x86,
comparable to Sparc figure.
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Re: request : qemu-smp as target
2005-05-14 12:16 ` Fabrice Bellard
2005-05-14 13:11 ` Jonas Maebe
2005-05-14 14:46 ` Blue Swirl
@ 2005-05-14 16:55 ` Joe Batt
2005-05-17 20:21 ` Paul Brook
2 siblings, 1 reply; 14+ messages in thread
From: Joe Batt @ 2005-05-14 16:55 UTC (permalink / raw)
To: qemu-devel
On Sat, 2005-05-14 at 14:16 +0200, Fabrice Bellard wrote:
...
> 2) The first implementation would use a cycle counter to schedule
> between CPUs. Is it interesting to go further and to use a host thread
> for each guest CPU at the expense of more locking overhead ?
What inter processor synchronization issues are there? Could you take
this a step further and use processes on different machines for each
processor? (There are many shared memory implementations to choose
from.) I have ignorantly implemented an SH2 emulator, but have zero
understanding of an SMP system. Are there so many resources shared
between the CPUs to make this a ridiculous proposition?
It could make for a interesting distributed single image system.
--
Joe Batt <Joe@soliddesign.net>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Qemu-devel] Re: request : qemu-smp as target
@ 2005-05-16 13:17 octane indice
0 siblings, 0 replies; 14+ messages in thread
From: octane indice @ 2005-05-16 13:17 UTC (permalink / raw)
To: fabrice, qemu-devel
Quoting Fabrice Bellard <fabrice@bellard.org> :
> SMP est definitely possible in QEMU - a few days of work are
> necessary to add the missing generic support and an x86
> implementation...
ok.
> but currently I prefer to work an other topics.
>
ok.
So, perhaps in the next releases?
---------------------------------------------
Protect your mails from viruses thanks to Alinto Premium services http://www.alinto.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Re: request : qemu-smp as target
2005-05-14 16:55 ` Joe Batt
@ 2005-05-17 20:21 ` Paul Brook
2005-05-17 20:41 ` Joe Batt
2005-05-18 11:25 ` Mark Williamson
0 siblings, 2 replies; 14+ messages in thread
From: Paul Brook @ 2005-05-17 20:21 UTC (permalink / raw)
To: qemu-devel; +Cc: Joe Batt
> What inter processor synchronization issues are there? Could you take
> this a step further and use processes on different machines for each
> processor? (There are many shared memory implementations to choose
> from.) Are there so many resources shared
> between the CPUs to make this a ridiculous proposition?
Baically most SMP/shared memory systems assume very low latency communication
between CPUs and memory. For example on opteron systems remote memory latency
is of the order of 200 cpu cycles. Typical ethernet latency is several
million cycles.
To do single-system image over a high latency interconnect (eg. ethernet) you
need OS and preferably use application support to avoid high-contention
memory areas. Simulating a SMP system over multiple separate nodes is
theoretically possible, but performance would probaby be abysmal.
The only solution I can imagine being even vaguely worthwhile is a running
user-mode qemu on top of a native openmozix system.
> I have ignorantly implemented an SH2 emulator,
Cool. Any chance you're going to make these changes publicly available?
Paul
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Re: request : qemu-smp as target
2005-05-17 20:21 ` Paul Brook
@ 2005-05-17 20:41 ` Joe Batt
2005-05-17 20:59 ` Paul Brook
2005-05-18 11:29 ` Mark Williamson
2005-05-18 11:25 ` Mark Williamson
1 sibling, 2 replies; 14+ messages in thread
From: Joe Batt @ 2005-05-17 20:41 UTC (permalink / raw)
To: Paul Brook; +Cc: qemu-devel
On Tue, 2005-05-17 at 21:21 +0100, Paul Brook wrote:
> > What inter processor synchronization issues are there? Could you take
> > this a step further and use processes on different machines for each
> > processor? (There are many shared memory implementations to choose
> > from.) Are there so many resources shared
> > between the CPUs to make this a ridiculous proposition?
>
> Baically most SMP/shared memory systems assume very low latency communication
> between CPUs and memory. For example on opteron systems remote memory latency
> is of the order of 200 cpu cycles. Typical ethernet latency is several
> million cycles.
But how often will the virtual CPUs need the same page and is there any
other shared resource other than memory? I don't know how independent
each CPU is. Though in side discussions, everyone agrees with you, I
haven't seen numbers to convince my gut. If page only needs to be
faulted back and forth every couple million cycles, then it might work.
> The only solution I can imagine being even vaguely worthwhile is a running
> user-mode qemu on top of a native openmozix system.
OpenMosix is very interesting, but is a pain to setup. How about this:
ssh -f host1 qemu -cpu-server $KEY
ssh -f host2 qemu -cpu-server $KEY
qemu -cpu-client host1:$KEY \
-cpu-client host2:$KEY \
-hda server.image
> > I have ignorantly implemented an SH2 emulator,
>
> Cool. Any chance you're going to make these changes publicly available?
It was a Java implementation for a customer. Not my property and not
integrated with any free software.
> Paul
--
Joe Batt <Joe@soliddesign.net>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Re: request : qemu-smp as target
2005-05-17 20:41 ` Joe Batt
@ 2005-05-17 20:59 ` Paul Brook
2005-05-18 11:29 ` Mark Williamson
1 sibling, 0 replies; 14+ messages in thread
From: Paul Brook @ 2005-05-17 20:59 UTC (permalink / raw)
To: Joe Batt; +Cc: qemu-devel
On Tuesday 17 May 2005 21:41, Joe Batt wrote:
> On Tue, 2005-05-17 at 21:21 +0100, Paul Brook wrote:
> > > What inter processor synchronization issues are there? Could you take
> > > this a step further and use processes on different machines for each
> > > processor? (There are many shared memory implementations to choose
> > > from.) Are there so many resources shared
> > > between the CPUs to make this a ridiculous proposition?
> >
> > Baically most SMP/shared memory systems assume very low latency
> > communication between CPUs and memory. For example on opteron systems
> > remote memory latency is of the order of 200 cpu cycles. Typical ethernet
> > latency is several million cycles.
>
> But how often will the virtual CPUs need the same page and is there any
> other shared resource other than memory? I don't know how independent
> each CPU is. Though in side discussions, everyone agrees with you, I
> haven't seen numbers to convince my gut. If page only needs to be
> faulted back and forth every couple million cycles, then it might work.
Everything but the cpu (and possibly the APIC) is shared. This is why big SMP
systems cost orders of magnitude more than a similar size cluster.
One of the biggest sources of contention is probably going to be kernel
spinlocks. Every time the kernel needs to acquire a spinlock that page will
need to be faulted across to that CPU.
> > Cool. Any chance you're going to make these changes publicly available?
>
> It was a Java implementation for a customer. Not my property and not
> integrated with any free software.
Oh! I thought you'd implemented SH support for qemu :-)
Paul
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Re: request : qemu-smp as target
2005-05-17 20:21 ` Paul Brook
2005-05-17 20:41 ` Joe Batt
@ 2005-05-18 11:25 ` Mark Williamson
1 sibling, 0 replies; 14+ messages in thread
From: Mark Williamson @ 2005-05-18 11:25 UTC (permalink / raw)
To: qemu-devel; +Cc: Paul Brook, Joe Batt
> The only solution I can imagine being even vaguely worthwhile is a running
> user-mode qemu on top of a native openmozix system.
Probably if you want to run a distributed SMP-style sytem using QEmu, the most
effective approach is going to be running OpenMosix *in* QEmu, on multiple
hosts.
Sadly, this is much less transparent than QEmu / the host doing the
distribution for you :-( but I can probably think of situations where it
might be useful... (how about running a virtual OM cluster on a bunch of
mostly-idle Windows boxes?)
Cheers,
Mark
>
> > I have ignorantly implemented an SH2 emulator,
>
> Cool. Any chance you're going to make these changes publicly available?
>
> Paul
>
>
> _______________________________________________
> Qemu-devel mailing list
> Qemu-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/qemu-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] Re: request : qemu-smp as target
2005-05-17 20:41 ` Joe Batt
2005-05-17 20:59 ` Paul Brook
@ 2005-05-18 11:29 ` Mark Williamson
2005-05-18 21:19 ` Re[2]: " Igor Shmukler
1 sibling, 1 reply; 14+ messages in thread
From: Mark Williamson @ 2005-05-18 11:29 UTC (permalink / raw)
To: qemu-devel; +Cc: Joe Batt, Paul Brook
> But how often will the virtual CPUs need the same page and is there any
> other shared resource other than memory? I don't know how independent
> each CPU is. Though in side discussions, everyone agrees with you, I
> haven't seen numbers to convince my gut. If page only needs to be
> faulted back and forth every couple million cycles, then it might work.
In the applications, probably very independent. In the kernel, highly
dependent: different CPUs may access shared data structures *and* protect
them with spinlocks. As Paul said in a separate mail, spinlocks are going to
be way more expensive in this sort of distributed environment.
All that being said, a company called "Virtual Iron" has got a
fully-virtualising solution that presents an SMP to the guest OS but actually
distributes computation across a cluster. I have yet to see the product
itself - no idea when it'll be released. It also sounds *really* difficult
to make go fast but at least suggests this sort of thing can perform
reasonably for some workloads.
Cheers,
Mark
> > The only solution I can imagine being even vaguely worthwhile is a
> > running user-mode qemu on top of a native openmozix system.
>
> OpenMosix is very interesting, but is a pain to setup. How about this:
>
> ssh -f host1 qemu -cpu-server $KEY
> ssh -f host2 qemu -cpu-server $KEY
> qemu -cpu-client host1:$KEY \
> -cpu-client host2:$KEY \
> -hda server.image
>
> > > I have ignorantly implemented an SH2 emulator,
> >
> > Cool. Any chance you're going to make these changes publicly available?
>
> It was a Java implementation for a customer. Not my property and not
> integrated with any free software.
>
> > Paul
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re[2]: [Qemu-devel] Re: request : qemu-smp as target
2005-05-18 11:29 ` Mark Williamson
@ 2005-05-18 21:19 ` Igor Shmukler
0 siblings, 0 replies; 14+ messages in thread
From: Igor Shmukler @ 2005-05-18 21:19 UTC (permalink / raw)
To: qemu-devel; +Cc: Joe Batt, Paul Brook
Hi,
I mostly agree with everything said, but I'd like to add some thoughts tp the pot.
I think it's important to understand that that a product that coverts clusters to
virtual MP could be designed with different requirements in mind.
We are working on a research project that represents cluster as a NUMA machine. This
is enough for a NUMA aware OS and performace is not bad.
If we were to make a regular SMP, it would probably not work as well.
I am not a Mosix guy, but I would think that installing QEMU on OM will solve
nothing. It will only shift problems to be addressed at a different layer. Maybe I
am wrong
Sincerely,
Igor.
> In the applications, probably very independent. In the kernel, highly
> dependent: different CPUs may access shared data structures *and* protect
> them with spinlocks. As Paul said in a separate mail, spinlocks are going to
> be way more expensive in this sort of distributed environment.
>
> All that being said, a company called "Virtual Iron" has got a
> fully-virtualising solution that presents an SMP to the guest OS but actually
> distributes computation across a cluster. I have yet to see the product
> itself - no idea when it'll be released. It also sounds *really* difficult
> to make go fast but at least suggests this sort of thing can perform
> reasonably for some workloads.
>
> Cheers,
> Mark
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2005-05-18 21:50 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-14 9:37 [Qemu-devel] Re: request : qemu-smp as target Blue Swirl
2005-05-14 11:31 ` Paul Brook
2005-05-14 15:22 ` Blue Swirl
2005-05-14 12:16 ` Fabrice Bellard
2005-05-14 13:11 ` Jonas Maebe
2005-05-14 14:46 ` Blue Swirl
2005-05-14 16:55 ` Joe Batt
2005-05-17 20:21 ` Paul Brook
2005-05-17 20:41 ` Joe Batt
2005-05-17 20:59 ` Paul Brook
2005-05-18 11:29 ` Mark Williamson
2005-05-18 21:19 ` Re[2]: " Igor Shmukler
2005-05-18 11:25 ` Mark Williamson
-- strict thread matches above, loose matches on Subject: below --
2005-05-16 13:17 octane indice
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).