* problems running many guests
@ 2008-05-01 23:00 Karl Rister
2008-05-02 0:16 ` Marcelo Tosatti
2008-05-04 8:41 ` Avi Kivity
0 siblings, 2 replies; 7+ messages in thread
From: Karl Rister @ 2008-05-01 23:00 UTC (permalink / raw)
To: kvm-devel
Hi
I have been trying to do some testing of a large number of guests (72) on a
big multi-node IBM box (8 sockets, 32 cores, 128GB) and I am having various
issues with the guests. I can get the guests to boot, but then I start to
have problems. Some guests appear to stall doing I/O and some become
unresponsive and spin their single vcpu at 100%.
Each guest is configured with 1 vcpu and 1000MB of memory. The single virtual
disk is backed by a LVM volume. Both the guest and host are running custom
kernels.
I have tried kvm-67, kvm-64, and kvm-62 (not functional at all). I have
cloned both the kvm and kvm-userspace repositories and am building the tagged
changesets from each.
Here are a few of the various things I have tried: virtio and emulated devices
for the nic and disk; mixed virtio and emulated devices; kvm-clock and
clock=jiffies.
Any help in pinpointing the problem would be appreciated.
Thanks.
--
Karl Rister
IBM Linux Performance Team
kmr@us.ibm.com
(512) 838-1553 (t/l 678)
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: problems running many guests
2008-05-01 23:00 problems running many guests Karl Rister
@ 2008-05-02 0:16 ` Marcelo Tosatti
2008-05-02 19:19 ` Karl Rister
2008-05-06 1:40 ` Karl Rister
2008-05-04 8:41 ` Avi Kivity
1 sibling, 2 replies; 7+ messages in thread
From: Marcelo Tosatti @ 2008-05-02 0:16 UTC (permalink / raw)
To: Karl Rister; +Cc: kvm-devel
On Thu, May 01, 2008 at 06:00:44PM -0500, Karl Rister wrote:
> Hi
>
> I have been trying to do some testing of a large number of guests (72) on a
> big multi-node IBM box (8 sockets, 32 cores, 128GB) and I am having various
> issues with the guests. I can get the guests to boot, but then I start to
> have problems. Some guests appear to stall doing I/O and some become
> unresponsive and spin their single vcpu at 100%.
Does -no-kvm-irqchip or -no-kvm-pit makes a difference? If not, please
grab kvm_stat --once output when that happens.
Also run "readprofile -r ; readprofile -m System-map-of-guest.map" with the
host booted with "profile=kvm". Make sure all guests are running the same kernel
image.
The profiling should be easier to understand if you have 1 guest spinning and
remaining ones idle.
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: problems running many guests
2008-05-02 0:16 ` Marcelo Tosatti
@ 2008-05-02 19:19 ` Karl Rister
2008-05-06 1:40 ` Karl Rister
1 sibling, 0 replies; 7+ messages in thread
From: Karl Rister @ 2008-05-02 19:19 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: kvm-devel
On Thursday 01 May 2008 7:16:53 pm Marcelo Tosatti wrote:
> On Thu, May 01, 2008 at 06:00:44PM -0500, Karl Rister wrote:
> > Hi
> >
> > I have been trying to do some testing of a large number of guests (72) on
> > a big multi-node IBM box (8 sockets, 32 cores, 128GB) and I am having
> > various issues with the guests. I can get the guests to boot, but then I
> > start to have problems. Some guests appear to stall doing I/O and some
> > become unresponsive and spin their single vcpu at 100%.
>
> Does -no-kvm-irqchip or -no-kvm-pit makes a difference? If not, please
> grab kvm_stat --once output when that happens.
I have tried -no-kvm-irqchip and it didn't help any. I will try -no-kvm-pit
and get the kvm_stat info for both.
>
> Also run "readprofile -r ; readprofile -m System-map-of-guest.map" with the
> host booted with "profile=kvm". Make sure all guests are running the same
> kernel image.
Will do.
>
> The profiling should be easier to understand if you have 1 guest spinning
> and remaining ones idle.
--
Karl Rister
IBM Linux Performance Team
kmr@us.ibm.com
(512) 838-1553 (t/l 678)
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: problems running many guests
2008-05-01 23:00 problems running many guests Karl Rister
2008-05-02 0:16 ` Marcelo Tosatti
@ 2008-05-04 8:41 ` Avi Kivity
1 sibling, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2008-05-04 8:41 UTC (permalink / raw)
To: Karl Rister; +Cc: kvm-devel
Karl Rister wrote:
> Hi
>
> I have been trying to do some testing of a large number of guests (72) on a
> big multi-node IBM box (8 sockets, 32 cores, 128GB) and I am having various
> issues with the guests. I can get the guests to boot, but then I start to
> have problems. Some guests appear to stall doing I/O and some become
> unresponsive and spin their single vcpu at 100%.
>
One of the problems with these large boxes is that their TSCs are not
synced across sockets; you may be hitting related issues. Can you try
configuring the guests not to use the tsc?
Also, if you are running on an old host kernel, you won't have
smp_call_function_single() and there will be many broadcast IPIs.
Please use a recent host kernel (kvm.git is best, though a bit bleeding
edge).
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: problems running many guests
2008-05-02 0:16 ` Marcelo Tosatti
2008-05-02 19:19 ` Karl Rister
@ 2008-05-06 1:40 ` Karl Rister
2008-05-06 16:16 ` Marcelo Tosatti
2008-05-07 7:56 ` Avi Kivity
1 sibling, 2 replies; 7+ messages in thread
From: Karl Rister @ 2008-05-06 1:40 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: kvm-devel
On Thursday 01 May 2008 7:16:53 pm Marcelo Tosatti wrote:
> Does -no-kvm-irqchip or -no-kvm-pit makes a difference? If not, please
> grab kvm_stat --once output when that happens.
Per some suggestions I have moved up to kvm-68 which is better, but still
having problems. Replicating the problem with only one guest spinning has
proven quite difficult, but attempting to boot a large smp guest can reliably
recreate the problem. Using -no-kvm-pit did not help the large guest
and -no-kvm-irqchip made it seize up even earlier with only 1 cpu spinning
instead of all of them.
>
> Also run "readprofile -r ; readprofile -m System-map-of-guest.map" with the
> host booted with "profile=kvm". Make sure all guests are running the same
> kernel image.
I got this from a spinning 16-way guest with only 8 of the host CPUs online
and without either -no-kvm-irqchip or -no-kvm-pit:
[root@newcastle ~]# readprofile -r ; readprofile -m
karl/System.map-2.6.25-03591-g873c05f
101 native_read_tsc 3.4828
1 read_persistent_clock 0.0192
25 kvm_clock_read 0.2660
95 getnstimeofday 0.7252
13 update_wall_time 0.0138
1 second_overflow 0.0020
readprofile: profile address out of range. Wrong map file?
The kvm_stat output during this is:
[root@newcastle ~]# kvm_stat --once
efer_reload 23354 0
exits 3587109 2250
fpu_reload 1934298 0
halt_exits 4583 0
halt_wakeup 42 0
host_state_reload 2165502 167
hypercalls 1482 0
insn_emulation 900199 0
insn_emulation_fail 0 0
invlpg 0 0
io_exits 1983116 0
irq_exits 427728 2250
irq_window 0 0
largepages 0 0
mmio_exits 163522 0
mmu_cache_miss 176 0
mmu_flooded 99 0
mmu_pde_zapped 191 0
mmu_pte_updated 10 0
mmu_pte_write 59030 0
mmu_recycled 0 0
mmu_shadow_zapped 99 0
pf_fixed 14890 0
pf_guest 0 0
remote_tlb_flush 29 0
request_irq 0 0
signal_exits 1 0
tlb_flush 481952 0
The output with -no-kvm-pit looked almost identical and with -no-kvm-pit there
was no samples registered for either tool.
--
Karl Rister
IBM Linux Performance Team
kmr@us.ibm.com
(512) 838-1553 (t/l 678)
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: problems running many guests
2008-05-06 1:40 ` Karl Rister
@ 2008-05-06 16:16 ` Marcelo Tosatti
2008-05-07 7:56 ` Avi Kivity
1 sibling, 0 replies; 7+ messages in thread
From: Marcelo Tosatti @ 2008-05-06 16:16 UTC (permalink / raw)
To: Karl Rister; +Cc: kvm-devel
Hi Karl,
On Mon, May 05, 2008 at 08:40:22PM -0500, Karl Rister wrote:
> On Thursday 01 May 2008 7:16:53 pm Marcelo Tosatti wrote:
> > Does -no-kvm-irqchip or -no-kvm-pit makes a difference? If not, please
> > grab kvm_stat --once output when that happens.
>
> Per some suggestions I have moved up to kvm-68 which is better, but still
> having problems. Replicating the problem with only one guest spinning has
> proven quite difficult, but attempting to boot a large smp guest can reliably
> recreate the problem. Using -no-kvm-pit did not help the large guest
> and -no-kvm-irqchip made it seize up even earlier with only 1 cpu spinning
> instead of all of them.
>
> >
> > Also run "readprofile -r ; readprofile -m System-map-of-guest.map" with the
> > host booted with "profile=kvm". Make sure all guests are running the same
> > kernel image.
>
> I got this from a spinning 16-way guest with only 8 of the host CPUs online
> and without either -no-kvm-irqchip or -no-kvm-pit:
>
> [root@newcastle ~]# readprofile -r ; readprofile -m
> karl/System.map-2.6.25-03591-g873c05f
> 101 native_read_tsc 3.4828
> 1 read_persistent_clock 0.0192
> 25 kvm_clock_read 0.2660
> 95 getnstimeofday 0.7252
> 13 update_wall_time 0.0138
> 1 second_overflow 0.0020
> readprofile: profile address out of range. Wrong map file?
KVM clock has known problems with SMP guests, please disable it for now.
Also disable LOCKDEP on the guest if it has more VCPU's than CPU's
available in the host.
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: problems running many guests
2008-05-06 1:40 ` Karl Rister
2008-05-06 16:16 ` Marcelo Tosatti
@ 2008-05-07 7:56 ` Avi Kivity
1 sibling, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2008-05-07 7:56 UTC (permalink / raw)
To: Karl Rister; +Cc: kvm-devel, Marcelo Tosatti
Karl Rister wrote:
> On Thursday 01 May 2008 7:16:53 pm Marcelo Tosatti wrote:
>
>> Does -no-kvm-irqchip or -no-kvm-pit makes a difference? If not, please
>> grab kvm_stat --once output when that happens.
>>
>
> Per some suggestions I have moved up to kvm-68 which is better, but still
> having problems. Replicating the problem with only one guest spinning has
> proven quite difficult, but attempting to boot a large smp guest can reliably
> recreate the problem. Using -no-kvm-pit did not help the large guest
> and -no-kvm-irqchip made it seize up even earlier with only 1 cpu spinning
> instead of all of them.
>
>
Can you try the many-uniprocessor-guests scenario, with each guest
pinned to a cpu?
taskset $(( 1 << (RANDOM % 32) )) qemu ...
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-05-07 7:56 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-01 23:00 problems running many guests Karl Rister
2008-05-02 0:16 ` Marcelo Tosatti
2008-05-02 19:19 ` Karl Rister
2008-05-06 1:40 ` Karl Rister
2008-05-06 16:16 ` Marcelo Tosatti
2008-05-07 7:56 ` Avi Kivity
2008-05-04 8:41 ` Avi Kivity
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox