From: "David S. Ahern" <daahern@cisco.com>
To: Marcelo Tosatti <mtosatti@redhat.com>, Avi Kivity <avi@qumranet.com>
Cc: kvm-devel <kvm-devel@lists.sourceforge.net>
Subject: Re: performance with guests running 2.4 kernels (specifically RHEL3)
Date: Tue, 29 Apr 2008 22:18:20 -0600 [thread overview]
Message-ID: <4817F30C.6050308@cisco.com> (raw)
In-Reply-To: <4816617F.3080403@cisco.com>
Another tidbit for you guys as I make my way through various permutations:
I installed the RHEL3 hugemem kernel and the guest behavior is *much* better.
System time still has some regular hiccups that are higher than xen and esx
(e.g., 1 minute samples out of 5 show system time between 10 and 15%), but
overall guest behavior is good with the hugemem kernel.
One side effect I've noticed is that I cannot restart the RHEL3 guest running
the hugemem kernel in successive attempts. The guest has 2 vcpus and qemu shows
one thread at 100% cpu. If I recall correctly kvm_stat shows a large amount of
tlb_flushes (like millions in a 5-second sample). The scenario is:
1. start guest running hugemem kernel,
2. shutdown,
3. restart guest.
During 3. it hangs, but at random points. Removing kvm/kvm-intel has no effect -
guest still hangs on the restart. Rebooting the host clears the problem.
Alternatively, during the hang on a restart I can kill the guest, and then on
restart choose the normal, 32-bit smp kernel and the guest boots just fine. At
this point I can shutdown the guest and restart with the hugemem kernel and it
boots just fine.
david
David S. Ahern wrote:
> Hi Marcelo:
>
> mmu_recycled is always 0 for this guest -- even after almost 4 hours of uptime.
>
> Here is a kvm_stat sample where guest time was very high and qemu had 2
> processors at 100% on the host. I removed counters where both columns have 0
> value for brevity.
>
> exits 45937979 758051
> fpu_reload 1416831 87
> halt_exits 112911 0
> halt_wakeup 31771 0
> host_state_reload 2068602 263
> insn_emulation 21601480 365493
> io_exits 1827374 2705
> irq_exits 8934818 285196
> mmio_exits 421674 147
> mmu_cache_miss 4817689 93680
> mmu_flooded 4815273 93680
> mmu_pde_zapped 51344 0
> mmu_prefetch 4817625 93680
> mmu_pte_updated 14803298 270104
> mmu_pte_write 19859863 363785
> mmu_shadow_zapped 4832106 93679
> pf_fixed 32184355 468398
> pf_guest 264138 0
> remote_tlb_flush 10697762 280522
> tlb_flush 10301338 176424
>
> (NOTE: This is for a *5* second sample interval instead of 1 to allow me to
> capture the data).
>
> Here's a sample when the guest is "well-behaved" (system time <10%, though ):
> exits 51502194 97453
> fpu_reload 1421736 227
> halt_exits 138361 1927
> halt_wakeup 33047 117
> host_state_reload 2110190 3740
> insn_emulation 24367441 47260
> io_exits 1874075 2576
> irq_exits 10224702 13333
> mmio_exits 435154 1726
> mmu_cache_miss 5414097 11258
> mmu_flooded 5411548 11243
> mmu_pde_zapped 52851 44
> mmu_prefetch 5414031 11258
> mmu_pte_updated 16854686 29901
> mmu_pte_write 22526765 42285
> mmu_shadow_zapped 5430025 11313
> pf_fixed 36144578 67666
> pf_guest 282794 430
> remote_tlb_flush 12126268 14619
> tlb_flush 11753162 21460
>
>
> There is definitely a strong correlation between the mmu counters and high
> system times in the guest. I am still trying to find out what in the guest is
> stimulating it when running on RHEL3; I do not see this same behavior for an
> equivalent setup running on RHEL4.
>
> By the way I added an mmu_prefetch stat in prefetch_page() to count the number
> of times the for() loop is hit with PTTYPE == 64; ie., number of times
> paging64_prefetch_page() is invoked. (I wanted an explicit counter for this
> loop, though the info seems to duplicate other entries.) That counter is listed
> above. As I mentioned in a prior post when kscand kicks in the change in
> mmu_prefetch counter is at 20,000+/sec, with each trip through that function
> taking 45k+ cycles.
>
> kscand is an instigator shortly after boot, however, kscand is *not* the culprit
> once the system has been up for 30-45 minutes. I have started instrumenting the
> RHEL3U8 kernel and for the load I am running kscand does not walk the active
> lists very often once the system is up.
>
> So, to dig deeper on what in the guest is stimulating the mmu I collected
> kvmtrace data for about a 2 minute time interval which caught about a 30-second
> period where guest system time was steady in the 25-30% range. Summarizing the
> number of times a RIP appears in an VMEXIT shows the following high runners:
>
> count RIP RHEL3-symbol
> 82549 0xc0140e42 follow_page [kernel] c0140d90 offset b2
> 42532 0xc0144760 handle_mm_fault [kernel] c01446d0 offset 90
> 36826 0xc013da4a futex_wait [kernel] c013d870 offset 1da
> 29987 0xc0145cd0 zap_pte_range [kernel] c0145c10 offset c0
> 27451 0xc0144018 do_no_page [kernel] c0143e20 offset 1f8
>
> (halt entry removed the list since that is the ideal scenario for an exit).
>
> So the RIP correlates to follow_page() for a large percentage of the VMEXITs.
>
> I wrote an awk script to summarize (histogram style) the TSC cycles between
> VMEXIT and VMENTRY for an address. For the first rip, 0xc0140e42, 82,271 times
> (ie., almost 100% of the time) the trace shows a delta between 50k and 100k
> cycles between the VMEXIT and the subsequent VMENTRY. Similarly for the second
> one, 0xc0144760, 42403 times (again almost 100% of the occurrences) the trace
> shows a delta between 50k and 100k cycles between VMEXIT and VMENTRY. These
> seems to correlate with the prefetch_page function in kvm, though I am not 100%
> positive on that.
>
> I am now investigating the kernel paths leading to those functions. Any insights
> would definitely be appreciated.
>
> david
>
>
> Marcelo Tosatti wrote:
>> On Fri, Apr 25, 2008 at 11:33:18AM -0600, David S. Ahern wrote:
>>> Most of the cycles (~80% of that 54k+) are spent in paging64_prefetch_page():
>>>
>>> for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
>>> gpa_t pte_gpa = gfn_to_gpa(sp->gfn);
>>> pte_gpa += (i+offset) * sizeof(pt_element_t);
>>>
>>> r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &pt,
>>> sizeof(pt_element_t));
>>> if (r || is_present_pte(pt))
>>> sp->spt[i] = shadow_trap_nonpresent_pte;
>>> else
>>> sp->spt[i] = shadow_notrap_nonpresent_pte;
>>> }
>>>
>>> This loop is run 512 times and takes a total of ~45k cycles, or ~88 cycles per
>>> loop.
>>>
>>> This function gets run >20,000/sec during some of the kscand loops.
>> Hi David,
>>
>> Do you see the mmu_recycled counter increase?
>>
>
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
next prev parent reply other threads:[~2008-04-30 4:18 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-16 0:15 performance with guests running 2.4 kernels (specifically RHEL3) David S. Ahern
2008-04-16 8:46 ` Avi Kivity
2008-04-17 21:12 ` David S. Ahern
2008-04-18 7:57 ` Avi Kivity
2008-04-21 4:31 ` David S. Ahern
2008-04-21 9:19 ` Avi Kivity
2008-04-21 17:07 ` David S. Ahern
2008-04-22 20:23 ` David S. Ahern
2008-04-23 8:04 ` Avi Kivity
2008-04-23 15:23 ` David S. Ahern
2008-04-23 15:53 ` Avi Kivity
2008-04-23 16:39 ` David S. Ahern
2008-04-24 17:25 ` David S. Ahern
2008-04-26 6:43 ` Avi Kivity
2008-04-26 6:20 ` Avi Kivity
2008-04-25 17:33 ` David S. Ahern
2008-04-26 6:45 ` Avi Kivity
2008-04-28 18:15 ` Marcelo Tosatti
2008-04-28 23:45 ` David S. Ahern
2008-04-30 4:18 ` David S. Ahern [this message]
2008-04-30 9:55 ` Avi Kivity
2008-04-30 13:39 ` David S. Ahern
2008-04-30 13:49 ` Avi Kivity
2008-05-11 12:32 ` Avi Kivity
2008-05-11 13:36 ` Avi Kivity
2008-05-13 3:49 ` David S. Ahern
2008-05-13 7:25 ` Avi Kivity
2008-05-14 20:35 ` David S. Ahern
2008-05-15 10:53 ` Avi Kivity
2008-05-17 4:31 ` David S. Ahern
[not found] ` <482FCEE1.5040306@qumranet.com>
[not found] ` <4830F90A.1020809@cisco.com>
2008-05-19 4:14 ` [kvm-devel] " David S. Ahern
2008-05-19 14:27 ` Avi Kivity
2008-05-19 16:25 ` David S. Ahern
2008-05-19 17:04 ` Avi Kivity
2008-05-20 14:19 ` Avi Kivity
2008-05-20 14:34 ` Avi Kivity
2008-05-22 22:08 ` David S. Ahern
2008-05-28 10:51 ` Avi Kivity
2008-05-28 14:13 ` David S. Ahern
2008-05-28 14:35 ` Avi Kivity
2008-05-28 19:49 ` David S. Ahern
2008-05-29 6:37 ` Avi Kivity
2008-05-28 14:48 ` Andrea Arcangeli
2008-05-28 14:57 ` Avi Kivity
2008-05-28 15:39 ` David S. Ahern
2008-05-29 11:49 ` Avi Kivity
2008-05-29 12:10 ` Avi Kivity
2008-05-29 13:49 ` David S. Ahern
2008-05-29 14:08 ` Avi Kivity
2008-05-28 15:58 ` Andrea Arcangeli
2008-05-28 15:37 ` Avi Kivity
2008-05-28 15:43 ` David S. Ahern
2008-05-28 17:04 ` Andrea Arcangeli
2008-05-28 17:24 ` David S. Ahern
2008-05-29 10:01 ` Avi Kivity
2008-05-29 14:27 ` Andrea Arcangeli
2008-05-29 15:11 ` David S. Ahern
2008-05-29 15:16 ` Avi Kivity
2008-05-30 13:12 ` Andrea Arcangeli
2008-05-31 7:39 ` Avi Kivity
2008-05-29 16:42 ` David S. Ahern
2008-05-31 8:16 ` Avi Kivity
2008-06-02 16:42 ` David S. Ahern
2008-06-05 8:37 ` Avi Kivity
2008-06-05 16:20 ` David S. Ahern
2008-06-06 16:40 ` Avi Kivity
2008-06-19 4:20 ` David S. Ahern
2008-06-22 6:34 ` Avi Kivity
2008-06-23 14:09 ` David S. Ahern
2008-06-25 9:51 ` Avi Kivity
2008-04-30 13:56 ` Daniel P. Berrange
2008-04-30 14:23 ` David S. Ahern
2008-04-23 8:03 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4817F30C.6050308@cisco.com \
--to=daahern@cisco.com \
--cc=avi@qumranet.com \
--cc=kvm-devel@lists.sourceforge.net \
--cc=mtosatti@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox