From: Peter Maloney <peter.maloney@brockmann-consult.de>
To: Andres Lagar-Cavilla <andreslc@gridcentric.ca>
Cc: Tim Deegan <tim@xen.org>,
Andres Lagar-Cavilla <andres@lagarcavilla.org>,
xen-devel@lists.xen.org
Subject: Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
Date: Tue, 13 Nov 2012 14:17:01 +0100 [thread overview]
Message-ID: <50A2484D.309@brockmann-consult.de> (raw)
In-Reply-To: <E9577EEC-5F2E-4DFB-913F-C470D77CCF5C@gridcentric.ca>
On 2012-11-01 18:28, Andres Lagar-Cavilla wrote:
> On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote:
>
>> Hi,
>>
>> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote:
>>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote:
>>>> The change was 8 months ago
>>>>
>>>> changeset: 24770:7f79475d3de7
>>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org>
>>>> date: Fri Feb 10 16:07:07 2012 +0000
>>>> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications
>>> This change was bad for performnace across the board and most of it has
>>> since been either reverted or amended, but clearly we missed something
>>> here.
>>>
>>> It's interesting that Win8 isn't slowed down. I wonder whether that's to
>>> do with the way it drives the VGA card -- IIRC it uses a generic VESA
>>> driver rather than a Cirrus one.
>> In fact this is to do with the APIC. On my test system, a busy 2-vcpu
>> VM is making about 300k/s accesses to the APIC TPR. These accesses are
>> all trapped and emulated by Xen, and that emulation has got more
>> expensive as part of this change.
>>
>> Later Windows OSes have a feature called 'lazy IRQL' which makes those
>> accesses go away, but sadly that's not been done for WinXP. On modern
>> Intel CPUs, the hardware acceleration for TPR accesses works for XP; on
>> AMD it requires the OS to use 'MOV reg32, CR8' to access the TPR instead
>> of MMIO, which XP is clearly not doing. :(
>>
>> Peter: if you have the option, you might find that installing the PV
>> drivers that ship with Citrix XenServer 6.0 makes things work better.
>>
>> Andres: even though this load of APIC emulations is pretty extreme, it's
>> surprising that the VM runs faster on shadow pagetables! Any ideas for
>> where this slowdown is coming from?
> Not any immediate ideas without profiling.
>
> However, most callers of hvmemul_do_io pass a stub zero ram_gpa address. We might be madly hitting the p2m locks for no reason there.
>
> How about the following patch, Peter, Tim?
Thanks,
I'll give it a try sometime this week I guess.
> diff -r 5171750d133e xen/arch/x86/hvm/emulate.c
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -60,24 +60,28 @@ static int hvmemul_do_io(
> ioreq_t *p = get_ioreq(curr);
> unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
> p2m_type_t p2mt;
> - struct page_info *ram_page;
> + struct page_info *ram_page = NULL;
> int rc;
>
> /* Check for paged out page */
> - ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
> - if ( p2m_is_paging(p2mt) )
> + if ( ram_gpa != INVALID_MFN )
> {
> - if ( ram_page )
> - put_page(ram_page);
> - p2m_mem_paging_populate(curr->domain, ram_gfn);
> - return X86EMUL_RETRY;
> - }
> - if ( p2m_is_shared(p2mt) )
> - {
> - if ( ram_page )
> - put_page(ram_page);
> - return X86EMUL_RETRY;
> - }
> + ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
> + if ( p2m_is_paging(p2mt) )
> + {
> + if ( ram_page )
> + put_page(ram_page);
> + p2m_mem_paging_populate(curr->domain, ram_gfn);
> + return X86EMUL_RETRY;
> + }
> + if ( p2m_is_shared(p2mt) )
> + {
> + if ( ram_page )
> + put_page(ram_page);
> + return X86EMUL_RETRY;
> + }
> + } else
> + value = 0; /* for pvalue */
>
> /*
> * Weird-sized accesses have undefined behaviour: we discard writes
> @@ -455,7 +459,7 @@ static int __hvmemul_read(
> return X86EMUL_UNHANDLEABLE;
> gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
> if ( (off + bytes) <= PAGE_SIZE )
> - return hvmemul_do_mmio(gpa, &reps, bytes, 0,
> + return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN,
> IOREQ_READ, 0, p_data);
> }
>
> @@ -480,7 +484,8 @@ static int __hvmemul_read(
> addr, &gpa, bytes, &reps, pfec, hvmemul_ctxt);
> if ( rc != X86EMUL_OKAY )
> return rc;
> - return hvmemul_do_mmio(gpa, &reps, bytes, 0, IOREQ_READ, 0, p_data);
> + return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN,
> + IOREQ_READ, 0, p_data);
> case HVMCOPY_gfn_paged_out:
> return X86EMUL_RETRY;
> case HVMCOPY_gfn_shared:
> @@ -552,7 +557,7 @@ static int hvmemul_write(
> unsigned int off = addr & (PAGE_SIZE - 1);
> gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off);
> if ( (off + bytes) <= PAGE_SIZE )
> - return hvmemul_do_mmio(gpa, &reps, bytes, 0,
> + return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN,
> IOREQ_WRITE, 0, p_data);
> }
>
> @@ -573,7 +578,7 @@ static int hvmemul_write(
> addr, &gpa, bytes, &reps, pfec, hvmemul_ctxt);
> if ( rc != X86EMUL_OKAY )
> return rc;
> - return hvmemul_do_mmio(gpa, &reps, bytes, 0,
> + return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN,
> IOREQ_WRITE, 0, p_data);
> case HVMCOPY_gfn_paged_out:
> return X86EMUL_RETRY;
> @@ -804,7 +809,7 @@ static int hvmemul_read_io(
> {
> unsigned long reps = 1;
> *val = 0;
> - return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_READ, 0, val);
> + return hvmemul_do_pio(port, &reps, bytes, INVALID_MFN, IOREQ_READ, 0, val);
> }
>
> static int hvmemul_write_io(
> @@ -814,7 +819,7 @@ static int hvmemul_write_io(
> struct x86_emulate_ctxt *ctxt)
> {
> unsigned long reps = 1;
> - return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_WRITE, 0, &val);
> + return hvmemul_do_pio(port, &reps, bytes, INVALID_MFN, IOREQ_WRITE, 0, &val);
> }
>
> static int hvmemul_read_cr(
> diff -r 5171750d133e xen/arch/x86/hvm/io.c
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -231,7 +231,7 @@ int handle_pio(uint16_t port, int size,
> if ( dir == IOREQ_WRITE )
> data = guest_cpu_user_regs()->eax;
>
> - rc = hvmemul_do_pio(port, &reps, size, 0, dir, 0, &data);
> + rc = hvmemul_do_pio(port, &reps, size, INVALID_MFN, dir, 0, &data);
>
> switch ( rc )
> {
>
>
next prev parent reply other threads:[~2012-11-13 13:17 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-20 17:21 xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7 Peter Maloney
2012-10-20 18:40 ` Peter Maloney
2012-10-22 13:56 ` Andres Lagar-Cavilla
2012-10-22 13:59 ` Tim Deegan
2012-10-23 22:17 ` Peter Maloney
2012-11-01 17:00 ` Tim Deegan
2012-11-01 17:28 ` Andres Lagar-Cavilla
2012-11-13 13:17 ` Peter Maloney [this message]
2012-11-22 18:54 ` Peter Maloney
2013-01-12 15:25 ` Peter Maloney
2013-01-17 20:57 ` Pasi Kärkkäinen
2013-01-18 14:22 ` George Dunlap
2013-01-18 14:40 ` Andres Lagar-Cavilla
2013-01-21 12:07 ` George Dunlap
2013-01-18 14:30 ` George Dunlap
2013-01-26 12:30 ` Peter Maloney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50A2484D.309@brockmann-consult.de \
--to=peter.maloney@brockmann-consult.de \
--cc=andres@lagarcavilla.org \
--cc=andreslc@gridcentric.ca \
--cc=tim@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.