From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dor Laor Subject: Re: Performance overhead of get_cycles_sync Date: Tue, 11 Dec 2007 16:11:30 +0200 Message-ID: <475E9A92.4030001@qumranet.com> References: <475E8C8B.7070308@qumranet.com> <20071211133738.GA8150@elte.hu> Reply-To: dor.laor-atKUWr5tajBWk0Htik3J/w@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0308456227==" Cc: kvm-devel , Linux Kernel Mailing List To: Ingo Molnar Return-path: In-Reply-To: <20071211133738.GA8150-X9Un+BFzKDI@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org Errors-To: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org List-Id: kvm.vger.kernel.org This is a multi-part message in MIME format. --===============0308456227== Content-Type: multipart/alternative; boundary="------------070701020703090108060404" This is a multi-part message in MIME format. --------------070701020703090108060404 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Ingo Molnar wrote: > > * Dor Laor wrote: > > > Hi Ingo, Thomas, > > > > In the latest kernel (2.6.24-rc3) I noticed a drastic performance > > decrease for KVM networking. The reason is many vmexit (exit reason is > > cpuid instruction) caused by calls to gettimeofday that uses tsc > > sourceclock. read_tsc calls get_cycles_sync which might call cpuid in > > order to serialize the cpu. > > > > Can you explain why the cpu needs to be serialized for every gettime > > call? Do we need to be that accurate? (It will also slightly improve > > physical hosts). I believe you have a reason and the answer is yes. In > > that case can you replace the serializing instruction with an > > instruction that does not trigger vmexit? Maybe use 'ltr' for example? > > hm, where exactly does it call CPUID? > > Ingo > Here [include/asm-x86/tsc.h]: /* Like get_cycles, but make sure the CPU is synchronized. */ static __always_inline cycles_t get_cycles_sync(void) { unsigned long long ret; unsigned eax, edx; /* * Use RDTSCP if possible; it is guaranteed to be synchronous * and doesn't cause a VMEXIT on Hypervisors */ alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP, ASM_OUTPUT2("=a" (eax), "=d" (edx)), "a" (0U), "d" (0U) : "ecx", "memory"); ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax); if (ret) return ret; /* * Don't do an additional sync on CPUs where we know * RDTSC is already synchronous: */ // alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC, // "=a" (eax), "0" (1) : "ebx","ecx","edx","memory"); rdtscll(ret); return ret; } --------------070701020703090108060404 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Ingo Molnar wrote:
Re: Performance overhead of get_cycles_sync

* Dor Laor <dor.laor-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Hi Ingo, Thomas,
>
> In the latest kernel (2.6.24-rc3) I noticed a drastic performance
> decrease for KVM networking. The reason is many vmexit (exit reason is
> cpuid instruction) caused by calls to gettimeofday that uses tsc
> sourceclock. read_tsc calls get_cycles_sync which might call cpuid in
> order to serialize the cpu.
>
> Can you explain why the cpu needs to be serialized for every gettime
> call? Do we need to be that accurate? (It will also slightly improve
> physical hosts). I believe you have a reason and the answer is yes. In
> that case can you replace the serializing instruction with an
> instruction that does not trigger vmexit? Maybe use 'ltr' for example?

hm, where exactly does it call CPUID?

        Ingo

Here [include/asm-x86/tsc.h]:

/* Like get_cycles, but make sure the CPU is synchronized. */
static __always_inline cycles_t get_cycles_sync(void)
{
    unsigned long long ret;
    unsigned eax, edx;

    /*
       * Use RDTSCP if possible; it is guaranteed to be synchronous
      * and doesn't cause a VMEXIT on Hypervisors
     */
    alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
               ASM_OUTPUT2("=a" (eax), "=d" (edx)),
               "a" (0U), "d" (0U) : "ecx", "memory");
    ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax);
    if (ret)
        return ret;

    /*
     * Don't do an additional sync on CPUs where we know
     * RDTSC is already synchronous:
     */
//    alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
//              "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
    rdtscll(ret);

    return ret;
}

--------------070701020703090108060404-- --===============0308456227== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php --===============0308456227== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ kvm-devel mailing list kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org https://lists.sourceforge.net/lists/listinfo/kvm-devel --===============0308456227==--