From mboxrd@z Thu Jan  1 00:00:00 1970
From: Don Slutz <dslutz@verizon.com>
Subject: Re: [PATCH for-4.5 v6 06/16] xen: Convert vmware_port
 to xentrace usage
Date: Mon, 29 Sep 2014 14:10:44 -0400
Message-ID: <5429A0A4.9070704@terremark.com>
References: <1411236447-7435-1-git-send-email-dslutz@verizon.com>	<1411236447-7435-7-git-send-email-dslutz@verizon.com>	<5422FF1D.8000308@eu.citrix.com>	<5423167A.1070901@terremark.com>
	<CAFLBxZaH-iRLyFTN+7na0a6NxFmQsi9hnTgwhdCSf201-D24Pg@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <CAFLBxZaH-iRLyFTN+7na0a6NxFmQsi9hnTgwhdCSf201-D24Pg@mail.gmail.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: George Dunlap <George.Dunlap@eu.citrix.com>, Don Slutz <dslutz@verizon.com>
Cc: Tim Deegan <tim@xen.org>, Kevin Tian <kevin.tian@intel.com>, Keir Fraser <keir@xen.org>, Ian Campbell <ian.campbell@citrix.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, Ian Jackson <ian.jackson@eu.citrix.com>, Eddie Dong <eddie.dong@intel.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Jan Beulich <jbeulich@suse.com>, Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>, Jun Nakajima <jun.nakajima@intel.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Boris Ostrovsky <boris.ostrovsky@oracle.com>, Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
List-Id: xen-devel@lists.xenproject.org


On 09/25/14 11:14, George Dunlap wrote:
> On Wed, Sep 24, 2014 at 8:07 PM, Don Slutz <dslutz@verizon.com> wrote:
>> On 09/24/14 13:27, George Dunlap wrote:
>>> On 09/20/2014 07:07 PM, Don Slutz wrote:
>>>> Reduce the VMPORT_DBG_LOG calls.
>>>
>>> You should also have mentioned that you added hew HVMTRACE macros which
>>> will log the TSC value.
>>>
>>> The reason the HVMTRACE macros don't log the TSC values is that for the
>>> most part you can get all the timing information you need from the TSC on
>>> the vmexit and vmenter.  Looking at where you've added the TSC values, I
>>> don't really see how it adds anything except bloat to the log.  Is there a
>>> reason you need to know exactly when these different things happened,
>>> instead of just being able to bracket them between VMENTER and VMEXITs?
>>>
>> I did want a way to know how long the VMware code was taking.  I am
>> not sure this is required.
>>
>> For example:
>>
>> CPU1  2899550319282 (+    4170)  VMEXIT      [ exitcode = 0x00000000, rIP  =
>> 0x00007fad13ffec8c ]
>> CPU1  2899550320086 (+     804)  TRAP_GP     [ inst_len = 1 edx = 0x00005658
>> exitinfo1 = 0x0000000000000000 exitinfo2 = 0x0000000000000000 ]
>> CPU1  2899550325054 (+    4968)  VMPORT_READ_BEFORE  [ eax = 0x564d5868 ebx
>> = 0x000001b3 ecx = 0x0003001e edx = 0x00005658 esi = 0x00000000 edi =
>> 0x000001b3 ]
>> CPU1  2899550326050 (+     996)  VMPORT_READ_AFTER  [ eax = 0x564d5868 ebx =
>> 0x000001b3 ecx = 0x0001001e edx = 0x00005658 esi = 0x00000000 edi =
>> 0x000001b3 ]
>> CPU1  2899550326722 (+     672)  vlapic_accept_pic_intr [ i8259_target = 1,
>> accept_pic_int = 0 ]
>> CPU1  2899550327454 (+     732)  VMENTRY
>>
>>
>> But I am happy to drop the new log TSC macros.
> The tracing function itself is not free -- the trace_var() function
> probably executes 5x the amount of code that is actually executed
> between the READ_BEFORE and READ_AFTER (given it's a switch statement
> and each one is basically a handful of variable assignments).  It's
> not unlikely that most of the 996 us there is in the trace function
> itself.

Ok, but the 83700 us is not all this.


>>>> diff --git a/xen/arch/x86/hvm/vmware/vmport.c
>>>> b/xen/arch/x86/hvm/vmware/vmport.c
>>>> index 811c303..962ee32 100644
>>>> --- a/xen/arch/x86/hvm/vmware/vmport.c
>>>> +++ b/xen/arch/x86/hvm/vmware/vmport.c
>>> We normally don't log both BEFORE and AFTER states of things like
>>> hypercalls -- just logging the outcome of what the hypervisor did should be
>>> sufficient, shouldn't it?
>>
>> Not that clear with this poorly build hyper-call.
>>
>>> Do you really need to know the value of things that got clobbered?
>>
>> When checking on the complex state machine that is the "RPC" code
>> it was very helpful.  With that code moving to QEMU, the before and
>> after is not so important.
>>
>> Here is a real example:
>>
>> CPU2  865821836508576 (+    2562)  VMEXIT      [ exitcode = 0x00000000, rIP
>> = 0x00007f68a8b17c8c ]
>> CPU2  865821836509362 (+     786)  TRAP_GP     [ inst_len = 1 edx =
>> 0x00025658 exitinfo1 = 0x0000000000000000 exitinfo2 = 0x0000000000000000 ]
>> CPU2  865821836514132 (+    4770)  VMPORT_READ_BEFORE  [ eax = 0x564d5868
>> ebx = 0x00000034 ecx = 0x0002001e edx = 0x00025658 esi = 0x00000000 edi =
>> 0x000001be ]
>> CPU2  865821836597832 (+   83700)  VMPORT_READ_AFTER  [ eax = 0x564d5868 ebx
>> = 0x00000034 ecx = 0x0001001e edx = 0x00025658 esi = 0x00000000 edi =
>> 0x000001be ]
>> CPU2  865821836598756 (+     924)  vlapic_accept_pic_intr [ i8259_target =
>> 1, accept_pic_int = 0 ]
>> CPU2  865821836605602 (+    6846)  vlapic_accept_pic_intr [ i8259_target =
>> 1, accept_pic_int = 0 ]
>> CPU2  865821836606436 (+     834)  VMENTRY
>> CPU2  865821836609712 (+    3276)  VMEXIT      [ exitcode = 0x00000000, rIP
>> = 0x00007f68a8b17c8c ]
>> CPU2  865821836610654 (+     942)  TRAP_GP     [ inst_len = 1 edx =
>> 0x00025658 exitinfo1 = 0x0000000000000000 exitinfo2 = 0x0000000000000000 ]
>> CPU2  865821836616828 (+    6174)  VMPORT_READ_BEFORE  [ eax = 0x564d5868
>> ebx = 0x00000034 ecx = 0x0003001e edx = 0x00025658 esi = 0x00000000 edi =
>> 0x000001be ]
>> CPU2  865821836617800 (+     972)  VMPORT_READ_AFTER  [ eax = 0x564d5868 ebx
>> = 0x00000011 ecx = 0x0003001e edx = 0x00015658 esi = 0x00000000 edi =
>> 0x000001be ]
>> CPU2  865821836618664 (+     864)  vlapic_accept_pic_intr [ i8259_target =
>> 1, accept_pic_int = 0 ]
>> CPU2  865821836619444 (+     780)  VMENTRY
>>
>> Note that in the one "RPC" call,
>>
>>
>> grep VMPORT_READ_BEFORE ~/zz-xentrace-vmware3-0.out | wc
>>     1592   39800  256312
>>
>> It took 1592 #GP traps to handle it, and 9643628760 tsc cycles.
> Right, so what I started to say yesterday: It looks like most of the
> trace points you're adding here is to help you debug the functionality
> of the hypervisor.  That certainly makes sense for you to do during
> development.  But what we want in upstream is something that will help
> us in production.  For that to be useful, we need the logging to be as
> efficient as possible. Every additional HVM trace point potentially
> adds more data to someone else's HVM trace. So we don't want
> extraneous information, and we don't want to log something that we can
> infer from something else.
>
> In general, it's enough to give information about the decisions that
> Xen is making to infer what the previous state is; and then giving
> information about what Xen did in response (i.e., return values,
> injection of traps, &c) to help figure out how the guest responded.
> In this case, I'd probably trace:
> 1. vmport hypercall, handled command
>   - cmd
>   - return values of modified registers.  Ideally only the registers
> that are modified, but just taking a big batch would be OK for now.
>   Note: No need for rip here, as it will be collected at the VMEXIT
> 2. vmport hypercall, unhandled command
>   - Just the unimplemented fail
> 3. In the vmport_gp_check(), if it successfully decodes an IO instruction:
>   - direction (read/write)
>   - size of the access
>
> I might consider logging something the failure path of
> *_vmexit_gp_intercept(), with information that might help you figure
> out why it didn't make it to the vmcall; but on the whole I think I'd
> probably leave that off.
>
> Hopefully all that would give you enough information to figure out
> where the problem was and how to reproduce the behavior locally; and
> once you can reproduce it locally, you could add in debugging traces
> (which wouldn't be upstreamed) to help you figure out why it wasn't
> taking the path you expected.
>
> Does that make sense?

Yes.  Will attempt to have this subset.

    -Don Slutz

> If you really want more traces, I might consider allowing them in the
> code but off by default; but I think you probably won't need more
> information from running production systems.
>
>   -George