Re: [PATCH v3 2/2][XTF] xtf/vpmu: MSR read/write tests for VPMU

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Mohit Gambhir <mohit.gambhir@oracle.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>, xen-devel@lists.xen.org
Cc: boris.ostrovsky@oracle.com, mgambhir@outlook.com
Subject: Re: [PATCH v3 2/2][XTF] xtf/vpmu: MSR read/write tests for VPMU
Date: Thu, 4 May 2017 17:36:11 -0400	[thread overview]
Message-ID: <5a11fdc2-4db9-bf6c-d845-8989810971ae@oracle.com> (raw)
In-Reply-To: <710894a2-c8c0-8337-ef98-51f238be362b@oracle.com>

Adding personal email id for future correspondence on this thread - 
mgambhir@outlook.com

On 05/04/2017 05:13 PM, Mohit Gambhir wrote:
>
>
> On 05/03/2017 04:41 PM, Andrew Cooper wrote:
>> On 25/04/17 22:45, Mohit Gambhir wrote:
>>>>> diff --git a/tests/vpmu/Makefile b/tests/vpmu/Makefile
>>>>> new file mode 100644
>>>>> index 0000000..1eaf436
>>>>> --- /dev/null
>>>>> +++ b/tests/vpmu/Makefile
>>>>> @@ -0,0 +1,9 @@
>>>>> +include $(ROOT)/build/common.mk
>>>>> +
>>>>> +NAME      := vpmu
>>>>> +CATEGORY  := utility
>>>> utilities don't get run automatically.  Is this intentional?  If it
>>>> isn't, what is the plan for making it automatically run? vpmu is still
>>>> disabled by default in all branches due to the security 
>>>> vulnerabilities,
>>>> so even if this vpmu test was automatically run, it would skip due to
>>>> vpmu not being visible.
>>> The reason I wanted it to not run automatically was because I thought
>>> it will fail when vpmu
>>> is not enabled - which is the default case. But if XTF will skip vpmu
>>> tests when it is disabled
>>> then we can run the tests automatically. Should the CATEGORY be
>>> functional in that case?
>>>> As a tangent, I wonder if it would be a useful to have a separate
>>>> category for incomplete tests, but which are still useful to have for
>>>> manual running.
>>> Possibly. Utility category seems to do just that but I can see that it
>>> is really not meant for
>>> functional tests. There are situations where writing tests before or
>>> while implementing a
>>> feature is useful  and in that case this new category can be
>>> beneficial. It would also benefit
>>> to have some kind of "expected failure" return type.
>> Hopefully I have covered all of these points sufficiently in the 0/2
>> thread.  In the meantime, I'll consider how best to introduce a "not
>> quite fully baked yet" category.
>>
>>>>> +TEST-ENVS := $(ALL_ENVIRONMENTS)
>>>> You build for all environments, but then have PV unilaterally skip.
>>>> Again, is this intentional?
>>> Yes, I thought I would go back and add PV/AMD tests in V2 but I can
>>> leave those TEST-ENVS
>>> out for now and "skip the skip" :) if that's preferable.
>>>> For HVM, why all environments?  does PMU have any interaction with the
>>>> operating or paging mode in use at the time of the samples?
>>> Not that I know of. So should it just be hvm64?
>> For now, I'd stick with just hvm64.  (A complication which I have yet to
>> sort out is the TEST-REVISION builds, so adding new logical content to
>> an existing test doesn't interfere with OSSTests bisection logic, but
>> that isn't an immediate problem.)
> Fixed in v5.
>>>>> +
>>>>> +    printk("-----------------\n");
>>>>> +    printk("Testing version 1\n");
>>>>> +    printk("-----------------\n");
>>>>> +
>>>>> +    /* for all general purpose counters */
>>>>> +    for ( i = 0; i < ng; i++ )
>>>>> +    {
>>>>> +        /* test writing to IA32_PMCx */
>>>>> +        idx = MSR_IA32_PMC(i);
>>>>> +
>>>>> +        /* test we can write to all valid bits in the counters */
>>>>> +        /* don't set bit 31 since that gets sign-extended */
>>>> What is wrong with bit 31 being sign extended?
>>> The problem there is that setting PMCx to 0xffffffff sign extends 31st
>>> bit to
>>> the remaining bits. So PMCx becomes 0xffffffffffff (when bit width is
>>> 48). rdmsr then
>>> reads all 48 bits and the value mismatches the one that was written.
>> That behaviour sounds mad, but I see it now described in the manuals.
>> Oh well...
>>
>>> A smarter test would provide both - write value to write and read
>>> value to check against.
>>> But test_valid_msr_write() only takes wval and checks to see if we
>>> read the same value
>>> that we wrote.
>> Well - you did introduce that function.  If altering it would be a
>> smarter test, then perhaps that is the better option to take.
> Fixed in v5
>>>>> +        wval = ((1ull << 31) - 1) ;
>>>>> +
>>>>> +        if ( !test_valid_msr_write(idx, wval) )
>>>>> +            return false;
>>>>> +
>>>>> +        /* set all valid bits in MSR_IA32_EVENTSELx */
>>>>> +        idx = MSR_IA32_PERFEVTSEL(i);
>>>>> +        wval = ((1ull << 32) - 1) ^
>>>>> (IA32_PERFEVENTSELx_ENABLE_ANYTHREAD |
>>>>> +                IA32_PERFEVENTSELx_ENABLE_PCB);
>>>> What is wrong with Pin Control in this register? Architecturally, 
>>>> it is
>>>> a software configurable bit.
>>> Look at this patch on xen-devel
>>>
>>> [PATCH] Fix hypervisor crash when writing to VPMU MSR
>>>
>>> I found that out while writing this test.
>> Welcome to the very grey areas of processors which XTF has a habit of
>> finding.  (I really need to correlate all the suspected errata in the
>> doxygen comments).
>>
>> Obviously, Xen shouldn't crash, but we are going to need feedback from
>> Intel before working out how to proceed.
>>
>>>>> +    /* test IA32_DEBUGCTL */
>>>>> +    idx = MSR_IA32_DEBUGCTL;
>>>>> +
>>>>> +    /* Test IA32_DEBUGCTL facilities enabled by v2 */
>>>>> +    wval = IA32_DEBUGCTL_Freeze_LBR_ON_PMI |
>>>>> IA32_DEBUGCTL_Freeze_PerfMon_On_PMI;
>>>>> +
>>>>> +    /* FIXME: This should really be a valid write but it isnt
>>>>> supported by the
>>>>> +     * VPMU yet */
>>>> In which case the test should be correct here, and highlight that 
>>>> there
>>>> is a bug in Xen.
>>> But then, should the main test return xtf_success, xtf_skip or
>>> xtf_failure?
>> Failure.  Xen is provably malfunctioning.
> Fixed in v5. The tests now fail.
>>
>>>>> +    /* Get perf capabilties */
>>>>> +    if ( rdmsr_safe(MSR_IA32_PERF_CAPABILITIES, &caps) )
>>>>> +    {
>>>>> +        printk("Fault while reading MSR_IA32_PERF_CAPABILITIES\n");
>>>> The PERF_CAPABILITIES MSR is only valid to read if PDCM is 
>>>> advertised in
>>>> CPUID.
>>>>
>>>> We currently never advertise PCDM, and is a bug that
>>>> MSR_PERF_CAPABILITIES is accessable at all in guest context.
>>>> (specifically, it is because Xen leaks almost all host MSR state into
>>>> guests).
>>> So I suppose I can't test full-bit width writes then?
>> Well, the code should be architecturally correct to start with. After
>> that, it is reasonable to explain that Xen used to leak
>> PERF_CAPABILITIES into guests view, at which point probing it is a
>> legitimate thing to do.
>>
>> However, you should take care that your logic doesn't begin to fail at
>> the point that Xen is fixed to behave architecturally (and actually
>> return #GP[0] if PDCM isn't advertised to the guest), because I will
>> definitely be fixing the MSR leakage problems (if noone beats me to it).
> Fixed in v5 ( I think!)
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    if ( !(caps >> 13) & 1 )
>>>> This lacks sufficient brackets for what you are intending to do.  
>>>> Also,
>>>> it looks like you probably want a #define at the top of the file for
>>>> this constant.
>>> ! and & have the same precedence and right to left associativity. So
>>> technically it should
>>> do what it is intended to do. But I will make it look better.
>> Logical NOT and Address-of have the same precedence.  Logical NOT has a
>> higher precedence than Bitwise AND.
>>
>> And to resolve any doubt, here is what Clang things:
>>
>> andrewcoop@andrewcoop:/local/xen-test-framework.git$ make CC=clang-4.0
>> -j4 -s
>> main.c:396:10: error: logical not is only applied to the left hand side
>> of this bitwise operator [-Werror,-Wlogical-not-parentheses]
>>      if ( !(caps >> 13) & 1 )
>>           ^             ~
> Hard to argue that. Fixed in v5.
>>>>> +
>>>>> +        case 3:
>>>>> +                /* test version 3 facilities */
>>>>> +                if ( !test_intel_pmu_ver3( ngregs, nfregs) )
>>>>> +                    return xtf_failure("Fail: Failed VPMU version
>>>>> 3\n");
>>>>> +                /* fall through */
>>>>> +
>>>>> +        case 2:
>>>>> +                /* test version 2 facilities */
>>>>> +                if ( !test_intel_pmu_ver2(ngregs, nfregs) )
>>>>> +                    return xtf_failure("Fail: Failed VPMU version
>>>>> 2\n");
>>>>> +
>>>>> +                /* test version 1 facilities */
>>>>> +                if ( !test_intel_pmu_ver1(ngregs) )
>>>>> +                    return xtf_failure("Fail: Failed VPMU version
>>>>> 1\n");
>>>>> +
>>>>> +                /* test full width counters */
>>>>> +                if ( !test_full_width_cnts(ngregs, ngbits) )
>>>>> +                    return xtf_failure("Fail: Failed full width
>>>>> test\n");
>>>>> +
>>>>> +                break;
>>>>> +
>>>>> +        case 1:
>>>>> +                /* version 1 unsupported */
>>>> You have a v1 test function, yet it is unsupported?
>>> Yes, from VPMUs perspective, v1 in itself is unsupported because that
>>> would
>>> have required disabling access to PERF_GLOBAL_STATUS and
>>> PERF_GLOBAL_CTRL MSRs.
>>> However, each version does support facilities introduced in the
>>> previous version.
>>> Thus, test_intel_pmu_ver1() is intended for v2 testing and helps
>>> breaking the
>>> test_intel_pmu_ver2()  function into facilities introduced by the
>>> architecture in
>>> two separate versions - v1 and v2
>> You are logically confusing Xen's reasoning for supporting different
>> versions of hardware PMU, with the version it decides to advertise to
>> guests.
>>
>> It is certainly reasonable for the vPMU driver in Xen to decide that,
>> without PMU v2, it is unacceptably complicated to deal with context
>> switching the counter state.  However, this has no bearing on which
>> version of PMU is emulated to the guest, and it would certainly should
>> be possible to configure Xen to only advertise v1 to guests, while the
>> real hardware actually supports v2.  (After all, we already downgrade v4
>> to v3 in a guests view).
> Yes, I think I understand what you are saying. I still do stand by the 
> decision of  having a
> separate function to test v1 facilities for -
> A) modularity
> B) Localizing the knowledge of xen internals to test_main() function. 
> The rest of the code
> is written to test the facilities  as described in the SDM.
>> On another note about the MSR leakage, the v$N test case should ideally
>> check that none of the v$N+1 capabilities/features are leaked into the
>> guest, but this smells like a similar quantity of "redoing it properly
>> the 2nd time around" as my attempts to fix CPUID handling in Xen.
> Maybe something that we can add in a future patch?
>> ~Andrew
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2017-05-04 21:36 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-24 17:54 [PATCH v3 0/2][XTF] xtf/vpmu VPMU tests Mohit Gambhir
2017-04-24 17:54 ` [PATCH v3 1/2][XTF] xtf/vpmu: Add Intel PMU MSR addresses Mohit Gambhir
2017-04-25  8:36   ` Jan Beulich
2017-04-25 15:04     ` Mohit Gambhir
2017-04-25 15:24   ` Andrew Cooper
2017-04-25 17:46     ` Mohit Gambhir
2017-04-26  6:07     ` Jan Beulich
2017-04-26  8:02       ` Andrew Cooper
2017-04-24 17:54 ` [PATCH v3 2/2][XTF] xtf/vpmu: MSR read/write tests for VPMU Mohit Gambhir
2017-04-25 18:20   ` Andrew Cooper
2017-04-25 21:45     ` Mohit Gambhir
2017-05-03 20:41       ` Andrew Cooper
2017-05-04 21:13         ` Mohit Gambhir
2017-05-04 21:36           ` Mohit Gambhir [this message]
2017-04-25 18:50 ` [PATCH v3 0/2][XTF] xtf/vpmu VPMU tests Andrew Cooper
2017-04-28 19:35   ` Mohit Gambhir
2017-05-03 20:07     ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5a11fdc2-4db9-bf6c-d845-8989810971ae@oracle.com \
    --to=mohit.gambhir@oracle.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=mgambhir@outlook.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).