Re: [RFC][PATCH 04/13] tools/libxl: detect and avoid conflicts with RDM

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: "Chen, Tiejun" <tiejun.chen@intel.com>
To: Wei Liu <wei.liu2@citrix.com>
Cc: kevin.tian@intel.com, ian.campbell@citrix.com,
	andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org,
	stefano.stabellini@citrix.com, JBeulich@suse.com,
	yang.z.zhang@intel.com, Ian.Jackson@eu.citrix.com
Subject: Re: [RFC][PATCH 04/13] tools/libxl: detect and avoid conflicts with RDM
Date: Tue, 19 May 2015 14:47:05 +0800	[thread overview]
Message-ID: <555ADC69.5060605@intel.com> (raw)
In-Reply-To: <20150518200017.GA9893@zion.uk.xensource.com>

On 2015/5/19 4:00, Wei Liu wrote:
> On Thu, May 14, 2015 at 04:27:45PM +0800, Chen, Tiejun wrote:
>> On 2015/5/11 19:32, Wei Liu wrote:
>>> On Mon, May 11, 2015 at 04:09:53PM +0800, Chen, Tiejun wrote:
>>>> On 2015/5/8 22:43, Wei Liu wrote:
>>>>> Sorry for the late review. This series fell through the crack.
>>>>>
>>>>
>>>> Thanks for your review.
>>>>
>>>>> On Fri, Apr 10, 2015 at 05:21:55PM +0800, Tiejun Chen wrote:
>>>>>> While building a VM, HVM domain builder provides struct hvm_info_table{}
>>>>>> to help hvmloader. Currently it includes two fields to construct guest
>>>>>> e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
>>>>>> check them to fix any conflict with RAM.
>>>>>>
>>>>>> RMRR can reside in address space beyond 4G theoretically, but we never
>>>>>> see this in real world. So in order to avoid breaking highmem layout
>>>>>
>>>>> How does this break highmem layout?
>>>>
>>>> In most cases highmen is always continuous like [4G, ...] but RMRR is
>>>> theoretically beyond 4G but very rarely. Especially we don't see this
>>>> happened in real world. So we don't want to such a case breaking the
>>>> highmem.
>>>>
>>>
>>> The problem is  we take this approach just because this rarely happens
>>> *now* is not future proof.  It needs to be clearly documented somewhere
>>> in the manual (or any other Intel docs) and be referenced in the code.
>>> Otherwise we might end up in a situation that no-one knows how it is
>>> supposed to work and no-one can fix it if it breaks in the future, that
>>> is, every single device on earth requires RMRR > 4G overnight (I'm
>>> exaggerating).
>>>
>>> Or you can just make it works with highmem. How much more work do you
>>> envisage?
>>>
>>> (If my above comment makes no sense do let me know. I only have very
>>> shallow understanding of RMRR)
>>
>> Maybe I'm misleading you :)
>>
>> I don't mean RMRR > 4G is not allowed to work in our implementation. What
>> I'm saying is that our *policy* is just simple for this kind of rare highmem
>> case...
>>
>
> Yes, but this policy is hardcoded in code (as in, you bail when
> detecting conflict in highmem region). I don't think we have the
> expertise to un-hardcode it in the future (it might require x86 specific
> insider information and specific hardware to test). So I would like to
> make it as future proof as possible.
>
> I know you're limited by hvm_info. I would accept this hardcoded policy
> as short term solution, but I would like commitment from Intel to
> maintain this piece of code and properly work out a flexible solution to
> deal with >4G case.

Looks Kevin replied this.

>
>>>
>>>>>
>>>>>> we don't solve highmem conflict. Note this means highmem rmrr could still
>>>>>> be supported if no conflict.
>>>>>>
>>
>> Like these two sentences above.
>>
>>>>>
>>>>> Aren't you actively trying to avoid conflict in libxl?
>>>>
>>>> RMRR is fixed by BIOS so we can't aovid conflict. Here we want to adopt some
>>>> good policies to address RMRR. In the case of highmemt, that simple policy
>>>> should be enough?
>>>>
>>>
>>> Whatever policy you and HV maintainers agree on. Just clearly document
>>> it.
>>
>> Do you mean I should brief this patch description into one separate
>> document?
>>
>
> Either in code comment or in a separate document.

Okay.

>
>>>
>>>>>
>
> [...]
>
>>>> The caller to xc_device_get_rdm always frees this.
>>>>
>>>
>>> I think I misunderstood how this function works. I thought xrdm was
>>> passed in by caller, which is clearly not the case. Sorry!
>>>
>>>
>>> In that case, the `if ( rc < 0 )' is not needed because the call should
>>> always return rc < 0. An assert is good enough.
>>
>> assert(rc < 0)? But we can't presume the user always pass '0' firstly, and
>> additionally, we may have no any RMRR indeed.
>>
>> So I guess what you want is,
>>
>> assert( rc <=0 );
>> if ( !rc )
>>      goto xrdm;
>>
>
> Yes I was thinking about something like this.
>
> How in this case can rc equal to 0? Is it because you don't actually have any

Yes.

> regions? If so, please write a comment.
>

Sure.

>> if ( errno == ENOBUFS )
>> ...
>>
>> Right?
>>
>>>
>
> [...]
>
>>>>>
>>>>>> +    memset(d_config->rdms, 0, sizeof(libxl_device_rdm));
>>>>>> +
>>>>>> +    /* Query all RDM entries in this platform */
>>>>>> +    if (type == LIBXL_RDM_RESERVE_TYPE_HOST) {
>>>>>> +        d_config->num_rdms = nr_all_rdms;
>>>>>> +        for (i = 0; i < d_config->num_rdms; i++) {
>>>>>> +            d_config->rdms[i].start =
>>>>>> +                                (uint64_t)xrdm[i].start_pfn << XC_PAGE_SHIFT;
>>>>>> +            d_config->rdms[i].size =
>>>>>> +                                (uint64_t)xrdm[i].nr_pages << XC_PAGE_SHIFT;
>>>>>> +            d_config->rdms[i].flag = d_config->b_info.rdm.reserve;
>>>>>> +        }
>>>>>> +    } else {
>>>>>> +        d_config->num_rdms = 0;
>>>>>> +    }
>>>>>
>>>>> And you should move d_config->rdms = libxl__calloc inside that `if'.
>>>>> That is, don't allocate memory if you don't need it.
>>>>
>>>> We can't since in all cases we need to preallocate this, and then we will
>>>> handle this according to our policy.
>>>>
>>>
>>> How would it ever be used again if you set d_config->num_rdms to 0? How
>>> do you know the exact size of your array again?
>>
>> If we don't consider multiple devices shares one rdm entry, our workflow can
>> be showed as follows:
>>
>> #1. We always preallocate all rdms[] but with memset().
>
> Because the number of RDM entries used in a domain never exceeds the
> number of global entries? I.e. we never risk overrunning the array?

Yes.

"global" indicates the number would count all rdm entries. If w/o 
"global" flag, we just count these rdm entries associated to those pci 
devices assigned to this domain. So this means actually we have this 
equation,

The number without "global" <= The number with "global"

>
>> #2. Then we have two cases for that global rule,
>>
>> #2.1 If we really need all according to our global rule, we would set all
>> rdms[] with all real rdm info and set d_config->num_rdms.
>> #2.2 If we have no such a global rule to obtain all, we just clear
>> d_config->num_rdms.
>>
>
> No, don't do this. In any failure path, if the free / dispose function
> depends on num_rdms to iterate through the whole list to dispose memory
> (if your rdm structure later contains pointers to allocated memory),
> this method leaks memory.
>
> The num_ field should always reflect the actual size of the array.
>
>> #3. Then for per device rule
>>
>> #3.1 From #2.1, we just need to check if we should change one given rdm
>> entry's policy if this given entry is just associated to this device.
>> #3.2 From 2.2, obviously we just need to fill rdms one by one. Of course,
>> its very possible that we don't fill all rdms since all passthroued devices
>> might not have no rdm at all or they just occupy some. But anyway, finally
>> we sync d_config->num_rdms.
>>
>
> Sorry I don't follow. But if your problem is you don't know how many
> entries you actually need, just use libxl__realloc?
>
>>>
>>>>>
>>>>>> +    free(xrdm);
>>>>>> +
>>>>>> +    /* Query RDM entries per-device */
>>>>>> +    for (i = 0; i < d_config->num_pcidevs; i++) {
>>>>>> +        unsigned int nr_entries = 0;
>>>>
>>>> Maybe I should restate this,
>>>> 	unsigned int nr_entries;
>>>>
>
> [...]
>
>>>>
>>>> Like I replied above we always preallocate this at the beginning.
>>>>
>>>
>>> Ah, OK.
>>>
>>> But please don't do this. It's hard to see you don't overrun the
>>> buffer. Please allocate memory only when you need it.
>>
>> Sorry I don't understand this. As I mention above, we don't know how many
>> rdm entries we really need to allocate. So this is why we'd like to
>> preallocate all rdms at the beginning. Then this can cover all cases, global
>> policy, (global policy & per device) and only per device. Even if multiple
>> devices shares one rdm we also need to avoid duplicating a new...
>>
>
> Can you use libxl__realloc?
>

Looks this can expand size each time automatically, right? If so I think 
we can try this.

Thanks
Tiejun

next prev parent reply	other threads:[~2015-05-19  6:47 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-10  9:21 [RFC][PATCH 00/13] Fix RMRR Tiejun Chen
2015-04-10  9:21 ` [RFC][PATCH 01/13] tools: introduce some new parameters to set rdm policy Tiejun Chen
2015-05-08 13:04   ` Wei Liu
2015-05-11  5:35     ` Chen, Tiejun
2015-05-11 14:54       ` Wei Liu
2015-05-15  1:52         ` Chen, Tiejun
2015-05-18  1:06           ` Chen, Tiejun
2015-05-18 19:17           ` Wei Liu
2015-05-19  3:16             ` Chen, Tiejun
2015-05-19  9:42               ` Wei Liu
2015-05-19 10:50                 ` Chen, Tiejun
2015-05-19 11:00                   ` Wei Liu
2015-05-20  5:27                     ` Chen, Tiejun
2015-05-20  8:36                       ` Wei Liu
2015-05-20  8:51                         ` Chen, Tiejun
2015-05-20  9:07                           ` Wei Liu
2015-04-10  9:21 ` [RFC][PATCH 02/13] introduce XENMEM_reserved_device_memory_map Tiejun Chen
2015-04-16 14:59   ` Tim Deegan
2015-04-16 15:10     ` Jan Beulich
2015-04-16 15:24       ` Tim Deegan
2015-04-16 15:40         ` Tian, Kevin
2015-04-23 12:32       ` Chen, Tiejun
2015-04-23 12:59         ` Jan Beulich
2015-04-24  1:17           ` Chen, Tiejun
2015-04-24  7:21             ` Jan Beulich
2015-04-10  9:21 ` [RFC][PATCH 03/13] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
2015-05-08 13:07   ` Wei Liu
2015-05-11  5:36     ` Chen, Tiejun
2015-05-11  9:50       ` Wei Liu
2015-04-10  9:21 ` [RFC][PATCH 04/13] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
2015-04-15 13:10   ` Ian Jackson
2015-04-15 18:22     ` Tian, Kevin
2015-04-23 12:31     ` Chen, Tiejun
2015-04-20 11:13   ` Jan Beulich
2015-05-06 15:00     ` Chen, Tiejun
2015-05-06 15:34       ` Jan Beulich
2015-05-07  2:22         ` Chen, Tiejun
2015-05-07  6:04           ` Jan Beulich
2015-05-08  1:14             ` Chen, Tiejun
2015-05-08  1:24           ` Chen, Tiejun
2015-05-08 15:13             ` Wei Liu
2015-05-11  6:06               ` Chen, Tiejun
2015-05-08 14:43   ` Wei Liu
2015-05-11  8:09     ` Chen, Tiejun
2015-05-11 11:32       ` Wei Liu
2015-05-14  8:27         ` Chen, Tiejun
2015-05-18  1:06           ` Chen, Tiejun
2015-05-18 20:00           ` Wei Liu
2015-05-19  1:32             ` Tian, Kevin
2015-05-19 10:22               ` Wei Liu
2015-05-19  6:47             ` Chen, Tiejun [this message]
2015-04-10  9:21 ` [RFC][PATCH 05/13] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
2015-04-16 15:05   ` Tim Deegan
2015-04-23 12:33     ` Chen, Tiejun
2015-04-10  9:21 ` [RFC][PATCH 06/13] xen:vtd: create RMRR mapping Tiejun Chen
2015-04-16 15:16   ` Tim Deegan
2015-04-16 15:50     ` Tian, Kevin
2015-04-10  9:21 ` [RFC][PATCH 07/13] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
2015-04-16 15:40   ` Tim Deegan
2015-04-23 12:32     ` Chen, Tiejun
2015-04-23 13:05       ` Tim Deegan
2015-04-23 13:59       ` Jan Beulich
2015-04-23 14:26         ` Tim Deegan
2015-05-04  8:15         ` Tian, Kevin
2015-04-20 13:36   ` Jan Beulich
2015-05-11  8:37     ` Chen, Tiejun
2015-05-08 16:07   ` Julien Grall
2015-05-11  8:42     ` Chen, Tiejun
2015-05-11  9:51       ` Julien Grall
2015-05-11 10:57         ` Jan Beulich
2015-05-14  5:48           ` Chen, Tiejun
2015-05-14 20:13             ` Jan Beulich
2015-05-14  5:47         ` Chen, Tiejun
2015-05-14 10:19           ` Julien Grall
2015-04-10  9:21 ` [RFC][PATCH 08/13] tools: extend xc_assign_device() " Tiejun Chen
2015-04-20 13:39   ` Jan Beulich
2015-05-11  9:45     ` Chen, Tiejun
2015-05-11 10:53       ` Jan Beulich
2015-05-14  7:04         ` Chen, Tiejun
2015-04-10  9:22 ` [RFC][PATCH 09/13] xen: enable XENMEM_set_memory_map in hvm Tiejun Chen
2015-04-16 15:42   ` Tim Deegan
2015-04-20 13:46   ` Jan Beulich
2015-05-15  2:33     ` Chen, Tiejun
2015-05-15  6:12       ` Jan Beulich
2015-05-15  6:24         ` Chen, Tiejun
2015-05-15  6:35           ` Jan Beulich
2015-05-15  6:59             ` Chen, Tiejun
2015-04-10  9:22 ` [RFC][PATCH 10/13] tools: extend XENMEM_set_memory_map Tiejun Chen
2015-04-10 10:01   ` Wei Liu
2015-04-13  2:09     ` Chen, Tiejun
2015-04-13 11:02       ` Wei Liu
2015-04-14  0:42         ` Chen, Tiejun
2015-05-05  9:32           ` Wei Liu
2015-04-20 13:51   ` Jan Beulich
2015-05-15  2:57     ` Chen, Tiejun
2015-05-15  6:16       ` Jan Beulich
2015-05-15  7:09         ` Chen, Tiejun
2015-05-15  7:32           ` Jan Beulich
2015-05-15  7:51             ` Chen, Tiejun
2015-04-10  9:22 ` [RFC][PATCH 11/13] hvmloader: get guest memory map into memory_map[] Tiejun Chen
2015-04-20 13:57   ` Jan Beulich
2015-05-15  3:10     ` Chen, Tiejun
2015-04-10  9:22 ` [RFC][PATCH 12/13] hvmloader/pci: skip reserved ranges Tiejun Chen
2015-04-20 14:21   ` Jan Beulich
2015-05-15  3:18     ` Chen, Tiejun
2015-05-15  6:19       ` Jan Beulich
2015-05-15  7:34         ` Chen, Tiejun
2015-05-15  7:44           ` Jan Beulich
2015-05-15  8:16             ` Chen, Tiejun
2015-05-15  8:31               ` Jan Beulich
2015-05-15  9:21                 ` Chen, Tiejun
2015-05-15  9:32                   ` Jan Beulich
2015-04-10  9:22 ` [RFC][PATCH 13/13] hvmloader/e820: construct guest e820 table Tiejun Chen
2015-04-20 14:29   ` Jan Beulich
2015-05-15  6:11     ` Chen, Tiejun
2015-05-15  6:25       ` Jan Beulich
2015-05-15  6:39         ` Chen, Tiejun
2015-05-15  6:56           ` Jan Beulich
2015-05-15  7:11             ` Chen, Tiejun
2015-05-15  7:34               ` Jan Beulich
2015-05-15  8:00                 ` Chen, Tiejun
2015-05-15  8:12                   ` Jan Beulich
2015-05-15  8:47                     ` Chen, Tiejun
2015-05-15  8:54                       ` Jan Beulich
2015-05-15  9:18                         ` Chen, Tiejun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=555ADC69.5060605@intel.com \
    --to=tiejun.chen@intel.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=kevin.tian@intel.com \
    --cc=stefano.stabellini@citrix.com \
    --cc=tim@xen.org \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    --cc=yang.z.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).