xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: "Chen, Tiejun" <tiejun.chen@intel.com>
To: Jan Beulich <JBeulich@suse.com>,
	ian.campbell@citrix.com, stefano.stabellini@citrix.com,
	wei.liu2@citrix.com, Ian.Jackson@eu.citrix.com
Cc: kevin.tian@intel.com, andrew.cooper3@citrix.com, tim@xen.org,
	xen-devel@lists.xen.org, yang.z.zhang@intel.com
Subject: Re: [RFC][PATCH 04/13] tools/libxl: detect and avoid conflicts with RDM
Date: Thu, 07 May 2015 10:22:20 +0800	[thread overview]
Message-ID: <554ACC5C.2040300@intel.com> (raw)
In-Reply-To: <554A509402000078000773F6@mail.emea.novell.com>

On 2015/5/6 23:34, Jan Beulich wrote:
>>>> On 06.05.15 at 17:00, <tiejun.chen@intel.com> wrote:
>> On 2015/4/20 19:13, Jan Beulich wrote:
>>>>>> On 10.04.15 at 11:21, <tiejun.chen@intel.com> wrote:
>>>> --- a/tools/libxc/xc_domain.c
>>>> +++ b/tools/libxc/xc_domain.c
>>>> @@ -1665,6 +1665,46 @@ int xc_assign_device(
>>>>        return do_domctl(xch, &domctl);
>>>>    }
>>>>
>>>> +struct xen_reserved_device_memory
>>>> +*xc_device_get_rdm(xc_interface *xch,
>>>> +                   uint32_t flag,
>>>> +                   uint16_t seg,
>>>> +                   uint8_t bus,
>>>> +                   uint8_t devfn,
>>>> +                   unsigned int *nr_entries)
>>>> +{
>>>
>>> So what's the point of having both this new function and
>>> xc_reserved_device_memory_map()? Is the latter useful for
>>> anything besides the purpose here?
>>
>> I just hope xc_reserved_device_memory_map() is a standard interface to
>> call that XENMEM_reserved_device_memory_map, but xc_device_get_rdm() can
>> handle some errors in current case.
>>
>> I think you are hinting we just need one, right?
>
> Correct. But remember - I'm not a maintainer of this code, so

But this may be a little complex with one...

> maintainers may be of different opinion.

Anyway, let me ask our tools maintainers.

Campbell, Jackson, Wei and Stefano,

What about your concern to this?

>
>>>> +    struct xen_reserved_device_memory *xrdm = NULL;
>>>> +    int rc = xc_reserved_device_memory_map(xch, flag, seg, bus, devfn, xrdm,
>>>> +                                           nr_entries);
>>>> +
>>>> +    if ( rc < 0 )
>>>> +    {
>>>> +        if ( errno == ENOBUFS )
>>>> +        {
>>>> +            if ( (xrdm = malloc(*nr_entries *
>>>> +                                sizeof(xen_reserved_device_memory_t))) == NULL )
>>>> +            {
>>>> +                PERROR("Could not allocate memory.");
>>>
>>> Now that's exactly the kind of error message that makes no sense:
>>> As errno will already cause PERROR() to print something along the
>>> lines of the message you provide here, you're just creating
>>> redundancy. Indicating the purpose of the allocation, otoh, would
>>> add helpful context for the one inspecting the resulting log.
>>
>> What about this?
>>
>> PERROR("Could not allocate memory buffers to store reserved device
>> memory entries.");
>
> You kind of go from one extreme to the other - the message
> doesn't need to be overly long, but it should be distinct from
> all other messages (so that when seen one can identify what
> went wrong).

I originally refer to some existing examples like this,

int
xc_core_arch_memory_map_get(xc_interface *xch, struct 
xc_core_arch_context *unused,
                             xc_dominfo_t *info, shared_info_any_t 
*live_shinfo,
                             xc_core_memory_map_t **mapp,
                             unsigned int *nr_entries)
{
     ...
     map = malloc(sizeof(*map));
     if ( map == NULL )
     {
         PERROR("Could not allocate memory");
         return -1;
     }

Maybe this is wrong to my case. Could I change this?

PERROR("Could not allocate memory for XENMEM_reserved_device_memory_map 
hypercall");

Or just give me your line.

>
>>>> @@ -302,8 +300,11 @@ static int setup_guest(xc_interface *xch,
>>>>
>>>>        for ( i = 0; i < nr_pages; i++ )
>>>>            page_array[i] = i;
>>>> -    for ( i = mmio_start >> PAGE_SHIFT; i < nr_pages; i++ )
>>>> -        page_array[i] += mmio_size >> PAGE_SHIFT;
>>>> +    /*
>>>> +     * This condition 'lowmem_end <= mmio_start' is always true.
>>>> +     */
>>>
>>> For one I think you mean "The", not "This", as there's no such
>>> condition around here. And then - why? DYM "is supposed to
>>> always be true"? In which case you may want to check...
>>
>> I always do this inside libxl__build_hvm() but before setup_guest(),
>>
>> +    if (args.lowmem_size > mmio_start)
>> +        args.lowmem_size = mmio_start;
>>
>> And plus, we also another policy to rdm,
>>
>>       #1. Above a predefined boundary (default 2G)
>>           - move lowmem_end below reserved region to solve conflict;
>>
>> This means there's such a likelihood of args.lowmem_size < mmio_start)
>> as well.
>>
>> So here I'm saying the condition is always true.
>
> Okay, but again - if this is relevant to the following code, an
> assertion or alike may still be warranted.

Yes I should add 'assert()' here.

>
>>> and hence don't have the final say on stylistic issues, I don't see
>>> why the above couldn't be expressed with a single return statement.
>>
>> Are you saying something like this? Note this was showed by yourself
>> long time ago.
>
> I know, and hence I was puzzled to still see you use the more
> convoluted form.
>
>> static bool check_mmio_hole_conflict(uint64_t start, uint64_t memsize,
>>                                         uint64_t mmio_start, uint64_t mmio_size)
>> {
>>        return start + memsize > mmio_start && start < mmio_start + mmio_size;
>> }
>>
>> But I don't think this really can't work out our case.
>
> It's equivalent to the original you had, so I don't see what you
> mean with "this really can't work out our case".
>

Let me make this point clear.

The original implementation,

+static int check_rdm_hole(uint64_t start, uint64_t memsize,
+                          uint64_t rdm_start, uint64_t rdm_size)
+{
+    if (start + memsize <= rdm_start || start >= rdm_start + rdm_size)
+        return 0;
+    else
+        return 1;
+}

means it returns 'false' in two cases:

#1. end = start + memsize; end <= rdm_start;

This region [start, end] is below of rdm entry.

#2. rdm_end = rdm_start + rdm_size; stat >= rdm_end;

This region [start, end] is above of rdm entry.

So others conditions should indicate that rdm entry is overlapping with 
this region. Actually this has three cases:

#1. This region just conflicts with the first half of rdm entry;
#2. This region just conflicts with the second half of rdm entry;
#3. This whole region falls inside of rdm entry;

Then it should return 'true', right?

But with this single line,

return start + memsize > rdm_start && start < rdm_start + rdm_size;

=>

return end > rdm_start && start < rdm_end;

This just guarantee it return 'true' *only* if #3 above occurs.

>>>> +int libxl__domain_device_check_rdm(libxl__gc *gc,
>>>> +                                   libxl_domain_config *d_config,
>>>> +                                   uint64_t rdm_mem_guard,
>>>> +                                   struct xc_hvm_build_args *args)
>>>> +{
>>>> +    int i, j, conflict;
>>>> +    libxl_ctx *ctx = libxl__gc_owner(gc);
>>>> +    struct xen_reserved_device_memory *xrdm = NULL;
>>>> +    unsigned int nr_all_rdms = 0;
>>>> +    uint64_t rdm_start, rdm_size, highmem_end = (1ULL << 32);
>>>> +    uint32_t type = d_config->b_info.rdm.type;
>>>> +    uint16_t seg;
>>>> +    uint8_t bus, devfn;
>>>> +
>>>> +    /* Might not to expose rdm. */
>>>> +    if ((type == LIBXL_RDM_RESERVE_TYPE_NONE) && !d_config->num_pcidevs)
>>>> +        return 0;
>>>> +
>>>> +    /* Collect all rdm info if exist. */
>>>> +    xrdm = xc_device_get_rdm(ctx->xch, LIBXL_RDM_RESERVE_TYPE_HOST,
>>>> +                             0, 0, 0, &nr_all_rdms);
>>>
>>> What meaning has passing a libxl private value to a libxc function?
>>
>> We intend to collect all rdm entries info in advance and then we can
>> construct d_config->rdms based on our policies as follows. Because we
>> need to first allocate d_config->rdms properly to store rdms, but in
>> some cases we don't know how many buffers are enough. For example, we
>> don't have that global flag but with multiple pci devices. And even a
>> shared entry worsen this situation.
>>
>> So here, we set that flag as LIBXL_RDM_RESERVE_TYPE_HOST but without any
>> SBDF to grab all rdms.
>
> I'm afraid you didn't get my point: Values passed to libxc should be

Sorry for this misunderstanding.

> known to libxc. Values privately defined by libxl for its own purposes
> aren't known to libxc, and hence shouldn't be passed to libxc
> functions.

I think we should set this with 'PCI_DEV_RDM_ALL' since,

struct xen_reserved_device_memory_map {
     /* IN */
     /* Currently just one bit to indicate checkng all Reserved Device 
Memory. */
#define PCI_DEV_RDM_ALL   0x1

>
>>>> +     * 'try' policy is specified, and we also mark this as INVALID not to expose
>>>> +     * this entry to hvmloader.
>>>
>>> What is "this" in "... also mark this as ..."? Certainly neither the conflict
>>> nor the warning.
>>
>> Sorry, this is my fault.
>>
>>        * If a conflict is detected on a given RMRR entry, an error will be
>>        * returned if 'strict' policy is specified. Or conflict is treated as a
>>        * warning if 'relaxed' policy is specified, and we also mark this as
>>        * INVALID not to expose this entry to hvmloader.
>
> The same "this" still doesn't have anything reasonable it references. I
> think you mean "the entry" (in which case the subsequent "this entry"
> could become just "it" afaict). But (not being a native speaker) the
> grammar of the second half of the sentence looks odd (and hence
> potentially confusing) to me anyway (i.e. even with the previous

Sure, we need to make this better and clear.

> issue fixed).

      * If a conflict is detected on a given RMRR entry, an error will be
      * returned if 'strict' policy is specified. Instead, if 'relaxed' 
policy
      * specified, this conflict is treated just as a warning, but we 
mark this
      * RMRR entry as INVALID to indicate that this entry shouldn't be 
exposed
      * to hvmloader.

I hope this can help us understand what we do.

>
>>>> +     *
>>>> +     * Firstly we should check the case of rdm < 4G because we may need to
>>>> +     * expand highmem_end.
>>>> +     */
>>>> +    for (i = 0; i < d_config->num_rdms; i++) {
>>>> +        rdm_start = d_config->rdms[i].start;
>>>> +        rdm_size = d_config->rdms[i].size;
>>>> +        conflict = check_rdm_hole(0, args->lowmem_size, rdm_start, rdm_size);
>>>> +
>>>> +        if (!conflict)
>>>> +            continue;
>>>> +
>>>> +        /*
>>>> +         * Just check if RDM > our memory boundary
>>>> +         */
>>>> +        if (d_config->rdms[i].start > rdm_mem_guard) {
>>>> +            /*
>>>> +             * We will move downwards lowmem_end so we have to expand
>>>> +             * highmem_end.
>>>> +             */
>>>> +            highmem_end += (args->lowmem_size - rdm_start);
>>>> +            /* Now move downwards lowmem_end. */
>>>> +            args->lowmem_size = rdm_start;
>>>
>>> Considering that the action here doesn't depend on the specific
>>> ->rdms[] slot being looked at, I don't see why the loop needs to
>>
>> I'm not sure if I understand what you mean.
>>
>> All rdm entries are organized disorderly in d_config->rdms, so we should
>> traverse all entries to make sure args->lowmem_size is below all rdms'
>> start address.
>
> I think I see what confused me: in the if() condition you reference
> d_config->rdms[i].start, yet the body of the if() has no reference
> to d_config->rdms[i] at all. If the if() used rdm_start it would
> become obvious that this is being latched at the beginning of the

Indeed, I really should use rdm_start here.

> body (which is what I overlooked, assuming the variable's value
> to have got set prior to the loop), and hence the body is not loop
> invariant.
>

So just replace d_config->rdms[i].start with rdm_start like this,

         /*
          * Just check if RDM > our memory boundary
          */
         if (rdm_start > rdm_mem_guard) {
             /*
              * We will move downwards lowmem_end so we have to expand
              * highmem_end.
              */
             highmem_end += (args->lowmem_size - rdm_start);
             /* Now move downwards lowmem_end. */
             args->lowmem_size = rdm_start;
         }
     }

Thanks
Tiejun

  reply	other threads:[~2015-05-07  2:22 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-10  9:21 [RFC][PATCH 00/13] Fix RMRR Tiejun Chen
2015-04-10  9:21 ` [RFC][PATCH 01/13] tools: introduce some new parameters to set rdm policy Tiejun Chen
2015-05-08 13:04   ` Wei Liu
2015-05-11  5:35     ` Chen, Tiejun
2015-05-11 14:54       ` Wei Liu
2015-05-15  1:52         ` Chen, Tiejun
2015-05-18  1:06           ` Chen, Tiejun
2015-05-18 19:17           ` Wei Liu
2015-05-19  3:16             ` Chen, Tiejun
2015-05-19  9:42               ` Wei Liu
2015-05-19 10:50                 ` Chen, Tiejun
2015-05-19 11:00                   ` Wei Liu
2015-05-20  5:27                     ` Chen, Tiejun
2015-05-20  8:36                       ` Wei Liu
2015-05-20  8:51                         ` Chen, Tiejun
2015-05-20  9:07                           ` Wei Liu
2015-04-10  9:21 ` [RFC][PATCH 02/13] introduce XENMEM_reserved_device_memory_map Tiejun Chen
2015-04-16 14:59   ` Tim Deegan
2015-04-16 15:10     ` Jan Beulich
2015-04-16 15:24       ` Tim Deegan
2015-04-16 15:40         ` Tian, Kevin
2015-04-23 12:32       ` Chen, Tiejun
2015-04-23 12:59         ` Jan Beulich
2015-04-24  1:17           ` Chen, Tiejun
2015-04-24  7:21             ` Jan Beulich
2015-04-10  9:21 ` [RFC][PATCH 03/13] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
2015-05-08 13:07   ` Wei Liu
2015-05-11  5:36     ` Chen, Tiejun
2015-05-11  9:50       ` Wei Liu
2015-04-10  9:21 ` [RFC][PATCH 04/13] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
2015-04-15 13:10   ` Ian Jackson
2015-04-15 18:22     ` Tian, Kevin
2015-04-23 12:31     ` Chen, Tiejun
2015-04-20 11:13   ` Jan Beulich
2015-05-06 15:00     ` Chen, Tiejun
2015-05-06 15:34       ` Jan Beulich
2015-05-07  2:22         ` Chen, Tiejun [this message]
2015-05-07  6:04           ` Jan Beulich
2015-05-08  1:14             ` Chen, Tiejun
2015-05-08  1:24           ` Chen, Tiejun
2015-05-08 15:13             ` Wei Liu
2015-05-11  6:06               ` Chen, Tiejun
2015-05-08 14:43   ` Wei Liu
2015-05-11  8:09     ` Chen, Tiejun
2015-05-11 11:32       ` Wei Liu
2015-05-14  8:27         ` Chen, Tiejun
2015-05-18  1:06           ` Chen, Tiejun
2015-05-18 20:00           ` Wei Liu
2015-05-19  1:32             ` Tian, Kevin
2015-05-19 10:22               ` Wei Liu
2015-05-19  6:47             ` Chen, Tiejun
2015-04-10  9:21 ` [RFC][PATCH 05/13] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
2015-04-16 15:05   ` Tim Deegan
2015-04-23 12:33     ` Chen, Tiejun
2015-04-10  9:21 ` [RFC][PATCH 06/13] xen:vtd: create RMRR mapping Tiejun Chen
2015-04-16 15:16   ` Tim Deegan
2015-04-16 15:50     ` Tian, Kevin
2015-04-10  9:21 ` [RFC][PATCH 07/13] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
2015-04-16 15:40   ` Tim Deegan
2015-04-23 12:32     ` Chen, Tiejun
2015-04-23 13:05       ` Tim Deegan
2015-04-23 13:59       ` Jan Beulich
2015-04-23 14:26         ` Tim Deegan
2015-05-04  8:15         ` Tian, Kevin
2015-04-20 13:36   ` Jan Beulich
2015-05-11  8:37     ` Chen, Tiejun
2015-05-08 16:07   ` Julien Grall
2015-05-11  8:42     ` Chen, Tiejun
2015-05-11  9:51       ` Julien Grall
2015-05-11 10:57         ` Jan Beulich
2015-05-14  5:48           ` Chen, Tiejun
2015-05-14 20:13             ` Jan Beulich
2015-05-14  5:47         ` Chen, Tiejun
2015-05-14 10:19           ` Julien Grall
2015-04-10  9:21 ` [RFC][PATCH 08/13] tools: extend xc_assign_device() " Tiejun Chen
2015-04-20 13:39   ` Jan Beulich
2015-05-11  9:45     ` Chen, Tiejun
2015-05-11 10:53       ` Jan Beulich
2015-05-14  7:04         ` Chen, Tiejun
2015-04-10  9:22 ` [RFC][PATCH 09/13] xen: enable XENMEM_set_memory_map in hvm Tiejun Chen
2015-04-16 15:42   ` Tim Deegan
2015-04-20 13:46   ` Jan Beulich
2015-05-15  2:33     ` Chen, Tiejun
2015-05-15  6:12       ` Jan Beulich
2015-05-15  6:24         ` Chen, Tiejun
2015-05-15  6:35           ` Jan Beulich
2015-05-15  6:59             ` Chen, Tiejun
2015-04-10  9:22 ` [RFC][PATCH 10/13] tools: extend XENMEM_set_memory_map Tiejun Chen
2015-04-10 10:01   ` Wei Liu
2015-04-13  2:09     ` Chen, Tiejun
2015-04-13 11:02       ` Wei Liu
2015-04-14  0:42         ` Chen, Tiejun
2015-05-05  9:32           ` Wei Liu
2015-04-20 13:51   ` Jan Beulich
2015-05-15  2:57     ` Chen, Tiejun
2015-05-15  6:16       ` Jan Beulich
2015-05-15  7:09         ` Chen, Tiejun
2015-05-15  7:32           ` Jan Beulich
2015-05-15  7:51             ` Chen, Tiejun
2015-04-10  9:22 ` [RFC][PATCH 11/13] hvmloader: get guest memory map into memory_map[] Tiejun Chen
2015-04-20 13:57   ` Jan Beulich
2015-05-15  3:10     ` Chen, Tiejun
2015-04-10  9:22 ` [RFC][PATCH 12/13] hvmloader/pci: skip reserved ranges Tiejun Chen
2015-04-20 14:21   ` Jan Beulich
2015-05-15  3:18     ` Chen, Tiejun
2015-05-15  6:19       ` Jan Beulich
2015-05-15  7:34         ` Chen, Tiejun
2015-05-15  7:44           ` Jan Beulich
2015-05-15  8:16             ` Chen, Tiejun
2015-05-15  8:31               ` Jan Beulich
2015-05-15  9:21                 ` Chen, Tiejun
2015-05-15  9:32                   ` Jan Beulich
2015-04-10  9:22 ` [RFC][PATCH 13/13] hvmloader/e820: construct guest e820 table Tiejun Chen
2015-04-20 14:29   ` Jan Beulich
2015-05-15  6:11     ` Chen, Tiejun
2015-05-15  6:25       ` Jan Beulich
2015-05-15  6:39         ` Chen, Tiejun
2015-05-15  6:56           ` Jan Beulich
2015-05-15  7:11             ` Chen, Tiejun
2015-05-15  7:34               ` Jan Beulich
2015-05-15  8:00                 ` Chen, Tiejun
2015-05-15  8:12                   ` Jan Beulich
2015-05-15  8:47                     ` Chen, Tiejun
2015-05-15  8:54                       ` Jan Beulich
2015-05-15  9:18                         ` Chen, Tiejun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=554ACC5C.2040300@intel.com \
    --to=tiejun.chen@intel.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=kevin.tian@intel.com \
    --cc=stefano.stabellini@citrix.com \
    --cc=tim@xen.org \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    --cc=yang.z.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).