From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Chen, Tiejun" Subject: Re: [RFC][PATCH 04/13] tools/libxl: detect and avoid conflicts with RDM Date: Thu, 07 May 2015 10:22:20 +0800 Message-ID: <554ACC5C.2040300@intel.com> References: <1428657724-3498-1-git-send-email-tiejun.chen@intel.com> <1428657724-3498-5-git-send-email-tiejun.chen@intel.com> <5534FB700200007800073B6C@mail.emea.novell.com> <554A2CA1.9030907@intel.com> <554A509402000078000773F6@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <554A509402000078000773F6@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , ian.campbell@citrix.com, stefano.stabellini@citrix.com, wei.liu2@citrix.com, Ian.Jackson@eu.citrix.com Cc: kevin.tian@intel.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, yang.z.zhang@intel.com List-Id: xen-devel@lists.xenproject.org On 2015/5/6 23:34, Jan Beulich wrote: >>>> On 06.05.15 at 17:00, wrote: >> On 2015/4/20 19:13, Jan Beulich wrote: >>>>>> On 10.04.15 at 11:21, wrote: >>>> --- a/tools/libxc/xc_domain.c >>>> +++ b/tools/libxc/xc_domain.c >>>> @@ -1665,6 +1665,46 @@ int xc_assign_device( >>>> return do_domctl(xch, &domctl); >>>> } >>>> >>>> +struct xen_reserved_device_memory >>>> +*xc_device_get_rdm(xc_interface *xch, >>>> + uint32_t flag, >>>> + uint16_t seg, >>>> + uint8_t bus, >>>> + uint8_t devfn, >>>> + unsigned int *nr_entries) >>>> +{ >>> >>> So what's the point of having both this new function and >>> xc_reserved_device_memory_map()? Is the latter useful for >>> anything besides the purpose here? >> >> I just hope xc_reserved_device_memory_map() is a standard interface to >> call that XENMEM_reserved_device_memory_map, but xc_device_get_rdm() can >> handle some errors in current case. >> >> I think you are hinting we just need one, right? > > Correct. But remember - I'm not a maintainer of this code, so But this may be a little complex with one... > maintainers may be of different opinion. Anyway, let me ask our tools maintainers. Campbell, Jackson, Wei and Stefano, What about your concern to this? > >>>> + struct xen_reserved_device_memory *xrdm = NULL; >>>> + int rc = xc_reserved_device_memory_map(xch, flag, seg, bus, devfn, xrdm, >>>> + nr_entries); >>>> + >>>> + if ( rc < 0 ) >>>> + { >>>> + if ( errno == ENOBUFS ) >>>> + { >>>> + if ( (xrdm = malloc(*nr_entries * >>>> + sizeof(xen_reserved_device_memory_t))) == NULL ) >>>> + { >>>> + PERROR("Could not allocate memory."); >>> >>> Now that's exactly the kind of error message that makes no sense: >>> As errno will already cause PERROR() to print something along the >>> lines of the message you provide here, you're just creating >>> redundancy. Indicating the purpose of the allocation, otoh, would >>> add helpful context for the one inspecting the resulting log. >> >> What about this? >> >> PERROR("Could not allocate memory buffers to store reserved device >> memory entries."); > > You kind of go from one extreme to the other - the message > doesn't need to be overly long, but it should be distinct from > all other messages (so that when seen one can identify what > went wrong). I originally refer to some existing examples like this, int xc_core_arch_memory_map_get(xc_interface *xch, struct xc_core_arch_context *unused, xc_dominfo_t *info, shared_info_any_t *live_shinfo, xc_core_memory_map_t **mapp, unsigned int *nr_entries) { ... map = malloc(sizeof(*map)); if ( map == NULL ) { PERROR("Could not allocate memory"); return -1; } Maybe this is wrong to my case. Could I change this? PERROR("Could not allocate memory for XENMEM_reserved_device_memory_map hypercall"); Or just give me your line. > >>>> @@ -302,8 +300,11 @@ static int setup_guest(xc_interface *xch, >>>> >>>> for ( i = 0; i < nr_pages; i++ ) >>>> page_array[i] = i; >>>> - for ( i = mmio_start >> PAGE_SHIFT; i < nr_pages; i++ ) >>>> - page_array[i] += mmio_size >> PAGE_SHIFT; >>>> + /* >>>> + * This condition 'lowmem_end <= mmio_start' is always true. >>>> + */ >>> >>> For one I think you mean "The", not "This", as there's no such >>> condition around here. And then - why? DYM "is supposed to >>> always be true"? In which case you may want to check... >> >> I always do this inside libxl__build_hvm() but before setup_guest(), >> >> + if (args.lowmem_size > mmio_start) >> + args.lowmem_size = mmio_start; >> >> And plus, we also another policy to rdm, >> >> #1. Above a predefined boundary (default 2G) >> - move lowmem_end below reserved region to solve conflict; >> >> This means there's such a likelihood of args.lowmem_size < mmio_start) >> as well. >> >> So here I'm saying the condition is always true. > > Okay, but again - if this is relevant to the following code, an > assertion or alike may still be warranted. Yes I should add 'assert()' here. > >>> and hence don't have the final say on stylistic issues, I don't see >>> why the above couldn't be expressed with a single return statement. >> >> Are you saying something like this? Note this was showed by yourself >> long time ago. > > I know, and hence I was puzzled to still see you use the more > convoluted form. > >> static bool check_mmio_hole_conflict(uint64_t start, uint64_t memsize, >> uint64_t mmio_start, uint64_t mmio_size) >> { >> return start + memsize > mmio_start && start < mmio_start + mmio_size; >> } >> >> But I don't think this really can't work out our case. > > It's equivalent to the original you had, so I don't see what you > mean with "this really can't work out our case". > Let me make this point clear. The original implementation, +static int check_rdm_hole(uint64_t start, uint64_t memsize, + uint64_t rdm_start, uint64_t rdm_size) +{ + if (start + memsize <= rdm_start || start >= rdm_start + rdm_size) + return 0; + else + return 1; +} means it returns 'false' in two cases: #1. end = start + memsize; end <= rdm_start; This region [start, end] is below of rdm entry. #2. rdm_end = rdm_start + rdm_size; stat >= rdm_end; This region [start, end] is above of rdm entry. So others conditions should indicate that rdm entry is overlapping with this region. Actually this has three cases: #1. This region just conflicts with the first half of rdm entry; #2. This region just conflicts with the second half of rdm entry; #3. This whole region falls inside of rdm entry; Then it should return 'true', right? But with this single line, return start + memsize > rdm_start && start < rdm_start + rdm_size; => return end > rdm_start && start < rdm_end; This just guarantee it return 'true' *only* if #3 above occurs. >>>> +int libxl__domain_device_check_rdm(libxl__gc *gc, >>>> + libxl_domain_config *d_config, >>>> + uint64_t rdm_mem_guard, >>>> + struct xc_hvm_build_args *args) >>>> +{ >>>> + int i, j, conflict; >>>> + libxl_ctx *ctx = libxl__gc_owner(gc); >>>> + struct xen_reserved_device_memory *xrdm = NULL; >>>> + unsigned int nr_all_rdms = 0; >>>> + uint64_t rdm_start, rdm_size, highmem_end = (1ULL << 32); >>>> + uint32_t type = d_config->b_info.rdm.type; >>>> + uint16_t seg; >>>> + uint8_t bus, devfn; >>>> + >>>> + /* Might not to expose rdm. */ >>>> + if ((type == LIBXL_RDM_RESERVE_TYPE_NONE) && !d_config->num_pcidevs) >>>> + return 0; >>>> + >>>> + /* Collect all rdm info if exist. */ >>>> + xrdm = xc_device_get_rdm(ctx->xch, LIBXL_RDM_RESERVE_TYPE_HOST, >>>> + 0, 0, 0, &nr_all_rdms); >>> >>> What meaning has passing a libxl private value to a libxc function? >> >> We intend to collect all rdm entries info in advance and then we can >> construct d_config->rdms based on our policies as follows. Because we >> need to first allocate d_config->rdms properly to store rdms, but in >> some cases we don't know how many buffers are enough. For example, we >> don't have that global flag but with multiple pci devices. And even a >> shared entry worsen this situation. >> >> So here, we set that flag as LIBXL_RDM_RESERVE_TYPE_HOST but without any >> SBDF to grab all rdms. > > I'm afraid you didn't get my point: Values passed to libxc should be Sorry for this misunderstanding. > known to libxc. Values privately defined by libxl for its own purposes > aren't known to libxc, and hence shouldn't be passed to libxc > functions. I think we should set this with 'PCI_DEV_RDM_ALL' since, struct xen_reserved_device_memory_map { /* IN */ /* Currently just one bit to indicate checkng all Reserved Device Memory. */ #define PCI_DEV_RDM_ALL 0x1 > >>>> + * 'try' policy is specified, and we also mark this as INVALID not to expose >>>> + * this entry to hvmloader. >>> >>> What is "this" in "... also mark this as ..."? Certainly neither the conflict >>> nor the warning. >> >> Sorry, this is my fault. >> >> * If a conflict is detected on a given RMRR entry, an error will be >> * returned if 'strict' policy is specified. Or conflict is treated as a >> * warning if 'relaxed' policy is specified, and we also mark this as >> * INVALID not to expose this entry to hvmloader. > > The same "this" still doesn't have anything reasonable it references. I > think you mean "the entry" (in which case the subsequent "this entry" > could become just "it" afaict). But (not being a native speaker) the > grammar of the second half of the sentence looks odd (and hence > potentially confusing) to me anyway (i.e. even with the previous Sure, we need to make this better and clear. > issue fixed). * If a conflict is detected on a given RMRR entry, an error will be * returned if 'strict' policy is specified. Instead, if 'relaxed' policy * specified, this conflict is treated just as a warning, but we mark this * RMRR entry as INVALID to indicate that this entry shouldn't be exposed * to hvmloader. I hope this can help us understand what we do. > >>>> + * >>>> + * Firstly we should check the case of rdm < 4G because we may need to >>>> + * expand highmem_end. >>>> + */ >>>> + for (i = 0; i < d_config->num_rdms; i++) { >>>> + rdm_start = d_config->rdms[i].start; >>>> + rdm_size = d_config->rdms[i].size; >>>> + conflict = check_rdm_hole(0, args->lowmem_size, rdm_start, rdm_size); >>>> + >>>> + if (!conflict) >>>> + continue; >>>> + >>>> + /* >>>> + * Just check if RDM > our memory boundary >>>> + */ >>>> + if (d_config->rdms[i].start > rdm_mem_guard) { >>>> + /* >>>> + * We will move downwards lowmem_end so we have to expand >>>> + * highmem_end. >>>> + */ >>>> + highmem_end += (args->lowmem_size - rdm_start); >>>> + /* Now move downwards lowmem_end. */ >>>> + args->lowmem_size = rdm_start; >>> >>> Considering that the action here doesn't depend on the specific >>> ->rdms[] slot being looked at, I don't see why the loop needs to >> >> I'm not sure if I understand what you mean. >> >> All rdm entries are organized disorderly in d_config->rdms, so we should >> traverse all entries to make sure args->lowmem_size is below all rdms' >> start address. > > I think I see what confused me: in the if() condition you reference > d_config->rdms[i].start, yet the body of the if() has no reference > to d_config->rdms[i] at all. If the if() used rdm_start it would > become obvious that this is being latched at the beginning of the Indeed, I really should use rdm_start here. > body (which is what I overlooked, assuming the variable's value > to have got set prior to the loop), and hence the body is not loop > invariant. > So just replace d_config->rdms[i].start with rdm_start like this, /* * Just check if RDM > our memory boundary */ if (rdm_start > rdm_mem_guard) { /* * We will move downwards lowmem_end so we have to expand * highmem_end. */ highmem_end += (args->lowmem_size - rdm_start); /* Now move downwards lowmem_end. */ args->lowmem_size = rdm_start; } } Thanks Tiejun