From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gordan Bobic Subject: HVM support for e820_host (Was: Bug: Limitation of <=2GB RAM in domU persists with 4.3.0) Date: Tue, 03 Sep 2013 20:47:09 +0100 Message-ID: <52263CBD.1090402@bobich.net> References: <8426aecf79e7f55c21bbe259014591a2@mail.shatteredsilicon.net> <20130724163102.GA6308@phenom.dumpdata.com> <51F051F1.5050806@bobich.net> <51F19D11.1090200@bobich.net> <51F1A54D.6070906@bobich.net> <1374798084.10269.2.camel@hastur.hellion.org.uk> <20130729180431.GQ5848@phenom.dumpdata.com> <20130903145934.GC1487@konrad-lan.dumpdata.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------020200020505070907080401" Return-path: In-Reply-To: <20130903145934.GC1487@konrad-lan.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org This is a multi-part message in MIME format. --------------020200020505070907080401 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 09/03/2013 03:59 PM, Konrad Rzeszutek Wilk wrote: >>>> 2) Further, I'm finding myself motivated to write that >>>> auto-set (as opposed to hard coded) vBAR=pBAR patch discussed >>>> briefly a week or so ago (have an init script read the BAR >>>> info from dom0 and put it in xenstore, plus a patch to >>>> make pBAR=vBAR reservations built dynamically rather than >>>> statically, based on this data. Now, I'm quite fluent in C, >>>> but my familiarity with Xen soruce code is nearly non-existant >>>> (limited to studying an old unsupported patch every now and then >>>> in order to make it apply to a more recent code release). >>>> Can anyone help me out with a high level view WRT where >>>> this would be best plumbed in (which files and the flow of >>>> control between the affected files)? >>> >>> hvmloader probably and the libxl e820 code. What from a >>> high view needs to happen is that: >>> 1). Need to relax the check in libxl for e820_hole >>> to also do it for HVM guests. Said code just iterates over the >>> host E820 and sanitizes it a bit and makes a E820 hypercall to >>> set it for the guest. [snip] OK, I have attached a preliminary patch against 4.3.0 for the libxl part. It compiles. I haven't tried running it to see if it actually works or does something, but my packages build. Please let me know if I've missed anything. On it's own, I don't think this patch will do much (apart from maybe break HVM hosts with e820_host=1 set). >>> 2). Figure out whether the E820 hypercall (which sets the E820 >>> layout for a guest) can be run on HVM guests. I think it >>> could not and Mukesh in his PVH patches posted a patch >>> to enable that - "..Move e820 fields out of pv_domain struct" Is this already in 4.3.0 or is this an out-of-tree patch? Do you have a link to it handy? >>> 2). Hvmloader should do an E820 get machine memory hypercall >>> to see if there is anything there. If there is - that means >>> the toolstack has request a "new" type of E820. Iterate >>> over the E820 and make it look like that. >>> You can look in the Linux arch/x86/xen/setup.c to see how >>> it does that. >>> >>> The complication there is that hvmloader needs to to fit the >>> ACPI code (the guest type one) and such. >>> Presumarily you can just re-use the existing spaces that >>> the host has marked as E820_RESERVED or E820_ACPI.. >> >> Yup, I get it. Not only that, but it should also ideally (not >> strictly necessary, but it'd be handy) map the IOMEM for devices >> it is passed so that pBAR=vBAR (as opposed to just leaving all >> the host e820 reserved areas well alone - which would work for >> most things). > > Yes. That is an extra complication that could be done in subsequent > patches. But in theory if you have the E820 mirrored from the host the > pBAR=vBAR should be easy enough as the values from the host BARs can > easily fit in the E820 gaps. Agreed. Let's leave the pBAR=vBAR part for a separate patch set. I'll have to figure out a sensible way to query the IOMEM regions for each of the devices passed to the VM and make sure they are in the same hole. >>> Then there is the SMBIOS would need to move and the BIOS >>> might need to be relocated - but I think those are relocatable >>> in some form. [bit above left for later reference] >>> Well, I am more than happy to help you with this. >> >> Thanks, much appreciated. :) > > Yeeey! Vict^H^H^H^volunteer :-)! > > I am also reachable on IRC (FreeNode mostly) as either darnok or konrad > if that would be more convient to discuss this. Thanks. I'll keep that in mind. :) Gordan --------------020200020505070907080401 Content-Type: text/plain; charset=UTF-8; name="xen-hvm-libxl-e820_host.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="xen-hvm-libxl-e820_host.patch" --- xen-4.3.0/tools/libxl/libxl_create.c.orig 2013-09-03 14:26:47.478350269 +0100 +++ xen-4.3.0/tools/libxl/libxl_create.c 2013-09-03 14:45:26.710553063 +0100 @@ -208,6 +208,8 @@ libxl_defbool_setdefault(&b_info->disable_migrate, false); + libxl_defbool_setdefault(&b_info->e820_host, false); + switch (b_info->type) { case LIBXL_DOMAIN_TYPE_HVM: if (b_info->shadow_memkb == LIBXL_MEMKB_DEFAULT) @@ -280,7 +282,6 @@ break; case LIBXL_DOMAIN_TYPE_PV: - libxl_defbool_setdefault(&b_info->u.pv.e820_host, false); if (b_info->shadow_memkb == LIBXL_MEMKB_DEFAULT) b_info->shadow_memkb = 0; if (b_info->u.pv.slack_memkb == LIBXL_MEMKB_DEFAULT) --- xen-4.3.0/tools/libxl/libxl_types.idl.orig 2013-09-03 14:16:48.462767589 +0100 +++ xen-4.3.0/tools/libxl/libxl_types.idl 2013-09-03 14:18:19.624028024 +0100 @@ -295,6 +295,8 @@ ("irqs", Array(uint32, "num_irqs")), ("iomem", Array(libxl_iomem_range, "num_iomem")), ("claim_mode", libxl_defbool), + # Use host's E820 for PCI passthrough. + ("e820_host", libxl_defbool), ("u", KeyedUnion(None, libxl_domain_type, "type", [("hvm", Struct(None, [("firmware", string), ("bios", libxl_bios_type), @@ -340,8 +342,6 @@ ("cmdline", string), ("ramdisk", string), ("features", string, {'const': True}), - # Use host's E820 for PCI passthrough. - ("e820_host", libxl_defbool), ])), ("invalid", Struct(None, [])), ], keyvar_init_val = "LIBXL_DOMAIN_TYPE_INVALID")), --- xen-4.3.0/tools/libxl/libxl_x86.c.orig 2013-09-03 14:26:36.093566315 +0100 +++ xen-4.3.0/tools/libxl/libxl_x86.c 2013-09-03 16:52:24.648701260 +0100 @@ -216,11 +216,8 @@ struct e820entry map[E820MAX]; libxl_domain_build_info *b_info; - if (d_config == NULL || d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM) - return ERROR_INVAL; - b_info = &d_config->b_info; - if (!libxl_defbool_val(b_info->u.pv.e820_host)) + if (!libxl_defbool_val(b_info->e820_host)) return ERROR_INVAL; rc = xc_get_machine_memory_map(ctx->xch, map, E820MAX); @@ -229,9 +226,15 @@ return ERROR_FAIL; } nr = rc; - rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb, - (b_info->max_memkb - b_info->target_memkb) + - b_info->u.pv.slack_memkb); + if (d_config == NULL || d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM) { + rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb, + (b_info->max_memkb - b_info->target_memkb)); + } else if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_PV) { + rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb, + (b_info->max_memkb - b_info->target_memkb) + + b_info->u.pv.slack_memkb); + } + if (rc) return ERROR_FAIL; @@ -296,8 +299,7 @@ xc_shadow_control(ctx->xch, domid, XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION, NULL, 0, &shadow, 0, NULL); } - if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_PV && - libxl_defbool_val(d_config->b_info.u.pv.e820_host)) { + if (libxl_defbool_val(d_config->b_info.e820_host)) { ret = libxl__e820_alloc(gc, domid, d_config); if (ret) { LIBXL__LOG_ERRNO(gc->owner, LIBXL__LOG_ERROR, --- xen-4.3.0/tools/libxl/xl_cmdimpl.c.orig 2013-09-03 14:26:54.524214804 +0100 +++ xen-4.3.0/tools/libxl/xl_cmdimpl.c 2013-09-03 14:47:11.811612562 +0100 @@ -1274,11 +1274,7 @@ if (!xlu_cfg_get_long (config, "pci_permissive", &l, 0)) pci_permissive = l; - /* To be reworked (automatically enabled) once the auto ballooning - * after guest starts is done (with PCI devices passed in). */ - if (c_info->type == LIBXL_DOMAIN_TYPE_PV) { - xlu_cfg_get_defbool(config, "e820_host", &b_info->u.pv.e820_host, 0); - } + xlu_cfg_get_defbool(config, "e820_host", &b_info->e820_host, 0); if (!xlu_cfg_get_list (config, "pci", &pcis, 0, 0)) { d_config->num_pcidevs = 0; @@ -1296,8 +1292,8 @@ if (!xlu_pci_parse_bdf(config, pcidev, buf)) d_config->num_pcidevs++; } - if (d_config->num_pcidevs && c_info->type == LIBXL_DOMAIN_TYPE_PV) - libxl_defbool_set(&b_info->u.pv.e820_host, true); + if (d_config->num_pcidevs) + libxl_defbool_set(&b_info->e820_host, true); } switch (xlu_cfg_get_list(config, "cpuid", &cpuids, 0, 1)) { --- xen-4.3.0/tools/libxl/xl_sxp.c.orig 2013-09-03 14:25:37.839675572 +0100 +++ xen-4.3.0/tools/libxl/xl_sxp.c 2013-09-03 14:22:13.953561029 +0100 @@ -87,6 +87,10 @@ } } + printf("\t(e820_host %s)\n", + libxl_defbool_to_string(b_info->e820_host)); + + printf("\t(image\n"); switch (c_info->type) { case LIBXL_DOMAIN_TYPE_HVM: @@ -150,8 +154,6 @@ printf("\t\t\t(kernel %s)\n", b_info->u.pv.kernel); printf("\t\t\t(cmdline %s)\n", b_info->u.pv.cmdline); printf("\t\t\t(ramdisk %s)\n", b_info->u.pv.ramdisk); - printf("\t\t\t(e820_host %s)\n", - libxl_defbool_to_string(b_info->u.pv.e820_host)); printf("\t\t)\n"); break; default: --------------020200020505070907080401 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --------------020200020505070907080401--