xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Ian Campbell <Ian.Campbell@eu.citrix.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
Subject: Re: [PATCH 2 of 3] libxl: Add support for passing in the machine's E820 for PCI passthrough
Date: Tue, 10 May 2011 10:56:33 -0400	[thread overview]
Message-ID: <20110510145633.GB17128@dumpdata.com> (raw)
In-Reply-To: <1305016190.26692.254.camel@zakaz.uk.xensource.com>

On Tue, May 10, 2011 at 09:29:50AM +0100, Ian Campbell wrote:
> On Wed, 2011-05-04 at 15:17 +0100, Konrad Rzeszutek Wilk wrote:
> > # HG changeset patch
> > # User Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > # Date 1304518568 14400
> > # Node ID ba218fa1a48ed682651fd90dd8eeb68eeab68e7a
> > # Parent  b6af9b428bb16c4c5364ace0617923ffa44ad887
> > libxl: Add support for passing in the machine's E820 for PCI passthrough
> > 
> > The code that populates E820 is unconditionally triggered by the guest
> > configuration having "pci=['<BDF>,..']", being a PV guest, and if
> > b_info->u.pv.machine_e820 is set.
> > 
> > The code do_domain_create calls the libxl__e820_alloc when
> > it notices that the guest is PV, has at least one PCI devices, and has
> > the machine_e820 flag set.
> > 
> > libxl__e820_alloc calls the xc_get_machine_memory_map to retrieve the systems
> > E820. Then the E820 is sanitized to weed out E820 entries below 16MB, and as
> > well remove any E820_RAM or E820_UNUSED regions as the guest does not need to
> > know about them. The guest only needs the E820_ACPI, E820_NVS, E820_RESERVED to
> > get an idea of where the PCI I/O space is. Mostly.. The Linux kernel assumes that any
> > gap in the E820 is considered PCI I/O space which means that if we pass
> > in the guest 2GB, and the E820_ACPI, and its friend start at 3GB, the
> > gap between 2GB and 3GB will be considered as PCI I/O space. To guard against
> > that we also create an E820_UNUSABLE between the region of 'target_kb'
> > (called ram_end in the code) up to the first E820_[ACPI,NVS,RESERVED] region.
> > Lastly, the xc_domain_set_memory_map is called to install the new E820.
> 
> Do we need to document the requirements this places on a PV guest
> somewhere more prominent?
> 
> Phrases like "up to the first E820...." make me worry that this will
> only work on "sensible" machines, but I suppose we can cross those
> bridges when we come to them...

I did test it with non-sensible machines that have a E820 peppered with
E820_RAM amongs the E820_RESERVED. The last patches makes that work.


> 
> > When tested with another PV guest (NetBSD 5.1) the modified E820 gave
> > it no trouble. The code has also been tested with older "classic" Xen Linux
> > and with the newer "pvops" with success (SLES11, RHEL5, Ubuntu Lucid,
> > Debian Squeeze, 2.6.37, 2.6.38, 2.6.39).
> > 
> > Memory that is slack or for balloon (so 'maxmem' in guest configuration)
> > is put behind the machine E820. Which in most cases is after the 4GB.
> > 
> > The reason for doing the fetching of the E820 using the hypercall in
> > the toolstack (instead of the guest doing it) is that when a guest
> > would do a hypercall to 'XENMEM_machine_memory_map' it would
> > retrieve an E820 with I/O range caps added in. Meaning that the
> > region after 4GB up to end of possible memory would be marked as unusable
> > and the kernel would not have any space to allocate a balloon
> > region.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > 
> > diff -r b6af9b428bb1 -r ba218fa1a48e tools/libxl/libxl.idl
> > --- a/tools/libxl/libxl.idl	Wed May 04 10:16:08 2011 -0400
> > +++ b/tools/libxl/libxl.idl	Wed May 04 10:16:08 2011 -0400
> > @@ -180,6 +180,7 @@ libxl_domain_build_info = Struct("domain
> >                                          ("cmdline", string),
> >                                          ("ramdisk", libxl_file_reference),
> >                                          ("features", string, True),
> > +                                        ("machine_e820", bool, False, "Use machine's E820 for PCI passthrough."),
> >                                          ])),
> >                   ])),
> >      ],
> > diff -r b6af9b428bb1 -r ba218fa1a48e tools/libxl/libxl_create.c
> > --- a/tools/libxl/libxl_create.c	Wed May 04 10:16:08 2011 -0400
> > +++ b/tools/libxl/libxl_create.c	Wed May 04 10:16:08 2011 -0400
> > @@ -519,6 +519,14 @@ static int do_domain_create(libxl__gc *g
> >      for (i = 0; i < d_config->num_pcidevs; i++)
> >          libxl__device_pci_add(gc, domid, &d_config->pcidevs[i], 1);
> >  
> > +    if (!d_config->c_info.hvm && d_config->b_info.u.pv.machine_e820) {
> > +        int rc = 0;
> > +        rc = libxl__e820_alloc(ctx, domid, d_config);
> 
> rc is pretty comprehensively initialised ;-0

Duh!
> 
> > +        if (rc)
> > +            LIBXL__LOG(ctx, LIBXL__LOG_WARNING,
> > +                      "Failed while collecting E820 with: %d (errno:%d)\n",
> > +                      rc, errno);
> 
> LIBXL__LOG_ERRNO takes care of logging errno for you.

OK.
> 
> > +    }
> >      if ( cb && (d_config->c_info.hvm || d_config->b_info.u.pv.bootloader )) {
> >          if ( (*cb)(ctx, domid, priv) )
> >              goto error_out;
> 
> > diff -r b6af9b428bb1 -r ba218fa1a48e tools/libxl/libxl_pci.c
> > --- a/tools/libxl/libxl_pci.c	Wed May 04 10:16:08 2011 -0400
> > +++ b/tools/libxl/libxl_pci.c	Wed May 04 10:16:08 2011 -0400
> > @@ -1047,3 +1047,157 @@ int libxl_device_pci_shutdown(libxl_ctx 
> >      free(pcidevs);
> >      return 0;
> >  }
> > +
> > +static int e820_sanitize(libxl_ctx *ctx, struct e820entry src[],
> > +                         uint32_t *nr_entries,
> > +                         unsigned long map_limitkb,
> > +                         unsigned long balloon_kb)
> > +{
> > +    uint64_t delta_kb = 0, start = 0, start_kb = 0, last = 0, ram_end;
> > +    uint32_t i, idx = 0, nr;
> > +    struct e820entry e820[E820MAX];
> > +
> > +    if (!src || !map_limitkb || !balloon_kb || !nr_entries)
> > +        return ERROR_INVAL;
> > +
> > +    nr = *nr_entries;
> > +    if (!nr)
> > +        return ERROR_INVAL;
> > +
> > +    if (nr > E820MAX)
> > +        return ERROR_NOMEM;
> > +
> > +    /* Weed out anything under 16MB */
> > +    for (i = 0; i < nr; i++) {
> > +      if (src[i].addr > 0x100000)
> 
> 0x100000 is 1MB not 16MB. I'm not sure if the code or the comment is
> correct.

Comment should have said 1MB.
> 
> > +        continue;
> > +
> > +      src[i].type = 0;
> > +      src[i].size = 0;
> > +      src[i].addr = -1ULL;
> > +    }
> > +
> > +    /* Find the lowest and highest entry in E820, skipping over
> > +     * undersired entries. */
> 
>           undesired ?
> 
> > +    start = -1ULL;
> > +    last = 0;
> > +    for (i = 0; i < nr; i++) {
> > +        if ((src[i].type == E820_RAM) ||
> > +            (src[i].type == E820_UNUSABLE) ||
> > +            (src[i].type == 0)) 
> [...]
> >     nr = idx;
> > +
> > +    for (i = 0; i < nr; i++) {
> > +      LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, ":%s\t[%lx -> %lx]",
> > +	e820[i].type == E820_RAM ? "RAM " :
> > +	(e820[i].type == E820_RESERVED ? "RSV " :
> > +	 e820[i].type == E820_ACPI ? "ACPI" :
> > +         (e820[i].type == E820_NVS ? "NVS " :
> > +          (e820[i].type == E820_UNUSABLE ? "UNU " : "----"))),
> > +          e820[i].addr >> 12,
> > +         (e820[i].addr + e820[i].size) >> 12);
> 
> This screams out for a e820_type_as_string style function or a lookup
> table, or just about anything else really ;-).

<nods> I had it at some point in the patches, not sure why it got lost.
Let me add it back in.
> Also you can use "%-4s" instead of manually aligning all the strings.

/me nods.
> 
> > +    }
> > +
> > +    /* Done: copy the sanitized version. */
> > +    *nr_entries = nr;
> > +    memcpy(src, e820, nr * sizeof(struct e820entry));
> > +    return 0;
> > +}
> > +
> 
> Ian.

  reply	other threads:[~2011-05-10 14:56 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-04 14:17 [PATCH 0 of 3] Patches for PCI passthrough with modified E820 (v3) - resent Konrad Rzeszutek Wilk
2011-05-04 14:17 ` [PATCH 1 of 3] tools: Add xc_domain_set_memory_map and xc_get_machine_memory_map calls (x86, amd64 only) Konrad Rzeszutek Wilk
2011-05-04 14:17 ` [PATCH 2 of 3] libxl: Add support for passing in the machine's E820 for PCI passthrough Konrad Rzeszutek Wilk
2011-05-10  8:29   ` Ian Campbell
2011-05-10 14:56     ` Konrad Rzeszutek Wilk [this message]
2011-05-17 15:10   ` Ian Jackson
2011-05-17 16:00     ` Konrad Rzeszutek Wilk
2011-05-17 16:05       ` Ian Campbell
2011-05-04 14:17 ` [PATCH 3 of 3] libxl: Convert E820_UNUSABLE and E820_RAM to E820_UNUSABLE as appropriate Konrad Rzeszutek Wilk
2011-05-09  9:00 ` [PATCH 0 of 3] Patches for PCI passthrough with modified E820 (v3) - resent Ian Campbell
2011-05-09 21:01   ` Konrad Rzeszutek Wilk
2011-05-10  8:30     ` Ian Campbell
2011-05-10 14:53       ` Konrad Rzeszutek Wilk
2011-05-10 15:09         ` Ian Campbell
2011-05-10 15:27           ` Konrad Rzeszutek Wilk
2011-05-10 15:32             ` Ian Campbell
2011-05-10 15:51               ` Konrad Rzeszutek Wilk
2011-05-11  7:49                 ` Ian Campbell
2011-05-12 17:41                   ` Konrad Rzeszutek Wilk
2011-05-13  8:47                     ` Ian Campbell
2011-05-13 13:57                       ` Konrad Rzeszutek Wilk
2011-05-17 16:02                         ` Konrad Rzeszutek Wilk
2011-05-17 16:07                           ` Ian Campbell
2011-05-17 16:34                             ` Konrad Rzeszutek Wilk
  -- strict thread matches above, loose matches on Subject: below --
2011-05-10 16:32 [PATCH 0 of 3] Patches for PCI passthrough with modified E820 (v4) Konrad Rzeszutek Wilk
2011-05-10 16:32 ` [PATCH 2 of 3] libxl: Add support for passing in the machine's E820 for PCI passthrough Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110510145633.GB17128@dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=Ian.Campbell@eu.citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=Stefano.Stabellini@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).