All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Ian Campbell <Ian.Campbell@eu.citrix.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
Subject: Re: [PATCH 2 of 3] libxl: Add support for passing in the machine's E820 for PCI passthrough
Date: Tue, 10 May 2011 10:56:33 -0400	[thread overview]
Message-ID: <20110510145633.GB17128@dumpdata.com> (raw)
In-Reply-To: <1305016190.26692.254.camel@zakaz.uk.xensource.com>

On Tue, May 10, 2011 at 09:29:50AM +0100, Ian Campbell wrote:
> On Wed, 2011-05-04 at 15:17 +0100, Konrad Rzeszutek Wilk wrote:
> > # HG changeset patch
> > # User Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > # Date 1304518568 14400
> > # Node ID ba218fa1a48ed682651fd90dd8eeb68eeab68e7a
> > # Parent  b6af9b428bb16c4c5364ace0617923ffa44ad887
> > libxl: Add support for passing in the machine's E820 for PCI passthrough
> > 
> > The code that populates E820 is unconditionally triggered by the guest
> > configuration having "pci=['<BDF>,..']", being a PV guest, and if
> > b_info->u.pv.machine_e820 is set.
> > 
> > The code do_domain_create calls the libxl__e820_alloc when
> > it notices that the guest is PV, has at least one PCI devices, and has
> > the machine_e820 flag set.
> > 
> > libxl__e820_alloc calls the xc_get_machine_memory_map to retrieve the systems
> > E820. Then the E820 is sanitized to weed out E820 entries below 16MB, and as
> > well remove any E820_RAM or E820_UNUSED regions as the guest does not need to
> > know about them. The guest only needs the E820_ACPI, E820_NVS, E820_RESERVED to
> > get an idea of where the PCI I/O space is. Mostly.. The Linux kernel assumes that any
> > gap in the E820 is considered PCI I/O space which means that if we pass
> > in the guest 2GB, and the E820_ACPI, and its friend start at 3GB, the
> > gap between 2GB and 3GB will be considered as PCI I/O space. To guard against
> > that we also create an E820_UNUSABLE between the region of 'target_kb'
> > (called ram_end in the code) up to the first E820_[ACPI,NVS,RESERVED] region.
> > Lastly, the xc_domain_set_memory_map is called to install the new E820.
> 
> Do we need to document the requirements this places on a PV guest
> somewhere more prominent?
> 
> Phrases like "up to the first E820...." make me worry that this will
> only work on "sensible" machines, but I suppose we can cross those
> bridges when we come to them...

I did test it with non-sensible machines that have a E820 peppered with
E820_RAM amongs the E820_RESERVED. The last patches makes that work.


> 
> > When tested with another PV guest (NetBSD 5.1) the modified E820 gave
> > it no trouble. The code has also been tested with older "classic" Xen Linux
> > and with the newer "pvops" with success (SLES11, RHEL5, Ubuntu Lucid,
> > Debian Squeeze, 2.6.37, 2.6.38, 2.6.39).
> > 
> > Memory that is slack or for balloon (so 'maxmem' in guest configuration)
> > is put behind the machine E820. Which in most cases is after the 4GB.
> > 
> > The reason for doing the fetching of the E820 using the hypercall in
> > the toolstack (instead of the guest doing it) is that when a guest
> > would do a hypercall to 'XENMEM_machine_memory_map' it would
> > retrieve an E820 with I/O range caps added in. Meaning that the
> > region after 4GB up to end of possible memory would be marked as unusable
> > and the kernel would not have any space to allocate a balloon
> > region.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > 
> > diff -r b6af9b428bb1 -r ba218fa1a48e tools/libxl/libxl.idl
> > --- a/tools/libxl/libxl.idl	Wed May 04 10:16:08 2011 -0400
> > +++ b/tools/libxl/libxl.idl	Wed May 04 10:16:08 2011 -0400
> > @@ -180,6 +180,7 @@ libxl_domain_build_info = Struct("domain
> >                                          ("cmdline", string),
> >                                          ("ramdisk", libxl_file_reference),
> >                                          ("features", string, True),
> > +                                        ("machine_e820", bool, False, "Use machine's E820 for PCI passthrough."),
> >                                          ])),
> >                   ])),
> >      ],
> > diff -r b6af9b428bb1 -r ba218fa1a48e tools/libxl/libxl_create.c
> > --- a/tools/libxl/libxl_create.c	Wed May 04 10:16:08 2011 -0400
> > +++ b/tools/libxl/libxl_create.c	Wed May 04 10:16:08 2011 -0400
> > @@ -519,6 +519,14 @@ static int do_domain_create(libxl__gc *g
> >      for (i = 0; i < d_config->num_pcidevs; i++)
> >          libxl__device_pci_add(gc, domid, &d_config->pcidevs[i], 1);
> >  
> > +    if (!d_config->c_info.hvm && d_config->b_info.u.pv.machine_e820) {
> > +        int rc = 0;
> > +        rc = libxl__e820_alloc(ctx, domid, d_config);
> 
> rc is pretty comprehensively initialised ;-0

Duh!
> 
> > +        if (rc)
> > +            LIBXL__LOG(ctx, LIBXL__LOG_WARNING,
> > +                      "Failed while collecting E820 with: %d (errno:%d)\n",
> > +                      rc, errno);
> 
> LIBXL__LOG_ERRNO takes care of logging errno for you.

OK.
> 
> > +    }
> >      if ( cb && (d_config->c_info.hvm || d_config->b_info.u.pv.bootloader )) {
> >          if ( (*cb)(ctx, domid, priv) )
> >              goto error_out;
> 
> > diff -r b6af9b428bb1 -r ba218fa1a48e tools/libxl/libxl_pci.c
> > --- a/tools/libxl/libxl_pci.c	Wed May 04 10:16:08 2011 -0400
> > +++ b/tools/libxl/libxl_pci.c	Wed May 04 10:16:08 2011 -0400
> > @@ -1047,3 +1047,157 @@ int libxl_device_pci_shutdown(libxl_ctx 
> >      free(pcidevs);
> >      return 0;
> >  }
> > +
> > +static int e820_sanitize(libxl_ctx *ctx, struct e820entry src[],
> > +                         uint32_t *nr_entries,
> > +                         unsigned long map_limitkb,
> > +                         unsigned long balloon_kb)
> > +{
> > +    uint64_t delta_kb = 0, start = 0, start_kb = 0, last = 0, ram_end;
> > +    uint32_t i, idx = 0, nr;
> > +    struct e820entry e820[E820MAX];
> > +
> > +    if (!src || !map_limitkb || !balloon_kb || !nr_entries)
> > +        return ERROR_INVAL;
> > +
> > +    nr = *nr_entries;
> > +    if (!nr)
> > +        return ERROR_INVAL;
> > +
> > +    if (nr > E820MAX)
> > +        return ERROR_NOMEM;
> > +
> > +    /* Weed out anything under 16MB */
> > +    for (i = 0; i < nr; i++) {
> > +      if (src[i].addr > 0x100000)
> 
> 0x100000 is 1MB not 16MB. I'm not sure if the code or the comment is
> correct.

Comment should have said 1MB.
> 
> > +        continue;
> > +
> > +      src[i].type = 0;
> > +      src[i].size = 0;
> > +      src[i].addr = -1ULL;
> > +    }
> > +
> > +    /* Find the lowest and highest entry in E820, skipping over
> > +     * undersired entries. */
> 
>           undesired ?
> 
> > +    start = -1ULL;
> > +    last = 0;
> > +    for (i = 0; i < nr; i++) {
> > +        if ((src[i].type == E820_RAM) ||
> > +            (src[i].type == E820_UNUSABLE) ||
> > +            (src[i].type == 0)) 
> [...]
> >     nr = idx;
> > +
> > +    for (i = 0; i < nr; i++) {
> > +      LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, ":%s\t[%lx -> %lx]",
> > +	e820[i].type == E820_RAM ? "RAM " :
> > +	(e820[i].type == E820_RESERVED ? "RSV " :
> > +	 e820[i].type == E820_ACPI ? "ACPI" :
> > +         (e820[i].type == E820_NVS ? "NVS " :
> > +          (e820[i].type == E820_UNUSABLE ? "UNU " : "----"))),
> > +          e820[i].addr >> 12,
> > +         (e820[i].addr + e820[i].size) >> 12);
> 
> This screams out for a e820_type_as_string style function or a lookup
> table, or just about anything else really ;-).

<nods> I had it at some point in the patches, not sure why it got lost.
Let me add it back in.
> Also you can use "%-4s" instead of manually aligning all the strings.

/me nods.
> 
> > +    }
> > +
> > +    /* Done: copy the sanitized version. */
> > +    *nr_entries = nr;
> > +    memcpy(src, e820, nr * sizeof(struct e820entry));
> > +    return 0;
> > +}
> > +
> 
> Ian.

  reply	other threads:[~2011-05-10 14:56 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-04 14:17 [PATCH 0 of 3] Patches for PCI passthrough with modified E820 (v3) - resent Konrad Rzeszutek Wilk
2011-05-04 14:17 ` [PATCH 1 of 3] tools: Add xc_domain_set_memory_map and xc_get_machine_memory_map calls (x86, amd64 only) Konrad Rzeszutek Wilk
2011-05-04 14:17 ` [PATCH 2 of 3] libxl: Add support for passing in the machine's E820 for PCI passthrough Konrad Rzeszutek Wilk
2011-05-10  8:29   ` Ian Campbell
2011-05-10 14:56     ` Konrad Rzeszutek Wilk [this message]
2011-05-17 15:10   ` Ian Jackson
2011-05-17 16:00     ` Konrad Rzeszutek Wilk
2011-05-17 16:05       ` Ian Campbell
2011-05-04 14:17 ` [PATCH 3 of 3] libxl: Convert E820_UNUSABLE and E820_RAM to E820_UNUSABLE as appropriate Konrad Rzeszutek Wilk
2011-05-09  9:00 ` [PATCH 0 of 3] Patches for PCI passthrough with modified E820 (v3) - resent Ian Campbell
2011-05-09 21:01   ` Konrad Rzeszutek Wilk
2011-05-10  8:30     ` Ian Campbell
2011-05-10 14:53       ` Konrad Rzeszutek Wilk
2011-05-10 15:09         ` Ian Campbell
2011-05-10 15:27           ` Konrad Rzeszutek Wilk
2011-05-10 15:32             ` Ian Campbell
2011-05-10 15:51               ` Konrad Rzeszutek Wilk
2011-05-11  7:49                 ` Ian Campbell
2011-05-12 17:41                   ` Konrad Rzeszutek Wilk
2011-05-13  8:47                     ` Ian Campbell
2011-05-13 13:57                       ` Konrad Rzeszutek Wilk
2011-05-17 16:02                         ` Konrad Rzeszutek Wilk
2011-05-17 16:07                           ` Ian Campbell
2011-05-17 16:34                             ` Konrad Rzeszutek Wilk
  -- strict thread matches above, loose matches on Subject: below --
2011-05-10 16:32 [PATCH 0 of 3] Patches for PCI passthrough with modified E820 (v4) Konrad Rzeszutek Wilk
2011-05-10 16:32 ` [PATCH 2 of 3] libxl: Add support for passing in the machine's E820 for PCI passthrough Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110510145633.GB17128@dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=Ian.Campbell@eu.citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=Stefano.Stabellini@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.