All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xensource.com, Keir Fraser <keir@xen.org>,
	bruce.edge@gmail.com, gianni.tedesco@citrix.com,
	stefano.stabellini@eu.citrix.com
Subject: Re: [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm)
Date: Mon, 15 Nov 2010 17:06:41 -0800	[thread overview]
Message-ID: <4CE1D921.2010703@goop.org> (raw)
In-Reply-To: <20101115231133.GA12364@dumpdata.com>

On 11/15/2010 03:11 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Nov 15, 2010 at 07:57:47PM +0000, Keir Fraser wrote:
>> On 15/11/2010 19:32, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:
>>
>>>> Or, is there much disadvantage, to having a static really big PCI hole? Say
>>>> starting at 1GB? The advantage of this would be the ability to hotplug PCI
>>>> devices to a domU even across save/restore/migrate -- this may not work so
>>>> well if you commit yourself to the hole size of the original host, and the
>>>> restore/migrate target host has a bigger hole!
>>> Well, the other question is whether the devices have to have the same
>>> pfn as mfn within the hole.  We're emulating the PCI config space anyway
>>> - couldn't we stick the passthrough PCI space at 3G regardless of where
>>> it is on the real hardware?
> Your thinking is that on the Linux side, any of the pfns that are within
> those System RAM gaps (irregardless if they are above or below 4GB) would
> be consultated during PTE creation/lookup (xen_pte_val..). 
>
> And if those PFNs are within those System RAM gaps, we would store the 
> MFN in the P2M list and instead of doing:
>    val = ((pteval_t)pfn << PAGE_SHIFT) | flags
>
> we would actually do mfn = pfn_to_mfn(pfn) and stick on the _PAGE_IOMAP flag.
>
> And  example patch (compiled tested, not tested any other way) attached at the
> end of this email.

Right, it basically depends on dropping _PAGE_IOMAP and populating the
p2m with the correct mapping for both memory and hardware pages.

> How does that work on the Xen side? Does the hypervisor depend on the pages
> that belong to the DOM_IO domain to have a INVALID_MFN value in the mfn_list?

Xen wouldn't care.  I don't think its necessary to explicitly do a
cross-domain mapping with DOM_IO as we currently do; that's overkill
and/or a misunderstanding on my part.

> We do make the PTE that refer to physical devices to be the DOM_IO domain..

I think Xen will sort that out for itself when presented with a
hardware/device mfn.

>> Well, I don't know. It sounds pretty sensible to me. :-)
>>
>> Certain virtualisation feature sdisappearing after a save/restore/migrate --
>> or worsse, becoming unreliable -- would be a bit sad.
> So having the option of the PCI hole being passed through, and giving
> the tools the value (pci_hole) would mean we could migrate an SR-IOV type
> device from one machine to another. Constructing the PCI hole using the
> XENMEM_machine_memory_map could generate different E820 for the two guests, which
> would be indeed a bit sad.
>
>
> --- the patch ---
>
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 50dc626..96a08ef 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -699,7 +699,7 @@ static bool xen_page_pinned(void *ptr)
>  
>  static bool xen_iomap_pte(pte_t pte)
>  {
> -	return pte_flags(pte) & _PAGE_IOMAP;
> +	return xen_pfn_is_pci(pte_mfn(pte));
>  }

I think populating the p2m appropriately in advance is better than this
test; this is OK for prototyping I guess, but way to expensive for every
set_pte.

    J

>  
>  void xen_set_domain_pte(pte_t *ptep, pte_t pteval, unsigned domid)
> @@ -801,11 +801,6 @@ void set_pte_mfn(unsigned long vaddr, unsigned long mfn, pgprot_t flags)
>  void xen_set_pte_at(struct mm_struct *mm, unsigned long addr,
>  		    pte_t *ptep, pte_t pteval)
>  {
> -	if (xen_iomap_pte(pteval)) {
> -		xen_set_iomap_pte(ptep, pteval);
> -		goto out;
> -	}
> -
>  	ADD_STATS(set_pte_at, 1);
>  //	ADD_STATS(set_pte_at_pinned, xen_page_pinned(ptep));
>  	ADD_STATS(set_pte_at_current, mm == current->mm);
> @@ -889,19 +884,6 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
>  	return val;
>  }
>  
> -static pteval_t iomap_pte(pteval_t val)
> -{
> -	if (val & _PAGE_PRESENT) {
> -		unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
> -		pteval_t flags = val & PTE_FLAGS_MASK;
> -
> -		/* We assume the pte frame number is a MFN, so
> -		   just use it as-is. */
> -		val = ((pteval_t)pfn << PAGE_SHIFT) | flags;
> -	}
> -
> -	return val;
> -}
>  
>  pteval_t xen_pte_val(pte_t pte)
>  {
> @@ -913,8 +895,8 @@ pteval_t xen_pte_val(pte_t pte)
>  		pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT;
>  	}
>  
> -	if (xen_initial_domain() && (pteval & _PAGE_IOMAP))
> -		return pteval;
> +	if (xen_pfn_is_pci(pte_mfn(pte)))
> +		pteval |= _PAGE_IOMAP;
>  
>  	return pte_mfn_to_pfn(pteval);
>  }
> @@ -974,13 +956,14 @@ pte_t xen_make_pte(pteval_t pte)
>  	 * mappings are just dummy local mappings to keep other
>  	 * parts of the kernel happy.
>  	 */
> -	if (unlikely(pte & _PAGE_IOMAP) &&
> -	    (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
> -		pte = iomap_pte(pte);
> -	} else {
> +	if ((unlikely(pte & _PAGE_IOMAP) &&
> +	    (xen_initial_domain() || addr >= ISA_END_ADDRESS)) ||
> +	    (unlikely(xen_pfn_is_pci(PFN_UP(addr)))))
> +		pte |=  _PAGE_IOMAP;
> +	else
>  		pte &= ~_PAGE_IOMAP;
> -		pte = pte_pfn_to_mfn(pte);
> -	}
> +
> +	pte = pte_pfn_to_mfn(pte);
>  
>  	return native_make_pte(pte);
>  }
> @@ -1037,10 +1020,8 @@ void xen_set_pud(pud_t *ptr, pud_t val)
>  
>  void xen_set_pte(pte_t *ptep, pte_t pte)
>  {
> -	if (xen_iomap_pte(pte)) {
> +	if (xen_iomap_pte(pte))
>  		xen_set_iomap_pte(ptep, pte);
> -		return;
> -	}
>  
>  	ADD_STATS(pte_update, 1);
>  //	ADD_STATS(pte_update_pinned, xen_page_pinned(ptep));
> @@ -1058,10 +1039,8 @@ void xen_set_pte(pte_t *ptep, pte_t pte)
>  #ifdef CONFIG_X86_PAE
>  void xen_set_pte_atomic(pte_t *ptep, pte_t pte)
>  {
> -	if (xen_iomap_pte(pte)) {
> +	if (xen_iomap_pte(pte))
>  		xen_set_iomap_pte(ptep, pte);
> -		return;
> -	}
>  
>  	set_64bit((u64 *)ptep, native_pte_val(pte));
>  }
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 5a1f22d..bb424e3 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -196,6 +196,31 @@ unsigned long xen_find_max_pfn(void)
>  	xen_raw_printk("E820 max_pfn = %ld (nr_pages: %ld)\n", max_pfn, xen_start_info->nr_pages);
>  	return max_pfn;
>  }
> +
> +int xen_pfn_is_pci(unsigned long pfn)
> +{
> +	static struct e820entry map[E820MAX] __initdata;
> +	int rc, op, i;
> +	struct xen_memory_map memmap;
> +	unsigned long long addr = PFN_PHYS(pfn);
> +	memmap.nr_entries = E820MAX;
> +	set_xen_guest_handle(memmap.buffer, map);
> +
> +	op = xen_initial_domain() ?
> +		XENMEM_machine_memory_map :
> +		XENMEM_memory_map;
> +	rc = HYPERVISOR_memory_op(op, &memmap);
> +	BUG_ON(rc);
> +
> +	for (i = 0; i < memmap.nr_entries; i++) {
> +		unsigned long long end = map[i].addr + map[i].size;
> +		if (map[i].type != E820_RAM)
> +			continue;
> +		if (addr >= map[i].addr && addr <= end)
> +			return 0;
> +	}
> +	return 1;
> +}
>  /**
>   * machine_specific_memory_setup - Hook for machine specific memory setup.
>   **/
> diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
> index eee2045..f859b04 100644
> --- a/include/xen/xen-ops.h
> +++ b/include/xen/xen-ops.h
> @@ -31,4 +31,6 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
>  extern phys_addr_t xen_extra_mem_start;
>  unsigned long xen_find_max_pfn(void);
>  
> +int xen_pfn_is_pci(unsigned long pfn);
> +
>  #endif /* INCLUDE_XEN_OPS_H */
>

  reply	other threads:[~2010-11-16  1:06 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-12 23:08 [RFC Patch] Support for making an E820 PCI hole in toolstack (xl + xm) Konrad Rzeszutek Wilk
2010-11-12 23:16 ` Konrad Rzeszutek Wilk
2010-11-13  7:40 ` Keir Fraser
2010-11-15 17:03   ` Konrad Rzeszutek Wilk
2010-11-15 17:20     ` Ian Campbell
2010-11-15 17:28       ` Konrad Rzeszutek Wilk
2010-11-15 17:48     ` Keir Fraser
2010-11-15 18:15       ` Konrad Rzeszutek Wilk
2010-11-15 18:41         ` Keir Fraser
2010-11-15 19:32           ` Jeremy Fitzhardinge
2010-11-15 19:57             ` Keir Fraser
2010-11-15 23:11               ` Konrad Rzeszutek Wilk
2010-11-16  1:06                 ` Jeremy Fitzhardinge [this message]
2010-11-16  9:26                   ` Ian Campbell
2010-11-16  9:52                     ` Keir Fraser
2010-11-16 10:02                       ` Ian Campbell
2010-11-16 10:11                         ` Keir Fraser
2010-11-16 18:01                           ` Jeremy Fitzhardinge
2010-11-16 15:50                     ` Konrad Rzeszutek Wilk
2010-11-17 14:23                       ` Ian Campbell
2010-11-16  7:40                 ` Keir Fraser
2010-11-15 19:30         ` Jeremy Fitzhardinge
2010-11-17 11:14 ` Gianni Tedesco
2010-11-17 11:43   ` Ian Campbell
2010-11-17 13:37     ` Gianni Tedesco

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CE1D921.2010703@goop.org \
    --to=jeremy@goop.org \
    --cc=bruce.edge@gmail.com \
    --cc=gianni.tedesco@citrix.com \
    --cc=keir@xen.org \
    --cc=konrad.wilk@oracle.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.