* [RFC PATCH 00/12] xen: fix many long-standing grant mapping bugs
@ 2015-01-06 18:57 David Vrabel
2015-01-06 18:57 ` [PATCH 01/12] mm: allow for an alternate set of pages for userspace mappings David Vrabel
` (11 more replies)
0 siblings, 12 replies; 34+ messages in thread
From: David Vrabel @ 2015-01-06 18:57 UTC (permalink / raw)
To: xen-devel
Cc: Boris Ostrovsky, Stefano Stabellini, Jenny Herbert, David Vrabel
This series fixes a number of long-standing bugs in the handling of
grant maps. Refer to the following for all the details.
http://xenbits.xen.org/people/dvrabel/grant-improvements-B.pdf
In summary, the important uses that this enables are:
1. Block backends can use networked storage safely.
2. Block backends can use network storage provided by other guests on
the same host.
3. User space block backends can use direct I/O or asynchronous I/O.
The first two patches are the core MM changes necessary. I shall be
sending these to the MM maintainers seperately.
Patches #3 and #4 remove existing (broken) mechanisms. This does
temporarily break some previously working use cases, but it does make
the subsequent additions much easier to review.
As a happy side effect, performance is also likely to be improved in
some areas (but I've not got any measurements yet). User space
backends using grant mapping should see some good improvements from
reduced overheads and better unmap batching. VIF to VIF network
traffic may also see a small improvement.
This is RFC because:
- It needs more testing.
- I've not checked what changes (if any) are needed for ARM.
Finally, thanks to Jenny who did much of the implementation.
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 01/12] mm: allow for an alternate set of pages for userspace mappings
2015-01-06 18:57 [RFC PATCH 00/12] xen: fix many long-standing grant mapping bugs David Vrabel
@ 2015-01-06 18:57 ` David Vrabel
2015-01-06 18:57 ` [PATCH 02/12] mm: add 'foreign' alias for the 'pinned' page flag David Vrabel
` (10 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: David Vrabel @ 2015-01-06 18:57 UTC (permalink / raw)
To: xen-devel
Cc: Boris Ostrovsky, Stefano Stabellini, Jenny Herbert, David Vrabel
Add an optional array of pages to struct vm_area_struct that can be
used find the page backing a VMA. This is useful in cases where the
normal mechanisms for finding the page don't work. This array is only
inspected if the PTE is special.
Splitting a VMA with such an array of pages is trivially done by
adjusting vma->pages. The original creator of the VMA must only free
the page array once all sub-VMAs are closed (e.g., by ref-counting in
vm_ops->open and vm_ops->close).
One use case is a Xen PV guest mapping foreign pages into userspace.
In a Xen PV guest, the PTEs contain MFNs so get_user_pages() (for
example) must do an MFN to PFN (M2P) lookup before it can get the
page. For foreign pages (those owned by another guest) the M2P lookup
returns the PFN as seen by the foreign guest (which would be
completely the wrong page for the local guest).
This cannot be fixed up improving the M2P lookup since one MFN may be
mapped onto two or more pages so getting the right page is impossible
given just the MFN.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
include/linux/mm_types.h | 8 ++++++++
mm/memory.c | 2 ++
mm/mmap.c | 12 +++++++++++-
3 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6d34aa2..4f34609 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -309,6 +309,14 @@ struct vm_area_struct {
#ifdef CONFIG_NUMA
struct mempolicy *vm_policy; /* NUMA policy for the VMA */
#endif
+ /*
+ * Array of pages to override the default vm_normal_page()
+ * result iff the PTE is special.
+ *
+ * The memory for this should be refcounted in vm_ops->open
+ * and vm_ops->close.
+ */
+ struct page **pages;
};
struct core_thread {
diff --git a/mm/memory.c b/mm/memory.c
index ca920d1..98520f6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -754,6 +754,8 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
if (HAVE_PTE_SPECIAL) {
if (likely(!pte_special(pte)))
goto check_pfn;
+ if (vma->pages)
+ return vma->pages[(addr - vma->vm_start) >> PAGE_SHIFT];
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
return NULL;
if (!is_zero_pfn(pfn))
diff --git a/mm/mmap.c b/mm/mmap.c
index 7b36aa7..504dc5c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2448,6 +2448,7 @@ static int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, int new_below)
{
struct vm_area_struct *new;
+ unsigned long delta;
int err = -ENOMEM;
if (is_vm_hugetlb_page(vma) && (addr &
@@ -2463,11 +2464,20 @@ static int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
INIT_LIST_HEAD(&new->anon_vma_chain);
+ delta = (addr - vma->vm_start) >> PAGE_SHIFT;
+
if (new_below)
new->vm_end = addr;
else {
new->vm_start = addr;
- new->vm_pgoff += ((addr - vma->vm_start) >> PAGE_SHIFT);
+ new->vm_pgoff += delta;
+ }
+
+ if (vma->pages) {
+ if (new_below)
+ vma->pages += delta;
+ else
+ new->pages += delta;
}
err = vma_dup_policy(vma, new);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 02/12] mm: add 'foreign' alias for the 'pinned' page flag
2015-01-06 18:57 [RFC PATCH 00/12] xen: fix many long-standing grant mapping bugs David Vrabel
2015-01-06 18:57 ` [PATCH 01/12] mm: allow for an alternate set of pages for userspace mappings David Vrabel
@ 2015-01-06 18:57 ` David Vrabel
2015-01-07 17:12 ` Konrad Rzeszutek Wilk
2015-01-06 18:57 ` [PATCH 03/12] xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs() David Vrabel
` (9 subsequent siblings)
11 siblings, 1 reply; 34+ messages in thread
From: David Vrabel @ 2015-01-06 18:57 UTC (permalink / raw)
To: xen-devel
Cc: Boris Ostrovsky, Stefano Stabellini, Jenny Herbert, David Vrabel
From: Jenny Herbert <jennifer.herbert@citrix.com>
The foreign page flag will be used by Xen guests to mark pages that
have grant mappings of frames from other (foreign) guests.
The foreign flag is an aliases the existing (Xen-specific) pinned
flag. This is safe because pinned is only used on pages used for page
tables and these cannot also be foreign.
Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
include/linux/page-flags.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e1f5fcd..7734cc8 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -123,6 +123,7 @@ enum pageflags {
/* XEN */
PG_pinned = PG_owner_priv_1,
PG_savepinned = PG_dirty,
+ PG_foreign = PG_owner_priv_1,
/* SLOB */
PG_slob_free = PG_private,
@@ -215,6 +216,7 @@ __PAGEFLAG(Slab, slab)
PAGEFLAG(Checked, checked) /* Used by some filesystems */
PAGEFLAG(Pinned, pinned) TESTSCFLAG(Pinned, pinned) /* Xen */
PAGEFLAG(SavePinned, savepinned); /* Xen */
+PAGEFLAG(Foreign, foreign); /* Xen */
PAGEFLAG(Reserved, reserved) __CLEARPAGEFLAG(Reserved, reserved)
PAGEFLAG(SwapBacked, swapbacked) __CLEARPAGEFLAG(SwapBacked, swapbacked)
__SETPAGEFLAG(SwapBacked, swapbacked)
--
1.7.10.4
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 03/12] xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs()
2015-01-06 18:57 [RFC PATCH 00/12] xen: fix many long-standing grant mapping bugs David Vrabel
2015-01-06 18:57 ` [PATCH 01/12] mm: allow for an alternate set of pages for userspace mappings David Vrabel
2015-01-06 18:57 ` [PATCH 02/12] mm: add 'foreign' alias for the 'pinned' page flag David Vrabel
@ 2015-01-06 18:57 ` David Vrabel
2015-01-06 18:57 ` [PATCH 04/12] xen: remove scratch frames for ballooned pages and m2p override David Vrabel
` (8 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: David Vrabel @ 2015-01-06 18:57 UTC (permalink / raw)
To: xen-devel
Cc: Boris Ostrovsky, Stefano Stabellini, Jenny Herbert, David Vrabel
When unmapping grants, instead of converting the kernel map ops to
unmap ops on the fly, pre-populate the set of unmap ops.
This allows the grant unmap for the kernel mappings to be trivially
batched in the future.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
arch/arm/include/asm/xen/page.h | 2 +-
arch/arm/xen/p2m.c | 2 +-
arch/x86/include/asm/xen/page.h | 2 +-
arch/x86/xen/p2m.c | 21 ++++++++++-----------
drivers/xen/gntdev.c | 26 ++++++++++++++++++--------
drivers/xen/grant-table.c | 4 ++--
include/xen/grant_table.h | 2 +-
7 files changed, 34 insertions(+), 25 deletions(-)
diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 68c739b..2f7e6ff 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -92,7 +92,7 @@ extern int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
struct page **pages, unsigned int count);
extern int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
- struct gnttab_map_grant_ref *kmap_ops,
+ struct gnttab_unmap_grant_ref *kunmap_ops,
struct page **pages, unsigned int count);
bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
diff --git a/arch/arm/xen/p2m.c b/arch/arm/xen/p2m.c
index 0548577..cb7a14c 100644
--- a/arch/arm/xen/p2m.c
+++ b/arch/arm/xen/p2m.c
@@ -102,7 +102,7 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
EXPORT_SYMBOL_GPL(set_foreign_p2m_mapping);
int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
- struct gnttab_map_grant_ref *kmap_ops,
+ struct gnttab_unmap_grant_ref *kunmap_ops,
struct page **pages, unsigned int count)
{
int i;
diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 5eea099..e9f52fe 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -55,7 +55,7 @@ extern int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
struct page **pages, unsigned int count);
extern int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
- struct gnttab_map_grant_ref *kmap_ops,
+ struct gnttab_unmap_grant_ref *kunmap_ops,
struct page **pages, unsigned int count);
extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn);
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index edbc7a6..9fd5f70 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -816,7 +816,7 @@ static struct page *m2p_find_override(unsigned long mfn)
}
static int m2p_remove_override(struct page *page,
- struct gnttab_map_grant_ref *kmap_op,
+ struct gnttab_unmap_grant_ref *kunmap_op,
unsigned long mfn)
{
unsigned long flags;
@@ -840,7 +840,7 @@ static int m2p_remove_override(struct page *page,
list_del(&page->lru);
spin_unlock_irqrestore(&m2p_override_lock, flags);
- if (kmap_op != NULL) {
+ if (kunmap_op != NULL) {
if (!PageHighMem(page)) {
struct multicall_space mcs;
struct gnttab_unmap_and_replace *unmap_op;
@@ -855,13 +855,13 @@ static int m2p_remove_override(struct page *page,
* issued. In this case handle is going to -1 because
* it hasn't been modified yet.
*/
- if (kmap_op->handle == -1)
+ if (kunmap_op->handle == -1)
xen_mc_flush();
/*
* Now if kmap_op->handle is negative it means that the
* hypercall actually returned an error.
*/
- if (kmap_op->handle == GNTST_general_error) {
+ if (kunmap_op->handle == GNTST_general_error) {
pr_warn("m2p_remove_override: pfn %lx mfn %lx, failed to modify kernel mappings",
pfn, mfn);
put_balloon_scratch_page();
@@ -873,9 +873,9 @@ static int m2p_remove_override(struct page *page,
mcs = __xen_mc_entry(
sizeof(struct gnttab_unmap_and_replace));
unmap_op = mcs.args;
- unmap_op->host_addr = kmap_op->host_addr;
+ unmap_op->host_addr = kunmap_op->host_addr;
unmap_op->new_addr = scratch_page_address;
- unmap_op->handle = kmap_op->handle;
+ unmap_op->handle = kunmap_op->handle;
MULTI_grant_table_op(mcs.mc,
GNTTABOP_unmap_and_replace, unmap_op, 1);
@@ -887,7 +887,6 @@ static int m2p_remove_override(struct page *page,
xen_mc_issue(PARAVIRT_LAZY_MMU);
- kmap_op->host_addr = 0;
put_balloon_scratch_page();
}
}
@@ -912,7 +911,7 @@ static int m2p_remove_override(struct page *page,
}
int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
- struct gnttab_map_grant_ref *kmap_ops,
+ struct gnttab_unmap_grant_ref *kunmap_ops,
struct page **pages, unsigned int count)
{
int i, ret = 0;
@@ -921,7 +920,7 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
if (xen_feature(XENFEAT_auto_translated_physmap))
return 0;
- if (kmap_ops &&
+ if (kunmap_ops &&
!in_interrupt() &&
paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
arch_enter_lazy_mmu_mode();
@@ -942,8 +941,8 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
ClearPagePrivate(pages[i]);
set_phys_to_machine(pfn, pages[i]->index);
- if (kmap_ops)
- ret = m2p_remove_override(pages[i], &kmap_ops[i], mfn);
+ if (kunmap_ops)
+ ret = m2p_remove_override(pages[i], &kunmap_ops[i], mfn);
if (ret)
goto out;
}
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 073b4a1..32f6bfe 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -91,6 +91,7 @@ struct grant_map {
struct gnttab_map_grant_ref *map_ops;
struct gnttab_unmap_grant_ref *unmap_ops;
struct gnttab_map_grant_ref *kmap_ops;
+ struct gnttab_unmap_grant_ref *kunmap_ops;
struct page **pages;
};
@@ -124,6 +125,7 @@ static void gntdev_free_map(struct grant_map *map)
kfree(map->map_ops);
kfree(map->unmap_ops);
kfree(map->kmap_ops);
+ kfree(map->kunmap_ops);
kfree(map);
}
@@ -140,11 +142,13 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
add->map_ops = kcalloc(count, sizeof(add->map_ops[0]), GFP_KERNEL);
add->unmap_ops = kcalloc(count, sizeof(add->unmap_ops[0]), GFP_KERNEL);
add->kmap_ops = kcalloc(count, sizeof(add->kmap_ops[0]), GFP_KERNEL);
+ add->kunmap_ops = kcalloc(count, sizeof(add->kunmap_ops[0]), GFP_KERNEL);
add->pages = kcalloc(count, sizeof(add->pages[0]), GFP_KERNEL);
if (NULL == add->grants ||
NULL == add->map_ops ||
NULL == add->unmap_ops ||
NULL == add->kmap_ops ||
+ NULL == add->kunmap_ops ||
NULL == add->pages)
goto err;
@@ -155,6 +159,7 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
add->map_ops[i].handle = -1;
add->unmap_ops[i].handle = -1;
add->kmap_ops[i].handle = -1;
+ add->kunmap_ops[i].handle = -1;
}
add->index = 0;
@@ -261,8 +266,6 @@ static int map_grant_pages(struct grant_map *map)
gnttab_set_map_op(&map->map_ops[i], addr, map->flags,
map->grants[i].ref,
map->grants[i].domid);
- gnttab_set_unmap_op(&map->unmap_ops[i], addr,
- map->flags, -1 /* handle */);
}
} else {
/*
@@ -290,13 +293,20 @@ static int map_grant_pages(struct grant_map *map)
return err;
for (i = 0; i < map->count; i++) {
- if (map->map_ops[i].status)
+ if (map->map_ops[i].status) {
err = -EINVAL;
- else {
- BUG_ON(map->map_ops[i].handle == -1);
- map->unmap_ops[i].handle = map->map_ops[i].handle;
- pr_debug("map handle=%d\n", map->map_ops[i].handle);
+ continue;
}
+
+ gnttab_set_unmap_op(&map->unmap_ops[i],
+ map->map_ops[i].host_addr,
+ map->flags,
+ map->map_ops[i].handle);
+ if (use_ptemod)
+ gnttab_set_unmap_op(&map->kunmap_ops[i],
+ map->kmap_ops[i].host_addr,
+ map->flags | GNTMAP_host_map,
+ map->kmap_ops[i].handle);
}
return err;
}
@@ -316,7 +326,7 @@ static int __unmap_grant_pages(struct grant_map *map, int offset, int pages)
}
err = gnttab_unmap_refs(map->unmap_ops + offset,
- use_ptemod ? map->kmap_ops + offset : NULL, map->pages + offset,
+ use_ptemod ? map->kunmap_ops + offset : NULL, map->pages + offset,
pages);
if (err)
return err;
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 7786291..999d7ab 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -738,7 +738,7 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
EXPORT_SYMBOL_GPL(gnttab_map_refs);
int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
- struct gnttab_map_grant_ref *kmap_ops,
+ struct gnttab_unmap_grant_ref *kunmap_ops,
struct page **pages, unsigned int count)
{
int ret;
@@ -747,7 +747,7 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
if (ret)
return ret;
- return clear_foreign_p2m_mapping(unmap_ops, kmap_ops, pages, count);
+ return clear_foreign_p2m_mapping(unmap_ops, kunmap_ops, pages, count);
}
EXPORT_SYMBOL_GPL(gnttab_unmap_refs);
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 3387465..7235d8f 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -167,7 +167,7 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
struct page **pages, unsigned int count);
int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
- struct gnttab_map_grant_ref *kunmap_ops,
+ struct gnttab_unmap_grant_ref *kunmap_ops,
struct page **pages, unsigned int count);
/* Perform a batch of grant map/copy operations. Retry every batch slot
--
1.7.10.4
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 04/12] xen: remove scratch frames for ballooned pages and m2p override
2015-01-06 18:57 [RFC PATCH 00/12] xen: fix many long-standing grant mapping bugs David Vrabel
` (2 preceding siblings ...)
2015-01-06 18:57 ` [PATCH 03/12] xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs() David Vrabel
@ 2015-01-06 18:57 ` David Vrabel
2015-01-06 18:57 ` [PATCH 05/12] x86/xen: require ballooned pages for grant maps David Vrabel
` (7 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: David Vrabel @ 2015-01-06 18:57 UTC (permalink / raw)
To: xen-devel
Cc: Boris Ostrovsky, Stefano Stabellini, Jenny Herbert, David Vrabel
The scratch frame mappings for ballooned pages and the m2p override
are broken. Remove them in preparation for replacing them with
simpler mechanisms that works.
The scratch pages did not ensure that the page was not in use. In
particular, the foreign page could still be in use by hardware. If
the guest reused the frame the hardware could read or write that
frame.
The m2p override did not handle the same frame being granted by two
different grant references. Trying an M2P override lookup in this
case is impossible.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
arch/x86/include/asm/xen/page.h | 18 +--
arch/x86/xen/p2m.c | 254 ++-------------------------------------
drivers/xen/balloon.c | 86 +------------
3 files changed, 14 insertions(+), 344 deletions(-)
diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index e9f52fe..358dcd3 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -57,7 +57,6 @@ extern int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
extern int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
struct gnttab_unmap_grant_ref *kunmap_ops,
struct page **pages, unsigned int count);
-extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn);
/*
* Helper functions to write or read unsigned long values to/from
@@ -154,21 +153,12 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
return mfn;
pfn = mfn_to_pfn_no_overrides(mfn);
- if (__pfn_to_mfn(pfn) != mfn) {
- /*
- * If this appears to be a foreign mfn (because the pfn
- * doesn't map back to the mfn), then check the local override
- * table to see if there's a better pfn to use.
- *
- * m2p_find_override_pfn returns ~0 if it doesn't find anything.
- */
- pfn = m2p_find_override_pfn(mfn, ~0);
- }
+ if (__pfn_to_mfn(pfn) != mfn)
+ pfn = ~0;
/*
- * pfn is ~0 if there are no entries in the m2p for mfn or if the
- * entry doesn't map back to the mfn and m2p_override doesn't have a
- * valid entry for it.
+ * pfn is ~0 if there are no entries in the m2p for mfn or the
+ * entry doesn't map back to the mfn.
*/
if (pfn == ~0 && __pfn_to_mfn(mfn) == IDENTITY_FRAME(mfn))
pfn = mfn;
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 9fd5f70..8f34ed4 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -84,8 +84,6 @@
#define PMDS_PER_MID_PAGE (P2M_MID_PER_PAGE / PTRS_PER_PTE)
-static void __init m2p_override_init(void);
-
unsigned long *xen_p2m_addr __read_mostly;
EXPORT_SYMBOL_GPL(xen_p2m_addr);
unsigned long xen_p2m_size __read_mostly;
@@ -399,8 +397,6 @@ void __init xen_vmalloc_p2m_tree(void)
xen_p2m_size = xen_max_p2m_pfn;
xen_inv_extra_mem();
-
- m2p_override_init();
}
unsigned long get_phys_to_machine(unsigned long pfn)
@@ -652,100 +648,21 @@ bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
return true;
}
-#define M2P_OVERRIDE_HASH_SHIFT 10
-#define M2P_OVERRIDE_HASH (1 << M2P_OVERRIDE_HASH_SHIFT)
-
-static struct list_head *m2p_overrides;
-static DEFINE_SPINLOCK(m2p_override_lock);
-
-static void __init m2p_override_init(void)
-{
- unsigned i;
-
- m2p_overrides = alloc_bootmem_align(
- sizeof(*m2p_overrides) * M2P_OVERRIDE_HASH,
- sizeof(unsigned long));
-
- for (i = 0; i < M2P_OVERRIDE_HASH; i++)
- INIT_LIST_HEAD(&m2p_overrides[i]);
-}
-
-static unsigned long mfn_hash(unsigned long mfn)
-{
- return hash_long(mfn, M2P_OVERRIDE_HASH_SHIFT);
-}
-
-/* Add an MFN override for a particular page */
-static int m2p_add_override(unsigned long mfn, struct page *page,
- struct gnttab_map_grant_ref *kmap_op)
-{
- unsigned long flags;
- unsigned long pfn;
- unsigned long uninitialized_var(address);
- unsigned level;
- pte_t *ptep = NULL;
-
- pfn = page_to_pfn(page);
- if (!PageHighMem(page)) {
- address = (unsigned long)__va(pfn << PAGE_SHIFT);
- ptep = lookup_address(address, &level);
- if (WARN(ptep == NULL || level != PG_LEVEL_4K,
- "m2p_add_override: pfn %lx not mapped", pfn))
- return -EINVAL;
- }
-
- if (kmap_op != NULL) {
- if (!PageHighMem(page)) {
- struct multicall_space mcs =
- xen_mc_entry(sizeof(*kmap_op));
-
- MULTI_grant_table_op(mcs.mc,
- GNTTABOP_map_grant_ref, kmap_op, 1);
-
- xen_mc_issue(PARAVIRT_LAZY_MMU);
- }
- }
- spin_lock_irqsave(&m2p_override_lock, flags);
- list_add(&page->lru, &m2p_overrides[mfn_hash(mfn)]);
- spin_unlock_irqrestore(&m2p_override_lock, flags);
-
- /* p2m(m2p(mfn)) == mfn: the mfn is already present somewhere in
- * this domain. Set the FOREIGN_FRAME_BIT in the p2m for the other
- * pfn so that the following mfn_to_pfn(mfn) calls will return the
- * pfn from the m2p_override (the backend pfn) instead.
- * We need to do this because the pages shared by the frontend
- * (xen-blkfront) can be already locked (lock_page, called by
- * do_read_cache_page); when the userspace backend tries to use them
- * with direct_IO, mfn_to_pfn returns the pfn of the frontend, so
- * do_blockdev_direct_IO is going to try to lock the same pages
- * again resulting in a deadlock.
- * As a side effect get_user_pages_fast might not be safe on the
- * frontend pages while they are being shared with the backend,
- * because mfn_to_pfn (that ends up being called by GUPF) will
- * return the backend pfn rather than the frontend pfn. */
- pfn = mfn_to_pfn_no_overrides(mfn);
- if (__pfn_to_mfn(pfn) == mfn)
- set_phys_to_machine(pfn, FOREIGN_FRAME(mfn));
-
- return 0;
-}
-
int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
struct page **pages, unsigned int count)
{
int i, ret = 0;
- bool lazy = false;
pte_t *pte;
if (xen_feature(XENFEAT_auto_translated_physmap))
return 0;
- if (kmap_ops &&
- !in_interrupt() &&
- paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
- arch_enter_lazy_mmu_mode();
- lazy = true;
+ if (kmap_ops) {
+ ret = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,
+ kmap_ops, count);
+ if (ret)
+ goto out;
}
for (i = 0; i < count; i++) {
@@ -773,160 +690,22 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
ret = -ENOMEM;
goto out;
}
-
- if (kmap_ops) {
- ret = m2p_add_override(mfn, pages[i], &kmap_ops[i]);
- if (ret)
- goto out;
- }
}
out:
- if (lazy)
- arch_leave_lazy_mmu_mode();
-
return ret;
}
EXPORT_SYMBOL_GPL(set_foreign_p2m_mapping);
-static struct page *m2p_find_override(unsigned long mfn)
-{
- unsigned long flags;
- struct list_head *bucket;
- struct page *p, *ret;
-
- if (unlikely(!m2p_overrides))
- return NULL;
-
- ret = NULL;
- bucket = &m2p_overrides[mfn_hash(mfn)];
-
- spin_lock_irqsave(&m2p_override_lock, flags);
-
- list_for_each_entry(p, bucket, lru) {
- if (page_private(p) == mfn) {
- ret = p;
- break;
- }
- }
-
- spin_unlock_irqrestore(&m2p_override_lock, flags);
-
- return ret;
-}
-
-static int m2p_remove_override(struct page *page,
- struct gnttab_unmap_grant_ref *kunmap_op,
- unsigned long mfn)
-{
- unsigned long flags;
- unsigned long pfn;
- unsigned long uninitialized_var(address);
- unsigned level;
- pte_t *ptep = NULL;
-
- pfn = page_to_pfn(page);
-
- if (!PageHighMem(page)) {
- address = (unsigned long)__va(pfn << PAGE_SHIFT);
- ptep = lookup_address(address, &level);
-
- if (WARN(ptep == NULL || level != PG_LEVEL_4K,
- "m2p_remove_override: pfn %lx not mapped", pfn))
- return -EINVAL;
- }
-
- spin_lock_irqsave(&m2p_override_lock, flags);
- list_del(&page->lru);
- spin_unlock_irqrestore(&m2p_override_lock, flags);
-
- if (kunmap_op != NULL) {
- if (!PageHighMem(page)) {
- struct multicall_space mcs;
- struct gnttab_unmap_and_replace *unmap_op;
- struct page *scratch_page = get_balloon_scratch_page();
- unsigned long scratch_page_address = (unsigned long)
- __va(page_to_pfn(scratch_page) << PAGE_SHIFT);
-
- /*
- * It might be that we queued all the m2p grant table
- * hypercalls in a multicall, then m2p_remove_override
- * get called before the multicall has actually been
- * issued. In this case handle is going to -1 because
- * it hasn't been modified yet.
- */
- if (kunmap_op->handle == -1)
- xen_mc_flush();
- /*
- * Now if kmap_op->handle is negative it means that the
- * hypercall actually returned an error.
- */
- if (kunmap_op->handle == GNTST_general_error) {
- pr_warn("m2p_remove_override: pfn %lx mfn %lx, failed to modify kernel mappings",
- pfn, mfn);
- put_balloon_scratch_page();
- return -1;
- }
-
- xen_mc_batch();
-
- mcs = __xen_mc_entry(
- sizeof(struct gnttab_unmap_and_replace));
- unmap_op = mcs.args;
- unmap_op->host_addr = kunmap_op->host_addr;
- unmap_op->new_addr = scratch_page_address;
- unmap_op->handle = kunmap_op->handle;
-
- MULTI_grant_table_op(mcs.mc,
- GNTTABOP_unmap_and_replace, unmap_op, 1);
-
- mcs = __xen_mc_entry(0);
- MULTI_update_va_mapping(mcs.mc, scratch_page_address,
- pfn_pte(page_to_pfn(scratch_page),
- PAGE_KERNEL_RO), 0);
-
- xen_mc_issue(PARAVIRT_LAZY_MMU);
-
- put_balloon_scratch_page();
- }
- }
-
- /* p2m(m2p(mfn)) == FOREIGN_FRAME(mfn): the mfn is already present
- * somewhere in this domain, even before being added to the
- * m2p_override (see comment above in m2p_add_override).
- * If there are no other entries in the m2p_override corresponding
- * to this mfn, then remove the FOREIGN_FRAME_BIT from the p2m for
- * the original pfn (the one shared by the frontend): the backend
- * cannot do any IO on this page anymore because it has been
- * unshared. Removing the FOREIGN_FRAME_BIT from the p2m entry of
- * the original pfn causes mfn_to_pfn(mfn) to return the frontend
- * pfn again. */
- mfn &= ~FOREIGN_FRAME_BIT;
- pfn = mfn_to_pfn_no_overrides(mfn);
- if (__pfn_to_mfn(pfn) == FOREIGN_FRAME(mfn) &&
- m2p_find_override(mfn) == NULL)
- set_phys_to_machine(pfn, mfn);
-
- return 0;
-}
-
int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
struct gnttab_unmap_grant_ref *kunmap_ops,
struct page **pages, unsigned int count)
{
int i, ret = 0;
- bool lazy = false;
if (xen_feature(XENFEAT_auto_translated_physmap))
return 0;
- if (kunmap_ops &&
- !in_interrupt() &&
- paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
- arch_enter_lazy_mmu_mode();
- lazy = true;
- }
-
for (i = 0; i < count; i++) {
unsigned long mfn = __pfn_to_mfn(page_to_pfn(pages[i]));
unsigned long pfn = page_to_pfn(pages[i]);
@@ -940,32 +719,15 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
WARN_ON(!PagePrivate(pages[i]));
ClearPagePrivate(pages[i]);
set_phys_to_machine(pfn, pages[i]->index);
-
- if (kunmap_ops)
- ret = m2p_remove_override(pages[i], &kunmap_ops[i], mfn);
- if (ret)
- goto out;
}
-
+ if (kunmap_ops)
+ ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
+ kunmap_ops, count);
out:
- if (lazy)
- arch_leave_lazy_mmu_mode();
return ret;
}
EXPORT_SYMBOL_GPL(clear_foreign_p2m_mapping);
-unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn)
-{
- struct page *p = m2p_find_override(mfn);
- unsigned long ret = pfn;
-
- if (p)
- ret = page_to_pfn(p);
-
- return ret;
-}
-EXPORT_SYMBOL_GPL(m2p_find_override_pfn);
-
#ifdef CONFIG_XEN_DEBUG_FS
#include <linux/debugfs.h>
#include "debugfs.h"
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 3860d02..0b52d92 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -92,7 +92,6 @@ EXPORT_SYMBOL_GPL(balloon_stats);
/* We increase/decrease in batches which fit in a page */
static xen_pfn_t frame_list[PAGE_SIZE / sizeof(unsigned long)];
-static DEFINE_PER_CPU(struct page *, balloon_scratch_page);
/* List of ballooned pages, threaded through the mem_map array. */
@@ -423,22 +422,12 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
page = pfn_to_page(pfn);
#ifdef CONFIG_XEN_HAVE_PVMMU
- /*
- * Ballooned out frames are effectively replaced with
- * a scratch frame. Ensure direct mappings and the
- * p2m are consistent.
- */
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
if (!PageHighMem(page)) {
- struct page *scratch_page = get_balloon_scratch_page();
-
ret = HYPERVISOR_update_va_mapping(
(unsigned long)__va(pfn << PAGE_SHIFT),
- pfn_pte(page_to_pfn(scratch_page),
- PAGE_KERNEL_RO), 0);
+ __pte_ma(0), 0);
BUG_ON(ret);
-
- put_balloon_scratch_page();
}
__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
}
@@ -500,18 +489,6 @@ static void balloon_process(struct work_struct *work)
mutex_unlock(&balloon_mutex);
}
-struct page *get_balloon_scratch_page(void)
-{
- struct page *ret = get_cpu_var(balloon_scratch_page);
- BUG_ON(ret == NULL);
- return ret;
-}
-
-void put_balloon_scratch_page(void)
-{
- put_cpu_var(balloon_scratch_page);
-}
-
/* Resets the Xen limit, sets new target, and kicks off processing. */
void balloon_set_new_target(unsigned long target)
{
@@ -605,61 +582,13 @@ static void __init balloon_add_region(unsigned long start_pfn,
}
}
-static int alloc_balloon_scratch_page(int cpu)
-{
- if (per_cpu(balloon_scratch_page, cpu) != NULL)
- return 0;
-
- per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
- if (per_cpu(balloon_scratch_page, cpu) == NULL) {
- pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n", cpu);
- return -ENOMEM;
- }
-
- return 0;
-}
-
-
-static int balloon_cpu_notify(struct notifier_block *self,
- unsigned long action, void *hcpu)
-{
- int cpu = (long)hcpu;
- switch (action) {
- case CPU_UP_PREPARE:
- if (alloc_balloon_scratch_page(cpu))
- return NOTIFY_BAD;
- break;
- default:
- break;
- }
- return NOTIFY_OK;
-}
-
-static struct notifier_block balloon_cpu_notifier = {
- .notifier_call = balloon_cpu_notify,
-};
-
static int __init balloon_init(void)
{
- int i, cpu;
+ int i;
if (!xen_domain())
return -ENODEV;
- if (!xen_feature(XENFEAT_auto_translated_physmap)) {
- register_cpu_notifier(&balloon_cpu_notifier);
-
- get_online_cpus();
- for_each_online_cpu(cpu) {
- if (alloc_balloon_scratch_page(cpu)) {
- put_online_cpus();
- unregister_cpu_notifier(&balloon_cpu_notifier);
- return -ENOMEM;
- }
- }
- put_online_cpus();
- }
-
pr_info("Initialising balloon driver\n");
balloon_stats.current_pages = xen_pv_domain()
@@ -696,15 +625,4 @@ static int __init balloon_init(void)
subsys_initcall(balloon_init);
-static int __init balloon_clear(void)
-{
- int cpu;
-
- for_each_possible_cpu(cpu)
- per_cpu(balloon_scratch_page, cpu) = NULL;
-
- return 0;
-}
-early_initcall(balloon_clear);
-
MODULE_LICENSE("GPL");
--
1.7.10.4
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 05/12] x86/xen: require ballooned pages for grant maps
2015-01-06 18:57 [RFC PATCH 00/12] xen: fix many long-standing grant mapping bugs David Vrabel
` (3 preceding siblings ...)
2015-01-06 18:57 ` [PATCH 04/12] xen: remove scratch frames for ballooned pages and m2p override David Vrabel
@ 2015-01-06 18:57 ` David Vrabel
2015-01-06 18:57 ` [PATCH 06/12] xen: mark grant mapped pages as foreign David Vrabel
` (6 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: David Vrabel @ 2015-01-06 18:57 UTC (permalink / raw)
To: xen-devel
Cc: Boris Ostrovsky, Stefano Stabellini, Jenny Herbert, David Vrabel
From: Jenny Herbert <jennifer.herbert@citrix.com>
Ballooned pages are always used for grant maps which means the
original frame does not need to be saved in page->index nor restored
after the grant unmap.
This allows the workaround in netback for the conflicting use of the
(unionized) page->index and page->pfmemalloc to be removed.
Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
arch/x86/xen/p2m.c | 5 +++--
drivers/net/xen-netback/netback.c | 6 ------
2 files changed, 3 insertions(+), 8 deletions(-)
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 8f34ed4..0d70814 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -682,9 +682,10 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
pfn = page_to_pfn(pages[i]);
WARN_ON(PagePrivate(pages[i]));
+ WARN(pfn_to_mfn(pfn) != INVALID_P2M_ENTRY, "page must be ballooned");
+
SetPagePrivate(pages[i]);
set_page_private(pages[i], mfn);
- pages[i]->index = pfn_to_mfn(pfn);
if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn)))) {
ret = -ENOMEM;
@@ -718,7 +719,7 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
set_page_private(pages[i], INVALID_P2M_ENTRY);
WARN_ON(!PagePrivate(pages[i]));
ClearPagePrivate(pages[i]);
- set_phys_to_machine(pfn, pages[i]->index);
+ set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
}
if (kunmap_ops)
ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 908e65e..6441318 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1241,12 +1241,6 @@ static void xenvif_fill_frags(struct xenvif_queue *queue, struct sk_buff *skb)
/* Take an extra reference to offset network stack's put_page */
get_page(queue->mmap_pages[pending_idx]);
}
- /* FIXME: __skb_fill_page_desc set this to true because page->pfmemalloc
- * overlaps with "index", and "mapping" is not set. I think mapping
- * should be set. If delivered to local stack, it would drop this
- * skb in sk_filter unless the socket has the right to use it.
- */
- skb->pfmemalloc = false;
}
static int xenvif_get_extras(struct xenvif_queue *queue,
--
1.7.10.4
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 06/12] xen: mark grant mapped pages as foreign
2015-01-06 18:57 [RFC PATCH 00/12] xen: fix many long-standing grant mapping bugs David Vrabel
` (4 preceding siblings ...)
2015-01-06 18:57 ` [PATCH 05/12] x86/xen: require ballooned pages for grant maps David Vrabel
@ 2015-01-06 18:57 ` David Vrabel
2015-01-07 11:53 ` Ian Campbell
2015-01-09 16:03 ` Stefano Stabellini
2015-01-06 18:57 ` [PATCH 07/12] xen-netback: use foreign page information from the pages themselves David Vrabel
` (5 subsequent siblings)
11 siblings, 2 replies; 34+ messages in thread
From: David Vrabel @ 2015-01-06 18:57 UTC (permalink / raw)
To: xen-devel
Cc: Jenny Herbert, Stefano Stabellini, Jenny Herbert, David Vrabel,
Boris Ostrovsky
From: Jenny Herbert <jennifer.herbert@citrix.com>
Use the "foreign" page flag to mark pages that have a grant map. Use
page->private to store information of the grant (the granting domain
and the grant reference).
Signed-off-by: Jenny Herbert <jenny.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
arch/x86/xen/p2m.c | 50 ++++++++++++++++++++++++++++++++++++++-------
include/xen/grant_table.h | 13 ++++++++++++
2 files changed, 56 insertions(+), 7 deletions(-)
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 0d70814..22624a3 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -648,6 +648,43 @@ bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
return true;
}
+static int
+init_page_grant_ref(struct page *p, domid_t domid, grant_ref_t grantref)
+{
+#ifdef CONFIG_X86_64
+ uint64_t gref;
+ uint64_t* gref_p = &gref;
+#else
+ uint64_t* gref_p = kmalloc(sizeof(uint64_t), GFP_KERNEL);
+ if (!gref)
+ return -ENOMEM;
+ uint64_t* gref = gref_p;
+#endif
+
+ *gref_p = ((uint64_t) grantref << 32) | domid;
+ p->private = gref;
+
+ WARN_ON(PagePrivate(p));
+ WARN_ON(PageForeign(p));
+ SetPagePrivate(p);
+ SetPageForeign(p);
+ return 0;
+}
+
+static void
+clear_page_grant_ref(struct page *p)
+{
+ WARN_ON(!PagePrivate(p));
+ WARN_ON(!PageForeign(p));
+
+#ifndef CONFIG_X86_64
+ kfree(p->private);
+#endif
+ p->private = 0;
+ ClearPagePrivate(p);
+ ClearPageForeign(p);
+}
+
int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
struct page **pages, unsigned int count)
@@ -681,11 +718,12 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
}
pfn = page_to_pfn(pages[i]);
- WARN_ON(PagePrivate(pages[i]));
- WARN(pfn_to_mfn(pfn) != INVALID_P2M_ENTRY, "page must be ballooned");
+ ret = init_page_grant_ref(pages[i],
+ map_ops[i].dom, map_ops[i].ref);
+ if (ret < 0)
+ goto out;
- SetPagePrivate(pages[i]);
- set_page_private(pages[i], mfn);
+ WARN(pfn_to_mfn(pfn) != INVALID_P2M_ENTRY, "page must be ballooned");
if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn)))) {
ret = -ENOMEM;
@@ -716,9 +754,7 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
goto out;
}
- set_page_private(pages[i], INVALID_P2M_ENTRY);
- WARN_ON(!PagePrivate(pages[i]));
- ClearPagePrivate(pages[i]);
+ clear_page_grant_ref(pages[i]);
set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
}
if (kunmap_ops)
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 7235d8f..381f007 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -45,6 +45,7 @@
#include <asm/xen/hypervisor.h>
#include <xen/features.h>
+#include <linux/mm_types.h>
#define GNTTAB_RESERVED_XENSTORE 1
@@ -182,4 +183,16 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
void gnttab_batch_map(struct gnttab_map_grant_ref *batch, unsigned count);
void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count);
+static inline void
+get_page_grant_ref(struct page *p, domid_t* domid, grant_ref_t* grantref)
+{
+#ifdef CONFIG_X86_64
+ uint64_t gref = p->private;
+#else
+ uint64_t gref = *p->private;
+#endif
+ *domid = gref & 0xffff;
+ *grantref = gref >> 32;
+}
+
#endif /* __ASM_GNTTAB_H__ */
--
1.7.10.4
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 07/12] xen-netback: use foreign page information from the pages themselves
2015-01-06 18:57 [RFC PATCH 00/12] xen: fix many long-standing grant mapping bugs David Vrabel
` (5 preceding siblings ...)
2015-01-06 18:57 ` [PATCH 06/12] xen: mark grant mapped pages as foreign David Vrabel
@ 2015-01-06 18:57 ` David Vrabel
2015-01-07 11:57 ` Ian Campbell
2015-01-06 18:57 ` [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use David Vrabel
` (4 subsequent siblings)
11 siblings, 1 reply; 34+ messages in thread
From: David Vrabel @ 2015-01-06 18:57 UTC (permalink / raw)
To: xen-devel
Cc: Jenny Herbert, Stefano Stabellini, Jenny Herbert, David Vrabel,
Boris Ostrovsky
From: Jenny Herbert <jenny.herbert@citrix.com>
Use the foreign page flag in netback to get the domid and grant ref
needed for the grant copy. This signficiantly simplifies the netback
code and makes netback work with foreign pages from other backends
(e.g., blkback).
This allows blkback to use iSCSI disks provided by domUs running on
the same host.
Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
drivers/net/xen-netback/netback.c | 97 +++----------------------------------
1 file changed, 6 insertions(+), 91 deletions(-)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 6441318..997088b 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -314,9 +314,7 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif_queue *queue,
static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb,
struct netrx_pending_operations *npo,
struct page *page, unsigned long size,
- unsigned long offset, int *head,
- struct xenvif_queue *foreign_queue,
- grant_ref_t foreign_gref)
+ unsigned long offset, int *head)
{
struct gnttab_copy *copy_gop;
struct xenvif_rx_meta *meta;
@@ -361,9 +359,9 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
copy_gop->flags = GNTCOPY_dest_gref;
copy_gop->len = bytes;
- if (foreign_queue) {
- copy_gop->source.domid = foreign_queue->vif->domid;
- copy_gop->source.u.ref = foreign_gref;
+ if (PageForeign(page)) {
+ get_page_grant_ref(page, ©_gop->source.domid,
+ ©_gop->source.u.ref);
copy_gop->flags |= GNTCOPY_source_gref;
} else {
copy_gop->source.domid = DOMID_SELF;
@@ -406,35 +404,6 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
}
/*
- * Find the grant ref for a given frag in a chain of struct ubuf_info's
- * skb: the skb itself
- * i: the frag's number
- * ubuf: a pointer to an element in the chain. It should not be NULL
- *
- * Returns a pointer to the element in the chain where the page were found. If
- * not found, returns NULL.
- * See the definition of callback_struct in common.h for more details about
- * the chain.
- */
-static const struct ubuf_info *xenvif_find_gref(const struct sk_buff *const skb,
- const int i,
- const struct ubuf_info *ubuf)
-{
- struct xenvif_queue *foreign_queue = ubuf_to_queue(ubuf);
-
- do {
- u16 pending_idx = ubuf->desc;
-
- if (skb_shinfo(skb)->frags[i].page.p ==
- foreign_queue->mmap_pages[pending_idx])
- break;
- ubuf = (struct ubuf_info *) ubuf->ctx;
- } while (ubuf);
-
- return ubuf;
-}
-
-/*
* Prepare an SKB to be transmitted to the frontend.
*
* This function is responsible for allocating grant operations, meta
@@ -459,8 +428,6 @@ static int xenvif_gop_skb(struct sk_buff *skb,
int head = 1;
int old_meta_prod;
int gso_type;
- const struct ubuf_info *ubuf = skb_shinfo(skb)->destructor_arg;
- const struct ubuf_info *const head_ubuf = ubuf;
old_meta_prod = npo->meta_prod;
@@ -507,68 +474,16 @@ static int xenvif_gop_skb(struct sk_buff *skb,
len = skb_tail_pointer(skb) - data;
xenvif_gop_frag_copy(queue, skb, npo,
- virt_to_page(data), len, offset, &head,
- NULL,
- 0);
+ virt_to_page(data), len, offset, &head);
data += len;
}
for (i = 0; i < nr_frags; i++) {
- /* This variable also signals whether foreign_gref has a real
- * value or not.
- */
- struct xenvif_queue *foreign_queue = NULL;
- grant_ref_t foreign_gref;
-
- if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) &&
- (ubuf->callback == &xenvif_zerocopy_callback)) {
- const struct ubuf_info *const startpoint = ubuf;
-
- /* Ideally ubuf points to the chain element which
- * belongs to this frag. Or if frags were removed from
- * the beginning, then shortly before it.
- */
- ubuf = xenvif_find_gref(skb, i, ubuf);
-
- /* Try again from the beginning of the list, if we
- * haven't tried from there. This only makes sense in
- * the unlikely event of reordering the original frags.
- * For injected local pages it's an unnecessary second
- * run.
- */
- if (unlikely(!ubuf) && startpoint != head_ubuf)
- ubuf = xenvif_find_gref(skb, i, head_ubuf);
-
- if (likely(ubuf)) {
- u16 pending_idx = ubuf->desc;
-
- foreign_queue = ubuf_to_queue(ubuf);
- foreign_gref =
- foreign_queue->pending_tx_info[pending_idx].req.gref;
- /* Just a safety measure. If this was the last
- * element on the list, the for loop will
- * iterate again if a local page were added to
- * the end. Using head_ubuf here prevents the
- * second search on the chain. Or the original
- * frags changed order, but that's less likely.
- * In any way, ubuf shouldn't be NULL.
- */
- ubuf = ubuf->ctx ?
- (struct ubuf_info *) ubuf->ctx :
- head_ubuf;
- } else
- /* This frag was a local page, added to the
- * array after the skb left netback.
- */
- ubuf = head_ubuf;
- }
xenvif_gop_frag_copy(queue, skb, npo,
skb_frag_page(&skb_shinfo(skb)->frags[i]),
skb_frag_size(&skb_shinfo(skb)->frags[i]),
skb_shinfo(skb)->frags[i].page_offset,
- &head,
- foreign_queue,
- foreign_queue ? foreign_gref : UINT_MAX);
+ &head);
}
return npo->meta_prod - old_meta_prod;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use
2015-01-06 18:57 [RFC PATCH 00/12] xen: fix many long-standing grant mapping bugs David Vrabel
` (6 preceding siblings ...)
2015-01-06 18:57 ` [PATCH 07/12] xen-netback: use foreign page information from the pages themselves David Vrabel
@ 2015-01-06 18:57 ` David Vrabel
2015-01-07 12:00 ` Ian Campbell
2015-01-09 16:11 ` Stefano Stabellini
2015-01-06 18:57 ` [PATCH 09/12] xen/gntdev: safely unmap grants in case they are still " David Vrabel
` (3 subsequent siblings)
11 siblings, 2 replies; 34+ messages in thread
From: David Vrabel @ 2015-01-06 18:57 UTC (permalink / raw)
To: xen-devel
Cc: Boris Ostrovsky, Stefano Stabellini, Jenny Herbert, David Vrabel
From: Jenny Herbert <jennifer.herbert@citrix.com>
Introduce gnttab_unmap_refs_async() that can be used to safely unmap
pages that may be in use (ref count > 1). If the pages are in use the
unmap is deferred and retried later. This polling is not very clever
but it should be good enough if the cases where the delay is necessary
are rare.
This is needed to allow block backends using grant mapping to safely
use network storage (block or filesystem based such as iSCSI or NFS).
The network storage driver may complete a block request whilst there
is a queued network packet retry (because the ack from the remote end
races with deciding to queue the retry). The pages for the retried
packet would be grant unmapped and the network driver (or hardware)
would access the unmapped page.
Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
drivers/xen/grant-table.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
include/xen/grant_table.h | 18 ++++++++++++++++++
2 files changed, 62 insertions(+)
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 999d7ab..dc0edfa 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -42,6 +42,7 @@
#include <linux/io.h>
#include <linux/delay.h>
#include <linux/hardirq.h>
+#include <linux/workqueue.h>
#include <xen/xen.h>
#include <xen/interface/xen.h>
@@ -751,6 +752,49 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
}
EXPORT_SYMBOL_GPL(gnttab_unmap_refs);
+#define GNTTAB_UNMAP_REFS_DELAY 5
+
+static void __gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item);
+
+static void gnttab_unmap_work(struct work_struct *work)
+{
+ struct gntab_unmap_queue_data
+ *unmap_data = container_of(work,
+ struct gntab_unmap_queue_data,
+ gnttab_work.work);
+ if (unmap_data->age!=UINT_MAX)
+ unmap_data->age++;
+ __gnttab_unmap_refs_async(unmap_data);
+}
+
+static void __gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item)
+{
+ int ret;
+ int pc;
+
+ for (pc = 0; pc < item->count; pc++) {
+ if (page_count(item->pages[pc]) > 1) {
+ unsigned long delay = GNTTAB_UNMAP_REFS_DELAY * (item->age + 1);
+ schedule_delayed_work(&item->gnttab_work,
+ msecs_to_jiffies(delay));
+ return;
+ }
+ }
+
+ ret = gnttab_unmap_refs(item->unmap_ops, item->kunmap_ops,
+ item->pages, item->count);
+ item->done(ret, item);
+}
+
+void gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item)
+{
+ INIT_DELAYED_WORK(&item->gnttab_work, gnttab_unmap_work);
+ item->age = 0;
+
+ __gnttab_unmap_refs_async(item);
+}
+EXPORT_SYMBOL_GPL(gnttab_unmap_refs_async);
+
static int gnttab_map_frames_v1(xen_pfn_t *frames, unsigned int nr_gframes)
{
int rc;
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 381f007..f7c4821 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -59,6 +59,22 @@ struct gnttab_free_callback {
u16 count;
};
+struct gntab_unmap_queue_data;
+
+typedef void (*gnttab_unmap_refs_done)(int result, struct gntab_unmap_queue_data *data);
+
+struct gntab_unmap_queue_data
+{
+ struct delayed_work gnttab_work;
+ void *data;
+ gnttab_unmap_refs_done done;
+ struct gnttab_unmap_grant_ref *unmap_ops;
+ struct gnttab_unmap_grant_ref *kunmap_ops;
+ struct page **pages;
+ unsigned int count;
+ unsigned int age;
+};
+
int gnttab_init(void);
int gnttab_suspend(void);
int gnttab_resume(void);
@@ -170,6 +186,8 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
struct gnttab_unmap_grant_ref *kunmap_ops,
struct page **pages, unsigned int count);
+void gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item);
+
/* Perform a batch of grant map/copy operations. Retry every batch slot
* for which the hypervisor returns GNTST_eagain. This is typically due
--
1.7.10.4
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 09/12] xen/gntdev: safely unmap grants in case they are still in use
2015-01-06 18:57 [RFC PATCH 00/12] xen: fix many long-standing grant mapping bugs David Vrabel
` (7 preceding siblings ...)
2015-01-06 18:57 ` [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use David Vrabel
@ 2015-01-06 18:57 ` David Vrabel
2015-01-06 18:57 ` [PATCH 10/12] xen-blkback: " David Vrabel
` (2 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: David Vrabel @ 2015-01-06 18:57 UTC (permalink / raw)
To: xen-devel
Cc: Boris Ostrovsky, Stefano Stabellini, Jenny Herbert, David Vrabel
From: Jenny Herbert <jennifer.herbert@citrix.com>
Use gnttab_unmap_refs_async() to wait until the mapped pages are no
longer in use before unmapping them.
This allows userspace programs to safely use Direct I/O and AIO to a
network filesystem which may retain refs to pages in queued skbs after
the filesystem I/O has completed.
Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
drivers/xen/gntdev.c | 36 +++++++++++++++++++++++++++++++-----
1 file changed, 31 insertions(+), 5 deletions(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 32f6bfe..a740dce 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -311,9 +311,30 @@ static int map_grant_pages(struct grant_map *map)
return err;
}
+struct unmap_grant_pages_callback_data
+{
+ struct completion completion;
+ int result;
+};
+
+static void unmap_grant_callback(int result,
+ struct gntab_unmap_queue_data *data)
+{
+ struct unmap_grant_pages_callback_data* d = data->data;
+
+ d->result = result;
+ complete(&d->completion);
+}
+
static int __unmap_grant_pages(struct grant_map *map, int offset, int pages)
{
int i, err = 0;
+ struct gntab_unmap_queue_data unmap_data;
+ struct unmap_grant_pages_callback_data data;
+
+ init_completion(&data.completion);
+ unmap_data.data = &data;
+ unmap_data.done= &unmap_grant_callback;
if (map->notify.flags & UNMAP_NOTIFY_CLEAR_BYTE) {
int pgno = (map->notify.addr >> PAGE_SHIFT);
@@ -325,11 +346,16 @@ static int __unmap_grant_pages(struct grant_map *map, int offset, int pages)
}
}
- err = gnttab_unmap_refs(map->unmap_ops + offset,
- use_ptemod ? map->kunmap_ops + offset : NULL, map->pages + offset,
- pages);
- if (err)
- return err;
+ unmap_data.unmap_ops = map->unmap_ops + offset;
+ unmap_data.kunmap_ops = use_ptemod ? map->kunmap_ops + offset : NULL;
+ unmap_data.pages = map->pages + offset;
+ unmap_data.count = pages;
+
+ gnttab_unmap_refs_async(&unmap_data);
+
+ wait_for_completion(&data.completion);
+ if (data.result)
+ return data.result;
for (i = 0; i < pages; i++) {
if (map->unmap_ops[offset+i].status)
--
1.7.10.4
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 10/12] xen-blkback: safely unmap grants in case they are still in use
2015-01-06 18:57 [RFC PATCH 00/12] xen: fix many long-standing grant mapping bugs David Vrabel
` (8 preceding siblings ...)
2015-01-06 18:57 ` [PATCH 09/12] xen/gntdev: safely unmap grants in case they are still " David Vrabel
@ 2015-01-06 18:57 ` David Vrabel
2015-01-06 18:57 ` [PATCH 11/12] xen/gntdev: mark userspace PTEs as special on x86 PV guests David Vrabel
2015-01-06 18:57 ` [PATCH 12/12] xen/gntdev: provide a set of pages for each VMA David Vrabel
11 siblings, 0 replies; 34+ messages in thread
From: David Vrabel @ 2015-01-06 18:57 UTC (permalink / raw)
To: xen-devel
Cc: Boris Ostrovsky, Stefano Stabellini, Jenny Herbert, David Vrabel
From: Jenny Herbert <jennifer.herbert@citrix.com>
Use gnttab_unmap_refs_async() to wait until the mapped pages are no
longer in use before unmapping them.
This allows blkback to use network storage which may retain refs to
pages in queued skbs after the block I/O has completed.
Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
drivers/block/xen-blkback/blkback.c | 104 +++++++++++++++++++++++++++++++++--
drivers/block/xen-blkback/common.h | 3 +
2 files changed, 103 insertions(+), 4 deletions(-)
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 63fc7f0..167305c 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -47,6 +47,7 @@
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>
#include <xen/balloon.h>
+#include <xen/grant_table.h>
#include "common.h"
/*
@@ -262,6 +263,17 @@ static void put_persistent_gnt(struct xen_blkif *blkif,
atomic_dec(&blkif->persistent_gnt_in_use);
}
+static void free_persistent_gnts_unmap_callback(int result,
+ struct gntab_unmap_queue_data *data)
+{
+ struct completion* c = data->data;
+
+ /* BUG_ON used to reproduce existing behaviour,
+ but is this the best way to deal with this? */
+ BUG_ON(result);
+ complete(c);
+}
+
static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
unsigned int num)
{
@@ -269,8 +281,17 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
struct persistent_gnt *persistent_gnt;
struct rb_node *n;
- int ret = 0;
int segs_to_unmap = 0;
+ struct gntab_unmap_queue_data unmap_data;
+ struct completion unmap_completion;
+
+ init_completion(&unmap_completion);
+
+ unmap_data.data = &unmap_completion;
+ unmap_data.done = &free_persistent_gnts_unmap_callback;
+ unmap_data.pages = pages;
+ unmap_data.unmap_ops = unmap;
+ unmap_data.kunmap_ops = NULL;
foreach_grant_safe(persistent_gnt, n, root, node) {
BUG_ON(persistent_gnt->handle ==
@@ -285,9 +306,11 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST ||
!rb_next(&persistent_gnt->node)) {
- ret = gnttab_unmap_refs(unmap, NULL, pages,
- segs_to_unmap);
- BUG_ON(ret);
+
+ unmap_data.count = segs_to_unmap;
+ gnttab_unmap_refs_async(&unmap_data);
+ wait_for_completion(&unmap_completion);
+
put_free_pages(blkif, pages, segs_to_unmap);
segs_to_unmap = 0;
}
@@ -653,6 +676,76 @@ void xen_blkbk_free_caches(struct xen_blkif *blkif)
shrink_free_pagepool(blkif, 0 /* All */);
}
+static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_queue_data *data)
+{
+ struct pending_req* pending_req = (struct pending_req*) (data->data);
+ struct xen_blkif *blkif = pending_req->blkif;
+
+ /* BUG_ON used to reproduce existing behaviour,
+ but is this the best way to deal with this? */
+ BUG_ON(result);
+ /* free page after unmap */
+ put_free_pages(blkif, data->pages, data->count);
+
+ /* respond */
+ make_response(blkif, pending_req->id,
+ pending_req->operation, pending_req->status);
+ free_req(blkif, pending_req);
+ /*
+ * Make sure the request is freed before releasing blkif,
+ * or there could be a race between free_req and the
+ * cleanup done in xen_blkif_free during shutdown.
+ *
+ * NB: The fact that we might try to wake up pending_free_wq
+ * before drain_complete (in case there's a drain going on)
+ * it's not a problem with our current implementation
+ * because we can assure there's no thread waiting on
+ * pending_free_wq if there's a drain going on, but it has
+ * to be taken into account if the current model is changed.
+ */
+ if (atomic_dec_and_test(&blkif->inflight) && atomic_read(&blkif->drain)) {
+ complete(&blkif->drain_complete);
+ }
+ xen_blkif_put(blkif);
+}
+
+static void xen_blkbk_unmap_populate(struct pending_req *req)
+{
+
+ struct gntab_unmap_queue_data* work = &req->gnttab_unmap_data;
+ struct xen_blkif *blkif = req->blkif;
+ struct grant_page **pages = req->segments;
+ unsigned int i, invcount = 0;
+
+ for (i = 0; i < req->nr_pages; i++) {
+ if (pages[i]->persistent_gnt != NULL) {
+ put_persistent_gnt(blkif, pages[i]->persistent_gnt);
+ continue;
+ }
+ if (pages[i]->handle == BLKBACK_INVALID_HANDLE)
+ continue;
+
+ req->unmap_pages[invcount] = pages[i]->page;
+ gnttab_set_unmap_op(&req->unmap[invcount], vaddr(pages[invcount]->page),
+ GNTMAP_host_map, pages[i]->handle);
+ pages[i]->handle = BLKBACK_INVALID_HANDLE;
+ invcount++;
+ }
+
+ work->data = req;
+ work->done = &xen_blkbk_unmap_and_respond_callback;
+ work->unmap_ops = req->unmap;
+ work->kunmap_ops = NULL;
+ work->pages = req->unmap_pages;
+ work->count = invcount;
+
+}
+
+static void xen_blkbk_unmap_and_respond(struct pending_req *req)
+{
+ xen_blkbk_unmap_populate(req);
+ gnttab_unmap_refs_async(&req->gnttab_unmap_data);
+}
/*
* Unmap the grant references, and also remove the M2P over-rides
* used in the 'pending_req'.
@@ -982,6 +1075,9 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
* the grant references associated with 'request' and provide
* the proper response on the ring.
*/
+ if (atomic_dec_and_test(&pending_req->pendcnt))
+ xen_blkbk_unmap_and_respond(pending_req);
+
if (atomic_dec_and_test(&pending_req->pendcnt)) {
struct xen_blkif *blkif = pending_req->blkif;
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index f65b807..428a34d 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -350,6 +350,9 @@ struct pending_req {
struct grant_page *indirect_pages[MAX_INDIRECT_PAGES];
struct seg_buf seg[MAX_INDIRECT_SEGMENTS];
struct bio *biolist[MAX_INDIRECT_SEGMENTS];
+ struct gnttab_unmap_grant_ref unmap[MAX_INDIRECT_SEGMENTS];
+ struct page *unmap_pages[MAX_INDIRECT_SEGMENTS];
+ struct gntab_unmap_queue_data gnttab_unmap_data;
};
--
1.7.10.4
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 11/12] xen/gntdev: mark userspace PTEs as special on x86 PV guests
2015-01-06 18:57 [RFC PATCH 00/12] xen: fix many long-standing grant mapping bugs David Vrabel
` (9 preceding siblings ...)
2015-01-06 18:57 ` [PATCH 10/12] xen-blkback: " David Vrabel
@ 2015-01-06 18:57 ` David Vrabel
2015-01-07 12:11 ` Ian Campbell
2015-01-06 18:57 ` [PATCH 12/12] xen/gntdev: provide a set of pages for each VMA David Vrabel
11 siblings, 1 reply; 34+ messages in thread
From: David Vrabel @ 2015-01-06 18:57 UTC (permalink / raw)
To: xen-devel
Cc: Boris Ostrovsky, Stefano Stabellini, Jenny Herbert, David Vrabel
In an x86 PV guest, get_user_pages_fast() on a userspace address range
containing foreign mappings does not work correctly because the M2P
lookup of the MFN from a userspace PTE may return the wrong page.
Force get_user_pages_fast() to fail on such addresses by marking the PTEs
as special.
If Xen has XENFEAT_gnttab_map_avail_bits (available since at least
4.0), we can do so efficiently in the grant map hypercall. Otherwise,
it needs to be done afterwards. This is both inefficient and racy
(the mapping is visible to the task before we fixup the PTEs), but
will be fine for well-behaved applications that do not use the mapping
until after the mmap() system call returns.
Guests with XENFEAT_auto_translated_physmap (ARM and x86 HVM or PVH)
do not need this since get_user_pages() has always worked correctly
for them.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
drivers/xen/gntdev.c | 32 ++++++++++++++++++++++++++++++--
include/xen/interface/features.h | 6 ++++++
include/xen/interface/grant_table.h | 7 +++++++
3 files changed, 43 insertions(+), 2 deletions(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index a740dce..5d73fe8 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -244,11 +244,24 @@ static int find_grant_ptes(pte_t *pte, pgtable_t token,
BUG_ON(pgnr >= map->count);
pte_maddr = arbitrary_virt_to_machine(pte).maddr;
+ /*
+ * Set the PTE as special to force get_user_pages_fast() fall
+ * back to the slow path. If this is not supported as part of
+ * the grant map, it will be done afterwards.
+ */
+ if (xen_feature(XENFEAT_gnttab_map_avail_bits))
+ flags |= (1 << _GNTMAP_guest_avail0);
+
gnttab_set_map_op(&map->map_ops[pgnr], pte_maddr, flags,
map->grants[pgnr].ref,
map->grants[pgnr].domid);
- gnttab_set_unmap_op(&map->unmap_ops[pgnr], pte_maddr, flags,
- -1 /* handle */);
+ return 0;
+}
+
+static int set_grant_ptes_as_special(pte_t *pte, pgtable_t token,
+ unsigned long addr, void *data)
+{
+ set_pte_at(current->mm, addr, pte, pte_mkspecial(*pte));
return 0;
}
@@ -842,6 +855,21 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
if (err)
goto out_put_map;
}
+ } else {
+ /*
+ * If the PTEs were not made special by the grant map
+ * hypercall, do so here.
+ *
+ * This is racy since the mapping is already visible
+ * to userspace but userspace should be well-behaved
+ * enough to not touch it until the mmap() call
+ * returns.
+ */
+ if (!xen_feature(XENFEAT_gnttab_map_avail_bits)) {
+ apply_to_page_range(vma->vm_mm, vma->vm_start,
+ vma->vm_end - vma->vm_start,
+ set_grant_ptes_as_special, NULL);
+ }
}
return 0;
diff --git a/include/xen/interface/features.h b/include/xen/interface/features.h
index 131a6cc..6ad3d11 100644
--- a/include/xen/interface/features.h
+++ b/include/xen/interface/features.h
@@ -41,6 +41,12 @@
/* x86: Does this Xen host support the MMU_PT_UPDATE_PRESERVE_AD hypercall? */
#define XENFEAT_mmu_pt_update_preserve_ad 5
+/*
+ * If set, GNTTABOP_map_grant_ref honors flags to be placed into guest kernel
+ * available pte bits.
+ */
+#define XENFEAT_gnttab_map_avail_bits 7
+
/* x86: Does this Xen host support the HVM callback vector type? */
#define XENFEAT_hvm_callback_vector 8
diff --git a/include/xen/interface/grant_table.h b/include/xen/interface/grant_table.h
index bcce564..56806bc 100644
--- a/include/xen/interface/grant_table.h
+++ b/include/xen/interface/grant_table.h
@@ -526,6 +526,13 @@ DEFINE_GUEST_HANDLE_STRUCT(gnttab_cache_flush);
#define GNTMAP_contains_pte (1<<_GNTMAP_contains_pte)
/*
+ * Bits to be placed in guest kernel available PTE bits (architecture
+ * dependent; only supported when XENFEAT_gnttab_map_avail_bits is set).
+ */
+#define _GNTMAP_guest_avail0 (16)
+#define GNTMAP_guest_avail_mask ((uint32_t)~0 << _GNTMAP_guest_avail0)
+
+/*
* Values for error status returns. All errors are -ve.
*/
#define GNTST_okay (0) /* Normal return. */
--
1.7.10.4
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 12/12] xen/gntdev: provide a set of pages for each VMA
2015-01-06 18:57 [RFC PATCH 00/12] xen: fix many long-standing grant mapping bugs David Vrabel
` (10 preceding siblings ...)
2015-01-06 18:57 ` [PATCH 11/12] xen/gntdev: mark userspace PTEs as special on x86 PV guests David Vrabel
@ 2015-01-06 18:57 ` David Vrabel
2015-01-09 15:55 ` Stefano Stabellini
11 siblings, 1 reply; 34+ messages in thread
From: David Vrabel @ 2015-01-06 18:57 UTC (permalink / raw)
To: xen-devel
Cc: Boris Ostrovsky, Stefano Stabellini, Jenny Herbert, David Vrabel
From: Jenny Herbert <jennifer.herbert@citrix.com>
For each VMA for grant maps, provide the correct array of pages so
get_user_pages() on foreign mappings works in PV guests.
Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
drivers/xen/gntdev.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 5d73fe8..09e7863 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -870,6 +870,8 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
vma->vm_end - vma->vm_start,
set_grant_ptes_as_special, NULL);
}
+
+ vma->pages = map->pages;
}
return 0;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 06/12] xen: mark grant mapped pages as foreign
2015-01-06 18:57 ` [PATCH 06/12] xen: mark grant mapped pages as foreign David Vrabel
@ 2015-01-07 11:53 ` Ian Campbell
2015-01-09 16:03 ` Stefano Stabellini
1 sibling, 0 replies; 34+ messages in thread
From: Ian Campbell @ 2015-01-07 11:53 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Jenny Herbert, Jenny Herbert, Boris Ostrovsky,
Stefano Stabellini
On Tue, 2015-01-06 at 18:57 +0000, David Vrabel wrote:
> From: Jenny Herbert <jennifer.herbert@citrix.com>
>
> Use the "foreign" page flag to mark pages that have a grant map. Use
> page->private to store information of the grant (the granting domain
> and the grant reference).
>
> Signed-off-by: Jenny Herbert <jenny.herbert@citrix.com>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
> arch/x86/xen/p2m.c | 50 ++++++++++++++++++++++++++++++++++++++-------
> include/xen/grant_table.h | 13 ++++++++++++
> 2 files changed, 56 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> index 0d70814..22624a3 100644
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -648,6 +648,43 @@ bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
> return true;
> }
>
> +static int
> +init_page_grant_ref(struct page *p, domid_t domid, grant_ref_t grantref)
I'd be inclined to add "map" to the names somewhere, otherwise people
might thing they need to call this when allocating a grant in the f.e.
or other things.
> +{
> +#ifdef CONFIG_X86_64
Rather than suggesting to add CONFIG_ARM_64 here I'll suggest
BITS_PER_LONG >= 64.
> + uint64_t gref;
> + uint64_t* gref_p = &gref;
> +#else
> + uint64_t* gref_p = kmalloc(sizeof(uint64_t), GFP_KERNEL);
Might this allocation be happening during e.g. swapping? I suppose it is
backend only, and swapping to a loopback vbd would be pretty mad. If you
can figure a reasonable use case for that you might want some extra GFP
flags?
Might this be hot enough to warrant using a specific kmem_cache?
> + if (!gref)
> + return -ENOMEM;
> + uint64_t* gref = gref_p;
> +#endif
> +
> + *gref_p = ((uint64_t) grantref << 32) | domid;
> + p->private = gref;
There is a set_page_private() macro, which doesn't seem to do much but I
suppose you should use it (and page_private() for accessing, if you
don't already).
> @@ -182,4 +183,16 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
> void gnttab_batch_map(struct gnttab_map_grant_ref *batch, unsigned count);
> void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count);
>
> +static inline void
> +get_page_grant_ref(struct page *p, domid_t* domid, grant_ref_t* grantref)
> +{
BUG_ON(!PageBlah(p))?
> +#ifdef CONFIG_X86_64
> + uint64_t gref = p->private;
> +#else
> + uint64_t gref = *p->private;
> +#endif
> + *domid = gref & 0xffff;
> + *grantref = gref >> 32;
> +}
> +
> #endif /* __ASM_GNTTAB_H__ */
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 07/12] xen-netback: use foreign page information from the pages themselves
2015-01-06 18:57 ` [PATCH 07/12] xen-netback: use foreign page information from the pages themselves David Vrabel
@ 2015-01-07 11:57 ` Ian Campbell
0 siblings, 0 replies; 34+ messages in thread
From: Ian Campbell @ 2015-01-07 11:57 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Jenny Herbert, Jenny Herbert, Boris Ostrovsky,
Stefano Stabellini
On Tue, 2015-01-06 at 18:57 +0000, David Vrabel wrote:
> From: Jenny Herbert <jenny.herbert@citrix.com>
>
> Use the foreign page flag in netback to get the domid and grant ref
> needed for the grant copy. This signficiantly simplifies the netback
> code and makes netback work with foreign pages from other backends
> (e.g., blkback).
>
> This allows blkback to use iSCSI disks provided by domUs running on
> the same host.
>
> Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
> drivers/net/xen-netback/netback.c | 97 +++----------------------------------
> 1 file changed, 6 insertions(+), 91 deletions(-)
Gotta love that diffstat...
Acked-by: Ian Campbell <ian.campbell@citrix.com>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use
2015-01-06 18:57 ` [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use David Vrabel
@ 2015-01-07 12:00 ` Ian Campbell
2015-01-07 12:06 ` Ian Campbell
2015-01-07 13:07 ` David Vrabel
2015-01-09 16:11 ` Stefano Stabellini
1 sibling, 2 replies; 34+ messages in thread
From: Ian Campbell @ 2015-01-07 12:00 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Boris Ostrovsky, Jenny Herbert, Stefano Stabellini
On Tue, 2015-01-06 at 18:57 +0000, David Vrabel wrote:
> From: Jenny Herbert <jennifer.herbert@citrix.com>
>
> Introduce gnttab_unmap_refs_async() that can be used to safely unmap
> pages that may be in use (ref count > 1). If the pages are in use the
> unmap is deferred and retried later. This polling is not very clever
> but it should be good enough if the cases where the delay is necessary
> are rare.
>
> This is needed to allow block backends using grant mapping to safely
> use network storage (block or filesystem based such as iSCSI or NFS).
>
> The network storage driver may complete a block request whilst there
> is a queued network packet retry (because the ack from the remote end
> races with deciding to queue the retry). The pages for the retried
> packet would be grant unmapped and the network driver (or hardware)
> would access the unmapped page.
I thought this had been solved a little while ago by mapping a scratch
page on unmap even for kernel space grant mappings, but both the design
doc and here imply not (i.e. the scratch is for user grant mappings
only), so I must be misremembering.
Regardless, this approach seems likely to be far better...
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use
2015-01-07 12:00 ` Ian Campbell
@ 2015-01-07 12:06 ` Ian Campbell
2015-01-07 13:07 ` David Vrabel
1 sibling, 0 replies; 34+ messages in thread
From: Ian Campbell @ 2015-01-07 12:06 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Boris Ostrovsky, Jenny Herbert, Stefano Stabellini
On Wed, 2015-01-07 at 12:00 +0000, Ian Campbell wrote:
> On Tue, 2015-01-06 at 18:57 +0000, David Vrabel wrote:
> > From: Jenny Herbert <jennifer.herbert@citrix.com>
> >
> > Introduce gnttab_unmap_refs_async() that can be used to safely unmap
> > pages that may be in use (ref count > 1). If the pages are in use the
> > unmap is deferred and retried later. This polling is not very clever
> > but it should be good enough if the cases where the delay is necessary
> > are rare.
> >
> > This is needed to allow block backends using grant mapping to safely
> > use network storage (block or filesystem based such as iSCSI or NFS).
> >
> > The network storage driver may complete a block request whilst there
> > is a queued network packet retry (because the ack from the remote end
> > races with deciding to queue the retry). The pages for the retried
> > packet would be grant unmapped and the network driver (or hardware)
> > would access the unmapped page.
>
> I thought this had been solved a little while ago by mapping a scratch
> page on unmap even for kernel space grant mappings, but both the design
> doc and here imply not (i.e. the scratch is for user grant mappings
> only), so I must be misremembering.
>
> Regardless, this approach seems likely to be far better...
None of the following patches switch netback to use it though, I think
because the use of SKBTX_DEV_ZEROCOPY makes it unnecessary. But perhaps
this lazy unmap scheme would be a better fix than SKB zerocopy for that
case too. The current stuff has the downside of always copying skbs from
guests destined for dom0 itself (rather than just passing through on
there way to a pysical NIC or another VIF), IIRC.
Ian.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 11/12] xen/gntdev: mark userspace PTEs as special on x86 PV guests
2015-01-06 18:57 ` [PATCH 11/12] xen/gntdev: mark userspace PTEs as special on x86 PV guests David Vrabel
@ 2015-01-07 12:11 ` Ian Campbell
2015-01-07 13:23 ` David Vrabel
0 siblings, 1 reply; 34+ messages in thread
From: Ian Campbell @ 2015-01-07 12:11 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Boris Ostrovsky, Jenny Herbert, Stefano Stabellini
On Tue, 2015-01-06 at 18:57 +0000, David Vrabel wrote:
> In an x86 PV guest, get_user_pages_fast() on a userspace address range
> containing foreign mappings does not work correctly because the M2P
> lookup of the MFN from a userspace PTE may return the wrong page.
>
> Force get_user_pages_fast() to fail on such addresses by marking the PTEs
> as special.
>
> If Xen has XENFEAT_gnttab_map_avail_bits (available since at least
> 4.0),
http://wiki.xenproject.org/wiki/Xen_Kernel_Feature_Matrix says the dom0
pvpops already requires >= 4.0 too, which matches my recollection
(something to do with a new APIC interface which upstream insisted on
during upstreaming, IIRC), but both could be out of date...
Ian.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use
2015-01-07 12:00 ` Ian Campbell
2015-01-07 12:06 ` Ian Campbell
@ 2015-01-07 13:07 ` David Vrabel
2015-01-07 13:24 ` Ian Campbell
1 sibling, 1 reply; 34+ messages in thread
From: David Vrabel @ 2015-01-07 13:07 UTC (permalink / raw)
To: Ian Campbell, David Vrabel
Cc: xen-devel, Boris Ostrovsky, Jenny Herbert, Stefano Stabellini
On 07/01/15 12:00, Ian Campbell wrote:
> On Tue, 2015-01-06 at 18:57 +0000, David Vrabel wrote:
>> From: Jenny Herbert <jennifer.herbert@citrix.com>
>>
>> Introduce gnttab_unmap_refs_async() that can be used to safely unmap
>> pages that may be in use (ref count > 1). If the pages are in use the
>> unmap is deferred and retried later. This polling is not very clever
>> but it should be good enough if the cases where the delay is necessary
>> are rare.
>>
>> This is needed to allow block backends using grant mapping to safely
>> use network storage (block or filesystem based such as iSCSI or NFS).
>>
>> The network storage driver may complete a block request whilst there
>> is a queued network packet retry (because the ack from the remote end
>> races with deciding to queue the retry). The pages for the retried
>> packet would be grant unmapped and the network driver (or hardware)
>> would access the unmapped page.
>
> I thought this had been solved a little while ago by mapping a scratch
> page on unmap even for kernel space grant mappings, but both the design
> doc and here imply not (i.e. the scratch is for user grant mappings
> only), so I must be misremembering.
It was only for user grant mappings and it did not fix the case where
the page being unmapped was currently dma mapped. This could have
resulted in the NIC transmitting sensitive data.
e.g.,
1. iscsi queues a retransmit with page P (frame F).
2. NIC driver DMA maps and queues frame F on h/w.
3. iscsi completes the I/O.
4. page P is unmapped.
5. response is sent to guest
6. guest reuses frame F.
7. NIC transmits frame F.
We don't use this safe unmap mechanism for netback because the zero copy
stuff means we don't need it and the polling on the unmap is high
latency and only good enough if the polling is needed very rarely.
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 11/12] xen/gntdev: mark userspace PTEs as special on x86 PV guests
2015-01-07 12:11 ` Ian Campbell
@ 2015-01-07 13:23 ` David Vrabel
2015-01-07 13:32 ` Ian Campbell
0 siblings, 1 reply; 34+ messages in thread
From: David Vrabel @ 2015-01-07 13:23 UTC (permalink / raw)
To: Ian Campbell, David Vrabel
Cc: xen-devel, Boris Ostrovsky, Jenny Herbert, Stefano Stabellini
On 07/01/15 12:11, Ian Campbell wrote:
> On Tue, 2015-01-06 at 18:57 +0000, David Vrabel wrote:
>> In an x86 PV guest, get_user_pages_fast() on a userspace address range
>> containing foreign mappings does not work correctly because the M2P
>> lookup of the MFN from a userspace PTE may return the wrong page.
>>
>> Force get_user_pages_fast() to fail on such addresses by marking the PTEs
>> as special.
>>
>> If Xen has XENFEAT_gnttab_map_avail_bits (available since at least
>> 4.0),
>
> http://wiki.xenproject.org/wiki/Xen_Kernel_Feature_Matrix says the dom0
> pvpops already requires >= 4.0 too, which matches my recollection
> (something to do with a new APIC interface which upstream insisted on
> during upstreaming, IIRC), but both could be out of date...
gntdev is usable by driver domains and useful for inter-domain comms so
it isn't limited to dom0 use only and Linux still needs to run on Xen
3.2 (I think that's the oldest still available on AWS).
Because of the m2p override limitation, gntdev is currently unsafe[1] to
use by untrusted userspace apps so there's no (new) security issues here.
However, we could disable gntdev if this feature is messing unless
overridden by a module option. Opinions on this?
David
[1] mapping a ref twice or a two refs for the same frame can corrupt
kernel state is various exciting ways because of messed up page ref counts.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use
2015-01-07 13:07 ` David Vrabel
@ 2015-01-07 13:24 ` Ian Campbell
2015-01-07 13:30 ` David Vrabel
0 siblings, 1 reply; 34+ messages in thread
From: Ian Campbell @ 2015-01-07 13:24 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Boris Ostrovsky, Jenny Herbert, Stefano Stabellini
On Wed, 2015-01-07 at 13:07 +0000, David Vrabel wrote:
> On 07/01/15 12:00, Ian Campbell wrote:
> > On Tue, 2015-01-06 at 18:57 +0000, David Vrabel wrote:
> >> From: Jenny Herbert <jennifer.herbert@citrix.com>
> >>
> >> Introduce gnttab_unmap_refs_async() that can be used to safely unmap
> >> pages that may be in use (ref count > 1). If the pages are in use the
> >> unmap is deferred and retried later. This polling is not very clever
> >> but it should be good enough if the cases where the delay is necessary
> >> are rare.
> >>
> >> This is needed to allow block backends using grant mapping to safely
> >> use network storage (block or filesystem based such as iSCSI or NFS).
> >>
> >> The network storage driver may complete a block request whilst there
> >> is a queued network packet retry (because the ack from the remote end
> >> races with deciding to queue the retry). The pages for the retried
> >> packet would be grant unmapped and the network driver (or hardware)
> >> would access the unmapped page.
> >
> > I thought this had been solved a little while ago by mapping a scratch
> > page on unmap even for kernel space grant mappings, but both the design
> > doc and here imply not (i.e. the scratch is for user grant mappings
> > only), so I must be misremembering.
>
> It was only for user grant mappings and it did not fix the case where
> the page being unmapped was currently dma mapped. This could have
> resulted in the NIC transmitting sensitive data.
>
> e.g.,
>
> 1. iscsi queues a retransmit with page P (frame F).
> 2. NIC driver DMA maps and queues frame F on h/w.
> 3. iscsi completes the I/O.
> 4. page P is unmapped.
> 5. response is sent to guest
> 6. guest reuses frame F.
> 7. NIC transmits frame F.
You don't actually need Xen to expose this sort of thing, any userspace
which reuses a buffer after write() (to e.g. NFS has completed) might
expose the new data on the wire in a retransmit.
It's certainly much easier to expose with Xen, I'll grant (no pun
intended) you that.
> We don't use this safe unmap mechanism for netback because the zero copy
> stuff means we don't need it and the polling on the unmap is high
> latency and only good enough if the polling is needed very rarely.
I'd think it should be rare for netback too unless your network is made
of wet string.
Ian.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use
2015-01-07 13:24 ` Ian Campbell
@ 2015-01-07 13:30 ` David Vrabel
2015-01-07 13:33 ` Ian Campbell
0 siblings, 1 reply; 34+ messages in thread
From: David Vrabel @ 2015-01-07 13:30 UTC (permalink / raw)
To: Ian Campbell
Cc: xen-devel, Boris Ostrovsky, Jenny Herbert, Stefano Stabellini
On 07/01/15 13:24, Ian Campbell wrote:
> On Wed, 2015-01-07 at 13:07 +0000, David Vrabel wrote:
>
>> We don't use this safe unmap mechanism for netback because the zero copy
>> stuff means we don't need it and the polling on the unmap is high
>> latency and only good enough if the polling is needed very rarely.
>
> I'd think it should be rare for netback too unless your network is made
> of wet string.
Hmmm. Yes, this might be worth looking at but I would prefer to do it
to it as a follow up to this series.
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 11/12] xen/gntdev: mark userspace PTEs as special on x86 PV guests
2015-01-07 13:23 ` David Vrabel
@ 2015-01-07 13:32 ` Ian Campbell
0 siblings, 0 replies; 34+ messages in thread
From: Ian Campbell @ 2015-01-07 13:32 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Boris Ostrovsky, Jenny Herbert, Stefano Stabellini
On Wed, 2015-01-07 at 13:23 +0000, David Vrabel wrote:
> On 07/01/15 12:11, Ian Campbell wrote:
> > On Tue, 2015-01-06 at 18:57 +0000, David Vrabel wrote:
> >> In an x86 PV guest, get_user_pages_fast() on a userspace address range
> >> containing foreign mappings does not work correctly because the M2P
> >> lookup of the MFN from a userspace PTE may return the wrong page.
> >>
> >> Force get_user_pages_fast() to fail on such addresses by marking the PTEs
> >> as special.
> >>
> >> If Xen has XENFEAT_gnttab_map_avail_bits (available since at least
> >> 4.0),
> >
> > http://wiki.xenproject.org/wiki/Xen_Kernel_Feature_Matrix says the dom0
> > pvpops already requires >= 4.0 too, which matches my recollection
> > (something to do with a new APIC interface which upstream insisted on
> > during upstreaming, IIRC), but both could be out of date...
>
> gntdev is usable by driver domains and useful for inter-domain comms so
> it isn't limited to dom0 use only and Linux still needs to run on Xen
> 3.2 (I think that's the oldest still available on AWS).
Ah yes, driver domains...
> Because of the m2p override limitation, gntdev is currently unsafe[1] to
> use by untrusted userspace apps so there's no (new) security issues here.
>
> However, we could disable gntdev if this feature is messing unless
> overridden by a module option. Opinions on this?
If it is exploitable by untrusted apps in the new form (the race between
mmap and the pte update still is, right?), then that might be best, or
only allow root to open it?
>
> David
>
> [1] mapping a ref twice or a two refs for the same frame can corrupt
> kernel state is various exciting ways because of messed up page ref counts.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use
2015-01-07 13:30 ` David Vrabel
@ 2015-01-07 13:33 ` Ian Campbell
0 siblings, 0 replies; 34+ messages in thread
From: Ian Campbell @ 2015-01-07 13:33 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Boris Ostrovsky, Jenny Herbert, Stefano Stabellini
On Wed, 2015-01-07 at 13:30 +0000, David Vrabel wrote:
> On 07/01/15 13:24, Ian Campbell wrote:
> > On Wed, 2015-01-07 at 13:07 +0000, David Vrabel wrote:
> >
> >> We don't use this safe unmap mechanism for netback because the zero copy
> >> stuff means we don't need it and the polling on the unmap is high
> >> latency and only good enough if the polling is needed very rarely.
> >
> > I'd think it should be rare for netback too unless your network is made
> > of wet string.
>
> Hmmm. Yes, this might be worth looking at but I would prefer to do it
> to it as a follow up to this series.
Sure.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 02/12] mm: add 'foreign' alias for the 'pinned' page flag
2015-01-06 18:57 ` [PATCH 02/12] mm: add 'foreign' alias for the 'pinned' page flag David Vrabel
@ 2015-01-07 17:12 ` Konrad Rzeszutek Wilk
0 siblings, 0 replies; 34+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-01-07 17:12 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Boris Ostrovsky, Jenny Herbert, Stefano Stabellini
On Tue, Jan 06, 2015 at 06:57:27PM +0000, David Vrabel wrote:
> From: Jenny Herbert <jennifer.herbert@citrix.com>
>
> The foreign page flag will be used by Xen guests to mark pages that
> have grant mappings of frames from other (foreign) guests.
>
> The foreign flag is an aliases the existing (Xen-specific) pinned
is an alias for the ..
> flag. This is safe because pinned is only used on pages used for page
> tables and these cannot also be foreign.
>
> Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
> include/linux/page-flags.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index e1f5fcd..7734cc8 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -123,6 +123,7 @@ enum pageflags {
> /* XEN */
> PG_pinned = PG_owner_priv_1,
> PG_savepinned = PG_dirty,
> + PG_foreign = PG_owner_priv_1,
>
> /* SLOB */
> PG_slob_free = PG_private,
> @@ -215,6 +216,7 @@ __PAGEFLAG(Slab, slab)
> PAGEFLAG(Checked, checked) /* Used by some filesystems */
> PAGEFLAG(Pinned, pinned) TESTSCFLAG(Pinned, pinned) /* Xen */
> PAGEFLAG(SavePinned, savepinned); /* Xen */
> +PAGEFLAG(Foreign, foreign); /* Xen */
> PAGEFLAG(Reserved, reserved) __CLEARPAGEFLAG(Reserved, reserved)
> PAGEFLAG(SwapBacked, swapbacked) __CLEARPAGEFLAG(SwapBacked, swapbacked)
> __SETPAGEFLAG(SwapBacked, swapbacked)
> --
> 1.7.10.4
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 12/12] xen/gntdev: provide a set of pages for each VMA
2015-01-06 18:57 ` [PATCH 12/12] xen/gntdev: provide a set of pages for each VMA David Vrabel
@ 2015-01-09 15:55 ` Stefano Stabellini
2015-01-09 16:05 ` Stefano Stabellini
0 siblings, 1 reply; 34+ messages in thread
From: Stefano Stabellini @ 2015-01-09 15:55 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Boris Ostrovsky, Stefano Stabellini, Jenny Herbert
On Tue, 6 Jan 2015, David Vrabel wrote:
> From: Jenny Herbert <jennifer.herbert@citrix.com>
>
> For each VMA for grant maps, provide the correct array of pages so
> get_user_pages() on foreign mappings works in PV guests.
>
> Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
> drivers/xen/gntdev.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index 5d73fe8..09e7863 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -870,6 +870,8 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
> vma->vm_end - vma->vm_start,
> set_grant_ptes_as_special, NULL);
> }
> +
> + vma->pages = map->pages;
Shouldn't this be done just for x86 PV guests rather than all the
callers of gntdev_mmap?
> }
>
> return 0;
> --
> 1.7.10.4
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 06/12] xen: mark grant mapped pages as foreign
2015-01-06 18:57 ` [PATCH 06/12] xen: mark grant mapped pages as foreign David Vrabel
2015-01-07 11:53 ` Ian Campbell
@ 2015-01-09 16:03 ` Stefano Stabellini
2015-01-09 16:19 ` Ian Campbell
1 sibling, 1 reply; 34+ messages in thread
From: Stefano Stabellini @ 2015-01-09 16:03 UTC (permalink / raw)
To: David Vrabel
Cc: Jenny Herbert, Stefano Stabellini, Jenny Herbert, xen-devel,
Boris Ostrovsky
On Tue, 6 Jan 2015, David Vrabel wrote:
> From: Jenny Herbert <jennifer.herbert@citrix.com>
>
> Use the "foreign" page flag to mark pages that have a grant map. Use
> page->private to store information of the grant (the granting domain
> and the grant reference).
>
> Signed-off-by: Jenny Herbert <jenny.herbert@citrix.com>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
> arch/x86/xen/p2m.c | 50 ++++++++++++++++++++++++++++++++++++++-------
> include/xen/grant_table.h | 13 ++++++++++++
> 2 files changed, 56 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> index 0d70814..22624a3 100644
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -648,6 +648,43 @@ bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
> return true;
> }
>
> +static int
> +init_page_grant_ref(struct page *p, domid_t domid, grant_ref_t grantref)
> +{
> +#ifdef CONFIG_X86_64
> + uint64_t gref;
> + uint64_t* gref_p = &gref;
> +#else
> + uint64_t* gref_p = kmalloc(sizeof(uint64_t), GFP_KERNEL);
> + if (!gref)
> + return -ENOMEM;
> + uint64_t* gref = gref_p;
> +#endif
> +
> + *gref_p = ((uint64_t) grantref << 32) | domid;
> + p->private = gref;
> +
> + WARN_ON(PagePrivate(p));
> + WARN_ON(PageForeign(p));
> + SetPagePrivate(p);
> + SetPageForeign(p);
> + return 0;
> +}
> +
> +static void
> +clear_page_grant_ref(struct page *p)
> +{
> + WARN_ON(!PagePrivate(p));
> + WARN_ON(!PageForeign(p));
> +
> +#ifndef CONFIG_X86_64
> + kfree(p->private);
> +#endif
> + p->private = 0;
> + ClearPagePrivate(p);
> + ClearPageForeign(p);
> +}
Given that get_page_grant_ref is used by netback, these functions need
to be made arch-independent, moved to an arch-independent code location
and called appropriately.
> int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
> struct gnttab_map_grant_ref *kmap_ops,
> struct page **pages, unsigned int count)
> @@ -681,11 +718,12 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
> }
> pfn = page_to_pfn(pages[i]);
>
> - WARN_ON(PagePrivate(pages[i]));
> - WARN(pfn_to_mfn(pfn) != INVALID_P2M_ENTRY, "page must be ballooned");
> + ret = init_page_grant_ref(pages[i],
> + map_ops[i].dom, map_ops[i].ref);
> + if (ret < 0)
> + goto out;
>
> - SetPagePrivate(pages[i]);
> - set_page_private(pages[i], mfn);
> + WARN(pfn_to_mfn(pfn) != INVALID_P2M_ENTRY, "page must be ballooned");
>
> if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn)))) {
> ret = -ENOMEM;
> @@ -716,9 +754,7 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
> goto out;
> }
>
> - set_page_private(pages[i], INVALID_P2M_ENTRY);
> - WARN_ON(!PagePrivate(pages[i]));
> - ClearPagePrivate(pages[i]);
> + clear_page_grant_ref(pages[i]);
> set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
> }
> if (kunmap_ops)
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index 7235d8f..381f007 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -45,6 +45,7 @@
> #include <asm/xen/hypervisor.h>
>
> #include <xen/features.h>
> +#include <linux/mm_types.h>
>
> #define GNTTAB_RESERVED_XENSTORE 1
>
> @@ -182,4 +183,16 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
> void gnttab_batch_map(struct gnttab_map_grant_ref *batch, unsigned count);
> void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count);
>
> +static inline void
> +get_page_grant_ref(struct page *p, domid_t* domid, grant_ref_t* grantref)
> +{
> +#ifdef CONFIG_X86_64
> + uint64_t gref = p->private;
> +#else
> + uint64_t gref = *p->private;
> +#endif
> + *domid = gref & 0xffff;
> + *grantref = gref >> 32;
> +}
> +
> #endif /* __ASM_GNTTAB_H__ */
> --
> 1.7.10.4
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 12/12] xen/gntdev: provide a set of pages for each VMA
2015-01-09 15:55 ` Stefano Stabellini
@ 2015-01-09 16:05 ` Stefano Stabellini
2015-01-09 16:41 ` David Vrabel
0 siblings, 1 reply; 34+ messages in thread
From: Stefano Stabellini @ 2015-01-09 16:05 UTC (permalink / raw)
To: Stefano Stabellini
Cc: xen-devel, Boris Ostrovsky, Jenny Herbert, David Vrabel
On Fri, 9 Jan 2015, Stefano Stabellini wrote:
> On Tue, 6 Jan 2015, David Vrabel wrote:
> > From: Jenny Herbert <jennifer.herbert@citrix.com>
> >
> > For each VMA for grant maps, provide the correct array of pages so
> > get_user_pages() on foreign mappings works in PV guests.
> >
> > Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
> > Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> > ---
> > drivers/xen/gntdev.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> > index 5d73fe8..09e7863 100644
> > --- a/drivers/xen/gntdev.c
> > +++ b/drivers/xen/gntdev.c
> > @@ -870,6 +870,8 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
> > vma->vm_end - vma->vm_start,
> > set_grant_ptes_as_special, NULL);
> > }
> > +
> > + vma->pages = map->pages;
>
> Shouldn't this be done just for x86 PV guests rather than all the
> callers of gntdev_mmap?
I guess that since on ARM and PVH the pte wouldn't be marked SPECIAL, it
wouldn't make a difference, but it would still be nice to be consistent.
> > }
> >
> > return 0;
> > --
> > 1.7.10.4
> >
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use
2015-01-06 18:57 ` [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use David Vrabel
2015-01-07 12:00 ` Ian Campbell
@ 2015-01-09 16:11 ` Stefano Stabellini
1 sibling, 0 replies; 34+ messages in thread
From: Stefano Stabellini @ 2015-01-09 16:11 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Boris Ostrovsky, Stefano Stabellini, Jenny Herbert
On Tue, 6 Jan 2015, David Vrabel wrote:
> From: Jenny Herbert <jennifer.herbert@citrix.com>
>
> Introduce gnttab_unmap_refs_async() that can be used to safely unmap
> pages that may be in use (ref count > 1). If the pages are in use the
> unmap is deferred and retried later. This polling is not very clever
> but it should be good enough if the cases where the delay is necessary
> are rare.
>
> This is needed to allow block backends using grant mapping to safely
> use network storage (block or filesystem based such as iSCSI or NFS).
>
> The network storage driver may complete a block request whilst there
> is a queued network packet retry (because the ack from the remote end
> races with deciding to queue the retry). The pages for the retried
> packet would be grant unmapped and the network driver (or hardware)
> would access the unmapped page.
>
> Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
> drivers/xen/grant-table.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> include/xen/grant_table.h | 18 ++++++++++++++++++
> 2 files changed, 62 insertions(+)
>
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index 999d7ab..dc0edfa 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -42,6 +42,7 @@
> #include <linux/io.h>
> #include <linux/delay.h>
> #include <linux/hardirq.h>
> +#include <linux/workqueue.h>
>
> #include <xen/xen.h>
> #include <xen/interface/xen.h>
> @@ -751,6 +752,49 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
> }
> EXPORT_SYMBOL_GPL(gnttab_unmap_refs);
>
> +#define GNTTAB_UNMAP_REFS_DELAY 5
> +
> +static void __gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item);
> +
> +static void gnttab_unmap_work(struct work_struct *work)
> +{
> + struct gntab_unmap_queue_data
> + *unmap_data = container_of(work,
> + struct gntab_unmap_queue_data,
> + gnttab_work.work);
> + if (unmap_data->age!=UINT_MAX)
code style
> + unmap_data->age++;
> + __gnttab_unmap_refs_async(unmap_data);
> +}
> +
> +static void __gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item)
> +{
> + int ret;
> + int pc;
> +
> + for (pc = 0; pc < item->count; pc++) {
> + if (page_count(item->pages[pc]) > 1) {
> + unsigned long delay = GNTTAB_UNMAP_REFS_DELAY * (item->age + 1);
I am curious: why are you increasing the delay at each iteration
instead of keeping it constant?
> + schedule_delayed_work(&item->gnttab_work,
> + msecs_to_jiffies(delay));
> + return;
> + }
> + }
> +
> + ret = gnttab_unmap_refs(item->unmap_ops, item->kunmap_ops,
> + item->pages, item->count);
> + item->done(ret, item);
> +}
> +
> +void gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item)
> +{
> + INIT_DELAYED_WORK(&item->gnttab_work, gnttab_unmap_work);
> + item->age = 0;
> +
> + __gnttab_unmap_refs_async(item);
> +}
> +EXPORT_SYMBOL_GPL(gnttab_unmap_refs_async);
> +
> static int gnttab_map_frames_v1(xen_pfn_t *frames, unsigned int nr_gframes)
> {
> int rc;
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index 381f007..f7c4821 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -59,6 +59,22 @@ struct gnttab_free_callback {
> u16 count;
> };
>
> +struct gntab_unmap_queue_data;
> +
> +typedef void (*gnttab_unmap_refs_done)(int result, struct gntab_unmap_queue_data *data);
> +
> +struct gntab_unmap_queue_data
> +{
> + struct delayed_work gnttab_work;
> + void *data;
> + gnttab_unmap_refs_done done;
> + struct gnttab_unmap_grant_ref *unmap_ops;
> + struct gnttab_unmap_grant_ref *kunmap_ops;
> + struct page **pages;
> + unsigned int count;
> + unsigned int age;
> +};
> +
> int gnttab_init(void);
> int gnttab_suspend(void);
> int gnttab_resume(void);
> @@ -170,6 +186,8 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
> int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
> struct gnttab_unmap_grant_ref *kunmap_ops,
> struct page **pages, unsigned int count);
> +void gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item);
> +
>
> /* Perform a batch of grant map/copy operations. Retry every batch slot
> * for which the hypervisor returns GNTST_eagain. This is typically due
> --
> 1.7.10.4
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 06/12] xen: mark grant mapped pages as foreign
2015-01-09 16:03 ` Stefano Stabellini
@ 2015-01-09 16:19 ` Ian Campbell
2015-01-09 16:39 ` David Vrabel
0 siblings, 1 reply; 34+ messages in thread
From: Ian Campbell @ 2015-01-09 16:19 UTC (permalink / raw)
To: Stefano Stabellini
Cc: Boris Ostrovsky, xen-devel, Jenny Herbert, Jenny Herbert,
David Vrabel
On Fri, 2015-01-09 at 16:03 +0000, Stefano Stabellini wrote:
> On Tue, 6 Jan 2015, David Vrabel wrote:
> > From: Jenny Herbert <jennifer.herbert@citrix.com>
> >
> > Use the "foreign" page flag to mark pages that have a grant map. Use
> > page->private to store information of the grant (the granting domain
> > and the grant reference).
> >
> > Signed-off-by: Jenny Herbert <jenny.herbert@citrix.com>
> > Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> > ---
> > arch/x86/xen/p2m.c | 50 ++++++++++++++++++++++++++++++++++++++-------
> > include/xen/grant_table.h | 13 ++++++++++++
> > 2 files changed, 56 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> > index 0d70814..22624a3 100644
> > --- a/arch/x86/xen/p2m.c
> > +++ b/arch/x86/xen/p2m.c
> > @@ -648,6 +648,43 @@ bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
> > return true;
> > }
> >
> > +static int
> > +init_page_grant_ref(struct page *p, domid_t domid, grant_ref_t grantref)
> > +{
> > +#ifdef CONFIG_X86_64
> > + uint64_t gref;
> > + uint64_t* gref_p = &gref;
> > +#else
> > + uint64_t* gref_p = kmalloc(sizeof(uint64_t), GFP_KERNEL);
> > + if (!gref)
> > + return -ENOMEM;
> > + uint64_t* gref = gref_p;
> > +#endif
> > +
> > + *gref_p = ((uint64_t) grantref << 32) | domid;
> > + p->private = gref;
> > +
> > + WARN_ON(PagePrivate(p));
> > + WARN_ON(PageForeign(p));
> > + SetPagePrivate(p);
> > + SetPageForeign(p);
> > + return 0;
> > +}
> > +
> > +static void
> > +clear_page_grant_ref(struct page *p)
> > +{
> > + WARN_ON(!PagePrivate(p));
> > + WARN_ON(!PageForeign(p));
> > +
> > +#ifndef CONFIG_X86_64
> > + kfree(p->private);
> > +#endif
> > + p->private = 0;
> > + ClearPagePrivate(p);
> > + ClearPageForeign(p);
> > +}
>
> Given that get_page_grant_ref is used by netback, these functions need
> to be made arch-independent, moved to an arch-independent code location
> and called appropriately.
... or stubbed out for arches which don't need this (which might include
arm*?).
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 06/12] xen: mark grant mapped pages as foreign
2015-01-09 16:19 ` Ian Campbell
@ 2015-01-09 16:39 ` David Vrabel
2015-01-09 16:46 ` Stefano Stabellini
2015-01-09 16:47 ` Ian Campbell
0 siblings, 2 replies; 34+ messages in thread
From: David Vrabel @ 2015-01-09 16:39 UTC (permalink / raw)
To: Ian Campbell, Stefano Stabellini
Cc: xen-devel, Jenny Herbert, Jenny Herbert, Boris Ostrovsky
On 09/01/15 16:19, Ian Campbell wrote:
> On Fri, 2015-01-09 at 16:03 +0000, Stefano Stabellini wrote:
>> On Tue, 6 Jan 2015, David Vrabel wrote:
>>> From: Jenny Herbert <jennifer.herbert@citrix.com>
>>>
>>> Use the "foreign" page flag to mark pages that have a grant map. Use
>>> page->private to store information of the grant (the granting domain
>>> and the grant reference).
>>>
>>> Signed-off-by: Jenny Herbert <jenny.herbert@citrix.com>
>>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>>> ---
>>> arch/x86/xen/p2m.c | 50 ++++++++++++++++++++++++++++++++++++++-------
>>> include/xen/grant_table.h | 13 ++++++++++++
>>> 2 files changed, 56 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
>>> index 0d70814..22624a3 100644
>>> --- a/arch/x86/xen/p2m.c
>>> +++ b/arch/x86/xen/p2m.c
>>> @@ -648,6 +648,43 @@ bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
>>> return true;
>>> }
>>>
>>> +static int
>>> +init_page_grant_ref(struct page *p, domid_t domid, grant_ref_t grantref)
>>> +{
>>> +#ifdef CONFIG_X86_64
>>> + uint64_t gref;
>>> + uint64_t* gref_p = &gref;
>>> +#else
>>> + uint64_t* gref_p = kmalloc(sizeof(uint64_t), GFP_KERNEL);
>>> + if (!gref)
>>> + return -ENOMEM;
>>> + uint64_t* gref = gref_p;
>>> +#endif
>>> +
>>> + *gref_p = ((uint64_t) grantref << 32) | domid;
>>> + p->private = gref;
>>> +
>>> + WARN_ON(PagePrivate(p));
>>> + WARN_ON(PageForeign(p));
>>> + SetPagePrivate(p);
>>> + SetPageForeign(p);
>>> + return 0;
>>> +}
>>> +
>>> +static void
>>> +clear_page_grant_ref(struct page *p)
>>> +{
>>> + WARN_ON(!PagePrivate(p));
>>> + WARN_ON(!PageForeign(p));
>>> +
>>> +#ifndef CONFIG_X86_64
>>> + kfree(p->private);
>>> +#endif
>>> + p->private = 0;
>>> + ClearPagePrivate(p);
>>> + ClearPageForeign(p);
>>> +}
>>
>> Given that get_page_grant_ref is used by netback, these functions need
>> to be made arch-independent, moved to an arch-independent code location
>> and called appropriately.
I've fixed this already.
> ... or stubbed out for arches which don't need this (which might include
> arm*?).
I'm reasonably certain that this is required for HVM and ARM guests as
well. The grant copy will still fail to get the page by gfn since the
mfn is owned by a different domain.
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 12/12] xen/gntdev: provide a set of pages for each VMA
2015-01-09 16:05 ` Stefano Stabellini
@ 2015-01-09 16:41 ` David Vrabel
0 siblings, 0 replies; 34+ messages in thread
From: David Vrabel @ 2015-01-09 16:41 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: xen-devel, Boris Ostrovsky, Jenny Herbert
On 09/01/15 16:05, Stefano Stabellini wrote:
> On Fri, 9 Jan 2015, Stefano Stabellini wrote:
>> On Tue, 6 Jan 2015, David Vrabel wrote:
>>> From: Jenny Herbert <jennifer.herbert@citrix.com>
>>>
>>> For each VMA for grant maps, provide the correct array of pages so
>>> get_user_pages() on foreign mappings works in PV guests.
>>>
>>> Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
>>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>>> ---
>>> drivers/xen/gntdev.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>>> index 5d73fe8..09e7863 100644
>>> --- a/drivers/xen/gntdev.c
>>> +++ b/drivers/xen/gntdev.c
>>> @@ -870,6 +870,8 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>>> vma->vm_end - vma->vm_start,
>>> set_grant_ptes_as_special, NULL);
>>> }
>>> +
>>> + vma->pages = map->pages;
>>
>> Shouldn't this be done just for x86 PV guests rather than all the
>> callers of gntdev_mmap?
>
> I guess that since on ARM and PVH the pte wouldn't be marked SPECIAL, it
> wouldn't make a difference, but it would still be nice to be consistent.
This is already PV only.
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 06/12] xen: mark grant mapped pages as foreign
2015-01-09 16:39 ` David Vrabel
@ 2015-01-09 16:46 ` Stefano Stabellini
2015-01-09 16:47 ` Ian Campbell
1 sibling, 0 replies; 34+ messages in thread
From: Stefano Stabellini @ 2015-01-09 16:46 UTC (permalink / raw)
To: David Vrabel
Cc: Jenny Herbert, Ian Campbell, Stefano Stabellini, Jenny Herbert,
xen-devel, Boris Ostrovsky
On Fri, 9 Jan 2015, David Vrabel wrote:
> On 09/01/15 16:19, Ian Campbell wrote:
> > On Fri, 2015-01-09 at 16:03 +0000, Stefano Stabellini wrote:
> >> On Tue, 6 Jan 2015, David Vrabel wrote:
> >>> From: Jenny Herbert <jennifer.herbert@citrix.com>
> >>>
> >>> Use the "foreign" page flag to mark pages that have a grant map. Use
> >>> page->private to store information of the grant (the granting domain
> >>> and the grant reference).
> >>>
> >>> Signed-off-by: Jenny Herbert <jenny.herbert@citrix.com>
> >>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> >>> ---
> >>> arch/x86/xen/p2m.c | 50 ++++++++++++++++++++++++++++++++++++++-------
> >>> include/xen/grant_table.h | 13 ++++++++++++
> >>> 2 files changed, 56 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> >>> index 0d70814..22624a3 100644
> >>> --- a/arch/x86/xen/p2m.c
> >>> +++ b/arch/x86/xen/p2m.c
> >>> @@ -648,6 +648,43 @@ bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
> >>> return true;
> >>> }
> >>>
> >>> +static int
> >>> +init_page_grant_ref(struct page *p, domid_t domid, grant_ref_t grantref)
> >>> +{
> >>> +#ifdef CONFIG_X86_64
> >>> + uint64_t gref;
> >>> + uint64_t* gref_p = &gref;
> >>> +#else
> >>> + uint64_t* gref_p = kmalloc(sizeof(uint64_t), GFP_KERNEL);
> >>> + if (!gref)
> >>> + return -ENOMEM;
> >>> + uint64_t* gref = gref_p;
> >>> +#endif
> >>> +
> >>> + *gref_p = ((uint64_t) grantref << 32) | domid;
> >>> + p->private = gref;
> >>> +
> >>> + WARN_ON(PagePrivate(p));
> >>> + WARN_ON(PageForeign(p));
> >>> + SetPagePrivate(p);
> >>> + SetPageForeign(p);
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static void
> >>> +clear_page_grant_ref(struct page *p)
> >>> +{
> >>> + WARN_ON(!PagePrivate(p));
> >>> + WARN_ON(!PageForeign(p));
> >>> +
> >>> +#ifndef CONFIG_X86_64
> >>> + kfree(p->private);
> >>> +#endif
> >>> + p->private = 0;
> >>> + ClearPagePrivate(p);
> >>> + ClearPageForeign(p);
> >>> +}
> >>
> >> Given that get_page_grant_ref is used by netback, these functions need
> >> to be made arch-independent, moved to an arch-independent code location
> >> and called appropriately.
>
> I've fixed this already.
>
> > ... or stubbed out for arches which don't need this (which might include
> > arm*?).
>
> I'm reasonably certain that this is required for HVM and ARM guests as
> well. The grant copy will still fail to get the page by gfn since the
> mfn is owned by a different domain.
I agree
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 06/12] xen: mark grant mapped pages as foreign
2015-01-09 16:39 ` David Vrabel
2015-01-09 16:46 ` Stefano Stabellini
@ 2015-01-09 16:47 ` Ian Campbell
1 sibling, 0 replies; 34+ messages in thread
From: Ian Campbell @ 2015-01-09 16:47 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Jenny Herbert, Jenny Herbert, Boris Ostrovsky,
Stefano Stabellini
On Fri, 2015-01-09 at 16:39 +0000, David Vrabel wrote:
> > ... or stubbed out for arches which don't need this (which might include
> > arm*?).
>
> I'm reasonably certain that this is required for HVM and ARM guests as
> well. The grant copy will still fail to get the page by gfn since the
> mfn is owned by a different domain.
Yes, I'd somehow convinced myself this snippet was to do with the
m2p_override aspect of the series, which it's not.
Ian.
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2015-01-09 16:49 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-06 18:57 [RFC PATCH 00/12] xen: fix many long-standing grant mapping bugs David Vrabel
2015-01-06 18:57 ` [PATCH 01/12] mm: allow for an alternate set of pages for userspace mappings David Vrabel
2015-01-06 18:57 ` [PATCH 02/12] mm: add 'foreign' alias for the 'pinned' page flag David Vrabel
2015-01-07 17:12 ` Konrad Rzeszutek Wilk
2015-01-06 18:57 ` [PATCH 03/12] xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs() David Vrabel
2015-01-06 18:57 ` [PATCH 04/12] xen: remove scratch frames for ballooned pages and m2p override David Vrabel
2015-01-06 18:57 ` [PATCH 05/12] x86/xen: require ballooned pages for grant maps David Vrabel
2015-01-06 18:57 ` [PATCH 06/12] xen: mark grant mapped pages as foreign David Vrabel
2015-01-07 11:53 ` Ian Campbell
2015-01-09 16:03 ` Stefano Stabellini
2015-01-09 16:19 ` Ian Campbell
2015-01-09 16:39 ` David Vrabel
2015-01-09 16:46 ` Stefano Stabellini
2015-01-09 16:47 ` Ian Campbell
2015-01-06 18:57 ` [PATCH 07/12] xen-netback: use foreign page information from the pages themselves David Vrabel
2015-01-07 11:57 ` Ian Campbell
2015-01-06 18:57 ` [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use David Vrabel
2015-01-07 12:00 ` Ian Campbell
2015-01-07 12:06 ` Ian Campbell
2015-01-07 13:07 ` David Vrabel
2015-01-07 13:24 ` Ian Campbell
2015-01-07 13:30 ` David Vrabel
2015-01-07 13:33 ` Ian Campbell
2015-01-09 16:11 ` Stefano Stabellini
2015-01-06 18:57 ` [PATCH 09/12] xen/gntdev: safely unmap grants in case they are still " David Vrabel
2015-01-06 18:57 ` [PATCH 10/12] xen-blkback: " David Vrabel
2015-01-06 18:57 ` [PATCH 11/12] xen/gntdev: mark userspace PTEs as special on x86 PV guests David Vrabel
2015-01-07 12:11 ` Ian Campbell
2015-01-07 13:23 ` David Vrabel
2015-01-07 13:32 ` Ian Campbell
2015-01-06 18:57 ` [PATCH 12/12] xen/gntdev: provide a set of pages for each VMA David Vrabel
2015-01-09 15:55 ` Stefano Stabellini
2015-01-09 16:05 ` Stefano Stabellini
2015-01-09 16:41 ` David Vrabel
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.