* [PATCH 1/6] xen-gntdev: Fix circular locking dependency
2010-12-14 14:55 [PATCH v2] Userspace grant communication Daniel De Graaf
@ 2010-12-14 14:55 ` Daniel De Graaf
2010-12-14 21:11 ` Jeremy Fitzhardinge
2010-12-14 14:55 ` [PATCH 2/6] xen-gntdev: Change page limit to be global instead of per-open Daniel De Graaf
` (4 subsequent siblings)
5 siblings, 1 reply; 32+ messages in thread
From: Daniel De Graaf @ 2010-12-14 14:55 UTC (permalink / raw)
To: xen-devel; +Cc: Daniel De Graaf, Ian.Campbell
apply_to_page_range will acquire PTE lock while priv->lock is held, and
mn_invl_range_start tries to acquire priv->lock with PTE already held.
Fix by not holding priv->lock during the entire map operation.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
drivers/xen/gntdev.c | 19 +++++++++----------
1 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index a33e443..c582804 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -581,23 +581,22 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
if (!(vma->vm_flags & VM_WRITE))
map->flags |= GNTMAP_readonly;
+ spin_unlock(&priv->lock);
+
err = apply_to_page_range(vma->vm_mm, vma->vm_start,
vma->vm_end - vma->vm_start,
find_grant_ptes, map);
- if (err) {
- goto unlock_out;
- if (debug)
- printk("%s: find_grant_ptes() failure.\n", __FUNCTION__);
- }
+ if (err)
+ return err;
err = map_grant_pages(map);
- if (err) {
- goto unlock_out;
- if (debug)
- printk("%s: map_grant_pages() failure.\n", __FUNCTION__);
- }
+ if (err)
+ return err;
+
map->is_mapped = 1;
+ return 0;
+
unlock_out:
spin_unlock(&priv->lock);
return err;
--
1.7.2.3
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH 1/6] xen-gntdev: Fix circular locking dependency
2010-12-14 14:55 ` [PATCH 1/6] xen-gntdev: Fix circular locking dependency Daniel De Graaf
@ 2010-12-14 21:11 ` Jeremy Fitzhardinge
2010-12-14 21:40 ` Daniel De Graaf
0 siblings, 1 reply; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-14 21:11 UTC (permalink / raw)
To: Daniel De Graaf; +Cc: xen-devel, Ian.Campbell
On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
> apply_to_page_range will acquire PTE lock while priv->lock is held, and
> mn_invl_range_start tries to acquire priv->lock with PTE already held.
> Fix by not holding priv->lock during the entire map operation.
Is priv->lock needed to protect the contents of map?
J
> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> ---
> drivers/xen/gntdev.c | 19 +++++++++----------
> 1 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index a33e443..c582804 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -581,23 +581,22 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
> if (!(vma->vm_flags & VM_WRITE))
> map->flags |= GNTMAP_readonly;
>
> + spin_unlock(&priv->lock);
> +
> err = apply_to_page_range(vma->vm_mm, vma->vm_start,
> vma->vm_end - vma->vm_start,
> find_grant_ptes, map);
> - if (err) {
> - goto unlock_out;
> - if (debug)
> - printk("%s: find_grant_ptes() failure.\n", __FUNCTION__);
> - }
> + if (err)
> + return err;
>
> err = map_grant_pages(map);
> - if (err) {
> - goto unlock_out;
> - if (debug)
> - printk("%s: map_grant_pages() failure.\n", __FUNCTION__);
> - }
> + if (err)
> + return err;
> +
> map->is_mapped = 1;
>
> + return 0;
> +
> unlock_out:
> spin_unlock(&priv->lock);
> return err;
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 1/6] xen-gntdev: Fix circular locking dependency
2010-12-14 21:11 ` Jeremy Fitzhardinge
@ 2010-12-14 21:40 ` Daniel De Graaf
2010-12-15 9:47 ` Ian Campbell
0 siblings, 1 reply; 32+ messages in thread
From: Daniel De Graaf @ 2010-12-14 21:40 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: xen-devel, Ian Campbell
On 12/14/2010 04:11 PM, Jeremy Fitzhardinge wrote:
> On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
>> apply_to_page_range will acquire PTE lock while priv->lock is held, and
>> mn_invl_range_start tries to acquire priv->lock with PTE already held.
>> Fix by not holding priv->lock during the entire map operation.
>
> Is priv->lock needed to protect the contents of map?
>
> J
No, since the map can only be mapped once (checked by map->vma assignment
while the lock is held). The unmap ioctl will return -EBUSY until
an munmap() is called on the area, so it also can't race, and the other
users are only active once the mmap operation completes.
>
>> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
>> ---
>> drivers/xen/gntdev.c | 19 +++++++++----------
>> 1 files changed, 9 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>> index a33e443..c582804 100644
>> --- a/drivers/xen/gntdev.c
>> +++ b/drivers/xen/gntdev.c
>> @@ -581,23 +581,22 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>> if (!(vma->vm_flags & VM_WRITE))
>> map->flags |= GNTMAP_readonly;
>>
>> + spin_unlock(&priv->lock);
>> +
>> err = apply_to_page_range(vma->vm_mm, vma->vm_start,
>> vma->vm_end - vma->vm_start,
>> find_grant_ptes, map);
>> - if (err) {
>> - goto unlock_out;
>> - if (debug)
>> - printk("%s: find_grant_ptes() failure.\n", __FUNCTION__);
>> - }
>> + if (err)
>> + return err;
>>
>> err = map_grant_pages(map);
>> - if (err) {
>> - goto unlock_out;
>> - if (debug)
>> - printk("%s: map_grant_pages() failure.\n", __FUNCTION__);
>> - }
>> + if (err)
>> + return err;
>> +
>> map->is_mapped = 1;
>>
>> + return 0;
>> +
>> unlock_out:
>> spin_unlock(&priv->lock);
>> return err;
>
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 1/6] xen-gntdev: Fix circular locking dependency
2010-12-14 21:40 ` Daniel De Graaf
@ 2010-12-15 9:47 ` Ian Campbell
2010-12-16 0:28 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 32+ messages in thread
From: Ian Campbell @ 2010-12-15 9:47 UTC (permalink / raw)
To: Daniel De Graaf; +Cc: Jeremy Fitzhardinge, xen-devel
On Tue, 2010-12-14 at 21:40 +0000, Daniel De Graaf wrote:
> On 12/14/2010 04:11 PM, Jeremy Fitzhardinge wrote:
> > On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
> >> apply_to_page_range will acquire PTE lock while priv->lock is held, and
> >> mn_invl_range_start tries to acquire priv->lock with PTE already held.
> >> Fix by not holding priv->lock during the entire map operation.
> >
> > Is priv->lock needed to protect the contents of map?
> >
> > J
>
> No, since the map can only be mapped once (checked by map->vma assignment
> while the lock is held). The unmap ioctl will return -EBUSY until
> an munmap() is called on the area, so it also can't race, and the other
> users are only active once the mmap operation completes.
Sounds reasonable enough to me. There are a few unlocked accesses to
vma->map:
gntdev_del_map (when called from gntdev_ioctl_map_grant_ref)
gntdev_vma_close
are these safe? If so then it would be worth a comment about why.
Anyway this patch appears to resolve the lockdep warning I was seeing
with 2.6.37 with qemu-xen backed block devices.
Ian.
>
> >
> >> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> >> ---
> >> drivers/xen/gntdev.c | 19 +++++++++----------
> >> 1 files changed, 9 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> >> index a33e443..c582804 100644
> >> --- a/drivers/xen/gntdev.c
> >> +++ b/drivers/xen/gntdev.c
> >> @@ -581,23 +581,22 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
> >> if (!(vma->vm_flags & VM_WRITE))
> >> map->flags |= GNTMAP_readonly;
> >>
> >> + spin_unlock(&priv->lock);
> >> +
> >> err = apply_to_page_range(vma->vm_mm, vma->vm_start,
> >> vma->vm_end - vma->vm_start,
> >> find_grant_ptes, map);
> >> - if (err) {
> >> - goto unlock_out;
> >> - if (debug)
> >> - printk("%s: find_grant_ptes() failure.\n", __FUNCTION__);
> >> - }
> >> + if (err)
> >> + return err;
> >>
> >> err = map_grant_pages(map);
> >> - if (err) {
> >> - goto unlock_out;
> >> - if (debug)
> >> - printk("%s: map_grant_pages() failure.\n", __FUNCTION__);
> >> - }
> >> + if (err)
> >> + return err;
> >> +
> >> map->is_mapped = 1;
> >>
> >> + return 0;
> >> +
> >> unlock_out:
> >> spin_unlock(&priv->lock);
> >> return err;
> >
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 1/6] xen-gntdev: Fix circular locking dependency
2010-12-15 9:47 ` Ian Campbell
@ 2010-12-16 0:28 ` Jeremy Fitzhardinge
2010-12-16 15:09 ` Stefano Stabellini
0 siblings, 1 reply; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-16 0:28 UTC (permalink / raw)
To: Ian Campbell; +Cc: Daniel De Graaf, xen-devel, Stefano Stabellini
On 12/15/2010 01:47 AM, Ian Campbell wrote:
> On Tue, 2010-12-14 at 21:40 +0000, Daniel De Graaf wrote:
>> On 12/14/2010 04:11 PM, Jeremy Fitzhardinge wrote:
>>> On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
>>>> apply_to_page_range will acquire PTE lock while priv->lock is held, and
>>>> mn_invl_range_start tries to acquire priv->lock with PTE already held.
>>>> Fix by not holding priv->lock during the entire map operation.
>>> Is priv->lock needed to protect the contents of map?
>>>
>>> J
>> No, since the map can only be mapped once (checked by map->vma assignment
>> while the lock is held). The unmap ioctl will return -EBUSY until
>> an munmap() is called on the area, so it also can't race, and the other
>> users are only active once the mmap operation completes.
> Sounds reasonable enough to me. There are a few unlocked accesses to
> vma->map:
> gntdev_del_map (when called from gntdev_ioctl_map_grant_ref)
> gntdev_vma_close
> are these safe? If so then it would be worth a comment about why.
>
> Anyway this patch appears to resolve the lockdep warning I was seeing
> with 2.6.37 with qemu-xen backed block devices.
Good. Stefano should stick this on his patch queue.
J
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 1/6] xen-gntdev: Fix circular locking dependency
2010-12-16 0:28 ` Jeremy Fitzhardinge
@ 2010-12-16 15:09 ` Stefano Stabellini
0 siblings, 0 replies; 32+ messages in thread
From: Stefano Stabellini @ 2010-12-16 15:09 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Ian Campbell, Daniel De Graaf, xen-devel, Stabellini
On Thu, 16 Dec 2010, Jeremy Fitzhardinge wrote:
> On 12/15/2010 01:47 AM, Ian Campbell wrote:
> > On Tue, 2010-12-14 at 21:40 +0000, Daniel De Graaf wrote:
> >> On 12/14/2010 04:11 PM, Jeremy Fitzhardinge wrote:
> >>> On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
> >>>> apply_to_page_range will acquire PTE lock while priv->lock is held, and
> >>>> mn_invl_range_start tries to acquire priv->lock with PTE already held.
> >>>> Fix by not holding priv->lock during the entire map operation.
> >>> Is priv->lock needed to protect the contents of map?
> >>>
> >>> J
> >> No, since the map can only be mapped once (checked by map->vma assignment
> >> while the lock is held). The unmap ioctl will return -EBUSY until
> >> an munmap() is called on the area, so it also can't race, and the other
> >> users are only active once the mmap operation completes.
> > Sounds reasonable enough to me. There are a few unlocked accesses to
> > vma->map:
> > gntdev_del_map (when called from gntdev_ioctl_map_grant_ref)
> > gntdev_vma_close
> > are these safe? If so then it would be worth a comment about why.
> >
> > Anyway this patch appears to resolve the lockdep warning I was seeing
> > with 2.6.37 with qemu-xen backed block devices.
>
> Good. Stefano should stick this on his patch queue.
Agreed. I'll add it in the next version.
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 2/6] xen-gntdev: Change page limit to be global instead of per-open
2010-12-14 14:55 [PATCH v2] Userspace grant communication Daniel De Graaf
2010-12-14 14:55 ` [PATCH 1/6] xen-gntdev: Fix circular locking dependency Daniel De Graaf
@ 2010-12-14 14:55 ` Daniel De Graaf
2010-12-14 21:12 ` Jeremy Fitzhardinge
2010-12-14 14:55 ` [PATCH 3/6] xen-gntdev: Remove unneeded structures from grant_map tracking data Daniel De Graaf
` (3 subsequent siblings)
5 siblings, 1 reply; 32+ messages in thread
From: Daniel De Graaf @ 2010-12-14 14:55 UTC (permalink / raw)
To: xen-devel; +Cc: Daniel De Graaf, Ian.Campbell
Because there is no limitation on how many times a user can open a
given device file, an per-file-description limit on the number of
pages granted offers little to no benefit. Change to a global limit
and remove the ioctl() as the parameter can now be changed via sysfs.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
drivers/xen/gntdev.c | 48 ++++++++++++++----------------------------------
1 files changed, 14 insertions(+), 34 deletions(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index c582804..62639af 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -44,13 +44,12 @@ MODULE_DESCRIPTION("User-space granted page access driver");
static int debug = 0;
module_param(debug, int, 0644);
-static int limit = 1024;
+static int limit = 1024*1024;
module_param(limit, int, 0644);
+static atomic_t pages_mapped = ATOMIC_INIT(0);
struct gntdev_priv {
struct list_head maps;
- uint32_t used;
- uint32_t limit;
spinlock_t lock;
struct mm_struct *mm;
struct mmu_notifier mn;
@@ -76,8 +75,8 @@ static void gntdev_print_maps(struct gntdev_priv *priv,
{
struct grant_map *map;
- printk("%s: maps list (priv %p, usage %d/%d)\n",
- __FUNCTION__, priv, priv->used, priv->limit);
+ printk("%s: maps list (priv %p)\n",
+ __FUNCTION__, priv);
list_for_each_entry(map, &priv->maps, next)
printk(" index %2d, count %2d %s\n",
map->index, map->count,
@@ -104,9 +103,6 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
add->count = count;
add->priv = priv;
- if (add->count + priv->used > priv->limit)
- goto err;
-
return add;
err:
@@ -131,7 +127,6 @@ static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
list_add_tail(&add->next, &priv->maps);
done:
- priv->used += add->count;
if (debug)
gntdev_print_maps(priv, "[new]", add->index);
}
@@ -178,7 +173,7 @@ static int gntdev_del_map(struct grant_map *map)
if (map->unmap_ops[i].handle)
return -EBUSY;
- map->priv->used -= map->count;
+ atomic_sub(map->count, &pages_mapped);
list_del(&map->next);
return 0;
}
@@ -360,7 +355,6 @@ static int gntdev_open(struct inode *inode, struct file *flip)
INIT_LIST_HEAD(&priv->maps);
spin_lock_init(&priv->lock);
- priv->limit = limit;
priv->mm = get_task_mm(current);
if (!priv->mm) {
@@ -416,19 +410,26 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
op.count);
if (unlikely(op.count <= 0))
return -EINVAL;
- if (unlikely(op.count > priv->limit))
- return -EINVAL;
err = -ENOMEM;
map = gntdev_alloc_map(priv, op.count);
if (!map)
return err;
+
if (copy_from_user(map->grants, &u->refs,
sizeof(map->grants[0]) * op.count) != 0) {
gntdev_free_map(map);
return err;
}
+ if (unlikely(atomic_add_return(op.count, &pages_mapped) > limit))
+ {
+ if (debug)
+ printk("%s: can't map: over limit\n", __FUNCTION__);
+ gntdev_free_map(map);
+ return err;
+ }
+
spin_lock(&priv->lock);
gntdev_add_map(priv, map);
op.index = map->index << PAGE_SHIFT;
@@ -495,24 +496,6 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
return 0;
}
-static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv,
- struct ioctl_gntdev_set_max_grants __user *u)
-{
- struct ioctl_gntdev_set_max_grants op;
-
- if (copy_from_user(&op, u, sizeof(op)) != 0)
- return -EFAULT;
- if (debug)
- printk("%s: priv %p, limit %d\n", __FUNCTION__, priv, op.count);
- if (op.count > limit)
- return -EINVAL;
-
- spin_lock(&priv->lock);
- priv->limit = op.count;
- spin_unlock(&priv->lock);
- return 0;
-}
-
static long gntdev_ioctl(struct file *flip,
unsigned int cmd, unsigned long arg)
{
@@ -529,9 +512,6 @@ static long gntdev_ioctl(struct file *flip,
case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR:
return gntdev_ioctl_get_offset_for_vaddr(priv, ptr);
- case IOCTL_GNTDEV_SET_MAX_GRANTS:
- return gntdev_ioctl_set_max_grants(priv, ptr);
-
default:
if (debug)
printk("%s: priv %p, unknown cmd %x\n",
--
1.7.2.3
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH 2/6] xen-gntdev: Change page limit to be global instead of per-open
2010-12-14 14:55 ` [PATCH 2/6] xen-gntdev: Change page limit to be global instead of per-open Daniel De Graaf
@ 2010-12-14 21:12 ` Jeremy Fitzhardinge
2010-12-14 21:42 ` Daniel De Graaf
0 siblings, 1 reply; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-14 21:12 UTC (permalink / raw)
To: Daniel De Graaf; +Cc: xen-devel, Ian.Campbell
On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
> Because there is no limitation on how many times a user can open a
> given device file, an per-file-description limit on the number of
> pages granted offers little to no benefit. Change to a global limit
> and remove the ioctl() as the parameter can now be changed via sysfs.
Does anyone use this ioctl? Wouldn't it be safer to replace it with a
no-op version?
J
> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> ---
> drivers/xen/gntdev.c | 48 ++++++++++++++----------------------------------
> 1 files changed, 14 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index c582804..62639af 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -44,13 +44,12 @@ MODULE_DESCRIPTION("User-space granted page access driver");
>
> static int debug = 0;
> module_param(debug, int, 0644);
> -static int limit = 1024;
> +static int limit = 1024*1024;
> module_param(limit, int, 0644);
> +static atomic_t pages_mapped = ATOMIC_INIT(0);
>
> struct gntdev_priv {
> struct list_head maps;
> - uint32_t used;
> - uint32_t limit;
> spinlock_t lock;
> struct mm_struct *mm;
> struct mmu_notifier mn;
> @@ -76,8 +75,8 @@ static void gntdev_print_maps(struct gntdev_priv *priv,
> {
> struct grant_map *map;
>
> - printk("%s: maps list (priv %p, usage %d/%d)\n",
> - __FUNCTION__, priv, priv->used, priv->limit);
> + printk("%s: maps list (priv %p)\n",
> + __FUNCTION__, priv);
> list_for_each_entry(map, &priv->maps, next)
> printk(" index %2d, count %2d %s\n",
> map->index, map->count,
> @@ -104,9 +103,6 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
> add->count = count;
> add->priv = priv;
>
> - if (add->count + priv->used > priv->limit)
> - goto err;
> -
> return add;
>
> err:
> @@ -131,7 +127,6 @@ static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
> list_add_tail(&add->next, &priv->maps);
>
> done:
> - priv->used += add->count;
> if (debug)
> gntdev_print_maps(priv, "[new]", add->index);
> }
> @@ -178,7 +173,7 @@ static int gntdev_del_map(struct grant_map *map)
> if (map->unmap_ops[i].handle)
> return -EBUSY;
>
> - map->priv->used -= map->count;
> + atomic_sub(map->count, &pages_mapped);
> list_del(&map->next);
> return 0;
> }
> @@ -360,7 +355,6 @@ static int gntdev_open(struct inode *inode, struct file *flip)
>
> INIT_LIST_HEAD(&priv->maps);
> spin_lock_init(&priv->lock);
> - priv->limit = limit;
>
> priv->mm = get_task_mm(current);
> if (!priv->mm) {
> @@ -416,19 +410,26 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
> op.count);
> if (unlikely(op.count <= 0))
> return -EINVAL;
> - if (unlikely(op.count > priv->limit))
> - return -EINVAL;
>
> err = -ENOMEM;
> map = gntdev_alloc_map(priv, op.count);
> if (!map)
> return err;
> +
> if (copy_from_user(map->grants, &u->refs,
> sizeof(map->grants[0]) * op.count) != 0) {
> gntdev_free_map(map);
> return err;
> }
>
> + if (unlikely(atomic_add_return(op.count, &pages_mapped) > limit))
> + {
> + if (debug)
> + printk("%s: can't map: over limit\n", __FUNCTION__);
> + gntdev_free_map(map);
> + return err;
> + }
> +
> spin_lock(&priv->lock);
> gntdev_add_map(priv, map);
> op.index = map->index << PAGE_SHIFT;
> @@ -495,24 +496,6 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
> return 0;
> }
>
> -static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv,
> - struct ioctl_gntdev_set_max_grants __user *u)
> -{
> - struct ioctl_gntdev_set_max_grants op;
> -
> - if (copy_from_user(&op, u, sizeof(op)) != 0)
> - return -EFAULT;
> - if (debug)
> - printk("%s: priv %p, limit %d\n", __FUNCTION__, priv, op.count);
> - if (op.count > limit)
> - return -EINVAL;
> -
> - spin_lock(&priv->lock);
> - priv->limit = op.count;
> - spin_unlock(&priv->lock);
> - return 0;
> -}
> -
> static long gntdev_ioctl(struct file *flip,
> unsigned int cmd, unsigned long arg)
> {
> @@ -529,9 +512,6 @@ static long gntdev_ioctl(struct file *flip,
> case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR:
> return gntdev_ioctl_get_offset_for_vaddr(priv, ptr);
>
> - case IOCTL_GNTDEV_SET_MAX_GRANTS:
> - return gntdev_ioctl_set_max_grants(priv, ptr);
> -
> default:
> if (debug)
> printk("%s: priv %p, unknown cmd %x\n",
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 2/6] xen-gntdev: Change page limit to be global instead of per-open
2010-12-14 21:12 ` Jeremy Fitzhardinge
@ 2010-12-14 21:42 ` Daniel De Graaf
2010-12-15 9:50 ` Ian Campbell
0 siblings, 1 reply; 32+ messages in thread
From: Daniel De Graaf @ 2010-12-14 21:42 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: xen-devel, Ian.Campbell
On 12/14/2010 04:12 PM, Jeremy Fitzhardinge wrote:
> On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
>> Because there is no limitation on how many times a user can open a
>> given device file, an per-file-description limit on the number of
>> pages granted offers little to no benefit. Change to a global limit
>> and remove the ioctl() as the parameter can now be changed via sysfs.
>
> Does anyone use this ioctl? Wouldn't it be safer to replace it with a
> no-op version?
>
> J
>
I do not know of any users. If it's preferred to replace with a noop ioctl
instead of the current -ENOSYS return, that's easy to do.
>> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
>> ---
>> drivers/xen/gntdev.c | 48 ++++++++++++++----------------------------------
>> 1 files changed, 14 insertions(+), 34 deletions(-)
>>
>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>> index c582804..62639af 100644
>> --- a/drivers/xen/gntdev.c
>> +++ b/drivers/xen/gntdev.c
>> @@ -44,13 +44,12 @@ MODULE_DESCRIPTION("User-space granted page access driver");
>>
>> static int debug = 0;
>> module_param(debug, int, 0644);
>> -static int limit = 1024;
>> +static int limit = 1024*1024;
>> module_param(limit, int, 0644);
>> +static atomic_t pages_mapped = ATOMIC_INIT(0);
>>
>> struct gntdev_priv {
>> struct list_head maps;
>> - uint32_t used;
>> - uint32_t limit;
>> spinlock_t lock;
>> struct mm_struct *mm;
>> struct mmu_notifier mn;
>> @@ -76,8 +75,8 @@ static void gntdev_print_maps(struct gntdev_priv *priv,
>> {
>> struct grant_map *map;
>>
>> - printk("%s: maps list (priv %p, usage %d/%d)\n",
>> - __FUNCTION__, priv, priv->used, priv->limit);
>> + printk("%s: maps list (priv %p)\n",
>> + __FUNCTION__, priv);
>> list_for_each_entry(map, &priv->maps, next)
>> printk(" index %2d, count %2d %s\n",
>> map->index, map->count,
>> @@ -104,9 +103,6 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
>> add->count = count;
>> add->priv = priv;
>>
>> - if (add->count + priv->used > priv->limit)
>> - goto err;
>> -
>> return add;
>>
>> err:
>> @@ -131,7 +127,6 @@ static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
>> list_add_tail(&add->next, &priv->maps);
>>
>> done:
>> - priv->used += add->count;
>> if (debug)
>> gntdev_print_maps(priv, "[new]", add->index);
>> }
>> @@ -178,7 +173,7 @@ static int gntdev_del_map(struct grant_map *map)
>> if (map->unmap_ops[i].handle)
>> return -EBUSY;
>>
>> - map->priv->used -= map->count;
>> + atomic_sub(map->count, &pages_mapped);
>> list_del(&map->next);
>> return 0;
>> }
>> @@ -360,7 +355,6 @@ static int gntdev_open(struct inode *inode, struct file *flip)
>>
>> INIT_LIST_HEAD(&priv->maps);
>> spin_lock_init(&priv->lock);
>> - priv->limit = limit;
>>
>> priv->mm = get_task_mm(current);
>> if (!priv->mm) {
>> @@ -416,19 +410,26 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
>> op.count);
>> if (unlikely(op.count <= 0))
>> return -EINVAL;
>> - if (unlikely(op.count > priv->limit))
>> - return -EINVAL;
>>
>> err = -ENOMEM;
>> map = gntdev_alloc_map(priv, op.count);
>> if (!map)
>> return err;
>> +
>> if (copy_from_user(map->grants, &u->refs,
>> sizeof(map->grants[0]) * op.count) != 0) {
>> gntdev_free_map(map);
>> return err;
>> }
>>
>> + if (unlikely(atomic_add_return(op.count, &pages_mapped) > limit))
>> + {
>> + if (debug)
>> + printk("%s: can't map: over limit\n", __FUNCTION__);
>> + gntdev_free_map(map);
>> + return err;
>> + }
>> +
>> spin_lock(&priv->lock);
>> gntdev_add_map(priv, map);
>> op.index = map->index << PAGE_SHIFT;
>> @@ -495,24 +496,6 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
>> return 0;
>> }
>>
>> -static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv,
>> - struct ioctl_gntdev_set_max_grants __user *u)
>> -{
>> - struct ioctl_gntdev_set_max_grants op;
>> -
>> - if (copy_from_user(&op, u, sizeof(op)) != 0)
>> - return -EFAULT;
>> - if (debug)
>> - printk("%s: priv %p, limit %d\n", __FUNCTION__, priv, op.count);
>> - if (op.count > limit)
>> - return -EINVAL;
>> -
>> - spin_lock(&priv->lock);
>> - priv->limit = op.count;
>> - spin_unlock(&priv->lock);
>> - return 0;
>> -}
>> -
>> static long gntdev_ioctl(struct file *flip,
>> unsigned int cmd, unsigned long arg)
>> {
>> @@ -529,9 +512,6 @@ static long gntdev_ioctl(struct file *flip,
>> case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR:
>> return gntdev_ioctl_get_offset_for_vaddr(priv, ptr);
>>
>> - case IOCTL_GNTDEV_SET_MAX_GRANTS:
>> - return gntdev_ioctl_set_max_grants(priv, ptr);
>> -
>> default:
>> if (debug)
>> printk("%s: priv %p, unknown cmd %x\n",
>
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 2/6] xen-gntdev: Change page limit to be global instead of per-open
2010-12-14 21:42 ` Daniel De Graaf
@ 2010-12-15 9:50 ` Ian Campbell
2010-12-16 0:27 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 32+ messages in thread
From: Ian Campbell @ 2010-12-15 9:50 UTC (permalink / raw)
To: Daniel De Graaf; +Cc: Jeremy Fitzhardinge, xen-devel@lists.xensource.com
On Tue, 2010-12-14 at 21:42 +0000, Daniel De Graaf wrote:
> On 12/14/2010 04:12 PM, Jeremy Fitzhardinge wrote:
> > On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
> >> Because there is no limitation on how many times a user can open a
> >> given device file, an per-file-description limit on the number of
> >> pages granted offers little to no benefit. Change to a global limit
> >> and remove the ioctl() as the parameter can now be changed via sysfs.
> >
> > Does anyone use this ioctl? Wouldn't it be safer to replace it with a
> > no-op version?
> >
> > J
> >
>
> I do not know of any users. If it's preferred to replace with a noop ioctl
> instead of the current -ENOSYS return, that's easy to do.
It's called by xc_gnttab_set_max_grants in libxc, although I don't see
any callers of that function.
We should probably also remove the library function.
Ian.
> >> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> >> ---
> >> drivers/xen/gntdev.c | 48 ++++++++++++++----------------------------------
> >> 1 files changed, 14 insertions(+), 34 deletions(-)
> >>
> >> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> >> index c582804..62639af 100644
> >> --- a/drivers/xen/gntdev.c
> >> +++ b/drivers/xen/gntdev.c
> >> @@ -44,13 +44,12 @@ MODULE_DESCRIPTION("User-space granted page access driver");
> >>
> >> static int debug = 0;
> >> module_param(debug, int, 0644);
> >> -static int limit = 1024;
> >> +static int limit = 1024*1024;
> >> module_param(limit, int, 0644);
> >> +static atomic_t pages_mapped = ATOMIC_INIT(0);
> >>
> >> struct gntdev_priv {
> >> struct list_head maps;
> >> - uint32_t used;
> >> - uint32_t limit;
> >> spinlock_t lock;
> >> struct mm_struct *mm;
> >> struct mmu_notifier mn;
> >> @@ -76,8 +75,8 @@ static void gntdev_print_maps(struct gntdev_priv *priv,
> >> {
> >> struct grant_map *map;
> >>
> >> - printk("%s: maps list (priv %p, usage %d/%d)\n",
> >> - __FUNCTION__, priv, priv->used, priv->limit);
> >> + printk("%s: maps list (priv %p)\n",
> >> + __FUNCTION__, priv);
> >> list_for_each_entry(map, &priv->maps, next)
> >> printk(" index %2d, count %2d %s\n",
> >> map->index, map->count,
> >> @@ -104,9 +103,6 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
> >> add->count = count;
> >> add->priv = priv;
> >>
> >> - if (add->count + priv->used > priv->limit)
> >> - goto err;
> >> -
> >> return add;
> >>
> >> err:
> >> @@ -131,7 +127,6 @@ static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
> >> list_add_tail(&add->next, &priv->maps);
> >>
> >> done:
> >> - priv->used += add->count;
> >> if (debug)
> >> gntdev_print_maps(priv, "[new]", add->index);
> >> }
> >> @@ -178,7 +173,7 @@ static int gntdev_del_map(struct grant_map *map)
> >> if (map->unmap_ops[i].handle)
> >> return -EBUSY;
> >>
> >> - map->priv->used -= map->count;
> >> + atomic_sub(map->count, &pages_mapped);
> >> list_del(&map->next);
> >> return 0;
> >> }
> >> @@ -360,7 +355,6 @@ static int gntdev_open(struct inode *inode, struct file *flip)
> >>
> >> INIT_LIST_HEAD(&priv->maps);
> >> spin_lock_init(&priv->lock);
> >> - priv->limit = limit;
> >>
> >> priv->mm = get_task_mm(current);
> >> if (!priv->mm) {
> >> @@ -416,19 +410,26 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
> >> op.count);
> >> if (unlikely(op.count <= 0))
> >> return -EINVAL;
> >> - if (unlikely(op.count > priv->limit))
> >> - return -EINVAL;
> >>
> >> err = -ENOMEM;
> >> map = gntdev_alloc_map(priv, op.count);
> >> if (!map)
> >> return err;
> >> +
> >> if (copy_from_user(map->grants, &u->refs,
> >> sizeof(map->grants[0]) * op.count) != 0) {
> >> gntdev_free_map(map);
> >> return err;
> >> }
> >>
> >> + if (unlikely(atomic_add_return(op.count, &pages_mapped) > limit))
> >> + {
> >> + if (debug)
> >> + printk("%s: can't map: over limit\n", __FUNCTION__);
> >> + gntdev_free_map(map);
> >> + return err;
> >> + }
> >> +
> >> spin_lock(&priv->lock);
> >> gntdev_add_map(priv, map);
> >> op.index = map->index << PAGE_SHIFT;
> >> @@ -495,24 +496,6 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
> >> return 0;
> >> }
> >>
> >> -static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv,
> >> - struct ioctl_gntdev_set_max_grants __user *u)
> >> -{
> >> - struct ioctl_gntdev_set_max_grants op;
> >> -
> >> - if (copy_from_user(&op, u, sizeof(op)) != 0)
> >> - return -EFAULT;
> >> - if (debug)
> >> - printk("%s: priv %p, limit %d\n", __FUNCTION__, priv, op.count);
> >> - if (op.count > limit)
> >> - return -EINVAL;
> >> -
> >> - spin_lock(&priv->lock);
> >> - priv->limit = op.count;
> >> - spin_unlock(&priv->lock);
> >> - return 0;
> >> -}
> >> -
> >> static long gntdev_ioctl(struct file *flip,
> >> unsigned int cmd, unsigned long arg)
> >> {
> >> @@ -529,9 +512,6 @@ static long gntdev_ioctl(struct file *flip,
> >> case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR:
> >> return gntdev_ioctl_get_offset_for_vaddr(priv, ptr);
> >>
> >> - case IOCTL_GNTDEV_SET_MAX_GRANTS:
> >> - return gntdev_ioctl_set_max_grants(priv, ptr);
> >> -
> >> default:
> >> if (debug)
> >> printk("%s: priv %p, unknown cmd %x\n",
> >
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 2/6] xen-gntdev: Change page limit to be global instead of per-open
2010-12-15 9:50 ` Ian Campbell
@ 2010-12-16 0:27 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-16 0:27 UTC (permalink / raw)
To: Ian Campbell; +Cc: Daniel De Graaf, xen-devel@lists.xensource.com
On 12/15/2010 01:50 AM, Ian Campbell wrote:
> On Tue, 2010-12-14 at 21:42 +0000, Daniel De Graaf wrote:
>> On 12/14/2010 04:12 PM, Jeremy Fitzhardinge wrote:
>>> On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
>>>> Because there is no limitation on how many times a user can open a
>>>> given device file, an per-file-description limit on the number of
>>>> pages granted offers little to no benefit. Change to a global limit
>>>> and remove the ioctl() as the parameter can now be changed via sysfs.
>>> Does anyone use this ioctl? Wouldn't it be safer to replace it with a
>>> no-op version?
>>>
>>> J
>>>
>> I do not know of any users. If it's preferred to replace with a noop ioctl
>> instead of the current -ENOSYS return, that's easy to do.
> It's called by xc_gnttab_set_max_grants in libxc, although I don't see
> any callers of that function.
>
> We should probably also remove the library function.
Seems like the right thing to do. If its original semantics were iffy
and nobody is using it, then there's no point keeping vestigial code around.
J
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 3/6] xen-gntdev: Remove unneeded structures from grant_map tracking data
2010-12-14 14:55 [PATCH v2] Userspace grant communication Daniel De Graaf
2010-12-14 14:55 ` [PATCH 1/6] xen-gntdev: Fix circular locking dependency Daniel De Graaf
2010-12-14 14:55 ` [PATCH 2/6] xen-gntdev: Change page limit to be global instead of per-open Daniel De Graaf
@ 2010-12-14 14:55 ` Daniel De Graaf
2010-12-14 21:15 ` Jeremy Fitzhardinge
2010-12-14 21:54 ` Jeremy Fitzhardinge
2010-12-14 14:55 ` [PATCH 4/6] xen-gntdev: Use find_vma rather than iterating our vma list manually Daniel De Graaf
` (2 subsequent siblings)
5 siblings, 2 replies; 32+ messages in thread
From: Daniel De Graaf @ 2010-12-14 14:55 UTC (permalink / raw)
To: xen-devel; +Cc: Daniel De Graaf, Ian.Campbell
The entire hypercall argument list isn't required; only selected
fields from the hypercall need to be tracked between the ioctl, map,
and unmap operations.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
drivers/xen/gntdev.c | 195 +++++++++++++++++++++++++++++--------------------
1 files changed, 115 insertions(+), 80 deletions(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 62639af..773b76c 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -55,17 +55,23 @@ struct gntdev_priv {
struct mmu_notifier mn;
};
+struct granted_page {
+ u64 pte_maddr;
+ union {
+ struct ioctl_gntdev_grant_ref target;
+ grant_handle_t handle;
+ };
+};
+
struct grant_map {
struct list_head next;
struct gntdev_priv *priv;
struct vm_area_struct *vma;
int index;
int count;
- int flags;
- int is_mapped;
- struct ioctl_gntdev_grant_ref *grants;
- struct gnttab_map_grant_ref *map_ops;
- struct gnttab_unmap_grant_ref *unmap_ops;
+ int is_mapped:1;
+ int is_ro:1;
+ struct granted_page pages[0];
};
/* ------------------------------------------------------------------ */
@@ -83,40 +89,28 @@ static void gntdev_print_maps(struct gntdev_priv *priv,
map->index == text_index && text ? text : "");
}
-static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
+static struct grant_map *gntdev_alloc_map(int count,
+ struct ioctl_gntdev_grant_ref* grants)
{
struct grant_map *add;
+ int i;
- add = kzalloc(sizeof(struct grant_map), GFP_KERNEL);
- if (NULL == add)
+ add = kzalloc(sizeof(*add) + sizeof(add->pages[0])*count, GFP_KERNEL);
+ if (!add)
return NULL;
- add->grants = kzalloc(sizeof(add->grants[0]) * count, GFP_KERNEL);
- add->map_ops = kzalloc(sizeof(add->map_ops[0]) * count, GFP_KERNEL);
- add->unmap_ops = kzalloc(sizeof(add->unmap_ops[0]) * count, GFP_KERNEL);
- if (NULL == add->grants ||
- NULL == add->map_ops ||
- NULL == add->unmap_ops)
- goto err;
-
- add->index = 0;
add->count = count;
- add->priv = priv;
+ for(i = 0; i < count; i++)
+ add->pages[i].target = grants[i];
return add;
-
-err:
- kfree(add->grants);
- kfree(add->map_ops);
- kfree(add->unmap_ops);
- kfree(add);
- return NULL;
}
static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
{
struct grant_map *map;
+ spin_lock(&priv->lock);
list_for_each_entry(map, &priv->maps, next) {
if (add->index + add->count < map->index) {
list_add_tail(&add->next, &map->next);
@@ -127,8 +121,10 @@ static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
list_add_tail(&add->next, &priv->maps);
done:
+ add->priv = priv;
if (debug)
gntdev_print_maps(priv, "[new]", add->index);
+ spin_unlock(&priv->lock);
}
static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index,
@@ -169,9 +165,10 @@ static int gntdev_del_map(struct grant_map *map)
if (map->vma)
return -EBUSY;
- for (i = 0; i < map->count; i++)
- if (map->unmap_ops[i].handle)
- return -EBUSY;
+ if (map->is_mapped)
+ for (i = 0; i < map->count; i++)
+ if (map->pages[i].handle)
+ return -EBUSY;
atomic_sub(map->count, &pages_mapped);
list_del(&map->next);
@@ -182,9 +179,6 @@ static void gntdev_free_map(struct grant_map *map)
{
if (!map)
return;
- kfree(map->grants);
- kfree(map->map_ops);
- kfree(map->unmap_ops);
kfree(map);
}
@@ -199,51 +193,91 @@ static int find_grant_ptes(pte_t *pte, pgtable_t token, unsigned long addr, void
BUG_ON(pgnr >= map->count);
pte_maddr = (u64)pfn_to_mfn(page_to_pfn(token)) << PAGE_SHIFT;
pte_maddr += (unsigned long)pte & ~PAGE_MASK;
- gnttab_set_map_op(&map->map_ops[pgnr], pte_maddr, map->flags,
- map->grants[pgnr].ref,
- map->grants[pgnr].domid);
- gnttab_set_unmap_op(&map->unmap_ops[pgnr], pte_maddr, map->flags,
- 0 /* handle */);
+ map->pages[pgnr].pte_maddr = pte_maddr;
+
return 0;
}
static int map_grant_pages(struct grant_map *map)
{
- int i, err = 0;
+ int i, flags, err = 0;
+ struct gnttab_map_grant_ref* map_ops;
+ flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte;
+ if (map->is_ro)
+ flags |= GNTMAP_readonly;
+
+ map_ops = kzalloc(sizeof(map_ops[0]) * map->count, GFP_TEMPORARY);
+ if (!map_ops)
+ return -ENOMEM;
+
+ for(i=0; i < map->count; i++)
+ gnttab_set_map_op(&map_ops[i], map->pages[i].pte_maddr, flags,
+ map->pages[i].target.ref,
+ map->pages[i].target.domid);
if (debug)
printk("%s: map %d+%d\n", __FUNCTION__, map->index, map->count);
err = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,
- map->map_ops, map->count);
+ map_ops, map->count);
if (WARN_ON(err))
- return err;
+ goto out;
for (i = 0; i < map->count; i++) {
- if (map->map_ops[i].status)
+ if (map_ops[i].status) {
+ map->pages[i].pte_maddr = 0;
err = -EINVAL;
- map->unmap_ops[i].handle = map->map_ops[i].handle;
+ } else {
+ map->pages[i].handle = map_ops[i].handle;
+ }
}
+
+out:
+ kfree(map_ops);
return err;
}
-static int unmap_grant_pages(struct grant_map *map, int offset, int pages)
+static void unmap_grant_pages(struct grant_map *map, int offset, int pages)
{
- int i, err = 0;
+ int i, flags, err = 0;
+ struct gnttab_unmap_grant_ref *unmap_ops;
+ struct gnttab_unmap_grant_ref unmap_single;
+
+ if (pages > 1) {
+ unmap_ops = kzalloc(sizeof(unmap_ops[0]) * pages,
+ GFP_TEMPORARY);
+ if (unlikely(!unmap_ops)) {
+ for(i=0; i < pages; i++)
+ unmap_grant_pages(map, offset + i, 1);
+ return;
+ }
+ } else {
+ unmap_ops = &unmap_single;
+ }
+ flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte;
+ if (map->is_ro)
+ flags |= GNTMAP_readonly;
+
+ for(i=0; i < pages; i++)
+ gnttab_set_unmap_op(&unmap_ops[i],
+ map->pages[offset+i].pte_maddr, flags,
+ map->pages[offset+i].handle);
if (debug)
printk("%s: map %d+%d [%d+%d]\n", __FUNCTION__,
map->index, map->count, offset, pages);
err = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
- map->unmap_ops + offset, pages);
+ unmap_ops, pages);
if (WARN_ON(err))
- return err;
+ goto out;
for (i = 0; i < pages; i++) {
- if (map->unmap_ops[offset+i].status)
- err = -EINVAL;
- map->unmap_ops[offset+i].handle = 0;
+ WARN_ON(unmap_ops[i].status);
+ map->pages[offset+i].pte_maddr = 0;
+ map->pages[offset+i].handle = 0;
}
- return err;
+out:
+ if (unmap_ops != &unmap_single)
+ kfree(unmap_ops);
}
/* ------------------------------------------------------------------ */
@@ -282,7 +316,6 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
struct grant_map *map;
unsigned long mstart, mend;
- int err;
spin_lock(&priv->lock);
list_for_each_entry(map, &priv->maps, next) {
@@ -301,10 +334,9 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
__FUNCTION__, map->index, map->count,
map->vma->vm_start, map->vma->vm_end,
start, end, mstart, mend);
- err = unmap_grant_pages(map,
- (mstart - map->vma->vm_start) >> PAGE_SHIFT,
- (mend - mstart) >> PAGE_SHIFT);
- WARN_ON(err);
+ unmap_grant_pages(map,
+ (mstart - map->vma->vm_start) >> PAGE_SHIFT,
+ (mend - mstart) >> PAGE_SHIFT);
}
spin_unlock(&priv->lock);
}
@@ -321,7 +353,6 @@ static void mn_release(struct mmu_notifier *mn,
{
struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
struct grant_map *map;
- int err;
spin_lock(&priv->lock);
list_for_each_entry(map, &priv->maps, next) {
@@ -331,8 +362,7 @@ static void mn_release(struct mmu_notifier *mn,
printk("%s: map %d+%d (%lx %lx)\n",
__FUNCTION__, map->index, map->count,
map->vma->vm_start, map->vma->vm_end);
- err = unmap_grant_pages(map, 0, map->count);
- WARN_ON(err);
+ unmap_grant_pages(map, 0, map->count);
}
spin_unlock(&priv->lock);
}
@@ -401,6 +431,7 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
{
struct ioctl_gntdev_map_grant_ref op;
struct grant_map *map;
+ struct ioctl_gntdev_grant_ref* grants;
int err;
if (copy_from_user(&op, u, sizeof(op)) != 0)
@@ -411,38 +442,45 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
if (unlikely(op.count <= 0))
return -EINVAL;
+ grants = kmalloc(sizeof(grants[0]) * op.count, GFP_TEMPORARY);
+ if (!grants)
+ return -ENOMEM;
+
+ if (copy_from_user(grants, u->refs, sizeof(grants[0]) * op.count)) {
+ err = -EFAULT;
+ goto out_free;
+ }
+
err = -ENOMEM;
- map = gntdev_alloc_map(priv, op.count);
+ map = gntdev_alloc_map(op.count, grants);
if (!map)
- return err;
-
- if (copy_from_user(map->grants, &u->refs,
- sizeof(map->grants[0]) * op.count) != 0) {
- gntdev_free_map(map);
- return err;
- }
+ goto out_free;
if (unlikely(atomic_add_return(op.count, &pages_mapped) > limit))
{
if (debug)
printk("%s: can't map: over limit\n", __FUNCTION__);
- gntdev_free_map(map);
- return err;
+ goto out_free_map;
}
- spin_lock(&priv->lock);
gntdev_add_map(priv, map);
op.index = map->index << PAGE_SHIFT;
- spin_unlock(&priv->lock);
- if (copy_to_user(u, &op, sizeof(op)) != 0) {
- spin_lock(&priv->lock);
- gntdev_del_map(map);
- spin_unlock(&priv->lock);
- gntdev_free_map(map);
- return err;
+ if (copy_to_user(u, &op, sizeof(op))) {
+ err = -EFAULT;
+ goto out_remove;
}
- return 0;
+ err = 0;
+out_free:
+ kfree(grants);
+ return err;
+out_remove:
+ spin_lock(&priv->lock);
+ gntdev_del_map(map);
+ spin_unlock(&priv->lock);
+out_free_map:
+ gntdev_free_map(map);
+ goto out_free;
}
static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
@@ -556,10 +594,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
vma->vm_private_data = map;
map->vma = vma;
-
- map->flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte;
- if (!(vma->vm_flags & VM_WRITE))
- map->flags |= GNTMAP_readonly;
+ map->is_ro = !(vma->vm_flags & VM_WRITE);
spin_unlock(&priv->lock);
--
1.7.2.3
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH 3/6] xen-gntdev: Remove unneeded structures from grant_map tracking data
2010-12-14 14:55 ` [PATCH 3/6] xen-gntdev: Remove unneeded structures from grant_map tracking data Daniel De Graaf
@ 2010-12-14 21:15 ` Jeremy Fitzhardinge
2010-12-14 21:52 ` Daniel De Graaf
2010-12-14 21:54 ` Jeremy Fitzhardinge
1 sibling, 1 reply; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-14 21:15 UTC (permalink / raw)
To: Daniel De Graaf; +Cc: xen-devel, Ian.Campbell
On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
> The entire hypercall argument list isn't required; only selected
> fields from the hypercall need to be tracked between the ioctl, map,
> and unmap operations.
Is the rationale of this patch to save memory? If so, how much does it
save.
(This patch seems sensible in principle, but it doesn't seem to save
much complexity.)
J
> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> ---
> drivers/xen/gntdev.c | 195 +++++++++++++++++++++++++++++--------------------
> 1 files changed, 115 insertions(+), 80 deletions(-)
>
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index 62639af..773b76c 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -55,17 +55,23 @@ struct gntdev_priv {
> struct mmu_notifier mn;
> };
>
> +struct granted_page {
> + u64 pte_maddr;
> + union {
> + struct ioctl_gntdev_grant_ref target;
> + grant_handle_t handle;
> + };
> +};
> +
> struct grant_map {
> struct list_head next;
> struct gntdev_priv *priv;
> struct vm_area_struct *vma;
> int index;
> int count;
> - int flags;
> - int is_mapped;
> - struct ioctl_gntdev_grant_ref *grants;
> - struct gnttab_map_grant_ref *map_ops;
> - struct gnttab_unmap_grant_ref *unmap_ops;
> + int is_mapped:1;
> + int is_ro:1;
> + struct granted_page pages[0];
> };
>
> /* ------------------------------------------------------------------ */
> @@ -83,40 +89,28 @@ static void gntdev_print_maps(struct gntdev_priv *priv,
> map->index == text_index && text ? text : "");
> }
>
> -static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
> +static struct grant_map *gntdev_alloc_map(int count,
> + struct ioctl_gntdev_grant_ref* grants)
> {
> struct grant_map *add;
> + int i;
>
> - add = kzalloc(sizeof(struct grant_map), GFP_KERNEL);
> - if (NULL == add)
> + add = kzalloc(sizeof(*add) + sizeof(add->pages[0])*count, GFP_KERNEL);
> + if (!add)
> return NULL;
>
> - add->grants = kzalloc(sizeof(add->grants[0]) * count, GFP_KERNEL);
> - add->map_ops = kzalloc(sizeof(add->map_ops[0]) * count, GFP_KERNEL);
> - add->unmap_ops = kzalloc(sizeof(add->unmap_ops[0]) * count, GFP_KERNEL);
> - if (NULL == add->grants ||
> - NULL == add->map_ops ||
> - NULL == add->unmap_ops)
> - goto err;
> -
> - add->index = 0;
> add->count = count;
> - add->priv = priv;
> + for(i = 0; i < count; i++)
> + add->pages[i].target = grants[i];
>
> return add;
> -
> -err:
> - kfree(add->grants);
> - kfree(add->map_ops);
> - kfree(add->unmap_ops);
> - kfree(add);
> - return NULL;
> }
>
> static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
> {
> struct grant_map *map;
>
> + spin_lock(&priv->lock);
> list_for_each_entry(map, &priv->maps, next) {
> if (add->index + add->count < map->index) {
> list_add_tail(&add->next, &map->next);
> @@ -127,8 +121,10 @@ static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
> list_add_tail(&add->next, &priv->maps);
>
> done:
> + add->priv = priv;
> if (debug)
> gntdev_print_maps(priv, "[new]", add->index);
> + spin_unlock(&priv->lock);
> }
>
> static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index,
> @@ -169,9 +165,10 @@ static int gntdev_del_map(struct grant_map *map)
>
> if (map->vma)
> return -EBUSY;
> - for (i = 0; i < map->count; i++)
> - if (map->unmap_ops[i].handle)
> - return -EBUSY;
> + if (map->is_mapped)
> + for (i = 0; i < map->count; i++)
> + if (map->pages[i].handle)
> + return -EBUSY;
>
> atomic_sub(map->count, &pages_mapped);
> list_del(&map->next);
> @@ -182,9 +179,6 @@ static void gntdev_free_map(struct grant_map *map)
> {
> if (!map)
> return;
> - kfree(map->grants);
> - kfree(map->map_ops);
> - kfree(map->unmap_ops);
> kfree(map);
> }
>
> @@ -199,51 +193,91 @@ static int find_grant_ptes(pte_t *pte, pgtable_t token, unsigned long addr, void
> BUG_ON(pgnr >= map->count);
> pte_maddr = (u64)pfn_to_mfn(page_to_pfn(token)) << PAGE_SHIFT;
> pte_maddr += (unsigned long)pte & ~PAGE_MASK;
> - gnttab_set_map_op(&map->map_ops[pgnr], pte_maddr, map->flags,
> - map->grants[pgnr].ref,
> - map->grants[pgnr].domid);
> - gnttab_set_unmap_op(&map->unmap_ops[pgnr], pte_maddr, map->flags,
> - 0 /* handle */);
> + map->pages[pgnr].pte_maddr = pte_maddr;
> +
> return 0;
> }
>
> static int map_grant_pages(struct grant_map *map)
> {
> - int i, err = 0;
> + int i, flags, err = 0;
> + struct gnttab_map_grant_ref* map_ops;
>
> + flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte;
> + if (map->is_ro)
> + flags |= GNTMAP_readonly;
> +
> + map_ops = kzalloc(sizeof(map_ops[0]) * map->count, GFP_TEMPORARY);
> + if (!map_ops)
> + return -ENOMEM;
> +
> + for(i=0; i < map->count; i++)
> + gnttab_set_map_op(&map_ops[i], map->pages[i].pte_maddr, flags,
> + map->pages[i].target.ref,
> + map->pages[i].target.domid);
> if (debug)
> printk("%s: map %d+%d\n", __FUNCTION__, map->index, map->count);
> err = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,
> - map->map_ops, map->count);
> + map_ops, map->count);
> if (WARN_ON(err))
> - return err;
> + goto out;
>
> for (i = 0; i < map->count; i++) {
> - if (map->map_ops[i].status)
> + if (map_ops[i].status) {
> + map->pages[i].pte_maddr = 0;
> err = -EINVAL;
> - map->unmap_ops[i].handle = map->map_ops[i].handle;
> + } else {
> + map->pages[i].handle = map_ops[i].handle;
> + }
> }
> +
> +out:
> + kfree(map_ops);
> return err;
> }
>
> -static int unmap_grant_pages(struct grant_map *map, int offset, int pages)
> +static void unmap_grant_pages(struct grant_map *map, int offset, int pages)
> {
> - int i, err = 0;
> + int i, flags, err = 0;
> + struct gnttab_unmap_grant_ref *unmap_ops;
> + struct gnttab_unmap_grant_ref unmap_single;
> +
> + if (pages > 1) {
> + unmap_ops = kzalloc(sizeof(unmap_ops[0]) * pages,
> + GFP_TEMPORARY);
> + if (unlikely(!unmap_ops)) {
> + for(i=0; i < pages; i++)
> + unmap_grant_pages(map, offset + i, 1);
> + return;
> + }
> + } else {
> + unmap_ops = &unmap_single;
> + }
>
> + flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte;
> + if (map->is_ro)
> + flags |= GNTMAP_readonly;
> +
> + for(i=0; i < pages; i++)
> + gnttab_set_unmap_op(&unmap_ops[i],
> + map->pages[offset+i].pte_maddr, flags,
> + map->pages[offset+i].handle);
> if (debug)
> printk("%s: map %d+%d [%d+%d]\n", __FUNCTION__,
> map->index, map->count, offset, pages);
> err = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
> - map->unmap_ops + offset, pages);
> + unmap_ops, pages);
> if (WARN_ON(err))
> - return err;
> + goto out;
>
> for (i = 0; i < pages; i++) {
> - if (map->unmap_ops[offset+i].status)
> - err = -EINVAL;
> - map->unmap_ops[offset+i].handle = 0;
> + WARN_ON(unmap_ops[i].status);
> + map->pages[offset+i].pte_maddr = 0;
> + map->pages[offset+i].handle = 0;
> }
> - return err;
> +out:
> + if (unmap_ops != &unmap_single)
> + kfree(unmap_ops);
> }
>
> /* ------------------------------------------------------------------ */
> @@ -282,7 +316,6 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
> struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
> struct grant_map *map;
> unsigned long mstart, mend;
> - int err;
>
> spin_lock(&priv->lock);
> list_for_each_entry(map, &priv->maps, next) {
> @@ -301,10 +334,9 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
> __FUNCTION__, map->index, map->count,
> map->vma->vm_start, map->vma->vm_end,
> start, end, mstart, mend);
> - err = unmap_grant_pages(map,
> - (mstart - map->vma->vm_start) >> PAGE_SHIFT,
> - (mend - mstart) >> PAGE_SHIFT);
> - WARN_ON(err);
> + unmap_grant_pages(map,
> + (mstart - map->vma->vm_start) >> PAGE_SHIFT,
> + (mend - mstart) >> PAGE_SHIFT);
> }
> spin_unlock(&priv->lock);
> }
> @@ -321,7 +353,6 @@ static void mn_release(struct mmu_notifier *mn,
> {
> struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
> struct grant_map *map;
> - int err;
>
> spin_lock(&priv->lock);
> list_for_each_entry(map, &priv->maps, next) {
> @@ -331,8 +362,7 @@ static void mn_release(struct mmu_notifier *mn,
> printk("%s: map %d+%d (%lx %lx)\n",
> __FUNCTION__, map->index, map->count,
> map->vma->vm_start, map->vma->vm_end);
> - err = unmap_grant_pages(map, 0, map->count);
> - WARN_ON(err);
> + unmap_grant_pages(map, 0, map->count);
> }
> spin_unlock(&priv->lock);
> }
> @@ -401,6 +431,7 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
> {
> struct ioctl_gntdev_map_grant_ref op;
> struct grant_map *map;
> + struct ioctl_gntdev_grant_ref* grants;
> int err;
>
> if (copy_from_user(&op, u, sizeof(op)) != 0)
> @@ -411,38 +442,45 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
> if (unlikely(op.count <= 0))
> return -EINVAL;
>
> + grants = kmalloc(sizeof(grants[0]) * op.count, GFP_TEMPORARY);
> + if (!grants)
> + return -ENOMEM;
> +
> + if (copy_from_user(grants, u->refs, sizeof(grants[0]) * op.count)) {
> + err = -EFAULT;
> + goto out_free;
> + }
> +
> err = -ENOMEM;
> - map = gntdev_alloc_map(priv, op.count);
> + map = gntdev_alloc_map(op.count, grants);
> if (!map)
> - return err;
> -
> - if (copy_from_user(map->grants, &u->refs,
> - sizeof(map->grants[0]) * op.count) != 0) {
> - gntdev_free_map(map);
> - return err;
> - }
> + goto out_free;
>
> if (unlikely(atomic_add_return(op.count, &pages_mapped) > limit))
> {
> if (debug)
> printk("%s: can't map: over limit\n", __FUNCTION__);
> - gntdev_free_map(map);
> - return err;
> + goto out_free_map;
> }
>
> - spin_lock(&priv->lock);
> gntdev_add_map(priv, map);
> op.index = map->index << PAGE_SHIFT;
> - spin_unlock(&priv->lock);
>
> - if (copy_to_user(u, &op, sizeof(op)) != 0) {
> - spin_lock(&priv->lock);
> - gntdev_del_map(map);
> - spin_unlock(&priv->lock);
> - gntdev_free_map(map);
> - return err;
> + if (copy_to_user(u, &op, sizeof(op))) {
> + err = -EFAULT;
> + goto out_remove;
> }
> - return 0;
> + err = 0;
> +out_free:
> + kfree(grants);
> + return err;
> +out_remove:
> + spin_lock(&priv->lock);
> + gntdev_del_map(map);
> + spin_unlock(&priv->lock);
> +out_free_map:
> + gntdev_free_map(map);
> + goto out_free;
> }
>
> static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
> @@ -556,10 +594,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>
> vma->vm_private_data = map;
> map->vma = vma;
> -
> - map->flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte;
> - if (!(vma->vm_flags & VM_WRITE))
> - map->flags |= GNTMAP_readonly;
> + map->is_ro = !(vma->vm_flags & VM_WRITE);
>
> spin_unlock(&priv->lock);
>
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 3/6] xen-gntdev: Remove unneeded structures from grant_map tracking data
2010-12-14 21:15 ` Jeremy Fitzhardinge
@ 2010-12-14 21:52 ` Daniel De Graaf
2010-12-14 21:56 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 32+ messages in thread
From: Daniel De Graaf @ 2010-12-14 21:52 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: xen-devel, Ian.Campbell
On 12/14/2010 04:15 PM, Jeremy Fitzhardinge wrote:
> On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
>> The entire hypercall argument list isn't required; only selected
>> fields from the hypercall need to be tracked between the ioctl, map,
>> and unmap operations.
>
> Is the rationale of this patch to save memory? If so, how much does it
> save.
>
> (This patch seems sensible in principle, but it doesn't seem to save
> much complexity.)
>
> J
This will also allow easier testing of what pages need to be unmapped
(more obvious in the HVM version). I also find it less confusing to
populate the hypercall arguments immediately before the hypercall, but
that's likely a matter of opinion. It only saves 46 bytes per page, so
if it seems more complex it could be dropped.
>> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
>> ---
>> drivers/xen/gntdev.c | 195 +++++++++++++++++++++++++++++--------------------
>> 1 files changed, 115 insertions(+), 80 deletions(-)
>>
>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>> index 62639af..773b76c 100644
>> --- a/drivers/xen/gntdev.c
>> +++ b/drivers/xen/gntdev.c
>> @@ -55,17 +55,23 @@ struct gntdev_priv {
>> struct mmu_notifier mn;
>> };
>>
>> +struct granted_page {
>> + u64 pte_maddr;
>> + union {
>> + struct ioctl_gntdev_grant_ref target;
>> + grant_handle_t handle;
>> + };
>> +};
>> +
>> struct grant_map {
>> struct list_head next;
>> struct gntdev_priv *priv;
>> struct vm_area_struct *vma;
>> int index;
>> int count;
>> - int flags;
>> - int is_mapped;
>> - struct ioctl_gntdev_grant_ref *grants;
>> - struct gnttab_map_grant_ref *map_ops;
>> - struct gnttab_unmap_grant_ref *unmap_ops;
>> + int is_mapped:1;
>> + int is_ro:1;
>> + struct granted_page pages[0];
>> };
>>
>> /* ------------------------------------------------------------------ */
>> @@ -83,40 +89,28 @@ static void gntdev_print_maps(struct gntdev_priv *priv,
>> map->index == text_index && text ? text : "");
>> }
>>
>> -static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
>> +static struct grant_map *gntdev_alloc_map(int count,
>> + struct ioctl_gntdev_grant_ref* grants)
>> {
>> struct grant_map *add;
>> + int i;
>>
>> - add = kzalloc(sizeof(struct grant_map), GFP_KERNEL);
>> - if (NULL == add)
>> + add = kzalloc(sizeof(*add) + sizeof(add->pages[0])*count, GFP_KERNEL);
>> + if (!add)
>> return NULL;
>>
>> - add->grants = kzalloc(sizeof(add->grants[0]) * count, GFP_KERNEL);
>> - add->map_ops = kzalloc(sizeof(add->map_ops[0]) * count, GFP_KERNEL);
>> - add->unmap_ops = kzalloc(sizeof(add->unmap_ops[0]) * count, GFP_KERNEL);
>> - if (NULL == add->grants ||
>> - NULL == add->map_ops ||
>> - NULL == add->unmap_ops)
>> - goto err;
>> -
>> - add->index = 0;
>> add->count = count;
>> - add->priv = priv;
>> + for(i = 0; i < count; i++)
>> + add->pages[i].target = grants[i];
>>
>> return add;
>> -
>> -err:
>> - kfree(add->grants);
>> - kfree(add->map_ops);
>> - kfree(add->unmap_ops);
>> - kfree(add);
>> - return NULL;
>> }
>>
>> static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
>> {
>> struct grant_map *map;
>>
>> + spin_lock(&priv->lock);
>> list_for_each_entry(map, &priv->maps, next) {
>> if (add->index + add->count < map->index) {
>> list_add_tail(&add->next, &map->next);
>> @@ -127,8 +121,10 @@ static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
>> list_add_tail(&add->next, &priv->maps);
>>
>> done:
>> + add->priv = priv;
>> if (debug)
>> gntdev_print_maps(priv, "[new]", add->index);
>> + spin_unlock(&priv->lock);
>> }
>>
>> static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index,
>> @@ -169,9 +165,10 @@ static int gntdev_del_map(struct grant_map *map)
>>
>> if (map->vma)
>> return -EBUSY;
>> - for (i = 0; i < map->count; i++)
>> - if (map->unmap_ops[i].handle)
>> - return -EBUSY;
>> + if (map->is_mapped)
>> + for (i = 0; i < map->count; i++)
>> + if (map->pages[i].handle)
>> + return -EBUSY;
>>
>> atomic_sub(map->count, &pages_mapped);
>> list_del(&map->next);
>> @@ -182,9 +179,6 @@ static void gntdev_free_map(struct grant_map *map)
>> {
>> if (!map)
>> return;
>> - kfree(map->grants);
>> - kfree(map->map_ops);
>> - kfree(map->unmap_ops);
>> kfree(map);
>> }
>>
>> @@ -199,51 +193,91 @@ static int find_grant_ptes(pte_t *pte, pgtable_t token, unsigned long addr, void
>> BUG_ON(pgnr >= map->count);
>> pte_maddr = (u64)pfn_to_mfn(page_to_pfn(token)) << PAGE_SHIFT;
>> pte_maddr += (unsigned long)pte & ~PAGE_MASK;
>> - gnttab_set_map_op(&map->map_ops[pgnr], pte_maddr, map->flags,
>> - map->grants[pgnr].ref,
>> - map->grants[pgnr].domid);
>> - gnttab_set_unmap_op(&map->unmap_ops[pgnr], pte_maddr, map->flags,
>> - 0 /* handle */);
>> + map->pages[pgnr].pte_maddr = pte_maddr;
>> +
>> return 0;
>> }
>>
>> static int map_grant_pages(struct grant_map *map)
>> {
>> - int i, err = 0;
>> + int i, flags, err = 0;
>> + struct gnttab_map_grant_ref* map_ops;
>>
>> + flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte;
>> + if (map->is_ro)
>> + flags |= GNTMAP_readonly;
>> +
>> + map_ops = kzalloc(sizeof(map_ops[0]) * map->count, GFP_TEMPORARY);
>> + if (!map_ops)
>> + return -ENOMEM;
>> +
>> + for(i=0; i < map->count; i++)
>> + gnttab_set_map_op(&map_ops[i], map->pages[i].pte_maddr, flags,
>> + map->pages[i].target.ref,
>> + map->pages[i].target.domid);
>> if (debug)
>> printk("%s: map %d+%d\n", __FUNCTION__, map->index, map->count);
>> err = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,
>> - map->map_ops, map->count);
>> + map_ops, map->count);
>> if (WARN_ON(err))
>> - return err;
>> + goto out;
>>
>> for (i = 0; i < map->count; i++) {
>> - if (map->map_ops[i].status)
>> + if (map_ops[i].status) {
>> + map->pages[i].pte_maddr = 0;
>> err = -EINVAL;
>> - map->unmap_ops[i].handle = map->map_ops[i].handle;
>> + } else {
>> + map->pages[i].handle = map_ops[i].handle;
>> + }
>> }
>> +
>> +out:
>> + kfree(map_ops);
>> return err;
>> }
>>
>> -static int unmap_grant_pages(struct grant_map *map, int offset, int pages)
>> +static void unmap_grant_pages(struct grant_map *map, int offset, int pages)
>> {
>> - int i, err = 0;
>> + int i, flags, err = 0;
>> + struct gnttab_unmap_grant_ref *unmap_ops;
>> + struct gnttab_unmap_grant_ref unmap_single;
>> +
>> + if (pages > 1) {
>> + unmap_ops = kzalloc(sizeof(unmap_ops[0]) * pages,
>> + GFP_TEMPORARY);
>> + if (unlikely(!unmap_ops)) {
>> + for(i=0; i < pages; i++)
>> + unmap_grant_pages(map, offset + i, 1);
>> + return;
>> + }
>> + } else {
>> + unmap_ops = &unmap_single;
>> + }
>>
>> + flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte;
>> + if (map->is_ro)
>> + flags |= GNTMAP_readonly;
>> +
>> + for(i=0; i < pages; i++)
>> + gnttab_set_unmap_op(&unmap_ops[i],
>> + map->pages[offset+i].pte_maddr, flags,
>> + map->pages[offset+i].handle);
>> if (debug)
>> printk("%s: map %d+%d [%d+%d]\n", __FUNCTION__,
>> map->index, map->count, offset, pages);
>> err = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
>> - map->unmap_ops + offset, pages);
>> + unmap_ops, pages);
>> if (WARN_ON(err))
>> - return err;
>> + goto out;
>>
>> for (i = 0; i < pages; i++) {
>> - if (map->unmap_ops[offset+i].status)
>> - err = -EINVAL;
>> - map->unmap_ops[offset+i].handle = 0;
>> + WARN_ON(unmap_ops[i].status);
>> + map->pages[offset+i].pte_maddr = 0;
>> + map->pages[offset+i].handle = 0;
>> }
>> - return err;
>> +out:
>> + if (unmap_ops != &unmap_single)
>> + kfree(unmap_ops);
>> }
>>
>> /* ------------------------------------------------------------------ */
>> @@ -282,7 +316,6 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
>> struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
>> struct grant_map *map;
>> unsigned long mstart, mend;
>> - int err;
>>
>> spin_lock(&priv->lock);
>> list_for_each_entry(map, &priv->maps, next) {
>> @@ -301,10 +334,9 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
>> __FUNCTION__, map->index, map->count,
>> map->vma->vm_start, map->vma->vm_end,
>> start, end, mstart, mend);
>> - err = unmap_grant_pages(map,
>> - (mstart - map->vma->vm_start) >> PAGE_SHIFT,
>> - (mend - mstart) >> PAGE_SHIFT);
>> - WARN_ON(err);
>> + unmap_grant_pages(map,
>> + (mstart - map->vma->vm_start) >> PAGE_SHIFT,
>> + (mend - mstart) >> PAGE_SHIFT);
>> }
>> spin_unlock(&priv->lock);
>> }
>> @@ -321,7 +353,6 @@ static void mn_release(struct mmu_notifier *mn,
>> {
>> struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
>> struct grant_map *map;
>> - int err;
>>
>> spin_lock(&priv->lock);
>> list_for_each_entry(map, &priv->maps, next) {
>> @@ -331,8 +362,7 @@ static void mn_release(struct mmu_notifier *mn,
>> printk("%s: map %d+%d (%lx %lx)\n",
>> __FUNCTION__, map->index, map->count,
>> map->vma->vm_start, map->vma->vm_end);
>> - err = unmap_grant_pages(map, 0, map->count);
>> - WARN_ON(err);
>> + unmap_grant_pages(map, 0, map->count);
>> }
>> spin_unlock(&priv->lock);
>> }
>> @@ -401,6 +431,7 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
>> {
>> struct ioctl_gntdev_map_grant_ref op;
>> struct grant_map *map;
>> + struct ioctl_gntdev_grant_ref* grants;
>> int err;
>>
>> if (copy_from_user(&op, u, sizeof(op)) != 0)
>> @@ -411,38 +442,45 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
>> if (unlikely(op.count <= 0))
>> return -EINVAL;
>>
>> + grants = kmalloc(sizeof(grants[0]) * op.count, GFP_TEMPORARY);
>> + if (!grants)
>> + return -ENOMEM;
>> +
>> + if (copy_from_user(grants, u->refs, sizeof(grants[0]) * op.count)) {
>> + err = -EFAULT;
>> + goto out_free;
>> + }
>> +
>> err = -ENOMEM;
>> - map = gntdev_alloc_map(priv, op.count);
>> + map = gntdev_alloc_map(op.count, grants);
>> if (!map)
>> - return err;
>> -
>> - if (copy_from_user(map->grants, &u->refs,
>> - sizeof(map->grants[0]) * op.count) != 0) {
>> - gntdev_free_map(map);
>> - return err;
>> - }
>> + goto out_free;
>>
>> if (unlikely(atomic_add_return(op.count, &pages_mapped) > limit))
>> {
>> if (debug)
>> printk("%s: can't map: over limit\n", __FUNCTION__);
>> - gntdev_free_map(map);
>> - return err;
>> + goto out_free_map;
>> }
>>
>> - spin_lock(&priv->lock);
>> gntdev_add_map(priv, map);
>> op.index = map->index << PAGE_SHIFT;
>> - spin_unlock(&priv->lock);
>>
>> - if (copy_to_user(u, &op, sizeof(op)) != 0) {
>> - spin_lock(&priv->lock);
>> - gntdev_del_map(map);
>> - spin_unlock(&priv->lock);
>> - gntdev_free_map(map);
>> - return err;
>> + if (copy_to_user(u, &op, sizeof(op))) {
>> + err = -EFAULT;
>> + goto out_remove;
>> }
>> - return 0;
>> + err = 0;
>> +out_free:
>> + kfree(grants);
>> + return err;
>> +out_remove:
>> + spin_lock(&priv->lock);
>> + gntdev_del_map(map);
>> + spin_unlock(&priv->lock);
>> +out_free_map:
>> + gntdev_free_map(map);
>> + goto out_free;
>> }
>>
>> static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
>> @@ -556,10 +594,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>>
>> vma->vm_private_data = map;
>> map->vma = vma;
>> -
>> - map->flags = GNTMAP_host_map | GNTMAP_application_map | GNTMAP_contains_pte;
>> - if (!(vma->vm_flags & VM_WRITE))
>> - map->flags |= GNTMAP_readonly;
>> + map->is_ro = !(vma->vm_flags & VM_WRITE);
>>
>> spin_unlock(&priv->lock);
>>
>
--
Daniel De Graaf
National Security Agency
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 3/6] xen-gntdev: Remove unneeded structures from grant_map tracking data
2010-12-14 21:52 ` Daniel De Graaf
@ 2010-12-14 21:56 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-14 21:56 UTC (permalink / raw)
To: Daniel De Graaf; +Cc: xen-devel, Ian.Campbell
On 12/14/2010 01:52 PM, Daniel De Graaf wrote:
> On 12/14/2010 04:15 PM, Jeremy Fitzhardinge wrote:
>> On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
>>> The entire hypercall argument list isn't required; only selected
>>> fields from the hypercall need to be tracked between the ioctl, map,
>>> and unmap operations.
>> Is the rationale of this patch to save memory? If so, how much does it
>> save.
>>
>> (This patch seems sensible in principle, but it doesn't seem to save
>> much complexity.)
>>
>> J
> This will also allow easier testing of what pages need to be unmapped
> (more obvious in the HVM version). I also find it less confusing to
> populate the hypercall arguments immediately before the hypercall, but
> that's likely a matter of opinion. It only saves 46 bytes per page, so
> if it seems more complex it could be dropped.
I like it in general. See the other mail I just sent - you can use the
multicall API to remove all the allocations for the arguments, and that
should help a lot.
J
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] xen-gntdev: Remove unneeded structures from grant_map tracking data
2010-12-14 14:55 ` [PATCH 3/6] xen-gntdev: Remove unneeded structures from grant_map tracking data Daniel De Graaf
2010-12-14 21:15 ` Jeremy Fitzhardinge
@ 2010-12-14 21:54 ` Jeremy Fitzhardinge
1 sibling, 0 replies; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-14 21:54 UTC (permalink / raw)
To: Daniel De Graaf; +Cc: Isaku Yamahata, xen-devel, Ian.Campbell
On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
> -static int unmap_grant_pages(struct grant_map *map, int offset, int pages)
> +static void unmap_grant_pages(struct grant_map *map, int offset, int pages)
> {
> - int i, err = 0;
> + int i, flags, err = 0;
> + struct gnttab_unmap_grant_ref *unmap_ops;
> + struct gnttab_unmap_grant_ref unmap_single;
> +
> + if (pages > 1) {
> + unmap_ops = kzalloc(sizeof(unmap_ops[0]) * pages,
> + GFP_TEMPORARY);
> + if (unlikely(!unmap_ops)) {
> + for(i=0; i < pages; i++)
> + unmap_grant_pages(map, offset + i, 1);
> + return;
> + }
> + } else {
> + unmap_ops = &unmap_single;
> + }
Rather than doing this, it would be better to use the multicall batching
API, in particular xen_mc_extend_args() - see xen_extend_mmu_update()
for an example.
(This would mean promoting arch/x86/xen/multicall.h to
include/xen/multicall.h and breaking ia64 builds until there's an ia64
implementation of that API, but that seems like a fair tradeoff at this
point.)
J
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 4/6] xen-gntdev: Use find_vma rather than iterating our vma list manually
2010-12-14 14:55 [PATCH v2] Userspace grant communication Daniel De Graaf
` (2 preceding siblings ...)
2010-12-14 14:55 ` [PATCH 3/6] xen-gntdev: Remove unneeded structures from grant_map tracking data Daniel De Graaf
@ 2010-12-14 14:55 ` Daniel De Graaf
2010-12-14 21:20 ` Jeremy Fitzhardinge
2010-12-14 14:55 ` [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver Daniel De Graaf
2010-12-14 14:55 ` [PATCH 6/6] xen-gntdev: Introduce HVM version of gntdev Daniel De Graaf
5 siblings, 1 reply; 32+ messages in thread
From: Daniel De Graaf @ 2010-12-14 14:55 UTC (permalink / raw)
To: xen-devel; +Cc: Daniel De Graaf, Ian.Campbell
This should be faster if many mappings exist, and also removes
the only user of vma that is not related to PTE modification.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
drivers/xen/gntdev.c | 34 ++++++++++------------------------
1 files changed, 10 insertions(+), 24 deletions(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 773b76c..a73f07c 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -74,6 +74,8 @@ struct grant_map {
struct granted_page pages[0];
};
+static struct vm_operations_struct gntdev_vmops;
+
/* ------------------------------------------------------------------ */
static void gntdev_print_maps(struct gntdev_priv *priv,
@@ -142,23 +144,6 @@ static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int ind
return NULL;
}
-static struct grant_map *gntdev_find_map_vaddr(struct gntdev_priv *priv,
- unsigned long vaddr)
-{
- struct grant_map *map;
-
- list_for_each_entry(map, &priv->maps, next) {
- if (!map->vma)
- continue;
- if (vaddr < map->vma->vm_start)
- continue;
- if (vaddr >= map->vma->vm_end)
- continue;
- return map;
- }
- return NULL;
-}
-
static int gntdev_del_map(struct grant_map *map)
{
int i;
@@ -510,6 +495,7 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
struct ioctl_gntdev_get_offset_for_vaddr __user *u)
{
struct ioctl_gntdev_get_offset_for_vaddr op;
+ struct vm_area_struct *vma;
struct grant_map *map;
if (copy_from_user(&op, u, sizeof(op)) != 0)
@@ -518,16 +504,16 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
printk("%s: priv %p, offset for vaddr %lx\n", __FUNCTION__, priv,
(unsigned long)op.vaddr);
- spin_lock(&priv->lock);
- map = gntdev_find_map_vaddr(priv, op.vaddr);
- if (map == NULL ||
- map->vma->vm_start != op.vaddr) {
- spin_unlock(&priv->lock);
+ vma = find_vma(current->mm, op.vaddr);
+ if (!vma)
return -EINVAL;
- }
+
+ map = vma->vm_private_data;
+ if (vma->vm_ops != &gntdev_vmops || !map)
+ return -EINVAL;
+
op.offset = map->index << PAGE_SHIFT;
op.count = map->count;
- spin_unlock(&priv->lock);
if (copy_to_user(u, &op, sizeof(op)) != 0)
return -EFAULT;
--
1.7.2.3
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH 4/6] xen-gntdev: Use find_vma rather than iterating our vma list manually
2010-12-14 14:55 ` [PATCH 4/6] xen-gntdev: Use find_vma rather than iterating our vma list manually Daniel De Graaf
@ 2010-12-14 21:20 ` Jeremy Fitzhardinge
2010-12-15 9:58 ` Ian Campbell
0 siblings, 1 reply; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-14 21:20 UTC (permalink / raw)
To: Daniel De Graaf; +Cc: xen-devel, Ian.Campbell
On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
> This should be faster if many mappings exist,
Yes, seems reasonable.
> and also removes
> the only user of vma that is not related to PTE modification.
What do you mean? Oh, you mean map->vma?
> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> ---
> drivers/xen/gntdev.c | 34 ++++++++++------------------------
> 1 files changed, 10 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index 773b76c..a73f07c 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -74,6 +74,8 @@ struct grant_map {
> struct granted_page pages[0];
> };
>
> +static struct vm_operations_struct gntdev_vmops;
Is it OK for this to be all NULL?
> +
> /* ------------------------------------------------------------------ */
>
> static void gntdev_print_maps(struct gntdev_priv *priv,
> @@ -142,23 +144,6 @@ static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int ind
> return NULL;
> }
>
> -static struct grant_map *gntdev_find_map_vaddr(struct gntdev_priv *priv,
> - unsigned long vaddr)
> -{
> - struct grant_map *map;
> -
> - list_for_each_entry(map, &priv->maps, next) {
> - if (!map->vma)
> - continue;
> - if (vaddr < map->vma->vm_start)
> - continue;
> - if (vaddr >= map->vma->vm_end)
> - continue;
> - return map;
> - }
> - return NULL;
> -}
> -
> static int gntdev_del_map(struct grant_map *map)
> {
> int i;
> @@ -510,6 +495,7 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
> struct ioctl_gntdev_get_offset_for_vaddr __user *u)
> {
> struct ioctl_gntdev_get_offset_for_vaddr op;
> + struct vm_area_struct *vma;
> struct grant_map *map;
>
> if (copy_from_user(&op, u, sizeof(op)) != 0)
> @@ -518,16 +504,16 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
> printk("%s: priv %p, offset for vaddr %lx\n", __FUNCTION__, priv,
> (unsigned long)op.vaddr);
>
> - spin_lock(&priv->lock);
> - map = gntdev_find_map_vaddr(priv, op.vaddr);
> - if (map == NULL ||
> - map->vma->vm_start != op.vaddr) {
> - spin_unlock(&priv->lock);
> + vma = find_vma(current->mm, op.vaddr);
> + if (!vma)
> return -EINVAL;
> - }
> +
> + map = vma->vm_private_data;
> + if (vma->vm_ops != &gntdev_vmops || !map)
> + return -EINVAL;
I know this is perfectly OK, but I'd be happier if you don't refer to
vm_private_data as "map" until the vma has been confirmed to be a gntdev
one.
> +
> op.offset = map->index << PAGE_SHIFT;
> op.count = map->count;
> - spin_unlock(&priv->lock);
>
> if (copy_to_user(u, &op, sizeof(op)) != 0)
> return -EFAULT;
J
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 4/6] xen-gntdev: Use find_vma rather than iterating our vma list manually
2010-12-14 21:20 ` Jeremy Fitzhardinge
@ 2010-12-15 9:58 ` Ian Campbell
2010-12-16 0:29 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 32+ messages in thread
From: Ian Campbell @ 2010-12-15 9:58 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Daniel De Graaf, xen-devel@lists.xensource.com
On Tue, 2010-12-14 at 21:20 +0000, Jeremy Fitzhardinge wrote:
> On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
> > @@ -74,6 +74,8 @@ struct grant_map {
> > struct granted_page pages[0];
> > };
> >
> > +static struct vm_operations_struct gntdev_vmops;
>
> Is it OK for this to be all NULL?
It's a forward declaration, the actual definition is later on.
I think it would be clearer to move the existing declaration (plus
gntdev_vma_{close,fault}) before gntdev_ioctl_get_offset_for_vaddr or
vice versa.
Oh wait, gntdev_ioctl_get_offset_for_vaddr is already after
gntdev_vmops, isn't it? Unless it changed in patches 2 or 3 which I
don't think it did?
Ian.
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 4/6] xen-gntdev: Use find_vma rather than iterating our vma list manually
2010-12-15 9:58 ` Ian Campbell
@ 2010-12-16 0:29 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-16 0:29 UTC (permalink / raw)
To: Ian Campbell; +Cc: Daniel De Graaf, xen-devel@lists.xensource.com
On 12/15/2010 01:58 AM, Ian Campbell wrote:
> On Tue, 2010-12-14 at 21:20 +0000, Jeremy Fitzhardinge wrote:
>> On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
>>> @@ -74,6 +74,8 @@ struct grant_map {
>>> struct granted_page pages[0];
>>> };
>>>
>>> +static struct vm_operations_struct gntdev_vmops;
>> Is it OK for this to be all NULL?
> It's a forward declaration, the actual definition is later on.
(Ugh, always hated that syntax.)
J
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver
2010-12-14 14:55 [PATCH v2] Userspace grant communication Daniel De Graaf
` (3 preceding siblings ...)
2010-12-14 14:55 ` [PATCH 4/6] xen-gntdev: Use find_vma rather than iterating our vma list manually Daniel De Graaf
@ 2010-12-14 14:55 ` Daniel De Graaf
2010-12-14 21:42 ` Jeremy Fitzhardinge
2010-12-14 14:55 ` [PATCH 6/6] xen-gntdev: Introduce HVM version of gntdev Daniel De Graaf
5 siblings, 1 reply; 32+ messages in thread
From: Daniel De Graaf @ 2010-12-14 14:55 UTC (permalink / raw)
To: xen-devel; +Cc: Daniel De Graaf, Ian.Campbell
This allows a userspace application to allocate a shared page for
implementing inter-domain communication or device drivers. These
shared pages can be mapped using the gntdev device or by the kernel
in another domain.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
drivers/xen/Kconfig | 7 +
drivers/xen/Makefile | 2 +
drivers/xen/gntalloc.c | 456 ++++++++++++++++++++++++++++++++++++++++++++++++
include/xen/gntalloc.h | 68 +++++++
4 files changed, 533 insertions(+), 0 deletions(-)
create mode 100644 drivers/xen/gntalloc.c
create mode 100644 include/xen/gntalloc.h
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index fa9982e..8398cb0 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -180,6 +180,13 @@ config XEN_GNTDEV
help
Allows userspace processes use grants.
+config XEN_GRANT_DEV_ALLOC
+ tristate "User-space grant reference allocator driver"
+ depends on XEN
+ help
+ Allows userspace processes to create pages with access granted
+ to other domains.
+
config XEN_S3
def_bool y
depends on XEN_DOM0 && ACPI
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index ef1ea63..9814c1d 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_XEN_XENCOMM) += xencomm.o
obj-$(CONFIG_XEN_BALLOON) += balloon.o
obj-$(CONFIG_XEN_DEV_EVTCHN) += xen-evtchn.o
obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev.o
+obj-$(CONFIG_XEN_GRANT_DEV_ALLOC) += xen-gntalloc.o
obj-$(CONFIG_XEN_PCIDEV_BACKEND) += pciback/
obj-$(CONFIG_XEN_BLKDEV_BACKEND) += blkback/
obj-$(CONFIG_XEN_BLKDEV_TAP) += blktap/
@@ -25,3 +26,4 @@ obj-$(CONFIG_XEN_PLATFORM_PCI) += platform-pci.o
xen-evtchn-y := evtchn.o
xen-gntdev-y := gntdev.o
+xen-gntalloc-y := gntalloc.o
diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
new file mode 100644
index 0000000..f26adfd
--- /dev/null
+++ b/drivers/xen/gntalloc.c
@@ -0,0 +1,456 @@
+/******************************************************************************
+ * gntalloc.c
+ *
+ * Device for creating grant references (in user-space) that may be shared
+ * with other domains.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+/*
+ * This driver exists to allow userspace programs in Linux to allocate kernel
+ * memory that will later be shared with another domain. Without this device,
+ * Linux userspace programs cannot create grant references.
+ *
+ * How this stuff works:
+ * X -> granting a page to Y
+ * Y -> mapping the grant from X
+ *
+ * 1. X uses the gntalloc device to allocate a page of kernel memory, P.
+ * 2. X creates an entry in the grant table that says domid(Y) can
+ * access P.
+ * 3. X gives the grant reference identifier, GREF, to Y.
+ * 4. A program in Y uses the gntdev device to map the page (owned by X
+ * and identified by GREF) into domain(Y) and then into the address
+ * space of the program. Behind the scenes, this requires a
+ * hypercall in which Xen modifies the host CPU page tables to
+ * perform the sharing -- that's where the actual cross-domain mapping
+ * occurs.
+ * 5. A program in X mmap()s a segment of the gntalloc device that
+ * corresponds to the shared page.
+ * 6. The two userspace programs can now communicate over the shared page.
+ *
+ *
+ * NOTE TO USERSPACE LIBRARIES:
+ * The grant allocation and mmap()ing are, naturally, two separate
+ * operations. You set up the sharing by calling the create ioctl() and
+ * then the mmap(). You must tear down the sharing in the reverse order
+ * (munmap() and then the destroy ioctl()).
+ *
+ * WARNING: Since Xen does not allow a guest to forcibly end the use of a grant
+ * reference, this device can be used to consume kernel memory by leaving grant
+ * references mapped by another domain when an application exits. Therefore,
+ * there is a global limit on the number of pages that can be allocated. When
+ * all references to the page are unmapped, it will be freed during the next
+ * grant operation.
+ */
+
+#include <asm/atomic.h>
+#include <linux/module.h>
+#include <linux/miscdevice.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/device.h>
+#include <linux/mm.h>
+#include <asm/uaccess.h>
+#include <linux/types.h>
+#include <linux/list.h>
+
+#include <xen/xen.h>
+#include <xen/page.h>
+#include <xen/grant_table.h>
+#include <xen/gntalloc.h>
+
+static int debug = 0;
+module_param(debug, int, 0644);
+
+static int limit = 1024;
+module_param(limit, int, 0644);
+
+static LIST_HEAD(gref_list);
+static DEFINE_SPINLOCK(gref_lock);
+static int gref_size = 0;
+
+/* Metadata on a grant reference. */
+struct gntalloc_gref {
+ struct list_head next_all; /* list entry gref_list */
+ struct list_head next_file; /* list entry file->list, if open */
+ domid_t foreign_domid; /* The ID of the domain to share with. */
+ grant_ref_t gref_id; /* The grant reference number. */
+ unsigned int users; /* Use count - when zero, waiting on Xen */
+ struct page* page; /* The shared page. */
+};
+
+struct gntalloc_file_private_data {
+ struct list_head list;
+};
+
+static void __del_gref(struct gntalloc_gref *gref);
+
+static void do_cleanup(void)
+{
+ struct gntalloc_gref *gref, *n;
+ list_for_each_entry_safe(gref, n, &gref_list, next_all) {
+ if (!gref->users)
+ __del_gref(gref);
+ }
+}
+
+
+static int add_gref(domid_t foreign_domid, uint32_t readonly,
+ struct gntalloc_file_private_data *priv)
+{
+ int rc;
+ struct gntalloc_gref *gref;
+
+ rc = -ENOMEM;
+ spin_lock(&gref_lock);
+ do_cleanup();
+ if (gref_size >= limit) {
+ spin_unlock(&gref_lock);
+ rc = -ENOSPC;
+ goto out;
+ }
+ gref_size++;
+ spin_unlock(&gref_lock);
+
+ gref = kzalloc(sizeof(*gref), GFP_KERNEL);
+ if (!gref)
+ goto out;
+
+ gref->foreign_domid = foreign_domid;
+ gref->users = 1;
+
+ /* Allocate the page to share. */
+ gref->page = alloc_page(GFP_KERNEL|__GFP_ZERO);
+ if (!gref->page)
+ goto out_nopage;
+
+ /* Grant foreign access to the page. */
+ gref->gref_id = gnttab_grant_foreign_access(foreign_domid,
+ pfn_to_mfn(page_to_pfn(gref->page)), readonly);
+ if (gref->gref_id < 0) {
+ printk(KERN_ERR "%s: failed to grant foreign access for mfn "
+ "%lu to domain %u\n", __func__,
+ pfn_to_mfn(page_to_pfn(gref->page)), foreign_domid);
+ rc = -EFAULT;
+ goto out_no_foreign_gref;
+ }
+
+ /* Add to gref lists. */
+ spin_lock(&gref_lock);
+ list_add_tail(&gref->next_all, &gref_list);
+ list_add_tail(&gref->next_file, &priv->list);
+ spin_unlock(&gref_lock);
+
+ return gref->gref_id;
+
+out_no_foreign_gref:
+ __free_page(gref->page);
+out_nopage:
+ kfree(gref);
+out:
+ return rc;
+}
+
+static void __del_gref(struct gntalloc_gref *gref)
+{
+ if (gnttab_query_foreign_access(gref->gref_id))
+ return;
+
+ if (!gnttab_end_foreign_access_ref(gref->gref_id, 0))
+ return;
+
+ gref_size--;
+ list_del(&gref->next_all);
+
+ __free_page(gref->page);
+ kfree(gref);
+}
+
+static struct gntalloc_gref* find_gref(struct gntalloc_file_private_data *priv,
+ grant_ref_t gref_id)
+{
+ struct gntalloc_gref *gref;
+ list_for_each_entry(gref, &priv->list, next_file) {
+ if (gref->gref_id == gref_id)
+ return gref;
+ }
+ return NULL;
+}
+
+/*
+ * -------------------------------------
+ * File operations.
+ * -------------------------------------
+ */
+static int gntalloc_open(struct inode *inode, struct file *filp)
+{
+ struct gntalloc_file_private_data *priv;
+
+ try_module_get(THIS_MODULE);
+
+ priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ goto out_nomem;
+ INIT_LIST_HEAD(&priv->list);
+
+ filp->private_data = priv;
+
+ if (debug)
+ printk("%s: priv %p\n", __FUNCTION__, priv);
+
+ return 0;
+
+out_nomem:
+ return -ENOMEM;
+}
+
+static int gntalloc_release(struct inode *inode, struct file *filp)
+{
+ struct gntalloc_file_private_data *priv = filp->private_data;
+ struct gntalloc_gref *gref;
+
+ if (debug)
+ printk("%s: priv %p\n", __FUNCTION__, priv);
+
+ spin_lock(&gref_lock);
+ while (!list_empty(&priv->list)) {
+ gref = list_entry(priv->list.next,
+ struct gntalloc_gref, next_file);
+ list_del(&gref->next_file);
+ gref->users--;
+ if (gref->users == 0)
+ __del_gref(gref);
+ }
+ kfree(priv);
+ spin_unlock(&gref_lock);
+
+ module_put(THIS_MODULE);
+
+ return 0;
+}
+
+static long gntalloc_ioctl_alloc(struct gntalloc_file_private_data *priv,
+ void __user *arg)
+{
+ int rc = 0;
+ struct ioctl_gntalloc_alloc_gref op;
+
+ if (debug)
+ printk("%s: priv %p\n", __FUNCTION__, priv);
+
+ if (copy_from_user(&op, arg, sizeof(op))) {
+ rc = -EFAULT;
+ goto alloc_grant_out;
+ }
+ rc = add_gref(op.foreign_domid, op.readonly, priv);
+ if (rc < 0)
+ goto alloc_grant_out;
+
+ op.gref_id = rc;
+ op.page_idx = rc;
+
+ rc = 0;
+
+ if (copy_to_user((void __user *)arg, &op, sizeof(op))) {
+ rc = -EFAULT;
+ goto alloc_grant_out;
+ }
+
+alloc_grant_out:
+ return rc;
+}
+
+static long gntalloc_ioctl_dealloc(struct gntalloc_file_private_data *priv,
+ void __user *arg)
+{
+ int rc = 0;
+ struct ioctl_gntalloc_dealloc_gref op;
+ struct gntalloc_gref *gref;
+
+ if (debug)
+ printk("%s: priv %p\n", __FUNCTION__, priv);
+
+ if (copy_from_user(&op, arg, sizeof(op))) {
+ rc = -EFAULT;
+ goto dealloc_grant_out;
+ }
+
+ spin_lock(&gref_lock);
+ gref = find_gref(priv, op.gref_id);
+ if (gref) {
+ list_del(&gref->next_file);
+ gref->users--;
+ rc = 0;
+ } else {
+ rc = -EINVAL;
+ }
+
+ do_cleanup();
+ spin_unlock(&gref_lock);
+dealloc_grant_out:
+ return rc;
+}
+
+static long gntalloc_ioctl(struct file *filp, unsigned int cmd,
+ unsigned long arg)
+{
+ struct gntalloc_file_private_data *priv = filp->private_data;
+
+ switch (cmd) {
+ case IOCTL_GNTALLOC_ALLOC_GREF:
+ return gntalloc_ioctl_alloc(priv, (void __user*)arg);
+
+ case IOCTL_GNTALLOC_DEALLOC_GREF:
+ return gntalloc_ioctl_dealloc(priv, (void __user*)arg);
+
+ default:
+ return -ENOIOCTLCMD;
+ }
+
+ return 0;
+}
+
+static int gntalloc_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+ struct gntalloc_gref *gref = vma->vm_private_data;
+ if (!gref)
+ return VM_FAULT_SIGBUS;
+
+ vmf->page = gref->page;
+ get_page(vmf->page);
+
+ return 0;
+};
+
+static void gntalloc_vma_close(struct vm_area_struct *vma)
+{
+ struct gntalloc_gref *gref = vma->vm_private_data;
+ if (!gref)
+ return;
+
+ spin_lock(&gref_lock);
+ gref->users--;
+ if (gref->users == 0)
+ __del_gref(gref);
+ spin_unlock(&gref_lock);
+}
+
+static struct vm_operations_struct gntalloc_vmops = {
+ .fault = gntalloc_vma_fault,
+ .close = gntalloc_vma_close,
+};
+
+static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+ struct gntalloc_file_private_data *priv = filp->private_data;
+ struct gntalloc_gref *gref;
+
+ if (debug)
+ printk("%s: priv %p, page %lu\n", __func__,
+ priv, vma->vm_pgoff);
+
+ /*
+ * There is a 1-to-1 correspondence of grant references to shared
+ * pages, so it only makes sense to map exactly one page per
+ * call to mmap().
+ */
+ if (((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) != 1) {
+ printk(KERN_ERR "%s: Only one page can be memory-mapped "
+ "per grant reference.\n", __func__);
+ return -EINVAL;
+ }
+
+ if (!(vma->vm_flags & VM_SHARED)) {
+ printk(KERN_ERR "%s: Mapping must be shared.\n",
+ __func__);
+ return -EINVAL;
+ }
+
+ spin_lock(&gref_lock);
+ gref = find_gref(priv, vma->vm_pgoff);
+ if (gref == NULL) {
+ spin_unlock(&gref_lock);
+ printk(KERN_ERR "%s: Could not find a grant reference with "
+ "page index %lu.\n", __func__, vma->vm_pgoff);
+ return -ENOENT;
+ }
+ gref->users++;
+ spin_unlock(&gref_lock);
+
+ vma->vm_private_data = gref;
+
+ /* This flag prevents Bad PTE errors when the memory is unmapped. */
+ vma->vm_flags |= VM_RESERVED;
+ vma->vm_flags |= VM_DONTCOPY;
+ vma->vm_flags |= VM_IO;
+
+ vma->vm_ops = &gntalloc_vmops;
+
+ return 0;
+}
+
+static const struct file_operations gntalloc_fops = {
+ .owner = THIS_MODULE,
+ .open = gntalloc_open,
+ .release = gntalloc_release,
+ .unlocked_ioctl = gntalloc_ioctl,
+ .mmap = gntalloc_mmap
+};
+
+/*
+ * -------------------------------------
+ * Module creation/destruction.
+ * -------------------------------------
+ */
+static struct miscdevice gntalloc_miscdev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "xen/gntalloc",
+ .fops = &gntalloc_fops,
+};
+
+static int __init gntalloc_init(void)
+{
+ int err;
+
+ if (!xen_domain()) {
+ if (debug)
+ printk(KERN_ERR "gntalloc: You must be running Xen\n");
+ return -ENODEV;
+ }
+
+ err = misc_register(&gntalloc_miscdev);
+ if (err != 0) {
+ printk(KERN_ERR "Could not register misc gntalloc device\n");
+ return err;
+ }
+
+ if (debug)
+ printk(KERN_INFO "Created grant allocation device at %d,%d\n",
+ MISC_MAJOR, gntalloc_miscdev.minor);
+
+ return 0;
+}
+
+static void __exit gntalloc_exit(void)
+{
+ misc_deregister(&gntalloc_miscdev);
+}
+
+module_init(gntalloc_init);
+module_exit(gntalloc_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Carter Weatherly <carter.weatherly@jhuapl.edu>, "
+ "Daniel De Graaf <dgdegra@tycho.nsa.gov>");
+MODULE_DESCRIPTION("User-space grant reference allocator driver");
diff --git a/include/xen/gntalloc.h b/include/xen/gntalloc.h
new file mode 100644
index 0000000..76b70d7
--- /dev/null
+++ b/include/xen/gntalloc.h
@@ -0,0 +1,68 @@
+/******************************************************************************
+ * gntalloc.h
+ *
+ * Interface to /dev/xen/gntalloc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __LINUX_PUBLIC_GNTALLOC_H__
+#define __LINUX_PUBLIC_GNTALLOC_H__
+
+/*
+ * Allocates a new page and creates a new grant reference.
+ *
+ * N.B. The page_idx is really the address >> PAGE_SHIFT, meaning it's the
+ * page number and not an actual address. It must be shifted again prior
+ * to feeding it to mmap() (i.e. page_idx << PAGE_SHIFT).
+ */
+#define IOCTL_GNTALLOC_ALLOC_GREF \
+_IOC(_IOC_NONE, 'G', 1, sizeof(struct ioctl_gntalloc_alloc_gref))
+struct ioctl_gntalloc_alloc_gref {
+ /* IN parameters */
+ /* The ID of the domain creating the grant reference. */
+ domid_t owner_domid;
+ /* The ID of the domain to be given access to the grant. */
+ domid_t foreign_domid;
+ /* The type of access given to domid. */
+ uint32_t readonly;
+ /* OUT parameters */
+ /* The grant reference of the newly created grant. */
+ grant_ref_t gref_id;
+ /* The page index (page number, NOT address) for grant mmap(). */
+ uint32_t page_idx;
+};
+
+/*
+ * Deallocates the grant reference, freeing the associated page.
+ */
+#define IOCTL_GNTALLOC_DEALLOC_GREF \
+_IOC(_IOC_NONE, 'G', 2, sizeof(struct ioctl_gntalloc_dealloc_gref))
+struct ioctl_gntalloc_dealloc_gref {
+ /* IN parameter */
+ /* The grant reference to deallocate. */
+ grant_ref_t gref_id;
+};
+#endif /* __LINUX_PUBLIC_GNTALLOC_H__ */
--
1.7.2.3
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver
2010-12-14 14:55 ` [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver Daniel De Graaf
@ 2010-12-14 21:42 ` Jeremy Fitzhardinge
2010-12-14 22:06 ` Daniel De Graaf
0 siblings, 1 reply; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-14 21:42 UTC (permalink / raw)
To: Daniel De Graaf; +Cc: xen-devel, Ian.Campbell
On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
> This allows a userspace application to allocate a shared page for
> implementing inter-domain communication or device drivers. These
> shared pages can be mapped using the gntdev device or by the kernel
> in another domain.
>
> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> ---
> drivers/xen/Kconfig | 7 +
> drivers/xen/Makefile | 2 +
> drivers/xen/gntalloc.c | 456 ++++++++++++++++++++++++++++++++++++++++++++++++
> include/xen/gntalloc.h | 68 +++++++
> 4 files changed, 533 insertions(+), 0 deletions(-)
> create mode 100644 drivers/xen/gntalloc.c
> create mode 100644 include/xen/gntalloc.h
>
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index fa9982e..8398cb0 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -180,6 +180,13 @@ config XEN_GNTDEV
> help
> Allows userspace processes use grants.
>
> +config XEN_GRANT_DEV_ALLOC
> + tristate "User-space grant reference allocator driver"
> + depends on XEN
> + help
> + Allows userspace processes to create pages with access granted
> + to other domains.
> +
> config XEN_S3
> def_bool y
> depends on XEN_DOM0 && ACPI
> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
> index ef1ea63..9814c1d 100644
> --- a/drivers/xen/Makefile
> +++ b/drivers/xen/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_XEN_XENCOMM) += xencomm.o
> obj-$(CONFIG_XEN_BALLOON) += balloon.o
> obj-$(CONFIG_XEN_DEV_EVTCHN) += xen-evtchn.o
> obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev.o
> +obj-$(CONFIG_XEN_GRANT_DEV_ALLOC) += xen-gntalloc.o
> obj-$(CONFIG_XEN_PCIDEV_BACKEND) += pciback/
> obj-$(CONFIG_XEN_BLKDEV_BACKEND) += blkback/
> obj-$(CONFIG_XEN_BLKDEV_TAP) += blktap/
> @@ -25,3 +26,4 @@ obj-$(CONFIG_XEN_PLATFORM_PCI) += platform-pci.o
>
> xen-evtchn-y := evtchn.o
> xen-gntdev-y := gntdev.o
> +xen-gntalloc-y := gntalloc.o
> diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
> new file mode 100644
> index 0000000..f26adfd
> --- /dev/null
> +++ b/drivers/xen/gntalloc.c
> @@ -0,0 +1,456 @@
> +/******************************************************************************
> + * gntalloc.c
> + *
> + * Device for creating grant references (in user-space) that may be shared
> + * with other domains.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> + */
> +
> +/*
> + * This driver exists to allow userspace programs in Linux to allocate kernel
> + * memory that will later be shared with another domain. Without this device,
> + * Linux userspace programs cannot create grant references.
> + *
> + * How this stuff works:
> + * X -> granting a page to Y
> + * Y -> mapping the grant from X
> + *
> + * 1. X uses the gntalloc device to allocate a page of kernel memory, P.
> + * 2. X creates an entry in the grant table that says domid(Y) can
> + * access P.
> + * 3. X gives the grant reference identifier, GREF, to Y.
> + * 4. A program in Y uses the gntdev device to map the page (owned by X
> + * and identified by GREF) into domain(Y) and then into the address
> + * space of the program. Behind the scenes, this requires a
> + * hypercall in which Xen modifies the host CPU page tables to
> + * perform the sharing -- that's where the actual cross-domain mapping
> + * occurs.
Presumably Y could be any grant-page user, not specifically gntdev? So
you could use this to implement a frontend in userspace?
> + * 5. A program in X mmap()s a segment of the gntalloc device that
> + * corresponds to the shared page.
> + * 6. The two userspace programs can now communicate over the shared page.
> + *
> + *
> + * NOTE TO USERSPACE LIBRARIES:
> + * The grant allocation and mmap()ing are, naturally, two separate
> + * operations. You set up the sharing by calling the create ioctl() and
> + * then the mmap(). You must tear down the sharing in the reverse order
> + * (munmap() and then the destroy ioctl()).
> + *
> + * WARNING: Since Xen does not allow a guest to forcibly end the use of a grant
> + * reference, this device can be used to consume kernel memory by leaving grant
> + * references mapped by another domain when an application exits. Therefore,
> + * there is a global limit on the number of pages that can be allocated. When
> + * all references to the page are unmapped, it will be freed during the next
> + * grant operation.
> + */
> +
> +#include <asm/atomic.h>
> +#include <linux/module.h>
> +#include <linux/miscdevice.h>
> +#include <linux/kernel.h>
> +#include <linux/init.h>
> +#include <linux/slab.h>
> +#include <linux/fs.h>
> +#include <linux/device.h>
> +#include <linux/mm.h>
> +#include <asm/uaccess.h>
> +#include <linux/types.h>
> +#include <linux/list.h>
> +
> +#include <xen/xen.h>
> +#include <xen/page.h>
> +#include <xen/grant_table.h>
> +#include <xen/gntalloc.h>
> +
> +static int debug = 0;
> +module_param(debug, int, 0644);
> +
> +static int limit = 1024;
> +module_param(limit, int, 0644);
> +
> +static LIST_HEAD(gref_list);
> +static DEFINE_SPINLOCK(gref_lock);
> +static int gref_size = 0;
> +
> +/* Metadata on a grant reference. */
> +struct gntalloc_gref {
> + struct list_head next_all; /* list entry gref_list */
> + struct list_head next_file; /* list entry file->list, if open */
> + domid_t foreign_domid; /* The ID of the domain to share with. */
> + grant_ref_t gref_id; /* The grant reference number. */
> + unsigned int users; /* Use count - when zero, waiting on Xen */
> + struct page* page; /* The shared page. */
> +};
> +
> +struct gntalloc_file_private_data {
> + struct list_head list;
> +};
> +
> +static void __del_gref(struct gntalloc_gref *gref);
> +
> +static void do_cleanup(void)
> +{
> + struct gntalloc_gref *gref, *n;
> + list_for_each_entry_safe(gref, n, &gref_list, next_all) {
> + if (!gref->users)
> + __del_gref(gref);
> + }
> +}
> +
> +
> +static int add_gref(domid_t foreign_domid, uint32_t readonly,
> + struct gntalloc_file_private_data *priv)
> +{
> + int rc;
> + struct gntalloc_gref *gref;
> +
> + rc = -ENOMEM;
> + spin_lock(&gref_lock);
> + do_cleanup();
> + if (gref_size >= limit) {
> + spin_unlock(&gref_lock);
> + rc = -ENOSPC;
> + goto out;
> + }
> + gref_size++;
> + spin_unlock(&gref_lock);
> +
> + gref = kzalloc(sizeof(*gref), GFP_KERNEL);
> + if (!gref)
> + goto out;
> +
> + gref->foreign_domid = foreign_domid;
> + gref->users = 1;
> +
> + /* Allocate the page to share. */
> + gref->page = alloc_page(GFP_KERNEL|__GFP_ZERO);
Could this be GFP_HIGHUSER?
> + if (!gref->page)
> + goto out_nopage;
> +
> + /* Grant foreign access to the page. */
> + gref->gref_id = gnttab_grant_foreign_access(foreign_domid,
> + pfn_to_mfn(page_to_pfn(gref->page)), readonly);
> + if (gref->gref_id < 0) {
> + printk(KERN_ERR "%s: failed to grant foreign access for mfn "
> + "%lu to domain %u\n", __func__,
> + pfn_to_mfn(page_to_pfn(gref->page)), foreign_domid);
> + rc = -EFAULT;
> + goto out_no_foreign_gref;
> + }
> +
> + /* Add to gref lists. */
> + spin_lock(&gref_lock);
> + list_add_tail(&gref->next_all, &gref_list);
> + list_add_tail(&gref->next_file, &priv->list);
> + spin_unlock(&gref_lock);
> +
> + return gref->gref_id;
> +
> +out_no_foreign_gref:
> + __free_page(gref->page);
> +out_nopage:
> + kfree(gref);
> +out:
> + return rc;
> +}
> +
> +static void __del_gref(struct gntalloc_gref *gref)
> +{
> + if (gnttab_query_foreign_access(gref->gref_id))
> + return;
> +
> + if (!gnttab_end_foreign_access_ref(gref->gref_id, 0))
> + return;
> +
> + gref_size--;
> + list_del(&gref->next_all);
> +
> + __free_page(gref->page);
> + kfree(gref);
> +}
> +
> +static struct gntalloc_gref* find_gref(struct gntalloc_file_private_data *priv,
> + grant_ref_t gref_id)
> +{
> + struct gntalloc_gref *gref;
> + list_for_each_entry(gref, &priv->list, next_file) {
> + if (gref->gref_id == gref_id)
> + return gref;
> + }
> + return NULL;
> +}
> +
> +/*
> + * -------------------------------------
> + * File operations.
> + * -------------------------------------
> + */
> +static int gntalloc_open(struct inode *inode, struct file *filp)
> +{
> + struct gntalloc_file_private_data *priv;
> +
> + try_module_get(THIS_MODULE);
> +
> + priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> + if (!priv)
> + goto out_nomem;
> + INIT_LIST_HEAD(&priv->list);
> +
> + filp->private_data = priv;
> +
> + if (debug)
> + printk("%s: priv %p\n", __FUNCTION__, priv);
> +
> + return 0;
> +
> +out_nomem:
> + return -ENOMEM;
> +}
> +
> +static int gntalloc_release(struct inode *inode, struct file *filp)
> +{
> + struct gntalloc_file_private_data *priv = filp->private_data;
> + struct gntalloc_gref *gref;
> +
> + if (debug)
> + printk("%s: priv %p\n", __FUNCTION__, priv);
> +
> + spin_lock(&gref_lock);
Presumably this is unnecessary because there can be no other users if
you're tearing down the list and destroying priv.
> + while (!list_empty(&priv->list)) {
> + gref = list_entry(priv->list.next,
> + struct gntalloc_gref, next_file);
> + list_del(&gref->next_file);
> + gref->users--;
> + if (gref->users == 0)
> + __del_gref(gref);
> + }
> + kfree(priv);
> + spin_unlock(&gref_lock);
> +
> + module_put(THIS_MODULE);
> +
> + return 0;
> +}
> +
> +static long gntalloc_ioctl_alloc(struct gntalloc_file_private_data *priv,
> + void __user *arg)
> +{
> + int rc = 0;
> + struct ioctl_gntalloc_alloc_gref op;
> +
> + if (debug)
> + printk("%s: priv %p\n", __FUNCTION__, priv);
> +
> + if (copy_from_user(&op, arg, sizeof(op))) {
> + rc = -EFAULT;
> + goto alloc_grant_out;
> + }
> + rc = add_gref(op.foreign_domid, op.readonly, priv);
> + if (rc < 0)
> + goto alloc_grant_out;
> +
> + op.gref_id = rc;
> + op.page_idx = rc;
Hm, see below.
> +
> + rc = 0;
> +
> + if (copy_to_user((void __user *)arg, &op, sizeof(op))) {
> + rc = -EFAULT;
> + goto alloc_grant_out;
> + }
> +
> +alloc_grant_out:
> + return rc;
> +}
> +
> +static long gntalloc_ioctl_dealloc(struct gntalloc_file_private_data *priv,
> + void __user *arg)
> +{
> + int rc = 0;
> + struct ioctl_gntalloc_dealloc_gref op;
> + struct gntalloc_gref *gref;
> +
> + if (debug)
> + printk("%s: priv %p\n", __FUNCTION__, priv);
> +
> + if (copy_from_user(&op, arg, sizeof(op))) {
> + rc = -EFAULT;
> + goto dealloc_grant_out;
> + }
> +
> + spin_lock(&gref_lock);
> + gref = find_gref(priv, op.gref_id);
> + if (gref) {
> + list_del(&gref->next_file);
> + gref->users--;
> + rc = 0;
> + } else {
> + rc = -EINVAL;
> + }
> +
> + do_cleanup();
> + spin_unlock(&gref_lock);
> +dealloc_grant_out:
> + return rc;
> +}
> +
> +static long gntalloc_ioctl(struct file *filp, unsigned int cmd,
> + unsigned long arg)
> +{
> + struct gntalloc_file_private_data *priv = filp->private_data;
> +
> + switch (cmd) {
> + case IOCTL_GNTALLOC_ALLOC_GREF:
> + return gntalloc_ioctl_alloc(priv, (void __user*)arg);
> +
> + case IOCTL_GNTALLOC_DEALLOC_GREF:
> + return gntalloc_ioctl_dealloc(priv, (void __user*)arg);
> +
> + default:
> + return -ENOIOCTLCMD;
> + }
> +
> + return 0;
> +}
> +
> +static int gntalloc_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> +{
> + struct gntalloc_gref *gref = vma->vm_private_data;
> + if (!gref)
> + return VM_FAULT_SIGBUS;
> +
> + vmf->page = gref->page;
> + get_page(vmf->page);
> +
> + return 0;
> +};
> +
> +static void gntalloc_vma_close(struct vm_area_struct *vma)
> +{
> + struct gntalloc_gref *gref = vma->vm_private_data;
> + if (!gref)
> + return;
> +
> + spin_lock(&gref_lock);
> + gref->users--;
> + if (gref->users == 0)
> + __del_gref(gref);
> + spin_unlock(&gref_lock);
> +}
> +
> +static struct vm_operations_struct gntalloc_vmops = {
> + .fault = gntalloc_vma_fault,
> + .close = gntalloc_vma_close,
> +};
> +
> +static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma)
> +{
> + struct gntalloc_file_private_data *priv = filp->private_data;
> + struct gntalloc_gref *gref;
> +
> + if (debug)
> + printk("%s: priv %p, page %lu\n", __func__,
> + priv, vma->vm_pgoff);
> +
> + /*
> + * There is a 1-to-1 correspondence of grant references to shared
> + * pages, so it only makes sense to map exactly one page per
> + * call to mmap().
> + */
Single-page mmap makes sense if the only possible use-cases are for
single-page mappings, but if you're talking about framebuffers and the
like is seems like a very awkward way to use mmap. It would be cleaner
from an API perspective to have a user-mode defined flat address space
indexed by pgoff which maps to an array of grefs, so you can sensibly do
a multi-page mapping.
It would also allow you to hide the grefs from usermode entirely. Then
its just up to usermode to choose suitable file offsets for itself.
> + if (((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) != 1) {
> + printk(KERN_ERR "%s: Only one page can be memory-mapped "
> + "per grant reference.\n", __func__);
> + return -EINVAL;
> + }
> +
> + if (!(vma->vm_flags & VM_SHARED)) {
> + printk(KERN_ERR "%s: Mapping must be shared.\n",
> + __func__);
> + return -EINVAL;
> + }
> +
> + spin_lock(&gref_lock);
> + gref = find_gref(priv, vma->vm_pgoff);
> + if (gref == NULL) {
> + spin_unlock(&gref_lock);
> + printk(KERN_ERR "%s: Could not find a grant reference with "
> + "page index %lu.\n", __func__, vma->vm_pgoff);
> + return -ENOENT;
> + }
> + gref->users++;
> + spin_unlock(&gref_lock);
> +
> + vma->vm_private_data = gref;
> +
> + /* This flag prevents Bad PTE errors when the memory is unmapped. */
> + vma->vm_flags |= VM_RESERVED;
> + vma->vm_flags |= VM_DONTCOPY;
> + vma->vm_flags |= VM_IO;
If you set VM_PFNMAP then you don't need to deal with faults.
> +
> + vma->vm_ops = &gntalloc_vmops;
> +
> + return 0;
> +}
> +
> +static const struct file_operations gntalloc_fops = {
> + .owner = THIS_MODULE,
> + .open = gntalloc_open,
> + .release = gntalloc_release,
> + .unlocked_ioctl = gntalloc_ioctl,
> + .mmap = gntalloc_mmap
> +};
> +
> +/*
> + * -------------------------------------
> + * Module creation/destruction.
> + * -------------------------------------
> + */
> +static struct miscdevice gntalloc_miscdev = {
> + .minor = MISC_DYNAMIC_MINOR,
> + .name = "xen/gntalloc",
> + .fops = &gntalloc_fops,
> +};
> +
> +static int __init gntalloc_init(void)
> +{
> + int err;
> +
> + if (!xen_domain()) {
> + if (debug)
> + printk(KERN_ERR "gntalloc: You must be running Xen\n");
> + return -ENODEV;
> + }
> +
> + err = misc_register(&gntalloc_miscdev);
> + if (err != 0) {
> + printk(KERN_ERR "Could not register misc gntalloc device\n");
> + return err;
> + }
> +
> + if (debug)
> + printk(KERN_INFO "Created grant allocation device at %d,%d\n",
> + MISC_MAJOR, gntalloc_miscdev.minor);
> +
> + return 0;
> +}
> +
> +static void __exit gntalloc_exit(void)
> +{
> + misc_deregister(&gntalloc_miscdev);
> +}
> +
> +module_init(gntalloc_init);
> +module_exit(gntalloc_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Carter Weatherly <carter.weatherly@jhuapl.edu>, "
> + "Daniel De Graaf <dgdegra@tycho.nsa.gov>");
> +MODULE_DESCRIPTION("User-space grant reference allocator driver");
> diff --git a/include/xen/gntalloc.h b/include/xen/gntalloc.h
> new file mode 100644
> index 0000000..76b70d7
> --- /dev/null
> +++ b/include/xen/gntalloc.h
> @@ -0,0 +1,68 @@
> +/******************************************************************************
> + * gntalloc.h
> + *
> + * Interface to /dev/xen/gntalloc.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version 2
> + * as published by the Free Software Foundation; or, when distributed
> + * separately from the Linux kernel or incorporated into other
> + * software packages, subject to the following license:
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this source file (the "Software"), to deal in the Software without
> + * restriction, including without limitation the rights to use, copy, modify,
> + * merge, publish, distribute, sublicense, and/or sell copies of the Software,
> + * and to permit persons to whom the Software is furnished to do so, subject to
> + * the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#ifndef __LINUX_PUBLIC_GNTALLOC_H__
> +#define __LINUX_PUBLIC_GNTALLOC_H__
> +
> +/*
> + * Allocates a new page and creates a new grant reference.
> + *
> + * N.B. The page_idx is really the address >> PAGE_SHIFT, meaning it's the
> + * page number and not an actual address. It must be shifted again prior
> + * to feeding it to mmap() (i.e. page_idx << PAGE_SHIFT).
> + */
> +#define IOCTL_GNTALLOC_ALLOC_GREF \
> +_IOC(_IOC_NONE, 'G', 1, sizeof(struct ioctl_gntalloc_alloc_gref))
> +struct ioctl_gntalloc_alloc_gref {
> + /* IN parameters */
> + /* The ID of the domain creating the grant reference. */
> + domid_t owner_domid;
> + /* The ID of the domain to be given access to the grant. */
> + domid_t foreign_domid;
> + /* The type of access given to domid. */
> + uint32_t readonly;
> + /* OUT parameters */
> + /* The grant reference of the newly created grant. */
> + grant_ref_t gref_id;
> + /* The page index (page number, NOT address) for grant mmap(). */
> + uint32_t page_idx;
> +};
> +
> +/*
> + * Deallocates the grant reference, freeing the associated page.
> + */
> +#define IOCTL_GNTALLOC_DEALLOC_GREF \
> +_IOC(_IOC_NONE, 'G', 2, sizeof(struct ioctl_gntalloc_dealloc_gref))
> +struct ioctl_gntalloc_dealloc_gref {
> + /* IN parameter */
> + /* The grant reference to deallocate. */
> + grant_ref_t gref_id;
> +};
> +#endif /* __LINUX_PUBLIC_GNTALLOC_H__ */
J
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver
2010-12-14 21:42 ` Jeremy Fitzhardinge
@ 2010-12-14 22:06 ` Daniel De Graaf
2010-12-14 22:40 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 32+ messages in thread
From: Daniel De Graaf @ 2010-12-14 22:06 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: xen-devel, Ian.Campbell
On 12/14/2010 04:42 PM, Jeremy Fitzhardinge wrote:
> On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
>> This allows a userspace application to allocate a shared page for
>> implementing inter-domain communication or device drivers. These
>> shared pages can be mapped using the gntdev device or by the kernel
>> in another domain.
>>
>> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
>> ---
>> drivers/xen/Kconfig | 7 +
>> drivers/xen/Makefile | 2 +
>> drivers/xen/gntalloc.c | 456 ++++++++++++++++++++++++++++++++++++++++++++++++
>> include/xen/gntalloc.h | 68 +++++++
>> 4 files changed, 533 insertions(+), 0 deletions(-)
>> create mode 100644 drivers/xen/gntalloc.c
>> create mode 100644 include/xen/gntalloc.h
>>
>> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
>> index fa9982e..8398cb0 100644
>> --- a/drivers/xen/Kconfig
>> +++ b/drivers/xen/Kconfig
>> @@ -180,6 +180,13 @@ config XEN_GNTDEV
>> help
>> Allows userspace processes use grants.
>>
>> +config XEN_GRANT_DEV_ALLOC
>> + tristate "User-space grant reference allocator driver"
>> + depends on XEN
>> + help
>> + Allows userspace processes to create pages with access granted
>> + to other domains.
>> +
>> config XEN_S3
>> def_bool y
>> depends on XEN_DOM0 && ACPI
>> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
>> index ef1ea63..9814c1d 100644
>> --- a/drivers/xen/Makefile
>> +++ b/drivers/xen/Makefile
>> @@ -10,6 +10,7 @@ obj-$(CONFIG_XEN_XENCOMM) += xencomm.o
>> obj-$(CONFIG_XEN_BALLOON) += balloon.o
>> obj-$(CONFIG_XEN_DEV_EVTCHN) += xen-evtchn.o
>> obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev.o
>> +obj-$(CONFIG_XEN_GRANT_DEV_ALLOC) += xen-gntalloc.o
>> obj-$(CONFIG_XEN_PCIDEV_BACKEND) += pciback/
>> obj-$(CONFIG_XEN_BLKDEV_BACKEND) += blkback/
>> obj-$(CONFIG_XEN_BLKDEV_TAP) += blktap/
>> @@ -25,3 +26,4 @@ obj-$(CONFIG_XEN_PLATFORM_PCI) += platform-pci.o
>>
>> xen-evtchn-y := evtchn.o
>> xen-gntdev-y := gntdev.o
>> +xen-gntalloc-y := gntalloc.o
>> diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
>> new file mode 100644
>> index 0000000..f26adfd
>> --- /dev/null
>> +++ b/drivers/xen/gntalloc.c
>> @@ -0,0 +1,456 @@
>> +/******************************************************************************
>> + * gntalloc.c
>> + *
>> + * Device for creating grant references (in user-space) that may be shared
>> + * with other domains.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
>> + */
>> +
>> +/*
>> + * This driver exists to allow userspace programs in Linux to allocate kernel
>> + * memory that will later be shared with another domain. Without this device,
>> + * Linux userspace programs cannot create grant references.
>> + *
>> + * How this stuff works:
>> + * X -> granting a page to Y
>> + * Y -> mapping the grant from X
>> + *
>> + * 1. X uses the gntalloc device to allocate a page of kernel memory, P.
>> + * 2. X creates an entry in the grant table that says domid(Y) can
>> + * access P.
>> + * 3. X gives the grant reference identifier, GREF, to Y.
>> + * 4. A program in Y uses the gntdev device to map the page (owned by X
>> + * and identified by GREF) into domain(Y) and then into the address
>> + * space of the program. Behind the scenes, this requires a
>> + * hypercall in which Xen modifies the host CPU page tables to
>> + * perform the sharing -- that's where the actual cross-domain mapping
>> + * occurs.
>
> Presumably Y could be any grant-page user, not specifically gntdev? So
> you could use this to implement a frontend in userspace?
Yes, this is correct; the use of gntdev here was intended as an example,
not as the only user
>> + * 5. A program in X mmap()s a segment of the gntalloc device that
>> + * corresponds to the shared page.
>> + * 6. The two userspace programs can now communicate over the shared page.
>> + *
>> + *
>> + * NOTE TO USERSPACE LIBRARIES:
>> + * The grant allocation and mmap()ing are, naturally, two separate
>> + * operations. You set up the sharing by calling the create ioctl() and
>> + * then the mmap(). You must tear down the sharing in the reverse order
>> + * (munmap() and then the destroy ioctl()).
>> + *
>> + * WARNING: Since Xen does not allow a guest to forcibly end the use of a grant
>> + * reference, this device can be used to consume kernel memory by leaving grant
>> + * references mapped by another domain when an application exits. Therefore,
>> + * there is a global limit on the number of pages that can be allocated. When
>> + * all references to the page are unmapped, it will be freed during the next
>> + * grant operation.
>> + */
>> +
>> +#include <asm/atomic.h>
>> +#include <linux/module.h>
>> +#include <linux/miscdevice.h>
>> +#include <linux/kernel.h>
>> +#include <linux/init.h>
>> +#include <linux/slab.h>
>> +#include <linux/fs.h>
>> +#include <linux/device.h>
>> +#include <linux/mm.h>
>> +#include <asm/uaccess.h>
>> +#include <linux/types.h>
>> +#include <linux/list.h>
>> +
>> +#include <xen/xen.h>
>> +#include <xen/page.h>
>> +#include <xen/grant_table.h>
>> +#include <xen/gntalloc.h>
>> +
>> +static int debug = 0;
>> +module_param(debug, int, 0644);
>> +
>> +static int limit = 1024;
>> +module_param(limit, int, 0644);
>> +
>> +static LIST_HEAD(gref_list);
>> +static DEFINE_SPINLOCK(gref_lock);
>> +static int gref_size = 0;
>> +
>> +/* Metadata on a grant reference. */
>> +struct gntalloc_gref {
>> + struct list_head next_all; /* list entry gref_list */
>> + struct list_head next_file; /* list entry file->list, if open */
>> + domid_t foreign_domid; /* The ID of the domain to share with. */
>> + grant_ref_t gref_id; /* The grant reference number. */
>> + unsigned int users; /* Use count - when zero, waiting on Xen */
>> + struct page* page; /* The shared page. */
>> +};
>> +
>> +struct gntalloc_file_private_data {
>> + struct list_head list;
>> +};
>> +
>> +static void __del_gref(struct gntalloc_gref *gref);
>> +
>> +static void do_cleanup(void)
>> +{
>> + struct gntalloc_gref *gref, *n;
>> + list_for_each_entry_safe(gref, n, &gref_list, next_all) {
>> + if (!gref->users)
>> + __del_gref(gref);
>> + }
>> +}
>> +
>> +
>> +static int add_gref(domid_t foreign_domid, uint32_t readonly,
>> + struct gntalloc_file_private_data *priv)
>> +{
>> + int rc;
>> + struct gntalloc_gref *gref;
>> +
>> + rc = -ENOMEM;
>> + spin_lock(&gref_lock);
>> + do_cleanup();
>> + if (gref_size >= limit) {
>> + spin_unlock(&gref_lock);
>> + rc = -ENOSPC;
>> + goto out;
>> + }
>> + gref_size++;
>> + spin_unlock(&gref_lock);
>> +
>> + gref = kzalloc(sizeof(*gref), GFP_KERNEL);
>> + if (!gref)
>> + goto out;
>> +
>> + gref->foreign_domid = foreign_domid;
>> + gref->users = 1;
>> +
>> + /* Allocate the page to share. */
>> + gref->page = alloc_page(GFP_KERNEL|__GFP_ZERO);
>
> Could this be GFP_HIGHUSER?
Yes, that's probably a better flag to use here.
>> + if (!gref->page)
>> + goto out_nopage;
>> +
>> + /* Grant foreign access to the page. */
>> + gref->gref_id = gnttab_grant_foreign_access(foreign_domid,
>> + pfn_to_mfn(page_to_pfn(gref->page)), readonly);
>> + if (gref->gref_id < 0) {
>> + printk(KERN_ERR "%s: failed to grant foreign access for mfn "
>> + "%lu to domain %u\n", __func__,
>> + pfn_to_mfn(page_to_pfn(gref->page)), foreign_domid);
>> + rc = -EFAULT;
>> + goto out_no_foreign_gref;
>> + }
>> +
>> + /* Add to gref lists. */
>> + spin_lock(&gref_lock);
>> + list_add_tail(&gref->next_all, &gref_list);
>> + list_add_tail(&gref->next_file, &priv->list);
>> + spin_unlock(&gref_lock);
>> +
>> + return gref->gref_id;
>> +
>> +out_no_foreign_gref:
>> + __free_page(gref->page);
>> +out_nopage:
>> + kfree(gref);
>> +out:
>> + return rc;
>> +}
>> +
>> +static void __del_gref(struct gntalloc_gref *gref)
>> +{
>> + if (gnttab_query_foreign_access(gref->gref_id))
>> + return;
>> +
>> + if (!gnttab_end_foreign_access_ref(gref->gref_id, 0))
>> + return;
>> +
>> + gref_size--;
>> + list_del(&gref->next_all);
>> +
>> + __free_page(gref->page);
>> + kfree(gref);
>> +}
>> +
>> +static struct gntalloc_gref* find_gref(struct gntalloc_file_private_data *priv,
>> + grant_ref_t gref_id)
>> +{
>> + struct gntalloc_gref *gref;
>> + list_for_each_entry(gref, &priv->list, next_file) {
>> + if (gref->gref_id == gref_id)
>> + return gref;
>> + }
>> + return NULL;
>> +}
>> +
>> +/*
>> + * -------------------------------------
>> + * File operations.
>> + * -------------------------------------
>> + */
>> +static int gntalloc_open(struct inode *inode, struct file *filp)
>> +{
>> + struct gntalloc_file_private_data *priv;
>> +
>> + try_module_get(THIS_MODULE);
>> +
>> + priv = kzalloc(sizeof(*priv), GFP_KERNEL);
>> + if (!priv)
>> + goto out_nomem;
>> + INIT_LIST_HEAD(&priv->list);
>> +
>> + filp->private_data = priv;
>> +
>> + if (debug)
>> + printk("%s: priv %p\n", __FUNCTION__, priv);
>> +
>> + return 0;
>> +
>> +out_nomem:
>> + return -ENOMEM;
>> +}
>> +
>> +static int gntalloc_release(struct inode *inode, struct file *filp)
>> +{
>> + struct gntalloc_file_private_data *priv = filp->private_data;
>> + struct gntalloc_gref *gref;
>> +
>> + if (debug)
>> + printk("%s: priv %p\n", __FUNCTION__, priv);
>> +
>> + spin_lock(&gref_lock);
>
> Presumably this is unnecessary because there can be no other users if
> you're tearing down the list and destroying priv.
>
>> + while (!list_empty(&priv->list)) {
>> + gref = list_entry(priv->list.next,
>> + struct gntalloc_gref, next_file);
>> + list_del(&gref->next_file);
>> + gref->users--;
>> + if (gref->users == 0)
>> + __del_gref(gref);
>> + }
>> + kfree(priv);
>> + spin_unlock(&gref_lock);
>> +
>> + module_put(THIS_MODULE);
>> +
>> + return 0;
>> +}
>> +
>> +static long gntalloc_ioctl_alloc(struct gntalloc_file_private_data *priv,
>> + void __user *arg)
>> +{
>> + int rc = 0;
>> + struct ioctl_gntalloc_alloc_gref op;
>> +
>> + if (debug)
>> + printk("%s: priv %p\n", __FUNCTION__, priv);
>> +
>> + if (copy_from_user(&op, arg, sizeof(op))) {
>> + rc = -EFAULT;
>> + goto alloc_grant_out;
>> + }
>> + rc = add_gref(op.foreign_domid, op.readonly, priv);
>> + if (rc < 0)
>> + goto alloc_grant_out;
>> +
>> + op.gref_id = rc;
>> + op.page_idx = rc;
>
> Hm, see below.
>
>> +
>> + rc = 0;
>> +
>> + if (copy_to_user((void __user *)arg, &op, sizeof(op))) {
>> + rc = -EFAULT;
>> + goto alloc_grant_out;
>> + }
>> +
>> +alloc_grant_out:
>> + return rc;
>> +}
>> +
>> +static long gntalloc_ioctl_dealloc(struct gntalloc_file_private_data *priv,
>> + void __user *arg)
>> +{
>> + int rc = 0;
>> + struct ioctl_gntalloc_dealloc_gref op;
>> + struct gntalloc_gref *gref;
>> +
>> + if (debug)
>> + printk("%s: priv %p\n", __FUNCTION__, priv);
>> +
>> + if (copy_from_user(&op, arg, sizeof(op))) {
>> + rc = -EFAULT;
>> + goto dealloc_grant_out;
>> + }
>> +
>> + spin_lock(&gref_lock);
>> + gref = find_gref(priv, op.gref_id);
>> + if (gref) {
>> + list_del(&gref->next_file);
>> + gref->users--;
>> + rc = 0;
>> + } else {
>> + rc = -EINVAL;
>> + }
>> +
>> + do_cleanup();
>> + spin_unlock(&gref_lock);
>> +dealloc_grant_out:
>> + return rc;
>> +}
>> +
>> +static long gntalloc_ioctl(struct file *filp, unsigned int cmd,
>> + unsigned long arg)
>> +{
>> + struct gntalloc_file_private_data *priv = filp->private_data;
>> +
>> + switch (cmd) {
>> + case IOCTL_GNTALLOC_ALLOC_GREF:
>> + return gntalloc_ioctl_alloc(priv, (void __user*)arg);
>> +
>> + case IOCTL_GNTALLOC_DEALLOC_GREF:
>> + return gntalloc_ioctl_dealloc(priv, (void __user*)arg);
>> +
>> + default:
>> + return -ENOIOCTLCMD;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int gntalloc_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>> +{
>> + struct gntalloc_gref *gref = vma->vm_private_data;
>> + if (!gref)
>> + return VM_FAULT_SIGBUS;
>> +
>> + vmf->page = gref->page;
>> + get_page(vmf->page);
>> +
>> + return 0;
>> +};
>> +
>> +static void gntalloc_vma_close(struct vm_area_struct *vma)
>> +{
>> + struct gntalloc_gref *gref = vma->vm_private_data;
>> + if (!gref)
>> + return;
>> +
>> + spin_lock(&gref_lock);
>> + gref->users--;
>> + if (gref->users == 0)
>> + __del_gref(gref);
>> + spin_unlock(&gref_lock);
>> +}
>> +
>> +static struct vm_operations_struct gntalloc_vmops = {
>> + .fault = gntalloc_vma_fault,
>> + .close = gntalloc_vma_close,
>> +};
>> +
>> +static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma)
>> +{
>> + struct gntalloc_file_private_data *priv = filp->private_data;
>> + struct gntalloc_gref *gref;
>> +
>> + if (debug)
>> + printk("%s: priv %p, page %lu\n", __func__,
>> + priv, vma->vm_pgoff);
>> +
>> + /*
>> + * There is a 1-to-1 correspondence of grant references to shared
>> + * pages, so it only makes sense to map exactly one page per
>> + * call to mmap().
>> + */
>
> Single-page mmap makes sense if the only possible use-cases are for
> single-page mappings, but if you're talking about framebuffers and the
> like is seems like a very awkward way to use mmap. It would be cleaner
> from an API perspective to have a user-mode defined flat address space
> indexed by pgoff which maps to an array of grefs, so you can sensibly do
> a multi-page mapping.
>
> It would also allow you to hide the grefs from usermode entirely. Then
> its just up to usermode to choose suitable file offsets for itself.
I considered this, but wanted to keep userspace compatability with the
previously created interface. If there's no reason to avoid doing so, I'll
change the ioctl interface to allocate an array of grants and calculate the
offset similar to how gntdev does currently (picks a suitable open slot).
Userspace does still have to know about grefs, of course, but only to pass
to the domain doing the mapping, not for its own mmap().
>> + if (((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) != 1) {
>> + printk(KERN_ERR "%s: Only one page can be memory-mapped "
>> + "per grant reference.\n", __func__);
>> + return -EINVAL;
>> + }
>> +
>> + if (!(vma->vm_flags & VM_SHARED)) {
>> + printk(KERN_ERR "%s: Mapping must be shared.\n",
>> + __func__);
>> + return -EINVAL;
>> + }
>> +
>> + spin_lock(&gref_lock);
>> + gref = find_gref(priv, vma->vm_pgoff);
>> + if (gref == NULL) {
>> + spin_unlock(&gref_lock);
>> + printk(KERN_ERR "%s: Could not find a grant reference with "
>> + "page index %lu.\n", __func__, vma->vm_pgoff);
>> + return -ENOENT;
>> + }
>> + gref->users++;
>> + spin_unlock(&gref_lock);
>> +
>> + vma->vm_private_data = gref;
>> +
>> + /* This flag prevents Bad PTE errors when the memory is unmapped. */
>> + vma->vm_flags |= VM_RESERVED;
>> + vma->vm_flags |= VM_DONTCOPY;
>> + vma->vm_flags |= VM_IO;
>
> If you set VM_PFNMAP then you don't need to deal with faults.
Useful to know. Is that more efficient/preferred to defining a
.fault handler? I used this method because that's what is used in
kernel/relay.c.
>> +
>> + vma->vm_ops = &gntalloc_vmops;
>> +
>> + return 0;
>> +}
>> +
>> +static const struct file_operations gntalloc_fops = {
>> + .owner = THIS_MODULE,
>> + .open = gntalloc_open,
>> + .release = gntalloc_release,
>> + .unlocked_ioctl = gntalloc_ioctl,
>> + .mmap = gntalloc_mmap
>> +};
>> +
>> +/*
>> + * -------------------------------------
>> + * Module creation/destruction.
>> + * -------------------------------------
>> + */
>> +static struct miscdevice gntalloc_miscdev = {
>> + .minor = MISC_DYNAMIC_MINOR,
>> + .name = "xen/gntalloc",
>> + .fops = &gntalloc_fops,
>> +};
>> +
>> +static int __init gntalloc_init(void)
>> +{
>> + int err;
>> +
>> + if (!xen_domain()) {
>> + if (debug)
>> + printk(KERN_ERR "gntalloc: You must be running Xen\n");
>> + return -ENODEV;
>> + }
>> +
>> + err = misc_register(&gntalloc_miscdev);
>> + if (err != 0) {
>> + printk(KERN_ERR "Could not register misc gntalloc device\n");
>> + return err;
>> + }
>> +
>> + if (debug)
>> + printk(KERN_INFO "Created grant allocation device at %d,%d\n",
>> + MISC_MAJOR, gntalloc_miscdev.minor);
>> +
>> + return 0;
>> +}
>> +
>> +static void __exit gntalloc_exit(void)
>> +{
>> + misc_deregister(&gntalloc_miscdev);
>> +}
>> +
>> +module_init(gntalloc_init);
>> +module_exit(gntalloc_exit);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("Carter Weatherly <carter.weatherly@jhuapl.edu>, "
>> + "Daniel De Graaf <dgdegra@tycho.nsa.gov>");
>> +MODULE_DESCRIPTION("User-space grant reference allocator driver");
>> diff --git a/include/xen/gntalloc.h b/include/xen/gntalloc.h
>> new file mode 100644
>> index 0000000..76b70d7
>> --- /dev/null
>> +++ b/include/xen/gntalloc.h
>> @@ -0,0 +1,68 @@
>> +/******************************************************************************
>> + * gntalloc.h
>> + *
>> + * Interface to /dev/xen/gntalloc.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License version 2
>> + * as published by the Free Software Foundation; or, when distributed
>> + * separately from the Linux kernel or incorporated into other
>> + * software packages, subject to the following license:
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this source file (the "Software"), to deal in the Software without
>> + * restriction, including without limitation the rights to use, copy, modify,
>> + * merge, publish, distribute, sublicense, and/or sell copies of the Software,
>> + * and to permit persons to whom the Software is furnished to do so, subject to
>> + * the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
>> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
>> + * IN THE SOFTWARE.
>> + */
>> +
>> +#ifndef __LINUX_PUBLIC_GNTALLOC_H__
>> +#define __LINUX_PUBLIC_GNTALLOC_H__
>> +
>> +/*
>> + * Allocates a new page and creates a new grant reference.
>> + *
>> + * N.B. The page_idx is really the address >> PAGE_SHIFT, meaning it's the
>> + * page number and not an actual address. It must be shifted again prior
>> + * to feeding it to mmap() (i.e. page_idx << PAGE_SHIFT).
>> + */
>> +#define IOCTL_GNTALLOC_ALLOC_GREF \
>> +_IOC(_IOC_NONE, 'G', 1, sizeof(struct ioctl_gntalloc_alloc_gref))
>> +struct ioctl_gntalloc_alloc_gref {
>> + /* IN parameters */
>> + /* The ID of the domain creating the grant reference. */
>> + domid_t owner_domid;
>> + /* The ID of the domain to be given access to the grant. */
>> + domid_t foreign_domid;
>> + /* The type of access given to domid. */
>> + uint32_t readonly;
>> + /* OUT parameters */
>> + /* The grant reference of the newly created grant. */
>> + grant_ref_t gref_id;
>> + /* The page index (page number, NOT address) for grant mmap(). */
>> + uint32_t page_idx;
>> +};
>> +
>> +/*
>> + * Deallocates the grant reference, freeing the associated page.
>> + */
>> +#define IOCTL_GNTALLOC_DEALLOC_GREF \
>> +_IOC(_IOC_NONE, 'G', 2, sizeof(struct ioctl_gntalloc_dealloc_gref))
>> +struct ioctl_gntalloc_dealloc_gref {
>> + /* IN parameter */
>> + /* The grant reference to deallocate. */
>> + grant_ref_t gref_id;
>> +};
>> +#endif /* __LINUX_PUBLIC_GNTALLOC_H__ */
>
> J
>
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver
2010-12-14 22:06 ` Daniel De Graaf
@ 2010-12-14 22:40 ` Jeremy Fitzhardinge
2010-12-15 14:18 ` Daniel De Graaf
0 siblings, 1 reply; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-14 22:40 UTC (permalink / raw)
To: Daniel De Graaf; +Cc: xen-devel, Ian.Campbell
On 12/14/2010 02:06 PM, Daniel De Graaf wrote:
>>> +static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma)
>>> +{
>>> + struct gntalloc_file_private_data *priv = filp->private_data;
>>> + struct gntalloc_gref *gref;
>>> +
>>> + if (debug)
>>> + printk("%s: priv %p, page %lu\n", __func__,
>>> + priv, vma->vm_pgoff);
>>> +
>>> + /*
>>> + * There is a 1-to-1 correspondence of grant references to shared
>>> + * pages, so it only makes sense to map exactly one page per
>>> + * call to mmap().
>>> + */
>> Single-page mmap makes sense if the only possible use-cases are for
>> single-page mappings, but if you're talking about framebuffers and the
>> like is seems like a very awkward way to use mmap. It would be cleaner
>> from an API perspective to have a user-mode defined flat address space
>> indexed by pgoff which maps to an array of grefs, so you can sensibly do
>> a multi-page mapping.
>>
>> It would also allow you to hide the grefs from usermode entirely. Then
>> its just up to usermode to choose suitable file offsets for itself.
> I considered this, but wanted to keep userspace compatability with the
> previously created interface.
Is that private to you, or something in broader use?
> If there's no reason to avoid doing so, I'll
> change the ioctl interface to allocate an array of grants and calculate the
> offset similar to how gntdev does currently (picks a suitable open slot).
I guess there's three options: you could get the kernel to allocate
extents, make usermode do it, or have one fd per extent and always start
from offset 0. I guess the last could get very messy if you want to
have lots of mappings... Making usermode define the offsets seems
simplest and most flexible, because then they can stitch together the
file-offset space in any way that's convenient to them (you just need to
deal with overlaps in that space).
> Userspace does still have to know about grefs, of course, but only to pass
> to the domain doing the mapping, not for its own mmap().
Ah, yes, of course.
>>> + if (((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) != 1) {
>>> + printk(KERN_ERR "%s: Only one page can be memory-mapped "
>>> + "per grant reference.\n", __func__);
>>> + return -EINVAL;
>>> + }
>>> +
>>> + if (!(vma->vm_flags & VM_SHARED)) {
>>> + printk(KERN_ERR "%s: Mapping must be shared.\n",
>>> + __func__);
>>> + return -EINVAL;
>>> + }
>>> +
>>> + spin_lock(&gref_lock);
>>> + gref = find_gref(priv, vma->vm_pgoff);
>>> + if (gref == NULL) {
>>> + spin_unlock(&gref_lock);
>>> + printk(KERN_ERR "%s: Could not find a grant reference with "
>>> + "page index %lu.\n", __func__, vma->vm_pgoff);
>>> + return -ENOENT;
>>> + }
>>> + gref->users++;
>>> + spin_unlock(&gref_lock);
>>> +
>>> + vma->vm_private_data = gref;
>>> +
>>> + /* This flag prevents Bad PTE errors when the memory is unmapped. */
>>> + vma->vm_flags |= VM_RESERVED;
>>> + vma->vm_flags |= VM_DONTCOPY;
>>> + vma->vm_flags |= VM_IO;
>> If you set VM_PFNMAP then you don't need to deal with faults.
> Useful to know. Is that more efficient/preferred to defining a
> .fault handler? I used this method because that's what is used in
> kernel/relay.c.
Well, as you currently have it, your mmap() function doesn't map
anything, so you're relying on demand faulting to populate the ptes.
Since you've already allocated the pages that's just a soft fault, but
it means you end up with a lot of per-page entries into the hypervisor.
If you make mmap pre-populate all the ptes (a nice big fat batch
operation), then you should never get faults on the vma. You can set
PFNMAP to make sure of that (since you're already setting all the
"woowoo vma" flags, that makes sense).
Its actual meaning is "this vma contains pages which are not really
kernel memory, so paging them doesn't make sense" - ie, device or
foreign memory (we use it in gntdev).
In this case, the pages are normal kernel pages but they're being given
over to a "device" and so are no longer subject to normal kernel
lifetime rules. So I think PFNMAP makes sense in this case too.
J
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver
2010-12-14 22:40 ` Jeremy Fitzhardinge
@ 2010-12-15 14:18 ` Daniel De Graaf
2010-12-16 1:05 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 32+ messages in thread
From: Daniel De Graaf @ 2010-12-15 14:18 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: xen-devel, Ian.Campbell
On 12/14/2010 05:40 PM, Jeremy Fitzhardinge wrote:
> On 12/14/2010 02:06 PM, Daniel De Graaf wrote:
>>>> +static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma)
>>>> +{
>>>> + struct gntalloc_file_private_data *priv = filp->private_data;
>>>> + struct gntalloc_gref *gref;
>>>> +
>>>> + if (debug)
>>>> + printk("%s: priv %p, page %lu\n", __func__,
>>>> + priv, vma->vm_pgoff);
>>>> +
>>>> + /*
>>>> + * There is a 1-to-1 correspondence of grant references to shared
>>>> + * pages, so it only makes sense to map exactly one page per
>>>> + * call to mmap().
>>>> + */
>>> Single-page mmap makes sense if the only possible use-cases are for
>>> single-page mappings, but if you're talking about framebuffers and the
>>> like is seems like a very awkward way to use mmap. It would be cleaner
>>> from an API perspective to have a user-mode defined flat address space
>>> indexed by pgoff which maps to an array of grefs, so you can sensibly do
>>> a multi-page mapping.
>>>
>>> It would also allow you to hide the grefs from usermode entirely. Then
>>> its just up to usermode to choose suitable file offsets for itself.
>> I considered this, but wanted to keep userspace compatability with the
>> previously created interface.
>
> Is that private to you, or something in broader use?
This module was used as part of Qubes (http://www.qubes-os.org). The device
path has changed (/dev/gntalloc to /dev/xen/gntalloc), and the API change
adds useful functionality, so I don't think we must keep compatibility. This
will also allow cleaning up the interface to remove parameters that make no
sense (owner_domid, for example).
>> If there's no reason to avoid doing so, I'll
>> change the ioctl interface to allocate an array of grants and calculate the
>> offset similar to how gntdev does currently (picks a suitable open slot).
>
> I guess there's three options: you could get the kernel to allocate
> extents, make usermode do it, or have one fd per extent and always start
> from offset 0. I guess the last could get very messy if you want to
> have lots of mappings... Making usermode define the offsets seems
> simplest and most flexible, because then they can stitch together the
> file-offset space in any way that's convenient to them (you just need to
> deal with overlaps in that space).
Would it be useful to also give userspace control over the offsets in gntdev?
One argument for doing it in the kernel is to avoid needing to track what
offsets are already being used (and then having the kernel re-check that).
While this isn't hard, IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR only exists in
order to relieve userspace of the need to track its mappings, so this
seems to have been a concern before.
Another use case of gntalloc that may prove useful is to have more than
one application able to map the same grant within the kernel. For gntdev,
this can be done by mapping the pages multiple times (although that may
not be the most efficient). Moving the gntalloc lists out of the file
private data would allow any user of gntalloc to map any shared page. The
primary reason not to do this is that it prevents automatic cleanup of
allocated pages on close (which is important when the userspace app doing
the mapping crashes).
>> Userspace does still have to know about grefs, of course, but only to pass
>> to the domain doing the mapping, not for its own mmap().
>
> Ah, yes, of course.
>
>>>> +
>>>> + /* This flag prevents Bad PTE errors when the memory is unmapped. */
>>>> + vma->vm_flags |= VM_RESERVED;
>>>> + vma->vm_flags |= VM_DONTCOPY;
>>>> + vma->vm_flags |= VM_IO;
>>> If you set VM_PFNMAP then you don't need to deal with faults.
>> Useful to know. Is that more efficient/preferred to defining a
>> .fault handler? I used this method because that's what is used in
>> kernel/relay.c.
>
> Well, as you currently have it, your mmap() function doesn't map
> anything, so you're relying on demand faulting to populate the ptes.
> Since you've already allocated the pages that's just a soft fault, but
> it means you end up with a lot of per-page entries into the hypervisor.
>
> If you make mmap pre-populate all the ptes (a nice big fat batch
> operation), then you should never get faults on the vma. You can set
> PFNMAP to make sure of that (since you're already setting all the
> "woowoo vma" flags, that makes sense).
>
> Its actual meaning is "this vma contains pages which are not really
> kernel memory, so paging them doesn't make sense" - ie, device or
> foreign memory (we use it in gntdev).
>
> In this case, the pages are normal kernel pages but they're being given
> over to a "device" and so are no longer subject to normal kernel
> lifetime rules. So I think PFNMAP makes sense in this case too.
>
>
> J
>
Agreed; once mapped, the frame numbers (GFN & MFN) won't change until
they are unmapped, so pre-populating them will be better.
--
Daniel De Graaf
National Security Agency
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver
2010-12-15 14:18 ` Daniel De Graaf
@ 2010-12-16 1:05 ` Jeremy Fitzhardinge
2010-12-16 15:22 ` Daniel De Graaf
0 siblings, 1 reply; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-16 1:05 UTC (permalink / raw)
To: Daniel De Graaf; +Cc: xen-devel, Ian.Campbell
On 12/15/2010 06:18 AM, Daniel De Graaf wrote:
> On 12/14/2010 05:40 PM, Jeremy Fitzhardinge wrote:
>> On 12/14/2010 02:06 PM, Daniel De Graaf wrote:
>>>>> +static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma)
>>>>> +{
>>>>> + struct gntalloc_file_private_data *priv = filp->private_data;
>>>>> + struct gntalloc_gref *gref;
>>>>> +
>>>>> + if (debug)
>>>>> + printk("%s: priv %p, page %lu\n", __func__,
>>>>> + priv, vma->vm_pgoff);
>>>>> +
>>>>> + /*
>>>>> + * There is a 1-to-1 correspondence of grant references to shared
>>>>> + * pages, so it only makes sense to map exactly one page per
>>>>> + * call to mmap().
>>>>> + */
>>>> Single-page mmap makes sense if the only possible use-cases are for
>>>> single-page mappings, but if you're talking about framebuffers and the
>>>> like is seems like a very awkward way to use mmap. It would be cleaner
>>>> from an API perspective to have a user-mode defined flat address space
>>>> indexed by pgoff which maps to an array of grefs, so you can sensibly do
>>>> a multi-page mapping.
>>>>
>>>> It would also allow you to hide the grefs from usermode entirely. Then
>>>> its just up to usermode to choose suitable file offsets for itself.
>>> I considered this, but wanted to keep userspace compatability with the
>>> previously created interface.
>> Is that private to you, or something in broader use?
> This module was used as part of Qubes (http://www.qubes-os.org). The device
> path has changed (/dev/gntalloc to /dev/xen/gntalloc), and the API change
> adds useful functionality, so I don't think we must keep compatibility. This
> will also allow cleaning up the interface to remove parameters that make no
> sense (owner_domid, for example).
Ah, right. Well that means it has at least been prototyped, but I don't
think we should be constrained by the original ABI if we can make clear
improvements.
>>> If there's no reason to avoid doing so, I'll
>>> change the ioctl interface to allocate an array of grants and calculate the
>>> offset similar to how gntdev does currently (picks a suitable open slot).
>> I guess there's three options: you could get the kernel to allocate
>> extents, make usermode do it, or have one fd per extent and always start
>> from offset 0. I guess the last could get very messy if you want to
>> have lots of mappings... Making usermode define the offsets seems
>> simplest and most flexible, because then they can stitch together the
>> file-offset space in any way that's convenient to them (you just need to
>> deal with overlaps in that space).
> Would it be useful to also give userspace control over the offsets in gntdev?
>
> One argument for doing it in the kernel is to avoid needing to track what
> offsets are already being used (and then having the kernel re-check that).
Hm, yeah, that could be a bit fiddly. I guess you'd need to stick them
into an rbtree or something.
> While this isn't hard, IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR only exists in
> order to relieve userspace of the need to track its mappings, so this
> seems to have been a concern before.
It would be nice to have them symmetric. However, its easy to implement
GET_OFFSET_FOR_VADDR either way - given a vaddr, you can look up the vma
and return its pgoff.
It looks like GET_OFFSET_FOR_VADDR is just used in xc_gnttab_munmap() so
that libxc can recover the offset and the page count from the vaddr, so
that it can pass them to IOCTL_GNTDEV_UNMAP_GRANT_REF.
Also, it seems to fail unmaps which don't exactly correspond to a
MAP_GRANT_REF. I guess that's OK, but it looks a bit strange.
> Another use case of gntalloc that may prove useful is to have more than
> one application able to map the same grant within the kernel.
So you mean have gntalloc allocate one page and the allow multiple
processes to map and use it? In that case it would probably be best
implemented as a filesystem, so you can give proper globally visible
names to the granted regions, and mmap them as normal files, like shm.
> Agreed; once mapped, the frame numbers (GFN & MFN) won't change until
> they are unmapped, so pre-populating them will be better.
Unless of course you don't want to map the pages in dom0 at all; if you
just want dom0 to be a facilitator for shared pages between two other
domains. Does Xen allow a page to be granted to more than one domain at
once?
J
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver
2010-12-16 1:05 ` Jeremy Fitzhardinge
@ 2010-12-16 15:22 ` Daniel De Graaf
2010-12-16 19:14 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 32+ messages in thread
From: Daniel De Graaf @ 2010-12-16 15:22 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: xen-devel, Ian.Campbell
On 12/15/2010 08:05 PM, Jeremy Fitzhardinge wrote:
> On 12/15/2010 06:18 AM, Daniel De Graaf wrote:
>> On 12/14/2010 05:40 PM, Jeremy Fitzhardinge wrote:
>>> On 12/14/2010 02:06 PM, Daniel De Graaf wrote:
>>>>>> +static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma)
>>>>>> +{
>>>>>> + struct gntalloc_file_private_data *priv = filp->private_data;
>>>>>> + struct gntalloc_gref *gref;
>>>>>> +
>>>>>> + if (debug)
>>>>>> + printk("%s: priv %p, page %lu\n", __func__,
>>>>>> + priv, vma->vm_pgoff);
>>>>>> +
>>>>>> + /*
>>>>>> + * There is a 1-to-1 correspondence of grant references to shared
>>>>>> + * pages, so it only makes sense to map exactly one page per
>>>>>> + * call to mmap().
>>>>>> + */
>>>>> Single-page mmap makes sense if the only possible use-cases are for
>>>>> single-page mappings, but if you're talking about framebuffers and the
>>>>> like is seems like a very awkward way to use mmap. It would be cleaner
>>>>> from an API perspective to have a user-mode defined flat address space
>>>>> indexed by pgoff which maps to an array of grefs, so you can sensibly do
>>>>> a multi-page mapping.
>>>>>
>>>>> It would also allow you to hide the grefs from usermode entirely. Then
>>>>> its just up to usermode to choose suitable file offsets for itself.
>>>> I considered this, but wanted to keep userspace compatability with the
>>>> previously created interface.
>>> Is that private to you, or something in broader use?
>> This module was used as part of Qubes (http://www.qubes-os.org). The device
>> path has changed (/dev/gntalloc to /dev/xen/gntalloc), and the API change
>> adds useful functionality, so I don't think we must keep compatibility. This
>> will also allow cleaning up the interface to remove parameters that make no
>> sense (owner_domid, for example).
>
> Ah, right. Well that means it has at least been prototyped, but I don't
> think we should be constrained by the original ABI if we can make clear
> improvements.
>
>>>> If there's no reason to avoid doing so, I'll
>>>> change the ioctl interface to allocate an array of grants and calculate the
>>>> offset similar to how gntdev does currently (picks a suitable open slot).
>>> I guess there's three options: you could get the kernel to allocate
>>> extents, make usermode do it, or have one fd per extent and always start
>>> from offset 0. I guess the last could get very messy if you want to
>>> have lots of mappings... Making usermode define the offsets seems
>>> simplest and most flexible, because then they can stitch together the
>>> file-offset space in any way that's convenient to them (you just need to
>>> deal with overlaps in that space).
>> Would it be useful to also give userspace control over the offsets in gntdev?
>>
>> One argument for doing it in the kernel is to avoid needing to track what
>> offsets are already being used (and then having the kernel re-check that).
>
> Hm, yeah, that could be a bit fiddly. I guess you'd need to stick them
> into an rbtree or something.
Another option that provides more flexibility - have a flag in the create
operation, similar to MAP_FIXED in mmap, that allows userspace to mandate
the offset if it wants control, but default to letting the kernel handle it.
We already have a flags field for making the grant writable, this is just
another bit.
>> While this isn't hard, IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR only exists in
>> order to relieve userspace of the need to track its mappings, so this
>> seems to have been a concern before.
>
> It would be nice to have them symmetric. However, its easy to implement
> GET_OFFSET_FOR_VADDR either way - given a vaddr, you can look up the vma
> and return its pgoff.
>
> It looks like GET_OFFSET_FOR_VADDR is just used in xc_gnttab_munmap() so
> that libxc can recover the offset and the page count from the vaddr, so
> that it can pass them to IOCTL_GNTDEV_UNMAP_GRANT_REF.
>
> Also, it seems to fail unmaps which don't exactly correspond to a
> MAP_GRANT_REF. I guess that's OK, but it looks a bit strange.
So, implementing an IOCTL_GNTALLOC_GET_OFFSET_FOR_VADDR would be useful in
order to allow gntalloc munmap() to be similar to gnttab's. If we want to
allow a given offset to be mapped to multiple domains, we couldn't just
return the offset; it would have to be a list of grant references, and
the destroy ioctl would need to take the grant reference.
>> Another use case of gntalloc that may prove useful is to have more than
>> one application able to map the same grant within the kernel.
>
> So you mean have gntalloc allocate one page and the allow multiple
> processes to map and use it? In that case it would probably be best
> implemented as a filesystem, so you can give proper globally visible
> names to the granted regions, and mmap them as normal files, like shm.
That seems like a better way to expose this functionality. I didn't have
a use case for multiple processes mapping a grant, just didn't want to
prevent doing it in the future if it was a trivial change. Since it's
more complex to implement a filesystem, I think someone needs to find a
use for it before it's written. I believe the current code lets you map
the areas in multiple processes if you pass the file descriptor around
with fork() or using unix sockets; that seems sufficient to me.
>> Agreed; once mapped, the frame numbers (GFN & MFN) won't change until
>> they are unmapped, so pre-populating them will be better.
>
> Unless of course you don't want to map the pages in dom0 at all; if you
> just want dom0 to be a facilitator for shared pages between two other
> domains. Does Xen allow a page to be granted to more than one domain at
> once?
I think so; you'd have to use multiple grant table entries in order to do
that, and it might trigger some hypervisor warnings when the shared page has
an unexpectedly high refcount when being unmapped (HVM mappings already
trigger such warnings, so perhaps they need to be removed). I can't think
of an immediate use for a 3-domain shared page, but that doesn't mean that
such a use doesn't exist. Perhaps some kind of shared page cache, exported
using read-only grants by dom0?
Having dom0 create a pair of grant table entries to let two domUs
communicate seems like a hack to get around the lack of gntalloc in either
domU. This module works perfectly fine in a domU, and the use of a shared
page must be asymmetric no matter how you set it up, so you don't save all
that much code by making the setup identical.
Anyway, you don't have to call mmap() to let another domain access the shared
pages; they are mappable as soon as the ioctl() returns, and remain so
until you call the removal ioctl(). So if you do call mmap(), you probably
expect to use the mapping.
--
Daniel De Graaf
National Security Agency
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver
2010-12-16 15:22 ` Daniel De Graaf
@ 2010-12-16 19:14 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-16 19:14 UTC (permalink / raw)
To: Daniel De Graaf; +Cc: xen-devel, Ian.Campbell
On 12/16/2010 07:22 AM, Daniel De Graaf wrote:
>> Hm, yeah, that could be a bit fiddly. I guess you'd need to stick them
>> into an rbtree or something.
> Another option that provides more flexibility - have a flag in the create
> operation, similar to MAP_FIXED in mmap, that allows userspace to mandate
> the offset if it wants control, but default to letting the kernel handle it.
> We already have a flags field for making the grant writable, this is just
> another bit.
I'd go for just implementing one way of doing it unless there's a clear
need for each of them. The choose-your-own-offset route is looking
pretty complex. If you have the kernel allocate the offsets, but
perhaps with the guarantee that subsequent allocations will get
consecutive offsets, then that lets usermode set up a group of pages
which can be mapped with a single mmap call, which is all I was really
aiming for.
>>> While this isn't hard, IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR only exists in
>>> order to relieve userspace of the need to track its mappings, so this
>>> seems to have been a concern before.
>> It would be nice to have them symmetric. However, its easy to implement
>> GET_OFFSET_FOR_VADDR either way - given a vaddr, you can look up the vma
>> and return its pgoff.
>>
>> It looks like GET_OFFSET_FOR_VADDR is just used in xc_gnttab_munmap() so
>> that libxc can recover the offset and the page count from the vaddr, so
>> that it can pass them to IOCTL_GNTDEV_UNMAP_GRANT_REF.
>>
>> Also, it seems to fail unmaps which don't exactly correspond to a
>> MAP_GRANT_REF. I guess that's OK, but it looks a bit strange.
> So, implementing an IOCTL_GNTALLOC_GET_OFFSET_FOR_VADDR would be useful in
> order to allow gntalloc munmap() to be similar to gnttab's. If we want to
> allow a given offset to be mapped to multiple domains, we couldn't just
> return the offset; it would have to be a list of grant references, and
> the destroy ioctl would need to take the grant reference.
See below.
>>> Another use case of gntalloc that may prove useful is to have more than
>>> one application able to map the same grant within the kernel.
>> So you mean have gntalloc allocate one page and the allow multiple
>> processes to map and use it? In that case it would probably be best
>> implemented as a filesystem, so you can give proper globally visible
>> names to the granted regions, and mmap them as normal files, like shm.
> That seems like a better way to expose this functionality. I didn't have
> a use case for multiple processes mapping a grant, just didn't want to
> prevent doing it in the future if it was a trivial change. Since it's
> more complex to implement a filesystem, I think someone needs to find a
> use for it before it's written. I believe the current code lets you map
> the areas in multiple processes if you pass the file descriptor around
> with fork() or using unix sockets; that seems sufficient to me.
That raises another quirk in the gntdev (which I think also applies to
gntalloc) API - the relationship between munmap and
IOCTL_GNTDEV_UNMAP_GRANT_REF. The current behaviour is that the ioctl
will fail with EBUSY if there are still mappings of the granted pages.
It would probably be better to make the map refcounted by the number of
vmas+1 pointing to it so that the UNMAP_GRANT_REF would drop a
reference, as would each vma as its unmapped, with the actual ungranting
happening at ref==0. That would allow multiple uncoordinated processes
to use the same mappings without having to work out who's doing the cleanup.
This would also allow auto-ungranting maps (xc_gnttab_map_grant_ref()
could do the UNMAP_GRANT_REF immediately after the mmap(), so that
xc_gnttab_munmap() can simply munmap() without the need for
GET_OFFSET_FOR_VADDR).
> Anyway, you don't have to call mmap() to let another domain access the shared
> pages; they are mappable as soon as the ioctl() returns, and remain so
> until you call the removal ioctl(). So if you do call mmap(), you probably
> expect to use the mapping.
Yep.
J
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 6/6] xen-gntdev: Introduce HVM version of gntdev
2010-12-14 14:55 [PATCH v2] Userspace grant communication Daniel De Graaf
` (4 preceding siblings ...)
2010-12-14 14:55 ` [PATCH 5/6] xen-gntalloc: Userspace grant allocation driver Daniel De Graaf
@ 2010-12-14 14:55 ` Daniel De Graaf
2010-12-14 21:45 ` Jeremy Fitzhardinge
5 siblings, 1 reply; 32+ messages in thread
From: Daniel De Graaf @ 2010-12-14 14:55 UTC (permalink / raw)
To: xen-devel; +Cc: Daniel De Graaf, Ian.Campbell
HVM does not allow direct PTE modification, so guest pages must be
allocated and used for grant mappings. If Xen does not provide an
auto-translated physmap, the existing direct PTE modification is more
efficient.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
drivers/xen/Makefile | 2 +
drivers/xen/gntdev-hvm.c | 601 ++++++++++++++++++++++++++++++++++++++++++++++
drivers/xen/gntdev.c | 3 +
3 files changed, 606 insertions(+), 0 deletions(-)
create mode 100644 drivers/xen/gntdev-hvm.c
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 9814c1d..ab0e6eb 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_XEN_XENCOMM) += xencomm.o
obj-$(CONFIG_XEN_BALLOON) += balloon.o
obj-$(CONFIG_XEN_DEV_EVTCHN) += xen-evtchn.o
obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev.o
+obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev-hvm.o
obj-$(CONFIG_XEN_GRANT_DEV_ALLOC) += xen-gntalloc.o
obj-$(CONFIG_XEN_PCIDEV_BACKEND) += pciback/
obj-$(CONFIG_XEN_BLKDEV_BACKEND) += blkback/
@@ -26,4 +27,5 @@ obj-$(CONFIG_XEN_PLATFORM_PCI) += platform-pci.o
xen-evtchn-y := evtchn.o
xen-gntdev-y := gntdev.o
+xen-gntdev-hvm-y := gntdev-hvm.o
xen-gntalloc-y := gntalloc.o
diff --git a/drivers/xen/gntdev-hvm.c b/drivers/xen/gntdev-hvm.c
new file mode 100644
index 0000000..331d5af
--- /dev/null
+++ b/drivers/xen/gntdev-hvm.c
@@ -0,0 +1,601 @@
+/******************************************************************************
+ * gntdev.c
+ *
+ * Device for accessing (in user-space) pages that have been granted by other
+ * domains.
+ *
+ * Copyright (c) 2006-2007, D G Murray.
+ * (c) 2009 Gerd Hoffmann <kraxel@redhat.com>
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/miscdevice.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/vmalloc.h>
+
+#include <xen/xen.h>
+#include <xen/grant_table.h>
+#include <xen/gntdev.h>
+#include <xen/interface/memory.h>
+#include <asm/xen/hypervisor.h>
+#include <asm/xen/hypercall.h>
+#include <asm/xen/page.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Derek G. Murray <Derek.Murray@cl.cam.ac.uk>, "
+ "Gerd Hoffmann <kraxel@redhat.com>");
+MODULE_DESCRIPTION("User-space granted page access driver");
+
+static int debug = 0;
+module_param(debug, int, 0644);
+static int limit = 1024*1024;
+module_param(limit, int, 0644);
+static atomic_t pages_mapped = ATOMIC_INIT(0);
+
+struct gntdev_priv {
+ struct list_head maps;
+ spinlock_t lock;
+};
+
+struct granted_page {
+ struct page* page;
+ union {
+ struct ioctl_gntdev_grant_ref target;
+ grant_handle_t handle;
+ };
+};
+
+struct grant_map {
+ struct list_head next;
+ int index;
+ int count;
+ atomic_t users;
+ int is_mapped:1;
+ int is_ro:1;
+ struct granted_page pages[0];
+};
+
+static struct vm_operations_struct gntdev_vmops;
+
+/* ------------------------------------------------------------------ */
+
+static void gntdev_print_maps(struct gntdev_priv *priv,
+ char *text, int text_index)
+{
+ struct grant_map *map;
+
+ printk("%s: maps list (priv %p)\n", __FUNCTION__, priv);
+ list_for_each_entry(map, &priv->maps, next) {
+ printk(" %p: %2d+%2d, r%c, %s %d,%d %s\n", map,
+ map->index, map->count, map->is_ro ? 'o' : 'w',
+ map->is_mapped ? "use,hnd" : "dom,ref",
+ map->is_mapped ? atomic_read(&map->users)
+ : map->pages[0].target.domid,
+ map->is_mapped ? map->pages[0].handle
+ : map->pages[0].target.ref,
+ map->index == text_index && text ? text : "");
+ }
+}
+
+static struct grant_map *gntdev_alloc_map(int count,
+ struct ioctl_gntdev_grant_ref* grants)
+{
+ struct grant_map *add;
+ int i;
+
+ add = kzalloc(sizeof(*add) + sizeof(add->pages[0])*count, GFP_KERNEL);
+ if (!add)
+ return NULL;
+
+ atomic_set(&add->users, 1);
+ add->count = count;
+
+ for(i = 0; i < count; i++)
+ add->pages[i].target = grants[i];
+
+ return add;
+}
+
+static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
+{
+ struct grant_map *map;
+
+ spin_lock(&priv->lock);
+
+ list_for_each_entry(map, &priv->maps, next) {
+ if (add->index + add->count < map->index) {
+ list_add_tail(&add->next, &map->next);
+ goto done;
+ }
+ add->index = map->index + map->count;
+ }
+ list_add_tail(&add->next, &priv->maps);
+
+done:
+ if (debug)
+ gntdev_print_maps(priv, "[new]", add->index);
+
+ spin_unlock(&priv->lock);
+}
+
+static void __gntdev_del_map(struct gntdev_priv *priv, struct grant_map *map)
+{
+ list_del(&map->next);
+}
+
+static void gntdev_del_map(struct gntdev_priv *priv, struct grant_map *map)
+{
+ spin_lock(&priv->lock);
+ __gntdev_del_map(priv, map);
+ spin_unlock(&priv->lock);
+}
+
+static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index,
+ int count)
+{
+ struct grant_map *map;
+
+ list_for_each_entry(map, &priv->maps, next) {
+ if (map->index != index)
+ continue;
+ if (map->count != count)
+ continue;
+ return map;
+ }
+ return NULL;
+}
+
+static void gntdev_unmap_fast(struct grant_map *map,
+ struct gnttab_unmap_grant_ref *unmap_ops)
+{
+ int err, flags, i, unmap_size = 0;
+ unsigned long pfn;
+ phys_addr_t mfn;
+
+ flags = GNTMAP_host_map;
+ if (map->is_ro)
+ flags |= GNTMAP_readonly;
+
+ for (i=0; i < map->count; i++) {
+ if (!map->pages[i].page)
+ continue;
+ pfn = page_to_pfn(map->pages[i].page);
+ mfn = (phys_addr_t)pfn_to_kaddr(pfn);
+ gnttab_set_unmap_op(&unmap_ops[unmap_size], mfn, flags,
+ map->pages[i].handle);
+ unmap_size++;
+ }
+
+ err = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
+ unmap_ops, unmap_size);
+ WARN_ON(err);
+
+ for (i = 0; i < unmap_size; i++)
+ WARN_ON(unmap_ops[i].status);
+}
+
+// for the out-of-memory case
+static void gntdev_unmap_slow(struct grant_map *map)
+{
+ int err, flags, i;
+ unsigned long pfn;
+ phys_addr_t mfn;
+ struct gnttab_unmap_grant_ref unmap_op;
+
+ flags = GNTMAP_host_map;
+ if (map->is_ro)
+ flags |= GNTMAP_readonly;
+
+ for (i=0; i < map->count; i++) {
+ if (!map->pages[i].page)
+ continue;
+
+ pfn = page_to_pfn(map->pages[i].page);
+ mfn = (phys_addr_t)pfn_to_kaddr(pfn);
+ gnttab_set_unmap_op(&unmap_op, mfn, flags, map->pages[i].handle);
+ err = HYPERVISOR_grant_table_op(
+ GNTTABOP_unmap_grant_ref, &unmap_op, 1);
+ WARN_ON(err);
+ WARN_ON(unmap_op.status);
+ }
+}
+
+static void gntdev_put_map(struct grant_map *map)
+{
+ struct gnttab_unmap_grant_ref *unmap_ops;
+ int i;
+ if (!map)
+ return;
+ if (!atomic_dec_and_test(&map->users))
+ return;
+ if (debug)
+ printk("%s: unmap %p (%d pages)\n", __FUNCTION__, map, map->count);
+ if (map->is_mapped) {
+ unmap_ops = kzalloc(sizeof(unmap_ops[0]) * map->count,
+ GFP_TEMPORARY);
+ if (likely(unmap_ops)) {
+ gntdev_unmap_fast(map, unmap_ops);
+ kfree(unmap_ops);
+ } else {
+ gntdev_unmap_slow(map);
+ }
+
+ atomic_sub(map->count, &pages_mapped);
+ }
+ for (i=0; i < map->count; i++)
+ if (map->pages[i].page)
+ __free_page(map->pages[i].page);
+ kfree(map);
+}
+
+static int gntdev_open(struct inode *inode, struct file *flip)
+{
+ struct gntdev_priv *priv;
+
+ priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&priv->maps);
+ spin_lock_init(&priv->lock);
+
+ flip->private_data = priv;
+ if (debug)
+ printk("%s: priv %p\n", __FUNCTION__, priv);
+
+ return 0;
+}
+
+static int gntdev_release(struct inode *inode, struct file *flip)
+{
+ struct gntdev_priv *priv = flip->private_data;
+ struct grant_map *map;
+
+ if (debug) {
+ printk("%s: priv %p\n", __FUNCTION__, priv);
+ gntdev_print_maps(priv, NULL, 0);
+ }
+
+ spin_lock(&priv->lock);
+ while (!list_empty(&priv->maps)) {
+ map = list_entry(priv->maps.next, struct grant_map, next);
+ list_del(&map->next);
+ gntdev_put_map(map);
+ }
+ spin_unlock(&priv->lock);
+
+ kfree(priv);
+ return 0;
+}
+
+static int gntdev_do_map(struct grant_map *map)
+{
+ int err, flags, i;
+ struct page* page;
+ phys_addr_t mfn;
+ struct gnttab_map_grant_ref* map_ops;
+
+ flags = GNTMAP_host_map;
+ if (map->is_ro)
+ flags |= GNTMAP_readonly;
+
+ err = -ENOMEM;
+
+ if (unlikely(atomic_add_return(map->count, &pages_mapped) > limit)) {
+ if (debug)
+ printk("%s: maps full\n", __FUNCTION__);
+ goto out;
+ }
+
+ map_ops = kzalloc(sizeof(map_ops[0]) * map->count, GFP_TEMPORARY);
+ if (!map_ops)
+ goto out;
+
+ for (i = 0; i < map->count; i++) {
+ page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO);
+ if (unlikely(!page))
+ goto out_free;
+ map->pages[i].page = page;
+ mfn = (phys_addr_t)pfn_to_kaddr(page_to_pfn(page));
+ gnttab_set_map_op(&map_ops[i], mfn, flags,
+ map->pages[i].target.ref,
+ map->pages[i].target.domid);
+ }
+ err = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,
+ map_ops, map->count);
+ if (WARN_ON(err))
+ goto out_free;
+
+ if (debug && map->count)
+ printk("%s: mapped first at gfn=%lx mfn=%lx\n", __func__,
+ page_to_pfn(map->pages[0].page), pfn_to_mfn(page_to_pfn(map->pages[0].page)));
+
+ map->is_mapped = 1;
+
+ for (i = 0; i < map->count; i++) {
+ if (map_ops[i].status) {
+ if (debug)
+ printk("%s: failed map at page %d: stat=%d\n",
+ __FUNCTION__, i, map_ops[i].status);
+ __free_page(map->pages[i].page);
+ map->pages[i].page = NULL;
+ err = -EINVAL;
+ } else {
+ map->pages[i].handle = map_ops[i].handle;
+ }
+ }
+
+out_free:
+ kfree(map_ops);
+out:
+ if (!map->is_mapped)
+ atomic_sub(map->count, &pages_mapped);
+ return err;
+}
+
+static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
+ struct ioctl_gntdev_map_grant_ref __user *u)
+{
+ struct ioctl_gntdev_map_grant_ref op;
+ struct grant_map *map;
+ struct ioctl_gntdev_grant_ref* grants;
+ int err;
+
+ if (copy_from_user(&op, u, sizeof(op)) != 0)
+ return -EFAULT;
+ if (debug)
+ printk("%s: priv %p, add %d\n", __FUNCTION__, priv,
+ op.count);
+ if (unlikely(op.count <= 0))
+ return -EINVAL;
+
+ err = -ENOMEM;
+ grants = kmalloc(sizeof(grants[0]) * op.count, GFP_TEMPORARY);
+ if (!grants)
+ goto out_fail;
+
+ err = -EFAULT;
+ if (copy_from_user(grants, u->refs, sizeof(grants[0]) * op.count))
+ goto out_free;
+
+ map = gntdev_alloc_map(op.count, grants);
+ if (!map)
+ goto out_free;
+
+ gntdev_add_map(priv, map);
+
+ op.index = map->index << PAGE_SHIFT;
+
+ err = -EFAULT;
+ if (copy_to_user(u, &op, sizeof(op)) != 0)
+ goto out_remove;
+
+ err = 0;
+
+out_free:
+ kfree(grants);
+out_fail:
+ return err;
+
+out_remove:
+ gntdev_del_map(priv, map);
+ gntdev_put_map(map);
+ goto out_free;
+}
+
+static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
+ struct ioctl_gntdev_unmap_grant_ref __user *u)
+{
+ struct ioctl_gntdev_unmap_grant_ref op;
+ struct grant_map *map;
+ int err = 0;
+
+ if (copy_from_user(&op, u, sizeof(op)) != 0)
+ return -EFAULT;
+
+ spin_lock(&priv->lock);
+ map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count);
+ if (map) {
+ __gntdev_del_map(priv, map);
+ } else
+ err = -EINVAL;
+ spin_unlock(&priv->lock);
+
+ if (debug)
+ printk("%s: priv %p, del %d+%d = %p\n", __FUNCTION__, priv,
+ (int)op.index, (int)op.count, map);
+
+ gntdev_put_map(map);
+ return err;
+}
+
+static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
+ struct ioctl_gntdev_get_offset_for_vaddr __user *u)
+{
+ struct ioctl_gntdev_get_offset_for_vaddr op;
+ struct vm_area_struct *vma;
+ struct grant_map *map;
+
+ if (copy_from_user(&op, u, sizeof(op)) != 0)
+ return -EFAULT;
+ if (debug)
+ printk("%s: priv %p, offset for vaddr %lx\n", __FUNCTION__, priv,
+ (unsigned long)op.vaddr);
+
+ vma = find_vma(current->mm, op.vaddr);
+ if (!vma)
+ return -EINVAL;
+
+ map = vma->vm_private_data;
+ if (vma->vm_ops != &gntdev_vmops || !map)
+ return -EINVAL;
+
+ op.offset = map->index << PAGE_SHIFT;
+ op.count = map->count;
+
+ if (copy_to_user(u, &op, sizeof(op)) != 0)
+ return -EFAULT;
+ return 0;
+}
+
+static long gntdev_ioctl(struct file *flip,
+ unsigned int cmd, unsigned long arg)
+{
+ struct gntdev_priv *priv = flip->private_data;
+ void __user *ptr = (void __user *)arg;
+
+ switch (cmd) {
+ case IOCTL_GNTDEV_MAP_GRANT_REF:
+ return gntdev_ioctl_map_grant_ref(priv, ptr);
+
+ case IOCTL_GNTDEV_UNMAP_GRANT_REF:
+ return gntdev_ioctl_unmap_grant_ref(priv, ptr);
+
+ case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR:
+ return gntdev_ioctl_get_offset_for_vaddr(priv, ptr);
+
+ default:
+ if (debug)
+ printk("%s: priv %p, unknown cmd %x\n",
+ __FUNCTION__, priv, cmd);
+ return -ENOIOCTLCMD;
+ }
+
+ return 0;
+}
+
+static int gntdev_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+ struct grant_map *map = vma->vm_private_data;
+ pgoff_t pgoff = vmf->pgoff - vma->vm_pgoff;
+
+ if (!map || !map->is_mapped || pgoff < 0 || pgoff > map->count) {
+ if (debug)
+ printk("%s: vaddr %p, pgoff %ld (shouldn't happen)\n",
+ __FUNCTION__, vmf->virtual_address, pgoff);
+ return VM_FAULT_SIGBUS;
+ }
+
+ vmf->page = map->pages[pgoff].page;
+ get_page(vmf->page);
+ return 0;
+}
+
+static void gntdev_vma_close(struct vm_area_struct *vma)
+{
+ struct grant_map *map = vma->vm_private_data;
+ gntdev_put_map(map);
+}
+
+static struct vm_operations_struct gntdev_vmops = {
+ .fault = gntdev_vma_fault,
+ .close = gntdev_vma_close,
+};
+
+static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
+{
+ struct gntdev_priv *priv = flip->private_data;
+ int index = vma->vm_pgoff;
+ int count = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+ struct grant_map *map;
+ int err = -EINVAL;
+
+ if (!(vma->vm_flags & VM_SHARED))
+ return -EINVAL;
+
+ spin_lock(&priv->lock);
+ map = gntdev_find_map_index(priv, index, count);
+
+ if (debug)
+ printk("%s: map %d+%d at %lx (priv %p, map %p)\n", __func__,
+ index, count, vma->vm_start, priv, map);
+
+ if (!map)
+ goto unlock_out;
+
+ if (!map->is_mapped) {
+ map->is_ro = !(vma->vm_flags & VM_WRITE);
+ err = gntdev_do_map(map);
+ if (err)
+ goto unlock_out;
+ }
+
+ if ((vma->vm_flags & VM_WRITE) && map->is_ro)
+ goto unlock_out;
+
+ err = 0;
+ vma->vm_ops = &gntdev_vmops;
+
+ vma->vm_flags |= VM_RESERVED;
+ vma->vm_flags |= VM_DONTEXPAND;
+ vma->vm_flags |= VM_IO;
+
+ vma->vm_private_data = map;
+
+ atomic_inc(&map->users);
+
+unlock_out:
+ spin_unlock(&priv->lock);
+ return err;
+}
+
+static const struct file_operations gntdev_fops = {
+ .owner = THIS_MODULE,
+ .open = gntdev_open,
+ .release = gntdev_release,
+ .mmap = gntdev_mmap,
+ .unlocked_ioctl = gntdev_ioctl
+};
+
+static struct miscdevice gntdev_miscdev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "xen/gntdev",
+ .fops = &gntdev_fops,
+};
+
+/* ------------------------------------------------------------------ */
+
+static int __init gntdev_init(void)
+{
+ int err;
+
+ if (!xen_domain())
+ return -ENODEV;
+
+ if (!xen_feature(XENFEAT_auto_translated_physmap))
+ return -ENODEV;
+
+ err = misc_register(&gntdev_miscdev);
+ if (err != 0) {
+ printk(KERN_ERR "Could not register gntdev device\n");
+ return err;
+ }
+ return 0;
+}
+
+static void __exit gntdev_exit(void)
+{
+ misc_deregister(&gntdev_miscdev);
+}
+
+module_init(gntdev_init);
+module_exit(gntdev_exit);
+
+/* ------------------------------------------------------------------ */
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index a73f07c..f6b98c0 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -626,6 +626,9 @@ static int __init gntdev_init(void)
if (!xen_domain())
return -ENODEV;
+ if (xen_feature(XENFEAT_auto_translated_physmap))
+ return -ENODEV;
+
err = misc_register(&gntdev_miscdev);
if (err != 0) {
printk(KERN_ERR "Could not register gntdev device\n");
--
1.7.2.3
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH 6/6] xen-gntdev: Introduce HVM version of gntdev
2010-12-14 14:55 ` [PATCH 6/6] xen-gntdev: Introduce HVM version of gntdev Daniel De Graaf
@ 2010-12-14 21:45 ` Jeremy Fitzhardinge
2010-12-14 22:27 ` Daniel De Graaf
0 siblings, 1 reply; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2010-12-14 21:45 UTC (permalink / raw)
To: Daniel De Graaf; +Cc: xen-devel, Ian.Campbell
On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
> HVM does not allow direct PTE modification, so guest pages must be
> allocated and used for grant mappings. If Xen does not provide an
> auto-translated physmap, the existing direct PTE modification is more
> efficient.
I haven't looked at this in detail, but doing a complete duplication of
the driver doesn't seem like a great idea.
Also, Stefano and I have been working on a modification to gntdev to
make sure that each granted page is backed with a proper struct page (so
that direct aio works on them). Would this help the hvm case?
Thanks,
J
> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> ---
> drivers/xen/Makefile | 2 +
> drivers/xen/gntdev-hvm.c | 601 ++++++++++++++++++++++++++++++++++++++++++++++
> drivers/xen/gntdev.c | 3 +
> 3 files changed, 606 insertions(+), 0 deletions(-)
> create mode 100644 drivers/xen/gntdev-hvm.c
>
> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
> index 9814c1d..ab0e6eb 100644
> --- a/drivers/xen/Makefile
> +++ b/drivers/xen/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_XEN_XENCOMM) += xencomm.o
> obj-$(CONFIG_XEN_BALLOON) += balloon.o
> obj-$(CONFIG_XEN_DEV_EVTCHN) += xen-evtchn.o
> obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev.o
> +obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev-hvm.o
> obj-$(CONFIG_XEN_GRANT_DEV_ALLOC) += xen-gntalloc.o
> obj-$(CONFIG_XEN_PCIDEV_BACKEND) += pciback/
> obj-$(CONFIG_XEN_BLKDEV_BACKEND) += blkback/
> @@ -26,4 +27,5 @@ obj-$(CONFIG_XEN_PLATFORM_PCI) += platform-pci.o
>
> xen-evtchn-y := evtchn.o
> xen-gntdev-y := gntdev.o
> +xen-gntdev-hvm-y := gntdev-hvm.o
> xen-gntalloc-y := gntalloc.o
> diff --git a/drivers/xen/gntdev-hvm.c b/drivers/xen/gntdev-hvm.c
> new file mode 100644
> index 0000000..331d5af
> --- /dev/null
> +++ b/drivers/xen/gntdev-hvm.c
> @@ -0,0 +1,601 @@
> +/******************************************************************************
> + * gntdev.c
> + *
> + * Device for accessing (in user-space) pages that have been granted by other
> + * domains.
> + *
> + * Copyright (c) 2006-2007, D G Murray.
> + * (c) 2009 Gerd Hoffmann <kraxel@redhat.com>
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> + */
> +
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/init.h>
> +#include <linux/miscdevice.h>
> +#include <linux/fs.h>
> +#include <linux/mm.h>
> +#include <linux/mman.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/vmalloc.h>
> +
> +#include <xen/xen.h>
> +#include <xen/grant_table.h>
> +#include <xen/gntdev.h>
> +#include <xen/interface/memory.h>
> +#include <asm/xen/hypervisor.h>
> +#include <asm/xen/hypercall.h>
> +#include <asm/xen/page.h>
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Derek G. Murray <Derek.Murray@cl.cam.ac.uk>, "
> + "Gerd Hoffmann <kraxel@redhat.com>");
> +MODULE_DESCRIPTION("User-space granted page access driver");
> +
> +static int debug = 0;
> +module_param(debug, int, 0644);
> +static int limit = 1024*1024;
> +module_param(limit, int, 0644);
> +static atomic_t pages_mapped = ATOMIC_INIT(0);
> +
> +struct gntdev_priv {
> + struct list_head maps;
> + spinlock_t lock;
> +};
> +
> +struct granted_page {
> + struct page* page;
> + union {
> + struct ioctl_gntdev_grant_ref target;
> + grant_handle_t handle;
> + };
> +};
> +
> +struct grant_map {
> + struct list_head next;
> + int index;
> + int count;
> + atomic_t users;
> + int is_mapped:1;
> + int is_ro:1;
> + struct granted_page pages[0];
> +};
> +
> +static struct vm_operations_struct gntdev_vmops;
> +
> +/* ------------------------------------------------------------------ */
> +
> +static void gntdev_print_maps(struct gntdev_priv *priv,
> + char *text, int text_index)
> +{
> + struct grant_map *map;
> +
> + printk("%s: maps list (priv %p)\n", __FUNCTION__, priv);
> + list_for_each_entry(map, &priv->maps, next) {
> + printk(" %p: %2d+%2d, r%c, %s %d,%d %s\n", map,
> + map->index, map->count, map->is_ro ? 'o' : 'w',
> + map->is_mapped ? "use,hnd" : "dom,ref",
> + map->is_mapped ? atomic_read(&map->users)
> + : map->pages[0].target.domid,
> + map->is_mapped ? map->pages[0].handle
> + : map->pages[0].target.ref,
> + map->index == text_index && text ? text : "");
> + }
> +}
> +
> +static struct grant_map *gntdev_alloc_map(int count,
> + struct ioctl_gntdev_grant_ref* grants)
> +{
> + struct grant_map *add;
> + int i;
> +
> + add = kzalloc(sizeof(*add) + sizeof(add->pages[0])*count, GFP_KERNEL);
> + if (!add)
> + return NULL;
> +
> + atomic_set(&add->users, 1);
> + add->count = count;
> +
> + for(i = 0; i < count; i++)
> + add->pages[i].target = grants[i];
> +
> + return add;
> +}
> +
> +static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
> +{
> + struct grant_map *map;
> +
> + spin_lock(&priv->lock);
> +
> + list_for_each_entry(map, &priv->maps, next) {
> + if (add->index + add->count < map->index) {
> + list_add_tail(&add->next, &map->next);
> + goto done;
> + }
> + add->index = map->index + map->count;
> + }
> + list_add_tail(&add->next, &priv->maps);
> +
> +done:
> + if (debug)
> + gntdev_print_maps(priv, "[new]", add->index);
> +
> + spin_unlock(&priv->lock);
> +}
> +
> +static void __gntdev_del_map(struct gntdev_priv *priv, struct grant_map *map)
> +{
> + list_del(&map->next);
> +}
> +
> +static void gntdev_del_map(struct gntdev_priv *priv, struct grant_map *map)
> +{
> + spin_lock(&priv->lock);
> + __gntdev_del_map(priv, map);
> + spin_unlock(&priv->lock);
> +}
> +
> +static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index,
> + int count)
> +{
> + struct grant_map *map;
> +
> + list_for_each_entry(map, &priv->maps, next) {
> + if (map->index != index)
> + continue;
> + if (map->count != count)
> + continue;
> + return map;
> + }
> + return NULL;
> +}
> +
> +static void gntdev_unmap_fast(struct grant_map *map,
> + struct gnttab_unmap_grant_ref *unmap_ops)
> +{
> + int err, flags, i, unmap_size = 0;
> + unsigned long pfn;
> + phys_addr_t mfn;
> +
> + flags = GNTMAP_host_map;
> + if (map->is_ro)
> + flags |= GNTMAP_readonly;
> +
> + for (i=0; i < map->count; i++) {
> + if (!map->pages[i].page)
> + continue;
> + pfn = page_to_pfn(map->pages[i].page);
> + mfn = (phys_addr_t)pfn_to_kaddr(pfn);
> + gnttab_set_unmap_op(&unmap_ops[unmap_size], mfn, flags,
> + map->pages[i].handle);
> + unmap_size++;
> + }
> +
> + err = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
> + unmap_ops, unmap_size);
> + WARN_ON(err);
> +
> + for (i = 0; i < unmap_size; i++)
> + WARN_ON(unmap_ops[i].status);
> +}
> +
> +// for the out-of-memory case
> +static void gntdev_unmap_slow(struct grant_map *map)
> +{
> + int err, flags, i;
> + unsigned long pfn;
> + phys_addr_t mfn;
> + struct gnttab_unmap_grant_ref unmap_op;
> +
> + flags = GNTMAP_host_map;
> + if (map->is_ro)
> + flags |= GNTMAP_readonly;
> +
> + for (i=0; i < map->count; i++) {
> + if (!map->pages[i].page)
> + continue;
> +
> + pfn = page_to_pfn(map->pages[i].page);
> + mfn = (phys_addr_t)pfn_to_kaddr(pfn);
> + gnttab_set_unmap_op(&unmap_op, mfn, flags, map->pages[i].handle);
> + err = HYPERVISOR_grant_table_op(
> + GNTTABOP_unmap_grant_ref, &unmap_op, 1);
> + WARN_ON(err);
> + WARN_ON(unmap_op.status);
> + }
> +}
> +
> +static void gntdev_put_map(struct grant_map *map)
> +{
> + struct gnttab_unmap_grant_ref *unmap_ops;
> + int i;
> + if (!map)
> + return;
> + if (!atomic_dec_and_test(&map->users))
> + return;
> + if (debug)
> + printk("%s: unmap %p (%d pages)\n", __FUNCTION__, map, map->count);
> + if (map->is_mapped) {
> + unmap_ops = kzalloc(sizeof(unmap_ops[0]) * map->count,
> + GFP_TEMPORARY);
> + if (likely(unmap_ops)) {
> + gntdev_unmap_fast(map, unmap_ops);
> + kfree(unmap_ops);
> + } else {
> + gntdev_unmap_slow(map);
> + }
> +
> + atomic_sub(map->count, &pages_mapped);
> + }
> + for (i=0; i < map->count; i++)
> + if (map->pages[i].page)
> + __free_page(map->pages[i].page);
> + kfree(map);
> +}
> +
> +static int gntdev_open(struct inode *inode, struct file *flip)
> +{
> + struct gntdev_priv *priv;
> +
> + priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> + if (!priv)
> + return -ENOMEM;
> +
> + INIT_LIST_HEAD(&priv->maps);
> + spin_lock_init(&priv->lock);
> +
> + flip->private_data = priv;
> + if (debug)
> + printk("%s: priv %p\n", __FUNCTION__, priv);
> +
> + return 0;
> +}
> +
> +static int gntdev_release(struct inode *inode, struct file *flip)
> +{
> + struct gntdev_priv *priv = flip->private_data;
> + struct grant_map *map;
> +
> + if (debug) {
> + printk("%s: priv %p\n", __FUNCTION__, priv);
> + gntdev_print_maps(priv, NULL, 0);
> + }
> +
> + spin_lock(&priv->lock);
> + while (!list_empty(&priv->maps)) {
> + map = list_entry(priv->maps.next, struct grant_map, next);
> + list_del(&map->next);
> + gntdev_put_map(map);
> + }
> + spin_unlock(&priv->lock);
> +
> + kfree(priv);
> + return 0;
> +}
> +
> +static int gntdev_do_map(struct grant_map *map)
> +{
> + int err, flags, i;
> + struct page* page;
> + phys_addr_t mfn;
> + struct gnttab_map_grant_ref* map_ops;
> +
> + flags = GNTMAP_host_map;
> + if (map->is_ro)
> + flags |= GNTMAP_readonly;
> +
> + err = -ENOMEM;
> +
> + if (unlikely(atomic_add_return(map->count, &pages_mapped) > limit)) {
> + if (debug)
> + printk("%s: maps full\n", __FUNCTION__);
> + goto out;
> + }
> +
> + map_ops = kzalloc(sizeof(map_ops[0]) * map->count, GFP_TEMPORARY);
> + if (!map_ops)
> + goto out;
> +
> + for (i = 0; i < map->count; i++) {
> + page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO);
> + if (unlikely(!page))
> + goto out_free;
> + map->pages[i].page = page;
> + mfn = (phys_addr_t)pfn_to_kaddr(page_to_pfn(page));
> + gnttab_set_map_op(&map_ops[i], mfn, flags,
> + map->pages[i].target.ref,
> + map->pages[i].target.domid);
> + }
> + err = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,
> + map_ops, map->count);
> + if (WARN_ON(err))
> + goto out_free;
> +
> + if (debug && map->count)
> + printk("%s: mapped first at gfn=%lx mfn=%lx\n", __func__,
> + page_to_pfn(map->pages[0].page), pfn_to_mfn(page_to_pfn(map->pages[0].page)));
> +
> + map->is_mapped = 1;
> +
> + for (i = 0; i < map->count; i++) {
> + if (map_ops[i].status) {
> + if (debug)
> + printk("%s: failed map at page %d: stat=%d\n",
> + __FUNCTION__, i, map_ops[i].status);
> + __free_page(map->pages[i].page);
> + map->pages[i].page = NULL;
> + err = -EINVAL;
> + } else {
> + map->pages[i].handle = map_ops[i].handle;
> + }
> + }
> +
> +out_free:
> + kfree(map_ops);
> +out:
> + if (!map->is_mapped)
> + atomic_sub(map->count, &pages_mapped);
> + return err;
> +}
> +
> +static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
> + struct ioctl_gntdev_map_grant_ref __user *u)
> +{
> + struct ioctl_gntdev_map_grant_ref op;
> + struct grant_map *map;
> + struct ioctl_gntdev_grant_ref* grants;
> + int err;
> +
> + if (copy_from_user(&op, u, sizeof(op)) != 0)
> + return -EFAULT;
> + if (debug)
> + printk("%s: priv %p, add %d\n", __FUNCTION__, priv,
> + op.count);
> + if (unlikely(op.count <= 0))
> + return -EINVAL;
> +
> + err = -ENOMEM;
> + grants = kmalloc(sizeof(grants[0]) * op.count, GFP_TEMPORARY);
> + if (!grants)
> + goto out_fail;
> +
> + err = -EFAULT;
> + if (copy_from_user(grants, u->refs, sizeof(grants[0]) * op.count))
> + goto out_free;
> +
> + map = gntdev_alloc_map(op.count, grants);
> + if (!map)
> + goto out_free;
> +
> + gntdev_add_map(priv, map);
> +
> + op.index = map->index << PAGE_SHIFT;
> +
> + err = -EFAULT;
> + if (copy_to_user(u, &op, sizeof(op)) != 0)
> + goto out_remove;
> +
> + err = 0;
> +
> +out_free:
> + kfree(grants);
> +out_fail:
> + return err;
> +
> +out_remove:
> + gntdev_del_map(priv, map);
> + gntdev_put_map(map);
> + goto out_free;
> +}
> +
> +static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
> + struct ioctl_gntdev_unmap_grant_ref __user *u)
> +{
> + struct ioctl_gntdev_unmap_grant_ref op;
> + struct grant_map *map;
> + int err = 0;
> +
> + if (copy_from_user(&op, u, sizeof(op)) != 0)
> + return -EFAULT;
> +
> + spin_lock(&priv->lock);
> + map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count);
> + if (map) {
> + __gntdev_del_map(priv, map);
> + } else
> + err = -EINVAL;
> + spin_unlock(&priv->lock);
> +
> + if (debug)
> + printk("%s: priv %p, del %d+%d = %p\n", __FUNCTION__, priv,
> + (int)op.index, (int)op.count, map);
> +
> + gntdev_put_map(map);
> + return err;
> +}
> +
> +static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
> + struct ioctl_gntdev_get_offset_for_vaddr __user *u)
> +{
> + struct ioctl_gntdev_get_offset_for_vaddr op;
> + struct vm_area_struct *vma;
> + struct grant_map *map;
> +
> + if (copy_from_user(&op, u, sizeof(op)) != 0)
> + return -EFAULT;
> + if (debug)
> + printk("%s: priv %p, offset for vaddr %lx\n", __FUNCTION__, priv,
> + (unsigned long)op.vaddr);
> +
> + vma = find_vma(current->mm, op.vaddr);
> + if (!vma)
> + return -EINVAL;
> +
> + map = vma->vm_private_data;
> + if (vma->vm_ops != &gntdev_vmops || !map)
> + return -EINVAL;
> +
> + op.offset = map->index << PAGE_SHIFT;
> + op.count = map->count;
> +
> + if (copy_to_user(u, &op, sizeof(op)) != 0)
> + return -EFAULT;
> + return 0;
> +}
> +
> +static long gntdev_ioctl(struct file *flip,
> + unsigned int cmd, unsigned long arg)
> +{
> + struct gntdev_priv *priv = flip->private_data;
> + void __user *ptr = (void __user *)arg;
> +
> + switch (cmd) {
> + case IOCTL_GNTDEV_MAP_GRANT_REF:
> + return gntdev_ioctl_map_grant_ref(priv, ptr);
> +
> + case IOCTL_GNTDEV_UNMAP_GRANT_REF:
> + return gntdev_ioctl_unmap_grant_ref(priv, ptr);
> +
> + case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR:
> + return gntdev_ioctl_get_offset_for_vaddr(priv, ptr);
> +
> + default:
> + if (debug)
> + printk("%s: priv %p, unknown cmd %x\n",
> + __FUNCTION__, priv, cmd);
> + return -ENOIOCTLCMD;
> + }
> +
> + return 0;
> +}
> +
> +static int gntdev_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> +{
> + struct grant_map *map = vma->vm_private_data;
> + pgoff_t pgoff = vmf->pgoff - vma->vm_pgoff;
> +
> + if (!map || !map->is_mapped || pgoff < 0 || pgoff > map->count) {
> + if (debug)
> + printk("%s: vaddr %p, pgoff %ld (shouldn't happen)\n",
> + __FUNCTION__, vmf->virtual_address, pgoff);
> + return VM_FAULT_SIGBUS;
> + }
> +
> + vmf->page = map->pages[pgoff].page;
> + get_page(vmf->page);
> + return 0;
> +}
> +
> +static void gntdev_vma_close(struct vm_area_struct *vma)
> +{
> + struct grant_map *map = vma->vm_private_data;
> + gntdev_put_map(map);
> +}
> +
> +static struct vm_operations_struct gntdev_vmops = {
> + .fault = gntdev_vma_fault,
> + .close = gntdev_vma_close,
> +};
> +
> +static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
> +{
> + struct gntdev_priv *priv = flip->private_data;
> + int index = vma->vm_pgoff;
> + int count = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
> + struct grant_map *map;
> + int err = -EINVAL;
> +
> + if (!(vma->vm_flags & VM_SHARED))
> + return -EINVAL;
> +
> + spin_lock(&priv->lock);
> + map = gntdev_find_map_index(priv, index, count);
> +
> + if (debug)
> + printk("%s: map %d+%d at %lx (priv %p, map %p)\n", __func__,
> + index, count, vma->vm_start, priv, map);
> +
> + if (!map)
> + goto unlock_out;
> +
> + if (!map->is_mapped) {
> + map->is_ro = !(vma->vm_flags & VM_WRITE);
> + err = gntdev_do_map(map);
> + if (err)
> + goto unlock_out;
> + }
> +
> + if ((vma->vm_flags & VM_WRITE) && map->is_ro)
> + goto unlock_out;
> +
> + err = 0;
> + vma->vm_ops = &gntdev_vmops;
> +
> + vma->vm_flags |= VM_RESERVED;
> + vma->vm_flags |= VM_DONTEXPAND;
> + vma->vm_flags |= VM_IO;
> +
> + vma->vm_private_data = map;
> +
> + atomic_inc(&map->users);
> +
> +unlock_out:
> + spin_unlock(&priv->lock);
> + return err;
> +}
> +
> +static const struct file_operations gntdev_fops = {
> + .owner = THIS_MODULE,
> + .open = gntdev_open,
> + .release = gntdev_release,
> + .mmap = gntdev_mmap,
> + .unlocked_ioctl = gntdev_ioctl
> +};
> +
> +static struct miscdevice gntdev_miscdev = {
> + .minor = MISC_DYNAMIC_MINOR,
> + .name = "xen/gntdev",
> + .fops = &gntdev_fops,
> +};
> +
> +/* ------------------------------------------------------------------ */
> +
> +static int __init gntdev_init(void)
> +{
> + int err;
> +
> + if (!xen_domain())
> + return -ENODEV;
> +
> + if (!xen_feature(XENFEAT_auto_translated_physmap))
> + return -ENODEV;
> +
> + err = misc_register(&gntdev_miscdev);
> + if (err != 0) {
> + printk(KERN_ERR "Could not register gntdev device\n");
> + return err;
> + }
> + return 0;
> +}
> +
> +static void __exit gntdev_exit(void)
> +{
> + misc_deregister(&gntdev_miscdev);
> +}
> +
> +module_init(gntdev_init);
> +module_exit(gntdev_exit);
> +
> +/* ------------------------------------------------------------------ */
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index a73f07c..f6b98c0 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -626,6 +626,9 @@ static int __init gntdev_init(void)
> if (!xen_domain())
> return -ENODEV;
>
> + if (xen_feature(XENFEAT_auto_translated_physmap))
> + return -ENODEV;
> +
> err = misc_register(&gntdev_miscdev);
> if (err != 0) {
> printk(KERN_ERR "Could not register gntdev device\n");
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH 6/6] xen-gntdev: Introduce HVM version of gntdev
2010-12-14 21:45 ` Jeremy Fitzhardinge
@ 2010-12-14 22:27 ` Daniel De Graaf
0 siblings, 0 replies; 32+ messages in thread
From: Daniel De Graaf @ 2010-12-14 22:27 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: xen-devel, Ian.Campbell
On 12/14/2010 04:45 PM, Jeremy Fitzhardinge wrote:
> On 12/14/2010 06:55 AM, Daniel De Graaf wrote:
>> HVM does not allow direct PTE modification, so guest pages must be
>> allocated and used for grant mappings. If Xen does not provide an
>> auto-translated physmap, the existing direct PTE modification is more
>> efficient.
>
> I haven't looked at this in detail, but doing a complete duplication of
> the driver doesn't seem like a great idea.
>
> Also, Stefano and I have been working on a modification to gntdev to
> make sure that each granted page is backed with a proper struct page (so
> that direct aio works on them). Would this help the hvm case?
>
> Thanks,
> J
Only if this also allocates a guest frame number that can be passed to Xen
in order to eliminate GNTMAP_contains_pte (which cannot work in HVM). I do
have a first attempt to get this version working in the PV case (I can
share it if you're interested); mapping works, but unmap currently leaves
the system in a bad state, triggering a number of BUG_ONs.
One problem that I noticed soon after posting is that Xen does not restore
the original mapping when the unmap hypercall is run; instead, it replaces
the page with one that discards writes and returns all one bits on read.
In the PV case, it seems like integrating with the balloon device would be
the proper way to request that Xen restore a real MFN under the GFNs that
used to map granted pages. I have tried to use XENMEM_populate_physmap to
request that Xen restore page mappings, but this seems to be a noop on HVM.
As an alternative to completely duplicating the module, the module could be
expanded to check xen_feature(XENFEAT_auto_translated_physmap) and perform
the correct type of mapping based on that. Eliminating PTE modification
completely will make the module more maintainable, however.
--
Daniel De Graaf
National Security Agency
^ permalink raw reply [flat|nested] 32+ messages in thread