All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] [PVOPS] fix gntdev on PAE
@ 2010-02-01 15:11 Stefano Stabellini
  2010-02-01 15:46 ` Stefano Stabellini
  0 siblings, 1 reply; 15+ messages in thread
From: Stefano Stabellini @ 2010-02-01 15:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Jeremy Fitzhardinge

Hi all,
this small patch fixes gntdev on Linux pvops kernels:
gnttab_set_map_op and gnttab_set_unmap_op shouldn't take unsigned long
as parameters for machine addresses because they are not big enough on
PAE systems.
This patch fixes the issue using phys_addr_t instead and enables
XEN_GNTDEV compilation again.


Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index f6828c5..a2833bb 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -170,7 +170,7 @@ config XEN_S3
        
 config XEN_GNTDEV
 	tristate "userspace grant access device driver"
-	depends on XEN && BROKEN
+	depends on XEN
 	select MMU_NOTIFIER
 	help
 	  Allows userspace processes use grants.

diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index ef07e91..8552d7e 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -146,7 +146,7 @@ void gnttab_grant_foreign_transfer_ref(grant_ref_t, domid_t domid,
 				       unsigned long pfn);
 
 static inline void
-gnttab_set_map_op(struct gnttab_map_grant_ref *map, unsigned long addr,
+gnttab_set_map_op(struct gnttab_map_grant_ref *map, phys_addr_t addr,
 		  uint32_t flags, grant_ref_t ref, domid_t domid)
 {
 	if (flags & GNTMAP_contains_pte)
@@ -162,7 +162,7 @@ gnttab_set_map_op(struct gnttab_map_grant_ref *map, unsigned long addr,
 }
 
 static inline void
-gnttab_set_unmap_op(struct gnttab_unmap_grant_ref *unmap, unsigned long addr,
+gnttab_set_unmap_op(struct gnttab_unmap_grant_ref *unmap, phys_addr_t addr,
 		    uint32_t flags, grant_handle_t handle)
 {
 	if (flags & GNTMAP_contains_pte)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PVOPS] fix gntdev on PAE
  2010-02-01 15:11 [PATCH] [PVOPS] fix gntdev on PAE Stefano Stabellini
@ 2010-02-01 15:46 ` Stefano Stabellini
  2010-02-09 21:57   ` Jeremy Fitzhardinge
  2010-02-09 22:24   ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 15+ messages in thread
From: Stefano Stabellini @ 2010-02-01 15:46 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Fitzhardinge, xen-devel@lists.xensource.com, Jeremy

On Mon, 1 Feb 2010, Stefano Stabellini wrote:
> Hi all,
> this small patch fixes gntdev on Linux pvops kernels:
> gnttab_set_map_op and gnttab_set_unmap_op shouldn't take unsigned long
> as parameters for machine addresses because they are not big enough on
> PAE systems.
> This patch fixes the issue using phys_addr_t instead and enables
> XEN_GNTDEV compilation again.
> 
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> 

BTW gntdev is used by qemu to provide the console backend to pv guests.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PVOPS] fix gntdev on PAE
  2010-02-01 15:46 ` Stefano Stabellini
@ 2010-02-09 21:57   ` Jeremy Fitzhardinge
  2010-02-10 12:19     ` Stefano Stabellini
  2010-02-09 22:24   ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 15+ messages in thread
From: Jeremy Fitzhardinge @ 2010-02-09 21:57 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel@lists.xensource.com

On 02/01/2010 07:46 AM, Stefano Stabellini wrote:
> On Mon, 1 Feb 2010, Stefano Stabellini wrote:
>    
>> Hi all,
>> this small patch fixes gntdev on Linux pvops kernels:
>> gnttab_set_map_op and gnttab_set_unmap_op shouldn't take unsigned long
>> as parameters for machine addresses because they are not big enough on
>> PAE systems.
>> This patch fixes the issue using phys_addr_t instead and enables
>> XEN_GNTDEV compilation again.
>>
>>
>> Signed-off-by: Stefano Stabellini<stefano.stabellini@eu.citrix.com>
>>
>>      
> BTW gntdev is used by qemu to provide the console backend to pv guests.
>    

Is that recent?  Console had been working before hadn't it?

The gntdev problems I saw were more locking related than anything to do 
with PAE.  Did you try testing with lock debugging enabled?

Thanks,
     J

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PVOPS] fix gntdev on PAE
  2010-02-01 15:46 ` Stefano Stabellini
  2010-02-09 21:57   ` Jeremy Fitzhardinge
@ 2010-02-09 22:24   ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 15+ messages in thread
From: Jeremy Fitzhardinge @ 2010-02-09 22:24 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Ian Campbell, xen-devel@lists.xensource.com

On 02/01/2010 07:46 AM, Stefano Stabellini wrote:
> On Mon, 1 Feb 2010, Stefano Stabellini wrote:
>    
>> Hi all,
>> this small patch fixes gntdev on Linux pvops kernels:
>> gnttab_set_map_op and gnttab_set_unmap_op shouldn't take unsigned long
>> as parameters for machine addresses because they are not big enough on
>> PAE systems.
>> This patch fixes the issue using phys_addr_t instead and enables
>> XEN_GNTDEV compilation again.
>>
>>
>> Signed-off-by: Stefano Stabellini<stefano.stabellini@eu.citrix.com>
>>
>>      
> BTW gntdev is used by qemu to provide the console backend to pv guests.
>    

I applied the grant_table.h parts of the patch since its clearly a fix, 
but I think we need confirm that gnt_dev is really OK.

     J

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PVOPS] fix gntdev on PAE
  2010-02-09 21:57   ` Jeremy Fitzhardinge
@ 2010-02-10 12:19     ` Stefano Stabellini
  2010-02-10 23:12       ` Jeremy Fitzhardinge
  2010-05-28 17:29       ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 15+ messages in thread
From: Stefano Stabellini @ 2010-02-10 12:19 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel@lists.xensource.com, Stefano Stabellini

On Tue, 9 Feb 2010, Jeremy Fitzhardinge wrote:
> On 02/01/2010 07:46 AM, Stefano Stabellini wrote:
> > On Mon, 1 Feb 2010, Stefano Stabellini wrote:
> >    
> >> Hi all,
> >> this small patch fixes gntdev on Linux pvops kernels:
> >> gnttab_set_map_op and gnttab_set_unmap_op shouldn't take unsigned long
> >> as parameters for machine addresses because they are not big enough on
> >> PAE systems.
> >> This patch fixes the issue using phys_addr_t instead and enables
> >> XEN_GNTDEV compilation again.
> >>
> >>
> >> Signed-off-by: Stefano Stabellini<stefano.stabellini@eu.citrix.com>
> >>
> >>      
> > BTW gntdev is used by qemu to provide the console backend to pv guests.
> >    
> 
> Is that recent?  Console had been working before hadn't it?
> 
> The gntdev problems I saw were more locking related than anything to do 
> with PAE.  Did you try testing with lock debugging enabled?
> 

Yes, I don't have any problem with locking in gntdev on my testbox.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PVOPS] fix gntdev on PAE
  2010-02-10 12:19     ` Stefano Stabellini
@ 2010-02-10 23:12       ` Jeremy Fitzhardinge
  2010-05-28 17:29       ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 15+ messages in thread
From: Jeremy Fitzhardinge @ 2010-02-10 23:12 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel@lists.xensource.com

On 02/10/2010 04:19 AM, Stefano Stabellini wrote:
> On Tue, 9 Feb 2010, Jeremy Fitzhardinge wrote:
>    
>> On 02/01/2010 07:46 AM, Stefano Stabellini wrote:
>>      
>>> On Mon, 1 Feb 2010, Stefano Stabellini wrote:
>>>
>>>        
>>>> Hi all,
>>>> this small patch fixes gntdev on Linux pvops kernels:
>>>> gnttab_set_map_op and gnttab_set_unmap_op shouldn't take unsigned long
>>>> as parameters for machine addresses because they are not big enough on
>>>> PAE systems.
>>>> This patch fixes the issue using phys_addr_t instead and enables
>>>> XEN_GNTDEV compilation again.
>>>>
>>>>
>>>> Signed-off-by: Stefano Stabellini<stefano.stabellini@eu.citrix.com>
>>>>
>>>>
>>>>          
>>> BTW gntdev is used by qemu to provide the console backend to pv guests.
>>>
>>>        
>> Is that recent?  Console had been working before hadn't it?
>>
>> The gntdev problems I saw were more locking related than anything to do
>> with PAE.  Did you try testing with lock debugging enabled?
>>
>>      
> Yes, I don't have any problem with locking in gntdev on my testbox.
>    

OK, I'll give it another go.

     J

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PVOPS] fix gntdev on PAE
  2010-02-10 12:19     ` Stefano Stabellini
  2010-02-10 23:12       ` Jeremy Fitzhardinge
@ 2010-05-28 17:29       ` Jeremy Fitzhardinge
  2010-06-01  9:38         ` Stefano Stabellini
  1 sibling, 1 reply; 15+ messages in thread
From: Jeremy Fitzhardinge @ 2010-05-28 17:29 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel@lists.xensource.com, Gerd Hoffmann

On 02/10/2010 04:19 AM, Stefano Stabellini wrote:
> On Tue, 9 Feb 2010, Jeremy Fitzhardinge wrote:
>   
>> On 02/01/2010 07:46 AM, Stefano Stabellini wrote:
>>     
>>> On Mon, 1 Feb 2010, Stefano Stabellini wrote:
>>>    
>>>       
>>>> Hi all,
>>>> this small patch fixes gntdev on Linux pvops kernels:
>>>> gnttab_set_map_op and gnttab_set_unmap_op shouldn't take unsigned long
>>>> as parameters for machine addresses because they are not big enough on
>>>> PAE systems.
>>>> This patch fixes the issue using phys_addr_t instead and enables
>>>> XEN_GNTDEV compilation again.
>>>>
>>>>
>>>> Signed-off-by: Stefano Stabellini<stefano.stabellini@eu.citrix.com>
>>>>
>>>>      
>>>>         
>>> BTW gntdev is used by qemu to provide the console backend to pv guests.
>>>    
>>>       
>> Is that recent?  Console had been working before hadn't it?
>>
>> The gntdev problems I saw were more locking related than anything to do 
>> with PAE.  Did you try testing with lock debugging enabled?
>>
>>     
> Yes, I don't have any problem with locking in gntdev on my testbox.
>   

I managed to catch a lockdep problem in gntdev, which may be the same as
before:

BUG: sleeping function called from invalid context at kernel/rwsem.c:21
in_atomic(): 1, irqs_disabled(): 0, pid: 4091, name: qemu-dm
2 locks held by qemu-dm/4091:
 #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff810bb50f>] sys_munmap+0x33/0x58
 #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810cd63a>] __mmu_notifier_invalidate_range_start+0x0/0xc7
Pid: 4091, comm: qemu-dm Not tainted 2.6.32.13 #23
Call Trace:
 [<ffffffff8106705b>] ? __debug_show_held_locks+0x22/0x24
 [<ffffffff81039522>] __might_sleep+0x123/0x127
 [<ffffffff810a8536>] ? release_pages+0xd2/0x1e7
 [<ffffffff81498849>] down_read+0x1f/0x57
 [<ffffffff81010142>] ? check_events+0x12/0x20
 [<ffffffff810a8536>] ? release_pages+0xd2/0x1e7
 [<ffffffff810cd63a>] ? __mmu_notifier_invalidate_range_start+0x0/0xc7
 [<ffffffff8123e069>] mn_invl_range_start+0x32/0x118
 [<ffffffff810cd69c>] __mmu_notifier_invalidate_range_start+0x62/0xc7
 [<ffffffff810cd63a>] ? __mmu_notifier_invalidate_range_start+0x0/0xc7
 [<ffffffff810b54bc>] unmap_vmas+0x8c/0x91a
 [<ffffffff810ba363>] unmap_region+0xda/0x178
 [<ffffffff810bb472>] do_munmap+0x2ae/0x318
 [<ffffffff810bb51d>] sys_munmap+0x41/0x58
 [<ffffffff81013b82>] system_call_fastpath+0x16/0x1b


The problem is that mn_invl_range_start does a down_read(), but it is
called from __mmu_notifier_invalidate_range_start(), which does an
rcu_read_lock, which has the side-effect of disabling preemption.

The mmu notifier code seems to have always used rcu_read_lock this way,
so I guess this bug has always been there.  It's not immediately obvious
how to fix it.

Thoughts?

    J

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PVOPS] fix gntdev on PAE
  2010-05-28 17:29       ` Jeremy Fitzhardinge
@ 2010-06-01  9:38         ` Stefano Stabellini
  2010-06-01 16:31           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 15+ messages in thread
From: Stefano Stabellini @ 2010-06-01  9:38 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: xen-devel@lists.xensource.com, Hoffmann, Gerd, Stefano Stabellini

On Fri, 28 May 2010, Jeremy Fitzhardinge wrote:
> 
> I managed to catch a lockdep problem in gntdev, which may be the same as
> before:
> 
> BUG: sleeping function called from invalid context at kernel/rwsem.c:21
> in_atomic(): 1, irqs_disabled(): 0, pid: 4091, name: qemu-dm
> 2 locks held by qemu-dm/4091:
>  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff810bb50f>] sys_munmap+0x33/0x58
>  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810cd63a>] __mmu_notifier_invalidate_range_start+0x0/0xc7
> Pid: 4091, comm: qemu-dm Not tainted 2.6.32.13 #23
> Call Trace:
>  [<ffffffff8106705b>] ? __debug_show_held_locks+0x22/0x24
>  [<ffffffff81039522>] __might_sleep+0x123/0x127
>  [<ffffffff810a8536>] ? release_pages+0xd2/0x1e7
>  [<ffffffff81498849>] down_read+0x1f/0x57
>  [<ffffffff81010142>] ? check_events+0x12/0x20
>  [<ffffffff810a8536>] ? release_pages+0xd2/0x1e7
>  [<ffffffff810cd63a>] ? __mmu_notifier_invalidate_range_start+0x0/0xc7
>  [<ffffffff8123e069>] mn_invl_range_start+0x32/0x118
>  [<ffffffff810cd69c>] __mmu_notifier_invalidate_range_start+0x62/0xc7
>  [<ffffffff810cd63a>] ? __mmu_notifier_invalidate_range_start+0x0/0xc7
>  [<ffffffff810b54bc>] unmap_vmas+0x8c/0x91a
>  [<ffffffff810ba363>] unmap_region+0xda/0x178
>  [<ffffffff810bb472>] do_munmap+0x2ae/0x318
>  [<ffffffff810bb51d>] sys_munmap+0x41/0x58
>  [<ffffffff81013b82>] system_call_fastpath+0x16/0x1b
> 
> 
> The problem is that mn_invl_range_start does a down_read(), but it is
> called from __mmu_notifier_invalidate_range_start(), which does an
> rcu_read_lock, which has the side-effect of disabling preemption.
> 
> The mmu notifier code seems to have always used rcu_read_lock this way,
> so I guess this bug has always been there.  It's not immediately obvious
> how to fix it.
> 
> Thoughts?

What about turning the semaphore into a rwlock?
Performances shouldn't matter in this case.
Something like this:

---


diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index ddc59cc..91846b9 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -28,7 +28,7 @@
 #include <linux/types.h>
 #include <linux/uaccess.h>
 #include <linux/sched.h>
-#include <linux/rwsem.h>
+#include <linux/spinlock_types.h>
 
 #include <xen/xen.h>
 #include <xen/grant_table.h>
@@ -51,7 +51,7 @@ struct gntdev_priv {
 	struct list_head maps;
 	uint32_t used;
 	uint32_t limit;
-	struct rw_semaphore sem;
+	rwlock_t rwlock;
 	struct mm_struct *mm;
 	struct mmu_notifier mn;
 };
@@ -277,7 +277,7 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
 	unsigned long mstart, mend;
 	int err;
 
-	down_read(&priv->sem);
+	read_lock(&priv->rwlock);
 	list_for_each_entry(map, &priv->maps, next) {
 		if (!map->vma)
 			continue;
@@ -299,7 +299,7 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
 					(mend - mstart) >> PAGE_SHIFT);
 		WARN_ON(err);
 	}
-	up_read(&priv->sem);
+	read_unlock(&priv->rwlock);
 }
 
 static void mn_invl_page(struct mmu_notifier *mn,
@@ -316,7 +316,7 @@ static void mn_release(struct mmu_notifier *mn,
 	struct grant_map *map;
 	int err;
 
-	down_read(&priv->sem);
+	read_lock(&priv->rwlock);
 	list_for_each_entry(map, &priv->maps, next) {
 		if (!map->vma)
 			continue;
@@ -327,7 +327,7 @@ static void mn_release(struct mmu_notifier *mn,
 		err = unmap_grant_pages(map, 0, map->count);
 		WARN_ON(err);
 	}
-	up_read(&priv->sem);
+	read_unlock(&priv->rwlock);
 }
 
 struct mmu_notifier_ops gntdev_mmu_ops = {
@@ -347,7 +347,7 @@ static int gntdev_open(struct inode *inode, struct file *flip)
 		return -ENOMEM;
 
 	INIT_LIST_HEAD(&priv->maps);
-	init_rwsem(&priv->sem);
+	priv->rwlock = RW_LOCK_UNLOCKED;
 	priv->limit = limit;
 
 	priv->mm = get_task_mm(current);
@@ -375,13 +375,13 @@ static int gntdev_release(struct inode *inode, struct file *flip)
 	if (debug)
 		printk("%s: priv %p\n", __FUNCTION__, priv);
 
-	down_write(&priv->sem);
+	write_lock(&priv->rwlock);
 	while (!list_empty(&priv->maps)) {
 		map = list_entry(priv->maps.next, struct grant_map, next);
 		err = gntdev_del_map(map);
 		WARN_ON(err);
 	}
-	up_write(&priv->sem);
+	write_unlock(&priv->rwlock);
 	mmu_notifier_unregister(&priv->mn, priv->mm);
 	kfree(priv);
 	return 0;
@@ -404,7 +404,7 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
 	if (unlikely(op.count > priv->limit))
 		return -EINVAL;
 
-	down_write(&priv->sem);
+	write_lock(&priv->rwlock);
 	err = -ENOMEM;
 	map = gntdev_add_map(priv, op.count);
 	if (!map)
@@ -417,13 +417,13 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
 	op.index = map->index << PAGE_SHIFT;
 	if (copy_to_user(u, &op, sizeof(op)) != 0)
 		goto err_free;
-	up_write(&priv->sem);
+	write_unlock(&priv->rwlock);
 	return 0;
 
 err_free:
 	gntdev_del_map(map);
 err_unlock:
-	up_write(&priv->sem);
+	write_unlock(&priv->rwlock);
 	return err;
 }
 
@@ -440,11 +440,11 @@ static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
 		printk("%s: priv %p, del %d+%d\n", __FUNCTION__, priv,
 		       (int)op.index, (int)op.count);
 
-	down_write(&priv->sem);
+	write_lock(&priv->rwlock);
 	map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count);
 	if (map)
 		err = gntdev_del_map(map);
-	up_write(&priv->sem);
+	write_unlock(&priv->rwlock);
 	return err;
 }
 
@@ -460,16 +460,16 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
 		printk("%s: priv %p, offset for vaddr %lx\n", __FUNCTION__, priv,
 		       (unsigned long)op.vaddr);
 
-	down_read(&priv->sem);
+	read_lock(&priv->rwlock);
 	map = gntdev_find_map_vaddr(priv, op.vaddr);
 	if (map == NULL ||
 	    map->vma->vm_start != op.vaddr) {
-		up_read(&priv->sem);
+		read_unlock(&priv->rwlock);
 		return -EINVAL;
 	}
 	op.offset = map->index << PAGE_SHIFT;
 	op.count = map->count;
-	up_read(&priv->sem);
+	read_unlock(&priv->rwlock);
 
 	if (copy_to_user(u, &op, sizeof(op)) != 0)
 		return -EFAULT;
@@ -488,9 +488,9 @@ static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv,
 	if (op.count > limit)
 		return -EINVAL;
 
-	down_write(&priv->sem);
+	write_lock(&priv->rwlock);
 	priv->limit = op.count;
-	up_write(&priv->sem);
+	write_unlock(&priv->rwlock);
 	return 0;
 }
 
@@ -538,7 +538,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 		printk("%s: map %d+%d at %lx (pgoff %lx)\n", __FUNCTION__,
 		       index, count, vma->vm_start, vma->vm_pgoff);
 
-	down_read(&priv->sem);
+	read_lock(&priv->rwlock);
 	map = gntdev_find_map_index(priv, index, count);
 	if (!map)
 		goto unlock_out;
@@ -580,7 +580,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 	map->is_mapped = 1;
 
 unlock_out:
-	up_read(&priv->sem);
+	read_unlock(&priv->rwlock);
 	return err;
 }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PVOPS] fix gntdev on PAE
  2010-06-01  9:38         ` Stefano Stabellini
@ 2010-06-01 16:31           ` Jeremy Fitzhardinge
  2010-06-01 16:36             ` Stefano Stabellini
  0 siblings, 1 reply; 15+ messages in thread
From: Jeremy Fitzhardinge @ 2010-06-01 16:31 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel@lists.xensource.com, Gerd Hoffmann

On 06/01/2010 02:38 AM, Stefano Stabellini wrote:
> On Fri, 28 May 2010, Jeremy Fitzhardinge wrote:
>   
>> I managed to catch a lockdep problem in gntdev, which may be the same as
>> before:
>>
>> BUG: sleeping function called from invalid context at kernel/rwsem.c:21
>> in_atomic(): 1, irqs_disabled(): 0, pid: 4091, name: qemu-dm
>> 2 locks held by qemu-dm/4091:
>>  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff810bb50f>] sys_munmap+0x33/0x58
>>  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810cd63a>] __mmu_notifier_invalidate_range_start+0x0/0xc7
>> Pid: 4091, comm: qemu-dm Not tainted 2.6.32.13 #23
>> Call Trace:
>>  [<ffffffff8106705b>] ? __debug_show_held_locks+0x22/0x24
>>  [<ffffffff81039522>] __might_sleep+0x123/0x127
>>  [<ffffffff810a8536>] ? release_pages+0xd2/0x1e7
>>  [<ffffffff81498849>] down_read+0x1f/0x57
>>  [<ffffffff81010142>] ? check_events+0x12/0x20
>>  [<ffffffff810a8536>] ? release_pages+0xd2/0x1e7
>>  [<ffffffff810cd63a>] ? __mmu_notifier_invalidate_range_start+0x0/0xc7
>>  [<ffffffff8123e069>] mn_invl_range_start+0x32/0x118
>>  [<ffffffff810cd69c>] __mmu_notifier_invalidate_range_start+0x62/0xc7
>>  [<ffffffff810cd63a>] ? __mmu_notifier_invalidate_range_start+0x0/0xc7
>>  [<ffffffff810b54bc>] unmap_vmas+0x8c/0x91a
>>  [<ffffffff810ba363>] unmap_region+0xda/0x178
>>  [<ffffffff810bb472>] do_munmap+0x2ae/0x318
>>  [<ffffffff810bb51d>] sys_munmap+0x41/0x58
>>  [<ffffffff81013b82>] system_call_fastpath+0x16/0x1b
>>
>>
>> The problem is that mn_invl_range_start does a down_read(), but it is
>> called from __mmu_notifier_invalidate_range_start(), which does an
>> rcu_read_lock, which has the side-effect of disabling preemption.
>>
>> The mmu notifier code seems to have always used rcu_read_lock this way,
>> so I guess this bug has always been there.  It's not immediately obvious
>> how to fix it.
>>
>> Thoughts?
>>     
> What about turning the semaphore into a rwlock?
> Performances shouldn't matter in this case.
> Something like this:
>   

The problem is that the rcu lock disables preemption, so anything inside
it must be non-scheduling.  So it would need to be a spinlock type
thing, I think.

    J

> ---
>
>
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index ddc59cc..91846b9 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -28,7 +28,7 @@
>  #include <linux/types.h>
>  #include <linux/uaccess.h>
>  #include <linux/sched.h>
> -#include <linux/rwsem.h>
> +#include <linux/spinlock_types.h>
>  
>  #include <xen/xen.h>
>  #include <xen/grant_table.h>
> @@ -51,7 +51,7 @@ struct gntdev_priv {
>  	struct list_head maps;
>  	uint32_t used;
>  	uint32_t limit;
> -	struct rw_semaphore sem;
> +	rwlock_t rwlock;
>  	struct mm_struct *mm;
>  	struct mmu_notifier mn;
>  };
> @@ -277,7 +277,7 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
>  	unsigned long mstart, mend;
>  	int err;
>  
> -	down_read(&priv->sem);
> +	read_lock(&priv->rwlock);
>  	list_for_each_entry(map, &priv->maps, next) {
>  		if (!map->vma)
>  			continue;
> @@ -299,7 +299,7 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
>  					(mend - mstart) >> PAGE_SHIFT);
>  		WARN_ON(err);
>  	}
> -	up_read(&priv->sem);
> +	read_unlock(&priv->rwlock);
>  }
>  
>  static void mn_invl_page(struct mmu_notifier *mn,
> @@ -316,7 +316,7 @@ static void mn_release(struct mmu_notifier *mn,
>  	struct grant_map *map;
>  	int err;
>  
> -	down_read(&priv->sem);
> +	read_lock(&priv->rwlock);
>  	list_for_each_entry(map, &priv->maps, next) {
>  		if (!map->vma)
>  			continue;
> @@ -327,7 +327,7 @@ static void mn_release(struct mmu_notifier *mn,
>  		err = unmap_grant_pages(map, 0, map->count);
>  		WARN_ON(err);
>  	}
> -	up_read(&priv->sem);
> +	read_unlock(&priv->rwlock);
>  }
>  
>  struct mmu_notifier_ops gntdev_mmu_ops = {
> @@ -347,7 +347,7 @@ static int gntdev_open(struct inode *inode, struct file *flip)
>  		return -ENOMEM;
>  
>  	INIT_LIST_HEAD(&priv->maps);
> -	init_rwsem(&priv->sem);
> +	priv->rwlock = RW_LOCK_UNLOCKED;
>  	priv->limit = limit;
>  
>  	priv->mm = get_task_mm(current);
> @@ -375,13 +375,13 @@ static int gntdev_release(struct inode *inode, struct file *flip)
>  	if (debug)
>  		printk("%s: priv %p\n", __FUNCTION__, priv);
>  
> -	down_write(&priv->sem);
> +	write_lock(&priv->rwlock);
>  	while (!list_empty(&priv->maps)) {
>  		map = list_entry(priv->maps.next, struct grant_map, next);
>  		err = gntdev_del_map(map);
>  		WARN_ON(err);
>  	}
> -	up_write(&priv->sem);
> +	write_unlock(&priv->rwlock);
>  	mmu_notifier_unregister(&priv->mn, priv->mm);
>  	kfree(priv);
>  	return 0;
> @@ -404,7 +404,7 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
>  	if (unlikely(op.count > priv->limit))
>  		return -EINVAL;
>  
> -	down_write(&priv->sem);
> +	write_lock(&priv->rwlock);
>  	err = -ENOMEM;
>  	map = gntdev_add_map(priv, op.count);
>  	if (!map)
> @@ -417,13 +417,13 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
>  	op.index = map->index << PAGE_SHIFT;
>  	if (copy_to_user(u, &op, sizeof(op)) != 0)
>  		goto err_free;
> -	up_write(&priv->sem);
> +	write_unlock(&priv->rwlock);
>  	return 0;
>  
>  err_free:
>  	gntdev_del_map(map);
>  err_unlock:
> -	up_write(&priv->sem);
> +	write_unlock(&priv->rwlock);
>  	return err;
>  }
>  
> @@ -440,11 +440,11 @@ static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
>  		printk("%s: priv %p, del %d+%d\n", __FUNCTION__, priv,
>  		       (int)op.index, (int)op.count);
>  
> -	down_write(&priv->sem);
> +	write_lock(&priv->rwlock);
>  	map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count);
>  	if (map)
>  		err = gntdev_del_map(map);
> -	up_write(&priv->sem);
> +	write_unlock(&priv->rwlock);
>  	return err;
>  }
>  
> @@ -460,16 +460,16 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
>  		printk("%s: priv %p, offset for vaddr %lx\n", __FUNCTION__, priv,
>  		       (unsigned long)op.vaddr);
>  
> -	down_read(&priv->sem);
> +	read_lock(&priv->rwlock);
>  	map = gntdev_find_map_vaddr(priv, op.vaddr);
>  	if (map == NULL ||
>  	    map->vma->vm_start != op.vaddr) {
> -		up_read(&priv->sem);
> +		read_unlock(&priv->rwlock);
>  		return -EINVAL;
>  	}
>  	op.offset = map->index << PAGE_SHIFT;
>  	op.count = map->count;
> -	up_read(&priv->sem);
> +	read_unlock(&priv->rwlock);
>  
>  	if (copy_to_user(u, &op, sizeof(op)) != 0)
>  		return -EFAULT;
> @@ -488,9 +488,9 @@ static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv,
>  	if (op.count > limit)
>  		return -EINVAL;
>  
> -	down_write(&priv->sem);
> +	write_lock(&priv->rwlock);
>  	priv->limit = op.count;
> -	up_write(&priv->sem);
> +	write_unlock(&priv->rwlock);
>  	return 0;
>  }
>  
> @@ -538,7 +538,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>  		printk("%s: map %d+%d at %lx (pgoff %lx)\n", __FUNCTION__,
>  		       index, count, vma->vm_start, vma->vm_pgoff);
>  
> -	down_read(&priv->sem);
> +	read_lock(&priv->rwlock);
>  	map = gntdev_find_map_index(priv, index, count);
>  	if (!map)
>  		goto unlock_out;
> @@ -580,7 +580,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>  	map->is_mapped = 1;
>  
>  unlock_out:
> -	up_read(&priv->sem);
> +	read_unlock(&priv->rwlock);
>  	return err;
>  }
>  
>
>   

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PVOPS] fix gntdev on PAE
  2010-06-01 16:31           ` Jeremy Fitzhardinge
@ 2010-06-01 16:36             ` Stefano Stabellini
  2010-06-01 16:46               ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 15+ messages in thread
From: Stefano Stabellini @ 2010-06-01 16:36 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: xen-devel@lists.xensource.com, Hoffmann, Gerd, Stefano Stabellini

On Tue, 1 Jun 2010, Jeremy Fitzhardinge wrote:
> On 06/01/2010 02:38 AM, Stefano Stabellini wrote:
> > On Fri, 28 May 2010, Jeremy Fitzhardinge wrote:
> >   
> >> I managed to catch a lockdep problem in gntdev, which may be the same as
> >> before:
> >>
> >> BUG: sleeping function called from invalid context at kernel/rwsem.c:21
> >> in_atomic(): 1, irqs_disabled(): 0, pid: 4091, name: qemu-dm
> >> 2 locks held by qemu-dm/4091:
> >>  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff810bb50f>] sys_munmap+0x33/0x58
> >>  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810cd63a>] __mmu_notifier_invalidate_range_start+0x0/0xc7
> >> Pid: 4091, comm: qemu-dm Not tainted 2.6.32.13 #23
> >> Call Trace:
> >>  [<ffffffff8106705b>] ? __debug_show_held_locks+0x22/0x24
> >>  [<ffffffff81039522>] __might_sleep+0x123/0x127
> >>  [<ffffffff810a8536>] ? release_pages+0xd2/0x1e7
> >>  [<ffffffff81498849>] down_read+0x1f/0x57
> >>  [<ffffffff81010142>] ? check_events+0x12/0x20
> >>  [<ffffffff810a8536>] ? release_pages+0xd2/0x1e7
> >>  [<ffffffff810cd63a>] ? __mmu_notifier_invalidate_range_start+0x0/0xc7
> >>  [<ffffffff8123e069>] mn_invl_range_start+0x32/0x118
> >>  [<ffffffff810cd69c>] __mmu_notifier_invalidate_range_start+0x62/0xc7
> >>  [<ffffffff810cd63a>] ? __mmu_notifier_invalidate_range_start+0x0/0xc7
> >>  [<ffffffff810b54bc>] unmap_vmas+0x8c/0x91a
> >>  [<ffffffff810ba363>] unmap_region+0xda/0x178
> >>  [<ffffffff810bb472>] do_munmap+0x2ae/0x318
> >>  [<ffffffff810bb51d>] sys_munmap+0x41/0x58
> >>  [<ffffffff81013b82>] system_call_fastpath+0x16/0x1b
> >>
> >>
> >> The problem is that mn_invl_range_start does a down_read(), but it is
> >> called from __mmu_notifier_invalidate_range_start(), which does an
> >> rcu_read_lock, which has the side-effect of disabling preemption.
> >>
> >> The mmu notifier code seems to have always used rcu_read_lock this way,
> >> so I guess this bug has always been there.  It's not immediately obvious
> >> how to fix it.
> >>
> >> Thoughts?
> >>     
> > What about turning the semaphore into a rwlock?
> > Performances shouldn't matter in this case.
> > Something like this:
> >   
> 
> The problem is that the rcu lock disables preemption, so anything inside
> it must be non-scheduling.  So it would need to be a spinlock type
> thing, I think.

right, in fact rwlock is a rw spinlock if I am not mistaken

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PVOPS] fix gntdev on PAE
  2010-06-01 16:36             ` Stefano Stabellini
@ 2010-06-01 16:46               ` Jeremy Fitzhardinge
  2010-06-02 14:11                 ` Stefano Stabellini
  0 siblings, 1 reply; 15+ messages in thread
From: Jeremy Fitzhardinge @ 2010-06-01 16:46 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel@lists.xensource.com, Gerd Hoffmann

On 06/01/2010 09:36 AM, Stefano Stabellini wrote:
>>> Performances shouldn't matter in this case.
>>> Something like this:
>>>   
>>>       
>> The problem is that the rcu lock disables preemption, so anything inside
>> it must be non-scheduling.  So it would need to be a spinlock type
>> thing, I think.
>>     
> right, in fact rwlock is a rw spinlock if I am not mistaken
>   

Ah, yes.  The problem with just making it a spinlock is that other parts
of the code do copy_from_user while holding it, so they would need to be
restructured to avoid that.

Also, rwlock is (almost?) never useful compared to a plain spinlock.

    J

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PVOPS] fix gntdev on PAE
  2010-06-01 16:46               ` Jeremy Fitzhardinge
@ 2010-06-02 14:11                 ` Stefano Stabellini
  2010-06-02 17:11                   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 15+ messages in thread
From: Stefano Stabellini @ 2010-06-02 14:11 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: xen-devel@lists.xensource.com, Hoffmann, Gerd, Stefano Stabellini

On Tue, 1 Jun 2010, Jeremy Fitzhardinge wrote:
> On 06/01/2010 09:36 AM, Stefano Stabellini wrote:
> >>> Performances shouldn't matter in this case.
> >>> Something like this:
> >>>   
> >>>       
> >> The problem is that the rcu lock disables preemption, so anything inside
> >> it must be non-scheduling.  So it would need to be a spinlock type
> >> thing, I think.
> >>     
> > right, in fact rwlock is a rw spinlock if I am not mistaken
> >   
> 
> Ah, yes.  The problem with just making it a spinlock is that other parts
> of the code do copy_from_user while holding it, so they would need to be
> restructured to avoid that.
> 

Yes, you are right. I fixed the patch to do that, however I couldn't
actually reproduce the bug you are seeing so I couldn't test the error
path at all...

---

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index ddc59cc..0e571bd 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -28,7 +28,7 @@
 #include <linux/types.h>
 #include <linux/uaccess.h>
 #include <linux/sched.h>
-#include <linux/rwsem.h>
+#include <linux/spinlock_types.h>
 
 #include <xen/xen.h>
 #include <xen/grant_table.h>
@@ -51,7 +51,7 @@ struct gntdev_priv {
 	struct list_head maps;
 	uint32_t used;
 	uint32_t limit;
-	struct rw_semaphore sem;
+	rwlock_t rwlock;
 	struct mm_struct *mm;
 	struct mmu_notifier mn;
 };
@@ -84,9 +84,9 @@ static void gntdev_print_maps(struct gntdev_priv *priv,
 		       map->index == text_index && text ? text : "");
 }
 
-static struct grant_map *gntdev_add_map(struct gntdev_priv *priv, int count)
+static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
 {
-	struct grant_map *map, *add;
+	struct grant_map *add;
 
 	add = kzalloc(sizeof(struct grant_map), GFP_KERNEL);
 	if (NULL == add)
@@ -104,8 +104,20 @@ static struct grant_map *gntdev_add_map(struct gntdev_priv *priv, int count)
 	add->count = count;
 	add->priv  = priv;
 
-	if (add->count + priv->used > priv->limit)
-		goto err;
+	if (add->count + priv->used <= priv->limit)
+		return add;
+
+err:
+	kfree(add->grants);
+	kfree(add->map_ops);
+	kfree(add->unmap_ops);
+	kfree(add);
+	return NULL;
+}
+
+static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
+{
+	struct grant_map *map;
 
 	list_for_each_entry(map, &priv->maps, next) {
 		if (add->index + add->count < map->index) {
@@ -120,14 +132,6 @@ done:
 	priv->used += add->count;
 	if (debug)
 		gntdev_print_maps(priv, "[new]", add->index);
-	return add;
-
-err:
-	kfree(add->grants);
-	kfree(add->map_ops);
-	kfree(add->unmap_ops);
-	kfree(add);
-	return NULL;
 }
 
 static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index,
@@ -174,11 +178,17 @@ static int gntdev_del_map(struct grant_map *map)
 
 	map->priv->used -= map->count;
 	list_del(&map->next);
+	return 0;
+}
+
+static void gntdev_free_map(struct grant_map *map)
+{
+	if (!map)
+		return;
 	kfree(map->grants);
 	kfree(map->map_ops);
 	kfree(map->unmap_ops);
 	kfree(map);
-	return 0;
 }
 
 /* ------------------------------------------------------------------ */
@@ -277,7 +287,7 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
 	unsigned long mstart, mend;
 	int err;
 
-	down_read(&priv->sem);
+	read_lock(&priv->rwlock);
 	list_for_each_entry(map, &priv->maps, next) {
 		if (!map->vma)
 			continue;
@@ -299,7 +309,7 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
 					(mend - mstart) >> PAGE_SHIFT);
 		WARN_ON(err);
 	}
-	up_read(&priv->sem);
+	read_unlock(&priv->rwlock);
 }
 
 static void mn_invl_page(struct mmu_notifier *mn,
@@ -316,7 +326,7 @@ static void mn_release(struct mmu_notifier *mn,
 	struct grant_map *map;
 	int err;
 
-	down_read(&priv->sem);
+	read_lock(&priv->rwlock);
 	list_for_each_entry(map, &priv->maps, next) {
 		if (!map->vma)
 			continue;
@@ -327,7 +337,7 @@ static void mn_release(struct mmu_notifier *mn,
 		err = unmap_grant_pages(map, 0, map->count);
 		WARN_ON(err);
 	}
-	up_read(&priv->sem);
+	read_unlock(&priv->rwlock);
 }
 
 struct mmu_notifier_ops gntdev_mmu_ops = {
@@ -347,7 +357,7 @@ static int gntdev_open(struct inode *inode, struct file *flip)
 		return -ENOMEM;
 
 	INIT_LIST_HEAD(&priv->maps);
-	init_rwsem(&priv->sem);
+	priv->rwlock = RW_LOCK_UNLOCKED;
 	priv->limit = limit;
 
 	priv->mm = get_task_mm(current);
@@ -375,13 +385,21 @@ static int gntdev_release(struct inode *inode, struct file *flip)
 	if (debug)
 		printk("%s: priv %p\n", __FUNCTION__, priv);
 
-	down_write(&priv->sem);
-	while (!list_empty(&priv->maps)) {
-		map = list_entry(priv->maps.next, struct grant_map, next);
-		err = gntdev_del_map(map);
-		WARN_ON(err);
+	while (1) {
+		write_lock(&priv->rwlock);
+		if (!list_empty(&priv->maps)) {
+			map = list_entry(priv->maps.next, struct grant_map, next);
+			err = gntdev_del_map(map);
+			write_unlock(&priv->rwlock);
+			if (err)
+				WARN_ON(err);
+			else
+				gntdev_free_map(map);
+		} else {
+			write_unlock(&priv->rwlock);
+			break;
+		}
 	}
-	up_write(&priv->sem);
 	mmu_notifier_unregister(&priv->mn, priv->mm);
 	kfree(priv);
 	return 0;
@@ -404,27 +422,29 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
 	if (unlikely(op.count > priv->limit))
 		return -EINVAL;
 
-	down_write(&priv->sem);
 	err = -ENOMEM;
-	map = gntdev_add_map(priv, op.count);
+	map = gntdev_alloc_map(priv, op.count);
 	if (!map)
-		goto err_unlock;
-
-	err = -ENOMEM;
+		return err;
 	if (copy_from_user(map->grants, &u->refs,
-			   sizeof(map->grants[0]) * op.count) != 0)
-		goto err_free;
+			   sizeof(map->grants[0]) * op.count) != 0) {
+		gntdev_free_map(map);
+		return err;
+	}
+
+	write_lock(&priv->rwlock);
+	gntdev_add_map(priv, map);
+
 	op.index = map->index << PAGE_SHIFT;
-	if (copy_to_user(u, &op, sizeof(op)) != 0)
-		goto err_free;
-	up_write(&priv->sem);
+	write_unlock(&priv->rwlock);
+	if (copy_to_user(u, &op, sizeof(op)) != 0) {
+		write_lock(&priv->rwlock);
+		gntdev_del_map(map);
+		write_unlock(&priv->rwlock);
+		gntdev_free_map(map);
+		return err;
+	}
 	return 0;
-
-err_free:
-	gntdev_del_map(map);
-err_unlock:
-	up_write(&priv->sem);
-	return err;
 }
 
 static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
@@ -440,11 +460,13 @@ static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
 		printk("%s: priv %p, del %d+%d\n", __FUNCTION__, priv,
 		       (int)op.index, (int)op.count);
 
-	down_write(&priv->sem);
+	write_lock(&priv->rwlock);
 	map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count);
 	if (map)
 		err = gntdev_del_map(map);
-	up_write(&priv->sem);
+	write_unlock(&priv->rwlock);
+	if (!err)
+		gntdev_free_map(map);
 	return err;
 }
 
@@ -460,16 +482,16 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
 		printk("%s: priv %p, offset for vaddr %lx\n", __FUNCTION__, priv,
 		       (unsigned long)op.vaddr);
 
-	down_read(&priv->sem);
+	read_lock(&priv->rwlock);
 	map = gntdev_find_map_vaddr(priv, op.vaddr);
 	if (map == NULL ||
 	    map->vma->vm_start != op.vaddr) {
-		up_read(&priv->sem);
+		read_unlock(&priv->rwlock);
 		return -EINVAL;
 	}
 	op.offset = map->index << PAGE_SHIFT;
 	op.count = map->count;
-	up_read(&priv->sem);
+	read_unlock(&priv->rwlock);
 
 	if (copy_to_user(u, &op, sizeof(op)) != 0)
 		return -EFAULT;
@@ -488,9 +510,9 @@ static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv,
 	if (op.count > limit)
 		return -EINVAL;
 
-	down_write(&priv->sem);
+	write_lock(&priv->rwlock);
 	priv->limit = op.count;
-	up_write(&priv->sem);
+	write_unlock(&priv->rwlock);
 	return 0;
 }
 
@@ -538,7 +560,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 		printk("%s: map %d+%d at %lx (pgoff %lx)\n", __FUNCTION__,
 		       index, count, vma->vm_start, vma->vm_pgoff);
 
-	down_read(&priv->sem);
+	read_lock(&priv->rwlock);
 	map = gntdev_find_map_index(priv, index, count);
 	if (!map)
 		goto unlock_out;
@@ -580,7 +602,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 	map->is_mapped = 1;
 
 unlock_out:
-	up_read(&priv->sem);
+	read_unlock(&priv->rwlock);
 	return err;
 }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PVOPS] fix gntdev on PAE
  2010-06-02 14:11                 ` Stefano Stabellini
@ 2010-06-02 17:11                   ` Jeremy Fitzhardinge
  2010-06-03  9:32                     ` Stefano Stabellini
  0 siblings, 1 reply; 15+ messages in thread
From: Jeremy Fitzhardinge @ 2010-06-02 17:11 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel@lists.xensource.com, Gerd Hoffmann

On 06/02/2010 07:11 AM, Stefano Stabellini wrote:
> On Tue, 1 Jun 2010, Jeremy Fitzhardinge wrote:
>   
>> On 06/01/2010 09:36 AM, Stefano Stabellini wrote:
>>     
>>>>> Performances shouldn't matter in this case.
>>>>> Something like this:
>>>>>   
>>>>>       
>>>>>           
>>>> The problem is that the rcu lock disables preemption, so anything inside
>>>> it must be non-scheduling.  So it would need to be a spinlock type
>>>> thing, I think.
>>>>     
>>>>         
>>> right, in fact rwlock is a rw spinlock if I am not mistaken
>>>   
>>>       
>> Ah, yes.  The problem with just making it a spinlock is that other parts
>> of the code do copy_from_user while holding it, so they would need to be
>> restructured to avoid that.
>>
>>     
> Yes, you are right. I fixed the patch to do that, however I couldn't
> actually reproduce the bug you are seeing so I couldn't test the error
> path at all...
>   

I hit this often, mostly when mucking around with xenstored (maybe when
it exits?).  Other people have reported it too.

Does it really need to be a rwlock?

    J

> ---
>
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index ddc59cc..0e571bd 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -28,7 +28,7 @@
>  #include <linux/types.h>
>  #include <linux/uaccess.h>
>  #include <linux/sched.h>
> -#include <linux/rwsem.h>
> +#include <linux/spinlock_types.h>
>  
>  #include <xen/xen.h>
>  #include <xen/grant_table.h>
> @@ -51,7 +51,7 @@ struct gntdev_priv {
>  	struct list_head maps;
>  	uint32_t used;
>  	uint32_t limit;
> -	struct rw_semaphore sem;
> +	rwlock_t rwlock;
>  	struct mm_struct *mm;
>  	struct mmu_notifier mn;
>  };
> @@ -84,9 +84,9 @@ static void gntdev_print_maps(struct gntdev_priv *priv,
>  		       map->index == text_index && text ? text : "");
>  }
>  
> -static struct grant_map *gntdev_add_map(struct gntdev_priv *priv, int count)
> +static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
>  {
> -	struct grant_map *map, *add;
> +	struct grant_map *add;
>  
>  	add = kzalloc(sizeof(struct grant_map), GFP_KERNEL);
>  	if (NULL == add)
> @@ -104,8 +104,20 @@ static struct grant_map *gntdev_add_map(struct gntdev_priv *priv, int count)
>  	add->count = count;
>  	add->priv  = priv;
>  
> -	if (add->count + priv->used > priv->limit)
> -		goto err;
> +	if (add->count + priv->used <= priv->limit)
> +		return add;
> +
> +err:
> +	kfree(add->grants);
> +	kfree(add->map_ops);
> +	kfree(add->unmap_ops);
> +	kfree(add);
> +	return NULL;
> +}
> +
> +static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
> +{
> +	struct grant_map *map;
>  
>  	list_for_each_entry(map, &priv->maps, next) {
>  		if (add->index + add->count < map->index) {
> @@ -120,14 +132,6 @@ done:
>  	priv->used += add->count;
>  	if (debug)
>  		gntdev_print_maps(priv, "[new]", add->index);
> -	return add;
> -
> -err:
> -	kfree(add->grants);
> -	kfree(add->map_ops);
> -	kfree(add->unmap_ops);
> -	kfree(add);
> -	return NULL;
>  }
>  
>  static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index,
> @@ -174,11 +178,17 @@ static int gntdev_del_map(struct grant_map *map)
>  
>  	map->priv->used -= map->count;
>  	list_del(&map->next);
> +	return 0;
> +}
> +
> +static void gntdev_free_map(struct grant_map *map)
> +{
> +	if (!map)
> +		return;
>  	kfree(map->grants);
>  	kfree(map->map_ops);
>  	kfree(map->unmap_ops);
>  	kfree(map);
> -	return 0;
>  }
>  
>  /* ------------------------------------------------------------------ */
> @@ -277,7 +287,7 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
>  	unsigned long mstart, mend;
>  	int err;
>  
> -	down_read(&priv->sem);
> +	read_lock(&priv->rwlock);
>  	list_for_each_entry(map, &priv->maps, next) {
>  		if (!map->vma)
>  			continue;
> @@ -299,7 +309,7 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
>  					(mend - mstart) >> PAGE_SHIFT);
>  		WARN_ON(err);
>  	}
> -	up_read(&priv->sem);
> +	read_unlock(&priv->rwlock);
>  }
>  
>  static void mn_invl_page(struct mmu_notifier *mn,
> @@ -316,7 +326,7 @@ static void mn_release(struct mmu_notifier *mn,
>  	struct grant_map *map;
>  	int err;
>  
> -	down_read(&priv->sem);
> +	read_lock(&priv->rwlock);
>  	list_for_each_entry(map, &priv->maps, next) {
>  		if (!map->vma)
>  			continue;
> @@ -327,7 +337,7 @@ static void mn_release(struct mmu_notifier *mn,
>  		err = unmap_grant_pages(map, 0, map->count);
>  		WARN_ON(err);
>  	}
> -	up_read(&priv->sem);
> +	read_unlock(&priv->rwlock);
>  }
>  
>  struct mmu_notifier_ops gntdev_mmu_ops = {
> @@ -347,7 +357,7 @@ static int gntdev_open(struct inode *inode, struct file *flip)
>  		return -ENOMEM;
>  
>  	INIT_LIST_HEAD(&priv->maps);
> -	init_rwsem(&priv->sem);
> +	priv->rwlock = RW_LOCK_UNLOCKED;
>  	priv->limit = limit;
>  
>  	priv->mm = get_task_mm(current);
> @@ -375,13 +385,21 @@ static int gntdev_release(struct inode *inode, struct file *flip)
>  	if (debug)
>  		printk("%s: priv %p\n", __FUNCTION__, priv);
>  
> -	down_write(&priv->sem);
> -	while (!list_empty(&priv->maps)) {
> -		map = list_entry(priv->maps.next, struct grant_map, next);
> -		err = gntdev_del_map(map);
> -		WARN_ON(err);
> +	while (1) {
> +		write_lock(&priv->rwlock);
> +		if (!list_empty(&priv->maps)) {
> +			map = list_entry(priv->maps.next, struct grant_map, next);
> +			err = gntdev_del_map(map);
> +			write_unlock(&priv->rwlock);
> +			if (err)
> +				WARN_ON(err);
> +			else
> +				gntdev_free_map(map);
> +		} else {
> +			write_unlock(&priv->rwlock);
> +			break;
> +		}
>  	}
> -	up_write(&priv->sem);
>  	mmu_notifier_unregister(&priv->mn, priv->mm);
>  	kfree(priv);
>  	return 0;
> @@ -404,27 +422,29 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
>  	if (unlikely(op.count > priv->limit))
>  		return -EINVAL;
>  
> -	down_write(&priv->sem);
>  	err = -ENOMEM;
> -	map = gntdev_add_map(priv, op.count);
> +	map = gntdev_alloc_map(priv, op.count);
>  	if (!map)
> -		goto err_unlock;
> -
> -	err = -ENOMEM;
> +		return err;
>  	if (copy_from_user(map->grants, &u->refs,
> -			   sizeof(map->grants[0]) * op.count) != 0)
> -		goto err_free;
> +			   sizeof(map->grants[0]) * op.count) != 0) {
> +		gntdev_free_map(map);
> +		return err;
> +	}
> +
> +	write_lock(&priv->rwlock);
> +	gntdev_add_map(priv, map);
> +
>  	op.index = map->index << PAGE_SHIFT;
> -	if (copy_to_user(u, &op, sizeof(op)) != 0)
> -		goto err_free;
> -	up_write(&priv->sem);
> +	write_unlock(&priv->rwlock);
> +	if (copy_to_user(u, &op, sizeof(op)) != 0) {
> +		write_lock(&priv->rwlock);
> +		gntdev_del_map(map);
> +		write_unlock(&priv->rwlock);
> +		gntdev_free_map(map);
> +		return err;
> +	}
>  	return 0;
> -
> -err_free:
> -	gntdev_del_map(map);
> -err_unlock:
> -	up_write(&priv->sem);
> -	return err;
>  }
>  
>  static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
> @@ -440,11 +460,13 @@ static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
>  		printk("%s: priv %p, del %d+%d\n", __FUNCTION__, priv,
>  		       (int)op.index, (int)op.count);
>  
> -	down_write(&priv->sem);
> +	write_lock(&priv->rwlock);
>  	map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count);
>  	if (map)
>  		err = gntdev_del_map(map);
> -	up_write(&priv->sem);
> +	write_unlock(&priv->rwlock);
> +	if (!err)
> +		gntdev_free_map(map);
>  	return err;
>  }
>  
> @@ -460,16 +482,16 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
>  		printk("%s: priv %p, offset for vaddr %lx\n", __FUNCTION__, priv,
>  		       (unsigned long)op.vaddr);
>  
> -	down_read(&priv->sem);
> +	read_lock(&priv->rwlock);
>  	map = gntdev_find_map_vaddr(priv, op.vaddr);
>  	if (map == NULL ||
>  	    map->vma->vm_start != op.vaddr) {
> -		up_read(&priv->sem);
> +		read_unlock(&priv->rwlock);
>  		return -EINVAL;
>  	}
>  	op.offset = map->index << PAGE_SHIFT;
>  	op.count = map->count;
> -	up_read(&priv->sem);
> +	read_unlock(&priv->rwlock);
>  
>  	if (copy_to_user(u, &op, sizeof(op)) != 0)
>  		return -EFAULT;
> @@ -488,9 +510,9 @@ static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv,
>  	if (op.count > limit)
>  		return -EINVAL;
>  
> -	down_write(&priv->sem);
> +	write_lock(&priv->rwlock);
>  	priv->limit = op.count;
> -	up_write(&priv->sem);
> +	write_unlock(&priv->rwlock);
>  	return 0;
>  }
>  
> @@ -538,7 +560,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>  		printk("%s: map %d+%d at %lx (pgoff %lx)\n", __FUNCTION__,
>  		       index, count, vma->vm_start, vma->vm_pgoff);
>  
> -	down_read(&priv->sem);
> +	read_lock(&priv->rwlock);
>  	map = gntdev_find_map_index(priv, index, count);
>  	if (!map)
>  		goto unlock_out;
> @@ -580,7 +602,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>  	map->is_mapped = 1;
>  
>  unlock_out:
> -	up_read(&priv->sem);
> +	read_unlock(&priv->rwlock);
>  	return err;
>  }
>  
>
>   

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PVOPS] fix gntdev on PAE
  2010-06-02 17:11                   ` Jeremy Fitzhardinge
@ 2010-06-03  9:32                     ` Stefano Stabellini
  2010-06-09 19:38                       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 15+ messages in thread
From: Stefano Stabellini @ 2010-06-03  9:32 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: xen-devel@lists.xensource.com, Hoffmann, Gerd, Stefano Stabellini

On Wed, 2 Jun 2010, Jeremy Fitzhardinge wrote:
> I hit this often, mostly when mucking around with xenstored (maybe when
> it exits?).  Other people have reported it too.
> 
> Does it really need to be a rwlock?
> 

I see you really don't like rwlocks :)

All right, this version uses plain spinlocks instead.

---


diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index ddc59cc..6a3e207 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -28,7 +28,7 @@
 #include <linux/types.h>
 #include <linux/uaccess.h>
 #include <linux/sched.h>
-#include <linux/rwsem.h>
+#include <linux/spinlock.h>
 
 #include <xen/xen.h>
 #include <xen/grant_table.h>
@@ -51,7 +51,7 @@ struct gntdev_priv {
 	struct list_head maps;
 	uint32_t used;
 	uint32_t limit;
-	struct rw_semaphore sem;
+	spinlock_t lock;
 	struct mm_struct *mm;
 	struct mmu_notifier mn;
 };
@@ -84,9 +84,9 @@ static void gntdev_print_maps(struct gntdev_priv *priv,
 		       map->index == text_index && text ? text : "");
 }
 
-static struct grant_map *gntdev_add_map(struct gntdev_priv *priv, int count)
+static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
 {
-	struct grant_map *map, *add;
+	struct grant_map *add;
 
 	add = kzalloc(sizeof(struct grant_map), GFP_KERNEL);
 	if (NULL == add)
@@ -104,8 +104,20 @@ static struct grant_map *gntdev_add_map(struct gntdev_priv *priv, int count)
 	add->count = count;
 	add->priv  = priv;
 
-	if (add->count + priv->used > priv->limit)
-		goto err;
+	if (add->count + priv->used <= priv->limit)
+		return add;
+
+err:
+	kfree(add->grants);
+	kfree(add->map_ops);
+	kfree(add->unmap_ops);
+	kfree(add);
+	return NULL;
+}
+
+static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
+{
+	struct grant_map *map;
 
 	list_for_each_entry(map, &priv->maps, next) {
 		if (add->index + add->count < map->index) {
@@ -120,14 +132,6 @@ done:
 	priv->used += add->count;
 	if (debug)
 		gntdev_print_maps(priv, "[new]", add->index);
-	return add;
-
-err:
-	kfree(add->grants);
-	kfree(add->map_ops);
-	kfree(add->unmap_ops);
-	kfree(add);
-	return NULL;
 }
 
 static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index,
@@ -174,11 +178,17 @@ static int gntdev_del_map(struct grant_map *map)
 
 	map->priv->used -= map->count;
 	list_del(&map->next);
+	return 0;
+}
+
+static void gntdev_free_map(struct grant_map *map)
+{
+	if (!map)
+		return;
 	kfree(map->grants);
 	kfree(map->map_ops);
 	kfree(map->unmap_ops);
 	kfree(map);
-	return 0;
 }
 
 /* ------------------------------------------------------------------ */
@@ -277,7 +287,7 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
 	unsigned long mstart, mend;
 	int err;
 
-	down_read(&priv->sem);
+	spin_lock(&priv->lock);
 	list_for_each_entry(map, &priv->maps, next) {
 		if (!map->vma)
 			continue;
@@ -299,7 +309,7 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
 					(mend - mstart) >> PAGE_SHIFT);
 		WARN_ON(err);
 	}
-	up_read(&priv->sem);
+	spin_unlock(&priv->lock);
 }
 
 static void mn_invl_page(struct mmu_notifier *mn,
@@ -316,7 +326,7 @@ static void mn_release(struct mmu_notifier *mn,
 	struct grant_map *map;
 	int err;
 
-	down_read(&priv->sem);
+	spin_lock(&priv->lock);
 	list_for_each_entry(map, &priv->maps, next) {
 		if (!map->vma)
 			continue;
@@ -327,7 +337,7 @@ static void mn_release(struct mmu_notifier *mn,
 		err = unmap_grant_pages(map, 0, map->count);
 		WARN_ON(err);
 	}
-	up_read(&priv->sem);
+	spin_unlock(&priv->lock);
 }
 
 struct mmu_notifier_ops gntdev_mmu_ops = {
@@ -347,7 +357,7 @@ static int gntdev_open(struct inode *inode, struct file *flip)
 		return -ENOMEM;
 
 	INIT_LIST_HEAD(&priv->maps);
-	init_rwsem(&priv->sem);
+	spin_lock_init(&priv->lock);
 	priv->limit = limit;
 
 	priv->mm = get_task_mm(current);
@@ -375,13 +385,21 @@ static int gntdev_release(struct inode *inode, struct file *flip)
 	if (debug)
 		printk("%s: priv %p\n", __FUNCTION__, priv);
 
-	down_write(&priv->sem);
-	while (!list_empty(&priv->maps)) {
-		map = list_entry(priv->maps.next, struct grant_map, next);
-		err = gntdev_del_map(map);
-		WARN_ON(err);
+	while (1) {
+		spin_lock(&priv->lock);
+		if (!list_empty(&priv->maps)) {
+			map = list_entry(priv->maps.next, struct grant_map, next);
+			err = gntdev_del_map(map);
+			spin_unlock(&priv->lock);
+			if (err)
+				WARN_ON(err);
+			else
+				gntdev_free_map(map);
+		} else {
+			spin_unlock(&priv->lock);
+			break;
+		}
 	}
-	up_write(&priv->sem);
 	mmu_notifier_unregister(&priv->mn, priv->mm);
 	kfree(priv);
 	return 0;
@@ -404,27 +422,29 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
 	if (unlikely(op.count > priv->limit))
 		return -EINVAL;
 
-	down_write(&priv->sem);
 	err = -ENOMEM;
-	map = gntdev_add_map(priv, op.count);
+	map = gntdev_alloc_map(priv, op.count);
 	if (!map)
-		goto err_unlock;
-
-	err = -ENOMEM;
+		return err;
 	if (copy_from_user(map->grants, &u->refs,
-			   sizeof(map->grants[0]) * op.count) != 0)
-		goto err_free;
+			   sizeof(map->grants[0]) * op.count) != 0) {
+		gntdev_free_map(map);
+		return err;
+	}
+
+	spin_lock(&priv->lock);
+	gntdev_add_map(priv, map);
+
 	op.index = map->index << PAGE_SHIFT;
-	if (copy_to_user(u, &op, sizeof(op)) != 0)
-		goto err_free;
-	up_write(&priv->sem);
+	spin_unlock(&priv->lock);
+	if (copy_to_user(u, &op, sizeof(op)) != 0) {
+		spin_lock(&priv->lock);
+		gntdev_del_map(map);
+		spin_unlock(&priv->lock);
+		gntdev_free_map(map);
+		return err;
+	}
 	return 0;
-
-err_free:
-	gntdev_del_map(map);
-err_unlock:
-	up_write(&priv->sem);
-	return err;
 }
 
 static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
@@ -440,11 +460,13 @@ static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
 		printk("%s: priv %p, del %d+%d\n", __FUNCTION__, priv,
 		       (int)op.index, (int)op.count);
 
-	down_write(&priv->sem);
+	spin_lock(&priv->lock);
 	map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count);
 	if (map)
 		err = gntdev_del_map(map);
-	up_write(&priv->sem);
+	spin_unlock(&priv->lock);
+	if (!err)
+		gntdev_free_map(map);
 	return err;
 }
 
@@ -460,16 +482,16 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
 		printk("%s: priv %p, offset for vaddr %lx\n", __FUNCTION__, priv,
 		       (unsigned long)op.vaddr);
 
-	down_read(&priv->sem);
+	spin_lock(&priv->lock);
 	map = gntdev_find_map_vaddr(priv, op.vaddr);
 	if (map == NULL ||
 	    map->vma->vm_start != op.vaddr) {
-		up_read(&priv->sem);
+		spin_unlock(&priv->lock);
 		return -EINVAL;
 	}
 	op.offset = map->index << PAGE_SHIFT;
 	op.count = map->count;
-	up_read(&priv->sem);
+	spin_unlock(&priv->lock);
 
 	if (copy_to_user(u, &op, sizeof(op)) != 0)
 		return -EFAULT;
@@ -488,9 +510,9 @@ static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv,
 	if (op.count > limit)
 		return -EINVAL;
 
-	down_write(&priv->sem);
+	spin_lock(&priv->lock);
 	priv->limit = op.count;
-	up_write(&priv->sem);
+	spin_unlock(&priv->lock);
 	return 0;
 }
 
@@ -538,7 +560,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 		printk("%s: map %d+%d at %lx (pgoff %lx)\n", __FUNCTION__,
 		       index, count, vma->vm_start, vma->vm_pgoff);
 
-	down_read(&priv->sem);
+	spin_lock(&priv->lock);
 	map = gntdev_find_map_index(priv, index, count);
 	if (!map)
 		goto unlock_out;
@@ -580,7 +602,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 	map->is_mapped = 1;
 
 unlock_out:
-	up_read(&priv->sem);
+	spin_unlock(&priv->lock);
 	return err;
 }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PVOPS] fix gntdev on PAE
  2010-06-03  9:32                     ` Stefano Stabellini
@ 2010-06-09 19:38                       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 15+ messages in thread
From: Jeremy Fitzhardinge @ 2010-06-09 19:38 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel@lists.xensource.com, Gerd Hoffmann

On 06/03/2010 02:32 AM, Stefano Stabellini wrote:
> On Wed, 2 Jun 2010, Jeremy Fitzhardinge wrote:
>   
>> I hit this often, mostly when mucking around with xenstored (maybe when
>> it exits?).  Other people have reported it too.
>>
>> Does it really need to be a rwlock?
>>
>>     
> I see you really don't like rwlocks :)
>
> All right, this version uses plain spinlocks instead.
>   

Thanks, that looks good.  I tweaked it a little bit (kfree doesn't
block, so it can be done under spinlock which simplifies the release
loop), and I'm about to give it a test spin.

    J

> ---
>
>
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index ddc59cc..6a3e207 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -28,7 +28,7 @@
>  #include <linux/types.h>
>  #include <linux/uaccess.h>
>  #include <linux/sched.h>
> -#include <linux/rwsem.h>
> +#include <linux/spinlock.h>
>  
>  #include <xen/xen.h>
>  #include <xen/grant_table.h>
> @@ -51,7 +51,7 @@ struct gntdev_priv {
>  	struct list_head maps;
>  	uint32_t used;
>  	uint32_t limit;
> -	struct rw_semaphore sem;
> +	spinlock_t lock;
>  	struct mm_struct *mm;
>  	struct mmu_notifier mn;
>  };
> @@ -84,9 +84,9 @@ static void gntdev_print_maps(struct gntdev_priv *priv,
>  		       map->index == text_index && text ? text : "");
>  }
>  
> -static struct grant_map *gntdev_add_map(struct gntdev_priv *priv, int count)
> +static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
>  {
> -	struct grant_map *map, *add;
> +	struct grant_map *add;
>  
>  	add = kzalloc(sizeof(struct grant_map), GFP_KERNEL);
>  	if (NULL == add)
> @@ -104,8 +104,20 @@ static struct grant_map *gntdev_add_map(struct gntdev_priv *priv, int count)
>  	add->count = count;
>  	add->priv  = priv;
>  
> -	if (add->count + priv->used > priv->limit)
> -		goto err;
> +	if (add->count + priv->used <= priv->limit)
> +		return add;
> +
> +err:
> +	kfree(add->grants);
> +	kfree(add->map_ops);
> +	kfree(add->unmap_ops);
> +	kfree(add);
> +	return NULL;
> +}
> +
> +static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
> +{
> +	struct grant_map *map;
>  
>  	list_for_each_entry(map, &priv->maps, next) {
>  		if (add->index + add->count < map->index) {
> @@ -120,14 +132,6 @@ done:
>  	priv->used += add->count;
>  	if (debug)
>  		gntdev_print_maps(priv, "[new]", add->index);
> -	return add;
> -
> -err:
> -	kfree(add->grants);
> -	kfree(add->map_ops);
> -	kfree(add->unmap_ops);
> -	kfree(add);
> -	return NULL;
>  }
>  
>  static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv, int index,
> @@ -174,11 +178,17 @@ static int gntdev_del_map(struct grant_map *map)
>  
>  	map->priv->used -= map->count;
>  	list_del(&map->next);
> +	return 0;
> +}
> +
> +static void gntdev_free_map(struct grant_map *map)
> +{
> +	if (!map)
> +		return;
>  	kfree(map->grants);
>  	kfree(map->map_ops);
>  	kfree(map->unmap_ops);
>  	kfree(map);
> -	return 0;
>  }
>  
>  /* ------------------------------------------------------------------ */
> @@ -277,7 +287,7 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
>  	unsigned long mstart, mend;
>  	int err;
>  
> -	down_read(&priv->sem);
> +	spin_lock(&priv->lock);
>  	list_for_each_entry(map, &priv->maps, next) {
>  		if (!map->vma)
>  			continue;
> @@ -299,7 +309,7 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
>  					(mend - mstart) >> PAGE_SHIFT);
>  		WARN_ON(err);
>  	}
> -	up_read(&priv->sem);
> +	spin_unlock(&priv->lock);
>  }
>  
>  static void mn_invl_page(struct mmu_notifier *mn,
> @@ -316,7 +326,7 @@ static void mn_release(struct mmu_notifier *mn,
>  	struct grant_map *map;
>  	int err;
>  
> -	down_read(&priv->sem);
> +	spin_lock(&priv->lock);
>  	list_for_each_entry(map, &priv->maps, next) {
>  		if (!map->vma)
>  			continue;
> @@ -327,7 +337,7 @@ static void mn_release(struct mmu_notifier *mn,
>  		err = unmap_grant_pages(map, 0, map->count);
>  		WARN_ON(err);
>  	}
> -	up_read(&priv->sem);
> +	spin_unlock(&priv->lock);
>  }
>  
>  struct mmu_notifier_ops gntdev_mmu_ops = {
> @@ -347,7 +357,7 @@ static int gntdev_open(struct inode *inode, struct file *flip)
>  		return -ENOMEM;
>  
>  	INIT_LIST_HEAD(&priv->maps);
> -	init_rwsem(&priv->sem);
> +	spin_lock_init(&priv->lock);
>  	priv->limit = limit;
>  
>  	priv->mm = get_task_mm(current);
> @@ -375,13 +385,21 @@ static int gntdev_release(struct inode *inode, struct file *flip)
>  	if (debug)
>  		printk("%s: priv %p\n", __FUNCTION__, priv);
>  
> -	down_write(&priv->sem);
> -	while (!list_empty(&priv->maps)) {
> -		map = list_entry(priv->maps.next, struct grant_map, next);
> -		err = gntdev_del_map(map);
> -		WARN_ON(err);
> +	while (1) {
> +		spin_lock(&priv->lock);
> +		if (!list_empty(&priv->maps)) {
> +			map = list_entry(priv->maps.next, struct grant_map, next);
> +			err = gntdev_del_map(map);
> +			spin_unlock(&priv->lock);
> +			if (err)
> +				WARN_ON(err);
> +			else
> +				gntdev_free_map(map);
> +		} else {
> +			spin_unlock(&priv->lock);
> +			break;
> +		}
>  	}
> -	up_write(&priv->sem);
>  	mmu_notifier_unregister(&priv->mn, priv->mm);
>  	kfree(priv);
>  	return 0;
> @@ -404,27 +422,29 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
>  	if (unlikely(op.count > priv->limit))
>  		return -EINVAL;
>  
> -	down_write(&priv->sem);
>  	err = -ENOMEM;
> -	map = gntdev_add_map(priv, op.count);
> +	map = gntdev_alloc_map(priv, op.count);
>  	if (!map)
> -		goto err_unlock;
> -
> -	err = -ENOMEM;
> +		return err;
>  	if (copy_from_user(map->grants, &u->refs,
> -			   sizeof(map->grants[0]) * op.count) != 0)
> -		goto err_free;
> +			   sizeof(map->grants[0]) * op.count) != 0) {
> +		gntdev_free_map(map);
> +		return err;
> +	}
> +
> +	spin_lock(&priv->lock);
> +	gntdev_add_map(priv, map);
> +
>  	op.index = map->index << PAGE_SHIFT;
> -	if (copy_to_user(u, &op, sizeof(op)) != 0)
> -		goto err_free;
> -	up_write(&priv->sem);
> +	spin_unlock(&priv->lock);
> +	if (copy_to_user(u, &op, sizeof(op)) != 0) {
> +		spin_lock(&priv->lock);
> +		gntdev_del_map(map);
> +		spin_unlock(&priv->lock);
> +		gntdev_free_map(map);
> +		return err;
> +	}
>  	return 0;
> -
> -err_free:
> -	gntdev_del_map(map);
> -err_unlock:
> -	up_write(&priv->sem);
> -	return err;
>  }
>  
>  static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
> @@ -440,11 +460,13 @@ static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
>  		printk("%s: priv %p, del %d+%d\n", __FUNCTION__, priv,
>  		       (int)op.index, (int)op.count);
>  
> -	down_write(&priv->sem);
> +	spin_lock(&priv->lock);
>  	map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count);
>  	if (map)
>  		err = gntdev_del_map(map);
> -	up_write(&priv->sem);
> +	spin_unlock(&priv->lock);
> +	if (!err)
> +		gntdev_free_map(map);
>  	return err;
>  }
>  
> @@ -460,16 +482,16 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
>  		printk("%s: priv %p, offset for vaddr %lx\n", __FUNCTION__, priv,
>  		       (unsigned long)op.vaddr);
>  
> -	down_read(&priv->sem);
> +	spin_lock(&priv->lock);
>  	map = gntdev_find_map_vaddr(priv, op.vaddr);
>  	if (map == NULL ||
>  	    map->vma->vm_start != op.vaddr) {
> -		up_read(&priv->sem);
> +		spin_unlock(&priv->lock);
>  		return -EINVAL;
>  	}
>  	op.offset = map->index << PAGE_SHIFT;
>  	op.count = map->count;
> -	up_read(&priv->sem);
> +	spin_unlock(&priv->lock);
>  
>  	if (copy_to_user(u, &op, sizeof(op)) != 0)
>  		return -EFAULT;
> @@ -488,9 +510,9 @@ static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv,
>  	if (op.count > limit)
>  		return -EINVAL;
>  
> -	down_write(&priv->sem);
> +	spin_lock(&priv->lock);
>  	priv->limit = op.count;
> -	up_write(&priv->sem);
> +	spin_unlock(&priv->lock);
>  	return 0;
>  }
>  
> @@ -538,7 +560,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>  		printk("%s: map %d+%d at %lx (pgoff %lx)\n", __FUNCTION__,
>  		       index, count, vma->vm_start, vma->vm_pgoff);
>  
> -	down_read(&priv->sem);
> +	spin_lock(&priv->lock);
>  	map = gntdev_find_map_index(priv, index, count);
>  	if (!map)
>  		goto unlock_out;
> @@ -580,7 +602,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>  	map->is_mapped = 1;
>  
>  unlock_out:
> -	up_read(&priv->sem);
> +	spin_unlock(&priv->lock);
>  	return err;
>  }
>  
>
>   

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-06-09 19:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-01 15:11 [PATCH] [PVOPS] fix gntdev on PAE Stefano Stabellini
2010-02-01 15:46 ` Stefano Stabellini
2010-02-09 21:57   ` Jeremy Fitzhardinge
2010-02-10 12:19     ` Stefano Stabellini
2010-02-10 23:12       ` Jeremy Fitzhardinge
2010-05-28 17:29       ` Jeremy Fitzhardinge
2010-06-01  9:38         ` Stefano Stabellini
2010-06-01 16:31           ` Jeremy Fitzhardinge
2010-06-01 16:36             ` Stefano Stabellini
2010-06-01 16:46               ` Jeremy Fitzhardinge
2010-06-02 14:11                 ` Stefano Stabellini
2010-06-02 17:11                   ` Jeremy Fitzhardinge
2010-06-03  9:32                     ` Stefano Stabellini
2010-06-09 19:38                       ` Jeremy Fitzhardinge
2010-02-09 22:24   ` Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.