From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: [PATCH] [PVOPS] fix gntdev on PAE Date: Fri, 28 May 2010 10:29:32 -0700 Message-ID: <4BFFFD7C.5080708@goop.org> References: <4B71DA53.1080404@goop.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Stefano Stabellini Cc: "xen-devel@lists.xensource.com" , Gerd Hoffmann List-Id: xen-devel@lists.xenproject.org On 02/10/2010 04:19 AM, Stefano Stabellini wrote: > On Tue, 9 Feb 2010, Jeremy Fitzhardinge wrote: > >> On 02/01/2010 07:46 AM, Stefano Stabellini wrote: >> >>> On Mon, 1 Feb 2010, Stefano Stabellini wrote: >>> >>> >>>> Hi all, >>>> this small patch fixes gntdev on Linux pvops kernels: >>>> gnttab_set_map_op and gnttab_set_unmap_op shouldn't take unsigned long >>>> as parameters for machine addresses because they are not big enough on >>>> PAE systems. >>>> This patch fixes the issue using phys_addr_t instead and enables >>>> XEN_GNTDEV compilation again. >>>> >>>> >>>> Signed-off-by: Stefano Stabellini >>>> >>>> >>>> >>> BTW gntdev is used by qemu to provide the console backend to pv guests. >>> >>> >> Is that recent? Console had been working before hadn't it? >> >> The gntdev problems I saw were more locking related than anything to do >> with PAE. Did you try testing with lock debugging enabled? >> >> > Yes, I don't have any problem with locking in gntdev on my testbox. > I managed to catch a lockdep problem in gntdev, which may be the same as before: BUG: sleeping function called from invalid context at kernel/rwsem.c:21 in_atomic(): 1, irqs_disabled(): 0, pid: 4091, name: qemu-dm 2 locks held by qemu-dm/4091: #0: (&mm->mmap_sem){++++++}, at: [] sys_munmap+0x33/0x58 #1: (rcu_read_lock){.+.+..}, at: [] __mmu_notifier_invalidate_range_start+0x0/0xc7 Pid: 4091, comm: qemu-dm Not tainted 2.6.32.13 #23 Call Trace: [] ? __debug_show_held_locks+0x22/0x24 [] __might_sleep+0x123/0x127 [] ? release_pages+0xd2/0x1e7 [] down_read+0x1f/0x57 [] ? check_events+0x12/0x20 [] ? release_pages+0xd2/0x1e7 [] ? __mmu_notifier_invalidate_range_start+0x0/0xc7 [] mn_invl_range_start+0x32/0x118 [] __mmu_notifier_invalidate_range_start+0x62/0xc7 [] ? __mmu_notifier_invalidate_range_start+0x0/0xc7 [] unmap_vmas+0x8c/0x91a [] unmap_region+0xda/0x178 [] do_munmap+0x2ae/0x318 [] sys_munmap+0x41/0x58 [] system_call_fastpath+0x16/0x1b The problem is that mn_invl_range_start does a down_read(), but it is called from __mmu_notifier_invalidate_range_start(), which does an rcu_read_lock, which has the side-effect of disabling preemption. The mmu notifier code seems to have always used rcu_read_lock this way, so I guess this bug has always been there. It's not immediately obvious how to fix it. Thoughts? J