possible recursive locking issue

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* possible recursive locking issue
@ 2017-07-06  9:39 Dong, Chuanxiao
  2017-07-06 21:10 ` Alex Williamson
  0 siblings, 1 reply; 5+ messages in thread
From: Dong, Chuanxiao @ 2017-07-06  9:39 UTC (permalink / raw)
  To: kwankhede@nvidia.com, Alex Williamson, kvm@vger.kernel.org
  Cc: 'Zhenyu Wang'

Hello,

We met a possible recursive locking issue and seeking a solution for resolving this. The log is looking like below:

    [ 5102.127454] ============================================
    [ 5102.133379] WARNING: possible recursive locking detected
    [ 5102.139304] 4.12.0-rc4+ #3 Not tainted
    [ 5102.143483] --------------------------------------------
    [ 5102.149407] qemu-system-x86/1620 is trying to acquire lock:
    [ 5102.155624]  (&container->group_lock){++++++}, at: [<ffffffff817768c6>] vfio_unpin_pages+0x96/0xf0
    [ 5102.165626]
    but task is already holding lock:
    [ 5102.172134]  (&container->group_lock){++++++}, at: [<ffffffff8177728f>] vfio_fops_unl_ioctl+0x5f/0x280
    [ 5102.182522]
    other info that might help us debug this:
    [ 5102.189806]  Possible unsafe locking scenario:

    [ 5102.196411]        CPU0
    [ 5102.199136]        ----
    [ 5102.201861]   lock(&container->group_lock);
    [ 5102.206527]   lock(&container->group_lock);
    [ 5102.211191]
    *** DEADLOCK ***

    [ 5102.217796]  May be due to missing lock nesting notation

    [ 5102.225370] 3 locks held by qemu-system-x86/1620:
    [ 5102.230618]  #0:  (&container->group_lock){++++++}, at: [<ffffffff8177728f>] vfio_fops_unl_ioctl+0x5f/0x280
    [ 5102.241482]  #1:  (&(&iommu->notifier)->rwsem){++++..}, at: [<ffffffff810de775>] __blocking_notifier_call_chain+0x35/0x70
    [ 5102.253713]  #2:  (&vgpu->vdev.cache_lock){+.+...}, at: [<ffffffff8157b007>] intel_vgpu_iommu_notifier+0x77/0x120
    [ 5102.265163]
    stack backtrace:
    [ 5102.270022] CPU: 5 PID: 1620 Comm: qemu-system-x86 Not tainted 4.12.0-rc4+ #3
    [ 5102.277991] Hardware name: Intel Corporation S1200RP/S1200RP, BIOS S1200RP.86B.03.01.APER.061220151418 06/12/2015
    [ 5102.289445] Call Trace:
    [ 5102.292175]  dump_stack+0x85/0xc7
    [ 5102.295871]  validate_chain.isra.21+0x9da/0xaf0
    [ 5102.300925]  __lock_acquire+0x405/0x820
    [ 5102.305202]  lock_acquire+0xc7/0x220
    [ 5102.309191]  ? vfio_unpin_pages+0x96/0xf0
    [ 5102.313666]  down_read+0x2b/0x50
    [ 5102.317259]  ? vfio_unpin_pages+0x96/0xf0
    [ 5102.321732]  vfio_unpin_pages+0x96/0xf0
    [ 5102.326024]  intel_vgpu_iommu_notifier+0xe5/0x120
    [ 5102.331283]  notifier_call_chain+0x4a/0x70
    [ 5102.335851]  __blocking_notifier_call_chain+0x4d/0x70
    [ 5102.341490]  blocking_notifier_call_chain+0x16/0x20
    [ 5102.346935]  vfio_iommu_type1_ioctl+0x87b/0x920
    [ 5102.351994]  vfio_fops_unl_ioctl+0x81/0x280
    [ 5102.356660]  ? __fget+0xf0/0x210
    [ 5102.360261]  do_vfs_ioctl+0x93/0x6a0
    [ 5102.364247]  ? __fget+0x111/0x210
    [ 5102.367942]  SyS_ioctl+0x41/0x70
    [ 5102.371542]  entry_SYSCALL_64_fastpath+0x1f/0xbe

The call stack is:
vfio_fops_unl_ioctl -> vfio_iommu_type1_ioctl -> vfio_dma_do_unmap -> blocking_notifier_call_chain -> intel_vgpu_iommu_notifier -> vfio_unpin_pages.
  
The container->group_lock is hold in vfio_fops_unl_ioctl first but then it will be hold again in vfio_unpin_pages. Regarding this, put the vfio_unpin_pages in another thread can resolve this recursive locking. In this way, vfio_unpin_pages will be asynchronies with the vfio_dma_do_unmap. Then it is possible to trigger below kernel panic due to this asynchronies:

[ 4468.975091] ------------[ cut here ]------------
[ 4468.976145] kernel BUG at drivers/vfio/vfio_iommu_type1.c:833!
[ 4468.977193] invalid opcode: 0000 [#1] SMP
[ 4468.978232] Modules linked in: bridge stp llc nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel joydev input_leds crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd hci_uart glue_helper shpchp btbcm cryptd btqca winbond_cir rc_core btintel ipmi_ssif mei_me mei intel_pch_thermal bluetooth acpi_als kfifo_buf industrialio soc_button_array intel_vbtn ipmi_devintf ipmi_msghandler intel_hid ecdh_generic intel_lpss_acpi intel_lpss spidev sparse_keymap acpi_power_meter mac_hid sunrpc parport_pc ppdev lp parport autofs4 kvmgt vfio_mdev mdev vfio_iommu_type1 vfio kvm irqbypass hid_generic usbhid i915 drm_kms_helper igb e1000e syscopyarea sysfillrect sysimgblt dca fb_sys_fops ptp pps_core drm i2c_algo_bit ahci libahci wmi video
  pinctrl_sunrisepoint
[ 4468.982783]  pinctrl_intel i2c_hid hid
[ 4468.983995] CPU: 3 PID: 1549 Comm: qemu-system-x86 Not tainted 4.12.0-rc7+ #1
[ 4468.985132] Hardware name: Intel Corporation Kabylake Greenlow Refresh UP Server Platform/Zumba Beach Server EV, BIOS KBLSE2R1.R00.0006.B08.1702011304 02/
[ 4468.986350] task: ffff8cd9afa28000 task.stack: ffffb136c2f68000
[ 4468.987545] RIP: 0010:vfio_iommu_type1_ioctl+0x894/0x910 [vfio_iommu_type1]
[ 4468.988864] RSP: 0018:ffffb136c2f6bd58 EFLAGS: 00010202
[ 4468.990106] RAX: 0000000000100000 RBX: 00007f80f55b1410 RCX: 000000007ff00000
[ 4468.991290] RDX: ffff8cd9af105d00 RSI: ffff8cd9b4835e40 RDI: 000000000000000b
[ 4468.992536] RBP: ffffb136c2f6be30 R08: 0000000000100000 R09: 0000000080000000
[ 4468.993749] R10: ffffb136c2f6bd30 R11: 000000000000013b R12: 0000000000000000
[ 4468.994991] R13: ffff8cd9ad813b80 R14: ffff8cd9afa28000 R15: ffffb136c2f6bdc8
[ 4468.996131] FS:  00007f80f55b2700(0000) GS:ffff8cd9c7d80000(0000) knlGS:0000000000000000
[ 4468.997305] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4468.998525] CR2: 000001f016dd2218 CR3: 0000000472bfc000 CR4: 00000000003426e0
[ 4468.999668] Call Trace:
[ 4469.000916]  ? kvm_set_memory_region+0x38/0x60 [kvm]
[ 4469.002072]  vfio_fops_unl_ioctl+0x7b/0x260 [vfio]
[ 4469.003220]  do_vfs_ioctl+0xa1/0x5d0
[ 4469.004443]  ? SyS_futex+0x7f/0x180
[ 4469.005567]  SyS_ioctl+0x79/0x90
[ 4469.006661]  entry_SYSCALL_64_fastpath+0x1e/0xa9

Can you help to check this recursive locking issue?

Thanks
Chuanxiao

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible recursive locking issue
  2017-07-06  9:39 possible recursive locking issue Dong, Chuanxiao
@ 2017-07-06 21:10 ` Alex Williamson
  2017-07-07  2:42   ` Dong, Chuanxiao
  2017-07-07  7:54   ` Paolo Bonzini
  0 siblings, 2 replies; 5+ messages in thread
From: Alex Williamson @ 2017-07-06 21:10 UTC (permalink / raw)
  To: Dong, Chuanxiao
  Cc: kwankhede@nvidia.com, kvm@vger.kernel.org, 'Zhenyu Wang'

On Thu, 6 Jul 2017 09:39:41 +0000
"Dong, Chuanxiao" <chuanxiao.dong@intel.com> wrote:

> Hello,
> 
> We met a possible recursive locking issue and seeking a solution for resolving this. The log is looking like below:
> 
>     [ 5102.127454] ============================================
>     [ 5102.133379] WARNING: possible recursive locking detected
>     [ 5102.139304] 4.12.0-rc4+ #3 Not tainted
>     [ 5102.143483] --------------------------------------------
>     [ 5102.149407] qemu-system-x86/1620 is trying to acquire lock:
>     [ 5102.155624]  (&container->group_lock){++++++}, at: [<ffffffff817768c6>] vfio_unpin_pages+0x96/0xf0
>     [ 5102.165626]
>     but task is already holding lock:
>     [ 5102.172134]  (&container->group_lock){++++++}, at: [<ffffffff8177728f>] vfio_fops_unl_ioctl+0x5f/0x280
>     [ 5102.182522]
>     other info that might help us debug this:
>     [ 5102.189806]  Possible unsafe locking scenario:
> 
>     [ 5102.196411]        CPU0
>     [ 5102.199136]        ----
>     [ 5102.201861]   lock(&container->group_lock);
>     [ 5102.206527]   lock(&container->group_lock);
>     [ 5102.211191]
>     *** DEADLOCK ***
> 
>     [ 5102.217796]  May be due to missing lock nesting notation
> 
>     [ 5102.225370] 3 locks held by qemu-system-x86/1620:
>     [ 5102.230618]  #0:  (&container->group_lock){++++++}, at: [<ffffffff8177728f>] vfio_fops_unl_ioctl+0x5f/0x280
>     [ 5102.241482]  #1:  (&(&iommu->notifier)->rwsem){++++..}, at: [<ffffffff810de775>] __blocking_notifier_call_chain+0x35/0x70
>     [ 5102.253713]  #2:  (&vgpu->vdev.cache_lock){+.+...}, at: [<ffffffff8157b007>] intel_vgpu_iommu_notifier+0x77/0x120
>     [ 5102.265163]
>     stack backtrace:
>     [ 5102.270022] CPU: 5 PID: 1620 Comm: qemu-system-x86 Not tainted 4.12.0-rc4+ #3
>     [ 5102.277991] Hardware name: Intel Corporation S1200RP/S1200RP, BIOS S1200RP.86B.03.01.APER.061220151418 06/12/2015
>     [ 5102.289445] Call Trace:
>     [ 5102.292175]  dump_stack+0x85/0xc7
>     [ 5102.295871]  validate_chain.isra.21+0x9da/0xaf0
>     [ 5102.300925]  __lock_acquire+0x405/0x820
>     [ 5102.305202]  lock_acquire+0xc7/0x220
>     [ 5102.309191]  ? vfio_unpin_pages+0x96/0xf0
>     [ 5102.313666]  down_read+0x2b/0x50
>     [ 5102.317259]  ? vfio_unpin_pages+0x96/0xf0
>     [ 5102.321732]  vfio_unpin_pages+0x96/0xf0
>     [ 5102.326024]  intel_vgpu_iommu_notifier+0xe5/0x120
>     [ 5102.331283]  notifier_call_chain+0x4a/0x70
>     [ 5102.335851]  __blocking_notifier_call_chain+0x4d/0x70
>     [ 5102.341490]  blocking_notifier_call_chain+0x16/0x20
>     [ 5102.346935]  vfio_iommu_type1_ioctl+0x87b/0x920
>     [ 5102.351994]  vfio_fops_unl_ioctl+0x81/0x280
>     [ 5102.356660]  ? __fget+0xf0/0x210
>     [ 5102.360261]  do_vfs_ioctl+0x93/0x6a0
>     [ 5102.364247]  ? __fget+0x111/0x210
>     [ 5102.367942]  SyS_ioctl+0x41/0x70
>     [ 5102.371542]  entry_SYSCALL_64_fastpath+0x1f/0xbe
> 
> The call stack is:
> vfio_fops_unl_ioctl -> vfio_iommu_type1_ioctl -> vfio_dma_do_unmap -> blocking_notifier_call_chain -> intel_vgpu_iommu_notifier -> vfio_unpin_pages.
>   
> The container->group_lock is hold in vfio_fops_unl_ioctl first but then it will be hold again in vfio_unpin_pages.

This doesn't make sense to me, but then lockdep splats using don't to
me at first.  If we're passing through vfio_fops_unl_ioctl() for a
VFIO_IOMMU_UNMAP_DMA, then we'll be holding a read-lock on
container->group_lock.  vfio_unpin_pages() also takes a read-lock on
the same.  Why is this a problem?  We should be able to nest
read-locks.

> Regarding this, put the vfio_unpin_pages in another thread can resolve
> this recursive locking. In this way, vfio_unpin_pages will be
> asynchronies with the vfio_dma_do_unmap. Then it is possible to
> trigger below kernel panic due to this asynchronies:

This is an invalid solution and the code is punishing you for it ;)
The user is requesting to unmap pages and we must release those pages
before the kernel ioctl returns.  As you can see near the BUG_ON hit
below, we'll retrigger the blocking notifier call chain 10 times to try
to get the page we need released.  If each one of those starts a
thread, there's no guarantee that any of them will run before we hit
our retry limit.  The below is completely expected in that case. Thanks,

Alex

> [ 4468.975091] ------------[ cut here ]------------
> [ 4468.976145] kernel BUG at drivers/vfio/vfio_iommu_type1.c:833!
> [ 4468.977193] invalid opcode: 0000 [#1] SMP
> [ 4468.978232] Modules linked in: bridge stp llc nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel joydev input_leds crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd hci_uart glue_helper shpchp btbcm cryptd btqca winbond_cir rc_core btintel ipmi_ssif mei_me mei intel_pch_thermal bluetooth acpi_als kfifo_buf industrialio soc_button_array intel_vbtn ipmi_devintf ipmi_msghandler intel_hid ecdh_generic intel_lpss_acpi intel_lpss spidev sparse_keymap acpi_power_meter mac_hid sunrpc parport_pc ppdev lp parport autofs4 kvmgt vfio_mdev mdev vfio_iommu_type1 vfio kvm irqbypass hid_generic usbhid i915 drm_kms_helper igb e1000e syscopyarea sysfillrect sysimgblt dca fb_sys_fops ptp pps_core drm i2c_algo_bit ahci libahci wmi vid
 eo pinctrl_sunrisepoint
> [ 4468.982783]  pinctrl_intel i2c_hid hid
> [ 4468.983995] CPU: 3 PID: 1549 Comm: qemu-system-x86 Not tainted 4.12.0-rc7+ #1
> [ 4468.985132] Hardware name: Intel Corporation Kabylake Greenlow Refresh UP Server Platform/Zumba Beach Server EV, BIOS KBLSE2R1.R00.0006.B08.1702011304 02/
> [ 4468.986350] task: ffff8cd9afa28000 task.stack: ffffb136c2f68000
> [ 4468.987545] RIP: 0010:vfio_iommu_type1_ioctl+0x894/0x910 [vfio_iommu_type1]
> [ 4468.988864] RSP: 0018:ffffb136c2f6bd58 EFLAGS: 00010202
> [ 4468.990106] RAX: 0000000000100000 RBX: 00007f80f55b1410 RCX: 000000007ff00000
> [ 4468.991290] RDX: ffff8cd9af105d00 RSI: ffff8cd9b4835e40 RDI: 000000000000000b
> [ 4468.992536] RBP: ffffb136c2f6be30 R08: 0000000000100000 R09: 0000000080000000
> [ 4468.993749] R10: ffffb136c2f6bd30 R11: 000000000000013b R12: 0000000000000000
> [ 4468.994991] R13: ffff8cd9ad813b80 R14: ffff8cd9afa28000 R15: ffffb136c2f6bdc8
> [ 4468.996131] FS:  00007f80f55b2700(0000) GS:ffff8cd9c7d80000(0000) knlGS:0000000000000000
> [ 4468.997305] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4468.998525] CR2: 000001f016dd2218 CR3: 0000000472bfc000 CR4: 00000000003426e0
> [ 4468.999668] Call Trace:
> [ 4469.000916]  ? kvm_set_memory_region+0x38/0x60 [kvm]
> [ 4469.002072]  vfio_fops_unl_ioctl+0x7b/0x260 [vfio]
> [ 4469.003220]  do_vfs_ioctl+0xa1/0x5d0
> [ 4469.004443]  ? SyS_futex+0x7f/0x180
> [ 4469.005567]  SyS_ioctl+0x79/0x90
> [ 4469.006661]  entry_SYSCALL_64_fastpath+0x1e/0xa9
> 
> Can you help to check this recursive locking issue?
> 
> Thanks
> Chuanxiao
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: possible recursive locking issue
  2017-07-06 21:10 ` Alex Williamson
@ 2017-07-07  2:42   ` Dong, Chuanxiao
  2017-07-07  7:54   ` Paolo Bonzini
  1 sibling, 0 replies; 5+ messages in thread
From: Dong, Chuanxiao @ 2017-07-07  2:42 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kwankhede@nvidia.com, kvm@vger.kernel.org, 'Zhenyu Wang'

> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Friday, July 7, 2017 5:10 AM
> To: Dong, Chuanxiao <chuanxiao.dong@intel.com>
> Cc: kwankhede@nvidia.com; kvm@vger.kernel.org; 'Zhenyu Wang'
> <zhenyuw@linux.intel.com>
> Subject: Re: possible recursive locking issue
> 
> On Thu, 6 Jul 2017 09:39:41 +0000
> "Dong, Chuanxiao" <chuanxiao.dong@intel.com> wrote:
> 
> > Hello,
> >
> > We met a possible recursive locking issue and seeking a solution for
> resolving this. The log is looking like below:
> >
> >     [ 5102.127454] ============================================
> >     [ 5102.133379] WARNING: possible recursive locking detected
> >     [ 5102.139304] 4.12.0-rc4+ #3 Not tainted
> >     [ 5102.143483] --------------------------------------------
> >     [ 5102.149407] qemu-system-x86/1620 is trying to acquire lock:
> >     [ 5102.155624]  (&container->group_lock){++++++}, at:
> [<ffffffff817768c6>] vfio_unpin_pages+0x96/0xf0
> >     [ 5102.165626]
> >     but task is already holding lock:
> >     [ 5102.172134]  (&container->group_lock){++++++}, at:
> [<ffffffff8177728f>] vfio_fops_unl_ioctl+0x5f/0x280
> >     [ 5102.182522]
> >     other info that might help us debug this:
> >     [ 5102.189806]  Possible unsafe locking scenario:
> >
> >     [ 5102.196411]        CPU0
> >     [ 5102.199136]        ----
> >     [ 5102.201861]   lock(&container->group_lock);
> >     [ 5102.206527]   lock(&container->group_lock);
> >     [ 5102.211191]
> >     *** DEADLOCK ***
> >
> >     [ 5102.217796]  May be due to missing lock nesting notation
> >
> >     [ 5102.225370] 3 locks held by qemu-system-x86/1620:
> >     [ 5102.230618]  #0:  (&container->group_lock){++++++}, at:
> [<ffffffff8177728f>] vfio_fops_unl_ioctl+0x5f/0x280
> >     [ 5102.241482]  #1:  (&(&iommu->notifier)->rwsem){++++..}, at:
> [<ffffffff810de775>] __blocking_notifier_call_chain+0x35/0x70
> >     [ 5102.253713]  #2:  (&vgpu->vdev.cache_lock){+.+...}, at:
> [<ffffffff8157b007>] intel_vgpu_iommu_notifier+0x77/0x120
> >     [ 5102.265163]
> >     stack backtrace:
> >     [ 5102.270022] CPU: 5 PID: 1620 Comm: qemu-system-x86 Not tainted
> 4.12.0-rc4+ #3
> >     [ 5102.277991] Hardware name: Intel Corporation S1200RP/S1200RP,
> BIOS S1200RP.86B.03.01.APER.061220151418 06/12/2015
> >     [ 5102.289445] Call Trace:
> >     [ 5102.292175]  dump_stack+0x85/0xc7
> >     [ 5102.295871]  validate_chain.isra.21+0x9da/0xaf0
> >     [ 5102.300925]  __lock_acquire+0x405/0x820
> >     [ 5102.305202]  lock_acquire+0xc7/0x220
> >     [ 5102.309191]  ? vfio_unpin_pages+0x96/0xf0
> >     [ 5102.313666]  down_read+0x2b/0x50
> >     [ 5102.317259]  ? vfio_unpin_pages+0x96/0xf0
> >     [ 5102.321732]  vfio_unpin_pages+0x96/0xf0
> >     [ 5102.326024]  intel_vgpu_iommu_notifier+0xe5/0x120
> >     [ 5102.331283]  notifier_call_chain+0x4a/0x70
> >     [ 5102.335851]  __blocking_notifier_call_chain+0x4d/0x70
> >     [ 5102.341490]  blocking_notifier_call_chain+0x16/0x20
> >     [ 5102.346935]  vfio_iommu_type1_ioctl+0x87b/0x920
> >     [ 5102.351994]  vfio_fops_unl_ioctl+0x81/0x280
> >     [ 5102.356660]  ? __fget+0xf0/0x210
> >     [ 5102.360261]  do_vfs_ioctl+0x93/0x6a0
> >     [ 5102.364247]  ? __fget+0x111/0x210
> >     [ 5102.367942]  SyS_ioctl+0x41/0x70
> >     [ 5102.371542]  entry_SYSCALL_64_fastpath+0x1f/0xbe
> >
> > The call stack is:
> > vfio_fops_unl_ioctl -> vfio_iommu_type1_ioctl -> vfio_dma_do_unmap ->
> blocking_notifier_call_chain -> intel_vgpu_iommu_notifier ->
> vfio_unpin_pages.
> >
> > The container->group_lock is hold in vfio_fops_unl_ioctl first but then it
> will be hold again in vfio_unpin_pages.
> 
> This doesn't make sense to me, but then lockdep splats using don't to me at
> first.  If we're passing through vfio_fops_unl_ioctl() for a
> VFIO_IOMMU_UNMAP_DMA, then we'll be holding a read-lock on
> container->group_lock.  vfio_unpin_pages() also takes a read-lock on
> the same.  Why is this a problem?  We should be able to nest read-locks.

Hi Alex,

This trace is captured with CONFIG_DEBUG_LOCK_ALLOC. From below comment in include/linux/rwsem.h:
#ifdef CONFIG_DEBUG_LOCK_ALLOC
/*
 * nested locking. NOTE: rwsems are not allowed to recurse
 * (which occurs if the same task tries to acquire the same
 * lock instance multiple times), but multiple locks of the
 * same lock class might be taken, if the order of the locks
 * is always the same. This ordering rule can be expressed
 * to lockdep via the _nested() APIs, but enumerating the
 * subclasses that are used. (If the nesting relationship is
 * static then another method for expressing nested locking is
 * the explicit definition of lock class keys and the use of
 * lockdep_set_class() at lock initialization time.
 * See Documentation/locking/lockdep-design.txt for more details.)
 */
extern void down_read_nested(struct rw_semaphore *sem, int subclass);

So looks like using down_read_nested in vfio_unping_pages is a better choice than down_read if we know it is possible that rwsems will be nested. When CONFIG_DEBUG_LOCK_ALLOC is not set, down_read_nested is just the same with down_read. What do you think?

Thanks
Chuanxiao
> 
> > Regarding this, put the vfio_unpin_pages in another thread can resolve
> > this recursive locking. In this way, vfio_unpin_pages will be
> > asynchronies with the vfio_dma_do_unmap. Then it is possible to
> > trigger below kernel panic due to this asynchronies:
> 
> This is an invalid solution and the code is punishing you for it ;) The user is
> requesting to unmap pages and we must release those pages before the
> kernel ioctl returns.  As you can see near the BUG_ON hit below, we'll
> retrigger the blocking notifier call chain 10 times to try to get the page we
> need released.  If each one of those starts a thread, there's no guarantee
> that any of them will run before we hit our retry limit.  The below is
> completely expected in that case. Thanks,
> 
> Alex
> 
> > [ 4468.975091] ------------[ cut here ]------------ [ 4468.976145]
> > kernel BUG at drivers/vfio/vfio_iommu_type1.c:833!
> > [ 4468.977193] invalid opcode: 0000 [#1] SMP [ 4468.978232] Modules
> > linked in: bridge stp llc nls_iso8859_1 intel_rapl
> > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel joydev
> > input_leds crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc
> > aesni_intel aes_x86_64 crypto_simd hci_uart glue_helper shpchp btbcm
> > cryptd btqca winbond_cir rc_core btintel ipmi_ssif mei_me mei
> > intel_pch_thermal bluetooth acpi_als kfifo_buf industrialio
> > soc_button_array intel_vbtn ipmi_devintf ipmi_msghandler intel_hid
> > ecdh_generic intel_lpss_acpi intel_lpss spidev sparse_keymap
> > acpi_power_meter mac_hid sunrpc parport_pc ppdev lp parport autofs4
> > kvmgt vfio_mdev mdev vfio_iommu_type1 vfio kvm irqbypass hid_generic
> > usbhid i915 drm_kms_helper igb e1000e syscopyarea sysfillrect
> > sysimgblt dca fb_sys_fops ptp pps_core drm i2c_algo_bit ahci libahci
> > wmi video pinctrl_sunrisepoint [ 4468.982783]  pinctrl_intel i2c_hid
> > hid [ 4468.983995] CPU: 3 PID: 1549 Comm: qemu-system-x86 Not tainted
> > 4.12.0-rc7+ #1 [ 4468.985132] Hardware name: Intel Corporation
> > Kabylake Greenlow Refresh UP Server Platform/Zumba Beach Server EV,
> BIOS KBLSE2R1.R00.0006.B08.1702011304 02/ [ 4468.986350] task:
> ffff8cd9afa28000 task.stack: ffffb136c2f68000 [ 4468.987545] RIP:
> 0010:vfio_iommu_type1_ioctl+0x894/0x910 [vfio_iommu_type1]
> [ 4468.988864] RSP: 0018:ffffb136c2f6bd58 EFLAGS: 00010202
> [ 4468.990106] RAX: 0000000000100000 RBX: 00007f80f55b1410 RCX:
> 000000007ff00000 [ 4468.991290] RDX: ffff8cd9af105d00 RSI:
> ffff8cd9b4835e40 RDI: 000000000000000b [ 4468.992536] RBP:
> ffffb136c2f6be30 R08: 0000000000100000 R09: 0000000080000000
> [ 4468.993749] R10: ffffb136c2f6bd30 R11: 000000000000013b R12:
> 0000000000000000 [ 4468.994991] R13: ffff8cd9ad813b80 R14:
> ffff8cd9afa28000 R15: ffffb136c2f6bdc8 [ 4468.996131] FS:
> 00007f80f55b2700(0000) GS:ffff8cd9c7d80000(0000)
> knlGS:0000000000000000 [ 4468.997305] CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033 [ 4468.998525] CR2: 000001f016dd2218 CR3:
> 0000000472bfc000 CR4: 00000000003426e0 [ 4468.999668] Call Trace:
> > [ 4469.000916]  ? kvm_set_memory_region+0x38/0x60 [kvm]
> [ 4469.002072]
> > vfio_fops_unl_ioctl+0x7b/0x260 [vfio] [ 4469.003220]
> > do_vfs_ioctl+0xa1/0x5d0 [ 4469.004443]  ? SyS_futex+0x7f/0x180 [
> > 4469.005567]  SyS_ioctl+0x79/0x90 [ 4469.006661]
> > entry_SYSCALL_64_fastpath+0x1e/0xa9
> >
> > Can you help to check this recursive locking issue?
> >
> > Thanks
> > Chuanxiao
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible recursive locking issue
  2017-07-06 21:10 ` Alex Williamson
  2017-07-07  2:42   ` Dong, Chuanxiao
@ 2017-07-07  7:54   ` Paolo Bonzini
  2017-07-07 15:50     ` Alex Williamson
  1 sibling, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2017-07-07  7:54 UTC (permalink / raw)
  To: Alex Williamson, Dong, Chuanxiao
  Cc: kwankhede@nvidia.com, kvm@vger.kernel.org, 'Zhenyu Wang'



On 06/07/2017 23:10, Alex Williamson wrote:
> vfio_unpin_pages() also takes a read-lock on
> the same.  Why is this a problem?  We should be able to nest
> read-locks.

rwsem is fair in that it blocks out new readers if a writer is waiting.
In this case nesting causes a deadlock, because the outer read-lock will
never be released.

Paolo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible recursive locking issue
  2017-07-07  7:54   ` Paolo Bonzini
@ 2017-07-07 15:50     ` Alex Williamson
  0 siblings, 0 replies; 5+ messages in thread
From: Alex Williamson @ 2017-07-07 15:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Dong, Chuanxiao, kwankhede@nvidia.com, kvm@vger.kernel.org,
	'Zhenyu Wang'

On Fri, 7 Jul 2017 09:54:13 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 06/07/2017 23:10, Alex Williamson wrote:
> > vfio_unpin_pages() also takes a read-lock on
> > the same.  Why is this a problem?  We should be able to nest
> > read-locks.  
> 
> rwsem is fair in that it blocks out new readers if a writer is waiting.
> In this case nesting causes a deadlock, because the outer read-lock will
> never be released.

Ok, that certainly explains the potential deadlock.  The _nested
variants Chuanxiao suggests only seems to potentially silence lockdep,
they don't change the non-debug versions afaict.  An ideal solution
would be to determine that we don't really need that lock in the unpin
path, otherwise an ugly (but straight forward) solution might be to
make unlocked variants of those call.  More investigation of exactly
what we're locking and why is required.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-07-07 15:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-06  9:39 possible recursive locking issue Dong, Chuanxiao
2017-07-06 21:10 ` Alex Williamson
2017-07-07  2:42   ` Dong, Chuanxiao
2017-07-07  7:54   ` Paolo Bonzini
2017-07-07 15:50     ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox