public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: fengnan chang <fengnanchang@gmail.com>
To: linux-pci@vger.kernel.org, lukas@wunner.de, kvm@vger.kernel.org,
	alex.williamson@redhat.com, bhelgaas@google.com
Subject: Re: Deadlock during PCIe hot remove and SPDK exit
Date: Thu, 28 Nov 2024 11:50:09 +0800	[thread overview]
Message-ID: <A8CD6F73-CDBC-45D1-A8DF-CB583962DB8C@gmail.com> (raw)
In-Reply-To: <D0B37524-9444-423B-9E48-406CF9A29A6A@gmail.com>



> 2024年11月27日 14:56,fengnan chang <fengnanchang@gmail.com> 写道:
> 
> Dear PCI maintainers:
>   I'm having a deadlock issue, somewhat similar to a previous one https://lore.kernel.org/linux-pci/CS1PR8401MB0728FC6FDAB8A35C22BD90EC95F10@CS1PR8401MB0728.NAMPRD84.PROD.OUTLOOK.COM/#t, but my kernel (6.6.40) already included the fix f5eff55. 
>   Here is my test process, I’m running kernel with 6.6.40 and SPDK v22.05:
>   1. SPDK use vfio driver to takeover two nvme disks, running some io in nvme.
>   2. pull out two nvme disks
>   3. Try to kill -9 SPDK process.
>   Then deadlock issue happened. For now I can 100% reproduce this problem. I’m not an export in PCI, but I did a brief analysis:
>   irq 149 thread take pci_rescan_remove_lock mutex lock, and wait for SPDK to release vfio.
>   irq 148 thread take reset_lock of ctrl A, and wait for psi_rescan_remove_lock
>   SPDK process try to release vfio driver, but wait for reset_lock of ctrl A.
> 
> 
> irq/149-pciehp stack, cat /proc/514/stack, 
> [<0>] pciehp_unconfigure_device+0x48/0x160 // wait for pci_rescan_remove_lock
> [<0>] pciehp_disable_slot+0x6b/0x130       // hold reset_lock of ctrl A
> [<0>] pciehp_handle_presence_or_link_change+0x7d/0x4d0
> [<0>] pciehp_ist+0x236/0x260
> [<0>] irq_thread_fn+0x1b/0x60
> [<0>] irq_thread+0xed/0x190
> [<0>] kthread+0xe4/0x110
> [<0>] ret_from_fork+0x2d/0x50
> [<0>] ret_from_fork_asm+0x11/0x20
> 
> 
> irq/148-pciehp stack, cat /proc/513/stack
> [<0>] vfio_unregister_group_dev+0x97/0xe0 [vfio]     //wait for

My mistake, this is wait for SPDK to release vfio device. This problem can reproduce in 6.12.
Besides, My college give me an idea, we can make vfio_device_fops_release be async, so when we close fd, it
won’t block, and when we close another fd, it will release  vfio device, this stack will not block too, then the deadlock disappears.

Here is my test patch, cc some vfio guys:
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 50128da18bca..4ebe154a4ae5 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -19,6 +19,7 @@ struct vfio_container;
 struct vfio_device_file {
        struct vfio_device *device;
        struct vfio_group *group;
+       struct work_struct      release_work;

        u8 access_granted;
        u32 devid; /* only valid when iommufd is valid */
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index a5a62d9d963f..47e3e3f73d70 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -487,6 +487,22 @@ static bool vfio_assert_device_open(struct vfio_device *device)
        return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
 }

+static void vfio_fops_release_work(struct work_struct *work)
+{
+       struct vfio_device_file *df =
+               container_of(work, struct vfio_device_file, release_work);
+       struct vfio_device *device = df->device;
+
+       if (df->group)
+               vfio_df_group_close(df);
+       else
+               vfio_df_unbind_iommufd(df);
+
+       vfio_device_put_registration(device);
+
+       kfree(df);
+}
+
 struct vfio_device_file *
 vfio_allocate_device_file(struct vfio_device *device)
 {
@@ -497,6 +513,7 @@ vfio_allocate_device_file(struct vfio_device *device)
                return ERR_PTR(-ENOMEM);

        df->device = device;
+       INIT_WORK(&df->release_work, vfio_fops_release_work);
        spin_lock_init(&df->kvm_ref_lock);

        return df;
@@ -628,16 +645,8 @@ static inline void vfio_device_pm_runtime_put(struct vfio_device *device)
 static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 {
        struct vfio_device_file *df = filep->private_data;
-       struct vfio_device *device = df->device;

-       if (df->group)
-               vfio_df_group_close(df);
-       else
-               vfio_df_unbind_iommufd(df);
-
-       vfio_device_put_registration(device);
-
-       kfree(df);
+       schedule_work(&df->release_work);

        return 0;
 }


> [<0>] vfio_pci_core_unregister_device+0x19/0x80 [vfio_pci_core]
> [<0>] vfio_pci_remove+0x15/0x20 [vfio_pci]
> [<0>] pci_device_remove+0x39/0xb0
> [<0>] device_release_driver_internal+0xad/0x120
> [<0>] pci_stop_bus_device+0x5d/0x80
> [<0>] pci_stop_and_remove_bus_device+0xe/0x20
> [<0>] pciehp_unconfigure_device+0x91/0x160   //hold pci_rescan_remove_lock, release reset_lock of ctrl B 
> [<0>] pciehp_disable_slot+0x6b/0x130
> [<0>] pciehp_handle_presence_or_link_change+0x7d/0x4d0
> [<0>] pciehp_ist+0x236/0x260             //hold reset_lock of ctrl B 
> [<0>] irq_thread_fn+0x1b/0x60
> [<0>] irq_thread+0xed/0x190
> [<0>] kthread+0xe4/0x110
> [<0>] ret_from_fork+0x2d/0x50
> [<0>] ret_from_fork_asm+0x11/0x20
> 
> 
> SPDK stack, cat /proc/166634/task/167181/stack
> [<0>] down_write_nested+0x1b7/0x1c0            //wait for reset_lock of ctrl A.
> [<0>] pciehp_reset_slot+0x58/0x160
> [<0>] pci_reset_hotplug_slot+0x3b/0x60
> [<0>] pci_reset_bus_function+0x3b/0xb0
> [<0>] __pci_reset_function_locked+0x3e/0x60
> [<0>] vfio_pci_core_disable+0x3ce/0x400 [vfio_pci_core]
> [<0>] vfio_pci_core_close_device+0x67/0xc0 [vfio_pci_core]
> [<0>] vfio_df_close+0x79/0xd0 [vfio]
> [<0>] vfio_df_group_close+0x36/0x70 [vfio]
> [<0>] vfio_device_fops_release+0x20/0x40 [vfio]
> [<0>] __fput+0xec/0x290
> [<0>] task_work_run+0x61/0x90
> [<0>] do_exit+0x39c/0xc40
> [<0>] do_group_exit+0x33/0xa0
> [<0>] get_signal+0xd84/0xd90
> [<0>] arch_do_signal_or_restart+0x2a/0x260
> [<0>] exit_to_user_mode_prepare+0x1c7/0x240
> [<0>] syscall_exit_to_user_mode+0x2a/0x60
> [<0>] do_syscall_64+0x3e/0x90
> 


       reply	other threads:[~2024-11-28  3:50 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <D0B37524-9444-423B-9E48-406CF9A29A6A@gmail.com>
2024-11-28  3:50 ` fengnan chang [this message]
2024-11-29  8:05   ` Deadlock during PCIe hot remove and SPDK exit Yi Liu
2024-12-02  3:25     ` fengnan chang
2024-12-03  5:36       ` Yi Liu
2024-12-03  9:31         ` fengnan chang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A8CD6F73-CDBC-45D1-A8DF-CB583962DB8C@gmail.com \
    --to=fengnanchang@gmail.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox