All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: moubingquan <moubingquan@h-partners.com>
Cc: bhelgaas@google.com, ilpo.jarvinen@linux.intel.com,
	linux-arm <linux-arm-kernel@lists.infradead.org>,
	linux-kernl <linux-kernel@vger.kernel.org>,
	linux-pci <linux-pci@vger.kernel.org>,
	fanghao <fanghao11@huawei.com>,
	gaozhihao <gaozhihao6@h-partners.com>,
	lujunhua <lujunhua7@h-partners.com>,
	shenyang <shenyang39@huawei.com>,
	wushanping <wushanping@huawei.com>,
	zengtao <prime.zeng@hisilicon.com>,
	jonathan.cameron@huawei.com,
	Niklas Schnelle <schnelle@linux.ibm.com>
Subject: Re: [BUG] sysfs: duplicate resource file creation during PCIe rescan
Date: Sat, 30 Aug 2025 10:28:02 +0200	[thread overview]
Message-ID: <aLK2EmT5kL_AfWim@wunner.de> (raw)
In-Reply-To: <b51519d6-ce45-4b6d-8135-c70169bd110e@h-partners.com>

On Tue, Jul 29, 2025 at 09:13:19AM +0800, moubingquan wrote:
> When uninstalling the driver on the ARM64 platform and invoking `sriov_disable()`,
> another thread simultaneously performed `echo 1 > /sys/bus/pci/rescan`,
> which resulted in a call trace error (`sysfs: cannot create duplicate filename`) when subsequently loading the driver.
> 
> Under certain multi-threaded scenarios (e.g. VF teardown + PCI rescan triggered in parallel),
> The following sequence may result in files appearing in sysfs that should not exist:
> 
> 1. sriov_disable() uninstalls VFs and removes the corresponding VF files in sysfs.
> 2.At the same time, when rescan_store() rescan the entire PCI device tree,
> there is a possibility that VF files that should have been deleted are added back,
> resulting in VF files in sysfs that should have been removed but were not.

Could you test if this pending series from Niklas (+cc) fixes the issue for you:

https://lore.kernel.org/r/20250826-pci_fix_sriov_disable-v1-0-2d0bc938f2a3@linux.ibm.com

If it does, could you reply with a Tested-by to his cover letter?

Thanks,

Lukas

> Tested on:
> - Kernel version: Linux version 6.15.0-rc4+ (phisik3@10-50-163-153-ARM-openEuler22-3)
> - Platform: ARM64 (Huawei Kunpeng920)
> - Repro steps:
> 1.Thread A unloads the driver and VF (requires calling sriov_disable()).
> 2.Thread B calls `echo 1 > /sys/bus/pci/rescan `
> 
> The system will report a call trace as follows:
> 
> sysfs: cannot create duplicate filename '/devices/pci0000:3c/0000:3c:01.0/0000:3e:00.2/resource2'
> CPU: 138 UID: 0 PID: 11067 Comm: sh Kdump: loaded Tainted: G      D W  O        6.15.0-rc4+ #1 PREEMPT
> Tainted: [D]=DIE, [W]=WARN, [O]=OOT_MODULE
> Call trace:
>  show_stack+0x20/0x38 (C)
>  dump_stack_lvl+0x80/0xf8
>  dump_stack+0x18/0x28
>  sysfs_warn_dup+0x6c/0x90
>  sysfs_add_bin_file_mode_ns+0x12c/0x178
>  sysfs_create_bin_file+0x7c/0xb8
>  pci_create_attr+0x104/0x1b0
>  pci_create_resource_files.part.0+0x50/0xd0
>  pci_create_sysfs_dev_files+0x30/0x50
>  pci_bus_add_device+0x40/0x120
>  pci_bus_add_devices+0x40/0x98
>  pci_bus_add_devices+0x6c/0x98
>  pci_rescan_bus+0x38/0x58
>  rescan_store+0x80/0xb0
>  bus_attr_store+0x2c/0x48
>  sysfs_kf_write+0x84/0xa8
>  kernfs_fop_write_iter+0x120/0x1b8
>  vfs_write+0x338/0x3f8
>  ksys_write+0x70/0x110
>  __arm64_sys_write+0x24/0x38
>  invoke_syscall+0x50/0x120
>  el0_svc_common.constprop.0+0xc8/0xf0
>  do_el0_svc+0x24/0x38
>  el0_svc+0x34/0xf0
>  el0t_64_sync_handler+0xc8/0xd0
>  el0t_64_sync+0x1ac/0x1b0
> 
> The general analysis and corresponding code are as follows:
> 
> drivers/pci/iov.c
> 
> 736 static void sriov_disable(struct pci_dev *dev)
> 737 {
> 738         struct pci_sriov *iov = dev->sriov;
> 739
> 740         if (!iov->num_VFs)
> 741                 return;
> 742
> 743         sriov_del_vfs(dev);
> 744         iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
> 745         pci_cfg_access_lock(dev);
> 746         pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
> 747         ssleep(1);
> 748         pci_cfg_access_unlock(dev);
> 749
> 750         pcibios_sriov_disable(dev);
> 751
> 752         if (iov->link != dev->devfn)
> 753                 sysfs_remove_link(&dev->dev.kobj, "dep_link");
> 754
> 755         iov->num_VFs = 0;
> 756         pci_iov_set_numvfs(dev, 0);
> 757 }
> 
> sriov_disable() will unload the VF and remove its files from sysfs.
> 
> drivers/pci/pci-sysfs.c
> 
>  435 static ssize_t rescan_store(const struct bus_type *bus, const char *buf, size_t count)
>  436 {
>  437         unsigned long val;
>  438         struct pci_bus *b = NULL;
>  439
>  440         if (kstrtoul(buf, 0, &val) < 0)
>  441                 return -EINVAL;
>  442
>  443         if (val) {
>  444                 pci_lock_rescan_remove();
>  445                 while ((b = pci_find_next_bus(b)) != NULL)
>  446                         pci_rescan_bus(b);
>  447                 pci_unlock_rescan_remove();
>  448         }
>  449         return count;
>  450 }
>  451 static BUS_ATTR_WO(rescan);
> 
> The `rescan_store()` function will scan the entire PCI bus, including the relevant sysfs files for the aforementioned VFs.
> 
> Initially, it seemed that directly adding the pci_lock_rescan_remove() lock to sriov_disable() could solve the problem.
> However, it was later discovered that this might lead to a deadlock issue (because the remove_store() function would call sriov_disable(), causing a deadlock).
> 
> drivers/pci/pci-sysfs.c
> 
>  487 static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
>  488                             const char *buf, size_t count)
>  489 {
>  490         unsigned long val;
>  491
>  492         if (kstrtoul(buf, 0, &val) < 0)
>  493                 return -EINVAL;
>  494
>  495         if (val && device_remove_file_self(dev, attr))
>  496
>         //Subsequently, sriov_disable() will be invoked.
>                 pci_stop_and_remove_bus_device_locked(to_pci_dev(dev));
>  497         return count;
>  498 }
>  499 static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0220, NULL,
>  500                                   remove_store);
> 
> The function `pci_stop_and_remove_bus_device_locked()` acquires the `pci_lock_rescan_remove()` lock and subsequently calls `sriov_disable()`.
> If the lock is added within `sriov_disable()`, it could lead to a deadlock.
> 
> Should we add a new lock to address this issue, or are there any other ideas? I would like to consult and discuss this with everyone.
> 
> Thanks,
> Bingquan Mou <moubingquan@h-partners.com>

      parent reply	other threads:[~2025-08-30  8:28 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-29  1:13 [BUG] sysfs: duplicate resource file creation during PCIe rescan moubingquan
2025-08-11  2:12 ` moubingquan
2025-08-30  3:10   ` moubingquan
2025-08-30  8:28 ` Lukas Wunner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aLK2EmT5kL_AfWim@wunner.de \
    --to=lukas@wunner.de \
    --cc=bhelgaas@google.com \
    --cc=fanghao11@huawei.com \
    --cc=gaozhihao6@h-partners.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lujunhua7@h-partners.com \
    --cc=moubingquan@h-partners.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=schnelle@linux.ibm.com \
    --cc=shenyang39@huawei.com \
    --cc=wushanping@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.