From: "Cédric Le Goater" <clg@redhat.com>
To: Farhan Ali <alifm@linux.ibm.com>,
linux-s390@vger.kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
Cc: alex.williamson@redhat.com, helgaas@kernel.org,
schnelle@linux.ibm.com, mjrosato@linux.ibm.com
Subject: Re: [PATCH v3 03/10] PCI: Allow per function PCI slots
Date: Tue, 16 Sep 2025 08:52:33 +0200 [thread overview]
Message-ID: <07205677-09f0-464b-b31c-0fb5493a1d81@redhat.com> (raw)
In-Reply-To: <20250911183307.1910-4-alifm@linux.ibm.com>
Hello Ali,
On 9/11/25 20:33, Farhan Ali wrote:
> On s390 systems, which use a machine level hypervisor, PCI devices are
> always accessed through a form of PCI pass-through which fundamentally
> operates on a per PCI function granularity. This is also reflected in the
> s390 PCI hotplug driver which creates hotplug slots for individual PCI
> functions. Its reset_slot() function, which is a wrapper for
> zpci_hot_reset_device(), thus also resets individual functions.
>
> Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
> to multifunction devices. This approach worked fine on s390 systems that
> only exposed virtual functions as individual PCI domains to the operating
> system. Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
> s390 supports exposing the topology of multifunction PCI devices by
> grouping them in a shared PCI domain. When attempting to reset a function
> through the hotplug driver, the shared slot assignment causes the wrong
> function to be reset instead of the intended one. It also leaks memory as
> we do create a pci_slot object for the function, but don't correctly free
> it in pci_slot_release().
>
> Add a flag for struct pci_slot to allow per function PCI slots for
> functions managed through a hypervisor, which exposes individual PCI
> functions while retaining the topology.
>
> Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions")
> Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
> drivers/pci/hotplug/s390_pci_hpc.c | 10 ++++++++--
> drivers/pci/pci.c | 4 +++-
> drivers/pci/slot.c | 14 +++++++++++---
> include/linux/pci.h | 1 +
> 4 files changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/pci/hotplug/s390_pci_hpc.c b/drivers/pci/hotplug/s390_pci_hpc.c
> index d9996516f49e..8b547de464bf 100644
> --- a/drivers/pci/hotplug/s390_pci_hpc.c
> +++ b/drivers/pci/hotplug/s390_pci_hpc.c
> @@ -126,14 +126,20 @@ static const struct hotplug_slot_ops s390_hotplug_slot_ops = {
>
> int zpci_init_slot(struct zpci_dev *zdev)
> {
> + int ret;
> char name[SLOT_NAME_SIZE];
> struct zpci_bus *zbus = zdev->zbus;
>
> zdev->hotplug_slot.ops = &s390_hotplug_slot_ops;
>
> snprintf(name, SLOT_NAME_SIZE, "%08x", zdev->fid);
> - return pci_hp_register(&zdev->hotplug_slot, zbus->bus,
> - zdev->devfn, name);
> + ret = pci_hp_register(&zdev->hotplug_slot, zbus->bus,
> + zdev->devfn, name);
> + if (ret)
> + return ret;
> +
> + zdev->hotplug_slot.pci_slot->per_func_slot = 1;
> + return 0;
> }
>
> void zpci_exit_slot(struct zpci_dev *zdev)
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 3994fa82df68..70296d3b1cfc 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5061,7 +5061,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe)
>
> static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
> {
> - if (dev->multifunction || dev->subordinate || !dev->slot ||
> + if (dev->multifunction && !dev->slot->per_func_slot)
> + return -ENOTTY;
> + if (dev->subordinate || !dev->slot ||
> dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
> return -ENOTTY;
>
> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
> index 50fb3eb595fe..51ee59e14393 100644
> --- a/drivers/pci/slot.c
> +++ b/drivers/pci/slot.c
> @@ -63,6 +63,14 @@ static ssize_t cur_speed_read_file(struct pci_slot *slot, char *buf)
> return bus_speed_read(slot->bus->cur_bus_speed, buf);
> }
>
> +static bool pci_dev_matches_slot(struct pci_dev *dev, struct pci_slot *slot)
> +{
> + if (slot->per_func_slot)
> + return dev->devfn == slot->number;
> +
> + return PCI_SLOT(dev->devfn) == slot->number;
> +}
> +
> static void pci_slot_release(struct kobject *kobj)
> {
> struct pci_dev *dev;
> @@ -73,7 +81,7 @@ static void pci_slot_release(struct kobject *kobj)
>
> down_read(&pci_bus_sem);
> list_for_each_entry(dev, &slot->bus->devices, bus_list)
> - if (PCI_SLOT(dev->devfn) == slot->number)
> + if (pci_dev_matches_slot(dev, slot))
> dev->slot = NULL;
> up_read(&pci_bus_sem);
>
> @@ -166,7 +174,7 @@ void pci_dev_assign_slot(struct pci_dev *dev)
>
> mutex_lock(&pci_slot_mutex);
> list_for_each_entry(slot, &dev->bus->slots, list)
> - if (PCI_SLOT(dev->devfn) == slot->number)
> + if (pci_dev_matches_slot(dev, slot))
> dev->slot = slot;
> mutex_unlock(&pci_slot_mutex);
> }
> @@ -285,7 +293,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
>
> down_read(&pci_bus_sem);
> list_for_each_entry(dev, &parent->devices, bus_list)
> - if (PCI_SLOT(dev->devfn) == slot_nr)
> + if (pci_dev_matches_slot(dev, slot))
> dev->slot = slot;
> up_read(&pci_bus_sem);
>
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 59876de13860..9265f32d9786 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -78,6 +78,7 @@ struct pci_slot {
> struct list_head list; /* Node in list of slots */
> struct hotplug_slot *hotplug; /* Hotplug info (move here) */
> unsigned char number; /* PCI_SLOT(pci_dev->devfn) */
> + unsigned int per_func_slot:1; /* Allow per function slot */
> struct kobject kobj;
> };
>
This change generates a kernel oops on x86_64. It can be reproduced in a VM.
C.
[ 3.073990] BUG: kernel NULL pointer dereference, address: 0000000000000021
[ 3.074976] #PF: supervisor read access in kernel mode
[ 3.074976] #PF: error_code(0x0000) - not-present page
[ 3.074976] PGD 0 P4D 0
[ 3.074976] Oops: Oops: 0000 [#1] SMP NOPTI
[ 3.074976] CPU: 18 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.0-rc6-clg-dirty #8 PREEMPT(voluntary)
[ 3.074976] Hardware name: Supermicro Super Server/X13SAE-F, BIOS 4.2 12/17/2024
[ 3.074976] RIP: 0010:pci_reset_bus_function+0xdf/0x160
[ 3.074976] Code: 4e 08 00 00 40 0f 85 83 00 00 00 48 8b 78 18 e8 27 9d ff ff 83 f8 e7 74 17 48 83 c4 08 5b 5d 41 5c c3 cc cc cc cc 48 8b 43 30 <f6> 40 21 01 75 b6 48 8b 53 10 48 83 7a 10 00 74 5e 48 83 7b 18 00
[ 3.074976] RSP: 0000:ffffcd808007b9a8 EFLAGS: 00010202
[ 3.074976] RAX: 0000000000000000 RBX: ffff88c4019b8000 RCX: 0000000000000000
[ 3.074976] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88c4019b8000
[ 3.074976] RBP: 0000000000000001 R08: 0000000000000002 R09: ffffcd808007b99c
[ 3.074976] R10: ffffcd808007b950 R11: 0000000000000000 R12: 0000000000000001
[ 3.074976] R13: ffff88c4019b80c8 R14: ffff88c401a7e028 R15: ffff88c401a73400
[ 3.074976] FS: 0000000000000000(0000) GS:ffff88d38aad5000(0000) knlGS:0000000000000000
[ 3.074976] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.074976] CR2: 0000000000000021 CR3: 0000000f66222001 CR4: 0000000000770ef0
[ 3.074976] PKRU: 55555554
[ 3.074976] Call Trace:
[ 3.074976] <TASK>
[ 3.074976] ? pci_pm_reset+0x39/0x180
[ 3.074976] pci_init_reset_methods+0x52/0x80
[ 3.074976] pci_device_add+0x215/0x5d0
[ 3.074976] pci_scan_single_device+0xa2/0xe0
[ 3.074976] pci_scan_slot+0x66/0x1c0
[ 3.074976] ? klist_next+0x145/0x150
[ 3.074976] pci_scan_child_bus_extend+0x3a/0x290
[ 3.074976] acpi_pci_root_create+0x236/0x2a0
[ 3.074976] pci_acpi_scan_root+0x19b/0x1f0
[ 3.074976] acpi_pci_root_add+0x1a5/0x370
[ 3.074976] acpi_bus_attach+0x1a8/0x290
[ 3.074976] ? __pfx_acpi_dev_for_one_check+0x10/0x10
[ 3.074976] device_for_each_child+0x4b/0x80
[ 3.074976] acpi_dev_for_each_child+0x28/0x40
[ 3.074976] ? __pfx_acpi_bus_attach+0x10/0x10
[ 3.074976] acpi_bus_attach+0x7a/0x290
[ 3.074976] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 3.074976] ? __pfx_acpi_dev_for_one_check+0x10/0x10
[ 3.074976] device_for_each_child+0x4b/0x80
[ 3.074976] acpi_dev_for_each_child+0x28/0x40
[ 3.074976] ? __pfx_acpi_bus_attach+0x10/0x10
[ 3.074976] acpi_bus_attach+0x7a/0x290
[ 3.074976] acpi_bus_scan+0x6a/0x1c0
[ 3.074976] ? __pfx_acpi_init+0x10/0x10
[ 3.074976] acpi_scan_init+0xdc/0x280
[ 3.074976] ? __pfx_acpi_init+0x10/0x10
[ 3.074976] acpi_init+0x218/0x530
[ 3.074976] do_one_initcall+0x40/0x310
[ 3.074976] kernel_init_freeable+0x2fe/0x450
[ 3.074976] ? __pfx_kernel_init+0x10/0x10
[ 3.074976] kernel_init+0x16/0x1d0
[ 3.074976] ret_from_fork+0x1ab/0x1e0
[ 3.074976] ? __pfx_kernel_init+0x10/0x10
[ 3.074976] ret_from_fork_asm+0x1a/0x30
[ 3.074976] </TASK>
[ 3.074976] Modules linked in:
[ 3.074976] CR2: 0000000000000021
[ 3.074976] ---[ end trace 0000000000000000 ]---
[ 3.074976] RIP: 0010:pci_reset_bus_function+0xdf/0x160
next prev parent reply other threads:[~2025-09-16 6:52 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-11 18:32 [PATCH v3 00/10] Error recovery for vfio-pci devices on s390x Farhan Ali
2025-09-11 18:32 ` [PATCH v3 01/10] PCI: Avoid saving error values for config space Farhan Ali
2025-09-13 8:27 ` Alex Williamson
2025-09-15 17:15 ` Farhan Ali
2025-09-16 18:09 ` Bjorn Helgaas
2025-09-16 20:00 ` Farhan Ali
2025-09-19 18:17 ` Alex Williamson
2025-09-11 18:32 ` [PATCH v3 02/10] PCI: Add additional checks for flr reset Farhan Ali
2025-09-11 18:33 ` [PATCH v3 03/10] PCI: Allow per function PCI slots Farhan Ali
2025-09-12 12:23 ` Benjamin Block
2025-09-12 17:19 ` Farhan Ali
2025-09-16 6:52 ` Cédric Le Goater [this message]
2025-09-16 18:37 ` Farhan Ali
2025-09-17 6:21 ` Cédric Le Goater
2025-09-17 17:50 ` Farhan Ali
2025-09-11 18:33 ` [PATCH v3 04/10] s390/pci: Add architecture specific resource/bus address translation Farhan Ali
2025-09-17 14:48 ` Niklas Schnelle
2025-09-17 17:22 ` Farhan Ali
2025-09-11 18:33 ` [PATCH v3 05/10] s390/pci: Restore IRQ unconditionally for the zPCI device Farhan Ali
2025-09-15 8:39 ` Niklas Schnelle
2025-09-15 17:42 ` Farhan Ali
2025-09-16 10:59 ` Niklas Schnelle
2025-09-11 18:33 ` [PATCH v3 06/10] s390/pci: Update the logic for detecting passthrough device Farhan Ali
2025-09-15 9:22 ` Niklas Schnelle
2025-09-11 18:33 ` [PATCH v3 07/10] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2025-09-15 11:42 ` Niklas Schnelle
2025-09-15 18:12 ` Farhan Ali
2025-09-16 10:54 ` Niklas Schnelle
2025-09-11 18:33 ` [PATCH v3 08/10] vfio-pci/zdev: Add a device feature for error information Farhan Ali
2025-09-13 9:04 ` Alex Williamson
2025-09-15 18:27 ` Farhan Ali
2025-09-15 6:26 ` Cédric Le Goater
2025-09-15 18:27 ` Farhan Ali
2025-09-11 18:33 ` [PATCH v3 09/10] vfio: Add a reset_done callback for vfio-pci driver Farhan Ali
2025-09-11 18:33 ` [PATCH v3 10/10] vfio: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=07205677-09f0-464b-b31c-0fb5493a1d81@redhat.com \
--to=clg@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=alifm@linux.ibm.com \
--cc=helgaas@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mjrosato@linux.ibm.com \
--cc=schnelle@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox