From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: Tudor Ambarus <tudor.ambarus@linaro.org>
Cc: "Bjorn Helgaas" <bhelgaas@google.com>,
linux-pci@vger.kernel.org,
"Michał Winiarski" <michal.winiarski@intel.com>,
"Igor Mammedov" <imammedo@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
"Mika Westerberg" <mika.westerberg@linux.intel.com>,
"William McVicker" <willmcvicker@google.com>
Subject: Re: [PATCH 24/25] PCI: Perform reset_resource() and build fail list in sync
Date: Tue, 6 May 2025 18:53:56 +0300 (EEST) [thread overview]
Message-ID: <8f281667-b4ef-9385-868f-93893b9d6611@linux.intel.com> (raw)
In-Reply-To: <5f103643-5e1c-43c6-b8fe-9617d3b5447c@linaro.org>
[-- Attachment #1: Type: text/plain, Size: 7379 bytes --]
On Tue, 6 May 2025, Tudor Ambarus wrote:
> Hi!
>
> On 12/16/24 5:56 PM, Ilpo Järvinen wrote:
> > Resetting resource is problematic as it prevent attempting to allocate
> > the resource later, unless something in between restores the resource.
> > Similarly, if fail_head does not contain all resources that were reset,
> > those resource cannot be restored later.
> >
> > The entire reset/restore cycle adds complexity and leaving resources
> > into reseted state causes issues to other code such as for checks done
> > in pci_enable_resources(). Take a small step towards not resetting
> > resources by delaying reset until the end of resource assignment and
> > build failure list (fail_head) in sync with the reset to avoid leaving
> > behind resources that cannot be restored (for the case where the caller
> > provides fail_head in the first place to allow restore somewhere in the
> > callchain, as is not all callers pass non-NULL fail_head).
> >
> > The Expansion ROM check is temporarily left in place while building the
> > failure list until the upcoming change which reworks optional resource
> > handling.
> >
> > Ideally, whole resource reset could be removed but doing that in a big
> > step would make the impact non-tractable due to complexity of all
> > related code.
> >
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>
> I'm hitting the BUG_ON(!list_empty(&add_list)); in
> pci_assign_unassigned_bus_resources() [1] with 6.15-rc5 and the the
> pixel6 downstream pcie driver.
>
> I saw the thread where "a34d74877c66 PCI: Restore assigned resources
> fully after release" fixes things for some other cases, but it's not the
> case here.
>
> Reverting the following patches fixes the problem:
> a34d74877c66 PCI: Restore assigned resources fully after release
> 2499f5348431 PCI: Rework optional resource handling
> 96336ec70264 PCI: Perform reset_resource() and build fail list in sync
So it's confirmed that you needed to revert also this last commit
96336ec70264, not just the rework change?
> In the working case the add_list list is empty throughout the entire
> body of pci_assign_unassigned_bus_resources().
>
> In the failing case __pci_bus_size_bridges() leaves the add_list not
> empty and __pci_bus_assign_resources() does not consume the list, thus
> the BUG_ON. The failing case contains an extra print that's not shown
> when reverting the blamed commits:
> [ 13.951185][ T1101] pcieport 0000:00:00.0: bridge window [mem
> 0x00100000-0x001fffff] to [bus 01-ff] add_size 100000 add_align 100000
>
> I've added some prints trying to describe the code path, see
> https://paste.ofcode.org/Aeu2YBpLztc49ZDw3uUJmd#
>
> Failing case:
> [ 13.944231][ T1101] pci 0000:01:00.0: [144d:a5a5] type 00 class
> 0x000000 PCIe Endpoint
> [ 13.944412][ T1101] pci 0000:01:00.0: BAR 0 [mem
> 0x00000000-0x000fffff 64bit]
> [ 13.944532][ T1101] pci 0000:01:00.0: ROM [mem 0x00000000-0x0000ffff
> pref]
> [ 13.944649][ T1101] pci 0000:01:00.0: enabling Extended Tags
> [ 13.944844][ T1101] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
> [ 13.945015][ T1101] pci 0000:01:00.0: 15.752 Gb/s available PCIe
> bandwidth, limited by 8.0 GT/s PCIe x2 link at 0000:00:00.0 (capable of
> 31.506 Gb/s with 16.0 GT/s PCIe x2 link)
> [ 13.950616][ T1101] __pci_bus_size_bridges: before pbus_size_mem.
> list empty? 1
> [ 13.950784][ T1101] pbus_size_mem: 2. list empty? 1
> [ 13.950886][ T1101] pbus_size_mem: 1 list empty? 0
> [ 13.950982][ T1101] pbus_size_mem: 3. list empty? 0
> [ 13.951082][ T1101] pbus_size_mem: 4. list empty? 0
> [ 13.951185][ T1101] pcieport 0000:00:00.0: bridge window [mem
> 0x00100000-0x001fffff] to [bus 01-ff] add_size 100000 add_align 100000
> [ 13.951448][ T1101] __pci_bus_size_bridges: after pbus_size_mem. list
> empty? 0
> [ 13.951643][ T1101] pci_assign_unassigned_bus_resources: before
> __pci_bus_assign_resources -> list empty? 0
> [ 13.951924][ T1101] pcieport 0000:00:00.0: bridge window [mem
> 0x40000000-0x401fffff]: assigned
> [ 13.952248][ T1101] pci_assign_unassigned_bus_resources: after
> __pci_bus_assign_resources -> list empty? 0
> [ 13.952634][ T1101] ------------[ cut here ]------------
> [ 13.952818][ T1101] kernel BUG at drivers/pci/setup-bus.c:2514!
> [ 13.953045][ T1101] Internal error: Oops - BUG: 00000000f2000800 [#1]
> SMP
> ...
> [ 13.976086][ T1101] Call trace:
> [ 13.976206][ T1101] pci_assign_unassigned_bus_resources+0x110/0x114 (P)
> [ 13.976462][ T1101] pci_rescan_bus+0x28/0x48
> [ 13.976628][ T1101] exynos_pcie_rc_poweron
>
> Working case:
> [ 13.786961][ T1120] pci 0000:01:00.0: [144d:a5a5] type 00 class
> 0x000000 PCIe Endpoint
> [ 13.787136][ T1120] pci 0000:01:00.0: BAR 0 [mem
> 0x00000000-0x000fffff 64bit]
> [ 13.787280][ T1120] pci 0000:01:00.0: ROM [mem 0x00000000-0x0000ffff
> pref]
> [ 13.787541][ T1120] pci 0000:01:00.0: enabling Extended Tags
> [ 13.787808][ T1120] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
> [ 13.787988][ T1120] pci 0000:01:00.0: 15.752 Gb/s available PCIe
> bandwidth, limited by 8.0 GT/s PCIe x2 link at 0000:00:00.0 (capable of
> 31.506 Gb/s with 16.0 GT/s PCIe x2 link)
> [ 13.795279][ T1120] __pci_bus_size_bridges: before pbus_size_mem.
> list empty? 1
> [ 13.795408][ T1120] pbus_size_mem: 2. list empty? 1
> [ 13.795495][ T1120] pbus_size_mem: 2. list empty? 1
> [ 13.795577][ T1120] __pci_bus_size_bridges: after pbus_size_mem. list
> empty? 1
> [ 13.795692][ T1120] pci_assign_unassigned_bus_resources: before
> __pci_bus_assign_resources -> list empty? 1
> [ 13.795849][ T1120] pcieport 0000:00:00.0: bridge window [mem
> 0x40000000-0x401fffff]: assigned
> [ 13.796072][ T1120] pci_assign_unassigned_bus_resources: after
> __pci_bus_assign_resources -> list empty? 1
> [ 13.796662][ T1120] cpif: s5100_poweron_pcie: DBG: MSI sfr not set
> up, yet(s5100_pdev is NULL)
> [ 13.796666][ T1120] cpif: register_pcie: s51xx_pcie_init start
>
>
> Any hints are welcomed. Thanks,
> ta
Hi and thanks for the report.
The interesting part occurs inside reassign_resources_sorted() where most
items are eliminated from realloc_head by the list_del().
My guess is that somehow, the change in 96336ec70264 from !res->flags
to the more complicated check somehow causes this. If the new check
doesn't match and subsequently, no match is found from the head list, the
loop will do continue and not remove the entry from realloc_head.
But it's hard to confirm without knowing what that resources realloc_head
contains. Perhaps if you print the resources that are processed around
that part of the code in reassign_resources_sorted(), comparing the log
from the reverted code with the non-working case might help to understand
what is different there and why. To understand better what is in the head
list, it would be also useful to know from which device the resources were
added into the head list in pdev_sort_resources().
In any case, that BUG_ON() seems a bit drastic action for what might be
just a single resource allocation failure so it should be downgraded to:
if (WARN_ON(!list_empty(&add_list))
free_list(&add_list);
... or WARN_ON_ONCE().
--
i.
next prev parent reply other threads:[~2025-05-06 15:54 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-16 17:56 [PATCH 00/25] PCI: Resource fitting/assignment fixes and cleanups Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 01/25] PCI: Remove add_align overwrite unrelated to size0 Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 02/25] PCI: size0 is unrelated to add_align Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 03/25] PCI: Simplify size1 assignment logic Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 04/25] PCI: Optional bridge window size too may need relaxing Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 05/25] PCI: Fix old_size lower bound in calculate_iosize() too Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 06/25] PCI: Use SZ_* instead of literals in setup-bus.c Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 07/25] PCI: resource_set_range/size() conversions Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 08/25] PCI: Add a helper to identify IOV resources Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 09/25] PCI: Check resource_size() separately Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 10/25] PCI: Add pci_resource_num() helper Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 11/25] PCI: Add dev & res local variables to resource assignment funcs Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 12/25] PCI: Converge return paths in __assign_resources_sorted() Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 13/25] PCI: Refactor pdev_sort_resources() & __dev_sort_resources() Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 14/25] PCI: Use while loop and break instead of gotos Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 15/25] PCI: Rename retval to ret Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 16/25] PCI: Consolidate assignment loop next round preparation Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 17/25] PCI: Remove wrong comment from pci_reassign_resource() Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 18/25] PCI: Add restore_dev_resource() Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 19/25] PCI: Extend enable to check for any optional resource Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 20/25] PCI: Always have realloc_head in __assign_resources_sorted() Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 21/25] PCI: Indicate optional resource assignment failures Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 22/25] PCI: Add debug print when releasing resources before retry Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 23/25] PCI: Use res->parent to check is resource is assigned Ilpo Järvinen
2024-12-16 17:56 ` [PATCH 24/25] PCI: Perform reset_resource() and build fail list in sync Ilpo Järvinen
2025-04-01 2:35 ` Guenter Roeck
2025-04-01 10:18 ` Ilpo Järvinen
2025-04-01 12:07 ` Ilpo Järvinen
2025-04-01 14:15 ` Guenter Roeck
2025-04-01 17:38 ` Ilpo Järvinen
2025-04-11 19:37 ` Ondřej Jirman
2025-04-14 9:52 ` Ilpo Järvinen
2025-04-14 12:19 ` Ondřej Jirman
2025-04-14 13:15 ` Ilpo Järvinen
2025-04-14 13:43 ` Ondřej Jirman
2025-04-14 13:52 ` Ilpo Järvinen
2025-04-01 13:28 ` Guenter Roeck
2025-05-06 15:03 ` Tudor Ambarus
2025-05-06 15:53 ` Ilpo Järvinen [this message]
2025-05-28 11:22 ` Tudor Ambarus
2025-05-28 11:39 ` Tudor Ambarus
2025-05-28 13:09 ` Tudor Ambarus
2025-05-30 6:55 ` Ilpo Järvinen
2025-05-30 6:38 ` Ilpo Järvinen
2025-05-30 14:48 ` Ilpo Järvinen
2025-06-02 14:40 ` Tudor Ambarus
2025-06-02 15:08 ` Ilpo Järvinen
2025-06-02 18:42 ` Tudor Ambarus
2025-06-03 8:13 ` Ilpo Järvinen
2025-06-03 10:36 ` Tudor Ambarus
2025-06-03 10:48 ` Tudor Ambarus
2025-06-03 11:43 ` Tudor Ambarus
2025-06-03 14:23 ` Ilpo Järvinen
2025-06-03 14:43 ` Ilpo Järvinen
2025-06-03 14:13 ` Ilpo Järvinen
2025-06-03 15:25 ` Tudor Ambarus
2025-06-03 17:03 ` Ilpo Järvinen
2025-06-03 17:09 ` Ilpo Järvinen
2025-06-02 12:32 ` Tudor Ambarus
2025-06-19 0:30 ` D Scott Phillips
2025-06-24 12:48 ` Ilpo Järvinen
2025-06-25 17:45 ` Ilpo Järvinen
2025-06-25 20:33 ` D Scott Phillips
2025-06-26 9:22 ` Ilpo Järvinen
2025-06-26 14:53 ` D Scott Phillips
2024-12-16 17:56 ` [PATCH 25/25] PCI: Rework optional resource handling Ilpo Järvinen
2025-02-13 21:46 ` [PATCH 00/25] PCI: Resource fitting/assignment fixes and cleanups Bjorn Helgaas
2025-02-14 8:18 ` Xiaochun XC17 Li | 李小春 Xavier
2025-02-14 11:53 ` Ilpo Järvinen
2025-02-14 21:28 ` Bjorn Helgaas
2025-02-14 9:59 ` Xiaochun Lee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8f281667-b4ef-9385-868f-93893b9d6611@linux.intel.com \
--to=ilpo.jarvinen@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=imammedo@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=michal.winiarski@intel.com \
--cc=mika.westerberg@linux.intel.com \
--cc=tudor.ambarus@linaro.org \
--cc=willmcvicker@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).