public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Shawn Jin <shawn.jin@asteralabs.com>
Cc: "Bjorn Helgaas" <bhelgaas@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
Subject: Re: [BUG] PCIe bridge resource allocation creates invalid limit addresses after Secondary Bus Reset recovery
Date: Wed, 11 Mar 2026 18:19:43 -0500	[thread overview]
Message-ID: <20260311231943.GA1086074@bhelgaas> (raw)
In-Reply-To: <LV8P221MB1472A24B9975F7C8E8D6BF929947A@LV8P221MB1472.NAMP221.PROD.OUTLOOK.COM>

[+cc Ilpo]

On Wed, Mar 11, 2026 at 10:00:39PM +0000, Shawn Jin wrote:
> Hello,
> 
> I'm reporting a potential critical bug in the Linux kernel's PCIe
> resource allocation code that creates invalid bridge window limit
> addresses during hotplug re-enumeration after Secondary Bus Reset
> (SBR) recovery.

Thanks for the report and the repro and debugging information!

> ## AFFECTED KERNEL VERSIONS
> - Confirmed: 5.15.0, 6.8.0 (Ubuntu 6.8.0-88-generic, 6.8.0-90-generic)
> - Likely affected: All recent kernels including 6.19

Do you know of any kernels that are *not* affected?  If you do, we
could bisect.

> ## HARDWARE CONFIGURATION
> Intel Ice Lake server with PCIe Gen5 switches and endpoints:
> 
> Topology 1:
>   Root Port 96:01.0 → 98:00.0 → 99:01.0 → 9b:00.0 (NVIDIA L20 GPU)
> 
> Kernel parameter: pci=realloc=on
> 
> ## PROBLEM DESCRIPTION
> 
> After performing Secondary Bus Reset on a PCIe switch port and
> clearing the reset bit, the kernel re-enumerates devices and assigns
> bridge window resources. However, the assigned memory window limit
> addresses are INVALID according to PCIe specification.
> 
> ### Evidence from dmesg (Topology 1):
> 
> **Before SBR (correct allocation):**
> ```
> [    6.636493] pci 0000:98:00.0: PCI bridge to [bus 99-9c]
> [    6.636539] pci 0000:98:00.0:   bridge window [mem 0xe9600000-0xe96fffff]
> [    6.636645] pci 0000:98:00.0:   bridge window [mem 0x13b000000000-0x13b7ffffffff 64bit pref]
> 
> [    6.644429] pci 0000:99:01.0: PCI bridge to [bus 9b]
> [    6.644476] pci 0000:99:01.0:   bridge window [mem 0xe9600000-0xe96fffff]
> [    6.644656] pci 0000:99:01.0:   bridge window [mem 0x13b000000000-0x13b7ffffffff 64bit pref]
> 
> [    6.654203] pci 0000:9b:00.0: [1e3e:0002] type 00 class 0x120000 PCIe Endpoint
> [    6.654652] pci 0000:9b:00.0: BAR 0 [mem 0x13b000000000-0x13b7ffffffff 64bit pref]
> [    6.654666] pci 0000:9b:00.0: BAR 2 [mem 0xe9600000-0xe963ffff]
> ```
> 
> **After SBR clear (INVALID allocation):**
> ```
> [  656.644184] pci 0000:98:00.0: bridge window [mem 0x13b000000000-0x13b7ffffffff 64bit pref]: assigned
> [  656.644186] pci 0000:98:00.0: bridge window [mem 0xe9600000-0xe96fffff]: assigned
> [  656.644188] pci 0000:99:01.0: bridge window [mem 0x13b000000000-0x13b7fffffffe 64bit pref]: assigned
> [  656.644189] pci 0000:99:01.0: bridge window [mem 0xe9600000-0xe96ffffe]: assigned
> 
> [  656.644830] pci 0000:9b:00.0: BAR 0 [mem size 0x800000000 64bit pref]: can't assign; no space
> [  656.644831] pci 0000:9b:00.0: BAR 0 [mem size 0x800000000 64bit pref]: failed to assign
> // BAR2 can still be assigned because the size is only 256KB, while the min window in the bridge is 1MB
> [  656.644832] pci 0000:9b:00.0: BAR 2 [mem 0xe9600000-0xe963ffff]: assigned
> 
> ```
> 
> ### Invalid Addresses Created by Kernel:
> - `0x13b7ffffffff` (ends in 0xFFFE - **2 bytes short**)
> - `0xe96ffffe`  (ends in 0xFFFE - **2 bytes short**)
> 
> ## IMPACT
> 
> 1. **Device initialization failure**: Endpoints cannot allocate required BARs
>    ```
> [  656.644830] pci 0000:9b:00.0: BAR 0 [mem size 0x800000000 64bit pref]: can't assign; no space
> [  656.644831] pci 0000:9b:00.0: BAR 0 [mem size 0x800000000 64bit pref]: failed to assign
>    ```
> 
> 2. **Consistent across multiple hierarchies**: Affects different PCIe topologies independently
> 
> ## REPRODUCTION
> 
> The attached script test_rc_sbr.sh.txt issues a SBR to the root port.
> 
> ## SUSPECTED ROOT CAUSE
> 
> The bug appears to be in `drivers/pci/setup-bus.c`, likely in:
> - `pci_bus_distribute_available_resources()`
> - `adjust_bridge_window()`
> - `pci_assign_unassigned_bridge_resources()`
> 
> The resource end address calculation appears to perform multiple subtractions:
> 1. Initial calculation: `res->end = res->start + size - 1` (correct)
> 2. During redistribution: Another subtraction occurs, creating `res->end = ... - 2`
> 
> ## WORKAROUND ATTEMPTS
> 
> - `pci=realloc=on`: Does NOT fix the issue
> - Manual remove/rescan from root: Does NOT fix the issue
> - Initial boot allocation: Works correctly (bug only occurs during hotplug re-enumeration)
> 
> ## REQUEST
> 
> I want to track how the bridge windows are allocated. Is there a way
> to enable additional kernel messages to show the path? Please
> investigate if this is a real kernel bug.
> 
> Thank you,
> Shawn

> #!/bin/bash
> 
> # Function to display usage
> usage() {
>     echo "Usage: $0 -rp <ROOT_PORT_BDF> -usp <USP_BDF>"
>     echo "Example: $0 -rp c6:01.0 -usp c7:00.0"
>     exit 1
> }
> 
> # Initialize variables
> ROOT_PORT_BDF=""
> USP_BDF=""
> 
> # Parse command-line arguments
> while [[ $# -gt 0 ]]; do
>     case $1 in
>         -rp)
>             ROOT_PORT_BDF="$2"
>             shift 2
>             ;;
>         -usp)
>             USP_BDF="$2"
>             shift 2
>             ;;
>         -h|--help)
>             usage
>             ;;
>         *)
>             echo "Unknown option: $1"
>             usage
>             ;;
>     esac
> done
> 
> # Validate that both arguments are provided
> if [ -z "$ROOT_PORT_BDF" ] || [ -z "$USP_BDF" ]; then
>     echo "Error: Both -rp and -usp arguments are required"
>     usage
> fi
> 
> echo "Root Port BDF: $ROOT_PORT_BDF"
> echo "USP BDF: $USP_BDF"
> echo ""
> 
> # Remove the USP device
> echo 1 | sudo tee /sys/bus/pci/devices/0000:${USP_BDF}/remove
> 
> # Trigger SBR via Bridge Control register
> BRIDGE_CTL=$(sudo setpci -s ${ROOT_PORT_BDF} 0x3E.w)
> BRIDGE_CTL_RESET=$(printf "0x%04x" $((0x$BRIDGE_CTL | 0x0040)))
> 
> echo "Asserting Secondary Bus Reset..."
> sudo setpci -s ${ROOT_PORT_BDF} 0x3E.w=$BRIDGE_CTL_RESET
> sleep 1
> 
> echo "De-asserting Secondary Bus Reset..."
> sudo setpci -s ${ROOT_PORT_BDF} 0x3E.w=$BRIDGE_CTL
> sleep 2
> 
> # Rescan PCI bus
> echo 1 | sudo tee /sys/bus/pci/rescan


  reply	other threads:[~2026-03-11 23:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-11 22:00 [BUG] PCIe bridge resource allocation creates invalid limit addresses after Secondary Bus Reset recovery Shawn Jin
2026-03-11 23:19 ` Bjorn Helgaas [this message]
2026-03-12  0:02   ` Shawn Jin
2026-03-12  1:03     ` Shawn Jin
2026-03-12 13:24       ` Ilpo Järvinen
2026-03-12 17:14         ` Shawn Jin
2026-03-12 17:48           ` Ilpo Järvinen
2026-03-13 16:48             ` Shawn Jin
2026-03-16 10:28               ` Ilpo Järvinen
2026-03-16 17:26                 ` Shawn Jin
2026-03-12 17:34     ` Bjorn Helgaas
2026-03-12 17:40       ` Shawn Jin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260311231943.GA1086074@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=shawn.jin@asteralabs.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox