All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Julien Grall <julien@xen.org>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Anthony PERARD <anthony.perard@vates.tech>,
	Michal Orzel <michal.orzel@amd.com>,
	Stewart Hildebrand <stewart.hildebrand@amd.com>
Subject: Re: [PATCH v2 2/4] PCI: determine whether a device has extended config space
Date: Wed, 28 Jan 2026 18:49:20 +0100	[thread overview]
Message-ID: <aXpMIOuRZvX8IyFK@Mac.lan> (raw)
In-Reply-To: <edb5eeb2-2cb2-4614-a042-7788fbb345c7@suse.com>

On Mon, Jan 19, 2026 at 03:46:55PM +0100, Jan Beulich wrote:
> Legacy PCI devices don't have any extended config space. Reading any part
> thereof may return all ones or other arbitrary data, e.g. in some cases
> base config space contents repeatedly.
> 
> Logic follows Linux 6.19-rc's pci_cfg_space_size(), albeit leveraging our
> determination of device type; in particular some comments are taken
> verbatim from there.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Should we skip re-evaluation when pci_mmcfg_arch_enable() takes its early
> exit path?

Possibly - we expect no change in that case.  However it would need
to propagate some extra information into the callers.  I could see
that as a followup optimization.

> The warning near the bottom of pci_check_extcfg() may be issued multiple
> times for a single device now. Should we try to avoid that?

Yeah, I've made some comments about that below.  Not sure how common
those broken bridges are, the comment just mentions two specific
models.  Adding yet another boolean to track that is cumbersome, and
what we would like to do is mark the bridge as broken, instead of
every device behind it.

> Note that no vPCI adjustments are done here, but they're going to be
> needed: Whatever requires extended capabilities will need re-
> evaluating / newly establishing / tearing down in case an invocation of
> PHYSDEVOP_pci_mmcfg_reserved alters global state.

Hm, you probably want to do something similar to re-scanning the
capability list, but avoid tearing down and re-setting the vPCI header
logic to prevent unneeded p2m manipulations.  We have no easy way to
preempt this rescanning from the context of a
PHYSDEVOP_pci_mmcfg_reserved call.

> Linux also has CONFIG_PCI_QUIRKS, allowing to compile out the slightly
> risky code (as reads may in principle have side effects). Should we gain
> such, too?

I would be fine with just a command line to disable the newly added
behavior in case it causes issues.

> ---
> v2: Major re-work to also check upon PHYSDEVOP_pci_mmcfg_reserved
>     invocation.
> 
> --- a/xen/arch/x86/physdev.c
> +++ b/xen/arch/x86/physdev.c
> @@ -22,6 +22,8 @@ int physdev_map_pirq(struct domain *d, i
>                       struct msi_info *msi);
>  int physdev_unmap_pirq(struct domain *d, int pirq);
>  
> +int cf_check physdev_check_pci_extcfg(struct pci_dev *pdev, void *arg);

I'm not sure why you need the forward declaration here, the function
(in this patch) is just used after it's already defined.

> +
>  #include "x86_64/mmconfig.h"
>  
>  #ifndef COMPAT
> @@ -160,6 +162,17 @@ int physdev_unmap_pirq(struct domain *d,
>  
>      return ret;
>  }
> +
> +int cf_check physdev_check_pci_extcfg(struct pci_dev *pdev, void *arg)

You can make this static I think?

> +{
> +    const struct physdev_pci_mmcfg_reserved *info = arg;
> +
> +    ASSERT(pdev->seg == info->segment);
> +    if ( pdev->bus >= info->start_bus && pdev->bus <= info->end_bus )
> +        pci_check_extcfg(pdev);
> +
> +    return 0;
> +}
>  #endif /* COMPAT */
>  
>  ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> @@ -511,6 +524,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>  
>          ret = pci_mmcfg_reserved(info.address, info.segment,
>                                   info.start_bus, info.end_bus, info.flags);
> +
> +        if ( !ret )
> +            ret = pci_segment_iterate(info.segment, physdev_check_pci_extcfg,
> +                                      &info);
> +
>          if ( !ret && has_vpci(currd) && (info.flags & XEN_PCI_MMCFG_RESERVED) )
>          {
>              /*
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -422,6 +422,9 @@ static struct pci_dev *alloc_pdev(struct
>      }
>  
>      apply_quirks(pdev);
> +
> +    pci_check_extcfg(pdev);
> +
>      check_pdev(pdev);
>  
>      return pdev;
> @@ -718,6 +721,11 @@ int pci_add_device(u16 seg, u8 bus, u8 d
>  
>                  list_add(&pdev->vf_list, &pf_pdev->vf_list);
>              }
> +
> +            if ( !pdev->ext_cfg )
> +                printk(XENLOG_WARNING
> +                       "%pp: VF without extended config space?\n",
> +                       &pdev->sbdf);

You possibly also want to check that the PF (pf_pdev in this context I
think) also has ext_cfg == true.

>          }
>      }
>  
> @@ -1041,6 +1049,75 @@ enum pdev_type pdev_type(u16 seg, u8 bus
>      return pos ? DEV_TYPE_PCIe_ENDPOINT : DEV_TYPE_PCI;
>  }
>  
> +void pci_check_extcfg(struct pci_dev *pdev)
> +{
> +    unsigned int pos, sig;
> +
> +    pdev->ext_cfg = false;

I think I would prefer if the ext_cfg field is only modified once Xen
know the correct value to put there.  It would also be nice to detect
cases where the device has pdev->ext_cfg == true but a new scan makes
it switch to false.  Which would signal something has likely gone very
wrong, and we should print a warning.

> +
> +    switch ( pdev->type )
> +    {
> +    case DEV_TYPE_PCIe_ENDPOINT:
> +    case DEV_TYPE_PCIe_BRIDGE:
> +    case DEV_TYPE_PCI_HOST_BRIDGE:
> +    case DEV_TYPE_PCIe2PCI_BRIDGE:
> +    case DEV_TYPE_PCI2PCIe_BRIDGE:
> +        break;
> +
> +    case DEV_TYPE_LEGACY_PCI_BRIDGE:
> +    case DEV_TYPE_PCI:
> +        pos = pci_find_cap_offset(pdev->sbdf, PCI_CAP_ID_PCIX);
> +        if ( !pos ||
> +             !(pci_conf_read32(pdev->sbdf, pos + PCI_X_STATUS) &
> +               (PCI_X_STATUS_266MHZ | PCI_X_STATUS_533MHZ)) )
> +            return;
> +        break;
> +
> +    default:
> +        return;
> +    }
> +
> +    /*
> +     * Regular PCI devices have 256 bytes, but PCI-X 2 and PCI Express devices
> +     * have 4096 bytes.  Even if the device is capable, that doesn't mean we
> +     * can access it.  Maybe we don't have a way to generate extended config
> +     * space accesses, or the device is behind a reverse Express bridge.  So
> +     * we try reading the dword at PCI_CFG_SPACE_SIZE which must either be 0
> +     * or a valid extended capability header.
> +     */
> +    if ( pci_conf_read32(pdev->sbdf, PCI_CFG_SPACE_SIZE) == 0xffffffffU )
> +        return;
> +
> +    /*
> +     * PCI Express to PCI/PCI-X Bridge Specification, rev 1.0, 4.1.4 says that
> +     * when forwarding a type1 configuration request the bridge must check
> +     * that the extended register address field is zero.  The bridge is not
> +     * permitted to forward the transactions and must handle it as an
> +     * Unsupported Request.  Some bridges do not follow this rule and simply
> +     * drop the extended register bits, resulting in the standard config space
> +     * being aliased, every 256 bytes across the entire configuration space.
> +     * Test for this condition by comparing the first dword of each potential
> +     * alias to the vendor/device ID.
> +     * Known offenders:
> +     *   ASM1083/1085 PCIe-to-PCI Reversible Bridge (1b21:1080, rev 01 & 03)
> +     *   AMD/ATI SBx00 PCI to PCI Bridge (1002:4384, rev 40)
> +     */
> +    sig = pci_conf_read32(pdev->sbdf, PCI_VENDOR_ID);
> +    for ( pos = PCI_CFG_SPACE_SIZE;
> +          pos < PCI_CFG_SPACE_EXP_SIZE; pos += PCI_CFG_SPACE_SIZE )
> +        if ( pci_conf_read32(pdev->sbdf, pos) != sig )
> +            break;
> +
> +    if ( pos >= PCI_CFG_SPACE_EXP_SIZE )
> +    {
> +        printk(XENLOG_WARNING "%pp: extended config space aliases base one\n",
> +               &pdev->sbdf);

Hm, I think this shouldn't be very common as it seems limited to a
short list of bridges.  However every device under such bridge would
be affected and repeatedly print the message.  I wonder whether we
should make this XENLOG_DEBUG instead, there isn't much the user can
do to fix it.  More a rant than a request though.

Thanks, Roger.


  parent reply	other threads:[~2026-01-28 17:50 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-19 14:45 [PATCH v2 0/4] (v)PCI: extended capability handling Jan Beulich
2026-01-19 14:46 ` [PATCH v2 1/4] PCI: handle PCI->PCIe bridges as well in alloc_pdev() Jan Beulich
2026-01-19 22:02   ` Stewart Hildebrand
2026-01-28 14:27   ` Roger Pau Monné
2026-01-19 14:46 ` [PATCH v2 2/4] PCI: determine whether a device has extended config space Jan Beulich
2026-01-23 22:24   ` Stewart Hildebrand
2026-01-26  8:58     ` Jan Beulich
2026-01-27  4:13       ` Stewart Hildebrand
2026-01-27  9:01         ` Jan Beulich
2026-01-28 17:49   ` Roger Pau Monné [this message]
2026-01-29  8:15     ` Jan Beulich
2026-01-29 10:33       ` Roger Pau Monné
2026-01-29 11:43         ` Jan Beulich
2026-01-19 14:47 ` [PATCH v2 3/4] PCI: don't look for ext-caps when there's no extended cfg space Jan Beulich
2026-01-26 15:51   ` Stewart Hildebrand
2026-01-29  8:53   ` Roger Pau Monné
2026-01-19 14:48 ` [PATCH v2 4/4] vPCI/DomU: really no ext-caps without extended config space Jan Beulich
2026-01-26 15:52   ` Stewart Hildebrand
2026-01-29  9:19   ` Roger Pau Monné
2026-01-29 10:08     ` Jan Beulich
2026-01-29 10:51       ` Roger Pau Monné
2026-01-29 11:36         ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXpMIOuRZvX8IyFK@Mac.lan \
    --to=roger.pau@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=anthony.perard@vates.tech \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=michal.orzel@amd.com \
    --cc=sstabellini@kernel.org \
    --cc=stewart.hildebrand@amd.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.