linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Global lock for PCI configuration accesses
@ 2015-09-09 15:11 Thierry Reding
  2015-09-09 17:27 ` Bjorn Helgaas
  0 siblings, 1 reply; 3+ messages in thread
From: Thierry Reding @ 2015-09-09 15:11 UTC (permalink / raw)
  To: linux-pci; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2010 bytes --]

Hi,

There's currently an issue with PCI configuration space accesses on
Tegra. The PCI host controller driver's ->map_bus() implementation
remaps I/O memory on-demand to avoid potentially wasting 256 MiB of
virtual address space. The reason why this is done is because the
mapping isn't compatible with ECAM and the extended register number
is encoded in the uppermost 4 bits. This means that if we want to
address the configuration space for a single bus we already need to
map 256 MiB of memory, even if only 1 MiB is really used.

tegra_pcie_bus_alloc() is therefore used to stitch together a 1 MiB
block of virtual addresses per bus made up of 16 64 KiB chunks each
so that only what's really needed is mapped.

That function gets called the first time a PCI configuration access
is performed on a bus. The code calls functions that may sleep, and
that causes problems because the PCI configuration space accessors
are called with the global pci_lock held. This works in practice
but it blows up when lockdep is enabled.

I remember coding up a fix for this using the ARM/PCI ->add_bus()
callbacks at one point and then forgetting about it. When I wanted
to revive that patch a little while ago I noticed that ->add_bus()
is now gone.

What I'm asking myself now is how to fix this. I suppose it'd be
possible to bring back ->add_bus(), though I suspect there were good
reasons to remove it (portability?). Another possible fix would be
to get rid of the spinlock protecting these accesses. It seems to me
like it's not really necessary in the majority of cases. For drivers
that do a simple readl() or writel() on some memory-mapped I/O the
lock doesn't protect anything.

Then again, there are a lot of pci_ops implementations in the tree,
and simply removing the global lock seems like it'd have a good chance
of breaking things for somebody.

So short of auditing all pci_ops implementations and pushing the lock
down into drivers, does anyone have any good ideas on how to fix this?

Thanks,
Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Global lock for PCI configuration accesses
  2015-09-09 15:11 Global lock for PCI configuration accesses Thierry Reding
@ 2015-09-09 17:27 ` Bjorn Helgaas
  2015-09-10 13:00   ` Thierry Reding
  0 siblings, 1 reply; 3+ messages in thread
From: Bjorn Helgaas @ 2015-09-09 17:27 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org

Hi Thierry,

On Wed, Sep 9, 2015 at 10:11 AM, Thierry Reding
<thierry.reding@gmail.com> wrote:
> Hi,
>
> There's currently an issue with PCI configuration space accesses on
> Tegra. The PCI host controller driver's ->map_bus() implementation
> remaps I/O memory on-demand to avoid potentially wasting 256 MiB of
> virtual address space. The reason why this is done is because the
> mapping isn't compatible with ECAM and the extended register number
> is encoded in the uppermost 4 bits. This means that if we want to
> address the configuration space for a single bus we already need to
> map 256 MiB of memory, even if only 1 MiB is really used.
>
> tegra_pcie_bus_alloc() is therefore used to stitch together a 1 MiB
> block of virtual addresses per bus made up of 16 64 KiB chunks each
> so that only what's really needed is mapped.
>
> That function gets called the first time a PCI configuration access
> is performed on a bus. The code calls functions that may sleep, and
> that causes problems because the PCI configuration space accessors
> are called with the global pci_lock held. This works in practice
> but it blows up when lockdep is enabled.
>
> I remember coding up a fix for this using the ARM/PCI ->add_bus()
> callbacks at one point and then forgetting about it. When I wanted
> to revive that patch a little while ago I noticed that ->add_bus()
> is now gone.

Removed by 6cf00af0ae15 ("ARM/PCI: Remove unused pcibios_add_bus() and
pcibios_remove_bus()"), I think.  That only removed the ARM
implementation; the hook itself is still called, but on every arch
except x86 and ia64, we use the default no-op implementation.  You
could add it back, I guess.  It was removed because the MSI-related
stuff that used to be in the ARM version is now done in a more generic
way (see 49dcc01a9ff2 ("ARM/PCI: Save MSI controller in
pci_sys_data")).

> What I'm asking myself now is how to fix this. I suppose it'd be
> possible to bring back ->add_bus(), though I suspect there were good
> reasons to remove it (portability?).

> Another possible fix would be
> to get rid of the spinlock protecting these accesses. It seems to me
> like it's not really necessary in the majority of cases. For drivers
> that do a simple readl() or writel() on some memory-mapped I/O the
> lock doesn't protect anything.

I've wondered about removing pci_lock, too.  It seems like it could be
removed in principle, but it would be a lot of work to audit
everything.  Probably more work than you want to do just to fix Tegra
:)

> Then again, there are a lot of pci_ops implementations in the tree,
> and simply removing the global lock seems like it'd have a good chance
> of breaking things for somebody.
>
> So short of auditing all pci_ops implementations and pushing the lock
> down into drivers, does anyone have any good ideas on how to fix this?

The 32-bit version of pci_mmcfg_read() uses fixmap to map the page it
needs on-demand.  Could you do something similar, i.e., allocate the
virtual space (which I assume is the part that might sleep), then
redirect the virt-to-phys mapping while holding the lock?

Bjorn

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Global lock for PCI configuration accesses
  2015-09-09 17:27 ` Bjorn Helgaas
@ 2015-09-10 13:00   ` Thierry Reding
  0 siblings, 0 replies; 3+ messages in thread
From: Thierry Reding @ 2015-09-10 13:00 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 5090 bytes --]

On Wed, Sep 09, 2015 at 12:27:38PM -0500, Bjorn Helgaas wrote:
> Hi Thierry,
> 
> On Wed, Sep 9, 2015 at 10:11 AM, Thierry Reding
> <thierry.reding@gmail.com> wrote:
> > Hi,
> >
> > There's currently an issue with PCI configuration space accesses on
> > Tegra. The PCI host controller driver's ->map_bus() implementation
> > remaps I/O memory on-demand to avoid potentially wasting 256 MiB of
> > virtual address space. The reason why this is done is because the
> > mapping isn't compatible with ECAM and the extended register number
> > is encoded in the uppermost 4 bits. This means that if we want to
> > address the configuration space for a single bus we already need to
> > map 256 MiB of memory, even if only 1 MiB is really used.
> >
> > tegra_pcie_bus_alloc() is therefore used to stitch together a 1 MiB
> > block of virtual addresses per bus made up of 16 64 KiB chunks each
> > so that only what's really needed is mapped.
> >
> > That function gets called the first time a PCI configuration access
> > is performed on a bus. The code calls functions that may sleep, and
> > that causes problems because the PCI configuration space accessors
> > are called with the global pci_lock held. This works in practice
> > but it blows up when lockdep is enabled.
> >
> > I remember coding up a fix for this using the ARM/PCI ->add_bus()
> > callbacks at one point and then forgetting about it. When I wanted
> > to revive that patch a little while ago I noticed that ->add_bus()
> > is now gone.
> 
> Removed by 6cf00af0ae15 ("ARM/PCI: Remove unused pcibios_add_bus() and
> pcibios_remove_bus()"), I think.  That only removed the ARM
> implementation; the hook itself is still called, but on every arch
> except x86 and ia64, we use the default no-op implementation.  You
> could add it back, I guess.  It was removed because the MSI-related
> stuff that used to be in the ARM version is now done in a more generic
> way (see 49dcc01a9ff2 ("ARM/PCI: Save MSI controller in
> pci_sys_data")).
> 
> > What I'm asking myself now is how to fix this. I suppose it'd be
> > possible to bring back ->add_bus(), though I suspect there were good
> > reasons to remove it (portability?).
> 
> > Another possible fix would be
> > to get rid of the spinlock protecting these accesses. It seems to me
> > like it's not really necessary in the majority of cases. For drivers
> > that do a simple readl() or writel() on some memory-mapped I/O the
> > lock doesn't protect anything.
> 
> I've wondered about removing pci_lock, too.  It seems like it could be
> removed in principle, but it would be a lot of work to audit
> everything.  Probably more work than you want to do just to fix Tegra
> :)

Thinking more about this, I'm not sure removing the lock would improve
the situation much. It seems like everything assumes that accesses to
PCI configuration space can be done in interrupt context, so what we do
on Tegra wouldn't be valid anyway.

I'm not sure if that's a reasonable assumption though. If it isn't then
removing the lock (or pushing it down into drivers as necessary) might
be the right thing to do.

> > Then again, there are a lot of pci_ops implementations in the tree,
> > and simply removing the global lock seems like it'd have a good chance
> > of breaking things for somebody.
> >
> > So short of auditing all pci_ops implementations and pushing the lock
> > down into drivers, does anyone have any good ideas on how to fix this?
> 
> The 32-bit version of pci_mmcfg_read() uses fixmap to map the page it
> needs on-demand.  Could you do something similar, i.e., allocate the
> virtual space (which I assume is the part that might sleep), then
> redirect the virt-to-phys mapping while holding the lock?

I hadn't known about fixmap yet, but unfortunately I don't think that
would work in this case. In particular we'd need to map 256 MiB in order
to address the configuration space of a single device. Well, if we do
the mapping directly in the configuration space accessor we could map
a single 64 KiB chunk at a time, depending upon which register is being
accesses. That's still quite a lot of code that would be running for
every single configuration space access.

Moreover this is really only a problem that happens during enumeration.
After enumeration completes the configuration space address ranges for
each bus will have been allocated and subsequent accesses would no
longer execute the slow paths.

Given that and the assumption that PCI configuration space might be
accessed from interrupt context it seems like a more suitable option to
add back in some way of calling into the driver everytime a new bus gets
created. I guess we could even reuse ->map_bus(), but it might be better
to add a ->add_bus() to struct pci_ops to more clearly separate the slow
from the fast path.

Any objections to adding ->add_bus() to struct pci_ops? That does have
the benefit of being more portable than adding back the pcibios hooks
that were previously removed.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-09-10 13:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-09 15:11 Global lock for PCI configuration accesses Thierry Reding
2015-09-09 17:27 ` Bjorn Helgaas
2015-09-10 13:00   ` Thierry Reding

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).