From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: Guo Chao <yan@linux.vnet.ibm.com>,
Yinghai Lu <yinghai@kernel.org>,
Wei Yang <weiyang@linux.vnet.ibm.com>,
Gavin Shan <shangw@linux.vnet.ibm.com>,
Jack Morgenstein <jackm@dev.mellanox.co.il>,
Amir Vadai <amirv@mellanox.com>,
Or Gerlitz <ogerlitz@mellanox.com>,
Eugenia Emantayev <eugenia@mellanox.com>,
talal@mellanox.com,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>
Subject: Re: [PATCH v7] PCI: Try best to allocate pref mmio 64bit above 4g
Date: Wed, 16 Apr 2014 16:30:16 +1000 [thread overview]
Message-ID: <1397629816.32730.8.camel@pasglop> (raw)
In-Reply-To: <CAErSpo63LKffb4DH+cSfBnvtAoPy06u22e7CUhRuKr-PHNWZMw@mail.gmail.com>
On Tue, 2014-04-15 at 18:09 -0600, Bjorn Helgaas wrote:
>
> Thanks for the example. Please open a bug report at
> http://bugzilla.kernel.org and attach the complete dmesg logs before
> and after Yinghai's patch.
>
> Having the complete logs helps me answer questions myself without
> having to bother you, and it also helps me figure out whether we can
> improve our logging to make it easier to diagnose problems like this.
Unfortunately, for a *little while* longer (hint !) we can't publish
a complete log from a Power8 machine, but we should be able to include
everything remotely related to PCI.
> > | pci 0003:05:00.0: reg 0x10: [mem 0x3d05801000000-0x3d058010fffff 64bit]
> > | pci 0003:05:00.0: reg 0x18: [mem 0x3d05010000000-0x3d05017ffffff 64bit pref]
> > | pci 0003:05:00.0: reg 0x30: [mem 0x00000000-0x000fffff pref]
> > | pci 0003:05:00.0: reg 0x134: [mem 0x3d05018000000-0x3d0501fffffff 64bit pref]
> >
> > This is printed at enumeration phase. This device has a SRIOV BAR with
> > size of 0x7ffffff (128M). That's the size of a signle VF BAR. The device
> > supports 63 VFs so we need near 8G space in total. Apparanlty we need
> > exploit 64-bit space.
>
> Yes. Do we print a hint anywhere about how many VFs there are? In
> other words, can you deduce the number "63" from the dmesg, or do you
> have to figure that out some other way? It'd be nice if that
> information were somewhere in dmesg.
>
> > | PCI host bridge to bus 0003:00
> > | pci_bus 0003:00: root bus resource [mem 0x3d05800000000-0x3d0587ffeffff] (bus address [0x80000000-0xfffeffff])
> > | pci_bus 0003:00: root bus resource [mem 0x3d05008000000-0x3d057ffffffff 64bit pref]
> >
> > And we do have a huge (32G) 64-bit prefetchable window supply. We expect
> > everything to work fine, but:
> >
> > | pci 0003:00:00.0: BAR 15: can't assign mem pref (size 0x206000000)
> > | pci 0003:00:00.0: BAR 14: assigned [mem 0x3d05800000000-0x3d05802ffffff]
> > | pci 0003:00:00.0: BAR 13: can't assign io (size 0x4000)
> >
> > It went wrong at the beginning. Note the error message never considers
> > 64-bit or not, but BAR 15 here has it MEM_64 flag cleared.
>
> BAR 15 is a bridge window. I think its resource flags should reflect
> the capability of the *window*, even if we disable the window or we
> happen to assign addresses that are under 4GB. So I think it's wrong
> that we clear the MEM_64 flag in pbus_size_mem() and the IO flag in
> pbus_size_io().
>
> > It first
> > tried to find a 32-bit prefetchable window, but we only supply a 64-bit one.
> > So it fall back to (32-bit) non-prefetchable window, but there is no enough
> > room there. At last it went into complicated steps (not show here) of
> > allocating requested resource first, then try best for the optional ones, etc..
> >
> > Why is BAR 15 (prefetchable) 32 bit instead of 64? Because PCI core favours
> > 32-bit prefetchable BARs and we have some. This is one of them:
> >
> > | pci 0003:05:00.0: reg 0x30: [mem 0x00000000-0x000fffff pref]
> >
> > PCI core decides to let them enjoy the benefition of prefetch. They can't
> > bear the risk of getting 4G-above address, so its parent, its parent's parent,
> > its parent's parent's parent, finally the root bridge (00:00.0) must have their
> > MEM_64 flag of prefetchable resource (BAR 15) clear.
>
> It sounds like we're tracking the resource requirements
> (prefetchability and BAR width) by using the flags on bridge windows.
> If that's the case, I think it's wrong. We should preserve the bridge
> window flags, because those express the bridge hardware capabilities,
> and we should explicitly keep track of what's required by devices
> below the bridge in some other way.
>
> > In the end nobody
> > is eligible to use the 64-bit (prefetchable) space even we have huge
> > supply !
> >
> > Note even the resource is small and successfully fall back into 32-bit
> > non-prefetchable window, that's still not OK for us because we need
> > SRIOV resource be at 64-bit prefetchable space to do platform
> > configuration.
> >
> > With Yinghai's patch, when 64-bit prefetchable BARs found, they're more
> > favoured than the 32-bit prefetchable ones (if any), so all upstream bridges'
> > prefetchable windows have their MEM_64 flag reserved and the huge 64-bit
> > prefetchable space will be exploited:
> >
> > | pci 0003:00:00.0: BAR 15: assigned [mem 0x3d05008000000-0x3d0521fffffff 64bit pref]
> > | pci 0003:00:00.0: BAR 14: assigned [mem 0x3d05800000000-0x3d05802ffffff]
> > | pci 0003:00:00.0: BAR 13: can't assign io (size 0x4000)
> >
> > (The IO resource error here is due to we do not provide IO window)
>
> Yes. The lack of I/O space is just a constraint of the platform.
> It'd be nice if we printed a more meaningful error message in this
> case. One really has to be a PCI expert to distinguish this from a
> real problem that we need to fix.
>
> Bjorn
next prev parent reply other threads:[~2014-04-16 6:30 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAE9FiQXzRQ9S0mwwb58R2icBTtHHJ6NMVFMaxEgtVNP9mZVjZw@mail.gmail.com>
2014-03-07 20:08 ` [PATCH v7] PCI: Try best to allocate pref mmio 64bit above 4g Yinghai Lu
2014-04-04 22:43 ` Bjorn Helgaas
2014-04-08 2:57 ` Guo Chao
2014-04-08 7:18 ` Or Gerlitz
2014-04-08 7:41 ` Wei Yang
2014-04-08 7:55 ` Or Gerlitz
2014-04-08 8:22 ` Guo Chao
2014-04-08 15:02 ` Bjorn Helgaas
2014-04-09 7:52 ` Guo Chao
2014-04-10 17:26 ` Bjorn Helgaas
2014-04-10 21:23 ` Benjamin Herrenschmidt
2014-04-15 11:54 ` Guo Chao
2014-04-16 0:09 ` Bjorn Helgaas
2014-04-16 2:29 ` Yinghai Lu
2014-04-16 17:06 ` Bjorn Helgaas
2014-04-17 3:30 ` Yinghai Lu
2014-04-16 4:16 ` Yinghai Lu
2014-04-16 17:10 ` Bjorn Helgaas
2014-04-16 6:30 ` Benjamin Herrenschmidt [this message]
2014-04-16 6:33 ` Benjamin Herrenschmidt
2014-04-16 17:15 ` Bjorn Helgaas
2014-04-16 22:11 ` Bjorn Helgaas
2014-04-17 4:20 ` Yinghai Lu
2014-04-17 16:35 ` Yinghai Lu
2014-04-17 4:23 ` [PATCH v8] " Yinghai Lu
2014-04-17 16:40 ` [PATCH v9] " Yinghai Lu
2014-05-20 3:45 ` [PATCH v10 0/4] PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources Bjorn Helgaas
2014-05-20 3:45 ` [PATCH v10 1/4] " Bjorn Helgaas
2014-05-20 3:45 ` [PATCH v10 2/4] PCI: Change pbus_size_mem() return values to be more conventional Bjorn Helgaas
2014-05-22 8:20 ` Wei Yang
2014-05-22 16:59 ` Bjorn Helgaas
2014-05-20 3:46 ` [PATCH v10 3/4] PCI: Simplify __pci_assign_resource() coding style Bjorn Helgaas
2014-05-20 3:46 ` [PATCH v10 4/4] PCI: Add resource allocation comments Bjorn Helgaas
2014-05-21 3:31 ` [PATCH v10 0/4] PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources Wei Yang
2014-05-21 4:36 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1397629816.32730.8.camel@pasglop \
--to=benh@kernel.crashing.org \
--cc=amirv@mellanox.com \
--cc=bhelgaas@google.com \
--cc=eugenia@mellanox.com \
--cc=jackm@dev.mellanox.co.il \
--cc=linux-pci@vger.kernel.org \
--cc=ogerlitz@mellanox.com \
--cc=shangw@linux.vnet.ibm.com \
--cc=talal@mellanox.com \
--cc=weiyang@linux.vnet.ibm.com \
--cc=yan@linux.vnet.ibm.com \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).