All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: Guo Chao <yan@linux.vnet.ibm.com>,
	Yinghai Lu <yinghai@kernel.org>,
	Wei Yang <weiyang@linux.vnet.ibm.com>,
	Gavin Shan <shangw@linux.vnet.ibm.com>,
	Jack Morgenstein <jackm@dev.mellanox.co.il>,
	Amir Vadai <amirv@mellanox.com>,
	Or Gerlitz <ogerlitz@mellanox.com>,
	Eugenia Emantayev <eugenia@mellanox.com>,
	talal@mellanox.com,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>
Subject: Re: [PATCH v7] PCI: Try best to allocate pref mmio 64bit above 4g
Date: Wed, 16 Apr 2014 16:30:16 +1000	[thread overview]
Message-ID: <1397629816.32730.8.camel@pasglop> (raw)
In-Reply-To: <CAErSpo63LKffb4DH+cSfBnvtAoPy06u22e7CUhRuKr-PHNWZMw@mail.gmail.com>

On Tue, 2014-04-15 at 18:09 -0600, Bjorn Helgaas wrote:
> 
> Thanks for the example.  Please open a bug report at
> http://bugzilla.kernel.org and attach the complete dmesg logs before
> and after Yinghai's patch.
> 
> Having the complete logs helps me answer questions myself without
> having to bother you, and it also helps me figure out whether we can
> improve our logging to make it easier to diagnose problems like this.

Unfortunately, for a *little while* longer (hint !) we can't publish
a complete log from a Power8 machine, but we should be able to include
everything remotely related to PCI.

> > | pci 0003:05:00.0: reg 0x10: [mem 0x3d05801000000-0x3d058010fffff 64bit]
> > | pci 0003:05:00.0: reg 0x18: [mem 0x3d05010000000-0x3d05017ffffff 64bit pref]
> > | pci 0003:05:00.0: reg 0x30: [mem 0x00000000-0x000fffff pref]
> > | pci 0003:05:00.0: reg 0x134: [mem 0x3d05018000000-0x3d0501fffffff 64bit pref]
> >
> > This is printed at enumeration phase. This device has a SRIOV BAR with
> > size of 0x7ffffff (128M). That's the size of a signle VF BAR. The device
> > supports 63 VFs so we need near 8G space in total. Apparanlty we need
> > exploit 64-bit space.
> 
> Yes.  Do we print a hint anywhere about how many VFs there are?  In
> other words, can you deduce the number "63" from the dmesg, or do you
> have to figure that out some other way?  It'd be nice if that
> information were somewhere in dmesg.
>
> > | PCI host bridge to bus 0003:00
> > | pci_bus 0003:00: root bus resource [mem 0x3d05800000000-0x3d0587ffeffff] (bus address [0x80000000-0xfffeffff])
> > | pci_bus 0003:00: root bus resource [mem 0x3d05008000000-0x3d057ffffffff 64bit pref]
> >
> > And we do have a huge (32G) 64-bit prefetchable window supply. We expect
> > everything to work fine, but:
> >
> > | pci 0003:00:00.0: BAR 15: can't assign mem pref (size 0x206000000)
> > | pci 0003:00:00.0: BAR 14: assigned [mem 0x3d05800000000-0x3d05802ffffff]
> > | pci 0003:00:00.0: BAR 13: can't assign io (size 0x4000)
> >
> > It went wrong at the beginning. Note the error message never considers
> > 64-bit or not, but BAR 15 here has it MEM_64 flag cleared.
> 
> BAR 15 is a bridge window.  I think its resource flags should reflect
> the capability of the *window*, even if we disable the window or we
> happen to assign addresses that are under 4GB.  So I think it's wrong
> that we clear the MEM_64 flag  in pbus_size_mem() and the IO flag in
> pbus_size_io().
> 
> > It first
> > tried to find a 32-bit prefetchable window, but we only supply a 64-bit one.
> > So it fall back to (32-bit) non-prefetchable window, but there is no enough
> > room there. At last it went into complicated steps (not show here) of
> > allocating requested resource first, then try best for the optional ones, etc..
> >
> > Why is BAR 15 (prefetchable) 32 bit instead of 64? Because PCI core favours
> > 32-bit prefetchable BARs and we have some. This is one of them:
> >
> > | pci 0003:05:00.0: reg 0x30: [mem 0x00000000-0x000fffff pref]
> >
> > PCI core decides to let them enjoy the benefition of prefetch. They can't
> > bear the risk of getting 4G-above address, so its parent, its parent's parent,
> > its parent's parent's parent, finally the root bridge (00:00.0) must have their
> > MEM_64 flag of prefetchable resource (BAR 15) clear.
> 
> It sounds like we're tracking the resource requirements
> (prefetchability and BAR width) by using the flags on bridge windows.
> If that's the case, I think it's wrong.  We should preserve the bridge
> window flags, because those express the bridge hardware capabilities,
> and we should explicitly keep track of what's required by devices
> below the bridge in some other way.
> 
> > In the end nobody
> > is eligible to use the 64-bit (prefetchable) space even we have huge
> > supply !
> >
> > Note even the resource is small and successfully fall back into 32-bit
> > non-prefetchable window, that's still not OK for us because we need
> > SRIOV resource be at 64-bit prefetchable space to do platform
> > configuration.
> >
> > With Yinghai's patch, when 64-bit prefetchable BARs found, they're more
> > favoured than the 32-bit prefetchable ones (if any), so all upstream bridges'
> > prefetchable windows have their MEM_64 flag reserved and the huge 64-bit
> > prefetchable space will be exploited:
> >
> > | pci 0003:00:00.0: BAR 15: assigned [mem 0x3d05008000000-0x3d0521fffffff 64bit pref]
> > | pci 0003:00:00.0: BAR 14: assigned [mem 0x3d05800000000-0x3d05802ffffff]
> > | pci 0003:00:00.0: BAR 13: can't assign io (size 0x4000)
> >
> > (The IO resource error here is due to we do not provide IO window)
> 
> Yes.  The lack of I/O space is just a constraint of the platform.
> It'd be nice if we printed a more meaningful error message in this
> case.  One really has to be a PCI expert to distinguish this from a
> real problem that we need to fix.
> 
> Bjorn



  parent reply	other threads:[~2014-04-16  6:30 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAE9FiQXzRQ9S0mwwb58R2icBTtHHJ6NMVFMaxEgtVNP9mZVjZw@mail.gmail.com>
2014-03-07 20:08 ` [PATCH v7] PCI: Try best to allocate pref mmio 64bit above 4g Yinghai Lu
2014-04-04 22:43   ` Bjorn Helgaas
2014-04-08  2:57     ` Guo Chao
2014-04-08  7:18       ` Or Gerlitz
2014-04-08  7:41         ` Wei Yang
2014-04-08  7:55           ` Or Gerlitz
2014-04-08  8:22             ` Guo Chao
2014-04-08 15:02       ` Bjorn Helgaas
2014-04-09  7:52         ` Guo Chao
2014-04-10 17:26           ` Bjorn Helgaas
2014-04-10 21:23             ` Benjamin Herrenschmidt
2014-04-15 11:54             ` Guo Chao
2014-04-16  0:09               ` Bjorn Helgaas
2014-04-16  2:29                 ` Yinghai Lu
2014-04-16 17:06                   ` Bjorn Helgaas
2014-04-17  3:30                     ` Yinghai Lu
2014-04-16  4:16                 ` Yinghai Lu
2014-04-16 17:10                   ` Bjorn Helgaas
2014-04-16  6:30                 ` Benjamin Herrenschmidt [this message]
2014-04-16  6:33                   ` Benjamin Herrenschmidt
2014-04-16 17:15                     ` Bjorn Helgaas
2014-04-16 22:11   ` Bjorn Helgaas
2014-04-17  4:20     ` Yinghai Lu
2014-04-17 16:35       ` Yinghai Lu
2014-04-17  4:23 ` [PATCH v8] " Yinghai Lu
2014-04-17 16:40 ` [PATCH v9] " Yinghai Lu
2014-05-20  3:45   ` [PATCH v10 0/4] PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources Bjorn Helgaas
2014-05-20  3:45     ` [PATCH v10 1/4] " Bjorn Helgaas
2014-05-20  3:45     ` [PATCH v10 2/4] PCI: Change pbus_size_mem() return values to be more conventional Bjorn Helgaas
2014-05-22  8:20       ` Wei Yang
2014-05-22 16:59         ` Bjorn Helgaas
2014-05-20  3:46     ` [PATCH v10 3/4] PCI: Simplify __pci_assign_resource() coding style Bjorn Helgaas
2014-05-20  3:46     ` [PATCH v10 4/4] PCI: Add resource allocation comments Bjorn Helgaas
2014-05-21  3:31     ` [PATCH v10 0/4] PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources Wei Yang
2014-05-21  4:36       ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1397629816.32730.8.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=amirv@mellanox.com \
    --cc=bhelgaas@google.com \
    --cc=eugenia@mellanox.com \
    --cc=jackm@dev.mellanox.co.il \
    --cc=linux-pci@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    --cc=shangw@linux.vnet.ibm.com \
    --cc=talal@mellanox.com \
    --cc=weiyang@linux.vnet.ibm.com \
    --cc=yan@linux.vnet.ibm.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.