* [PATCH v6 0/2] PCI: allocate 64bit mmio pref
@ 2013-12-19 20:44 Yinghai Lu
2013-12-19 20:44 ` [PATCH v6 1/2] PCI: Try to allocate mem64 above 4G at first Yinghai Lu
2013-12-19 20:44 ` [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g Yinghai Lu
0 siblings, 2 replies; 10+ messages in thread
From: Yinghai Lu @ 2013-12-19 20:44 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: Guo Chao, linux-pci, linux-kernel, Yinghai Lu
mmio 64 allocation that could help Guo Chao <yan@linux.vnet.ibm.com> on powerpc mmio allocation.
It will try to assign 64 bit resource above 4g at first.
And it is based on current pci/next and pci/resource.
-v2: update after patch that move device_del down to pci_destroy_dev.
add "Try best to allocate pref mmio 64bit above 4G"
-v3: refresh and send out after pci_clip_resource() changes,
as Bjorn is not happy with attachments.
-v4: make pcibios_resource_to_bus take bus directly.
-v5: fix non-pref mmio64 allocation problem found by Guo Chao.
refresh last three as Bjorn update first two and put them in pci/resource
-v6: try above 4G at first, then restart again for under 4G.
so we can drop patch that sort pci root bus resource list.
Yinghai Lu (2):
PCI: Try to allocate mem64 above 4G at first
PCI: Try best to allocate pref mmio 64bit above 4g
drivers/pci/bus.c | 34 ++++++++----
drivers/pci/setup-bus.c | 138 ++++++++++++++++++++++++++++++++----------------
drivers/pci/setup-res.c | 20 ++++++-
3 files changed, 135 insertions(+), 57 deletions(-)
--
1.8.4
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v6 1/2] PCI: Try to allocate mem64 above 4G at first
2013-12-19 20:44 [PATCH v6 0/2] PCI: allocate 64bit mmio pref Yinghai Lu
@ 2013-12-19 20:44 ` Yinghai Lu
2013-12-19 20:44 ` [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g Yinghai Lu
1 sibling, 0 replies; 10+ messages in thread
From: Yinghai Lu @ 2013-12-19 20:44 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: Guo Chao, linux-pci, linux-kernel, Yinghai Lu
On system with more pcie cards, we do not have enough range under 4G
to allocate those pci devices.
On 64bit system, we could try to allocate mem64 above 4G at first,
and fall back to below 4g if it can not find any above 4g.
-v2: update bottom assigning to make it clear for non-pae support machine.
-v3: Bjorn's change:
use MAX_RESOURCE instead of -1
use start/end instead of bottom/max
for all arch instead of just x86_64
-v4: updated after PCI_MAX_RESOURCE_32 change.
-v5: restore io handling to use PCI_MAX_RESOURCE_32 as limit.
-v6: checking pcibios_resource_to_bus return for every bus res, to decide it
if we need to try high at first.
It supports all arches instead of just x86_64.
-v7: split 4G limit change out to another patch according to Bjorn.
also use pci_clip_resource instead.
-v8: refresh after changes in pci/resource.
-v9: make second try to restart from first res of bus.
so we can ommit the patch that sort resource list of pci root bus.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
drivers/pci/bus.c | 34 ++++++++++++++++++++++++----------
1 file changed, 24 insertions(+), 10 deletions(-)
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 263b90c..d49e6cb 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -100,6 +100,9 @@ void pci_bus_remove_resources(struct pci_bus *bus)
/* The region that can be mapped by a 32-bit BAR. */
static struct pci_bus_region pci_32_bit = {0, 0xffffffff};
+/* The region that can be mapped by a 64-bit BAR above 4G */
+static struct pci_bus_region pci_64_bit = {(resource_size_t)(1ULL<<32),
+ (resource_size_t)(-1ULL)};
/*
* @res contains CPU addresses. Clip it so the corresponding bus addresses
@@ -150,10 +153,11 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
{
int i, ret = -ENOMEM;
struct resource *r;
- resource_size_t max;
+ bool try_again = !!(res->flags & IORESOURCE_MEM_64);
type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
+again:
pci_bus_for_each_resource(bus, r, i) {
struct resource avail;
@@ -170,13 +174,21 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
!(res->flags & IORESOURCE_PREFETCH))
continue;
+ /* If this is a 64-bit BAR, try above 4G first. */
+ avail = *r;
+ if (try_again) {
+ /* res->flags has IORESOURCE_MEM_64 set */
+ pci_clip_resource_to_bus(bus, &avail, &pci_64_bit);
+ if (!resource_size(&avail))
+ continue;
+ }
+
/*
* Unless this is a 64-bit BAR, we have to clip the
* available space to the part that maps to the region of
* 32-bit bus addresses.
*/
- avail = *r;
- if (!(res->flags & IORESOURCE_MEM_64)) {
+ if (!try_again && !(res->flags & IORESOURCE_MEM_64)) {
pci_clip_resource_to_bus(bus, &avail, &pci_32_bit);
if (!resource_size(&avail))
continue;
@@ -188,17 +200,19 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
* this is an already-configured bridge window, its start
* overrides "min".
*/
- if (avail.start)
- min = avail.start;
-
- max = avail.end;
/* Ok, try it out.. */
- ret = allocate_resource(r, res, size, min, max,
- align, alignf, alignf_data);
+ ret = allocate_resource(r, res, size, avail.start ? : min,
+ avail.end, align, alignf, alignf_data);
if (ret == 0)
- break;
+ return 0;
}
+
+ if (try_again) {
+ try_again = false;
+ goto again;
+ }
+
return ret;
}
--
1.8.4
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g
2013-12-19 20:44 [PATCH v6 0/2] PCI: allocate 64bit mmio pref Yinghai Lu
2013-12-19 20:44 ` [PATCH v6 1/2] PCI: Try to allocate mem64 above 4G at first Yinghai Lu
@ 2013-12-19 20:44 ` Yinghai Lu
2013-12-23 0:00 ` Bjorn Helgaas
2014-02-17 3:22 ` Guo Chao
1 sibling, 2 replies; 10+ messages in thread
From: Yinghai Lu @ 2013-12-19 20:44 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: Guo Chao, linux-pci, linux-kernel, Yinghai Lu
When one of children resources does not support MEM_64, MEM_64 for
bridge get reset, so pull down whole pref resource on the bridge under 4G.
If the bridge support pref mem 64, will only allocate that with pref mem64 to
children that support it.
For children resources if they only support pref mem 32, will allocate them
from non pref mem instead.
If the bridge only support 32bit pref mmio, will still have all children pref
mmio under that.
-v2: Add release bridge res support with bridge mem res for pref_mem children res.
-v3: refresh and make it can be applied early before for_each_dev_res patchset.
-v4: fix non-pref mmio 64bit support found by Guo Chao.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Guo Chao <yan@linux.vnet.ibm.com>
---
drivers/pci/setup-bus.c | 138 ++++++++++++++++++++++++++++++++----------------
drivers/pci/setup-res.c | 20 ++++++-
2 files changed, 111 insertions(+), 47 deletions(-)
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 138bdd6..b29504f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -713,12 +713,11 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
bus resource of a given type. Note: we intentionally skip
the bus resources which have already been assigned (that is,
have non-NULL parent resource). */
-static struct resource *find_free_bus_resource(struct pci_bus *bus, unsigned long type)
+static struct resource *find_free_bus_resource(struct pci_bus *bus,
+ unsigned long type_mask, unsigned long type)
{
int i;
struct resource *r;
- unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
- IORESOURCE_PREFETCH;
pci_bus_for_each_resource(bus, r, i) {
if (r == &ioport_resource || r == &iomem_resource)
@@ -815,7 +814,8 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
resource_size_t add_size, struct list_head *realloc_head)
{
struct pci_dev *dev;
- struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO);
+ struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
+ IORESOURCE_IO);
resource_size_t size = 0, size0 = 0, size1 = 0;
resource_size_t children_add_size = 0;
resource_size_t min_align, align;
@@ -915,15 +915,17 @@ static inline resource_size_t calculate_mem_align(resource_size_t *aligns,
* guarantees that all child resources fit in this size.
*/
static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
- unsigned long type, resource_size_t min_size,
- resource_size_t add_size,
- struct list_head *realloc_head)
+ unsigned long type, unsigned long type2,
+ unsigned long type3,
+ resource_size_t min_size, resource_size_t add_size,
+ struct list_head *realloc_head)
{
struct pci_dev *dev;
resource_size_t min_align, align, size, size0, size1;
resource_size_t aligns[12]; /* Alignments from 1Mb to 2Gb */
int order, max_order;
- struct resource *b_res = find_free_bus_resource(bus, type);
+ struct resource *b_res = find_free_bus_resource(bus,
+ mask | IORESOURCE_PREFETCH, type);
unsigned int mem64_mask = 0;
resource_size_t children_add_size = 0;
@@ -944,7 +946,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
struct resource *r = &dev->resource[i];
resource_size_t r_size;
- if (r->parent || (r->flags & mask) != type)
+ if (r->parent || ((r->flags & mask) != type &&
+ (r->flags & mask) != type2 &&
+ (r->flags & mask) != type3))
continue;
r_size = resource_size(r);
#ifdef CONFIG_PCI_IOV
@@ -1117,8 +1121,9 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
struct list_head *realloc_head)
{
struct pci_dev *dev;
- unsigned long mask, prefmask;
+ unsigned long mask, prefmask, type2 = 0, type3 = 0;
resource_size_t additional_mem_size = 0, additional_io_size = 0;
+ struct resource *b_res;
list_for_each_entry(dev, &bus->devices, bus_list) {
struct pci_bus *b = dev->subordinate;
@@ -1163,15 +1168,34 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
has already been allocated by arch code, try
non-prefetchable range for both types of PCI memory
resources. */
+ b_res = &bus->self->resource[PCI_BRIDGE_RESOURCES];
mask = IORESOURCE_MEM;
prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH;
- if (pbus_size_mem(bus, prefmask, prefmask,
+ if (b_res[2].flags & IORESOURCE_MEM_64) {
+ prefmask |= IORESOURCE_MEM_64;
+ if (pbus_size_mem(bus, prefmask, prefmask,
+ prefmask, prefmask,
realloc_head ? 0 : additional_mem_size,
- additional_mem_size, realloc_head))
- mask = prefmask; /* Success, size non-prefetch only. */
- else
- additional_mem_size += additional_mem_size;
- pbus_size_mem(bus, mask, IORESOURCE_MEM,
+ additional_mem_size, realloc_head)) {
+ /* Success, size non-pref64 only. */
+ mask = prefmask;
+ type2 = prefmask & ~IORESOURCE_MEM_64;
+ type3 = prefmask & ~IORESOURCE_PREFETCH;
+ }
+ }
+ if (!type2) {
+ prefmask &= ~IORESOURCE_MEM_64;
+ if (pbus_size_mem(bus, prefmask, prefmask,
+ prefmask, prefmask,
+ realloc_head ? 0 : additional_mem_size,
+ additional_mem_size, realloc_head)) {
+ /* Success, size non-prefetch only. */
+ mask = prefmask;
+ } else
+ additional_mem_size += additional_mem_size;
+ type2 = type3 = IORESOURCE_MEM;
+ }
+ pbus_size_mem(bus, mask, IORESOURCE_MEM, type2, type3,
realloc_head ? 0 : additional_mem_size,
additional_mem_size, realloc_head);
break;
@@ -1257,42 +1281,66 @@ static void __ref __pci_bridge_assign_resources(const struct pci_dev *bridge,
static void pci_bridge_release_resources(struct pci_bus *bus,
unsigned long type)
{
- int idx;
- bool changed = false;
- struct pci_dev *dev;
+ struct pci_dev *dev = bus->self;
struct resource *r;
unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
- IORESOURCE_PREFETCH;
+ IORESOURCE_PREFETCH | IORESOURCE_MEM_64;
+ unsigned old_flags = 0;
+ struct resource *b_res;
+ int idx = 1;
- dev = bus->self;
- for (idx = PCI_BRIDGE_RESOURCES; idx <= PCI_BRIDGE_RESOURCE_END;
- idx++) {
- r = &dev->resource[idx];
- if ((r->flags & type_mask) != type)
- continue;
- if (!r->parent)
- continue;
- /*
- * if there are children under that, we should release them
- * all
- */
- release_child_resources(r);
- if (!release_resource(r)) {
- dev_printk(KERN_DEBUG, &dev->dev,
- "resource %d %pR released\n", idx, r);
- /* keep the old size */
- r->end = resource_size(r) - 1;
- r->start = 0;
- r->flags = 0;
- changed = true;
- }
- }
+ b_res = &dev->resource[PCI_BRIDGE_RESOURCES];
+
+ /*
+ * 1. if there is io port assign fail, will release bridge
+ * io port.
+ * 2. if there is non pref mmio assign fail, release bridge
+ * nonpref mmio.
+ * 3. if there is 64bit pref mmio assign fail, and bridge pref
+ * is 64bit, release bridge pref mmio.
+ * 4. if there is pref mmio assign fail, and bridge pref is
+ * 32bit mmio, release bridge pref mmio
+ * 5. if there is pref mmio assign fail, and bridge pref is not
+ * assigned, release bridge nonpref mmio.
+ */
+ if (type & IORESOURCE_IO)
+ idx = 0;
+ else if (!(type & IORESOURCE_PREFETCH))
+ idx = 1;
+ else if ((type & IORESOURCE_MEM_64) &&
+ (b_res[2].flags & IORESOURCE_MEM_64))
+ idx = 2;
+ else if (!(b_res[2].flags & IORESOURCE_MEM_64) &&
+ (b_res[2].flags & IORESOURCE_PREFETCH))
+ idx = 2;
+ else
+ idx = 1;
+
+ r = &b_res[idx];
+
+ if (!r->parent)
+ return;
+
+ /*
+ * if there are children under that, we should release them
+ * all
+ */
+ release_child_resources(r);
+ if (!release_resource(r)) {
+ type = old_flags = r->flags & type_mask;
+ dev_printk(KERN_DEBUG, &dev->dev, "resource %d %pR released\n",
+ PCI_BRIDGE_RESOURCES + idx, r);
+ /* keep the old size */
+ r->end = resource_size(r) - 1;
+ r->start = 0;
+ r->flags = 0;
- if (changed) {
/* avoiding touch the one without PREF */
if (type & IORESOURCE_PREFETCH)
type = IORESOURCE_PREFETCH;
__pci_setup_bridge(bus, type);
+ /* for next child res under same bridge */
+ r->flags = old_flags;
}
}
@@ -1471,7 +1519,7 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
LIST_HEAD(fail_head);
struct pci_dev_resource *fail_res;
unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
- IORESOURCE_PREFETCH;
+ IORESOURCE_PREFETCH | IORESOURCE_MEM_64;
int pci_try_num = 1;
enum enable_type enable_local;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 5c060b1..2c659e4 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -208,15 +208,31 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
/* First, try exact prefetching match.. */
ret = pci_bus_alloc_resource(bus, res, size, align, min,
- IORESOURCE_PREFETCH,
+ IORESOURCE_PREFETCH | IORESOURCE_MEM_64,
pcibios_align_resource, dev);
- if (ret < 0 && (res->flags & IORESOURCE_PREFETCH)) {
+ if (ret < 0 &&
+ (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) ==
+ (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) {
+ /*
+ * That failed.
+ *
+ * Try below 4g pref
+ */
+ ret = pci_bus_alloc_resource(bus, res, size, align, min,
+ IORESOURCE_PREFETCH,
+ pcibios_align_resource, dev);
+ }
+
+ if (ret < 0 &&
+ (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64))) {
/*
* That failed.
*
* But a prefetching area can handle a non-prefetching
* window (it will just not perform as well).
+ *
+ * Also can put 64bit under 32bit range. (below 4g).
*/
ret = pci_bus_alloc_resource(bus, res, size, align, min, 0,
pcibios_align_resource, dev);
--
1.8.4
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g
2013-12-19 20:44 ` [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g Yinghai Lu
@ 2013-12-23 0:00 ` Bjorn Helgaas
2013-12-23 1:14 ` Yinghai Lu
2014-02-17 3:22 ` Guo Chao
1 sibling, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2013-12-23 0:00 UTC (permalink / raw)
To: Yinghai Lu
Cc: Guo Chao, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
On Thu, Dec 19, 2013 at 1:44 PM, Yinghai Lu <yinghai@kernel.org> wrote:
Let me see if I can figure out what you're trying to do here. Please
correct me if I'm wrong:
> When one of children resources does not support MEM_64, MEM_64 for
> bridge get reset, so pull down whole pref resource on the bridge under 4G.
When we allocate space for a bridge's prefetchable window, we
currently look at the devices behind the bridge and put the window
below 4GB if any of those children has a 32-bit prefetchable BAR.
This maximizes the use of prefetch, at the cost of using more 32-bit
address space.
> If the bridge support pref mem 64, will only allocate that with pref mem64 to
> children that support it.
> For children resources if they only support pref mem 32, will allocate them
> from non pref mem instead.
You are changing this so that we will always try to put a bridge's
64-bit prefetchable window above 4GB, regardless of what devices are
behind the bridge. If a device behind the bridge has a 32-bit
prefetchable BAR, we will place that BAR in the bridge's 32-bit
non-prefetchable window.
This minimizes the use of the 32-bit address space, at the cost of not
being able to use prefetch as much.
> If the bridge only support 32bit pref mmio, will still have all children pref
> mmio under that.
Obviously, if a bridge has a prefetchable window that's only 32 bits,
64-bit prefetchable BARs behind the bridge will have to be in that
32-bit prefetchable window or the 32-bit non-prefetchable window. And
if the bridge has no prefetchable window at all, every memory BAR
behind the bridge will have to be in the 32-bit non-prefetchable
window.
I'll look at the actual patch later; I just want to make sure I
understand your intent first.
Bjorn
> -v2: Add release bridge res support with bridge mem res for pref_mem children res.
> -v3: refresh and make it can be applied early before for_each_dev_res patchset.
> -v4: fix non-pref mmio 64bit support found by Guo Chao.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Tested-by: Guo Chao <yan@linux.vnet.ibm.com>
> ---
> drivers/pci/setup-bus.c | 138 ++++++++++++++++++++++++++++++++----------------
> drivers/pci/setup-res.c | 20 ++++++-
> 2 files changed, 111 insertions(+), 47 deletions(-)
>
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 138bdd6..b29504f 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -713,12 +713,11 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
> bus resource of a given type. Note: we intentionally skip
> the bus resources which have already been assigned (that is,
> have non-NULL parent resource). */
> -static struct resource *find_free_bus_resource(struct pci_bus *bus, unsigned long type)
> +static struct resource *find_free_bus_resource(struct pci_bus *bus,
> + unsigned long type_mask, unsigned long type)
> {
> int i;
> struct resource *r;
> - unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
> - IORESOURCE_PREFETCH;
>
> pci_bus_for_each_resource(bus, r, i) {
> if (r == &ioport_resource || r == &iomem_resource)
> @@ -815,7 +814,8 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
> resource_size_t add_size, struct list_head *realloc_head)
> {
> struct pci_dev *dev;
> - struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO);
> + struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
> + IORESOURCE_IO);
> resource_size_t size = 0, size0 = 0, size1 = 0;
> resource_size_t children_add_size = 0;
> resource_size_t min_align, align;
> @@ -915,15 +915,17 @@ static inline resource_size_t calculate_mem_align(resource_size_t *aligns,
> * guarantees that all child resources fit in this size.
> */
> static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
> - unsigned long type, resource_size_t min_size,
> - resource_size_t add_size,
> - struct list_head *realloc_head)
> + unsigned long type, unsigned long type2,
> + unsigned long type3,
> + resource_size_t min_size, resource_size_t add_size,
> + struct list_head *realloc_head)
> {
> struct pci_dev *dev;
> resource_size_t min_align, align, size, size0, size1;
> resource_size_t aligns[12]; /* Alignments from 1Mb to 2Gb */
> int order, max_order;
> - struct resource *b_res = find_free_bus_resource(bus, type);
> + struct resource *b_res = find_free_bus_resource(bus,
> + mask | IORESOURCE_PREFETCH, type);
> unsigned int mem64_mask = 0;
> resource_size_t children_add_size = 0;
>
> @@ -944,7 +946,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
> struct resource *r = &dev->resource[i];
> resource_size_t r_size;
>
> - if (r->parent || (r->flags & mask) != type)
> + if (r->parent || ((r->flags & mask) != type &&
> + (r->flags & mask) != type2 &&
> + (r->flags & mask) != type3))
> continue;
> r_size = resource_size(r);
> #ifdef CONFIG_PCI_IOV
> @@ -1117,8 +1121,9 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
> struct list_head *realloc_head)
> {
> struct pci_dev *dev;
> - unsigned long mask, prefmask;
> + unsigned long mask, prefmask, type2 = 0, type3 = 0;
> resource_size_t additional_mem_size = 0, additional_io_size = 0;
> + struct resource *b_res;
>
> list_for_each_entry(dev, &bus->devices, bus_list) {
> struct pci_bus *b = dev->subordinate;
> @@ -1163,15 +1168,34 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
> has already been allocated by arch code, try
> non-prefetchable range for both types of PCI memory
> resources. */
> + b_res = &bus->self->resource[PCI_BRIDGE_RESOURCES];
> mask = IORESOURCE_MEM;
> prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH;
> - if (pbus_size_mem(bus, prefmask, prefmask,
> + if (b_res[2].flags & IORESOURCE_MEM_64) {
> + prefmask |= IORESOURCE_MEM_64;
> + if (pbus_size_mem(bus, prefmask, prefmask,
> + prefmask, prefmask,
> realloc_head ? 0 : additional_mem_size,
> - additional_mem_size, realloc_head))
> - mask = prefmask; /* Success, size non-prefetch only. */
> - else
> - additional_mem_size += additional_mem_size;
> - pbus_size_mem(bus, mask, IORESOURCE_MEM,
> + additional_mem_size, realloc_head)) {
> + /* Success, size non-pref64 only. */
> + mask = prefmask;
> + type2 = prefmask & ~IORESOURCE_MEM_64;
> + type3 = prefmask & ~IORESOURCE_PREFETCH;
> + }
> + }
> + if (!type2) {
> + prefmask &= ~IORESOURCE_MEM_64;
> + if (pbus_size_mem(bus, prefmask, prefmask,
> + prefmask, prefmask,
> + realloc_head ? 0 : additional_mem_size,
> + additional_mem_size, realloc_head)) {
> + /* Success, size non-prefetch only. */
> + mask = prefmask;
> + } else
> + additional_mem_size += additional_mem_size;
> + type2 = type3 = IORESOURCE_MEM;
> + }
> + pbus_size_mem(bus, mask, IORESOURCE_MEM, type2, type3,
> realloc_head ? 0 : additional_mem_size,
> additional_mem_size, realloc_head);
> break;
> @@ -1257,42 +1281,66 @@ static void __ref __pci_bridge_assign_resources(const struct pci_dev *bridge,
> static void pci_bridge_release_resources(struct pci_bus *bus,
> unsigned long type)
> {
> - int idx;
> - bool changed = false;
> - struct pci_dev *dev;
> + struct pci_dev *dev = bus->self;
> struct resource *r;
> unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
> - IORESOURCE_PREFETCH;
> + IORESOURCE_PREFETCH | IORESOURCE_MEM_64;
> + unsigned old_flags = 0;
> + struct resource *b_res;
> + int idx = 1;
>
> - dev = bus->self;
> - for (idx = PCI_BRIDGE_RESOURCES; idx <= PCI_BRIDGE_RESOURCE_END;
> - idx++) {
> - r = &dev->resource[idx];
> - if ((r->flags & type_mask) != type)
> - continue;
> - if (!r->parent)
> - continue;
> - /*
> - * if there are children under that, we should release them
> - * all
> - */
> - release_child_resources(r);
> - if (!release_resource(r)) {
> - dev_printk(KERN_DEBUG, &dev->dev,
> - "resource %d %pR released\n", idx, r);
> - /* keep the old size */
> - r->end = resource_size(r) - 1;
> - r->start = 0;
> - r->flags = 0;
> - changed = true;
> - }
> - }
> + b_res = &dev->resource[PCI_BRIDGE_RESOURCES];
> +
> + /*
> + * 1. if there is io port assign fail, will release bridge
> + * io port.
> + * 2. if there is non pref mmio assign fail, release bridge
> + * nonpref mmio.
> + * 3. if there is 64bit pref mmio assign fail, and bridge pref
> + * is 64bit, release bridge pref mmio.
> + * 4. if there is pref mmio assign fail, and bridge pref is
> + * 32bit mmio, release bridge pref mmio
> + * 5. if there is pref mmio assign fail, and bridge pref is not
> + * assigned, release bridge nonpref mmio.
> + */
> + if (type & IORESOURCE_IO)
> + idx = 0;
> + else if (!(type & IORESOURCE_PREFETCH))
> + idx = 1;
> + else if ((type & IORESOURCE_MEM_64) &&
> + (b_res[2].flags & IORESOURCE_MEM_64))
> + idx = 2;
> + else if (!(b_res[2].flags & IORESOURCE_MEM_64) &&
> + (b_res[2].flags & IORESOURCE_PREFETCH))
> + idx = 2;
> + else
> + idx = 1;
> +
> + r = &b_res[idx];
> +
> + if (!r->parent)
> + return;
> +
> + /*
> + * if there are children under that, we should release them
> + * all
> + */
> + release_child_resources(r);
> + if (!release_resource(r)) {
> + type = old_flags = r->flags & type_mask;
> + dev_printk(KERN_DEBUG, &dev->dev, "resource %d %pR released\n",
> + PCI_BRIDGE_RESOURCES + idx, r);
> + /* keep the old size */
> + r->end = resource_size(r) - 1;
> + r->start = 0;
> + r->flags = 0;
>
> - if (changed) {
> /* avoiding touch the one without PREF */
> if (type & IORESOURCE_PREFETCH)
> type = IORESOURCE_PREFETCH;
> __pci_setup_bridge(bus, type);
> + /* for next child res under same bridge */
> + r->flags = old_flags;
> }
> }
>
> @@ -1471,7 +1519,7 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
> LIST_HEAD(fail_head);
> struct pci_dev_resource *fail_res;
> unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
> - IORESOURCE_PREFETCH;
> + IORESOURCE_PREFETCH | IORESOURCE_MEM_64;
> int pci_try_num = 1;
> enum enable_type enable_local;
>
> diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
> index 5c060b1..2c659e4 100644
> --- a/drivers/pci/setup-res.c
> +++ b/drivers/pci/setup-res.c
> @@ -208,15 +208,31 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
>
> /* First, try exact prefetching match.. */
> ret = pci_bus_alloc_resource(bus, res, size, align, min,
> - IORESOURCE_PREFETCH,
> + IORESOURCE_PREFETCH | IORESOURCE_MEM_64,
> pcibios_align_resource, dev);
>
> - if (ret < 0 && (res->flags & IORESOURCE_PREFETCH)) {
> + if (ret < 0 &&
> + (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) ==
> + (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) {
> + /*
> + * That failed.
> + *
> + * Try below 4g pref
> + */
> + ret = pci_bus_alloc_resource(bus, res, size, align, min,
> + IORESOURCE_PREFETCH,
> + pcibios_align_resource, dev);
> + }
> +
> + if (ret < 0 &&
> + (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64))) {
> /*
> * That failed.
> *
> * But a prefetching area can handle a non-prefetching
> * window (it will just not perform as well).
> + *
> + * Also can put 64bit under 32bit range. (below 4g).
> */
> ret = pci_bus_alloc_resource(bus, res, size, align, min, 0,
> pcibios_align_resource, dev);
> --
> 1.8.4
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g
2013-12-23 0:00 ` Bjorn Helgaas
@ 2013-12-23 1:14 ` Yinghai Lu
2014-01-08 23:34 ` Yinghai Lu
0 siblings, 1 reply; 10+ messages in thread
From: Yinghai Lu @ 2013-12-23 1:14 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Guo Chao, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
On Sun, Dec 22, 2013 at 4:00 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, Dec 19, 2013 at 1:44 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>
> Let me see if I can figure out what you're trying to do here. Please
> correct me if I'm wrong:
>
>> When one of children resources does not support MEM_64, MEM_64 for
>> bridge get reset, so pull down whole pref resource on the bridge under 4G.
>
> When we allocate space for a bridge's prefetchable window, we
> currently look at the devices behind the bridge and put the window
> below 4GB if any of those children has a 32-bit prefetchable BAR.
>
> This maximizes the use of prefetch, at the cost of using more 32-bit
> address space.
yes. and we have problem when we have 8 sockets or 32 sockets system,
will have limit 32bit space.
but we have enough above 4G 64bit mmio for prefetchable.
>
>> If the bridge support pref mem 64, will only allocate that with pref mem64 to
>> children that support it.
>> For children resources if they only support pref mem 32, will allocate them
>> from non pref mem instead.
>
> You are changing this so that we will always try to put a bridge's
> 64-bit prefetchable window above 4GB, regardless of what devices are
> behind the bridge. If a device behind the bridge has a 32-bit
> prefetchable BAR, we will place that BAR in the bridge's 32-bit
> non-prefetchable window.
Yes. so we can keep IORESOURCE_MEM64 in the flags for PREF.
>
> This minimizes the use of the 32-bit address space, at the cost of not
> being able to use prefetch as much.
>
>> If the bridge only support 32bit pref mmio, will still have all children pref
>> mmio under that.
>
> Obviously, if a bridge has a prefetchable window that's only 32 bits,
> 64-bit prefetchable BARs behind the bridge will have to be in that
> 32-bit prefetchable window or the 32-bit non-prefetchable window. And
> if the bridge has no prefetchable window at all, every memory BAR
> behind the bridge will have to be in the 32-bit non-prefetchable
> window.
Yes.
>
> I'll look at the actual patch later; I just want to make sure I
> understand your intent first.
Thanks
Yinghai
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g
2013-12-23 1:14 ` Yinghai Lu
@ 2014-01-08 23:34 ` Yinghai Lu
2014-01-10 9:41 ` Guo Chao
0 siblings, 1 reply; 10+ messages in thread
From: Yinghai Lu @ 2014-01-08 23:34 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Guo Chao, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
On Sun, Dec 22, 2013 at 5:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Sun, Dec 22, 2013 at 4:00 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Thu, Dec 19, 2013 at 1:44 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>
>> Let me see if I can figure out what you're trying to do here. Please
>> correct me if I'm wrong:
>>
>>> When one of children resources does not support MEM_64, MEM_64 for
>>> bridge get reset, so pull down whole pref resource on the bridge under 4G.
>>
>> When we allocate space for a bridge's prefetchable window, we
>> currently look at the devices behind the bridge and put the window
>> below 4GB if any of those children has a 32-bit prefetchable BAR.
>>
>> This maximizes the use of prefetch, at the cost of using more 32-bit
>> address space.
>
> yes. and we have problem when we have 8 sockets or 32 sockets system,
> will have limit 32bit space.
> but we have enough above 4G 64bit mmio for prefetchable.
>
>>
>>> If the bridge support pref mem 64, will only allocate that with pref mem64 to
>>> children that support it.
>>> For children resources if they only support pref mem 32, will allocate them
>>> from non pref mem instead.
>>
>> You are changing this so that we will always try to put a bridge's
>> 64-bit prefetchable window above 4GB, regardless of what devices are
>> behind the bridge. If a device behind the bridge has a 32-bit
>> prefetchable BAR, we will place that BAR in the bridge's 32-bit
>> non-prefetchable window.
>
> Yes. so we can keep IORESOURCE_MEM64 in the flags for PREF.
>
>>
>> This minimizes the use of the 32-bit address space, at the cost of not
>> being able to use prefetch as much.
>>
>>> If the bridge only support 32bit pref mmio, will still have all children pref
>>> mmio under that.
>>
>> Obviously, if a bridge has a prefetchable window that's only 32 bits,
>> 64-bit prefetchable BARs behind the bridge will have to be in that
>> 32-bit prefetchable window or the 32-bit non-prefetchable window. And
>> if the bridge has no prefetchable window at all, every memory BAR
>> behind the bridge will have to be in the 32-bit non-prefetchable
>> window.
>
> Yes.
>
>>
>> I'll look at the actual patch later; I just want to make sure I
>> understand your intent first.
Hi, Bjorn,
Can you check and add this one to your pci/resource branch?
With that we can close the loop for 64bit mmio resource allocation.
Thanks
Yinghai
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g
2014-01-08 23:34 ` Yinghai Lu
@ 2014-01-10 9:41 ` Guo Chao
2014-01-10 17:06 ` Yinghai Lu
0 siblings, 1 reply; 10+ messages in thread
From: Guo Chao @ 2014-01-10 9:41 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org
On Wed, Jan 08, 2014 at 03:34:54PM -0800, Yinghai Lu wrote:
> On Sun, Dec 22, 2013 at 5:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Sun, Dec 22, 2013 at 4:00 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >> On Thu, Dec 19, 2013 at 1:44 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> >>
> >> Let me see if I can figure out what you're trying to do here. Please
> >> correct me if I'm wrong:
> >>
> >>> When one of children resources does not support MEM_64, MEM_64 for
> >>> bridge get reset, so pull down whole pref resource on the bridge under 4G.
> >>
> >> When we allocate space for a bridge's prefetchable window, we
> >> currently look at the devices behind the bridge and put the window
> >> below 4GB if any of those children has a 32-bit prefetchable BAR.
> >>
> >> This maximizes the use of prefetch, at the cost of using more 32-bit
> >> address space.
> >
> > yes. and we have problem when we have 8 sockets or 32 sockets system,
> > will have limit 32bit space.
> > but we have enough above 4G 64bit mmio for prefetchable.
> >
> >>
> >>> If the bridge support pref mem 64, will only allocate that with pref mem64 to
> >>> children that support it.
> >>> For children resources if they only support pref mem 32, will allocate them
> >>> from non pref mem instead.
> >>
> >> You are changing this so that we will always try to put a bridge's
> >> 64-bit prefetchable window above 4GB, regardless of what devices are
> >> behind the bridge. If a device behind the bridge has a 32-bit
> >> prefetchable BAR, we will place that BAR in the bridge's 32-bit
> >> non-prefetchable window.
> >
> > Yes. so we can keep IORESOURCE_MEM64 in the flags for PREF.
> >
> >>
> >> This minimizes the use of the 32-bit address space, at the cost of not
> >> being able to use prefetch as much.
> >>
> >>> If the bridge only support 32bit pref mmio, will still have all children pref
> >>> mmio under that.
> >>
> >> Obviously, if a bridge has a prefetchable window that's only 32 bits,
> >> 64-bit prefetchable BARs behind the bridge will have to be in that
> >> 32-bit prefetchable window or the 32-bit non-prefetchable window. And
> >> if the bridge has no prefetchable window at all, every memory BAR
> >> behind the bridge will have to be in the 32-bit non-prefetchable
> >> window.
> >
> > Yes.
> >
> >>
> >> I'll look at the actual patch later; I just want to make sure I
> >> understand your intent first.
>
> Hi, Bjorn,
>
> Can you check and add this one to your pci/resource branch?
> With that we can close the loop for 64bit mmio resource allocation.
>
Just FYI, a Mellanox net card failed after exactly this patch.
3.13-rc7 + bjorn's series is OK. After this patch applied, Mellanox
driver complains:
|mlx4_core 0003:05:00.0: Multiple PFs not yet supported. Skipping PF.
|mlx4_core: probe of 0003:05:00.0 failed with error -22
This is caused by MMIO read from BAR 0 (64-bit non-prefetchable) returns
non-zore value.
Resource assignment, as far as we can see, works fine. The noticable
effect of this patch is putting ROM BAR under non-prefetachable. I try
to revert this effect by adding MEM_64 to its ROM resource and it works
again (system does not expose 4G above aperture yet). Not sure what's
the root cause, looks like a driver/firmware/hardware defect.
Thanks
Guo Chao
> Thanks
>
> Yinghai
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g
2014-01-10 9:41 ` Guo Chao
@ 2014-01-10 17:06 ` Yinghai Lu
0 siblings, 0 replies; 10+ messages in thread
From: Yinghai Lu @ 2014-01-10 17:06 UTC (permalink / raw)
To: Guo Chao
Cc: Bjorn Helgaas, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org
On Fri, Jan 10, 2014 at 1:41 AM, Guo Chao <yan@linux.vnet.ibm.com> wrote:
> On Wed, Jan 08, 2014 at 03:34:54PM -0800, Yinghai Lu wrote:
> Just FYI, a Mellanox net card failed after exactly this patch.
>
> 3.13-rc7 + bjorn's series is OK. After this patch applied, Mellanox
> driver complains:
>
> |mlx4_core 0003:05:00.0: Multiple PFs not yet supported. Skipping PF.
> |mlx4_core: probe of 0003:05:00.0 failed with error -22
>
> This is caused by MMIO read from BAR 0 (64-bit non-prefetchable) returns
> non-zore value.
>
> Resource assignment, as far as we can see, works fine. The noticable
> effect of this patch is putting ROM BAR under non-prefetachable. I try
> to revert this effect by adding MEM_64 to its ROM resource and it works
> again (system does not expose 4G above aperture yet). Not sure what's
> the root cause, looks like a driver/firmware/hardware defect.
Interesting. Can you post boot log with "debug ignore_loglevel initcall_debug"
and with/without this patch?
Thanks
Yinghai
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g
2013-12-19 20:44 ` [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g Yinghai Lu
2013-12-23 0:00 ` Bjorn Helgaas
@ 2014-02-17 3:22 ` Guo Chao
2014-02-18 21:09 ` Bjorn Helgaas
1 sibling, 1 reply; 10+ messages in thread
From: Guo Chao @ 2014-02-17 3:22 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: linux-pci, Yinghai Lu
On Thu, Dec 19, 2013 at 12:44:03PM -0800, Yinghai Lu wrote:
> When one of children resources does not support MEM_64, MEM_64 for
> bridge get reset, so pull down whole pref resource on the bridge under 4G.
>
> If the bridge support pref mem 64, will only allocate that with pref mem64 to
> children that support it.
> For children resources if they only support pref mem 32, will allocate them
> from non pref mem instead.
>
> If the bridge only support 32bit pref mmio, will still have all children pref
> mmio under that.
>
> -v2: Add release bridge res support with bridge mem res for pref_mem children res.
> -v3: refresh and make it can be applied early before for_each_dev_res patchset.
> -v4: fix non-pref mmio 64bit support found by Guo Chao.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Tested-by: Guo Chao <yan@linux.vnet.ibm.com>
> ---
> drivers/pci/setup-bus.c | 138 ++++++++++++++++++++++++++++++++----------------
> drivers/pci/setup-res.c | 20 ++++++-
> 2 files changed, 111 insertions(+), 47 deletions(-)
Hi, Bjorn
What's the status of this patch?
Regards,
Guo Chao
>
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 138bdd6..b29504f 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -713,12 +713,11 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
> bus resource of a given type. Note: we intentionally skip
> the bus resources which have already been assigned (that is,
> have non-NULL parent resource). */
> -static struct resource *find_free_bus_resource(struct pci_bus *bus, unsigned long type)
> +static struct resource *find_free_bus_resource(struct pci_bus *bus,
> + unsigned long type_mask, unsigned long type)
> {
> int i;
> struct resource *r;
> - unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
> - IORESOURCE_PREFETCH;
>
> pci_bus_for_each_resource(bus, r, i) {
> if (r == &ioport_resource || r == &iomem_resource)
> @@ -815,7 +814,8 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
> resource_size_t add_size, struct list_head *realloc_head)
> {
> struct pci_dev *dev;
> - struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO);
> + struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
> + IORESOURCE_IO);
> resource_size_t size = 0, size0 = 0, size1 = 0;
> resource_size_t children_add_size = 0;
> resource_size_t min_align, align;
> @@ -915,15 +915,17 @@ static inline resource_size_t calculate_mem_align(resource_size_t *aligns,
> * guarantees that all child resources fit in this size.
> */
> static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
> - unsigned long type, resource_size_t min_size,
> - resource_size_t add_size,
> - struct list_head *realloc_head)
> + unsigned long type, unsigned long type2,
> + unsigned long type3,
> + resource_size_t min_size, resource_size_t add_size,
> + struct list_head *realloc_head)
> {
> struct pci_dev *dev;
> resource_size_t min_align, align, size, size0, size1;
> resource_size_t aligns[12]; /* Alignments from 1Mb to 2Gb */
> int order, max_order;
> - struct resource *b_res = find_free_bus_resource(bus, type);
> + struct resource *b_res = find_free_bus_resource(bus,
> + mask | IORESOURCE_PREFETCH, type);
> unsigned int mem64_mask = 0;
> resource_size_t children_add_size = 0;
>
> @@ -944,7 +946,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
> struct resource *r = &dev->resource[i];
> resource_size_t r_size;
>
> - if (r->parent || (r->flags & mask) != type)
> + if (r->parent || ((r->flags & mask) != type &&
> + (r->flags & mask) != type2 &&
> + (r->flags & mask) != type3))
> continue;
> r_size = resource_size(r);
> #ifdef CONFIG_PCI_IOV
> @@ -1117,8 +1121,9 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
> struct list_head *realloc_head)
> {
> struct pci_dev *dev;
> - unsigned long mask, prefmask;
> + unsigned long mask, prefmask, type2 = 0, type3 = 0;
> resource_size_t additional_mem_size = 0, additional_io_size = 0;
> + struct resource *b_res;
>
> list_for_each_entry(dev, &bus->devices, bus_list) {
> struct pci_bus *b = dev->subordinate;
> @@ -1163,15 +1168,34 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
> has already been allocated by arch code, try
> non-prefetchable range for both types of PCI memory
> resources. */
> + b_res = &bus->self->resource[PCI_BRIDGE_RESOURCES];
> mask = IORESOURCE_MEM;
> prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH;
> - if (pbus_size_mem(bus, prefmask, prefmask,
> + if (b_res[2].flags & IORESOURCE_MEM_64) {
> + prefmask |= IORESOURCE_MEM_64;
> + if (pbus_size_mem(bus, prefmask, prefmask,
> + prefmask, prefmask,
> realloc_head ? 0 : additional_mem_size,
> - additional_mem_size, realloc_head))
> - mask = prefmask; /* Success, size non-prefetch only. */
> - else
> - additional_mem_size += additional_mem_size;
> - pbus_size_mem(bus, mask, IORESOURCE_MEM,
> + additional_mem_size, realloc_head)) {
> + /* Success, size non-pref64 only. */
> + mask = prefmask;
> + type2 = prefmask & ~IORESOURCE_MEM_64;
> + type3 = prefmask & ~IORESOURCE_PREFETCH;
> + }
> + }
> + if (!type2) {
> + prefmask &= ~IORESOURCE_MEM_64;
> + if (pbus_size_mem(bus, prefmask, prefmask,
> + prefmask, prefmask,
> + realloc_head ? 0 : additional_mem_size,
> + additional_mem_size, realloc_head)) {
> + /* Success, size non-prefetch only. */
> + mask = prefmask;
> + } else
> + additional_mem_size += additional_mem_size;
> + type2 = type3 = IORESOURCE_MEM;
> + }
> + pbus_size_mem(bus, mask, IORESOURCE_MEM, type2, type3,
> realloc_head ? 0 : additional_mem_size,
> additional_mem_size, realloc_head);
> break;
> @@ -1257,42 +1281,66 @@ static void __ref __pci_bridge_assign_resources(const struct pci_dev *bridge,
> static void pci_bridge_release_resources(struct pci_bus *bus,
> unsigned long type)
> {
> - int idx;
> - bool changed = false;
> - struct pci_dev *dev;
> + struct pci_dev *dev = bus->self;
> struct resource *r;
> unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
> - IORESOURCE_PREFETCH;
> + IORESOURCE_PREFETCH | IORESOURCE_MEM_64;
> + unsigned old_flags = 0;
> + struct resource *b_res;
> + int idx = 1;
>
> - dev = bus->self;
> - for (idx = PCI_BRIDGE_RESOURCES; idx <= PCI_BRIDGE_RESOURCE_END;
> - idx++) {
> - r = &dev->resource[idx];
> - if ((r->flags & type_mask) != type)
> - continue;
> - if (!r->parent)
> - continue;
> - /*
> - * if there are children under that, we should release them
> - * all
> - */
> - release_child_resources(r);
> - if (!release_resource(r)) {
> - dev_printk(KERN_DEBUG, &dev->dev,
> - "resource %d %pR released\n", idx, r);
> - /* keep the old size */
> - r->end = resource_size(r) - 1;
> - r->start = 0;
> - r->flags = 0;
> - changed = true;
> - }
> - }
> + b_res = &dev->resource[PCI_BRIDGE_RESOURCES];
> +
> + /*
> + * 1. if there is io port assign fail, will release bridge
> + * io port.
> + * 2. if there is non pref mmio assign fail, release bridge
> + * nonpref mmio.
> + * 3. if there is 64bit pref mmio assign fail, and bridge pref
> + * is 64bit, release bridge pref mmio.
> + * 4. if there is pref mmio assign fail, and bridge pref is
> + * 32bit mmio, release bridge pref mmio
> + * 5. if there is pref mmio assign fail, and bridge pref is not
> + * assigned, release bridge nonpref mmio.
> + */
> + if (type & IORESOURCE_IO)
> + idx = 0;
> + else if (!(type & IORESOURCE_PREFETCH))
> + idx = 1;
> + else if ((type & IORESOURCE_MEM_64) &&
> + (b_res[2].flags & IORESOURCE_MEM_64))
> + idx = 2;
> + else if (!(b_res[2].flags & IORESOURCE_MEM_64) &&
> + (b_res[2].flags & IORESOURCE_PREFETCH))
> + idx = 2;
> + else
> + idx = 1;
> +
> + r = &b_res[idx];
> +
> + if (!r->parent)
> + return;
> +
> + /*
> + * if there are children under that, we should release them
> + * all
> + */
> + release_child_resources(r);
> + if (!release_resource(r)) {
> + type = old_flags = r->flags & type_mask;
> + dev_printk(KERN_DEBUG, &dev->dev, "resource %d %pR released\n",
> + PCI_BRIDGE_RESOURCES + idx, r);
> + /* keep the old size */
> + r->end = resource_size(r) - 1;
> + r->start = 0;
> + r->flags = 0;
>
> - if (changed) {
> /* avoiding touch the one without PREF */
> if (type & IORESOURCE_PREFETCH)
> type = IORESOURCE_PREFETCH;
> __pci_setup_bridge(bus, type);
> + /* for next child res under same bridge */
> + r->flags = old_flags;
> }
> }
>
> @@ -1471,7 +1519,7 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
> LIST_HEAD(fail_head);
> struct pci_dev_resource *fail_res;
> unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
> - IORESOURCE_PREFETCH;
> + IORESOURCE_PREFETCH | IORESOURCE_MEM_64;
> int pci_try_num = 1;
> enum enable_type enable_local;
>
> diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
> index 5c060b1..2c659e4 100644
> --- a/drivers/pci/setup-res.c
> +++ b/drivers/pci/setup-res.c
> @@ -208,15 +208,31 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
>
> /* First, try exact prefetching match.. */
> ret = pci_bus_alloc_resource(bus, res, size, align, min,
> - IORESOURCE_PREFETCH,
> + IORESOURCE_PREFETCH | IORESOURCE_MEM_64,
> pcibios_align_resource, dev);
>
> - if (ret < 0 && (res->flags & IORESOURCE_PREFETCH)) {
> + if (ret < 0 &&
> + (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) ==
> + (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) {
> + /*
> + * That failed.
> + *
> + * Try below 4g pref
> + */
> + ret = pci_bus_alloc_resource(bus, res, size, align, min,
> + IORESOURCE_PREFETCH,
> + pcibios_align_resource, dev);
> + }
> +
> + if (ret < 0 &&
> + (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64))) {
> /*
> * That failed.
> *
> * But a prefetching area can handle a non-prefetching
> * window (it will just not perform as well).
> + *
> + * Also can put 64bit under 32bit range. (below 4g).
> */
> ret = pci_bus_alloc_resource(bus, res, size, align, min, 0,
> pcibios_align_resource, dev);
> --
> 1.8.4
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g
2014-02-17 3:22 ` Guo Chao
@ 2014-02-18 21:09 ` Bjorn Helgaas
0 siblings, 0 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2014-02-18 21:09 UTC (permalink / raw)
To: Guo Chao; +Cc: linux-pci, Yinghai Lu
On Mon, Feb 17, 2014 at 11:22:21AM +0800, Guo Chao wrote:
> On Thu, Dec 19, 2013 at 12:44:03PM -0800, Yinghai Lu wrote:
> > When one of children resources does not support MEM_64, MEM_64 for
> > bridge get reset, so pull down whole pref resource on the bridge under 4G.
> >
> > If the bridge support pref mem 64, will only allocate that with pref mem64 to
> > children that support it.
> > For children resources if they only support pref mem 32, will allocate them
> > from non pref mem instead.
> >
> > If the bridge only support 32bit pref mmio, will still have all children pref
> > mmio under that.
> >
> > -v2: Add release bridge res support with bridge mem res for pref_mem children res.
> > -v3: refresh and make it can be applied early before for_each_dev_res patchset.
> > -v4: fix non-pref mmio 64bit support found by Guo Chao.
> >
> > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> > Tested-by: Guo Chao <yan@linux.vnet.ibm.com>
> > ---
> > drivers/pci/setup-bus.c | 138 ++++++++++++++++++++++++++++++++----------------
> > drivers/pci/setup-res.c | 20 ++++++-
> > 2 files changed, 111 insertions(+), 47 deletions(-)
>
> Hi, Bjorn
>
> What's the status of this patch?
I dropped it because you said it didn't work. At least, that's what I
*thought* you meant. Here is what you wrote:
> Just FYI, a Mellanox net card failed after exactly this patch.
> 3.13-rc7 + bjorn's series is OK. After this patch applied, Mellanox
> driver complains:
> |mlx4_core 0003:05:00.0: Multiple PFs not yet supported. Skipping PF.
> |mlx4_core: probe of 0003:05:00.0 failed with error -22
> This is caused by MMIO read from BAR 0 (64-bit non-prefetchable) returns
> non-zore value.
> Resource assignment, as far as we can see, works fine. The noticable
> effect of this patch is putting ROM BAR under non-prefetachable. I try
> to revert this effect by adding MEM_64 to its ROM resource and it works
> again (system does not expose 4G above aperture yet). Not sure what's
> the root cause, looks like a driver/firmware/hardware defect.
I assumed that you and Yinghai would hash this out and post an updated
series with your Tested-by.
Let me know if I didn't understand you correctly.
Bjorn
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index 138bdd6..b29504f 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -713,12 +713,11 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
> > bus resource of a given type. Note: we intentionally skip
> > the bus resources which have already been assigned (that is,
> > have non-NULL parent resource). */
> > -static struct resource *find_free_bus_resource(struct pci_bus *bus, unsigned long type)
> > +static struct resource *find_free_bus_resource(struct pci_bus *bus,
> > + unsigned long type_mask, unsigned long type)
> > {
> > int i;
> > struct resource *r;
> > - unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
> > - IORESOURCE_PREFETCH;
> >
> > pci_bus_for_each_resource(bus, r, i) {
> > if (r == &ioport_resource || r == &iomem_resource)
> > @@ -815,7 +814,8 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
> > resource_size_t add_size, struct list_head *realloc_head)
> > {
> > struct pci_dev *dev;
> > - struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO);
> > + struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
> > + IORESOURCE_IO);
> > resource_size_t size = 0, size0 = 0, size1 = 0;
> > resource_size_t children_add_size = 0;
> > resource_size_t min_align, align;
> > @@ -915,15 +915,17 @@ static inline resource_size_t calculate_mem_align(resource_size_t *aligns,
> > * guarantees that all child resources fit in this size.
> > */
> > static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
> > - unsigned long type, resource_size_t min_size,
> > - resource_size_t add_size,
> > - struct list_head *realloc_head)
> > + unsigned long type, unsigned long type2,
> > + unsigned long type3,
> > + resource_size_t min_size, resource_size_t add_size,
> > + struct list_head *realloc_head)
> > {
> > struct pci_dev *dev;
> > resource_size_t min_align, align, size, size0, size1;
> > resource_size_t aligns[12]; /* Alignments from 1Mb to 2Gb */
> > int order, max_order;
> > - struct resource *b_res = find_free_bus_resource(bus, type);
> > + struct resource *b_res = find_free_bus_resource(bus,
> > + mask | IORESOURCE_PREFETCH, type);
> > unsigned int mem64_mask = 0;
> > resource_size_t children_add_size = 0;
> >
> > @@ -944,7 +946,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
> > struct resource *r = &dev->resource[i];
> > resource_size_t r_size;
> >
> > - if (r->parent || (r->flags & mask) != type)
> > + if (r->parent || ((r->flags & mask) != type &&
> > + (r->flags & mask) != type2 &&
> > + (r->flags & mask) != type3))
> > continue;
> > r_size = resource_size(r);
> > #ifdef CONFIG_PCI_IOV
> > @@ -1117,8 +1121,9 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
> > struct list_head *realloc_head)
> > {
> > struct pci_dev *dev;
> > - unsigned long mask, prefmask;
> > + unsigned long mask, prefmask, type2 = 0, type3 = 0;
> > resource_size_t additional_mem_size = 0, additional_io_size = 0;
> > + struct resource *b_res;
> >
> > list_for_each_entry(dev, &bus->devices, bus_list) {
> > struct pci_bus *b = dev->subordinate;
> > @@ -1163,15 +1168,34 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
> > has already been allocated by arch code, try
> > non-prefetchable range for both types of PCI memory
> > resources. */
> > + b_res = &bus->self->resource[PCI_BRIDGE_RESOURCES];
> > mask = IORESOURCE_MEM;
> > prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH;
> > - if (pbus_size_mem(bus, prefmask, prefmask,
> > + if (b_res[2].flags & IORESOURCE_MEM_64) {
> > + prefmask |= IORESOURCE_MEM_64;
> > + if (pbus_size_mem(bus, prefmask, prefmask,
> > + prefmask, prefmask,
> > realloc_head ? 0 : additional_mem_size,
> > - additional_mem_size, realloc_head))
> > - mask = prefmask; /* Success, size non-prefetch only. */
> > - else
> > - additional_mem_size += additional_mem_size;
> > - pbus_size_mem(bus, mask, IORESOURCE_MEM,
> > + additional_mem_size, realloc_head)) {
> > + /* Success, size non-pref64 only. */
> > + mask = prefmask;
> > + type2 = prefmask & ~IORESOURCE_MEM_64;
> > + type3 = prefmask & ~IORESOURCE_PREFETCH;
> > + }
> > + }
> > + if (!type2) {
> > + prefmask &= ~IORESOURCE_MEM_64;
> > + if (pbus_size_mem(bus, prefmask, prefmask,
> > + prefmask, prefmask,
> > + realloc_head ? 0 : additional_mem_size,
> > + additional_mem_size, realloc_head)) {
> > + /* Success, size non-prefetch only. */
> > + mask = prefmask;
> > + } else
> > + additional_mem_size += additional_mem_size;
> > + type2 = type3 = IORESOURCE_MEM;
> > + }
> > + pbus_size_mem(bus, mask, IORESOURCE_MEM, type2, type3,
> > realloc_head ? 0 : additional_mem_size,
> > additional_mem_size, realloc_head);
> > break;
> > @@ -1257,42 +1281,66 @@ static void __ref __pci_bridge_assign_resources(const struct pci_dev *bridge,
> > static void pci_bridge_release_resources(struct pci_bus *bus,
> > unsigned long type)
> > {
> > - int idx;
> > - bool changed = false;
> > - struct pci_dev *dev;
> > + struct pci_dev *dev = bus->self;
> > struct resource *r;
> > unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
> > - IORESOURCE_PREFETCH;
> > + IORESOURCE_PREFETCH | IORESOURCE_MEM_64;
> > + unsigned old_flags = 0;
> > + struct resource *b_res;
> > + int idx = 1;
> >
> > - dev = bus->self;
> > - for (idx = PCI_BRIDGE_RESOURCES; idx <= PCI_BRIDGE_RESOURCE_END;
> > - idx++) {
> > - r = &dev->resource[idx];
> > - if ((r->flags & type_mask) != type)
> > - continue;
> > - if (!r->parent)
> > - continue;
> > - /*
> > - * if there are children under that, we should release them
> > - * all
> > - */
> > - release_child_resources(r);
> > - if (!release_resource(r)) {
> > - dev_printk(KERN_DEBUG, &dev->dev,
> > - "resource %d %pR released\n", idx, r);
> > - /* keep the old size */
> > - r->end = resource_size(r) - 1;
> > - r->start = 0;
> > - r->flags = 0;
> > - changed = true;
> > - }
> > - }
> > + b_res = &dev->resource[PCI_BRIDGE_RESOURCES];
> > +
> > + /*
> > + * 1. if there is io port assign fail, will release bridge
> > + * io port.
> > + * 2. if there is non pref mmio assign fail, release bridge
> > + * nonpref mmio.
> > + * 3. if there is 64bit pref mmio assign fail, and bridge pref
> > + * is 64bit, release bridge pref mmio.
> > + * 4. if there is pref mmio assign fail, and bridge pref is
> > + * 32bit mmio, release bridge pref mmio
> > + * 5. if there is pref mmio assign fail, and bridge pref is not
> > + * assigned, release bridge nonpref mmio.
> > + */
> > + if (type & IORESOURCE_IO)
> > + idx = 0;
> > + else if (!(type & IORESOURCE_PREFETCH))
> > + idx = 1;
> > + else if ((type & IORESOURCE_MEM_64) &&
> > + (b_res[2].flags & IORESOURCE_MEM_64))
> > + idx = 2;
> > + else if (!(b_res[2].flags & IORESOURCE_MEM_64) &&
> > + (b_res[2].flags & IORESOURCE_PREFETCH))
> > + idx = 2;
> > + else
> > + idx = 1;
> > +
> > + r = &b_res[idx];
> > +
> > + if (!r->parent)
> > + return;
> > +
> > + /*
> > + * if there are children under that, we should release them
> > + * all
> > + */
> > + release_child_resources(r);
> > + if (!release_resource(r)) {
> > + type = old_flags = r->flags & type_mask;
> > + dev_printk(KERN_DEBUG, &dev->dev, "resource %d %pR released\n",
> > + PCI_BRIDGE_RESOURCES + idx, r);
> > + /* keep the old size */
> > + r->end = resource_size(r) - 1;
> > + r->start = 0;
> > + r->flags = 0;
> >
> > - if (changed) {
> > /* avoiding touch the one without PREF */
> > if (type & IORESOURCE_PREFETCH)
> > type = IORESOURCE_PREFETCH;
> > __pci_setup_bridge(bus, type);
> > + /* for next child res under same bridge */
> > + r->flags = old_flags;
> > }
> > }
> >
> > @@ -1471,7 +1519,7 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
> > LIST_HEAD(fail_head);
> > struct pci_dev_resource *fail_res;
> > unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
> > - IORESOURCE_PREFETCH;
> > + IORESOURCE_PREFETCH | IORESOURCE_MEM_64;
> > int pci_try_num = 1;
> > enum enable_type enable_local;
> >
> > diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
> > index 5c060b1..2c659e4 100644
> > --- a/drivers/pci/setup-res.c
> > +++ b/drivers/pci/setup-res.c
> > @@ -208,15 +208,31 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,
> >
> > /* First, try exact prefetching match.. */
> > ret = pci_bus_alloc_resource(bus, res, size, align, min,
> > - IORESOURCE_PREFETCH,
> > + IORESOURCE_PREFETCH | IORESOURCE_MEM_64,
> > pcibios_align_resource, dev);
> >
> > - if (ret < 0 && (res->flags & IORESOURCE_PREFETCH)) {
> > + if (ret < 0 &&
> > + (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) ==
> > + (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) {
> > + /*
> > + * That failed.
> > + *
> > + * Try below 4g pref
> > + */
> > + ret = pci_bus_alloc_resource(bus, res, size, align, min,
> > + IORESOURCE_PREFETCH,
> > + pcibios_align_resource, dev);
> > + }
> > +
> > + if (ret < 0 &&
> > + (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64))) {
> > /*
> > * That failed.
> > *
> > * But a prefetching area can handle a non-prefetching
> > * window (it will just not perform as well).
> > + *
> > + * Also can put 64bit under 32bit range. (below 4g).
> > */
> > ret = pci_bus_alloc_resource(bus, res, size, align, min, 0,
> > pcibios_align_resource, dev);
> > --
> > 1.8.4
> >
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-02-18 21:09 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-19 20:44 [PATCH v6 0/2] PCI: allocate 64bit mmio pref Yinghai Lu
2013-12-19 20:44 ` [PATCH v6 1/2] PCI: Try to allocate mem64 above 4G at first Yinghai Lu
2013-12-19 20:44 ` [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g Yinghai Lu
2013-12-23 0:00 ` Bjorn Helgaas
2013-12-23 1:14 ` Yinghai Lu
2014-01-08 23:34 ` Yinghai Lu
2014-01-10 9:41 ` Guo Chao
2014-01-10 17:06 ` Yinghai Lu
2014-02-17 3:22 ` Guo Chao
2014-02-18 21:09 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).