* [PATCH v3 1/6] resources: ensure alignment callback doesn't allocate below available start
2010-10-13 16:15 [PATCH v3 0/6] PCI: allocate space top-down, not bottom-up Bjorn Helgaas
@ 2010-10-13 16:15 ` Bjorn Helgaas
2010-10-13 16:15 ` [PATCH v3 2/6] resources: allocate space within a region from the top down Bjorn Helgaas
` (4 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2010-10-13 16:15 UTC (permalink / raw)
To: Jesse Barnes
Cc: Bob Picco, Brian Bloniarz, Charles Butterfield, Denys Vlasenko,
linux-pci, Horst H. von Brand, linux-kernel, Stefan Becker,
H. Peter Anvin, Yinghai Lu, Thomas Gleixner, Linus Torvalds,
Ingo Molnar
The alignment callback returns a proposed location, which may have been
adjusted to avoid ISA aliases or for other architecture-specific reasons.
We already had a check ("tmp.start < tmp.end") to make sure the callback
doesn't return a location above the available area.
This patch adds a check to make sure the callback doesn't return something
*below* the available area, as may happen if the callback tries to allocate
top-down.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
---
kernel/resource.c | 10 ++++++++--
1 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/kernel/resource.c b/kernel/resource.c
index 7b36976..ace2269 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -371,6 +371,7 @@ static int find_resource(struct resource *root, struct resource *new,
{
struct resource *this = root->child;
struct resource tmp = *new;
+ resource_size_t start;
tmp.start = root->start;
/*
@@ -391,8 +392,13 @@ static int find_resource(struct resource *root, struct resource *new,
if (tmp.end > max)
tmp.end = max;
tmp.start = ALIGN(tmp.start, align);
- if (alignf)
- tmp.start = alignf(alignf_data, &tmp, size, align);
+ if (alignf) {
+ start = alignf(alignf_data, &tmp, size, align);
+ if (tmp.start <= start && start <= tmp.end)
+ tmp.start = start;
+ else
+ tmp.start = tmp.end;
+ }
if (tmp.start < tmp.end && tmp.end - tmp.start >= size - 1) {
new->start = tmp.start;
new->end = tmp.start + size - 1;
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH v3 2/6] resources: allocate space within a region from the top down
2010-10-13 16:15 [PATCH v3 0/6] PCI: allocate space top-down, not bottom-up Bjorn Helgaas
2010-10-13 16:15 ` [PATCH v3 1/6] resources: ensure alignment callback doesn't allocate below available start Bjorn Helgaas
@ 2010-10-13 16:15 ` Bjorn Helgaas
[not found] ` <AANLkTinJqohDTjDLkBn84e9zn80opZss7kX_MPnoX4vd@mail.gmail.com>
2010-10-13 16:15 ` [PATCH v3 3/6] PCI: allocate bus resources " Bjorn Helgaas
` (3 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2010-10-13 16:15 UTC (permalink / raw)
To: Jesse Barnes
Cc: Bob Picco, Brian Bloniarz, Charles Butterfield, Denys Vlasenko,
linux-pci, Horst H. von Brand, linux-kernel, Stefan Becker,
H. Peter Anvin, Yinghai Lu, Thomas Gleixner, Linus Torvalds,
Ingo Molnar
Allocate space from the top of a region first, then work downward,
if an architecture desires this.
I think it's too dangerous to do this across the board, because
iomem_resource.end is initialized to -1, which is 0xffffffff_ffffffff
on 64-bit architectures, and most machines can't address the entire
64-bit physical address space. So this patch is only effective if the
architecture defines ARCH_HAS_TOP_DOWN_ALLOC.
When we allocate space from a resource, we look for gaps between children
of the resource. Previously, we always looked at gaps from the bottom up.
For example, given this:
[mem 0xbff00000-0xf7ffffff] PCI Bus 0000:00
[mem 0xc0000000-0xdfffffff] PCI Bus 0000:02
we attempted to allocate from the [mem 0xbff00000-0xbfffffff] gap first,
then the [mem 0xe0000000-0xf7ffffff] gap.
With this patch, if the architecture defines ARCH_HAS_TOP_DOWN_ALLOC,
we allocate from [mem 0xe0000000-0xf7ffffff] first.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
---
kernel/resource.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 70 insertions(+), 0 deletions(-)
diff --git a/kernel/resource.c b/kernel/resource.c
index ace2269..9218e8e 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -357,8 +357,77 @@ int __weak page_is_ram(unsigned long pfn)
return walk_system_ram_range(pfn, 1, NULL, __is_ram) == 1;
}
+#ifdef ARCH_HAS_TOP_DOWN_ALLOC
+/*
+ * Find the resource before "child" in the sibling list of "root" children.
+ */
+static struct resource *find_sibling_prev(struct resource *root, struct resource *child)
+{
+ struct resource *this;
+
+ for (this = root->child; this; this = this->sibling)
+ if (this->sibling == child)
+ return this;
+
+ return NULL;
+}
+
/*
* Find empty slot in the resource tree given range and alignment.
+ * This version allocates from the end of the root resource first.
+ */
+static int find_resource(struct resource *root, struct resource *new,
+ resource_size_t size, resource_size_t min,
+ resource_size_t max, resource_size_t align,
+ resource_size_t (*alignf)(void *,
+ const struct resource *,
+ resource_size_t,
+ resource_size_t),
+ void *alignf_data)
+{
+ struct resource *this;
+ struct resource tmp = *new;
+ resource_size_t start;
+
+ tmp.start = root->end;
+ tmp.end = root->end;
+
+ this = find_sibling_prev(root, NULL);
+ for (;;) {
+ if (this && this->end < root->end)
+ tmp.start = this->end + 1;
+ else
+ tmp.start = root->start;
+ if (tmp.start < min)
+ tmp.start = min;
+ if (tmp.end > max)
+ tmp.end = max;
+ tmp.start = ALIGN(tmp.start, align);
+ if (alignf) {
+ start = alignf(alignf_data, &tmp, size, align);
+ if (tmp.start <= start && start <= tmp.end)
+ tmp.start = start;
+ else
+ tmp.start = tmp.end;
+ }
+ if (tmp.start < tmp.end && tmp.end - tmp.start >= size - 1) {
+ new->start = tmp.start;
+ new->end = tmp.start + size - 1;
+ return 0;
+ }
+ if (!this || this->start == root->start)
+ break;
+ tmp.end = this->start - 1;
+ this = find_sibling_prev(root, this);
+ }
+ return -EBUSY;
+}
+
+#else
+
+/*
+ * Find empty slot in the resource tree given range and alignment.
+ * This version allocates from the beginning of the root resource first.
*/
static int find_resource(struct resource *root, struct resource *new,
resource_size_t size, resource_size_t min,
@@ -411,6 +480,7 @@ static int find_resource(struct resource *root, struct resource *new,
}
return -EBUSY;
}
+#endif
/**
* allocate_resource - allocate empty slot in the resource tree given range & alignment
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH v3 3/6] PCI: allocate bus resources from the top down
2010-10-13 16:15 [PATCH v3 0/6] PCI: allocate space top-down, not bottom-up Bjorn Helgaas
2010-10-13 16:15 ` [PATCH v3 1/6] resources: ensure alignment callback doesn't allocate below available start Bjorn Helgaas
2010-10-13 16:15 ` [PATCH v3 2/6] resources: allocate space within a region from the top down Bjorn Helgaas
@ 2010-10-13 16:15 ` Bjorn Helgaas
2010-10-13 16:15 ` [PATCH v3 4/6] x86/PCI: allocate space from the end of a region, not the beginning Bjorn Helgaas
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2010-10-13 16:15 UTC (permalink / raw)
To: Jesse Barnes
Cc: Bob Picco, Brian Bloniarz, Charles Butterfield, Denys Vlasenko,
linux-pci, Horst H. von Brand, linux-kernel, Stefan Becker,
H. Peter Anvin, Yinghai Lu, Thomas Gleixner, Linus Torvalds,
Ingo Molnar
Allocate space from the highest-address PCI bus resource first, then work
downward.
Previously, we looked for space in PCI host bridge windows in the order
we discovered the windows. For example, given the following windows
(discovered via an ACPI _CRS method):
pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
pci_root PNP0A03:00: host bridge window [mem 0x000c0000-0x000effff]
pci_root PNP0A03:00: host bridge window [mem 0x000f0000-0x000fffff]
pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xf7ffffff]
pci_root PNP0A03:00: host bridge window [mem 0xff980000-0xff980fff]
pci_root PNP0A03:00: host bridge window [mem 0xff97c000-0xff97ffff]
pci_root PNP0A03:00: host bridge window [mem 0xfed20000-0xfed9ffff]
we attempted to allocate from [mem 0x000a0000-0x000bffff] first, then
[mem 0x000c0000-0x000effff], and so on.
With this patch, we allocate from [mem 0xff980000-0xff980fff] first, then
[mem 0xff97c000-0xff97ffff], [mem 0xfed20000-0xfed9ffff], etc.
Allocating top-down follows Windows practice, so we're less likely to
trip over BIOS defects in the _CRS description.
On the machine above (a Dell T3500), the [mem 0xbff00000-0xbfffffff] region
doesn't actually work and is likely a BIOS defect. The symptom is that we
move the AHCI controller to 0xbff00000, which leads to "Boot has failed,
sleeping forever," a BUG in ahci_stop_engine(), or some other boot failure.
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c43
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=620313
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=629933
Reported-by: Brian Bloniarz <phunge0@hotmail.com>
Reported-and-tested-by: Stefan Becker <chemobejk@gmail.com>
Reported-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
---
drivers/pci/bus.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++-----
1 files changed, 48 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 7f0af0e..172bf26 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -64,6 +64,49 @@ void pci_bus_remove_resources(struct pci_bus *bus)
}
}
+/*
+ * Find the highest-address bus resource below the cursor "res". If the
+ * cursor is NULL, return the highest resource.
+ */
+static struct resource *pci_bus_find_resource_prev(struct pci_bus *bus,
+ unsigned int type,
+ struct resource *res)
+{
+ struct resource *r, *prev = NULL;
+ int i;
+
+ pci_bus_for_each_resource(bus, r, i) {
+ if (!r)
+ continue;
+
+ if ((r->flags & IORESOURCE_TYPE_BITS) != type)
+ continue;
+
+ /* If this resource is at or past the cursor, skip it */
+ if (res) {
+ if (r == res)
+ continue;
+ if (r->end > res->end)
+ continue;
+ if (r->end == res->end && r->start > res->start)
+ continue;
+ }
+
+ if (!prev)
+ prev = r;
+
+ /*
+ * A small resource is higher than a large one that ends at
+ * the same address.
+ */
+ if (r->end > prev->end ||
+ (r->end == prev->end && r->start > prev->start))
+ prev = r;
+ }
+
+ return prev;
+}
+
/**
* pci_bus_alloc_resource - allocate a resource from a parent bus
* @bus: PCI bus
@@ -89,9 +132,10 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
resource_size_t),
void *alignf_data)
{
- int i, ret = -ENOMEM;
+ int ret = -ENOMEM;
struct resource *r;
resource_size_t max = -1;
+ unsigned int type = res->flags & IORESOURCE_TYPE_BITS;
type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
@@ -99,10 +143,9 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
if (!(res->flags & IORESOURCE_MEM_64))
max = PCIBIOS_MAX_MEM_32;
- pci_bus_for_each_resource(bus, r, i) {
- if (!r)
- continue;
-
+ /* Look for space at highest addresses first */
+ r = pci_bus_find_resource_prev(bus, type, NULL);
+ for ( ; r; r = pci_bus_find_resource_prev(bus, type, r)) {
/* type_mask must match */
if ((res->flags ^ r->flags) & type_mask)
continue;
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH v3 4/6] x86/PCI: allocate space from the end of a region, not the beginning
2010-10-13 16:15 [PATCH v3 0/6] PCI: allocate space top-down, not bottom-up Bjorn Helgaas
` (2 preceding siblings ...)
2010-10-13 16:15 ` [PATCH v3 3/6] PCI: allocate bus resources " Bjorn Helgaas
@ 2010-10-13 16:15 ` Bjorn Helgaas
2010-10-13 21:27 ` H. Peter Anvin
2010-10-13 16:15 ` [PATCH v3 5/6] x86: update iomem_resource end based on CPU physical address capabilities Bjorn Helgaas
2010-10-13 16:15 ` [PATCH v3 6/6] x86: allocate space within a region top-down Bjorn Helgaas
5 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2010-10-13 16:15 UTC (permalink / raw)
To: Jesse Barnes
Cc: Bob Picco, Brian Bloniarz, Charles Butterfield, Denys Vlasenko,
linux-pci, Horst H. von Brand, linux-kernel, Stefan Becker,
H. Peter Anvin, Yinghai Lu, Thomas Gleixner, Linus Torvalds,
Ingo Molnar
Allocate from the end of a region, not the beginning.
For example, if we need to allocate 0x800 bytes for a device on bus
0000:00 given these resources:
[mem 0xbff00000-0xdfffffff] PCI Bus 0000:00
[mem 0xc0000000-0xdfffffff] PCI Bus 0000:02
the available space at [mem 0xbff00000-0xbfffffff] is passed to the
alignment callback (pcibios_align_resource()). Prior to this patch, we
would put the new 0x800 byte resource at the beginning of that available
space, i.e., at [mem 0xbff00000-0xbff007ff].
With this patch, we put it at the end, at [mem 0xbffff800-0xbfffffff].
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c41
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
---
arch/x86/pci/i386.c | 18 ++++++++++++------
1 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 5525309..fe866c8 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -37,6 +37,7 @@
#include <asm/pci_x86.h>
#include <asm/io_apic.h>
+#define ALIGN_DOWN(x, a) ((x) & ~(a - 1))
static int
skip_isa_ioresource_align(struct pci_dev *dev) {
@@ -65,16 +66,21 @@ pcibios_align_resource(void *data, const struct resource *res,
resource_size_t size, resource_size_t align)
{
struct pci_dev *dev = data;
- resource_size_t start = res->start;
+ resource_size_t start = ALIGN_DOWN(res->end - size + 1, align);
if (res->flags & IORESOURCE_IO) {
- if (skip_isa_ioresource_align(dev))
- return start;
- if (start & 0x300)
- start = (start + 0x3ff) & ~0x3ff;
+
+ /*
+ * If we're avoiding ISA aliases, the largest contiguous I/O
+ * port space is 256 bytes. Clearing bits 9 and 10 preserves
+ * all 256-byte and smaller alignments, so the result will
+ * still be correctly aligned.
+ */
+ if (!skip_isa_ioresource_align(dev))
+ start &= ~0x300;
} else if (res->flags & IORESOURCE_MEM) {
if (start < BIOS_END)
- start = BIOS_END;
+ start = res->end; /* fail; no space */
}
return start;
}
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH v3 4/6] x86/PCI: allocate space from the end of a region, not the beginning
2010-10-13 16:15 ` [PATCH v3 4/6] x86/PCI: allocate space from the end of a region, not the beginning Bjorn Helgaas
@ 2010-10-13 21:27 ` H. Peter Anvin
2010-10-14 14:39 ` Bjorn Helgaas
0 siblings, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2010-10-13 21:27 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Jesse Barnes, Bob Picco, Brian Bloniarz, Charles Butterfield,
Denys Vlasenko, linux-pci, Horst H. von Brand, linux-kernel,
Stefan Becker, Yinghai Lu, Thomas Gleixner, Linus Torvalds,
Ingo Molnar
On 10/13/2010 09:15 AM, Bjorn Helgaas wrote:
>
> +#define ALIGN_DOWN(x, a) ((x) & ~(a - 1))
> - resource_size_t start = res->start;
> + resource_size_t start = ALIGN_DOWN(res->end - size + 1, align);
>
Use round_down() here instead of inventing yet another macro?
In all other respects:
Reviewed-by: H. Peter Anvin <hpa@linux.intel.com>
-hpa
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 4/6] x86/PCI: allocate space from the end of a region, not the beginning
2010-10-13 21:27 ` H. Peter Anvin
@ 2010-10-14 14:39 ` Bjorn Helgaas
0 siblings, 0 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2010-10-14 14:39 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Jesse Barnes, Bob Picco, Brian Bloniarz, Charles Butterfield,
Denys Vlasenko, linux-pci, Horst H. von Brand, linux-kernel,
Stefan Becker, Yinghai Lu, Thomas Gleixner, Linus Torvalds,
Ingo Molnar
On Wednesday, October 13, 2010 03:27:58 pm H. Peter Anvin wrote:
> On 10/13/2010 09:15 AM, Bjorn Helgaas wrote:
> >
> > +#define ALIGN_DOWN(x, a) ((x) & ~(a - 1))
> > - resource_size_t start = res->start;
> > + resource_size_t start = ALIGN_DOWN(res->end - size + 1, align);
> >
>
> Use round_down() here instead of inventing yet another macro?
Thanks! I searched for something, but I missed round_down().
Bjorn
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 5/6] x86: update iomem_resource end based on CPU physical address capabilities
2010-10-13 16:15 [PATCH v3 0/6] PCI: allocate space top-down, not bottom-up Bjorn Helgaas
` (3 preceding siblings ...)
2010-10-13 16:15 ` [PATCH v3 4/6] x86/PCI: allocate space from the end of a region, not the beginning Bjorn Helgaas
@ 2010-10-13 16:15 ` Bjorn Helgaas
2010-10-13 16:15 ` [PATCH v3 6/6] x86: allocate space within a region top-down Bjorn Helgaas
5 siblings, 0 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2010-10-13 16:15 UTC (permalink / raw)
To: Jesse Barnes
Cc: Bob Picco, Brian Bloniarz, Charles Butterfield, Denys Vlasenko,
linux-pci, Horst H. von Brand, linux-kernel, Stefan Becker,
H. Peter Anvin, Yinghai Lu, Thomas Gleixner, Linus Torvalds,
Ingo Molnar
The iomem_resource map reflects the available physical address space.
We statically initialize the end to -1, i.e., 0xffffffff_ffffffff, but
of course we can only use as much as the CPU can address.
This patch updates the end based on the CPU capabilities, so we don't
mistakenly allocate space that isn't usable, as we're likely to do when
allocating from the top-down.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
---
arch/x86/kernel/setup.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c3a4fbb..922b5a1 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -788,6 +788,7 @@ void __init setup_arch(char **cmdline_p)
x86_init.oem.arch_setup();
+ iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
setup_memory_map();
parse_setup_data();
/* update the e820_saved too */
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH v3 6/6] x86: allocate space within a region top-down
2010-10-13 16:15 [PATCH v3 0/6] PCI: allocate space top-down, not bottom-up Bjorn Helgaas
` (4 preceding siblings ...)
2010-10-13 16:15 ` [PATCH v3 5/6] x86: update iomem_resource end based on CPU physical address capabilities Bjorn Helgaas
@ 2010-10-13 16:15 ` Bjorn Helgaas
5 siblings, 0 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2010-10-13 16:15 UTC (permalink / raw)
To: Jesse Barnes
Cc: Bob Picco, Brian Bloniarz, Charles Butterfield, Denys Vlasenko,
linux-pci, Horst H. von Brand, linux-kernel, Stefan Becker,
H. Peter Anvin, Yinghai Lu, Thomas Gleixner, Linus Torvalds,
Ingo Molnar
Request that allocate_resource() use available space from high addresses
first, rather than the default of using low addresses first.
The most common place this makes a difference is when we move or assign
new PCI device resources. Low addresses are generally scarce, so it's
better to use high addresses when possible. This follows Windows practice
for PCI allocation.
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c42
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
---
arch/x86/include/asm/io.h | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 30a3e97..6cc02c4 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -35,6 +35,7 @@
*/
#define ARCH_HAS_IOREMAP_WC
+#define ARCH_HAS_TOP_DOWN_ALLOC
#include <linux/string.h>
#include <linux/compiler.h>
^ permalink raw reply related [flat|nested] 10+ messages in thread