From: Bjorn Helgaas <bjorn.helgaas@hp.com>
To: Yinghai Lu <yinghai@kernel.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>,
Len Brown <lenb@kernel.org>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-acpi@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Thomas Gleixner <tglx@linutronix.de>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@elte.hu>, Adam Belay <abelay@mit.edu>
Subject: Re: [PATCH 8/9] x86: avoid E820 regions when allocating address space
Date: Sun, 19 Dec 2010 16:33:25 -0700 [thread overview]
Message-ID: <20101219233325.GA2013@helgaas.com> (raw)
In-Reply-To: <AANLkTim68DbGeWbYKYx3_WOGFkA8KZFVHFyEFGtXt-cr@mail.gmail.com>
On Sun, Dec 19, 2010 at 01:50:50AM -0800, Yinghai Lu wrote:
> On Thu, Dec 16, 2010 at 9:38 AM, Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
> >
> > When we allocate address space, e.g., to assign it to a PCI device, don't
> > allocate anything mentioned in the BIOS E820 memory map.
> >
> > On recent machines (2008 and newer), we assign PCI resources from the
> > windows described by the ACPI PCI host bridge _CRS. On many Dell
> > machines, these windows overlap some E820 reserved areas, e.g.,
> >
> > BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved)
> > pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]
> >
> > If we put devices at 0xbff00000, they don't work, probably because
> > that's really RAM, not I/O memory. This patch prevents that by removing
> > the 0xbfe4dc00-0xbfffffff area from the "available" resource.
> >
> > I'm not very happy with this solution because Windows solves the problem
> > differently (it seems to ignore E820 reserved areas and it allocates
> > top-down instead of bottom-up; details at comment 45 of the bugzilla
> > below). That means we're vulnerable to BIOS defects that Windows would not
> > trip over. For example, if BIOS described a device in ACPI but didn't
> > mention it in E820, Windows would work fine but Linux would fail.
> >
> > Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228
> > Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
> > ---
> >
> > arch/x86/kernel/resource.c | 38 +++++++++++++++++++++++++++++++++++++-
> > 1 files changed, 37 insertions(+), 1 deletions(-)
> >
> >
> > diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
> > index 407a900..89638af 100644
> > --- a/arch/x86/kernel/resource.c
> > +++ b/arch/x86/kernel/resource.c
> > @@ -1,11 +1,47 @@
> > #include <linux/ioport.h>
> > #include <asm/e820.h>
> >
> > +static void resource_clip(struct resource *res, resource_size_t start,
> > + resource_size_t end)
> > +{
> > + resource_size_t low = 0, high = 0;
> > +
> > + if (res->end < start || res->start > end)
> > + return; /* no conflict */
> > +
> > + if (res->start < start)
> > + low = start - res->start;
> > +
> > + if (res->end > end)
> > + high = res->end - end;
> > +
> > + /* Keep the area above or below the conflict, whichever is larger */
> > + if (low > high)
> > + res->end = start - 1;
> > + else
> > + res->start = end + 1;
> > +}
> > +
> > +static void remove_e820_regions(struct resource *avail)
> > +{
> > + int i;
> > + struct e820entry *entry;
> > +
> > + for (i = 0; i < e820.nr_map; i++) {
> > + entry = &e820.map[i];
> > +
> > + resource_clip(avail, entry->addr,
> > + entry->addr + entry->size - 1);
> > + }
> > +}
> > +
> > void arch_remove_reservations(struct resource *avail)
> > {
> > - /* Trim out BIOS area (low 1MB) */
> > + /* Trim out BIOS area (low 1MB) and E820 regions */
> > if (avail->flags & IORESOURCE_MEM) {
> > if (avail->start < BIOS_END)
> > avail->start = BIOS_END;
> > +
> > + remove_e820_regions(avail);
> > }
> > }
>
> that looks expensive. it will keep going through e820 tables...
It's expensive when we do it, but we do it very rarely, i.e., only
when we call allocate_resource().
> but e820 should have been reserved in resource tree...
E820 regions don't fit very well in the resource tree. The
tree normally contains devices, which are mutually exclusive
and fit nicely in a hierarchy.
The E820 RAM regions fit that description (they can't conflict
with anything else). But generic "reserved" regions do not.
A reserved region might cover several devices, it might cover
part of a device, it might cover a piece of RAM, or it might
not be related to a device at all. We do try to wedge those
reservations into the resource tree, but I think that's a
mistake because it doesn't work very well.
In this case, the E820 reservation *is* in the resource tree,
but we took the reasonable 0xbfe4dc00 - 0xc0000000 E820 entry
and turned it into the bogus 0xbfe4dc00-0xf7ffffff range:
bfe4dc00-f7ffffff reserved (expanded)
bff00000-f7ffffff PCI Bus 0000:00
More details here:
https://bugzilla.kernel.org/show_bug.cgi?id=16228#c45
Bjorn
next prev parent reply other threads:[~2010-12-19 23:33 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-16 17:38 [PATCH 0/9] PCI: revert to allocating bottom-up, avoid E820 areas Bjorn Helgaas
2010-12-16 17:38 ` [PATCH 1/9] Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode" Bjorn Helgaas
2010-12-16 17:38 ` [PATCH 2/9] Revert "x86: allocate space within a region top-down" Bjorn Helgaas
2010-12-16 17:38 ` [PATCH 3/9] Revert "x86/PCI: allocate space from the end of a region, not the beginning" Bjorn Helgaas
2010-12-16 17:38 ` [PATCH 4/9] Revert "PCI: allocate bus resources from the top down" Bjorn Helgaas
2010-12-16 17:38 ` [PATCH 5/9] Revert "resources: support allocating space within a region " Bjorn Helgaas
2010-12-16 17:38 ` [PATCH 6/9] resources: add arch hook for preventing allocation in reserved areas Bjorn Helgaas
2010-12-16 22:03 ` Linus Torvalds
2010-12-16 22:56 ` Bjorn Helgaas
2010-12-16 23:17 ` Linus Torvalds
2010-12-16 17:38 ` [PATCH 7/9] x86: avoid low BIOS area when allocating address space Bjorn Helgaas
2010-12-16 17:38 ` [PATCH 8/9] x86: avoid E820 regions " Bjorn Helgaas
2010-12-19 9:50 ` Yinghai Lu
2010-12-19 23:33 ` Bjorn Helgaas [this message]
2010-12-16 17:39 ` [PATCH 9/9] x86: avoid high BIOS area " Bjorn Helgaas
2010-12-16 21:59 ` [PATCH 0/9] PCI: revert to allocating bottom-up, avoid E820 areas Linus Torvalds
2010-12-16 22:03 ` Jesse Barnes
2010-12-17 3:00 ` H. Peter Anvin
2010-12-17 16:00 ` Bjorn Helgaas
2010-12-19 14:31 ` David John
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101219233325.GA2013@helgaas.com \
--to=bjorn.helgaas@hp.com \
--cc=abelay@mit.edu \
--cc=hpa@zytor.com \
--cc=jbarnes@virtuousgeek.org \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox