public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: split e820 reserved entries record to late
@ 2008-08-28 19:39 Yinghai Lu
  2008-08-28 20:05 ` Linus Torvalds
  0 siblings, 1 reply; 6+ messages in thread
From: Yinghai Lu @ 2008-08-28 19:39 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	Jesse Barnes, Linus Torvalds
  Cc: linux-kernel, Yinghai Lu

so could let BAR res register at first, or even pnp?

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

---
 arch/x86/kernel/e820.c |   20 ++++++++++++++++++--
 arch/x86/pci/i386.c    |    3 +++
 include/asm-x86/e820.h |    1 +
 3 files changed, 22 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1271,13 +1271,15 @@ static inline const char *e820_type_to_s
 /*
  * Mark e820 reserved areas as busy for the resource manager.
  */
+struct resource __initdata *e820_res;
 void __init e820_reserve_resources(void)
 {
 	int i;
-	struct resource *res;
 	u64 end;
+	struct resource *res;
 
 	res = alloc_bootmem_low(sizeof(struct resource) * e820.nr_map);
+	e820_res = res;
 	for (i = 0; i < e820.nr_map; i++) {
 		end = e820.map[i].addr + e820.map[i].size - 1;
 #ifndef CONFIG_RESOURCES_64BIT
@@ -1291,7 +1293,8 @@ void __init e820_reserve_resources(void)
 		res->end = end;
 
 		res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
-		insert_resource(&iomem_resource, res);
+		if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20))
+			insert_resource(&iomem_resource, res);
 		res++;
 	}
 
@@ -1303,6 +1306,19 @@ void __init e820_reserve_resources(void)
 	}
 }
 
+void __init e820_reserve_resources_late(void)
+{
+	int i;
+	struct resource *res;
+
+	res = e820_res;
+	for (i = 0; i < e820.nr_map; i++) {
+		if (e820.map[i].type == E820_RESERVED && res->start >= (1ULL<<20))
+			insert_resource(&iomem_resource, res);
+		res++;
+	}
+}
+
 char *__init default_machine_specific_memory_setup(void)
 {
 	char *who = "BIOS-e820";
Index: linux-2.6/arch/x86/pci/i386.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/i386.c
+++ linux-2.6/arch/x86/pci/i386.c
@@ -36,6 +36,7 @@
 #include <asm/pat.h>
 #include <asm/hpet.h>
 #include <asm/io_apic.h>
+#include <asm/e820.h>
 
 #include "pci.h"
 
@@ -297,6 +298,8 @@ static int __init pcibios_assign_resourc
 		}
 	}
 
+	e820_reserve_resources_late();
+
 	pci_assign_unassigned_resources();
 
 	return 0;
Index: linux-2.6/include/asm-x86/e820.h
===================================================================
--- linux-2.6.orig/include/asm-x86/e820.h
+++ linux-2.6/include/asm-x86/e820.h
@@ -122,6 +122,7 @@ extern void e820_register_active_regions
 extern u64 e820_hole_size(u64 start, u64 end);
 extern void finish_e820_parsing(void);
 extern void e820_reserve_resources(void);
+extern void e820_reserve_resources_late(void);
 extern void setup_memory_map(void);
 extern char *default_machine_specific_memory_setup(void);
 extern char *machine_specific_memory_setup(void);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: split e820 reserved entries record to late
  2008-08-28 19:39 [PATCH] x86: split e820 reserved entries record to late Yinghai Lu
@ 2008-08-28 20:05 ` Linus Torvalds
  2008-08-28 20:19   ` Yinghai Lu
  2008-08-28 20:29   ` H. Peter Anvin
  0 siblings, 2 replies; 6+ messages in thread
From: Linus Torvalds @ 2008-08-28 20:05 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	Jesse Barnes, linux-kernel



On Thu, 28 Aug 2008, Yinghai Lu wrote:
>
> so could let BAR res register at first, or even pnp?

Well, I'm not sure whether PnP or e820 should be first, as long as any 
"real hardware" probing takes precedence over either. I _suspect_ that 
e820 is more trustworthy, which implies that PnP should probably be added 
last. It would be good to have some idea what Windows does, since usually 
all the firmware bugs are essentially hidden by whatever that other OS 
happens to do.

The basic rule really should be: "What do we trust most?" and probe things 
in that order. 

So e820 is fairly trustworthy, but we know that it will have various 
random things marked as reserved because they are special in some way (but 
we don't know _how_ they are special - they may well be real BAR's that 
just have a fixed meaning to ACPI or whatever).

But we obviously trust _part_ of it (the RAM stuff) more than we trust 
other parts. So it does make sense to consider that separately.

PnP I personally wouldn't trust at all, except as a way to keep dynamic 
resources away from those things, which is why I'd put it last. But that's 
just my personal gut feeling.

Hardware we generally trust more than any firmware, but even hardware can 
have bugs. And some classes of hardware tends to be less buggy than others 
(ie I'd trust some on-die APIC base pointer before I would trust a Cardbus 
controller BAR, for example).

But yes, I think your patch looks like it is definitely moving in the 
right direction. If this means that we can now do PCI probing without 
having the BAR's move around because they also happened to be covered by 
an e820 map, then that sounds like a good thing.

Of course, I bet there will be cases where this causes problems. It feels 
like we have _never_ worked around some PCI BAR allocation problem without 
hitting another unexpected one..

		Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: split e820 reserved entries record to late
  2008-08-28 20:05 ` Linus Torvalds
@ 2008-08-28 20:19   ` Yinghai Lu
  2008-08-28 20:29   ` H. Peter Anvin
  1 sibling, 0 replies; 6+ messages in thread
From: Yinghai Lu @ 2008-08-28 20:19 UTC (permalink / raw)
  To: Linus Torvalds, Bjorn Helgaas
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
	Jesse Barnes, linux-kernel

On Thu, Aug 28, 2008 at 1:05 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Thu, 28 Aug 2008, Yinghai Lu wrote:
>>
>> so could let BAR res register at first, or even pnp?
>
> Well, I'm not sure whether PnP or e820 should be first, as long as any
> "real hardware" probing takes precedence over either. I _suspect_ that
> e820 is more trustworthy, which implies that PnP should probably be added
> last. It would be good to have some idea what Windows does, since usually
> all the firmware bugs are essentially hidden by whatever that other OS
> happens to do.
>
> The basic rule really should be: "What do we trust most?" and probe things
> in that order.
>
> So e820 is fairly trustworthy, but we know that it will have various
> random things marked as reserved because they are special in some way (but
> we don't know _how_ they are special - they may well be real BAR's that
> just have a fixed meaning to ACPI or whatever).
>
> But we obviously trust _part_ of it (the RAM stuff) more than we trust
> other parts. So it does make sense to consider that separately.
>
> PnP I personally wouldn't trust at all, except as a way to keep dynamic
> resources away from those things, which is why I'd put it last. But that's
> just my personal gut feeling.
>
> Hardware we generally trust more than any firmware, but even hardware can
> have bugs. And some classes of hardware tends to be less buggy than others
> (ie I'd trust some on-die APIC base pointer before I would trust a Cardbus
> controller BAR, for example).

ok, will move e820_reserve_resource_late to pcibios_resource_survey(),
so it is called vi pci_subsys_init before pnp_system_init

YH

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: split e820 reserved entries record to late
  2008-08-28 20:05 ` Linus Torvalds
  2008-08-28 20:19   ` Yinghai Lu
@ 2008-08-28 20:29   ` H. Peter Anvin
  2008-08-28 20:38     ` Linus Torvalds
  1 sibling, 1 reply; 6+ messages in thread
From: H. Peter Anvin @ 2008-08-28 20:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Yinghai Lu, Ingo Molnar, Thomas Gleixner, Andrew Morton,
	Jesse Barnes, linux-kernel

Linus Torvalds wrote:
> 
> So e820 is fairly trustworthy, but we know that it will have various 
> random things marked as reserved because they are special in some way (but 
> we don't know _how_ they are special - they may well be real BAR's that 
> just have a fixed meaning to ACPI or whatever).
> 

My thinking is that if we run into a region which is reserved in e820 
but points to a real BAR, we would want to keep that BAR pinned, since a 
legitimate BIOS might use this mechanism to indicate that the device 
implemented by that BAR is used by SMM or ACPI.  If not, in most cases 
we will only have wasted some address space.  The sucky case, of course, 
would be an uninitialized BAR pointing into unusable address space which 
happens to be reserved in e820.  This seems very difficult to 
disambiguate from the above case through any algorithm that I can think of.

> Of course, I bet there will be cases where this causes problems. It feels 
> like we have _never_ worked around some PCI BAR allocation problem without 
> hitting another unexpected one..

I suspect that for any possible behaviour, there will be at least one 
system out there doing something broken for it :(

	-hpa

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: split e820 reserved entries record to late
  2008-08-28 20:29   ` H. Peter Anvin
@ 2008-08-28 20:38     ` Linus Torvalds
  2008-08-28 20:45       ` H. Peter Anvin
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2008-08-28 20:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Ingo Molnar, Thomas Gleixner, Andrew Morton,
	Jesse Barnes, linux-kernel



On Thu, 28 Aug 2008, H. Peter Anvin wrote:
>
>     The sucky case, of course, would be an uninitialized BAR pointing 
> into unusable address space which happens to be reserved in e820.  This 
> seems very difficult to disambiguate from the above case through any 
> algorithm that I can think of.

Yeah, well, the good news is that it should be fairly rare. Any sane PCI 
device will come out of reset with IO and MEM disabled, and even if some 
crazy BIOS enables IO/MEM on it and activates the BAR's with some random 
content, I'm not seeing how that would work well with Windows either if it 
really was overlapping with some critical real other piece of hardware.

So I'd _assume_ that something like that would break Windows too, and thus 
not actually make it into a real product.

Maybe.

			Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: split e820 reserved entries record to late
  2008-08-28 20:38     ` Linus Torvalds
@ 2008-08-28 20:45       ` H. Peter Anvin
  0 siblings, 0 replies; 6+ messages in thread
From: H. Peter Anvin @ 2008-08-28 20:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Yinghai Lu, Ingo Molnar, Thomas Gleixner, Andrew Morton,
	Jesse Barnes, linux-kernel

Linus Torvalds wrote:
> 
> On Thu, 28 Aug 2008, H. Peter Anvin wrote:
>>     The sucky case, of course, would be an uninitialized BAR pointing 
>> into unusable address space which happens to be reserved in e820.  This 
>> seems very difficult to disambiguate from the above case through any 
>> algorithm that I can think of.
> 
> Yeah, well, the good news is that it should be fairly rare. Any sane PCI 
> device will come out of reset with IO and MEM disabled, and even if some 
> crazy BIOS enables IO/MEM on it and activates the BAR's with some random 
> content, I'm not seeing how that would work well with Windows either if it 
> really was overlapping with some critical real other piece of hardware.
> 
> So I'd _assume_ that something like that would break Windows too, and thus 
> not actually make it into a real product.
> 

I believe you are right.

I think the key bit here is that if the IO or MEM bit is enabled, it's 
likely to be initialized.  The PCI specs say that BARs should come out 
of reset initialized to zero, but I know for a fact that at least 
several Broadcom chips don't -- however, the combination of BAR and 
enable bits should be enough of a sentry.

	-hpa

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-08-28 20:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-28 19:39 [PATCH] x86: split e820 reserved entries record to late Yinghai Lu
2008-08-28 20:05 ` Linus Torvalds
2008-08-28 20:19   ` Yinghai Lu
2008-08-28 20:29   ` H. Peter Anvin
2008-08-28 20:38     ` Linus Torvalds
2008-08-28 20:45       ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox