* [PATCH] x86: split e820 reserved entries record to late v4
@ 2008-08-29 1:29 Yinghai Lu
2008-08-29 6:30 ` Ingo Molnar
0 siblings, 1 reply; 13+ messages in thread
From: Yinghai Lu @ 2008-08-29 1:29 UTC (permalink / raw)
To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
Jesse Barnes, Linus Torvalds
Cc: linux-kernel, Yinghai Lu
Linus said we should register some entries in e820 later,
so could let BAR res register at first, or even pnp?
this one replace
| commit a2bd7274b47124d2fc4dfdb8c0591f545ba749dd
| Author: Yinghai Lu <yhlu.kernel@gmail.com>
| Date: Mon Aug 25 00:56:08 2008 -0700
|
| x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR, v3
v2: insert e820 reserve resources before pnp_system_init
v3: fix merging problem in tip/x86/core
please drop the one in tip/x86/core use this one instead
v4: address Linus's review about comments and condition in _late()
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
arch/x86/kernel/e820.c | 24 +++++++++++++++++++++++-
arch/x86/pci/i386.c | 3 +++
include/asm-x86/e820.h | 1 +
3 files changed, 27 insertions(+), 1 deletion(-)
Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1271,6 +1271,7 @@ static inline const char *e820_type_to_s
/*
* Mark e820 reserved areas as busy for the resource manager.
*/
+static struct resource __initdata *e820_res;
void __init e820_reserve_resources(void)
{
int i;
@@ -1278,6 +1279,7 @@ void __init e820_reserve_resources(void)
u64 end;
res = alloc_bootmem_low(sizeof(struct resource) * e820.nr_map);
+ e820_res = res;
for (i = 0; i < e820.nr_map; i++) {
end = e820.map[i].addr + e820.map[i].size - 1;
#ifndef CONFIG_RESOURCES_64BIT
@@ -1291,7 +1293,14 @@ void __init e820_reserve_resources(void)
res->end = end;
res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
- insert_resource(&iomem_resource, res);
+
+ /*
+ * don't register the region that could be conflicted with
+ * pci device BAR resource and insert them later in
+ * pcibios_resource_survey()
+ */
+ if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20))
+ insert_resource(&iomem_resource, res);
res++;
}
@@ -1303,6 +1312,19 @@ void __init e820_reserve_resources(void)
}
}
+void __init e820_reserve_resources_late(void)
+{
+ int i;
+ struct resource *res;
+
+ res = e820_res;
+ for (i = 0; i < e820.nr_map; i++) {
+ if (!res->parent && res->end)
+ insert_resource(&iomem_resource, res);
+ res++;
+ }
+}
+
char *__init default_machine_specific_memory_setup(void)
{
char *who = "BIOS-e820";
Index: linux-2.6/arch/x86/pci/i386.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/i386.c
+++ linux-2.6/arch/x86/pci/i386.c
@@ -33,6 +33,7 @@
#include <linux/bootmem.h>
#include <asm/pat.h>
+#include <asm/e820.h>
#include "pci.h"
@@ -230,6 +231,8 @@ void __init pcibios_resource_survey(void
pcibios_allocate_bus_resources(&pci_root_buses);
pcibios_allocate_resources(0);
pcibios_allocate_resources(1);
+
+ e820_reserve_resources_late();
}
/**
Index: linux-2.6/include/asm-x86/e820.h
===================================================================
--- linux-2.6.orig/include/asm-x86/e820.h
+++ linux-2.6/include/asm-x86/e820.h
@@ -122,6 +122,7 @@ extern void e820_register_active_regions
extern u64 e820_hole_size(u64 start, u64 end);
extern void finish_e820_parsing(void);
extern void e820_reserve_resources(void);
+extern void e820_reserve_resources_late(void);
extern void setup_memory_map(void);
extern char *default_machine_specific_memory_setup(void);
extern char *machine_specific_memory_setup(void);
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: split e820 reserved entries record to late v4
2008-08-29 1:29 Yinghai Lu
@ 2008-08-29 6:30 ` Ingo Molnar
2008-08-29 6:47 ` Yinghai Lu
2008-08-29 15:26 ` Linus Torvalds
0 siblings, 2 replies; 13+ messages in thread
From: Ingo Molnar @ 2008-08-29 6:30 UTC (permalink / raw)
To: Yinghai Lu
Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, Jesse Barnes,
Linus Torvalds, linux-kernel
* Yinghai Lu <yhlu.kernel@gmail.com> wrote:
> Linus said we should register some entries in e820 later,
> so could let BAR res register at first, or even pnp?
>
> this one replace
> | commit a2bd7274b47124d2fc4dfdb8c0591f545ba749dd
> | Author: Yinghai Lu <yhlu.kernel@gmail.com>
> | Date: Mon Aug 25 00:56:08 2008 -0700
> |
> | x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR, v3
>
> v2: insert e820 reserve resources before pnp_system_init
> v3: fix merging problem in tip/x86/core
> please drop the one in tip/x86/core use this one instead
> v4: address Linus's review about comments and condition in _late()
>
> Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
applied to tip/x86/core, thanks.
Let me outline the issue that i raised before:
> + if (!res->parent && res->end)
> + insert_resource(&iomem_resource, res);
what if this insertion fails due to partial overlap? Right now we drop
it silently - which might be fine for most systems, but have a look on
the specific system that had the hpet regression, there we have these
reserved e820 entries:
BIOS-e820: 0000000077ff0000 - 0000000078000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
which overlaps with the chipset PCI BAR (hpet) resource:
pci 0000:00:14.0: BAR has HPET at fed00000-fed003ff
so due to this 1K conflict we take the full e820-reserved entry out and
give the range 0xfec00000-0x100000000 as 'free'.
And that failure to register can cause problems. In this case that
'reserved' e820 entry definitely has real meaning, both the local APIC
and the IO-APIC is in that range:
ACPI: Local APIC address 0xfee00000
IOAPIC[0]: apic_id 2, version 0, address 0xfec00000, GSI 0-23
Which might still be OK for all memory resources we happen to enumerate
- but we dont necessarily enumerate all of them when we have e.g. an UP
kernel, and we will definitely not enumerate any 'hidden' state a system
might have there. (SMM, etc.) If we then allocate a dynamic PCI resource
into that range later on (thinking it's "free" but in reality it's
claimed) we get a crash or worse.
So my worry, which i outlined before and which Peter agreed with, was
that we should not mark areas 'free' that the BIOS thinks are
'reserved'. According to the map above, the BIOS declared non-RAM 'free'
range in the first 4GB is 0x78000000..0xe0000000 - 1664 MB, plenty of
space.
The solution would be to insert such conflicting (even if partially
overlapping)
Also, a small code structure comment:
> + if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20))
> + insert_resource(&iomem_resource, res);
this still needs a comment that we deal with resources that start below
1MB in a special way and insert them early.
Perhaps split it out into a e820_entry_trusted() function and use that
as a condition in both the early and the late logic. [plus the check for
->end in the late logic - that should be outside of the 'trust'
definition]
So whenever we tweak the definition of 'trust', we only have to do it in
a single place. Agreed?
Ingo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: split e820 reserved entries record to late v4
2008-08-29 6:30 ` Ingo Molnar
@ 2008-08-29 6:47 ` Yinghai Lu
2008-08-29 6:58 ` Ingo Molnar
2008-08-29 15:26 ` Linus Torvalds
1 sibling, 1 reply; 13+ messages in thread
From: Yinghai Lu @ 2008-08-29 6:47 UTC (permalink / raw)
To: Ingo Molnar
Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, Jesse Barnes,
Linus Torvalds, linux-kernel
On Thu, Aug 28, 2008 at 11:30 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Yinghai Lu <yhlu.kernel@gmail.com> wrote:
>
>> Linus said we should register some entries in e820 later,
>> so could let BAR res register at first, or even pnp?
>>
>> this one replace
>> | commit a2bd7274b47124d2fc4dfdb8c0591f545ba749dd
>> | Author: Yinghai Lu <yhlu.kernel@gmail.com>
>> | Date: Mon Aug 25 00:56:08 2008 -0700
>> |
>> | x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR, v3
>>
>> v2: insert e820 reserve resources before pnp_system_init
>> v3: fix merging problem in tip/x86/core
>> please drop the one in tip/x86/core use this one instead
>> v4: address Linus's review about comments and condition in _late()
>>
>> Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
>
> applied to tip/x86/core, thanks.
>
> Let me outline the issue that i raised before:
>
>> + if (!res->parent && res->end)
>> + insert_resource(&iomem_resource, res);
>
> what if this insertion fails due to partial overlap? Right now we drop
> it silently - which might be fine for most systems, but have a look on
> the specific system that had the hpet regression, there we have these
> reserved e820 entries:
>
> BIOS-e820: 0000000077ff0000 - 0000000078000000 (reserved)
> BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
> BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
>
> which overlaps with the chipset PCI BAR (hpet) resource:
>
> pci 0000:00:14.0: BAR has HPET at fed00000-fed003ff
>
> so due to this 1K conflict we take the full e820-reserved entry out and
> give the range 0xfec00000-0x100000000 as 'free'.
you will get
fec00000 - ffffffff reserved
fed0000 - fed003ff hpet
fed0000 - fed003ff 0000:00:14.0
>
> And that failure to register can cause problems. In this case that
> 'reserved' e820 entry definitely has real meaning, both the local APIC
> and the IO-APIC is in that range:
>
> ACPI: Local APIC address 0xfee00000
> IOAPIC[0]: apic_id 2, version 0, address 0xfec00000, GSI 0-23
>
> Which might still be OK for all memory resources we happen to enumerate
> - but we dont necessarily enumerate all of them when we have e.g. an UP
> kernel, and we will definitely not enumerate any 'hidden' state a system
> might have there. (SMM, etc.) If we then allocate a dynamic PCI resource
> into that range later on (thinking it's "free" but in reality it's
> claimed) we get a crash or worse.
>
> So my worry, which i outlined before and which Peter agreed with, was
> that we should not mark areas 'free' that the BIOS thinks are
> 'reserved'. According to the map above, the BIOS declared non-RAM 'free'
> range in the first 4GB is 0x78000000..0xe0000000 - 1664 MB, plenty of
> space.
for the pci gap use?
e820_setup_gap will check e820_map directly to use that range as pci
gap for unassigned resources.
>
> The solution would be to insert such conflicting (even if partially
> overlapping)
>
> Also, a small code structure comment:
>
>> + if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20))
>> + insert_resource(&iomem_resource, res);
>
> this still needs a comment that we deal with resources that start below
> 1MB in a special way and insert them early.
>
> Perhaps split it out into a e820_entry_trusted() function and use that
> as a condition in both the early and the late logic. [plus the check for
> ->end in the late logic - that should be outside of the 'trust'
> definition]
>
> So whenever we tweak the definition of 'trust', we only have to do it in
> a single place. Agreed?
yes
YH
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: split e820 reserved entries record to late v4
2008-08-29 6:47 ` Yinghai Lu
@ 2008-08-29 6:58 ` Ingo Molnar
2008-08-29 7:16 ` Yinghai Lu
2008-08-29 15:31 ` Linus Torvalds
0 siblings, 2 replies; 13+ messages in thread
From: Ingo Molnar @ 2008-08-29 6:58 UTC (permalink / raw)
To: Yinghai Lu
Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, Jesse Barnes,
Linus Torvalds, linux-kernel
* Yinghai Lu <yhlu.kernel@gmail.com> wrote:
> > BIOS-e820: 0000000077ff0000 - 0000000078000000 (reserved)
> > BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
> > BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
> >
> > which overlaps with the chipset PCI BAR (hpet) resource:
> >
> > pci 0000:00:14.0: BAR has HPET at fed00000-fed003ff
> >
> > so due to this 1K conflict we take the full e820-reserved entry out and
> > give the range 0xfec00000-0x100000000 as 'free'.
>
> you will get
> fec00000 - ffffffff reserved
> fed0000 - fed003ff hpet
> fed0000 - fed003ff 0000:00:14.0
ok - because it's fully contained insert_resource() will succeed? I
thought it would only succeed if the new resource was smaller than (a
subset of) the existing resource. In the other direction, when a newly
inserted resource is a superset of the existing resource, i thought we'd
fail.
hypothetical scenario, what if we had neither a superset nor a subset
scenario, but a partial overlap, between:
> > BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
and:
> > pci 0000:00:14.0: BAR has HPET at feb0f000-fec01000
i.e. we have:
[... PCI BAR ...]
[... e820 reservation ...]
in that case the insert_resource() will fail due to the conflict. Can we
declare it in that case that the e820 reserved entry is mortally broken
and we just ignore it?
At least we should emit a prominent warning if insert_resource() fails,
and add in an mdelay(2000) so that the user sees it.
Ingo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: split e820 reserved entries record to late v4
2008-08-29 6:58 ` Ingo Molnar
@ 2008-08-29 7:16 ` Yinghai Lu
2008-08-29 7:28 ` Ingo Molnar
2008-08-29 15:31 ` Linus Torvalds
1 sibling, 1 reply; 13+ messages in thread
From: Yinghai Lu @ 2008-08-29 7:16 UTC (permalink / raw)
To: Ingo Molnar
Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, Jesse Barnes,
Linus Torvalds, linux-kernel
On Thu, Aug 28, 2008 at 11:58 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Yinghai Lu <yhlu.kernel@gmail.com> wrote:
>
>> > BIOS-e820: 0000000077ff0000 - 0000000078000000 (reserved)
>> > BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
>> > BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
>> >
>> > which overlaps with the chipset PCI BAR (hpet) resource:
>> >
>> > pci 0000:00:14.0: BAR has HPET at fed00000-fed003ff
>> >
>> > so due to this 1K conflict we take the full e820-reserved entry out and
>> > give the range 0xfec00000-0x100000000 as 'free'.
>>
>> you will get
>> fec00000 - ffffffff reserved
>> fed0000 - fed003ff hpet
>> fed0000 - fed003ff 0000:00:14.0
>
> ok - because it's fully contained insert_resource() will succeed? I
> thought it would only succeed if the new resource was smaller than (a
> subset of) the existing resource. In the other direction, when a newly
> inserted resource is a superset of the existing resource, i thought we'd
> fail.
>
> hypothetical scenario, what if we had neither a superset nor a subset
> scenario, but a partial overlap, between:
>
>> > BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
>
> and:
>
>> > pci 0000:00:14.0: BAR has HPET at feb0f000-fec01000
>
> i.e. we have:
>
> [... PCI BAR ...]
> [... e820 reservation ...]
>
> in that case the insert_resource() will fail due to the conflict. Can we
> declare it in that case that the e820 reserved entry is mortally broken
> and we just ignore it?
yes, that will fail to insert ...
expand to 0xfeb0f000 - 0xfffffff and try again.?
may need to update insert_resource to return conflict resource ...
>
> At least we should emit a prominent warning if insert_resource() fails,
> and add in an mdelay(2000) so that the user sees it.
right
YH
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: split e820 reserved entries record to late v4
2008-08-29 7:16 ` Yinghai Lu
@ 2008-08-29 7:28 ` Ingo Molnar
2008-08-29 7:52 ` Yinghai Lu
0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2008-08-29 7:28 UTC (permalink / raw)
To: Yinghai Lu
Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, Jesse Barnes,
Linus Torvalds, linux-kernel
* Yinghai Lu <yhlu.kernel@gmail.com> wrote:
> On Thu, Aug 28, 2008 at 11:58 PM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > * Yinghai Lu <yhlu.kernel@gmail.com> wrote:
> >
> >> > BIOS-e820: 0000000077ff0000 - 0000000078000000 (reserved)
> >> > BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
> >> > BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
> >> >
> >> > which overlaps with the chipset PCI BAR (hpet) resource:
> >> >
> >> > pci 0000:00:14.0: BAR has HPET at fed00000-fed003ff
> >> >
> >> > so due to this 1K conflict we take the full e820-reserved entry out and
> >> > give the range 0xfec00000-0x100000000 as 'free'.
> >>
> >> you will get
> >> fec00000 - ffffffff reserved
> >> fed0000 - fed003ff hpet
> >> fed0000 - fed003ff 0000:00:14.0
> >
> > ok - because it's fully contained insert_resource() will succeed? I
> > thought it would only succeed if the new resource was smaller than (a
> > subset of) the existing resource. In the other direction, when a newly
> > inserted resource is a superset of the existing resource, i thought we'd
> > fail.
> >
> > hypothetical scenario, what if we had neither a superset nor a subset
> > scenario, but a partial overlap, between:
> >
> >> > BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
> >
> > and:
> >
> >> > pci 0000:00:14.0: BAR has HPET at feb0f000-fec01000
> >
> > i.e. we have:
> >
> > [... PCI BAR ...]
> > [... e820 reservation ...]
> >
> > in that case the insert_resource() will fail due to the conflict. Can we
> > declare it in that case that the e820 reserved entry is mortally broken
> > and we just ignore it?
>
> yes, that will fail to insert ...
>
> expand to 0xfeb0f000 - 0xfffffff and try again.?
>
> may need to update insert_resource to return conflict resource ...
yes, that sounds an excellent idea - i was thinking of something
muchmore complex like breaking up the reserved entry - but indeed just
creating a large enough superset should be perfect. I.e. extend both
start and end until we fit fully. [or reach some natural boundary such
as 0 or 4GB]
> > At least we should emit a prominent warning if insert_resource() fails,
> > and add in an mdelay(2000) so that the user sees it.
>
> right
btw., perhaps we should try this: first try a request_resource(). If
that fails it means we overlap with something - then we should already
printk a warning. (e820 reserved entries should never conflict with PCI
resources, should they?)
then try an insert_resource(). If that too fails it means a partial
overlap - printk another warning. Try the extension (within reasonable
limits) and retry.
Does that sound worthwile?
Ingo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: split e820 reserved entries record to late v4
2008-08-29 7:28 ` Ingo Molnar
@ 2008-08-29 7:52 ` Yinghai Lu
0 siblings, 0 replies; 13+ messages in thread
From: Yinghai Lu @ 2008-08-29 7:52 UTC (permalink / raw)
To: Ingo Molnar
Cc: Thomas Gleixner, H. Peter Anvin, Andrew Morton, Jesse Barnes,
Linus Torvalds, linux-kernel
On Fri, Aug 29, 2008 at 12:28 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Yinghai Lu <yhlu.kernel@gmail.com> wrote:
>
>> On Thu, Aug 28, 2008 at 11:58 PM, Ingo Molnar <mingo@elte.hu> wrote:
>> >
>> > * Yinghai Lu <yhlu.kernel@gmail.com> wrote:
>> >
>> >> > BIOS-e820: 0000000077ff0000 - 0000000078000000 (reserved)
>> >> > BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
>> >> > BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
>> >> >
>> >> > which overlaps with the chipset PCI BAR (hpet) resource:
>> >> >
>> >> > pci 0000:00:14.0: BAR has HPET at fed00000-fed003ff
>> >> >
>> >> > so due to this 1K conflict we take the full e820-reserved entry out and
>> >> > give the range 0xfec00000-0x100000000 as 'free'.
>> >>
>> >> you will get
>> >> fec00000 - ffffffff reserved
>> >> fed0000 - fed003ff hpet
>> >> fed0000 - fed003ff 0000:00:14.0
>> >
>> > ok - because it's fully contained insert_resource() will succeed? I
>> > thought it would only succeed if the new resource was smaller than (a
>> > subset of) the existing resource. In the other direction, when a newly
>> > inserted resource is a superset of the existing resource, i thought we'd
>> > fail.
>> >
>> > hypothetical scenario, what if we had neither a superset nor a subset
>> > scenario, but a partial overlap, between:
>> >
>> >> > BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
>> >
>> > and:
>> >
>> >> > pci 0000:00:14.0: BAR has HPET at feb0f000-fec01000
>> >
>> > i.e. we have:
>> >
>> > [... PCI BAR ...]
>> > [... e820 reservation ...]
>> >
>> > in that case the insert_resource() will fail due to the conflict. Can we
>> > declare it in that case that the e820 reserved entry is mortally broken
>> > and we just ignore it?
>>
>> yes, that will fail to insert ...
>>
>> expand to 0xfeb0f000 - 0xfffffff and try again.?
>>
>> may need to update insert_resource to return conflict resource ...
>
> yes, that sounds an excellent idea - i was thinking of something
> muchmore complex like breaking up the reserved entry - but indeed just
> creating a large enough superset should be perfect. I.e. extend both
> start and end until we fit fully. [or reach some natural boundary such
> as 0 or 4GB]
>
>> > At least we should emit a prominent warning if insert_resource() fails,
>> > and add in an mdelay(2000) so that the user sees it.
>>
>> right
>
> btw., perhaps we should try this: first try a request_resource(). If
> that fails it means we overlap with something - then we should already
> printk a warning. (e820 reserved entries should never conflict with PCI
> resources, should they?)
>
> then try an insert_resource(). If that too fails it means a partial
> overlap - printk another warning. Try the extension (within reasonable
> limits) and retry.
>
> Does that sound worthwile?
request_resource should always fail at that case, because we are using
iomem_resource at first parent...
YH
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: split e820 reserved entries record to late v4
2008-08-29 6:30 ` Ingo Molnar
2008-08-29 6:47 ` Yinghai Lu
@ 2008-08-29 15:26 ` Linus Torvalds
2008-09-06 17:55 ` Ingo Molnar
1 sibling, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2008-08-29 15:26 UTC (permalink / raw)
To: Ingo Molnar
Cc: Yinghai Lu, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
Jesse Barnes, linux-kernel
On Fri, 29 Aug 2008, Ingo Molnar wrote:
>
> BIOS-e820: 0000000077ff0000 - 0000000078000000 (reserved)
> BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
> BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
>
> which overlaps with the chipset PCI BAR (hpet) resource:
>
> pci 0000:00:14.0: BAR has HPET at fed00000-fed003ff
>
> so due to this 1K conflict we take the full e820-reserved entry out and
> give the range 0xfec00000-0x100000000 as 'free'.
That's wrong. It's a full overlap, so "insert_resource()" should happily
insert the 00000000fec00000 - 0000000100000000 _around_ the HPET bar.
Do you actually see this behavior, or do you just misunderstand how
"insert_resource()" works?
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: split e820 reserved entries record to late v4
2008-08-29 6:58 ` Ingo Molnar
2008-08-29 7:16 ` Yinghai Lu
@ 2008-08-29 15:31 ` Linus Torvalds
1 sibling, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2008-08-29 15:31 UTC (permalink / raw)
To: Ingo Molnar
Cc: Yinghai Lu, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
Jesse Barnes, linux-kernel
On Fri, 29 Aug 2008, Ingo Molnar wrote:
> >
> > you will get
> > fec00000 - ffffffff reserved
> > fed0000 - fed003ff hpet
> > fed0000 - fed003ff 0000:00:14.0
>
> ok - because it's fully contained insert_resource() will succeed?
Correct.
> I thought it would only succeed if the new resource was smaller than (a
> subset of) the existing resource.
No, that's "request_resource()".
Yeah, I know, the resource code is complicated, and I wish it wasn't, but
the whole issue with nesting resources correctly simply _is_ fairly
complex.
So "insert_resource()" literally tries to fit a resource into an existing
tree, at the right level, whatever level that is.
In contrast, "request_resource()" is about exclusivity, and tries to
request a "leaf" resource - and it refuses to work if there are resources
it clashes with that cover the same range.
> In the other direction, when a newly inserted resource is a superset of
> the existing resource, i thought we'd fail.
For request_resource, yes.
> hypothetical scenario, what if we had neither a superset nor a subset
> scenario, but a partial overlap, between:
>
> > > BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
>
> and:
>
> > > pci 0000:00:14.0: BAR has HPET at feb0f000-fec01000
Now THAT is somethign that the resource structs simply cannot handle. At
that point you'd get a failure even from insert_resource(), because it no
longer nests in the tree. You would have to split the range in order for
it to nest properly.
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] x86: split e820 reserved entries record to late v4
2008-08-30 7:58 [PATCH] x86: unify using pci_mmcfg_insert_resource Yinghai Lu
@ 2008-08-30 7:58 ` Yinghai Lu
0 siblings, 0 replies; 13+ messages in thread
From: Yinghai Lu @ 2008-08-30 7:58 UTC (permalink / raw)
To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
Jesse Barnes, Linus Torvalds
Cc: linux-kernel, Yinghai Lu
Linus said we should register some entries in e820 later,
so could let BAR res register at first, or even pnp?
this one replace
| commit a2bd7274b47124d2fc4dfdb8c0591f545ba749dd
| Author: Yinghai Lu <yhlu.kernel@gmail.com>
| Date: Mon Aug 25 00:56:08 2008 -0700
|
| x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR, v3
v2: insert e820 reserve resources before pnp_system_init
v3: fix merging problem in tip/x86/core
please drop the one in tip/x86/core use this one instead
v4: address Linus's review about comments and condition in _late()
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
arch/x86/kernel/e820.c | 24 +++++++++++++++++++++++-
arch/x86/pci/i386.c | 3 +++
include/asm-x86/e820.h | 1 +
3 files changed, 27 insertions(+), 1 deletion(-)
Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1271,6 +1271,7 @@ static inline const char *e820_type_to_s
/*
* Mark e820 reserved areas as busy for the resource manager.
*/
+static struct resource __initdata *e820_res;
void __init e820_reserve_resources(void)
{
int i;
@@ -1278,6 +1279,7 @@ void __init e820_reserve_resources(void)
u64 end;
res = alloc_bootmem_low(sizeof(struct resource) * e820.nr_map);
+ e820_res = res;
for (i = 0; i < e820.nr_map; i++) {
end = e820.map[i].addr + e820.map[i].size - 1;
#ifndef CONFIG_RESOURCES_64BIT
@@ -1291,7 +1293,14 @@ void __init e820_reserve_resources(void)
res->end = end;
res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
- insert_resource(&iomem_resource, res);
+
+ /*
+ * don't register the region that could be conflicted with
+ * pci device BAR resource and insert them later in
+ * pcibios_resource_survey()
+ */
+ if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20))
+ insert_resource(&iomem_resource, res);
res++;
}
@@ -1303,6 +1312,19 @@ void __init e820_reserve_resources(void)
}
}
+void __init e820_reserve_resources_late(void)
+{
+ int i;
+ struct resource *res;
+
+ res = e820_res;
+ for (i = 0; i < e820.nr_map; i++) {
+ if (!res->parent && res->end)
+ insert_resource(&iomem_resource, res);
+ res++;
+ }
+}
+
char *__init default_machine_specific_memory_setup(void)
{
char *who = "BIOS-e820";
Index: linux-2.6/arch/x86/pci/i386.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/i386.c
+++ linux-2.6/arch/x86/pci/i386.c
@@ -33,6 +33,7 @@
#include <linux/bootmem.h>
#include <asm/pat.h>
+#include <asm/e820.h>
#include "pci.h"
@@ -230,6 +231,8 @@ void __init pcibios_resource_survey(void
pcibios_allocate_bus_resources(&pci_root_buses);
pcibios_allocate_resources(0);
pcibios_allocate_resources(1);
+
+ e820_reserve_resources_late();
}
/**
Index: linux-2.6/include/asm-x86/e820.h
===================================================================
--- linux-2.6.orig/include/asm-x86/e820.h
+++ linux-2.6/include/asm-x86/e820.h
@@ -122,6 +122,7 @@ extern void e820_register_active_regions
extern u64 e820_hole_size(u64 start, u64 end);
extern void finish_e820_parsing(void);
extern void e820_reserve_resources(void);
+extern void e820_reserve_resources_late(void);
extern void setup_memory_map(void);
extern char *default_machine_specific_memory_setup(void);
extern char *machine_specific_memory_setup(void);
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] x86: split e820 reserved entries record to late v4
@ 2008-08-30 20:36 Yinghai Lu
2008-08-31 2:18 ` Yinghai Lu
0 siblings, 1 reply; 13+ messages in thread
From: Yinghai Lu @ 2008-08-30 20:36 UTC (permalink / raw)
To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
Jesse Barnes, Linus Torvalds
Cc: linux-kernel, Yinghai Lu
Linus said we should register some entries in e820 later,
so could let BAR res register at first, or even pnp?
this one replace
| commit a2bd7274b47124d2fc4dfdb8c0591f545ba749dd
| Author: Yinghai Lu <yhlu.kernel@gmail.com>
| Date: Mon Aug 25 00:56:08 2008 -0700
|
| x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR, v3
v2: insert e820 reserve resources before pnp_system_init
v3: fix merging problem in tip/x86/core
please drop the one in tip/x86/core use this one instead
v4: address Linus's review about comments and condition in _late()
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
---
arch/x86/kernel/e820.c | 24 +++++++++++++++++++++++-
arch/x86/pci/i386.c | 3 +++
include/asm-x86/e820.h | 1 +
3 files changed, 27 insertions(+), 1 deletion(-)
Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -1267,6 +1267,7 @@ static inline const char *e820_type_to_s
/*
* Mark e820 reserved areas as busy for the resource manager.
*/
+static struct resource __initdata *e820_res;
void __init e820_reserve_resources(void)
{
int i;
@@ -1274,6 +1275,7 @@ void __init e820_reserve_resources(void)
u64 end;
res = alloc_bootmem_low(sizeof(struct resource) * e820.nr_map);
+ e820_res = res;
for (i = 0; i < e820.nr_map; i++) {
end = e820.map[i].addr + e820.map[i].size - 1;
#ifndef CONFIG_RESOURCES_64BIT
@@ -1287,7 +1289,14 @@ void __init e820_reserve_resources(void)
res->end = end;
res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
- insert_resource(&iomem_resource, res);
+
+ /*
+ * don't register the region that could be conflicted with
+ * pci device BAR resource and insert them later in
+ * pcibios_resource_survey()
+ */
+ if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20))
+ insert_resource(&iomem_resource, res);
res++;
}
@@ -1299,6 +1308,19 @@ void __init e820_reserve_resources(void)
}
}
+void __init e820_reserve_resources_late(void)
+{
+ int i;
+ struct resource *res;
+
+ res = e820_res;
+ for (i = 0; i < e820.nr_map; i++) {
+ if (!res->parent && res->end)
+ insert_resource(&iomem_resource, res);
+ res++;
+ }
+}
+
char *__init default_machine_specific_memory_setup(void)
{
char *who = "BIOS-e820";
Index: linux-2.6/arch/x86/pci/i386.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/i386.c
+++ linux-2.6/arch/x86/pci/i386.c
@@ -33,6 +33,7 @@
#include <linux/bootmem.h>
#include <asm/pat.h>
+#include <asm/e820.h>
#include "pci.h"
@@ -230,6 +231,8 @@ void __init pcibios_resource_survey(void
pcibios_allocate_bus_resources(&pci_root_buses);
pcibios_allocate_resources(0);
pcibios_allocate_resources(1);
+
+ e820_reserve_resources_late();
}
/**
Index: linux-2.6/include/asm-x86/e820.h
===================================================================
--- linux-2.6.orig/include/asm-x86/e820.h
+++ linux-2.6/include/asm-x86/e820.h
@@ -120,6 +120,7 @@ extern void e820_register_active_regions
extern u64 e820_hole_size(u64 start, u64 end);
extern void finish_e820_parsing(void);
extern void e820_reserve_resources(void);
+extern void e820_reserve_resources_late(void);
extern void setup_memory_map(void);
extern char *default_machine_specific_memory_setup(void);
extern char *machine_specific_memory_setup(void);
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: split e820 reserved entries record to late v4
2008-08-30 20:36 [PATCH] x86: split e820 reserved entries record to late v4 Yinghai Lu
@ 2008-08-31 2:18 ` Yinghai Lu
0 siblings, 0 replies; 13+ messages in thread
From: Yinghai Lu @ 2008-08-31 2:18 UTC (permalink / raw)
To: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
Jesse Barnes, Linus Torvalds, Ivan Kokshaysky
Cc: linux-kernel, Yinghai Lu
add to: Ivan.
On Sat, Aug 30, 2008 at 1:36 PM, Yinghai Lu <yhlu.kernel@gmail.com> wrote:
> Linus said we should register some entries in e820 later,
> so could let BAR res register at first, or even pnp?
>
> this one replace
> | commit a2bd7274b47124d2fc4dfdb8c0591f545ba749dd
> | Author: Yinghai Lu <yhlu.kernel@gmail.com>
> | Date: Mon Aug 25 00:56:08 2008 -0700
> |
> | x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR, v3
>
> v2: insert e820 reserve resources before pnp_system_init
> v3: fix merging problem in tip/x86/core
> please drop the one in tip/x86/core use this one instead
> v4: address Linus's review about comments and condition in _late()
>
> Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
>
> ---
> arch/x86/kernel/e820.c | 24 +++++++++++++++++++++++-
> arch/x86/pci/i386.c | 3 +++
> include/asm-x86/e820.h | 1 +
> 3 files changed, 27 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/arch/x86/kernel/e820.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/e820.c
> +++ linux-2.6/arch/x86/kernel/e820.c
> @@ -1267,6 +1267,7 @@ static inline const char *e820_type_to_s
> /*
> * Mark e820 reserved areas as busy for the resource manager.
> */
> +static struct resource __initdata *e820_res;
> void __init e820_reserve_resources(void)
> {
> int i;
> @@ -1274,6 +1275,7 @@ void __init e820_reserve_resources(void)
> u64 end;
>
> res = alloc_bootmem_low(sizeof(struct resource) * e820.nr_map);
> + e820_res = res;
> for (i = 0; i < e820.nr_map; i++) {
> end = e820.map[i].addr + e820.map[i].size - 1;
> #ifndef CONFIG_RESOURCES_64BIT
> @@ -1287,7 +1289,14 @@ void __init e820_reserve_resources(void)
> res->end = end;
>
> res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> - insert_resource(&iomem_resource, res);
> +
> + /*
> + * don't register the region that could be conflicted with
> + * pci device BAR resource and insert them later in
> + * pcibios_resource_survey()
> + */
> + if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20))
> + insert_resource(&iomem_resource, res);
> res++;
> }
>
> @@ -1299,6 +1308,19 @@ void __init e820_reserve_resources(void)
> }
> }
>
> +void __init e820_reserve_resources_late(void)
> +{
> + int i;
> + struct resource *res;
> +
> + res = e820_res;
> + for (i = 0; i < e820.nr_map; i++) {
> + if (!res->parent && res->end)
> + insert_resource(&iomem_resource, res);
> + res++;
> + }
> +}
> +
> char *__init default_machine_specific_memory_setup(void)
> {
> char *who = "BIOS-e820";
> Index: linux-2.6/arch/x86/pci/i386.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/pci/i386.c
> +++ linux-2.6/arch/x86/pci/i386.c
> @@ -33,6 +33,7 @@
> #include <linux/bootmem.h>
>
> #include <asm/pat.h>
> +#include <asm/e820.h>
>
> #include "pci.h"
>
> @@ -230,6 +231,8 @@ void __init pcibios_resource_survey(void
> pcibios_allocate_bus_resources(&pci_root_buses);
> pcibios_allocate_resources(0);
> pcibios_allocate_resources(1);
> +
> + e820_reserve_resources_late();
> }
>
> /**
> Index: linux-2.6/include/asm-x86/e820.h
> ===================================================================
> --- linux-2.6.orig/include/asm-x86/e820.h
> +++ linux-2.6/include/asm-x86/e820.h
> @@ -120,6 +120,7 @@ extern void e820_register_active_regions
> extern u64 e820_hole_size(u64 start, u64 end);
> extern void finish_e820_parsing(void);
> extern void e820_reserve_resources(void);
> +extern void e820_reserve_resources_late(void);
> extern void setup_memory_map(void);
> extern char *default_machine_specific_memory_setup(void);
> extern char *machine_specific_memory_setup(void);
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86: split e820 reserved entries record to late v4
2008-08-29 15:26 ` Linus Torvalds
@ 2008-09-06 17:55 ` Ingo Molnar
0 siblings, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2008-09-06 17:55 UTC (permalink / raw)
To: Linus Torvalds
Cc: Yinghai Lu, Thomas Gleixner, H. Peter Anvin, Andrew Morton,
Jesse Barnes, linux-kernel
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Fri, 29 Aug 2008, Ingo Molnar wrote:
> >
> > BIOS-e820: 0000000077ff0000 - 0000000078000000 (reserved)
> > BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
> > BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
> >
> > which overlaps with the chipset PCI BAR (hpet) resource:
> >
> > pci 0000:00:14.0: BAR has HPET at fed00000-fed003ff
> >
> > so due to this 1K conflict we take the full e820-reserved entry out and
> > give the range 0xfec00000-0x100000000 as 'free'.
>
> That's wrong. It's a full overlap, so "insert_resource()" should
> happily insert the 00000000fec00000 - 0000000100000000 _around_ the
> HPET bar.
>
> Do you actually see this behavior, or do you just misunderstand how
> "insert_resource()" works?
the latter, i misunderstood how it works.
Ingo
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-09-06 17:55 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-30 20:36 [PATCH] x86: split e820 reserved entries record to late v4 Yinghai Lu
2008-08-31 2:18 ` Yinghai Lu
-- strict thread matches above, loose matches on Subject: below --
2008-08-30 7:58 [PATCH] x86: unify using pci_mmcfg_insert_resource Yinghai Lu
2008-08-30 7:58 ` [PATCH] x86: split e820 reserved entries record to late v4 Yinghai Lu
2008-08-29 1:29 Yinghai Lu
2008-08-29 6:30 ` Ingo Molnar
2008-08-29 6:47 ` Yinghai Lu
2008-08-29 6:58 ` Ingo Molnar
2008-08-29 7:16 ` Yinghai Lu
2008-08-29 7:28 ` Ingo Molnar
2008-08-29 7:52 ` Yinghai Lu
2008-08-29 15:31 ` Linus Torvalds
2008-08-29 15:26 ` Linus Torvalds
2008-09-06 17:55 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox