* [BUG] 3.2.2 crash in isolate_migratepages
@ 2012-01-27 21:43 Herbert van den Bergh
2012-01-30 9:09 ` Mel Gorman
0 siblings, 1 reply; 4+ messages in thread
From: Herbert van den Bergh @ 2012-01-27 21:43 UTC (permalink / raw)
To: linux-mm; +Cc: Mel Gorman
3.2.2 panics on a 16GB i686 blade:
BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000
The crash happens on this line in mm/compaction.c::isolate_migratepages:
328 page = pfn_to_page(low_pfn);
This macro finds the struct page pointer for a given pfn. These struct
page pointers are stored in sections of 131072 pages if
CONFIG_SPARSEMEM=y. If an entire section has no memory pages, the page
structs are not allocated for this section. On this particular machine,
there is no RAM mapped from 2GB - 4GB:
# dmesg|grep usable
BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
BIOS-e820: 0000000000100000 - 000000007fe4e000 (usable)
BIOS-e820: 000000007fe56000 - 000000007fe57000 (usable)
BIOS-e820: 0000000100000000 - 000000047ffff000 (usable)
So there are no page structs for the sections between 2GB and 4GB.
I believe this check was intended to catch page numbers that point to holes:
323 if (!pfn_valid_within(low_pfn))
324 continue;
But pfn_valid_within is defined to (1) on all archs except ARM and ia64
as far as I can tell. So this check always passes (it's in fact
optimized out), and pfn_to_page ends up dereferencing an invalid address
due to a null pointer in the mem_section structure.
Other compaction code checks for pfn_valid(pfn), which actually checks
for the null pointer in the mem_section structure. It is not clear to
me why isolate_migratepages uses pfn_valid_within(). Changing it to
pfn_valid() prevents the crash. It looks like the correct solution to
me, but I'm not familiar with this code.
I also tried this on a 64-bit machine with a 1GB gap at 3GB, but the
address calculated from (struct page *)0 + pfn is a valid readable
memory location, so it doesn't panic. Not sure what other bad things
happen later though.
Any comments, questions, other data you'd like to see?
Thanks,
Herbert.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [BUG] 3.2.2 crash in isolate_migratepages
2012-01-27 21:43 [BUG] 3.2.2 crash in isolate_migratepages Herbert van den Bergh
@ 2012-01-30 9:09 ` Mel Gorman
2012-01-30 18:16 ` Herbert van den Bergh
0 siblings, 1 reply; 4+ messages in thread
From: Mel Gorman @ 2012-01-30 9:09 UTC (permalink / raw)
To: Herbert van den Bergh; +Cc: linux-mm
On Fri, Jan 27, 2012 at 01:43:07PM -0800, Herbert van den Bergh wrote:
>
> 3.2.2 panics on a 16GB i686 blade:
>
> BUG: unable to handle kernel paging request at 01c00008
> IP: [<c0522399>] isolate_migratepages+0x119/0x390
> *pdpt = 000000002f7ce001 *pde = 0000000000000000
>
> The crash happens on this line in mm/compaction.c::isolate_migratepages:
>
> 328 page = pfn_to_page(low_pfn);
>
This is not line 328 on kernel 3.2.2. Can you double check what version
you are using?
> This macro finds the struct page pointer for a given pfn. These struct
> page pointers are stored in sections of 131072 pages if
> CONFIG_SPARSEMEM=y. If an entire section has no memory pages, the page
> structs are not allocated for this section. On this particular machine,
> there is no RAM mapped from 2GB - 4GB:
>
> # dmesg|grep usable
> BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
> BIOS-e820: 0000000000100000 - 000000007fe4e000 (usable)
> BIOS-e820: 000000007fe56000 - 000000007fe57000 (usable)
> BIOS-e820: 0000000100000000 - 000000047ffff000 (usable)
>
> So there are no page structs for the sections between 2GB and 4GB.
>
> I believe this check was intended to catch page numbers that point to holes:
>
> 323 if (!pfn_valid_within(low_pfn))
> 324 continue;
Can you try the following patch please?
---8<---
mm: compaction: Check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration
When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned. Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.
The problem is that pfn_valid is only called on the first PFN being
checked. Lets say we have a case like this
H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole
H------|------H------|----m-Hoooooo|ooooooH-f----|------H
The migrate_pfn is just below a memory hole and the free scanner is
beyond the hole. When isolate_migratepages started, it scans from
migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory
hole. It checks pfn_valid() on the first PFN but then scans into the
hole where there are not necessarily valid struct pages.
This patch ensures that isolate_migratepages calls pfn_valid when
necessary.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/compaction.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 899d956..edc1e26 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -313,6 +313,19 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
} else if (!locked)
spin_lock_irq(&zone->lru_lock);
+ /*
+ * migrate_pfn does not necessarily start aligned to a
+ * pageblock. Ensure that pfn_valid is called when moving
+ * into a new MAX_ORDER_NR_PAGES range in case of large
+ * memory holes within the zone
+ */
+ if ((low_pfn & (MAX_ORDER_NR_PAGES - 1)) == 0) {
+ if (!pfn_valid(low_pfn)) {
+ low_pfn += MAX_ORDER_NR_PAGES - 1;
+ continue;
+ }
+ }
+
if (!pfn_valid_within(low_pfn))
continue;
nr_scanned++;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [BUG] 3.2.2 crash in isolate_migratepages
2012-01-30 9:09 ` Mel Gorman
@ 2012-01-30 18:16 ` Herbert van den Bergh
2012-01-30 18:28 ` Michal Nazarewicz
0 siblings, 1 reply; 4+ messages in thread
From: Herbert van den Bergh @ 2012-01-30 18:16 UTC (permalink / raw)
To: Mel Gorman; +Cc: linux-mm
On 1/30/12 1:09 AM, Mel Gorman wrote:
> On Fri, Jan 27, 2012 at 01:43:07PM -0800, Herbert van den Bergh wrote:
>> 3.2.2 panics on a 16GB i686 blade:
>>
>> BUG: unable to handle kernel paging request at 01c00008
>> IP: [<c0522399>] isolate_migratepages+0x119/0x390
>> *pdpt = 000000002f7ce001 *pde = 0000000000000000
>>
>> The crash happens on this line in mm/compaction.c::isolate_migratepages:
>>
>> 328 page = pfn_to_page(low_pfn);
>>
> This is not line 328 on kernel 3.2.2. Can you double check what version
> you are using?
That's right, I was using 3.1, but reproduced the problem on 3.2.2. The
source code line numbers are from 3.1. Sorry for the confusion.
>> This macro finds the struct page pointer for a given pfn. These struct
>> page pointers are stored in sections of 131072 pages if
>> CONFIG_SPARSEMEM=y. If an entire section has no memory pages, the page
>> structs are not allocated for this section. On this particular machine,
>> there is no RAM mapped from 2GB - 4GB:
>>
>> # dmesg|grep usable
>> BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
>> BIOS-e820: 0000000000100000 - 000000007fe4e000 (usable)
>> BIOS-e820: 000000007fe56000 - 000000007fe57000 (usable)
>> BIOS-e820: 0000000100000000 - 000000047ffff000 (usable)
>>
>> So there are no page structs for the sections between 2GB and 4GB.
>>
>> I believe this check was intended to catch page numbers that point to holes:
>>
>> 323 if (!pfn_valid_within(low_pfn))
>> 324 continue;
> Can you try the following patch please?
The following patch fixes the crash on this system.
Thanks,
Herbert.
>
> ---8<---
> mm: compaction: Check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration
>
> When isolating for migration, migration starts at the start of a zone
> which is not necessarily pageblock aligned. Further, it stops isolating
> when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
> not aligned.
>
> The problem is that pfn_valid is only called on the first PFN being
> checked. Lets say we have a case like this
>
> H = MAX_ORDER_NR_PAGES boundary
> | = pageblock boundary
> m = cc->migrate_pfn
> f = cc->free_pfn
> o = memory hole
>
> H------|------H------|----m-Hoooooo|ooooooH-f----|------H
>
> The migrate_pfn is just below a memory hole and the free scanner is
> beyond the hole. When isolate_migratepages started, it scans from
> migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory
> hole. It checks pfn_valid() on the first PFN but then scans into the
> hole where there are not necessarily valid struct pages.
>
> This patch ensures that isolate_migratepages calls pfn_valid when
> necessary.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
> mm/compaction.c | 13 +++++++++++++
> 1 files changed, 13 insertions(+), 0 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 899d956..edc1e26 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -313,6 +313,19 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> } else if (!locked)
> spin_lock_irq(&zone->lru_lock);
>
> + /*
> + * migrate_pfn does not necessarily start aligned to a
> + * pageblock. Ensure that pfn_valid is called when moving
> + * into a new MAX_ORDER_NR_PAGES range in case of large
> + * memory holes within the zone
> + */
> + if ((low_pfn & (MAX_ORDER_NR_PAGES - 1)) == 0) {
> + if (!pfn_valid(low_pfn)) {
> + low_pfn += MAX_ORDER_NR_PAGES - 1;
> + continue;
> + }
> + }
> +
> if (!pfn_valid_within(low_pfn))
> continue;
> nr_scanned++;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [BUG] 3.2.2 crash in isolate_migratepages
2012-01-30 18:16 ` Herbert van den Bergh
@ 2012-01-30 18:28 ` Michal Nazarewicz
0 siblings, 0 replies; 4+ messages in thread
From: Michal Nazarewicz @ 2012-01-30 18:28 UTC (permalink / raw)
To: Mel Gorman, Herbert van den Bergh; +Cc: linux-mm
> On 1/30/12 1:09 AM, Mel Gorman wrote:
>> The migrate_pfn is just below a memory hole and the free scanner is
>> beyond the hole. When isolate_migratepages started, it scans from
>> migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory
>> hole. It checks pfn_valid() on the first PFN but then scans into the
>> hole where there are not necessarily valid struct pages.
>>
>> This patch ensures that isolate_migratepages calls pfn_valid when
>> necessary.
>>
>> Signed-off-by: Mel Gorman <mgorman@suse.de>
If anyone cares, this looks good to me, so:
Acked-by: Michal Nazarewicz <mina86@mina86.com>
>> ---
>> mm/compaction.c | 13 +++++++++++++
>> 1 files changed, 13 insertions(+), 0 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 899d956..edc1e26 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -313,6 +313,19 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>> } else if (!locked)
>> spin_lock_irq(&zone->lru_lock);
>>
>> + /*
>> + * migrate_pfn does not necessarily start aligned to a
>> + * pageblock. Ensure that pfn_valid is called when moving
>> + * into a new MAX_ORDER_NR_PAGES range in case of large
>> + * memory holes within the zone
>> + */
>> + if ((low_pfn & (MAX_ORDER_NR_PAGES - 1)) == 0) {
>> + if (!pfn_valid(low_pfn)) {
>> + low_pfn += MAX_ORDER_NR_PAGES - 1;
>> + continue;
>> + }
>> + }
>> +
>> if (!pfn_valid_within(low_pfn))
>> continue;
>> nr_scanned++;
--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michał “mina86” Nazarewicz (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-01-30 18:28 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-27 21:43 [BUG] 3.2.2 crash in isolate_migratepages Herbert van den Bergh
2012-01-30 9:09 ` Mel Gorman
2012-01-30 18:16 ` Herbert van den Bergh
2012-01-30 18:28 ` Michal Nazarewicz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).