All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Henrik Rydberg" <rydberg@euromail.se>
To: Mel Gorman <mgorman@suse.de>
Cc: Jan Kara <jack@suse.cz>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm@kvack.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block()
Date: Thu, 6 Dec 2012 17:58:29 +0100	[thread overview]
Message-ID: <20121206165829.GA392@polaris.bitmath.org> (raw)
In-Reply-To: <20121206161934.GA17258@suse.de>

Hi Mel,

> Still travelling and am not in a position to test this properly :(.
> However, this bug feels very similar to a bug in the migration scanner where
> a pfn_valid check is missed because the start is not aligned.  Henrik, when
> did this start happening? I would be a little surprised if it started between
> 3.6 and 3.7-rcX but maybe it's just easier to hit now for some reason.

I started using transparent hugepages when moving to 3.7-rc1, so it is
quite possible that the problem was there already in 3.6.

> How reproducible is this? Is there anything in particular you do to
> trigger the oops?

Unfortunately nothing special, and it is rare. IIRC, it has happened
after a long uptime, but I guess that only means the probability of
the oops is higher then.

> Does the following patch help any? It's only compile tested I'm afraid.
> 
> ---8<---
> mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free
> 
> Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new
> MAX_ORDER_NR_PAGES block during isolation for migration) added a check
> for pfn_valid() when isolating pages for migration as the scanner does
> not necessarily start pageblock-aligned. However, the free scanner has
> the same problem. If it encounters a hole, it can also trigger an oops
> when is calls PageBuddy(page) on a page that is within an hole.
> 
> Reported-by: Henrik Rydberg <rydberg@euromail.se>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Cc: stable@vger.kernel.org
> ---
>  mm/compaction.c |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 9eef558..7d85ad485 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -298,6 +298,16 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>  			continue;
>  		if (!valid_page)
>  			valid_page = page;
> +
> +		/*
> +		 * As blockpfn may not start aligned, blockpfn->end_pfn
> +		 * may cross a MAX_ORDER_NR_PAGES boundary and a pfn_valid
> +		 * check is necessary. If the pfn is not valid, stop
> +		 * isolation.
> +		 */
> +		if ((blockpfn & (MAX_ORDER_NR_PAGES - 1)) == 0 &&
> +		    !pfn_valid(blockpfn))
> +			break;
>  		if (!PageBuddy(page))
>  			continue;
>  

I am running with it now, adding a printout to see if the case happens
at all. Might take a while, will try to stress the machine a bit.

Thanks,
Henrik

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Henrik Rydberg" <rydberg@euromail.se>
To: Mel Gorman <mgorman@suse.de>
Cc: Jan Kara <jack@suse.cz>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm@kvack.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Oops in 3.7-rc8 isolate_free_pages_block()
Date: Thu, 6 Dec 2012 17:58:29 +0100	[thread overview]
Message-ID: <20121206165829.GA392@polaris.bitmath.org> (raw)
In-Reply-To: <20121206161934.GA17258@suse.de>

Hi Mel,

> Still travelling and am not in a position to test this properly :(.
> However, this bug feels very similar to a bug in the migration scanner where
> a pfn_valid check is missed because the start is not aligned.  Henrik, when
> did this start happening? I would be a little surprised if it started between
> 3.6 and 3.7-rcX but maybe it's just easier to hit now for some reason.

I started using transparent hugepages when moving to 3.7-rc1, so it is
quite possible that the problem was there already in 3.6.

> How reproducible is this? Is there anything in particular you do to
> trigger the oops?

Unfortunately nothing special, and it is rare. IIRC, it has happened
after a long uptime, but I guess that only means the probability of
the oops is higher then.

> Does the following patch help any? It's only compile tested I'm afraid.
> 
> ---8<---
> mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free
> 
> Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new
> MAX_ORDER_NR_PAGES block during isolation for migration) added a check
> for pfn_valid() when isolating pages for migration as the scanner does
> not necessarily start pageblock-aligned. However, the free scanner has
> the same problem. If it encounters a hole, it can also trigger an oops
> when is calls PageBuddy(page) on a page that is within an hole.
> 
> Reported-by: Henrik Rydberg <rydberg@euromail.se>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Cc: stable@vger.kernel.org
> ---
>  mm/compaction.c |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 9eef558..7d85ad485 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -298,6 +298,16 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>  			continue;
>  		if (!valid_page)
>  			valid_page = page;
> +
> +		/*
> +		 * As blockpfn may not start aligned, blockpfn->end_pfn
> +		 * may cross a MAX_ORDER_NR_PAGES boundary and a pfn_valid
> +		 * check is necessary. If the pfn is not valid, stop
> +		 * isolation.
> +		 */
> +		if ((blockpfn & (MAX_ORDER_NR_PAGES - 1)) == 0 &&
> +		    !pfn_valid(blockpfn))
> +			break;
>  		if (!PageBuddy(page))
>  			continue;
>  

I am running with it now, adding a printout to see if the case happens
at all. Might take a while, will try to stress the machine a bit.

Thanks,
Henrik

  parent reply	other threads:[~2012-12-06 16:56 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-06  9:17 Oops in 3.7-rc8 isolate_free_pages_block() Henrik Rydberg
2012-12-06 14:48 ` Jan Kara
2012-12-06 14:48   ` Jan Kara
2012-12-06 15:22   ` Henrik Rydberg
2012-12-06 15:22     ` Henrik Rydberg
2012-12-06 16:10     ` Linus Torvalds
2012-12-06 16:10       ` Linus Torvalds
2012-12-06 16:35       ` Mel Gorman
2012-12-06 16:35         ` Mel Gorman
2012-12-06 16:19   ` Mel Gorman
2012-12-06 16:19     ` Mel Gorman
2012-12-06 16:50     ` Linus Torvalds
2012-12-06 16:50       ` Linus Torvalds
2012-12-06 17:55       ` Mel Gorman
2012-12-06 17:55         ` Mel Gorman
2012-12-06 18:19         ` Linus Torvalds
2012-12-06 18:19           ` Linus Torvalds
2012-12-06 18:21           ` Mel Gorman
2012-12-06 18:21             ` Mel Gorman
2012-12-06 18:32           ` Henrik Rydberg
2012-12-06 18:32             ` Henrik Rydberg
2012-12-06 18:41             ` Linus Torvalds
2012-12-06 18:41               ` Linus Torvalds
2012-12-06 19:01               ` Mel Gorman
2012-12-06 19:01                 ` Mel Gorman
2012-12-06 19:28               ` Henrik Rydberg
2012-12-06 19:28                 ` Henrik Rydberg
2012-12-06 19:38                 ` Linus Torvalds
2012-12-06 19:38                   ` Linus Torvalds
2012-12-06 21:39                   ` Henrik Rydberg
2012-12-06 21:39                     ` Henrik Rydberg
2012-12-07  8:32                   ` Mel Gorman
2012-12-07  8:32                     ` Mel Gorman
2012-12-06 16:58     ` Henrik Rydberg [this message]
2012-12-06 16:58       ` Henrik Rydberg
2012-12-06 17:22     ` Henrik Rydberg
2012-12-06 17:22       ` Henrik Rydberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121206165829.GA392@polaris.bitmath.org \
    --to=rydberg@euromail.se \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.