linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: TAO HU <tghk48@motorola.com>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org,
	"Ye Yuan.Bo-A22116" <yuan-bo.ye@motorola.com>,
	Chang Qing-A21550 <Qing.Chang@motorola.com>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [Question] race condition in mm/page_alloc.c regarding page->lru?
Date: Fri, 2 Apr 2010 11:51:33 +0800	[thread overview]
Message-ID: <g2g5f4a33681004012051wedea9538w9da89e210b731422@mail.gmail.com> (raw)
In-Reply-To: <i2i5f4a33681003312105m4cd42e9ayfe35cc0988c401b6@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5470 bytes --]

2 patches related to page_alloc.c were applied.
Does anyone see a connection between the 2 patches and the panic?
NOTE: the full patches are attached.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a596bfd..34a29e2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2551,6 +2551,20 @@ static inline unsigned long
wait_table_bits(unsigned long size)
 #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))

 /*
+ * Check if a pageblock contains reserved pages
+ */
+static int pageblock_is_reserved(unsigned long start_pfn)
+{
+	unsigned long end_pfn = start_pfn + pageblock_nr_pages;
+	unsigned long pfn;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn++)
+		if (PageReserved(pfn_to_page(pfn)))
+			return 1;
+	return 0;
+}
+
+/*
  * Mark a number of pageblocks as MIGRATE_RESERVE. The number
  * of blocks reserved is based on zone->pages_min. The memory within the
  * reserve will tend to store contiguous free pages. Setting min_free_kbytes
@@ -2579,7 +2593,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 			continue;

 		/* Blocks with reserved pages will never free, skip them. */
-		if (PageReserved(page))
+		if (pageblock_is_reserved(pfn))
 			continue;

 		block_migratetype = get_pageblock_migratetype(page);
-- 
1.5.4.3

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5c44ed4..a596bfd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -119,6 +119,7 @@ static char * const zone_names[MAX_NR_ZONES] = {
 };

 int min_free_kbytes = 1024;
+int min_free_order_shift = 1;

 unsigned long __meminitdata nr_kernel_pages;
 unsigned long __meminitdata nr_all_pages;
@@ -1256,7 +1257,7 @@ int zone_watermark_ok(struct zone *z, int order,
unsigned long mark,
 		free_pages -= z->free_area[o].nr_free << o;

 		/* Require fewer higher order pages to be free */
-		min >>= 1;
+		min >>= min_free_order_shift;

 		if (free_pages <= min)
 			return 0;
-- 


On Thu, Apr 1, 2010 at 12:05 PM, TAO HU <tghk48@motorola.com> wrote:
> Hi, all
>
> We got a panic on our ARM (OMAP) based HW.
> Our code is based on 2.6.29 kernel (last commit for mm/page_alloc.c is
> cc2559bccc72767cb446f79b071d96c30c26439b)
>
> It appears to crash while going through pcp->list in
> buffered_rmqueue() of mm/page_alloc.c after checking vmlinux.
> "00100100" implies LIST_POISON1 that suggests a race condition between
> list_add() and list_del() in my personal view.
> However we not yet figure out locking problem regarding page.lru.
>
> Any known issues about race condition in mm/page_alloc.c?
> And other hints are highly appreciated.
>
>  /* Find a page of the appropriate migrate type */
>                if (cold) {
>                   ... ...
>                } else {
>                        list_for_each_entry(page, &pcp->list, lru)
>                                if (page_private(page) == migratetype)
>                                        break;
>                }
>
> <1>[120898.805267] Unable to handle kernel paging request at virtual
> address 00100100
> <1>[120898.805633] pgd = c1560000
> <1>[120898.805786] [00100100] *pgd=897b3031, *pte=00000000, *ppte=00000000
> <4>[120898.806457] Internal error: Oops: 17 [#1] PREEMPT
> ... ...
> <4>[120898.807861] CPU: 0    Not tainted  (2.6.29-omap1 #1)
> <4>[120898.808044] PC is at get_page_from_freelist+0x1d0/0x4b0
> <4>[120898.808227] LR is at get_page_from_freelist+0xc8/0x4b0
> <4>[120898.808563] pc : [<c00a600c>]    lr : [<c00a5f04>]    psr: 800000d3
> <4>[120898.808563] sp : c49fbd18  ip : 00000000  fp : c49fbd74
> <4>[120898.809020] r10: 00000000  r9 : 001000e8  r8 : 00000002
> <4>[120898.809204] r7 : 001200d2  r6 : 60000053  r5 : c0507c4c  r4 : c49fa000
> <4>[120898.809509] r3 : 001000e8  r2 : 00100100  r1 : c0507c6c  r0 : 00000001
> <4>[120898.809844] Flags: Nzcv  IRQs off  FIQs off  Mode SVC_32  ISA
> ARM  Segment kernel
> <4>[120898.810028] Control: 10c5387d  Table: 82160019  DAC: 00000017
> <4>[120898.948425] Backtrace:
> <4>[120898.948760] [<c00a5e3c>] (get_page_from_freelist+0x0/0x4b0)
> from [<c00a6398>] (__alloc_pages_internal+0xac/0x3e8)
> <4>[120898.949554] [<c00a62ec>] (__alloc_pages_internal+0x0/0x3e8)
> from [<c00b461c>] (handle_mm_fault+0x16c/0xbac)
> <4>[120898.950347] [<c00b44b0>] (handle_mm_fault+0x0/0xbac) from
> [<c00b51d0>] (__get_user_pages+0x174/0x2b4)
> <4>[120898.951019] [<c00b505c>] (__get_user_pages+0x0/0x2b4) from
> [<c00b534c>] (get_user_pages+0x3c/0x44)
> <4>[120898.951812] [<c00b5310>] (get_user_pages+0x0/0x44) from
> [<c00caf9c>] (get_arg_page+0x50/0xa4)
> <4>[120898.952636] [<c00caf4c>] (get_arg_page+0x0/0xa4) from
> [<c00cb1ec>] (copy_strings+0x108/0x210)
> <4>[120898.953430]  r7:beffffe4 r6:00000ffc r5:00000000 r4:00000018
> <4>[120898.954223] [<c00cb0e4>] (copy_strings+0x0/0x210) from
> [<c00cb330>] (copy_strings_kernel+0x3c/0x74)
> <4>[120898.955047] [<c00cb2f4>] (copy_strings_kernel+0x0/0x74) from
> [<c00cc778>] (do_execve+0x18c/0x2b0)
> <4>[120898.955841]  r5:0001e240 r4:0001e224
> <4>[120898.956329] [<c00cc5ec>] (do_execve+0x0/0x2b0) from
> [<c00400e4>] (sys_execve+0x3c/0x5c)
> <4>[120898.957153] [<c00400a8>] (sys_execve+0x0/0x5c) from
> [<c003ce80>] (ret_fast_syscall+0x0/0x2c)
> <4>[120898.957946]  r7:0000000b r6:0001e270 r5:00000000 r4:0001d580
> <4>[120898.958740] Code: e1530008 0a000006 e2429018 e1a03009 (e5b32018)
>
>
>
> --
> Best Regards
> Hu Tao
>

[-- Attachment #2: 0001-mm-Add-min_free_order_shift-tunable.patch --]
[-- Type: application/octet-stream, Size: 2105 bytes --]

From d620f695290e4ffb1586420ba1dbbb5b2c8c075d Mon Sep 17 00:00:00 2001
From: =?utf-8?q?Arve=20Hj=C3=B8nnev=C3=A5g?= <arve@android.com>
Date: Tue, 17 Feb 2009 14:51:02 -0800
Subject: [PATCH] mm: Add min_free_order_shift tunable.
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

By default the kernel tries to keep half as much memory free at each
order as it does for one order below. This can be too agressive when
running without swap.

Signed-off-by: Arve Hjønnevåg <arve@android.com>
---
 kernel/sysctl.c |    9 +++++++++
 mm/page_alloc.c |    3 ++-
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c5ef44f..0e3d9aa 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -76,6 +76,7 @@ extern int suid_dumpable;
 extern char core_pattern[];
 extern int pid_max;
 extern int min_free_kbytes;
+extern int min_free_order_shift;
 extern int pid_max_min, pid_max_max;
 extern int sysctl_drop_caches;
 extern int percpu_pagelist_fraction;
@@ -1097,6 +1098,14 @@ static struct ctl_table vm_table[] = {
 		.extra1		= &zero,
 	},
 	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "min_free_order_shift",
+		.data		= &min_free_order_shift,
+		.maxlen		= sizeof(min_free_order_shift),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec
+	},
+	{
 		.ctl_name	= VM_PERCPU_PAGELIST_FRACTION,
 		.procname	= "percpu_pagelist_fraction",
 		.data		= &percpu_pagelist_fraction,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5c44ed4..a596bfd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -119,6 +119,7 @@ static char * const zone_names[MAX_NR_ZONES] = {
 };
 
 int min_free_kbytes = 1024;
+int min_free_order_shift = 1;
 
 unsigned long __meminitdata nr_kernel_pages;
 unsigned long __meminitdata nr_all_pages;
@@ -1256,7 +1257,7 @@ int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
 		free_pages -= z->free_area[o].nr_free << o;
 
 		/* Require fewer higher order pages to be free */
-		min >>= 1;
+		min >>= min_free_order_shift;
 
 		if (free_pages <= min)
 			return 0;
-- 
1.5.4.3


[-- Attachment #3: 0002-mm-Check-if-any-page-in-a-pageblock-is-reserved-bef.patch --]
[-- Type: application/octet-stream, Size: 1747 bytes --]

From a4eb204a8029320c2dd748daf4f51fd48d337c3d Mon Sep 17 00:00:00 2001
From: =?utf-8?q?Arve=20Hj=C3=B8nnev=C3=A5g?= <arve@android.com>
Date: Wed, 18 Mar 2009 17:27:31 -0700
Subject: [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE

This fixes a problem where the first pageblock got marked MIGRATE_RESERVE even
though it only had a few free pages. This in turn caused no contiguous memory
to be reserved and frequent kswapd wakeups that emptied the caches to get more
contiguous memory.
---
 mm/page_alloc.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a596bfd..34a29e2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2551,6 +2551,20 @@ static inline unsigned long wait_table_bits(unsigned long size)
 #define LONG_ALIGN(x) (((x)+(sizeof(long))-1)&~((sizeof(long))-1))
 
 /*
+ * Check if a pageblock contains reserved pages
+ */
+static int pageblock_is_reserved(unsigned long start_pfn)
+{
+	unsigned long end_pfn = start_pfn + pageblock_nr_pages;
+	unsigned long pfn;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn++)
+		if (PageReserved(pfn_to_page(pfn)))
+			return 1;
+	return 0;
+}
+
+/*
  * Mark a number of pageblocks as MIGRATE_RESERVE. The number
  * of blocks reserved is based on zone->pages_min. The memory within the
  * reserve will tend to store contiguous free pages. Setting min_free_kbytes
@@ -2579,7 +2593,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 			continue;
 
 		/* Blocks with reserved pages will never free, skip them. */
-		if (PageReserved(page))
+		if (pageblock_is_reserved(pfn))
 			continue;
 
 		block_migratetype = get_pageblock_migratetype(page);
-- 
1.5.4.3


  reply	other threads:[~2010-04-02  3:51 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-01  4:05 [Question] race condition in mm/page_alloc.c regarding page->lru? TAO HU
2010-04-02  3:51 ` TAO HU [this message]
2010-04-02  5:03   ` KOSAKI Motohiro
2010-04-02  5:19     ` TAO HU
2010-04-02  9:48     ` Mel Gorman
2010-04-03  0:59       ` Arve Hjønnevåg
2010-04-04 22:45         ` KOSAKI Motohiro
2010-04-05 10:14         ` Mel Gorman
2010-04-05 10:49           ` Minchan Kim
2010-04-06  3:09           ` [PATCH] mm: Check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE Arve Hjønnevåg
2010-04-06  4:15             ` Minchan Kim
2010-04-06 15:11             ` Mel Gorman
2010-04-02  5:04   ` [Question] race condition in mm/page_alloc.c regarding page->lru? KAMEZAWA Hiroyuki
2010-04-02  5:15     ` Minchan Kim
2010-04-02  7:00       ` TAO HU
2010-04-02  7:22         ` Minchan Kim
2010-04-02  5:13   ` Minchan Kim
2010-04-02  6:48     ` TAO HU
2010-04-02  7:06   ` Daniel Mack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=g2g5f4a33681004012051wedea9538w9da89e210b731422@mail.gmail.com \
    --to=tghk48@motorola.com \
    --cc=Qing.Chang@motorola.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=yuan-bo.ye@motorola.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).