All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	akpm@linux-foundation.org, Ury Stankevich <urykhy@gmail.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	stable@kernel.org
Subject: Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous
Date: Mon, 6 Jun 2011 11:43:45 +0100	[thread overview]
Message-ID: <20110606104345.GE5247@suse.de> (raw)
In-Reply-To: <20110604065853.GA4114@barrios-laptop>

On Sat, Jun 04, 2011 at 03:58:53PM +0900, Minchan Kim wrote:
> On Fri, Jun 03, 2011 at 03:49:41PM +0100, Mel Gorman wrote:
> > On Fri, Jun 03, 2011 at 03:09:20AM +0100, Mel Gorman wrote:
> > > On Thu, Jun 02, 2011 at 05:37:54PM +0200, Andrea Arcangeli wrote:
> > > > > There is an explanation in here somewhere because as I write this,
> > > > > the test machine has survived 14 hours under continual stress without
> > > > > the isolated counters going negative with over 128 million pages
> > > > > successfully migrated and a million pages failed to migrate due to
> > > > > direct compaction being called 80,000 times. It's possible it's a
> > > > > co-incidence but it's some co-incidence!
> > > > 
> > > > No idea...
> > > 
> > > I wasn't able to work on this most of the day but was looking at this
> > > closer this evening again and I think I might have thought of another
> > > theory that could cause this problem.
> > > 
> > > When THP is isolating pages, it accounts for the pages isolated against
> > > the zone of course. If it backs out, it finds the pages from the PTEs.
> > > On !SMP but PREEMPT, we may not have adequate protection against a new
> > > page from a different zone being inserted into the PTE causing us to
> > > decrement against the wrong zone. While the global counter is fine,
> > > the per-zone counters look corrupted. You'd still think it was the
> > > anon counter tht got screwed rather than the file one if it really was
> > > THP unfortunately so it's not the full picture. I'm going to start
> > > a test monitoring both zoneinfo and vmstat to see if vmstat looks
> > > fine while the per-zone counters that are negative are offset by a
> > > positive count on the other zones that when added together become 0.
> > > Hopefully it'll actually trigger overnight :/
> > > 
> > 
> > Right idea of the wrong zone being accounted for but wrong place. I
> > think the following patch should fix the problem;
> > 
> > ==== CUT HERE ===
> > mm: compaction: Ensure that the compaction free scanner does not move to the next zone
> > 
> > Compaction works with two scanners, a migration and a free
> > scanner. When the scanners crossover, migration within the zone is
> > complete. The location of the scanner is recorded on each cycle to
> > avoid excesive scanning.
> > 
> > When a zone is small and mostly reserved, it's very easy for the
> > migration scanner to be close to the end of the zone. Then the following
> > situation can occurs
> > 
> >   o migration scanner isolates some pages near the end of the zone
> >   o free scanner starts at the end of the zone but finds that the
> >     migration scanner is already there
> >   o free scanner gets reinitialised for the next cycle as
> >     cc->migrate_pfn + pageblock_nr_pages
> >     moving the free scanner into the next zone
> >   o migration scanner moves into the next zone but continues accounting
> >     against the old zone
> > 
> > When this happens, NR_ISOLATED accounting goes haywire because some
> > of the accounting happens against the wrong zone. One zones counter
> > remains positive while the other goes negative even though the overall
> > global count is accurate. This was reported on X86-32 with !SMP because
> > !SMP allows the negative counters to be visible. The fact that it is
> > difficult to reproduce on X86-64 is probably just a co-incidence as
> 
> I guess it's related to zone sizes.
> X86-64 has small DMA and large DMA32 zones for fallback of NORMAL while
> x86 has just a small DMA(16M) zone.
> 

Yep, this is a possibility as well as the use of lowmem reserves.

> I think DMA zone in x86 is easily full of non-LRU or non-movable pages.

Maybe not full, but it has more PageReserved pages than anywhere else
and few MIGRATE_MOVABLE blocks. MIGRATE_MOVABLE gets skipped during
async compaction we could easily reach the end of the DMA zone quickly.

> So isolate_migratepagse continues to scan for finding pages which are migratable
> and then it reaches near end of zone.
> 
> > the bug should theoritically be possible there.
>
> Finally, you found it. Congratulations on!
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> 
> When we are debugging this problem, we found a few of bugs and enhance points
> and submitted patches. It was a very good chance to fix Linux VM.
> 

Thanks.

-- 
Mel Gorman
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	akpm@linux-foundation.org, Ury Stankevich <urykhy@gmail.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	stable@kernel.org
Subject: Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous
Date: Mon, 6 Jun 2011 11:43:45 +0100	[thread overview]
Message-ID: <20110606104345.GE5247@suse.de> (raw)
In-Reply-To: <20110604065853.GA4114@barrios-laptop>

On Sat, Jun 04, 2011 at 03:58:53PM +0900, Minchan Kim wrote:
> On Fri, Jun 03, 2011 at 03:49:41PM +0100, Mel Gorman wrote:
> > On Fri, Jun 03, 2011 at 03:09:20AM +0100, Mel Gorman wrote:
> > > On Thu, Jun 02, 2011 at 05:37:54PM +0200, Andrea Arcangeli wrote:
> > > > > There is an explanation in here somewhere because as I write this,
> > > > > the test machine has survived 14 hours under continual stress without
> > > > > the isolated counters going negative with over 128 million pages
> > > > > successfully migrated and a million pages failed to migrate due to
> > > > > direct compaction being called 80,000 times. It's possible it's a
> > > > > co-incidence but it's some co-incidence!
> > > > 
> > > > No idea...
> > > 
> > > I wasn't able to work on this most of the day but was looking at this
> > > closer this evening again and I think I might have thought of another
> > > theory that could cause this problem.
> > > 
> > > When THP is isolating pages, it accounts for the pages isolated against
> > > the zone of course. If it backs out, it finds the pages from the PTEs.
> > > On !SMP but PREEMPT, we may not have adequate protection against a new
> > > page from a different zone being inserted into the PTE causing us to
> > > decrement against the wrong zone. While the global counter is fine,
> > > the per-zone counters look corrupted. You'd still think it was the
> > > anon counter tht got screwed rather than the file one if it really was
> > > THP unfortunately so it's not the full picture. I'm going to start
> > > a test monitoring both zoneinfo and vmstat to see if vmstat looks
> > > fine while the per-zone counters that are negative are offset by a
> > > positive count on the other zones that when added together become 0.
> > > Hopefully it'll actually trigger overnight :/
> > > 
> > 
> > Right idea of the wrong zone being accounted for but wrong place. I
> > think the following patch should fix the problem;
> > 
> > ==== CUT HERE ===
> > mm: compaction: Ensure that the compaction free scanner does not move to the next zone
> > 
> > Compaction works with two scanners, a migration and a free
> > scanner. When the scanners crossover, migration within the zone is
> > complete. The location of the scanner is recorded on each cycle to
> > avoid excesive scanning.
> > 
> > When a zone is small and mostly reserved, it's very easy for the
> > migration scanner to be close to the end of the zone. Then the following
> > situation can occurs
> > 
> >   o migration scanner isolates some pages near the end of the zone
> >   o free scanner starts at the end of the zone but finds that the
> >     migration scanner is already there
> >   o free scanner gets reinitialised for the next cycle as
> >     cc->migrate_pfn + pageblock_nr_pages
> >     moving the free scanner into the next zone
> >   o migration scanner moves into the next zone but continues accounting
> >     against the old zone
> > 
> > When this happens, NR_ISOLATED accounting goes haywire because some
> > of the accounting happens against the wrong zone. One zones counter
> > remains positive while the other goes negative even though the overall
> > global count is accurate. This was reported on X86-32 with !SMP because
> > !SMP allows the negative counters to be visible. The fact that it is
> > difficult to reproduce on X86-64 is probably just a co-incidence as
> 
> I guess it's related to zone sizes.
> X86-64 has small DMA and large DMA32 zones for fallback of NORMAL while
> x86 has just a small DMA(16M) zone.
> 

Yep, this is a possibility as well as the use of lowmem reserves.

> I think DMA zone in x86 is easily full of non-LRU or non-movable pages.

Maybe not full, but it has more PageReserved pages than anywhere else
and few MIGRATE_MOVABLE blocks. MIGRATE_MOVABLE gets skipped during
async compaction we could easily reach the end of the DMA zone quickly.

> So isolate_migratepagse continues to scan for finding pages which are migratable
> and then it reaches near end of zone.
> 
> > the bug should theoritically be possible there.
>
> Finally, you found it. Congratulations on!
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> 
> When we are debugging this problem, we found a few of bugs and enhance points
> and submitted patches. It was a very good chance to fix Linux VM.
> 

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-06-06 10:43 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-30 13:13 [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous Mel Gorman
2011-05-30 13:13 ` Mel Gorman
2011-05-30 14:31 ` Andrea Arcangeli
2011-05-30 14:31   ` Andrea Arcangeli
2011-05-30 15:37   ` Mel Gorman
2011-05-30 15:37     ` Mel Gorman
2011-05-30 16:55     ` Mel Gorman
2011-05-30 16:55       ` Mel Gorman
2011-05-30 17:53       ` Andrea Arcangeli
2011-05-30 17:53         ` Andrea Arcangeli
2011-05-31 12:16         ` Minchan Kim
2011-05-31 12:16           ` Minchan Kim
2011-05-31 12:24           ` Andrea Arcangeli
2011-05-31 12:24             ` Andrea Arcangeli
2011-05-31 13:33             ` Minchan Kim
2011-05-31 13:33               ` Minchan Kim
2011-05-31 14:14               ` Andrea Arcangeli
2011-05-31 14:14                 ` Andrea Arcangeli
2011-05-31 14:37                 ` Minchan Kim
2011-05-31 14:37                   ` Minchan Kim
2011-05-31 14:38                   ` Minchan Kim
2011-05-31 14:38                     ` Minchan Kim
2011-06-02 18:23                     ` Andrea Arcangeli
2011-06-02 18:23                       ` Andrea Arcangeli
2011-06-02 20:21                       ` Minchan Kim
2011-06-02 20:21                         ` Minchan Kim
2011-06-02 20:59                         ` Minchan Kim
2011-06-02 20:59                           ` Minchan Kim
2011-06-02 22:03                           ` Andrea Arcangeli
2011-06-02 22:03                             ` Andrea Arcangeli
2011-06-02 21:40                         ` Andrea Arcangeli
2011-06-02 21:40                           ` Andrea Arcangeli
2011-06-02 22:23                           ` Minchan Kim
2011-06-02 22:23                             ` Minchan Kim
2011-06-02 22:32                             ` Andrea Arcangeli
2011-06-02 22:32                               ` Andrea Arcangeli
2011-06-02 23:01                               ` Minchan Kim
2011-06-02 23:01                                 ` Minchan Kim
2011-06-03 17:37                                 ` Andrea Arcangeli
2011-06-03 17:37                                   ` Andrea Arcangeli
2011-06-03 18:07                                   ` Andrea Arcangeli
2011-06-03 18:07                                     ` Andrea Arcangeli
2011-06-04  7:59                                     ` Minchan Kim
2011-06-04  7:59                                       ` Minchan Kim
2011-06-06 10:32                                     ` Mel Gorman
2011-06-06 10:32                                       ` Mel Gorman
2011-06-06 12:49                                       ` Andrea Arcangeli
2011-06-06 12:49                                         ` Andrea Arcangeli
2011-06-06 14:47                                         ` Mel Gorman
2011-06-06 14:47                                           ` Mel Gorman
2011-06-06 14:07                                       ` Minchan Kim
2011-06-06 14:07                                         ` Minchan Kim
2011-06-06 10:15                                 ` Mel Gorman
2011-06-06 10:15                                   ` Mel Gorman
2011-06-06 10:26                                   ` Mel Gorman
2011-06-06 10:26                                     ` Mel Gorman
2011-06-06 14:01                                   ` Minchan Kim
2011-06-06 14:01                                     ` Minchan Kim
2011-06-06 14:26                                   ` Minchan Kim
2011-06-06 14:26                                     ` Minchan Kim
2011-06-02 23:02                       ` Minchan Kim
2011-06-02 23:02                         ` Minchan Kim
2011-06-01  0:57                 ` Mel Gorman
2011-06-01  0:57                   ` Mel Gorman
2011-06-01  9:24                   ` Mel Gorman
2011-06-01  9:24                     ` Mel Gorman
2011-06-01 17:58                   ` Mel Gorman
2011-06-01 17:58                     ` Mel Gorman
2011-06-01 19:15                     ` Andrea Arcangeli
2011-06-01 19:15                       ` Andrea Arcangeli
2011-06-01 21:40                       ` Mel Gorman
2011-06-01 21:40                         ` Mel Gorman
2011-06-01 23:30                         ` Andrea Arcangeli
2011-06-01 23:30                           ` Andrea Arcangeli
2011-06-02  1:03                           ` Mel Gorman
2011-06-02  1:03                             ` Mel Gorman
2011-06-02  8:34                             ` Minchan Kim
2011-06-02  8:34                               ` Minchan Kim
2011-06-02 13:29                             ` Andrea Arcangeli
2011-06-02 13:29                               ` Andrea Arcangeli
2011-06-02 14:50                               ` Mel Gorman
2011-06-02 14:50                                 ` Mel Gorman
2011-06-02 15:37                                 ` Andrea Arcangeli
2011-06-02 15:37                                   ` Andrea Arcangeli
2011-06-03  2:09                                   ` Mel Gorman
2011-06-03  2:09                                     ` Mel Gorman
2011-06-03 14:49                                     ` Mel Gorman
2011-06-03 14:49                                       ` Mel Gorman
2011-06-03 15:45                                       ` Andrea Arcangeli
2011-06-03 15:45                                         ` Andrea Arcangeli
2011-06-04  7:25                                         ` Minchan Kim
2011-06-04  7:25                                           ` Minchan Kim
2011-06-06 10:39                                         ` Mel Gorman
2011-06-06 10:39                                           ` Mel Gorman
2011-06-06 12:38                                           ` Andrea Arcangeli
2011-06-06 12:38                                             ` Andrea Arcangeli
2011-06-06 14:55                                             ` Mel Gorman
2011-06-06 14:55                                               ` Mel Gorman
2011-06-06 14:19                                           ` Minchan Kim
2011-06-06 14:19                                             ` Minchan Kim
2011-06-06 22:32                                         ` Andrew Morton
2011-06-06 22:32                                           ` Andrew Morton
2011-06-04  6:58                                       ` Minchan Kim
2011-06-04  6:58                                         ` Minchan Kim
2011-06-06 10:43                                         ` Mel Gorman [this message]
2011-06-06 10:43                                           ` Mel Gorman
2011-06-06 12:40                                           ` Andrea Arcangeli
2011-06-06 12:40                                             ` Andrea Arcangeli
2011-06-06 13:27                                             ` Minchan Kim
2011-06-06 13:27                                               ` Minchan Kim
2011-06-06 13:23                                           ` Minchan Kim
2011-06-06 13:23                                             ` Minchan Kim
2011-05-31 14:34         ` Mel Gorman
2011-05-31 14:34           ` Mel Gorman
2011-05-30 14:45 ` [stable] " Greg KH
2011-05-30 14:45   ` Greg KH
2011-05-30 16:14 ` Minchan Kim
2011-05-30 16:14   ` Minchan Kim
2011-05-31  8:32   ` Mel Gorman
2011-05-31  8:32     ` Mel Gorman
2011-05-31  4:48 ` KAMEZAWA Hiroyuki
2011-05-31  4:48   ` KAMEZAWA Hiroyuki
2011-05-31  5:38   ` Minchan Kim
2011-05-31  5:38     ` Minchan Kim
2011-05-31  7:14 ` KOSAKI Motohiro
2011-05-31  7:14   ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110606104345.GE5247@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=stable@kernel.org \
    --cc=urykhy@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.