From: Mel Gorman <mel@csn.ul.ie>
To: Dave Chinner <david@fromorbit.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel List <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org, Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Minchan Kim <minchan.kim@gmail.com>,
Christoph Lameter <cl@linux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Wu Fengguang <fengguang.wu@intel.com>,
David Rientjes <rientjes@google.com>
Subject: Re: [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails
Date: Sun, 5 Sep 2010 19:22:56 +0100 [thread overview]
Message-ID: <20100905182256.GH8384@csn.ul.ie> (raw)
In-Reply-To: <20100904022545.GD705@dastard>
On Sat, Sep 04, 2010 at 12:25:45PM +1000, Dave Chinner wrote:
> On Fri, Sep 03, 2010 at 04:00:26PM -0700, Andrew Morton wrote:
> > On Fri, 3 Sep 2010 10:08:46 +0100
> > Mel Gorman <mel@csn.ul.ie> wrote:
> >
> > > When under significant memory pressure, a process enters direct reclaim
> > > and immediately afterwards tries to allocate a page. If it fails and no
> > > further progress is made, it's possible the system will go OOM. However,
> > > on systems with large amounts of memory, it's possible that a significant
> > > number of pages are on per-cpu lists and inaccessible to the calling
> > > process. This leads to a process entering direct reclaim more often than
> > > it should increasing the pressure on the system and compounding the problem.
> > >
> > > This patch notes that if direct reclaim is making progress but
> > > allocations are still failing that the system is already under heavy
> > > pressure. In this case, it drains the per-cpu lists and tries the
> > > allocation a second time before continuing.
> ....
> > The patch looks reasonable.
> >
> > But please take a look at the recent thread "mm: minute-long livelocks
> > in memory reclaim". There, people are pointing fingers at that
> > drain_all_pages() call, suspecting that it's causing huge IPI storms.
> >
> > Dave was going to test this theory but afaik hasn't yet done so. It
> > would be nice to tie these threads together if poss?
>
> It's been my "next-thing-to-do" since David suggested I try it -
> tracking down other problems has got in the way, though. I
> just ran my test a couple of times through:
>
> $ ./fs_mark -D 10000 -L 63 -S0 -n 100000 -s 0 \
> -d /mnt/scratch/0 -d /mnt/scratch/1 \
> -d /mnt/scratch/3 -d /mnt/scratch/2 \
> -d /mnt/scratch/4 -d /mnt/scratch/5 \
> -d /mnt/scratch/6 -d /mnt/scratch/7
>
> To create millions of inodes in parallel on an 8p/4G RAM VM.
> The filesystem is ~1.1TB XFS:
>
> # mkfs.xfs -f -d agcount=16 /dev/vdb
> meta-data=/dev/vdb isize=256 agcount=16, agsize=16777216 blks
> = sectsz=512 attr=2
> data = bsize=4096 blocks=268435456, imaxpct=5
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0
> log =internal log bsize=4096 blocks=131072, version=2
> = sectsz=512 sunit=0 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
> # mount -o inode64,delaylog,logbsize=262144,nobarrier /dev/vdb /mnt/scratch
>
Unfortunately, I doubt I'll be able to reproduce this test. I don't have
access to a machine with enough processors or disk. I will try on 4p/4G
and 500M and see how that pans out.
> Performance prior to this patch was that each iteration resulted in
> ~65k files/s, with occassionaly peaks to 90k files/s, but drops to
> frequently 45k files/s when reclaim ran to reclaim the inode
> caches. This load ran permanently at 800% CPU usage.
>
> Every so often (may once or twice a 50M inode create run) all 8 CPUs
> would remain pegged but the create rate would drop to zero for a few
> seconds to a couple of minutes. that was the livelock issues I
> reported.
>
Should be easy to spot at least.
> With this patchset, I'm seeing a per-iteration average of ~77k
> files/s, with only a couple of iterations dropping down to ~55k
> file/s and a significantly number above 90k/s. The runtime to 50M
> inodes is down by ~30% and the average CPU usage across the run is
> around 700%. IOWs, there a significant gain in performance there is
> a significant drop in CPU usage. I've done two runs to 50m inodes,
> and not seen any sign of a livelock, even for short periods of time.
>
Very cool.
> Ah, spoke too soon - I let the second run keep going, and at ~68M
> inodes it's just pegged all the CPUs and is pretty much completely
> wedged. Serial console is not responding, I can't get a new login,
> and the only thing responding that tells me the machine is alive is
> the remote PCP monitoring. It's been stuck for 5 minutes .... and
> now it is back. Here's what I saw:
>
> http://userweb.kernel.org/~dgc/shrinker-2.6.36/fs_mark-wedge-1.png
>
> The livelock is at the right of the charts, where the top chart is
> all red (system CPU time), and the other charts flat line to zero.
>
> And according to fsmark:
>
> 1 66400000 0 64554.2 7705926
> 1 67200000 0 64836.1 7573013
> <hang happened here>
> 2 68000000 0 69472.8 7941399
> 2 68800000 0 85017.5 7585203
>
> it didn't record any change in performance, which means the livelock
> probably occurred between iterations. I couldn't get any info on
> what caused the livelock this time so I can only assume it has the
> same cause....
>
Not sure where you could have gotten stuck. I thought it might have
locked up in congestion_wait() but it wouldn't have locked up this badly
if that was teh case. Sluggish sure but not that dead.
I'll see about reproducing with your test tomorrow and see what I find.
Thanks.
> Still, given the improvements in performance from this patchset,
> I'd say inclusion is a no-braniner....
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-09-05 18:23 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-03 9:08 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V4 Mel Gorman
2010-09-03 9:08 ` [PATCH 1/3] mm: page allocator: Update free page counters after pages are placed on the free list Mel Gorman
2010-09-03 22:38 ` Andrew Morton
2010-09-05 18:06 ` Mel Gorman
2010-09-03 9:08 ` [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake Mel Gorman
2010-09-03 22:55 ` Andrew Morton
2010-09-03 23:17 ` Christoph Lameter
2010-09-03 23:28 ` Andrew Morton
2010-09-04 0:54 ` Christoph Lameter
2010-09-05 18:12 ` Mel Gorman
2010-09-03 9:08 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-09-03 23:00 ` Andrew Morton
2010-09-04 2:25 ` Dave Chinner
2010-09-04 3:21 ` Andrew Morton
2010-09-04 7:58 ` Dave Chinner
2010-09-04 8:14 ` Dave Chinner
[not found] ` <20100905015400.GA10714@localhost>
[not found] ` <20100905021555.GG705@dastard>
[not found] ` <20100905060539.GA17450@localhost>
[not found] ` <20100905131447.GJ705@dastard>
2010-09-05 13:45 ` Wu Fengguang
2010-09-05 23:33 ` Dave Chinner
2010-09-06 4:02 ` Dave Chinner
2010-09-06 8:40 ` Mel Gorman
2010-09-06 21:50 ` Dave Chinner
2010-09-08 8:49 ` Dave Chinner
2010-09-09 12:39 ` Mel Gorman
2010-09-10 6:17 ` Dave Chinner
2010-09-07 14:23 ` Christoph Lameter
2010-09-08 2:13 ` Wu Fengguang
2010-09-04 3:23 ` Wu Fengguang
2010-09-04 3:59 ` Andrew Morton
2010-09-04 4:37 ` Wu Fengguang
2010-09-05 18:22 ` Mel Gorman [this message]
2010-09-05 18:14 ` Mel Gorman
2010-09-08 7:43 ` KOSAKI Motohiro
2010-09-08 20:05 ` Christoph Lameter
2010-09-09 12:41 ` Mel Gorman
2010-09-09 13:45 ` Christoph Lameter
2010-09-09 13:55 ` Mel Gorman
2010-09-09 14:32 ` Christoph Lameter
2010-09-09 15:05 ` Mel Gorman
2010-09-10 2:56 ` KOSAKI Motohiro
2010-09-03 23:05 ` [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V4 Andrew Morton
2010-09-21 11:17 ` Mel Gorman
2010-09-21 12:58 ` [stable] " Greg KH
2010-09-21 14:23 ` Mel Gorman
2010-09-23 18:49 ` Greg KH
2010-09-24 9:14 ` Mel Gorman
-- strict thread matches above, loose matches on Subject: below --
2010-08-31 17:37 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V3 Mel Gorman
2010-08-31 17:37 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-08-31 18:26 ` Christoph Lameter
2010-08-23 8:00 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V2 Mel Gorman
2010-08-23 8:00 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-08-23 23:17 ` KOSAKI Motohiro
2010-08-16 9:42 [RFC PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator Mel Gorman
2010-08-16 9:42 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-08-16 14:50 ` Rik van Riel
2010-08-17 2:57 ` Minchan Kim
2010-08-18 3:02 ` KAMEZAWA Hiroyuki
2010-08-19 14:47 ` Minchan Kim
2010-08-19 15:10 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100905182256.GH8384@csn.ul.ie \
--to=mel@csn.ul.ie \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=fengguang.wu@intel.com \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan.kim@gmail.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).