linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Christoph Lameter <cl@linux.com>
Cc: Shaohua Li <shaohua.li@intel.com>, linux-mm@kvack.org
Subject: Re: zone state overhead
Date: Tue, 28 Sep 2010 14:30:59 +0100	[thread overview]
Message-ID: <20100928133059.GL8187@csn.ul.ie> (raw)
In-Reply-To: <alpine.DEB.2.00.1009280736020.4144@router.home>

On Tue, Sep 28, 2010 at 07:39:24AM -0500, Christoph Lameter wrote:
> On Tue, 28 Sep 2010, Shaohua Li wrote:
> 
> > In a 4 socket 64 CPU system, zone_nr_free_pages() takes about 5% ~ 10% cpu time
> > according to perf when memory pressure is high. The workload does something
> > like:
> > for i in `seq 1 $nr_cpu`
> > do
> >         create_sparse_file $SPARSE_FILE-$i $((10 * mem / nr_cpu))
> >         $USEMEM -f $SPARSE_FILE-$i -j 4096 --readonly $((10 * mem / nr_cpu)) &
> > done
> > this simply reads a sparse file for each CPU. Apparently the
> > zone->percpu_drift_mark is too big, and guess zone_page_state_snapshot() makes
> > a lot of cache bounce for ->vm_stat_diff[]. below is the zoneinfo for reference.
> > Is there any way to reduce the overhead?
> 

The overhead is higher than I would have expected. I would guess the
cache bounces are a real problem.

> I guess Mel could reduce the percpu_drift_mark? Or tune that with a
> reduction in the stat_threshold? The less the count can deviate the less
> the percpu_drift_mark has to be and the less we need calls to
> zone_page_state_snapshot.
> 

This is true. It's helpful to remember why this patch exists. Under heavy
memory pressure, large machines run the risk of live-locking because the
NR_FREE_PAGES gets out of sync. The test case mentioned above is under
memory pressure so it is potentially at risk. Ordinarily, we would be less
concerned with performance under heavy memory pressure and more concerned with
correctness of behaviour. The percpu_drift_mark is set at a point where the
risk is "real".  Lowering it will help performance but increase risk. Reducing
stat_threshold shifts the cost elsewhere by increasing the frequency the
vmstat counters are updated which I considered to be worse overall.

Which of these is better or is there an alternative suggestion on how
this livelock can be avoided?

As a heads up, I'm preparing for exams at the moment and while I'm online, I'm
not in the position to prototype patches and test them at the moment but can
review alternative proposals if people have them. I'm also out early next week.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-09-28 13:31 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-28  5:08 zone state overhead Shaohua Li
2010-09-28 12:39 ` Christoph Lameter
2010-09-28 13:30   ` Mel Gorman [this message]
2010-09-28 13:40     ` Christoph Lameter
2010-09-28 13:51       ` Mel Gorman
2010-09-28 14:08         ` Christoph Lameter
2010-09-29  3:02           ` Shaohua Li
2010-09-29  4:02     ` David Rientjes
2010-09-29  4:47       ` Shaohua Li
2010-09-29  5:06         ` David Rientjes
2010-09-29 10:03       ` Mel Gorman
2010-09-29 14:12         ` Christoph Lameter
2010-09-29 14:17           ` Mel Gorman
2010-09-29 14:34             ` Christoph Lameter
2010-09-29 14:41               ` Mel Gorman
2010-09-29 14:45                 ` Mel Gorman
2010-09-29 14:54                   ` Christoph Lameter
2010-09-29 14:52                 ` Christoph Lameter
2010-09-29 19:44         ` David Rientjes
2010-10-08 15:29 ` Mel Gorman
2010-10-09  0:58   ` Shaohua Li
2010-10-11  8:56     ` Mel Gorman
2010-10-12  1:05       ` Shaohua Li
2010-10-12 16:25         ` Mel Gorman
2010-10-13  2:41           ` Shaohua Li
2010-10-13 12:09             ` Mel Gorman
2010-10-13  3:36           ` KOSAKI Motohiro
2010-10-13  6:25             ` [RFC][PATCH 0/3] mm: reserve max drift pages at boot time instead using zone_page_state_snapshot() KOSAKI Motohiro
2010-10-13  6:27               ` [RFC][PATCH 1/3] mm, mem-hotplug: recalculate lowmem_reserve when memory hotplug occur KOSAKI Motohiro
2010-10-13  6:39                 ` KAMEZAWA Hiroyuki
2010-10-13 12:59                 ` Mel Gorman
2010-10-14  2:44                   ` KOSAKI Motohiro
2010-10-13  6:28               ` [RFC][PATCH 2/3] mm: update pcp->stat_threshold " KOSAKI Motohiro
2010-10-13  6:40                 ` KAMEZAWA Hiroyuki
2010-10-13 13:02                 ` Mel Gorman
2010-10-13  6:32               ` [RFC][PATCH 3/3] mm: reserve max drift pages at boot time instead using zone_page_state_snapshot() KOSAKI Motohiro
2010-10-13 13:19                 ` Mel Gorman
2010-10-14  2:39                   ` KOSAKI Motohiro
2010-10-18 10:43                     ` Mel Gorman
2010-10-13  7:10               ` [experimental][PATCH] mm,vmstat: per cpu stat flush too when per cpu page cache flushed KOSAKI Motohiro
2010-10-13  7:16                 ` KAMEZAWA Hiroyuki
2010-10-13 13:22                 ` Mel Gorman
2010-10-14  2:50                   ` KOSAKI Motohiro
2010-10-15 17:31                     ` Christoph Lameter
2010-10-18  9:27                       ` KOSAKI Motohiro
2010-10-18 15:44                         ` Christoph Lameter
2010-10-19  1:10                           ` KOSAKI Motohiro
2010-10-18 11:08                     ` Mel Gorman
2010-10-19  1:34                       ` KOSAKI Motohiro
2010-10-19  9:06                         ` Mel Gorman
2010-10-18 15:51                 ` Christoph Lameter
2010-10-19  0:43                   ` KOSAKI Motohiro
2010-10-13 11:24             ` zone state overhead Mel Gorman
2010-10-14  3:07               ` KOSAKI Motohiro
2010-10-18 10:39                 ` Mel Gorman
2010-10-19  1:16                   ` KOSAKI Motohiro
2010-10-19  9:08                     ` Mel Gorman
2010-10-22 14:12                       ` Mel Gorman
2010-10-22 15:23                         ` Christoph Lameter
2010-10-22 18:45                           ` Mel Gorman
2010-10-22 15:27                         ` Christoph Lameter
2010-10-22 18:46                           ` Mel Gorman
2010-10-22 20:01                             ` Christoph Lameter
2010-10-25  4:46                         ` KOSAKI Motohiro
2010-10-27  8:19                           ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100928133059.GL8187@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=cl@linux.com \
    --cc=linux-mm@kvack.org \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).