Re: [PATCH 2/2 v4]mm: batch activate_page() to reduce lock contention

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Shaohua Li <shaohua.li@intel.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm <linux-mm@kvack.org>, Andi Kleen <andi@firstfloor.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>, mel <mel@csn.ul.ie>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH 2/2 v4]mm: batch activate_page() to reduce lock contention
Date: Tue, 15 Mar 2011 09:53:34 +0800	[thread overview]
Message-ID: <1300154014.2337.74.camel@sli10-conroe> (raw)
In-Reply-To: <20110314144540.GC11699@barrios-desktop>

On Mon, 2011-03-14 at 22:45 +0800, Minchan Kim wrote:
> On Thu, Mar 10, 2011 at 01:30:19PM +0800, Shaohua Li wrote:
> > The zone->lru_lock is heavily contented in workload where activate_page()
> > is frequently used. We could do batch activate_page() to reduce the lock
> > contention. The batched pages will be added into zone list when the pool
> > is full or page reclaim is trying to drain them.
> > 
> > For example, in a 4 socket 64 CPU system, create a sparse file and 64 processes,
> > processes shared map to the file. Each process read access the whole file and
> > then exit. The process exit will do unmap_vmas() and cause a lot of
> > activate_page() call. In such workload, we saw about 58% total time reduction
> > with below patch. Other workloads with a lot of activate_page also benefits a
> > lot too.
> > 
> > Andrew Morton suggested activate_page() and putback_lru_pages() should
> > follow the same path to active pages, but this is hard to implement (see commit
> > 7a608572a282a). On the other hand, do we really need putback_lru_pages() to
> > follow the same path? I tested several FIO/FFSB benchmark (about 20 scripts for
> > each benchmark) in 3 machines here from 2 sockets to 4 sockets. My test doesn't
> > show anything significant with/without below patch (there is slight difference
> > but mostly some noise which we found even without below patch before). Below
> > patch basically returns to the same as my first post.
> > 
> > I tested some microbenchmarks:
> > case-anon-cow-rand-mt               0.58%
> > case-anon-cow-rand          -3.30%
> > case-anon-cow-seq-mt                -0.51%
> > case-anon-cow-seq           -5.68%
> > case-anon-r-rand-mt         0.23%
> > case-anon-r-rand            0.81%
> > case-anon-r-seq-mt          -0.71%
> > case-anon-r-seq                     -1.99%
> > case-anon-rx-rand-mt                2.11%
> > case-anon-rx-seq-mt         3.46%
> > case-anon-w-rand-mt         -0.03%
> > case-anon-w-rand            -0.50%
> > case-anon-w-seq-mt          -1.08%
> > case-anon-w-seq                     -0.12%
> > case-anon-wx-rand-mt                -5.02%
> > case-anon-wx-seq-mt         -1.43%
> > case-fork                   1.65%
> > case-fork-sleep                     -0.07%
> > case-fork-withmem           1.39%
> > case-hugetlb                        -0.59%
> > case-lru-file-mmap-read-mt  -0.54%
> > case-lru-file-mmap-read             0.61%
> > case-lru-file-mmap-read-rand        -2.24%
> > case-lru-file-readonce              -0.64%
> > case-lru-file-readtwice             -11.69%
> > case-lru-memcg                      -1.35%
> > case-mmap-pread-rand-mt             1.88%
> > case-mmap-pread-rand                -15.26%
> > case-mmap-pread-seq-mt              0.89%
> > case-mmap-pread-seq         -69.72%
> > case-mmap-xread-rand-mt             0.71%
> > case-mmap-xread-seq-mt              0.38%
> > 
> > The most significent are:
> > case-lru-file-readtwice             -11.69%
> > case-mmap-pread-rand                -15.26%
> > case-mmap-pread-seq         -69.72%
> > 
> > which use activate_page a lot.  others are basically variations because
> > each run has slightly difference.
> > 
> > In UP case, 'size mm/swap.o'
> > before the two patches:
> >    text    data     bss     dec     hex filename
> >    6466     896       4    7366    1cc6 mm/swap.o
> > after the two patches:
> >    text    data     bss     dec     hex filename
> >    6343     896       4    7243    1c4b mm/swap.o
> > 
> > Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> > 
> > ---
> >  mm/swap.c |   45 ++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 40 insertions(+), 5 deletions(-)
> > 
> > Index: linux/mm/swap.c
> > ===================================================================
> > --- linux.orig/mm/swap.c	2011-03-09 12:56:09.000000000 +0800
> > +++ linux/mm/swap.c	2011-03-09 12:56:46.000000000 +0800
> > @@ -272,14 +272,10 @@ static void update_page_reclaim_stat(str
> >  		memcg_reclaim_stat->recent_rotated[file]++;
> >  }
> >  
> > -/*
> > - * FIXME: speed this up?
> > - */
> > -void activate_page(struct page *page)
> > +static void __activate_page(struct page *page, void *arg)
> >  {
> >  	struct zone *zone = page_zone(page);
> >  
> > -	spin_lock_irq(&zone->lru_lock);
> >  	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
> >  		int file = page_is_file_cache(page);
> >  		int lru = page_lru_base_type(page);
> > @@ -292,8 +288,45 @@ void activate_page(struct page *page)
> >  
> >  		update_page_reclaim_stat(zone, page, file, 1);
> >  	}
> > +}
> > +
> > +#ifdef CONFIG_SMP
> > +static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs);
> > +
> > +static void activate_page_drain(int cpu)
> > +{
> > +	struct pagevec *pvec = &per_cpu(activate_page_pvecs, cpu);
> > +
> > +	if (pagevec_count(pvec))
> > +		pagevec_lru_move_fn(pvec, __activate_page, NULL);
> > +}
> > +
> > +void activate_page(struct page *page)
> > +{
> > +	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
> > +		struct pagevec *pvec = &get_cpu_var(activate_page_pvecs);
> > +
> > +		page_cache_get(page);
> > +		if (!pagevec_add(pvec, page))
> > +			pagevec_lru_move_fn(pvec, __activate_page, NULL);
> > +		put_cpu_var(activate_page_pvecs);
> > +	}
> > +}
> > +
> > +#else
> > +static inline void activate_page_drain(int cpu)
> > +{
> > +}
> > +
> > +void activate_page(struct page *page)
> > +{
> > +	struct zone *zone = page_zone(page);
> > +
> > +	spin_lock_irq(&zone->lru_lock);
> > +	__activate_page(page, NULL);
> >  	spin_unlock_irq(&zone->lru_lock);
> >  }
> > +#endif
>  
> Why do we need CONFIG_SMP in only activate_page_pvecs?
> The per-cpu of activate_page_pvecs consumes lots of memory in UP?
> I don't think so. But if it consumes lots of memory, it's a problem
> of per-cpu. 
No, not too much memory.

> I can't understand why we should hanlde activate_page_pvecs specially.
> Please, enlighten me. 
Not it's special. akpm asked me to do it this time. Reducing little
memory is still worthy anyway, so that's it. We can do it for other
pvecs too, in separate patch.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-03-15  1:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-10  5:30 [PATCH 2/2 v4]mm: batch activate_page() to reduce lock contention Shaohua Li
2011-03-14 14:45 ` Minchan Kim
2011-03-15  1:53   ` Shaohua Li [this message]
2011-03-15  2:12     ` Minchan Kim
2011-03-15  2:28       ` Andrew Morton
2011-03-15  2:40         ` Minchan Kim
2011-03-15  2:44           ` Andrew Morton
2011-03-15  2:59             ` Minchan Kim
2011-03-15  2:32       ` KOSAKI Motohiro
2011-03-15  2:43         ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1300154014.2337.74.camel@sli10-conroe \
    --to=shaohua.li@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=hannes@cmpxchg.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=minchan.kim@gmail.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).