All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: kosaki.motohiro@jp.fujitsu.com, nickpiggin@yahoo.com.au,
	linux-mm@kvack.org, riel@redhat.com, lee.schermerhorn@hp.com
Subject: Re: vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch
Date: Fri, 10 Oct 2008 15:33:46 -0700	[thread overview]
Message-ID: <20081010153346.e25b90f7.akpm@linux-foundation.org> (raw)
In-Reply-To: <20081010152540.79ed64cb.akpm@linux-foundation.org>

On Fri, 10 Oct 2008 15:25:40 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Fri, 10 Oct 2008 15:17:01 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Wed,  8 Oct 2008 19:03:07 +0900 (JST)
> > KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> > 
> > > Hi
> > > 
> > > Nick, Andrew, very thanks for good advice.
> > > your helpful increase my investigate speed.
> > > 
> > > 
> > > > This patch, like I said when it was first merged, has the problem that
> > > > it can cause large stalls when reclaiming pages.
> > > > 
> > > > I actually myself tried a similar thing a long time ago. The problem is
> > > > that after a long period of no reclaiming, your file pages can all end
> > > > up being active and referenced. When the first guy wants to reclaim a
> > > > page, it might have to scan through gigabytes of file pages before being
> > > > able to reclaim a single one.
> > > 
> > > I perfectly agree this opinion.
> > > all pages stay on active list is awful.
> > > 
> > > In addition, my mesurement tell me this patch cause latency degression on really heavy io workload.
> > > 
> > > 2.6.27-rc8: Throughput 13.4231 MB/sec  4000 clients  4000 procs  max_latency=1421988.159 ms
> > >  + patch  : Throughput 12.0953 MB/sec  4000 clients  4000 procs  max_latency=1731244.847 ms
> > > 
> > > 
> > > > While it would be really nice to be able to just lazily set PageReferenced
> > > > and nothing else in mark_page_accessed, and then do file page aging based
> > > > on the referenced bit, the fact is that we virtually have O(1) reclaim
> > > > for file pages now, and this can make it much more like O(n) (in worst case,
> > > > especially).
> > > > 
> > > > I don't think it is right to say "we broke aging and this patch fixes it".
> > > > It's all a big crazy heuristic. Who's to say that the previous behaviour
> > > > wasn't better and this patch breaks it? :)
> > > > 
> > > > Anyway, I don't think it is exactly productive to keep patches like this in
> > > > the tree (that doesn't seem ever intended to be merged) while there are
> > > > other big changes to reclaim there.
> > 
> > Well yes.  I've been hanging onto these in the hope that someone would
> > work out whether they are changes which we should make.
> > 
> > 
> > > > Same for vm-dont-run-touch_buffer-during-buffercache-lookups.patch
> > > 
> > > I mesured it too,
> > > 
> > > 2.6.27-rc8: Throughput 13.4231 MB/sec  4000 clients  4000 procs  max_latency=1421988.159 ms
> > >  + patch  : Throughput 11.8494 MB/sec  4000 clients  4000 procs  max_latency=3463217.227 ms
> > > 
> > > dbench latency increased about x2.5
> > > 
> > > So, the patch desctiption already descibe this risk. 
> > > metadata dropping can decrease performance largely.
> > > that just appeared, imho.
> > 
> > Oh well, that'll suffice, thanks - I'll drop them.
> 
> Which means that after vmscan-split-lru-lists-into-anon-file-sets.patch,
> shrink_active_list() simply does
> 
> 	while (!list_empty(&l_hold)) {
> 		cond_resched();
> 		page = lru_to_page(&l_hold);
> 		list_add(&page->lru, &l_inactive);
> 	}
> 
> yes?
> 
> We might even be able to list_splice those pages..

OK, that wasn't a particularly good time to drop those patches.

Here's how shrink_active_list() ended up:

static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
			struct scan_control *sc, int priority, int file)
{
	unsigned long pgmoved;
	int pgdeactivate = 0;
	unsigned long pgscanned;
	LIST_HEAD(l_hold);	/* The pages which were snipped off */
	LIST_HEAD(l_active);
	LIST_HEAD(l_inactive);
	struct page *page;
	struct pagevec pvec;
	enum lru_list lru;

	lru_add_drain();
	spin_lock_irq(&zone->lru_lock);
	pgmoved = sc->isolate_pages(nr_pages, &l_hold, &pgscanned, sc->order,
					ISOLATE_ACTIVE, zone,
					sc->mem_cgroup, 1, file);
	/*
	 * zone->pages_scanned is used for detect zone's oom
	 * mem_cgroup remembers nr_scan by itself.
	 */
	if (scan_global_lru(sc)) {
		zone->pages_scanned += pgscanned;
		zone->recent_scanned[!!file] += pgmoved;
	}

	if (file)
		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
	else
		__mod_zone_page_state(zone, NR_ACTIVE_ANON, -pgmoved);
	spin_unlock_irq(&zone->lru_lock);

	pgmoved = 0;
	while (!list_empty(&l_hold)) {
		cond_resched();
		page = lru_to_page(&l_hold);
		list_del(&page->lru);

		if (unlikely(!page_evictable(page, NULL))) {
			putback_lru_page(page);
			continue;
		}

		list_add(&page->lru, &l_inactive);
		if (!page_mapping_inuse(page)) {
			/*
			 * Bypass use-once, make the next access count. See
			 * mark_page_accessed and shrink_page_list.
			 */
			SetPageReferenced(page);
		}
	}

	/*
	 * Count the referenced pages as rotated, even when they are moved
	 * to the inactive list.  This helps balance scan pressure between
	 * file and anonymous pages in get_scan_ratio.
 	 */
	zone->recent_rotated[!!file] += pgmoved;

	/*
	 * Now put the pages back on the appropriate [file or anon] inactive
	 * and active lists.
	 */
	pagevec_init(&pvec, 1);
	pgmoved = 0;
	lru = LRU_BASE + file * LRU_FILE;
	spin_lock_irq(&zone->lru_lock);
	while (!list_empty(&l_inactive)) {
		page = lru_to_page(&l_inactive);
		prefetchw_prev_lru_page(page, &l_inactive, flags);
		VM_BUG_ON(PageLRU(page));
		SetPageLRU(page);
		VM_BUG_ON(!PageActive(page));
		ClearPageActive(page);

		list_move(&page->lru, &zone->lru[lru].list);
		mem_cgroup_move_lists(page, lru);
		pgmoved++;
		if (!pagevec_add(&pvec, page)) {
			__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
			spin_unlock_irq(&zone->lru_lock);
			pgdeactivate += pgmoved;
			pgmoved = 0;
			if (buffer_heads_over_limit)
				pagevec_strip(&pvec);
			__pagevec_release(&pvec);
			spin_lock_irq(&zone->lru_lock);
		}
	}
	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
	pgdeactivate += pgmoved;
	if (buffer_heads_over_limit) {
		spin_unlock_irq(&zone->lru_lock);
		pagevec_strip(&pvec);
		spin_lock_irq(&zone->lru_lock);
	}

	pgmoved = 0;
	lru = LRU_ACTIVE + file * LRU_FILE;
	while (!list_empty(&l_active)) {
		page = lru_to_page(&l_active);
		prefetchw_prev_lru_page(page, &l_active, flags);
		VM_BUG_ON(PageLRU(page));
		SetPageLRU(page);
		VM_BUG_ON(!PageActive(page));

		list_move(&page->lru, &zone->lru[lru].list);
		mem_cgroup_move_lists(page, lru);
		pgmoved++;
		if (!pagevec_add(&pvec, page)) {
			__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
			pgmoved = 0;
			spin_unlock_irq(&zone->lru_lock);
			if (vm_swap_full())
				pagevec_swap_free(&pvec);
			__pagevec_release(&pvec);
			spin_lock_irq(&zone->lru_lock);
		}
	}
	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);

	__count_zone_vm_events(PGREFILL, zone, pgscanned);
	__count_vm_events(PGDEACTIVATE, pgdeactivate);
	spin_unlock_irq(&zone->lru_lock);
	if (vm_swap_full())
		pagevec_swap_free(&pvec);

	pagevec_release(&pvec);
}


Note the first use of pgmoved there.  It no longer does anything.  erk.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-10-10 22:33 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-08  5:55 vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch Nick Piggin
2008-10-08 10:03 ` vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch KOSAKI Motohiro
2008-10-10 22:17   ` vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch Andrew Morton
2008-10-10 22:25     ` vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch Andrew Morton
2008-10-10 22:33       ` Andrew Morton [this message]
2008-10-10 23:59         ` vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch Rik van Riel
2008-10-11  1:42           ` vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch Andrew Morton
2008-10-11  1:53             ` vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch Rik van Riel
2008-10-11  2:21               ` vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch Andrew Morton
2008-10-11 20:46                 ` vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch Rik van Riel
2008-10-12 13:31                   ` vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch KOSAKI Motohiro
2008-10-10 23:56       ` vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081010153346.e25b90f7.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.