public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Andrew Morton <akpm@osdl.org>
Cc: piggin@cyberone.com.au, riel@redhat.com,
	marcelo.tosatti@cyclades.com, j-nomura@ce.jp.nec.com,
	linux-kernel@vger.kernel.org, torvalds@osdl.org
Subject: Re: [2.4] heavy-load under swap space shortage
Date: Mon, 15 Mar 2004 19:51:47 +0100	[thread overview]
Message-ID: <20040315185147.GH30940@dualathlon.random> (raw)
In-Reply-To: <20040315103510.25c955a3.akpm@osdl.org>

On Mon, Mar 15, 2004 at 10:35:10AM -0800, Andrew Morton wrote:
> Andrea Arcangeli <andrea@suse.de> wrote:
> >
> > On Tue, Mar 16, 2004 at 01:37:04AM +1100, Nick Piggin wrote:
> > > This case I think is well worth the unfairness it causes, because it
> > > means your zone's pages can be freed quickly and without freeing pages
> > > from other zones.
> > 
> > freeing pages from other zones is perfectly fine, the classzone design
> > gets it right, you have to free memory from the other zones too or you
> > have no way to work on a 1G machine. you call the thing "unfair" when it
> > has nothing to do with fariness, your unfariness is the slowdown I
> > pointed out,
> 
> This "slowdown" is purely theoretical and has never been demonstrated.

on a 32G box the slowdown is zero, as it's zero on a 1G box too, you
definitely need a 2G box to measure it.

The effect is that you can do stuff like 'cvs up' and you will end up
caching just 1G instead of 2G. Or do I miss something? If I would own a
2G box I would hate to be able to cache just 1 G (yeah, the cache is 2G
but half of that cache is pinned and it sits there with years old data,
so effectively you lose 50% of the ram in the box in terms of cache
utilization).

> One could just as easily point at the fact that on a 32GB machine with a
> single LRU we have to send 64 highmem pages to the wrong end of the LRU for
> each scanned lowmem page, thus utterly destroying any concept of it being
> an LRU in the first place.  But this is also theoretical, and has never
> been demonstrated and is thus uninteresting.

the lowmem zone on a 32G box is completely reserved for zone-normal
allocation, and dcache shrinks aren't too frequent in some workload, but
you're certainly right that on a 32G box per-zone lru is optimal in
terms of cpu utilization (on 64bit either ways doesn't make any
difference, the GFP_DMA allocations are so seldom that throwing a bit of
cpu at those seldom allocation is fine).

> 
> Worked out why my box is going into a 3-5 minute coma with one test.
> Think what the LRUs look like when the test first hits page reclaim
> on this 2.5G ia32 box:
> 
>                head                           tail
> active_list:   <800M of ZONE_NORMAL> <200M of ZONE_HIGHMEM>
> inactive_list:          <1.5G of ZONE_HIGHMEM>
> 
> now, somebody does a GFP_KERNEL allocation.
> 
> uh-oh.
> 
> VM calls refill_inactive.  That moves 25 ZONE_HIGHMEM pages onto
> the inactive list.  It then scans 5000 pages, achieving nothing.

I fixed this in my tree a long time ago, you certainly don't need
per-zone lru to fix this (though for a 32G box the per-zone lru doesn't
only fix it, it also save lots of cpu too compared to the global lru).
See the refill_inactive code in my tree:

static void refill_inactive(int nr_pages, zone_t * classzone)
{
	struct list_head * entry;
	unsigned long ratio;

	ratio = (unsigned long) nr_pages * classzone->nr_active_pages /
(((unsigned long) classzone->nr_inactive_pages * vm_lru_balance_ratio) +
1);

	entry = active_list.prev;
	while (ratio && entry != &active_list) {
		struct page * page;
		int related_metadata = 0;

		page = list_entry(entry, struct page, lru);
		entry = entry->prev;

		if (!memclass(page_zone(page), classzone)) {
			/*
			 * Hack to address an issue found by Rik. The
			 * problem is that
			 * highmem pages can hold buffer headers
			 * allocated
			 * from the slab on lowmem, and so if we are
			 * working
			 * on the NORMAL classzone here, it is correct
			 * not to
			 * try to free the highmem pages themself (that
			 * would be useless)
			 * but we must make sure to drop any lowmem
			 * metadata related to those
			 * highmem pages.
			 */
			if (page->buffers && page->mapping) { /* fast path racy check */
				if (unlikely(TryLockPage(page)))
					continue;
				if (page->buffers && page->mapping && memclass_related_bhs(page, classzone)) /* non racy check */
					related_metadata = 1;
				UnlockPage(page);
			}
			if (!related_metadata)
				continue;
		}

		if (PageTestandClearReferenced(page)) {
			list_del(&page->lru);
			list_add(&page->lru, &active_list);
			continue;
		}

		if (!related_metadata)
			ratio--;

		del_page_from_active_list(page);
		add_page_to_inactive_list(page);
		SetPageReferenced(page);
	}
	if (entry != &active_list) {
		list_del(&active_list);
		list_add(&active_list, entry);
	}
}


the memclass checks guarantees that we make progress. the old vm code
(that you inherit in 2.5) missed those bits I believe.

without those fixes the 2.4 vm wouldn't perform on 32G (as you also
found during 2.5).

  reply	other threads:[~2004-03-15 18:51 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-02 10:12 [2.4] heavy-load under swap space shortage j-nomura
2004-02-02 13:29 ` Hugh Dickins
2004-02-03  7:53   ` j-nomura
2004-02-03 17:19     ` Hugh Dickins
2004-02-04 11:40       ` j-nomura
2004-02-05 18:42         ` Hugh Dickins
2004-02-06  9:03           ` j-nomura
2004-03-10 10:57           ` j-nomura
2004-03-14 19:47             ` Marcelo Tosatti
2004-03-14 19:54               ` Rik van Riel
2004-03-14 20:15               ` Andrew Morton
     [not found]                 ` <20040314230138.GV30940@dualathlon.random>
2004-03-14 23:22                   ` Andrew Morton
2004-03-15  0:14                     ` Andrea Arcangeli
2004-03-15  4:38                       ` Nick Piggin
2004-03-15 11:49                         ` Andrea Arcangeli
2004-03-15 13:23                           ` Rik van Riel
2004-03-15 14:37                             ` Nick Piggin
2004-03-15 14:50                               ` Andrea Arcangeli
2004-03-15 18:35                                 ` Andrew Morton
2004-03-15 18:51                                   ` Andrea Arcangeli [this message]
2004-03-15 19:02                                     ` Andrew Morton
2004-03-15 21:55                                       ` Andrea Arcangeli
2004-03-15 22:05                                 ` Nick Piggin
2004-03-15 22:24                                   ` Andrea Arcangeli
2004-03-15 22:41                                     ` Nick Piggin
2004-03-15 22:44                                       ` Andrea Arcangeli
2004-03-15 22:41                                     ` Rik van Riel
2004-03-15 23:32                                       ` Andrea Arcangeli
2004-03-16  6:27                                         ` Nick Piggin
2004-03-16  7:25                                   ` Marcelo Tosatti
2004-03-16  6:31                     ` Marcelo Tosatti
2004-03-16 13:47                       ` Andrea Arcangeli
2004-03-16 16:59                         ` Marcelo Tosatti
2004-11-22 15:01                     ` Lazily add anonymous pages to LRU on v2.4? was " Marcelo Tosatti
2004-11-22 19:49                       ` Andrea Arcangeli
2004-11-22 15:58                         ` Marcelo Tosatti
2004-05-26 12:41             ` Marcelo Tosatti
2004-05-26 18:24               ` Marc-Christian Petersen
2004-05-27 11:16                 ` Marcelo Tosatti
2004-05-26 19:06               ` Hugh Dickins
2004-05-26 22:23               ` Andrea Arcangeli
2004-05-28  2:55               ` j-nomura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040315185147.GH30940@dualathlon.random \
    --to=andrea@suse.de \
    --cc=akpm@osdl.org \
    --cc=j-nomura@ce.jp.nec.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo.tosatti@cyclades.com \
    --cc=piggin@cyberone.com.au \
    --cc=riel@redhat.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox