From: Andrea Arcangeli <andrea@suse.de>
To: Andrew Morton <akpm@osdl.org>
Cc: piggin@cyberone.com.au, riel@redhat.com,
marcelo.tosatti@cyclades.com, j-nomura@ce.jp.nec.com,
linux-kernel@vger.kernel.org, torvalds@osdl.org
Subject: Re: [2.4] heavy-load under swap space shortage
Date: Mon, 15 Mar 2004 19:51:47 +0100 [thread overview]
Message-ID: <20040315185147.GH30940@dualathlon.random> (raw)
In-Reply-To: <20040315103510.25c955a3.akpm@osdl.org>
On Mon, Mar 15, 2004 at 10:35:10AM -0800, Andrew Morton wrote:
> Andrea Arcangeli <andrea@suse.de> wrote:
> >
> > On Tue, Mar 16, 2004 at 01:37:04AM +1100, Nick Piggin wrote:
> > > This case I think is well worth the unfairness it causes, because it
> > > means your zone's pages can be freed quickly and without freeing pages
> > > from other zones.
> >
> > freeing pages from other zones is perfectly fine, the classzone design
> > gets it right, you have to free memory from the other zones too or you
> > have no way to work on a 1G machine. you call the thing "unfair" when it
> > has nothing to do with fariness, your unfariness is the slowdown I
> > pointed out,
>
> This "slowdown" is purely theoretical and has never been demonstrated.
on a 32G box the slowdown is zero, as it's zero on a 1G box too, you
definitely need a 2G box to measure it.
The effect is that you can do stuff like 'cvs up' and you will end up
caching just 1G instead of 2G. Or do I miss something? If I would own a
2G box I would hate to be able to cache just 1 G (yeah, the cache is 2G
but half of that cache is pinned and it sits there with years old data,
so effectively you lose 50% of the ram in the box in terms of cache
utilization).
> One could just as easily point at the fact that on a 32GB machine with a
> single LRU we have to send 64 highmem pages to the wrong end of the LRU for
> each scanned lowmem page, thus utterly destroying any concept of it being
> an LRU in the first place. But this is also theoretical, and has never
> been demonstrated and is thus uninteresting.
the lowmem zone on a 32G box is completely reserved for zone-normal
allocation, and dcache shrinks aren't too frequent in some workload, but
you're certainly right that on a 32G box per-zone lru is optimal in
terms of cpu utilization (on 64bit either ways doesn't make any
difference, the GFP_DMA allocations are so seldom that throwing a bit of
cpu at those seldom allocation is fine).
>
> Worked out why my box is going into a 3-5 minute coma with one test.
> Think what the LRUs look like when the test first hits page reclaim
> on this 2.5G ia32 box:
>
> head tail
> active_list: <800M of ZONE_NORMAL> <200M of ZONE_HIGHMEM>
> inactive_list: <1.5G of ZONE_HIGHMEM>
>
> now, somebody does a GFP_KERNEL allocation.
>
> uh-oh.
>
> VM calls refill_inactive. That moves 25 ZONE_HIGHMEM pages onto
> the inactive list. It then scans 5000 pages, achieving nothing.
I fixed this in my tree a long time ago, you certainly don't need
per-zone lru to fix this (though for a 32G box the per-zone lru doesn't
only fix it, it also save lots of cpu too compared to the global lru).
See the refill_inactive code in my tree:
static void refill_inactive(int nr_pages, zone_t * classzone)
{
struct list_head * entry;
unsigned long ratio;
ratio = (unsigned long) nr_pages * classzone->nr_active_pages /
(((unsigned long) classzone->nr_inactive_pages * vm_lru_balance_ratio) +
1);
entry = active_list.prev;
while (ratio && entry != &active_list) {
struct page * page;
int related_metadata = 0;
page = list_entry(entry, struct page, lru);
entry = entry->prev;
if (!memclass(page_zone(page), classzone)) {
/*
* Hack to address an issue found by Rik. The
* problem is that
* highmem pages can hold buffer headers
* allocated
* from the slab on lowmem, and so if we are
* working
* on the NORMAL classzone here, it is correct
* not to
* try to free the highmem pages themself (that
* would be useless)
* but we must make sure to drop any lowmem
* metadata related to those
* highmem pages.
*/
if (page->buffers && page->mapping) { /* fast path racy check */
if (unlikely(TryLockPage(page)))
continue;
if (page->buffers && page->mapping && memclass_related_bhs(page, classzone)) /* non racy check */
related_metadata = 1;
UnlockPage(page);
}
if (!related_metadata)
continue;
}
if (PageTestandClearReferenced(page)) {
list_del(&page->lru);
list_add(&page->lru, &active_list);
continue;
}
if (!related_metadata)
ratio--;
del_page_from_active_list(page);
add_page_to_inactive_list(page);
SetPageReferenced(page);
}
if (entry != &active_list) {
list_del(&active_list);
list_add(&active_list, entry);
}
}
the memclass checks guarantees that we make progress. the old vm code
(that you inherit in 2.5) missed those bits I believe.
without those fixes the 2.4 vm wouldn't perform on 32G (as you also
found during 2.5).
next prev parent reply other threads:[~2004-03-15 18:51 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-02-02 10:12 [2.4] heavy-load under swap space shortage j-nomura
2004-02-02 13:29 ` Hugh Dickins
2004-02-03 7:53 ` j-nomura
2004-02-03 17:19 ` Hugh Dickins
2004-02-04 11:40 ` j-nomura
2004-02-05 18:42 ` Hugh Dickins
2004-02-06 9:03 ` j-nomura
2004-03-10 10:57 ` j-nomura
2004-03-14 19:47 ` Marcelo Tosatti
2004-03-14 19:54 ` Rik van Riel
2004-03-14 20:15 ` Andrew Morton
[not found] ` <20040314230138.GV30940@dualathlon.random>
2004-03-14 23:22 ` Andrew Morton
2004-03-15 0:14 ` Andrea Arcangeli
2004-03-15 4:38 ` Nick Piggin
2004-03-15 11:49 ` Andrea Arcangeli
2004-03-15 13:23 ` Rik van Riel
2004-03-15 14:37 ` Nick Piggin
2004-03-15 14:50 ` Andrea Arcangeli
2004-03-15 18:35 ` Andrew Morton
2004-03-15 18:51 ` Andrea Arcangeli [this message]
2004-03-15 19:02 ` Andrew Morton
2004-03-15 21:55 ` Andrea Arcangeli
2004-03-15 22:05 ` Nick Piggin
2004-03-15 22:24 ` Andrea Arcangeli
2004-03-15 22:41 ` Nick Piggin
2004-03-15 22:44 ` Andrea Arcangeli
2004-03-15 22:41 ` Rik van Riel
2004-03-15 23:32 ` Andrea Arcangeli
2004-03-16 6:27 ` Nick Piggin
2004-03-16 7:25 ` Marcelo Tosatti
2004-03-16 6:31 ` Marcelo Tosatti
2004-03-16 13:47 ` Andrea Arcangeli
2004-03-16 16:59 ` Marcelo Tosatti
2004-11-22 15:01 ` Lazily add anonymous pages to LRU on v2.4? was " Marcelo Tosatti
2004-11-22 19:49 ` Andrea Arcangeli
2004-11-22 15:58 ` Marcelo Tosatti
2004-05-26 12:41 ` Marcelo Tosatti
2004-05-26 18:24 ` Marc-Christian Petersen
2004-05-27 11:16 ` Marcelo Tosatti
2004-05-26 19:06 ` Hugh Dickins
2004-05-26 22:23 ` Andrea Arcangeli
2004-05-28 2:55 ` j-nomura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040315185147.GH30940@dualathlon.random \
--to=andrea@suse.de \
--cc=akpm@osdl.org \
--cc=j-nomura@ce.jp.nec.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo.tosatti@cyclades.com \
--cc=piggin@cyberone.com.au \
--cc=riel@redhat.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.