From: Andrea Arcangeli <andrea@suse.de>
To: Andrew Morton <akpm@osdl.org>
Cc: piggin@cyberone.com.au, riel@redhat.com,
marcelo.tosatti@cyclades.com, j-nomura@ce.jp.nec.com,
linux-kernel@vger.kernel.org, torvalds@osdl.org
Subject: Re: [2.4] heavy-load under swap space shortage
Date: Mon, 15 Mar 2004 19:51:47 +0100 [thread overview]
Message-ID: <20040315185147.GH30940@dualathlon.random> (raw)
In-Reply-To: <20040315103510.25c955a3.akpm@osdl.org>
On Mon, Mar 15, 2004 at 10:35:10AM -0800, Andrew Morton wrote:
> Andrea Arcangeli <andrea@suse.de> wrote:
> >
> > On Tue, Mar 16, 2004 at 01:37:04AM +1100, Nick Piggin wrote:
> > > This case I think is well worth the unfairness it causes, because it
> > > means your zone's pages can be freed quickly and without freeing pages
> > > from other zones.
> >
> > freeing pages from other zones is perfectly fine, the classzone design
> > gets it right, you have to free memory from the other zones too or you
> > have no way to work on a 1G machine. you call the thing "unfair" when it
> > has nothing to do with fariness, your unfariness is the slowdown I
> > pointed out,
>
> This "slowdown" is purely theoretical and has never been demonstrated.
on a 32G box the slowdown is zero, as it's zero on a 1G box too, you
definitely need a 2G box to measure it.
The effect is that you can do stuff like 'cvs up' and you will end up
caching just 1G instead of 2G. Or do I miss something? If I would own a
2G box I would hate to be able to cache just 1 G (yeah, the cache is 2G
but half of that cache is pinned and it sits there with years old data,
so effectively you lose 50% of the ram in the box in terms of cache
utilization).
> One could just as easily point at the fact that on a 32GB machine with a
> single LRU we have to send 64 highmem pages to the wrong end of the LRU for
> each scanned lowmem page, thus utterly destroying any concept of it being
> an LRU in the first place. But this is also theoretical, and has never
> been demonstrated and is thus uninteresting.
the lowmem zone on a 32G box is completely reserved for zone-normal
allocation, and dcache shrinks aren't too frequent in some workload, but
you're certainly right that on a 32G box per-zone lru is optimal in
terms of cpu utilization (on 64bit either ways doesn't make any
difference, the GFP_DMA allocations are so seldom that throwing a bit of
cpu at those seldom allocation is fine).
>
> Worked out why my box is going into a 3-5 minute coma with one test.
> Think what the LRUs look like when the test first hits page reclaim
> on this 2.5G ia32 box:
>
> head tail
> active_list: <800M of ZONE_NORMAL> <200M of ZONE_HIGHMEM>
> inactive_list: <1.5G of ZONE_HIGHMEM>
>
> now, somebody does a GFP_KERNEL allocation.
>
> uh-oh.
>
> VM calls refill_inactive. That moves 25 ZONE_HIGHMEM pages onto
> the inactive list. It then scans 5000 pages, achieving nothing.
I fixed this in my tree a long time ago, you certainly don't need
per-zone lru to fix this (though for a 32G box the per-zone lru doesn't
only fix it, it also save lots of cpu too compared to the global lru).
See the refill_inactive code in my tree:
static void refill_inactive(int nr_pages, zone_t * classzone)
{
struct list_head * entry;
unsigned long ratio;
ratio = (unsigned long) nr_pages * classzone->nr_active_pages /
(((unsigned long) classzone->nr_inactive_pages * vm_lru_balance_ratio) +
1);
entry = active_list.prev;
while (ratio && entry != &active_list) {
struct page * page;
int related_metadata = 0;
page = list_entry(entry, struct page, lru);
entry = entry->prev;
if (!memclass(page_zone(page), classzone)) {
/*
* Hack to address an issue found by Rik. The
* problem is that
* highmem pages can hold buffer headers
* allocated
* from the slab on lowmem, and so if we are
* working
* on the NORMAL classzone here, it is correct
* not to
* try to free the highmem pages themself (that
* would be useless)
* but we must make sure to drop any lowmem
* metadata related to those
* highmem pages.
*/
if (page->buffers && page->mapping) { /* fast path racy check */
if (unlikely(TryLockPage(page)))
continue;
if (page->buffers && page->mapping && memclass_related_bhs(page, classzone)) /* non racy check */
related_metadata = 1;
UnlockPage(page);
}
if (!related_metadata)
continue;
}
if (PageTestandClearReferenced(page)) {
list_del(&page->lru);
list_add(&page->lru, &active_list);
continue;
}
if (!related_metadata)
ratio--;
del_page_from_active_list(page);
add_page_to_inactive_list(page);
SetPageReferenced(page);
}
if (entry != &active_list) {
list_del(&active_list);
list_add(&active_list, entry);
}
}
the memclass checks guarantees that we make progress. the old vm code
(that you inherit in 2.5) missed those bits I believe.
without those fixes the 2.4 vm wouldn't perform on 32G (as you also
found during 2.5).
next prev parent reply other threads:[~2004-03-15 18:51 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-02-02 10:12 [2.4] heavy-load under swap space shortage j-nomura
2004-02-02 13:29 ` Hugh Dickins
2004-02-03 7:53 ` j-nomura
2004-02-03 17:19 ` Hugh Dickins
2004-02-04 11:40 ` j-nomura
2004-02-05 18:42 ` Hugh Dickins
2004-02-06 9:03 ` j-nomura
2004-03-10 10:57 ` j-nomura
2004-03-14 19:47 ` Marcelo Tosatti
2004-03-14 19:54 ` Rik van Riel
2004-03-14 20:15 ` Andrew Morton
[not found] ` <20040314230138.GV30940@dualathlon.random>
2004-03-14 23:22 ` Andrew Morton
2004-03-15 0:14 ` Andrea Arcangeli
2004-03-15 4:38 ` Nick Piggin
2004-03-15 11:49 ` Andrea Arcangeli
2004-03-15 13:23 ` Rik van Riel
2004-03-15 14:37 ` Nick Piggin
2004-03-15 14:50 ` Andrea Arcangeli
2004-03-15 18:35 ` Andrew Morton
2004-03-15 18:51 ` Andrea Arcangeli [this message]
2004-03-15 19:02 ` Andrew Morton
2004-03-15 21:55 ` Andrea Arcangeli
2004-03-15 22:05 ` Nick Piggin
2004-03-15 22:24 ` Andrea Arcangeli
2004-03-15 22:41 ` Nick Piggin
2004-03-15 22:44 ` Andrea Arcangeli
2004-03-15 22:41 ` Rik van Riel
2004-03-15 23:32 ` Andrea Arcangeli
2004-03-16 6:27 ` Nick Piggin
2004-03-16 7:25 ` Marcelo Tosatti
2004-03-16 6:31 ` Marcelo Tosatti
2004-03-16 13:47 ` Andrea Arcangeli
2004-03-16 16:59 ` Marcelo Tosatti
2004-11-22 15:01 ` Lazily add anonymous pages to LRU on v2.4? was " Marcelo Tosatti
2004-11-22 19:49 ` Andrea Arcangeli
2004-11-22 15:58 ` Marcelo Tosatti
2004-05-26 12:41 ` Marcelo Tosatti
2004-05-26 18:24 ` Marc-Christian Petersen
2004-05-27 11:16 ` Marcelo Tosatti
2004-05-26 19:06 ` Hugh Dickins
2004-05-26 22:23 ` Andrea Arcangeli
2004-05-28 2:55 ` j-nomura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040315185147.GH30940@dualathlon.random \
--to=andrea@suse.de \
--cc=akpm@osdl.org \
--cc=j-nomura@ce.jp.nec.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo.tosatti@cyclades.com \
--cc=piggin@cyberone.com.au \
--cc=riel@redhat.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox