From: Andrew Morton <akpm@digeo.com>
To: Rik van Riel <riel@conectiva.com.br>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: inactive_dirty list
Date: Fri, 06 Sep 2002 13:42:06 -0700 [thread overview]
Message-ID: <3D79131E.837F08B3@digeo.com> (raw)
Rik, it seems that the time has come...
I was doing some testing overnight with mem=1024m. Page reclaim
was pretty inefficient at that level: kswapd consumed 6% of CPU
on a permanent basis (workload was heavy dbench plus looping
make -j6 bzImage). kswapd was reclaiming only 3% of the pages
which it was looking at.
This doesn't happen at mem=768m, and I'm sure it won't happen at
mem=1.5G.
What is happening here is that the logic which clamps dirty+writeback
pagecache at 40% of memory is working nicely, and the allocate-from-
highmem-first logic is ensuring that all of ZONE_HIGHMEM is dirty
or under writeback all the time. kswapd isn't allowed to block
against that pagecache, so it's scanning zillions of pages.
This is a fundamental problem when the size of the highmem zone is
approximately equal to 40% of total memory.
We could fix it by changing the page allocator to balance its
allocations across zones, but I don't think we want to do that.
I think it's best to split the inactive list into reclaimable
and unreclaimable. (inactive_clean/inactive_dirty).
I'll code that tonight; please let me run some thoughts by you:
- inactive_dirty holds pages which are dirty or under writeback.
- end_page_writeback() will move the page onto inactive_clean.
- everywhere where we add a page to the inactive list will now
add it to either inactive_clean or inactive_dirty, based on
its PageDirty || PageWriteback state.
- the inactive target logic will remain the same. So
zone->nr_inactive_pages will be the sum of the pages on
zone->inactive_clean and zone->inactive_dirty.
- swapcache pages don't go on inactive_dirty(!). They remain on
inactive_clean, so if a page allocator or kswapd hits a swapcache
page, they block on it (swapout throttling).
A result of this is that we never need to scan inactive_dirty.
Those pages will always be written out in balance_dirty_pages
by the write(2) caller, or by pdflush.
(Hence: we don't need inactive_dirty at all. We could just cut
those pages off the LRU altogether. But let's not do that).
- Hence: the only pages which are written out from within the VM
are swapcache.
- So the only real source of throttling for tasks which aren't
running generic_file_write() is the call to blk_congestion_wait()
in try_to_free_pages(). Which seems sane to me - this will wake
up after 1/4 of a second, or after someone frees a write request
against *any* queue. We know that the pages which were covered
by that request were just placed onto inactive_clean, so off
we go again. Should work (heh).
- with this scheme, we don't actually need zone->nr_inactive_dirty_pages
and zone->nr_inactive_clean_pages, but I may as well do that - it's
easy enough.
- MAP_SHARED pages will be on inactive_clean, but if we change the
logic in there to give these pages a second round on the LRU then
the apges will automatically be added to inactive_dirty on the
way out of shrink_zone().
How does that all sound?
btw, it is approximately the case that the pages will come clean
in LRU order (oldest-first) because of the writeback logic. fs-writeback.c
walks the inodes in oldest-dirtied to newest-dirtied order, and
it walks the inode pages in oldest-dirtied to newest-dirtied
order. But I think that end_page_writeback() should still move
cleaned pages onto the far (hot) end of inactive_clean?
I think all of this will not result in the zone balancing logic
going into a tailspin. I'm just a bit worried about corner cases
when the number of reclaimable pages in highmem is getting low - the
classzone balancing code may keep on trying to refill that zone's free
memory pools too much. We'll see...
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
next reply other threads:[~2002-09-06 20:42 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-09-06 20:42 Andrew Morton [this message]
2002-09-06 21:03 ` inactive_dirty list Rik van Riel
2002-09-06 21:40 ` Andrew Morton
2002-09-06 21:49 ` Rik van Riel
2002-09-06 21:58 ` Andrew Morton
2002-09-06 22:04 ` Rik van Riel
2002-09-06 22:19 ` Andrew Morton
2002-09-06 22:23 ` Rik van Riel
2002-09-06 22:48 ` Andrew Morton
2002-09-06 23:03 ` Rik van Riel
2002-09-06 23:34 ` Andrew Morton
2002-09-07 0:00 ` Rik van Riel
2002-09-07 0:29 ` Andrew Morton
2002-09-08 21:21 ` Daniel Phillips
2002-09-06 22:22 ` Rik van Riel
2002-09-07 2:14 ` Andrew Morton
2002-09-07 2:10 ` Rik van Riel
2002-09-07 5:28 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3D79131E.837F08B3@digeo.com \
--to=akpm@digeo.com \
--cc=linux-mm@kvack.org \
--cc=riel@conectiva.com.br \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.