From: Andrew Morton <akpm@digeo.com>
To: Rik van Riel <riel@conectiva.com.br>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: inactive_dirty list
Date: Fri, 06 Sep 2002 22:28:44 -0700 [thread overview]
Message-ID: <3D798E8C.3DCD883C@digeo.com> (raw)
In-Reply-To: Pine.LNX.4.44L.0209062309580.1857-100000@imladris.surriel.com
Rik van Riel wrote:
>
> On Fri, 6 Sep 2002, Andrew Morton wrote:
>
> > I have a silly feeling that setting DEF_PRIORITY to "12" will
> > simply fix this.
> >
> > Duh.
>
> Ideally we'd get rid of DEF_PRIORITY alltogether and would
> just scan each zone once.
>
What I'm doing now is:
#define DEF_PRIORITY 12 /* puke */
for (priority = DEF_PRIORITY; priority; priority--) {
int total_scanned = 0;
shrink_caches(priority, &total_scanned);
if (that didn't work) {
wakeup_bdflush(total_scanned);
blk_congestion_wait(WRITE, HZ/4);
}
}
and in shrink_caches():
max_scan = zone->nr_inactive >> priority;
if (max_scan < nr_pages * 2)
max_scan = nr_pages * 2;
nr_pages = shrink_zone(zone, max_scan, gfp_mask, nr_pages);
So in effect, for a 32-page reclaim attempt we'll scan 64 pages
of ZONE_HIGHMEM, then 128 pages of ZONE_NORMAL/DMA. If that doesn't
yield 32 pages we ask pdflush to write 3*64 pages. Then take a nap.
Then do it again: scan 64 pages of ZONE_HIGHMEM, then 128 of ZONE_NORMAL/DMA,
then write back 192 pages then nap.
Then do it again: scan 128 pages of ZONE_HIGHMEM, then 256 of ZONE_NORMAL/DMA,
then write back 384 pages then nap.
etc. Plus there are the actual pages which we started IO against
during the LRU scan - there can be up to 32 of those.
BTW, it turns out that the main reason why kswapd was going silly was
that the VM is *not* treating the `priority' as a logarithmic thing at
all:
int max_scan = nr_inactive_pages / priority;
so the claims about scanning 1/64th of the list are crap. That
thing scans 1/6th of the queue on the first pass. In the mem=1G
case, that's 30,000 damn pages. Maybe someone should take a look
at Marcelo's kernel?
There are a few warts: pdflush_operation will fail if all pdflush threads
are out doing something (pretty unlikely with the nonblocking stuff.
Might happen if writeback has to run get_block()). But we'll be writing
back stuff anyway.
I changed blk_congestion_wait a bit too. The first version would
return immediately if no queues were congested ( > 75% full). Now,
it will sleep even if no queues are congested. It will return
as soon as someone puts back a write request. If someone is silly
enough to call blk_congestion_wait() when there are no write requests
in flight at all, they get to take the full 1/4 second sleep.
The mem=1G corner case is fixed, and page reclaim just doesn't
figure:
c012c034 288 0.317709 do_wp_page
c0144ae0 316 0.348597 __block_commit_write
c012c910 342 0.377279 do_anonymous_page
c0143efc 353 0.389414 __find_get_block
c012f7e0 356 0.392724 find_lock_page
c012f9f0 356 0.392724 do_generic_file_read
c01832bc 367 0.404858 ext2_free_branches
c0136e70 371 0.409271 __free_pages_ok
c010e7b4 386 0.425818 timer_interrupt
c01e3cfc 414 0.456707 radix_tree_lookup
c0141894 434 0.47877 vfs_write
c012f580 474 0.522896 unlock_page
c0134348 500 0.551578 kmem_cache_alloc
c01347d0 531 0.585776 kmem_cache_free
c013712c 574 0.633212 rmqueue
c0141320 605 0.667409 generic_file_llseek
c0156924 616 0.679544 count_list
c0142c04 617 0.680647 fget
c01091e0 793 0.874803 system_call
c0155914 860 0.948714 __d_lookup
c0144674 1076 1.187 __block_prepare_write
c014c63c 1184 1.30614 link_path_walk
c012fcd4 10932 12.0597 file_read_actor
c0130674 16443 18.1392 generic_file_write_nolock
c0107048 31293 34.5211 poll_idle
The balancing of the zones looks OK from a first glance and of course
the change in system behaviour under heavy writeout loads is profound.
Let's do the MAP_SHARED-pages-get-a-second-round thing, and it'd
be good if we could come up with some algorithm for setting the
current dirty pagecache clamping level rather than relying on the
dopey /proc/sys/vm/dirty_async_ratio magic number.
I'm thinking that dirty_async_ratio becomes a maximum ratio, and
that we dynamically lower it when large amounts of dirty pagecache
would be embarrassing. Or maybe there's just no need for this. Dunno.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
prev parent reply other threads:[~2002-09-07 5:14 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-09-06 20:42 inactive_dirty list Andrew Morton
2002-09-06 21:03 ` Rik van Riel
2002-09-06 21:40 ` Andrew Morton
2002-09-06 21:49 ` Rik van Riel
2002-09-06 21:58 ` Andrew Morton
2002-09-06 22:04 ` Rik van Riel
2002-09-06 22:19 ` Andrew Morton
2002-09-06 22:23 ` Rik van Riel
2002-09-06 22:48 ` Andrew Morton
2002-09-06 23:03 ` Rik van Riel
2002-09-06 23:34 ` Andrew Morton
2002-09-07 0:00 ` Rik van Riel
2002-09-07 0:29 ` Andrew Morton
2002-09-08 21:21 ` Daniel Phillips
2002-09-06 22:22 ` Rik van Riel
2002-09-07 2:14 ` Andrew Morton
2002-09-07 2:10 ` Rik van Riel
2002-09-07 5:28 ` Andrew Morton [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3D798E8C.3DCD883C@digeo.com \
--to=akpm@digeo.com \
--cc=linux-mm@kvack.org \
--cc=riel@conectiva.com.br \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.