From: Andrea Arcangeli <aarcange@redhat.com>
To: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Rik van Riel <riel@redhat.com>,
lwoodman@redhat.com,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: FWD: [PATCH v2] vmscan: limit concurrent reclaimers in shrink_zone
Date: Fri, 18 Dec 2009 17:23:32 +0100 [thread overview]
Message-ID: <20091218162332.GR29790@random.random> (raw)
In-Reply-To: <Pine.LNX.4.64.0912172055570.15788@sister.anvils>
On Thu, Dec 17, 2009 at 09:05:23PM +0000, Hugh Dickins wrote:
> Please first clarify whether what Larry is running is actually
> a workload that people need to behave well in real life.
Anything with 10000 connections using a connection-per-thread/process
model, should use threads if good performance are expected, processes
not. Most things that are using multi-process design will never use
one-connection-per-process design (yes there are exceptions and
no we can't expect to fix those as they're proprietary ;). So I'm not
particularly worried.
Also make sure this also happens on older kernels, newer kernels uses
rmap chains and mangle over ptes even when there's no VM pressure for
no good reason. Older kernels would only hit on the anon_vma chain on
any anon page, only after this anon page was converted to swapcache
and swap was hit, so it makes a whole lot of difference. Anon_vma
chains should only be touched after we are I/O bound if anybody is to
expect decent performance out of the kernel.
> I'm not asserting that this one is purely academic, but I do
> think we need more than an artificial case to worry much about it.
Tend to agree.
> An rwlock there has been proposed on several occasions, but
> we resist because that change benefits this case but performs
> worse on more common cases (I believe: no numbers to back that up).
I think rwlock for anon_vma is a must. Whatever higher overhead of the
fast path with no contention is practically zero, and in large smp it
allows rmap on long chains to run in parallel, so very much worth it
because downside is practically zero and upside may be measurable
instead in certain corner cases. I don't think it'll be enough, but I
definitely like it.
> Substitute a MAP_SHARED file underneath those 10000 vmas,
> and don't you have an equal problem with the prio_tree,
> which would be harder to solve than the anon_vma case?
That is a very good point.
Rik suggested to me to have a cowed newly allocated page to use its
own anon_vma. Conceptually Rik's idea is fine one, but the only
complication then is how to chain the same vma into multiple anon_vma
(in practice insert/removal will be slower and more metadata will be
needed for additional anon_vmas and vams queued in more than
anon_vma). But this only will help if the mapcount of the page is 1,
if the mapcount is 10000 no change to anon_vma or prio_tree will solve
this, and we've to start breaking the rmap loop after 64
test_and_clear_young instead to mitigate the inefficiency on pages
that are used and never will go into swap and so where wasting 10000
cachelines just because this used page eventually is in the tail
position of the lru uis entirely wasted.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-12-18 16:23 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-11 21:46 [PATCH v2] vmscan: limit concurrent reclaimers in shrink_zone Rik van Riel
2009-12-14 0:14 ` Minchan Kim
2009-12-14 4:09 ` Rik van Riel
2009-12-14 4:19 ` Minchan Kim
2009-12-14 4:29 ` Rik van Riel
2009-12-14 5:00 ` Minchan Kim
2009-12-14 12:22 ` KOSAKI Motohiro
2009-12-14 12:23 ` [cleanup][PATCH 1/8] vmscan: Make shrink_zone_begin/end helper function KOSAKI Motohiro
2009-12-14 14:34 ` Rik van Riel
2009-12-14 22:39 ` Minchan Kim
2009-12-14 12:24 ` [PATCH 2/8] Mark sleep_on as deprecated KOSAKI Motohiro
2009-12-14 13:03 ` Christoph Hellwig
2009-12-14 16:04 ` Arjan van de Ven
2009-12-14 14:34 ` Rik van Riel
2009-12-14 22:44 ` Minchan Kim
2009-12-14 12:29 ` [PATCH 3/8] Don't use sleep_on() KOSAKI Motohiro
2009-12-14 14:35 ` Rik van Riel
2009-12-14 22:46 ` Minchan Kim
2009-12-14 12:30 ` [PATCH 4/8] Use prepare_to_wait_exclusive() instead prepare_to_wait() KOSAKI Motohiro
2009-12-14 14:33 ` Rik van Riel
2009-12-15 0:45 ` KOSAKI Motohiro
2009-12-15 5:32 ` Mike Galbraith
2009-12-15 8:28 ` Mike Galbraith
2009-12-15 14:36 ` Mike Galbraith
2009-12-15 14:58 ` Rik van Riel
2009-12-15 18:17 ` Mike Galbraith
2009-12-15 18:43 ` Mike Galbraith
2009-12-15 19:33 ` Rik van Riel
2009-12-16 0:48 ` KOSAKI Motohiro
2009-12-16 2:44 ` Rik van Riel
2009-12-16 5:43 ` Mike Galbraith
2009-12-14 23:03 ` Minchan Kim
2009-12-14 12:30 ` [PATCH 5/8] Use io_schedule() instead schedule() KOSAKI Motohiro
2009-12-14 14:37 ` Rik van Riel
2009-12-14 23:46 ` Minchan Kim
2009-12-15 0:56 ` KOSAKI Motohiro
2009-12-15 1:13 ` Minchan Kim
2009-12-14 12:31 ` [PATCH 6/8] Stop reclaim quickly when the task reclaimed enough lots pages KOSAKI Motohiro
2009-12-14 14:45 ` Rik van Riel
2009-12-14 23:51 ` KOSAKI Motohiro
2009-12-15 0:11 ` Minchan Kim
2009-12-15 0:35 ` KOSAKI Motohiro
2009-12-14 12:32 ` [PATCH 7/8] Use TASK_KILLABLE instead TASK_UNINTERRUPTIBLE KOSAKI Motohiro
2009-12-14 14:47 ` Rik van Riel
2009-12-14 23:52 ` Minchan Kim
2009-12-14 12:32 ` [PATCH 8/8] mm: Give up allocation if the task have fatal signal KOSAKI Motohiro
2009-12-14 14:48 ` Rik van Riel
2009-12-14 23:54 ` Minchan Kim
2009-12-15 0:50 ` KOSAKI Motohiro
2009-12-15 1:03 ` Minchan Kim
2009-12-15 1:16 ` KOSAKI Motohiro
2009-12-14 12:40 ` [PATCH v2] vmscan: limit concurrent reclaimers in shrink_zone KOSAKI Motohiro
2009-12-14 17:08 ` Larry Woodman
2009-12-15 0:49 ` KOSAKI Motohiro
[not found] ` <20091217193818.9FA9.A69D9226@jp.fujitsu.com>
2009-12-17 12:23 ` FWD: " Larry Woodman
2009-12-17 14:43 ` Rik van Riel
2009-12-17 19:55 ` Rik van Riel
2009-12-17 21:05 ` Hugh Dickins
2009-12-17 22:52 ` Rik van Riel
2009-12-18 16:23 ` Andrea Arcangeli [this message]
2009-12-18 17:43 ` Rik van Riel
2009-12-18 10:27 ` KOSAKI Motohiro
2009-12-18 14:09 ` Rik van Riel
2009-12-18 13:38 ` Avi Kivity
2009-12-18 14:12 ` Rik van Riel
2009-12-18 14:13 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091218162332.GR29790@random.random \
--to=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hugh.dickins@tiscali.co.uk \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lwoodman@redhat.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).