All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rik van Riel <riel@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	lwoodman@redhat.com,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: FWD:  [PATCH v2] vmscan: limit concurrent reclaimers in shrink_zone
Date: Fri, 18 Dec 2009 12:43:32 -0500	[thread overview]
Message-ID: <4B2BBF44.2090104@redhat.com> (raw)
In-Reply-To: <20091218162332.GR29790@random.random>

On 12/18/2009 11:23 AM, Andrea Arcangeli wrote:
> On Thu, Dec 17, 2009 at 09:05:23PM +0000, Hugh Dickins wrote:

>> An rwlock there has been proposed on several occasions, but
>> we resist because that change benefits this case but performs
>> worse on more common cases (I believe: no numbers to back that up).
>
> I think rwlock for anon_vma is a must. Whatever higher overhead of the
> fast path with no contention is practically zero, and in large smp it
> allows rmap on long chains to run in parallel, so very much worth it
> because downside is practically zero and upside may be measurable
> instead in certain corner cases. I don't think it'll be enough, but I
> definitely like it.

I agree, changing the anon_vma lock to an rwlock should
work a lot better than what we have today.  The tradeoff
is a tiny slowdown in medium contention cases, at the
benefit of avoiding catastrophic slowdown in some cases.

With Nick Piggin's fair rwlocks, there should be no issue
at all.

> Rik suggested to me to have a cowed newly allocated page to use its
> own anon_vma. Conceptually Rik's idea is fine one, but the only
> complication then is how to chain the same vma into multiple anon_vma
> (in practice insert/removal will be slower and more metadata will be
> needed for additional anon_vmas and vams queued in more than
> anon_vma). But this only will help if the mapcount of the page is 1,
> if the mapcount is 10000 no change to anon_vma or prio_tree will solve
> this,

It's even more complex than this for anonymous pages.

Anonymous pages get COW copied in child (and parent)
processes, potentially resulting in one page, at each
offset into the anon_vma, for every process attached
to the anon_vma.

As a result, with 10000 child processes, page_referenced
can end up searching through 10000 VMAs even for pages
with a mapcount of 1!

-- 
All rights reversed.

WARNING: multiple messages have this Message-ID (diff)
From: Rik van Riel <riel@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	lwoodman@redhat.com,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: FWD:  [PATCH v2] vmscan: limit concurrent reclaimers in shrink_zone
Date: Fri, 18 Dec 2009 12:43:32 -0500	[thread overview]
Message-ID: <4B2BBF44.2090104@redhat.com> (raw)
In-Reply-To: <20091218162332.GR29790@random.random>

On 12/18/2009 11:23 AM, Andrea Arcangeli wrote:
> On Thu, Dec 17, 2009 at 09:05:23PM +0000, Hugh Dickins wrote:

>> An rwlock there has been proposed on several occasions, but
>> we resist because that change benefits this case but performs
>> worse on more common cases (I believe: no numbers to back that up).
>
> I think rwlock for anon_vma is a must. Whatever higher overhead of the
> fast path with no contention is practically zero, and in large smp it
> allows rmap on long chains to run in parallel, so very much worth it
> because downside is practically zero and upside may be measurable
> instead in certain corner cases. I don't think it'll be enough, but I
> definitely like it.

I agree, changing the anon_vma lock to an rwlock should
work a lot better than what we have today.  The tradeoff
is a tiny slowdown in medium contention cases, at the
benefit of avoiding catastrophic slowdown in some cases.

With Nick Piggin's fair rwlocks, there should be no issue
at all.

> Rik suggested to me to have a cowed newly allocated page to use its
> own anon_vma. Conceptually Rik's idea is fine one, but the only
> complication then is how to chain the same vma into multiple anon_vma
> (in practice insert/removal will be slower and more metadata will be
> needed for additional anon_vmas and vams queued in more than
> anon_vma). But this only will help if the mapcount of the page is 1,
> if the mapcount is 10000 no change to anon_vma or prio_tree will solve
> this,

It's even more complex than this for anonymous pages.

Anonymous pages get COW copied in child (and parent)
processes, potentially resulting in one page, at each
offset into the anon_vma, for every process attached
to the anon_vma.

As a result, with 10000 child processes, page_referenced
can end up searching through 10000 VMAs even for pages
with a mapcount of 1!

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-12-18 17:44 UTC|newest]

Thread overview: 131+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-11 21:46 [PATCH v2] vmscan: limit concurrent reclaimers in shrink_zone Rik van Riel
2009-12-11 21:46 ` Rik van Riel
2009-12-14  0:14 ` Minchan Kim
2009-12-14  0:14   ` Minchan Kim
2009-12-14  4:09   ` Rik van Riel
2009-12-14  4:09     ` Rik van Riel
2009-12-14  4:19     ` Minchan Kim
2009-12-14  4:19       ` Minchan Kim
2009-12-14  4:29       ` Rik van Riel
2009-12-14  4:29         ` Rik van Riel
2009-12-14  5:00         ` Minchan Kim
2009-12-14  5:00           ` Minchan Kim
2009-12-14 12:22 ` KOSAKI Motohiro
2009-12-14 12:22   ` KOSAKI Motohiro
2009-12-14 12:23   ` [cleanup][PATCH 1/8] vmscan: Make shrink_zone_begin/end helper function KOSAKI Motohiro
2009-12-14 12:23     ` KOSAKI Motohiro
2009-12-14 14:34     ` Rik van Riel
2009-12-14 14:34       ` Rik van Riel
2009-12-14 22:39     ` Minchan Kim
2009-12-14 22:39       ` Minchan Kim
2009-12-14 12:24   ` [PATCH 2/8] Mark sleep_on as deprecated KOSAKI Motohiro
2009-12-14 12:24     ` KOSAKI Motohiro
2009-12-14 13:03     ` Christoph Hellwig
2009-12-14 13:03       ` Christoph Hellwig
2009-12-14 16:04       ` Arjan van de Ven
2009-12-14 16:04         ` Arjan van de Ven
2009-12-14 14:34     ` Rik van Riel
2009-12-14 14:34       ` Rik van Riel
2009-12-14 22:44     ` Minchan Kim
2009-12-14 22:44       ` Minchan Kim
2009-12-14 12:29   ` [PATCH 3/8] Don't use sleep_on() KOSAKI Motohiro
2009-12-14 12:29     ` KOSAKI Motohiro
2009-12-14 14:35     ` Rik van Riel
2009-12-14 14:35       ` Rik van Riel
2009-12-14 22:46     ` Minchan Kim
2009-12-14 22:46       ` Minchan Kim
2009-12-14 12:30   ` [PATCH 4/8] Use prepare_to_wait_exclusive() instead prepare_to_wait() KOSAKI Motohiro
2009-12-14 12:30     ` KOSAKI Motohiro
2009-12-14 14:33     ` Rik van Riel
2009-12-14 14:33       ` Rik van Riel
2009-12-15  0:45       ` KOSAKI Motohiro
2009-12-15  0:45         ` KOSAKI Motohiro
2009-12-15  5:32         ` Mike Galbraith
2009-12-15  5:32           ` Mike Galbraith
2009-12-15  8:28           ` Mike Galbraith
2009-12-15  8:28             ` Mike Galbraith
2009-12-15 14:36             ` Mike Galbraith
2009-12-15 14:36               ` Mike Galbraith
2009-12-15 14:58           ` Rik van Riel
2009-12-15 14:58             ` Rik van Riel
2009-12-15 18:17             ` Mike Galbraith
2009-12-15 18:17               ` Mike Galbraith
2009-12-15 18:43             ` Mike Galbraith
2009-12-15 18:43               ` Mike Galbraith
2009-12-15 19:33               ` Rik van Riel
2009-12-15 19:33                 ` Rik van Riel
2009-12-16  0:48             ` KOSAKI Motohiro
2009-12-16  0:48               ` KOSAKI Motohiro
2009-12-16  2:44               ` Rik van Riel
2009-12-16  2:44                 ` Rik van Riel
2009-12-16  5:43               ` Mike Galbraith
2009-12-16  5:43                 ` Mike Galbraith
2009-12-14 23:03     ` Minchan Kim
2009-12-14 23:03       ` Minchan Kim
2009-12-14 12:30   ` [PATCH 5/8] Use io_schedule() instead schedule() KOSAKI Motohiro
2009-12-14 12:30     ` KOSAKI Motohiro
2009-12-14 14:37     ` Rik van Riel
2009-12-14 14:37       ` Rik van Riel
2009-12-14 23:46     ` Minchan Kim
2009-12-14 23:46       ` Minchan Kim
2009-12-15  0:56       ` KOSAKI Motohiro
2009-12-15  0:56         ` KOSAKI Motohiro
2009-12-15  1:13         ` Minchan Kim
2009-12-15  1:13           ` Minchan Kim
2009-12-14 12:31   ` [PATCH 6/8] Stop reclaim quickly when the task reclaimed enough lots pages KOSAKI Motohiro
2009-12-14 12:31     ` KOSAKI Motohiro
2009-12-14 14:45     ` Rik van Riel
2009-12-14 14:45       ` Rik van Riel
2009-12-14 23:51       ` KOSAKI Motohiro
2009-12-14 23:51         ` KOSAKI Motohiro
2009-12-15  0:11     ` Minchan Kim
2009-12-15  0:11       ` Minchan Kim
2009-12-15  0:35       ` KOSAKI Motohiro
2009-12-15  0:35         ` KOSAKI Motohiro
2009-12-14 12:32   ` [PATCH 7/8] Use TASK_KILLABLE instead TASK_UNINTERRUPTIBLE KOSAKI Motohiro
2009-12-14 12:32     ` KOSAKI Motohiro
2009-12-14 14:47     ` Rik van Riel
2009-12-14 14:47       ` Rik van Riel
2009-12-14 23:52     ` Minchan Kim
2009-12-14 23:52       ` Minchan Kim
2009-12-14 12:32   ` [PATCH 8/8] mm: Give up allocation if the task have fatal signal KOSAKI Motohiro
2009-12-14 12:32     ` KOSAKI Motohiro
2009-12-14 14:48     ` Rik van Riel
2009-12-14 14:48       ` Rik van Riel
2009-12-14 23:54     ` Minchan Kim
2009-12-14 23:54       ` Minchan Kim
2009-12-15  0:50       ` KOSAKI Motohiro
2009-12-15  0:50         ` KOSAKI Motohiro
2009-12-15  1:03         ` Minchan Kim
2009-12-15  1:03           ` Minchan Kim
2009-12-15  1:16           ` KOSAKI Motohiro
2009-12-15  1:16             ` KOSAKI Motohiro
2009-12-14 12:40   ` [PATCH v2] vmscan: limit concurrent reclaimers in shrink_zone KOSAKI Motohiro
2009-12-14 12:40     ` KOSAKI Motohiro
2009-12-14 17:08 ` Larry Woodman
2009-12-14 17:08   ` Larry Woodman
2009-12-15  0:49   ` KOSAKI Motohiro
2009-12-15  0:49     ` KOSAKI Motohiro
     [not found]   ` <20091217193818.9FA9.A69D9226@jp.fujitsu.com>
2009-12-17 12:23     ` FWD: " Larry Woodman
2009-12-17 14:43       ` Rik van Riel
2009-12-17 14:43         ` Rik van Riel
2009-12-17 19:55       ` Rik van Riel
2009-12-17 19:55         ` Rik van Riel
2009-12-17 21:05         ` Hugh Dickins
2009-12-17 21:05           ` Hugh Dickins
2009-12-17 22:52           ` Rik van Riel
2009-12-17 22:52             ` Rik van Riel
2009-12-18 16:23           ` Andrea Arcangeli
2009-12-18 16:23             ` Andrea Arcangeli
2009-12-18 17:43             ` Rik van Riel [this message]
2009-12-18 17:43               ` Rik van Riel
2009-12-18 10:27       ` KOSAKI Motohiro
2009-12-18 10:27         ` KOSAKI Motohiro
2009-12-18 14:09         ` Rik van Riel
2009-12-18 14:09           ` Rik van Riel
2009-12-18 13:38 ` Avi Kivity
2009-12-18 13:38   ` Avi Kivity
2009-12-18 14:12   ` Rik van Riel
2009-12-18 14:12     ` Rik van Riel
2009-12-18 14:13     ` Avi Kivity
2009-12-18 14:13       ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B2BBF44.2090104@redhat.com \
    --to=riel@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lwoodman@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.