From: Rik van Riel <riel@redhat.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
lwoodman@redhat.com, akpm@linux-foundation.org,
Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
aarcange@redhat.com
Subject: Re: [PATCH -mm] change anon_vma linking to fix multi-process server scalability issue
Date: Thu, 28 Jan 2010 12:24:10 -0500 [thread overview]
Message-ID: <4B61C83A.20301@redhat.com> (raw)
In-Reply-To: <1264696641.17063.32.camel@barrios-desktop>
On 01/28/2010 11:37 AM, Minchan Kim wrote:
> On Thu, 2010-01-28 at 00:20 -0500, Rik van Riel wrote:
>> This patch changes the way anon_vmas and VMAs are linked, which
>> allows us to associate multiple anon_vmas with a VMA. At fork
>> time, each child process gets its own anon_vmas, in which its
>> COWed pages will be instantiated. The parents' anon_vma is also
>> linked to the VMA, because non-COWed pages could be present in
>> any of the children.
>
> any of the children?
>
> IMHO, "parent" is right. :)
> Do I miss something? Could you elaborate it?
I am talking about an anonymous page that is shared by parent
and children processes and has not been COW copied yet.
>> -void vma_adjust(struct vm_area_struct *vma, unsigned long start,
>> +int vma_adjust(struct vm_area_struct *vma, unsigned long start,
>> unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert)
>> {
>> struct mm_struct *mm = vma->vm_mm;
>> @@ -542,6 +541,29 @@ again: remove_next = 1 + (end> next->vm_end);
>> }
>> }
>>
>> + /*
>> + * When changing only vma->vm_end, we don't really need
>> + * anon_vma lock.
>> + */
>> + if (vma->anon_vma&& (insert || importer || start != vma->vm_start))
>> + anon_vma = vma->anon_vma;
>> + if (anon_vma) {
>> + /*
>> + * Easily overlooked: when mprotect shifts the boundary,
>> + * make sure the expanding vma has anon_vma set if the
>> + * shrinking vma had, to cover any anon pages imported.
>> + */
>> + if (importer&& !importer->anon_vma) {
>> + /* Block reverse map lookups until things are set up. */
>> + importer->vm_flags |= VM_LOCK_RMAP;
>> + if (anon_vma_clone(importer, vma)) {
>> + importer->vm_flags&= ~VM_LOCK_RMAP;
>> + return -ENOMEM;
>
> If we fail in here during progressing on next vmas in case of mprotect case 6,
> the previous vmas would become inconsistent state.
I've re-read the code, but I don't see what you are referring
to. If vma_adjust bails out early, no VMAs will be adjusted
and all the VMAs will stay the way they were before mprotect
was called.
What am I overlooking?
>> @@ -2260,6 +2306,12 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
>> }
>> }
>> return new_vma;
>> +
>> + out_free_mempol:
>> + mpol_put(pol);
>> + out_free_vma:
>> + kmem_cache_free(vm_area_cachep, new_vma);
>> + return NULL;
>> }
>
>
> As I said previously, I have a concern about memory footprint.
> It adds anon_vma_chain and increases anon_vma's size for KSM.
>
> I think it will increase 3 times more than only anon_vma.
>
> Although you think it's not big in normal machine,
> it's not good in embedded system which is no anon_vma scalability issue
> and even no-swap. so I wanted you to make it configurable.
That is a fair point. With CONFIG_SWAP=n we do not need the
anon_vma structs or anon_vma_chain structs at all.
I would be happy to integrate a patch into my series that
stubs out all of that code for CONFIG_SWAP=n, but I am going
to work on something else myself right now :)
--
All rights reversed.
WARNING: multiple messages have this Message-ID (diff)
From: Rik van Riel <riel@redhat.com>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
lwoodman@redhat.com, akpm@linux-foundation.org,
Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
aarcange@redhat.com
Subject: Re: [PATCH -mm] change anon_vma linking to fix multi-process server scalability issue
Date: Thu, 28 Jan 2010 12:24:10 -0500 [thread overview]
Message-ID: <4B61C83A.20301@redhat.com> (raw)
In-Reply-To: <1264696641.17063.32.camel@barrios-desktop>
On 01/28/2010 11:37 AM, Minchan Kim wrote:
> On Thu, 2010-01-28 at 00:20 -0500, Rik van Riel wrote:
>> This patch changes the way anon_vmas and VMAs are linked, which
>> allows us to associate multiple anon_vmas with a VMA. At fork
>> time, each child process gets its own anon_vmas, in which its
>> COWed pages will be instantiated. The parents' anon_vma is also
>> linked to the VMA, because non-COWed pages could be present in
>> any of the children.
>
> any of the children?
>
> IMHO, "parent" is right. :)
> Do I miss something? Could you elaborate it?
I am talking about an anonymous page that is shared by parent
and children processes and has not been COW copied yet.
>> -void vma_adjust(struct vm_area_struct *vma, unsigned long start,
>> +int vma_adjust(struct vm_area_struct *vma, unsigned long start,
>> unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert)
>> {
>> struct mm_struct *mm = vma->vm_mm;
>> @@ -542,6 +541,29 @@ again: remove_next = 1 + (end> next->vm_end);
>> }
>> }
>>
>> + /*
>> + * When changing only vma->vm_end, we don't really need
>> + * anon_vma lock.
>> + */
>> + if (vma->anon_vma&& (insert || importer || start != vma->vm_start))
>> + anon_vma = vma->anon_vma;
>> + if (anon_vma) {
>> + /*
>> + * Easily overlooked: when mprotect shifts the boundary,
>> + * make sure the expanding vma has anon_vma set if the
>> + * shrinking vma had, to cover any anon pages imported.
>> + */
>> + if (importer&& !importer->anon_vma) {
>> + /* Block reverse map lookups until things are set up. */
>> + importer->vm_flags |= VM_LOCK_RMAP;
>> + if (anon_vma_clone(importer, vma)) {
>> + importer->vm_flags&= ~VM_LOCK_RMAP;
>> + return -ENOMEM;
>
> If we fail in here during progressing on next vmas in case of mprotect case 6,
> the previous vmas would become inconsistent state.
I've re-read the code, but I don't see what you are referring
to. If vma_adjust bails out early, no VMAs will be adjusted
and all the VMAs will stay the way they were before mprotect
was called.
What am I overlooking?
>> @@ -2260,6 +2306,12 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
>> }
>> }
>> return new_vma;
>> +
>> + out_free_mempol:
>> + mpol_put(pol);
>> + out_free_vma:
>> + kmem_cache_free(vm_area_cachep, new_vma);
>> + return NULL;
>> }
>
>
> As I said previously, I have a concern about memory footprint.
> It adds anon_vma_chain and increases anon_vma's size for KSM.
>
> I think it will increase 3 times more than only anon_vma.
>
> Although you think it's not big in normal machine,
> it's not good in embedded system which is no anon_vma scalability issue
> and even no-swap. so I wanted you to make it configurable.
That is a fair point. With CONFIG_SWAP=n we do not need the
anon_vma structs or anon_vma_chain structs at all.
I would be happy to integrate a patch into my series that
stubs out all of that code for CONFIG_SWAP=n, but I am going
to work on something else myself right now :)
--
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-01-28 17:25 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-28 5:20 [PATCH -mm] change anon_vma linking to fix multi-process server scalability issue Rik van Riel
2010-01-28 5:20 ` Rik van Riel
2010-01-28 6:43 ` [PATCH -mm] rmap: remove obsolete check from __page_check_anon_rmap Rik van Riel
2010-01-28 6:43 ` Rik van Riel
2010-02-01 15:26 ` Minchan Kim
2010-02-01 15:26 ` Minchan Kim
2010-01-28 6:43 ` [PATCH -mm] rmap: move exclusively owned pages to own anon_vma in do_wp_page Rik van Riel
2010-01-28 6:43 ` Rik van Riel
2010-02-01 15:25 ` Minchan Kim
2010-02-01 15:25 ` Minchan Kim
2010-02-01 16:33 ` Rik van Riel
2010-02-01 16:33 ` Rik van Riel
2010-01-28 16:37 ` [PATCH -mm] change anon_vma linking to fix multi-process server scalability issue Minchan Kim
2010-01-28 16:37 ` Minchan Kim
2010-01-28 17:24 ` Rik van Riel [this message]
2010-01-28 17:24 ` Rik van Riel
2010-01-29 0:55 ` Minchan Kim
2010-01-29 0:55 ` Minchan Kim
2010-01-29 1:18 ` Rik van Riel
2010-01-29 1:18 ` Rik van Riel
2010-01-29 23:14 ` Andrew Morton
2010-01-29 23:14 ` Andrew Morton
2010-01-29 23:57 ` Rik van Riel
2010-01-29 23:57 ` Rik van Riel
2010-01-30 0:22 ` [PATCH -mm] further cleanups to " Rik van Riel
2010-01-30 0:22 ` Rik van Riel
2010-01-30 0:34 ` [PATCH -mm] remove VM_LOCK_RMAP code Rik van Riel
2010-01-30 0:34 ` Rik van Riel
2010-02-01 6:15 ` Nick Piggin
2010-02-01 6:15 ` Nick Piggin
2010-02-01 15:55 ` Rik van Riel
2010-02-01 15:55 ` Rik van Riel
2010-02-02 6:44 ` Nick Piggin
2010-02-02 6:44 ` Nick Piggin
[not found] ` <20100211130422.f262be71.akpm@linux-foundation.org>
2010-02-11 21:49 ` [PATCH -mm] change anon_vma linking to fix multi-process server scalability issue Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B61C83A.20301@redhat.com \
--to=riel@redhat.com \
--cc=Lee.Schermerhorn@hp.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lwoodman@redhat.com \
--cc=minchan.kim@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.