linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-mm@kvack.org, borntraeger@de.ibm.com, hughd@google.com,
	izik.eidus@ravellosystems.com, chrisw@sous-sol.org,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring
Date: Wed, 18 Jan 2017 16:15:56 +0100	[thread overview]
Message-ID: <1e1e7589-9713-e6a4-f57c-bfd94eb3e1e9@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170112172132.GM4947@redhat.com>

Hi Andrea,

On 12/01/17 18:21, Andrea Arcangeli wrote:
> Hello Claudio,
> 
> On Thu, Jan 12, 2017 at 05:17:14PM +0100, Claudio Imbrenda wrote:
>> +#ifdef __HAVE_COLOR_ZERO_PAGE
>> +	/*
>> +	 * Same checksum as an empty page. We attempt to merge it with the
>> +	 * appropriate zero page.
>> +	 */
>> +	if (checksum == zero_checksum) {
>> +		struct vm_area_struct *vma;
>> +
>> +		vma = find_mergeable_vma(rmap_item->mm, rmap_item->address);
>> +		err = try_to_merge_one_page(vma, page,
>> +					    ZERO_PAGE(rmap_item->address));
> 
> So the objective is not to add the zero pages to the stable tree but
> just convert them to readonly zerpages?

Yes. I thought that would be the easiest and cleanest way to do it.

> Maybe this could be a standard option for all archs to disable
> enable/disable with a new sysfs control similarly to the NUMA aware
> deduplication. The question is if it should be enabled by default in
> those archs where page coloring matters a lot. Probably yes.

I'm not sure it would make sense to have this for archs that don't have
page coloring. Merging empty pages together instead of with the
ZERO_PAGE() would save exactly one page and it would bring no speed
advantages (or rather: not using the ZERO_PAGE() would not bring any
speed penalty).
That's why I have #ifdef'd it to have it only when page coloring is
present. Also, for what I could see, only MIPS and s390 have page
coloring; I don't like the idea of imposing any overhead to all the
other archs.

I agree that this should be toggleable with a sysfs control, since it's
a change that can potentially negatively affect the performance in some
cases. I'm adding it in the next iteration.

> There are guest OS creating lots of zero pages, not linux though, for
> linux guests this is just overhead. Also those guests creating zero

Unless the userspace in the guests is creating lots of pages full of
zeroes :)

> pages wouldn't constantly read from them so again for KVM usage this
> is unlikely to help. For certain guest OS it'll create less KSM
> metadata with this approach, but it's debatable if it's worth one more

Honestly I don't think this patch will bring any benefits regarding
metadata -- one page more or less in the metadata won't change much. Our
issue is just the reading speed of the deduplicated empty pages.

> memcpy for every merge-candidate page to save some metadata, it's very

I'm confused, why memcpy? did you mean memcmp? We are not doing any
additional memops except in the case when a candidate non-empty page
happens to have the same checksum as an empty page, in which case we
have an extra memcmp compared to the normal operation.

> guest-workload dependent too. Of course your usage is not KVM but
> number crunching with uninitialized tables, it's different and the
> zero page read speed matters.
> 
> On the implementation side I think the above is going to call
> page_add_anon_rmap(kpage, vma, addr, false) and get_page by mistake,
> and it should use pte_mkspecial not mk_pte. I think you need to pass
> up a zeropage bool into replace_page and change replace_page to create
> a proper zeropage in place of the old page or it'll eventually
> overflow the page count crashing etc...

Maybe an even less intrusive change could be to check in replace_page if
is_zero_pfn(page_to_pfn(kpage)). And of course I would #ifdef that too,
to avoid the overhead for archs without page coloring.
So if the replacement page is a ZERO_PAGE() no get_page() and no
page_add_anon_rmap() would be performed, and the set_pte_at_notify()
would have pte_mkspecial(pfn_pte(page_to_pfn(kpage))) instead of mk_pte() .


thanks,
Claudio

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-01-18 15:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-12 16:17 [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring Claudio Imbrenda
2017-01-12 16:21 ` Christian Borntraeger
2017-01-12 17:21 ` Andrea Arcangeli
2017-01-18 15:15   ` Claudio Imbrenda [this message]
2017-01-18 16:29     ` Andrea Arcangeli
2017-01-18 17:17       ` Claudio Imbrenda
2017-01-18 18:11         ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e1e7589-9713-e6a4-f57c-bfd94eb3e1e9@linux.vnet.ibm.com \
    --to=imbrenda@linux.vnet.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=borntraeger@de.ibm.com \
    --cc=chrisw@sous-sol.org \
    --cc=hughd@google.com \
    --cc=izik.eidus@ravellosystems.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).