Re: [PATCH v4] KSM: numa awareness sysfs knob

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrea Arcangeli <aarcange@redhat.com>
To: Hugh Dickins <hughd@google.com>
Cc: Petr Holasek <pholasek@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Chris Wright <chrisw@sous-sol.org>,
	Izik Eidus <izik.eidus@ravellosystems.com>,
	Rik van Riel <riel@redhat.com>,
	David Rientjes <rientjes@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Anton Arapov <anton@redhat.com>
Subject: Re: [PATCH v4] KSM: numa awareness sysfs knob
Date: Mon, 1 Oct 2012 13:53:09 +0200	[thread overview]
Message-ID: <20121001115309.GE20924@redhat.com> (raw)
In-Reply-To: <alpine.LSU.2.00.1209301639240.6304@eggly.anvils>

Hi Hugh,

On Sun, Sep 30, 2012 at 05:36:33PM -0700, Hugh Dickins wrote:
> I'm all for the simplest solution, but here in ksm_migrate_page()
> is not a good place for COW breaking - we don't want to get into
> an indefinite number of page allocations, and the risk of failure.

Agreed, not a good place to break_cow.

> I was toying with the idea of leaving the new page in the old NUMAnode's
> stable tree temporarily, until ksmd comes around again, and let that
> clean it up.  Which would imply less reliance on get_kpfn_nid(),
> and not skipping PageKsm in ksm_do_scan(), and...

There a break_cow could more easily run to cleanup the errors in the
stable tree. It'd be one way to avoid altering migrate.

> But it's not all that simple, and I think we can do better.

Agreed.

> It's only just fully dawned on me that ksm_migrate_page() is actually
> a very convenient place: no pagetable mangling required, because we
> know that neither old nor new page is at this instant mapped into
> userspace at all - don't we?  Instead there are swap-like migration
> entries plugging all ptes until we're ready to put in the new page.

Yes.

> So I think what we really want to do is change the ksm_migrate_page()
> interface a little, and probably the precise position it's called from,
> to allow it to update mm/migrate.c's newpage - in the collision case

I agree your proposed modification to the ->migratepage protocol
should be able to deal with that. We should notify the caller the
"newpage" has been freed and we transferred all ownership to an
"alternate_newpage". So then migrate will restore the ptes pointing to
the alternate_newpage (not the allocated newpage). It should be also
possible to get an hold on the alternate_newpage, before having to
allocate newpage.

> when the new NUMAnode already has a stable copy of this page.  But when
> it doesn't, just move KSMnode from old NUMAnode's stable tree to new.

Agreed, that is the easy case and doesn't require interface changes.

> How well the existing ksm.c primitives are suited to this, I've not
> checked.  Probably not too well, but shouldn't be hard to add what's
> needed.
> 
> What do you think?  Does that sound reasonable, Petr?

Sounds like a plan, I agree the modification to migrate is the best
way to go here. Only cons: it's not the simplest solution.

> By the way, this is probably a good occasion to remind ourselves,
> that page migration is still usually disabled on PageKsm pages:
> ksm_migrate_page() is only being called for memory hotremove.  I had
> been about to complain that calling remove_node_from_stable_tree()
> from ksm_migrate_page() is also unsafe from a locking point of view;
> until I remembered that MEM_GOING_OFFLINE has previously acquired
> ksm_thread_mutex.
> 
> But page migration is much more important now than three years ago,
> with compaction relying upon it, CMA and THP relying upon compaction,
> and lumpy reclaim gone.

Agreed. AutoNUMA needs it too: AutoNUMA migrates all types of memory,
not just anonymous memory, as long as the mapcount == 1.

If all users break_cow except one, then the KSM page can move around
if it has left just one user, we don't need to wait this last user to
break_cow (which may never happen) before can move it.

> Whilst it should not be mixed up in the NUMA patch itself, I think we
> need now to relax that restriction.  I found re-reading my 62b61f611e
> "ksm: memory hotremove migration only" was helpful.  Petr, is that
> something you could take on also?  I _think_ it's just a matter of
> protecting the stable tree(s) with an additional mutex (which ought
> not to be contended, since ksm_thread_mutex is normally held above
> it, except in migration); then removing a number of PageKsm refusals
> (and the offlining arg to unmap_and_move() etc).  But perhaps there's
> more to it, I haven't gone over it properly.

Removing the restriction sounds good. In addition to
compaction/AutoNUMA etc.. KSM pages are marked MOVABLE so it's likely
not good for the anti frag pageblock types.

So if I understand this correctly, there would be no way to trigger
the stable tree corruption in current v4, without memory hotremove.

> Yes, I agree; but a few more comments I'll make against the v4 post.

Cool.

Thanks for the help!
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Andrea Arcangeli <aarcange@redhat.com>
To: Hugh Dickins <hughd@google.com>
Cc: Petr Holasek <pholasek@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Chris Wright <chrisw@sous-sol.org>,
	Izik Eidus <izik.eidus@ravellosystems.com>,
	Rik van Riel <riel@redhat.com>,
	David Rientjes <rientjes@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Anton Arapov <anton@redhat.com>
Subject: Re: [PATCH v4] KSM: numa awareness sysfs knob
Date: Mon, 1 Oct 2012 13:53:09 +0200	[thread overview]
Message-ID: <20121001115309.GE20924@redhat.com> (raw)
In-Reply-To: <alpine.LSU.2.00.1209301639240.6304@eggly.anvils>

Hi Hugh,

On Sun, Sep 30, 2012 at 05:36:33PM -0700, Hugh Dickins wrote:
> I'm all for the simplest solution, but here in ksm_migrate_page()
> is not a good place for COW breaking - we don't want to get into
> an indefinite number of page allocations, and the risk of failure.

Agreed, not a good place to break_cow.

> I was toying with the idea of leaving the new page in the old NUMAnode's
> stable tree temporarily, until ksmd comes around again, and let that
> clean it up.  Which would imply less reliance on get_kpfn_nid(),
> and not skipping PageKsm in ksm_do_scan(), and...

There a break_cow could more easily run to cleanup the errors in the
stable tree. It'd be one way to avoid altering migrate.

> But it's not all that simple, and I think we can do better.

Agreed.

> It's only just fully dawned on me that ksm_migrate_page() is actually
> a very convenient place: no pagetable mangling required, because we
> know that neither old nor new page is at this instant mapped into
> userspace at all - don't we?  Instead there are swap-like migration
> entries plugging all ptes until we're ready to put in the new page.

Yes.

> So I think what we really want to do is change the ksm_migrate_page()
> interface a little, and probably the precise position it's called from,
> to allow it to update mm/migrate.c's newpage - in the collision case

I agree your proposed modification to the ->migratepage protocol
should be able to deal with that. We should notify the caller the
"newpage" has been freed and we transferred all ownership to an
"alternate_newpage". So then migrate will restore the ptes pointing to
the alternate_newpage (not the allocated newpage). It should be also
possible to get an hold on the alternate_newpage, before having to
allocate newpage.

> when the new NUMAnode already has a stable copy of this page.  But when
> it doesn't, just move KSMnode from old NUMAnode's stable tree to new.

Agreed, that is the easy case and doesn't require interface changes.

> How well the existing ksm.c primitives are suited to this, I've not
> checked.  Probably not too well, but shouldn't be hard to add what's
> needed.
> 
> What do you think?  Does that sound reasonable, Petr?

Sounds like a plan, I agree the modification to migrate is the best
way to go here. Only cons: it's not the simplest solution.

> By the way, this is probably a good occasion to remind ourselves,
> that page migration is still usually disabled on PageKsm pages:
> ksm_migrate_page() is only being called for memory hotremove.  I had
> been about to complain that calling remove_node_from_stable_tree()
> from ksm_migrate_page() is also unsafe from a locking point of view;
> until I remembered that MEM_GOING_OFFLINE has previously acquired
> ksm_thread_mutex.
> 
> But page migration is much more important now than three years ago,
> with compaction relying upon it, CMA and THP relying upon compaction,
> and lumpy reclaim gone.

Agreed. AutoNUMA needs it too: AutoNUMA migrates all types of memory,
not just anonymous memory, as long as the mapcount == 1.

If all users break_cow except one, then the KSM page can move around
if it has left just one user, we don't need to wait this last user to
break_cow (which may never happen) before can move it.

> Whilst it should not be mixed up in the NUMA patch itself, I think we
> need now to relax that restriction.  I found re-reading my 62b61f611e
> "ksm: memory hotremove migration only" was helpful.  Petr, is that
> something you could take on also?  I _think_ it's just a matter of
> protecting the stable tree(s) with an additional mutex (which ought
> not to be contended, since ksm_thread_mutex is normally held above
> it, except in migration); then removing a number of PageKsm refusals
> (and the offlining arg to unmap_and_move() etc).  But perhaps there's
> more to it, I haven't gone over it properly.

Removing the restriction sounds good. In addition to
compaction/AutoNUMA etc.. KSM pages are marked MOVABLE so it's likely
not good for the anti frag pageblock types.

So if I understand this correctly, there would be no way to trigger
the stable tree corruption in current v4, without memory hotremove.

> Yes, I agree; but a few more comments I'll make against the v4 post.

Cool.

Thanks for the help!
Andrea

next prev parent reply	other threads:[~2012-10-01 11:53 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-24  0:56 [PATCH v4] KSM: numa awareness sysfs knob Petr Holasek
2012-09-24  0:56 ` Petr Holasek
2012-09-27 15:45 ` Rik van Riel
2012-09-27 15:45   ` Rik van Riel
2012-09-28 11:26 ` Andrea Arcangeli
2012-09-28 11:26   ` Andrea Arcangeli
2012-10-01  0:36   ` Hugh Dickins
2012-10-01  0:36     ` Hugh Dickins
2012-10-01 11:53     ` Andrea Arcangeli [this message]
2012-10-01 11:53       ` Andrea Arcangeli
2012-10-01 23:47       ` Hugh Dickins
2012-10-01 23:47         ` Hugh Dickins
2012-10-01  1:37 ` Hugh Dickins
2012-10-01  1:37   ` Hugh Dickins
2012-10-01  8:14   ` Hugh Dickins
2012-10-01  8:14     ` Hugh Dickins
2012-10-08 23:00   ` Petr Holasek
2012-10-08 23:00     ` Petr Holasek
2012-10-23  6:28 ` Ni zhan Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121001115309.GE20924@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anton@redhat.com \
    --cc=chrisw@sous-sol.org \
    --cc=hughd@google.com \
    --cc=izik.eidus@ravellosystems.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pholasek@redhat.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.